Task: For each article, find the dealer or dealers with the most expensive price.
In standard SQL (and as of MySQL 4.1), the problem can be solved with a subquery like this:
SELECT article, dealer, price
FROM   shop s1
WHERE  price=(SELECT MAX(s2.price)
              FROM shop s2
              WHERE s1.article = s2.article);
        The preceding example uses a correlated subquery, which can be
        inefficient (see Section 12.2.8.7, “Correlated Subqueries”). Other
        possibilities for solving the problem are to use an uncorrelated
        subquery in the FROM clause or a
        LEFT JOIN:
      
SELECT s1.article, dealer, s1.price FROM shop s1 JOIN ( SELECT article, MAX(price) AS price FROM shop GROUP BY article) AS s2 ON s1.article = s2.article AND s1.price = s2.price; SELECT s1.article, s1.dealer, s1.price FROM shop s1 LEFT JOIN shop s2 ON s1.article = s2.article AND s1.price < s2.price WHERE s2.article IS NULL;
        The LEFT JOIN works on the basis that when
        s1.price is at its maximum value, there is no
        s2.price with a greater value and the
        s2 rows values will be
        NULL. See Section 12.2.7.1, “JOIN Syntax”.
      
Before MySQL 4.1, subqueries are unavailable. Another approach is to solve the problem in several steps:
Get the list of (article,maxprice) pairs.
For each article, get the corresponding rows that have the stored maximum price.
This can easily be done with a temporary table and a join:
CREATE TEMPORARY TABLE tmp (
        article INT(4) UNSIGNED ZEROFILL DEFAULT '0000' NOT NULL,
        price   DOUBLE(16,2)             DEFAULT '0.00' NOT NULL);
LOCK TABLES shop READ;
INSERT INTO tmp SELECT article, MAX(price) FROM shop GROUP BY article;
SELECT shop.article, dealer, shop.price FROM shop, tmp
WHERE shop.article=tmp.article AND shop.price=tmp.price;
UNLOCK TABLES;
DROP TABLE tmp;
        If you don't use a TEMPORARY table, you must
        also lock the tmp table.
      
“Can it be done with a single query?”
Yes, but only by using a quite inefficient trick called the “MAX-CONCAT trick”:
SELECT article,
       SUBSTRING( MAX( CONCAT(LPAD(price,6,'0'),dealer) ), 7) AS dealer,
  0.00+LEFT(      MAX( CONCAT(LPAD(price,6,'0'),dealer) ), 6) AS price
FROM   shop
GROUP BY article;
+---------+--------+-------+
| article | dealer | price |
+---------+--------+-------+
|    0001 | B      |  3.99 |
|    0002 | A      | 10.99 |
|    0003 | C      |  1.69 |
|    0004 | D      | 19.95 |
+---------+--------+-------+
The last example can be made a bit more efficient by doing the splitting of the concatenated column in the client.


User Comments
The examples here by MySQL are not that helpful if you are working with temporary tables and need to add conditions to the query... like prices only from this year, for instance.
The first example falls down with temporary tables because you can't select from the same temp table twice in the same query (get "can't reopen table" error). It also doesn't give you just distinct articles, if that's what you were after. It will show the same article twice in the case where you have 2 articles from different dealers for the same price.
The second example is both difficult to understand and cumbersome to manage if you have extra query conditions (i.e. don't want to select the whole table). With large tables, you will end up needing to repeat query conditions in inner and outer queries (or at least in the LEFT JOIN ON part) for query speed.
I found another method for group-wise max that is a bit more straightforward and less duplicating of conditions:
SELECT *
FROM (SELECT *
FROM shop
[WHERE conditions]
ORDER BY price DESC) AS s
GROUP BY article
The inner query orders all records by highest price. The outer query uses a group by, which simply grabs the first distinct article that it finds. In this case the first one it finds will be the highest one since we ordered them that way.
This query is probably not as efficient as others, but it's straightforward, and it's easier on query developers when they need to use temp tables or needed to query on less than an entire table. It's only slightly less efficient than the queries given by MySQL on this page, in my tests.
Speakman: Your solution sounds good, but is it stable?
The manual tells that non-grouped fields have an indeterminate value if not all values of rows are the same.
Right now the value at the first encountered row may be used, but may it happen in the future releases that other random row will be selected? It would be nice to hear an insider's opinion.
I suggest the following syntax which orders by column1 as usual but returns the corresponding column2 when given:
max( column1 [, column2 ] )
Speakman's solution ("group by trick", as I call it) is stable and far more efficient than the methods given in the article.
This is Order(N*logN); the article is Order(N*N). That is, as the number of rows grows, the article's methods slow down quadratically -- a million rows would require a trillion operations.
Add your own comment.