Filter rows with same column value but IDs are not the biggest [duplicate]

Filter rows with same column value but IDs are not the biggest [duplicate] - sql

There is a table messages that contains data as shown below:
Id Name Other_Columns
-------------------------
1 A A_data_1
2 A A_data_2
3 A A_data_3
4 B B_data_1
5 B B_data_2
6 C C_data_1
If I run a query select * from messages group by name, I will get the result as:
1 A A_data_1
4 B B_data_1
6 C C_data_1
What query will return the following result?
3 A A_data_3
5 B B_data_2
6 C C_data_1
That is, the last record in each group should be returned.
At present, this is the query that I use:
SELECT
*
FROM (SELECT
*
FROM messages
ORDER BY id DESC) AS x
GROUP BY name
But this looks highly inefficient. Any other ways to achieve the same result?

MySQL 8.0 now supports windowing functions, like almost all popular SQL implementations. With this standard syntax, we can write greatest-n-per-group queries:
WITH ranked_messages AS (
SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn
FROM messages AS m
)
SELECT * FROM ranked_messages WHERE rn = 1;
This and other approaches to finding groupwise maximal rows are illustrated in the MySQL manual.
Below is the original answer I wrote for this question in 2009:
I write the solution this way:
SELECT m1.*
FROM messages m1 LEFT JOIN messages m2
ON (m1.name = m2.name AND m1.id < m2.id)
WHERE m2.id IS NULL;
Regarding performance, one solution or the other can be better, depending on the nature of your data. So you should test both queries and use the one that is better at performance given your database.
For example, I have a copy of the StackOverflow August data dump. I'll use that for benchmarking. There are 1,114,357 rows in the Posts table. This is running on MySQL 5.0.75 on my Macbook Pro 2.40GHz.
I'll write a query to find the most recent post for a given user ID (mine).
First using the technique shown by #Eric with the GROUP BY in a subquery:
SELECT p1.postid
FROM Posts p1
INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid
FROM Posts pi GROUP BY pi.owneruserid) p2
ON (p1.postid = p2.maxpostid)
WHERE p1.owneruserid = 20860;
1 row in set (1 min 17.89 sec)
Even the EXPLAIN analysis takes over 16 seconds:
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 76756 | |
| 1 | PRIMARY | p1 | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY | 8 | p2.maxpostid | 1 | Using where |
| 2 | DERIVED | pi | index | NULL | OwnerUserId | 8 | NULL | 1151268 | Using index |
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
3 rows in set (16.09 sec)
Now produce the same query result using my technique with LEFT JOIN:
SELECT p1.postid
FROM Posts p1 LEFT JOIN posts p2
ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid)
WHERE p2.postid IS NULL AND p1.owneruserid = 20860;
1 row in set (0.28 sec)
The EXPLAIN analysis shows that both tables are able to use their indexes:
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
| 1 | SIMPLE | p1 | ref | OwnerUserId | OwnerUserId | 8 | const | 1384 | Using index |
| 1 | SIMPLE | p2 | ref | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8 | const | 1384 | Using where; Using index; Not exists |
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
2 rows in set (0.00 sec)
Here's the DDL for my Posts table:
CREATE TABLE `posts` (
`PostId` bigint(20) unsigned NOT NULL auto_increment,
`PostTypeId` bigint(20) unsigned NOT NULL,
`AcceptedAnswerId` bigint(20) unsigned default NULL,
`ParentId` bigint(20) unsigned default NULL,
`CreationDate` datetime NOT NULL,
`Score` int(11) NOT NULL default '0',
`ViewCount` int(11) NOT NULL default '0',
`Body` text NOT NULL,
`OwnerUserId` bigint(20) unsigned NOT NULL,
`OwnerDisplayName` varchar(40) default NULL,
`LastEditorUserId` bigint(20) unsigned default NULL,
`LastEditDate` datetime default NULL,
`LastActivityDate` datetime default NULL,
`Title` varchar(250) NOT NULL default '',
`Tags` varchar(150) NOT NULL default '',
`AnswerCount` int(11) NOT NULL default '0',
`CommentCount` int(11) NOT NULL default '0',
`FavoriteCount` int(11) NOT NULL default '0',
`ClosedDate` datetime default NULL,
PRIMARY KEY (`PostId`),
UNIQUE KEY `PostId` (`PostId`),
KEY `PostTypeId` (`PostTypeId`),
KEY `AcceptedAnswerId` (`AcceptedAnswerId`),
KEY `OwnerUserId` (`OwnerUserId`),
KEY `LastEditorUserId` (`LastEditorUserId`),
KEY `ParentId` (`ParentId`),
CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`)
) ENGINE=InnoDB;
Note to commenters: If you want another benchmark with a different version of MySQL, a different dataset, or different table design, feel free to do it yourself. I have shown the technique above. Stack Overflow is here to show you how to do software development work, not to do all the work for you.

UPD: 2017-03-31, the version 5.7.5 of MySQL made the ONLY_FULL_GROUP_BY switch enabled by default (hence, non-deterministic GROUP BY queries became disabled). Moreover, they updated the GROUP BY implementation and the solution might not work as expected anymore even with the disabled switch. One needs to check.
Bill Karwin's solution above works fine when item count within groups is rather small, but the performance of the query becomes bad when the groups are rather large, since the solution requires about n*n/2 + n/2 of only IS NULL comparisons.
I made my tests on a InnoDB table of 18684446 rows with 1182 groups. The table contains testresults for functional tests and has the (test_id, request_id) as the primary key. Thus, test_id is a group and I was searching for the last request_id for each test_id.
Bill's solution has already been running for several hours on my dell e4310 and I do not know when it is going to finish even though it operates on a coverage index (hence using index in EXPLAIN).
I have a couple of other solutions that are based on the same ideas:
if the underlying index is BTREE index (which is usually the case), the largest (group_id, item_value) pair is the last value within each group_id, that is the first for each group_id if we walk through the index in descending order;
if we read the values which are covered by an index, the values are read in the order of the index;
each index implicitly contains primary key columns appended to that (that is the primary key is in the coverage index). In solutions below I operate directly on the primary key, in you case, you will just need to add primary key columns in the result.
in many cases it is much cheaper to collect the required row ids in the required order in a subquery and join the result of the subquery on the id. Since for each row in the subquery result MySQL will need a single fetch based on primary key, the subquery will be put first in the join and the rows will be output in the order of the ids in the subquery (if we omit explicit ORDER BY for the join)
3 ways MySQL uses indexes is a great article to understand some details.
Solution 1
This one is incredibly fast, it takes about 0,8 secs on my 18M+ rows:
SELECT test_id, MAX(request_id) AS request_id
FROM testresults
GROUP BY test_id DESC;
If you want to change the order to ASC, put it in a subquery, return the ids only and use that as the subquery to join to the rest of the columns:
SELECT test_id, request_id
FROM (
SELECT test_id, MAX(request_id) AS request_id
FROM testresults
GROUP BY test_id DESC) as ids
ORDER BY test_id;
This one takes about 1,2 secs on my data.
Solution 2
Here is another solution that takes about 19 seconds for my table:
SELECT test_id, request_id
FROM testresults, (SELECT #group:=NULL) as init
WHERE IF(IFNULL(#group, -1)=#group:=test_id, 0, 1)
ORDER BY test_id DESC, request_id DESC
It returns tests in descending order as well. It is much slower since it does a full index scan but it is here to give you an idea how to output N max rows for each group.
The disadvantage of the query is that its result cannot be cached by the query cache.

Use your subquery to return the correct grouping, because you're halfway there.
Try this:
select
a.*
from
messages a
inner join
(select name, max(id) as maxid from messages group by name) as b on
a.id = b.maxid
If it's not id you want the max of:
select
a.*
from
messages a
inner join
(select name, max(other_col) as other_col
from messages group by name) as b on
a.name = b.name
and a.other_col = b.other_col
This way, you avoid correlated subqueries and/or ordering in your subqueries, which tend to be very slow/inefficient.

I arrived at a different solution, which is to get the IDs for the last post within each group, then select from the messages table using the result from the first query as the argument for a WHERE x IN construct:
SELECT id, name, other_columns
FROM messages
WHERE id IN (
SELECT MAX(id)
FROM messages
GROUP BY name
);
I don't know how this performs compared to some of the other solutions, but it worked spectacularly for my table with 3+ million rows. (4 second execution with 1200+ results)
This should work both on MySQL and SQL Server.

Solution by sub query fiddle Link
select * from messages where id in
(select max(id) from messages group by Name)
Solution By join condition fiddle link
select m1.* from messages m1
left outer join messages m2
on ( m1.id<m2.id and m1.name=m2.name )
where m2.id is null
Reason for this post is to give fiddle link only.
Same SQL is already provided in other answers.

An approach with considerable speed is as follows.
SELECT *
FROM messages a
WHERE Id = (SELECT MAX(Id) FROM messages WHERE a.Name = Name)
Result
Id Name Other_Columns
3 A A_data_3
5 B B_data_2
6 C C_data_1

We will look at how you can use MySQL at getting the last record in a Group By of records. For example if you have this result set of posts.
id
category_id
post_title
1
1
Title 1
2
1
Title 2
3
1
Title 3
4
2
Title 4
5
2
Title 5
6
3
Title 6
I want to be able to get the last post in each category which are Title 3, Title 5 and Title 6. To get the posts by the category you will use the MySQL Group By keyboard.
select * from posts group by category_id
But the results we get back from this query is.
id
category_id
post_title
1
1
Title 1
4
2
Title 4
6
3
Title 6
The group by will always return the first record in the group on the result set.
SELECT id, category_id, post_title
FROM posts
WHERE id IN (
SELECT MAX(id)
FROM posts
GROUP BY category_id );
This will return the posts with the highest IDs in each group.
id
category_id
post_title
3
1
Title 3
5
2
Title 5
6
3
Title 6
Reference Click Here

Here are two suggestions. First, if mysql supports ROW_NUMBER(), it's very simple:
WITH Ranked AS (
SELECT Id, Name, OtherColumns,
ROW_NUMBER() OVER (
PARTITION BY Name
ORDER BY Id DESC
) AS rk
FROM messages
)
SELECT Id, Name, OtherColumns
FROM messages
WHERE rk = 1;
I'm assuming by "last" you mean last in Id order. If not, change the ORDER BY clause of the ROW_NUMBER() window accordingly. If ROW_NUMBER() isn't available, this is another solution:
Second, if it doesn't, this is often a good way to proceed:
SELECT
Id, Name, OtherColumns
FROM messages
WHERE NOT EXISTS (
SELECT * FROM messages as M2
WHERE M2.Name = messages.Name
AND M2.Id > messages.Id
)
In other words, select messages where there is no later-Id message with the same Name.

Clearly there are lots of different ways of getting the same results, your question seems to be what is an efficient way of getting the last results in each group in MySQL. If you are working with huge amounts of data and assuming you are using InnoDB with even the latest versions of MySQL (such as 5.7.21 and 8.0.4-rc) then there might not be an efficient way of doing this.
We sometimes need to do this with tables with even more than 60 million rows.
For these examples I will use data with only about 1.5 million rows where the queries would need to find results for all groups in the data. In our actual cases we would often need to return back data from about 2,000 groups (which hypothetically would not require examining very much of the data).
I will use the following tables:
CREATE TABLE temperature(
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
groupID INT UNSIGNED NOT NULL,
recordedTimestamp TIMESTAMP NOT NULL,
recordedValue INT NOT NULL,
INDEX groupIndex(groupID, recordedTimestamp),
PRIMARY KEY (id)
);
CREATE TEMPORARY TABLE selected_group(id INT UNSIGNED NOT NULL, PRIMARY KEY(id));
The temperature table is populated with about 1.5 million random records, and with 100 different groups.
The selected_group is populated with those 100 groups (in our cases this would normally be less than 20% for all of the groups).
As this data is random it means that multiple rows can have the same recordedTimestamps. What we want is to get a list of all of the selected groups in order of groupID with the last recordedTimestamp for each group, and if the same group has more than one matching row like that then the last matching id of those rows.
If hypothetically MySQL had a last() function which returned values from the last row in a special ORDER BY clause then we could simply do:
SELECT
last(t1.id) AS id,
t1.groupID,
last(t1.recordedTimestamp) AS recordedTimestamp,
last(t1.recordedValue) AS recordedValue
FROM selected_group g
INNER JOIN temperature t1 ON t1.groupID = g.id
ORDER BY t1.recordedTimestamp, t1.id
GROUP BY t1.groupID;
which would only need to examine a few 100 rows in this case as it doesn't use any of the normal GROUP BY functions. This would execute in 0 seconds and hence be highly efficient.
Note that normally in MySQL we would see an ORDER BY clause following the GROUP BY clause however this ORDER BY clause is used to determine the ORDER for the last() function, if it was after the GROUP BY then it would be ordering the GROUPS. If no GROUP BY clause is present then the last values will be the same in all of the returned rows.
However MySQL does not have this so let's look at different ideas of what it does have and prove that none of these are efficient.
Example 1
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM selected_group g
INNER JOIN temperature t1 ON t1.id = (
SELECT t2.id
FROM temperature t2
WHERE t2.groupID = g.id
ORDER BY t2.recordedTimestamp DESC, t2.id DESC
LIMIT 1
);
This examined 3,009,254 rows and took ~0.859 seconds on 5.7.21 and slightly longer on 8.0.4-rc
Example 2
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM temperature t1
INNER JOIN (
SELECT max(t2.id) AS id
FROM temperature t2
INNER JOIN (
SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
FROM selected_group g
INNER JOIN temperature t3 ON t3.groupID = g.id
GROUP BY t3.groupID
) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
GROUP BY t2.groupID
) t5 ON t5.id = t1.id;
This examined 1,505,331 rows and took ~1.25 seconds on 5.7.21 and slightly longer on 8.0.4-rc
Example 3
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM temperature t1
WHERE t1.id IN (
SELECT max(t2.id) AS id
FROM temperature t2
INNER JOIN (
SELECT t3.groupID, max(t3.recordedTimestamp) AS recordedTimestamp
FROM selected_group g
INNER JOIN temperature t3 ON t3.groupID = g.id
GROUP BY t3.groupID
) t4 ON t4.groupID = t2.groupID AND t4.recordedTimestamp = t2.recordedTimestamp
GROUP BY t2.groupID
)
ORDER BY t1.groupID;
This examined 3,009,685 rows and took ~1.95 seconds on 5.7.21 and slightly longer on 8.0.4-rc
Example 4
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM selected_group g
INNER JOIN temperature t1 ON t1.id = (
SELECT max(t2.id)
FROM temperature t2
WHERE t2.groupID = g.id AND t2.recordedTimestamp = (
SELECT max(t3.recordedTimestamp)
FROM temperature t3
WHERE t3.groupID = g.id
)
);
This examined 6,137,810 rows and took ~2.2 seconds on 5.7.21 and slightly longer on 8.0.4-rc
Example 5
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM (
SELECT
t2.id,
t2.groupID,
t2.recordedTimestamp,
t2.recordedValue,
row_number() OVER (
PARTITION BY t2.groupID ORDER BY t2.recordedTimestamp DESC, t2.id DESC
) AS rowNumber
FROM selected_group g
INNER JOIN temperature t2 ON t2.groupID = g.id
) t1 WHERE t1.rowNumber = 1;
This examined 6,017,808 rows and took ~4.2 seconds on 8.0.4-rc
Example 6
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM (
SELECT
last_value(t2.id) OVER w AS id,
t2.groupID,
last_value(t2.recordedTimestamp) OVER w AS recordedTimestamp,
last_value(t2.recordedValue) OVER w AS recordedValue
FROM selected_group g
INNER JOIN temperature t2 ON t2.groupID = g.id
WINDOW w AS (
PARTITION BY t2.groupID
ORDER BY t2.recordedTimestamp, t2.id
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
) t1
GROUP BY t1.groupID;
This examined 6,017,908 rows and took ~17.5 seconds on 8.0.4-rc
Example 7
SELECT t1.id, t1.groupID, t1.recordedTimestamp, t1.recordedValue
FROM selected_group g
INNER JOIN temperature t1 ON t1.groupID = g.id
LEFT JOIN temperature t2
ON t2.groupID = g.id
AND (
t2.recordedTimestamp > t1.recordedTimestamp
OR (t2.recordedTimestamp = t1.recordedTimestamp AND t2.id > t1.id)
)
WHERE t2.id IS NULL
ORDER BY t1.groupID;
This one was taking forever so I had to kill it.

Here is another way to get the last related record using GROUP_CONCAT with order by and SUBSTRING_INDEX to pick one of the record from the list
SELECT
`Id`,
`Name`,
SUBSTRING_INDEX(
GROUP_CONCAT(
`Other_Columns`
ORDER BY `Id` DESC
SEPARATOR '||'
),
'||',
1
) Other_Columns
FROM
messages
GROUP BY `Name`
Above query will group the all the Other_Columns that are in same Name group and using ORDER BY id DESC will join all the Other_Columns in a specific group in descending order with the provided separator in my case i have used || ,using SUBSTRING_INDEX over this list will pick the first one
Fiddle Demo

Hi #Vijay Dev if your table messages contains Id which is auto increment primary key then to fetch the latest record basis on the primary key your query should read as below:
SELECT m1.* FROM messages m1 INNER JOIN (SELECT max(Id) as lastmsgId FROM messages GROUP BY Name) m2 ON m1.Id=m2.lastmsgId

I've not yet tested with large DB but I think this could be faster than joining tables:
SELECT *, Max(Id) FROM messages GROUP BY Name

SELECT
column1,
column2
FROM
table_name
WHERE id IN
(SELECT
MAX(id)
FROM
table_name
GROUP BY column1)
ORDER BY column1 ;

You can take view from here as well.
http://sqlfiddle.com/#!9/ef42b/9
FIRST SOLUTION
SELECT d1.ID,Name,City FROM Demo_User d1
INNER JOIN
(SELECT MAX(ID) AS ID FROM Demo_User GROUP By NAME) AS P ON (d1.ID=P.ID);
SECOND SOLUTION
SELECT * FROM (SELECT * FROM Demo_User ORDER BY ID DESC) AS T GROUP BY NAME ;

If you need the most recent or oldest record of a text column in a grouped query, and you would rather not use a subquery, you can do this...
Ex. You have a list of movies and need to get the count in the series and the latest movie
id
series
name
1
Star Wars
A New hope
2
Star Wars
The Empire Strikes Back
3
Star Wars
Return of The Jedi
SELECT COUNT(id), series, SUBSTRING(MAX(CONCAT(id, name)), LENGTH(id) + 1),
FROM Movies
GROUP BY series
This returns...
id
series
name
3
Star Wars
Return of The Jedi
MAX will return the row with the highest value, so by concatenating the id to the name, you now will get the newest record, then just strip off the id for your final result.
More efficient than using a subquery.
So for the given example:
SELECT MAX(Id), Name, SUBSTRING(MAX(CONCAT(Id, Other_Columns)), LENGTH(Id) + 1),
FROM messages
GROUP BY Name
Happy coding, and "May The Force Be With You" :)

Try this:
SELECT jos_categories.title AS name,
joined .catid,
joined .title,
joined .introtext
FROM jos_categories
INNER JOIN (SELECT *
FROM (SELECT `title`,
catid,
`created`,
introtext
FROM `jos_content`
WHERE `sectionid` = 6
ORDER BY `id` DESC) AS yes
GROUP BY `yes`.`catid` DESC
ORDER BY `yes`.`created` DESC) AS joined
ON( joined.catid = jos_categories.id )

Here is my solution:
SELECT
DISTINCT NAME,
MAX(MESSAGES) OVER(PARTITION BY NAME) MESSAGES
FROM MESSAGE;

SELECT * FROM table_name WHERE primary_key IN (SELECT MAX(primary_key) FROM table_name GROUP BY column_name )

**
Hi, this query might help :
**
SELECT
*
FROM
message
WHERE
`Id` IN (
SELECT
MAX(`Id`)
FROM
message
GROUP BY
`Name`
)
ORDER BY
`Id` DESC

i find best solution in https://dzone.com/articles/get-last-record-in-each-mysql-group
select * from `data` where `id` in (select max(`id`) from `data` group by `name_id`)

The below query will work fine as per your question.
SELECT M1.*
FROM MESSAGES M1,
(
SELECT SUBSTR(Others_data,1,2),MAX(Others_data) AS Max_Others_data
FROM MESSAGES
GROUP BY 1
) M2
WHERE M1.Others_data = M2.Max_Others_data
ORDER BY Others_data;

If you want the last row for each Name, then you can give a row number to each row group by the Name and order by Id in descending order.
QUERY
SELECT t1.Id,
t1.Name,
t1.Other_Columns
FROM
(
SELECT Id,
Name,
Other_Columns,
(
CASE Name WHEN #curA
THEN #curRow := #curRow + 1
ELSE #curRow := 1 AND #curA := Name END
) + 1 AS rn
FROM messages t,
(SELECT #curRow := 0, #curA := '') r
ORDER BY Name,Id DESC
)t1
WHERE t1.rn = 1
ORDER BY t1.Id;
SQL Fiddle

If performance is really your concern you can introduce a new column on the table called IsLastInGroup of type BIT.
Set it to true on the columns which are last and maintain it with every row insert/update/delete. Writes will be slower, but you'll benefit on reads. It depends on your use case and I recommend it only if you're read-focused.
So your query will look like:
SELECT * FROM Messages WHERE IsLastInGroup = 1

MariaDB 10.3 and newer using GROUP_CONCAT.
The idea is to use ORDER BY + LIMIT:
SELECT GROUP_CONCAT(id ORDER BY id DESC LIMIT 1) AS id,
name,
GROUP_CONCAT(Other_columns ORDER BY id DESC LIMIT 1) AS Other_columns
FROM t
GROUP BY name;
db<>fiddle demo

How about this:
SELECT DISTINCT ON (name) *
FROM messages
ORDER BY name, id DESC;
I had similar issue (on postgresql tough) and on a 1M records table. This solution takes 1.7s vs 44s produced by the one with LEFT JOIN.
In my case I had to filter the corrispondant of your name field against NULL values, resulting in even better performances by 0.2 secs

Yet another option without subqueries.
This solution uses MySQL LAST_VALUE window function, exploiting Window Function Frame available MySQL tool from .
SELECT DISTINCT
LAST_VALUE(Id)
OVER(PARTITION BY Name
ORDER BY Id
ROWS BETWEEN 0 PRECEDING
AND UNBOUNDED FOLLOWING),
Name,
LAST_VALUE(Other_Columns)
OVER(PARTITION BY Name
ORDER BY Id
ROWS BETWEEN 0 PRECEDING
AND UNBOUNDED FOLLOWING)
FROM
tab
Try it here.

Hope below Oracle query can help:
WITH Temp_table AS
(
Select id, name, othercolumns, ROW_NUMBER() over (PARTITION BY name ORDER BY ID
desc)as rank from messages
)
Select id, name,othercolumns from Temp_table where rank=1

Another approach :
Find the propertie with the max m2_price withing each program (n properties in 1 program) :
select * from properties p
join (
select max(m2_price) as max_price
from properties
group by program_id
) p2 on (p.program_id = p2.program_id)
having p.m2_price = max_price

What about:
select *, max(id) from messages group by name
I have tested it on sqlite and it returns all columns and max id value for all names.

As of MySQL 8.0.14, this can also be achieved using Lateral Derived Tables:
SELECT t.*
FROM messages t
JOIN LATERAL (
SELECT name, MAX(id) AS id
FROM messages t1
WHERE t.name = t1.name
GROUP BY name
) trn ON t.name = trn.name AND t.id = trn.id
db<>fiddle

Related

Select distinct value and bring only the latest one

I have a table that stores different statuses of each transaction. Each transaction can have multiple statuses (pending, rejected, aproved, etc).
I need to build a query that brings only the last status of each transaction.
The definition for the table that stores the statuses is:
[dbo].[Cuotas_Estado]
ID int (PK)
IdCuota int (references table dbo.Cuotas - FK)
IdEstado int (references table dbo.Estados - FK)
Here's the architecture for the 3 tables:
When running a simple SELECT statement on table dbo.Cuotas_Estado you'll get:
SELECT
*
FROM [dbo].[Cuotas_Estado] [E]
But the result I need is:
IdCuota | IdEstado
2 | 1
3 | 2
9 | 3
10 | 3
11 | 4
I'm running the following select statement:
SELECT
DISTINCT([E].[IdEstado]),
[E].[IdCuota]
FROM [dbo].[Cuotas_Estado] [E]
ORDER BY
[E].[IdCuota] ASC;
This will bring this result:
So, as you can see, it's bringing a double value to entry 9 and entry 11, I need the query to bring only the latest IdEstado column (3 in the entry 9 and 4 in the entry 11).

can you try this?
with cte as (
select IdEstado,IdCuota,
row_number() over(partition by IdCuota order by fecha desc) as RowNum
from [dbo].[Cuotas_Estado]
)
select IdEstado,IdCuota
from cte
where RowNum = 1

You can use a correlated subquery:
SELECT e.*
FROM [dbo].[Cuotas_Estado] e
WHERE e.IdEstado = (SELECT MAX(e2.IdEstado)
FROM [dbo].[Cuotas_Estado] e2
WHERE e2.IdCuota = e.IdCuota
);
With an index on Cuotas_Estado(IdCuota, IdEstado) this is probably the most efficient method.

Get top values from two columns

Lets say I have a table like this:
id | peru | usa
1 20 10
2 5 100
3 1 5
How can I get the top values from peru and usa as well as the spefic ids. So that I get as result:
usa_id: 2 | usa: 100 | peru_id: 1 | peru: 20
Is this possible In one query? Or do I have to do two ORDER BY querys?
Im using postgresql

You can do this with some subqueries and a cross join:
select
u.id usa_id,
u.usa,
p.id peru_id,
p.peru
from
(select id, usa from mytable where usa=(select max(usa) from mytable) order by id limit 1) u
cross join (select id, peru from mytable where peru=(select max(peru) from mytable) order by id limit 1) p
;
In the case that there are multiple rows with the same max value (for usa or peru, independently), this solution will select the one with the lowest id (I've assumed that id is unique).

SELECT
t1.id as peru_id, t1.peru
, t2.id as usa_id, t2.usa
FROM tab1 t1, tab1 t2
ORDER BY t1.peru desc, t2.usa desc
limit 1
http://sqlfiddle.com/#!15/0c12f/6
As basicly what this does is a simple carthesian product - I guess that performance WILL be poor for large datasets.
on the fiddle it took 196ms for a 1k rows table. On 10k rows table - sqlFiddle hung up.

You can consider using MAX aggregate function in conjunction with ARRAY type. Check this out:
CREATE TEMPORARY TABLE _test(
id integer primary key,
peru integer not null,
usa integer not null
);
INSERT INTO _test(id, peru, usa)
VALUES
(1,20,10),
(2,5,100),
(3,1,5);
SELECT MAX(ARRAY[peru, id]) AS max_peru, MAX(array[usa, id]) AS max_usa FROM _test;
SELECT x.max_peru[1] AS peru, x.max_peru[2] AS peru_id, x.max_usa[1]
AS usa, x.max_usa[2] AS usa_id FROM (
SELECT MAX(array[peru, id]) AS max_peru,
MAX(array[usa, id]) AS max_usa FROM _test ) as x;

Avoid useless subqueries or aggregations when joining and grouping

I have two tables, room and message, in a chat database :
CREATE TABLE room (
id serial primary key,
name varchar(50) UNIQUE NOT NULL,
private boolean NOT NULL default false,
description text NOT NULL
);
CREATE TABLE message (
id bigserial primary key,
room integer references room(id),
author integer references player(id),
created integer NOT NULL,
);
Let's say I want to get the rooms with the numbers of messages from an user and dates of most recent message :
id | number | last_created | description | name | private
----+--------+--------------+-------------+------------------+---------
2 | 1149 | 1391703964 | | Dragons & co | t
8 | 136 | 1391699600 | | Javascript | f
10 | 71 | 1391684998 | | WBT | t
1 | 86 | 1391682712 | | Miaou | f
3 | 423 | 1391681764 | | Code & Baguettes | f
...
I see two solutions :
1) selecting/grouping on the messages and using subqueries to get the room columns :
select m.room as id, count(*) number, max(created) last_created,
(select name from room where room.id=m.room),
(select description from room where room.id=m.room),
(select private from room where room.id=m.room)
from message m where author=$1 group by room order by last_created desc limit 10
This makes 3 almost identical subqueries. This looks very dirty. I could reverse it to do only 2 suqueries on message columns but it wouln't be much better.
2) selecting on both tables and using aggregate functions for all columns :
select room.id, count(*) number, max(created) last_created,
max(name) as name, max(description) as description, bool_or(private) as private
from message, room
where message.room=room.id and author=$1
group by room.id order by last_created desc limit 10
All those aggregate functions look messy and useless.
Is there a clean solution here ?
It looks like a general problem to me. Theoretically, those aggregate functions are useless as, by construct, all the joined rows are the same row. I'd like to know if there's a general solution.

Try performing the grouping in a subquery:
select m.id, m.number, m.last_created, r.name, r.description, r.private
from (
select m.room as id, count(*) number, max(created) last_created
from message m
where author=$1
group by room
) m
join room r
on r.id = m.id
order by m.last_created desc limit 10
Edit: Another option (likely with similar performance) is to move that aggregation into a view, something like:
create view MessagesByRoom
as
select m.author, m.room, count(*) number, max(created) last_created,
from message m
group by author, room
And then use it like:
select m.room, m.number, m.last_created, r.name, r.description, r.private
from MessagesByRoom m
join room r
on r.id = m.room
where m.author = $1
order by m.last_created desc limit 10

Maybe use a join?
SELECT
r.id, count(*) number_of_posts,
max(m.created) last_created,
r.name, r.description, r.private
FROM room r
JOIN message m on r.id = m.room
WHERE m.author = $1
GROUP BY r.id
ORDER BY last_created desc

You can include the columns in the group by:
select room.id, count(*) number, max(message.created) last_created,
room.name, room.description, room.private
from message join
room
on message.room=room.id and author=$1
group by room.id, name, description, private
order by last_created desc
limit 10;
EDIT:
This query will work in more recent versions of Postgres:
select room.id, count(*) number, max(message.created) last_created,
room.name, room.description, room.private
from message join
room
on message.room=room.id and author=$1
group by room.id
order by last_created desc
limit 10;
Earlier versions of the documentation are pretty clear that you would need to include all the columns:
When GROUP BY is present, it is not valid for the SELECT list
expressions to refer to ungrouped columns except within aggregate
functions, since there would be more than one possible value to return
for an ungrouped column.
The ANSI standard actually does allow the above query with just group by room.id. This is a rather recent addition to the functionality of databases that support it.

Select data from a table where only the first two columns are distinct

Background
I have a table which has six columns. The first three columns create the pk. I'm tasked with removing one of the pk columns.
I selected (using distinct) the data into a temp table (excluding the third column), and tried inserting all of that data back into the original table with the third column being '11' for every row as this is what I was instructed to do. (this column is going to be removed by a DBA after I do this)
However, when I went to insert this data back into the original table I get a pk constraint error. (shocking, I know)
The other three columns are just date columns, so the distinct select didn't create a unique pk for each record. What I'm trying to achieve is just calling a distinct on the first two columns, and then just arbitrarily selecting the three other columns as it doesn't matter which dates I choose (at least not on dev).
What I've tried
I found the following post which seems to achieve what I want:
How do I (or can I) SELECT DISTINCT on multiple columns?
I tried the answers from both Joel,and Erwin.
Attempt 1:
However, with Joels answer the set returned is too large - the inner join isn't doing what I thought it would do. Selecting distinct col1 and col2 there are 400 columns returned, however when I use his solution 600 rows are returned. I checked the data and in fact there were duplicate pk's. Here is my attempt at duplicating Joels answer:
select a.emp_no,
a.eec_planning_unit_cde,
'11' as area, create_dte,
create_by_emp_no, modify_dte,
modify_by_emp_no
from tempdb.guest.temp_part_time_evaluator b
inner join
(
select emp_no, eec_planning_unit_cde
from tempdb.guest.temp_part_time_evaluator
group by emp_no, eec_planning_unit_cde
) a
ON b.emp_no = a.emp_no AND b.eec_planning_unit_cde = a.eec_planning_unit_cde
Now, if I execute just the inner select statement 400 rows are returned. If I select the whole query 600 rows are returned? Isn't inner join supposed to only show the intersection of the two sets?
Attempt 2:
I also tried the answer from Erwin. This one has a syntax error and I'm having trouble googling the spec on the where clause (specifically, the trick he is using with (emp_no, eec_planning_unit_cde))
Here is the attempt:
select emp_no,
eec_planning_unit_cde,
'11' as area, create_dte,
create_by_emp_no,
modify_dte,
modify_by_emp_no
where (emp_no, eec_planning_unit_cde) IN
(
select emp_no, eec_planning_unit_cde
from tempdb.guest.temp_part_time_evaluator
group by emp_no, eec_planning_unit_cde
)
Now, I realize that the post I referenced is for postgresql. Doesn't T-SQL have something similar? Trying to google parenthesis isn't working too well.
Overview of Questions:
Why doesn't inner join return an intersection of two sets? From googling this is what I thought it was supposed to do
Is there another way to achieve the same method that I was trying in attempt 2 in t-sql?
It doesn't matter to me which one of these I use, or if I use another solution... how should I go about this?

A select distinct will be based on all columns so it does not guarantee the first two to be distinct
select pk1, pk2, '11', max(c1), max(c2), max(c3)
from table
group by pk1, pk2

You could TRY this:
SELECT a.emp_no,
a.eec_planning_unit_cde,
b.'11' as area,
b.create_dte,
b.create_by_emp_no,
b.modify_dte,
b.modify_by_emp_no
FROM
(
SELECT emp_no, eec_planning_unit_cde
FROM tempdb.guest.temp_part_time_evaluator
GROUP BY emp_no, eec_planning_unit_cde
) a
JOIN tempdb.guest.temp_part_time_evaluator b
ON a.emp_no = b.emp_no AND a.eec_planning_unit_cde = b.eec_planning_unit_cde
That would give you a distinct on those fields but if there is differences in the data between columns you might have to try a more brute force approch.
SELECT a.emp_no,
a.eec_planning_unit_cde,
a.'11' as area,
a.create_dte,
a.create_by_emp_no,
a.modify_dte,
a.modify_by_emp_no
FROM
(
SELECT ROW_NUMBER() OVER(ORDER BY emp_no, eec_planning_unit_cde) rownumber,
a.emp_no,
a.eec_planning_unit_cde,
a.'11' as area,
a.create_dte,
a.create_by_emp_no,
a.modify_dte,
a.modify_by_emp_no
FROM tempdb.guest.temp_part_time_evaluator
) a
WHERE rownumber = 1

I'll reply one by one:
Why doesn't inner join return an intersection of two sets? From googling this is what I thought it was supposed to do
Inner join don't do an intersection. Le'ts supose this tables:
T1 T2
n s n s
1 A 2 X
2 B 2 Y
2 C
3 D
If you join both tables by numeric column you don't get the intersection (2 rows). You get:
select *
from t1 inner join t2
on t1.n = t2.n;
| N | S |
---------
| 2 | B |
| 2 | B |
| 2 | C |
| 2 | C |
And, your second query approach:
select *
from t1
where t1.n in (select n from t2);
| N | S |
---------
| 2 | B |
| 2 | C |
Is there another way to achieve the same method that I was trying in attempt 2 in t-sql?
Yes, this subquery:
select *
from t1
where not exists (
select 1
from t2
where t2.n = t1.n
);
It doesn't matter to me which one of these I use, or if I use another solution... how should I go about this?
yes, using #JTC second query.

How to find "holes" in a table

I recently inherited a database on which one of the tables has the primary key composed of encoded values (Part1*1000 + Part2).
I normalized that column, but I cannot change the old values.
So now I have
select ID from table order by ID
ID
100001
100002
101001
...
I want to find the "holes" in the table (more precisely, the first "hole" after 100000) for new rows.
I'm using the following select, but is there a better way to do that?
select /* top 1 */ ID+1 as newID from table
where ID > 100000 and
ID + 1 not in (select ID from table)
order by ID
newID
100003
101029
...
The database is Microsoft SQL Server 2000. I'm ok with using SQL extensions.

select ID +1 From Table t1
where not exists (select * from Table t2 where t1.id +1 = t2.id);
not sure if this version would be faster than the one you mentioned originally.

SELECT (ID+1) FROM table AS t1
LEFT JOIN table as t2
ON t1.ID+1 = t2.ID
WHERE t2.ID IS NULL

This solution should give you the first and last ID values of the "holes" you are seeking. I use this in Firebird 1.5 on a table of 500K records, and although it does take a little while, it gives me what I want.
SELECT l.id + 1 start_id, MIN(fr.id) - 1 stop_id
FROM (table l
LEFT JOIN table r
ON l.id = r.id - 1)
LEFT JOIN table fr
ON l.id < fr.id
WHERE r.id IS NULL AND fr.id IS NOT NULL
GROUP BY l.id, r.id
For example, if your data looks like this:
ID
1001
1002
1005
1006
1007
1009
1011
You would receive this:
start_id stop_id
1003 1004
1008 1008
1010 1010
I wish I could take full credit for this solution, but I found it at Xaprb.

from How do I find a "gap" in running counter with SQL?
select
MIN(ID)
from (
select
100001 ID
union all
select
[YourIdColumn]+1
from
[YourTable]
where
--Filter the rest of your key--
) foo
left join
[YourTable]
on [YourIdColumn]=ID
and --Filter the rest of your key--
where
[YourIdColumn] is null

The best way is building a temp table with all IDs
Than make a left join.
declare #maxId int
select #maxId = max(YOUR_COLUMN_ID) from YOUR_TABLE_HERE
declare #t table (id int)
declare #i int
set #i = 1
while #i <= #maxId
begin
insert into #t values (#i)
set #i = #i +1
end
select t.id
from #t t
left join YOUR_TABLE_HERE x on x.YOUR_COLUMN_ID = t.id
where x.YOUR_COLUMN_ID is null

Have thought about this question recently, and looks like this is the most elegant way to do that:
SELECT TOP(#MaxNumber) ROW_NUMBER() OVER (ORDER BY t1.number)
FROM master..spt_values t1 CROSS JOIN master..spt_values t2
EXCEPT
SELECT Id FROM <your_table>

This solution doesn't give all holes in table, only next free ones + first available max number on table - works if you want to fill in gaps in id-es, + get free id number if you don't have a gap..
select numb + 1 from temp
minus
select numb from temp;

This will give you the complete picture, where 'Bottom' stands for gap start and 'Top' stands for gap end:
select *
from
(
(select <COL>+1 as id, 'Bottom' AS 'Pos' from <TABLENAME> /*where <CONDITION*/>
except
select <COL>, 'Bottom' AS 'Pos' from <TABLENAME> /*where <CONDITION>*/)
union
(select <COL>-1 as id, 'Top' AS 'Pos' from <TABLENAME> /*where <CONDITION>*/
except
select <COL>, 'Top' AS 'Pos' from <TABLENAME> /*where <CONDITION>*/)
) t
order by t.id, t.Pos
Note: First and Last results are WRONG and should not be regarded, but taking them out would make this query a lot more complicated, so this will do for now.

Many of the previous answer are quite good. However they all miss to return the first value of the sequence and/or miss to consider the lower limit 100000. They all returns intermediate holes but not the very first one (100001 if missing).
A full solution to the question is the following one:
select id + 1 as newid from
(select 100000 as id union select id from tbl) t
where (id + 1 not in (select id from tbl)) and
(id >= 100000)
order by id
limit 1;
The number 100000 is to be used if the first number of the sequence is 100001 (as in the original question); otherwise it is to be modified accordingly
"limit 1" is used in order to have just the first available number instead of the full sequence

For people using Oracle, the following can be used:
select a, b from (
select ID + 1 a, max(ID) over (order by ID rows between current row and 1 following) - 1 b from MY_TABLE
) where a <= b order by a desc;

The following SQL code works well with SqLite, but should be used without issues also on MySQL, MS SQL and so on.
On SqLite this takes only 2 seconds on a table with 1 million rows (and about 100 spared missing rows)
WITH holes AS (
SELECT
IIF(c2.id IS NULL,c1.id+1,null) as start,
IIF(c3.id IS NULL,c1.id-1,null) AS stop,
ROW_NUMBER () OVER (
ORDER BY c1.id ASC
) AS rowNum
FROM |mytable| AS c1
LEFT JOIN |mytable| AS c2 ON c1.id+1 = c2.id
LEFT JOIN |mytable| AS c3 ON c1.id-1 = c3.id
WHERE c2.id IS NULL OR c3.id IS NULL
)
SELECT h1.start AS start, h2.stop AS stop FROM holes AS h1
LEFT JOIN holes AS h2 ON h1.rowNum+1 = h2.rowNum
WHERE h1.start IS NOT NULL AND h2.stop IS NOT NULL
UNION ALL
SELECT 1 AS start, h1.stop AS stop FROM holes AS h1
WHERE h1.rowNum = 1 AND h1.stop > 0
ORDER BY h1.start ASC

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas