How to do this data transformation - sql

This is my input data
GroupId Serial Action
1 1 Start
1 2 Run
1 3 Jump
1 8 End
2 9 Shop
2 10 Start
2 11 Run
For each activitysequence in a group I want to Find pairs of Actions where Action1.SerialNo = Action2.SerialNo + k and how may times it happens
Suppose k = 1, then output will be
FirstAction NextAction Frequency
Start Run 2
Run Jump 1
Shop Start 1
How can I do this in SQL, fast enough given the input table contains millions of entries.

tful, This should produce the result you want, but I don't know if it will be as fast as you 'd like. It's worth a try.
create table Actions(
GroupId int,
Serial int,
"Action" varchar(20) not null,
primary key (GroupId, Serial)
);
insert into Actions values
(1,1,'Start'), (1,2,'Run'), (1,3,'Jump'),
(1,8,'End'), (2,9,'Shop'), (2,10,'Start'),
(2,11,'Run');
go
declare #k int = 1;
with ActionsDoubled(Serial,Tag,"Action") as (
select
Serial, 'a', "Action"
from Actions as A
union all
select
Serial-#k, 'b', "Action"
from Actions
as B
), Pivoted(Serial,a,b) as (
select Serial,a,b
from ActionsDoubled
pivot (
max("Action") for Tag in ([a],[b])
) as P
)
select
a, b, count(*) as ct
from Pivoted
where a is not NULL and b is not NULL
group by a,b
order by a,b;
go
drop table Actions;
If you will be doing the same computation for various #k values on stable data, this may work better in the long run:
declare #k int = 1;
select
Serial, 'a' as Tag, "Action"
into ActionsDoubled
from Actions as A
union all
select
Serial-#k, 'b', "Action"
from Actions
as B;
go
create unique clustered index AD_S on ActionsDoubled(Serial,Tag);
create index AD_a on ActionsDoubled(Tag,Serial);
go
with Pivoted(Serial,a,b) as (
select Serial,a,b
from ActionsDoubled
pivot (
max("Action") for Tag in ([a],[b])
) as P
)
select
a, b, count(*) as ct
from Pivoted
where a is not NULL and b is not NULL
group by a,b
order by a,b;
go
drop table ActionsDoubled;

SELECT a1.Action AS FirstActio, a2.Action AS NextAction, COUNT(*) AS Frequency
FROM Activities a1 JOIN Activities a2
ON (a1.GroupId = a2.GroupId AND a1.Serial = a2.Serial + #k)
GROUP BY a1.Action, a2.Action;

The problem is this: Your query has to go through EVERY row regardless.
You can make it more manageable for your database by tackling each group separately as separate queries. Especially if the size of each group is SMALL.
There's a lot going on under the hood and when the query has to do a scan of the entire table, this actually ends up being many times slower than if you did small chunks which effectively cover all million rows.
So for instance:
--Stickler for clean formatting...
SELECT
a1.Action AS FirstAction,
a2.Action AS NextAction,
COUNT(*) AS Frequency
FROM
Activities a1 JOIN Activities a2
ON (a1.groupid = a2.groupid
AND a1.Serial = a2.Serial + #k)
WHERE
a1.groupid = 1
GROUP BY
a1.Action,
a2.Action;
By the way, you have an index (GroupId, Serial) on the table, right?

Related

Remove duplicated subsets from very large table

The data I'm working with is fairly complicated, so I'm just going to provide a simpler example so I can hopefully expand that out to what I'm working on.
Note: I've already found a way to do it, but it's extremely slow and not scalable. It works great on small datasets, but if I applied it to the actual tables it needs to run on, it would take forever.
I need to remove entire duplicate subsets of data within a table. Removing duplicate rows is easy, but I'm stuck finding an efficient way to remove duplicate subsets.
Example:
GroupID Subset Value
------- ---- ----
1 a 1
1 a 2
1 a 3
1 b 1
1 b 3
1 b 5
1 c 1
1 c 3
1 c 5
2 a 1
2 a 2
2 a 3
2 b 4
2 b 5
2 b 6
2 c 1
2 c 3
2 c 6
So in this example, from GroupID 1, I would need to remove either subset 'b' or subset 'c', doesn't matter which since both contain Values 1,2,3. For GroupID 2, none of the sets are duplicated, so none are removed.
Here's the code I used to solve this on a small scale. It works great, but when applied to 10+ Million records...you can imagine it would be very slow (I was later informed of the number of records, the sample data I was given was much smaller)...:
DECLARE #values TABLE (GroupID INT NOT NULL, SubSet VARCHAR(1) NOT NULL, [Value] INT NOT NULL)
INSERT INTO #values (GroupID, SubSet, [Value])
VALUES (1,'a',1),(1,'a',2),(1,'a',3) ,(1,'b',1),(1,'b',3),(1,'b',5) ,(1,'c',1),(1,'c',3),(1,'c',5),
(2,'a',1),(2,'a',2),(2,'a',3) ,(2,'b',2),(2,'b',4),(2,'b',6) ,(2,'c',1),(2,'c',3),(2,'c',6)
SELECT *
FROM #values v
ORDER BY v.GroupID, v.SubSet, v.[Value]
SELECT x.GroupID, x.NameValues, MIN(x.SubSet)
FROM (
SELECT t1.GroupID, t1.SubSet
, NameValues = (SELECT ',' + CONVERT(VARCHAR(10), t2.[Value]) FROM #values t2 WHERE t1.GroupID = t2.GroupID AND t1.SubSet = t2.SubSet ORDER BY t2.[Value] FOR XML PATH(''))
FROM #values t1
GROUP BY t1.GroupID, t1.SubSet
) x
GROUP BY x.GroupID, x.NameValues
All I'm doing here is grouping by GroupID and Subset and concatenating all of the values into a comma delimited string...and then taking that and grouping on GroupID and Value list, and taking the MIN subset.
I'd go with something like this:
;with cte as
(
select v.GroupID, v.SubSet, checksum_agg(v.Value) h, avg(v.Value) a
from #values v
group by v.GroupID, v.SubSet
)
delete v
from #values v
join
(
select c1.GroupID, case when c1.SubSet > c2.SubSet then c1.SubSet else c2.SubSet end SubSet
from cte c1
join cte c2 on c1.GroupID = c2.GroupID and c1.SubSet <> c2.SubSet and c1.h = c2.h and c1.a = c2.a
)x on v.GroupID = x.GroupID and v.SubSet = x.SubSet
select *
from #values
From Checksum_Agg:
The CHECKSUM_AGG result does not depend on the order of the rows in
the table.
This is because it is a sum of the values: 1 + 2 + 3 = 3 + 2 + 1 = 3 + 3 = 6.
HashBytes is designed to produce a different value for two inputs that differ only in the order of the bytes, as well as other differences. (There is a small possibility that two inputs, perhaps of wildly different lengths, could hash to the same value. You can't take an arbitrary input and squeeze it down to an absolutely unique 16-byte value.)
The following code demonstrates how to use HashBytes to return for each GroupId/Subset.
-- Thanks for the sample data!
DECLARE #values TABLE (GroupID INT NOT NULL, SubSet VARCHAR(1) NOT NULL, [Value] INT NOT NULL)
INSERT INTO #values (GroupID, SubSet, [Value])
VALUES (1,'a',1),(1,'a',2),(1,'a',3) ,(1,'b',1),(1,'b',3),(1,'b',5) ,(1,'c',1),(1,'c',3),(1,'c',5),
(2,'a',1),(2,'a',2),(2,'a',3) ,(2,'b',2),(2,'b',4),(2,'b',6) ,(2,'c',1),(2,'c',3),(2,'c',6);
SELECT *
FROM #values v
ORDER BY v.GroupID, v.SubSet, v.[Value];
with
DistinctGroups as (
select distinct GroupId, Subset
from #Values ),
GroupConcatenatedValues as (
select GroupId, Subset, Convert( VarBinary(256), (
select Convert( VarChar(8000), Cast( Value as Binary(4) ), 2 ) AS [text()]
from #Values as V
where V.GroupId = DG.GroupId and V.SubSet = DG.SubSet
order by Value
for XML Path('') ), 2 ) as GroupedBinary
from DistinctGroups as DG )
-- To see the intermediate results from the CTE you can use one of the
-- following two queries instead of the last select :
-- select * from DistinctGroups;
-- select * from GroupConcatenatedValues;
select GroupId, Subset, GroupedBinary, HashBytes( 'MD4', GroupedBinary ) as Hash
from GroupConcatenatedValues
order by GroupId, Subset;
You can use checksum_agg() over a set of rows. If the checksums are the same, this is strong evidence that the 'values' columns are equal within the grouped fields.
In the 'getChecksums' cte below, I group by the group and subset, with a checksum based on your 'value' column.
In the 'maybeBadSubsets' cte, I put a row_number over each aggregation just to identify the 2nd+ row in the event the checksums match.
Finally, I delete any subgroups so identified.
with
getChecksums as (
select groupId,
subset,
cs = checksum_agg(value)
from #values v
group by groupId,
subset
),
maybeBadSubsets as (
select groupId,
subset,
cs,
deleteSubset =
case
when row_number() over (
partition by groupId, cs
order by subset
) > 1
then 1
end
from getChecksums
)
delete v
from #values v
where exists (
select 0
from maybeBadSubsets mbs
where v.groupId = mbs.groupId
and v.SubSet = mbs.subset
and mbs.deleteSubset = 1
);
I don't know what the exact likelihood is for checksums to match. If you're not comfortable with the false positive rate, you can still use it to eliminate some branches in a more algorithmic approach in order to vastly improve performance.
Note: CTE's can have a quirk performance-wise. If you find that the query engine is running 'maybeBadSubsets' for each row of #values, you may need to put its results into a temp table or table variable before using it. But I believe with 'exists' you're okay as far at that goes.
EDIT:
I didn't catch it, but as the OP noticed, checksum_agg seems to perform very poorly in terms of false hits/misses. I suspect it might be due to the simplicity of the input. I changed
cs = checksum_agg(value)
above to
cs = checksum_agg(convert(int,hashbytes('md5', convert(char(1),value))))
and got better results. But I don't know how it would perform on larger datasets.

Select exactly 5 rows from a table

I have an odd requirement which ideally should be solved in SQL, not the surrounding app.
I need to select exactly 5 rows regardless of how many are actually available. In practice the number of rows available will usually be less than 5 and on some rare occasions it will be more than 5. The "extra" rows should have null in every column.
The app is written in a technology that isn't Turing Complete. This requirement is much more difficult to solve in the app's code than you might imagine! To describe it, the app is effectively a transformer: It takes in a bunch of queries and spits out a report. So please understand the app is NOT written in a "programming language" in the traditional sense.
So for example, if I have a table:
A | B
-----
1 | X
2 | Y
3 | Z
Then a valid result would be
A | B
-----------
2 | Y
1 | X
3 | Z
null | null
null | null
I know this is an unusual requirement. Sadly it can't be solved in the application due to the technology being used.
Ideally this shouldn't require changes to the database but if there is no other way that changes can be arranged.
Any suggestions?
You can do something like this:
select top 5 a, b
from (select a, b, 1 as priority from t union all
select null, null, 2 cross join
(values(1, 2, 3, 4, 5)) v(5)
) x
order by priority;
That is, create dummy rows, append them, and then choose the first five.
I do think that this work should be done in the app, but you can do it in SQL.
Create Table #Test (A int, B int)
Insert #Test Values (1,1)
Insert #Test Values (2,1)
Insert #Test Values (3,1)
Select Top 5 * From
(
Select A, B From #Test
Union All
Select Null, Null
Union All
Select Null, Null
Union All
Select Null, Null
Union All
Select Null, Null
Union All
Select Null, Null
) A
Wrap this in a stored proc..
declare #rowcount int
select top 5* from dbo.test
set #rowcount=##rowcount
if #rowcount<5
Begin
select * from dbo.test
union all
select null from dbo.numbers where n<=5-#rowcount
End
If you use some sort of tally table (although the numbers themselves do not matter, only that the table has enough records), you can use it to create the dummy rows. e.g. using sys.columns:
select top 5 a,b from
(
select a, b, 0 ord from yourTable
union all
select null a,null b, 1 from sys.columns
) t
order by ord
The advantage of the tally would be that if you need another number of rows in the future, you only need to change the top x (provided the tally table has enough rows)
Get those 3 records from your table.
Take a counter variable.
and then from your code add the NULL content until your counter gets 5.

Can this ORDER BY on a CASE clause be made faster?

I'm selecting results from a table of ~350 million records, and it's running extremely slowly - around 10 minutes. The culprit seems to be the ORDER BY, as if I remove it the query only takes a moment. Here's the gist:
SELECT TOP 100
(columns snipped)
FROM (
SELECT
CASE WHEN (e2.ID IS NULL) THEN
CAST(0 AS BIT) ELSE CAST(1 AS BIT) END AS RecordExists,
(columns snipped)
FROM dbo.Files AS e1
LEFT OUTER JOIN dbo.Records AS e2 ON e1.FID = e2.FID
) AS p1
ORDER BY p1.RecordExists
Basically, I'm ordering the results by whether Files have a corresponding Record, as those without need to be handled first. I could run two queries with WHERE clauses, but I'd rather do it in a single query if possible.
Is there any way to speed this up?
The ultimate issue is that the use of CASE in the sub-query introduces an ORDER BY over something that is not being used in a sargable manner. Thus the entire intermediate result-set must first be ordered to find the TOP 100 - this is all 350+ million records!2
In this particular case, moving the CASE to the outside SELECT and use a DESC ordering (to put NULL values, which means "0" in the current RecordExists, first) should do the trick1. It's not a generic approach, though .. but the ordering should be much, much faster iff Files.ID is indexed. (If the query is still slow, consult the query plan to find out why ORDER BY is not using an index.)
Another alternative might be to include a persisted computed column for RecordExists (that is also indexed) that can be used as an index in the ORDER BY.
Once again, the idea is that the ORDER BY works over something sargable, which only requires reading sequentially inside the index (up to the desired number of records to match the outside limit) and not ordering 350+ million records on-the-fly :)
SQL Server is then able to push this ordering (and limit) down into the sub-query, instead of waiting for the intermediate result-set of the sub-query to come up. Look at the query plan differences based on what is being ordered.
1 Example:
SELECT TOP 100
-- If needed
CASE WHEN (p1.ID IS NULL) THEN
CAST(0 AS BIT) ELSE CAST(1 AS BIT) END AS RecordExists,
(columns snipped)
FROM (
SELECT
(columns snipped)
FROM dbo.Files AS e1
LEFT OUTER JOIN dbo.Records AS e2 ON e1.FID = e2.FID
) AS p1
-- Hopefully ID is indexed, DESC makes NULLs (!RecordExists) go first
ORDER BY p1.ID DESC
2 Actually, it seems like it could hypothetically just stop after the first 100 0's without a full-sort .. at least under some extreme query planner optimization under a closed function range, but that depends on when the 0's are encountered in the intermediate result set (in the first few thousand or not until the hundreds of millions or never?). I highly doubt SQL Server accounts for this extreme case anyway; that is, don't count on this still non-sargable behavior.
Give this form a try
SELECT TOP(100) *
FROM (
SELECT TOP(100)
0 AS RecordExists
--,(columns snipped)
FROM dbo.Files AS e1
WHERE NOT EXISTS (SELECT * FROM dbo.Records e2 WHERE e1.FID = e2.FID)
ORDER BY SecondaryOrderColumn
) X
UNION ALL
SELECT * FROM (
SELECT TOP(100)
1 AS RecordExists
--,(columns snipped)
FROM dbo.Files AS e1
INNER JOIN dbo.Records AS e2 ON e1.FID = e2.FID
ORDER BY SecondaryOrderColumn
) X
ORDER BY SecondaryOrderColumn
Key indexes:
Records (FID)
Files (FID, SecondaryOrdercolumn)
Well the reason it is much slower is because it is really a very different query without the order by clause.
With the order by clause:
Find all matching records out of the entire 350 million rows. Then sort them.
Without the order by clause:
Find the first 100 matching records. Stop.
Q: If you say the only difference is "with/outout" the "order by", then could you somehow move the "top 100" into the inner select?
EXAMPLE:
SELECT
(columns snipped)
FROM (
SELECT TOP 100
CASE WHEN (e2.ID IS NULL) THEN
CAST(0 AS BIT) ELSE CAST(1 AS BIT) END AS RecordExists,
(columns snipped)
FROM dbo.Files AS e1
LEFT OUTER JOIN dbo.Records AS e2 ON e1.FID = e2.FID
) AS p1
ORDER BY p1.RecordExists
In SQL Server, null values collate lower than any value in the domain. Given these two tables:
create table dbo.foo
(
id int not null identity(1,1) primary key clustered ,
name varchar(32) not null unique nonclustered ,
)
insert dbo.foo ( name ) values ( 'alpha' )
insert dbo.foo ( name ) values ( 'bravo' )
insert dbo.foo ( name ) values ( 'charlie' )
insert dbo.foo ( name ) values ( 'delta' )
insert dbo.foo ( name ) values ( 'echo' )
insert dbo.foo ( name ) values ( 'foxtrot' )
go
create table dbo.bar
(
id int not null identity(1,1) primary key clustered ,
foo_id int null foreign key references dbo.foo(id) ,
name varchar(32) not null unique nonclustered ,
)
go
insert dbo.bar( foo_id , name ) values( 1 , 'golf' )
insert dbo.bar( foo_id , name ) values( 5 , 'hotel' )
insert dbo.bar( foo_id , name ) values( 3 , 'india' )
insert dbo.bar( foo_id , name ) values( 5 , 'juliet' )
insert dbo.bar( foo_id , name ) values( 6 , 'kilo' )
go
The query
select *
from dbo.foo foo
left join dbo.bar bar on bar.foo_id = foo.id
order by bar.foo_id, foo.id
yields the following result set:
id name id foo_id name
-- ------- ---- ------ -------
2 bravo NULL NULL NULL
4 delta NULL NULL NULL
1 alpha 1 1 golf
3 charlie 3 3 india
5 echo 2 5 hotel
5 echo 4 5 juliet
6 foxtrot 5 6 kilo
(7 row(s) affected)
This should allow the query optimizer to use a suitable index (if such exists); however, it does not guarantee than any such index would be used.
Can you try this?
SELECT TOP 100
(columns snipped)
FROM dbo.Files AS e1
LEFT OUTER JOIN dbo.Records AS e2 ON e1.FID = e2.FID
ORDER BY e2.ID ASC
This should give you where e2.ID is null first. Also, make sure Records.ID is indexed. This should give you the ordering you were wanting.

Simple SQL: How to calculate unique, contiguous numbers for duplicates in a set?

Let's say I create a table with an int Page, int Section, and an int ID identity field, where the page field ranges from 1 to 8 and the section field ranges from 1 to 30 for each page. Now let's say that two records have duplicate page and section. How could I renumber those two records so that the sequence of page and section numbering is contiguous?
select page, section
from #fun
group by page, section having count(*) > 1
shows the duplicates:
page 1 section 3
page 2 section 3
page 1 section 4 and page 2 section 4 are missing. Is there a way without using a cursor to find and renumber the positions in SQL 2000 that doesn't support Row_Number()?
This rownum below of course produces exactly the same number as in section:
select page, section,
(select count(*) + 1
from #fun b
where b.page = a.page and b.section < a.section) as rownum
from #fun a
I could create a pivot table having values 1 through 100, but what would I join against?
What I want to do is something like this:
update p set section = (expression that gets 4)
from #fun p
where (expression that identifies duplicate sections by page)
I don't have a 2000 server to test this on, but I think it should work.
Create test tables/data:
CREATE TABLE #fun
(Id INT IDENTITY(100,1)
,page INT NOT NULL
,section INT NOT NULL
)
INSERT #fun (page, section)
SELECT 1,1
UNION ALL SELECT 1,3 UNION ALL SELECT 1,2
UNION ALL SELECT 1,3 UNION ALL SELECT 1,5
UNION ALL SELECT 2,1 UNION ALL SELECT 2,2
UNION ALL SELECT 2,3 UNION ALL SELECT 2,5
UNION ALL SELECT 2,3
Now the processing:
-- create a worktable
CREATE TABLE #fun2
(Id INT IDENTITY(1,1)
,funId INT
,page INT NOT NULL
,section INT NOT NULL
)
-- insert data into the second temp table ordered by the relevant columns
-- the identity column will form the basis of the revised section number
INSERT #fun2 (funId, page, section)
SELECT Id,page,section
FROM #fun
ORDER BY page,section,Id
-- write the calculated section value back where it is different
UPDATE p
SET section = y.calc_section
FROM #fun AS p
JOIN
(
SELECT f2.funId, f2.id - x.adjust calc_section
FROM #fun2 AS f2
JOIN (
-- this subquery is used to calculate an offset like
-- PARTITION BY in a 2005+ ROWNUMBER function
SELECT MIN(Id) - 1 adjust, page
FROM #fun2
GROUP BY page
) AS x
ON f2.page = x.page
) AS y
ON p.Id = y.funId
WHERE p.section <> y.calc_section
SELECT * FROM #fun order by page, section
Disclaimer: I don't have SQL Server to test.
If I understand you correctly, if you knew the ROW_NUMBER of your #fun records partitioned over (page, section) duplicates, you could use this relative ranking to increment the "section":
UPDATE p
SET section = section + (rownumber - 1)
FROM #fun AS p
INNER JOIN ( -- SELECT id, ROW_NUMBER() OVER (PARTITION BY page, section) ...
SELECT id, COUNT(1) AS rownumber
FROM #fun a
LEFT JOIN #fun b
ON a.page = b.page AND a.section = b.section AND a.id <= b.id
GROUP BY a.id, a.page, a.section) d
ON p.id = d.id
WHERE rownumber > 1
That won't handle the case where the number of duplicates push you past your upper limit of 30. It may also create new duplicates where if higher numbered sections per page already exist -- that is, one instance of (pg 1, sec 3) becomes (pg 1, sec 4), which already existed -- but you can run the UPDATE repeatedly until no duplicates exist.
And then add a unique index on (page, section).

Most efficient way of selecting the changes between timestamped snapshots

I have a table that holds data about items that existed at a certain time - regular snapshots taken.
Simple example:
Timestamp ID
1 A
1 B
2 A
2 B
2 C
3 A
3 D
4 D
4 E
In this case, Item C gets created sometime between snapshot 1 and 2 and sometime between snapshot 2 and 3 B and C disappear and D gets created, etc.
The table is reasonably large (millions of records) and for each timestamp there are about 50 records.
What's the most efficient way of selecting the item IDs for items that disappear between two consecutive timestamps?
So for the above example ...
Between 1 and 2: NULL
Between 2 and 3: B, C
Between 3 and 4: A
If it doesn't make the query inefficient, can it be extended to automatically use the latest (i.e. MAX) timestamp and the previous one?
Another way to view this is that you want to find records that exist in timestamp #1 that do not exist in timestamp #2. The easiest way?
SELECT Timestamp
FROM records AS t1
WHERE NOT EXISTS (SELECT 1 FROM records AS t2 WHERE t2.id = t1.id AND t2.Timestamp = t1.Timestamp + 1)
Of course, I'm exploiting here the fact that your example timestamps are integers, when in reality I imagine they are genuine timestamps. But it turns out the integers work so well for this particular purpose, they'd be really handy to have around. So, perhaps we should make a numbered list of all available timestamps. The easiest way to get that?
CREATE TEMPORARY TABLE timestamp_map AS (
timestamp_id AS INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
timestamp_value AS DATETIME
);
INSERT INTO timestamp_map (timestamp_value) (SELECT DISTINCT timestamp FROM records ORDER BY timestamp);
(You could also maintain such a table permanently by use of triggers.)
It's a bit out there, but I've gotten similar techniques to work very efficiently in the past for data like what you describe, when lots of back-and-forth subqueries and NOT EXISTS proved too slow.
Update:
See this entry in my blog for performance details:
MySQL: difference between sets
SELECT ts,
(
SELECT GROUP_CONCAT(id)
FROM mytable mi
WHERE mi.ts =
(
SELECT MAX(ts)
FROM mytable mp
WHERE mp.ts = mo.pts
)
AND NOT EXISTS
(
SELECT NULL
FROM mytable mn
WHERE mn.ts = mo.ts
AND mn.id = mi.id
)
)
FROM (
SELECT #r AS pts,
#r := ts AS ts
FROM (
SELECT #r := NULL
) vars,
(
SELECT DISTINCT ts
FROM mytable
) moo
) mo
To select only the last change:
SELECT ts,
(
SELECT GROUP_CONCAT(id)
FROM mytable mi
WHERE mi.ts =
(
SELECT MAX(ts)
FROM mytable mp
WHERE mp.ts < mo.ts
)
AND NOT EXISTS
(
SELECT NULL
FROM mytable mn
WHERE mn.ts = mo.ts
AND mn.id = mi.id
)
)
FROM (
SELECT MAX(ts) AS ts
FROM mytable
) mo
For this to be efficient, you need to have a composite index on mytable (timestamp, id) (in this order).