SQL - Latest record based on time-stamp and ID - sql

Changing my whole question as I get a lot of complaints about posting images. I also added a code which is more similar to my situation. My apologies I am new to SO, I try and make it as easy as possible for you.
I use IBM DB2 DBMS
I have a query which selects a lot of records(messages) that always have an ID(which is supposed to be unique), a status(error, completed) and a time-stamp. My query is the following;
select *
from tableone tr, tabletwo ms
where ms.TS BETWEEN '2017-09-15 00:00:00.000' and '2017-09-16 00:00:00.000'
and ms.ID=tr.ID
and ms.STATUS in ('ERROR','COMPLETED')
ORDER by tr.ID
The ID is unique to one message, a message can get multiple statuses on different time-stamps, which will result in multiple records as output of the query above.
I wish to only have records with unique messages and the latest gotten status.
I hope you guys and gals can help, thanks in advance.

Postgres, Oracle, SQL Server:
with CTE as
(
select t1.*, row_number() over(partition by t1.ID order by t1.Timestamp desc) rn
from MyTable t1
where t1.STATUS in ('ERROR','COMPLETED')
)
select *
from CTE
where rn = 1
MySQL
select t1.*
from MyTable t1
inner join
(
select t2.ID, max(t2.Timestamp) as MaxT
from MyTable t2
where t2.STATUS in ('ERROR','COMPLETED')
group by t2.ID
) x3
on x3.ID = t1.ID
and x3.MaxT = t1.Timestamp
where t1.STATUS in ('ERROR','COMPLETED')

Try it
select *
from table_name a,
where a.STATUS in ('ERROR','COMPLETED')
and a.TimeStamp = (select max(b.TimeStamp)
from table_name b,
where a.ID=B.ID)
ORDER by a.ID
or
select *
from table_name a,
where a.STATUS in ('ERROR','COMPLETED')
and a.TimeStamp = (select Top(1) b.TimeStamp
from table_name b,
where a.ID=B.ID
order by b.TimeStamp desc)
ORDER by a.ID

Here is the code based on your values.
I Reversed your ID to get only numbers. I did a row_number by the new ID and sorting it desc to get the newest.
With pretty ID:
select * from (
select *,ROW_NUMBER() over(partition by TrueID order by timestamp DESC) as RN
from (
SELECT REVERSE(substring(reverse([ID]),1,2)) as TrueID
,[Status]
,[Timestamp]
FROM [LegOgSpass].[dbo].[statustable])x
)z where RN= 1
With original ID:
select * from (
select *,ROW_NUMBER() over(partition by ID order by timestamp DESC) as RN
from (
SELECT ID
,[Status]
,[Timestamp]
FROM [LegOgSpass].[dbo].[statustable])x
)z where RN= 1

Related

SQL to pull only the first 10 records of all fields in table, grouped by only 2 columns

My SQL skills are too weak to solve this problem, but I am pretty certain it is possible to solve.
To simplify - I have a small table with 5 columns, let's label them A, B, C, D, E. It's 1000 rows.
I need to be able to group by columns A and B where (E is not null and E <> ''). That part I can do.
select T.A, T.B, count(*) as countAll
from TABLE T
where not T.E is null and T.E <> ''
group by T.A, T.B
But then I need to be able to get just the first 10 rows of each group of all the columns ([A-E]) included in each grouping within those parameters. This is where I'm flailing. What I need to see is all the fields in the table returned for the first 10 records of each grouping.
The below seems very similar to what I need but I so far cannot get it to even compile on my end. I must not be using the PARTITION BY clause correctly (never used it before). https://stackoverflow.com/a/51527260/3536926
SELECT MemberID, ResNumber, pcode, MemberEmail, arrivaldate,
FROM (
SELECT MemberID, ResNumber, pcode, MemberEmail, arrivaldate,
ROW_NUMBER () OVER w AS RN
FROM sometable
WINDOW w AS (PARTITION BY MemberID ORDER BY ResNumber ASC)
) X
WHERE RN <= 2
Maybe I should be using something besides GROUP BY like PARTITION BY but I'm not familiar with this?
I think this is perhaps just a syntax error. OVER clause is formatted like this:
SELECT MemberID, ResNumber, pcode, MemberEmail, arrivaldate
FROM (
SELECT MemberID, ResNumber, pcode, MemberEmail, arrivaldate,
ROW_NUMBER() OVER (PARTITION BY MemberID ORDER BY ResNumber ASC) AS RN
FROM sometable
) X
WHERE RN <= 10
Is the order of each set of returned rows important? This is a little janky but may get you close to what you're after.
WITH cte_Stuff
AS (
SELECT t1.A, t1,B, t1.C, t1.D, t1.E, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RowId --Returns a random set unless the ORDER BY is specified
FROM [TABLE] t1
INNER JOIN (
SELECT t2.A, t2.B, COUNT(*) as countAll
FROM [TABLE] t2
WHERE ISNULL(t2.E, '') <> ''
GROUP BY t2.A, t2.B
) x ON t1.A = t2.A
AND t1.B = t2.B
)
SELECT c.A, c.B, c.C, c.D, c.E
FROM cte_Stuff c
WHERE c.RowId <= 10

How to only select the SQL row with the MAX id in this join?

I need to get the record with the MAX id from this joined table, but I only need the top row to be joined with the main query on this subquery. How can I limit the subquery to only return one row? Previously the tran_state MAX was being returned which did not work correctly.
LEFT JOIN (
SELECT
tran_id
, MAX(id) AS max_tran_id
, MAX(DATETIME(created, 'America/New_York')) AS max_tran_created
, tran_state
FROM `prod.tran`
GROUP BY tran_id
) data ON t.id = data.tran_id
I attempted to modify the query like so but the tran_state is coming back as null.
LEFT JOIN (
SELECT
tran_state,
tran_id
FROM `prod.tran` WHERE ID IN (
SELECT
MAX(ID)
FROM `prod.tran` trans
WHERE trans.tran_id = transaction_id)
) data ON t.id = data.tran_id
You can use window functions:
LEFT JOIN
(SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.tran_id ORDER BY t.id DESC) as seqnum
FROM `prod.tran` t
) data
ON t.id = data.tran_id AND t.seqnum = 1
Try this query.
SELECT *
FROM `prod.tran`
WHERE id IN (
SELECT MAX(id)
FROM `prod.tran`
GROUP BY tran_id
) a

Scalable Solution to get latest row for each ID in BigQuery

I have a quite large table with a field ID and another field as collection_time. I want to select latest record for each ID. Unfortunately combination of (ID, collection_time) time is not unique together in my data. I want just one of records with the maximum collection time. I have tried two solutions but none of them has worked for me:
First: using query
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY collection_time) as rn
FROM mytable) where rn=1
This results in Resources exceeded error that I guess is because of ORDER BY in the query.
Second
Using join between table and latest time:
(SELECT tab1.*
FROM mytable AS tab1
INNER JOIN EACH
(SELECT ID, MAX(collection_time) AS second_time
FROM mytable GROUP EACH BY ID) AS tab2
ON tab1.ID=tab2.ID AND tab1.collection_time=tab2.second_time)
this solution does not work for me because (ID, collection_time) are not unique together so in JOIN result there would be multiple rows for each ID.
I am wondering if there is a workaround for the resourcesExceeded error, or a different query that would work in my case?
SELECT
agg.table.*
FROM (
SELECT
id,
ARRAY_AGG(STRUCT(table)
ORDER BY
collection_time DESC)[SAFE_OFFSET(0)] agg
FROM
`dataset.table` table
GROUP BY
id)
This will do the job for you and is scalable considering the fact that the schema keeps changing, you won't have to change this
Short and scalable version:
select array_agg(t order by collection_time desc limit 1)[offset(0)].*
from mytable t
group by t.id;
Quick and dirty option - combine your both queries into one - first get all records with latest collection_time (using your second query) and then dedup them using your first query:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY tab1.ID) AS rn
FROM (
SELECT tab1.*
FROM mytable AS tab1
INNER JOIN (
SELECT ID, MAX(collection_time) AS second_time
FROM mytable GROUP BY ID
) AS tab2
ON tab1.ID=tab2.ID AND tab1.collection_time=tab2.second_time
)
)
WHERE rn = 1
And with Standard SQL (proposed by S.Mohsen sh)
WITH myTable AS (
SELECT 1 AS ID, 1 AS collection_time
),
tab1 AS (
SELECT ID,
MAX(collection_time) AS second_time
FROM myTable GROUP BY ID
),
tab2 AS (
SELECT * FROM myTable
),
joint AS (
SELECT tab2.*
FROM tab2 INNER JOIN tab1
ON tab2.ID=tab1.ID AND tab2.collection_time=tab1.second_time
)
SELECT * EXCEPT(rn)
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID) AS rn
FROM joint
)
WHERE rn=1
If you don't care about writing a piece of code for every column:
SELECT ID,
ARRAY_AGG(col1 ORDER BY collection_time DESC)[OFFSET(0)] AS col1,
ARRAY_AGG(col2 ORDER BY collection_time DESC)[OFFSET(0)] AS col2
FROM myTable
GROUP BY ID
I see no one has mentioned window functions with QUALIFY:
SELECT *, MAX(collection_time) OVER (PARTITION BY id) AS max_timestamp
FROM my_table
QUALIFY collection_time = max_timestamp
The window function adds a column max_timestamp that is accessible in the QUALIFY clause to filter on.
As per your comment, Considering you have a table with unique ID's for which you need to find latest collection_time. Here is another way to do it using Correlated Sub-Query. Give it a try.
SELECT id,
(SELECT Max(collection_time)
FROM mytable B
WHERE A.id = B.id) AS Max_collection_time
FROM id_table A
Another solution, which could be more scalable since it avoids multiple scans of the same table (which will happen with both self-join and correlated subquery in above answers). This solution only works with standard SQL (uncheck "Use Legacy SQL" option):
SELECT
ID,
(SELECT srow.*
FROM UNNEST(t.srows) srow
WHERE srow.collection_time = MAX(srow.collection_time))
FROM
(SELECT ID, ARRAY_AGG(STRUCT(col1, col2, col3, ...)) srows
FROM id_table
GROUP BY ID) t

oracle query: get max value with some conditions

I have a table called service_t,it has a column effective_dt which is populated with unix timestamp. I need find all rows with max effective_dt but the effective_dt must be less than a given value. I have the following sql but I don’t think it’s efficient:
Select *
from service_t t1
where t1.effective_dt <= :given_value
and t1.effective_dt = (select max(effective_dt)
from service_t t2
where t2.effective_dt <= :given_value
and t1.id = t2.id)
Is this efficient or any other good ways? Thanks!
Using analytic functions is probably more efficient:
Select * from (
select t1.*,
dense_rank() over (partition by id order by effective_dt desc) rn
from service_t t1
where t1.effective_dt <= :given_value)
where rn = 1

PostgreSQL Selecting Most Recent Entry for a Given ID

Table Essentially looks like:
Serial-ID, ID, Date, Data, Data, Data, etc.
There can be Multiple Rows for the Same ID. I'd like to create a view of this table to be used in Reports that only shows the most recent entry for each ID. It should show all of the columns.
Can someone help me with the SQL select? thanks.
There's about 5 different ways to do this, but here's one:
SELECT *
FROM yourTable AS T1
WHERE NOT EXISTS(
SELECT *
FROM yourTable AS T2
WHERE T2.ID = T1.ID AND T2.Date > T1.Date
)
And here's another:
SELECT T1.*
FROM yourTable AS T1
LEFT JOIN yourTable AS T2 ON
(
T2.ID = T1.ID
AND T2.Date > T1.Date
)
WHERE T2.ID IS NULL
One more:
WITH T AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Date DESC) AS rn
FROM yourTable
)
SELECT * FROM T WHERE rn = 1
Ok, i'm getting carried away, here's the last one I'll post(for now):
WITH T AS (
SELECT ID, MAX(Date) AS latest_date
FROM yourTable
GROUP BY ID
)
SELECT yourTable.*
FROM yourTable
JOIN T ON T.ID = yourTable.ID AND T.latest_date = yourTable.Date
I would use DISTINCT ON
CREATE VIEW your_view AS
SELECT DISTINCT ON (id) *
FROM your_table a
ORDER BY id, date DESC;
This works because distinct on suppresses rows with duplicates of the expression in parentheses. DESC in order by means the one that normally sorts last will be first, and therefor be the one that shows in the result.
https://www.postgresql.org/docs/10/static/sql-select.html#SQL-DISTINCT
This seems like a good use for correlated subqueries:
CREATE VIEW your_view AS
SELECT *
FROM your_table a
WHERE date = (
SELECT MAX(date)
FROM your_table b
WHERE b.id = a.id
)
Your date column would need to uniquely identify each row (like a TIMESTAMP type).