Getting the First and Last Row Using ROW_NUMBER and PARTITION BY - sql

Sample Input
Name | Value | Timestamp
-----|-------|-----------------
One | 1 | 2016-01-01 02:00
Two | 3 | 2016-01-01 03:00
One | 2 | 2016-01-02 02:00
Two | 4 | 2016-01-03 04:00
Desired Output
Name | Value | EarliestTimestamp | LatestTimestamp
-----|-------|-------------------|-----------------
One | 2 | 2016-01-01 02:00 | 2016-01-02 02:00
Two | 4 | 2016-01-01 03:00 | 2016-01-03 04:00
Attempted Query
I am trying to use ROW_NUMBER() and PARTITION BY to get the latest Name and Value but I would also like the earliest and latest Timestamp value:
SELECT
t.Name,
t.Value,
t.????????? AS EarliestTimestamp,
t.Timestamp AS LatestTimestamp
FROM
(SELECT
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TIMESTAMP DESC) AS RowNumber,
Name,
Value
Timestamp) t
WHERE t.RowNumber = 1

This can be done using window functions min and max.
select distinct name,
min(timestamp) over(partition by name), max(timestamp) over(partition by name)
from tablename
Example
Edit: Based on the comments
select t.name,t.value,t1.earliest,t1.latest
from t
join (select distinct name,
min(tm) over(partition by name) earliest, max(tm) over(partition by name) latest
from t) t1 on t1.name = t.name and t1.latest = t.tm
Edit: Another approach is using the first_value window function, which would eliminate the need for a sub-query and join.
select distinct
name,
first_value(value) over(partition by name order by timestamp desc) as latest_value,
min(tm) over(partition by name) earliest,
-- or first_value can be used
-- first_value(timestamp) over(partition by name order by timestamp)
max(tm) over(partition by name) latest
-- or first_value can be used
-- first_value(timestamp) over(partition by name order by timestamp desc)
from t

If I'm understanding your question correctly, here's one option using the row_number function twice. Then to get them on the same row, you can use conditional aggregation.
This should be close:
SELECT
t.Name,
t.Value,
max(case when t.minrn = 1 then t.timestamp end) AS EarliestTimestamp,
max(case when t.maxrn = 1 then t.timestamp end) AS LatestTimestamp
FROM
(SELECT
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TIMESTAMP) as minrn,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TIMESTAMP DESC) as maxrn,
Name,
Value
Timestamp
FROM YourTable) t
WHERE t.minrn = 1 or t.maxrn = 1
GROUP BY t.Name, t.Value

Use MIN(Timestamp) OVER (PARTITION BY Name) in addition to the ROW_NUMBER() column, like so:
SELECT
t.Name,
t.Value,
t.EarliestTimestamp AS EarliestTimestamp,
t.Timestamp AS LatestTimestamp
FROM
(SELECT
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TIMESTAMP DESC) AS RowNumber,
MIN(Timestamp) OVER (PARTITION BY Name) AS EarliestTimestamp,
^^
Name,
Value
Timestamp) t
WHERE t.RowNumber = 1

You can use MIN and MAX functions + OUTER APPLY:
SELECT t.Name,
p.[Value],
MIN(t.[Timestamp]) as EarliestTimestamp ,
MAX(t.[Timestamp]) as LatestTimestamp
FROM Table1 t
OUTER APPLY (SELECT TOP 1 * FROM Table1 WHERE t.Name = Name ORDER BY [Timestamp] DESC) p
GROUP BY t.Name, p.[Value]
Output:
Name Value EarliestTimestamp LatestTimestamp
One 2 2016-01-01 02:00 2016-01-02 02:00
Two 4 2016-01-01 03:00 2016-01-03 04:00

If I understood your question, use the row_number() function as follows:
SELECT
t.Name,
t.Value,
min(t.Timestamp) Over (Partition by name) As EarliestTimestamp,
t.Timestamp AS LatestTimestamp
FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TIMESTAMP DESC) AS RowNumber,
Name,
Value,
Timestamp) t
WHERE t.RowNumber = 1
Group By t.Name, t.Value, t.TimeStamp

Think simple.
select
t.Name,
MAX(t.Value),
MIN(t.Timestamp),
MAX(t.Timestamp)
FROM
t
group by
t.Name

Related

SQL The largest number of consecutive values for each value

I have Tabel MatchResults
id | player_win_id
------------------
1 | 1
2 | 1
3 | 3
4 | 1
5 | 2
6 | 3
7 | 3
8 | 1
9 | 1
10 | 1
I need to find out for each player ID the highest number of consecutive victories. I use MS SQL Server.
Expected Result
PLAYER_ID | WIN_COUNT
------------------
1 | 3
2 | 1
3 | 2
This is a type of gaps-and-islands problem. One solution uses the difference of row numbers. So, to get all streaks:
select player_win_id, count(*)
from (select t.*,
row_number() over (order by id) as seqnum,
row_number() over (partition by player_win_id order by id) as seqnum_p
from MatchResults t
) t
group by player_win_id, (seqnum - seqnum_p);
Why this works is a little tricky to explain. But if you look at the results of the subquery, you'll probably see how the difference between the row number values captures adjacent rows with the same player win id.
For the maximum, the simplest is probably just an aggregation query:
select player_win_id, max(cnt)
from (select player_win_id, count(*) as cnt
from (select t.*,
row_number() over (order by id) as seqnum,
row_number() over (partition by player_win_id order by id) as seqnum_p
from MatchResults t
) t
group by player_win_id, (seqnum - seqnum_p)
) p
group by player_win_id;
Now I understand the previous comment. The code for my table is:
select player_win_id, max(cnt)
from (select player_win_id, count(*) as cnt
from (select *,
row_number() over (order by id) as seqnum,
row_number() over (partition by player_win_id order by id) as seqnum_p
from MatchResults ) t
group by player_win_id, (seqnum - seqnum_p)
) p
group by player_win_id;

How to count repeating values in a column in PostgreSQL?

Hi I have a table like below, and I want to count the repeating values in the status column. I don't want to calculate the overall duplicate values. For example, I just want to count how many "Offline" appears until the value changes to "Idle".
This is the result I wanted. Thank you.
This is often called gaps-and-islands.
One way to do it is with two sequences of row numbers.
Examine each intermediate result of the query to understand how it works.
WITH
CTE_rn
AS
(
SELECT
status
,dt
,ROW_NUMBER() OVER (ORDER BY dt) as rn1
,ROW_NUMBER() OVER (PARTITION BY status ORDER BY dt) as rn2
FROM
T
)
SELECT
status
,COUNT(*) AS cnt
FROM
CTE_rn
GROUP BY
status
,rn1-rn2
ORDER BY
min(dt)
;
Result
| status | cnt |
|---------|-----|
| offline | 2 |
| idle | 1 |
| offline | 2 |
| idle | 1 |
WITH
cte1 AS ( SELECT status,
"date",
workstation,
CASE WHEN status = LAG(status) OVER (PARTITION BY workstation ORDER BY "date")
THEN 0
ELSE 1 END changed
FROM test ),
cte2 AS ( SELECT status,
"date",
workstation,
SUM(changed) OVER (PARTITION BY workstation ORDER BY "date") group_num
FROM cte1 )
SELECT status, COUNT(*) "count", workstation, MIN("date") "from", MAX("date") "till"
FROM cte2
GROUP BY group_num, status, workstation;
fiddle

Delete duplicated record

I have a table which contains a lot of duplicated rows like this:
id_emp id date ch_in ch_out
1 34103 2019-09-01
1 34193 2019-09-01 17:00
1 34194 2019-09-02 07:03:21 16:59:26
1 34104 2019-09-02 07:03:21 16:59:26
1 33361 2019-09-02 NULL NULL
I want just one row for each date and others must delete with condition like I want the output must be:
id_emp id date ch_in ch_out
1 34193 2019-09-01 17:00
1 34104 2019-09-02 07:03:21 16:59:26
I tried to use distinct but nothing working:
select distinct id_emp, id, date_1, ch_in,ch_out
from ch_inout
where id_emp=1 order by date_1 asc
And I tried too using this query to delete:
select *
from (
select *, rn=row_number() over (partition by date_1 order by id)
from ch_inout
) x
where rn > 1;
But nothing is working the result is empty.
You can use aggregation:
select id_emp, max(id) as id, date, min(ch_in), max(ch_out)
from ch_inout
group by id_emp, date;
This returns the maximum id for each group of rows. That is not exactly what is returned in the question, but you don't specify the logic.
EDIT:
If you want to delete all but the largest id for each id_emp/date combination, you can use:
delete c from ch_inout c
where id < (select max(c2.id)
from ch_inout c2
where c2.id_emp = c.id_emp and c2.date = c.date
);
You can use ROW_NUMBER() to identify the records you want to delete. Assuming that you want to keep the record with the lowest id on each date:
SELECT *
FROM (
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY date ORDER BY id) rn
FROM ch_inout t
) x
WHERE rn > 1
You can easily turn this into a DELETE statement:
WITH cte AS (
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY date ORDER BY id) rn
FROM ch_inout t
)
DELETE FROM cte WHERE rn > 1

First value in DATE minus 30 days SQL

I have bunch of data out of which I'm showing ID, max date and it's corresponding values (user id, type, ...). Then I need to take MAX date for each ID, substract 30 days and show first date and it's corresponding values within this date period.
Example:
ID Date Name
1 01.05.2018 AAA
1 21.04.2018 CCC
1 05.04.2018 BBB
1 28.03.2018 AAA
expected:
ID max_date max_name previous_date previous_name
1 01.05.2018 AAA 05.04.2018 BBB
I have working solution using subselects, but as I have quite huge WHERE part, refresh takes ages.
SUBSELECT looks like that:
(SELECT MIN(N.name)
FROM t1 N
WHERE N.ID = T.ID
AND (N.date < MAX(T.date) AND N.date >= (MAX(T.date)-30))
AND (...)) AS PreviousName
How'd you write the select?
I'm using TSQL
Thanks
I can do this with 2 CTEs to build up the dates and names.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE t1 (ID int, theDate date, theName varchar(10)) ;
INSERT INTO t1 (ID, theDate, theName)
VALUES
( 1,'2018-05-01','AAA' )
, ( 1,'2018-04-21','CCC' )
, ( 1,'2018-04-05','BBB' )
, ( 1,'2018-03-27','AAA' )
, ( 2,'2018-05-02','AAA' )
, ( 2,'2018-05-21','CCC' )
, ( 2,'2018-03-03','BBB' )
, ( 2,'2018-01-20','AAA' )
;
Main Query:
;WITH cte1 AS (
SELECT t1.ID, t1.theDate, t1.theName
, DATEADD(day,-30,t1.theDate) AS dMinus30
, ROW_NUMBER() OVER (PARTITION BY t1.ID ORDER BY t1.theDate DESC) AS rn
FROM t1
)
, cte2 AS (
SELECT c2.ID, c2.theDate, c2.theName
, ROW_NUMBER() OVER (PARTITION BY c2.ID ORDER BY c2.theDate) AS rn
, COUNT(*) OVER (PARTITION BY c2.ID) AS theCount
FROM cte1
INNER JOIN cte1 c2 ON cte1.ID = c2.ID
AND c2.theDate >= cte1.dMinus30
WHERE cte1.rn = 1
GROUP BY c2.ID, c2.theDate, c2.theName
)
SELECT cte1.ID, cte1.theDate AS max_date, cte1.theName AS max_name
, cte2.theDate AS previous_date, cte2.theName AS previous_name
, cte2.theCount
FROM cte1
INNER JOIN cte2 ON cte1.ID = cte2.ID
AND cte2.rn=1
WHERE cte1.rn = 1
Results:
| ID | max_date | max_name | previous_date | previous_name |
|----|------------|----------|---------------|---------------|
| 1 | 2018-05-01 | AAA | 2018-04-05 | BBB |
| 2 | 2018-05-21 | CCC | 2018-05-02 | AAA |
cte1 builds the list of max_date and max_name grouped by the ID and then using a ROW_NUMBER() window function to sort the groups by the dates to get the most recent date. cte2 joins back to this list to get all dates within the last 30 days of cte1's max date. Then it does essentially the same thing to get the last date. Then the outer query joins those two results together to get the columns needed while only selecting the most and least recent rows from each respectively.
I'm not sure how well it will scale with your data, but using the CTEs should optimize pretty well.
EDIT: For the additional requirement, I just added in another COUNT() window function to cte2.
I would do:
select id,
max(case when seqnum = 1 then date end) as max_date,
max(case when seqnum = 1 then name end) as max_name,
max(case when seqnum = 2 then date end) as prev_date,
max(case when seqnum = 2 then name end) as prev_name,
from (select e.*, row_number() over (partition by id order by date desc) as seqnum
from example e
) e
group by id;

select top N records for each entity

I have a table like below -
ID | Reported Date | Device_ID
-------------------------------------------
1 | 2016-03-09 09:08:32.827 | 1
2 | 2016-03-08 09:08:32.827 | 1
3 | 2016-03-08 09:08:32.827 | 1
4 | 2016-03-10 09:08:32.827 | 2
5 | 2016-03-05 09:08:32.827 | 2
Now, i want a top 1 row based on date column for each device_ID
Expected Output
ID | Reported Date | Device_ID
-------------------------------------------
1 | 2016-03-09 09:08:32.827 | 1
4 | 2016-03-10 09:08:32.827 | 2
I am using SQL Server 2008 R2. i can go and write Stored Procedure to handle it but wanted do it with simple query.
****************EDIT**************************
Answer by 'Felix Pamittan' worked well but for 'N' just change it to
SELECT
Id, [Reported Date], Device_ID
FROM (
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY Device_ID ORDER BY [ReportedDate] DESC)
FROM tbl
)t
WHERE Rn >= N
He had mentioned this in comment thought to add it to questions so that no body miss it.
Use ROW_NUMBER:
SELECT
Id, [Reported Date], Device_ID
FROM (
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY Device_ID ORDER BY [ReportedDate] DESC)
FROM tbl
)t
WHERE Rn = 1
You can also try using CTE
With DeviceCTE AS
(SELECT *, ROW_NUMBER() OVER(PARTITION BY Device_ID ORDER BY [Reported Date] DESC) AS Num
FROM tblname)
SELECT Id, [Reported Date], Device_ID
From DeviceCTE
Where Num = 1
If you can't use an analytic function, e.g. because your application layer won't allow it, then you can try the following solution which uses a subquery to arrive at the answer:
SELECT t1.ID, t2.maxDate, t1.Device_ID
INNER JOIN
(
SELECT Device_ID, MAX([Reported Date]) AS maxDate
FROM yourTable
GROUP BY Device_ID
) t2
ON t1.Device_ID = t2.Device_ID AND t1.[Reported Date] = t2.maxDate
Select * from DEVICE_TABLE D
where [Reported Date] = (Select Max([Reported Date]) from DEVICE_TABLE where Device_ID = D.Device_ID)
should do the trick, assume that "top 1 row based on date column" means that you want to select the latest reported date of each Device_ID ?
As for your title, select top 5 rows of each Device_ID
Select * from DEVICE_TABLE D
where [Reported Date] in (Select top 5 [Reported Date] from DEVICE_TABLE D where Device_ID = D.Device_ID)
order by Device_ID, [Reported Date] desc
will give you the top 5 latest reports of each device id.
You may want to sort out the top 5 date if your data isn't in order...
Again with no analytic functions you can use CROSS APPLY :
DECLARE #tbl TABLE(Id INT,[Reported Date] DateTime , Device_ID INT)
INSERT INTO #tbl
VALUES
(1,'2016-03-09 09:08:32.827',1),
(2,'2016-03-08 09:08:32.827',1),
(3,'2016-03-08 09:08:32.827',1),
(4,'2016-03-10 09:08:32.827',2),
(5,'2016-03-05 09:08:32.827',2)
SELECT r.*
FROM ( SELECT DISTINCT Device_ID FROM #tbl ) d
CROSS APPLY ( SELECT TOP 1 *
FROM #tbl t
WHERE d.Device_ID = t.Device_ID ) r
Can be easily modified to support N records.
Credits go to wBob answering this question here