how to difference of row numbers approach - sql

i have data like this table
ItemId Value Date
1 2 2017-12-18 17:00:00.000
1 2 2017-12-18 17:02:00.000
1 2 2017-12-18 17:04:00.000
1 3 2017-12-18 17:06:00.000
1 3 2017-12-18 17:08:00.000
1 2 2017-12-18 17:10:00.000
1 2 2017-12-18 17:12:00.000
1 2 2017-12-18 17:16:00.000
1 4 2017-12-18 17:14:00.000
i want to output like this in sql server
ItemId Value MaxDate
1 2 2017-12-18 17:04:00.000
1 3 2017-12-18 17:08:00.000
1 2 2017-12-18 17:16:00.000
1 4 2017-12-18 17:14:00.000
thanks for your anwsers.

You appear to want the last row before value changes, although I'm not sure where value "4" comes from (my best guess is that the last input row should have a "4" and a different timestamp).
If so, you can simply use lead():
select t.*
from (select t.*,
lead(value) over (partition by itemId order by date) as next_value
from t
) t
where next_value is null or next_value <> value;

Related

How to count consecutive days in a table where days are duplicated "PostgresSQL"

Hello I would like to know the highest count of consecutive days a user has trained for.
My logs table that stores the records looks like this:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
The closest I could get is with this query, which does work only if the user has trained on one ground at a day.
SELECT COUNT(*) AS days_in_row
FROM (SELECT row_number() OVER (ORDER BY day) - day AS grp
FROM logs
WHERE created_at >= '2023-01-24 00:00:00'
AND user_id = 1) x
GROUP BY grp
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
This query would return a count of 5 consecutive days which is correct.
However my query doesn't work once a user trains multiple times on different training grounds in one day:
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
3
2
2023-01-26 10:00:00
5
1
4
1
2023-01-27 10:00:00
Than the query from above would return a count of 2 consecutive days which is not what I expect instead I would expect the number four because the user has trained the following days in row (1,2,3,4).
Thank you for reading.
Select only distinct data of interest first
SELECT min(created_at) start, COUNT(*) AS days_in_row
FROM (SELECT created_at, row_number() OVER (ORDER BY day) - day AS grp
FROM (
select distinct day, created_at
from logs
where created_at >= '2023-01-24 00:00:00'
AND user_id = 1) t
) x
GROUP BY grp

SQL - Count total IDs each day between dates

Here is what my data looks like
ID StartDate EndDate
1 1/1/2019 1/15/2019
2 1/10/2019 1/11/2019
3 2/5/2020 3/10/2020
4 3/10/2019 3/19/2019
5 5/1/2020 5/4/2020
I am trying to get a list of every date in my data set,and how many IDs fall in that time range, aggregated to the date level. So for ID-1, it would be in the records for 1/1/2019, 1/2/2019...through 1/15/2019.
I am not sure how to do this. All help is appreciated.
If you don't have a calendar table (highly recommended), you can perform this task with an ad-hoc tally table in concert with a CROSS APPLY
Example
Declare #YourTable Table ([ID] varchar(50),[StartDate] date,[EndDate] date)
Insert Into #YourTable Values
(1,'1/1/2019','1/15/2019')
,(2,'1/10/2019','1/11/2019')
,(3,'2/5/2020','3/10/2020')
,(4,'3/10/2019','3/19/2019')
,(5,'5/1/2020','5/4/2020')
Select A.ID
,B.Date
From #YourTable A
Cross Apply (
Select Top (DateDiff(DAY,A.[StartDate],A.[EndDate])+1) Date=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[StartDate])
From master..spt_values n1,master..spt_values n2
) B
Returns
ID Date
1 2019-01-01
1 2019-01-02
1 2019-01-03
1 2019-01-04
1 2019-01-05
1 2019-01-06
1 2019-01-07
1 2019-01-08
1 2019-01-09
1 2019-01-15
2 2019-01-10
2 2019-01-11
....
5 2020-05-01
5 2020-05-02
5 2020-05-03
5 2020-05-04

SQL Collapse Data

I am trying to collapse data that is in a sequence sorted by date. While grouping on the person and the type.
The data is stored in an SQL server and looks like the following -
seq person date type
--- ------ ------------------- ----
1 1 2018-02-10 08:00:00 1
2 1 2018-02-11 08:00:00 1
3 1 2018-02-12 08:00:00 1
4 1 2018-02-14 16:00:00 1
5 1 2018-02-15 16:00:00 1
6 1 2018-02-16 16:00:00 1
7 1 2018-02-20 08:00:00 2
8 1 2018-02-21 08:00:00 2
9 1 2018-02-22 08:00:00 2
10 1 2018-02-23 08:00:00 1
11 1 2018-02-24 08:00:00 1
12 1 2018-02-25 08:00:00 2
13 2 2018-02-10 08:00:00 1
14 2 2018-02-11 08:00:00 1
15 2 2018-02-12 08:00:00 1
16 2 2018-02-14 16:00:00 3
17 2 2018-02-15 16:00:00 3
18 2 2018-02-16 16:00:00 3
This data set contains about 1.2 million records that resemble the above.
The result that I would like to get from this would be -
person start type
------ ------------------- ----
1 2018-02-10 08:00:00 1
1 2018-02-20 08:00:00 2
1 2018-02-23 08:00:00 1
1 2018-02-25 08:00:00 2
2 2018-02-10 08:00:00 1
2 2018-02-14 16:00:00 3
I have the data in the first format by running the following query -
select
ROW_NUMBER() OVER (ORDER BY date) AS seq
person,
date,
type,
from table
group by person, date, type
I am just not sure how to keep the minimum date with the other distinct values from person and type.
This is a gaps-and-islands problem so, you can use differences of row_number() & use them in grouping :
select person, min(date) as start, type
from (select *,
row_number() over (partition by person order by seq) seq1,
row_number() over (partition by person, type order by seq) seq2
from table
) t
group by person, type, (seq1 - seq2)
order by person, start;
The correct solution using the difference of row numbers is:
select person, type, min(date) as start
from (select t.*,
row_number() over (partition by person order by seq) as seqnum_p,
row_number() over (partition by person, type order by seq) as seqnum_pt
from t
) t
group by person, type, (seqnum_p - seqnum_pt)
order by person, start;
type needs to be included in the GROUP BY.

How to get data into every 5 mins in SQL Server?

Just wandering, will it be possible to get data into every 5 mins group by name in SQL Server.
Example: I have the following data in a table:
id name message datetime
--- -------- ------- --------
1 David test 1 2017-12-18 10:00
2 David test 2 2017-12-18 10:01
3 David test 3 2017-12-18 10:03
4 Alvin bluh 1 2017-12-18 10:04
5 Alvin bluh 2 2017-12-18 10:04
6 David test 4 2017-12-18 10:06
How can I get the result as below in SQL Server?
id name message datetime
--- -------- ------- --------
1 David test 1 2017-12-18 10:00
2 David test 2
3 David test 3
4 Alvin bluh 1 2017-12-18 10:04
5 Alvin bluh 2
6 David test 4 2017-12-18 10:06
Group by Name, and group datetime by 5 mins
select id, name, message,
datetime = case when rn = 1 then datetime end
from (
select *, rn = row_number() over (partition by name,
dateadd(minute, datediff(minute, 0, datetime) / 5 * 5, 0)
order by datetime)
from yourtable t
) d
order by id

PostgreSQL - rank over rows listed in blocks of 0 and 1

I have a table that looks like:
id code date1 date2 block
--------------------------------------------------
20 1234 2017-07-01 2017-07-31 1
15 1234 2017-06-01 2017-06-30 1
13 1234 2017-05-01 2017-05-31 0
11 1234 2017-03-01 2017-03-31 0
9 1234 2017-02-01 2017-02-28 1
8 1234 2017-01-01 2017-01-31 0
7 1234 2016-11-01 2016-11-31 0
6 1234 2016-10-01 2016-10-31 1
2 1234 2016-09-01 2016-09-31 1
I need to rank the rows according to the blocks of 0's and 1's, like:
id code date1 date2 block desired_rank
-------------------------------------------------------------------
20 1234 2017-07-01 2017-07-31 1 1
15 1234 2017-06-01 2017-06-30 1 1
13 1234 2017-05-01 2017-05-31 0 2
11 1234 2017-03-01 2017-03-31 0 2
9 1234 2017-02-01 2017-02-28 1 3
8 1234 2017-01-01 2017-01-31 0 4
7 1234 2016-11-01 2016-11-31 0 4
6 1234 2016-10-01 2016-10-31 1 5
2 1234 2016-09-01 2016-09-31 1 5
I've tried to use rank() and dense_rank(), but the result I end up with is:
id code date1 date2 block dense_rank()
-------------------------------------------------------------------
20 1234 2017-07-01 2017-07-31 1 1
15 1234 2017-06-01 2017-06-30 1 2
13 1234 2017-05-01 2017-05-31 0 1
11 1234 2017-03-01 2017-03-31 0 2
9 1234 2017-02-01 2017-02-28 1 3
8 1234 2017-01-01 2017-01-31 0 3
7 1234 2016-11-01 2016-11-31 0 4
6 1234 2016-10-01 2016-10-31 1 4
2 1234 2016-09-01 2016-09-31 1 5
In the last table, the rank doesn't care about the rows, it just takes all the 1's and 0's as a unit and sets an ascending count starting at the first 1 and 0.
My query goes like this:
CREATE TEMP TABLE data (id integer,code text, date1 date, date2 date, block integer);
INSERT INTO data VALUES
(20,'1234', '2017-07-01','2017-07-31',1),
(15,'1234', '2017-06-01','2017-06-30',1),
(13,'1234', '2017-05-01','2017-05-31',0),
(11,'1234', '2017-03-01','2017-03-31',0),
(9, '1234', '2017-02-01','2017-02-28',1),
(8, '1234', '2017-01-01','2017-01-31',0),
(7, '1234', '2016-11-01','2016-11-30',0),
(6, '1234', '2016-10-01','2016-10-31',1),
(2, '1234', '2016-09-01','2016-09-30',1);
SELECT *,dense_rank() OVER (PARTITION BY code,block ORDER BY date2 DESC)
FROM data
ORDER BY date2 DESC;
By the way, the database is in postgreSQL.
I hope there's a workaround... Thanks :)
Edit: Note that the blocks of 0's and 1's aren't equal.
There's no way to get this result using a single Window Function:
SELECT *,
Sum(flag) -- now sum the 0/1 to create the "rank"
Over (PARTITION BY code
ORDER BY date2 DESC)
FROM
(
SELECT *,
CASE
WHEN Lag(block) -- check if this is the 1st row of a new block
Over (PARTITION BY code
ORDER BY date2 DESC) = block
THEN 0
ELSE 1
END AS flag
FROM DATA
) AS dt