Calculate Date difference between two consecutive rows - sql

I have a table which contains datetime rows like below.
ID | DateTime
1 | 12:00
2 | 12:02
3 | 12:03
4 | 12:04
5 | 12:05
6 | 12:10
I want to identify those rows where there is a 'gap' of 5 minutes between rows (for example, row 5 and 6).
I know that we need to use DATEDIFF, but how can I only get those rows which are consecutive with each other?

You can use LAG, LEAD window functions for this:
SELECT ID
FROM (
SELECT ID, [DateTime],
DATEDIFF(mi, LAG([DateTime]) OVER (ORDER BY ID), [DateTime]) AS prev_diff,
DATEDIFF(mi, [DateTime], LEAD([DateTime]) OVER (ORDER BY ID)) AS next_diff
FROM mytable) AS t
WHERE prev_diff >= 5 OR next_diff >= 5
Output:
ID
==
5
6
Note: The above query assumes that order is defined by ID field. You can easily substitute this field with any other field that specifies order in your table.

You might try this (I'm not sure if it's really fast)
SELECT current.datetime AS current_datetime,
previous.datetime AS previous_datetime,
DATEDIFF(minute, previous.datetime, current.datetime) AS gap
FROM my_table current
JOIN my_table previous
ON previous.datetime < current.datetime
AND NOT EXISTS (SELECT *
FROM my_table others
WHERE others.datetime < current.datetime
AND others.datetime > previous.datetime);

update SS2012: Use LAG
DECLARE #tbl TABLE(ID INT, T TIME)
INSERT INTO #tbl VALUES
(1,'12:00')
,(2,'12:02')
,(3,'12:03')
,(4,'12:04')
,(5,'12:05')
,(6,'12:10');
WITH TimesWithDifferenceToPrevious AS
(
SELECT ID
,T
,LAG(T) OVER(ORDER BY T) AS prev
,DATEDIFF(MI,LAG(T) OVER(ORDER BY T),T) AS MinuteDiff
FROM #tbl
)
SELECT *
FROM TimesWithDifferenceToPrevious
WHERE ABS(MinuteDiff) >=5
The result
6 12:10:00.0000000 12:05:00.0000000 5

Related

SQL: How to create a daily view based on different time intervals using SQL logic?

Here is an example:
Id|price|Date
1|2|2022-05-21
1|3|2022-06-15
1|2.5|2022-06-19
Needs to look like this:
Id|Date|price
1|2022-05-21|2
1|2022-05-22|2
1|2022-05-23|2
...
1|2022-06-15|3
1|2022-06-16|3
1|2022-06-17|3
1|2022-06-18|3
1|2022-06-19|2.5
1|2022-06-20|2.5
...
Until today
1|2022-08-30|2.5
I tried using the lag(price) over (partition by id order by date)
But i can't get it right.
I'm not familiar with Azure, but it looks like you need to use a calendar table, or generate missing dates using a recursive CTE.
To get started with a recursive CTE, you can generate line numbers for each id (assuming multiple id values) in the source data ordered by date. These rows with row number equal to 1 (with the minimum date value for the corresponding id) will be used as the starting point for the recursion. Then you can use the DATEADD function to generate the row for the next day. To use the price values ​​from the original data, you can use a subquery to get the price for this new date, and if there is no such value (no row for this date), use the previous price value from CTE (use the COALESCE function for this).
For SQL Server query can look like this
WITH cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATEADD(d, 1, cte.date),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATEADD(d, 1, cte.date)),
cte.price
)
FROM cte
WHERE DATEADD(d, 1, cte.date) <= GETDATE()
)
SELECT * FROM cte
ORDER BY id, date
OPTION (MAXRECURSION 0)
Note that I added OPTION (MAXRECURSION 0) to make the recursion run through all the steps, since the default value is 100, this is not enough to complete the recursion.
db<>fiddle here
The same approach for MySQL (you need MySQL of version 8.0 to use CTE)
WITH RECURSIVE cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATE_ADD(cte.date, interval 1 day),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATE_ADD(cte.date, interval 1 day)),
cte.price
)
FROM cte
WHERE DATE_ADD(cte.date, interval 1 day) <= NOW()
)
SELECT * FROM cte
ORDER BY id, date
db<>fiddle here
Both queries produces the same results, the only difference is the use of the engine's specific date functions.
For MySQL versions below 8.0, you can use a calendar table since you don't have CTE support and can't generate the required date range.
Assuming there is a column in the calendar table to store date values ​​(let's call it date for simplicity) you can use the CROSS JOIN operator to generate date ranges for the id values in your table that will match existing dates. Then you can use a subquery to get the latest price value from the table which is stored for the corresponding date or before it.
So the query would be like this
SELECT
d.id,
d.date,
(SELECT
price
FROM tbl
WHERE tbl.id = d.id AND tbl.date <= d.date
ORDER BY tbl.date DESC
LIMIT 1
) price
FROM (
SELECT
t.id,
c.date
FROM calendar c
CROSS JOIN (SELECT DISTINCT id FROM tbl) t
WHERE c.date BETWEEN (
SELECT
MIN(date) min_date
FROM tbl
WHERE tbl.id = t.id
)
AND NOW()
) d
ORDER BY id, date
Using my pseudo-calendar table with date values ranging from 2022-05-20 to 2022-05-30 and source data in that range, like so
id
price
date
1
2
2022-05-21
1
3
2022-05-25
1
2.5
2022-05-28
2
10
2022-05-25
2
100
2022-05-30
the query produces following results
id
date
price
1
2022-05-21
2
1
2022-05-22
2
1
2022-05-23
2
1
2022-05-24
2
1
2022-05-25
3
1
2022-05-26
3
1
2022-05-27
3
1
2022-05-28
2.5
1
2022-05-29
2.5
1
2022-05-30
2.5
2
2022-05-25
10
2
2022-05-26
10
2
2022-05-27
10
2
2022-05-28
10
2
2022-05-29
10
2
2022-05-30
100
db<>fiddle here

sum last 7 days of sales in new column

I have the following data set:
I want to create a new column that sums the last 7 days of sales. So the query result should look be the following:
Pls help
Thanks!
In standard SQL, you would use a window function -- assuming you have data for each day:
select t.*,
sum(sales) over (partition by itemid order by date rows between 6 preceding and current row) as sales_7
from t;
use sum() aggregate function and group by
select country,itemid,year,monthnumber,week sum(sales) as sales_last_7days from your_table
where date>=DATEADD(day, -7, getdate()) and date< getdate()
group by country,itemid,year,monthnumber,week
with window:
select (list other columns here), sum(sum(sales)) over
(partition by week
order by day
rows between 6 preceding and current row)
from table
group by date, week;
note that week doesen't change group by beacause a date is reffered to one week only, but it is needed in window.
Seems you are working with SQL Server if so, then you can use apply :
select t.*, t1.[last7day]
from table t outer apply
(select sum(t1.sales) as [last7day]
from table t1
where t.itemid = t1.itemid and
t1.date <= dateadd(day, -6, t.dt)
) t1;
If you don't have exactly one day for each row, for example if you have a list of transactions...
The below example completely confused me the first time I saw it, so I've tried to comment as much as I can to explain what's happening.
Suppose we have a table tbl with date column dt and amount column amt, and for each date in tbl we want to return a rolling sum of the amount from the current day and the past 6 days.
select distinct -- see note after code on what this distinct is doing.
dt
, ( -- Has to be in brackets to denote we're returning 1 value per row.
-- for each row of T1:
select sum(b.amt) -- the sum of amounts in T2. The where clause will restrict which rows in T2 will be summed.
from tbl T2
where T2.dt between T1.dt - 6 and T1.dt -- for each row in T1, give me all rows in T2 where the date is between 6 days before this T1 row's date and T1 row's date, giving us our rolling sum
-- WARNING: CHECK YOUR VERSION OF SQL FOR HOW TO SUBTRACT DAYS FROM A DATE, I'VE MADE IT (T1.dt - 6) FOR SIMPLICITY
-- we don't need a group by, because we're returning one value for each row in T1
)
from tbl T1
We have a main version of tbl, aliased T1. We then have a secondary table, aliased T2. For each row in T1, we're going to ask for a set of rows in T2 that we're going to sum before giving it to our main query.
To understand what's happening, run the code without the distinct. You'll notice that we have the same number of rows as in tbl, because the T2 statement is happening for every row in T1.
Notes:
If you have any days for which no rows exist in your table you will not get a calculation for this day. To be certain this doesn't happen, join your table to a table containing a distinct list of consecutive dates, and use this as your date column.
If you have nulls in your amount column the calculation will still work, but if the rolling average contains only nulls you will have null instead of 0 as your result. If that troubles you convert all your nulls to zero's before (or after) you use the query.
The beginning of the period will have a 'ramp up'. But this would be the same whatever method you use to do a rolling sum. If it bothers you, don't return the first 6 days.
Finally a worked example if you're playing along at home using SQL Server:
with tbl as (
-- a list of transactions from 1.10.2019 to 14.10.2019
select cast('2019-10-01' as date) dt, 1 amt
union select cast('2019-10-02' as date), 4
union select cast('2019-10-01' as date), 10
union select cast('2019-10-03' as date), 3
union select cast('2019-10-04' as date), 20
union select cast('2019-10-04' as date), 2
union select cast('2019-10-04' as date), 12
union select cast('2019-10-04' as date), 17
union select cast('2019-10-05' as date), null -- a whole week of null values because we all had the week off... I hope this data wasn't important
union select cast('2019-10-06' as date), null
union select cast('2019-10-07' as date), null
union select cast('2019-10-08' as date), null
union select cast('2019-10-09' as date), null
union select cast('2019-10-10' as date), null
union select cast('2019-10-10' as date), null
union select cast('2019-10-10' as date), null
union select cast('2019-10-11' as date), null
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-13' as date), 2
union select cast('2019-10-14' as date), 1000
)
select distinct
a.dt
, (
select sum(b.amt)
from tbl b
where b.dt between dateadd(dd, -6, a.dt) and a.dt
) past_7_days_amt
from tbl a
Returns:
+------------+-----------------+
| dt | past_7_days_amt |
+------------+-----------------+
| 2019-10-01 | 11 |
| 2019-10-02 | 15 |
| 2019-10-03 | 18 |
| 2019-10-04 | 69 |
| 2019-10-05 | 69 |
| 2019-10-06 | 69 |
| 2019-10-07 | 69 |
| 2019-10-08 | 58 |
| 2019-10-09 | 54 |
| 2019-10-10 | 51 |
| 2019-10-11 | NULL |
| 2019-10-12 | 1 |
| 2019-10-13 | 3 |
| 2019-10-14 | 1003 |
+------------+-----------------+

SQL Server - find absence date occurrences [duplicate]

This question already has an answer here:
SQL: Gaps and Islands, Grouped dates
(1 answer)
Closed 5 years ago.
I have the following dataset:
enter image description here
Here is script for this data:
;with dataset AS (
select 'EMP01' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-07' AS DATE) AS CUT_DATE
UNION
select 'EMP01' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-15' AS DATE) AS CUT_DATE
UNION
select 'EMP02' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-09' AS DATE) AS CUT_DATE
)
select *
from dataset
I need to divide these periods (PERIOD_START and PERIOD_END) by CUT_DATE (exclude cut dates from that periods) The number of cut dates could be any (3,5,8 etc).
Expecting result for the dataset above is:
If your version of SQL Server supports LAG, you can use this.
SELECT EMPLOYEE_ID,
ITEM_TYPE,
MIN(APPLY_DATE) AS STARTDATE,
MAX(APPLY_DATE) AS ENDDATE
FROM
(SELECT T.*,
SUM(CASE WHEN PREV_TYPE=ITEM_TYPE THEN 0 ELSE 1 END)
OVER(PARTITION BY EMPLOYEE_ID ORDER BY APPLY_DATE) AS GRP
FROM (SELECT D.*,
LAG(ITEM_TYPE) OVER(PARTITION BY EMPLOYEE_ID ORDER BY APPLY_DATE) AS PREV_TYPE
FROM DATA D
) T
) T
WHERE ITEM_TYPE IN ('Sickness','Vacation')
GROUP BY EMPLOYEE_ID,ITEM_TYPE,GRP
The logic is to get the previous row's item_type (based on ascending order of apply_date) and compare it with the current row's value. If they are equal, they belong to the same group. Else you start a new group. This is done in the sum window function. After groups are assigned, you just need to get the max and min date for an employee_id,item_type.
Sample Demo
You would use the LAG function.
If you order by something, the LAG function gives the previous value;
a full description can be found at: http://www.sqlservercentral.com/articles/T-SQL/106783/
Take a look at vkp's answer for a full query
This is another way if way if lag is supported.
Rextester Sample
with tbl as
(select d.*
,case when (item_type = lag(item_type) over (partition by employee_id order by apply_date))
then 0
else 1
end grp_tmp
from DATA2 d
where
item_type <> 'Worked'
)
,tbl2 as
(select t.*
,sum(grp_tmp) over (order by employee_id,apply_date
rows between unbounded preceding and current row
)
as grp
from tbl t
)
select
EMPLOYEE_ID
,ITEM_TYPE
,(CONVERT(VARCHAR(24),min(apply_date),103)
+' - '
+CONVERT(VARCHAR(24),max(apply_date),103)
) as range
from tbl2
group by EMPLOYEE_ID,
ITEM_TYPE
,grp
order by
employee_id
,min(apply_date);
Output
+-------------+-----------+-------------------------+
| EMPLOYEE_ID | ITEM_TYPE | range |
+-------------+-----------+-------------------------+
| 1 | Sickness | 23/05/2017 - 24/05/2017 |
| 1 | Vacation | 26/05/2017 - 29/05/2017 |
| 1 | Sickness | 01/06/2017 - 01/06/2017 |
| 2 | Sickness | 25/05/2017 - 30/05/2017 |
+-------------+-----------+-------------------------+

Window functions with missing data

Assume that I have a table (MyTable) as follows:
item_id | date
----------------
1 | 2016-06-08
1 | 2016-06-07
1 | 2016-06-05
1 | 2016-06-04
1 | 2016-05-31
...
2 | 2016-06-08
2 | 2016-06-06
2 | 2016-06-04
2 | 2016-05-31
...
3 | 2016-05-31
...
I would like to build a weekly summary table that reports on a running 7 day window. The window would basically say "How many unique item_ids were reported in the preceding 7 days"?
So, in this case, the output table would look something like:
date | weekly_ids
----------------------
2016-05-31| 3 # All 3 were present on the 31st
2016-06-01| 3 # All 3 were present on the 31st which is < 7 days before the 1st
2016-06-02| 3 # Same
2016-06-03| 3 # Same
2016-06-04| 3 # Same
2016-06-05| 3 # Same
2016-06-06| 3 # Same
2016-06-07| 3 # Same
2016-06-08| 2 # item 3 was not present for the entire last week so it does not add to the count.
I've tried:
SELECT
item_id,
date,
MAX(present) OVER (
PARTITION BY item_id
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS is_present
FROM (
# Inner query
SELECT
item_id,
date,
1 AS present,
FROM MyTable
)
GROUP BY date
ORDER BY date DESC
This feels like it is going in the right direction. But as it is, the window runs over the wrong time-frame when dates aren't present (too many dates) and it also doesn't output records for dates when the item_id wasn't present (even if it was present on the previous date). Is there a simple resolution to this problem?
If it's helpful and necessary
I can hard-code an oldest date
I also can get a table of all of the item_ids in existence.
This query will only be run on BigQuery, so BQ specific functions/syntax are fair game and SQL functions/syntax that doesn't run on BigQuery unfortunately doesn't help me ...
I have created a temp table to hold dates, however, you probably would benefit from adding a permanent table to your database for these joins. Trust me it will cause less headaches.
DECLARE #my_table TABLE
(
item_id int,
date DATETIME
)
INSERT #my_table SELECT 1,'2016-06-08'
INSERT #my_table SELECT 1,'2016-06-07'
INSERT #my_table SELECT 1,'2016-06-05'
INSERT #my_table SELECT 1,'2016-06-04'
INSERT #my_table SELECT 1,'2016-05-31'
INSERT #my_table SELECT 2,'2016-06-08'
INSERT #my_table SELECT 2,'2016-06-06'
INSERT #my_table SELECT 2,'2016-06-04'
INSERT #my_table SELECT 2,'2016-05-31'
INSERT #my_table SELECT 3,'2016-05-31'
DECLARE #TrailingDays INT=7
DECLARE #LowDate DATETIME='01/01/2016'
DECLARE #HighDate DATETIME='12/31/2016'
DECLARE #Calendar TABLE(CalendarDate DATETIME)
DECLARE #LoopDate DATETIME=#LowDate
WHILE(#LoopDate<=#HighDate) BEGIN
INSERT #Calendar SELECT #LoopDate
SET #LoopDate=DATEADD(DAY,1,#LoopDate)
END
SELECT
date=HighDate,
weekly_ids=COUNT(DISTINCT item_id)
FROM
(
SELECT
HighDate=C.CalendarDate,
LowDate=LAG(C.CalendarDate, #TrailingDays,0) OVER (ORDER BY C.CalendarDate)
FROM
#Calendar C
WHERE
CalendarDate BETWEEN #LowDate AND #HighDate
)AS X
LEFT OUTER JOIN #my_table MT ON MT.date BETWEEN LowDate AND HighDate
GROUP BY
LowDate,
HighDate
Try below example. It can give you direction to explore
Purely GBQ - Legacy SQL
SELECT date, items FROM (
SELECT
date, COUNT(DISTINCT item_id) OVER(ORDER BY sec RANGE BETWEEN 60*60*24*2 PRECEDING AND CURRENT ROW) AS items
FROM (
SELECT
item_id, date, timestamp_to_sec(timestamp(date)) AS sec
FROM (
SELECT calendar.day AS date, MyTable.item_id AS item_id
FROM (
SELECT DATE(DATE_ADD(TIMESTAMP('2016-05-28'), pos - 1, "DAY")) AS day
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP('2016-05-28')), '.'),'') AS h
FROM (SELECT NULL)),h
)))
) AS calendar
LEFT JOIN (
SELECT date, item_id
FROM
(SELECT 1 AS item_id, '2016-06-08' AS date),
(SELECT 1 AS item_id, '2016-06-07' AS date),
(SELECT 1 AS item_id, '2016-06-05' AS date),
(SELECT 1 AS item_id, '2016-06-04' AS date),
(SELECT 1 AS item_id, '2016-05-28' AS date),
(SELECT 2 AS item_id, '2016-06-08' AS date),
(SELECT 2 AS item_id, '2016-06-06' AS date),
(SELECT 2 AS item_id, '2016-06-04' AS date),
(SELECT 2 AS item_id, '2016-05-31' AS date),
(SELECT 3 AS item_id, '2016-05-31' AS date),
(SELECT 3 AS item_id, '2016-06-05' AS date)
) AS MyTable
ON calendar.day = MyTable.date
)
)
)
GROUP BY date, items
ORDER BY date
Please note
oldest date - 2016-05-28 - is hardcoded in calendar subquery
window size is controled in RANGE BETWEEN 60*60*24*2 PRECEDING AND CURRENT ROW; if you need 7 days - the expression should be 60*60*24*6
have in mind specifics of COUNT(DISTINCT) in BigQuery Legacy SQL

Select rows where price didn't change

Suppose you have a table like (am using SQL Server 2008, no audit log - table is HUGE):
SecID | Date | Price
1 1/1/11 10
1 1/2/11 10
1 1/3/11 5
1 1/4/11 10
1 1/5/11 10
Suppose this table is HUGE (millions of rows for different secIDs and Date) - I would like to return the records when the price changed (looking for something better than using a cursor and iterating):
Am trying to figure out how to get:
SecID | StartDate | EndDate | Price
1 1/1/11 1/2/11 10
1 1/3/11 1/3/11 5
1 1/4/11 1/5/11 10
i.e. another way to look at it is that I am looking for a range of dates where the price has stayed the same.
This is an "islands" problem.
declare #Yourtable table
(SecID int, Date Date, Price int)
INSERT INTO #Yourtable
SELECT 1,GETDATE()-5,10 union all
SELECT 1,GETDATE()-4,10 union all
SELECT 1,GETDATE()-3,5 union all
SELECT 1,GETDATE()-2,10 union all
SELECT 1,GETDATE()-1, 10
;WITH cte AS
(
SELECT SecID,Date,Price,
ROW_NUMBER() OVER (PARTITION BY SecID ORDER BY Date) -
ROW_NUMBER() OVER (PARTITION BY Price, SecID ORDER BY Date) AS Grp
FROM #Yourtable
)
SELECT SecID,Price, MIN(Date) StartDate, MAX(Date) EndDate
FROM cte
GROUP BY SecID, Grp, Price
ORDER BY SecID, MIN(Date)
If the value does not change, the std deviation will be zero
select secId
from ...
group by secId
having count(*) = 1
OR stdev(price) = 0
I think this should work
SELECT SecID, Min(Date) AS StartDate, Max(Date) AS EndDate, Price FROM BigTable GROUP BY SecID, EndDate Having Min(Date) != MAx(Date) And Date != NULL