SQL Report for grouping by date ranges - sql

I have a table which stores data like this:
ItemID Date Value
01 1/1/15 1
01 2/1/15 2
01 3/1/15 0
01 4/1/15 0
01 5/1/15 3
01 6/1/15 1
How do I generate a report in SQL which would show the begin and end dates of all zero periods per item?
In this example, I would get :
ItemID Start End
01 3/1/14 4/1/15
The condition is that there will be multiple zero periods during the year, and all of them should appear in the report (so simple group by will not do).
Thanks very much!

This will return the START and END dates of all continuous zero VALUE.
SQL Fiddle
;WITH Cte AS(
SELECT *,
RN = DATEADD(MONTH,- ROW_NUMBER() OVER(PARTITION BY ItemID ORDER BY [Date]), [Date])
FROM Test
WHERE Value = 0
)
SELECT
ItemID,
Start = MIN([Date]),
[End] = MAX([Date])
FROM Cte
GROUP BY
ItemID, RN
Sample Data
ItemID Date Value
------ ---------- -----------
01 2015-01-01 1
01 2015-02-01 2
01 2015-03-01 0
01 2015-04-01 0
01 2015-05-01 3
01 2015-06-01 1
01 2015-07-01 0
01 2015-08-01 0
01 2015-09-01 0
RESULT
ItemID Start End
------ ---------- ----------
01 2015-03-01 2015-04-01
01 2015-07-01 2015-09-01

A more general solution (works with 2012+):
with x as (
select *,
case when lag(value) over(partition by itemid order by date) <> value then 1 else 0 end as l
from #t
),
y as (
select *, sum(l) over(partition by itemid order by date) as grp
from x
where value = 0
)
select itemid, min(date), max(date)
from y
group by itemid, grp
order by itemid, grp

Related

How can I select records from the last value accumulated

I have the next data: TABLE_A
RegisteredDate
Quantity
2022-03-01 13:00
100
2022-03-01 13:10
20
2022-03-01 13:20
-80
2022-03-01 13:30
-40
2022-03-02 09:00
10
2022-03-02 22:00
-5
2022-03-03 02:00
-5
2022-03-03 03:00
25
2022-03-03 03:20
-10
If I add cumulative column
select RegisteredDate, Quantity
, sum(Quantity) over ( order by RegisteredDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as Summary
from TABLE_A
RegisteredDate
Quantity
Summary
2022-03-01 13:00
100
100
2022-03-01 13:10
20
120
2022-03-01 13:20
-80
40
2022-03-01 13:30
-40
0
2022-03-02 09:00
10
10
2022-03-02 22:00
-5
5
2022-03-03 02:00
-5
0
2022-03-03 03:00
25
25
2022-03-03 03:20
-10
15
Is there a way to get the following result with a query?
RegisteredDate
Quantity
Summary
2022-03-03 03:00
25
25
2022-03-03 03:20
-10
15
This result is the last records after the last zero.
EDIT:
Really for the solution to this problem I need the: 2022-03-03 03:00 is the first date of the last records after the last zero.
You can try to use SUM aggregate window function to calculation grp column which part represent to last value accumulated.
Query 1:
WITH cte AS
(
SELECT RegisteredDate,
Quantity,
sum(Quantity) over (order by RegisteredDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as Summary
FROM TABLE_A
), cte2 AS (
SELECT *,
SUM(CASE WHEN Summary = 0 THEN 1 ELSE 0 END) OVER(order by RegisteredDate desc) grp
FROM cte
)
SELECT RegisteredDate,
Quantity
FROM cte2
WHERE grp = 0
ORDER BY RegisteredDate
Results:
| RegisteredDate | Quantity |
|----------------------|----------|
| 2022-03-03T03:00:00Z | 25 |
| 2022-03-03T03:20:00Z | -10 |
Use a CTE that returns the summary column and NOT EXISTS to filter out the rows that you don't need:
WITH cte AS (SELECT *, SUM(Quantity) OVER (ORDER BY RegisteredDate) Summary FROM TABLE_A)
SELECT c1.*
FROM cte c1
WHERE NOT EXISTS (
SELECT 1
FROM cte c2 WHERE c2.RegisteredDate >= c1.RegisteredDate AND c2.Summary = 0
)
ORDER BY c1.RegisteredDate;
There is no need for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW in the OVER clause of the window function, because this is the default behavior.
See the demo.
Try this:
with u as
(select RegisteredDate,
Quantity,
sum(Quantity) over (order by RegisteredDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as Summary
from TABLE_A)
select * from u
where RegisteredDate >= all(select RegisteredDate from u where Summary = 0)
and Summary <> 0;
Fiddle
Basically what you want is for RegisteredDate to be >= all RegisteredDatess where Summary = 0, and you want Summary <> 0.
When using window functions, it is necessary to take into account that RegisteredDate column is not unique in TABLE_A, so ordering only by RegisteredDate column is not enough to get a stable result on the same dataset.
With A As (
Select ROW_NUMBER() Over (Order by RegisteredDate, Quantity) As ID, RegisteredDate, Quantity
From TABLE_A),
B As (
Select A.*, SUM(Quantity) Over (Order by ID) As Summary
From A)
Select Top 1 *
From B
Where ID > (Select MAX(ID) From B Where Summary=0)
ID
RegisteredDate
Quantity
Summary
8
2022-03-03 03:00
25
25

SQL - Snowflake - Inner Join not working as expected

I have a table ADS in snowflake like so (data is being inserted each day), note there are duplicates entries on rows 3 and 4:
ID
REPORT_DATE
CLICKS
IMPRESSIONS
1
Jan 01
20
400
1
Jan 02
25
600
1
Jan 03
80
900
1
Jan 03
80
900
2
Jan 01
30
500
2
Jan 02
55
650
2
Jan 03
90
950
I want to select all entries based on ID with the max REPORT_DATE - essentially I want to know the latest number of CLICKS and IMPRESSIONS for each ID:
ID
REPORT_DATE
CLICKS
IMPRESSIONS
1
Jan 03
80
900
2
Jan 03
90
950
This query successfully gives me the max DATE for each ID:
SELECT
MAX(REPORT_DATE),
ID
FROM ADS
GROUP BY
ID;
Result:
ID
MAX(REPORT_DATE)
1
Jan 03
2
Jan 03
However, when I try to conduct an inner join, duplicates arise:
SELECT
a.ID,
a.REPORT_DATE,
a.CLICKS,
a.IMPRESSIONS
FROM ADS a
INNER JOIN (
SELECT
MAX(REPORT_DATE),
ID
FROM ADS
GROUP BY
ID
) b
ON a.ID = b.ID
AND a.REPORT_DATE = b.REPORT_DATE;
Result:
ID
REPORT_DATE
CLICKS
IMPRESSIONS
1
Jan 03
80
900
1
Jan 03
80
900
2
Jan 03
90
950
How can I construct my query to remove these duplicates?
You could use QUALIFY and ROW_NUMBER():
SELECT a.ID,a.REPORT_DATE,a.CLICKS,a.IMPRESSIONS
FROM ADS a
QUALIFY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY REPORT_DATE DESC) = 1;
Please note that ORDER BY REPORT_DATE is not stable(in case of a tie). I would suggest adding another column for sorting that is the tuple is always unique.
If the rows that have a tie are the same it actually is not an issue.
You can use row_number() window function:
select id, report_date, clicks, impresions from
(
select id, report_date, clicks, impresions, row_number()over(partition by id order
by report_date desc) rnk from ADs
)t
where rn=1

sql date difference with multiple variables

I'm trying to get the number of days difference in dates between the effdate status 0 that follows the most recent status 1
the code below yields the following results
SELECT * FROM
(SELECT FILEKEY, STATUS, EFFDATE FROM ASTATUSHIST
UNION
SELECT FILEKEY, ASTATUS, ASTATUSEFFDATE FROM USERS ) A
ORDER BY 1, 3 DESC
130 0 2019-10-25 00:00:00.000
130 0 2017-03-01 00:00:00.000
130 0 2017-01-01 00:00:00.000
130 1 2005-02-01 00:00:00.000
130 0 2001-03-03 00:00:00.000
130 0 2000-01-30 00:00:00.000
130 0 2000-01-01 00:00:00.000
this code combines 2 tables to get the complete history for a given user.
Ideally I could produce something that looks like this:
130 4352
or
125 null
where the null is filekey without a status 1 or a filekey with a status 1 but without a following status 0
Thanks
In all supported versions of SQL Server, you can use window functions:
with t as (
<your query here>
)
select t.*,
datediff(day, date, next_date) as days_diff
from (select t.*,
row_number() over (partition by filekey, status order by date desc) as seqnum,
lead(date) over (partition by filekey order by date) as next_date
from t
) t
where seqnum = 1;

MsSql Compare specific datetimes in sequence based on ID

I have a table where we store our data from a call and it looks like this:
CallID Arrive_Seq DateTime ActivitytypeID
1 1 2018-01-01 05:00:00 1
1 2 2018-01-01 05:00:01 2
1 3 2018-01-01 06:00:00 21
1 4 2018-01-01 06:00:01 28
1 5 2018-01-01 06:00:02 13
1 6 2018-01-01 06:00:03 22
1 7 2018-01-01 06:00:05 29
1 8 2018-01-01 06:05:00 21
1 9 2018-01-01 06:05:01 28
1 10 2018-01-01 06:05:02 13
1 11 2018-01-01 06:05:03 22
1 12 2018-01-01 06:07:45 29
Now I want to select the datediff between ActivitytypeID 21 and 29 in the arrive_sew order. In this example they occur twice (on arrive_seq 3,8 and 7,12). This order is not specific and ActivitytypeID can occur both more and less times in the sequence but they are always connected with eachother. Think of it as ActivitytypeID 21 = 'call started' AND ActivitytypeID = 29 'Call ended'.
In the example the answer whould be:
SELECT DATEDIFF (SECOND, '2018-01-01 06:00:00', '2018-01-01 06:00:05') = 5 -- Compares datetime of arrive_seq 3 and 7
AND
SELECT DATEDIFF (SECOND, '2018-01-01 06:00:05', '2018-01-01 06:07:45') = 460 -- Compares datetime of arrive_seq 21 and 29
Total duration = 465
I have tried with this code but it doesn't work all the time due to row# can change based on arrive_seq and ActivitytypeID
;WITH CallbackDuration AS (
SELECT ROW_NUMBER() OVER(ORDER BY a.time_stamp ASC) AS RowNumber, DATEDIFF(second, a.time_stamp, b.time_stamp) AS 'Duration'
FROM Table a
JOIN Table b on a.call_id = b.call_id
WHERE a.call_id = 1 AND a.activity_type = 21 AND b.activity_type = 29
GROUP BY a.time_stamp, b.time_stamp,a.call_id)
SELECT SUM(Duration) AS 'Duration' FROM CallbackDuration WHERE RowNumber in (1,5,9)
I think this is what you want:
select
call_start,
call_end,
datediff (second, call_start, call_end) as duration
from
(
select
call_timestamp as call_end,
lag(call_timestamp) over (partition by call_id order by call_timestamp) as call_start,
activity_type as call_end_activity,
lag (activity_type) over (partition by call_id order by call_timestamp) as call_start_activity
from
call_log
where
activity_type in (21, 29)
) x
where
call_start_activity = 21;
Result:
call_start call_end duration
----------------------- ----------------------- -----------
2018-01-01 06:00:00.000 2018-01-01 06:00:05.000 5
2018-01-01 06:05:00.000 2018-01-01 06:07:45.000 165
(2 rows affected)
Note that the time of the second call is based on your sample data with start time 2018-01-01 06:05:00
This query seems to return your expected result
declare #x int = 21
declare #y int = 29
;with cte(CallID, Arrive_Seq, DateTime, ActivitytypeID) as (
select
a, b, cast(c as datetime), d
from (values
(1,1,'2018-01-01 05:00:00',1)
,(1,2,'2018-01-01 05:00:01',2)
,(1,3,'2018-01-01 06:00:00',21)
,(1,4,'2018-01-01 06:00:01',28)
,(1,5,'2018-01-01 06:00:02',13)
,(1,6,'2018-01-01 06:00:03',22)
,(1,7,'2018-01-01 06:00:05',29)
,(1,8,'2018-01-01 06:05:00',21)
,(1,9,'2018-01-01 06:05:01',28)
,(1,10,'2018-01-01 06:05:02',13)
,(1,11,'2018-01-01 06:05:03',22)
,(1,12,'2018-01-01 06:07:45',29)
) t(a,b,c,d)
)
select
sum(ss)
from (
select
*, ss = datediff(ss, DateTime, lead(datetime) over (order by Arrive_Seq))
, rn = row_number() over (order by Arrive_Seq)
from
cte
where
ActivitytypeID in (#x, #y)
) t
where
rn % 2 = 1

Count seconds on switch interval SQL Server

I have a table like this:
Value TimeStamp
1 2016-04-01 00:01:09.000
0 2016-04-01 00:01:09.000
0 2016-04-01 00:01:37.000
1 2016-04-01 00:01:37.000
1 2016-04-01 00:04:52.000
1 2016-04-01 00:09:58.000
1 2016-04-01 00:15:05.000
1 2016-04-01 00:20:11.000
1 2016-04-01 00:24:49.000
1 2016-04-01 00:29:55.000
1 2016-04-01 00:31:19.000
0 2016-04-01 00:31:19.000
0 2016-04-01 00:31:46.000
1 2016-04-01 00:31:46.000
1 2016-04-01 00:35:01.000
1 2016-04-01 00:40:07.000
1 2016-04-01 00:44:46.000
1 2016-04-01 00:49:52.000
1 2016-04-01 00:54:58.000
1 2016-04-01 01:00:04.000
1 2016-04-01 01:01:28.000
0 2016-04-01 01:01:28.000
0 2016-04-01 01:05:10.000
0 2016-04-01 01:09:49.000
And i want to count the seconds where value is 1 (switch ON) PER DAY, here is the deal; When the timeStamp repeats it means that there was a change from 0 to 1 or viceversa in the switch value, I already had many aproches like:
Q1 AS (SELECT ROW_NUMBER() OVER (ORDER BY TimeStamp) AS id,
Value, Timestamp
FROM Q2
GROUP BY idVBox, sensorType, sensorSubtype, timeStamp
HAVING COUNT(TimeStamp) > 1)
Then:
SELECT A.Value, DATEDIFF(SECOND,A.TimeStamp,B.TimeStamp)
FROM Q1 AS A
INNER JOIN Q1 AS B
ON B.ID = A.ID + 1
AND B.ID%2 = 0
Then Group by and Sum, but here the problem is that i don't know if the value comes in 1 or 0 from the past day, and the switch can change it's state quick and never get an actual value of it's actual state. Any other idea?
What you want to do, is add a dummy sensor state switch into your set at the beginning of the day before you start your calculation.
The extra records added are:
0, '2016-04-01 00:00:00'
1, '2016-04-01 00:00:00' -- This is conditional on the first record in your set having a value of 1
The overall query is below
Note: in order to determine what record is actually the first in sequence I used "ID" column.
;WITH Q0 AS(
-- Inserts a new record ( 0, '2016-04-01 00:00:00' ) to the beginning of the day
SELECT TOP 1 0 AS Value, CONVERT( DATETIME, CONVERT( DATE, LogDate )) AS LogDate
FROM #SwitchLog
UNION ALL
-- Inserts a new record ( 1, '2016-04-01 00:00:00' ) to the beginning of the day when the first record has Value = 1
SELECT Value, CONVERT( DATETIME, CONVERT( DATE, LogDate )) AS LogDate
FROM
( SELECT TOP 1 ID, Value, LogDate
FROM #SwitchLog
ORDER BY LogDate ASC, ID ASC ) AS DummyRecord --<-- NOTE: the use of a table ID column
WHERE Value = 1
UNION ALL
SELECT Value, LogDate
FROM #SwitchLog
)
,
Q1 AS (SELECT ROW_NUMBER() OVER (ORDER BY LogDate) AS id,
SUM( Value ) AS Value, LogDate
FROM Q0
GROUP BY LogDate
HAVING COUNT(LogDate) > 1)
SELECT A.Value, DATEDIFF(SECOND,A.LogDate,B.LogDate) AS Total
FROM Q1 AS A
INNER JOIN Q1 AS B
ON B.ID = A.ID + 1 AND B.ID%2 = 0
Output:
Value Total
----------- -----------
1 69
1 1782
1 1782
Same approach should be used to insert dummy record(s) at the end of the period/day ((day + 1) 00:00:00) to cater for situations where sensor value is 1 at the end of the day.
If using SQL Server 2012 then you could make good use of the LAG() function.
First, join the table on duplicate dates where value=1. Next, calculate difference between the on and the previous on. Finally, sum it up.
NOTE : The LAG() will return null for first on of the day.
SELECT
Seconds=SUM(X.Seconds)
FROM
(
SELECT
Seconds=DATEDIFF(SECOND,LAG(T1.TimeStamp) OVER (ORDER BY T1.TimeStamp),T1.TimeStamp)
FROM
MyTable T1
INNER JOIN MyTable T2 ON T2.TimeStamp=T1.TimeStamp AND T1.Value<>T2.Value
WHERE
T1.Value=1
)AS X