Get value from previous hour and subtract from current value in SQL - sql

I need to subtract level value from previous hour max(date)'s level value but I'm confused on how to get the last record and subtract it hour-wise.
My table records are like this:
SNO | Date | ID | Level
1 | 2021-01-13 00:07:44.190 | 1021 | 56.29
2 | 2021-01-13 00:33:44.190 | 1022 | 84.29
3 | 2021-01-13 00:35:44.190 | 1021 | 54.29
4 | 2021-01-13 00:43:44.190 | 1021 | 53.29
5 | 2021-01-13 00:47:44.190 | 1022 | 82.29
6 | 2021-01-13 01:07:44.190 | 1021 | 52.93
7 | 2021-01-13 01:33:44.190 | 1022 | 82.29
8 | 2021-01-13 01:43:44.190 | 1021 | 47.29
9 | 2021-01-13 01:47:44.190 | 1022 | 79.29
10 | 2021-01-13 02:07:44.190 | 1021 | 44.29
11 | 2021-01-13 02:33:44.190 | 1022 | 77.29
Now what I need to do is I need max(date) from each hour on the basis of ID whose results are like this:
SNO | Date | ID | Level | Level_2
3 | 2021-01-13 00:43:44.190 | 1021 | 53.29 | <-- Level from previous last hour or 0 -->
4 | 2021-01-13 00:47:44.190 | 1022 | 82.29 | <-- Level from previous last hour or 0 -->
7 | 2021-01-13 01:43:44.190 | 1021 | 47.29 | 54.29
8 | 2021-01-13 01:47:44.190 | 1022 | 79.29 | 82.29
9 | 2021-01-13 02:07:44.190 | 1021 | 44.29 | 47.29
10 | 2021-01-13 02:33:44.190 | 1022 | 77.29 | 79.29
Kindly please share possible results for this condition and you can ask for more information if needed.

To get the results you want, you can filter down to the last row in each hour and then use lag():
select t.*,
lag(level) over (partition by id order by date) as prev_level
from (select t.*,
row_number() over (partition by id
order by convert(date, date), datepart(hour, date) order by date desc
) as seqnm
from t
) t
where seqnum = 1;
Note: This assumes that you have data for every hour.
Another method is to look at the next date and see if it is in the same hour.
An alternative way to get data for every hour is to use lead():
select t.*
from (select t.*,
lead(date) over (partition by id order by date) as next_date
from t
) t
where next_date is null or
datediff(hour, date, next_date) > 0;

Related

Retrieve SQL records where only the last unique entries match criteria in postgresql

I've got a long table that tracks a numerical 'state' value (0=new, 1=setup mode, 2=retired, 3=active, 4=inactive) of a collection of 'devices' historically. These devices may be activated/deactivated throughout the year, so the table is continuous collection of state changes - mostly state 3 and 4, ordered by id, with a timestamp on the end, for example:
id | device_id | new_state | when
----------+-----------+-----------+----------------------------
218010581 | 2505 | 0 | 2022-06-06 16:28:11.174084
218010580 | 2505 | 1 | 2022-06-06 16:28:11.174084
218010634 | 2505 | 3 | 2022-06-06 16:29:25.129019
218087737 | 659 | 3 | 2022-06-07 22:55:48.705208
218087744 | 1392 | 3 | 2022-06-07 22:55:59.016974
218087757 | 1556 | 3 | 2022-06-07 22:56:09.811876
218087758 | 2071 | 1 | 2022-06-07 22:56:20.850095
218087765 | 2071 | 3 | 2022-06-07 22:56:29.122074
When I want to look for a list of devices and see their 'history', I know I can use something like:
select *
from devstatechange
where device_id = 2345
order by "when";
id | device_id | new_state | when
-----------+-----------+-----------+----------------------------
184682659 | 2345 | 0 | 2021-05-27 17:03:36.894429
184682658 | 2345 | 1 | 2021-05-27 17:03:36.894429
184684721 | 2345 | 3 | 2021-05-27 17:31:01.968314
194933399 | 2345 | 4 | 2021-08-31 23:30:05.555407
195213746 | 2345 | 3 | 2021-09-03 16:53:39.043005
206278232 | 2345 | 4 | 2021-12-31 22:30:08.820068
206515355 | 2345 | 3 | 2022-01-03 16:06:01.223759
215709888 | 2345 | 4 | 2022-04-30 23:30:30.309389
215846807 | 2345 | 3 | 2022-05-02 19:40:31.525514
select *
from devstatechange
where device_id = 2351
order by "when";
id | device_id | new_state | when
-----------+-----------+-----------+----------------------------
186091252 | 2351 | 0 | 2021-06-09 15:36:02.775035
186091253 | 2351 | 1 | 2021-06-09 15:36:02.775035
186091349 | 2351 | 3 | 2021-06-09 15:37:56.965599
197880878 | 2351 | 4 | 2021-09-30 23:30:06.691835
197945073 | 2351 | 3 | 2021-10-01 15:32:35.907913
208981857 | 2351 | 4 | 2022-01-31 22:30:09.521694
209722639 | 2351 | 3 | 2022-02-09 15:20:12.412816
217666572 | 2351 | 4 | 2022-05-31 23:30:30.881928
What I am really looking for is a query that returns a unique list of devices where the latest dated entry for each device only contains a state of '4' ('inactive state'), and not include records that do not match.
So in using the above data samples, even though both devices 2345 and 2351 have states of 3 and 4 throughout their history, only device 2351 has it's last dated entry with a state of 4 - meaning it is currently in an 'inactive' state. Device 2345's would not appear in the result set since its last dated entry has a state of 3 - it's still active.
Stabbing in the dark, I've tried variants of:
SELECT DISTINCT *
FROM devstatechange
WHERE MAX("when") AND new_state = 4
ORDER BY "when";
SELECT DISTINCT device_id, new_state, MAX("when")
FROM devstatechange
WHERE new_state = 4
ORDER BY "when";
with obviously no success.
I'm thinking I might need to 'group' the entries together, but I don't know how to specify 'return last entry only if new_state = 4' in SQL, or rather PostgreSQL.
Any tidbits or pokes in the right direction would be appreciated.
SELECT * FROM (
SELECT DISTINCT ON (device_id)
*
FROM devstatechange
ORDER BY device_id, "when" DESC
) AS latest
WHERE new_state = 4;
The DISTINCT ON keyword together with the ORDER BY will pull the newest row for each device. The outer query then filters these by your condition.
You may use Row_Number() function with a partition by device_id and order by when.
Try the following CTE:
with cte as
(
Select id ,device_id ,new_state ,when_ ,
row_number() over (partition by device_id order by when_ desc) as rn
from devstatechange
)
select * from cte where rn=1 and new_state=4
See a demo from db-fiddle.
The problem with:
SELECT DISTINCT * FROM devstatechange WHERE MAX("when") AND new_state=4 ORDER BY "when";
is that MAX("when") refers to all the entrys on the table.
you should change it to:
when = (select max(when) from devstatechange dev2 where dev2.device_id = dev1.device_id )
You can use CTE to obtain a last state of each device and then select only those, whose last state is 4, like this
WITH device_last_state AS (
SELECT DISTINCT ON (device_id)
id,
device_id,
last_value (new_state) over (partition by device_id order by "when" desc) as new_state,
"when"
FROM devicestatechange
)
SELECT * FROM device_last_state
WHERE new_state = 4
Check a demo

SQL Server - Counting total number of days user had active contracts

I want to count the number of days while user had active contract based on table with start and end dates for each service contract. I want to count the time of any activity, no matter if the customer had 1 or 5 contracts active at same time.
+---------+-------------+------------+------------+
| USER_ID | CONTRACT_ID | START_DATE | END_DATE |
+---------+-------------+------------+------------+
| 1 | 14 | 18.02.2021 | 18.04.2022 |
| 1 | 13 | 02.01.2019 | 02.01.2020 |
| 1 | 12 | 01.01.2018 | 01.01.2019 |
| 1 | 11 | 13.02.2017 | 13.02.2019 |
| 2 | 23 | 19.06.2021 | 18.04.2022 |
| 2 | 22 | 01.07.2019 | 01.07.2020 |
| 2 | 21 | 19.01.2019 | 19.01.2020 |
+---------+-------------+------------+------------+
In result I want a table:
+---------+--------------------+
| USER_ID | DAYS_BEEING_ACTIVE |
+---------+--------------------+
| 1 | 1477 |
| 2 | 832 |
+---------+--------------------+
Where
1477 stands by 1053 (days from 13.02.2017 to 02.01.2020 - user had active contracts during this time) + 424 (days from 18.02.2021 to 18.04.2022)
832 stands by 529 (days from 19.01.2019 to 01.07.2020) + 303 (days from 19.06.2021 to 18.04.2022).
I tried some queries with joins, datediff's, case when conditions but nothing worked. I'll be grateful for any help.
If you don't have a Tally/Numbers table (highly recommended), you can use an ad-hoc tally/numbers table
Example or dbFiddle
Select User_ID
,Days = count(DISTINCT dateadd(DAY,N,Start_Date))
from YourTable A
Join ( Select Top 10000 N=Row_Number() Over (Order By (Select NULL))
From master..spt_values n1, master..spt_values n2
) B
On N<=DateDiff(DAY,Start_Date,End_Date)
Group By User_ID
Results
User_ID Days
1 1477
2 832

How to use variable lag window functions?

I have a table with the following schema:
CREATE TABLE example (
userID,
status, --'SUCCESS' or 'FAIL'
date -- self explanatory
);
INSERT INTO example
Values(123, 'SUCCESS', 20211010),
(123, 'SUCCESS', 20211011),
(123, 'SUCCESS', 20211028),
(123, 'FAIL', 20211029),
(123, 'SUCCESS', 20211105),
(123, 'SUCCESS', 20211110)
I am trying to utilize a lag or lead function to assess whether the current line is within a 2-week window of the previous 'SUCCESS'. Given the current data, I would expect a isWithin2WeeksofSuccessFlag to be as following:
123, 'SUCCESS', 20211010,0 --since it is the first instance
123, 'SUCCESS', 20211011,1
123, 'SUCCESS', 20211028,1
123, 'FAIL', 20211029, 1 --failed, but criteria is that it is within 2 weeks of last success, which it is
123, 'SUCCESS', 20211105, 1 --last success is 2 rows back, but it is within 2 weeks
123, 'SUCCESS', 20211128, 0 --outside of 2 weeks
I would initially think to do something like this:
Select userID, status, date,
case when lag(status,1) over (partition by userid order by date asc) = 'SUCCESS'
and date_add('day',-14, date) <= lag(date,1) over (partition by userid order by date asc)
then 1 end as isWithin2WeeksofSuccessFlag
from example
This would work if I didn't have the 'FAIL' line in there. To handle it, I could modify the lag to 2 (instead of 1), but what about if I have 2,3,4,n 'FAIL's in a row? I would need to lag by 3,4,5,n+1. The specific number of FAILs in between is variable. How do I handle this variability?
NOTE I am querying billions of rows. Efficiency isn't really a concern (since it is for analysis), but running into memory allocation errors is.Thus, endlessly adding more window functions would likely cause an automatic termination of the query due memory requirement being above node limit.
How should I handle this?
Here's an approach, also using window functions, with each "common table expression" handling one step at a time.
Note: The expected result in the question does not match the data in the question. '20211128' doesn't exist in the actual data. I used the example INSERT statement.
In the test case, I changed the column name to xdate to avoid any potential SQL reserved word issues.
The SQL:
WITH cte1 AS (
SELECT *
, SUM(CASE WHEN status = 'SUCCESS' THEN 1 ELSE 0 END) OVER (PARTITION BY userID ORDER BY xdate) AS grp
FROM example
)
, cte2 AS (
SELECT *
, MAX(CASE WHEN status = 'SUCCESS' THEN xdate END) OVER (PARTITION BY userID, grp) AS lastdate
FROM cte1
)
, cte3 AS (
SELECT *
, CASE WHEN LAG(lastdate) OVER (PARTITION BY userID ORDER BY xdate) > (xdate - INTERVAL '2' WEEK) THEN 1 ELSE 0 END AS isNear
FROM cte2
)
SELECT * FROM cte3
ORDER BY userID, xdate
;
The result:
+--------+---------+------------+------+------------+--------+
| userID | status | xdate | grp | lastdate | isNear |
+--------+---------+------------+------+------------+--------+
| 123 | SUCCESS | 2021-10-10 | 1 | 2021-10-10 | 0 |
| 123 | SUCCESS | 2021-10-11 | 2 | 2021-10-11 | 1 |
| 123 | SUCCESS | 2021-10-28 | 3 | 2021-10-28 | 0 |
| 123 | FAIL | 2021-10-29 | 3 | 2021-10-28 | 1 |
| 123 | SUCCESS | 2021-11-05 | 4 | 2021-11-05 | 1 |
| 123 | SUCCESS | 2021-11-10 | 5 | 2021-11-10 | 1 |
+--------+---------+------------+------+------------+--------+
and with the data adjusted to match your expected result, plus a new user introduced, the result is this:
+--------+---------+------------+------+------------+--------+
| userID | status | xdate | grp | lastdate | isNear |
+--------+---------+------------+------+------------+--------+
| 123 | SUCCESS | 2021-10-10 | 1 | 2021-10-10 | 0 |
| 123 | SUCCESS | 2021-10-11 | 2 | 2021-10-11 | 1 |
| 123 | SUCCESS | 2021-10-28 | 3 | 2021-10-28 | 0 |
| 123 | FAIL | 2021-10-29 | 3 | 2021-10-28 | 1 |
| 123 | SUCCESS | 2021-11-05 | 4 | 2021-11-05 | 1 |
| 123 | SUCCESS | 2021-11-28 | 5 | 2021-11-28 | 0 |
| 323 | SUCCESS | 2021-10-10 | 1 | 2021-10-10 | 0 |
| 323 | SUCCESS | 2021-10-11 | 2 | 2021-10-11 | 1 |
| 323 | SUCCESS | 2021-10-28 | 3 | 2021-10-28 | 0 |
| 323 | FAIL | 2021-10-29 | 3 | 2021-10-28 | 1 |
| 323 | SUCCESS | 2021-11-05 | 4 | 2021-11-05 | 1 |
| 323 | SUCCESS | 2021-11-28 | 5 | 2021-11-28 | 0 |
+--------+---------+------------+------+------------+--------+
Here's an extra test case, which might expose problems in some solutions:
INSERT INTO example VALUES
(123, 'SUCCESS', '2021-10-11')
, (123, 'FAIL' , '2021-10-12')
, (123, 'FAIL' , '2021-10-13')
;
The result:
+--------+---------+------------+------+------------+--------+
| userID | status | xdate | grp | lastdate | isNear |
+--------+---------+------------+------+------------+--------+
| 123 | SUCCESS | 2021-10-11 | 1 | 2021-10-11 | 0 |
| 123 | FAIL | 2021-10-12 | 1 | 2021-10-11 | 1 |
| 123 | FAIL | 2021-10-13 | 1 | 2021-10-11 | 1 |
+--------+---------+------------+------+------------+--------+
If your DBMS doesn't support window function filters you can order by status desc so 'SUCCESS' goes before 'FAIL'.
select userID, status, date,
case when lag(status,1) over (partition by userid order by status desc , date asc) = 'SUCCESS'
and dateadd(d, -14, date) <= lag(date,1) over (partition by userid order by status desc , date asc)
then 1 end as isWithin2WeeksofSuccessFlag
from example
order by date
Sql Server fiddle

How to concat two fields and use the result in WHERE clause?

I have to get all oldest records based on the date-time information.
Data
Id | External Id | Date | Time
1 | 1000 | 2020-08-18 00:00:00 | 02:30:22
2 | 1000 | 2020-08-12 00:00:00 | 12:45:51
3 | 1556 | 2020-08-17 00:00:00 | 10:09:01
4 | 1919 | 2020-08-14 00:00:00 | 18:19:18
5 | 1919 | 2020-08-14 00:00:00 | 11:45:21
6 | 1919 | 2020-08-14 00:00:00 | 15:54:15
Expected result
Id | External Id | Date | Time
2 | 1000 | 2020-08-12 00:00:00 | 12:45:51
3 | 1556 | 2020-08-17 00:00:00 | 10:09:01
5 | 1919 | 2020-08-14 00:00:00 | 11:45:21
I'm currently doing this
SELECT *
FROM RUN AS T1
WHERE CONCAT(T1.DATE, T1.TIME) = (
SELECT MIN(CONCAT(T2.DATE, T2.TIME))
FROM RUN AS T2
WHERE T2.EXTERNAL_ID = T1.EXTERNAL_ID
)
Is it a correct way to do ?
Thank you, regards
Update 1 : Data type
DATE column is datetime
TIME column is varchar
You can use a window function such as DENSE_RANK()
SELECT ID, External_ID, Date, Time
FROM
(
SELECT DENSE_RANK() OVER (PARTITION BY External_ID ORDER BY Date, Time) AS dr,
r.*
FROM run r
) AS q
WHERE dr = 1
Demo

Where clause changing the results of my datediff column, How can I work around this?

I'm trying to obtain the time elapsed while st1=5. Here is what I currently have, which gives me the datediff time for each state change. My issue is that when i add a where st1=5 clause the datediff shows the difference in time between instances where the state = 5 instead of time elapsed where state is 5.
select timestamp,st1,st2,st3,st4,
datediff(second, timestamp, lead(timestamp)
over (order by timestamp)) as timediff
from A6K_status
Order By Timestamp DESC
+-----+-----+-----+-----+---------------------+----------+
| st1 | st2 | st3 | st4 | TimeStamp | TimeDiff |
+-----+-----+-----+-----+---------------------+----------+
| 3 | 3 | 3 | 3 | 2018-07-23 07:51:06 | |
+-----+-----+-----+-----+---------------------+----------+
| 5 | 5 | 5 | 5 | 2018-07-23 07:50:00 | 66 |
+-----+-----+-----+-----+---------------------+----------+
| 0 | 0 | 10 | 10 | 2018-07-23 07:47:19 | 161 |
+-----+-----+-----+-----+---------------------+----------+
| 5 | 5 | 5 | 5 | 2018-07-23 07:39:07 | 492 |
+-----+-----+-----+-----+---------------------+----------+
| 3 | 3 | 10 | 10 | 2018-07-23 07:37:48 | 79 |
+-----+-----+-----+-----+---------------------+----------+
| 3 | 3 | 10 | 10 | 2018-07-23 07:37:16 | 32 |
+-----+-----+-----+-----+---------------------+----------+
I am trying to sum the time that the state of station1 is 5. From this table above(what I have right now) if i could just sum timediff Where st1=5 that would work perfectly. But by adding "where st1=5" to my query gives me the time difference between instances where the state = 5.
Any help would be much appreciated. I feel very close to the result I would like to achieve. Thanks you.
Edit
This is what I would like to achieve
+-----+------------+----------+
| st1 | TimeStamp | TimeDiff |
+-----+------------+----------+
| 5 | 2018-07-23 | 558 |
+-----+------------+----------+
You would use a subquery (or CTE):
select sum(timediff)
from (select timestamp, st1, st2, st3, st4,
datediff(second, timestamp, lead(timestamp) over (order by timestamp)) as timediff
from A6K_status
) s
where st1 = 5;
Assuming SQL Server, try something like this:
WITH SourceTable AS (
select TOP 100 PERCENT timestamp,st1,st2,st3,st4,
datediff(second, timestamp, lead(timestamp)
over (order by timestamp)) as timediff
from A6K_status
Order By Timestamp DESC
)
SELECT SUM(timediff) as totaltimediff
WHERE st1 = 5