LAG with condition - sql

I want to get a value from the previous row that matches a certain condition.
For example: here I want for each row to get the timestamp from the last event = 1.
I feel I can do it without joins with LAG and PARTITION BY with CASE but I am not able to crack it.
Please help.

Here is one approach using analytic functions:
WITH cte AS (
SELECT *, COUNT(CASE WHEN event = 1 THEN 1 END) OVER
(PARTITION BY customer_id ORDER BY ts) cnt
FROM yourTable
)
SELECT ts, customer_id, event,
MAX(CASE WHEN event = 1 THEN ts END) OVER
(PARTITION BY customer_id, cnt) AS desired_result
FROM cte
ORDER BY customer_id, ts;
Demo
We can articulate your problem by saying that your want the desired_result column to contain the most recent timestamp value when the event was 1. The count (cnt) in the CTE above computes a pseudo group of records for each time the event is 1. Then we simply do a conditional aggregation over customer and pseudo group to find the timestamp value.

One more approach with "one query":
with data as
(
select sysdate - 0.29 ts, 111 customer_id, 1 event from dual union all
select sysdate - 0.28 ts, 111 customer_id, 2 event from dual union all
select sysdate - 0.27 ts, 111 customer_id, 3 event from dual union all
select sysdate - 0.26 ts, 111 customer_id, 1 event from dual union all
select sysdate - 0.25 ts, 111 customer_id, 1 event from dual union all
select sysdate - 0.24 ts, 111 customer_id, 2 event from dual union all
select sysdate - 0.23 ts, 111 customer_id, 1 event from dual union all
select sysdate - 0.22 ts, 111 customer_id, 1 event from dual
)
select
ts, event,
last_value(case when event=1 then ts end) ignore nulls
over (partition by customer_id order by ts) desired_result,
max(case when event=1 then ts end)
over (partition by customer_id order by ts) desired_result_2
from data
order by ts
Edit: As suggested by MatBailie the max(case...) works as well and is a more general approach. The "last_value ... ignore nulls" is Oracle specific.

Related

create time range with 2 columns date_time

The problem I am facing is how to find distinct time periods from multiple time periods with overlap in Teradata ANSI SQL.
For example, the attached tables contain multiple overlapping time periods, how can I combine those time periods into 3 unique time periods in Teradata SQL???
I think I can do it in python with the loop function, but not sure how to do it in SQL
ID
Start Date
End Date
001
2005-01-01
2006-01-01
001
2005-01-01
2007-01-01
001
2008-01-01
2008-06-01
001
2008-04-01
2008-12-01
001
2010-01-01
2010-05-01
001
2010-04-01
2010-12-01
001
2010-11-01
2012-01-01
My expected result is:
ID
start_Date
end_date
001
2005-01-01
2007-01-01
001
2008-01-01
2008-12-01
001
2010-01-01
2012-01-01
From Oracle 12, you can use MATCH_RECOGNIZE to perform a row-by-row comparison:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY id
ORDER BY start_date
MEASURES
FIRST(start_date) AS start_date,
MAX(end_date) AS end_date
ONE ROW PER MATCH
PATTERN (overlapping_ranges* last_range)
DEFINE overlapping_ranges AS NEXT(start_date) <= MAX(end_date)
)
Which, for the sample data:
CREATE TABLE table_name (ID, Start_Date, End_Date) AS
SELECT '001', DATE '2005-01-01', DATE '2006-01-01' FROM DUAL UNION ALL
SELECT '001', DATE '2005-01-01', DATE '2007-01-01' FROM DUAL UNION ALL
SELECT '001', DATE '2008-01-01', DATE '2008-06-01' FROM DUAL UNION ALL
SELECT '001', DATE '2008-04-01', DATE '2008-12-01' FROM DUAL UNION ALL
SELECT '001', DATE '2010-01-01', DATE '2010-05-01' FROM DUAL UNION ALL
SELECT '001', DATE '2010-04-01', DATE '2010-12-01' FROM DUAL UNION ALL
SELECT '001', DATE '2010-11-01', DATE '2012-01-01' FROM DUAL;
Outputs:
ID
START_DATE
END_DATE
001
2005-01-01 00:00:00
2007-01-01 00:00:00
001
2008-01-01 00:00:00
2008-12-01 00:00:00
001
2010-01-01 00:00:00
2012-01-01 00:00:00
db<>fiddle here
Update: Alternative query
SELECT id,
start_date,
end_date
FROM (
SELECT id,
dt,
SUM(cnt) OVER (PARTITION BY id ORDER BY dt) AS grp,
cnt
FROM (
SELECT ID,
dt,
SUM(type) OVER (PARTITION BY id ORDER BY dt, ROWNUM) * type AS cnt
FROM table_name
UNPIVOT (dt FOR type IN (start_date AS 1, end_date AS -1))
)
WHERE cnt IN (1,0)
)
PIVOT (MAX(dt) FOR cnt IN (1 AS start_date, 0 AS end_date))
Or, an equivalent that does not use UNPIVOT, PIVOT or ROWNUM and works in both Oracle and PostgreSQL:
SELECT id,
MAX(CASE cnt WHEN 1 THEN dt END) AS start_date,
MAX(CASE cnt WHEN 0 THEN dt END) AS end_date
FROM (
SELECT id,
dt,
SUM(cnt) OVER (PARTITION BY id ORDER BY dt) AS grp,
cnt
FROM (
SELECT ID,
dt,
SUM(type) OVER (PARTITION BY id ORDER BY dt, rn) * type AS cnt
FROM (
SELECT r.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY dt ASC, type DESC) AS rn
FROM (
SELECT id, 1 AS type, start_date AS dt FROM table_name
UNION ALL
SELECT id, -1 AS type, end_date AS dt FROM table_name
) r
) p
) s
WHERE cnt IN (1,0)
) t
GROUP BY id, grp
Update 2: Another Alternative
SELECT id,
MIN(start_date) AS start_date,
MAX(end_Date) AS end_date
FROM (
SELECT t.*,
SUM(CASE WHEN start_date <= prev_max THEN 0 ELSE 1 END)
OVER (PARTITION BY id ORDER BY start_date) AS grp
FROM (
SELECT t.*,
MAX(end_date) OVER (
PARTITION BY id ORDER BY start_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS prev_max
FROM table_name t
) t
) t
GROUP BY id, grp
db<>fiddle Oracle PostgreSQL
This is a gaps and islands problem. Try this:
with u as
(select ID, start_date, end_date,
case
when start_date <= lag(end_date) over(partition by ID order by start_date, end_date) then 0
else 1 end as grp
from table_name),
v as
(select ID, start_date, end_date,
sum(grp) over(partition by ID order by start_date, end_date) as island
from u)
select ID, min(start_date) as start_Date, max(end_date) as end_date
from v
group by ID, island;
Fiddle
Basically you can identify "islands" by comparing start_date of current row to end_date of previous row (ordered by start_date, end_date), if it precedes it then it's the same island. Then you can do a rolling sum() to get the island numbers. Finally select min(start_date) and max(end_date) from each island to get the desired output.
This may work ,with little bit of change in function , I tried it in Dbeaver :
select ID,Start_Date,End_Date
from
(
select t.*,
dense_rank () over(partition by extract (year from Start_Date) order BY End_Date desc) drnk
from testing_123 t
) temp
where temp.drnk = 1
ORDER BY Start_Date;
Try this
WITH a as (
SELECT
ID,
LEFT(Start_Date, 4) as Year,
MIN(Start_Date) as New_Start_Date
FROM
TAB1
GROUP BY
ID,
LEFT(Start_Date, 4)
), b as (
SELECT
a.ID,
Year,
New_Start_Date,
End_Date
FROM
a
LEFT JOIN
TAB1
ON LEFT(a.New_Start_Date, 4) = LEFT(TAB1.Start_Date, 4)
)
select
ID,
New_Start_Date as Start_Date,
MAX(End_Date)
from
b
GROUP BY
ID,
New_Start_Date;
Example: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=97f91b68c635aebfb752538cdd752ace

Stopping leading observations once certain threshold met in Oracle

Hopefully this will make sense. In short, what I have is a pt with multiple encounters. Starting with the first encounter (and is always included) and then including the next encounter if within 4 hrs.
If the next encounter does not meet criteria then all other observations will not be included in the output-
The code below shows the problem. It outputs rows 1,2, and 4. I want rows 1&2 but not 4.
Any tips appreciated on this
TIA
With Base as
(select 123 as ID, 12345 as enc_id, TO_DATE('2019-07-01 13:27:18', 'YYYY-MM-DD HH24:MI:SS') as dt from dual union
select 123 as ID, 12346 as enc_id, TO_DATE('2019-07-01 16:27:18', 'YYYY-MM-DD HH24:MI:SS') as dt from dual union
select 123 as ID, 12347 as enc_id, TO_DATE('2019-07-02 16:27:18', 'YYYY-MM-DD HH24:MI:SS') as dt from dual union
select 123 as ID, 12348 as enc_id, TO_DATE('2019-07-02 18:27:18', 'YYYY-MM-DD HH24:MI:SS') as dt from dual)
select * from (select ID,ENC_ID,dt,row_number() over (partition by ID order by DT) RK,
lag(dt) over (partition by ID order by dt) prev_dt,
(DT-lag(dt) over (partition by ID order by dt))*24 as time_dif_hrs from base) where RK=1 or TIME_DIF_HRS<4
You can use another analytical function sum as follows:
Select ID,ENC_ID,dt from
(select ID,ENC_ID,dt,rk,
sum(case when (date - prev_date)* 24 < 4 then 0 else 1 end)
over( partition by ID order by DT) as cond_met_running
from (select ID,ENC_ID,dt,
row_number() over (partition by ID order by DT) RK,
lag(dt) over (partition by ID order by dt) prev_dt
from base)
)
) Where rk = 1 or cond_met_running = 0

Finding the most recent thing prior to a specific event

I'm doing some timestamp problem solving but am stuck with some join logic.
I have a table of data like so:
id, event_time, event_type, location
1001, 2018-06-04 18:23:48.526895 UTC, I, d
1001, 2018-06-04 19:26:44.359296 UTC, I, h
1001, 2018-06-05 06:07:03.658263 UTC, I, w
1001, 2018-06-07 00:47:44.651841 UTC, I, d
1001, 2018-06-07 00:48:17.857729 UTC, C, d
1001, 2018-06-08 00:04:53.086240 UTC, I, a
1001, 2018-06-12 21:23:03.071829 UTC, I, d
...
And I'm trying to find the timestamp difference between when a user has an event_type of C and the most recent event type of I up to event_type C for a given location value.
Ultimately the schema I'm after is:
id, location, timestamp_diff
1001, d, 33
1001, z, 21
1002, a, 55
...
I tried the following, which works for only one id value, but doesn't seem to work for multiples ids. I might be over-complicating the issue, but I wasn't sure. On one id it gives about 5 rows, which is right. However, when I open it up two ids, I get upwards of 200 rows when I should get something like 7 (5 for the first id and 2 for the second):
with c as (
select
id
,event_time as c_time
,location
from data
where event_type = 'C'
and id = '1001'
)
,i as (
select
id
,event_time as i_time
,location
from data
where event_type = 'I'
)
,check1 as (
c.*
,i.i_time
from c
left join i on (c.id = i.id and c.location = i.location)
group by 1,2,3,4
having i_time <= c_time
)
,check2 as (
select
id
,c_time
,location
,max(i_time) as i_time
from check1
group by 1,2,3
)
select
id
,location
,timestamp_diff(c_time, i_time, second) as timestamp_diff
#standardSQL
SELECT id, location, TIMESTAMP_DIFF(event_time, i_event_time, SECOND) AS diff
FROM (
SELECT *, MAX(IF(event_type = 'I', event_time, NULL)) OVER(win2) AS i_event_time
FROM (
SELECT *, COUNTIF(event_type = 'C') OVER(win1) grp
FROM `project.dataset.table`
WINDOW win1 AS (PARTITION BY id, location ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WINDOW win2 AS (PARTITION BY id, location, grp ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WHERE event_type = 'C'
AND NOT i_event_time IS NULL
This version addresses some edge cases - like for example case when there are consecutive 'C' events with "missing" 'I' events as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1001 id, TIMESTAMP '2018-06-04 18:23:48.526895 UTC' event_time, 'I' event_type, 'd' location UNION ALL
SELECT 1001, '2018-06-04 19:26:44.359296 UTC', 'I', 'h' UNION ALL
SELECT 1001, '2018-06-05 06:07:03.658263 UTC', 'I', 'w' UNION ALL
SELECT 1001, '2018-06-07 00:47:44.651841 UTC', 'I', 'd' UNION ALL
SELECT 1001, '2018-06-07 00:48:17.857729 UTC', 'C', 'd' UNION ALL
SELECT 1001, '2018-06-08 00:04:53.086240 UTC', 'C', 'd' UNION ALL
SELECT 1001, '2018-06-12 21:23:03.071829 UTC', 'I', 'd'
)
SELECT id, location, TIMESTAMP_DIFF(event_time, i_event_time, SECOND) AS diff
FROM (
SELECT *, MAX(IF(event_type = 'I', event_time, NULL)) OVER(win2) AS i_event_time
FROM (
SELECT *, COUNTIF(event_type = 'C') OVER(win1) grp
FROM `project.dataset.table`
WINDOW win1 AS (PARTITION BY id, location ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WINDOW win2 AS (PARTITION BY id, location, grp ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WHERE event_type = 'C'
AND NOT i_event_time IS NULL
result is
Row id location diff
1 1001 d 33
while if not to address that mentioned edge case it would be
Row id location diff
1 1001 d 33
2 1001 d 83795
You can use a cumulative max() function to get the most recent i time before every event.
Then just filter based on the C event:
select id, location,
timestamp_diff(event_time, i_event_time, second) as diff
from (select t.*,
max(case when event_type = 'I' then event_time end) over (partition by id, location order by event_time) as i_event_time
from t
) t
where event_type = 'C';

Get rows from current month if older is not available

I have a table that looks like this:
+--------------------+---------+
| Month (date) | amount |
+--------------------+---------+
| 2016-10-01 | 20 |
| 2016-08-01 | 10 |
| 2016-07-01 | 17 |
+--------------------+---------+
I'm looking for a query (sql statement) which satisfies the following conditions:
Give me the value of the previous month.
If there is no value for the previous month lock back in time until one can be found.
If there is just a value for the current month give me this value.
In the example table the row I'm looking for would be this:
+--------------------+---------+
| 2016-08-01 | 10 |
+--------------------+---------+
Has anyone a idea for a non complex select query?
Thanks in advance,
Peter
You may need the following:
SELECT *
FROM ( SELECT *
FROM test
WHERE TRUNC(SYSDATE, 'month') >= month
ORDER BY CASE
WHEN TRUNC(SYSDATE, 'month') = month
THEN 0 /* if current month, ordered last */
ELSE 1 /* previous months are ordered first */
END DESC,
month DESC /* among previous months, the greatest first */
)
WHERE ROWNUM = 1
Another way using MAX
WITH tbl AS (
SELECT TO_DATE('2016-10-01', 'YYYY-MM-DD') AS "month", 20 AS amount FROM dual
UNION
SELECT TO_DATE('2016-08-01', 'YYYY-MM-DD') AS "month", 10 AS amount FROM dual
UNION
SELECT TO_DATE('2016-07-01', 'YYYY-MM-DD') AS "month", 5 AS amount FROM dual
)
SELECT *
FROM tbl
WHERE TRUNC("month", 'MONTH') = NVL((SELECT MAX(t."month")
FROM tbl t
WHERE t."month" < TRUNC(SYSDATE, 'MONTH')),
TRUNC(SYSDATE, 'MONTH'));
I would use row_number():
select t.*
from (select t.*,
row_number() over (order by (case when to_char(dte, 'YYYY-MM') = to_char(sysdate, 'YYYY-MM') then 1 else 2 end) desc,
dte desc
) as seqnum
from t
) t
where seqnum = 1;
Actually, you don't need row_number() for this:
select t.*
from (select t.*
from t
order by (case when to_char(dte, 'YYYY-MM') = to_char(sysdate, 'YYYY-MM') then 1 else 2 end) desc,
dte desc
) t
where rownum = 1;
It's not the nicest query but it should work.
select amount, date from (
select amount, date, row_number over(partition by HERE_PUT_ID order by
case trunc(date, 'month') when trunc(sysdate, 'month') then to_date('00010101', 'yyyymmdd') else trunc(date, 'month') end
desc) r)
where r = 1;
I guess you have some id in table so put id column instead of HERE_PUT_ID if you want query for whole table just delete: partition by HERE_PUT_ID
I added more data for testing, and an "id" column (a more realistic scenario) to show how this would work. If there is no "id" in your data, simply delete any reference to it from the solution.
Notes - month is a reserved Oracle word, don't use it as a column name. The solution assumes the date column contains dates that are already truncated to the beginning of the month. The trick in "order by" in the dense_rank last is to assign a value (ANY value!) when the month is the current month; by default, the value assigned to all other months is NULL, which by default come after any non-null value in an ascending order.
You may want to test the various solutions for efficiency if execution time is important.
with
inputs ( id, mth, amount ) as (
select 1, date '2016-10-01', 20 from dual union all
select 1, date '2016-08-01', 10 from dual union all
select 1, date '2016-07-01', 17 from dual union all
select 2, date '2016-10-01', 30 from dual union all
select 2, date '2016-09-01', 25 from dual union all
select 3, date '2016-10-01', 20 from dual union all
select 4, date '2016-08-01', 45 from dual union all
select 4, date '2016-06-01', 30 from dual
)
-- end of TEST DATA - the solution (SQL query) is below this line
select id,
max(mth) keep(dense_rank last order by
case when mth = trunc(sysdate, 'mm') then 0 end, mth) as mth,
max(amount) keep(dense_rank last order by
case when mth = trunc(sysdate, 'mm') then 0 end, mth) as amount
from inputs
group by id
order by id -- ORDER BY is optional
;
ID MTH AMOUNT
--- ---------- -------
1 2016-08-01 10
2 2016-09-01 25
3 2016-10-01 20
4 2016-08-01 45
You could sort the data in the direction you want to:
with MyData as
(
SELECT to_date('2016-10-01','YYYY-MM-DD') MY_DATE, 20 AMOUNT FROM DUAL UNION
SELECT to_date('2016-08-01','YYYY-MM-DD') MY_DATE, 10 AMOUNT FROM DUAL UNION
SELECT to_date('2016-07-01','YYYY-MM-DD') MY_DATE, 17 AMOUNT FROM DUAL
),
MyResult AS (
SELECT
D.*
FROM MyData D
ORDER BY
DECODE(
12*TO_CHAR(MY_DATE,'YYYY') + TO_CHAR(MY_DATE,'MM'),
12*TO_CHAR(SYSDATE,'YYYY') + TO_CHAR(SYSDATE,'MM'),
-1,
12*TO_CHAR(MY_DATE,'YYYY') + TO_CHAR(MY_DATE,'MM'))
DESC
)
SELECT * FROM MyResult WHERE RowNum = 1

SQL to calculate difference between 2 latest recent values by event_types

The events table looks like
event_type value timestamp
2 2 06-06-2016 14:00:00
2 7 06-06-2016 13:00:00
2 2 06-06-2016 12:00:00
3 3 06-06-2016 14:00:00
3 9 06-06-2016 13:00:00
4 9 06-06-2016 13:00:00
My goal is to filter event types that occur more than twice and subtract most two recent values and shows BY event_type.
The end result would be
event_type value
2 -5
3 -6
I was able to get filter events occurred more than twice and order by event_type based on timestamp desc.
The difficult part for me is to subtract most two recent values and shows BY event_type.
DB / SQL experts , please help
You can use a query like this:
SELECT event_type, diff
FROM (
SELECT event_type, value, "timestamp", rn,
value - LEAD(value) OVER (PARTITION BY event_type
ORDER BY "timestamp" DESC) AS diff
FROM (
SELECT event_type, value, "timestamp",
COUNT(*) OVER (PARTITION BY event_type) AS cnt,
ROW_NUMBER() OVER (PARTITION BY event_type ORDER BY "timestamp" DESC) AS rn
FROM mytable) AS t
WHERE cnt >=2 AND rn <= 2 ) AS s
WHERE rn = 1
The innermost subquery uses:
Window function COUNT with PARTITION BY clause, so as to calculate the population of each event_type slice.
Window function ROW_NUMBER so as to get the two latest records within each event_type slice.
The mid-level query uses LEAD window function, so as to calculate the difference between the first and the second records. The outermost query simply returns this difference.
Demo here
This example only for Oracle.
Test data:
with t(event_type,
value,
timestamp) as
(select 2, 2, to_timestamp('06-06-2016 14:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 2, 7, to_timestamp('06-06-2016 13:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 2, 2, to_timestamp('06-06-2016 12:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 3, 3, to_timestamp('06-06-2016 14:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 3, 9, to_timestamp('06-06-2016 13:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 4, 9, to_timestamp('06-06-2016 13:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual)
Query:
select event_type,
max(value) keep(dense_rank first order by rn) - max(value) keep(dense_rank last order by rn) as value
from (select event_type,
row_number() over(partition by event_type order by timestamp desc) rn,
value
from t) t
where rn in (1, 2)
group by event_type
having count (*) >= 2