Oracle Join Between with Interval Times of Minutes - sql

I have some problems with joining tables with different date interval by minutes.
Example :
table1
ID Modules Timestamp
1 Delivered 02-FEB-2020 08:24:45
1 Read 02-FEB-2020 08:27:50
1 Delivered 03-FEB-2020 09:24:45
1 Read 03-FEB-2020 10:00:50
2 Delivered 03-FEB-2020 09:28:10
2 Read 03-FEB-2020 09:30:11
Question:
is there any way to make the data become like this?
ID Modules1 Timestamp1 Modules2 Timestamp2
1 Delivered 02-FEB-2020 08:24:45 Read 02-FEB-2020 08:27:50
1 Delivered 03-FEB-2020 09:24:45
1 Read 03-FEB-2020 10:00:50
2 Delivered 03-FEB-2020 09:28:10 Read 03-FEB-2020 09:30:11
Goal:
so if someone read during 5 minutes then it will join, if not the data will remain same.

I interpret this as a type of gaps-and-islands problem. Each "island" either starts with a lag of 5 minutes on a "Read" or any row with "Delivered".
with tgrp as (
select t.*,
sum(case when modules = 'Delivered' or
prev_timestamp < timestamp - interval '5' minute
then 1 else 0
end) over (partition by id order by timestamp) as grp
from (select t.*,
lag(timestamp) over (partition by id order by timestamp) as prev_timestamp
from t
) t
)
select id,
max(case when seqnum = 1 then module end) as module1,
max(case when seqnum = 1 then timestamp end) as timestamp1,
max(case when seqnum = 2 then module end) as module2,
max(case when seqnum = 2 then timestamp end) as timestamp2
from (select tgrp.*,
row_number() over (partition by id, grp order by timestamp) as seqnum
from tgrp
) tgrp
group by id, grp;
EDIT:
I think a simpler method is to put the data together using lead() and then filter and adjust the final values:
select t.id, t.module, t.timestamp,
(case when t.next_module = 'Read' and
t.next_timestamp < t.timestamp + interval '5' minute
then t.next_module
end) as module2,
(case when t.next_module = 'Read' and
t.next_timestamp < t.timestamp + interval '5' minute
then t.next_timestamp
end) as timestamp2
from (select t.*,
lead(module) over (partition by id order by timestamp) as next_module,
lead(timestamp) over (partition by id order by timestamp) as next_timestamp
from t
) t
where module = 'Delivery' or
(next_timestamp > timestamp + interval '5' minute)

You can do self join to achieve the desired result as following:
With cte as
(Select t.*,
Row_number() over (partition by id, modules order by timestamp) as rn
From your_table t)
Select t1.*,
case when t1.modules = 'delivered' and t1.timestamp + interval '5' minute <= t2.timestamp
then t2.timestamp
end as timestamp2
From cte t1
left join cte t2
On (t1.rn = t2.rn and t2.modules = 'read')
Left join cte3
On (t1.rn = t3.rn and t3.modules = 'delivered')
Where t1.modules = 'delivered' or t3.timestamp + interval '5' minute > t2.timestamp
Cheers!!

Related

Hive rolling sum of data over date

I am working on Hive and am facing an issue with rolling counts. The sample data I am working on is as shown below:
and the output I am expecting is as shown below:
I tried using the following query but it is not returning the rolling count:
select event_dt,status, count(distinct account) from
(select *, row_number() over (partition by account order by event_dt
desc)
as rnum from table.A
where event_dt between '2018-05-02' and '2018-05-04') x where rnum =1
group by event_dt, status;
Please help me with this if some one has solved a similar issue.
You seem to just want conditional aggregation:
select event_dt,
sum(case when status = 'Registered' then 1 else 0 end) as registered,
sum(case when status = 'active_acct' then 1 else 0 end) as active_acct,
sum(case when status = 'suspended' then 1 else 0 end) as suspended,
sum(case when status = 'reactive' then 1 else 0 end) as reactive
from table.A
group by event_dt
order by event_dt;
EDIT:
This is a tricky problem. The solution I've come up with does a cross-product of dates and users and then calculates the most recent status as of each date.
So:
select a.event_dt,
sum(case when aa.status = 'Registered' then 1 else 0 end) as registered,
sum(case when aa.status = 'active_acct' then 1 else 0 end) as active_acct,
sum(case when aa.status = 'suspended' then 1 else 0 end) as suspended,
sum(case when aa.status = 'reactive' then 1 else 0 end) as reactive
from (select d.event_dt, ac.account, a.status,
max(case when a.status is not null then a.timestamp end) over (partition by ac.account order by d.event_dt) as last_status_timestamp
from (select distinct event_dt from table.A) d cross join
(select distinct account from table.A) ac left join
(select a.*,
row_number() over (partition by account, event_dt order by timestamp desc) as seqnum
from table.A a
) a
on a.event_dt = d.event_dt and
a.account = ac.account and
a.seqnum = 1 -- get the last one on the date
) a left join
table.A aa
on aa.timestamp = a.last_status_timestamp and
aa.account = a.account
group by d.event_dt
order by d.event_dt;
What this is doing is creating a derived table with rows for all accounts and dates. This has the status on certain days, but not all days.
The cumulative max for last_status_timestamp calculates the most recent timestamp that has a valid status. This is then joined back to the table to get the status on that date. Voila! This is the status used for the conditional aggregation.
The cumulative max and join is a work-around because Hive does not (yet?) support the ignore nulls option in lag().

Window Functions

I want to add a window functions.
Take the min date when visit = Y and end as Associd.
TableA
ID Date AssocId Visit
1 1/1/17 10101 Y
1 1/2/17 10102 Y
End Results.
ID Date AssocId
1 1/1/17 10101
SQL > This gives me the min date but I need the AssocId associated to that date.
SELECT MIN(CASE WHEN A.VISIT = 'Y'
THEN A.DATE END) OVER (PARTITION BY ID)
AS MIN_DT,
You can use FIRST_VALUE():
SELECT MIN(CASE WHEN A.VISIT = 'Y' THEN A.DATE END) OVER (PARTITION BY ID) AS MIN_DT,
FIRST_VALUE(CASE WHEN A.VISIT = 'Y' THEN A.ASSOCID END) KEEP (DENSE_RANK FIRST OVER (PARTITION BY ID ORDER BY A.VISIT DESC, A.DATE ASC),
Note that this is a little tricky with conditional operations. I would be more inclined to use a subquery to nest the query operations. The outer expression would be:
SELECT MAX(CASE WHEN Date = MIN_DT THEN ASSOCID END) OVER (PARTITION BY ID)
If you wanted this per ID, I would suggest:
select id, min(date),
first_value(associd) over (partition by id order by date)
from t
where visit = 'Y'
group by id;
That is, use aggregation functions.
You seems want :
select t.*
from table t
where visit = 'Y' and
date= (select min(t1.date) from table t1 where t1.id = t.id);

Aggregation disables window function capability

I am trying to re-write the query where I am joining the query on itself:
select count(distinct case when cancelled_client_id is null and year(RUM.first_date) = year(date) and RUM.first_date <= .date then user_id
when cancelled_client_id is null and year(coalesce(RUM.first_date,RUR.first_date)) = year(date)
and coalesce(RUM.first_date,RUR.first_date) <= RUL.date then user_id end) as
from RUL
left join
(
select enrolled_client_id, min(date) as first_date
from RUL
where enrolled_client_id is not null
group by enrolled_client_id
) RUR on RUR.enrolled_client_id=RUL.enrolled_client_id
left join
(
select managed_client_id, min(date) as first_date
from RUL
where managed_client_id is not null
group by managed_client_id
) RUM on RUM.managed_client_id=RUL.managed_client_id
Using window functions:
count(distinct case when cancelled_client_id is null
and year(min(case when enrolled_client_id is not null then date end) over(partition by enrolled_client_id)) = year(date)
and min(case when enrolled_client_id is not null then date end) over(partition by enrolled_client_id) <= date
then user_id
when cancelled_client_id_rev is null
and year(coalesce(
min(case when enrolled_client_id is not null then date end) over(partition by enrolled_client_id),
min(case when managed_client_id is not null then date end) over(partition by managed_client_id))) = year(date)
and coalesce(
min(case when enrolled_client_id is not null then date end) over(partition by enrolled_client_id),
min(case when managed_client_id is not null then date end) over(partition by managed_client_id)) <= date
then user_id end)
from RUL
However I am getting an error that "Windowed functions cannot be used in the context of another windowed function or aggregate" due to the count(distinct min). Any work-arounds?
I have no idea what the count(distinct) is supposed to be doing, but you can simplify the code to:
select count(distinct case when cancelled_client_id is null and
year(rum_first_date) = year(date) and
rum_first_date <= rul.date
then user_id
when cancelled_client_id is null and
year(coalesce(RUM_first_date, RUR_first_date)) = year(rul.date) and
coalesce(rum_first_date, rur_first_date) <= RUL.date
then user_id
end) as . . .
from (select RUL.*,
min(date) over (partition by enrolled_client_id) as rur_date,
min(date) over (partition by managed_client_id) as rum_date
from RUL
) RUL

SQL query min and max group by flag

I have a table as below :
How can I craft a SQL select statement so that MIN AND MAX EVENT DATE groups results by FLAG (0,1)?
So the result would be:
Just do conditional aggregation with use of window function
SELECT card_no, descr_reader,
max(CASE WHEN flag = 0 THEN event_date END) date_in,
max(CASE WHEN flag = 1 THEN event_date END) date_out
FROM
(
SELECT *,
COUNT(flag) OVER (PARTITION BY flag ORDER BY id) Seq
FROM table t
)t
GROUP BY card_no, descr_reader, Seq
An alternative if Window function does not work:
SELECT
t1.card_no, t1.descr_reader,
t1.event_date date_in,
(select top 1 event_date from test t2
where t2.card_no = t1.card_no and
t2.reader_no = t1.reader_no and
t2.descr_reader = t1.descr_reader and
t2.event_date > t1.event_date and
t2.flag = 1
order by t2.event_date ) as date_out
FROM test t1
WHERE t1.flag = 0

SQL Query to get the Max Date of a certain Status and subract that from the Max Date of another Status

An example would be.. Say a ticket is in New status. I want to get the MAX Date of New Status and the Max date of Completed Status and calculate the difference between the MAX Completed Status from the MAX New Status
ex.
SELECT t.ID,
MAX(update_date) WHERE t.status = 'New' start_time,
MAX(update_date) WHERE t.status = 'Completed' stop_time,
DATEDIFF(second, MAX(update_date), MAX(update_date)) elapsed_sec
FROM xxx.dbo t
GROUP BY t.ID;
Thank you so much,
P
SELECT
t.id
,DATEDIFF(second, start_time, stop_time) elapsed_sec
FROM (
SELECT
ID,
(SELECT MAX(update_date) from xxx.dbo WHERE status = 'New' AND ID=t2.ID) start_time,
(SELECT MAX(update_date) from xxx.dbo WHERE status = 'Completed' AND ID=t2.ID) stop_time
FROM xxx.dbo t2
) t
I would suggest doing this using condition aggregation and not with correlated subqueries:
SELECT t.ID,
MAX(CASE WHEN t.status = 'New' THEN update_date END) as start_time,
MAX(CASE WHEN t.status = 'Completed' THEN update_date END) as stop_time,
MAX(update_date) WHERE t.status = 'Completed' stop_time,
DATEDIFF(second,
MAX(CASE WHEN t.status = 'New' THEN update_date END),
MAX(CASE WHEN t.status = 'Completed' THEN update_date END)
) as elapsed_sec
FROM xxx.dbo t
GROUP BY t.ID;