Lead and case expression - sql

I have this table:
ID Date
-----------------
1 1/1/2019
1 1/15/2019
Expected output:
ID DATE LEAD_DATE
-------------------------
1 1/1/2019 1/14/2019
1 1/15/2019 SYSDATE
SQL:
SELECT
*,
CASE
WHEN LEAD (a.date) OVER (PARTITION BY a.ID ORDER BY a.date) = TRUNC(a.date) THEN NULL
ELSE LEAD (a.date) OVER (PARTITION BY a.id ORDER BY a.date) - NUMTODSINTERVAL(1,'second')
END AS LEAD_DT
FROM a
Results:
ID DATE LEAD_DATE
-------------------------
1 1/1/2019 1/14/2019
1 1/15/2019
Can I add the system date when null in the case expression?

Use NVL :
SELECT
a.*,
NVL(CASE
WHEN LEAD (a.date) OVER (PARTITION BY H.ID ORDER BY a.date) = TRUNC(a.date) THEN NULL
ELSE LEAD (a.date) OVER (PARTITION BY a.id ORDER BY a.date) - NUMTODSINTERVAL(1,'second')
END, SYSDATE) AS LEAD_DT
FROM a
Or, better yet :
SELECT
a.*,
CASE LEAD (a.date) OVER (PARTITION BY a.ID ORDER BY a.date)
WHEN TRUNC(a.date) THEN SYSDATE
WHEN NULL THEN SYSDATE
ELSE LEAD (a.date) OVER (PARTITION BY a.id ORDER BY a.date) - NUMTODSINTERVAL(1,'second')
END AS LEAD_DT
FROM a

Use COALESCE:.
SELECT a.*,
CASE COALESCE(LEAD("Date") OVER (PARTITION BY ID ORDER BY "Date") - "Date", 0)
WHEN 0 THEN SYSDATE
ELSE LEAD("Date") OVER (PARTITION BY ID ORDER BY "Date") - INTERVAL '1' SECOND
END AS LEAD_DT
FROM a

Related

How to find the time and step between status change

I'm trying to query a dataset about user status changes. and I want to find out the time it takes for the status to change, and the steps in between(number of rows).
Example data:
user_id
Status
date
1
a
2001-01-01
1
a
2001-01-08
1
b
2001-01-15
1
b
2001-01-28
1
a
2001-01-31
1
b
2001-02-01
2
a
2001-01-08
2
a
2001-01-18
2
a
2001-01-28
3
b
2001-03-08
3
b
2001-03-18
3
b
2001-03-19
3
a
2001-03-20
Desired output:
user_id
From
to
days in between
Steps in between
1
a
b
14
2
1
b
a
16
2
1
a
b
1
1
3
b
a
12
3
You might consider below another approach.
WITH partitions AS (
SELECT *, COUNTIF(flag) OVER w AS part FROM (
SELECT *, ROW_NUMBER() OVER w AS rn, status <> LAG(status) OVER w AS flag,
FROM sample_data
WINDOW w AS (PARTITION BY user_id ORDER BY date)
) WINDOW w AS (PARTITION BY user_id ORDER BY date)
)
SELECT user_id,
LAG(ANY_VALUE(status)) OVER w AS `from`,
ANY_VALUE(status) AS `to`,
EXTRACT(DAY FROM MIN(date) - LAG(MIN(date)) OVER w) AS days_in_between,
MIN(rn) - LAG(MIN(rn)) OVER w AS steps_in_between
FROM partitions
GROUP BY user_id, part
QUALIFY `from` IS NOT NULL
WINDOW w AS (PARTITION BY user_id ORDER BY MIN(date));
Query results
with main as (
select
*,
dense_rank() over(partition by user_id order by date) as rank_,
row_number() over(partition by user_id, status order by date) as rank_2,
row_number() over(partition by user_id, status order by date) - dense_rank() over(partition by id order by date) as diff,
row_number() over(partition by user_id order by date) as row_num,
lag(status) over(partition by user_id order by date) as prev_status,
concat(lag(status) over(partition by user_id order by date) , ' to ' , status) as status_change
from table
),
new_rank as (
select
*,
rown_num - diff as row_num_diff,
min(date) over(partition by user_id, status, rown_num - diff) as min_date
from main
),
prev_date as (
select
*,
lag(min_date) over(partition by user_id order by date) as prev_min_date
from new_rank
)
select
status as from,
prev_status as to,
date_diff(prev_min_date, min_date, DAY) as days_in_between
from prev_date
where status !=prev_status and prev_status is not null
Does this seem to work? I tried to solve this but it's very hard to solve it without a fiddle plus:
you may remove the extra steps/ranks that I have added, I left them there so you can visually see what they are doing
I don't get your steps logic so it is missing from the code

Select data where sum for last 7 from max-date is greater than x

I have a data set as such:
Date Value Type
2020-06-01 103 B
2020-06-01 100 A
2020-06-01 133 A
2020-06-11 150 A
2020-07-01 1000 A
2020-07-21 104 A
2020-07-25 140 A
2020-07-28 1600 A
2020-08-01 100 A
Like this:
Type ISHIGH
A 1
B 0
Here's the query i tried,
select type, case when sum(value) > 10 then 1 else 0 end as total_usage
from table_a
where (select sum(value) as usage from tableA where date = max(date)-7)
group by type, date
This is clearly not right. What is a simple way to do this?
It is a simply group by except that you need to be able to access max date before grouping:
select type
, max(date) as last_usage_date
, sum(value) as total_usage
, case when sum(case when date >= cutoff_date then value end) >= 1000 then 'y' end as [is high!]
from t
cross apply (
select dateadd(day, -6, max(date))
from t as x
where x.type = t.type
) as ca(cutoff_date)
group by type, cutoff_date
If you want just those two columns then a simpler approach is:
select t.type, case when sum(value) >= 1000 then 'y' end as [is high!]
from t
left join (
select type, dateadd(day, -6, max(date)) as cutoff_date
from t
group by type
) as a on t.type = a.type and t.date >= a.cutoff_date
group by t.type
Find the max date by type. Then used it to find last 7 days and sum() the value.
with
cte as
(
select [type], max([Date]) as MaxDate
from tableA
group by [type]
)
select c.[type], sum(a.Value),
case when SUM(a.Value) > 1000 then 1 else 0 end as ISHIGH
from cte c
inner join tableA a on a.[type] = c.[type]
and a.[Date] >= DATEADD(DAY, -7, c.MaxDate)
group by c.[type]
This can be done through a cumulative total as follows:
;With CTE As (
Select [type], [date],
SUM([value]) Over (Partition by [type] Order by [date] Desc) As Total,
Row_Number() Over (Partition by [type] Order by [date] Desc) As Row_Num
From Tbl)
Select Distinct CTE.[type], Case When C.[type] Is Not Null Then 1 Else 0 End As ISHIGH
From CTE Left Join CTE As C On (CTE.[type]=C.[type]
And DateDiff(dd,CTE.[date],C.[date])<=7
And C.Total>1000)
Where CTE.Row_Num=1
I think you are quite close with you initial attempt to solve this. Just a tiny edit:
select type, case when sum(value) > 1000 then 1 else 0 end as total_usage
from tableA
where date > (select max(date)-7 from tableA)
group by type

How to remove NULL values from two rows in a table

output I am getting is this.
2015-10-01 NULL
NULL NULL
NULL NULL
NULL 2015-10-05
2015-10-11 NULL
NULL 2015-10-13
2015-10-15 2015-10-16
2015-10-25 NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL 2015-10-31
I want this to be
2015-10-01 2015-10-05
2015-10-11 2015-10-13
2015-10-15 2015-10-16
2015-10-25 2015-10-31
My code:
select (case when (end_lag <> start_date) or end_lag is null then start_date end) as start_date,
(case when (start_lead <> end_date) or start_lead is null then end_date end) as end_date
from
(select lead(start_date) over(order by start_date) as start_lead, start_date, end_date, lag(end_date) over(order by end_date) as end_lag
from projects) t1;
original table has two attributes (start_date, end_date), I have created the lead column for start_date and lag column for end_date
From current results table would go with:
select start_date, end_date
from (select row_number() over(order by null) rn, start_date
from current_t
where start_date is not null) a
join (select row_number() over(order by null) rn, end_date
from current_t
where end_date is not null) b
on b.rn = a.rn;
(sql fiddle here)
You don't seem to have an ordering for your rows. So, you can just unpivot and pair them up:
select min(dte), nullif(max(dte), min(dte))
from (select x.dte, row_number() over (order by dte) as seqnum
from projects p cross join lateral
(select p.start_date as dte from dual union all
select p.end_date from dual
) x
) p
group by ceil(seqnum / 2)
Ignore two NULLs and take lead value from your original query. I guess it could be simplified, hard to know without DDL and sample data.
select *
from (
select start_date,
case when end_date is null then lead(end_date) over(order by coalesce(start_date, end_date)) else end_date end end_date
from (
select *
from (
-- your original query
select (case when (end_lag <> start_date) or end_lag is null then start_date end) as start_date,
(case when (start_lead <> end_date) or start_lead is null then end_date end) as end_date
from (
select lead(start_date) over(order by start_date) as start_lead, start_date, end_date,
lag(end_date) over(order by end_date) as end_lag
from projects) t1
---
) tbl
where not (start_date is null and end_date is null )
) t
) t
where start_date is not null
order by start_date;

SQL - unique users who are visiting for the first time

Given following table visitorLog, write a SQL to find the following by date.
Total_Visitors
VisitorGain - compare to previous day
VisitorLoss - compare to previous day
Total_New_Visitors - unique users who are visiting for the first time
visitorLog :
*----------------------*
| Date Visitor |
*----------------------*
| 01-Jan-2011 V1 |
| 01-Jan-2011 V2 |
| 01-Jan-2011 V3 |
| 02-Jan-2011 V2 |
| 03-Jan-2011 V2 |
| 03-Jan-2011 V4 |
| 03-Jan-2011 V5 |
*----------------------*
Expected output:
*---------------------------------------------------------------------*
| Date Total_Visitors VisitorGain VisitorLoss Total_New_Visitors |
*---------------------------------------------------------------------*
| 01-Jan-2011 3 3 0 3 |
| 02-Jan-2011 1 0 2 0 |
| 03-Jan-2011 3 2 0 2 |
*---------------------------------------------------------------------*
Here is my SQL and SLQ fiddle.
with cte as
(
select
date,
total_visitors,
lag(total_visitors) over (order by date) as prev_visitors,
row_number() over (order by date ) as rnk
from
(
select
*,
count(visitor) over (partition by date) as total_visitors
from visitorLog
) val
group by
date,
total_visitors
),
cte2 as
(
select
date,
sum(case when rnk = 1 then 1 else 0 end) as total_new_visitors
from
(
select
date,
visitor,
row_number() over (partition BY visitor order by date) as rnk
from visitorLog
) t
group by
date
)
select
c.date,
sum(total_visitors) as total_visitors,
sum(
case
when rnk = 1 then total_visitors
when (rnk > 1 and prev_visitors < total_visitors) then (total_visitors - prev_visitors)
else
0
end
)visitorGain,
sum(
case
when rnk = 1 then 0
when prev_visitors > total_visitors then (prev_visitors - total_visitors)
else
0
end
) as visitorLoss,
sum(total_new_visitors) as total_new_visitors
from cte c
join cte2 c2
on c.date = c2.date
group by
c.date
order by
c.date
My solution is working as expected but I am wondering if I am missing any any edge cases here which may break my logic. any help would be great.
This logic does what you want:
select date, count(*) as num_visitor,
greatest(count(*) - lag(count(*)::int, 1, 0) over (order by date), 0) as visitor_gain,
greatest(lag(count(*)::int, 1, 0) over (order by date) - count(*), 0) as visitor_loss,
count(*) filter (where seqnum = 1) as num_new_visitors
from (select vl.*,
row_number() over (partition by visitor order by date) as seqnum
from visitorLog vl
) vl
group by date
order by date
Here is a db<>fiddle.
I would use window functions and aggregation:
select
date,
count(*) no_visitor,
count(*) - lag(count(*), 1, 0) over(partition by date) no_visitor_diff,
count(*) filter(where rn = 1) no_new_visitors
from (
select t.*, row_number() over(partition by visitor order by date) rn
from visitorLog
) t
group by date
order by date
The subquery ranks the visits of each customer using row_number() (the first visit of each customer gets row number 1). Then, the outer query aggregates by date, and uses lag() to get the visitor count of the "previous" day.
I don't really see the point to have two distinct columns for the difference of visitors compared to the last day, so this gives you a single column, with a value that's either positive or negative depending whether customers were gained or lost.
If you really want two columns, then:
greatest(count(*) - lag(count(*), 1, 0) over(partition by date), 0) visitor_gain,
- least(count(*) - lag(count(*), 1, 0) over(partition by date), 0) visitor_loss

Skip specific rows using LAG in sql

I have a table that looks like this:
Using the LAG function in SQL, I would like to perform the LAG on only values where star_date=end_date and get the past previous start_date record where start_date=end_date.
That my end table will have an extra column like this:
I hope my question is clear, any help is appreciated.
You can assign a group to these values and use that:
select t.*,
(case when start_date = end_date
then lag(start_date) over (partition by (case when start_date = end_date then 1 else 0 end) order by start_date)
end) as prev_eq_start_date
from t;
Or:
select t.*,
(case when start_date = end_date
then lag(start_date) over (partition by start_date = end_date order by start_date)
end) as prev_eq_start_date
from t;
Note if you data is big and most rows have different dates, then you might have a resources issue. In this case, an additional, unused partition by key can help:
select t.*,
(case when start_date = end_date
then lag(start_date) over (partition by (case when start_date = end_date then 1 else 2 end), (case when start_date <> end_date then start_date end) order by start_date)
end) as prev_eq_start_date
from t;
This has no impact on the result but it can avoid a resources error caused by too many rows with different values.
Below is for BigQuery Standard SQL
#standardSQL
SELECT *, NULL AS lag_result
FROM `project.dataset.table` WHERE start_date != end_date
UNION ALL
SELECT *, LAG(start_date) OVER(ORDER BY start_date)
FROM `project.dataset.table` WHERE start_date = end_date
If to apply to sample data in your question - result is
Row user_id start_date end_date lag_result
1 1 2019-01-01 2019-02-28 null
2 3 2019-02-27 2019-02-28 null
3 4 2019-08-04 2019-09-01 null
4 2 2019-02-01 2019-02-01 null
5 5 2019-08-07 2019-08-07 2019-02-01
6 6 2019-08-27 2019-08-27 2019-08-07
Btw, in case if your start_date and end_date are of STRING data type ('27/02/2019') vs. DATE type ('2019-02-27' as it was assumed in above query) - you should use below one
#standardSQL
SELECT *, NULL AS lag_result
FROM `project.dataset.table` WHERE start_date != end_date
UNION ALL
SELECT *, LAG(start_date) OVER(ORDER BY PARSE_DATE('%d/%m/%Y', start_date))
FROM `project.dataset.table` WHERE start_date = end_date
with result
Row user_id start_date end_date lag_result
1 1 01/01/2019 28/02/2019 null
2 3 27/02/2019 28/02/2019 null
3 4 04/08/2019 01/09/2019 null
4 2 01/02/2019 01/02/2019 null
5 5 07/08/2019 07/08/2019 01/02/2019
6 6 27/08/2019 27/08/2019 07/08/2019
Use JOIN
SQL FIDDLE
SELECT T.*,T1.LAG_Result
FROM TABLE T LEFT JOIN
(
SELECT User_Id,LAG(start_date) OVER(ORDER BY start_date) LAG_Result
FROM TABLE S
WHERE start_date = end_date
) T1 ON T.User_Id = T1.User_Id