Skip specific rows using LAG in sql - sql

I have a table that looks like this:
Using the LAG function in SQL, I would like to perform the LAG on only values where star_date=end_date and get the past previous start_date record where start_date=end_date.
That my end table will have an extra column like this:
I hope my question is clear, any help is appreciated.

You can assign a group to these values and use that:
select t.*,
(case when start_date = end_date
then lag(start_date) over (partition by (case when start_date = end_date then 1 else 0 end) order by start_date)
end) as prev_eq_start_date
from t;
Or:
select t.*,
(case when start_date = end_date
then lag(start_date) over (partition by start_date = end_date order by start_date)
end) as prev_eq_start_date
from t;
Note if you data is big and most rows have different dates, then you might have a resources issue. In this case, an additional, unused partition by key can help:
select t.*,
(case when start_date = end_date
then lag(start_date) over (partition by (case when start_date = end_date then 1 else 2 end), (case when start_date <> end_date then start_date end) order by start_date)
end) as prev_eq_start_date
from t;
This has no impact on the result but it can avoid a resources error caused by too many rows with different values.

Below is for BigQuery Standard SQL
#standardSQL
SELECT *, NULL AS lag_result
FROM `project.dataset.table` WHERE start_date != end_date
UNION ALL
SELECT *, LAG(start_date) OVER(ORDER BY start_date)
FROM `project.dataset.table` WHERE start_date = end_date
If to apply to sample data in your question - result is
Row user_id start_date end_date lag_result
1 1 2019-01-01 2019-02-28 null
2 3 2019-02-27 2019-02-28 null
3 4 2019-08-04 2019-09-01 null
4 2 2019-02-01 2019-02-01 null
5 5 2019-08-07 2019-08-07 2019-02-01
6 6 2019-08-27 2019-08-27 2019-08-07
Btw, in case if your start_date and end_date are of STRING data type ('27/02/2019') vs. DATE type ('2019-02-27' as it was assumed in above query) - you should use below one
#standardSQL
SELECT *, NULL AS lag_result
FROM `project.dataset.table` WHERE start_date != end_date
UNION ALL
SELECT *, LAG(start_date) OVER(ORDER BY PARSE_DATE('%d/%m/%Y', start_date))
FROM `project.dataset.table` WHERE start_date = end_date
with result
Row user_id start_date end_date lag_result
1 1 01/01/2019 28/02/2019 null
2 3 27/02/2019 28/02/2019 null
3 4 04/08/2019 01/09/2019 null
4 2 01/02/2019 01/02/2019 null
5 5 07/08/2019 07/08/2019 01/02/2019
6 6 27/08/2019 27/08/2019 07/08/2019

Use JOIN
SQL FIDDLE
SELECT T.*,T1.LAG_Result
FROM TABLE T LEFT JOIN
(
SELECT User_Id,LAG(start_date) OVER(ORDER BY start_date) LAG_Result
FROM TABLE S
WHERE start_date = end_date
) T1 ON T.User_Id = T1.User_Id

Related

How to remove NULL values from two rows in a table

output I am getting is this.
2015-10-01 NULL
NULL NULL
NULL NULL
NULL 2015-10-05
2015-10-11 NULL
NULL 2015-10-13
2015-10-15 2015-10-16
2015-10-25 NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL 2015-10-31
I want this to be
2015-10-01 2015-10-05
2015-10-11 2015-10-13
2015-10-15 2015-10-16
2015-10-25 2015-10-31
My code:
select (case when (end_lag <> start_date) or end_lag is null then start_date end) as start_date,
(case when (start_lead <> end_date) or start_lead is null then end_date end) as end_date
from
(select lead(start_date) over(order by start_date) as start_lead, start_date, end_date, lag(end_date) over(order by end_date) as end_lag
from projects) t1;
original table has two attributes (start_date, end_date), I have created the lead column for start_date and lag column for end_date
From current results table would go with:
select start_date, end_date
from (select row_number() over(order by null) rn, start_date
from current_t
where start_date is not null) a
join (select row_number() over(order by null) rn, end_date
from current_t
where end_date is not null) b
on b.rn = a.rn;
(sql fiddle here)
You don't seem to have an ordering for your rows. So, you can just unpivot and pair them up:
select min(dte), nullif(max(dte), min(dte))
from (select x.dte, row_number() over (order by dte) as seqnum
from projects p cross join lateral
(select p.start_date as dte from dual union all
select p.end_date from dual
) x
) p
group by ceil(seqnum / 2)
Ignore two NULLs and take lead value from your original query. I guess it could be simplified, hard to know without DDL and sample data.
select *
from (
select start_date,
case when end_date is null then lead(end_date) over(order by coalesce(start_date, end_date)) else end_date end end_date
from (
select *
from (
-- your original query
select (case when (end_lag <> start_date) or end_lag is null then start_date end) as start_date,
(case when (start_lead <> end_date) or start_lead is null then end_date end) as end_date
from (
select lead(start_date) over(order by start_date) as start_lead, start_date, end_date,
lag(end_date) over(order by end_date) as end_lag
from projects) t1
---
) tbl
where not (start_date is null and end_date is null )
) t
) t
where start_date is not null
order by start_date;

Finding gaps between date ranges spanning records

I'm trying to write a query where I can find any gap in the date ranges for a given ID when passing in two dates.
EDIT: I need to know if a whole gap or part of a gap exists in my date range.
I have data in this format:
Example 1:
| ID | START_DATE | END_DATE |
|----|------------|------------|
| 1 | 01/01/2019 | 30/09/2019 |
| 1 | 01/03/2020 | (null) |
Example 2:
| ID | START_DATE | END_DATE |
|----|------------|------------|
| 2 | 01/01/2019 | 30/09/2019 |
| 2 | 01/10/2019 | 01/12/2019 |
| 2 | 02/12/2019 | (null) |
NB. A null end date essentially means "still active up to current day".
E.g. Example 1 has a gap of 152 days between 30/09/2019 and 01/03/2020. If I queried in the range of 05/05/2019 - 01/09/2019 there's no gap in that range. Whereas if I'm looking at the date range 05/05/2019 - 02/10/2019 there's a single day gap in that range.
For what it's worth, I don't actually care how many days gap, just whether there is one or not.
I've tried doing something like this but it doesn't work when my date falls into a gap:
SELECT SUM(START_DATE - PREV_END - 1)
FROM
(
SELECT ID, START_DATE, END_DATE, LAG(END_DATE) OVER (ORDER BY START_DATE) AS PREV_END_DATE
FROM TBL
WHERE ID = X_ID
)
WHERE START_DATE >= Y_FIRST_DATE
AND START_DATE <= Z_SECOND_DATE;
X_ID, Y_FIRST_DATE, and Z_SECOND_DATE are just any different ID or date range I might want to pass in.
How could I go about this?
Another option to determine the days might be by use SELECT .. FROM dual CONNECT BY LEVEL <= syntax through EXISTence of gaps by INTERSECTing two sets, one finds all dates between extremum parameters while the other finds all the dates fitting within the dates inserted into table as bounds :
SELECT CASE WHEN
SUM( 1 + LEAST(Z_SECOND_DATE,NVL(END_DATE,TRUNC(SYSDATE)))
- GREATEST(Y_FIRST_DATE,START_DATE) ) = Z_SECOND_DATE - Y_FIRST_DATE + 1 THEN
'NO Gap'
ELSE
'Gap Exists'
END "gap?"
FROM TBL t
WHERE ID = X_ID
AND EXISTS ( SELECT Y_FIRST_DATE + LEVEL - 1
FROM dual
CONNECT BY LEVEL <= Z_SECOND_DATE - Y_FIRST_DATE + 1
INTERSECT
SELECT t.START_DATE + LEVEL - 1
FROM dual
CONNECT BY LEVEL <= NVL(t.END_DATE,TRUNC(SYSDATE))- t.START_DATE + 1
)
START_DATE values are assumed to be non-null based on the sample data.
Demo
This is another variation the islands-and-gaps problem that pops up a lot here. I think this fits with Oracle's pattern matching functionality. Take this example:
WITH tbl AS
(
SELECT 1 AS ID, to_date('01/01/2019', 'DD/MM/YYYY') AS START_DATE, to_date('30/09/2019', 'DD/MM/YYYY') AS END_DATE FROM DUAL
UNION ALL
SELECT 1 AS ID, to_date('01/03/2020', 'DD/MM/YYYY') AS START_DATE, NULL AS END_DATE FROM DUAL
UNION ALL
SELECT 2 AS ID, to_date('01/01/2019', 'DD/MM/YYYY') AS START_DATE, to_date('30/09/2019', 'DD/MM/YYYY') AS END_DATE FROM DUAL
UNION ALL
SELECT 2 AS ID, to_date('01/10/2019', 'DD/MM/YYYY') AS START_DATE, to_date('01/12/2019', 'DD/MM/YYYY') AS END_DATE FROM DUAL
UNION ALL
SELECT 2 AS ID, to_date('02/12/2019', 'DD/MM/YYYY') AS START_DATE, NULL AS END_DATE FROM DUAL
)
SELECT *
FROM tbl
MATCH_RECOGNIZE(ORDER BY ID, start_date
MEASURES b.id AS ID,
a.end_date+1 AS GAP_START,
b.start_date-1 AS GAP_END
PATTERN (A B+)
DEFINE B AS start_date > PREV(end_date)+1 AND ID = PREV(ID))L;
I know it looks long, but most of it is creating the WITH clause. The pattern matching allows you to define what a gap is and pull the information accordingly. Notice that in order to have a gap, your start date must be greater than the previous end date + 1 grouped by the ID column.
To enhance this to answer your updated/edited question, just add this line of code to the end:
WHERE GREATEST(gap_start, TO_DATE('15/09/2019', 'DD/MM/YYYY' /*Y_FIRST_DATE*/)) <= LEAST(gap_end, to_date('15/10/2019', 'DD/MM/YYYY')/*Z_SECOND_DATE*/)
You can split the date range you are passing, into dates and then compare it with a date range in your table as follows:
SELECT
CASE WHEN SUM(CASE WHEN T.ID IS NULL THEN 1 END) > 0
THEN 'THERE IS GAP'
ELSE 'THERE IS NO GAP'
END AS RESULT_
FROM ( SELECT P_IN_FROM_DATE + LEVEL - 1 AS CUST_DATES
FROM DUAL
CONNECT BY LEVEL <= P_IN_TO_DATE - P_IN_FROM_DATE + 1
) CUST_TBL
LEFT JOIN TBL T
ON CUST_TBL.CUST_DATES BETWEEN T.START_DATE AND T.END_DATE
OR ( CUST_TBL.CUST_DATES >= T.START_DATE AND T.END_DATE IS NULL )
I would suggest finding the maximum end date before the current record -- based on the start date.
That would be:
select t.*
from (select t.*,
max(end_date) over (order by start_date
rows between unbounded preceding and 1 preceding
) as max_prev_end_date
from tbl t
where start_date <= :input_end_date and
end_date >= :input_start_date
) t
where max_prev_end_date < start_date;

GROUP BY ignore groups containing NULL value and fetch recent record by date from each group

I have a user_details table as below.
id user_id start_date end_date
1 55 5-1-2017 NULL
2 55 3-1-2017 4-30-2017
3 66 1-1-2018 1-31-2018
4 66 2-1-2018 4-12-2018
5 77 11-1-2016 11-30-2016
6 77 12-1-2016 NULL
7 99 8-1-2016 1-31-2017
8 99 7-1-2016 7-31-2016
I have to fetch the latest record by start_date for each user but fetch only those users having end_date set for all records of that user.
The output should be as below:
id user_id start_date end_date
4 66 2-1-2018 4-12-2018
7 99 8-1-2016 1-31-2017
How can I achieve this result?
You can use DISTINCT ON and an ORDER BY clause to get the row with the latest start_date per group.
Then eliminate the results with end_date IS NULL.
SELECT id, user_id, start_date, end_date
FROM (SELECT DISTINCT ON (user_id)
id, user_id, start_date, end_date
FROM user_detail
ORDER BY user_id, start_date DESC, end_date, id) AS q
WHERE end_date IS NOT NULL;
One approach is to aggregate by user_id, and then identify which users have an end_date from the latest record which is not NULL. We use a CTE to find the max start_date value for each user. In the HAVING clause we assert that when the start_date is the latest starting date, the end_date is also not NULL.
WITH cte AS (
SELECT id, user_id, start_date, end_date,
MAX(start_date) OVER (PARTITION BY user_id) max_start_date
FROM yourTable
)
SELECT
user_id,
MIN(start_date) AS start_date,
MAX(end_date) AS end_date
FROM cte
GROUP BY
user_id
HAVING
COUNT(CASE WHEN start_date = max_start_date AND
end_date IS NOT NULL THEN 1 END) > 0;
Demo
With NOT EXISTS:
select u.*
from user_details u
where not exists (
select 1 from user_details
where user_id = u.user_id and (start_date > u.start_date or end_date is null)
)
See the demo.
Results:
| id | user_id | start_date | end_date |
| --- | ------- | ---------- | --------- |
| 4 | 66 | 2018-02-01 | 2018-04-12|
| 7 | 99 | 2016-08-01 | 2017-01-31|
DISTINCT ON with the right index might be the most efficient method. But that index is quite specific: (user_id, start_date DESC, end_date, id). The following should have similar performance but with a simpler index:
select ud.*
from user_details ud
where ud.id = (select ud2.id
from user_details ud2
where ud2.user_id = ud.user_id
order by ud2.start_date desc
limit 1
) and
ud.end_date is not null;
For this, you want an index on user_details(user_id, start_date desc, id).

Partition rows where dates are between the previous dates

I have the below table.
I want to identify overlapping intervals of start_date and end_date.
*edit I would like to remove the row that has the least amount of days between the start and end date where those rows overlap.
Example:
pgid 1 & pgid 2 have overlapping days. Remove the row that has the least amount of days between start_date and end_date.
Table A
id pgid Start_date End_date Days
1 1 8/4/2018 9/10/2018 37
1 2 9/8/2018 9/8/2018 0
1 3 10/29/2018 11/30/2018 32
1 4 12/1/2018 sysdate 123
Expected Results:
id Start_date End_date Days
1 8/4/2018 9/10/2018 37
1 10/29/2018 11/30/2018 32
1 12/1/2018 sysdate 123
I am thinking exists:
select t.*,
(case when exists (select 1
from t t2
where t2.start_date < t.start_date and
t2.end_date > t.end_date and
t2.id = t.id
)
then 2 else 1
end) as overlap_flag
from t;
Maybe lead and lag:
SELECT
CASE
WHEN END_DATE > LEAD (START_DATE) OVER (PARTITION BY id ORDER BY START_DATE) THEN 1
WHEN START_DATE < LAG (END_DATE) OVER (PARTITION BY id ORDER BY START_DATE) THEN 1
ELSE 0
END OVERLAP_FLAG
FROM A

Oracle SQL overlap between begin date and end date in 2 or more records

Database my_table:
id seq start_date end_date
1 1 01-01-2017 02-01-2017
1 2 07-01-2017 09-01-2017
1 3 11-01-2017 11-01-2017
2 1 20-01-2017 20-01-2017
3 1 01-02-2017 02-02-2017
3 2 03-02-2017 04-02-2017
3 3 08-01-2017 09-02-2017
3 4 09-01-2017 10-02-2017
3 5 10-01-2017 12-02-2017
My requirement is to get the first date (normally seq 1 start date) and end date (normally last seq end date) and the number of dates occurred during all seq for each unique ID.
Date occurred:
id 1 2 3
01-01-2017 20-01-2017 01-02-2017
02-01-2017 02-02-2017
07-01-2017 03-02-2017
08-01-2017 04-02-2017
09-01-2017 08-02-2017
11-01-2017 09-02-2017
10-02-2017
11-02-2017
12-02-2017
total 6 1 9
Here is the result I want:
id start_date end_date num_date
1 01-01-2017 11-01-2017 6
2 20-01-2017 20-01-2017 1
3 01-02-2017 12-02-2017 9
I have tried
SELECT id
, MIN(start_date)
, MAX(end_date)
, SUM(end_date - start_date + 1)
FROM my_table
GROUP BY id
and this SQL statement work fine in id 1 and 2 since there is no overlap date between begin date and end date. But for id 3, the result num_date is 11. Could you please suggest the SQL statement to solve this problem? Thank you.
One more question: The date in database is in datetime format. How do I convert it to date. I tried to use TRUNC function but it sometimes convert date to yesterday instead.
You need to count how many times an end_date equals the following start_date. For this you need to use the lag() or the lead() analytic function. You can use a case expression for the comparison, but alas you can't wrap the case expression within a COUNT or SUM in the same query; you need a subquery and an outer query.
Something like this; not tested, since you didn't provide CREATE TABLE and INSERT statements to recreate your sample data.
select id, min(start_date) as start_date, max(end_date) as end_date,
sum(end_date - start_date + 1 - flag) as num_days
from ( select id, start_date, end_date,
case when start_date = lag(end_date)
over (partition by id order by end_date) then 1
else 0 end as flag
from my_table
)
group by id;
SELECT id,
MIN( start_date ) AS start_date,
MAX( end_date ) AS end_date,
SUM( end_date - start_date + 1 ) AS num_days
FROM (
SELECT id,
GREATEST(
start_date,
COALESCE(
LAG( end_date ) OVER ( PARTITION BY id ORDER BY seq ) + 1,
start_date
)
) AS start_date,
end_date
FROM your_table
)
WHERE start_date <= end_date
GROUP BY id;