How to remove NULL values from two rows in a table - sql

output I am getting is this.
2015-10-01 NULL
NULL NULL
NULL NULL
NULL 2015-10-05
2015-10-11 NULL
NULL 2015-10-13
2015-10-15 2015-10-16
2015-10-25 NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL 2015-10-31
I want this to be
2015-10-01 2015-10-05
2015-10-11 2015-10-13
2015-10-15 2015-10-16
2015-10-25 2015-10-31
My code:
select (case when (end_lag <> start_date) or end_lag is null then start_date end) as start_date,
(case when (start_lead <> end_date) or start_lead is null then end_date end) as end_date
from
(select lead(start_date) over(order by start_date) as start_lead, start_date, end_date, lag(end_date) over(order by end_date) as end_lag
from projects) t1;
original table has two attributes (start_date, end_date), I have created the lead column for start_date and lag column for end_date

From current results table would go with:
select start_date, end_date
from (select row_number() over(order by null) rn, start_date
from current_t
where start_date is not null) a
join (select row_number() over(order by null) rn, end_date
from current_t
where end_date is not null) b
on b.rn = a.rn;
(sql fiddle here)

You don't seem to have an ordering for your rows. So, you can just unpivot and pair them up:
select min(dte), nullif(max(dte), min(dte))
from (select x.dte, row_number() over (order by dte) as seqnum
from projects p cross join lateral
(select p.start_date as dte from dual union all
select p.end_date from dual
) x
) p
group by ceil(seqnum / 2)

Ignore two NULLs and take lead value from your original query. I guess it could be simplified, hard to know without DDL and sample data.
select *
from (
select start_date,
case when end_date is null then lead(end_date) over(order by coalesce(start_date, end_date)) else end_date end end_date
from (
select *
from (
-- your original query
select (case when (end_lag <> start_date) or end_lag is null then start_date end) as start_date,
(case when (start_lead <> end_date) or start_lead is null then end_date end) as end_date
from (
select lead(start_date) over(order by start_date) as start_lead, start_date, end_date,
lag(end_date) over(order by end_date) as end_lag
from projects) t1
---
) tbl
where not (start_date is null and end_date is null )
) t
) t
where start_date is not null
order by start_date;

Related

Skip specific rows using LAG in sql

I have a table that looks like this:
Using the LAG function in SQL, I would like to perform the LAG on only values where star_date=end_date and get the past previous start_date record where start_date=end_date.
That my end table will have an extra column like this:
I hope my question is clear, any help is appreciated.
You can assign a group to these values and use that:
select t.*,
(case when start_date = end_date
then lag(start_date) over (partition by (case when start_date = end_date then 1 else 0 end) order by start_date)
end) as prev_eq_start_date
from t;
Or:
select t.*,
(case when start_date = end_date
then lag(start_date) over (partition by start_date = end_date order by start_date)
end) as prev_eq_start_date
from t;
Note if you data is big and most rows have different dates, then you might have a resources issue. In this case, an additional, unused partition by key can help:
select t.*,
(case when start_date = end_date
then lag(start_date) over (partition by (case when start_date = end_date then 1 else 2 end), (case when start_date <> end_date then start_date end) order by start_date)
end) as prev_eq_start_date
from t;
This has no impact on the result but it can avoid a resources error caused by too many rows with different values.
Below is for BigQuery Standard SQL
#standardSQL
SELECT *, NULL AS lag_result
FROM `project.dataset.table` WHERE start_date != end_date
UNION ALL
SELECT *, LAG(start_date) OVER(ORDER BY start_date)
FROM `project.dataset.table` WHERE start_date = end_date
If to apply to sample data in your question - result is
Row user_id start_date end_date lag_result
1 1 2019-01-01 2019-02-28 null
2 3 2019-02-27 2019-02-28 null
3 4 2019-08-04 2019-09-01 null
4 2 2019-02-01 2019-02-01 null
5 5 2019-08-07 2019-08-07 2019-02-01
6 6 2019-08-27 2019-08-27 2019-08-07
Btw, in case if your start_date and end_date are of STRING data type ('27/02/2019') vs. DATE type ('2019-02-27' as it was assumed in above query) - you should use below one
#standardSQL
SELECT *, NULL AS lag_result
FROM `project.dataset.table` WHERE start_date != end_date
UNION ALL
SELECT *, LAG(start_date) OVER(ORDER BY PARSE_DATE('%d/%m/%Y', start_date))
FROM `project.dataset.table` WHERE start_date = end_date
with result
Row user_id start_date end_date lag_result
1 1 01/01/2019 28/02/2019 null
2 3 27/02/2019 28/02/2019 null
3 4 04/08/2019 01/09/2019 null
4 2 01/02/2019 01/02/2019 null
5 5 07/08/2019 07/08/2019 01/02/2019
6 6 27/08/2019 27/08/2019 07/08/2019
Use JOIN
SQL FIDDLE
SELECT T.*,T1.LAG_Result
FROM TABLE T LEFT JOIN
(
SELECT User_Id,LAG(start_date) OVER(ORDER BY start_date) LAG_Result
FROM TABLE S
WHERE start_date = end_date
) T1 ON T.User_Id = T1.User_Id

Lead and case expression

I have this table:
ID Date
-----------------
1 1/1/2019
1 1/15/2019
Expected output:
ID DATE LEAD_DATE
-------------------------
1 1/1/2019 1/14/2019
1 1/15/2019 SYSDATE
SQL:
SELECT
*,
CASE
WHEN LEAD (a.date) OVER (PARTITION BY a.ID ORDER BY a.date) = TRUNC(a.date) THEN NULL
ELSE LEAD (a.date) OVER (PARTITION BY a.id ORDER BY a.date) - NUMTODSINTERVAL(1,'second')
END AS LEAD_DT
FROM a
Results:
ID DATE LEAD_DATE
-------------------------
1 1/1/2019 1/14/2019
1 1/15/2019
Can I add the system date when null in the case expression?
Use NVL :
SELECT
a.*,
NVL(CASE
WHEN LEAD (a.date) OVER (PARTITION BY H.ID ORDER BY a.date) = TRUNC(a.date) THEN NULL
ELSE LEAD (a.date) OVER (PARTITION BY a.id ORDER BY a.date) - NUMTODSINTERVAL(1,'second')
END, SYSDATE) AS LEAD_DT
FROM a
Or, better yet :
SELECT
a.*,
CASE LEAD (a.date) OVER (PARTITION BY a.ID ORDER BY a.date)
WHEN TRUNC(a.date) THEN SYSDATE
WHEN NULL THEN SYSDATE
ELSE LEAD (a.date) OVER (PARTITION BY a.id ORDER BY a.date) - NUMTODSINTERVAL(1,'second')
END AS LEAD_DT
FROM a
Use COALESCE:.
SELECT a.*,
CASE COALESCE(LEAD("Date") OVER (PARTITION BY ID ORDER BY "Date") - "Date", 0)
WHEN 0 THEN SYSDATE
ELSE LEAD("Date") OVER (PARTITION BY ID ORDER BY "Date") - INTERVAL '1' SECOND
END AS LEAD_DT
FROM a

Number of unique dates

There is table:
CREATE TABLE my_table
(gr_id NUMBER,
start_date DATE,
end_date DATE);
All dates always have zero time portion. I need to know a fastest way to compute number of unique dates inside gr_id.
For example, if there is rows (dd.mm.rrrr):
1 | 01.01.2000 | 07.01.2000
1 | 01.01.2000 | 07.01.2000
2 | 01.01.2000 | 03.01.2000
2 | 05.01.2000 | 07.01.2000
3 | 01.01.2000 | 04.01.2000
3 | 03.01.2000 | 05.01.2000
then right answer will be
1 | 7
2 | 6
3 | 5
At now I use additional table
CREATE TABLE mfr_date_list
(MFR_DATE DATE);
with every date between 01.01.2000 and 31.12.2020 and query like this:
SELECT COUNT(DISTINCT mfr_date_list.mfr_date) cnt,
dt.gr_id
FROM dwh_mfr.mfr_date_list,
(SELECT gr_id,
start_date AS sd,
end_date AS ed
FROM my_table
) dt
WHERE mfr_date_list.mfr_date BETWEEN dt.sd AND dt.ed
AND dt.ed IS NOT NULL
GROUP BY dt.gr_id
This query return correct resul data set, but I think it's not fastest way. I think there is some way to build query withot table mfr_date_list at all.
Oracle 11.2 64-bit.
I would expect what you're doing to be the fastest way (as always test). Your query can be simplified, though this only aids understanding and not necessarily speed:
select t.gr_id, count(distinct dl.mfr_date) as cnt
from my_table t
join mfr_date_list dl
on dl.mfr_date between t.date_start and t.date_end
where t.end_date is not null
group by t.gr_id
Whatever you do you have to generate the data between the two dates somehow as you need to remove the overlap. One way would be to use CAST(MULTISET()), as Lalit Kumar explains:
select gr_id, count(distinct end_date - column_value + 1)
from my_table m
cross join table(cast(multiset(select level
from dual
connect by level <= m.end_date - m.start_date + 1
) as sys.odcinumberlist))
group by gr_id;
GR_ID COUNT(DISTINCTEND_DATE-COLUMN_VALUE+1)
---------- --------------------------------------
1 7
2 6
3 5
This is very Oracle specific but should perform substantially better than most other row-generators as you're only accessing the table once and you're generating the minimal number of rows required due to the condition linking MY_TABLE and your generated rows.
What you really need to do is combine the ranges and then count the lengths. This can be quite challenging because of duplicate dates. The following is one way to approach this.
First, enumerate the dates and determine whether the date is "in" or "out". When the cumulative sum is 0 then it is "out":
select t.gr_id, dt,
sum(inc) over (partition by t.gr_id order by dt) as cume_inc
from (select t.gr_id, t.start_date as dt, 1 as inc
from my_table t
union all
select t.gr_id, t.end_date + 1, -1 as inc
from my_table t
) t
Then, use lead() to determine how long the period is:
with inc as (
select t.gr_id, dt,
sum(inc) over (partition by t.gr_id order by dt) as cume_inc
from (select t.gr_id, t.start_date as dt, 1 as inc
from my_table t
union all
select t.gr_id, t.end_date + 1, -1 as inc
from my_table t
) t
)
select t.gr_id,
sum(nextdt - dt) as daysInUse
from (select inc.*, lead(dt) over (partition by t.gr_id order by dt) as nextdt
from inc
) t
group by t.gr_id;
This is close to what you want. The following are two challenges: (1) putting in the limits and (2) handling ties. The following should work (although there might be off-by-one and boundary issues):
with inc as (
select t.gr_id, dt, priority,
sum(inc) over (partition by t.gr_id order by dt) as cume_inc
from ((select t.gr_id, t.start_date as dt, count(*) as inc, 1 as priority
from my_table t
group by t.gr_id, t.start_date
)
union all
(select t.gr_id, t.end_date + 1, - count(*) as inc, -1
from my_table t
group by t.gr_id, t.end_date
)
) t
)
select t.gr_id,
sum(least(nextdt, date '2020-12-31') - greatest(dt, date, '2010-01-01')) as daysInUse
from (select inc.*, lead(dt) over (partition by t.gr_id order by dt, priority) as nextdt
from inc
) t
group by t.gr_id;

Continuous and non continuous date spans in Oracle SQL: Finding the earliest date

I need your help to solve this problem:
Here is my data
id start_date end_date
5567 2008-04-17 2008-04-30
5567 2008-05-02 2008-07-31
5567 2008-08-01 2008-08-31
5567 2008-09-01 2009-12-31
Since there is a lapse between 2008-04-30 and 2008-05-02 the requirement is to display the earliest start date after the lapse.
id start_date end_date
5567 2008-05-02 2008-08-31
Here is another set of data:
id start_date end_date
5567 2008-04-17 2008-04-30
5567 2008-05-01 2008-07-31
5567 2008-08-01 2008-08-31
5567 2008-09-01 2009-12-31
In this case all the spans are continuous so the the earliest start date should be on the output. the output should be:
id start_date end_date
5567 2008-04-17 2008-04-30
Here is the code I have used:
select
id, min(start_date), contig
from (
select
id, start_date, end_date,
case
when lag(end_date) over (partition by id order by end_date) =
start_date-1 or row_number() over (partition by id order by
end_date)=1
then 'c' else 'l' end contig
from t2 )
group by id, contig;
It's working when there is no lapses between the span but giving two records when there is a lapse.
For example, when the spans are continuous my query returns:
ID MIN(START_DATE CONTIG
5567 17-APR-08 c
But when the data is not continuous it's showing two records:
ID MIN(START_DATE CONTIG
5567 02-MAY-08 l
5567 17-APR-08 c
But in this case I only want the 1st record.
I know there is a PL/SQL solution to this but can I achieve it in only SQL?
The database is Oracle 11gR2.
I think this will do what you want:
select start_date
from (select t2.start_date
from t2 left join
t2 t2p
on t2.start_date = t2p.end_date + 1
where t2p.end_date is null
order by t2.start_date nulls last
) t
where rownum = 1;
You can also do this with lag():
select coalesce(min(case when prev_end_date is not null then start_date end),
min(start_date))
from (select t2.*, lag(t2.end_date) over (order by t2.start_date) as prev_end_date
from t2
) t
where prev_end_date is null or prev_end_date <> start_date - 1;
Your "else" condition is a bit tricky. You have to be careful that you don't get the minimum start date all the time.

using min and max in group by clause

I want below output in oracle sql.
I have data in table as below :
id start_date end_date assignment number
1 2.02.2014 15.02.2014 10
2 25.02.2014 30.02.2014 20
3 26.03.2014 04.05.2014 30
4 06.06.2014 31.12.4712 10
I need output using group by
assignment_number start_date end_date
10 02.02.2014 15.02.2014
10 06.06.2014 31.12.4712
20 25.02.2014 30.02.2014
30 26.03.2014 04.05.2014
I tried using min(start_date) and max(end_date) for assignment 10 ia was getting output as
assignment_number start_date end_date
10 02.02.2014 31.12.4712
But I want as :-
assignment_number start_date end_date
10 02.02.2014 15.02.2014
10 06.06.2014 31.12.4712
Please help
I think you'd have to calculate the min and max separately, then union them. Try something like this:
SELECT
assignment_number
, start_date
, end_date
FROM
(SELECT
assignment_number
, start_date
, end_date
FROM TABLE
GROUP BY assignment_number
HAVING MIN(start_date)
UNION
SELECT
assignment_number
, start_date
, end_date
FROM TABLE
GROUP BY assignment_number
HAVING MAX(end_date)
)
ORDER BY
1 ASC
, 2 ASC
, 3 ASC
;
sql fiddle
select id, to_char(start_date,'dd.mm.yyyy') start_date, to_char(end_date,'dd.mm.yyyy') end_date,ASSIGNMENT_NUMBER from sof1 s
where not exists
(select 1 from sof1 s2
where s2.assignment_number=s.assignment_number
and s2.start_date<s.start_date
)
or not exists
(select 1 from sof1 s2
where s2.assignment_number=s.assignment_number
and s2.end_date>s.end_date
)
order by ASSIGNMENT_NUMBER
With analytic function:
sql fiddle
select id, to_char(start_date,'dd.mm.yyyy') start_date, to_char(end_date,'dd.mm.yyyy') end_date,ASSIGNMENT_NUMBER from
(select s.*
, min (start_date) over (partition by ASSIGNMENT_NUMBER) sd
, max (end_date) over (partition by ASSIGNMENT_NUMBER) ed
from sof1 s
)
where start_date=sd or end_date=ed
order by ASSIGNMENT_NUMBER, start_date