T-SQL Combine rows in continuation - sql

I have a table that looks like the following.
What I want is the the rows in continuation of each other to be grouped together - for each "ID".
The column IsContinued marks if the next row should be combined with the current row
My data looks like this:
+-----+--------+-------------+-----------+----------+
| ID | Period | IsContinued | StartDate | EndDate |
+-----+--------+-------------+-----------+----------+
| 123 | 1 | 1 | 20180101 | 20180404 |
+-----+--------+-------------+-----------+----------+
| 123 | 2 | 1 | 20180501 | 20180910 |
+-----+--------+-------------+-----------+----------+
| 123 | 3 | 0 | 20181001 | 20181201 |
+-----+--------+-------------+-----------+----------+
| 123 | 4 | 1 | 20190105 | 20190228 |
+-----+--------+-------------+-----------+----------+
| 123 | 5 | 0 | 20190401 | 20190430 |
+-----+--------+-------------+-----------+----------+
| 456 | 2 | 1 | 20180201 | 20180215 |
+-----+--------+-------------+-----------+----------+
| 456 | 3 | 0 | 20180301 | 20180401 |
+-----+--------+-------------+-----------+----------+
| 456 | 4 | 0 | 20180501 | 20180530 |
+-----+--------+-------------+-----------+----------+
| 456 | 5 | 0 | 20180701 | 20180705 |
+-----+--------+-------------+-----------+----------+
The end result I want is this:
+-----+-------------+-----------+-----------+----------+
| ID | PeriodStart | PeriodEnd | StartDate | EndDate |
+-----+-------------+-----------+-----------+----------+
| 123 | 1 | 3 | 20180101 | 20181201 |
+-----+-------------+-----------+-----------+----------+
| 123 | 4 | 5 | 20190105 | 20190430 |
+-----+-------------+-----------+-----------+----------+
| 456 | 2 | 3 | 20180201 | 20180401 |
+-----+-------------+-----------+-----------+----------+
| 456 | 4 | 4 | 20180501 | 20180530 |
+-----+-------------+-----------+-----------+----------+
| 456 | 5 | 5 | 20180701 | 20180705 |
+-----+-------------+-----------+-----------+----------+
DDL Statement:
CREATE TABLE #Period (ID INT, PeriodNr INT, IsContinued INT, STARTDATE DATE, ENDDATE DATE)
INSERT INTO #Period VALUES (123,1,1,'20180101', '20180404'),
(123,2,1,'20180501', '20180910'),
(123,3,0,'20181001', '20181201'),
(123,4,1,'20190105', '20190228'),
(123,5,0,'20190401', '20190430'),
(456,2,1,'20180201', '20180215'),
(456,3,0,'20180301', '20180401'),
(456,4,0,'20180501', '20180530'),
(456,5,0,'20180701', '20180705')
The code should be run on SQL Server 2016
Thanks!

Here is one approach:
with removeFluff as
(
SELECT *
FROM (
SELECT ID, PeriodNr, IsContinued, STARTDATE, ENDDATE, LAG(IsContinued,1,2) OVER (PARTITION BY ID ORDER BY PERIODNR) Lag
FROM #Period
) A
WHERE (IsContinued <> Lag) OR (IsContinued + Lag = 0)
)
,getValues as
(
SELECT ID,
CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(PeriodNr) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE PeriodNr END PeriodStart,
PeriodNr PeriodEnd,
CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(STARTDATE) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE STARTDATE END StartDate,
EndDate,
IsContinued
FROM removeFluff r
)
SELECT ID, PeriodStart, PeriodEnd, StartDate, EndDate
FROM getValues
WHERE IsContinued = 0
Output:
ID PeriodStart PeriodEnd StartDate EndDate
123 1 3 2018-01-01 2018-12-01
123 4 5 2019-01-05 2019-04-30
456 2 3 2018-02-01 2018-04-01
456 4 4 2018-05-01 2018-05-30
456 5 5 2018-07-01 2018-07-05
Method:
removeFluff cte removes lines that are unimportant. Theses are the records that don't start or end a segment (line 2 in your sample data)
Now that the fluff is removed, we know that either:
A.) The line is complete on it's own (LAG(IsContinued) ... = 0), ie. previous line is complete
B.) The line needs the "start" info from the previous line (LAG(IsContinued) ... = 1)
We apply these two cases in the CASE expression of the getValues cte
Last, the results are narrowed to only the important rows in the final select with IsContinued = 0. This is because we have used LAG to get "start" data on the "end" data row, so we only want to select the end rows

Related

Number based on condition

I'm trying to generate a number based on a condition.
When there is yes in column 'Stop' in the partition of a Client ordered by Start_Date, the Dense Rank has to start over. So I tried several things but it's stil not what I want.
My table with current number and expected number
+-----------+------------+------+------------+-------------+
| Client_No | Start_Date | Stop | Current_No | Expected_No |
+-----------+------------+------+------------+-------------+
| 1 | 1-1-2018 | No | 1 | 1 |
+-----------+------------+------+------------+-------------+
| 1 | 1-2-2018 | No | 2 | 2 |
+-----------+------------+------+------------+-------------+
| 1 | 1-3-2018 | No | 3 | 3 |
+-----------+------------+------+------------+-------------+
| 1 | 1-4-2018 | Yes | 1 | 1 |
+-----------+------------+------+------------+-------------+
| 1 | 1-5-2018 | No | 4 | 2 |
+-----------+------------+------+------------+-------------+
| 1 | 1-6-2018 | No | 5 | 3 |
+-----------+------------+------+------------+-------------+
| 2 | 1-2-2018 | No | 1 | 1 |
+-----------+------------+------+------------+-------------+
| 2 | 1-3-2018 | No | 2 | 2 |
+-----------+------------+------+------------+-------------+
| 2 | 1-4-2018 | Yes | 1 | 1 |
+-----------+------------+------+------------+-------------+
| 2 | 1-5-2018 | No | 3 | 2 |
+-----------+------------+------+------------+-------------+
| 2 | 1-6-2018 | Yes | 2 | 1 |
+-----------+------------+------+------------+-------------+
The query I used so far:
DENSE_RANK() OVER(PARTITION BY Client_No, Stop ORDER BY Start_Date ASC)
This seems not to be the solution because it counts onwart from the value 'no', but I don't no how to handle this in another way.
One way to solve such a Gaps-And-Islands puzzle is to first calculate a rank that starts with the 'Yes' stops.
Then calculate the row_number or dense_rank also over that rank.
For example:
create table test
(
Id int identity(1,1) primary key,
Client_No int,
Start_Date date,
Stop varchar(3)
)
insert into test
(Client_No, Start_Date, Stop) values
(1,'2018-01-01','No')
,(1,'2018-02-01','No')
,(1,'2018-03-01','No')
,(1,'2018-04-01','Yes')
,(1,'2018-05-01','No')
,(1,'2018-06-01','No')
,(2,'2018-02-01','No')
,(2,'2018-03-01','No')
,(2,'2018-04-01','Yes')
,(2,'2018-05-01','No')
,(2,'2018-06-01','Yes')
select *
, row_number() over (partition by Client_no, Rnk order by start_date) as rn
from
(
select *
, sum(case when Stop = 'Yes' then 1 else 0 end) over (partition by Client_No order by start_date) rnk
from test
) q
order by Client_No, start_date
GO
Id | Client_No | Start_Date | Stop | rnk | rn
-: | --------: | :------------------ | :--- | --: | :-
1 | 1 | 01/01/2018 00:00:00 | No | 0 | 1
2 | 1 | 01/02/2018 00:00:00 | No | 0 | 2
3 | 1 | 01/03/2018 00:00:00 | No | 0 | 3
4 | 1 | 01/04/2018 00:00:00 | Yes | 1 | 1
5 | 1 | 01/05/2018 00:00:00 | No | 1 | 2
6 | 1 | 01/06/2018 00:00:00 | No | 1 | 3
7 | 2 | 01/02/2018 00:00:00 | No | 0 | 1
8 | 2 | 01/03/2018 00:00:00 | No | 0 | 2
9 | 2 | 01/04/2018 00:00:00 | Yes | 1 | 1
10 | 2 | 01/05/2018 00:00:00 | No | 1 | 2
11 | 2 | 01/06/2018 00:00:00 | Yes | 2 | 1
db<>fiddle here
The difference between using this:
row_number() over (partition by Client_no, Rnk order by start_date)
versus this:
dense_rank() over (partition by Client_no, Rnk order by start_date)
is that the dense_rank would calculate the same number for the same start_date per Client_no & Rnk.
Below is one approach which gives you the output you want. You can see as a live/working demo here.
The steps involved are:
Create an adjusted stop value where we mark Stop as Yes for the first ever row for every customer
Create a separate table which only includes the rows where we will want to start/restart counting
For each of the rows in this new table we also add an end data, which is basically the date of the next row for every customer, or for the last row a date in the future
We join the original data table with the new table and run a sequence based on this new calculation
-- 1. Creating adjusted stop value
data_adjusted_stop as
(
select *,
case when row_number() over(partition by Client_No order by Start_Date asc) = 1 then 'Yes' else Stop end as adjusted_stop
from data
),
-- 2. Extracting the rows where we will want to (re)start the counting
data_with_cycle as
(
select Client_No,
row_number() over(partition by Client_No order by Start_Date asc) adjusted_stop_cycle,
Start_Date
from data_adjusted_stop
where adjusted_stop = 'Yes'
),
-- 3. Adding an End_Date column for each row where we will want to (re)start counting
data_with_end_date as
(
select *,
coalesce(lead(Start_Date) over (partition by Client_No order by Start_Date asc), '2021-01-01') as End_Date
from data_with_cycle
)
-- 4. Running a sequence partitioned by Client_No and the stop cycle
select data.*,
row_number() over(partition by data.Client_No, data_with_end_date.adjusted_stop_cycle order by data.Start_Date asc) as desired_output_sequence
from data
left join data_with_end_date
on data_with_end_date.Client_no = data.Client_no
where data.Start_Date >= data_with_end_date.Start_Date
and data.Start_Date < data_with_end_date.End_Date
order by 1, 2

How to Create a Flag Based on Date Values in Hive

I have a sample table as follows:
| name | startdate | enddate | flg |
|-------|-----------|------------|-----|
| John | 6/1/2018 | 7/1/2018 | |
| John | 10/1/2018 | 11/1/2018 | |
| John | 12/1/2018 | 12/20/2018 | |
| Ron | 3/1/2017 | 9/1/2017 | |
| Ron | 5/1/2018 | 10/1/2018 | |
| Jacob | 6/10/2018 | 6/12/2018 | |
What I want in the output: If a person has a 'startdate' within 60 days (or 2 months) of an 'enddate' values; then set the flg as 1 for that person. else have the flg as 0.
For example: John has a record of startdate on December 1st; which is within 60 days of one of the enddate for this person (November 1st 2018). So, the flg for this person is set to 1.
So, the output should look like as:
| Name | startdate | enddate | flg |
|-------|-----------|------------|-----|
| John | 6/1/2018 | 7/1/2018 | 1 |
| John | 10/1/2018 | 11/1/2018 | 1 |
| John | 12/1/2018 | 12/20/2018 | 1 |
| Ron | 3/1/2017 | 9/1/2017 | 0 |
| Ron | 5/1/2018 | 10/1/2018 | 0 |
| Jacob | 6/10/2018 | 6/12/2018 | 0 |
Any idea please?
Date Functions: Use datediff and case
select Name,startdate,enddate,
case when datediff(enddate,startdate) < 60 then 1 else 0 end flag
from table
If you are comparing the previous row's enddate, use lag()
select Name,startdate,enddate,
case when datediff(startdate,prev_enddate) < 60 then 1 else 0 end flag
from
(
select Name,startdate,enddate,
lag(endate) over(partition by Name order by startdate,enddate) as prev_enddate
from table
) t
Use lag to get the enddate of the previous row (per name). After this the flag can be set per name using max window function with a case expression that checks to see if the 60 day diff is satisfied at least once per name.
select name
,startdate
,enddate
,max(case when datediff(startdate,prev_end_dt) < 60 then 1 else 0 end) over(partition by name) as flag
from (select t.*
,lag(enddate) over(partition by name order by startdate) as prev_end_dt
from table t
) t

SQL Select Day IN and Day OUT grouped by ID's

How to GROUP EIDs by dates where Date between 2014-01-15 and 2014-03-18
| ID |EID | DATE | Status | |
|----------|--------------|---------|-----|
| 9 |9991 | 2014-03-16 | OUT | |
| 8 |9997 | 2014-03-18 | IN | |
| 7 |9997 | 2014-03-16 | OUT | |
| 6 |9999 | 2014-02-16 | IN | |
| 5 |9999 | 2014-02-16 | OUT | |
| 4 |9996 | 2014-03-18 | IN | |
| 3 |9996 | 2014-03-16 | OUT | |
| 2 |9997 | 2014-01-18 | IN | |
| 1 |9997 | 2014-01-15 | OUT | |
Output should be like:
|
|EID |in date | OUT date| DAYS OUT |
|------|--------------|--------- |------ ----|
| 9997 | 2014-03-18 | 2014-03-16| 2 |
| 9997 | 2014-01-18 | 2014-01-15| 3 |
| 9999 | 2014-02-16 | 2014-02-16| 0 |
| 9996 | 2014-03-18 | 2014-03-16| 2 |
| 9991 | | 2014-03-16| |
Thank you
Here is one method that assumes that they are interleaved, so no two ins or outs are together:
select eid,
max(case when status = 'in' then date end) as in_date,
max(case when status = 'out' then date end) as out_date,
datediff(day,
max(case when status = 'in' then date end),
max(case when status = 'out' then date end)
) as days_diff
from (select t.*, row_number() over (partition by eid, status order by date) as seqnum
from t
) t
group by eid, seqnum;
I think that you have already done it but, have you tried to do the sentence like:
SELECT [here you format as you wish] FROM [your table] WHERE date BETWEEN '2014-01-15' AND '2014-03-18' GROUP BY date
or
SELECT [here you format as you wish] FROM [your table] WHERE dateIn >= '2014-01-15' AND dateOut <= '2014-03-18' GROUP BY dateIn
Can you share your full table?

Copy value into rows below until greater value is found in SQL

I have been working on copying first sequential value in "episode" until another value > than itself is found(see column "episode_final" below) without too much luck. The logic should partition the data by id ordered by date in SQL server 2012. Any help will be appreciated.
You can try to use LEAD window function get the episode next value.
Then use CASE WHEN check episode> nextVal does increase 1.
CREATE TABLE T(
id varchar(50),
date date,
episode int
);
INSERT INTO T VALUES (123,'2018-01-01',1);
INSERT INTO T VALUES (123,'2018-01-02',1);
INSERT INTO T VALUES (123,'2018-01-10',1);
INSERT INTO T VALUES (123,'2018-01-11',1);
INSERT INTO T VALUES (123,'2018-01-12',1);
INSERT INTO T VALUES (123,'2018-01-20',2);
INSERT INTO T VALUES (123,'2018-03-20',1);
INSERT INTO T VALUES (123,'2018-05-01',1);
INSERT INTO T VALUES (123,'2018-05-10',3);
INSERT INTO T VALUES (123,'2018-05-20',1);
INSERT INTO T VALUES (345,'2018-06-20',1);
INSERT INTO T VALUES (345,'2018-07-21',1);
INSERT INTO T VALUES (345,'2018-07-22',2);
Query 1:
SELECT t1.Id,
t1.Date,
t1.episode,
(SUM(CASE WHEN episode> coalesce(nextVal,preVal) THEN 1 ELSE 0 END) over (partition by id order by [date]) + 1) episode_final
FROM (
SELECT T.*,LEAD(episode) over (partition by id order by [date]) nextVal,
LAG(episode) over (partition by id order by [date]) preVal
FROM T
)t1
Results:
| Id | Date | episode | episode_final |
|-----|------------|---------|---------------|
| 123 | 2018-01-01 | 1 | 1 |
| 123 | 2018-01-02 | 1 | 1 |
| 123 | 2018-01-10 | 1 | 1 |
| 123 | 2018-01-11 | 1 | 1 |
| 123 | 2018-01-12 | 1 | 1 |
| 123 | 2018-01-20 | 2 | 2 |
| 123 | 2018-03-20 | 1 | 2 |
| 123 | 2018-05-01 | 1 | 2 |
| 123 | 2018-05-10 | 3 | 3 |
| 123 | 2018-05-20 | 1 | 3 |
| 345 | 2018-06-20 | 1 | 1 |
| 345 | 2018-07-21 | 1 | 1 |
| 345 | 2018-07-22 | 2 | 2 |

Show only one record, if value same in another column SQL

I have a table with 5 columns like this:
| ID | NAME | PO_NUMBER | DATE | STATS |
| 1 | Jhon | 160101-001 | 2016-01-01 | 7 |
| 2 | Jhon | 160101-002 | 2016-01-01 | 7 |
| 3 | Jhon | 160102-001 | 2016-01-02 | 7 |
| 4 | Jane | 160101-001 | 2016-01-01 | 7 |
| 5 | Jane | 160102-001 | 2016-01-02 | 7 |
| 6 | Jane | 160102-002 | 2016-01-02 | 7 |
| 7 | Jane | 160102-003 | 2016-01-02 | 7 |
I need to display all values, but stats fields without duplicate according from date field.
Like this
| ID | NAME | PO_NUMBER | DATE | STATS |
| 1 | Jhon | 160101-001 | 2016-01-01 | 7 |
| 2 | Jhon | 160101-002 | 2016-01-01 | null |
| 3 | Jhon | 160102-001 | 2016-01-02 | 7 |
| 4 | Jane | 160101-001 | 2016-01-01 | 7 |
| 5 | Jane | 160102-001 | 2016-01-02 | 7 |
| 6 | Jane | 160102-002 | 2016-01-02 | null |
| 7 | Jane | 160102-003 | 2016-01-02 | null |
I've had trouble getting the hoped. Thanks
From your sample data, it appears you only want to show the stats for po_number ending with 001. If so, this should be the easiest approach:
select id, name, po_number, date,
case when right(po_number, 3) = '001' then stats else null end as stats
from yourtable
If instead you want to order by the po_number, then here's one option using row_number:
select id, name, po_number, date,
case when rn = 1 then stats else null end as stats
from (
select *, row_number() over (partition by name, date order by po_number) as rn
from yourtable
) t
SQL Fiddle Demo
since you are using SQL 2012, you can use the LEAD() or LAG() window function to compare the DATE value
select *,
STATS = case when t.DATE = LAG(DATE) OVER(ORDER BY ID)
then NULL
else STATS
end
from yourtable t
Use below code
;with temp as (
select id,name ,PO_NUMBER ,DATE, STATS,
LAG (STATS, 1, 0)
OVER (PARTITION BY name ,PO_NUMBER ,DATE ORDER BY id) AS PrevSTATS
from tableName
)
select id,name ,PO_NUMBER ,DATE,
case when STATS = PrevSTATS then null
else STATS end as STATS
from temp