How to Create a Flag Based on Date Values in Hive - hive

I have a sample table as follows:
| name | startdate | enddate | flg |
|-------|-----------|------------|-----|
| John | 6/1/2018 | 7/1/2018 | |
| John | 10/1/2018 | 11/1/2018 | |
| John | 12/1/2018 | 12/20/2018 | |
| Ron | 3/1/2017 | 9/1/2017 | |
| Ron | 5/1/2018 | 10/1/2018 | |
| Jacob | 6/10/2018 | 6/12/2018 | |
What I want in the output: If a person has a 'startdate' within 60 days (or 2 months) of an 'enddate' values; then set the flg as 1 for that person. else have the flg as 0.
For example: John has a record of startdate on December 1st; which is within 60 days of one of the enddate for this person (November 1st 2018). So, the flg for this person is set to 1.
So, the output should look like as:
| Name | startdate | enddate | flg |
|-------|-----------|------------|-----|
| John | 6/1/2018 | 7/1/2018 | 1 |
| John | 10/1/2018 | 11/1/2018 | 1 |
| John | 12/1/2018 | 12/20/2018 | 1 |
| Ron | 3/1/2017 | 9/1/2017 | 0 |
| Ron | 5/1/2018 | 10/1/2018 | 0 |
| Jacob | 6/10/2018 | 6/12/2018 | 0 |
Any idea please?

Date Functions: Use datediff and case
select Name,startdate,enddate,
case when datediff(enddate,startdate) < 60 then 1 else 0 end flag
from table
If you are comparing the previous row's enddate, use lag()
select Name,startdate,enddate,
case when datediff(startdate,prev_enddate) < 60 then 1 else 0 end flag
from
(
select Name,startdate,enddate,
lag(endate) over(partition by Name order by startdate,enddate) as prev_enddate
from table
) t

Use lag to get the enddate of the previous row (per name). After this the flag can be set per name using max window function with a case expression that checks to see if the 60 day diff is satisfied at least once per name.
select name
,startdate
,enddate
,max(case when datediff(startdate,prev_end_dt) < 60 then 1 else 0 end) over(partition by name) as flag
from (select t.*
,lag(enddate) over(partition by name order by startdate) as prev_end_dt
from table t
) t

Related

Update another row value in SQL Server

I have a table with name, location, startdate and enddate as follows:
+------+----------+-----------+-----------+-----------+
| name | location | startdate | endate | is_active |
+------+----------+-----------+-----------+-----------+
| A | delhi | 3/26/2019 | 3/26/2019 | 1 |
| A | delhi | 3/27/2019 | 3/27/2019 | 1 |
| A | delhi | 3/28/2019 | 3/28/2019 | 1 |
| A | delhi | 3/31/2019 | 3/31/2019 | 1 |
+------+----------+-----------+-----------+-----------+
need to update like this:
+------+----------+-----------+-----------+-----------+
| name | location | startdate | endate | is_active |
+------+----------+-----------+-----------+-----------+
| A | delhi | 3/26/2019 | 3/28/2019 | 1 |
| A | delhi | 3/27/2019 | 3/27/2019 | 0 |
| A | delhi | 3/28/2019 | 3/28/2019 | 0 |
| A | delhi | 3/31/2019 | 3/31/2019 | 1 |
+------+----------+-----------+-----------+-----------+
If the startdate is consecutive, the update the end date with the end date of last consecutive startdate and also update is_active = 0 for the consecutive startdate
This is a gaps-and-islands problem. Here is an approach using lag() and a cumulative sum() to define the groups. The final step is conditiona logic:
select
name,
location,
startdate,
case when row_number() over(partition by name, location, grp order by startdate) = 1
then max(startdate) over(partition by name, location, grp)
else enddate
end as enddate,
case when row_number() over(partition by name, location, grp order by startdate) = 1
then 1
else 0
end as is_active
from (
select
t.*,
sum(case when startdate = dateadd(day, 1, lag_enddate) then 0 else 1 end)
over(partition by name, location order by startdate) grp
from (
select
t.*,
lag(enddate) over(partition by name, location order by startdate) lag_enddate
from mytable t
) t
) t
Demo on DB Fiddle:
name | location | startdate | enddate | is_active
:--- | :------- | :--------- | :--------- | --------:
A | delhi | 2019-03-26 | 2019-03-28 | 1
A | delhi | 2019-03-27 | 2019-03-27 | 0
A | delhi | 2019-03-28 | 2019-03-28 | 0
A | delhi | 2019-03-31 | 2019-03-31 | 1

Create rows for iteration in SQL

I have to create a report of how long a ticket is open every first of the month, and another that shows how long it took to close a ticket. What is the best way to do this with SQL without creating an interval for each month? I am using SQL Server 2008 R2
My current data:
| Ticket | Start Date | End Date |
|--------|------------|------------|
| ABC | 5/8/2018 | 9/28/2018 |
| XYZ | 6/22/2018 | 10/15/2018 |
Expected result:
| Ticket | Start Date | End Date | Report Date | Ticket Age | Ticket Interval |
|--------|------------|------------|-------------|------------|-----------------|
| ABC | 5/8/2018 | 9/28/2018 | 6/1/2018 | 24 | |
| ABC | 5/8/2018 | 9/28/2018 | 7/1/2018 | 54 | |
| ABC | 5/8/2018 | 9/28/2018 | 8/1/2018 | 85 | |
| ABC | 5/8/2018 | 9/28/2018 | 9/1/2018 | 116 | |
| ABC | 5/8/2018 | 9/28/2018 | 10/1/2018 | | 143 |
| XYZ | 6/22/2018 | 10/15/2018 | 7/1/2018 | 9 | |
| XYZ | 6/22/2018 | 10/15/2018 | 8/1/2018 | 40 | |
| XYZ | 6/22/2018 | 10/15/2018 | 9/1/2018 | 71 | |
| XYZ | 6/22/2018 | 10/15/2018 | 10/1/2018 | 101 | |
| XYZ | 6/22/2018 | 10/15/2018 | 11/1/2018 | | 115 |
You can use recursive CTEs:
with cte as (
select ticket, sdate, edate, dateadd(month, 1, dateadd(day, 1 - day(sdate), sdate)) as reportdate
from t
union all
select ticket, sdate, edate, dateadd(month, 1, reportdate)
from cte
where reportdate <= edate
)
select cte.*, datediff(day, sdate, reportdate) as ticketage,
(case when datediff(month, edate, reportdate) = 1 then datediff(day, sdate, edate) end) as interval
from cte
order by ticket, reportdate;
I included the ticket age on the last month for the ticket. You can use a similar case expression if you really don't want it.
Here is a db<>fiddle.

SQL Select Day IN and Day OUT grouped by ID's

How to GROUP EIDs by dates where Date between 2014-01-15 and 2014-03-18
| ID |EID | DATE | Status | |
|----------|--------------|---------|-----|
| 9 |9991 | 2014-03-16 | OUT | |
| 8 |9997 | 2014-03-18 | IN | |
| 7 |9997 | 2014-03-16 | OUT | |
| 6 |9999 | 2014-02-16 | IN | |
| 5 |9999 | 2014-02-16 | OUT | |
| 4 |9996 | 2014-03-18 | IN | |
| 3 |9996 | 2014-03-16 | OUT | |
| 2 |9997 | 2014-01-18 | IN | |
| 1 |9997 | 2014-01-15 | OUT | |
Output should be like:
|
|EID |in date | OUT date| DAYS OUT |
|------|--------------|--------- |------ ----|
| 9997 | 2014-03-18 | 2014-03-16| 2 |
| 9997 | 2014-01-18 | 2014-01-15| 3 |
| 9999 | 2014-02-16 | 2014-02-16| 0 |
| 9996 | 2014-03-18 | 2014-03-16| 2 |
| 9991 | | 2014-03-16| |
Thank you
Here is one method that assumes that they are interleaved, so no two ins or outs are together:
select eid,
max(case when status = 'in' then date end) as in_date,
max(case when status = 'out' then date end) as out_date,
datediff(day,
max(case when status = 'in' then date end),
max(case when status = 'out' then date end)
) as days_diff
from (select t.*, row_number() over (partition by eid, status order by date) as seqnum
from t
) t
group by eid, seqnum;
I think that you have already done it but, have you tried to do the sentence like:
SELECT [here you format as you wish] FROM [your table] WHERE date BETWEEN '2014-01-15' AND '2014-03-18' GROUP BY date
or
SELECT [here you format as you wish] FROM [your table] WHERE dateIn >= '2014-01-15' AND dateOut <= '2014-03-18' GROUP BY dateIn
Can you share your full table?

T-SQL Combine rows in continuation

I have a table that looks like the following.
What I want is the the rows in continuation of each other to be grouped together - for each "ID".
The column IsContinued marks if the next row should be combined with the current row
My data looks like this:
+-----+--------+-------------+-----------+----------+
| ID | Period | IsContinued | StartDate | EndDate |
+-----+--------+-------------+-----------+----------+
| 123 | 1 | 1 | 20180101 | 20180404 |
+-----+--------+-------------+-----------+----------+
| 123 | 2 | 1 | 20180501 | 20180910 |
+-----+--------+-------------+-----------+----------+
| 123 | 3 | 0 | 20181001 | 20181201 |
+-----+--------+-------------+-----------+----------+
| 123 | 4 | 1 | 20190105 | 20190228 |
+-----+--------+-------------+-----------+----------+
| 123 | 5 | 0 | 20190401 | 20190430 |
+-----+--------+-------------+-----------+----------+
| 456 | 2 | 1 | 20180201 | 20180215 |
+-----+--------+-------------+-----------+----------+
| 456 | 3 | 0 | 20180301 | 20180401 |
+-----+--------+-------------+-----------+----------+
| 456 | 4 | 0 | 20180501 | 20180530 |
+-----+--------+-------------+-----------+----------+
| 456 | 5 | 0 | 20180701 | 20180705 |
+-----+--------+-------------+-----------+----------+
The end result I want is this:
+-----+-------------+-----------+-----------+----------+
| ID | PeriodStart | PeriodEnd | StartDate | EndDate |
+-----+-------------+-----------+-----------+----------+
| 123 | 1 | 3 | 20180101 | 20181201 |
+-----+-------------+-----------+-----------+----------+
| 123 | 4 | 5 | 20190105 | 20190430 |
+-----+-------------+-----------+-----------+----------+
| 456 | 2 | 3 | 20180201 | 20180401 |
+-----+-------------+-----------+-----------+----------+
| 456 | 4 | 4 | 20180501 | 20180530 |
+-----+-------------+-----------+-----------+----------+
| 456 | 5 | 5 | 20180701 | 20180705 |
+-----+-------------+-----------+-----------+----------+
DDL Statement:
CREATE TABLE #Period (ID INT, PeriodNr INT, IsContinued INT, STARTDATE DATE, ENDDATE DATE)
INSERT INTO #Period VALUES (123,1,1,'20180101', '20180404'),
(123,2,1,'20180501', '20180910'),
(123,3,0,'20181001', '20181201'),
(123,4,1,'20190105', '20190228'),
(123,5,0,'20190401', '20190430'),
(456,2,1,'20180201', '20180215'),
(456,3,0,'20180301', '20180401'),
(456,4,0,'20180501', '20180530'),
(456,5,0,'20180701', '20180705')
The code should be run on SQL Server 2016
Thanks!
Here is one approach:
with removeFluff as
(
SELECT *
FROM (
SELECT ID, PeriodNr, IsContinued, STARTDATE, ENDDATE, LAG(IsContinued,1,2) OVER (PARTITION BY ID ORDER BY PERIODNR) Lag
FROM #Period
) A
WHERE (IsContinued <> Lag) OR (IsContinued + Lag = 0)
)
,getValues as
(
SELECT ID,
CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(PeriodNr) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE PeriodNr END PeriodStart,
PeriodNr PeriodEnd,
CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(STARTDATE) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE STARTDATE END StartDate,
EndDate,
IsContinued
FROM removeFluff r
)
SELECT ID, PeriodStart, PeriodEnd, StartDate, EndDate
FROM getValues
WHERE IsContinued = 0
Output:
ID PeriodStart PeriodEnd StartDate EndDate
123 1 3 2018-01-01 2018-12-01
123 4 5 2019-01-05 2019-04-30
456 2 3 2018-02-01 2018-04-01
456 4 4 2018-05-01 2018-05-30
456 5 5 2018-07-01 2018-07-05
Method:
removeFluff cte removes lines that are unimportant. Theses are the records that don't start or end a segment (line 2 in your sample data)
Now that the fluff is removed, we know that either:
A.) The line is complete on it's own (LAG(IsContinued) ... = 0), ie. previous line is complete
B.) The line needs the "start" info from the previous line (LAG(IsContinued) ... = 1)
We apply these two cases in the CASE expression of the getValues cte
Last, the results are narrowed to only the important rows in the final select with IsContinued = 0. This is because we have used LAG to get "start" data on the "end" data row, so we only want to select the end rows

Show only one record, if value same in another column SQL

I have a table with 5 columns like this:
| ID | NAME | PO_NUMBER | DATE | STATS |
| 1 | Jhon | 160101-001 | 2016-01-01 | 7 |
| 2 | Jhon | 160101-002 | 2016-01-01 | 7 |
| 3 | Jhon | 160102-001 | 2016-01-02 | 7 |
| 4 | Jane | 160101-001 | 2016-01-01 | 7 |
| 5 | Jane | 160102-001 | 2016-01-02 | 7 |
| 6 | Jane | 160102-002 | 2016-01-02 | 7 |
| 7 | Jane | 160102-003 | 2016-01-02 | 7 |
I need to display all values, but stats fields without duplicate according from date field.
Like this
| ID | NAME | PO_NUMBER | DATE | STATS |
| 1 | Jhon | 160101-001 | 2016-01-01 | 7 |
| 2 | Jhon | 160101-002 | 2016-01-01 | null |
| 3 | Jhon | 160102-001 | 2016-01-02 | 7 |
| 4 | Jane | 160101-001 | 2016-01-01 | 7 |
| 5 | Jane | 160102-001 | 2016-01-02 | 7 |
| 6 | Jane | 160102-002 | 2016-01-02 | null |
| 7 | Jane | 160102-003 | 2016-01-02 | null |
I've had trouble getting the hoped. Thanks
From your sample data, it appears you only want to show the stats for po_number ending with 001. If so, this should be the easiest approach:
select id, name, po_number, date,
case when right(po_number, 3) = '001' then stats else null end as stats
from yourtable
If instead you want to order by the po_number, then here's one option using row_number:
select id, name, po_number, date,
case when rn = 1 then stats else null end as stats
from (
select *, row_number() over (partition by name, date order by po_number) as rn
from yourtable
) t
SQL Fiddle Demo
since you are using SQL 2012, you can use the LEAD() or LAG() window function to compare the DATE value
select *,
STATS = case when t.DATE = LAG(DATE) OVER(ORDER BY ID)
then NULL
else STATS
end
from yourtable t
Use below code
;with temp as (
select id,name ,PO_NUMBER ,DATE, STATS,
LAG (STATS, 1, 0)
OVER (PARTITION BY name ,PO_NUMBER ,DATE ORDER BY id) AS PrevSTATS
from tableName
)
select id,name ,PO_NUMBER ,DATE,
case when STATS = PrevSTATS then null
else STATS end as STATS
from temp