Update another row value in SQL Server - sql

I have a table with name, location, startdate and enddate as follows:
+------+----------+-----------+-----------+-----------+
| name | location | startdate | endate | is_active |
+------+----------+-----------+-----------+-----------+
| A | delhi | 3/26/2019 | 3/26/2019 | 1 |
| A | delhi | 3/27/2019 | 3/27/2019 | 1 |
| A | delhi | 3/28/2019 | 3/28/2019 | 1 |
| A | delhi | 3/31/2019 | 3/31/2019 | 1 |
+------+----------+-----------+-----------+-----------+
need to update like this:
+------+----------+-----------+-----------+-----------+
| name | location | startdate | endate | is_active |
+------+----------+-----------+-----------+-----------+
| A | delhi | 3/26/2019 | 3/28/2019 | 1 |
| A | delhi | 3/27/2019 | 3/27/2019 | 0 |
| A | delhi | 3/28/2019 | 3/28/2019 | 0 |
| A | delhi | 3/31/2019 | 3/31/2019 | 1 |
+------+----------+-----------+-----------+-----------+
If the startdate is consecutive, the update the end date with the end date of last consecutive startdate and also update is_active = 0 for the consecutive startdate

This is a gaps-and-islands problem. Here is an approach using lag() and a cumulative sum() to define the groups. The final step is conditiona logic:
select
name,
location,
startdate,
case when row_number() over(partition by name, location, grp order by startdate) = 1
then max(startdate) over(partition by name, location, grp)
else enddate
end as enddate,
case when row_number() over(partition by name, location, grp order by startdate) = 1
then 1
else 0
end as is_active
from (
select
t.*,
sum(case when startdate = dateadd(day, 1, lag_enddate) then 0 else 1 end)
over(partition by name, location order by startdate) grp
from (
select
t.*,
lag(enddate) over(partition by name, location order by startdate) lag_enddate
from mytable t
) t
) t
Demo on DB Fiddle:
name | location | startdate | enddate | is_active
:--- | :------- | :--------- | :--------- | --------:
A | delhi | 2019-03-26 | 2019-03-28 | 1
A | delhi | 2019-03-27 | 2019-03-27 | 0
A | delhi | 2019-03-28 | 2019-03-28 | 0
A | delhi | 2019-03-31 | 2019-03-31 | 1

Related

How to get Max date and sum of its rows SQL

I have following table,
+------+-------------+----------+---------+
| id | date | amount | amount2 |
+------+-------------+----------+---------+
| | | | 500 |
| 1 | 1/1/2020 | 1000 | |
+------+-------------+----------+---------+
| | | | 100 |
| 1 | 1/3/2020 | 1558 | |
+------+-------------+----------+---------+
| | | | 200 |
| 1 | 1/3/2020 | 126 | |
+------+-------------+----------+---------+
| | | | 500 |
| 2 | 2/5/2020 | 4921 | |
+------+-------------+----------+---------+
| | | | 100 |
| 2 | 2/5/2020 | 15 | |
+------+-------------+----------+---------+
| | | | 140 |
| 2 | 1/1/2020 | 5951 | |
+------+-------------+----------+---------+
| | | | 10 |
| 2 | 1/2/2020 | 1588 | |
+------+-------------+----------+---------+
| | | | 56 |
| 2 | 1/3/2020 | 1568 | |
+------+-------------+----------+---------+
| | | | 45 |
| 2 | 1/4/2020 | 12558 | |
+------+-------------+----------+---------+
I need to get each Id's max date and its amount and amount2 summations, how can I do this. according to above data, I need following output.
+------+-------------+----------+---------+
| | | | 300 |
| 1 | 1/3/2020 | 1684 | |
+------+-------------+----------+---------+
| | | | 600 |
| 2 | 2/5/2020 | 4936 | |
+------+-------------+----------+---------+
How can I do this.
Aggregate and use MAX OVER to get the IDs' maximum dates:
select id, [date], sum_amount, sum_amount2
from
(
select
id, [date], sum(amount) as sum_amount, sum(amount2) as sum_amount2,
max([date]) over (partition by id) as max_date_for_id
from mytable group by id, [date]
) aggregated
where [date] = max_date_for_id
order by id;
first is to use dense_rank() to find the row with latest date
dense_rank () over (partition by id order by [date] desc)
after that, just simply group by with sum() on the amount
select id, [date], sum(amount), sum(amount2)
from
(
select *,
dr = dense_rank () over (partition by id order by [date] desc)
from your_table
) t
where dr = 1
group by id, [date]

SQL get first non value based on status

I'm not sure if the title is correct but here's my question. I have a table like this:
+----+--------+--------------+---------+------------+
| id | city | province | status | date |
+----+--------+--------------+---------+------------+
| 1 | cainta | rizal | failed | 01/01/2020 |
| 1 | null | null | success | 02/01/2020 |
| 1 | cainta | rizal | failed | 03/01/2020 |
| 2 | pasig | metro manila | failed | 04/01/2020 |
| 2 | pasig | metro manila | failed | 05/01/2020 |
| 2 | null | null | success | 06/01/2020 |
| 3 | obando | bulacan | failed | 07/01/2020 |
| 3 | null | null | failed | 08/01/2020 |
| 3 | obando | bulacan | success | 09/01/2020 |
+----+--------+--------------+---------+------------+
Now I need to get all transactions with status='success'. If I do that the output will be like this:
| id | city | province | status | date |
|------|--------|------------|----------|------------|
| 1 | nan | nan | success | 02/01/2020 |
| 2 | nan | nan | success | 06/01/2020 |
| 3 | obando | bulacan | success | 09/01/2020 |
What I need is this:
| id | city | province | status | date |
|------|--------|--------------|----------|------------|
| 1 | cainta | rizal | success | 02/01/2020 |
| 2 | pasig | metro manila | success | 06/01/2020 |
| 3 | obando | bulacan | success | 09/01/2020 |
Hopefully someone can shed some light on how to tackle this kind of situation.
Try the following using lag()
with cte as
(
select
*,
lag(city) over (order by id) as ncity,
lag(province) over (order by id) as nprovince
from myTable
)
select
id,
coalesce(city, ncity) as city,
coalesce(province, nprovince) as province,
status,
date
from cte
where status = 'success';
output:
| id | city | province | status | date |
| --- | ------ | ------------ | ------- | ---------- |
| 1 | cainta | rizal | success | 02/01/2020 |
| 2 | pasig | metro manila | success | 06/01/2020 |
| 3 | obando | bulacan | success | 09/01/2020 |
Perhaps a window function can help:
SELECT id, city, province, status, date
FROM (SELECT id,
max(city) OVER w AS city,
max(province) OVER w AS province,
status,
date
FROM atable
WINDOW w AS (PARTITION BY id)) AS q
WHERE status = 'success';
You can use the analytical functions here.
SELECT * FROM
(SELECT T.ID, T.CITY, T.PROVINCE,
MAX(CASE WHEN STATUS = 'success' THEN DATE END)
OVER (PARTITION BY ID ORDER BY DATE) AS DATE,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DATE) AS RN,
SUM(CASE WHEN STATUS = 'success' THEN 1 ELSE 0 END)
OVER (PARTITION BY ID) AS CNT
FROM YOUR_TABLE T)
WHERE RN = 1 AND CNT > 0
As you have changed the sample data, You can use the GROUP BY as follows:
SELECT ID, MAX(CITY) AS CITY, MAX(PROVINCE) AS PROVINCE,
MAX(CASE WHEN STATUS = 'success' THEN DATE END) AS DATE
FROM YOUR_TABLE
GROUP BY ID
HAVING SUM(CASE WHEN STATUS = 'success' THEN 1 END) > 0
If you want just one row per id, you can use aggregation:
select id, max(city) as city, max(province) as province,
max(date) filter (where status = 'success') as date
from t
group by id
having count(*) filter (where status = 'success') > 0;
Note that if you can have multiple success dates per id, you can put the on the same row using array_agg():
array_agg(date) filter (where status = 'success') as dates

How to Create a Flag Based on Date Values in Hive

I have a sample table as follows:
| name | startdate | enddate | flg |
|-------|-----------|------------|-----|
| John | 6/1/2018 | 7/1/2018 | |
| John | 10/1/2018 | 11/1/2018 | |
| John | 12/1/2018 | 12/20/2018 | |
| Ron | 3/1/2017 | 9/1/2017 | |
| Ron | 5/1/2018 | 10/1/2018 | |
| Jacob | 6/10/2018 | 6/12/2018 | |
What I want in the output: If a person has a 'startdate' within 60 days (or 2 months) of an 'enddate' values; then set the flg as 1 for that person. else have the flg as 0.
For example: John has a record of startdate on December 1st; which is within 60 days of one of the enddate for this person (November 1st 2018). So, the flg for this person is set to 1.
So, the output should look like as:
| Name | startdate | enddate | flg |
|-------|-----------|------------|-----|
| John | 6/1/2018 | 7/1/2018 | 1 |
| John | 10/1/2018 | 11/1/2018 | 1 |
| John | 12/1/2018 | 12/20/2018 | 1 |
| Ron | 3/1/2017 | 9/1/2017 | 0 |
| Ron | 5/1/2018 | 10/1/2018 | 0 |
| Jacob | 6/10/2018 | 6/12/2018 | 0 |
Any idea please?
Date Functions: Use datediff and case
select Name,startdate,enddate,
case when datediff(enddate,startdate) < 60 then 1 else 0 end flag
from table
If you are comparing the previous row's enddate, use lag()
select Name,startdate,enddate,
case when datediff(startdate,prev_enddate) < 60 then 1 else 0 end flag
from
(
select Name,startdate,enddate,
lag(endate) over(partition by Name order by startdate,enddate) as prev_enddate
from table
) t
Use lag to get the enddate of the previous row (per name). After this the flag can be set per name using max window function with a case expression that checks to see if the 60 day diff is satisfied at least once per name.
select name
,startdate
,enddate
,max(case when datediff(startdate,prev_end_dt) < 60 then 1 else 0 end) over(partition by name) as flag
from (select t.*
,lag(enddate) over(partition by name order by startdate) as prev_end_dt
from table t
) t

SQL Select Day IN and Day OUT grouped by ID's

How to GROUP EIDs by dates where Date between 2014-01-15 and 2014-03-18
| ID |EID | DATE | Status | |
|----------|--------------|---------|-----|
| 9 |9991 | 2014-03-16 | OUT | |
| 8 |9997 | 2014-03-18 | IN | |
| 7 |9997 | 2014-03-16 | OUT | |
| 6 |9999 | 2014-02-16 | IN | |
| 5 |9999 | 2014-02-16 | OUT | |
| 4 |9996 | 2014-03-18 | IN | |
| 3 |9996 | 2014-03-16 | OUT | |
| 2 |9997 | 2014-01-18 | IN | |
| 1 |9997 | 2014-01-15 | OUT | |
Output should be like:
|
|EID |in date | OUT date| DAYS OUT |
|------|--------------|--------- |------ ----|
| 9997 | 2014-03-18 | 2014-03-16| 2 |
| 9997 | 2014-01-18 | 2014-01-15| 3 |
| 9999 | 2014-02-16 | 2014-02-16| 0 |
| 9996 | 2014-03-18 | 2014-03-16| 2 |
| 9991 | | 2014-03-16| |
Thank you
Here is one method that assumes that they are interleaved, so no two ins or outs are together:
select eid,
max(case when status = 'in' then date end) as in_date,
max(case when status = 'out' then date end) as out_date,
datediff(day,
max(case when status = 'in' then date end),
max(case when status = 'out' then date end)
) as days_diff
from (select t.*, row_number() over (partition by eid, status order by date) as seqnum
from t
) t
group by eid, seqnum;
I think that you have already done it but, have you tried to do the sentence like:
SELECT [here you format as you wish] FROM [your table] WHERE date BETWEEN '2014-01-15' AND '2014-03-18' GROUP BY date
or
SELECT [here you format as you wish] FROM [your table] WHERE dateIn >= '2014-01-15' AND dateOut <= '2014-03-18' GROUP BY dateIn
Can you share your full table?

T-SQL Combine rows in continuation

I have a table that looks like the following.
What I want is the the rows in continuation of each other to be grouped together - for each "ID".
The column IsContinued marks if the next row should be combined with the current row
My data looks like this:
+-----+--------+-------------+-----------+----------+
| ID | Period | IsContinued | StartDate | EndDate |
+-----+--------+-------------+-----------+----------+
| 123 | 1 | 1 | 20180101 | 20180404 |
+-----+--------+-------------+-----------+----------+
| 123 | 2 | 1 | 20180501 | 20180910 |
+-----+--------+-------------+-----------+----------+
| 123 | 3 | 0 | 20181001 | 20181201 |
+-----+--------+-------------+-----------+----------+
| 123 | 4 | 1 | 20190105 | 20190228 |
+-----+--------+-------------+-----------+----------+
| 123 | 5 | 0 | 20190401 | 20190430 |
+-----+--------+-------------+-----------+----------+
| 456 | 2 | 1 | 20180201 | 20180215 |
+-----+--------+-------------+-----------+----------+
| 456 | 3 | 0 | 20180301 | 20180401 |
+-----+--------+-------------+-----------+----------+
| 456 | 4 | 0 | 20180501 | 20180530 |
+-----+--------+-------------+-----------+----------+
| 456 | 5 | 0 | 20180701 | 20180705 |
+-----+--------+-------------+-----------+----------+
The end result I want is this:
+-----+-------------+-----------+-----------+----------+
| ID | PeriodStart | PeriodEnd | StartDate | EndDate |
+-----+-------------+-----------+-----------+----------+
| 123 | 1 | 3 | 20180101 | 20181201 |
+-----+-------------+-----------+-----------+----------+
| 123 | 4 | 5 | 20190105 | 20190430 |
+-----+-------------+-----------+-----------+----------+
| 456 | 2 | 3 | 20180201 | 20180401 |
+-----+-------------+-----------+-----------+----------+
| 456 | 4 | 4 | 20180501 | 20180530 |
+-----+-------------+-----------+-----------+----------+
| 456 | 5 | 5 | 20180701 | 20180705 |
+-----+-------------+-----------+-----------+----------+
DDL Statement:
CREATE TABLE #Period (ID INT, PeriodNr INT, IsContinued INT, STARTDATE DATE, ENDDATE DATE)
INSERT INTO #Period VALUES (123,1,1,'20180101', '20180404'),
(123,2,1,'20180501', '20180910'),
(123,3,0,'20181001', '20181201'),
(123,4,1,'20190105', '20190228'),
(123,5,0,'20190401', '20190430'),
(456,2,1,'20180201', '20180215'),
(456,3,0,'20180301', '20180401'),
(456,4,0,'20180501', '20180530'),
(456,5,0,'20180701', '20180705')
The code should be run on SQL Server 2016
Thanks!
Here is one approach:
with removeFluff as
(
SELECT *
FROM (
SELECT ID, PeriodNr, IsContinued, STARTDATE, ENDDATE, LAG(IsContinued,1,2) OVER (PARTITION BY ID ORDER BY PERIODNR) Lag
FROM #Period
) A
WHERE (IsContinued <> Lag) OR (IsContinued + Lag = 0)
)
,getValues as
(
SELECT ID,
CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(PeriodNr) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE PeriodNr END PeriodStart,
PeriodNr PeriodEnd,
CASE WHEN LAG(IsContinued) OVER (PARTITION BY ID ORDER BY PeriodNr) = 1 THEN LAG(STARTDATE) OVER (PARTITION BY ID ORDER BY PeriodNr) ELSE STARTDATE END StartDate,
EndDate,
IsContinued
FROM removeFluff r
)
SELECT ID, PeriodStart, PeriodEnd, StartDate, EndDate
FROM getValues
WHERE IsContinued = 0
Output:
ID PeriodStart PeriodEnd StartDate EndDate
123 1 3 2018-01-01 2018-12-01
123 4 5 2019-01-05 2019-04-30
456 2 3 2018-02-01 2018-04-01
456 4 4 2018-05-01 2018-05-30
456 5 5 2018-07-01 2018-07-05
Method:
removeFluff cte removes lines that are unimportant. Theses are the records that don't start or end a segment (line 2 in your sample data)
Now that the fluff is removed, we know that either:
A.) The line is complete on it's own (LAG(IsContinued) ... = 0), ie. previous line is complete
B.) The line needs the "start" info from the previous line (LAG(IsContinued) ... = 1)
We apply these two cases in the CASE expression of the getValues cte
Last, the results are narrowed to only the important rows in the final select with IsContinued = 0. This is because we have used LAG to get "start" data on the "end" data row, so we only want to select the end rows