Change Dates using Lag based on condition - sql

Input Table :
date_1 date_2 ID
2019-01-01 2019-06-30 1
2019-05-01 2019-05-31 1
2019-06-01 2019-07-30 1
2019-01-02 2019-02-28 2
2019-03-01 2019-08-30 2
2019-01-02 2019-02-28 3
2019-02-06 2019-08-30 3
I am working on a complex HIVE problem of dates.
I need to changes dates of date_1 column and date_2 column for same ID.
I want to copy date_2's date to date_1's date in next row based on a condition. And all this I have to do for each ID, i.e. partition By ID.
Note : Data is sorted by ID asc, date_1 asc, date_2 asc.
For example :
Consider 2nd row, date_1 date is '2019-05-01' and now see its previous row for same ID 1 , here date_2 date is '2019-06-30'.
So check IF date_2 value of any row's previous row is greater than current row's value of date_1 , which is true in case of second row of ID 1.
When true then replace date_1 value of second row with date_2 value of previous row.
i.e. change 2019-05-01 to 2019-06-30, otherwise keep it as it is. Same do it for 3rd row and so on.
when considering 3rd row , then look for its previous row 2nd . And same goes for other rows.
Consider 2nd row of ID 2.
Here 2019-02-28is not greater than 2019-03-01 , so keep it as it is.
Expected Output :
date_1 date_2 ID
2019-01-01 2019-06-30 1
2019-06-30 2019-05-31 1
2019-06-01 2019-07-30 1
2019-01-02 2019-02-28 2
2019-03-01 2019-08-30 2
2019-01-02 2019-02-28 3
2019-02-28 2019-08-30 3

I think you want lag() like this
select date_add(lag(date2, 1, date1) over (partition by id order by date1), 1) as date1,
date2,
id
from t;

Related

How to order rows by the greatest date of each row, for a table with 8 date columns?

This is very different from doing an SQL order by 2 date columns (or for proper way to sort sql columns, which is only for 1 column). There, we would do something like:
ORDER BY CASE WHEN date_1 > date_2
THEN date_2 ELSE date_1 END
FYI, I'm using YYY-MM-DD in this example for brevity, but I also need it to work for
TIMESTAMP (YYYY-MM-DD HH:MI:SS)
I have this table:
id
name
date_1
date_2
date_3
date_4
date_5
date_6
date_7
date_8
1
John
2008-08-11
2008-08-12
2009-08-11
2009-08-21
2009-09-11
2017-08-11
2017-09-12
2017-09-30
2
Bill
2008-09-12
2008-09-12
2008-10-12
2011-09-12
2008-09-13
2022-05-20
2022-05-21
2022-05-22
3
Andy
2008-10-13
2008-10-13
2008-10-14
2008-10-15
2008-11-01
2008-11-02
2008-11-03
2008-11-04
4
Hank
2008-11-14
2008-11-15
2008-11-16
2008-11-17
2008-12-31
2009-01-01
2009-01-02
2009-01-02
5
Alex
2008-12-15
2018-12-15
2018-12-15
2018-12-16
2018-12-17
2018-12-18
2018-12-25
2008-12-31
... But, the permutations of that give me a headache, just to think about them.
This Answer had more of a "general solution", but that was to SELECT, not to ORDER BY...
SELECT MAX(date_col)
FROM(
SELECT MAX(date_col1) AS date_col FROM some_table
UNION
SELECT MAX(date_col2) AS date_col FROM some_table
UNION
SELECT MAX(date_col3) AS date_col FROM some_table
...
)
Is there something more like that, such as could be created by iterating a loop in, say PHP or Node.js? I need something a scalable solution.
I only need to list each row once.
I want to order them each by whichever col has the most recent date of those I list on that row.
Something like:
SELECT * FROM some_table WHERE
(
GREATEST OF date_1
OR date_2
OR date_3
OR date_4
OR date_5
OR date_6
OR date_7
OR date_8
)
You can use the GREATEST function to achieve it.
SELECT GREATEST(date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8) max_date,t.*
FROM Tab t
ORDER BY GREATEST(date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8) Desc;
DB Fiddle: Try it here
max_date
id
name
date_1
date_2
date_3
date_4
date_5
date_6
date_7
date_8
2022-05-22
2
Bill
2008-09-12
2008-09-12
2008-10-12
2011-09-12
2008-09-13
2022-05-20
2022-05-21
2022-05-22
2018-12-25
5
Alex
2008-12-15
2018-12-15
2018-12-15
2018-12-16
2018-12-17
2018-12-18
2018-12-25
2008-12-31
2017-09-30
1
John
2008-08-11
2008-08-12
2009-08-11
2009-08-21
2009-09-11
2017-08-11
2017-09-12
2017-09-30
2009-01-02
4
Hank
2008-11-14
2008-11-15
2008-11-16
2008-11-17
2008-12-31
2009-01-01
2009-01-02
2009-01-02
2008-11-04
3
Andy
2008-10-13
2008-10-13
2008-10-14
2008-10-15
2008-11-01
2008-11-02
2008-11-03
2008-11-04
In the event of a NULL value, GREATEST could throw-off the ORDER.
Based on this Answer from a Question about GREATEST handling NULL, this would apply these tables, based on the approved Answer:
SELECT COALESCE (
GREATEST(date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8),
date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8
) max_date,t.*
FROM TAB t
ORDER BY COALESCE (
GREATEST(date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8),
date_1,date_2,date_3,date_4,date_5,date_6,date_7,date_8
) DESC;

sql query using time series

I have the below table in bigquery:
Timestamp variant_id activity
2020-04-02 08:50 1 active
2020-04-03 07:39 1 not_active
2020-04-04 07:40 1 active
2020-04-05 10:22 2 active
2020-04-07 07:59 2 not_active
I want to query this subset of data to get the number of active variant per day.
If variant_id 1 is active at date 2020-04-04, it still active the follwing dates also 2020-04-05, 2020-04-06 until the value activity column is not_active , the goal is to count each day the number of variant_id who has the value active in the column activity, but I should take into account that each variant_id has the value of the last activity on a specific date.
for example the result of the desired query in the subset data must be:
Date activity_count
2020-04-02 1
2020-04-03 0
2020-04-04 1
2020-04-05 2
2020-04-06 2
2020-04-07 1
2020-04-08 1
2020-04-09 1
2020-04-10 1
any help please ?
Consider below approach
select date, count(distinct if(activity = 'active', variant_id, null)) activity_count
from (
select date(timestamp) date, variant_id, activity,
lead(date(timestamp)) over(partition by variant_id order by timestamp) next_date
from your_table
), unnest(generate_date_array(date, ifnull(next_date - 1, '2020-04-10'))) date
group by date
if applied to sample data in your question - output is

Specific grouping elements in SQL Server

I've got a problem with my SQL task and didn't find any answers yet.
I've got table with this sample data:
ID
Value
Date
1
1
2020-01-01
1
2
2020-03-02
1
1
2020-03-21
1
1
2020-04-14
1
3
2020-05-01
1
1
2020-08-09
1
1
2020-09-12
1
1
2020-10-12
1
3
2020-12-04
All I want to get is:
ID
Value
Date
1
1
2020-01-01
1
2
2020-03-02
1
1
2020-03-21
1
3
2020-05-01
1
1
2020-08-09
1
3
2020-12-04
Some kind of changing value history, but only if the value was changed - when value on new record is the same, get value with min date.
I tried with grouping and row_number, but got no positive results. Any ideas how to do that?
One way to articulate your logic is to say that you want to retain a record when the previous record, as ordered by the date (within a given ID), has a different value than the current record.
WITH cte AS (
SELECT *, LAG(Value) OVER (PARTITION BY ID ORDER BY Date) LagValue
FROM yourTable
)
SELECT ID, Value, Date
FROM cte
WHERE LagValue <> Value OR LagValue IS NULL
ORDER BY Date;
Demo

How to create a specific SQL Server stored procedure? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have a CONTRACTOR_SCHEDULER table. Letters in columns "Schedule" mean working time (m - 8:00-20:00, n - 20:00-8:00(next day), d - 8:00-8:00(next day), h-day off). (Name, Begin_date) is unique.
Name
Schedule
Begin_date
End_date
John
nhmh
2019-01-01
2019-01-08
John
nnh
2019-01-09
2019-01-25
Kate
dh
2019-01-01
2019-01-07
Kate
mnhdh
2019-01-08
2019-01-14
Mike
nh
2019-01-01
2019-02-01
Mike
mh
2019-02-02
2019-12-31
I need a SQL Server stored procedure that creates new table CONTRACTOR_WORK_DAY of working days using CONTRACTOR_SCHEDULER rows (days off do not appear in table).
Example:
First row - John has schedule nhm. First letter is n so begin_date - 2019-01-01 20:00, End_date - 2019-01-02 08:00. Next letter is h, skip it as a day off. Last letter is m so begin_date - 2019-01-03 08:00, End_date - 2019-01-03 20:00. Repeat schedule until 2019-01-08 - end_date in first table.
Table for the first row of CONTRACTOR_SCHEDULER would be:
Name
Begin_date
End_date
John
2019-01-01 20:00
2019-01-02 08:00
John
2019-01-03 08:00
2019-01-03 20:00
John
2019-01-05 20:00
2019-01-06 08:00
John
2019-01-08 08:00
2019-01-08 20:00
I wrote this in python using some loop over schedule string etc. but can not figure out how to do it in T-SQL for SQL Server.
Yet another option.
Not sure if I agree with the last record of the desired results. I have 2019-01-07 while you have 2019-01-08
Example or dbFiddle
Select A.[Name]
,[Begin_Date] = convert(datetime,left(dateadd(DAY,N ,[Begin_date]),10)+' '+BegTime)
,[End_Date] = convert(datetime,left(dateadd(DAY,N+NxtDay,[Begin_date]),10)+' '+EndTime)
From YourTable A
Cross Apply ( values ( left(replicate(Schedule,10),datediff(DAY,Begin_Date,End_Date)) ) )B(S)
Cross Apply (
Select N=N-1
,Subs=substring(B.S,N,1)
From ( Select Top (len(S)+1) N=Row_Number() Over (Order By (Select Null)) From master..spt_values n1 ) B1
) C
Join ( values ('m','08:00','20:00',0)
,('n','20:00','08:00',1)
,('d','08:00','08:00',1)
) D(SchdCd,BegTime,EndTime,NxtDay) on Subs=SchdCd
Order By [Begin_Date]
Results
Name Begin_Date End_Date
John 2019-01-01 20:00:00.000 2019-01-02 08:00:00.000
John 2019-01-03 08:00:00.000 2019-01-03 20:00:00.000
John 2019-01-05 20:00:00.000 2019-01-06 08:00:00.000
John 2019-01-07 08:00:00.000 2019-01-07 20:00:00.000
WITH
numbers AS
(
-- Generate a table of values (from 0 upwards)
-- Must be at least as long as the longest schedule string
SELECT
ROW_NUMBER() OVER (ORDER BY sv.number) - 1 AS id
FROM
master..spt_values sv
),
pivotted AS
(
-- Create one row for each day in the date range
-- Extract the relevant character from the schedule for each day
SELECT
c.*,
n.id AS date_offset,
SUBSTRING(c.schedule, (n.id % LEN(c.schedule)) + 1, 1) AS schedule_char
FROM
CONTRACTOR_SHERULER c
INNER JOIN
numbers n
ON n.id <= DATEDIFF(DAY, c.begin_date, c.end_date)
)
-- Add a number of days to the begin_date
-- Then add a number of hours based on the current character from the schedule
SELECT
pivotted.*,
DATEADD(
HOUR,
CASE pivotted.schedule_char WHEN 'm' THEN 8
WHEN 'n' THEN 20
WHEN 'd' THEN 8 END,
DATEADD(DAY, date_offset, pivotted.begin_date)
)
AS begin_datetime,
DATEADD(
HOUR,
CASE pivotted.schedule_char WHEN 'm' THEN 20
WHEN 'n' THEN 32
WHEN 'd' THEN 32 END,
DATEADD(DAY, date_offset, pivotted.begin_date)
)
AS end_datetime
FROM
pivotted
WHERE
pivotted.schedule_char <> 'h'
Demo : https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=f6c52e6718329f2ae013dc93edad5d78

How to filter only a subset of data while maintain existing records as is in a table?

[SQL Novice] I have a table that looks like this:
id date
1 2019-01-01
1 2019-01-02
2 2019-03-01
2 2019-05-01
I want to only filter the id column on 2 where date is between 2019-04-01 and 2019-05-01 without impacting id equals 1.
The new table should look like this:
id date
1 2019-01-01
1 2019-01-02
2 2019-03-01
I tried this:
select * from table1 where id =2 and date between 2019-03-01 and 2019-04-01
And get this data set:
id date
2 2019-03-01
I think you want or:
where id = 1 or
(id = 2 and date between '2019-03-01' and '2019-04-01')
for your desired result need
select * from table1 where [date] >= '2019-01-01' and [date] <= '2019-03-01'