Generate multiples rows of new column based on one value of another column - sql

I have a table like below:
ID
Date
1
2022-01-01
2
2022-03-21
I want to add a new column based on the date and it should look like this
ID
Date
NewCol
1
2022-01-01
2022-02-01
1
2022-01-01
2022-03-01
1
2022-01-01
2022-04-01
1
2022-01-01
2022-05-01
2
2022-03-21
2022-04-21
2
2022-03-21
2022-05-21
Let's say that there is a #EndDate = 2022-05-31 (that's where it should stop)
I'm having a hard time trying to figure out how to do it in SSMS. Would appreciate any insights! Thanks :)

In the following solutions we leverage string_split with combination with replicate to generate new records.
select ID
,Date
,dateadd(month, row_number() over(partition by ID order by (select null)), Date) as NewCol
from (
select *
from t
outer apply string_split(replicate(',',datediff(month, Date, '2022-05-31')-1),',')
) t
ID
Date
NewCol
1
2022-01-01
2022-02-01
1
2022-01-01
2022-03-01
1
2022-01-01
2022-04-01
1
2022-01-01
2022-05-01
2
2022-03-21
2022-04-21
2
2022-03-21
2022-05-21
Fiddle
For SQL in Azure and SQL Server 2022 we have a cleaner solution based on [ordinal][4].
"The enable_ordinal argument and ordinal output column are currently
supported in Azure SQL Database, Azure SQL Managed Instance, and Azure
Synapse Analytics (serverless SQL pool only). Beginning with SQL
Server 2022 (16.x) Preview, the argument and output column are
available in SQL Server."
select ID
,Date
,dateadd(month, ordinal, Date) as NewCol
from (
select *
from t
outer apply string_split(replicate(',',datediff(month, Date, '2022-05-31')-1),',',1)
) t

with cal (id, dt) as
(
select id, date as dt from t
union all select id, dateadd(month, 1, dt) from cal where month(dt) < month('2022-05-31')
)
select t.id
,t.date
,cal.dt as new_col
from cal join t on t.id = cal.id and t.date != cal.dt
order by id, new_col
id
date
new_col
1
2022-01-01
2022-02-01
1
2022-01-01
2022-03-01
1
2022-01-01
2022-04-01
1
2022-01-01
2022-05-01
2
2022-03-21
2022-04-21
2
2022-03-21
2022-05-21
Fiddle

There are many ways to "explode" a row into a set, the simplest in my opinion is a recursive CTE:
DECLARE #endpoint date = '20220531';
DECLARE #prev date = DATEADD(MONTH, -1, #endpoint);
WITH x AS
(
SELECT ID, date, NewCol = DATEADD(MONTH, 1, date) FROM #d
UNION ALL
SELECT ID, date, DATEADD(MONTH, 1, NewCol) FROM x
WHERE NewCol < #prev
)
SELECT * FROM x
ORDER BY ID, NewCol;
Working example in this fiddle.
Keep in mind that if you could have > 100 months you'll need to add OPTION (MAXRECURSION) (or just consider using a different solution at scale).

Related

Find top common value

I am trying to get the top common date group by code and item. Is there anyway I can achieve this in snowflake?
My current table looks something like this. I need to extract out the date that is available in all item for each code. For e.g. for code = 1, I only want date = 2022-03-01 because it's the only date that is common between item a,b,c.
Code
Date
item
1
2022-01-01
a
1
2022-03-01
a
1
2022-01-01
b
1
2022-03-01
b
1
2022-03-01
c
1
2022-05-01
c
2
2022-01-01
a
2
2022-05-01
a
2
2022-01-01
b
2
2022-03-01
b
2
2022-01-01
c
My end result:
Code
Date
item
1
2022-03-01
a
1
2022-03-01
b
1
2022-03-01
c
2
2022-01-01
a
2
2022-01-01
b
2
2022-01-01
c
You may use count window function to count the similar dates for each code, then use the desne_rank function to get the rows with date value equal to the date with max count.
with count_dates as
(
select *,
count(*) over (partition by Code, Date) cn
from table_name
)
select Code, Date, item
from
(
select *,
dense_rank() over (partition by Code order by cn desc) rnk
from count_dates
) T
where rnk=1
order by Code
Using dense_rank() over (partition by Code order by cn desc) rnk will return all the latest common dates (dates with with same maximum count value), if you want to get only the latest common date use dense_rank() over (partition by Code order by cn desc, Date desc) rnk.
Output:

Group by date and find median of processing time

I select input date and output date from a database. I use a formula to indicate the processing time. Now, I would like the values ​​to be grouped according to the date of receipt and the median of the processing time to be output for all grouped dates of receipt. Something like this:
The data I select:
input date | output date | processing time
2022-01-03 | 2022-01-03 | 0
2022-01-03 | 2022-01-06 | 3
2022-01-03 | 2022-01-11 | 8
2022-01-05 | 2022-01-10 | 5
2022-01-05 | 2022-01-15 | 10
The output I want:
input date | processing time
2022-01-03 | 3
2022-01-05 | 7.5
My SQL Code:
SELECT [received_date]
,CONVERT(date, [exported_on])
,DATEDIFF(day, [received_date], [exported_on]) AS processing_time
FROM [request] WHERE YEAR (received_date) = 2022
GROUP BY received_date, [exported_on]
ORDER BY received_date
How can I do this? Do I need a temp table to do this, or can I modify my query?
You could try using PERCENTILE_CONT
with cte as (
select input_date,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY processing_time) OVER(PARTITION BY input_date) as Median_Process_Time
FROM tableA
)
SELECT *
FROM cte
GROUP BY input_date, Median_Process_Time
db fiddle
Also you check check out the discussion here How to find the SQL medians for a grouping
Here my solution. Thank you for your help.
SET NOCOUNT ON;
DECLARE #working TABLE(entry_date date, exit_date date, work_time int)
INSERT INTO #working
SELECT [received] AS date_of_entry
,CONVERT(date, [exported]) AS date_of_exit
,DATEDIFF(day, [received], [exported]) AS processing_time
FROM [zsdt].[dbo].[antrag] WHERE YEAR([received]) = 2022 AND scanner_name IS NOT NULL AND exportiert_am IS NOT NULL AND NOT scanner_name = 'AP99'
GROUP BY [received], [exported]
ORDER BY [received] ASC
;WITH CTE AS
( SELECT entry_date,
work_time,
[half1] = NTILE(2) OVER(PARTITION BY entry_date ORDER BY work_time),
[half2] = NTILE(2) OVER(PARTITION BY entry_date ORDER BY work_time DESC)
FROM #working
WHERE work_time IS NOT NULL
)
SELECT entry_date,
(MAX(CASE WHEN Half1 = 1 THEN work_time END) +
MIN(CASE WHEN Half2 = 1 THEN work_time END)) / 2.0
FROM CTE
GROUP BY entry_date;

Create sql Key based on datetime that is persistent overnight

I have a time series with a table like this
CarId
EventDateTime
Event
SessionFlag
CarId
EventDateTime
Event
SessionFlag
ExpectedKey
1
2022-01-01 7:00
Start
1
1-20220101-7
1
2022-01-01 7:05
Drive
1
1-20220101-7
1
2022-01-01 8:00
Park
1
1-20220101-7
1
2022-01-01 10:00
Drive
1
1-20220101-7
1
2022-01-01 18:05
End
0
1-20220101-7
1
2022-01-01 23:00
Start
1
1-20220101-23
1
2022-01-01 23:05
Drive
1
1-20220101-23
1
2022-01-02 2:00
Park
1
1-20220101-23
1
2022-01-02 3:00
Drive
1
1-20220101-23
1
2022-01-02 15:00
End
0
1-20220101-23
1
2022-01-02 16:00
Start
1
1-20220102-16
Other CarIds do exist.
What I am attempting to do is create the last column, ExpectedKey.
The problem I face though is midnight, as the same session can exist over two days.
The record above with ExpectedKey 1-20220101-23 is the prime example of what I'm trying to achieve.
I've played with using:
CASE
WHEN SessionFlag<> 0
AND
SessionFlag= LAG(SessionFlag) OVER (PARTITION BY Carid ORDER BY EventDateTime)
THEN FIRST_VALUE(CarId+'-'+Convert(CHAR(8),EventDateTime,112)+'-'+CAST(DATEPART(HOUR,EventDateTime)AS
VARCHAR))OVER (PARTITION BY CarId ORDER BY EventDateTime)
ELSE CarId+'-'+Convert(CHAR(8),EventDateTime,112)+'-'+CAST(DATEPART(HOUR,EventDateTime)AS VARCHAR) END AS SessionId
But can't seem to make it partition correctly overnight.
Can anyone off advice?
This is a classic gaps-and-islands problem. There are a number of solutions.
The simplest (if not that efficient) is partitioning over a windowed conditional count
WITH Groups AS (
SELECT *,
GroupId = COUNT(CASE WHEN t.Event = 'Start' THEN 1 END)
OVER (PARTITION BY t.CarId ORDER BY t.EventDateTime)
FROM YourTable t
)
SELECT *,
NewKey = CONCAT_WS('-',
t.CarId,
CONVERT(varchar(8), EventDateTime, 112),
FIRST_VALUE(DATEPART(hour, t.EventDateTime))
OVER (PARTITION BY t.CarId, t.GroupId ORDER BY t.EventDateTime
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
)
FROM Groups t;
db<>fiddle
using APPLY to get the Start event datetime and form the key with concat_ws
select *
from time_series t
cross apply
(
select top 1
ExpectedKey = concat_ws('-',
CarId,
convert(varchar(10), EventDateTime, 112),
datepart(hour, EventDateTime))
from time_series x
where x.Event = 'Start'
and x.EventDateTime <= t.EventDateTime
order by x.EventDateTime desc
) k

How to fill missing values for missing dates with value from date before in sql bigquery? [duplicate]

This question already has an answer here:
Create Balance Sheet with every date is filled in Bigquery
(1 answer)
Closed 8 months ago.
Hi I have a product table with daily price, the catch here is that for the table only updates if there's a price change, and for the dates in between will not be written into the table because the price is the same as the day before.
How do I fill missing values of price with the last entry of date before?
date
id
price
2022-01-01
1
5
2022-01-03
1
6
2022-01-05
1
7
2022-01-01
2
10
2022-01-02
2
11
2022-01-06
2
12
into
date
id
price
2022-01-01
1
5
2022-01-02
1
5
2022-01-03
1
6
2022-01-04
1
6
2022-01-05
1
7
2022-01-01
2
10
2022-01-02
2
11
2022-01-03
2
11
2022-01-04
2
11
2022-01-05
2
11
2022-01-06
2
12
I am currently thinking of creating a table for dates and joining and using lag function. Anyone can help?
select
date,id,
case
when price is null then nullPrice
else price
end as price
from(
select *,
Lag(price, 1) OVER(.
ORDER BY date,id ASC) AS nullPrice
from price_table
join date_table using(date)
)
Consider below:
WITH days_by_id AS (
SELECT id, GENERATE_DATE_ARRAY(MIN(date), MAX(date)) days
FROM sample
GROUP BY id
)
SELECT date, id,
IFNULL(price, LAST_VALUE(price IGNORE NULLS) OVER (PARTITION BY id ORDER BY date)) AS price
FROM days_by_id, UNNEST(days) date LEFT JOIN sample USING (id, date);
output :
You can use generate_date_array function for this
with date_arr
as(
select *
from unnest(generate_date_array('2022-01-01', '2022-05-01')) as dt)
select da.dt, t1.*
from date_arr da
left outer join table1 t1
on da.dt = t1.dt
You can replace hardcoded dates with max and min date from table.

Create a row for each date in a range, and add 1 for each day within a date range for a record in SQL

Suppose I have a date range, #StartDate = 2022-01-01 and #EndDate = 2022-02-01, and this is a reporting period.
In addition, I also have customer records, where each customer has a LIVE Date and a ServiceEndDate (or ServiceEndDate = NULL as they are an ongoing customer)
Some customers may have their Live Date and Service end date range extend outside of the reporting period range. I would only want to report for days that they were a customer in the period.
Name
LiveDate
ServiceEndDate
Tom
2021-10-11
2022-01-13
Mark
2022-11-13
2022-02-15
Andy
2022-01-02
2022-02-10
Rob
2022-01-09
2022-01-14
I would like to create a table where column A is the Date (iterating between every date in the reporting period) and column B is a sum of the number of customers that were a customer on that date.
Something like this
Date
NumberOfCustomers
2022-01-01
2
2022-01-02
3
2022-01-03
3
2022-01-04
3
2022-01-05
3
2022-01-06
3
2022-01-07
3
2022-01-08
3
2022-01-09
4
2022-01-10
4
2022-01-11
4
2022-01-12
4
2022-01-13
4
2022-01-14
3
2022-01-15
3
And so on until the end the #EndDate
Any help would be much appreciated, thanks.
You can join your table to a calendar table containing all the dates you need:
with calendar as
(select cast('2022-01-01' as datetime) as d
union all select dateadd(day, 1, d)
from calendar
where d < '2022-02-01')
select d as "Date", count(*) as NumberOfCustomers
from calendar inner join table_name
on d between LiveDate and coalesce(ServiceEndDate, '9999-12-31')
group by d;
Fiddle
I would personally suggest using a Tally, rather than an rCTE, as a Tally is significantly more performant.
SELECT *
INTO dbo.YourTable
FROM (VALUES('Tom ',CONVERT(date,'2021-10-11 '),CONVERT(date,'2022-01-13')),
('Mark',CONVERT(date,' 2022-11-13'),CONVERT(date,' 2022-02-15')),
('Andy',CONVERT(date,' 2022-01-02'),CONVERT(date,' 2022-02-10')),
('Rob ',CONVERT(date,'2022-01-09 '),CONVERT(date,'2022-01-14')))V(Name,LiveDate,ServiceEndDate);
GO
SELECT *
FROM dbo.YourTable;
GO
DECLARE #StartDate date = '20220101',
#EndDate date = '20220201';
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT 0 AS I
UNION ALL
SELECT TOP (DATEDIFF(DAY, #StartDate, #EndDate))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3), --up to 1,000 days
Dates AS(
SELECT DATEADD(DAY, T.I, #StartDate) AS Date
FROM Tally T)
SELECT D.Date,
COUNT(YT.[Name]) AS NumberOfCustomers
FROM Dates D
LEFT JOIN dbo.YourTable YT ON D.[Date] >= YT.LiveDate
AND (D.[Date] <= YT.ServiceEndDate
OR YT.ServiceEndDate IS NULL)
GROUP BY D.[Date]
ORDER BY D.[Date];
GO
DROP TABLE dbo.YourTable;
Note that then results don't reflect your expected results, I suspect your expected results are wrong. For example you have 2 people live on 2022-01-01, however, there is only 1 person who is live on that date: Tom.
This solution will also never have Mark as "live" (the rCTE method in the other answer won't either) as their end date is before their Live date. If someone can have their service end before it started, I would suggest you have a data quality issue, and you should be adding a CHECK CONSTRAINT to the table to ensure that value of ServiceEndDate is >= LiveDate.