SQL: Getting Missing Date Values and Copy Data to Those New Dates - sql

So this seems somewhat weird, but this use case came up, and I have been somewhat struggling trying to figure out how to come about a solution. Let's say I have this data set:
date
value1
value2
2020-01-01
50
2
2020-01-04
23
5
2020-01-07
14
8
My goal is to try and fill in the gap between the two dates while copying whatever values were from the date before it. So for example, the data output I would want is:
date
value1
value2
2020-01-01
50
2
2020-01-02
50
2
2020-01-03
50
2
2020-01-04
23
5
2020-01-05
23
5
2020-01-06
23
5
2020-01-07
14
8
Not sure if this is something I can do with SQL but would definitely take any suggestions.

One approach is to use the window function lead() in concert with an ad-hoc tally table if you don't have a calendar table (highly suggested).
Example
;with cte as (
Select *
,nrows = datediff(day,[date],lead([date],1,[date]) over (order by [date]))
From YourTable A
)
Select date = dateadd(day,coalesce(N-1,0),[date])
,value1
,value2
From cte A
left Join (Select Top 1000 N=Row_Number() Over (Order By (Select NULL)) From master..spt_values n1 ) B
on N<=nRows
Results
date value1 value2
2020-01-01 50 2
2020-01-02 50 2
2020-01-03 50 2
2020-01-04 23 5
2020-01-05 23 5
2020-01-06 23 5
2020-01-07 14 8
EDIT: If you have a calendar table
Select Date = coalesce(B.Date,A.Date)
,value1
,value2
From (
Select Date
,value1
,value2
,Date2 = lead([date],1,[date]) over (order by [date])
From YourTable A
) A
left Join CalendarTable B on B.Date >=A.Date and B.Date< A.Date2

Another option is to use CROSS APPLY. I am not sure how you are determining what range you want from the table, but you can easily override my guess by explicitly defining #s and #e:
DECLARE #s date, #e date;
SELECT #s = MIN(date), #e = MAX(date) FROM dbo.TheTable;
;WITH d(d) AS
(
SELECT #s UNION ALL
SELECT DATEADD(DAY,1,d) FROM d
WHERE d < #e
)
SELECT d.d, x.value1, x.value2
FROM d CROSS APPLY
(
SELECT TOP (1) value1, value2
FROM dbo.TheTable
WHERE date <= d.d
AND value1 IS NOT NULL
ORDER BY date DESC
) AS x
-- OPTION (MAXRECURSION 32767) -- if date range can be > 100 days but < 89 years
-- OPTION (MAXRECURSION 0) -- if date range can be > 89 years
If you don't like the recursive CTE, you could easily use a calendar table (but presumably you'd still need a way to define the overall date range you're after as opposed to all of time).
Example db<>fiddle

In SQL Server you can make a cursor, which iterates over the dates. If it finds values for a given date, it takes those and stores them for later. in the next iteration it can then take the stored values, in case there are no values in the database

Related

Sum and segment overlapping date ranges

Our HR system specifies employee assignments, which can be concurrent. Our rostering system only allows one summary assignment for a person. Therefore I need to pre-process the HR records, so rostering can determine the number of shifts a worker is expected to work on a given day.
Looking just at worker A who has two assignments, the first is for a quarter shift and the second for a half shift, but overlapping in the middle where they work .75 shifts.
Person StartDate EndDate Shifts
A 01/01/21 04/01/21 .25
A 03/01/21 06/01/21 .5
01---02---03---04---05---06---07
Rec 1 |------------------|
Rec 2 | |===================|
Total | 0.25 | 0.75 | 0.5 |
Required output.
Person StartDate EndDate ShiftCount
A 01/01/21 02/01/21 0.25
A 03/01/21 04/01/21 0.75
A 05/01/21 06/01/21 0.5
Given this data, how do we sum and segment the data? I found an exact question for MySQL but the version was too early and code was suggested. I also found a Postgres solution but we don't have ranges.
select * from (
values
('A','01/01/21','04/01/21',0.25),
('A','03/01/21','05/01/21',0.5)
) AS Data (Person,StartDate,EndDate,Shifts);
It looks like a Gaps-and-Islands to me.
If it helps, cte1 is used to expand the date ranges via an ad-hoc tally table. Then cte2 is used to create the Gaps-and-Islands. The final result is then a small matter of aggregation.
Example
Set Dateformat DMY
Declare #YourTable table (Person varchar(50),StartDate Date,EndDate date,Shifts decimal(10,2))
Insert Into #YourTable values
('A','01/01/21','04/01/21',0.25)
,('A','03/01/21','05/01/21',0.5)
;with cte1 as (
Select [Person]
,[d] = dateadd(DAY,N,StartDate)
,Shifts = sum(Shifts)
From #YourTable A
Join (
Select Top 1000 N=-1+Row_Number() Over (Order By (Select Null))
From master..spt_values n1,master..spt_values n2
) B on N <= datediff(DAY,[StartDate],[EndDate])
Group By Person,dateadd(DAY,N,StartDate)
), cte2 as (
Select *
,Grp = datediff(day,'1900-01-01',d)-row_number() over (partition by Person,Shifts Order by d)
From cte1
)
Select Person
,StartDate = min(d)
,EndDate = max(d)
,Shifts = max(Shifts)
From cte2
Group By Person,Grp
Returns
Person StartDate EndDate Shifts
A 2021-01-01 2021-01-02 0.25
A 2021-01-03 2021-01-04 0.75
A 2021-01-05 2021-01-05 0.50

Finding all dates after a date for a variable number of days

I have a list of dates in a table. For this examples the 1st day of each month. Let's call it table timeperiod with column endTime
endTime
1-1-2019
2-1-2019
3-1-2019
4-1-2019
I want to find all dates x number of days after each date in a list. Lets say x = 4. Then the list should be:
1-1-2019
1-2-2019
1-3-2019
1-4-2019
2-1-2019
2-2-2019
2-3-2019
2-4-2019
3-1-2019
3-2-2019
3-3-2019
3-4-2019
4-1-2019
4-2-2019
4-3-2019
4-4-2019
I have found solutions to find all dates between dates but I keep getting "Subquery returned more than 1 value" error when I try to use it with a list of dates.
Here is an example of something I tried but doesn't work
declare #days DECIMAL = 4
declare #StartDate date = (select convert(varchar, DATEADD(Day, +0, endTime),101) from timeperiod
declare #EndDate date = (select convert(varchar, DATEADD(Day, +#days, endTime),101) from timeperiod;
;WITH cte AS (
SELECT #StartDate AS myDate
UNION ALL
SELECT DATEADD(day,1,myDate) as myDate
FROM cte
WHERE DATEADD(day,1,myDate) <= #EndDate
)
SELECT myDate
FROM cte
OPTION (MAXRECURSION 0)
Here is a row generator that generates 5 rows, 0 to 4:
WITH rg AS (
SELECT 0 AS rn
UNION ALL
SELECT rg.rn + 1
FROM rg
WHERE rn < 4
)
Here we join it with your existing table that has firsts of the month and use DATEADD to add rn numbers of days (between 0 and 4) to the endPeriod. CROSS JOINing it caused the rows in timePeriod to repeat 5 times each:
SELECT
DATEADD(DAY, rg.rn, timePeriod.endTime) as fakeEndTime
FROM
rg CROSS JOIN timePeriod
I wasn't really clear when you say "days X days after the date, say x = 4" - to me if there is a day that is 1-Jan-2000, then the date 4 days after this is 5-Jan-2000
If you only want the 1,2,3 and 4 of Jan make the row generator < 3 instead of < 4
Already +1'd on Caius Jard's recursive cte.
Here is yet another option using an ad-hoc tally table in concert with a CROSS JOIN
Example
Declare #YourTable Table ([endTime] date)
Insert Into #YourTable Values
('1-1-2019')
,('2-1-2019')
,('3-1-2019')
,('4-1-2019')
Select NewDate = dateadd(DAY,N-1,EndTime)
From #YourTable A
Cross Join (
Select Top (4) N=row_number() over (order by (select null))
From master..spt_values N1
) B
Returns
NewDate
2019-01-01
2019-01-02
2019-01-03
2019-01-04
2019-02-01
2019-02-02
2019-02-03
2019-02-04
2019-03-01
2019-03-02
2019-03-03
2019-03-04
2019-04-01
2019-04-02
2019-04-03
2019-04-04

Frequency Distribution by Day

I have records of No. of calls coming to a call center. When a call comes into a call center a ticket is open.
So, let's say ticket 1 (T1) is open on 8/1/19 and it stays open till 8/5/19. So, if a person ran a query everyday then on 8/1 it will show 1 ticket open...same think on day 2 till day 5....I want to get records by day to see how many tickets were open for each day.....
In short, Frequency Distribution by Day.
Ticket Open_date Close_date
T1 8/1/2019 8/5/2019
T2 8/1/2019 8/6/2019
Result:
Result
Date # Tickets_Open
8/1/2019 2
8/2/2019 2
8/3/2019 2
8/4/2019 2
8/5/2019 2
8/6/2019 1
8/7/2019 0
8/8/2019 0
8/9/2019 0
8/10/2019 0
We can handle your requirement via the use of a calendar table, which stores all dates covering the full range in your data set.
WITH dates AS (
SELECT '2019-08-01' AS dt UNION ALL
SELECT '2019-08-02' UNION ALL
SELECT '2019-08-03' UNION ALL
SELECT '2019-08-04' UNION ALL
SELECT '2019-08-05' UNION ALL
SELECT '2019-08-06' UNION ALL
SELECT '2019-08-07' UNION ALL
SELECT '2019-08-08' UNION ALL
SELECT '2019-08-09' UNION ALL
SELECT '2019-08-10'
)
SELECT
d.dt,
COUNT(t.Open_date) AS num_tickets_open
FROM dates d
LEFT JOIN tickets t
ON d.dt BETWEEN t.Open_date AND t.Close_date
GROUP BY
d.dt;
Note that in practice if you expect to have this reporting requirement in the long term, you might want to replace the dates CTE above with a bona-fide table of dates.
This solution generates the list of dates from the tickets table using CTE recursion and calculates the count:
WITH Tickets(Ticket, Open_date, Close_date) AS
(
SELECT "T1", "8/1/2019", "8/5/2019"
UNION ALL
SELECT "T2", "8/1/2019", "8/6/2019"
),
Ticket_dates(Ticket, Dates) as
(
SELECT t1.Ticket, CONVERT(DATETIME, t1.Open_date)
FROM Tickets t1
UNION ALL
SELECT t1.Ticket, DATEADD(dd, 1, CONVERT(DATETIME, t1.Dates))
FROM Ticket_dates t1
inner join Tickets t2 on t1.Ticket = t2.Ticket
where DATEADD(dd, 1, CONVERT(DATETIME, t1.Dates)) <= CONVERT(DATETIME, t2.Close_date)
)
SELECT CONVERT(varchar, Dates, 1), count(*)
FROM Ticket_dates
GROUP by Dates
ORDER by Dates
A "general purpose" trick is to generate a series of numbers, which can be done using CTE's but there are many alternatives, and from that create the needed range of dates. Once that exists then you can left join your ticket data to this and then count by date.
CREATE TABLE mytable(
Ticket VARCHAR(8) NOT NULL PRIMARY KEY
,Open_date DATE NOT NULL
,Close_date DATE NOT NULL
);
INSERT INTO mytable(Ticket,Open_date,Close_date) VALUES ('T1','8/1/2019','8/5/2019');
INSERT INTO mytable(Ticket,Open_date,Close_date) VALUES ('T2','8/1/2019','8/6/2019');
Also note I am using a cross apply in this example to "attach" the min and max dates of your tickets to each numbered row. You would need to include your own logic on what data to select here.
;WITH
cteDigits AS (
SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
)
, cteTally AS (
SELECT
[1s].digit
+ [10s].digit * 10
+ [100s].digit * 100 /* add more like this as needed */
AS num
FROM cteDigits [1s]
CROSS JOIN cteDigits [10s]
CROSS JOIN cteDigits [100s] /* add more like this as needed */
)
select
n.num + 1 rownum
, dateadd(day,n.num,ca.min_date) as on_date
, count(t.Ticket) as tickets_open
from cteTally n
cross apply (select min(Open_date), max(Close_date) from mytable) ca (min_date, max_date)
left join mytable t on dateadd(day,n.num,ca.min_date) between t.Open_date and t.Close_date
where dateadd(day,n.num,ca.min_date) <= ca.max_date
group by
n.num + 1
, dateadd(day,n.num,ca.min_date)
order by
rownum
;
result:
+--------+---------------------+--------------+
| rownum | on_date | tickets_open |
+--------+---------------------+--------------+
| 1 | 01.08.2019 00:00:00 | 2 |
| 2 | 02.08.2019 00:00:00 | 2 |
| 3 | 03.08.2019 00:00:00 | 2 |
| 4 | 04.08.2019 00:00:00 | 2 |
| 5 | 05.08.2019 00:00:00 | 2 |
| 6 | 06.08.2019 00:00:00 | 1 |
+--------+---------------------+--------------+

Months Between 2 dates for each Project

Hi I am trying to run a query to return a row for each month between 2 dates for each project that I have. See example data:
Project Start End
1 1/1/2015 3/1/2015
2 2/1/2015 4/1/2015
End Data needed:
Project Month
1 1/1/2015
1 2/1/2015
1 3/1/2015
2 2/1/2015
2 3/1/2015
2 4/1/2015
I have several projects and will need a query to do this for all of them at the same time. How can I do this in SQL Server?
Another option is a CROSS APPLY with an ad-hoc tally table
Select A.Project
,Month = B.D
From YourTable A
Cross Apply (
Select Top (DateDiff(MONTH,A.Start,A.[End])+1) D=DateAdd(Month,-1+Row_Number() Over (Order By(Select null)),A.Start)
From master..spt_values
) B
Returns
Project Month
1 2015-01-01
1 2015-02-01
1 2015-03-01
2 2015-02-01
2 2015-03-01
2 2015-04-01
This is simple if you have or create a table for Months:
create table dbo.Months([Month] date primary key);
declare #StartDate date = '20100101'
,#NumberOfYears int = 30;
insert dbo.Months([Month],MonthEnd)
select top (12*#NumberOfYears)
[Month] = dateadd(month, row_number() over (order by number) -1, #StartDate)
from master.dbo.spt_values;
If you really do not want to have a Months table, you can use a cte like this:
declare #StartDate date = '20100101'
,#NumberOfYears int = 10;
;with Months as (
select top (12*#NumberOfYears)
[Month] = dateadd(month, row_number() over (order by number) -1, #StartDate)
from master.dbo.spt_values
)
Then query it like so:
select
t.Project
, m.Month
from t
inner join dbo.Months m
on m.Month >= t.Start
and m.Month <= t.[End]
rextester demo: http://rextester.com/SXPX26360
returns:
+---------+------------+
| Project | Month |
+---------+------------+
| 1 | 2015-01-01 |
| 1 | 2015-02-01 |
| 1 | 2015-03-01 |
| 2 | 2015-02-01 |
| 2 | 2015-03-01 |
| 2 | 2015-04-01 |
+---------+------------+
calendar and numbers tables reference:
Generate a set or sequence without loops 2- Aaron Bertrand
Creating a Date Table/Dimension in SQL Server 2008 - David Stein
Calendar Tables - Why You Need One - David Stein
Creating a date dimension or calendar table in SQL Server - Aaron Bertrand
TSQL Function to Determine Holidays in SQL Server - Tim Cullen
F_TABLE_DATE - Michael Valentine Jones
I personally like a tally table for this kind of thing. It is the swiss army knife of t-sql.
I create a view on my system for this. If you don't want to create a view you can easily use these ctes anytime you need a tally table.
create View [dbo].[cteTally] as
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select N from cteTally
GO
Now we just need your sample data in a table.
create table #Projects
(
Project int
, Start datetime
, EndDate datetime
)
insert #Projects
select 1, '1/1/2015', '3/1/2015' union all
select 2, '2/1/2015', '4/1/2015'
At this point we get to the real issue here which is retrieving your information. With the sample data and the view this becomes pretty simple.
select p.*
, NewMonth = DATEADD(MONTH, t.N - 1, p.Start)
from #Projects p
join cteTally t on t.N <= DATEDIFF(MONTH, p.Start, p.EndDate) + 1
order by p.Project
, t.N
Generate Time series take help from the link.
Then join time using between
SELECT --something
FROM table1 a
/type of/ JOIN table2 b ON b.field2 BETWEEN a.field2 AND a.field3

SQL Query Find x rows forward the highest value without having a lower value in between

I have a table with the left 2 columns.
I am trying to achieve the 3th column based on some logic.
Logic: If we take date 1/1 and go further the highest score that wil be reached with going further in dates before the score goes down will be on 3/1. With a score of 12. So as HighestAchievedScore we will retrieve 12 for 1/1. And so forth.
If we are on a date where the next score goes down my highestAchieveScore will be my next score. Like you can see at 3/01/2014
date score HighestAchieveScore
1/01/2014 10 12
2/01/2014 11 12
3/01/2014 12 10
4/01/2014 10 11
5/01/2014 11 9
6/01/2014 9 8
7/01/2014 8 9
8/01/2014 9 9
I hope I explained it clear enough.
Thanks already for every input resolving the problem.
Lets make some test data:
DECLARE #Score TABLE
(
ScoreDate DATETIME,
Score INT
)
INSERT INTO #Score
VALUES
('01-01-2014', 10),
('01-02-2014', 11),
('01-03-2014', 12),
('01-04-2014', 10),
('01-05-2014', 11),
('01-06-2014', 9),
('01-07-2014', 8),
('01-08-2014', 9);
Now we are going to number our rows and then link to the next row to see if we are still going up
WITH ScoreRows AS
(
SELECT
s.ScoreDate,
s.Score,
ROW_NUMBER() OVER (ORDER BY ScoreDate) RN
FROM #Score s
),
ScoreUpDown AS
(
SELECT p.ScoreDate,
p.Score,
p.RN,
CASE WHEN p.Score < n.Score THEN 1 ELSE 0 END GoingUp,
ISNULL(n.Score, p.Score) NextScore
FROM ScoreRows p
LEFT JOIN ScoreRows n
ON n.RN = p.RN + 1
)
We take our data recursively look for the next row that is right before a fall, and take that value as our max for any row that is still going up. otherwise, we use the score for the next falling row.
SELECT
s.ScoreDate,
s.Score,
CASE WHEN s.GoingUp = 1 THEN d.Score ELSE s.NextScore END Test
FROM ScoreUpDown s
OUTER APPLY
(
SELECT TOP 1 * FROM ScoreUpDown d
WHERE d.ScoreDate > s.ScoreDate
AND GoingUp = 0
) d;
Output:
ScoreDate Score Test
2014-01-01 00:00:00.000 10 12
2014-01-02 00:00:00.000 11 12
2014-01-03 00:00:00.000 12 10
2014-01-04 00:00:00.000 10 11
2014-01-05 00:00:00.000 11 9
2014-01-06 00:00:00.000 9 8
2014-01-07 00:00:00.000 8 9
2014-01-08 00:00:00.000 9 9
Assuming you are wanting the third column to be computed, you can create the table like this (or add the column to an existing table), using a function to determine the value of the third column:
Create Function dbo.fnGetMaxScore(#Date Date)
Returns Int
As Begin
Declare #Ret Int
Select #Ret = Max(Score)
From YourTable
Where Date > #Date
Return #Ret
End
Create Table YourTable
(
Date Date,
Score Int,
HighestAchieveScore As dbo.fnGetMaxScore(Date)
)
I'm not sure this will work.... but this is the general concept.
Self join on A.Date < B.Date to get max score, but use coalesce and a 3rd self join on a rowID assigned in a CTE to determine if the score dropped on the next record, and if it did coalesce that score in, otherwise use the max score.
NEED TO TEST but have to setup a fiddle to do so..
WITH CTE as
(SELECT Date, Score, ROW_NUMBER() OVER(ORDER BY A.Date ASC) AS Row FROM tableName)
SELECT A.Date, A.Score, coalesce(c.score, Max(A.Score)) as HighestArchievedScore
FROM CTE A
LEFT JOIN CTE B
on A.Date < B.Date
LEFT JOIN CTE C
on A.Row+1=B.Row
and A.Score > C.Score
GROUP BY A.DATE,
A.SCORE
This should work on SQL Server 2012 but not earlier versions:
WITH cte AS (
SELECT date,
LEAD(score) OVER (ORDER BY date) nextScore
FROM yourTable
)
SELECT t.date, score,
CASE
WHEN nextScore < score THEN nextScore
ELSE (
SELECT ISNULL(MAX(t1.score), t.score)
FROM yourTable t1
JOIN cte ON t1.date = cte.date
WHERE t1.date > t.date
AND ISNULL(nextScore, 0) < score
)
END AS HighestAchieveScore
FROM yourTable t
JOIN cte ON t.date = cte.date