PARTITION BY with date between 2 date - sql

I work on Azure SQL Database working with SQL Server
In SQL, I try to have a table by day, but the day is not in the table.
I explain it by the example below:
TABLE STARTER: (Format Date: YYYY-MM-DD)
Date begin
Date End
Category
Value
2021-01-01
2021-01-03
1
0.2
2021-01-02
2021-01-03
1
0.1
2021-01-01
2021-01-02
2
0.3
For the result, I try to have this TABLE RESULT:
Date
Category
Value
2021-01-01
1
0.2
2021-01-01
2
0.3
2021-01-02
1
0.3 (0.2+0.1)
2021-01-02
2
0.3
2021-01-03
1
0.3 (0.2+0.1)
For each day, I want to sum the value if the day is between the beginning and the end of the date. I need to do that for each category.
In terms of SQL code I try to do something like that:
SELECT SUM(CAST(value as float)) OVER (PARTITION BY Date begin, Category) as value,
Date begin,
Category,
Value
FROM TABLE STARTER
This code calculates only the value that has the same Date begin but don't consider all date between Date begin and Date End.
So in my code, it doesn't calculate the sum of the value for the 02-01-2021 of Category 1 because it doesn't write explicitly. (between 01-01-2021 and 03-01-2021)
Is it possible to do that in SQL?
Thanks so much for your help!

You can use a recursive CTE to expand the date ranges into the list of separate days. Then, it's matter of joining and aggregating.
For example:
with
r as (
select category,
min(date_begin) as date_begin, max(date_end) as date_end
from starter
group by category
),
d as (
select category, date_begin as d from r
union all
select d.category, dateadd(day, 1, d.d)
from d
join r on r.category = d.category
where d.d < r.date_end
)
select d.d, d.category, sum(s.value) as value
from d
join starter s on s.category = d.category
and d.d between s.date_begin and s.date_end
group by d.category, d.d;
Result:
d category value
----------- --------- -----
2021-01-01 1 0.20
2021-01-01 2 0.30
2021-01-02 1 0.30
2021-01-02 2 0.30
2021-01-03 1 0.30
See running example at db<>fiddle.
Note: Starting in SQL Server 2022 it seems there is/will be a new GENERATE_SERIES() function that will make this query much shorter.

Related

SQL: Getting Missing Date Values and Copy Data to Those New Dates

So this seems somewhat weird, but this use case came up, and I have been somewhat struggling trying to figure out how to come about a solution. Let's say I have this data set:
date
value1
value2
2020-01-01
50
2
2020-01-04
23
5
2020-01-07
14
8
My goal is to try and fill in the gap between the two dates while copying whatever values were from the date before it. So for example, the data output I would want is:
date
value1
value2
2020-01-01
50
2
2020-01-02
50
2
2020-01-03
50
2
2020-01-04
23
5
2020-01-05
23
5
2020-01-06
23
5
2020-01-07
14
8
Not sure if this is something I can do with SQL but would definitely take any suggestions.
One approach is to use the window function lead() in concert with an ad-hoc tally table if you don't have a calendar table (highly suggested).
Example
;with cte as (
Select *
,nrows = datediff(day,[date],lead([date],1,[date]) over (order by [date]))
From YourTable A
)
Select date = dateadd(day,coalesce(N-1,0),[date])
,value1
,value2
From cte A
left Join (Select Top 1000 N=Row_Number() Over (Order By (Select NULL)) From master..spt_values n1 ) B
on N<=nRows
Results
date value1 value2
2020-01-01 50 2
2020-01-02 50 2
2020-01-03 50 2
2020-01-04 23 5
2020-01-05 23 5
2020-01-06 23 5
2020-01-07 14 8
EDIT: If you have a calendar table
Select Date = coalesce(B.Date,A.Date)
,value1
,value2
From (
Select Date
,value1
,value2
,Date2 = lead([date],1,[date]) over (order by [date])
From YourTable A
) A
left Join CalendarTable B on B.Date >=A.Date and B.Date< A.Date2
Another option is to use CROSS APPLY. I am not sure how you are determining what range you want from the table, but you can easily override my guess by explicitly defining #s and #e:
DECLARE #s date, #e date;
SELECT #s = MIN(date), #e = MAX(date) FROM dbo.TheTable;
;WITH d(d) AS
(
SELECT #s UNION ALL
SELECT DATEADD(DAY,1,d) FROM d
WHERE d < #e
)
SELECT d.d, x.value1, x.value2
FROM d CROSS APPLY
(
SELECT TOP (1) value1, value2
FROM dbo.TheTable
WHERE date <= d.d
AND value1 IS NOT NULL
ORDER BY date DESC
) AS x
-- OPTION (MAXRECURSION 32767) -- if date range can be > 100 days but < 89 years
-- OPTION (MAXRECURSION 0) -- if date range can be > 89 years
If you don't like the recursive CTE, you could easily use a calendar table (but presumably you'd still need a way to define the overall date range you're after as opposed to all of time).
Example db<>fiddle
In SQL Server you can make a cursor, which iterates over the dates. If it finds values for a given date, it takes those and stores them for later. in the next iteration it can then take the stored values, in case there are no values in the database

Uniform distribution of monthly budget to date

I have monthly budget need to distribute to per day
Datasource
Month
Budget
Jan
31
Feb
56
I want to smoothen out to
Date
Budget
01-Jan
1
02-Jan
1
...
...
01-Feb
2
02-Feb
2
...
...
How can I do this?
Assuming the month is really a date on the first day, then a pretty simply method uses a recursive CTE:
with cte as (
select month as day, budget
from t
union all
select dateadd(day, 1, day), budget
from cte
where day < eomonth(day)
)
select day, budget * 1.0 / day(eomonth(day))
from cte
order by day;
Here is a db<>fiddle.
Just another option using an ad-hoc tally/numbers table
This assumes the source MONTH is a string and the desired year is the current year.
Example or dbFiddle
Declare #YourTable Table ([Month] varchar(50),[Budget] money)
Insert Into #YourTable Values
('Jan',31)
,('Feb',56)
Select Date = DateFromParts(year(D),month(D),N)
,Budget = Budget / day(D)
From #YourTable A
Cross Apply ( values (EOMonth(try_convert(date,concat('01-',Month,'-',year(getdate())))))) B(D)
Join (Select Top 31 N=Row_Number() Over (Order By (Select Null)) From master..spt_values n1) C
on N<=day(D)
Results
Date Budget
2021-01-01 1.00
2021-01-02 1.00
...
2021-01-30 1.00
2021-01-31 1.00
2021-02-01 2.00
...
2021-02-27 2.00
2021-02-28 2.00

Sum and segment overlapping date ranges

Our HR system specifies employee assignments, which can be concurrent. Our rostering system only allows one summary assignment for a person. Therefore I need to pre-process the HR records, so rostering can determine the number of shifts a worker is expected to work on a given day.
Looking just at worker A who has two assignments, the first is for a quarter shift and the second for a half shift, but overlapping in the middle where they work .75 shifts.
Person StartDate EndDate Shifts
A 01/01/21 04/01/21 .25
A 03/01/21 06/01/21 .5
01---02---03---04---05---06---07
Rec 1 |------------------|
Rec 2 | |===================|
Total | 0.25 | 0.75 | 0.5 |
Required output.
Person StartDate EndDate ShiftCount
A 01/01/21 02/01/21 0.25
A 03/01/21 04/01/21 0.75
A 05/01/21 06/01/21 0.5
Given this data, how do we sum and segment the data? I found an exact question for MySQL but the version was too early and code was suggested. I also found a Postgres solution but we don't have ranges.
select * from (
values
('A','01/01/21','04/01/21',0.25),
('A','03/01/21','05/01/21',0.5)
) AS Data (Person,StartDate,EndDate,Shifts);
It looks like a Gaps-and-Islands to me.
If it helps, cte1 is used to expand the date ranges via an ad-hoc tally table. Then cte2 is used to create the Gaps-and-Islands. The final result is then a small matter of aggregation.
Example
Set Dateformat DMY
Declare #YourTable table (Person varchar(50),StartDate Date,EndDate date,Shifts decimal(10,2))
Insert Into #YourTable values
('A','01/01/21','04/01/21',0.25)
,('A','03/01/21','05/01/21',0.5)
;with cte1 as (
Select [Person]
,[d] = dateadd(DAY,N,StartDate)
,Shifts = sum(Shifts)
From #YourTable A
Join (
Select Top 1000 N=-1+Row_Number() Over (Order By (Select Null))
From master..spt_values n1,master..spt_values n2
) B on N <= datediff(DAY,[StartDate],[EndDate])
Group By Person,dateadd(DAY,N,StartDate)
), cte2 as (
Select *
,Grp = datediff(day,'1900-01-01',d)-row_number() over (partition by Person,Shifts Order by d)
From cte1
)
Select Person
,StartDate = min(d)
,EndDate = max(d)
,Shifts = max(Shifts)
From cte2
Group By Person,Grp
Returns
Person StartDate EndDate Shifts
A 2021-01-01 2021-01-02 0.25
A 2021-01-03 2021-01-04 0.75
A 2021-01-05 2021-01-05 0.50

How do you get the sum of values by day, but day in columns SQL

I have a table that looks like below where day, client_name and order_value are stored/
select day, client_name, order_value
from sample_table
day
client_name
order_value
2021-01-01
A
100
2021-01-01
A
100
2021-01-02
A
200
2021-01-03
A
100
2021-01-01
B
300
2021-01-01
B
400
2021-01-01
C
500
2021-01-02
C
500
2021-01-02
C
500
and I want to get the sum of order_value per client by day, but days in columns.
Basically, I want my result to come out something like this.
client_name
2021-01-01
2021-01-02
2021-01-03
A
200
200
100
B
700
Null
Null
C
500
1000
Null
If you know what the days are, you can use conditional aggregation:
select client_name,
sum(case when date = '2021-01-01' then order_value end) as date_20210101,
sum(case when date = '2021-01-02' then order_value end) as date_20210102,
sum(case when date = '2021-01-03' then order_value end) as date_20210103
from t
group by client_name ;
If you don't know the specific dates (i.e., you want them based on the data or a variable number), then you need to use dynamic SQL. That means that you construct the SQL statement as a string and then execute it.

Find From/To Dates across multiple rows - SQL Postgres

I want to be able to "book" within range of dates, but you can't book across gaps of days. So booking across multiple rates is fine as long as they are contiguous.
I am happy to change data structure/index, if there are better ways of storing start/end ranges.
So far I have a "rates" table which contains Start/End Periods of time with a daily rate.
e.g. Rates Table.
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-16 2016-04-17
3 50.00 2016-04-18 2016-04-30
For the above data I would want to return:
From To
2015-04-12 2016-4-30
For simplicity sake it is safe to assume that dates are safely consecutive. For contiguous dates To is always 1 day before from.
For the case there is only 1 row, I would want it to return the From/To of that single row.
Also to clarify if I had the following data:
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-17 2016-04-18
3 50.00 2016-04-19 2016-04-30
4 50.00 2016-05-01 2016-05-21
Meaning where there is a gap >= 1 day it would count as a separate range.
In which case I would expect the following:
From To
2015-04-12 2016-04-15
2015-04-17 2016-05-21
Edit 1
After playing around I have come up with the following SQL which seems to work. Although I'm not sure if there are better ways/issues with it?
WITH grouped_rates AS
(SELECT
from_date,
to_date,
SUM(grp_start) OVER (ORDER BY from_date, to_date) group
FROM (SELECT
gite_id,
from_date,
to_date,
CASE WHEN (from_date - INTERVAL '1 DAY') = lag(to_date)
OVER (ORDER BY from_date, to_date)
THEN 0
ELSE 1
END grp_start
FROM rates
GROUP BY from_date, to_date) AS start_groups)
SELECT
min(from_date) from_date,
max(to_date) to_date
FROM grouped_rates
GROUP BY grp;
This is identifying contiguous overlapping groups in the data. One approach is to find where each group begins and then do a cumulative sum. The following query adds a flag indicating if a row starts a group:
select r.*,
(case when not exists (select 1
from rates r2
where r2.from < r.from and r2.to >= r.to or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r;
The or in the correlation condition is to handle the situation where intervals that define a group overlap on the start date for the interval.
You can then do a cumulative sum on this flag and aggregate by that sum:
with r as (
select r.*,
(case when not exists (select 1
from rates r2
where (r2.from < r.from and r2.to >= r.to) or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r
)
select min(from), max(to)
from (select r.*,
sum(r.StartFlag) over (order by r.from) as grp
from r
) r
group by grp;
CREATE TABLE prices( id INTEGER NOT NULL PRIMARY KEY
, price MONEY
, date_from DATE NOT NULL
, date_upto DATE NOT NULL
);
-- some data (upper limit is EXCLUSIVE)
INSERT INTO prices(id, price, date_from, date_upto) VALUES
( 1, 75.00, '2015-04-12', '2016-04-16' )
,( 2, 100.00, '2016-04-17', '2016-04-19' )
,( 3, 50.00, '2016-04-19', '2016-05-01' )
,( 4, 50.00, '2016-05-01', '2016-05-22' )
;
-- SELECT * FROM prices;
-- Recursive query to "connect the dots"
WITH RECURSIVE rrr AS (
SELECT date_from, date_upto
, 1 AS nperiod
FROM prices p0
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_upto = p0.date_from) -- no preceding segment
UNION ALL
SELECT r.date_from, p1.date_upto
, 1+r.nperiod AS nperiod
FROM prices p1
JOIN rrr r ON p1.date_from = r.date_upto
)
SELECT * FROM rrr r
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_from = r.date_upto) -- no following segment
;
Result:
date_from | date_upto | nperiod
------------+------------+---------
2015-04-12 | 2016-04-16 | 1
2016-04-17 | 2016-05-22 | 3
(2 rows)