SQL query joining on existing date records and max date for missing records - sql

I have an items table with dates and values. As soon as the value gets to 1, there are no more records for that Itemid.
Item Table
Itemid ItemDate Value
1 2020-04-30 0.5
1 2020-05-31 0.75
1 2020-06-30 1.0
2 2020-05-31 0.6
2 2020-06-30 1.0
I want to join this with a simple date table
dateId EOMDate
1 2020-04-30
2 2020-05-31
3 2020-06-30
4 2020-07-31
5 2020-08-31
The result should produce one record for each date in the date table and for each item where the date is >= the Item date. Where there is an exact date match with the Item table, it will use that record from the item table. Where there is no matching record in the item table, then it uses the record with the Max(ItemDate) value, that exists in the item table.
So it should produce this:
Result EOMDate ItemDate Value
1 2020-04-30 2020-04-30 0.5
1 2020-05-31 2020-05-31 0.75
1 2020-06-30 2020-06-30 1.0
1 2020-07-31 2020-06-30 1.0
1 2020-08-31 2020-06-30 1.0
2 2020-05-31 2020-05-31 0.6
2 2020-06-30 2020-06-30 1.0
2 2020-07-31 2020-06-30 1.0
2 2020-08-31 2020-06-30 1.0
The item table has several hundred millions of rows, and the date table has 120 records (each month end for 10 years), so I need a good performing solution. This has completely stumped me for some reason!
EDIT
my initial and non-working solution uses an outer apply
select p.ItemId, p.ItemDate, d.EOMDate, p.Value
from (select ItemId, ItemDate, Value from Items) p
OUTER APPLY
(
SELECT EOMDate from dates
) d
order by p.ItemDate,d.EOMDate
However it returns a table that has one record for each combination of Item date and EOM date. So in the above example, 20 records for ItemId 1 and 16 records for ItemId2
Here is to sql to create the above example tables:
CREATE TABLE #Items (ItemId int, ItemDate date, [Value] float)
Insert into #Items (ItemId,ItemDate,[Value])
Values (1,'2020-04-30',0.5),(1,'2020-05-31',0.75),(1,'2020-06-30',1),(2,'2020-05-31',0.6),(2,'2020-06-30',1)
Create Table #dates (dateId int, EOMDate date)
Insert into #dates (dateId,EOMDate) Values (1,'2020-04-30'),(2,'2020-05-31'),(3,'2020-06-30'),(4,'2020-07-31'),(5,'2020-08-31')

One method uses apply:
select i.*, d.*
from (select item_id, max(date) as max_date
from items
group by item_id
) i outer apply
(select top (1) d.*
from dates d
where d.date >= max_date
order by d.date asc
) d

You can use cross join and analytical function as follows:
Select * from
(Select a.item_id, d.eomdate, i.itemdate, i.value,
Row_number() over (partition by a.item_id, d.eomdate order by i.itemdate) as rn
From
(Select distinct item_id from items) a
Cross join Dates d
join items i on i.item_id = a.item_id and d.eomdate >= i.item_date) t
Where rn = 1

Related

Uniform distribution of monthly budget to date

I have monthly budget need to distribute to per day
Datasource
Month
Budget
Jan
31
Feb
56
I want to smoothen out to
Date
Budget
01-Jan
1
02-Jan
1
...
...
01-Feb
2
02-Feb
2
...
...
How can I do this?
Assuming the month is really a date on the first day, then a pretty simply method uses a recursive CTE:
with cte as (
select month as day, budget
from t
union all
select dateadd(day, 1, day), budget
from cte
where day < eomonth(day)
)
select day, budget * 1.0 / day(eomonth(day))
from cte
order by day;
Here is a db<>fiddle.
Just another option using an ad-hoc tally/numbers table
This assumes the source MONTH is a string and the desired year is the current year.
Example or dbFiddle
Declare #YourTable Table ([Month] varchar(50),[Budget] money)
Insert Into #YourTable Values
('Jan',31)
,('Feb',56)
Select Date = DateFromParts(year(D),month(D),N)
,Budget = Budget / day(D)
From #YourTable A
Cross Apply ( values (EOMonth(try_convert(date,concat('01-',Month,'-',year(getdate())))))) B(D)
Join (Select Top 31 N=Row_Number() Over (Order By (Select Null)) From master..spt_values n1) C
on N<=day(D)
Results
Date Budget
2021-01-01 1.00
2021-01-02 1.00
...
2021-01-30 1.00
2021-01-31 1.00
2021-02-01 2.00
...
2021-02-27 2.00
2021-02-28 2.00

Bigquery calculate split from the beginning of month until end of month/current date

I have some problems, I want to calculate amount split from the beginning of month until end of month/current date and split by product like this :
data :
Date_payment
product
amount
2020-02-01
aa
10
2020-02-01
aa
20
2020-02-03
bb
5
2020-02-29
bb
5
2020-03-01
aa
4
2020-03-03
aa
3
2020-03-03
bb
1
let say Current date is 2020-03-05
i want to calculate split by product and by month from the beginning of month until end of month/current date
my expectation result is :
Date_Report
product
Total_amount
2020-02-01
aa
30
2020-02-02
aa
30
2020-02-03
aa
30
2020-02-03
bb
5
2020-02-04
aa
30
2020-02-04
bb
5
....so on until..
Date_Report
product
Totalamount
2020-02-29
aa
30
2020-02-29
bb
10
2020-03-01
aa
4
2020-03-02
aa
4
2020-03-03
aa
7
2020-03-03
bb
1
2020-03-04
aa
7
2020-03-04
bb
1
2020-03-05
aa
7
2020-03-05
bb
1
I want to see the total amount of each day from the beginning of the month to the end of the month for each month.
anyone can help ?
You need to generate the days, cross join with the products and bring in the existing data:
select report_date, p.product, coalesce(month_amount, 0) as month_amount
from (unnest(generate_date_array(date('2020-01-01'), current_date, interval 1 day)) report_date cross join
(select distinct product from t) p left join
(select date_trunc(report_date, month) as mon, product,
sum(amount) as month_amount
from t
group by 1, 2
) t
on t.mon = date_trunc(report_date, month) and t.product = p.product;
The following query should do the job:
SELECT
DATE_TRUNC(Date_payment, MONTH) AS payment_month,
product,
SUM(amount) AS total_amount
FROM
transactions
GROUP BY
payment_month,
product
ORDER BY
payment_month,
product
With the table you provided the result is:
Assuming your columns are as below :
create table "Example"
( "DatePayment" date,
"Product" nvarchar(5),
"Amount" int
);
Your logic could be(please change to BigQuery syntax):
DO BEGIN
DECLARE FIRSTDATE,LASTDATE DATE;
SELECT TO_DATE(YEAR(MIN("DatePayment"))||'-'||MONTH(MIN("DatePayment"))||'-01') INTO FIRSTDATE FROM "Example";
SELECT LAST_DAY(MAX("DatePayment")) INTO LASTDATE FROM "Example";
VAR1= SELECT D."DT",E."Product",case when S."Amount" is null then 0 else S."Amount" end as "Amount_CC" FROM
(select "GENERATED_PERIOD_START" AS "DT" from "SERIES_GENERATE_DATE"('INTERVAL 1 DAY' ,:FIRSTDATE, :LASTDATE ) ) D
CROSS JOIN ( SELECT DISTINCT "Product" FROM "Example" ) E
LEFT OUTER JOIN ( select "DatePayment","Product","Amount" from "Example" ) S
ON D."DT"=S."DatePayment" AND E."Product"=S."Product";
VAR2= select * , SUM("Amount_CC") OVER (PARTITION BY MONTH("DT"), "Product" ORDER BY "DT"
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS "Amount" from :VAR1 ;
select "DT" as "DatePayment","Product","Amount" from :VAR2 where "Amount" <>0 order by "DT","Product";
END;
You will get the result for all months in your dataset.
Results below :

Fill Missing Dates for Running Total

I have this table
UserID
Date
Sale
A
2021-05-01
3
A
2021-05-03
1
A
2021-05-03
2
A
2021-05-05
5
B
2021-05-02
4
B
2021-05-03
10
What I need is something that looks like this.
UserID
Date
DailySale
RunningSale
A
2021-05-01
3
3
A
2021-05-02
NULL
3
A
2021-05-03
3
6
A
2021-05-04
NULL
6
A
2021-05-05
5
11
B
2021-05-01
NULL
0
B
2021-05-02
4
4
B
2021-05-03
10
14
B
2021-05-04
NULL
14
B
2021-05-05
NULL
14
I need to join on itself with all the dates in a certain time period so I can create a running sum sales total by date.
I figured out how to do it all separately, I know how to do a running sum using (over partition by) and I know I can join a calendar table to my sales table to get the time period. But I want to try the self join method by distinct(datetime), and I'm not certain how to go about that. I've tried this, but it doesn't work for me. I have over 1 million rows, so it takes over 2 minutes to finished processing and the running-sum column looks exactly like the daily-sum column.
What's the best way to go about this?
Edit: Corrected Table Sums
You need a calendar table here containing all dates. Consider the following approach:
WITH dates AS (
SELECT '2021-05-01' AS Date UNION ALL
SELECT '2021-05-02' UNION ALL
SELECT '2021-05-03' UNION ALL
SELECT '2021-05-04' UNION ALL
SELECT '2021-05-05'
)
SELECT
u.UserID,
d.Date,
SUM(t.Sale) AS DailySale,
SUM(COALESCE(SUM(t.Sale), 0)) OVER (PARTITION BY u.UserID ORDER BY d.Date) AS RunningSale
FROM (SELECT DISTINCT UserID FROM yourTable) u
CROSS JOIN dates d
LEFT JOIN yourTable t
ON t.UserID = u.UserID AND t.Date = d.Date
GROUP BY
u.UserID,
d.Date
ORDER BY
u.UserID,
d.Date
Demo

Given two date ranged discount tables and product price, calculate date ranged final price

I have two tables with seasonal discounts. In each of these two tables are non overlapping date ranges, product id and discount that applies in that date range. Date ranges from one table however may overlap with date ranges in the other table. Given a third table with product id and its default price, the goal is to efficiently calculate seasonal - date ranged prices for product id after discounts from both tables have been applied.
Discounts multiply only in their overlapping period, e.g. if a first discount is 0.9 (10%) from 2019-07-01 to 2019-07-30, and a second discount is 0.8 from 2019-07-16 to 2019-08-15, this translates to: 0.9 discount from 2019-07-01 to 2019-07-15, 0.72 discount from 2019-07-16 to 2019-07-30, and 0.8 discount from 2019-07-31 to 2019-08-15.
I have managed to come to a solution, by first generating a table that holds ordered all of start and end dates in both discount tables, then generating a resulting table of all smallest disjoint intervals, and then for each interval, generating all prices, default, price with only the discount from first table applied (if any applies), price with only the discount from second table applied (if any applies), price with both discounts applied (if so possible) and then taking a min of these four prices. See sample code bellow.
declare #pricesDefault table (product_id int, price decimal)
insert into #pricesDefault
values
(1, 100),
(2, 120),
(3, 200),
(4, 50)
declare #discountTypeA table (product_id int, modifier decimal(4,2), startdate datetime, enddate datetime)
insert into #discountTypeA
values
(1, 0.75, '2019-06-06', '2019-07-06'),
(1, 0.95, '2019-08-06', '2019-08-20'),
(1, 0.92, '2019-05-06', '2019-06-05'),
(2, 0.75, '2019-06-08', '2019-07-19'),
(2, 0.95, '2019-07-20', '2019-09-20'),
(3, 0.92, '2019-05-06', '2019-06-05')
declare #discountTypeB table (product_id int, modifier decimal(4,2), startdate datetime, enddate datetime)
insert into #discountTypeB
values
(1, 0.85, '2019-06-20', '2019-07-03'),
(1, 0.65, '2019-08-10', '2019-08-29'),
(1, 0.65, '2019-09-10', '2019-09-27'),
(3, 0.75, '2019-05-08', '2019-05-19'),
(2, 0.95, '2019-05-20', '2019-05-21'),
(3, 0.92, '2019-09-06', '2019-09-09')
declare #pricingPeriod table(product_id int, discountedPrice decimal, startdate datetime, enddate datetime);
with allDates(product_id, dt) as
(select distinct product_id, dta.startdate from #discountTypeA dta
union all
select distinct product_id, dta.enddate from #discountTypeA dta
union all
select distinct product_id, dtb.startdate from #discountTypeB dtb
union all
select distinct product_id, dtb.enddate from #discountTypeB dtb
),
allproductDatesWithId as
(select product_id, dt, row_number() over (partition by product_id order by dt asc) 'Id'
from allDates),
sched as
(select pd.product_id, apw1.dt startdate, apw2.dt enddate
from #pricesDefault pd
join allproductDatesWithId apw1 on apw1.product_id = pd.product_id
join allproductDatesWithId apw2 on apw2.product_id = pd.product_id and apw2.Id= apw1.Id+1
),
discountAppliedTypeA as(
select sc.product_id, sc.startdate, sc.enddate,
min(case when sc.startdate >= dta.startdate and dta.enddate >= sc.enddate then pd.price * dta.modifier else pd.price end ) 'price'
from sched sc
join #pricesDefault pd on pd.product_id = sc.product_id
left join #discountTypeA dta on sc.product_id = dta.product_id
group by sc.product_id, sc.startdate , sc.enddate ),
discountAppliedTypeB as(
select daat.product_id, daat.startdate, daat.enddate,
min(case when daat.startdate >= dta.startdate and dta.enddate >= daat.enddate then daat.price * dta.modifier else daat.price end ) 'price'
from discountAppliedTypeA daat
left join #discountTypeB dta on daat.product_id = dta.product_id
group by daat.product_id, daat.startdate , daat.enddate )
select * from discountAppliedTypeB
order by product_id, startdate
Calculating a min of all possible prices is unnecessary overhead. I'd like to generate, just one resulting price and have it as a final price.
Here is the resulting set:
product_id start_date end_date final_price
1 2019-05-06 00:00:00.000 2019-06-05 00:00:00.000 92.0000
1 2019-06-05 00:00:00.000 2019-06-06 00:00:00.000 100.0000
1 2019-06-06 00:00:00.000 2019-06-20 00:00:00.000 75.0000
1 2019-06-20 00:00:00.000 2019-07-03 00:00:00.000 63.7500
1 2019-07-03 00:00:00.000 2019-07-06 00:00:00.000 75.0000
1 2019-07-06 00:00:00.000 2019-08-06 00:00:00.000 100.0000
1 2019-08-06 00:00:00.000 2019-08-10 00:00:00.000 95.0000
1 2019-08-10 00:00:00.000 2019-08-20 00:00:00.000 61.7500
1 2019-08-20 00:00:00.000 2019-08-29 00:00:00.000 65.0000
1 2019-08-29 00:00:00.000 2019-09-10 00:00:00.000 100.0000
1 2019-09-10 00:00:00.000 2019-09-27 00:00:00.000 65.0000
2 2019-05-20 00:00:00.000 2019-05-21 00:00:00.000 114.0000
2 2019-05-21 00:00:00.000 2019-06-08 00:00:00.000 120.0000
2 2019-06-08 00:00:00.000 2019-07-19 00:00:00.000 90.0000
2 2019-07-19 00:00:00.000 2019-07-20 00:00:00.000 120.0000
2 2019-07-20 00:00:00.000 2019-09-20 00:00:00.000 114.0000
3 2019-05-06 00:00:00.000 2019-05-08 00:00:00.000 184.0000
3 2019-05-08 00:00:00.000 2019-05-19 00:00:00.000 138.0000
3 2019-05-19 00:00:00.000 2019-06-05 00:00:00.000 184.0000
3 2019-06-05 00:00:00.000 2019-09-06 00:00:00.000 200.0000
3 2019-09-06 00:00:00.000 2019-09-09 00:00:00.000 184.0000
Is there a more efficient to this solution that I am not seeing?
I have a large data set of ~20K rows in real product prices table, and 100K- 200K rows in both discount tables.
Indexing structure of the actual tables is following: product id is clustered index in product prices table, whilst discount tables have an Id surrogate column as clustered index (as well as primary key), and (product_id, start_date, end_date) as a non clustered index.
You can generate the dates using union. Then bring in all discounts that are valid on that date, and calculate the total.
This looks like:
with prices as (
select a.product_id, v.dte
from #discountTypeA a cross apply
(values (a.startdate), (a.enddate)) v(dte)
union -- on purpose to remove duplicates
select b.product_id, v.dte
from #discountTypeB b cross apply
(values (b.startdate), (b.enddate)) v(dte)
),
p as (
select p.*, 1-a.modifier as a_discount, 1-b.modifier as b_discount, pd.price
from prices p left join
#pricesDefault pd
on pd.product_id = p.product_id left join
#discountTypeA a
on p.product_id = a.product_id and
p.dte >= a.startdate and p.dte < a.enddate left join
#discountTypeb b
on p.product_id = b.product_id and
p.dte >= b.startdate and p.dte < b.enddate
)
select p.product_id, price * (1 - coalesce(a_discount, 0)) * (1 - coalesce(b_discount, 0)) as price, a_discount, b_discount,
dte as startdate, lead(dte) over (partition by product_id order by dte) as enddate
from p
order by product_id, dte;
Here is a db<>fiddle.
Here is a version that works out the price for every date. You can then either use this directly, or use one of the many solutions on SO for working out date ranges.
In this example I have hard coded the date limits, but you could easily read them from your tables if you prefer.
I haven't done any performance testing on this, but give it a go. Its quite a bit simpler do if you have the right indexes it might be quicker.
;with dates as (
select convert(datetime,'2019-05-06') as d
union all
select d+1 from dates where d<'2019-09-27'
)
select pricesDefault.product_id, d, pricesDefault.price as baseprice,
discountA.modifier as dA,
discountB.modifier as dB,
pricesDefault.price*isnull(discountA.modifier,1)*isnull(discountB.modifier,1) as finalprice
from #pricesDefault pricesDefault
cross join dates
left join #discountTypeA discountA on discountA.product_id=pricesDefault.product_id and d between discountA.startdate and discountA.enddate
left join #discountTypeB discountB on discountB.product_id=pricesDefault.product_id and d between discountB.startdate and discountB.enddate
order by pricesDefault.product_id, d
Option (MaxRecursion 1000)

Find From/To Dates across multiple rows - SQL Postgres

I want to be able to "book" within range of dates, but you can't book across gaps of days. So booking across multiple rates is fine as long as they are contiguous.
I am happy to change data structure/index, if there are better ways of storing start/end ranges.
So far I have a "rates" table which contains Start/End Periods of time with a daily rate.
e.g. Rates Table.
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-16 2016-04-17
3 50.00 2016-04-18 2016-04-30
For the above data I would want to return:
From To
2015-04-12 2016-4-30
For simplicity sake it is safe to assume that dates are safely consecutive. For contiguous dates To is always 1 day before from.
For the case there is only 1 row, I would want it to return the From/To of that single row.
Also to clarify if I had the following data:
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-17 2016-04-18
3 50.00 2016-04-19 2016-04-30
4 50.00 2016-05-01 2016-05-21
Meaning where there is a gap >= 1 day it would count as a separate range.
In which case I would expect the following:
From To
2015-04-12 2016-04-15
2015-04-17 2016-05-21
Edit 1
After playing around I have come up with the following SQL which seems to work. Although I'm not sure if there are better ways/issues with it?
WITH grouped_rates AS
(SELECT
from_date,
to_date,
SUM(grp_start) OVER (ORDER BY from_date, to_date) group
FROM (SELECT
gite_id,
from_date,
to_date,
CASE WHEN (from_date - INTERVAL '1 DAY') = lag(to_date)
OVER (ORDER BY from_date, to_date)
THEN 0
ELSE 1
END grp_start
FROM rates
GROUP BY from_date, to_date) AS start_groups)
SELECT
min(from_date) from_date,
max(to_date) to_date
FROM grouped_rates
GROUP BY grp;
This is identifying contiguous overlapping groups in the data. One approach is to find where each group begins and then do a cumulative sum. The following query adds a flag indicating if a row starts a group:
select r.*,
(case when not exists (select 1
from rates r2
where r2.from < r.from and r2.to >= r.to or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r;
The or in the correlation condition is to handle the situation where intervals that define a group overlap on the start date for the interval.
You can then do a cumulative sum on this flag and aggregate by that sum:
with r as (
select r.*,
(case when not exists (select 1
from rates r2
where (r2.from < r.from and r2.to >= r.to) or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r
)
select min(from), max(to)
from (select r.*,
sum(r.StartFlag) over (order by r.from) as grp
from r
) r
group by grp;
CREATE TABLE prices( id INTEGER NOT NULL PRIMARY KEY
, price MONEY
, date_from DATE NOT NULL
, date_upto DATE NOT NULL
);
-- some data (upper limit is EXCLUSIVE)
INSERT INTO prices(id, price, date_from, date_upto) VALUES
( 1, 75.00, '2015-04-12', '2016-04-16' )
,( 2, 100.00, '2016-04-17', '2016-04-19' )
,( 3, 50.00, '2016-04-19', '2016-05-01' )
,( 4, 50.00, '2016-05-01', '2016-05-22' )
;
-- SELECT * FROM prices;
-- Recursive query to "connect the dots"
WITH RECURSIVE rrr AS (
SELECT date_from, date_upto
, 1 AS nperiod
FROM prices p0
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_upto = p0.date_from) -- no preceding segment
UNION ALL
SELECT r.date_from, p1.date_upto
, 1+r.nperiod AS nperiod
FROM prices p1
JOIN rrr r ON p1.date_from = r.date_upto
)
SELECT * FROM rrr r
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_from = r.date_upto) -- no following segment
;
Result:
date_from | date_upto | nperiod
------------+------------+---------
2015-04-12 | 2016-04-16 | 1
2016-04-17 | 2016-05-22 | 3
(2 rows)