Propagate missing dates in teradata - select query - sql

I have a table that looks like this:
my_date
item_id.
sales
2020-03-01
GMZS72429
2
2020-03-07
GMZS72429
2
2020-03-09
GMZS72429
1
2020-03-04
GMZS72425
1
And I want it to look like this
my_date
item_id
sales
2020-03-01
GMZS72429
2
2020-03-02
GMZS72429
0
...
...
...
2020-03-05
GMZS72429
0
2020-03-06
GMZS72429
0
2020-03-07
GMZS72429
2
2020-03-08
GMZS72429
0
2020-03-09
GMZS72429
1
2020-03-01
GMZS72425
0
2020-03-02
GMZS72425
0
2020-03-03
GMZS72425
0
2020-03-04
GMZS72425
1
...
...
...
2020-03-09
GMZS72425
0
Since I was struggling with the documentation from Teradata, I have tried generating the pair item_id - my_date using another table, followed by a left join:
with a1 as(
select distinct my_date, item_id from some_table_with_the_item_ids_and_all_dates
)
select a1.my_date, a1.item_id, coalesce(sales, 0) as sales
from a1 left join my_table on a1.item_id=my_table.item_id and a1.my_date=my_table.my_date;
This worked but it is terribly slow, and ugly. I was wondering if there is a better built-in (or alternative) method to do this. Thanks

This is a use case for Teradata's EXPAND ON syntax:
select
new_date
,item_id
,case when my_date = new_date then sales else 0 end
from
(
select dt.*, begin(p2) as new_date
from
(
select t.*
-- create a period for expansion in the next step
,period(my_date, lead(my_date, 1, my_date+1)
over (partition by item_id
order by my_date)) as pd
from vt as t
) as dt
-- now create the missing dates
expand on pd as p2
) as dt

One simple option is to use Teradata's built in date view as your driver:
select
coalesce(v.my_date,c.calendar_date),
item_id,
coalesce(v.sales,0)
from
sys_calendar.calendar c
left join your_table v
on v.my_date = c.calendar_date
where
c.calendar_date between (select min(my_date) from your_table ) and (select max(my_date) from your_table)
order by 1

Related

SQL query joining on existing date records and max date for missing records

I have an items table with dates and values. As soon as the value gets to 1, there are no more records for that Itemid.
Item Table
Itemid ItemDate Value
1 2020-04-30 0.5
1 2020-05-31 0.75
1 2020-06-30 1.0
2 2020-05-31 0.6
2 2020-06-30 1.0
I want to join this with a simple date table
dateId EOMDate
1 2020-04-30
2 2020-05-31
3 2020-06-30
4 2020-07-31
5 2020-08-31
The result should produce one record for each date in the date table and for each item where the date is >= the Item date. Where there is an exact date match with the Item table, it will use that record from the item table. Where there is no matching record in the item table, then it uses the record with the Max(ItemDate) value, that exists in the item table.
So it should produce this:
Result EOMDate ItemDate Value
1 2020-04-30 2020-04-30 0.5
1 2020-05-31 2020-05-31 0.75
1 2020-06-30 2020-06-30 1.0
1 2020-07-31 2020-06-30 1.0
1 2020-08-31 2020-06-30 1.0
2 2020-05-31 2020-05-31 0.6
2 2020-06-30 2020-06-30 1.0
2 2020-07-31 2020-06-30 1.0
2 2020-08-31 2020-06-30 1.0
The item table has several hundred millions of rows, and the date table has 120 records (each month end for 10 years), so I need a good performing solution. This has completely stumped me for some reason!
EDIT
my initial and non-working solution uses an outer apply
select p.ItemId, p.ItemDate, d.EOMDate, p.Value
from (select ItemId, ItemDate, Value from Items) p
OUTER APPLY
(
SELECT EOMDate from dates
) d
order by p.ItemDate,d.EOMDate
However it returns a table that has one record for each combination of Item date and EOM date. So in the above example, 20 records for ItemId 1 and 16 records for ItemId2
Here is to sql to create the above example tables:
CREATE TABLE #Items (ItemId int, ItemDate date, [Value] float)
Insert into #Items (ItemId,ItemDate,[Value])
Values (1,'2020-04-30',0.5),(1,'2020-05-31',0.75),(1,'2020-06-30',1),(2,'2020-05-31',0.6),(2,'2020-06-30',1)
Create Table #dates (dateId int, EOMDate date)
Insert into #dates (dateId,EOMDate) Values (1,'2020-04-30'),(2,'2020-05-31'),(3,'2020-06-30'),(4,'2020-07-31'),(5,'2020-08-31')
One method uses apply:
select i.*, d.*
from (select item_id, max(date) as max_date
from items
group by item_id
) i outer apply
(select top (1) d.*
from dates d
where d.date >= max_date
order by d.date asc
) d
You can use cross join and analytical function as follows:
Select * from
(Select a.item_id, d.eomdate, i.itemdate, i.value,
Row_number() over (partition by a.item_id, d.eomdate order by i.itemdate) as rn
From
(Select distinct item_id from items) a
Cross join Dates d
join items i on i.item_id = a.item_id and d.eomdate >= i.item_date) t
Where rn = 1

How to get daily budget based on monthly budget and workings days

Have have 2 tables.
One table with month budget, and one table with workings days.
What I want, is find out daily budget based on the monthly budget and working days.
Example:
August have a budget on 1000 and have 21 workings day.
September have a budget on 2000 and 23 workings days
I want to figure out what the total budget betweens two dates.
Ex: between 2020-08-02 and 2020-09-15
But must be sure that, days in august takes budget from august, days from september takes budget from september etc.
tbBudget:
Date | Amount
2020-08-01 | 1000
2020-09-01 | 2000
2020-10-01 | 3000
tbWorkingDays
Date | WorkingDay
2020-08-01 | 0
2020-08-02 | 0
2020-08-03 | 1
2020-08-04 | 1
2020-08-05 | 1
2020-08-06 | 1
2020-08-07 | 1
2020-08-08 | 1
...
2020-09-01 | 1
2020-09-02 | 1
2020-09-03 | 0
2020-09-04 | 1
...
2020-10-01 | 1
2020-10-02 | 0
2020-10-03 | 1
2020-10-04 | 1
I have no idea how to solve this issue. Can you help me?
My result should be like:
Date | WorkingDay | BudgetAmount
2020-08-02 | 0 | 0.0
2020-08-03 | 1 | 47.6
2020-08-04 | 1 | 47.6
2020-08-05 | 1 | 47.6
..
2020-09-13 | 1 | 86.9
2020-09-14 | 1 | 86.9
2020-09-15 | 1 | 86.9
Using CTE and group by:
with CTE1 AS(
SELECT FORMAT(A.DATE, 'MMyyyy') DATE, B.AMOUNT, SUM(CASE WHEN [WorkingDay] = 1 THEN 1 ELSE 0 END) AS TOTAL_WORKING_DAYS
FROM tbWorkingDays A INNER JOIN tbBudget B
ON (FORMAT(A.DATE, 'MMyyyy') = FORMAT(B.DATE, 'MMyyyy')) GROUP BY FORMAT(A.[DATE], 'MMyyyy'), B.AMOUNT
)
SELECT A.DATE,
A.WORKINGDAY,
CASE WHEN A.WORKINGDAY = 1 THEN B.AMOUNT/B.TOTAL_WORKING_DAYS
ELSE 0 END AS BudgetAmount
FROM CTE1 B
INNER JOIN
tbWorkingDays A
ON (FORMAT(A.DATE, 'MMyyyy') = B.DATE);
Assuming that the budgets are by month:
select wd.*,
(case when workingday = 0 then 0
else wd.budget * 1.0 / sum(wd.workingday) over (partition by wd.date)
end) as daily_amount
from tbWorkingDays wd join
tblBudget b
on wd.date >= b.date and wd.date < dateadd(month, 1, wd.date);
If the budget dates are not per month, then use apply instead:
select wd.*,
(case when workingday = 0 then 0
else wd.budget * 1.0 / sum(wd.workingday) over (partition by wd.date)
end) as daily_amount
from tbWorkingDays wd cross apply
(select top (1) b.*
from tblBudget b
where wd.date >= b.date
order by b.date desc
) b
Use sum as an analytical function to get the number of workingdays pr month, then divide out
Here is a functioning solution
with tally as
(
SELECT
row_number() over (order by (select null))-1 n
from (values (null),(null),(null),(null),(null),(null),(null),(null),(null),(null),(null)) a(a)
cross join (values (null),(null),(null),(null),(null),(null),(null),(null),(null),(null),(null)) b(b)
cross join (values (null),(null),(null),(null),(null),(null),(null),(null),(null),(null),(null)) c(c)
)
, tbWorkingDays as
(
select
cast(dateadd(day,n,'2020-01-01') as date) [Date],
iif(DATEPART(WEEKDAY,cast(dateadd(day,n,'2020-01-01') as date)) in (1,7),0,1) WorkingDay
from tally
where n<365
)
, tbBudget AS
(
select * from
(values
(cast('2020-08-01' as date), cast(1000 as decimal(19,2)))
,(cast('2020-09-01' as date), cast(2000as decimal(19,2)))
,(cast('2020-10-01' as date), cast(3000as decimal(19,2)))
) a([Date],[Amount])
)
select
a.[Date]
,a.WorkingDay*
(b.Amount/
sum(a.WorkingDay) over (partition by year(a.Date)*100+month(a.Date)))
from tbWorkingDays a
inner join tbBudget b
on a.Date between b.Date and dateadd(day,-1,dateadd(month,1,b.date))
The work is done here:
select
a.[Date]
,a.WorkingDay*
(b.Amount/
sum(a.WorkingDay) over (partition by year(a.Date)*100+month(a.Date)))
from tbWorkingDays a
inner join tbBudget b
on a.Date between b.Date and dateadd(day,-1,dateadd(month,1,b.date))
The expression
sum(a.WorkingDay) over (partition by year(a.Date)*100+month(a.Date))
Sums the number of workingdays for the current month. I then join against the budget and take the sum for the month and divide by the expression above.
To make sure there only is budget on workingdays, I simply multiply by "workingday", since 0 is a non workingday, the sum will be 0 for all non workingdays.

How to find the continous range of dates in Sql Server?

How do we find the continous range of dates from the following scenario?
Id modifiedDate StartDate EndDate
1 2019-01-01 2019-01-01 2019-12-31
1 2019-02-02 2019-02-01 2019-02-28
1 2019-02-27 2019-01-15 2019-03-15
1 2019-03-01 2019-03-01 2019-03-12
2 2019-01-01 2019-01-01 2019-03-01
2 2019-05-01 2019-05-01 2019-08-01
The Output i want to show is :
Id StartDate EndDate
1 2019-01-01 2019-01-15
1 2019-01-15 2019-02-01
1 2019-02-01 2019-02-28
1 2019-02-28 2019-03-01
1 2019-03-01 2019-03-12
2 2019-01-01 2019-03-01
2 2019-05-01 2019-08-01
What I have tried so far is :
With X As(
Select a.StartDate,a.EndDate,b.StartDate,b.EndDate
From table a Full Join table b ON a.endDate>b.StartDate
Where a.StartDate<>b.StartDate and b.endDate<>a.Enddate
)
Select StartDate,Enddate,Min(StartDtae)
From X
Group By StartDate,EndDate
But I couldn't get fill the gaps in between the dates. How can I fix this?
You can try this following script I have created with the Help of CTE and Row_Number(). I am getting 2 additional row considering your sample output from the the given input data. If you sample output is correct, you can ignore this solution.
CTE Only worked for MSSQL and Oracle. But you can convert the logic given, for any other databases.
WITH CTE
AS
(
SELECT DISTINCT id,Date, ROW_NUMBER() OVER(PARTITION BY id ORDER BY Date) RN
FROM
(
SELECT Id,StartDate Date FROM your_table
UNION ALL
SELECT Id,EndDate FROM your_table
) A
)
SELECT A.Id, A.Date StartDate,B.Date EndDate
FROM CTE A
INNER JOIN CTE B ON A.Id = B.Id AND A.RN = B.RN - 1
Output is-
Id StartDate EndDate
1 2019-01-01 2019-01-15
1 2019-01-15 2019-02-01
1 2019-02-01 2019-02-28
1 2019-02-28 2019-03-01
1 2019-03-01 2019-03-12
1 2019-03-12 2019-03-15 -- Not exist in your expected output
1 2019-03-15 2019-12-31 -- Not exist in your expected output
Note: Adding an additional Filtering at the as below will give you the exact output you have posted. But take your own decision which one best suits your requirement.
SELECT....
....
FROM CTE A
INNER JOIN CTE B ON A.Id = B.Id AND A.RN = B.RN - 1
WHERE B.DATE <= '2019-03-12'
The following query should give you the desired result:
WITH dates AS (SELECT StartDate
FROM TABLE
UNION
SELECT EndDate + 1
FROM TABLE)
SELECT StartDate
, (SELECT MIN(StartDate) - 1
FROM dates b
WHERE StartDate - 1 > a.StartDate) EndDate
FROM dates a
Just use lead() with union:
select t.id, t.dte as startdate,
lead(t.dte) over (partition by t.id order by t.dte) as enddate
from (select distinct t.id, v.dte
from t cross apply
(values (startdate), (enddate)) v(dte)
) t;
In addition to being concise, this probably has the best performance.

Get max date for each from either of 2 columns

I have a table like below
AID BID CDate
-----------------------------------------------------
1 2 2018-11-01 00:00:00.000
8 1 2018-11-08 00:00:00.000
1 3 2018-11-09 00:00:00.000
7 1 2018-11-15 00:00:00.000
6 1 2018-12-24 00:00:00.000
2 5 2018-11-02 00:00:00.000
2 7 2018-12-15 00:00:00.000
And I am trying to get a result set as follows
ID MaxDate
-------------------
1 2018-12-24 00:00:00.000
2 2018-12-15 00:00:00.000
Each value in the id columns(AID,BID) should return the max of CDate .
ex: in the case of 1, its max CDate is 2018-12-24 00:00:00.000 (here 1 appears under BID)
in the case of 2 , max date is 2018-12-15 00:00:00.000 . (here 2 is under AID)
I tried the following.
1.
select
g.AID,g.BID,
max(g.CDate) as 'LastDate'
from dbo.TT g
inner join
(select AID,BID,max(CDate) as maxdate
from dbo.TT
group by AID,BID)a
on (a.AID=g.AID or a.BID=g.BID)
and a.maxdate=g.CDate
group by g.AID,g.BID
and 2.
SELECT
AID,
CDate
FROM (
SELECT
*,
max_date = MAX(CDate) OVER (PARTITION BY [AID])
FROM dbo.TT
) AS s
WHERE CDate= max_date
Please suggest a 3rd solution.
You can assemble the data in a table expression first, and the compute the max for each value is simple. For example:
select
id, max(cdate)
from (
select aid as id, cdate from t
union all
select bid, cdate from t
) x
group by id
You seem to only care about values that are in both columns. If this interpretation is correct, then:
select id, max(cdate)
from ((select aid as id, cdate, 1 as is_a, 0 as is_b
from t
) union all
(select bid as id, cdate, 1 as is_a, 0 as is_b
from t
)
) ab
group by id
having max(is_a) = 1 and max(is_b) = 1;

Oracle SQL - Select users between two date by month

I am learning SQL and I was wondering how to select active users by month, depending on their starting and ending date (both timestamp(6)). My table looks like this:
Cust_Num | Start_Date | End_Date
1 | 2018-01-01 | 2019-01-01
2 | 2018-01-01 | NULL
3 | 2019-01-01 | 2019-06-01
4 | 2017-01-01 | 2019-03-01
So, counting the active users by month, I should have an output like:
As of. | Count
2018-06-01 | 3
...
2019-02-01 | 3
2019-07-01 | 1
So far, I do a manual operation by entering each month:
Select
201906,
count(distinct a.cust_num)
From
active_users a
Where
to_date(‘20190630’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)
union all
Select
201905,
count(distinct a.cust_num)
From
active_users a
Where
to_date(‘20190531’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)
union all
...
Not very optimized and sustainable if I want to enter 10 years ao 120 months lol.
Any help is welcome. Thanks a lot!
This query shows the active-user-count effective as-of the end of the month.
How it works:
Convert each input row (with StartDate and EndDate value) into two rows that represent a point-in-time when the active-user-count incremented (on StartDate) and decremented (on EndDate). We need to convert NULL to a far-off date value because NULL values are sorted before instead of after non-NULL values:
This makes your data look like this:
OnThisDate Change
2018-01-01 1
2019-01-01 -1
2018-01-01 1
9999-12-31 -1
2019-01-01 1
2019-06-01 -1
2017-01-01 1
2019-03-01 -1
Then we simply SUM OVER the Change values (after sorting) to get the active-user-count as of that specific date:
So first, sort by OnThisDate:
OnThisDate Change
2017-01-01 1
2018-01-01 1
2018-01-01 1
2019-01-01 1
2019-01-01 -1
2019-03-01 -1
2019-06-01 -1
9999-12-31 -1
Then SUM OVER:
OnThisDate ActiveCount
2017-01-01 1
2018-01-01 2
2018-01-01 3
2019-01-01 4
2019-01-01 3
2019-03-01 2
2019-06-01 1
9999-12-31 0
Then we PARTITION (not group!) the rows by month and sort them by their date so we can identify the last ActiveCount row for that month (this actually happens in the WHERE of the outermost query, using ROW_NUMBER() and COUNT() for each month PARTITION):
OnThisDate ActiveCount IsLastInMonth
2017-01-01 1 1
2018-01-01 2 0
2018-01-01 3 1
2019-01-01 4 0
2019-01-01 3 1
2019-03-01 2 1
2019-06-01 1 1
9999-12-31 0 1
Then filter on that where IsLastInMonth = 1 (actually, where ROW_COUNT() = COUNT(*) inside each PARTITION) to give us the final output data:
At-end-of-month Active-count
2017-01 1
2018-01 3
2019-01 3
2019-03 2
2019-06 1
9999-12 0
This does result in "gaps" in the result-set because the At-end-of-month column only shows rows where the Active-count value actually changed rather than including all possible calendar months - but that's ideal (as far as I'm concerned) because it excludes redundant data. Filling in the gaps can be done inside your application code by simply repeating output rows for each additional month until it reaches the next At-end-of-month value.
Here's the query using T-SQL on SQL Server (I don't have access to Oracle right now). And here's the SQLFiddle I used to come to a solution: http://sqlfiddle.com/#!18/ad68b7/24
SELECT
OtdYear,
OtdMonth,
ActiveCount
FROM
(
-- This query adds columns to indicate which row is the last-row-in-month ( where RowInMonth == RowsInMonth )
SELECT
OnThisDate,
OtdYear,
OtdMonth,
ROW_NUMBER() OVER ( PARTITION BY OtdYear, OtdMonth ORDER BY OnThisDate ) AS RowInMonth,
COUNT(*) OVER ( PARTITION BY OtdYear, OtdMonth ) AS RowsInMonth,
ActiveCount
FROM
(
SELECT
OnThisDate,
YEAR( OnThisDate ) AS OtdYear,
MONTH( OnThisDate ) AS OtdMonth,
SUM( [Change] ) OVER ( ORDER BY OnThisDate ASC ) AS ActiveCount
FROM
(
SELECT
StartDate AS [OnThisDate],
1 AS [Change]
FROM
tbl
UNION ALL
SELECT
ISNULL( EndDate, DATEFROMPARTS( 9999, 12, 31 ) ) AS [OnThisDate],
-1 AS [Change]
FROM
tbl
) AS sq1
) AS sq2
) AS sq3
WHERE
RowInMonth = RowsInMonth
ORDER BY
OtdYear,
OtdMonth
This query can be flattened into fewer nested queries by using aggregate and window functions directly instead of using aliases (like OtdYear, ActiveCount, etc) but that would make the query much harder to understand.
I have created the query which will give the result of all the months starting from the minimum start date in the table till maximum end date.
You can change it using adding one condition in WHERE clause.
-- table creation
CREATE TABLE ACTIVE_USERS (CUST_NUM NUMBER, START_DATE DATE, END_DATE DATE)
-- data creation
INSERT INTO ACTIVE_USERS
SELECT * FROM
(
SELECT 1, DATE '2018-01-01', DATE '2019-01-01' FROM DUAL UNION ALL
SELECT 2, DATE '2018-01-01', NULL FROM DUAL UNION ALL
SELECT 3, DATE '2019-01-01', DATE '2019-06-01' FROM DUAL UNION ALL
SELECT 4, DATE '2017-01-01', DATE '2019-03-01' FROM DUAL
)
-- data in the actual table
SELECT * FROM ACTIVE_USERS ORDER BY CUST_NUM;
CUST_NUM START_DATE END_DATE
---------- ---------- ----------
1 2018-01-01 2019-01-01
2 2018-01-01
3 2019-01-01 2019-06-01
4 2017-01-01 2019-03-01
Query to fetch desired result
WITH CTE ( START_DATE, END_DATE ) AS
(
SELECT
ADD_MONTHS( START_DATE, LEVEL - 1 ),
ADD_MONTHS( START_DATE, LEVEL ) - 1
FROM
(
SELECT
MIN( START_DATE ) AS START_DATE,
MAX( END_DATE ) AS END_DATE
FROM
ACTIVE_USERS
)
CONNECT BY LEVEL <= CEIL( MONTHS_BETWEEN( END_DATE, START_DATE ) ) + 1
)
--
--
SELECT
C.START_DATE,
COUNT(1) AS CNT
FROM
CTE C
JOIN ACTIVE_USERS D ON
(
C.END_DATE BETWEEN
D.START_DATE
AND
CASE
WHEN D.END_DATE IS NOT NULL THEN D.END_DATE
ELSE C.END_DATE
END
)
GROUP BY
C.START_DATE
ORDER BY
C.START_DATE;
-- output --
START_DATE CNT
---------- ----------
2017-01-01 1
2017-02-01 1
2017-03-01 1
2017-04-01 1
2017-05-01 1
2017-06-01 1
2017-07-01 1
2017-08-01 1
2017-09-01 1
2017-10-01 1
2017-11-01 1
START_DATE CNT
---------- ----------
2017-12-01 1
2018-01-01 3
2018-02-01 3
2018-03-01 3
2018-04-01 3
2018-05-01 3
2018-06-01 3
2018-07-01 3
2018-08-01 3
2018-09-01 3
2018-10-01 3
START_DATE CNT
---------- ----------
2018-11-01 3
2018-12-01 3
2019-01-01 3
2019-02-01 3
2019-03-01 2
2019-04-01 2
2019-05-01 2
2019-06-01 1
30 rows selected.
Cheers!!