Fill rows for missing data by last day of month - sql

I have a table that looks like
UserID LastDayofMonth Count
1234 2015-09-30 00:00:00 12
1237 2015-09-30 00:00:00 5
3233 2015-09-30 00:00:00 3
8336 2015-09-30 00:00:00 22
1234 2015-10-31 00:00:00 8
1237 2015-10-31 00:00:00 5
3233 2015-10-31 00:00:00 7
8336 2015-11-30 00:00:00 52
1234 2015-11-30 00:00:00 8
1237 2015-11-30 00:00:00 5
3233 2015-11-30 00:00:00 7
(with around ~10,000 rows). As you can see in the example, UserID 8336 has no record for October 31st (dates are monthly but always the last day of the month, which I want to keep). How do I return a table with a records that fills in records for a period of four months so that users like 8336 get records like
8336 2015-10-31 00:00:00 0
I do have a calendar table with all days that I can use.

If I understand correctly, you want a record for each user and for each end of month. And, if the record does not currently exist, then you want the value of 0.
This is two step process. Generate all the rows first, using cross join. Then use left join to get the values.
So:
select u.userId, l.LastDayofMonth, coalesce(t.cnt, 0) as cnt
from (select distinct userId from t) u cross join
(select distinct LastDayofMonth from t) l left join
t
on t.userId = u.userId and t.LastDayofMonth = l.LastDayofMonth;

This solution uses a couple of CTEs, not knowing your calendar table layout. The only advantage this solution has over Gordon Linoff's is it doesn't assume at least one user per possible month. I've provided test data per your example with an extra record for the month of July, skipping August entirely.
/************** TEST DATA ******************/
IF OBJECT_ID('MonthlyUserCount','U') IS NULL
BEGIN
CREATE TABLE MonthlyUserCount
(
UserID INT
, LastDayofMonth DATETIME
, [Count] INT
)
INSERT MonthlyUserCount
VALUES (1234,'2015-07-31 00:00:00',12),--extra record
(1234,'2015-09-30 00:00:00',12),
(1237,'2015-09-30 00:00:00',5),
(3233,'2015-09-30 00:00:00',3),
(8336,'2015-09-30 00:00:00',22),
(1234,'2015-10-31 00:00:00',8),
(1237,'2015-10-31 00:00:00',5),
(3233,'2015-10-31 00:00:00',7),
(8336,'2015-11-30 00:00:00',52),
(1234,'2015-11-30 00:00:00',8),
(1237,'2015-11-30 00:00:00',5),
(3233,'2015-11-30 00:00:00',7)
END
/************ END TEST DATA ***************/
DECLARE #Start DATETIME;
DECLARE #End DATETIME;
--establish a date range
SELECT #Start = MIN(LastDayofMonth) FROM MonthlyUserCount;
SELECT #End = MAX(LastDayofMonth) FROM MonthlyUserCount;
--create a custom calendar of days using the date range above and identify the last day of the month
--if your calendar table does this already, modify the next cte to mimic this functionality
WITH cteAllDays AS
(
SELECT #Start AS [Date], CASE WHEN DATEPART(mm, #Start) <> DATEPART(mm, #Start+1) THEN 1 ELSE 0 END [Last]
UNION ALL
SELECT [Date]+1, CASE WHEN DATEPART(mm,[Date]+1) <> DatePart(mm, [Date]+2) THEN 1 ELSE 0 END
FROM cteAllDays
WHERE [Date]< #End
),
--cte using calendar of days to associate every user with every end of month
cteUserAllDays AS
(
SELECT DISTINCT m.UserID, c.[Date] LastDayofMonth
FROM MonthlyUserCount m, cteAllDays c
WHERE [Last]=1
)
--left join the cte to evaluate the NULL and present a 0 count for that month
SELECT c.UserID, c.LastDayofMonth, ISNULL(m.[Count],0) [Count]
FROM cteUserAllDays c
LEFT JOIN MonthlyUserCount m ON m.UserID = c.UserID
AND m.LastDayofMonth =c.LastDayofMonth
ORDER BY c.LastDayofMonth, c.UserID
OPTION ( MAXRECURSION 0 )

Related

how to aggregate one record multiple times based on condition

I have a bunch of records in the table below.
product_id produced_date expired_date
123 2010-02-01 2012-05-31
234 2013-03-01 2014-08-04
345 2012-05-01 2018-02-25
... ... ...
I want the output to display how many unexpired products currently we have at the monthly level. (Say, if a product expires on August 04, we still count it in August stock)
Month n_products
2010-02-01 10
2010-03-01 12
...
2022-07-01 25
2022-08-01 15
How should I do this in Presto or Hive? Thank you!
You can use below SQL.
Here we are using case when to check if a product is expired or not(produced_date >= expired_date ), if its expired, we are summing it to get count of product that has been expired. And then group that data over expiry month.
select
TRUNC(expired_date, 'MM') expired_month,
SUM( case when produced_date >= expired_date then 1 else 0 end) n_products
from mytable
group by 1
We can use unnest and sequence functions to create a derived table; Joining our table with this derived table, should give us the desired result.
Select m.month,count(product_id) as n_products
(Select
(select x
from unnest(sequence(Min(month(produced_date)), Max(month(expired_date)), Interval '1' month)) t(x)
) as month
from table) m
left join table t on m.month >= t.produced_date and m.month <= t.expired_date
group by 1
order by 1

SQL query to find out number of days in a week a user visited

I'd like to find out how many days in a week users have visited my site. For example, 1 day in a week, 2 days in a week, every day of the week (7).
I imagine the easiest way of doing this would be to set the date range and find out the number of days within that range (option 1). However, ideally I'd like the code to understand a week so I can run a number of weeks in one query (option 2). I'd like the users to be unique for each number of days (ie those who have visited 2 days have also visited 1 day but would only be counted in the 2 days row)
In my database (using SQLWorkbench64) I have user ids (id) and date (dt)
I'm relatively new to SQL so any help would be very much appreciated!!
Expected results (based on total users = 5540):
Option 1:
Number of Days Users
1 2000
2 1400
3 1000
4 700
5 300
6 100
7 40
Option 2:
Week Commencing Number of Days Users
06/05/2019 1 2000
06/05/2019 2 1400
06/05/2019 3 1000
06/05/2019 4 700
06/05/2019 5 300
06/05/2019 6 100
06/05/2019 7 40
You can find visitor count between a date range with below script. Its also consider if a visitor visits multi days in the given date range, s/he will be counted for the latest date only from the range-
Note: Dates are used as sample in the query.
SELECT date,COUNT(id)
FROM
(
SELECT id,max(date) date
FROM your_table
WHERE date BETWEEN '04/21/2019' AND '04/22/2019'
GROUP BY ID
)A
GROUP BY date
You can find the Monday of the week of a date and then group by that. After you have the week day there is a series of group by. Here is how I did this:
DECLARE #table TABLE
(
id INT,
date DATETIME,
MondayOfWeek DATETIME
)
DECLARE #info TABLE
(
CommencingWeek DATETIME,
NumberOfDays INT,
Users INT
)
INSERT INTO #table (id,date) VALUES
(1,'04/15/2019'), (2,'07/21/2018'), (3,'04/16/2019'), (4,'04/16/2018'), (1,'04/16/2019'), (2,'04/17/2019')
UPDATE #table
SET MondayOfWeek = CONVERT(varchar(50), (DATEADD(dd, ##DATEFIRST - DATEPART(dw, date) - 6, date)), 101)
INSERT INTO #info (CommencingWeek,NumberOfDays)
SELECT MondayOfWeek, NumberDaysInWeek FROM
(
SELECT id,MondayOfWeek,COUNT(*) AS NumberDaysInWeek FROM #table
GROUP BY id,MondayOfWeek
) T1
SELECT CommencingWeek,NumberOfDays,COUNT(*) AS Users FROM #info
GROUP BY CommencingWeek,NumberOfDays
ORDER BY CommencingWeek DESC
Here is the output from my query:
CommencingWeek NumberOfDays Users
2019-04-14 00:00:00.000 1 2
2019-04-14 00:00:00.000 2 1
2018-07-15 00:00:00.000 1 1
2018-04-15 00:00:00.000 1 1

T-SQL Programming . Common Table expression

I would need a help in the following scneario. I am using T-SQL
Following is my table details. Say the table name is #tempk
Customer Current_Month Contract Amount
201 2015-09-01 3 100
My requirement is to add 12 months from the current month.that is 2016-09-01. Assuming
I am getting the start date of the month. I need the data in the following format
Customer Renewal_Month Contract_months End_Month Amount
201 2015-09-01 3 2016-09-01 100
201 2015-12-01 3 2016-09-01 100
201 2015-03-01 3 2016-09-01 100
201 2015-06-01 3 2016-09-01 100
The contract column can have any values
The consquent records are incremental of contract columns from the previous records.
I am using the following query. I have a date dimension table called Dim_Date that has date,quareter,year,month etc..
WITH GetProrateCTE (Customer_ID,Renewal_Month,Contract_Months,End_Month,MRR) as
(SELECT Customer_ID,Renewal_Month,Contract_Months,DATEADD(month, 12,Renewal_Month) End_Month,MRR
from #tempk),
GetRenewalMonths (Customer_ID,Renewal_Month,Contract_Months,End_Month,MRR) as
(
SELECT A.Customer_ID,B.Month Renewal_Month,A.Contract_Months,A.End_Month,A.MRR
FROM GetProrateCTE A
INNER JOIN (SELECT Month from DW..Dim_Date B GROUP BY MONTH) B
ON B.Month between A.Renewal_Month and A.End_Month
)
SELECT G.Customer_ID,G.Renewal_Month,G.Contract_Months,G.End_Month,G.MRR
FROM GetRenewalMonths G
Could you please help me to achieve the result. Any help would be greatly appreciated.
I want to do this in Common table Expressions. or would it be better if I go cursor.
You can try in this way -
WITH CTE AS
(SELECT Customer,DATEADD(MM,DATEDIFF(MM,0,Current_Month), 0) AS Renewal_Month,Contract,DATEADD(YEAR,1,Current_Month) AS End_Month,Amount,1 AS Level FROM #tempk
UNION ALL
SELECT t.Customer,DATEADD(MONTH,t.Contract,c.Renewal_Month),t.Contract,DATEADD(YEAR,1,t.Current_Month) AS End_Month,t.Amount,Level + 1
FROM #tempk t join CTE c on t.customer = c.customer
WHERE Level < (12/t.Contract))
SELECT Customer,Renewal_Month,Contract AS Contract_months,End_Month,Amount
FROM CTE
Just append your logic of the date dimension table to this.

Select dates in ranges list

I have table with records, each row contains DATETIME column which describes when row was loaded into table. And I have CTE which creates ranges (count is vary) like one below.
first_day_of_month last_day_of_moth
-------------------------------------------------------
2013-12-01 00:00:00.000 2013-12-31 23:59:59.000
2013-11-01 00:00:00.000 2013-12-31 23:59:59.000
2013-10-01 00:00:00.000 2013-12-31 23:59:59.000
2013-09-01 00:00:00.000 2013-12-31 23:59:59.000
2013-08-01 00:00:00.000 2013-12-31 23:59:59.000
Question: Now I want to select minimal DATETIME value from first table for each range created in CTE. I am absolutely have no idea how to do it. Any ideas/links are appreciated.
For example, it should looks like:
2013-12-10
2013-11-20
2013-10-05
2013-09-13
2013-08-06
UPD: Date or datetime - it is no matter
UPD2: I found that I can join my tables using condition like:
INNER JOIN source_monthes_dates ON
(load_timestamp >= first_day_of_month AND load_timestamp <= last_day_of_moth)
but actually I do not know how to acquire only first date of period.
You can use this query which uses ROW_NUMBER() to get the minimum. ranges is the result of your CTE, table1 is the other table where you have dates.
select x.somedate
from
(select t.somedate,
ROW_NUMBER() OVER (PARTITION BY r.first_day_of_month, r.last_day_of_moth ORDER BY t.somedate) rownumber
from ranges r
inner join table1 t
on r.first_day_of_month <= t.somedate and r.last_day_of_moth >= t.somedate) x
where x.rownumber = 1
SQL Fiddle demo
If you want to get all the ranges and include only those days that match ranges and display null for others, you can join ranges once more:
select ranges.first_day_of_month, ranges.last_day_of_moth, x.somedate
from
ranges
left join
(select t.somedate, r.first_day_of_month, r.last_day_of_moth,
ROW_NUMBER() OVER (PARTITION BY r.first_day_of_month, r.last_day_of_moth ORDER BY t.somedate) rownumber
from ranges r
inner join table1 t
on r.first_day_of_month <= t.somedate and r.last_day_of_moth >= t.somedate) x
on x.first_day_of_month = ranges.first_day_of_month and x.last_day_of_moth = ranges.last_day_of_moth
where isnull(x.rownumber, 1) = 1
SQL Fiddle demo

T-SQL: A list of Date without using temp table

I have a table
|Start Date|End Date |Value|Avgerage Value Per Day|
|2011-01-01 |2012-01-01| 730 | 2|
I want to turn this table into a View
|Date| Average Value |
2011-01-01 | 2
2011-01-02 | 2
2011-01-03 | 2
.....
2011-12-31 | 2
Is is possible without using temp table to generate a list of date?
Any ideas?
Edit
Thanks both of the answers
With recursive view is similar as temp table
I do worry about the performance in a view, caz the view later will be involved in other processes.
I'll try recursive view then, if it doesn't fit, I may just use an hard code date list table.
declare #start datetime
SET #start = '20110501'
declare #end datetime
SET #end ='20120501'
;with months (date)
AS
(
SELECT #start
UNION ALL
SELECT DATEADD(day,1,date)
from months
where DATEADD(day,1,date)<=#end
)
select * from months OPTION (MAXRECURSION 0);
etc..etc..etc..
Yes you can. This generates the days from the input set and then gives you the ranges you need
Though this technically internally is like temp tables you can create a recursive view :
Create View TestView as
with Data As -- Pretends to be your table of data replace with select from your tables
(
select Cast('2012-05-01' as DATETIME) [Start Date], Cast('2012-05-02' as DATETIME) [End Date], 2 [Avgerage Value Per Day]
union all
select Cast('2012-04-01' as DATETIME) [Start Date], Cast('2012-04-05' as DATETIME) [End Date], 3 [Avgerage Value Per Day]
)
,AllDates as -- generates all days
(
select cast('1900-01-01' as datetime) TheDate
union all
select TheDate + 1
from AllDates
where TheDate + 1 < '2050-12-31'
)
select TheDate [Date], o.[Avgerage Value Per Day]
from AllDates
join Data o on TheDate Between o.[Start Date] AND o.[End Date];
you can the query it but you need to ensure that you specify a recursion limit
select * from TestView
OPTION (MAXRECURSION 0)
this gives this result
Date Avgerage Value Per Day
2012-04-01 00:00:00.000 3
2012-04-02 00:00:00.000 3
2012-04-03 00:00:00.000 3
2012-04-04 00:00:00.000 3
2012-04-05 00:00:00.000 3
2012-05-01 00:00:00.000 2
2012-05-02 00:00:00.000 2
You can see that from the test data I wanted May 1-2 and April 1-5