I have reviewed many posts about how to find gaps in dates and believe that I am close to figuring it out but need just a little extra help. Per my query I am pulling distinct days with a record count for each distinct day. I have added a "Gap_Days" column which should return a zero if no gap from previous date OR the number of days since the previous date. As you can see all of my Gap_Days are zero when in fact I am missing 10/24 and 10/25. Therefore on 10/26 there should be a gap of 2 since the previous date is 10/23.
Thanks in advance for pointing out what I am probably looking right at.
SELECT DISTINCT Run_Date, COUNT(Run_Date) AS Daily_Count,
Gap_Days = Coalesce(DateDiff(Day,Lag(Run_Date) Over (partition by Run_Date order by Run_Date DESC), Run_Date)-1,0)
FROM tblUnitsOfWork
WHERE (Run_Date >= '2022-10-01')
GROUP BY Run_Date
ORDER BY Run_Date DESC;
Run_Date Daily_Count Gap_Days
2022-10-29 00:00:00.000 8431 0
2022-10-28 00:00:00.000 8204 0
2022-10-27 00:00:00.000 8705 0
2022-10-26 00:00:00.000 7885 0
2022-10-23 00:00:00.000 7485 0
2022-10-22 00:00:00.000 8699 0
2022-10-21 00:00:00.000 9212 0
2022-10-20 00:00:00.000 9220 0
First let's set up some demo data:
DECLARE #table TABLE (ID INT IDENTITY, date DATE)
DECLARE #dt DATE
WHILE (SELECT COUNT(*) FROM #table) < 30
BEGIN
SET #dt = DATEADD(DAY,(ROUND(((50 - 1 -1) * RAND() + 1), 0) - 1)-25,CURRENT_TIMESTAMP)
IF NOT EXISTS (SELECT 1 FROM #table WHERE date = #dt) INSERT INTO #table (date) SELECT #dt
END
ID date
--------
1 2022-11-10
2 2022-11-15
3 2022-10-20
...
28 2022-10-14
29 2022-11-13
30 2022-11-21
This gives us a table variable with 30 random dates in a 50 day window. Now let's look for missing dates:
SELECT *, CASE WHEN ROW_NUMBER() OVER (ORDER BY date) > 1 AND LAG(date,1) OVER (ORDER BY date) <> DATEADD(DAY,-1,date) THEN 'GAP! ' + CAST(DATEDIFF(DAY,LAG(date,1) OVER (ORDER BY date),date)-1 AS NVARCHAR) + ' DAYS MISSING!' END
FROM #table
ORDER BY date
All we're doing here is ignoring the first date (since it's expected there wouldn't be one before then) and from then on comparing the last date (using lag ordered by date) to the current date. If it is not a day before the case statement will produce a message with how many days were missing.
ID date MissingDatesFlag
----------------------------
1 2022-10-08 NULL
4 2022-10-09 NULL
25 2022-10-10 NULL
28 2022-10-11 NULL
22 2022-10-15 GAP! 4 DAYS MISSING!
2 2022-10-18 GAP! 3 DAYS MISSING!
12 2022-10-19 NULL
24 2022-10-20 NULL
....
15 2022-11-18 GAP! 3 DAYS MISSING!
29 2022-11-21 GAP! 3 DAYS MISSING!
20 2022-11-22 NULL
Since the demo data is randomly selected your results may vary, but they should be similar.
Related
My table structure in SQL Server looks as below.
id startdate enddate value
---------------------------------------
1 2019-02-06 2019-02-07 11
1 2019-01-22 2019-02-05 10
1 2019-01-15 2019-01-21 14
1 2018-12-13 2018-01-14 15
1 2018-12-09 2018-12-12 14
1 2018-08-13 2018-12-08 17
1 2018-07-19 2018-08-12 19
1 2018-06-13 2018-07-18 20
Now my query needs to display value from highest start date for that month. Which is fine and I know what needs to be done but Not start just highest date value for that month, if no value is there for that start date, we carry forward value from last month. So basically if you notice on above data, after December 2018 values, there are no values for November, October, September etc but I want to return MM/YYYY values for that month in result but value for those months should be what we found on earlier month which is August values which in this example is 17. Please note that enddate will always be as of one day before new start date begins. Probably that can be used for back filling and carry forwarding missing month values?
So my result should look like below.
id date value
----------------------------
1 2019-02 11
1 2019-01 10
1 2018-12 15
1 2018-11 17
1 2018-10 17
1 2018-09 17
1 2018-08 17
1 2018-07 19
1 2018-06 20
Do you think this can be done without using cursor here?
Alexander Volok's answer is solid, so I won't go into too much extra code. But I thought I'd explain the reasoning. In essence, what you need to do is create a skeleton date table containing all the dates and primary keys you want returned. I'm guessing you have more than one id value in your real data, so probably something like this (whether you choose to persist it or not is up to you)
create table #skelly
(
id int,
_year int,
_month int
primary key (id, _year, _month)
)
You can get much more precise if you need to be, by only including dates which fall between the min and max StartDate per id, but that's an exercise I leave up to you.
From there, it's then just a matter of filling in the values you care about against that skeleton table. You can do this in a number of ways; by joining, cross applying or a correlated subquery (as Alexander Volok used).
DECLARE #start DATE, #end DATE;
SELECT #start = '20180601', #end = GETDATE();
;WITH Months AS
(
SELECT EOMONTH(DATEADD(month, n-1, #start)) AS DateValue FROM (
SELECT TOP (DATEDIFF(MONTH, #start, #end) + 1)
n = ROW_NUMBER() OVER (ORDER BY [object_id])
FROM sys.all_objects
) D
)
, InputData AS
(
SELECT 1 AS id, '2019-02-06' startdate, '2019-02-07' as enddate, 11 AS [value] UNION ALL
SELECT 1, '2019-01-22', '2019-01-25', 10 UNION ALL
SELECT 1, '2019-01-15', '2019-01-17', 14 UNION ALL
SELECT 1, '2018-12-13', '2018-12-19', 15 UNION ALL
SELECT 1, '2018-12-09', '2018-12-10', 14 UNION ALL
SELECT 1, '2018-08-13', '2018-12-08', 17 UNION ALL
SELECT 1, '2018-07-19', '2018-07-25', 19 UNION ALL
SELECT 1, '2018-06-13', '2018-07-18', 20
)
SELECT FORMAT(m.DateValue, 'yyyy-MM') AS [Month]
, (SELECT TOP 1 I.value FROM InputData I WHERE I.startdate < M.DateValue ORDER BY I.startdate DESC ) [Value]
FROM months m
ORDER BY M.DateValue DESC
Results to:
Month Value
2019-02 11
2019-01 10
2018-12 15
2018-11 17
2018-10 17
2018-09 17
2018-08 17
2018-07 19
2018-06 20
EDIT: Start date as of Jan 1 XXXX
I need to create a count of distinct userID's based on a 7 day grouping. Basically if a User calls on day 1 and day 2 of the month, they are counted 1 time. However if they call on Day 1 and day 10, then they are counted 2 times.
Table layout:
userId CallId datetime
0 123 01/01/2016 xx:xx:xx
0 124 01/10/2016 xx:xx:xx
1 125 01/10/2016 xx:xx:xx
1 126 01/10/2016 xx:xx:xx
2 127 01/10/2016 xx:xx:xx
1 128 01/30/2016 xx:xx:xx
2 129 01/31/2016 xx:xx:xx
What I need the return to look like:
Count(UserID) Week#
1 1
3 2
2 4
Thank you for your time.
Based on Gurwinders response I have produced the following and included years so that it is still usuable in a years time.
SELECT COUNT(UserID), CallYear, CallWeek
FROM (
SELECT DISTINCT UserID,
datepart(year,datetime) as CallYear,
datepart(week,datetime) as CallWeek
FROM my_table
)
Group By CallYear,CallWeek
This will produce a rolling distinct count begining Jan 1
Declare #YourTable table (userId int,CallId int,datetime datetime)
Insert Into #YourTable values
(0,123,'2016-01-01'),
(0,124,'2016-01-10'),
(1,125,'2016-01-10'),
(1,126,'2016-01-10'),
(2,127,'2016-01-10'),
(1,128,'2016-01-30'),
(2,129,'2016-01-31')
Select D1
,D1 =DateAdd(DD,6,D1)
,Cnt=count(Distinct UserID)
From #YourTable A
Join (Select Top 500 D1=DateAdd(DD,(Row_Number() Over (Order By Number)-1)*7,'2016-01-01') From master..spt_values ) B
on datetime between D1 and DateAdd(DD,6,D1)
Group By D1
Returns
D1 D1 Cnt
2016-01-01 2016-01-07 1
2016-01-08 2016-01-14 3
2016-01-29 2016-02-04 2
you can use this:
select count(distinct userid), datepart(week, datetime) week, datepart(year, datetime) year
from my_table
group by datepart(week, datetime), datepart(year, datetime);
What is your starting date? Have you looked at the DateDiff() function?
Try this:
With ABC
As
(select datepart(week, datetime) as week#
from table)
Select count(week#) as Times,week#
From ABC
Source date:
CREATE TABLE #Temp (ID INT Identity(1,1) Primary Key, BeginDate datetime, EndDate datetime, GroupBy INT)
INSERT INTO #Temp
SELECT '2015-06-05 00:00:00.000','2015-06-12 00:00:00.000',7
UNION
SELECT '2015-06-05 00:00:00.000', '2015-06-08 00:00:00.000',7
UNION
SELECT '2015-10-22 00:00:00.000', '2015-10-31 00:00:00.000',7
SELECT *, DATEDIFF(DAY,BeginDate, EndDate) TotalDays FROM #Temp
DROP TABLE #Temp
ID BeginDate EndDate GroupBy TotalDays
1 6/5/15 0:00 6/8/15 0:00 7 3
2 6/5/15 0:00 6/12/15 0:00 7 7
3 10/22/15 0:00 10/31/15 0:00 7 9
Desired Output:
ID BeginDate EndDate GroupBy TotalDays GroupCnt GroupNum
1 6/5/15 0:00 6/8/15 0:00 7 3 1 1
2 6/5/15 0:00 6/12/15 0:00 7 7 1 1
3 10/22/15 0:00 10/29/15 0:00 7 9 2 1
3 10/29/15 0:00 10/31/15 0:00 7 9 2 2
Goal:
Group the records based on ID/BeginDate/EndDate.
Based on the GroupBy number (# of days) and TotalDays (days diff),
if the GroupBy => TotalDays, keep a single group record
else multiply the group records (1 record per GroupBy count) while staying within TotalDays limit.
Apologies if it's confusing but basically, in the above example, there should be one record for each group (ID/BeginDate/EndDate) for the record where days diff b/w Begin/End date = 7 or less (GroupBy).
If the days diff goes above 7 days, create another record (for every additional 7 days diff).
So since 1st two records have days diff of 7 days or less, there's only one record.
The 3rd record has days diff of 9 (7 + 2). Therefore, there should be 2 records (1st for the first 7 days and 2nd for the additional 2 days).
GroupCNT = how many records there're of the grouped records after applying the above records.
GroupNum is basically row number of the group.
GroupBy # can be different for each record. Dataset is huge so performance does matter.
One pattern I was able to figure out was related to the modulus b/w GroupBy and days diff.
When the GroupBy value is < days diff, modulus is always less than GroupBy. When the GroupBy value = days diff, modulus is always 0. And when the GroupBy value > days diff, modulus is always equals GroupBy. I'm not sure if/how to use that to group/multiply records to meet the requirement.
SELECT DISTINCT
ID
, BeginDate
, EndDate
, GroupBy
, DATEDIFF(DAY,BeginDate, EndDate) TotalDays
, CAST(GroupBy as decimal(18,6))%CAST(DATEDIFF(DAY,BeginDate, EndDate) AS decimal(18,6)) Modulus
, CASE WHEN DATEDIFF(DAY,BeginDate, EndDate) <= GroupBy THEN BeginDate END NewBeginDate
, CASE WHEN DATEDIFF(DAY,BeginDate, EndDate) <= GroupBy THEN EndDate END NewEndDate
FROM #Temp
Update:
Forgot to mention/include that the begin/enddate, when the records gets multiplied, will change accordingly. In other words, begin/end date will reflect the GroupBy - desired output shows what I mean more clearly in the 3rd and 4th record.
Also, GroupCnt/GroupNum are not as important to calculate as grouping/multiplying the records.
You could do something like this using a recursive CTE..
;WITH cte AS (
SELECT ID,
BeginDate,
EndDate,
GroupBy,
DATEDIFF(DAY, BeginDate, EndDate) AS TotalDays,
1 AS GroupNum
FROM #Temp
UNION ALL
SELECT ID,
BeginDate,
EndDate,
GroupBy,
TotalDays,
GroupNum + 1
FROM cte
WHERE GroupNum * GroupBy < TotalDays
)
SELECT ID,
BeginDate = CASE WHEN GroupNum = 1 THEN BeginDate
ELSE DATEADD(DAY, GroupBy * (GroupNum - 1), BeginDate)
END ,
EndDate = CASE WHEN TotalDays <= GroupBy THEN EndDate
WHEN DATEADD(DAY, GroupBy * GroupNum, BeginDate) > EndDate THEN EndDate
ELSE DATEADD(DAY, GroupBy * GroupNum, BeginDate)
END ,
GroupBy,
TotalDays,
COUNT(*) OVER (PARTITION BY ID) GroupCnt,
GroupNum
FROM cte
OPTION (MAXRECURSION 0)
the cte builds out a recordset like this.
ID BeginDate EndDate GroupBy TotalDays GroupNum
----------- ----------------------- ----------------------- ----------- ----------- -----------
1 2015-06-05 00:00:00.000 2015-06-08 00:00:00.000 7 3 1
2 2015-06-05 00:00:00.000 2015-06-12 00:00:00.000 7 7 1
3 2015-10-22 00:00:00.000 2015-10-31 00:00:00.000 7 9 1
3 2015-10-22 00:00:00.000 2015-10-31 00:00:00.000 7 9 2
then you just have to take this and use some case statements to determine what the begin and end date should be.
you should end up with
ID BeginDate EndDate GroupBy TotalDays GroupCnt GroupNum
----------- ----------------------- ----------------------- ----------- ----------- ----------- -----------
1 2015-06-05 00:00:00.000 2015-06-08 00:00:00.000 7 3 1 1
2 2015-06-05 00:00:00.000 2015-06-12 00:00:00.000 7 7 1 1
3 2015-10-22 00:00:00.000 2015-10-29 00:00:00.000 7 9 2 1
3 2015-10-29 00:00:00.000 2015-10-31 00:00:00.000 7 9 2 2
since you're using SQL 2012, you can also use the LAG and LEAD functions in your final query.
;WITH cte AS (
SELECT ID,
BeginDate,
EndDate,
GroupBy,
DATEDIFF(DAY, BeginDate, EndDate) AS TotalDays,
1 AS GroupNum
FROM #Temp
UNION ALL
SELECT ID,
BeginDate,
EndDate,
GroupBy,
TotalDays,
GroupNum + 1
FROM cte
WHERE GroupNum * GroupBy < TotalDays
)
SELECT ID,
BeginDate = COALESCE(LAG(BeginDate) OVER (PARTITION BY ID ORDER BY GroupNum) + GroupBy * (GroupNum - 1), BeginDate),
EndDate = COALESCE(LEAD(BeginDate) OVER (PARTITION BY ID ORDER BY GroupNum) + GroupBy * GroupNum, EndDate),
GroupBy,
TotalDays,
COUNT(*) OVER (PARTITION BY ID) GroupCnt,
GroupNum
FROM cte
OPTION (MAXRECURSION 0)
CREATE TABLE dim_number (id INT);
INSERT INTO dim_number VALUES ((0), (1), (2), (3)); -- Populate this to a large number
SELECT
#Temp.Id,
CASE WHEN dim_number.id = 0
THEN #Temp.BeginDate
ELSE DATEADD(DAY, dim_number.id * #Temp.GroupBy, #Temp.BeginDate)
END AS BeginDate,
CASE WHEN dim_number.id = parts.count
THEN #Temp.EndDate
ELSE DATEADD(DAY, (dim_number.id + 1) * #Temp.GroupBy, #Temp.BeginDate)
END AS EndDate,
#Temp.GroupBy AS GroupBy,
DATEDIFF(DAY, #Temp.BeginDate, #Temp.EndDate) AS TotalDays,
parts.count + 1 AS GroupCnt,
dim_number.id + 1 AS GroupNum
FROM
#Temp
CROSS APPLY
(SELECT DATEDIFF(DAY, #Temp.BeginDate, #Temp.EndDate) / #Temp.GroupBy AS count) AS parts
INNER JOIN
dim_number
ON dim_number.id >= 0
AND dim_number.id <= parts.count
I need to pick one date from week, it has to be Friday. However, when Friday is null - it means no data was entered, and I have to find any other day with data in the same week. Can someone share their views on how to solve this type of situation?
If you see in the following data, in the 2nd week, Friday has null entry, so another day has to be picked up.
Day Weekdate Data entry dt Data
1 2/7/2016
2 2/8/2016
3 2/9/2016
4 2/10/2016
5 2/11/2016
6 2/12/2016 2/12/2016 500
7 2/13/2016
1 2/14/2016
2 2/15/2016
3 2/16/2016
4 2/17/2016 2/17/2016 300
5 2/18/2016
6 2/19/2016 NULL NULL
7 2/20/2016
1 2/21/2016
2 2/22/2016
3 2/23/2016
4 2/24/2016
5 2/25/2016
6 2/26/2016 2/26/2016 250
7 2/27/2016
You may try this
--Not null data
select * from tblData
where DATEPART(dw,weekDate) = 6 and data is not null
Union
Select data.* from
(
select weekDate
from tblData
where DATEPART(dw,weekDate) = 6 and data is null
) nullData --Select Friday with null data
Cross Apply
(
--Find first record with not null data that is within this week
Select top 1 *
From tblData data
Where
data.weekDate between Dateadd(day, -6, nullData.weekDate) and nullData.weekDate
and data.data is not null
Order by data.weekDate desc
) data
You can try something like this to get the data entered for the latest date (Friday first, then every other day) for each week in your table:
SELECT
Weeks.FirstofWeek,
Detail.Day,
Detail.DataEntryDt,
Detail.Data
FROM
( --master list of weeks
SELECT DISTINCT DATEADD(DAY,(1-DATEPART(dw,Weekdate)),Weekdate) AS FirstofWeek
FROM dataTable
) AS Weeks
LEFT OUTER JOIN
( --detail
SELECT
--order first by presence of data, then by date, selecting Friday first:
ROW_NUMBER() OVER (PARTITION BY DATEADD(DAY,(1-DATEPART(dw,Weekdate)),Weekdate) ORDER BY CASE WHEN Data IS NOT NULL THEN 99 ELSE 0 END DESC, CASE WHEN [Day] = 6 THEN 99 ELSE [Day] END DESC) AS RowNum,
[Day],
DATEADD(DAY,(1-DATEPART(dw,Weekdate)),Weekdate) AS FirstofWeek,
Weekdate,
DataEntryDt,
Data
FROM dataTable
) AS Detail
ON Weeks.FirstofWeek = Detail.FirstofWeek
AND Detail.RowNum = 1 --get only top record for week with data present
I am new to SQL and I need to find count of users every 7 days. I have a table with users for every single day starting from April 2015 up until now:
...
2015-05-16 00:00
2015-05-16 00:00
2015-05-17 00:00
2015-05-17 00:00
2015-05-17 00:00
2015-05-17 00:00
2015-05-17 00:00
2015-05-18 00:00
2015-05-18 00:00
...
and I need to count the number of users every 7 days (weekly) so I have data weekly.
SELECT COUNT(user_id), Activity_Date FROM TABLE_NAME
I need output like this:
TotalUsers week1 week2 week3 ..........and so on
82 80 14 16
I am using DB Visualizer to query Oracle database.
You should try following,
Select
sum(Week1) + sum(Week2) + sum(Week3) + sum(Week4) + sum(Week5) as Total,
sum(Week1) as Week1,
sum(Week2) as Week2,
sum(Week3) as Week3,
sum(Week4) as Week4,
sum(Week5) as Week5
From (
select
case when week = 1 then 1 else 0 end as Week1,
case when week = 2 then 1 else 0 end as Week2,
case when week = 3 then 1 else 0 end as Week3,
case when week = 4 then 1 else 0 end as Week4,
case when week = 5 then 1 else 0 end as Week5
from
(
Select
CEILING(datepart(dd,visitdate)/7+1) week,
user_id
from visitor
)T
)D
Here is Fiddle
You need to add month & year in the result as well.
SELECT COUNT(user_id), Activity_Date FROM TABLE_NAME WHERE Activity_Date > '2015-06-31';
That would get the amount of users for the last 7 days.
This is my test table:
user_id act_date
1 01/04/2015
2 01/04/2015
3 04/04/2015
4 05/04/2015
..
This is my query:
select week_offset, count(*) nb from (
select trunc((act_date-to_date('01042015','DDMMYYYY'))/7) as week_offset from test_date)
group by week_offset
order by 1
and this is the output:
week_offset nb
0 6
1 3
4 5
5 7
6 3
7 1
18 1
Week offset is the number of the week from 01/04/2015, and we can show the first day of the week.
See here for live testing.
How do you define your weeks? Here's an approach for SQL Server that starts each seven-day block relative to the start of April. The expressions will vary according to your specific needs:
select
dateadd(
dd,
datediff(dd, cast('20150401' as date), Activity_Date) / 7 * 7,
cast('20150401' as date)
) as WeekStart,
count(*)
from T
group by datediff(dd, cast('20150401' as date), Activity_Date) / 7
Oracle:
select
trunc(Activity_date, 'DAY') as WeekStart,
count(*)
from T
group by trunc(Activity_date, 'DAY') /* D and DAY are the same thing */