SQL Server - Split year into 4 weekly periods - sql

I would like to split up the year into 13 periods with 4 weeks in each
52 weeks a year / 4 = 13 even periods
I would like each period to start on a saturday and end on a friday.
It should look like the below image
Obviously I could do this manually, but the dates would change each year and I am looking for a way to automate this with SQL rather than manually do this for each upcoming year
Is there a way to produce this yearly split automatically?

In this previous answer I show an approach to create a numbers/date table. Such a table is very handsome in many places.
With this approach you might try something like this:
CREATE TABLE dbo.RunningNumbers(Number INT NOT NULL,CalendarDate DATE NOT NULL, CalendarYear INT NOT NULL,CalendarMonth INT NOT NULL,CalendarDay INT NOT NULL, CalendarWeek INT NOT NULL, CalendarYearDay INT NOT NULL, CalendarWeekDay INT NOT NULL);
DECLARE #CountEntries INT = 100000;
DECLARE #StartNumber INT = 0;
WITH E1(N) AS(SELECT 1 FROM(VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(N)), --10 ^ 1
E2(N) AS(SELECT 1 FROM E1 a CROSS JOIN E1 b), -- 10 ^ 2 = 100 rows
E4(N) AS(SELECT 1 FROM E2 a CROSS JOIN E2 b), -- 10 ^ 4 = 10,000 rows
E8(N) AS(SELECT 1 FROM E4 a CROSS JOIN E4 b), -- 10 ^ 8 = 10,000,000 rows
CteTally AS
(
SELECT TOP(ISNULL(#CountEntries,1000000)) ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) -1 + ISNULL(#StartNumber,0) As Nmbr
FROM E8
)
INSERT INTO dbo.RunningNumbers
SELECT CteTally.Nmbr,CalendarDate.d,CalendarExt.*
FROM CteTally
CROSS APPLY
(
SELECT DATEADD(DAY,CteTally.Nmbr,{ts'1900-01-01 00:00:00'})
) AS CalendarDate(d)
CROSS APPLY
(
SELECT YEAR(CalendarDate.d) AS CalendarYear
,MONTH(CalendarDate.d) AS CalendarMonth
,DAY(CalendarDate.d) AS CalendarDay
,DATEPART(WEEK,CalendarDate.d) AS CalendarWeek
,DATEPART(DAYOFYEAR,CalendarDate.d) AS CalendarYearDay
,DATEPART(WEEKDAY,CalendarDate.d) AS CalendarWeekDay
) AS CalendarExt;
GO
NTILE - SQL Server 2008+ will create (almost) even chunks.
This the actual query
SELECT *,NTILE(13) OVER(ORDER BY CalendarDate) AS Periode
FROM RunningNumbers
WHERE CalendarWeekDay=6
AND CalendarDate>={d'2017-01-01'} AND CalendarDate <= {d'2017-12-31'};
GO
--Carefull with existing data!
--DROP TABLE dbo.RunningNumbers;
Hint 1: Place indexes!
Hint 2: Read the link about NTILE, especially the Remark-section.
I think this will fit for this case. You might think about using Prdp's approach with ROW_NUMBER() in conncetion with INT division. But - big advantage! - NTILE would allow PARTITION BY CalendarYear.
Hint 3: You might add a column to the table
...where you set the period's number as a fix value. This will make future queries very easy and would allow manual correction on special cases (53rd week..)

Here is one way using Calendar table
DECLARE #start DATE = '2017-04-01',
#end_date DATE = '2017-12-31'
SET DATEFIRST 7;
WITH Calendar
AS (SELECT 1 AS id,
#start AS start_date,
Dateadd(dd, 6, #start) AS end_date
UNION ALL
SELECT id + 1,
Dateadd(week, 1, start_date),
Dateadd(week, 1, end_date)
FROM Calendar
WHERE end_date < #end_date)
SELECT id,
( Row_number()OVER(ORDER BY id) - 1 ) / 4 + 1 AS Period,
start_date,
end_date
FROM Calendar
OPTION (maxrecursion 0)
I have generated dates using Recursive CTE but it is better to create a physical calendar table use it in queries like this

Firstly, you will never get 52 even weeks in a year, there are overlap weeks in most calendar standards. You will occasionally get a week 53.
You can tell SQL to use Saturday as the first day of the week with datefirst, then running a datepart on today's date with getdate() will tell you the week of the year:
SET datefirst 6 -- 6 is Saturday
SELECT datepart(ww,getdate()) as currentWeek
You could then divide this by 4 with a CEILING command to get the 4-week split:
SET datefirst 6
SELECT DATEPART(ww,getdate()) as currentWeek,
CEILING(DATEPART(ww,getdate())/4) as four_week_split

Related

How do I get the month number with the maximum number of days from the date range?

I have a table with 10 million rows, where there are two columns that contain the start date and the end date of the range. For example, 2019-09-25 and 2019-10-20. I want to extract the month number with the maximum number of days, in this example it will be 10. In addition to dates that are separated by one month, there are also such examples: 2019-07-01 and 2019-07-29 (within one month), as well as 2019-07-01 and 2019-09-05 (more than one month). How can I implement this?
Seems like you could do something like this:
SELECT CASE WHEN DATEDIFF(DAY, DATEFROMPARTS(YEAR(EndDate),MONTH(EndDate),1),EndDate) >= DATEDIFF(DAY, StartDate, EOMONTH(StartDate)) THEN DATEPART(MONTH,EndDate)
ELSE DATEPART(MONTH,StartDate)
END
FROM (VALUES('20190925','20191020'))V(StartDate,EndDate);
Does the following fit your requirements?
You can build a table of days-in-month (this would be permanent ideally)
and then join to it using the month numbers of your min and max dates.
declare #start date='20190925', #end date='20191020';
--declare #start date='20190701', #end date='20190729';
--declare #start date='20190701', #end date='20190905';
with dim as (
select m,DAY(DATEADD(DD,-1,DATEADD(mm, DATEDIFF(mm, 0, DateFromParts(Year(GetDate()),m,1) )+1, 0)))d
from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12))m(m)
)
select top(1) with ties m
from dim
where m between Month(#start) and Month(#end)
order by d desc
You don't state how you determin the most days where there are several months with the same number of months, so with ties includes all qualifying months.
Edit
So I don't know if there is a requirement to span years - the sample data suggests not - however with a permanent list of dates and corresponding days in month values (this is often part of a calendar table) a slight tweak will accomodate it.
with dim as (
select Year(#start)*100 + m m, Day(DATEADD(DD,-1,DATEADD(mm, DATEDIFF(mm, 0, DateFromParts(Year(#start),m,1) )+1, 0)))d
from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12))m(m)
union all
select Year(#end)*100 + m m, Day(DATEADD(DD,-1,DATEADD(mm, DATEDIFF(mm, 0, DateFromParts(Year(#end),m,1) )+1, 0)))d
from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12))m(m)
)
select top(1) with ties m
from dim
where m between Year(#start)*100 + Month(#start) and Year(#end)*100 + Month(#end)
order by d desc
You could try something like this
with
l0(n) as (
select 1 n
from (values (1),(1),(1),(1),(1),(1),(1),(1)) as v(n))
select top(1) with ties
vTable.*, calc.dt month_with_most_days
from (values ('20190925','20191020'),
('20190925','20191120')) vTable(startdate, enddate)
cross apply (values (datediff(month, vTable.startdate, vTable.enddate))) diff(mo_count)
cross apply (select top (diff.mo_count+1)
row_number() over (order by (select null)) n
from l0 l1, l0 l2, l0 l3, l0 l4) tally /* 8^4 months possible */
cross apply (values (cast(case when tally.n=1 then startdate
when tally.n=diff.mo_count+1 then enddate
else eomonth(dateadd(month, tally.n-1, startdate)) end as date))) calc(dt)
order by row_number() over (partition by startdate, enddate
order by day(calc.dt) desc);
startdate enddate month_with_most_days
20190925 20191020 2019-09-25
20190925 20191120 2019-10-31

How to get six weeks data from a week column?

I have a legacy query in which I am looking data for six weeks as shown below. In my below AND condition I get data for past six weeks and it worked fine in 2020 middle and end. But since 2021 started, this stopped working because of obvious subtraction I am doing with 6.
AND data.week_col::integer BETWEEN DATE_PART(w, CURRENT_DATE) - 6 AND DATE_PART(w, CURRENT_DATE) - 1
There is a bug in above query because of which it stopped working in 2021. How can I change above condition so that it can work entire year without any issues and give me data for past 6 weeks.
Update
Below is my query which I am running:
select *,
dateadd(d, - datepart(dow, trunc(CONVERT_TIMEZONE('UTC','PST8PDT',client_date))), trunc(CONVERT_TIMEZONE('UTC','PST8PDT',client_date)) + 6) as day,
date_part(week, day) as week_col
from holder data
where data.week_col::integer BETWEEN DATE_PART(w, CURRENT_DATE) - 6 AND DATE_PART(w, CURRENT_DATE) - 1
client_date column has values like this - 2021-01-15 21:30:00.0. And from that I get value of day column and from day column I get value of
week_col column as shown above.
week_col column has values like 53, 52 .... It's a week number in general.
Because of my AND condition I am getting data for week 1 only but technically I want data for 49, 50, 51, 52, 53 and 1 as it is past six weeks. Can I use day column here to get correct past six weeks?
Would this serve as a solution? I do not know much about the redshirt syntax but I read it supports dateadd(). If you are normalizing client_date to a time zone converted day with no time then why not simply use that in the comparison to the current date converted to the same time zone.
WHERE
client_date BETWEEN
DATEADD(WEEK,-6,trunc(CONVERT_TIMEZONE('UTC','PST8PDT',CURRENT_DATE)))
AND
DATEADD(WEEK,-1,trunc(CONVERT_TIMEZONE('UTC','PST8PDT',CURRENT_DATE)))
If the above logic works out then you may want to convert the -6 and -1 week to variables, if that is supported.
Solution 2
This is a bit more verbose but involves virtualizing a calender table and then joining your current date parameter into the calender data, for markers. Finally, you can join your data against the calender which has been normalized by weeks in time chronologically.
This is SQL Server syntax, however, I am certain it can be converted to RS.
DECLARE #D TABLE(client_date DATETIME)
INSERT #D VALUES
('11/20/2020'),('11/27/2020'),
('12/4/2020'),('12/11/2020'),('12/18/2020'),('12/25/2020'),
('01/8/2021'),('01/8/2021'),('1/15/2021'),('1/22/2021'),('1/29/2021')
DECLARE #Date DATETIME = '1/23/2021'
DECLARE #StartDate DATETIME = '01/01/2010'
DECLARE #NumberOfDays INT = 6000
;WITH R1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
R2(N) AS (SELECT 1 FROM R1 a, R1 b),
R3(N) AS (SELECT 1 FROM R2 a, R2 b),
Tally(Number) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM R3)
,WithTally AS
(
SELECT CalendarDate = DATEADD(DAY,T.Number,#StartDate)
FROM Tally T
WHERE T.Number < #NumberOfDays
)
,Calendar AS
(
SELECT
CalendarDate,
WeekIndex = DENSE_RANK() OVER(ORDER BY DATEPART(YEAR, CalendarDate), DATEPART(WEEK, CalendarDate))
FROM
WithTally
),
CalendarAlignedWithCurrentDateParamater AS
(
SELECT *
FROM
Calendar
CROSS JOIN (SELECT WeekIndexForToday=WeekIndex FROM Calendar WHERE Calendar.CalendarDate=#Date ) AS X
)
SELECT
D.*,
C.WeekIndex,
C.WeekIndexForToday
FROM
CalendarAlignedWithCurrentDateParamater C
INNER JOIN #D D ON D.client_date = C.CalendarDate
WHERE
C.WeekIndex BETWEEN C.WeekIndexForToday-6 AND C.WeekIndexForToday-1
OPTION (MAXRECURSION 0)

SQL question - how to create date + hour dimension table?

I would like to create a table showing hours 0 through 24 for each date since 1/1/2020 (until current). It would look something like this:
enter image description here
Column 1: Date from 1/1/2020 until current
Column 2: Hour 0-24, repeating for each date
Here is one way to do this using T-SQL:
with
tally as
(
select top 1000 n = row_number() over(order by (select null)) - 1 from sys.messages
),
calendar as
(
select [Date] = cast(dateadd(d, n, '1/1/2020') as date) from tally where n < datediff(d, '1/1/2020', getdate())
),
[hours] as
(
select top 24 n from tally
)
--select * from tally;
--select * from calendar;
select [Date] = format(c.[Date], 'M/d/yyyy'), Hrs = h.n
from [hours] h
cross join calendar c
order by c.[Date], h.n;
The tally CTE creates rows with a zero based index.
The calendar CTE creates the dates between 1/1/2020 and today.
The hours CTE creates the hours from 0 through 23.
The final query creates a Cartesian Product of calendar and hours.
This is a query that generates the desired data. If the data is to be persisted in a table, then insert logic would need to be added to the final query.

Group by contiguous dates and Count

I have a table which contains information about reports being accessed along with the Date.I need to group reports being accessed according to a date range and count them.
I'm using T-SQL
Table
EventId ReportId Date
60 4 11/24/2015
59 11 11/23/2015
58 6 11/22/2015
57 11 11/22/2015
56 9 11/21/2015
55 3 11/20/2015
54 5 11/20/2015
53 6 11/19/2015
52 5 11/19/2015
51 4 11/18/2015
50 3 11/17/2015
49 9 11/16/2015
If days' difference is 3 then I need result in the format
StartDate EndDate ReportsAccessed
11/22/2015 11/24/2015 4
11/19/2015 11/21/2015 5
11/16/2015 11/18/2015 3
but the difference between days could change.
Assuming you have values for all the dates, then you can calculate the difference in days between each date and the maximum (or minimum) date. Then divide this by three and use that for aggregation:
select min(date), max(date), count(*) as ReportsAccessed
from (select t.*, max(date) over () as maxd
from table t
) t
group by (datediff(day, date, maxd) / 3)
order by min(date);
"3" is what I think you are referring to as the "difference in days".
Those 2 blocks are simply for added clarity on what parameters you'd have to change
DECLARE #t as TABLE(
id int identity(1,1),
reportId int,
dateAccess date)
DECLARE #NumberOfDays int=3;
And here comes the actual select
Select StartDate, EndDate, COUNT(reportId) from
(
select *,
DATEADD(day, DATEDIFF(DAY, dateAccess, maxdate.maxdate)%#NumberOfDays, dateAccess) as EndDate,
DATEADD(day, DATEDIFF(DAY, dateAccess, maxdate.maxdate)%#NumberOfDays-#NumberOfDays+1, dateAccess) as StartDate
from #t, (select MAX(dateAccess) maxdate from #t t2) maxdate
) results
GROUP BY StartDate, EndDate
ORDER BY StartDate desc
There are a few places I'm unsure if it's optimized or not, for instance cross joining with select max(date) instead of using a subquery, but that returns the exact result from your OP.
Basically, I simply split the entries into groups based on how far they are from the MAX(date), and then use a COUNT. On that note, it might be more useful to use COUNT(distinct ...) otherwise if someone looks at the document #9 3 times, it will tell you tha 3 documents were checked, but only 1 was truly looked at.
The upside with using MAX(date) over MIN(date) is that your first group will always have the maximal amount of days. This will prove very useful if you want to compare the last few periods to the average. The downside is that you don't have stable data. With every new entry (assuming it's a new day), your query will cycle itself to produce a new set of results. If you wanted to graph the data, you'd be better comparing to MIN(date) that way the first days won't change when you add a new one.
Depending on the usage, it could even be useful to extrapolate the number of accesses done in the last period (in that case MIN(date) is also preferable).
Here's an adaptation of Gordon's answer that's probably much more optimized (it's at the very least much more aesthetic) :
SELECT DateADD(day, -datediff(day, dateAccess, maxdate)/3*3, maxdate) as EndDate,
DateADD(day, (-datediff(day, dateAccess, maxdate)/3+1)*3, maxdate) as StartDate,
count(reportId)
from (select *, MAX(dateAccess) over() as maxdate from #t) t
GROUP BY datediff(day, dateAccess, maxdate)/3, maxdate
I will insist that most efficient way of doing this is to use tally table. That way you are getting sargable predicates with all benefits from indexes on date column:
declare #c int = 3
;with minmax as(select min(date) as mind, max(date) as maxd from t),
tally as(select #c * (-1 + row_number() over(order by(select null))) as rn
from master..spt_values),
intervals as(select dateadd(dd, rn, mind) as f, dateadd(dd, rn + #c - 1, mind) t
from tally t cross join minmax m where dateadd(dd, rn, mind) <= maxd)
select i.f as [from], i.t as [to], count(*) as reeports
from intervals i
join t on t.date >= i.f and t.date <= i.t
group by i.f, i.t
Explanation: minmax selects minimum date and maximum date from table.
tally generates numbers from 0 to N(depends on system, but enougth to calc intervals). intervals selects resulting intervals. The last part is simple join on intervals to calculate counts per interval.
Fiddle http://sqlfiddle.com/#!3/c61d1/5

Sql to select row from each day of a month

I have a table which store records of all dates of a month. I want to retrieve some data from it. The table is so large that I should only selecting a fews of them. If the records have a column "ric_date" which is a date, how can I select records from each of the dates in a month, while selecting only a fews from each date?
The table is so large that the records for 1 date can have 100000 records.
WITH T AS (
SELECT ric_date
FROM yourTable
WHERE rice_date BETWEEN #start_date AND #end_date -- thanks Aaron Bertrand
GROUP BY ric_date
)
SELECT CA.*
FROM T
CROSS APPLY (
SELECT TOP 500 * -- 'a fews'
FROM yourTable AS YT
WHERE YT.ric_date = T.ric_date
ORDER BY someAttribute -- not required, but useful
) AS CA
Rough idea. This will get the first three rows per day for the current month (or as many that exist for any given day - there may be days with no rows represented).
DECLARE
#manys INT = 3,
#month DATE = DATEADD(DAY, 1-DAY(GETDATE()), DATEDIFF(DAY, 0, GETDATE()));
;WITH x AS
(
SELECT some_column, ric_date, rn = ROW_NUMBER() OVER
(PARTITION BY ric_date ORDER BY ric_date)
FROM dbo.data
WHERE ric_date >= #month
AND ric_date < DATEADD(MONTH, 1, #month)
)
SELECT some_column, ric_date FROM x
WHERE rn <= #manys;
If you don't have supporting indexes (most importantly on ric_date), this won't necessarily scale well at the high end.