Snowflake sql query to assign weeks to a month - sql

I know about Snowflake date function to find out day, week, month, year, etc.
I want to have weeks start from Saturday each week to next Saturday.
following gives an idea how to extract, but need something to address my specific case.
How to get week number of month for any given date on snowflake SQL
If four days or more in week period belong to a certain month, I would assign the week to that month; otherwise, to the next month
example:
Week of April 29, 2023 to May 5, 2023 has less then four days in April so want to consider it as May
Week of May 23, 2023 to June 2nd, 2023 has more than four days in May so I would like to consider it as May
I want to assign weeks to a month with more days of one month (four or more days)

Snowflake will allow you to set the first day of the week with a parameter.
https://docs.snowflake.com/en/sql-reference/parameters.html#label-week-start
This will allow you to set the first day of the week at Saturday.
Doing so will result in the WEEK() function counting weeks in a year using saturday as a delimiter between weeks.
Now we just need to find which actual month has the most days for any given week and assign that week to the proper month.
I have an example script below that serves as an example on how to make a custom date dimension table. You can generate the table once and join against it to retrieve your custom date attributes.
/***************************************************************************
A WEEK_START session variable of 0 is the default Snowflake behavior
and has weeks start on Monday and end of Sunday (ISO standard).
https://docs.snowflake.com/en/sql-reference/parameters.html#label-week-start
-- 6 = Saturday is day 1 of of the week
*********************************************************************************************/
alter session set week_start = 6;
/*********************************************************************************************
The parameters below define the temporal boundaries of the calendar table. The values must be
DATE type and can be hardcoded, the result of a query, or a combination of both.
For example, you could set date_start and date_end based on the MIN and MAX date of the table
with the finest date granularity in your data.
*********************************************************************************************/
SET date_start = TO_DATE('2022-12-18');
SET date_end = current_date(); --TIP: for the current date use current_date();
--This sets the num_days parameter to the number of days between start and end
--this value is used for the generator
set num_days = (select datediff(day, $date_start, $date_end+1));
--CTE to hold generated date range
create or replace transient table calendar as
with gen_cte as (
select
dateadd(day,'-' || row_number() over (order by null),
dateadd(day, '+1', $date_end)
) as date_key
from table (generator(rowcount => ($num_days)))
order by 1)
-- calendar table expressions
, step_1 as (
select
date_key,
, dayofmonth(date_key) as day_of_month
, week(date_key) as week_num --*see comments
--, dayofweekiso(date_key) as day_of_week_iso,
, dayofweek(date_key) as day_of_week
, dayname(date_key) as day_name
, month(date_key) as month_num
--, weekiso(date_key) as week_iso_num, --*see comments
, year(date_key) as year_
, year_ || '-' ||week_num::string as year_week_key
, count(date_key) over (partition by year_week_key, month_num) as days_of_week_in_month
--ceil(dayofmonth(date_key) / 7) as day_instance_in_month --used to identify 'floating' events such as "fourth thursday of november"
FROM gen_cte)
-- calculate the max number of days in each month for any week in year
, step_2 as (
select
year_week_key
, month_num
, max(step_1.days_of_week_in_month) as max_days_of_week_in_month
from step_1
group by year_week_key, month_num)
-- for any week with 2 actual month values, assign the month with the most number of days
, step_3 as (
select
year_week_key
, month_num
, row_number() over (partition by year_week_key order by max_days_of_week_in_month desc ) as month_rank
from step_2
qualify month_rank = 1
)
select
s1.date_key
, s1.day_of_month
, s1.week_num
, s1.day_of_week
, s1.day_name
, s3.month_num as assigned_month_num
, s1.month_num as actual_month_num
, s1.year_
from step_1 s1
left join step_3 s3
on s1.year_week_key = s3.year_week_key
;
-- select from your new date dimension table
select * from calendar;

Related

Getting Average for Weekdays and Weekends within 30 days in BQ

I'm trying to get the average repairs in the weekdays and weekends within the last 30 days. Each day is tagged whether it's a weekday or a weekend. Holidays are tagged as weekends.
If I use:
AVG(Completed_Repairs) OVER(PARTITION BY day_type ORDER BY UNIX_DATE(WORK_DT) RANGE BETWEEN 30 PRECEDING AND CURRENT ROW)
I only get either the average repairs for all weekdays or for all weekends in the last 30 days depending on what type of day the date is. But I also need the average for the opposite to compute a prorated monthly number. I basically would need another column with the value of the opposite day type.
If I understood correctly, not partitioning might be the way:
with
input as (
select cast('2022-10-11' as date) as WORK_DT, "weekday" as day_type, 307 as completed_repairs union all
select cast('2022-10-12' as date) as WORK_DT, "weekday" as day_type, 100 as completed_repairs union all
select cast('2022-10-09' as date) as WORK_DT, "weekend" as day_type, 750 as completed_repairs union all
select cast('2022-10-10' as date) as WORK_DT, "weekend" as day_type, 647 as completed_repairs
)
select
*,
avg(if(day_type = 'weekday', completed_repairs,0)) OVER(ORDER BY UNIX_DATE(WORK_DT) RANGE BETWEEN 30 PRECEDING AND CURRENT ROW) as avg_weekday,
avg(if(day_type = 'weekend', completed_repairs,0)) OVER(ORDER BY UNIX_DATE(WORK_DT) RANGE BETWEEN 30 PRECEDING AND CURRENT ROW) as avg_weekend,
from input
order by work_dt
You can replace the 0 by null if you don't want the weekends to impact the average of the weekdays and vice-versa.
If you'd rather have a column "matching" and a column "opposite" you can then use the result of this to write a condition depending on the day_type and the column name.

R's ceiling_date equivalent in SQL

I want to implement R's ceiling_date fucntion in SQL (Postgresql).
So I have dates in a column for everyday with corresponding sales and I want to accumulate the sales for a week over a single date (say Friday).
Input Format:
Dates in yellow are the dates to aggregate sales on
Expected output format:
This can easily be done in R using ceiling_date but I want to do it in SQL itself.
Any help would be appreciated. Thanks
Accepting and processing the ISO 8601 Standard is by far the easiest for processing date ranges. But this imposes a standard definition, which is essentially:
All weeks consist on exactly 7 days.
All weeks begin on Monday.
The first week of the year is the week the contains 4-Jan.
The date_trunc function gives the first date of the week, adding 6 gives the last day of the week.
-- ISO 8601 Week definition
select (date_trunc('week',dte)::date +6) "Week Ending"
, sum(sales) "Total Sales"
from test
group by (date_trunc('week',dte)::date +6)
order by (date_trunc('week',dte)::date +6);
Date/Week processing for non ISO 8601 presents somewhat tricky process to get the appropriate week definition. The following does so for week Friday - Thursday definition. It creates a date range for a year beginning with the first Friday in the table, then joins using the range contains operator to determine the appropriate summation period
with periods (wk) as
( select daterange( ((min_dt + (n-1) * interval '1 week'))::date
, ((min_dt + (n) * interval '1 week'))::date
, '(]'
)
from (select min(dte) min_dt
from test
where extract(dow from dte) = 5 --- Day_Of_Week (5) = Friday
) s
cross join generate_series(0,52) gs(n)
) --select * from periods;
select upper(wk)-1 "Week Ending"
, sum(sales) "Total Sales"
from periods
join test
on (dte <# wk)
group by upper(wk)-1
order by upper(wk)-1;
See demo of both here.
NOTE: Demo changes sample date from January (2022-01-01 ...) to May (2022-05-01 ...) as 6-January-2022 was Thursday not Friday as description, 6-May-2022 is however Friday. Also the sum of values ending 6-May is 38 (not 42 as indicated). Finally, neither query attempts a limiting date, but processed through end-of-data. Nor does either address multiple years of data.
demo
idea: for 2022-Janurary-1 to 2022-Janurary-20, there is 3 Fridays:'2022-01-07','2022-01-14', '2022-01-21'.
We need to partition by these 3 friday order by sales date.
Now the problem is now to compute get all these date belong to these 3 fridays.
get every friday each sales_date belong to.
deal with special cases(one week after friday: saturday, sunday) when sales_date > friday then the real friday is next friday.
final code:
SELECT
*,
sum(amount) OVER (PARTITION BY sales.compute_friday ORDER BY sales_date)
FROM
sales;
processing code:
BEGIN;
CREATE TABLE sales (
sales_date date
, amount numeric
);
INSERT INTO sales (sales_date , amount)
SELECT
i
, (random() * 10)::integer
FROM
generate_series('2022-01-01'::timestamp , '2022-01-20'::timestamp , interval '1 day') g (i);
ALTER TABLE sales
ADD COLUMN friday date;
UPDATE
sales
SET
friday = (date_trunc('week' , sales_date) + interval '4 day')::date;
ALTER TABLE sales
ADD COLUMN compute_friday date;
UPDATE
sales
SET
compute_friday = CASE WHEN sales_date > friday THEN
(friday + interval '7 days')::date
ELSE
friday
END;
COMMIT;

how do I create a calculated field that returns days remaining till end of FISCAL_QUARTER?

Current output with no DAYS_LEFT_IN_QUARTERI am new to using Snowflake and was tasked to create a Calendar Dimension table that would aid in reporting weekly / monthly /quarterly reports. I am confused on how to return days remaining in the FISCAL_QUARTER. Q1 spans from Feb - Apr.
Attached below is the code I have been writing to generate the dates projecting 14 years in the future.
--Set the start date and number of years to produce
SET START_DATE = '2012-01-01';
SET NUMBER_DAYS = (SELECT TRUNC(14 * 365));
--Set parameters to force ISO
ALTER SESSION SET WEEK_START = 1, WEEK_OF_YEAR_POLICY = 1;
WITH CTE_MY_DATE AS (
SELECT DATEADD(DAY, SEQ4(), $START_DATE) AS MY_DATE
FROM TABLE(GENERATOR(ROWCOUNT=>$NUMBER_DAYS)) -- Number of days after reference date in previous line
)
SELECT
MY_DATE::date
,YEAR(MY_DATE) AS YEAR
,MONTH(MY_DATE) AS MONTH
,MONTHNAME(MY_DATE) AS MONTH_ABBREVIATION
,DAY(MY_DATE)
,DAYOFWEEK(MY_DATE)
,WEEKOFYEAR(MY_DATE)
,DAYOFYEAR(MY_DATE)
,YEAR(ADD_MONTHS(DATE_TRUNC('month', MY_DATE),11)) AS FISCAL_YEAR
,CONCAT('Q', QUARTER(ADD_MONTHS(DATE_TRUNC('month', MY_DATE),11))) AS FISCAL_QUARTER
,MONTH(ADD_MONTHS(DATE_TRUNC('month', MY_DATE),11)) AS FISCAL_MONTH
FROM CTE_MY_DATE
;
firstly your generator will get gaps, as SEQx() function are allowed to have gaps, so you need to use SEQx() as the OVER BY of a ROW_NUMBER like so:
WITH cte_my_date AS (
SELECT DATEADD(DAY, ROW_NUMBER() OVER(ORDER BY SEQ4()), $START_DATE) AS my_date
FROM TABLE(GENERATOR(ROWCOUNT=>$NUMBER_DAYS)) -- Number of days after reference date in previous line
)
and days left in quarter, is the day truncated to quarter, +1 quarter, date-diff in days to day:
,DATEDIFF('days', my_date, DATEADD('quarter', 1, DATE_TRUNC('quarter', my_date))) AS days_left_in_quarter
How's this? You can copy/paste the code straight into snowflake to test.
Using last_day() tends to make it look a little tidier :-)
WITH CTE_MY_DATE AS (
SELECT DATEADD(DAY, SEQ4(), current_date()) AS MY_DATE
FROM TABLE(GENERATOR(ROWCOUNT=>300)))
SELECT
MY_DATE::date
,YEAR(last_day(my_date,year)) AS FISCAL_YEAR
,concat('Q',quarter(my_date)) AS FISCAL_QUARTER
,datediff(d, my_date, last_day(my_date,quarter)) AS
DAYS_LEFT_IN_QUARTER
FROM CTE_MY_DATE

Find previous equivalent dates over the past two calender years

If today is say 15th August 2012 then the query should return the following
15/01/2011,
15/02/2011,
...
...
15/07/2012
15/08/2012
If today is 31st August 2012 then the query would return
31/01/2011,
28/02/2011, <<<<this is the nearest date
...
...
31/07/2012
31/08/2012
We have a vw_DimDate in our Warehouse which should help
edit
It contains the following fields
Currently I'm using the following but it seems rather convoluted! ...
DECLARE #Dt DATETIME = '31 JUL 2012'--GETDATE()
;WITH DateSet_cte(DayMarker)
AS
(
SELECT DayMarker
FROM WHData.dbo.vw_DimDate
WHERE
DayMarker >= CONVERT(DATETIME,CONVERT(CHAR(4),DATEADD(YEAR,-1,#Dt),112) + '0101') AND
DayMarker <=#Dt
)
, MaxDate_cte(MaxDate)
AS
(
SELECT [MaxDate] = MAX(DayMarker)
FROM DateSet_cte
)
SELECT
[Mth] = CONVERT(DATETIME,CONVERT(CHAR(6),a.DayMarker,112) + '01')
, MAX(a.DayMarker) [EquivDate]
FROM DateSet_cte a
WHERE DAY(a.DayMarker) <= (SELECT DAY([MaxDate]) FROM MaxDate_cte)
GROUP BY CONVERT(DATETIME,CONVERT(CHAR(6),a.DayMarker,112) + '01')
;with Numbers as (
select distinct number from master..spt_values where number between 0 and 23
), Today as (
select CONVERT(date,CURRENT_TIMESTAMP) as d
)
select
DATEADD(month,-number,d)
from
Numbers,Today
where DATEPART(year,DATEADD(month,-number,d)) >= DATEPART(year,d) - 1
Seems odd to want a variable number of returned values based on how far through the year we are, but that's what I've implemented.
When you use DATEADD to add months to a value, then it automatically adjusts the day number if it would have produced an out of range date (e.g. 31st February), such that it's the last day of the month. Or, as the documentation puts it:
If datepart is month and the date month has more days than the return month and the date day does not exist in the return month, the last day of the return month is returned.
Of course, if you already have a numbers table in your database, you can eliminate the first CTE. You mentioned that you "have a vw_DimDate in our Warehouse which should help", but since I have no idea on what that (presumably, a) view contains, it wasn't any help.

Return just the last day of each month with SQL

I have a table that contains multiple records for each day of the month, over a number of years. Can someone help me out in writing a query that will only return the last day of each month.
SQL Server (other DBMS will work the same or very similarly):
SELECT
*
FROM
YourTable
WHERE
DateField IN (
SELECT MAX(DateField)
FROM YourTable
GROUP BY MONTH(DateField), YEAR(DateField)
)
An index on DateField is helpful here.
PS: If your DateField contains time values, the above will give you the very last record of every month, not the last day's worth of records. In this case use a method to reduce a datetime to its date value before doing the comparison, for example this one.
The easiest way I could find to identify if a date field in the table is the end of the month, is simply adding one day and checking if that day is 1.
where DAY(DATEADD(day, 1, AsOfDate)) = 1
If you use that as your condition (assuming AsOfDate is the date field you are looking for), then it will only returns records where AsOfDate is the last day of the month.
Use the EOMONTH() function if it's available to you (E.g. SQL Server). It returns the last date in a month given a date.
select distinct
Date
from DateTable
Where Date = EOMONTH(Date)
Or, you can use some date math.
select distinct
Date
from DateTable
where Date = DATEADD(MONTH, DATEDIFF(MONTH, -1, Date)-1, -1)
In SQL Server, this is how I usually get to the last day of the month relative to an arbitrary point in time:
select dateadd(day,-day(dateadd(month,1,current_timestamp)) , dateadd(month,1,current_timestamp) )
In a nutshell:
From your reference point-in-time,
Add 1 month,
Then, from the resulting value, subtract its day-of-the-month in days.
Voila! You've the the last day of the month containing your reference point in time.
Getting the 1st day of the month is simpler:
select dateadd(day,-(day(current_timestamp)-1),current_timestamp)
From your reference point-in-time,
subtract (in days), 1 less than the current day-of-the-month component.
Stripping off/normalizing the extraneous time component is left as an exercise for the reader.
A simple way to get the last day of month is to get the first day of the next month and subtract 1.
This should work on Oracle DB
select distinct last_day(trunc(sysdate - rownum)) dt
from dual
connect by rownum < 430
order by 1
I did the following and it worked out great. I also wanted the Maximum Date for the Current Month. Here is what I my output is. Notice the last date for July which is 24th. I pulled it on 7/24/2017, hence the result
Year Month KPI_Date
2017 4 2017-04-28
2017 5 2017-05-31
2017 6 2017-06-30
2017 7 2017-07-24
SELECT B.Year ,
B.Month ,
MAX(DateField) KPI_Date
FROM Table A
INNER JOIN ( SELECT DISTINCT
YEAR(EOMONTH(DateField)) year ,
MONTH(EOMONTH(DateField)) month
FROM Table
) B ON YEAR(A.DateField) = B.year
AND MONTH(A.DateField) = B.Month
GROUP BY B.Year ,
B.Month
SELECT * FROM YourTableName WHERE anyfilter
AND "DATE" IN (SELECT MAX(NameofDATE_Column) FROM YourTableName WHERE
anyfilter GROUP BY
TO_CHAR(NameofDATE_Column,'MONTH'),TO_CHAR(NameofDATE_Column,'YYYY'));
Note: this answer does apply for Oracle DB
Here's how I just solved this. day_date is the date field, calendar is the table that holds the dates.
SELECT cast(datepart(year, day_date) AS VARCHAR)
+ '-'
+ cast(datepart(month, day_date) AS VARCHAR)
+ '-'
+ cast(max(DATEPART(day, day_date)) AS VARCHAR) 'DATE'
FROM calendar
GROUP BY datepart(year, day_date)
,datepart(month, day_date)
ORDER BY 1