R's ceiling_date equivalent in SQL - sql

I want to implement R's ceiling_date fucntion in SQL (Postgresql).
So I have dates in a column for everyday with corresponding sales and I want to accumulate the sales for a week over a single date (say Friday).
Input Format:
Dates in yellow are the dates to aggregate sales on
Expected output format:
This can easily be done in R using ceiling_date but I want to do it in SQL itself.
Any help would be appreciated. Thanks

Accepting and processing the ISO 8601 Standard is by far the easiest for processing date ranges. But this imposes a standard definition, which is essentially:
All weeks consist on exactly 7 days.
All weeks begin on Monday.
The first week of the year is the week the contains 4-Jan.
The date_trunc function gives the first date of the week, adding 6 gives the last day of the week.
-- ISO 8601 Week definition
select (date_trunc('week',dte)::date +6) "Week Ending"
, sum(sales) "Total Sales"
from test
group by (date_trunc('week',dte)::date +6)
order by (date_trunc('week',dte)::date +6);
Date/Week processing for non ISO 8601 presents somewhat tricky process to get the appropriate week definition. The following does so for week Friday - Thursday definition. It creates a date range for a year beginning with the first Friday in the table, then joins using the range contains operator to determine the appropriate summation period
with periods (wk) as
( select daterange( ((min_dt + (n-1) * interval '1 week'))::date
, ((min_dt + (n) * interval '1 week'))::date
, '(]'
)
from (select min(dte) min_dt
from test
where extract(dow from dte) = 5 --- Day_Of_Week (5) = Friday
) s
cross join generate_series(0,52) gs(n)
) --select * from periods;
select upper(wk)-1 "Week Ending"
, sum(sales) "Total Sales"
from periods
join test
on (dte <# wk)
group by upper(wk)-1
order by upper(wk)-1;
See demo of both here.
NOTE: Demo changes sample date from January (2022-01-01 ...) to May (2022-05-01 ...) as 6-January-2022 was Thursday not Friday as description, 6-May-2022 is however Friday. Also the sum of values ending 6-May is 38 (not 42 as indicated). Finally, neither query attempts a limiting date, but processed through end-of-data. Nor does either address multiple years of data.

demo
idea: for 2022-Janurary-1 to 2022-Janurary-20, there is 3 Fridays:'2022-01-07','2022-01-14', '2022-01-21'.
We need to partition by these 3 friday order by sales date.
Now the problem is now to compute get all these date belong to these 3 fridays.
get every friday each sales_date belong to.
deal with special cases(one week after friday: saturday, sunday) when sales_date > friday then the real friday is next friday.
final code:
SELECT
*,
sum(amount) OVER (PARTITION BY sales.compute_friday ORDER BY sales_date)
FROM
sales;
processing code:
BEGIN;
CREATE TABLE sales (
sales_date date
, amount numeric
);
INSERT INTO sales (sales_date , amount)
SELECT
i
, (random() * 10)::integer
FROM
generate_series('2022-01-01'::timestamp , '2022-01-20'::timestamp , interval '1 day') g (i);
ALTER TABLE sales
ADD COLUMN friday date;
UPDATE
sales
SET
friday = (date_trunc('week' , sales_date) + interval '4 day')::date;
ALTER TABLE sales
ADD COLUMN compute_friday date;
UPDATE
sales
SET
compute_friday = CASE WHEN sales_date > friday THEN
(friday + interval '7 days')::date
ELSE
friday
END;
COMMIT;

Related

Snowflake sql query to assign weeks to a month

I know about Snowflake date function to find out day, week, month, year, etc.
I want to have weeks start from Saturday each week to next Saturday.
following gives an idea how to extract, but need something to address my specific case.
How to get week number of month for any given date on snowflake SQL
If four days or more in week period belong to a certain month, I would assign the week to that month; otherwise, to the next month
example:
Week of April 29, 2023 to May 5, 2023 has less then four days in April so want to consider it as May
Week of May 23, 2023 to June 2nd, 2023 has more than four days in May so I would like to consider it as May
I want to assign weeks to a month with more days of one month (four or more days)
Snowflake will allow you to set the first day of the week with a parameter.
https://docs.snowflake.com/en/sql-reference/parameters.html#label-week-start
This will allow you to set the first day of the week at Saturday.
Doing so will result in the WEEK() function counting weeks in a year using saturday as a delimiter between weeks.
Now we just need to find which actual month has the most days for any given week and assign that week to the proper month.
I have an example script below that serves as an example on how to make a custom date dimension table. You can generate the table once and join against it to retrieve your custom date attributes.
/***************************************************************************
A WEEK_START session variable of 0 is the default Snowflake behavior
and has weeks start on Monday and end of Sunday (ISO standard).
https://docs.snowflake.com/en/sql-reference/parameters.html#label-week-start
-- 6 = Saturday is day 1 of of the week
*********************************************************************************************/
alter session set week_start = 6;
/*********************************************************************************************
The parameters below define the temporal boundaries of the calendar table. The values must be
DATE type and can be hardcoded, the result of a query, or a combination of both.
For example, you could set date_start and date_end based on the MIN and MAX date of the table
with the finest date granularity in your data.
*********************************************************************************************/
SET date_start = TO_DATE('2022-12-18');
SET date_end = current_date(); --TIP: for the current date use current_date();
--This sets the num_days parameter to the number of days between start and end
--this value is used for the generator
set num_days = (select datediff(day, $date_start, $date_end+1));
--CTE to hold generated date range
create or replace transient table calendar as
with gen_cte as (
select
dateadd(day,'-' || row_number() over (order by null),
dateadd(day, '+1', $date_end)
) as date_key
from table (generator(rowcount => ($num_days)))
order by 1)
-- calendar table expressions
, step_1 as (
select
date_key,
, dayofmonth(date_key) as day_of_month
, week(date_key) as week_num --*see comments
--, dayofweekiso(date_key) as day_of_week_iso,
, dayofweek(date_key) as day_of_week
, dayname(date_key) as day_name
, month(date_key) as month_num
--, weekiso(date_key) as week_iso_num, --*see comments
, year(date_key) as year_
, year_ || '-' ||week_num::string as year_week_key
, count(date_key) over (partition by year_week_key, month_num) as days_of_week_in_month
--ceil(dayofmonth(date_key) / 7) as day_instance_in_month --used to identify 'floating' events such as "fourth thursday of november"
FROM gen_cte)
-- calculate the max number of days in each month for any week in year
, step_2 as (
select
year_week_key
, month_num
, max(step_1.days_of_week_in_month) as max_days_of_week_in_month
from step_1
group by year_week_key, month_num)
-- for any week with 2 actual month values, assign the month with the most number of days
, step_3 as (
select
year_week_key
, month_num
, row_number() over (partition by year_week_key order by max_days_of_week_in_month desc ) as month_rank
from step_2
qualify month_rank = 1
)
select
s1.date_key
, s1.day_of_month
, s1.week_num
, s1.day_of_week
, s1.day_name
, s3.month_num as assigned_month_num
, s1.month_num as actual_month_num
, s1.year_
from step_1 s1
left join step_3 s3
on s1.year_week_key = s3.year_week_key
;
-- select from your new date dimension table
select * from calendar;

prestosql get average from last 7 days for each day

The question I have is very similar to the question here, but I am using Presto SQL (on aws athena) and couldn't find information on loops in presto.
To reiterate the issue, I want the query that:
Given table that contains: Day, Number of Items for this Day
I want: Day, Average Items for Last 7 Days before "Day"
So if I have a table that has data from Dec 25th to Jan 25th, my output table should have data from Jan 1st to Jan 25th. And for each day from Jan 1-25th, it will be the average number of items from last 7 days.
Is it possible to do this with presto?
maybe you can try this one
calendar Common Table Expression (CTE) is used to generate dates between two dates range.
with calendar as (
select date_generated
from (
values (sequence(date'2021-12-25', date'2022-01-25', interval '1' day))
) as t1(date_array)
cross join unnest(date_array) as t2(date_generated)),
temp CTE is basically used to make a date group which contains last 7 days for each date group.
temp as (select c1.date_generated as date_groups
, format_datetime(c2.date_generated, 'yyyy-MM-dd') as dates
from calendar c1, calendar c2
where c2.date_generated between c1.date_generated - interval '6' day and c1.date_generated
and c1.date_generated >= date'2021-12-25' + interval '6' day)
Output for this part:
date_groups
dates
2022-01-01
2021-12-26
2022-01-01
2021-12-27
2022-01-01
2021-12-28
2022-01-01
2021-12-29
2022-01-01
2021-12-30
2022-01-01
2021-12-31
2022-01-01
2022-01-01
last part is joining day column from your table with each date and then group it by the date group
select temp.date_groups as day
, avg(your_table.num_of_items) avg_last_7_days
from your_table
join temp on your_table.day = temp.dates
group by 1
You want a running average (AVG OVER)
select
day, amount,
avg(amount) over (order by day rows between 6 preceding and current row) as avg_amount
from mytable
order by day
offset 6;
I tried many different variations of getting the "running average" (which I now know is what I was looking for thanks to Thorsten's answer), but couldn't get the output I wanted exactly with my other columns (that weren't included in my original question) in the table, but this ended up working:
SELECT day, <other columns>, avg(amount) OVER (
PARTITION BY <other columns>
ORDER BY date(day) ASC
ROWS 6 PRECEDING) as avg_7_days_amount FROM table ORDER BY date(day) ASC

Using Date to find the inequality for sales than 500

I'm curious as to find the daily average sales for the month of December 1998 not greater than 100 as a where clause. So what I imagine is that since the table consists of the date of sales (sth like 1 december 1998, consisting of different date, months and year), amount due....First I'm going to define a particular month.
DEFINE a = TO_DATE('1-Dec-1998', 'DD-Month-YYYY')
SELECT SUBSTR(Sales_Date, 4,6), (SUM(Amount_Due)/EXTRACT(DAY FROM LAST_DAY(Sales_Date))
FROM ......
WHERE SUM(AMOUNT_DUE)/EXTRACT(DAY FROM LAST_DAY(&a)) < 100
I'm stuck as to extract the sum of amount due in the month of december 1998 for the where clause....
How can I achieve the objective?
To me, it looks like this:
select to_char(sales_date, 'mm.yyyy') month,
avg(amount_due) avg_value
from your_table
where sales_date >= trunc(date '1998-12-01', 'mm')
and sales_date < add_months(trunc(date '1998-12-01', 'mm'), 1)
group by to_char(sales_date, 'mm.yyyy')
having avg(amount_due) < 100;
WHERE clause can be simplified; it shows how to fetch certain period:
trunc to mm returns first day in that month
add_months to the above value (first day in that month) will return first day of the next month
the bottom line: give me all rows whose sales_date is >= first day of this month and < first day of the next month; basically, the whole this month
Finally, the where clause you used should actually be the having clause.
As long as the amount_due column only contains numbers, you can use the sum function.
Below SQL query should be able to satisfy your requirement.
Select SUM(Amount_Due) from table Sales where Sales_Date between '1-12-1998' and '31-12-1998'
OR
Select SUM(Amount_Due) from table Sales where Sales_Date like '%-12-1998'

SQL -- computing end dates from a given start date with arbitrary breaks

I have a table of 'semesters' of variable lengths with variable breaks in between them with a constraint such that a 'start_date' is always greater than the previous 'end_date':
id start_date end_date
-----------------------------
1 2012-10-01 2012-12-20
2 2013-01-05 2013-03-28
3 2013-04-05 2013-06-29
4 2013-07-10 2013-09-20
And a table of students as follows, where a start date may occur at any time within a given semester:
id start_date n_weeks
-------------------------
1 2012-11-15 25
2 2013-02-12 8
3 2013-03-02 12
I am attempting to compute an 'end_date' by joining the 'students' on 'semesters' which takes into account the variable-length breaks in-between semesters.
I can draw in the previous semester's end date (ie from the previous row's end_date) and by subtraction find the number of days in-between semesters using the following:
SELECT start_date
, end_date
, lag(end_date) OVER () AS prev_end_date
, start_date - lag(end_date) OVER () AS days_break
FROM terms
ORDER BY start_date;
Clearly, if there were to be only two terms, it would simply be a matter of adding the 'break' in days (perhaps, cast to 'weeks') -- and thereby extend the 'end_date' by that same period of time.
But should 'n_weeks' for a given student span more than one term, how could such a query be structured ?
Been banging my head against a wall for the last couple of days and I'd be immensely grateful for any help anyone would be able to offer....
Many thanks.
Rather than just looking at the lengths of semesters or the gaps between them, you could generate a list of all the dates that are within a semester using generate_series(), like this:
SELECT
row_number() OVER () as day_number,
day
FROM
(
SELECT
generate_series(start_date, end_date, '1 day') as day
FROM
semesters
) as day_series
ORDER BY
day
(SQLFiddle demo)
This assigns each day that is during a semester an arbitrary but sequential "day number", skipping out all the gaps between semesters.
You can then use this as a sub-query/CTE JOINed to your table of students: first find the "day number" of their start date, then add 7 * n_weeks to find the "day number" of their end date, and finally join back to find the actual date for that "day number".
This assumes that there is no special handling needed for partial weeks - i.e. if n_weeks is 4, the student must be enrolled for 28 days which are within the duration of a semeseter. The approach could be adapted to measure weeks (pass 1 week as the last argument to generate_series()), with the additional step of finding which week the student's start_date falls into.
Here's a complete query (SQLFiddle demo here):
WITH semester_days AS
(
SELECT
semester_id,
row_number() OVER () as day_number,
day_date::date
FROM
(
SELECT
id as semester_id,
generate_series(start_date, end_date, '1 day') as day_date
FROM
semesters
) as day_series
ORDER BY
day_date
)
SELECT
S.id as student_id,
S.start_date,
SD_start.semester_id as start_semester_id,
S.n_weeks,
SD_end.day_date as end_date,
SD_end.semester_id as end_semester_id
FROM
students as S
JOIN
semester_days as SD_start
On SD_start.day_date = S.start_date
JOIN
semester_days as SD_end
On SD_end.day_number = SD_start.day_number + (7 * S.n_weeks)
ORDER BY
S.start_date

Calculate closest working day in Postgres

I need to schedule some items in a postgres query based on a requested delivery date for an order. So for example, the order has a requested delivery on a Monday (20120319 for example), and the order needs to be prepared on the prior working day (20120316).
Thoughts on the most direct method? I'm open to adding a dates table. I'm thinking there's got to be a better way than a long set of case statements using:
SELECT EXTRACT(DOW FROM TIMESTAMP '2001-02-16 20:38:40');
This gets you previous business day.
SELECT
CASE (EXTRACT(ISODOW FROM current_date)::integer) % 7
WHEN 1 THEN current_date-3
WHEN 0 THEN current_date-2
ELSE current_date-1
END AS previous_business_day
To have the previous work day:
select max(s.a) as work_day
from (
select s.a::date
from generate_series('2012-01-02'::date, '2050-12-31', '1 day') s(a)
where extract(dow from s.a) between 1 and 5
except
select holiday_date
from holiday_table
) s
where s.a < '2012-03-19'
;
If you want the next work day just invert the query.
SELECT y.d AS prep_day
FROM (
SELECT generate_series(dday - 8, dday - 1, interval '1d')::date AS d
FROM (SELECT '2012-03-19'::date AS dday) x
) y
LEFT JOIN holiday h USING (d)
WHERE h.d IS NULL
AND extract(isodow from y.d) < 6
ORDER BY y.d DESC
LIMIT 1;
It should be faster to generate only as many days as necessary. I generate one week prior to the delivery. That should cover all possibilities.
isodow as extract parameter is more convenient than dow to test for workdays.
min() / max(), ORDER BY / LIMIT 1, that's a matter of taste with the few rows in my query.
To get several candidate days in descending order, not just the top pick, change the LIMIT 1.
I put the dday (delivery day) in a subquery so you only have to input it once. You can enter any date or timestamp literal. It is cast to date either way.
CREATE TABLE Holidays (Holiday, PrecedingBusinessDay) AS VALUES
('2012-12-25'::DATE, '2012-12-24'::DATE),
('2012-12-26'::DATE, '2012-12-24'::DATE);
SELECT Day, COALESCE(PrecedingBusinessDay, PrecedingMondayToFriday)
FROM
(SELECT Day, Day - CASE DATE_PART('DOW', Day)
WHEN 0 THEN 2
WHEN 1 THEN 3
ELSE 1
END AS PrecedingMondayToFriday
FROM TestDays) AS PrecedingMondaysToFridays
LEFT JOIN Holidays ON PrecedingMondayToFriday = Holiday;
You might want to rename some of the identifiers :-).