Customizing the range of a week with date_trunc - sql

I've been trying for hours now to write a date_trunc statement to be used in a group by where my week starts on a Friday and ends the following Thursday.
So something like
SELECT
DATE_TRUNC(...) sales_week,
SUM(sales) sales
FROM table
GROUP BY 1
ORDER BY 1 DESC
Which would return the results for the last complete week (by those standards) as 09-13-2019.

You can subtract 4 days and then add 4 days:
SELECT DATE_TRUNC(<whatever> - INTERVAL '4 DAY') + INTERVAL '4 DAY' as sales_week,
SUM(sales) as sales
FROM table
GROUP BY 1
ORDER BY 1 DESC

The expression
select current_date - cast(cast(7 - (5 - extract(dow from current_date)) as text) || ' days' as interval);
should always give you the previous Friday's date.

if by any chance you might have gaps in data (maybe more granular breakdowns vs just per week), you can generate a set of custom weeks and left join to that:
drop table if exists sales_weeks;
create table sales_weeks as
with
dates as (
select generate_series('2019-01-01'::date,current_date,interval '1 day')::date as date
)
,week_ids as (
select
date
,sum(case when extract('dow' from date)=5 then 1 else 0 end) over (order by date) as week_id
from dates
)
select
week_id
,min(date) as week_start_date
,max(date) as week_end_date
from week_ids
group by 1
order by 1
;

Related

create table with dates - sql

I have a query that can create a table with dates like below:
with digit as (
select 0 as d union all
select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9
),
seq as (
select a.d + (10 * b.d) + (100 * c.d) + (1000 * d.d) as num
from digit a
cross join
digit b
cross join
digit c
cross join
digit d
order by 1
)
select (last_day(sysdate)::date - seq.num)::date as "Date"
from seq;
How could this be changed to generate only dates
Thanks
demo:db<>fiddle
WITH dates AS (
SELECT
date_trunc('month', CURRENT_DATE) AS first_day_of_month,
date_trunc('month', CURRENT_DATE) + interval '1 month -1 day' AS last_day_of_month
)
SELECT
generate_series(first_day_of_month, last_day_of_month, interval '1 day')::date
FROM dates
date_trunc() truncates a type date (or timestamp) to a certain date part. date_trunc('month', ...) removes all parts but year and month. All other parts are set to their lowest possible values. So, the day part is set to 1. That's why you get the first day of month with this.
adding a month returns the first of the next month, subtracting a day from this results in the last day of the current month.
Finally you can generate a date series with start and end date using the generate_series() function
Edit: Redshift does not support generate_series() with type date and timestamp but with integer. So, we need to create an integer series instead and adding the results to the first of the month:
db<>fiddle
WITH dates AS (
SELECT
date_trunc('month', CURRENT_DATE) AS first_day_of_month,
date_trunc('month', CURRENT_DATE) + interval '1 month -1 day' AS last_day_of_month
)
SELECT
first_day_of_month::date + gs
FROM
dates,
generate_series(
date_part('day', first_day_of_month)::int - 1,
date_part('day', last_day_of_month)::int - 1
) as gs
This answers the original version of the question.
You would use generate_series():
select gs.dte
from generate_series(date_trunc('month', now()::date),
date_trunc('month', now()::date) + interval '1 month' - interval '1 day',
interval '1 day'
) gs(dte);
Here is a db<>fiddle.

how can i use unnest generate date array to provide end of month index?

i want to be able to create a fake index for my data so e.g. if i have an single order i want it repeated for every date in the array created below.
select
*
from
database.data,
UNNEST(GENERATE_DATE_ARRAY(
'2014-01-01',
(SELECT
MAX(Order_Date)
FROM
database.data), INTERVAL 1 MONTH)) AS month
however this creates an index of the 1st of each month, how can i change this so it's the end of every month? e.g. 2014-01-31, and 1 month interval, onwards
You can use date arithmetics:
select d.*, date_sub(date_add(dt, 1, interval 1 month), interval 1 day)
from database.data d
cross join unnest(
generate_date_array('2014-01-01', (select max(order_date) from database.data), interval 1 month)
) as dt
As of 10/14/2020, a new function LAST_DAY is released to do this in one stop:
SELECT LAST_DAY(DATE '2008-11-25', MONTH) AS last_day

PostgreSQL generate month and year series based on table field and fill with nulls if no data for a given month

I want to generate series of month and year from the next month of current year(say, start_month) to 12 months from start_month along with the corresponding data (if any, else return nulls) from another table in PostgreSQL.
SELECT ( ( DATE '2019-03-01' + ( interval '1' month * generate_series(0, 11) ) )
:: DATE ) dd,
extract(year FROM ( DATE '2019-03-01' + ( interval '1' month *
generate_series(0, 11) )
)),
coalesce(SUM(price), 0)
FROM items
WHERE s.date_added >= '2019-03-01'
AND s.date_added < '2020-03-01'
AND item_type_id = 3
GROUP BY 1,
2
ORDER BY 2;
The problem with the above query is that it is giving me the same value for price for all the months. The requirement is that the price column be filled with nulls or zeros if no price data is available for a given month.
Put the generate_series() in the FROM clause. You are summarizing the data -- i.e. calculating the price over the entire range -- and then projecting this on all months. Instead:
SELECT gs.yyyymm,
coalesce(SUM(i.price), 0)
FROM generate_series('2019-03-01'::date, '2020-02-01', INTERVAL '1 MONTH'
) gs(yyyymm) LEFT JOIN
items i
ON gs.yyyymm = DATE_TRUNC('month', s.date_added) AND
i.item_type_id = 3
GROUP BY gs.yyyymm
ORDER BY gs.yyyymm;
You want generate_series in the FROM clause and join with it, somewhat like
SELECT months.m::date, ...
FROM generate_series(
start_month,
start_month + INTERVAL '11 months',
INTERVAL '1 month'
) AS months(m)
LEFT JOIN items
ON months.m::date = items.date_added

Postgres - Cohort analysis across months sequentially, not if exists in any later month

I'm doing a cohort analysis and can get the group of users to examine, then see whether they transacted in the months following on. But I want it like this:
Of that group in December, who transacted in Jan; of the Jan group from Dec, who transacted in Feb. Basically i'm tracking decay of the customer base
What I don't want is those that return in any month following Dec, which is this:
WITH start_sample AS (
SELECT
user_fk,
created_at AS start_sample_date
FROM transactions
WHERE created_at >= '2016-11-01' AND created_at < '2016-12-01'
GROUP BY user_fk,
start_sample_date),
start_sample_min AS (
SELECT
user_fk,
MIN(start_sample_date) AS first_transaction
FROM start_sample
GROUP BY user_fk
)
SELECT
DATE_TRUNC('month', created_at) AS transacting_month,
COUNT(DISTINCT user_fk)
FROM transactions
WHERE created_at >= '2016-11-01'
AND t.user_fk IN(SELECT user_fk FROM start_sample_min)
GROUP BY transacting_month
ORDER BY transacting_month;
Then I made a churn model to see if it would get what I need, but it doesn't:
WITH monthly_users AS (
SELECT
user_fk AS monthly_user_fk,
DATE_TRUNC('month', created_at) AS month
FROM transactions
WHERE created_at >= '2016-11-01' AND created_at < '2017-12-01'
GROUP BY monthly_user_fk, month
ORDER BY monthly_user_fk, month
),
lag_lead AS (
SELECT
monthly_user_fk,
month,
LAG(month,1) OVER (PARTITION BY monthly_user_fk ORDER BY month) AS lag,
LEAD(month,1) OVER (PARTITION BY monthly_user_fk ORDER BY month) AS lead
FROM monthly_users),
lag_lead_with_diffs AS (
SELECT
monthly_user_fk,
month,
lag AS previous_month,
lead AS next_month,
EXTRACT(EPOCH FROM (month - lag)/86400)::INT AS lag_size,
EXTRACT(EPOCH FROM (lead - month)/86400)::INT AS lead_size
FROM lag_lead
),
calculated AS (
SELECT
month,
CASE WHEN previous_month IS NULL THEN 'ACTIVATION'
WHEN lag_size <= 31 THEN 'ACTIVE'
WHEN lag_size > 31 THEN 'RETURN' END AS this_month_values,
CASE WHEN (lead_size > 31 OR lead_size IS NULL) THEN 'CHURN' ELSE NULL END AS next_month_churn,
COUNT(DISTINCT monthly_user_fk) AS c_d_users
FROM lag_lead_with_diffs
GROUP BY month, 2, 3
)
SELECT
month,
this_month_values,
SUM(c_d_users) AS distinct_users
FROM calculated
GROUP BY month, this_month_values
UNION
SELECT month + INTERVAL '1 month',
'CHURN',
SUM(c_d_users)
FROM calculated
WHERE next_month_churn IS NOT NULL
GROUP BY month + INTERVAL '1 month', 2
HAVING (EXTRACT(EPOCH FROM (month + INTERVAL '1 month'))) < 1512086400
ORDER BY month, this_month_values;
However this is not fixed at the initial group. The Active group rolls from month to month.
I understand that the above is likely more complicated than what i'm asking, but I can't seem to get my head around it
Thanks in advance
Perhaps this is what you are looking for:
with Monthly_Users as (
select user_fk
, date_trunc('month',created_at) as month
, (date_part('year', created_at) - 2016) * 12
+ date_part('month', created_at) - 11 as Months_Between
from transactions
where created_at between date '2016-11-01'
and date '2017-12-01'
group by user_fk, month, months_between
), t2 as (
select Monthly_Users.*
, count(*) over (partition by user_fk
order by month rows between unbounded preceding
and 1 preceding) prev_rec_cnt
from Monthly_Users
)
select month
, count(*)
from t2
where Months_Between = Prev_Rec_Cnt
group by month
order by month;
In this query the Monthly_Users CTE is just like yours, but adds a computation of the number of Months_Between the created_at date and your initial starting date. In the second Common Table Expression, I count the number of occurrences of each user_fk prior to the current months record. Finally in the output query I limit the results to only those records where the Months_Between value matches the Prev_Rec_Cnt value. Any missed months will cause the Prev_Rec_Cnt value to not match the Months_Between value, so you'll be able to see the fall off of user_fk values from month to month.

Daily average for the month (needs number of days in month)

I have a table as follow:
CREATE TABLE counts
(
T TIMESTAMP NOT NULL,
C INTEGER NOT NULL
);
I create the following views from it:
CREATE VIEW micounts AS
SELECT DATE_TRUNC('minute',t) AS t,SUM(c) AS c FROM counts GROUP BY 1;
CREATE VIEW hrcounts AS
SELECT DATE_TRUNC('hour',t) AS t,SUM(c) AS c,SUM(c)/60 AS a
FROM micounts GROUP BY 1;
CREATE VIEW dycounts AS
SELECT DATE_TRUNC('day',t) AS t,SUM(c) AS c,SUM(c)/24 AS a
FROM hrcounts GROUP BY 1;
The problem now comes in when I want to create the monthly counts to know what to divide the daily sums by to get the average column a i.e. the number of days in the specific month.
I know to get the days in PostgreSQL you can do:
SELECT DATE_PART('days',DATE_TRUNC('month',now())+'1 MONTH'::INTERVAL-DATE_TRUNC('month',now()))
But I can't use now(), I have to somehow let it know what the month is when the grouping gets done. Any suggestions i.e. what should replace ??? in this view:
CREATE VIEW mocounts AS
SELECT DATE_TRUNC('month',t) AS t,SUM(c) AS c,SUM(c)/(???) AS a
FROM dycounts
GROUP BY 1;
A bit shorter and faster and you get the number of days instead of an interval:
SELECT EXTRACT(day FROM date_trunc('month', now()) + interval '1 month'
- interval '1 day')
It's possible to combine multiple units in a single interval value . So we can use '1 mon - 1 day':
SELECT EXTRACT(day FROM date_trunc('month', now()) + interval '1 mon - 1 day')
(mon, month or months work all the same for month units.)
To divide the daily sum by the number of days in the current month (orig. question):
SELECT t::date AS the_date
, SUM(c) AS c
, SUM(c) / EXTRACT(day FROM date_trunc('month', t::date)
+ interval '1 mon - 1 day') AS a
FROM dycounts
GROUP BY 1;
To divide monthly sum by the number of days in the current month (updated question):
SELECT DATE_TRUNC('month', t)::date AS t
,SUM(c) AS c
,SUM(c) / EXTRACT(day FROM date_trunc('month', t)::date
+ interval '1 mon - 1 day') AS a
FROM dycounts
GROUP BY 1;
You have to repeat the GROUP BY expression if you want to use a single query level.
Or use a subquery:
SELECT *, c / EXTRACT(day FROM t + interval '1 mon - 1 day') AS a
FROM (
SELECT date_trunc('month', t)::date AS t, SUM(c) AS c
FROM dycounts
GROUP BY 1
) sub;