postgreSQL: How Select the nearest date that is not null - sql

I got a date that I want to find the all records in the past that got the same month and day.
The problem accrues when there is no such date in the same year. For example, the 29th February.
My goal is to get the nearest date from below the date that does not exist.
This is my currently query with the date 2012-02-29:
SELECT date, amount
FROM table_name
WHERE
EXTRACT(MONTH FROM date) = EXTRACT(MONTH FROM DATE('2012-02-29') )
AND EXTRACT(DAY FROM date) = EXTRACT(DAY FROM DATE('2012-02-29') )
AND date < '2012-02-29'
ORDER BY date DESC LIMIT 10;

If I understand correctly, you want one date per year with the property that that day is nearest to the given date.
I would suggest using distinct on:
select distinct on (date_trunc('year', date)) t.*
from table_name t
order by date_trunc('year', date),
abs(date_part('day, (date -
(date '2012-02-29' -
(extract(year from date '2012-02-29') - extract(year from date)) * interval '1 year'
)
)
)
)
);
EDIT:
An example of working code:
select distinct on (date_trunc('year', date)) t.*
from table_name t
order by date_trunc('year', date),
abs(date_part('day', date - (date '2012-02-29' -
((extract(year from date '2012-02-29') - extract(year from date)) * interval '1 year')
)
))

Related

Rewrite PostgreSQL query using CTE:

I have the following code to pull records from a daterange in PostgreSQL, it works as intended. The "end date" is determined by the "date" column from the last record, and the "start date" is calculated by subtracting a 7-day interval from the "end date".
SELECT date
FROM files
WHERE daterange((
(SELECT date FROM files ORDER BY date DESC LIMIT 1) - interval '7 day')::date, -- "start date"
(SELECT date FROM files ORDER BY date DESC LIMIT 1)::date, -- "end date"
'(]') #> date::date
ORDER BY date ASC
I'm trying to rewrite this query using CTEs, so I can replace those subqueries with values such as end_date and start_date. Is this possible using this method or should I look for other alternatives like variables? I'm still learning SQL.
WITH end_date AS
(
SELECT date FROM files ORDER BY date DESC LIMIT 1
),
start_date AS
(
SELECT date FROM end_date - INTERVAL '7 day'
)
SELECT date
FROM files
WHERE daterange(
start_date::date,
end_date::date,
'(]') #> date::date
ORDER BY date ASC
Right now I'm getting the following error:
ERROR: syntax error at or near "-"
LINE 7: SELECT date FROM end_date - INTERVAL '7 day'
You do not need two CTEs, it's one just fine, which can be joined to filter data.
WITH RECURSIVE files AS (
SELECT CURRENT_DATE date, 1 some_value
UNION ALL
SELECT (date + interval '1 day')::date, some_value + 1 FROM files
WHERE date < (CURRENT_DATE + interval '1 month')::date
),
dates AS (
SELECT
(MAX(date) - interval '7 day')::date from_date,
MAX(date) to_date
FROM files
)
SELECT f.* FROM files f
JOIN dates d ON daterange(d.from_date, d.to_date, '(]') #> f.date
You even can make it to be a daterange initially in CTE and use it later like this
WITH dates AS (
SELECT
daterange((MAX(date) - interval '7 day')::date, MAX(date), '(]') range
FROM files
)
SELECT f.* FROM files f
JOIN dates d ON d.range #> f.date
Here the first CTE is used just to generate some data.
It will get all file lines for dates in the last week, excluding from_date and including to_date.
date
some_value
2022-09-26
25
2022-09-27
26
2022-09-28
27
2022-09-29
28
2022-09-30
29
2022-10-01
30
2022-10-02
31
I think this is what you want:
WITH end_date AS
(
SELECT date FROM files ORDER BY date DESC LIMIT 1
),
start_date AS
(
SELECT date - INTERVAL '7 day' as date
FROM end_date
)
SELECT F.date, S.date startDate, E.date endDate
FROM files F
JOIN start_date S on F.date >= S.date
JOIN end_date E on F.date <= E.date
ORDER BY date ASC;
I hope I'm not repeating anything, but if I understand your problem correctly I think this will work:
with cte as (
select max (date)::date as max_date from files
)
select date
from files
cross join cte
where date >= max_date - 7
Or perhaps even:
select date
from files
where date >= (select max (date)::date - 7 from files)
Since you have already determined that the CTE has the max date, there is really no need to further bound it with a between, <= or range. You can simply say anything after that date minus 7 days.
The error in your code above is because you want this:
SELECT date - INTERVAL '7 day' as date FROM end_date
And not this:
SELECT date FROM end_date - INTERVAL '7 day'
You are subtracting from the table, which doesn't make sense.

create table with dates - sql

I have a query that can create a table with dates like below:
with digit as (
select 0 as d union all
select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9
),
seq as (
select a.d + (10 * b.d) + (100 * c.d) + (1000 * d.d) as num
from digit a
cross join
digit b
cross join
digit c
cross join
digit d
order by 1
)
select (last_day(sysdate)::date - seq.num)::date as "Date"
from seq;
How could this be changed to generate only dates
Thanks
demo:db<>fiddle
WITH dates AS (
SELECT
date_trunc('month', CURRENT_DATE) AS first_day_of_month,
date_trunc('month', CURRENT_DATE) + interval '1 month -1 day' AS last_day_of_month
)
SELECT
generate_series(first_day_of_month, last_day_of_month, interval '1 day')::date
FROM dates
date_trunc() truncates a type date (or timestamp) to a certain date part. date_trunc('month', ...) removes all parts but year and month. All other parts are set to their lowest possible values. So, the day part is set to 1. That's why you get the first day of month with this.
adding a month returns the first of the next month, subtracting a day from this results in the last day of the current month.
Finally you can generate a date series with start and end date using the generate_series() function
Edit: Redshift does not support generate_series() with type date and timestamp but with integer. So, we need to create an integer series instead and adding the results to the first of the month:
db<>fiddle
WITH dates AS (
SELECT
date_trunc('month', CURRENT_DATE) AS first_day_of_month,
date_trunc('month', CURRENT_DATE) + interval '1 month -1 day' AS last_day_of_month
)
SELECT
first_day_of_month::date + gs
FROM
dates,
generate_series(
date_part('day', first_day_of_month)::int - 1,
date_part('day', last_day_of_month)::int - 1
) as gs
This answers the original version of the question.
You would use generate_series():
select gs.dte
from generate_series(date_trunc('month', now()::date),
date_trunc('month', now()::date) + interval '1 month' - interval '1 day',
interval '1 day'
) gs(dte);
Here is a db<>fiddle.

PostgreSQL generate month and year series based on table field and fill with nulls if no data for a given month

I want to generate series of month and year from the next month of current year(say, start_month) to 12 months from start_month along with the corresponding data (if any, else return nulls) from another table in PostgreSQL.
SELECT ( ( DATE '2019-03-01' + ( interval '1' month * generate_series(0, 11) ) )
:: DATE ) dd,
extract(year FROM ( DATE '2019-03-01' + ( interval '1' month *
generate_series(0, 11) )
)),
coalesce(SUM(price), 0)
FROM items
WHERE s.date_added >= '2019-03-01'
AND s.date_added < '2020-03-01'
AND item_type_id = 3
GROUP BY 1,
2
ORDER BY 2;
The problem with the above query is that it is giving me the same value for price for all the months. The requirement is that the price column be filled with nulls or zeros if no price data is available for a given month.
Put the generate_series() in the FROM clause. You are summarizing the data -- i.e. calculating the price over the entire range -- and then projecting this on all months. Instead:
SELECT gs.yyyymm,
coalesce(SUM(i.price), 0)
FROM generate_series('2019-03-01'::date, '2020-02-01', INTERVAL '1 MONTH'
) gs(yyyymm) LEFT JOIN
items i
ON gs.yyyymm = DATE_TRUNC('month', s.date_added) AND
i.item_type_id = 3
GROUP BY gs.yyyymm
ORDER BY gs.yyyymm;
You want generate_series in the FROM clause and join with it, somewhat like
SELECT months.m::date, ...
FROM generate_series(
start_month,
start_month + INTERVAL '11 months',
INTERVAL '1 month'
) AS months(m)
LEFT JOIN items
ON months.m::date = items.date_added

Postgres - Cohort analysis across months sequentially, not if exists in any later month

I'm doing a cohort analysis and can get the group of users to examine, then see whether they transacted in the months following on. But I want it like this:
Of that group in December, who transacted in Jan; of the Jan group from Dec, who transacted in Feb. Basically i'm tracking decay of the customer base
What I don't want is those that return in any month following Dec, which is this:
WITH start_sample AS (
SELECT
user_fk,
created_at AS start_sample_date
FROM transactions
WHERE created_at >= '2016-11-01' AND created_at < '2016-12-01'
GROUP BY user_fk,
start_sample_date),
start_sample_min AS (
SELECT
user_fk,
MIN(start_sample_date) AS first_transaction
FROM start_sample
GROUP BY user_fk
)
SELECT
DATE_TRUNC('month', created_at) AS transacting_month,
COUNT(DISTINCT user_fk)
FROM transactions
WHERE created_at >= '2016-11-01'
AND t.user_fk IN(SELECT user_fk FROM start_sample_min)
GROUP BY transacting_month
ORDER BY transacting_month;
Then I made a churn model to see if it would get what I need, but it doesn't:
WITH monthly_users AS (
SELECT
user_fk AS monthly_user_fk,
DATE_TRUNC('month', created_at) AS month
FROM transactions
WHERE created_at >= '2016-11-01' AND created_at < '2017-12-01'
GROUP BY monthly_user_fk, month
ORDER BY monthly_user_fk, month
),
lag_lead AS (
SELECT
monthly_user_fk,
month,
LAG(month,1) OVER (PARTITION BY monthly_user_fk ORDER BY month) AS lag,
LEAD(month,1) OVER (PARTITION BY monthly_user_fk ORDER BY month) AS lead
FROM monthly_users),
lag_lead_with_diffs AS (
SELECT
monthly_user_fk,
month,
lag AS previous_month,
lead AS next_month,
EXTRACT(EPOCH FROM (month - lag)/86400)::INT AS lag_size,
EXTRACT(EPOCH FROM (lead - month)/86400)::INT AS lead_size
FROM lag_lead
),
calculated AS (
SELECT
month,
CASE WHEN previous_month IS NULL THEN 'ACTIVATION'
WHEN lag_size <= 31 THEN 'ACTIVE'
WHEN lag_size > 31 THEN 'RETURN' END AS this_month_values,
CASE WHEN (lead_size > 31 OR lead_size IS NULL) THEN 'CHURN' ELSE NULL END AS next_month_churn,
COUNT(DISTINCT monthly_user_fk) AS c_d_users
FROM lag_lead_with_diffs
GROUP BY month, 2, 3
)
SELECT
month,
this_month_values,
SUM(c_d_users) AS distinct_users
FROM calculated
GROUP BY month, this_month_values
UNION
SELECT month + INTERVAL '1 month',
'CHURN',
SUM(c_d_users)
FROM calculated
WHERE next_month_churn IS NOT NULL
GROUP BY month + INTERVAL '1 month', 2
HAVING (EXTRACT(EPOCH FROM (month + INTERVAL '1 month'))) < 1512086400
ORDER BY month, this_month_values;
However this is not fixed at the initial group. The Active group rolls from month to month.
I understand that the above is likely more complicated than what i'm asking, but I can't seem to get my head around it
Thanks in advance
Perhaps this is what you are looking for:
with Monthly_Users as (
select user_fk
, date_trunc('month',created_at) as month
, (date_part('year', created_at) - 2016) * 12
+ date_part('month', created_at) - 11 as Months_Between
from transactions
where created_at between date '2016-11-01'
and date '2017-12-01'
group by user_fk, month, months_between
), t2 as (
select Monthly_Users.*
, count(*) over (partition by user_fk
order by month rows between unbounded preceding
and 1 preceding) prev_rec_cnt
from Monthly_Users
)
select month
, count(*)
from t2
where Months_Between = Prev_Rec_Cnt
group by month
order by month;
In this query the Monthly_Users CTE is just like yours, but adds a computation of the number of Months_Between the created_at date and your initial starting date. In the second Common Table Expression, I count the number of occurrences of each user_fk prior to the current months record. Finally in the output query I limit the results to only those records where the Months_Between value matches the Prev_Rec_Cnt value. Any missed months will cause the Prev_Rec_Cnt value to not match the Months_Between value, so you'll be able to see the fall off of user_fk values from month to month.

update table with dates with month

There's a table dates_calendar:
id | date
-------------------------
13 | 2016-10-23 00:00:00
14 | 2016-10-24 00:00:00
I need to update this table and insert dates until the next month counting from the last date in the table. E.g. last date is 2016-10-24 00:00:00 - I need to insert dates till 2016-10-31. After that (the last date now is 2016-10-31) next statement call should insert dates till 2016-11-30 and so on.
Example of my SQL code, but it inserts 30 days all the time.
INSERT INTO dates_calendar (date)
VALUES (
generate_series(
(SELECT date FROM dates_calendar ORDER BY date DESC LIMIT 1) + interval '1 day',
(SELECT date FROM dates_calendar ORDER BY date DESC LIMIT 1) + interval '1 month',
'1 day'
)
);
I'm using PostgreSQL. As well would be fine to get rid of a duplicated SELECT statement of the last date.
insert into dates_calendar (date)
select dates::date
from (
select max(date)::date+ 1 next_day, '1day'::interval one_day, '1month'::interval one_month
from dates_calendar
) s,
generate_series(
next_day,
date_trunc('month', next_day)+ one_month- one_day,
one_day) dates;
To calculate the first and last date you need to insert you can use this query:
select max(date) + interval '1' day as first_day,
date_trunc('month', max(date) + interval '1' month) - interval '1' day as last_day
from dates_calendar
The expression date_trunc('month', max(date) + interval '1' month) calculates the start date of the next month. Subtracting one day from that will give you the last day of that month.
This can then be used to generate the list of dates:
with from_to (first_day, last_day) as (
select max(date) + interval '1' day,
date_trunc('month', max(date) + interval '1' month) - interval '1' day
from dates_calendar
)
select dt
from generate_series( (select first_day from from_to), (select last_day from from_to), interval '1' day) as t(dt);
And finally this can be used to insert the generated rows into the table:
with from_to (first_day, last_day) as (
select max(date) + interval '1' day,
date_trunc('month', max(date) + interval '1' month) - interval '1' day
from dates_calendar
)
insert into dates_calendar (date)
select dt
from generate_series( (select first_day from from_to), (select last_day from from_to), interval '1' day) as t(dt);
with max_date (d) as (select max(date)::date from dates_calendar)
insert into dates_calendar (date)
select d
from generate_series (
(select d from max_date) + 1,
(select date_trunc('month', d + interval '1 month')::date - 1 from max_date),
'1 day'
) g(d)