Redshift - Adding dates (month interval) between two dates - sql

Using Amazon Redshift.
Also have a dates table with all calendar dates that can be utilized.
Question: How can I take a start timestamp (created_at) and end timestamp (ended_at) and add a column that adds 1 month to the start timestamp until the end timestamp.
I have a table with:
user_id,
plan_id,
created_at,
ended_at, (can be null)
So if I had a created_at timestamp of 2019-07-11, I would have a column with additional rows for 2019-08-11, 2019-09-11, 2019-10-11, etc. The goal is to associate the monthly amounts paid by a user to the dates when starting with only a start and end date.
EDIT:
I used the below query which works when an ended_at timestamp is present, however, when it is null, I need to have the next month populated until an ended_at timestamp is present.
select
ps.network_id,
ps.user_id,
ps.plan_id,
ps.created_at,
extract('day' from ps.created_at) as extract_day,
d.calendar_date,
ps.archived_at as ended_at,
ps.application_fee_percent,
pp.amount,
pp.interval,
pp.name
from payments_subscriptions ps
left outer join dates d on extract('day' from date_trunc('day',d.calendar_date)) = extract('day' from ps.created_at) AND date_trunc('day',d.calendar_date) >= date_trunc('day',ps.created_at) AND date_trunc('day',d.calendar_date) < date_trunc('day',ps.archived_at)
left outer join payments_plans pp on ps.plan_id = pp.id
where ps.network_id = '1318990'
and ps.user_id = '2343404'
order by 3,6 desc
output from above query - subscription with null ended_at needs to continue until ended_at is present

Use dateadd function for increasing time/date in timestamp
https://docs.aws.amazon.com/redshift/latest/dg/r_DATEADD_function.html
For increasing one month use this:
DATEADD(month, 1, CURRENT_TIMESTAMP)

For anyone looking for a potential solution, I ended up joining my dates table in this fashion:
LEFT OUTER JOIN dates d ON extract('day' FROM date_trunc('day',d.calendar_date)) = extract('day' FROM payments_subscriptions.created_at)
AND date_trunc('day',d.calendar_date) >= date_trunc('day',payments_subscriptions.created_at)
AND date_trunc('day',d.calendar_date) < date_trunc('day',getdate())
and this where clause:
WHERE (calendar_date < date_trunc('day',payments_subscriptions.archived_at) OR payments_subscriptions.archived_at is null)

Related

Filling in empty dates

This query returns the number of alarms created by day between a specific date range.
SELECT CAST(created_at AS DATE) AS date, SUM(1) AS count
FROM ew_alarms
LEFT JOIN site ON site.id = ew_alarms.site_id
AND ew_alarms.created_at BETWEEN '12/22/2020' AND '01/22/2021' AND (CAST(EXTRACT(HOUR FROM ew_alarms.created_at) AS INT) BETWEEN 0 AND 23.99)
GROUP BY CAST(created_at AS DATE)
ORDER BY date DESC
Result: screenshot
What the best way to fill in the missing dates (1/16, 1/17, 1/18, etc)? Due to no alarms created on those days these results throw off the daily average I'm ultimately trying to achieve.
Would it be a generate_series query?
Yes, use generate_series(). I would suggest:
SELECT gs.date, COUNT(s.site_id) AS count
FROM GENERATE_SERIES('2020-12-22'::date, '2021-01-22'::date, INTERVAL '1 DAY') gs(dte) LEFT JOIN
ew_alarms a
ON ew.created_at >= gs.dte AND
ew.created_at < gs.dte + INTERVAL '1 DAY' LEFT JOIN
site s
ON s.id = a.site_id
GROUP BY gs.dte
ORDER BY date DESC;
I don't know what the hour comparison is supposed to be doing. The hour is always going to be between 0 and 23, so I removed that logic.
Note: Presumably, you want to count something from either site or ew_alarms. That is expected with LEFT JOINs so 0 can be returned.

Postgres: Return zero as default for rows where there is no matach

I am trying to get all the paid contracts from my contracts table and group them by month. I can get the data but for months where there is no new paid contract I want to get a zero instead of missing month. I have tried coalesce and generate_series but I cannot seem to get the missing row.
Here is my query:
with months as (
select generate_series(
'2019-01-01', current_date, interval '1 month'
) as series )
select date(months.series) as day, SUM(contracts.price) from months
left JOIN contracts on date(date_trunc('month', contracts.to)) = months.series
where contracts.tier='paid' and contracts.trial=false and (contracts.to is not NULL) group by day;
I want the results to look like:
|Contract Value| Month|
| 20 | 01-2020|
| 10 | 02-2020|
| 0 | 03-2020|
I can get the rows where there is a contract but cannot get the zero row.
Postgres Version 10.9
I think that you want:
with months as (
select generate_series('2019-01-01', current_date, interval '1 month' ) as series
)
select m.series as day, coalesce(sum(c.price), 0) sum_price
from months m
left join contracts c
on c.to >= m.series
and c.to < m.series + interval '1' month
and co.tier = 'paid'
and not c.trial
group by m.series;
That is:
you want the condition on the left joined table in the on clause of the join rather than in the where clause, otherwise they become mandatory, and evict rows where the left join came back empty
the filter on the date can be optimized to avoid using date functions; this makes the query SARGeable, ie the database may take advantage of an index on the date column
table aliases make the query easier to read and write
You need to move conditions to the on clause:
with months as (
select generate_series( '2019-01-01'::date, current_date, interval '1 month') as series
)
select dm.series as day, coalesce(sum(c.price), 0)
from months m left join
contracts c
on c.to >= m.series and
c.to < m.series + interval '1 month' and
c.tier = 'paid' and
c.trial = false
group by day;
Note some changes to the query:
The conditions on c that were in the where clause are in the on clause.
The date comparison uses simple data comparisons, rather than truncating to the month. This helps the optimizer and makes it easier to use an index.
Table aliases make the query easier to write and to read.
There is no need to convert day to a date. It already is.
to is a bad choice for a column name because it is reserved. However, I did not change it.

How to cast unix timestamp as date and extract month from it in Presto SQL

I have the following query:
select cast(ov.ingestion_timestamp as date), date(ov.date_id), cast(ov.main_category as varchar),
sum(cast(ov.order_target as int)),
sum(cast(ov.gmv_target as int))
from tableA ov
inner join tableB cb
on date(ov.date_id) = date(cb.ingestion_timestamp)
inner join tableC loc
on date(ov.date_id) = date(loc.ingestion_timestamp)
where MONTH(date(ov.ingestion_timestamp)) = month(current_date)
group by 1,2,3
I would like to get records where month of the ingestion_timestamp column is equals to current month.. All column values are stored as object hence I need to cast to their respective datatypes. May I know how I can retrieve month of the ingestion_timestamp column please?
Thank you.
I would suggest not casting in the where clause: this is inefficient, because the function needs to applied to every row before filtering.
Instead, you can compute the timestamp that corresponds to the beginning of the month, and is it for direct filtering:
where ingestion_timestamp >= to_unixtime(date_trunc(month, current_date))
If you have dates in the future you can add an upper bound limit
where
ingestion_timestamp >= to_unixtime(date_trunc(month, current_date))
and ingestion_timestam < to_unixtime(date_trunc(month, current_date) + interval '1' month)

Counting Biz days for each row of database

I have a table with each row containing a start and end date with timestamp format and need to filter them by the number of business days between the start and end date.
Based on some of the solutions posted here, I created a separate table with all days and marked them with a boolean field like this:
CREATE TABLE tbl_holiday (h_date TIMESTAMP, is_holiday BOOLEAN)
Is it possible to write a query that filters by count days between start_date and date_date that has _is_holiday as False?
My database is Impala.
You would typically join the original table with the holiday table with inequality conditions on the start and end date, aggregate, and finally filter in a having clause by the sum of business days against your target value:
select t.id, t.start_date, t.end_date
from mytable t
inner join tbl_holiday h on h.hdate between t.start_date and t.end_date
group by t.id, t.start_date, t.end_date
having sum(cast(is_holiday as int)) = :no_of_business_days

MySQL date range SELECT + JOIN query using column with CURRENT_TIMESTAMP

I am using this query:
SELECT p.id, count(clicks.ip)
FROM `p`
LEFT JOIN c clicks ON p.id = clicks.pid
WHERE clicks.ip = '111.222.333.444'
To select clicks from table "c", that has "pid" = "p.id". The query seems to work fine, but now I want to use that query with date ranges. The "c" table has a column "time" that uses MySQL CURRENT_TIMESTAMP data type (YYYY-MM-DD HH:MM:SS). How can I use my query with date range using that column?
I want to be able to select count(clicks.ip) from a specific day, and also group the results by hour (but this is for a different query).
Use:
SELECT p.id,
COUNT(clicks.ip)
FROM `p`
LEFT JOIN c clicks ON clicks.pid = p.id
AND clicks.ip = '111.222.333.444'
AND clicks.time BETWEEN DATE_SUB(NOW(), INTERVAL 1 DAY)
AND NOW()
I provided an example that will count clicks that occurred between this time yesterday (DATE_SUB(NOW(), INTERVAL 1 DAY)) and today (NOW()). Mind that BETWEEN is inclusive.