Count if previous month data exists postgres - sql

i'm stuck with a query to count id where if it exists in previous month than 1
my table look like this
date | id |
2020-02-02| 1 |
2020-03-04| 1 |
2020-03-04| 2 |
2020-04-05| 1 |
2020-04-05| 3 |
2020-05-06| 2 |
2020-05-06| 3 |
2020-06-07| 2 |
2020-06-07| 3 |
i'm stuck with this query
SELECT date_trunc('month',date), id
FROM table
WHERE id IN
(SELECT DISTINCT id FROM table WHERE date
BETWEEN date_trunc('month', current_date) - interval '1 month' AND date_trunc('month', current_date)
the main problem is that i stuck with current_date function. is there any dynamic ways change current_date? thanks
What i expected to be my result is
date | count |
2020-02-01| 0 |
2020-03-01| 1 |
2020-04-01| 1 |
2020-05-01| 1 |
2020-06-01| 2 |

Solution 1 with SELF JOIN
SELECT date_trunc('month', c.date) :: date AS date
, count(DISTINCT c.id) FILTER (WHERE p.date IS NOT NULL)
FROM test AS c
LEFT JOIN test AS p
ON c.id = p.id
AND date_trunc('month', c.date) = date_trunc('month', p.date) + interval '1 month'
GROUP BY date_trunc('month', c.date)
ORDER BY date_trunc('month', c.date)
Result :
date count
2020-02-01 0
2020-03-01 1
2020-04-01 1
2020-05-01 1
2020-06-01 2
Solution 2 with WINDOW FUNCTIONS
SELECT DISTINCT ON (date) date
, count(*) FILTER (WHERE count > 0 AND previous_month) OVER (PARTITION BY date)
FROM
( SELECT DISTINCT ON (id, date_trunc('month', date))
id
, date_trunc('month', date) AS date
, count(*) OVER w AS count
, first_value(date_trunc('month', date)) OVER w = date_trunc('month', date) - interval '1 month' AS previous_month
FROM test
WINDOW w AS (PARTITION BY id ORDER BY date_trunc('month', date) GROUPS BETWEEN 1 PRECEDING AND 1 PRECEDING)
) AS a
Result :
date count
2020-02-01 0
2020-03-01 1
2020-04-01 1
2020-05-01 1
2020-06-01 2
see dbfiddle

Related

Obtain Name Column Based on Value

I have a table that calculates the number of associated records that fit a criteria for each parent record. See example below:
note - morning, afternoon and evening are only weekdays
| id | morning | afternoon | evening | weekend |
| -- | ------- | --------- | ------- | ------- |
| 1 | 0 | 2 | 3 | 1 |
| 2 | 2 | 9 | 4 | 6 |
What I am trying to achieve is to determine which columns have the lowest value and get their column name as such:
| id | time_of_day |
| -- | ----------- |
| 1 | morning |
| 2 | afternoon |
Here is my current SQL code to result in the first table:
SELECT
leads.id,
COALESCE(morning, 0) morning,
COALESCE(afternoon, 0) afternoon,
COALESCE(evening, 0) evening,
COALESCE(weekend, 0) weekend
FROM leads
LEFT OUTER JOIN (
SELECT DISTINCT ON (lead_id) lead_id, COUNT(*) AS morning
FROM lead_activities
WHERE lead_activities.modality = 'Call' AND lead_activities.bound_type = 'outbound' AND extract('dow' from created_at) IN (0,1,2,3,4,5) AND (extract('hour' from created_at) >= 0 AND extract('hour' from created_at) < 12)
GROUP BY lead_id
) morning ON morning.lead_id = leads.id
LEFT OUTER JOIN (
SELECT DISTINCT ON (lead_id) lead_id, COUNT(*) AS afternoon
FROM lead_activities
WHERE lead_activities.modality = 'Call' AND lead_activities.bound_type = 'outbound' AND extract('dow' from created_at) IN (0,1,2,3,4,5) AND (extract('hour' from created_at) >= 12 AND extract('hour' from created_at) < 17)
GROUP BY lead_id
) afternoon ON afternoon.lead_id = leads.id
LEFT OUTER JOIN (
SELECT DISTINCT ON (lead_id) lead_id, COUNT(*) AS evening
FROM lead_activities
WHERE lead_activities.modality = 'Call' AND lead_activities.bound_type = 'outbound' AND extract('dow' from created_at) IN (0,1,2,3,4,5) AND (extract('hour' from created_at) >= 17 AND extract('hour' from created_at) < 25)
GROUP BY lead_id
) evening ON evening.lead_id = leads.id
LEFT OUTER JOIN (
SELECT DISTINCT ON (lead_id) lead_id, COUNT(*) AS weekend
FROM lead_activities
WHERE lead_activities.modality = 'Call' AND lead_activities.bound_type = 'outbound' AND extract('dow' from created_at) IN (6,7)
GROUP BY lead_id
) weekend ON weekend.lead_id = leads.id
You can use CASE/WHEN/ELSE to check for the specific conditions and produce different values. For example:
with
q as (
-- your query here
)
select
id,
case
when morning <= least(afternoon, evening, weekend) then 'morning'
when afternoon <= least(morning, evening, weekend) then 'afternoon'
when evening <= least(morning, afternoon, weekend) then 'evening'
else 'weekend'
end as time_of_day
from q

Count days per month from days off table

I have table which stores person, start of holiday and stop of holiday.
I need to count from it, how many working days per month person was on holiday. So I want to partition this table over month.
To get holidays I'm using: https://github.com/christopherthompson81/pgsql_holidays
Let's assume I have table for one person only with start/stop only.
create table data (id int, start date, stop date);
This is function for network_days I wrote:
CREATE OR REPLACE FUNCTION network_days(start_date date , stop_date date) RETURNS bigint AS $$
SELECT count(*) FROM
generate_series(start_date , stop_date - interval '1 minute' , interval '1 day') the_day
WHERE
extract('ISODOW' FROM the_day) < 6 AND the_day NOT IN (
SELECT datestamp::timestamptz FROM holidays_poland (extract(year FROM o.start_date)::int, extract(year FROM o.stop_date)::int))
$$
LANGUAGE sql
STABLE;
and I created function with query like:
--$2 = 2020
SELECT
month, year, sum(value_per_day)
FROM (
SELECT to_char(dt , 'mm') AS month, to_char(dt, 'yyyy') AS year, (network_days ((
CASE WHEN EXTRACT(year FROM df.start_date) < 2020 THEN (SELECT date_trunc('year' , df.start_date) + interval '1 year')::date
ELSE df.start_date END) , ( CASE WHEN EXTRACT(year FROM df.stop_date) > $2 THEN (date_trunc('year' , df.stop_date))::date
ELSE
df.stop_date END))::int ::numeric / count(*) OVER (PARTITION BY id))::int AS value_per_day
FROM intranet.dayoff df
LEFT JOIN generate_series((
CASE WHEN EXTRACT(year FROM df.start_date) < $2 THEN (SELECT date_trunc('year' , df.start_date) + interval '1 year')::date ELSE df.start_date
END) , (CASE WHEN EXTRACT(year FROM df.stop_date) > $2 THEN (date_trunc('year' , df.stop_date))::date
ELSE df.stop_date END) - interval '1 day' , interval '1 day') AS t (dt) ON extract('ISODOW' FROM dt) < 6
WHERE
extract(isodow FROM dt) < 6 AND (EXTRACT(year FROM start_date) = $2 OR EXTRACT(year FROM stop_date) = $2)) t
GROUP BY month, year
ORDER BY month;
based on: https://dba.stackexchange.com/questions/237745/postgresql-split-date-range-by-business-days-then-aggregate-by-month?rq=1
and I almost have it:
10 rows returned
| month | year | sum |
| ----- | ---- | ---- |
| 03 | 2020 | 2 |
| 04 | 2020 | 13 |
| 06 | 2020 | 1 |
| 11 | 2020 | 1 |
| 12 | 2020 | 2 |
| 05 | 2020 | 1 |
| 10 | 2020 | 2 |
| 08 | 2020 | 10 |
| 01 | 2020 | 1 |
| 02 | 2020 | 1 |
so in function I created I'd need to add something like this
dt NOT IN (SELECT datestamp::timestamptz FROM holidays_poland ($2, $2))
but I end up with many conditions and I feel like this wrong approach.
I feel like I should just somehow divide table from:
id start stop
1 31.12.2019 00:00:00 01.01.2020 00:00:00
2 30.03.2020 00:00:00 14.04.2020 00:00:00
3 01.05.2020 00:00:00 03.05.2020 00:00:00
to
start stop
30.03.2020 00:00:00 01.01.2020 00:00:00
01.01.2020 00:00:00 14.04.2020 00:00:00
01.05.2020 00:00:00 03.05.2020 00:00:00
and just run network_days function for this date range, but I couldn't successfully partition my query of the table to get such result.
What do you think is best way to achieve what I want to calculate?
demo:db<>fiddle
SELECT
gs::date
FROM person_holidays p,
generate_series(p.start, p.stop, interval '1 day') gs -- 1
WHERE gs::date NOT IN (SELECT holiday FROM holidays) -- 2
AND EXTRACT(isodow from gs::date) < 6 -- 3
Generate date series from person's start and stop date
Exclude all dates from the holidays table
If necessary: Exclude all weekend days (Saturday and Sunday)
Afterwards you are able to GROUP BY months and count the records:
SELECT
date_trunc('month', gs),
COUNT(*)
FROM person_holidays p,
generate_series(p.start, p.stop, interval '1 day') gs
WHERE gs::date NOT IN (SELECT holiday FROM holidays)
and extract(isodow from gs::date) < 6
GROUP BY 1

Sum results on constant timeframe range on each date in table

I'm using PostGres DB.
I have a table that contains test names, their results and reported time:
|test_name|result |report_time|
| A |error |29/11/2020 |
| A |failure|28/12/2020 |
| A |error |29/12/2020 |
| B |passed |30/12/2020 |
| C |failure|31/12/2020 |
| A |error |31/12/2020 |
I'd like to sum how many tests have failed or errored in the last 30 days, per date (and limit it to be 5 days back from the current date), so the final result will be:
| date | sum | (notes)
| 29/11/2020 | 1 | 1 failed/errored test in range (29/11 -> 29/10)
| 28/12/2020 | 2 | 2 failed/errored tests in range (28/12 -> 28/11)
| 29/12/2020 | 3 | 3 failed/errored tests in range (29/12 -> 29/11)
| 30/12/2020 | 2 | 2 failed/errored tests in range (30/12 -> 30/11)
| 31/12/2020 | 4 | 4 failed/errored tests in range (31/12 -> 31/11)
I know how to sum the results per date (i.e, how many failures/errors were on a specific date):
SELECT report_time::date AS "Report Time", count(case when result in ('failure', 'error') then 1 else
null end) from table
where report_time::date = now()::date
GROUP BY report_time::date, count(case when result in ('failure', 'error') then 1 else null end)
But I'm struggling to sum each date 30 days back.
You can generate the dates and then use window functions:
select gs.dte, num_failed_error, num_failed_error_30
from genereate_series(current_date - interval '5 day', current_date, interval '1 day') gs(dte) left join
(select t.report_time, count(*) as num_failed_error,
sum(count(*)) over (order by report_time range between interval '30 day' preceding and current row) as num_failed_error_30
from t
where t.result in ('failed', 'error') and
t.report_time >= current_date - interval '35 day'
group by t.report_time
) t
on t.report_time = gs.dte ;
Note: This assumes that report_time is only the date with no time component. If it has a time component, use report_time::date.
If you have data on each day, then this can be simplified to:
select t.report_time, count(*) as num_failed_error,
sum(count(*)) over (order by report_time range between interval '30 day' preceding and current row) as num_failed_error_30
from t
where t.result in ('failed', 'error') and
t.report_time >= current_date - interval '35 day'
group by t.report_time
order by report_time desc
limit 5;
Since I'm using PostGresSql 10.12 and update is currently not an option, I took a different approach, where I calculate the dates of the last 30 days and for each date I calculate the cumulative distinct sum for the past 30 days:
SELECT days_range::date, SUM(number_of_tests)
FROM generate_series (now() - interval '30 day', now()::timestamp , '1 day'::interval) days_range
CROSS JOIN LATERAL (
SELECT environment, COUNT(DISTINCT(test_name)) as number_of_tests from tests
WHERE report_time > days_range - interval '30 day'
GROUP BY report_time::date
HAVING COUNT(case when result in ('failure', 'error') then 1 else null end) > 0
ORDER BY report_time::date asc
) as lateral_query
GROUP BY days_range
ORDER BY days_range desc
It is definitely not the best optimized query, it takes ~1 minute for it to compute.

Weeks between two dates

I'm attempting to turn two dates into a series of records. One record for each week between the dates.
Additionally the original start and end dates should be used to clip the week in case the range starts or ends mid-week. I'm also assuming that a week starts on Monday.
With a start date of: 05/09/2018 and an end date of 27/09/2018 I would like to retrieve the following results:
| # | Start Date | End date |
|---------------------------------|
| 0 | '05/09/2018' | '09/09/2018' |
| 1 | '10/09/2018' | '16/09/2018' |
| 2 | '17/09/2018' | '23/09/2018' |
| 3 | '24/09/2018' | '27/09/2018' |
I have made some progress - at the moment I can get the total number of weeks between the date range with:
SELECT (
EXTRACT(
days FROM (
date_trunc('week', to_date('27/09/2018', 'DD/MM/YYYY')) -
date_trunc('week', to_date('05/09/2018', 'DD/MM/YYYY'))
) / 7
) + 1
) as total_weeks;
Total weeks will return 4 for the above SQL. This is where I'm stuck, going from an integer to actual set of results.
Window functions are your friend:
SELECT week_num,
min(d) AS start_date,
max(d) AS end_date
FROM (SELECT d,
count(*) FILTER (WHERE new_week) OVER (ORDER BY d) AS week_num
FROM (SELECT DATE '2018-09-05' + i AS d,
extract(dow FROM DATE '2018-09-05'
+ lag(i) OVER (ORDER BY i)
) = 1 AS new_week
FROM generate_series(0, DATE '2018-09-27' - DATE '2018-09-05') AS i
) AS week_days
) AS weeks
GROUP BY week_num
ORDER BY week_num;
week_num | start_date | end_date
----------+------------+------------
0 | 2018-09-05 | 2018-09-09
1 | 2018-09-10 | 2018-09-16
2 | 2018-09-17 | 2018-09-23
3 | 2018-09-24 | 2018-09-27
(4 rows)
Use generate_series():
select gs.*
from generate_series(date_trunc('week', '2018-09-05'::date),
'2018-09-27'::date,
interval '1 week'
) gs(dte)
Ultimately I expanded on Gordon's solution to get to the following, however Laurenz's answer is slightly more concise.
select
(
case when (week_start - interval '6 days' <= date_trunc('week', '2018-09-05'::date)) then '2018-09-05'::date else week_start end
) as start_date,
(
case when (week_start + interval '6 days' >= '2018-09-27'::date) then '2018-09-27'::date else week_start + interval '6 days' end
) as end_date
from generate_series(
date_trunc('week', '2018-09-05'::date),
'2018-09-27'::date,
interval '1 week'
) gs(week_start);

How to force zero values in Redshift?

If this is my query:
select
(min(timestamp))::date as date,
(count(distinct(user_id)) as user_id_count
(row_number() over (order by signup_day desc)-1) as days_since
from
data.table
where
timestamp >= current_date - 3
group by
timestamp
order by
timestamp asc;
And these are my results
date | user_id_count | days_since
------------+-----------------+-------------
2018-01-22 | 3 | 1
2018-01-23 | 5 | 0
How can I get it the table to show (where the user ID count is 0?):
date | user_id_count | days_since
------------+-----------------+-------------
2018-01-21 | 0 | 0
2018-01-22 | 3 | 1
2018-01-23 | 5 | 0
You need to generate the dates. In Postgres, generate_series() is the way to go:
select g.ts as dte,
count(distinct t.user_id) as user_id_count
row_number() over (order by signup_day desc) - 1) as days_since
from generate_series(current_date::timestamp - interval '3 day', current_date::timestamp, interval '1 day') g(ts) left join
data.table t
on t.timestamp::date = g.ts
group by t.ts
order by t.ts;
You have to create a "calendar" table with dates and left join your aggregated result like that:
with
aggregated_result as (
select ...
)
select
t1.date
,coalesce(t2.user_id_count,0) as user_id_count
,coalesce(t2. days_since,0) as days_since
from calendar t1
left join aggregated_result t2
using (date)
more on creating calendar table: How do I create a dates table in Redshift?