Postgresql group month wise with missing values - sql

first an example of my table:
id_object;time;value;status
1;2014-05-22 09:30:00;1234;1
1;2014-05-22 09:31:00;2341;2
1;2014-05-22 09:32:00;1234;1
...
1;2014-06-01 00:00:00;4321;1
...
Now i need count all rows with status=1 and id_object=1 monthwise for example. this is my query:
SELECT COUNT(*)
FROM my_table
WHERE id_object=1
AND status=1
AND extract(YEAR FROM time)=2014
GROUP BY extract(MONTH FROM time)
The result for this example is:
2
1
2 for may and 1 for june but i need a output with all 12 months, also months with no data. for this example i need this ouput:
0 0 0 0 2 1 0 0 0 0 0 0
Thx for help.

you can use generate_series() function like this:
select
g.month,
count(m)
from generate_series(1, 12) as g(month)
left outer join my_table as m on
m.id_object = 1 and
m.status = 1 and
extract(year from m.time) = 2014 and
extract(month from m.time) = g.month
group by g.month
order by g.month
sql fiddle demo

Rather than comparing with an extracted value, you'll want to use a range-table instead. Something that looks like this:
month startOfMonth nextMonth
1 '2014-01-01' '2014-02-01'
2 '2014-02-01' '2014-03-01'
......
12 '2014-12-01' '2015-01-01'
As in #Roman's answer, we'll start with generate_series(), this time using it to generate the range table:
WITH Month_Range AS (SELECT EXTRACT(MONTH FROM month) AS month,
month AS startOfMonth,
month + INTERVAL '1 MONTH' AS nextMonth
FROM generate_series(CAST('2014-01-01' AS DATE),
CAST('2014-12-01' AS DATE),
INTERVAL '1 month') AS mr(month))
SELECT Month_Range.month, COUNT(My_Table)
FROM Month_Range
LEFT JOIN My_Table
ON My_Table.time >= Month_Range.startOfMonth
AND My_Table.time < Month_Range.nextMonth
AND my_table.id_object = 1
AND my_table.status = 1
GROUP BY Month_Range.month
ORDER BY Month_Range.month
(As a side note, I'm now annoyed at how PostgreSQL handles intervals)
SQL Fiddle Demo
The use of the range will allow any index including My_Table.time to be used (although not if an index was built over an EXTRACTed column.
EDIT:
Modified query to take advantage of the fact that generate_series(...) will also handle date/time series.

generate_series can generate timestamp series
select
g.month,
count(t)
from
generate_series(
(select date_trunc('year', min(t.time)) from t),
(select date_trunc('year', max(t.time)) + interval '11 months' from t),
interval '1 month'
) as g(month)
left outer join
t on
t.id_object = 1 and
t.status = 1 and
date_trunc('month', t.time) = g.month
where date_trunc('year', g.month) = '2014-01-01'::date
group by g.month
order by g.month

Related

get List of counts from table based on dates in sql

I have to fetch List of counts from table by department here is my table structure
empid empname department departmentId joinedon
i want to populate all the joined employee on today , yesterday and More than 2 days like [12,25,89] i.e
12* joined today
25 joined yesterday
81 joined all prior to yesterday(2+day)
* 0 if there isn't any entries for given date range.
You would use aggregation on a case expression:
select (case when joinedon::date = current_date then 'today'
when joinedon::date = current_date - interval '1 day' then 'yesterday'
when joinedon::date < current_date - interval '1 day' then 'older'
end) as grp,
count(*)
from t
group by grp;
In additional to #Gordon Linoff answer:
SELECT
days.day,
coalesce(t.cnt, 0) count
FROM (
SELECT * FROM (VALUES ('today'), ('yesterday'), ('older')) AS days (day)
)days
LEFT JOIN (
SELECT (CASE WHEN joinedon::date = current_date THEN 'today'
WHEN joinedon::date = current_date - interval '1 day' THEN 'yesterday'
WHEN joinedon::date < current_date - interval '1 day' THEN 'older'
end) as day,
count(*) cnt
FROM t
GROUP BY day
) t on t.day = days.day;
Test it here
You can use the group by as follows:
select department,
(case when joinedon::date = current_date then 'today'
when joinedon::date = current_date - interval '1 day' then 'yesterday'
when joinedon::date < current_date - interval '1 day' then 'More than 2 days'
end) as grp,
Coalesce(count(*),0)
from t
group by grp, department;

Window function is not allowed in where clause redshift

I have a dates CTE in my below query where I am using limit clause which I don't want to use it. I am trying to understand on how to rewrite my dates CTE so that I can avoid using limit 8 query.
WITH dates AS (
SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column
FROM dimensions.customer LIMIT 8
)
SELECT
dates.week_column,
'W' || ceiling(date_part('week', dates.week_column + INTERVAL '1 day')) AS week_number,
COUNT(DISTINCT features.client_id) AS total
FROM dimensions.program features
JOIN dates ON features.last_update <= dates.week_column
WHERE features.type = 'capacity'
AND features.status = 'CURRENT'
GROUP BY dates.week_column
ORDER by dates.week_column DESC
Below is the output I get from my inner dates CTE query:
SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column
FROM dimensions.customer LIMIT 8
Output from dates CTE :
2021-01-10
2021-01-03
2020-12-27
2020-12-20
2020-12-13
2020-12-06
2020-11-29
2020-11-22
Is there any way to avoid using limit 8 in my CTE query and still get same output? Our platform doesn't allow us to run queries if it has limit clause in it so trying to see if I can rewrite it differently in sql redshift?
If I modify my dates CTE query like this, then it gives me error as window function is not allowed in where clause.
WITH dates AS (
SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column,
ROW_NUMBER() OVER () as seqnum
FROM dimensions.customer
WHERE seqnum <= 8;
)
....
Update
Something like this you mean?
WITH dates AS (
SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column,
ROW_NUMBER() OVER () as seqnum
FROM dimensions.customer
)
SELECT
dates.week_column,
'W' || ceiling(date_part('week', dates.week_column + INTERVAL '1 day')) AS week_number,
COUNT(DISTINCT features.client_id) AS total
FROM dimensions.program features
JOIN dates ON features.last_update <= dates.week_column
WHERE dates.seqnum <= 8
AND features.type = 'capacity'
AND features.status = 'CURRENT'
GROUP BY dates.week_column
ORDER by dates.week_column DESC
Just move your WHERE clause to the outer SELECT. Seqnum doesn't exists until the CTE runs but does exist when the result of the CTE is consumed.
UPDATE ...
After moving the where clause AndyP got a correlated subquery error coming from a WHERE clause not included in the posted query. As shown in this somewhat modified query:
WITH dates AS
(
SELECT (DATE_TRUNC('week',getdate () +INTERVAL '1 day')::DATE- 7*(ROW_NUMBER() OVER (ORDER BY TRUE) - 1) -INTERVAL '1 day')::DATE AS week_of
FROM (SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X)
)
SELECT dates.week_of,
'W' || CEILING(DATE_PART('week',dates.week_of +INTERVAL '1 day')) AS week_number,
COUNT(DISTINCT features.id) AS total
FROM dimensions.program features
JOIN dates ON features.last_update <= dates.week_of
WHERE features.version = (SELECT MAX(version)
FROM headers f2
WHERE features.id = f2.id
AND features.type = f2.type
AND f2.last_update <= dates.week_of)
AND features.type = 'type'
AND features.status = 'live'
GROUP BY dates.week_of
ORDER BY dates.week_of DESC;
This was an interesting replacement of a correlated query with a join due to the inequality in the correlated sub query. We thought others might be helped by posting the final solution. This works:
WITH dates AS
(
SELECT (DATE_TRUNC('week',getdate () +INTERVAL '1 day')::DATE- 7*(ROW_NUMBER() OVER (ORDER BY TRUE) - 1) -INTERVAL '1 day')::DATE AS week_of
FROM (SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X)
)
SELECT dates.week_of,
'W' || CEILING(DATE_PART('week',dates.week_of +INTERVAL '1 day')) AS week_number,
COUNT(DISTINCT features.carrier_id) AS total
FROM dimensions.program features
JOIN dates ON features.last_update <= dates.week_of
JOIN (SELECT MAX(MAX(version)) OVER(Partition by id, type Order by dates.weeks_of rows unbounded preceding) AS feature_version,
f2.id,
f2.type,
dates.week_of
FROM dimensions.headers f2
JOIN dates ON f2.last_update <= dates.week_of
GROUP BY f2.id,
f2.type,
dates.week_of) f2
ON features.id = f2.id
AND features.type = f2.type
AND f2.week_of = dates.week_of
AND features.version = f2.version
WHERE features.type = 'type'
AND features.status = 'live'
GROUP BY dates.week_of
ORDER BY dates.week_of DESC;
Needing to make a data segment that had all the possible Max(version) for all possible week_of values was the key. Hopefully having both of these queries posted will help other fix correlated subquery errors.

Customizing the range of a week with date_trunc

I've been trying for hours now to write a date_trunc statement to be used in a group by where my week starts on a Friday and ends the following Thursday.
So something like
SELECT
DATE_TRUNC(...) sales_week,
SUM(sales) sales
FROM table
GROUP BY 1
ORDER BY 1 DESC
Which would return the results for the last complete week (by those standards) as 09-13-2019.
You can subtract 4 days and then add 4 days:
SELECT DATE_TRUNC(<whatever> - INTERVAL '4 DAY') + INTERVAL '4 DAY' as sales_week,
SUM(sales) as sales
FROM table
GROUP BY 1
ORDER BY 1 DESC
The expression
select current_date - cast(cast(7 - (5 - extract(dow from current_date)) as text) || ' days' as interval);
should always give you the previous Friday's date.
if by any chance you might have gaps in data (maybe more granular breakdowns vs just per week), you can generate a set of custom weeks and left join to that:
drop table if exists sales_weeks;
create table sales_weeks as
with
dates as (
select generate_series('2019-01-01'::date,current_date,interval '1 day')::date as date
)
,week_ids as (
select
date
,sum(case when extract('dow' from date)=5 then 1 else 0 end) over (order by date) as week_id
from dates
)
select
week_id
,min(date) as week_start_date
,max(date) as week_end_date
from week_ids
group by 1
order by 1
;

PostgreSQL generate month and year series based on table field and fill with nulls if no data for a given month

I want to generate series of month and year from the next month of current year(say, start_month) to 12 months from start_month along with the corresponding data (if any, else return nulls) from another table in PostgreSQL.
SELECT ( ( DATE '2019-03-01' + ( interval '1' month * generate_series(0, 11) ) )
:: DATE ) dd,
extract(year FROM ( DATE '2019-03-01' + ( interval '1' month *
generate_series(0, 11) )
)),
coalesce(SUM(price), 0)
FROM items
WHERE s.date_added >= '2019-03-01'
AND s.date_added < '2020-03-01'
AND item_type_id = 3
GROUP BY 1,
2
ORDER BY 2;
The problem with the above query is that it is giving me the same value for price for all the months. The requirement is that the price column be filled with nulls or zeros if no price data is available for a given month.
Put the generate_series() in the FROM clause. You are summarizing the data -- i.e. calculating the price over the entire range -- and then projecting this on all months. Instead:
SELECT gs.yyyymm,
coalesce(SUM(i.price), 0)
FROM generate_series('2019-03-01'::date, '2020-02-01', INTERVAL '1 MONTH'
) gs(yyyymm) LEFT JOIN
items i
ON gs.yyyymm = DATE_TRUNC('month', s.date_added) AND
i.item_type_id = 3
GROUP BY gs.yyyymm
ORDER BY gs.yyyymm;
You want generate_series in the FROM clause and join with it, somewhat like
SELECT months.m::date, ...
FROM generate_series(
start_month,
start_month + INTERVAL '11 months',
INTERVAL '1 month'
) AS months(m)
LEFT JOIN items
ON months.m::date = items.date_added

Summing Moving Range and Criteria, Grouping by Day

What I'm trying to do is sum the last 30 days based on criteria and group by the day, one day of code looks like this:
select
sum(case when f.hire_date__c between '2017-08-01 00:00:00' and '2017-09-01
00:00:00'
and t.createddate between '2017-08-01 00:00:00' and '2017-09-01 00:00:00'
and t.name = 'Request' then 1 else 0 end) as Requests
from case_task_c as t
join case_file_c as f
on f.id = t.case_file__c
I could adjust dates accordingly for the 30 day look back based on today's date, etc. What I can't figure out is to have this query group by day for each day, i.e, yesterdays results, the day prior, etc for the adjusted date ranges.
So far I have this:
select
date(cast(f.hire_date__c as date)),
row_number() over (order by f.hire_date__c desc) as rownumber,
rr.Cancels as Cancels,
qq.hires as hires,
sum(rr.Cancels) over (rows between 1 following and 30 following) as
CumulCancel,
sum(qq.Hires) over (rows between 1 following and 30 following) as Hires
from case_file_c as f
left join(
select
cast(f.hire_date__c as date) as date1,
sum(case when
t.name = 'Cancellation Request' then 1 else 0 end) as Cancels
from case_task_c as t
join case_file_c as f
on f.id = t.case_file__c
group by date1)
as rr
on rr.date1 = cast(f.hire_date__c as date)
left join(
select
cast(f.hire_date__c as date) as date2,
sum(case when f.hire_date__c is not null then 1 else 0 end) as
hires
from sf_case_file_c as f
group by date2) as qq
on qq.date2 = cast(f.hire_date__c as date)
where f.hire_date__c is not null
and f.hire_date__c >= '2017-01-01 00:00:00'
and f.hire_date__c between date_add('day',-30,current_date) and current_date
group by f.hire_date__c, rr.Cancels, qq.hires
order by f.hire_date__c desc
Even using 'current_date - interval -30 day' is just looking up.. the current date.
Using Postgres 8.0.2
Use group by like following. You are converting datetime to date in column selection but not in group by
GROUP BY date(cast(f.hire_date__c as date)),rr.Cancels, qq.hires