Window function is not allowed in where clause redshift - sql

I have a dates CTE in my below query where I am using limit clause which I don't want to use it. I am trying to understand on how to rewrite my dates CTE so that I can avoid using limit 8 query.
WITH dates AS (
SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column
FROM dimensions.customer LIMIT 8
)
SELECT
dates.week_column,
'W' || ceiling(date_part('week', dates.week_column + INTERVAL '1 day')) AS week_number,
COUNT(DISTINCT features.client_id) AS total
FROM dimensions.program features
JOIN dates ON features.last_update <= dates.week_column
WHERE features.type = 'capacity'
AND features.status = 'CURRENT'
GROUP BY dates.week_column
ORDER by dates.week_column DESC
Below is the output I get from my inner dates CTE query:
SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column
FROM dimensions.customer LIMIT 8
Output from dates CTE :
2021-01-10
2021-01-03
2020-12-27
2020-12-20
2020-12-13
2020-12-06
2020-11-29
2020-11-22
Is there any way to avoid using limit 8 in my CTE query and still get same output? Our platform doesn't allow us to run queries if it has limit clause in it so trying to see if I can rewrite it differently in sql redshift?
If I modify my dates CTE query like this, then it gives me error as window function is not allowed in where clause.
WITH dates AS (
SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column,
ROW_NUMBER() OVER () as seqnum
FROM dimensions.customer
WHERE seqnum <= 8;
)
....
Update
Something like this you mean?
WITH dates AS (
SELECT (date_trunc('week', getdate() + INTERVAL '1 day')::date - 7 * (row_number() over (order by true) - 1) - INTERVAL '1 day')::date AS week_column,
ROW_NUMBER() OVER () as seqnum
FROM dimensions.customer
)
SELECT
dates.week_column,
'W' || ceiling(date_part('week', dates.week_column + INTERVAL '1 day')) AS week_number,
COUNT(DISTINCT features.client_id) AS total
FROM dimensions.program features
JOIN dates ON features.last_update <= dates.week_column
WHERE dates.seqnum <= 8
AND features.type = 'capacity'
AND features.status = 'CURRENT'
GROUP BY dates.week_column
ORDER by dates.week_column DESC

Just move your WHERE clause to the outer SELECT. Seqnum doesn't exists until the CTE runs but does exist when the result of the CTE is consumed.
UPDATE ...
After moving the where clause AndyP got a correlated subquery error coming from a WHERE clause not included in the posted query. As shown in this somewhat modified query:
WITH dates AS
(
SELECT (DATE_TRUNC('week',getdate () +INTERVAL '1 day')::DATE- 7*(ROW_NUMBER() OVER (ORDER BY TRUE) - 1) -INTERVAL '1 day')::DATE AS week_of
FROM (SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X)
)
SELECT dates.week_of,
'W' || CEILING(DATE_PART('week',dates.week_of +INTERVAL '1 day')) AS week_number,
COUNT(DISTINCT features.id) AS total
FROM dimensions.program features
JOIN dates ON features.last_update <= dates.week_of
WHERE features.version = (SELECT MAX(version)
FROM headers f2
WHERE features.id = f2.id
AND features.type = f2.type
AND f2.last_update <= dates.week_of)
AND features.type = 'type'
AND features.status = 'live'
GROUP BY dates.week_of
ORDER BY dates.week_of DESC;
This was an interesting replacement of a correlated query with a join due to the inequality in the correlated sub query. We thought others might be helped by posting the final solution. This works:
WITH dates AS
(
SELECT (DATE_TRUNC('week',getdate () +INTERVAL '1 day')::DATE- 7*(ROW_NUMBER() OVER (ORDER BY TRUE) - 1) -INTERVAL '1 day')::DATE AS week_of
FROM (SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X)
)
SELECT dates.week_of,
'W' || CEILING(DATE_PART('week',dates.week_of +INTERVAL '1 day')) AS week_number,
COUNT(DISTINCT features.carrier_id) AS total
FROM dimensions.program features
JOIN dates ON features.last_update <= dates.week_of
JOIN (SELECT MAX(MAX(version)) OVER(Partition by id, type Order by dates.weeks_of rows unbounded preceding) AS feature_version,
f2.id,
f2.type,
dates.week_of
FROM dimensions.headers f2
JOIN dates ON f2.last_update <= dates.week_of
GROUP BY f2.id,
f2.type,
dates.week_of) f2
ON features.id = f2.id
AND features.type = f2.type
AND f2.week_of = dates.week_of
AND features.version = f2.version
WHERE features.type = 'type'
AND features.status = 'live'
GROUP BY dates.week_of
ORDER BY dates.week_of DESC;
Needing to make a data segment that had all the possible Max(version) for all possible week_of values was the key. Hopefully having both of these queries posted will help other fix correlated subquery errors.

Related

Getting a period index from a date in PostgreSQL

Here is a Postgres code I created, it works. Is there a way to code it in a more efficient way? My goal is to get how much periods a given date falls from 2014-03-01. One period is a half-year starting from March or September.
I updated this code below on 2022-05-18 at 10:19 UTC+2
select date,
dense_rank() over (order by half_year_mar_sep) as period_index
from
(
select date as date,
case when extract(month from date) = 12 then (extract(year from date) || '-09-01')
when extract(month from date) in (1, 2) then (extract(year from date) - 1 || '-09-01')
when extract(month from date) in (3, 4, 5) then (extract(year from date) || '-03-01')
when extract(month from date) in (6, 7, 8) then (extract(year from date) || '-03-01')
else extract(year from date) || '-09-01'
end::date as half_year_mar_sep
from
(
select generate_series(date '2014-03-01', CURRENT_DATE, interval '1 day')::date as date
) s1
) s2
If I encapsulate the code above into select min(date), period_index from (<code above>) s3 group by 2 order by 1 then here is the result what I need:
WITH cte AS (
SELECT
date1::date,
rank() OVER (ORDER BY date1)
FROM generate_series(date '2014-03-01', CURRENT_DATE + interval '1' month, interval '6 month') g (date1)
),
cteall AS (
SELECT
all_date::date
FROM
generate_series(date '2014-03-01', CURRENT_DATE + interval '1' month, interval ' 1 day') s (all_date)
),
cte3 AS (
SELECT
*
FROM
cteall c1
LEFT JOIN cte c2 ON date1 = all_date
),
cte4 AS (
SELECT
*,
count(rank) OVER w AS ct_str
FROM
cte3
WINDOW w AS (ORDER BY all_date))
SELECT
*,
rank() OVER (PARTITION BY ct_str ORDER BY all_date) AS rank1,
dense_rank() OVER (ORDER BY all_date) AS dense_rank1
FROM
cte4;
Hope it's not intimidating. personally I found cte is a good tool, since it make logic more clearly.
demo
useful link: How to do forward fill as a PL/PGSQL function
If some column don't need, you can simple replace * with the columns you want.
Based on #Mark's answer I wrote this code below, but it's not simpler than the original code.
select s.date,
m.period_index
from
(
select date::date as half_year_start,
rank() over (order by date) as period_index,
coalesce(lead(date::date, 1) over (), CURRENT_DATE) as following_half_year_start
from generate_series(date '2014-03-01', CURRENT_DATE + interval '1' month, interval '6 month') as date
) m
left join
(
select generate_series(date '2014-03-01', CURRENT_DATE, interval '1 day')::date as date
) s
on s.date between m.half_year_start and m.following_half_year_start
;

How to get the percentage change from same time 7 days ago?

I have a big PostgreSQL database with time series data.
I query the data with a resample to one hour. What I want is to compare the the mean value from the last hour to the value 7 days ago at the same time and I don't know how to do it.
This is what I use to get the latest value.
SELECT DATE_TRUNC('hour', datetime) AS time, AVG(value) as value, id FROM database
GROUP BY id, time
WHERE datetime > now()- '01:00:00'::interval
You can use a CTE to calculate last week's average in the same time period, then join on id and hour.
with last_week as
(
SELECT
id,
extract(hour from datetime) as time,
avg(value) as avg_value
FROM my_table
where DATE_TRUNC('hour', datetime) =
(date_trunc('hour', now() - interval '7 DAYS'))
group by 1,2
)
select n.id,
DATE_TRUNC('hour', n.datetime) AS time_now,
avg(n.value) as avg_now,
t.avg_value as avg_last_week
from my_table n
left join last_week t
on t.id = n.id
and t.time = extract(hour from n.datetime)
where datetime > now()- '01:00:00'::interval
group by 1,2,4
order by 1
I'm making a few assumptions on how your data appear.
**EDIT - JUST NOTICED YOU ASKED FOR PERCENT CHANGE
Showing change as decimal...
select id,
extract(hour from time_now) as hour_now,
avg_now,
avg_last_week,
coalesce(((avg_now - avg_last_week) / avg_last_week), 0) AS CHANGE
from (
with last_week as
(
SELECT
id,
extract(hour from datetime) as time,
avg(value) as avg_value
FROM my_table
where DATE_TRUNC('hour', datetime) =
(date_trunc('hour', now() - interval '7 DAYS'))
group by 1,2
)
select n.id,
DATE_TRUNC('hour', n.datetime) AS time_now,
avg(n.value) as avg_now,
t.avg_value as avg_last_week
from my_table n
left join last_week t
on t.id = n.id
and t.time = extract(hour from n.datetime)
where datetime > now()- '01:00:00'::interval
group by 1,2,4
)z
group by 1,2,3,4
order by 1,2
db-fiddle found here: https://www.db-fiddle.com/f/rWJATypGzHPZ8sG2vXAGXC/4

How to avoid using limit query in sql redshift to get last x weeks of data

I have a query below which gives me data for the past 8 weeks and it works fine -
WITH dates
AS (
SELECT (
date_trunc('week', getdate() + INTERVAL '1 day')::DATE - 7 * (
row_number() OVER (
ORDER BY true
) - 1
) - INTERVAL '1 day'
)::DATE AS week_info
FROM data.process LIMIT 8
)
SELECT dates.week_info
,'W' || ceiling(date_part('week', dates.week_info + INTERVAL '1 day')) AS week_number
,COUNT(DISTINCT zeus.client_id) AS PROC
FROM data.active_values zeus
JOIN dates ON zeus.updated_timestamp <= dates.week_info
WHERE zeus.kites_version = (
SELECT MAX(kites_version)
FROM data.active_values f2
WHERE zeus.client_id = f2.client_id
AND zeus.type = f2.type
AND f2.updated_timestamp <= dates.week_info
)
AND zeus.type = 'hello-world'
AND zeus.STATUS = 'CURRENT'
GROUP BY dates.week_info
ORDER BY dates.week_info DESC LIMIT 8
But the problem I have is I am using limit 8 to get the above query working. I am trying to see if there is any way by which we can avoid using limit 8 and just use zeus.updated_timestamp value to get the past 8 weeks data in a similar output format as my current query is giving?
Output is coming like this from above query and I want it to be in this format only:
week_info week_number PROC
--------------------------------
2020-10-25 W44 100
2020-10-18 W43 101
2020-10-11 W42 109
2020-10-04 W41 134
2020-09-27 W40 982
2020-09-20 W39 187
2020-09-13 W38 765
2020-09-06 W37 234
Note:-
updated_timestamp column has full date in it like 2020-10-28 18:56:25:17
So 2 removals of LIMIT requested. The first, in the CTE, can be replaced by adding a WHERE clause in the outer select - "WHERE dates.week_info > 8 weeks ago" (I'll leave it to you to define 8 weeks ago. Also there are more efficient ways to make 8 dates than using a window function and scanning an unneeded table but that is your choice. Changing this will remove the LIMIT / WHERE need all together. Your CTE then looks something like:
select date_trunc('week', getdate() + INTERVAL '1 day')::DATE - (t.num * 7) - 1 as week_info
from (select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7) as t (num)
The second LIMIT is coming about because of the inequality in the JOIN clause which is causing a lot of row replication - I hope this is really what you need. There will only be 8 dates coming from the CTE AND having a GROUP BY on this date means that there will only be 8 rows of output. If there are only 8 possible rows there is no reason to have a LIMIT.
EDIT - merged code (untested):
WITH dates
AS (
select date_trunc('week', getdate() + INTERVAL '1 day')::DATE - (t.num * 7) - 1 as week_info
from (select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7) as t (num)
)
SELECT dates.week_info
,'W' || ceiling(date_part('week', dates.week_info + INTERVAL '1 day')) AS week_number
,COUNT(DISTINCT zeus.client_id) AS PROC
FROM data.active_values zeus
JOIN dates ON zeus.updated_timestamp <= dates.week_info
WHERE zeus.kites_version = (
SELECT MAX(kites_version)
FROM data.active_values f2
WHERE zeus.client_id = f2.client_id
AND zeus.type = f2.type
AND f2.updated_timestamp <= dates.week_info
)
AND zeus.type = 'hello-world'
AND zeus.STATUS = 'CURRENT'
GROUP BY dates.week_info
ORDER BY dates.week_info DESC
EDIT 2 - attempt to address correlated subquery issue:
If I understand correctly the where clause in question is just trying to ensure that only client_ids with values that match on kite_version are counted. A more direct (and less error prone) way to get this is to calculate the subgroup max directly. The below code attempts to do this but I don't have your data nor your business intent so this is an example of a better way to attack this type of requirement.
WITH dates
AS (
select date_trunc('week', getdate() + INTERVAL '1 day')::DATE - (t.num * 7) - 1 as week_info
from (select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7) as t (num)
),
active_values_plus AS (
SELECT client_id, updated_timestamp, type, status, kites_version, MAX(kites_version) OVER (PARTITION BY client_id, type) AS max_kites_version
FROM data.active_values
)
SELECT dates.week_info
,'W' || ceiling(date_part('week', dates.week_info + INTERVAL '1 day')) AS week_number
,COUNT(DISTINCT zeus.client_id) AS PROC
FROM active_values_plus zeus
JOIN dates ON zeus.updated_timestamp <= dates.week_info
WHERE zeus.kites_version = zeus.max_kites_version
AND zeus.type = 'hello-world'
AND zeus.STATUS = 'CURRENT'
GROUP BY dates.week_info
ORDER BY dates.week_info DESC

SQL: Return '0' for a row if it doesn't exist

I have a SQL query which displays count, date, and time.
This is what the output looks like:
And this is my SQL query:
select
count(*),
to_char(timestamp, 'MM/DD/YYYY'),
to_char(timestamp, 'HH24')
from
MY_TABLE
where
timestamp >= to_timestamp('03/01/2016','MM/DD/YYYY')
group by
to_char(timestamp, 'MM/DD/YYYY'), to_char(timestamp, 'HH24')
Now, in COUNT column, I want to display 0 if the count doesn't exist for that hour. So on 3/2/2016 at 8am, the count was 6. Then at 9am the count was 0 so that row didn't get displayed. I want to display that row. And at 10am & 11am, the counts are displayed then it just goes to next day.
So how do I display count of 0? I want to display 0 count for each day every hour doesn't matter if it's 0 or 6 or whatever. Thanks :)
Use a partition outer join:
SELECT m.day,
h.hr,
COALESCE( freq, 0 ) AS freq
FROM ( SELECT LEVEL - 1 AS hr
FROM DUAL
CONNECT BY LEVEL <= 24
) h
LEFT OUTER JOIN
( SELECT COUNT(*) AS freq,
TO_CHAR( "timestamp", 'mm/dd/yyyy' ) AS day,
EXTRACT( HOUR FROM "timestamp" ) AS hr
FROM MY_TABLE
WHERE "timestamp" >= TIMESTAMP '2016-03-01 00:00:00'
GROUP BY
TO_CHAR( "timestamp", 'mm/dd/yyyy' ),
EXTRACT( HOUR FROM "timestamp" )
) m
PARTITION BY ( m.day, m.hr )
ON ( m.hr = h.hr );
Use a cte to generate numbers for all the hours in a day. Then cross join the result with all the possible dates from the table. Then left join on the cte which has all date and hour combinations, to get a 0 count when a row is absent for a particular hour.
with nums(n) as (select 1 from dual
union all
select n+1 from nums where n < 24)
,dateshrscomb as (select n,dt
from nums
cross join (select distinct trunc(timestamp) dt from my_table
where timestamp >= to_timestamp('03/01/2016','MM/DD/YYYY')
) alldates
)
select count(trunc(m.timestamp)), d.dt, d.n
from dateshrscomb d
left join MY_TABLE m on to_char(m.timestamp, 'HH24') = d.n
and trunc(m.timestamp) = d.dt
and m.timestamp >= to_timestamp('03/01/2016','MM/DD/YYYY')
group by d.dt, d.n
with cteHours(h) as (select 0 from dual
union all
select h+1 from cteHours where h < 24)
, cteDates(d) AS (
SELECT
trunc(MIN(timestamp)) as d
FROM
My_Table
WHERE
timestamp >= to_timestamp('03/01/2016','MM/DD/YYYY')
UNION ALL
SELECT
d + 1 as d
FROM
cteDates
WHERE
d + 1 <= (SELECT trunc(MAX(timestamp)) FROM MY_TABLE)
)
, datesNumsCross (d,h) AS (
SELECT
d, h
FROM
cteDates
CROSS JOIN cteHours
)
select count(*), to_char(d.d, 'MM/DD/YYYY'), d.h
from datesNumsCross d
LEFT JOIN MY_TABLE m
ON d.d = trunc(m.timestamp)
AND d.h = to_char(m.timestamp, 'HH24')
group by d.d, d.h
#VPK is doing a good job at answering, I just happened to be writing this at the same time as his last edit to generate a date hour cross join. This solution differs from his in that it will get all dates between your desired max and min. Where as his will get only the dates within the table so if you have a day missing completely it would not be represented in his but would in this one. Plus I did a little clean up on the joins.
Here is one way to do that.
Using Oracle's hierarchical query feature and level psuedo column, generate the dates and hours.
Then do an outer join of above with your data.
Need to adjust the value of level depending upon your desired range (This example uses 120). Start date needs to be set as well. It is ( trunc(sysdate, 'hh24')-2/24 ) in this example.
select nvl(c1.cnt, 0), d1.date_part, d1.hour_part
from
(
select
to_char(s.dt - (c.lev)/24, 'mm/dd/yyyy') date_part,
to_char(s.dt - (c.lev)/24, 'hh24') hour_part
from
(select level lev from dual connect by level <= 120) c,
(select trunc(sysdate, 'hh24')-2/24 dt from dual) s
where (s.dt - (c.lev)/24) < trunc(sysdate, 'hh24')-2/24
) d1
full outer join
(
select
count(*) cnt,
to_char(timestamp, 'MM/DD/YYYY') date_part,
to_char(timestamp, 'HH24') hour_part
from
MY_TABLE
where
timestamp >= to_timestamp('03/01/2016','MM/DD/YYYY')
group by
to_char(timestamp, 'MM/DD/YYYY'), to_char(timestamp, 'HH24')
) c1
on d1.date_part = c1.date_part
and d1.hour_part = c1.hour_part

Postgresql group month wise with missing values

first an example of my table:
id_object;time;value;status
1;2014-05-22 09:30:00;1234;1
1;2014-05-22 09:31:00;2341;2
1;2014-05-22 09:32:00;1234;1
...
1;2014-06-01 00:00:00;4321;1
...
Now i need count all rows with status=1 and id_object=1 monthwise for example. this is my query:
SELECT COUNT(*)
FROM my_table
WHERE id_object=1
AND status=1
AND extract(YEAR FROM time)=2014
GROUP BY extract(MONTH FROM time)
The result for this example is:
2
1
2 for may and 1 for june but i need a output with all 12 months, also months with no data. for this example i need this ouput:
0 0 0 0 2 1 0 0 0 0 0 0
Thx for help.
you can use generate_series() function like this:
select
g.month,
count(m)
from generate_series(1, 12) as g(month)
left outer join my_table as m on
m.id_object = 1 and
m.status = 1 and
extract(year from m.time) = 2014 and
extract(month from m.time) = g.month
group by g.month
order by g.month
sql fiddle demo
Rather than comparing with an extracted value, you'll want to use a range-table instead. Something that looks like this:
month startOfMonth nextMonth
1 '2014-01-01' '2014-02-01'
2 '2014-02-01' '2014-03-01'
......
12 '2014-12-01' '2015-01-01'
As in #Roman's answer, we'll start with generate_series(), this time using it to generate the range table:
WITH Month_Range AS (SELECT EXTRACT(MONTH FROM month) AS month,
month AS startOfMonth,
month + INTERVAL '1 MONTH' AS nextMonth
FROM generate_series(CAST('2014-01-01' AS DATE),
CAST('2014-12-01' AS DATE),
INTERVAL '1 month') AS mr(month))
SELECT Month_Range.month, COUNT(My_Table)
FROM Month_Range
LEFT JOIN My_Table
ON My_Table.time >= Month_Range.startOfMonth
AND My_Table.time < Month_Range.nextMonth
AND my_table.id_object = 1
AND my_table.status = 1
GROUP BY Month_Range.month
ORDER BY Month_Range.month
(As a side note, I'm now annoyed at how PostgreSQL handles intervals)
SQL Fiddle Demo
The use of the range will allow any index including My_Table.time to be used (although not if an index was built over an EXTRACTed column.
EDIT:
Modified query to take advantage of the fact that generate_series(...) will also handle date/time series.
generate_series can generate timestamp series
select
g.month,
count(t)
from
generate_series(
(select date_trunc('year', min(t.time)) from t),
(select date_trunc('year', max(t.time)) + interval '11 months' from t),
interval '1 month'
) as g(month)
left outer join
t on
t.id_object = 1 and
t.status = 1 and
date_trunc('month', t.time) = g.month
where date_trunc('year', g.month) = '2014-01-01'::date
group by g.month
order by g.month