Where clause inside an over clause in postgres - sql

Is it possible to use the where clause inside an overclause as below ?
SELECT SUM(amount) OVER(partition by prod_name WHERE dateval > dateval_13week)
I cannot use preceding and following inside over clause as my dates are not in the order.
All I need to fetch is the records that are less than 13week date value of the current record.
EDIT :
sum(CASE WHEN dateval >= dateval_13week and dateval <=current_row_dateval then amount else 0 end) over (partition by prod_name order by week_end desc)
Just to elaborate, earlier I was partitioning the records with the below query when I had all my dates in a sequence. Now I have the dates in random order and there are some missing dates.
sum(amount) over
(partition by prod_name order by prod_name,week_end desc rows between 0 preceding and 12 following)

Adding to #D Stanley answer you can use FILTER clause for aggregate function in Postgre:
SELECT SUM(amount) FILTER (WHERE dateval > dateval_13week)
OVER(partition by prod_name)

You could simulate the WHERE in your SUM parameter:
SELECT SUM(CASE WHEN dateval > dateval_13week THEN amount ELSE 0 END)
OVER(partition by prod_name)

You cannot filter the rows with the WHERE clause, inside the OVER partition clause.
You can fix the query selecting only the rows that are needed to you, using CASE and performing a sum of the amount where the condition is satisfied.

Related

Column neither grouped nor aggregated after introducing window query

I have trouble integrating a simple window function into my query. I work with this avocado dataset from Kaggle. I started off with a simple query:
SELECT
date,
SUM(Total_Bags) as weekly_bags,
FROM
`course.avocado`
WHERE
EXTRACT(year FROM date) = 2015
GROUP BY
date
ORDER BY
date
And it works just fine. Next, I want to add the rolling sum to the query to display along the weekly sum. I tried the following:
SELECT
date,
SUM(Total_Bags) as weekly_bags,
SUM(Total_Bags) OVER(
PARTITION BY date
ORDER BY date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
)
FROM
`course.avocado`
WHERE
EXTRACT(year FROM date) = 2015
GROUP BY
date
ORDER BY
date
but im getting the common error:
SELECT list expression references column Total_Bags which is neither grouped nor aggregated at [4:7]
and im confused. Total_Bags in the first query was aggregated yet when it's introduced again in the second query, it's not aggregated anymore. How do I fix this query? Thanks.
In your query, which returns 2 columns: date and aggregate SUM(Total_Bags), the window function SUM() is evaluated after the aggregation when there is no column Total_Bags and this is why you can't use it inside the window function.
However, you can do want you want, without group by, by using only window functions and DISTINCT:
SELECT DISTINCT date,
SUM(Total_Bags) OVER(PARTITION BY date) AS weekly_bags,
SUM(Total_Bags) OVER(
PARTITION BY date
ORDER BY date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
)
FROM course.avocado
WHERE EXTRACT(year FROM date) = 2015
ORDER BY date;
or, use window function on the the aggregated result:
SELECT date,
SUM(Total_Bags) AS weekly_bags,
SUM(SUM(Total_Bags)) OVER(
ORDER BY date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
)
FROM course.avocado
WHERE EXTRACT(year FROM date) = 2015
GROUP BY date
ORDER BY date;
I tried to approach it from a different angle and seems I have figured it out, the results seem just right. Here's the code:
WITH daily_bags AS
(SELECT
Date,
CAST(SUM(Total_Bags) as int64) as all_bags
FROM
`course.avocado`
WHERE
EXTRACT(year from Date) = 2015
GROUP BY
Date
ORDER BY
Date)
SELECT
Date,
all_bags,
SUM(all_bags) OVER(
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
) as rolling_sum
FROM
daily_bags
Thanks everyone for your help.

How can i group rows on sql base on condition

I am using redshift sql and would like to group users who has overlapping voucher period into a single row instead (showing the minimum start date and max end date)
For E.g if i have these records,
I would like to achieve this result using redshift
Explanation is tat since row 1 and row 2 has overlapping dates, I would like to just combine them together and get the min(Start_date) and max(End_Date)
I do not really know where to start. Tried using row_number to partition them but does not seem to work well. This is what I tried.
select
id,
start_date,
end_date,
lag(end_date, 1) over (partition by id order by start_date) as prev_end_date,
row_number() over (partition by id, (case when prev_end_date >= start_date then 1 else 0) order by start_date) as rn
from users
Are there any suggestions out there? Thank you kind sirs.
This is a type of gaps-and-islands problem. Because the dates are arbitrary, let me suggest the following approach:
Use a cumulative max to get the maximum end_date before the current date.
Use logic to determine when there is no overall (i.e. a new period starts).
A cumulative sum of the starts provides an identifier for the group.
Then aggregate.
As SQL:
select id, min(start_date), max(end_date)
from (select u.*,
sum(case when prev_end_date >= start_date then 0 else 1
end) over (partition by id
order by start_date, voucher_code
rows between unbounded preceding and current row
) as grp
from (select u.*,
max(end_date) over (partition by id
order by start_date, voucher_code
rows between unbounded preceding and 1 preceding
) as prev_end_date
from users u
) u
) u
group by id, grp;
Another approach would be using recursive CTE:
Divide all rows into numbered partitions grouped by id and ordered by start_date and end_date
Iterate over them calculating group_start_date for each row (rows which have to be merged in final result would have the same group_start_date)
Finally you need to group the CTE by id and group_start_date taking max end_date from each group.
Here is corresponding sqlfiddle: http://sqlfiddle.com/#!18/7059b/2
And the SQL, just in case:
WITH cteSequencing AS (
-- Get Values Order
SELECT *, start_date AS group_start_date,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY start_date, end_date) AS iSequence
FROM users),
Recursion AS (
-- Anchor - the first value in groups
SELECT *
FROM cteSequencing
WHERE iSequence = 1
UNION ALL
-- Remaining items
SELECT b.id, b.start_date, b.end_date,
CASE WHEN a.end_date > b.start_date THEN a.group_start_date
ELSE b.start_date
END
AS groupStartDate,
b.iSequence
FROM Recursion AS a
INNER JOIN cteSequencing AS b ON a.iSequence + 1 = b.iSequence AND a.id = b.id)
SELECT id, group_start_date as start_date, MAX(end_date) as end_date FROM Recursion group by id, group_start_date ORDER BY id, group_start_date

how can i reset the count to 0 in sql when i have a condition that is false?

i have a sql table which the following data shown in the picture
I need to create a query in sql which counts for ticker the number of consecutive days per year in which
the close_value is greater than the open_value, if close_value is less than the open value the counter must be reset to zero and I have to save the counter in that instant
This is an example of a gaps-and-islands problem. You can use the difference of row_numbers():
select ticker, min(date), max(date), min(open_value), max(close_value),
count(*) as num_rows
from (select t.*,
row_number() over (partition by ticker order by date) as seqnum,
row_number() over (partition by ticker, (case when close_value > open_value then 1 else 2 end) order by date) as seqnum_2
from t
) t
where close_value > open_value
group by ticker, (seqnum - seqnum_2);
This returns all such periods. You haven't specified what the result set should look like, but this should be pretty close.

How to take only one entry from a table based on an offset to a date column value

I have a requirement to get values from a table based on an offset conditions on a date column.
Say for eg: for the below attached table, if there is any dates that comes close within 15 days based on effectivedate column I should return only the first one.
So my expected result would be as below:
Here for A1234 policy, it returns 6/18/16 entry and skipped 6/12/16 entry as the offset between these 2 dates is within 15 days and I took the latest one from the list.
If you want to group rows together that are within 15 days of each other, then you have a variant of the gaps-and-islands problem. I would recommend lag() and cumulative sum for this version:
select polno, min(effectivedate), max(expirationdate)
from (select t.*,
sum(case when prev_ed >= dateadd(day, -15, effectivedate)
then 1 else 0
end) over (partition by polno order by effectivedate) as grp
from (select t.*,
lag(expirationdate) over (partition by polno order by effectivedate) as prev_ed
from t
) t
) t
group by polno, grp;

How do I get a rolling view of counts?

Goal is to count anyone who fits a criteria on three months back from specified date. The (BetweenDate -3 months) is the tricky part. I am operating within a yearly window not 3 months back from getDate() I need it to be three months back from within -3 months of Y. Any ideas?
CREATE TABLE MONTH3LOOK AS Select
to_CHAR(DATE_OF_SERVICE_3013,'YYYY-MM') "Date"
,COUNT(DISTINCT case when (regexp_instr(IS_CONCAT,'(2957|29570|29571|29572|29573|29574|29575|29576|29577|29578|29579)')>0)
and
(DATE_OF_SERVICE_3013 between trunc(DATE_OF_SERVICE_3013,'MM') and add_months(trunc(DATE_OF_SERVICE_3013,'MM'),-3))
then USER end) AS Recip
FROM .NET_SERVICE
WHERE DATE_OF_SERVICE_3013 BETWEEN
TO_DATE('2013-10','YYYY-MM') AND
TO_DATE('2014-03','YYYY-MM')
group by to_CHAR(DATE_OF_SERVICE_3013,'YYYY-MM')
You will likely need to use analytic functions to get your counts and the distinct operator to simulate the group by since including the group by operator interferes with the operation of the analytic functions:
select distinct trunc(date_of_service_3013,'MM') "Date"
, count(case when regexp_like(IS_CONCAT, '(1234|5678|etc)') then user end)
over (order by trunc(date_of_service_3013, 'mm')
range between interval '3' month preceding
and current row) recip
from your_table
where DATE_OF_SERVICE_3013 BETWEEN TO_DATE('2013-10','YYYY-MM')
AND TO_DATE('2014-03','YYYY-MM');
Another way to take the effect of the group by operation into account is to change use both analytic and aggregate functions:
select trunc(date_of_service_3013,'MM') "Date"
, sum(count(case when regexp_like(IS_CONCAT, '1234|5678|etc') then user end))
over (order by trunc(date_of_service_3013, 'mm')
range between interval '3' month preceding
and current row) recip
from your_table
group by trunc(date_of_service_3013,'MM')
where DATE_OF_SERVICE_3013 BETWEEN TO_DATE('2013-10','YYYY-MM')
AND TO_DATE('2014-03','YYYY-MM');
Here the aggregate count works on a month by month basis as per the group by clause, then it uses the analytic sum to add up those counts.
One thing about these two solutions, the where clause will prevent any records prior to 2013-10 from being counted. If you want to include records prior to 2013-10 in the counts but only output 2013-10 to 2014-03 then you'll need to do it in two stages using either of the two queries above inside the with t1 as (...) subfactored query block with the starting date adjusted appropriately:
with t1 as (
select distinct trunc(date_of_service_3013,'MM') "Date"
, count(case when regexp_like(IS_CONCAT, '1234|5678|etc') then user end)
over (order by trunc(date_of_service_3013, 'mm')
range between interval '3' month preceding
and current row) recip
from your_table
where DATE_OF_SERVICE_3013 BETWEEN TO_DATE('2013-07','YYYY-MM')
AND TO_DATE('2014-03','YYYY-MM')
)
select * from t1 where "Date" >= TO_DATE('2013-10','YYYY-MM');