Issues with SQL window function- error that column must be aggregated or in group by - sql

I have a table called "sales" with two columns: transaction_date, and transaction_amount: VALUES ('2020-01-16 00:05:54.000000', '122.02'), ('2020-01-07 20:53:04.000000', '1240.00')
I want to find the 3-day moving average for each day in January 2020. I am returning the error that transaction_amount must be included in either an aggregated function or in the group by. It does not make sense to group by it, as I only want one entry per day in the resulting table. In my code, I already have the amount in the aggregate function SUM, so I am not sure what else to try. Here is my query so far:
SELECT EXTRACT(DAY FROM transaction_time) AS Jan20_day, SUM(transaction_amount), SUM(transaction_amount) OVER(ORDER BY EXTRACT(DAY FROM transaction_time) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_average FROM sales WHERE EXTRACT(MONTH FROM transaction_time)=1 AND EXTRACT(YEAR FROM transaction_time)=2020 GROUP BY EXTRACT(DAY FROM transaction_time)
Any insight on why I am returning the following error?
Query Error: error: column "transactions.transaction_amount" must appear in the GROUP BY clause or be used in an aggregate function

I would expect something like this:
SELECT EXTRACT(DAY FROM transaction_time) AS Jan20_day,
SUM(transaction_amount),
SUM(SUM(transaction_amount)) OVER (ORDER BY EXTRACT(DAY FROM transaction_time) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_average
FROM sales
WHERE transaction_time >= DATE '2020-01-01' AND
transaction_time < DATE '2020-02-01'
GROUP BY EXTRACT(DAY FROM transaction_time);
But the basic issue with your query is that you need to apply the window function to SUM(), so SUM(SUM(transaction_amount)) . . . .

There is need to use GroupBy Before Where Clause
GROUP BY clause is used with the SELECT statement. In the query,
GROUP BY clause is placed after the WHERE clause. In the query,
GROUP BY clause is placed before ORDER BY clause if used any.
SELECT EXTRACT(DAY FROM transaction_time) AS Jan20_day, SUM(transaction_amount),
SUM(transaction_amount)
OVER(ORDER BY EXTRACT(DAY FROM transaction_time)
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_average
FROM sales
WHERE EXTRACT(MONTH FROM transaction_time)=1
GROUP BY {property}
AND EXTRACT(YEAR FROM transaction_time)=2020 GROUP BY EXTRACT(DAY FROM transaction_time)

Related

Column neither grouped nor aggregated after introducing window query

I have trouble integrating a simple window function into my query. I work with this avocado dataset from Kaggle. I started off with a simple query:
SELECT
date,
SUM(Total_Bags) as weekly_bags,
FROM
`course.avocado`
WHERE
EXTRACT(year FROM date) = 2015
GROUP BY
date
ORDER BY
date
And it works just fine. Next, I want to add the rolling sum to the query to display along the weekly sum. I tried the following:
SELECT
date,
SUM(Total_Bags) as weekly_bags,
SUM(Total_Bags) OVER(
PARTITION BY date
ORDER BY date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
)
FROM
`course.avocado`
WHERE
EXTRACT(year FROM date) = 2015
GROUP BY
date
ORDER BY
date
but im getting the common error:
SELECT list expression references column Total_Bags which is neither grouped nor aggregated at [4:7]
and im confused. Total_Bags in the first query was aggregated yet when it's introduced again in the second query, it's not aggregated anymore. How do I fix this query? Thanks.
In your query, which returns 2 columns: date and aggregate SUM(Total_Bags), the window function SUM() is evaluated after the aggregation when there is no column Total_Bags and this is why you can't use it inside the window function.
However, you can do want you want, without group by, by using only window functions and DISTINCT:
SELECT DISTINCT date,
SUM(Total_Bags) OVER(PARTITION BY date) AS weekly_bags,
SUM(Total_Bags) OVER(
PARTITION BY date
ORDER BY date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
)
FROM course.avocado
WHERE EXTRACT(year FROM date) = 2015
ORDER BY date;
or, use window function on the the aggregated result:
SELECT date,
SUM(Total_Bags) AS weekly_bags,
SUM(SUM(Total_Bags)) OVER(
ORDER BY date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
)
FROM course.avocado
WHERE EXTRACT(year FROM date) = 2015
GROUP BY date
ORDER BY date;
I tried to approach it from a different angle and seems I have figured it out, the results seem just right. Here's the code:
WITH daily_bags AS
(SELECT
Date,
CAST(SUM(Total_Bags) as int64) as all_bags
FROM
`course.avocado`
WHERE
EXTRACT(year from Date) = 2015
GROUP BY
Date
ORDER BY
Date)
SELECT
Date,
all_bags,
SUM(all_bags) OVER(
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
) as rolling_sum
FROM
daily_bags
Thanks everyone for your help.

Bigquery Error code: Window ORDER BY expression references column start_date which is neither grouped nor aggregated at

I am using BigQuery for SQL and I can't figure out why there is an error message that comes like this:
Window ORDER BY expression references column start_date which is neither grouped nor aggregated at [4:73]
Here is my code:
SELECT EXTRACT(WEEK FROM start_date) as week, count(start_date) as count,
RANK() OVER (PARTITION BY start_station_name ORDER BY EXTRACT(WEEK FROM start_date))
from `bigquery-public-data.london_bicycles.cycle_hire`
GROUP BY EXTRACT(WEEK FROM start_date), start_station_name)
I thought I have grouped the week below, as seen in the last line. So what can cause this error message to keep popping up?
This is a parsing error in BigQuery, which you can work around with an aggregation function. Your query has another issue, which is the start_station_name.
SELECT EXTRACT(WEEK FROM start_date) as week, start_station_name, count(start_date) as count,
RANK() OVER (PARTITION BY start_station_name ORDER BY MIN(EXTRACT(WEEK FROM start_date)))
from `bigquery-public-data.london_bicycles.cycle_hire`
GROUP BY 1, 2;
The MIN() really serves no purpose other than lettering BigQuery parse the query. Because the expression is part of the GROUP BY, there is only one value for the MIN() to consider.
This is a bug in the BigQuery parsing, because it does not recognize that the expression is the same as the expression in the GROUP BY. Happily, it is easy to work around.
try like below using cte
with cte as
(
SELECT *, EXTRACT(WEEK FROM start_date) as week
from `bigquery-public-data.london_bicycles.cycle_hire`
) select week,count(start_date) as count,
RANK() OVER (PARTITION BY start_station_name ORDER BY week)
from cte group by week,start_station_name
In query you need to make sure that you have to put ORDER BY only on those values which you are selecting.
With your query, the problem is you are doing ORDER BY EXTRACT(WEEK from start_date). Rather than doing this you should write ORDER BY week because you are selecting week already

SQL - Rolling avg over truncated date

I want to do a rolling mean of a calculated field on a week basis out of data whose precision is at the second. This is why I first truncate the date to the week.
So my provisional query is
SELECT week, AVG(my_value) OVER(ORDER BY week ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS avg_my_value
FROM
(SELECT id,
DATE_TRUNC('week', created_at) AS week,
my_value
FROM my_table
ORDER BY week ASC
)
GROUP BY week
The problem I have is that the AVG works but it's done separately for all rows which have got the same week! I think this is because there must be some sort of inner grouping added but the problem I have is to conceive it for the case of an average.
If that counts, I am looking for a solution working for Redshift, or PostgreSQL.
If you want a cumulative average, then:
SELECT week,
AVG(AVG(my_value)) OVER (ORDER BY week ASC) AS avg_my_value
FROM (SELECT id, DATE_TRUNC('week', created_at) AS week, my_value
FROM my_table
) t
GROUP BY week;
Notes:
The ORDER BY in the subquery is superfluous.
Note the nesting of the aggregation functions.

Create new table with number of incidents per month

Whats up mates , i have already started to learn SQL database thing and i am confused here . i have to create a table with number of incidents per month.
I already know how to create table but the rest ?
SELECT
EXTRACT(month FROM dateofcall) AS x,
incidentnumber,
dateofcall
FROM
incidents
GROUP BY
incidentnumber,
x
ORDER BY
x ASC;
But its not giving me the results of incidents number per month . =(
It looks like you are grouping by too many items in your GROUP BY clause, and you are not COUNTing your incidents, just showing their details.
Try this:
SELECT EXTRACT(month FROM dateofcall) AS x,
COUNT(*) AS incidents
FROM
incidents
GROUP BY
EXTRACT(month FROM dateofcall)
ORDER BY
EXTRACT(month FROM dateofcall)
SELECT
EXTRACT(month FROM dateofcall) AS theMonth,
COUNT(*) AS theNumberOfIncidents
FROM
incidents
GROUP BY
EXTRACT(month FROM dateofcall)
ORDER BY
theMonth
Your original query wasn't counting anything. You were also grouping by incidentNumber which I assume is your primary-key, which is a nonsensical operation.
Due to a quirk in the SQL language you cannot use a column alias in GROUP BY statements, which is why you need to duplicate the EXTRACT(month FROM dateofcall) code.

How do I get a rolling view of counts?

Goal is to count anyone who fits a criteria on three months back from specified date. The (BetweenDate -3 months) is the tricky part. I am operating within a yearly window not 3 months back from getDate() I need it to be three months back from within -3 months of Y. Any ideas?
CREATE TABLE MONTH3LOOK AS Select
to_CHAR(DATE_OF_SERVICE_3013,'YYYY-MM') "Date"
,COUNT(DISTINCT case when (regexp_instr(IS_CONCAT,'(2957|29570|29571|29572|29573|29574|29575|29576|29577|29578|29579)')>0)
and
(DATE_OF_SERVICE_3013 between trunc(DATE_OF_SERVICE_3013,'MM') and add_months(trunc(DATE_OF_SERVICE_3013,'MM'),-3))
then USER end) AS Recip
FROM .NET_SERVICE
WHERE DATE_OF_SERVICE_3013 BETWEEN
TO_DATE('2013-10','YYYY-MM') AND
TO_DATE('2014-03','YYYY-MM')
group by to_CHAR(DATE_OF_SERVICE_3013,'YYYY-MM')
You will likely need to use analytic functions to get your counts and the distinct operator to simulate the group by since including the group by operator interferes with the operation of the analytic functions:
select distinct trunc(date_of_service_3013,'MM') "Date"
, count(case when regexp_like(IS_CONCAT, '(1234|5678|etc)') then user end)
over (order by trunc(date_of_service_3013, 'mm')
range between interval '3' month preceding
and current row) recip
from your_table
where DATE_OF_SERVICE_3013 BETWEEN TO_DATE('2013-10','YYYY-MM')
AND TO_DATE('2014-03','YYYY-MM');
Another way to take the effect of the group by operation into account is to change use both analytic and aggregate functions:
select trunc(date_of_service_3013,'MM') "Date"
, sum(count(case when regexp_like(IS_CONCAT, '1234|5678|etc') then user end))
over (order by trunc(date_of_service_3013, 'mm')
range between interval '3' month preceding
and current row) recip
from your_table
group by trunc(date_of_service_3013,'MM')
where DATE_OF_SERVICE_3013 BETWEEN TO_DATE('2013-10','YYYY-MM')
AND TO_DATE('2014-03','YYYY-MM');
Here the aggregate count works on a month by month basis as per the group by clause, then it uses the analytic sum to add up those counts.
One thing about these two solutions, the where clause will prevent any records prior to 2013-10 from being counted. If you want to include records prior to 2013-10 in the counts but only output 2013-10 to 2014-03 then you'll need to do it in two stages using either of the two queries above inside the with t1 as (...) subfactored query block with the starting date adjusted appropriately:
with t1 as (
select distinct trunc(date_of_service_3013,'MM') "Date"
, count(case when regexp_like(IS_CONCAT, '1234|5678|etc') then user end)
over (order by trunc(date_of_service_3013, 'mm')
range between interval '3' month preceding
and current row) recip
from your_table
where DATE_OF_SERVICE_3013 BETWEEN TO_DATE('2013-07','YYYY-MM')
AND TO_DATE('2014-03','YYYY-MM')
)
select * from t1 where "Date" >= TO_DATE('2013-10','YYYY-MM');