Bigquery Error code: Window ORDER BY expression references column start_date which is neither grouped nor aggregated at - sql

I am using BigQuery for SQL and I can't figure out why there is an error message that comes like this:
Window ORDER BY expression references column start_date which is neither grouped nor aggregated at [4:73]
Here is my code:
SELECT EXTRACT(WEEK FROM start_date) as week, count(start_date) as count,
RANK() OVER (PARTITION BY start_station_name ORDER BY EXTRACT(WEEK FROM start_date))
from `bigquery-public-data.london_bicycles.cycle_hire`
GROUP BY EXTRACT(WEEK FROM start_date), start_station_name)
I thought I have grouped the week below, as seen in the last line. So what can cause this error message to keep popping up?

This is a parsing error in BigQuery, which you can work around with an aggregation function. Your query has another issue, which is the start_station_name.
SELECT EXTRACT(WEEK FROM start_date) as week, start_station_name, count(start_date) as count,
RANK() OVER (PARTITION BY start_station_name ORDER BY MIN(EXTRACT(WEEK FROM start_date)))
from `bigquery-public-data.london_bicycles.cycle_hire`
GROUP BY 1, 2;
The MIN() really serves no purpose other than lettering BigQuery parse the query. Because the expression is part of the GROUP BY, there is only one value for the MIN() to consider.
This is a bug in the BigQuery parsing, because it does not recognize that the expression is the same as the expression in the GROUP BY. Happily, it is easy to work around.

try like below using cte
with cte as
(
SELECT *, EXTRACT(WEEK FROM start_date) as week
from `bigquery-public-data.london_bicycles.cycle_hire`
) select week,count(start_date) as count,
RANK() OVER (PARTITION BY start_station_name ORDER BY week)
from cte group by week,start_station_name

In query you need to make sure that you have to put ORDER BY only on those values which you are selecting.
With your query, the problem is you are doing ORDER BY EXTRACT(WEEK from start_date). Rather than doing this you should write ORDER BY week because you are selecting week already

Related

Column neither grouped nor aggregated after introducing window query

I have trouble integrating a simple window function into my query. I work with this avocado dataset from Kaggle. I started off with a simple query:
SELECT
date,
SUM(Total_Bags) as weekly_bags,
FROM
`course.avocado`
WHERE
EXTRACT(year FROM date) = 2015
GROUP BY
date
ORDER BY
date
And it works just fine. Next, I want to add the rolling sum to the query to display along the weekly sum. I tried the following:
SELECT
date,
SUM(Total_Bags) as weekly_bags,
SUM(Total_Bags) OVER(
PARTITION BY date
ORDER BY date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
)
FROM
`course.avocado`
WHERE
EXTRACT(year FROM date) = 2015
GROUP BY
date
ORDER BY
date
but im getting the common error:
SELECT list expression references column Total_Bags which is neither grouped nor aggregated at [4:7]
and im confused. Total_Bags in the first query was aggregated yet when it's introduced again in the second query, it's not aggregated anymore. How do I fix this query? Thanks.
In your query, which returns 2 columns: date and aggregate SUM(Total_Bags), the window function SUM() is evaluated after the aggregation when there is no column Total_Bags and this is why you can't use it inside the window function.
However, you can do want you want, without group by, by using only window functions and DISTINCT:
SELECT DISTINCT date,
SUM(Total_Bags) OVER(PARTITION BY date) AS weekly_bags,
SUM(Total_Bags) OVER(
PARTITION BY date
ORDER BY date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
)
FROM course.avocado
WHERE EXTRACT(year FROM date) = 2015
ORDER BY date;
or, use window function on the the aggregated result:
SELECT date,
SUM(Total_Bags) AS weekly_bags,
SUM(SUM(Total_Bags)) OVER(
ORDER BY date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
)
FROM course.avocado
WHERE EXTRACT(year FROM date) = 2015
GROUP BY date
ORDER BY date;
I tried to approach it from a different angle and seems I have figured it out, the results seem just right. Here's the code:
WITH daily_bags AS
(SELECT
Date,
CAST(SUM(Total_Bags) as int64) as all_bags
FROM
`course.avocado`
WHERE
EXTRACT(year from Date) = 2015
GROUP BY
Date
ORDER BY
Date)
SELECT
Date,
all_bags,
SUM(all_bags) OVER(
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
) as rolling_sum
FROM
daily_bags
Thanks everyone for your help.

Issues with SQL window function- error that column must be aggregated or in group by

I have a table called "sales" with two columns: transaction_date, and transaction_amount: VALUES ('2020-01-16 00:05:54.000000', '122.02'), ('2020-01-07 20:53:04.000000', '1240.00')
I want to find the 3-day moving average for each day in January 2020. I am returning the error that transaction_amount must be included in either an aggregated function or in the group by. It does not make sense to group by it, as I only want one entry per day in the resulting table. In my code, I already have the amount in the aggregate function SUM, so I am not sure what else to try. Here is my query so far:
SELECT EXTRACT(DAY FROM transaction_time) AS Jan20_day, SUM(transaction_amount), SUM(transaction_amount) OVER(ORDER BY EXTRACT(DAY FROM transaction_time) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_average FROM sales WHERE EXTRACT(MONTH FROM transaction_time)=1 AND EXTRACT(YEAR FROM transaction_time)=2020 GROUP BY EXTRACT(DAY FROM transaction_time)
Any insight on why I am returning the following error?
Query Error: error: column "transactions.transaction_amount" must appear in the GROUP BY clause or be used in an aggregate function
I would expect something like this:
SELECT EXTRACT(DAY FROM transaction_time) AS Jan20_day,
SUM(transaction_amount),
SUM(SUM(transaction_amount)) OVER (ORDER BY EXTRACT(DAY FROM transaction_time) ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_average
FROM sales
WHERE transaction_time >= DATE '2020-01-01' AND
transaction_time < DATE '2020-02-01'
GROUP BY EXTRACT(DAY FROM transaction_time);
But the basic issue with your query is that you need to apply the window function to SUM(), so SUM(SUM(transaction_amount)) . . . .
There is need to use GroupBy Before Where Clause
GROUP BY clause is used with the SELECT statement. In the query,
GROUP BY clause is placed after the WHERE clause. In the query,
GROUP BY clause is placed before ORDER BY clause if used any.
SELECT EXTRACT(DAY FROM transaction_time) AS Jan20_day, SUM(transaction_amount),
SUM(transaction_amount)
OVER(ORDER BY EXTRACT(DAY FROM transaction_time)
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_average
FROM sales
WHERE EXTRACT(MONTH FROM transaction_time)=1
GROUP BY {property}
AND EXTRACT(YEAR FROM transaction_time)=2020 GROUP BY EXTRACT(DAY FROM transaction_time)

SQL - Rolling avg over truncated date

I want to do a rolling mean of a calculated field on a week basis out of data whose precision is at the second. This is why I first truncate the date to the week.
So my provisional query is
SELECT week, AVG(my_value) OVER(ORDER BY week ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS avg_my_value
FROM
(SELECT id,
DATE_TRUNC('week', created_at) AS week,
my_value
FROM my_table
ORDER BY week ASC
)
GROUP BY week
The problem I have is that the AVG works but it's done separately for all rows which have got the same week! I think this is because there must be some sort of inner grouping added but the problem I have is to conceive it for the case of an average.
If that counts, I am looking for a solution working for Redshift, or PostgreSQL.
If you want a cumulative average, then:
SELECT week,
AVG(AVG(my_value)) OVER (ORDER BY week ASC) AS avg_my_value
FROM (SELECT id, DATE_TRUNC('week', created_at) AS week, my_value
FROM my_table
) t
GROUP BY week;
Notes:
The ORDER BY in the subquery is superfluous.
Note the nesting of the aggregation functions.

Cannot group by timestamp_trunc

I have this query in standard sql.
select timestamp_trunc(endTime, MONTH), count(1)
from `simple_table`
group by timestamp_trunc(endTime, MONTH);
Which returnts the following error:
SELECT list expression references column endTime which is neither grouped nor aggregated at [1:24]
However, the following code:
select timestamp_trunc(endTime, MONTH)
from `simple_table`
limit 10
Works perfectly. Is there some hidden reference about BigQuery's ability to do group by that I am missing?
just do as below
select timestamp_trunc(endTime, MONTH), count(1)
from `simple_table`
group by 1
or
select timestamp_trunc(endTime, MONTH) as m, count(1)
from `simple_table`
group by m
I think what happens is not the problem in using functions/expressions in GROUP BY, but rather the fact that engine does not recognize that expression for field in SELECT list and expression in GROUP BY are the same. Rather they are treated as different, thus engine think endTime filed is "orphan" (neither aggregated nor grouped by)
for example, below will work (of course it is not what you need - but it proves that group by accepts expressions)
select count(1)
from `simple_table`
group by timestamp_trunc(endTime, MONTH)

error: column "month" does not exist in PG query

My PG query:
SELECT "Tracks"."PageId",
date_trunc("month", "Tracks"."createdAt") AS month,
count(*)
FROM "Tracks"
WHERE "Tracks"."PageId" IN (29,30,31)
GROUP BY month, "Tracks"."PageId"
and my schema:
id, createdAt, updatedAt, PageId
A bit confused as to why I'm receiving this error!
You can use an alias in the where or group by clause. You need to repeat the expression:
SELECT "Tracks"."PageId",
date_trunc('month', "Tracks"."createdAt") AS month,
count(*)
FROM "Tracks"
WHERE "Tracks"."PageId" IN (29,30,31)
GROUP BY date_trunc('month', "Tracks"."createdAt"), "Tracks"."PageId";
Note that the first parameter for date_trunc() is a varchar value, so you need to put that in single quotes, not double quotes.
If you don't want to repeat the expression you can put that into a derived table:
select "PageId", month, count(*)
from (
SELECT "Tracks"."PageId",
date_trunc('month', "Tracks"."createdAt") AS month
FROM "Tracks"
WHERE "Tracks"."PageId" IN (29,30,31)
) t
group by month, "PageId";
Unrelated, but: you should really avoid quoted identifiers. They are much more trouble then they are worth it