Can LAG be used with HAVING? - sql

I distinctly recall that T-SQL will never let you mix LAG and WHERE. For example,
SELECT FOO
WHERE LAG(BAR) OVER (ORDER BY DATE) > 7
will never work. T-SQL will not run it no matter what you do. But does T-SQL ever let you mix LAG with HAVING?
Note: All that an answer needs to do is either give a theory-based or documentation-based reason why it does not, or give any example at all of where it does.

From Logical Processing Order of the SELECT statement:
The following steps show the logical processing order, or binding
order, for a SELECT statement......
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
Window functions are evaluated at the level of SELECT, which comes after HAVING, so the answer is no you can't use window functions in the HAVING clause.

Having clause can only be used with Group by clause. In order to use Group by the listed columns should be aggregated using Group by columns. Group by can only be used with aggregate functions like min,max,sum,count functions. Hence it is not possible to combine having clause along with the LAG analytical function.
In order to use LAG and Having, one should use CTE or subquery.

Related

Still confusing the rules around selecting columns, group by, and joins

I am still confused by the syntax rules of using GROUP BY. I understand we use GROUP BY when there is some aggregate function. If I have even one aggregate function in a SQL statement, do I need to put all of my selected columns into my GROUP BY statement? I don't have a specific query to ask about but when I try to do joins, I get errors. In particular, when I use a count(*) in a statement and/or a join, I just seem to mess it up.
I use BigQuery at my job. I am regularly floored by strange gaps in knowledge.
Thank you!
This is a little complicated.
First, no aggregation functions are needed in an aggregation query. So this is allowed:
select a
from t
group by a;
This is equivalent, by the way, to:
select distinct a
from t;
If there are aggregation functions, then no group by is needed. So, this is allowed:
select max(a)
from t;
Such an aggregation query -- with no group by -- always returns one row. This is true even if the table is empty or a where clause filters out all the rows. In that case, most aggregation functions return NULL, with the notable exception of count() that returns 0.
Next, if you mix aggregation functions and non-aggregation expressions in the select, then in general you want the non-aggregation, non-constant expressions in the group by. I should note that you can do:
select a, concat(a, 'bcd'), count(*)
from t
group by a;
This should work, but sometimes BigQuery gets confused and will want the expression in the group by.
Finally, the SQL standard supports a query like this:
select t.*, count(*)
from t join
u
using (foo)
group by t.a;
When a is the primary key (or equivalent) in t. However, BigQuery does not have primary keys, so this is not relevant to that database.

Subquery requiring group by without single row function

From my understanding queries that rely on one or more aggregate functions as well as at least one single row function require the single row functions to be placed
in a group by clause, which makes sense overall.
However I'm working through problems in an online resource and ran into the question in the picture, my logic behind why I answered it executes successfully but gives improper output is that the subquery is a query that has only an aggregate function, leaving me to believe that it requires no group by, why is it that this requires a group by in the subquery?
Already cleared by Gordon, "Nested aggregate requires a GROUP BY clause". If we consider the query into 2 parts, first part works fine if Having is given with specific value.
Example:
Run Queries in this link:
https://livesql.oracle.com/apex/f?p=590:1:104596775146183::NO:RP::
select count(*), PROD_CATEGORY_ID from SH.PRODUCTS group by PROD_CATEGORY_ID
having count(*)>15;
But we get error if we combine 2 aggregate functions,
select max(count(PROD_CATEGORY_ID)) from SH.PRODUCTS ; --> Throws ORA-00978
select max(count(PROD_CATEGORY_ID)) from SH.PRODUCTS
group by PROD_CATEGORY_ID; -->Gives max count of prod_cat
Gives final result:
select count(*), PROD_CATEGORY_ID from SH.PRODUCTS
group by PROD_CATEGORY_ID
having count(*)=(select max(count(*)) from SH.PRODUCTS group by PROD_CATEGORY_ID);
Good Examples in link:
https://mahtodeepak05.wordpress.com/2014/12/17/aggregate-function-nesting-in-oracle/
You can easily test this:
select max(count(*))
from dual;
The error is:
ORA-00978: nested group function without GROUP BY
So, a nested group by seems to require a GROUP BY.

Aggregate window function output in postgresql (redshift)

I really want to use the median window function as an aggregate function.
I currently am forced to use the window function in a sub-select, and then aggregate over it like this:
SELECT id, MIN(avg) AS mean, MIN(median) AS median, COUNT(*)
FROM (
SELECT id, AVG(metric) OVER(PARTITION BY id), MEDIAN(metric) OVER(PARTITION BY id)
FROM data_table
)
GROUP BY id;
Is there a way to aggregate over a window function result so there's only one SELECT statement?
Strictly speaking, your example query could be rewritten:
SELECT id,
AVG(metric),
MEDIAN(metric),
COUNT(*)
FROM data_table
GROUP BY id;
But I'm wondering if you just picked a poor example that happens to be mathematically capable of simplification. This is a special case because the subquery and the main query are aggregating on the same field, and the outer aggregates are picking a minimum from what would be a set of identical values.
If that's not the case and your actual query and subquery are not grouping by the same field, then the answer is no, you need a subquery for two reasons:
First, by ANSI definition, window functions are evaluated after the WHERE, GROUP BY, and HAVING clauses. There is no clause to specify your desired behavior of aggregating after a window function, so you must use a subquery or CTE.
Second, even if you eliminated the windowing from the OVER() clause you need to GROUP BY data you only know after the first round of aggregation has been completed.

Using count in oracle sql developer

I'm using oracle sql developer and I can't get this query to function. It's telling me its not a single group function. Please help.
SELECT LGBRAND.BRAND_NAME, LGPRODUCT.PROD_DESCRIPT,
COUNT (LGPRODUCT.PROD_DESCRIPT) AS "NUMPRODUCTS"
FROM LGBRAND, LGPRODUCT
ORDER BY LGBRAND.BRAND_NAME;
What I'm trying to accomplish is to get the total different products grouped by each brand name.
when using aggregate functions you need to use group by clause
All aggregate functions like avg, count,sum needs to be used along with a group by function. If you dont use a group by clause, you are performing the function on all the rows of the table.
SELECT LGBRAND.BRAND_NAME,
LGPRODUCT.PROD_DESCRIPT,
COUNT (LGPRODUCT.PROD_DESCRIPT) AS "NUMPRODUCTS"
FROM LGBRAND, LGPRODUCT,
GROUP BY LGBRAND.BRAND;
You need to use GROUP BY clause.

Select all columns on a group by throws error

I ran a query against Northwind database Products Table like below
select * from Northwind.dbo.Products GROUP BY CategoryID and i was hit with a error. I am sure you will also be hit by same error. So what is the correct statement that i need to execute to group all products with respect to their category id's.
edit: this like really helped understand a lot
http://weblogs.sqlteam.com/jeffs/archive/2007/07/20/but-why-must-that-column-be-contained-in-an-aggregate.aspx
You need to use an Aggregate function and then group by any non-aggregated columns.
I recommend reading up on GROUP BY.
If you're using GROUP BY in a query, all items in your SELECT statement must either be contained as part of an aggregate function, e.g. Sum() or Count(), else they will also need to be included in the GROUP BY clause.
Because you are using SELECT *, this is equivalent to listing ALL columns in your SELECT.
Therefore, either list them all in the GROUP BY too, use aggregating functions for the rest where possible, or only select the CategoryID.