Use of HAVING without GROUP BY not working as expected - sql

I am starting to learn SQL Server, in the documentation found in msdn states like this
HAVING is typically used with a GROUP BY clause. When GROUP BY is not used, there is an implicit single, aggregated group.
This made me to think that we can use having without a groupBy clause, but when I am trying to make a query I am not able to use it.
I have a table like this
CREATE TABLE [dbo].[_abc]
(
[wage] [int] NULL
) ON [PRIMARY]
GO
INSERT INTO [dbo].[_abc] (wage)
VALUES (4), (8), (15), (30), (50)
GO
Now when I run this query, I get an error
select *
from [dbo].[_abc]
having sum(wage) > 5
Error:

The documentation is correct; i.e. you could run this statement:
select sum(wage) sum_of_all_wages
, count(1) count_of_all_records
from [dbo].[_abc]
having sum(wage) > 5
The reason your statement doesn't work is because of the select *, which means select every columns' value. When there is no group by, all records are aggregated; i.e. you only get 1 record in your result set which has to represent every record. As such, you can only* include values provided by applying aggregate functions to your columns; not the columns themselves.
* of course, you can also provide constants, so select 'x' constant, count(1) cnt from myTable would work.
There aren't many use cases I can think of where you'd want to use having without a group by, but certainly it can be done as shown above.
NB: If you wanted all rows where the wage was greater than 5, you'd use the where clause instead:
select *
from [dbo].[_abc]
where wage > 5
Equally, if you want the sum of all wages greater than 5 you can do this
select sum(wage) sum_of_wage_over_5
from [dbo].[_abc]
where wage > 5
Or if you wanted to compare the sum of wages over 5 with those under:
select case when wage > 5 then 1 else 0 end wage_over_five
, sum(wage) sum_of_wage
from [dbo].[_abc]
group by case when wage > 5 then 1 else 0 end
See runnable examples here.
Update based on comments:
Do you need having to use aggregate functions?
No. You can run select sum(wage) from [dbo].[_abc]. When an aggregate function is used without a group by clause, it's as if you're grouping by a constant; i.e. select sum(wage) from [dbo].[_abc] group by 1.
The documentation merely means that whilst normally you'd have a having statement with a group by statement, it's OK to exclude the group by / in such cases the having statement, like the select statement, will treat your query as if you'd specified group by 1
What's the point?
It's hard to think of many good use cases, since you're only getting one row back and the having statement is a filter on that.
One use case could be that you write code to monitor your licenses for some software; if you have less users than per-user-licenses all's good / you don't want to see the result since you don't care. If you have more users you want to know about it. E.g.
declare #totalUserLicenses int = 100
select count(1) NumberOfActiveUsers
, #totalUserLicenses NumberOfLicenses
, count(1) - #totalUserLicenses NumberOfAdditionalLicensesToPurchase
from [dbo].[Users]
where enabled = 1
having count(1) > #totalUserLicenses
Isn't the select irrelevant to the having clause?
Yes and no. Having is a filter on your aggregated data. Select says what columns/information to bring back. As such you have to ask "what would the result look like?" i.e. Given we've had to effectively apply group by 1 to make use of the having statement, how should SQL interpret select *? Since your table only has one column this would translate to select wage; but we have 5 rows, so 5 different values of wage, and only 1 row in the result to show this.
I guess you could say "I want to return all rows if their sum is greater than 5; otherwise I don't want to return any rows". Were that your requirement it could be achieved a variety of ways; one of which would be:
select *
from [dbo].[_abc]
where exists
(
select 1
from [dbo].[_abc]
having sum(wage) > 5
)
However, we have to write the code to meet the requirement, rather than expect the code to understand our intent.
Another way to think about having is as being a where statement applied to a subquery. I.e. your original statement effectively reads:
select wage
from
(
select sum(wage) sum_of_wage
from [dbo].[_abc]
group by 1
) singleRowResult
where sum_of_wage > 5
That won't run because wage is not available to the outer query; only sum_of_wage is returned.

HAVING without GROUP BY clause is perfectly valid but here is what you need to understand:
The result will contain zero or one row
The implicit GROUP BY will return exactly one row even if the WHERE condition matched zero rows
HAVING will keep or eliminate that single row based on the condition
Any column in the SELECT clause needs to be wrapped inside an aggregate function
You can also specify an expression as long as it is not functionally dependent on the columns
Which means you can do this:
SELECT SUM(wage)
FROM employees
HAVING SUM(wage) > 100
-- One row containing the sum if the sum is greater than 5
-- Zero rows otherwise
Or even this:
SELECT 1
FROM employees
HAVING SUM(wage) > 100
-- One row containing "1" if the sum is greater than 5
-- Zero rows otherwise
This construct is often used when you're interested in checking if a match for the aggregate was found:
SELECT *
FROM departments
WHERE EXISTS (
SELECT 1
FROM employees
WHERE employees.department = departments.department
HAVING SUM(wage) > 100
)
-- all departments whose employees earn more than 100 in total

In SQL you cannot return aggregate functioned columns directly. You need to group the non aggregate fields
As shown below example
USE AdventureWorks2012 ;
GO
SELECT SalesOrderID, SUM(LineTotal) AS SubTotal
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
HAVING SUM(LineTotal) > 100000.00
ORDER BY SalesOrderID ;
In your case you don't have identity column for your table it should come as below
Alter _abc
Add Id_new Int Identity(1, 1)
Go

Related

Generalized way to remove empty row given by aggregate functions in SQLite

The below query would give an empty NULL row when no data is present in the Table or when WHERE fails.
SELECT MAX(Number) AS Number
FROM Table
WHERE Number > 10;
The NULL row result looks like this:-
Number
1
NULL
So to detect if the query gives actual data or not I had to do this:-
SELECT 1
FROM (
SELECT MAX(Number) AS Number
FROM Table
WHERE Number > 10
)
WHERE Number IS NOT NULL;
Now, this will give me 1 if a max number exists and 0 if (max it doesn't exist or empty table or WHERE Number < 10).
So, is this how we generally tackle the empty row produced by the aggregate function, or is there a more generalized way to do that.
An example SQLite fiddle showcasing the use case.
You are overcomplicating things with the use of the aggregate function MAX().
This part of your code:
EXISTS (
SELECT 1
FROM (
SELECT MAX(TourCompletionDateTime) AS TourCompletionDateTime
FROM Details
WHERE TourCompletionDateTime < '2022-07-26T09:36:00.730589Z'
)
WHERE TourCompletionDateTime IS NOT NULL
)
is equivalent to just:
EXISTS (
SELECT 1
FROM Details
WHERE TourCompletionDateTime < '2022-07-26T09:36:00.730589Z'
)
See the demo.

SQL: Give up/return different result if too many rows

Short version, I have a SQL statement where I only want the results if the number of rows returned is less than some value (say 1000) and otherwise I want a different result set. What's the best way to do this without incurring the overhead of returning the 1000 rows (as would happen if I used limit) when I'm just going to throw them away?
For instance, I want to return the results of
SELECT *
FROM T
WHERE updated_at > timestamp
AND name <= 'Michael'
ORDER BY name ASC
provided there are at most 1000 entries but if there are more than that I want to return
SELECT *
FROM T
ORDER BY name ASC
LIMIT 25
Two queries isn't bad, but I definitely don't want to get 1000 records back from the first query only to toss them.
(Happy to use Postgres extensions too but prefer SQL)
--
To explain I'm refreshing data requested by client in batches and sometimes the client needs to know if there have been any changes in the part they've already received. If there are too many changes, however, I'm just giving up and starting to send the records from the start again.
WITH max1000 AS (
SELECT the_row, count(*) OVER () AS total
FROM (
SELECT the_row -- named row type
FROM T AS the_row
WHERE updated_at > timestamp
AND name <= 'Michael'
ORDER BY name
LIMIT 1001
) sub
)
SELECT (the_row).* -- parentheses required
FROM max1000 m
WHERE total < 1001
UNION ALL
( -- parentheses required
SELECT *
FROM T
WHERE (SELECT total > 1000 FROM max1000 LIMIT 1)
ORDER BY name
LIMIT 25
)
The subquery sub in CTE max1000 gets the complete, sorted result for the first query - wrapped as row type, and with LIMIT 1001 to avoid excess work.
The outer SELECT adds the total row count. See:
Run a query with a LIMIT/OFFSET and also get the total number of rows
The first SELECT of the outer UNION query returns decomposed rows as result - if there are less than 1001 of them.
The second SELECT of the outer UNION query returns the alternate result - if there were more than 1000. Parentheses are required - see:
Combining 3 SELECT statements to output 1 table
Or:
WITH max1000 AS (
SELECT *
FROM T
WHERE updated_at > timestamp
AND name <= 'Michael'
ORDER BY name
LIMIT 1001
)
, ct(ok) AS (SELECT count(*) < 1001 FROM max1000)
SELECT *
FROM max1000 m
WHERE (SELECT ok FROM ct)
UNION ALL
( -- parentheses required
SELECT *
FROM T
WHERE (SELECT NOT ok FROM ct)
ORDER BY name
LIMIT 25
);
I think I like the 2nd better. Not sure which is faster.
Either optimizes performance for less than 1001 rows in most calls. If that's the exception, I would first run a somewhat cheaper count. Also depends a lot on available indexes ...
You get no row if the first query finds no row. (Seems like an odd result.)

How can I get the total result count, and a given subset ('page' of results) with the same SQL Query with Oracle

I would like to display a table of results. The data is sourced from a SQL query on an Oracle database. I would like to show the results one page (say, 10 records) at a time, minimising the actual data being sent to the front-end.
At the same time, I would like to show the total number of possible results (say, showing 1-10 of 123), and to allow for pagination (say, to calculate that 10 per page, 123 results, therefore 13 pages).
I can get the total number of results with a single count query.
SELECT count(*) AS NUM_RESULTS FROM ... etc.
and I can get the desired subset with another query
SELECT * FROM ... etc. WHERE ? <= ROWNUM AND ROWNUM < ?
But, is there a way to get all the relevant details in one single query?
Update
Actually, the above query using ROWNUM seems to work for 0 - 10, but not for 10 - 20, so how can I do that too?
ROWNUM is a bit tricky to use.
The ROWNUM pseudocolumn always starts with 1 for the first result that actually gets fetched. If you filter for ROWNUM>10, you will never fetch any result and therefore will not get any.
If you want to use it for paging (not that you really should), it requires nested subqueries:
select * from
(select rownum n, x.* from
(select * from mytable order by name) x
)
where n between 3 and 5;
Note that you need another nested subquery to get the order by right; if you put the order by one level higher
select * from
(select rownum n, x.* from mytable x order by name)
where n between 3 and 5;
it will pick 3 random(*) rows and sort them, but that is ususally not what you want.
(*) not really random, but probably not what you expect.
See http://use-the-index-luke.com/sql/partial-results/window-functions for more effient ways to implement pagination.
You can use inner join on your table and fetch total number of result in your subquery. The example of an query is as follows:
SELECT E.emp_name, E.emp_age, E.emp_sal, E.emp_count
FROM EMP as E
INNER JOIN (SELECT emp_name, COUNT(*) As emp_count
FROM EMP GROUP BY emp_name) AS T
ON E.emp_name = T.emp_name WHERE E.emp_age < 35;
Not sure exactly what you're after based on your question wording, but it seems like you want to see your specialized table of all records with a row number between two values, and in an adjacent field in each record see the total count of records. If so, you can try selecting everything from your table and joining a subquery of a COUNT value as a field by saying where 1=1 (i.e. everywhere) tack that field onto the record. Example:
SELECT *
FROM table_name LEFT JOIN (SELECT COUNT(*) AS NUM_RESULTS FROM table_name) ON 1=1
WHERE ? <= ROWNUM AND ROWNUM < ?

When would you use a column name instead of * in a count?

I understand why you use COUNT(*) and COUNT(DISTINCT col), but in which cases would you use simply COUNT(col). Wouldn't COUNT(col) return the same result as COUNT(*)?
SELECT COUNT(CustomerID) AS OrdersFromCustomerID7 FROM Orders
WHERE CustomerID=7;
and
SELECT COUNT(*) AS OrdersFromCustomerID7 FROM Orders
WHERE CustomerID=7;
both result in
OrdersFromCustomerID7
4
in this W3 school example.
When you use COUNT(Colomn) It won't count nulls.
As opposed to COUNT(*) which will count each row individually no matter null or not.
Lets take this case:
ID | NAME
1 John
2 NULL
3 Jonathan
SELECT COUNT(*) FROM Table -- return 3
SELECT COUNT(NAME) FROM Table -- return 2
Try this:
DECLARE #tbl TABLE(ID INT IDENTITY,SomeValue INT);
INSERT INTO #tbl VALUES(1),(2),(NULL);
SELECT *
FROM #tbl
SELECT COUNT(*) AS COUNT_Asterisk
,COUNT(SomeValue) AS COUNT_SomeValue
FROM #tbl
The results vary in cases where column values can be NULL. Both also vary when it comes to performance. You can refer to this article for more details count-vs-countcol
When you count a column that is not NULL, then the following produce the same results:
COUNT(*)
COUNT(1)
COUNT(column)
There is often a small potential difference in performance. The first two count rows. The third actually requires reading the column value (at least in most databases). Under some circumstances, this can require reading additional data to determine if the value is indeed not NULL. This overhead is typically quite small in an aggregation query.
The first, using *, is the original way of counting rows. The second is fine, although I prefer the first. Why? Because COUNT(1) = COUNT(2), and I find that awkward.

sql get max based on field

I need to get the ID based from what ever the max amount is. Below is giving me an error
select ID from Prog
where Amount = MAX(Amount)
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.
The end result is that I need to get the just the ID as I need to pass it something else that is expecting it.
You need to order by Amount and select 1 record instead...
SELECT ID
FROM Prog
ORDER BY Amount DESC
LIMIT 1;
This takes all the rows in Prog, orders them in descending order by Amount (in other words, the first sorted row has the highest Amount), then limits the query to select only one row (the one with the highest Amount).
Also, subqueries are bad for performance. This code runs on a table with 200k records in half the time as the subquery versions.
Just pass a subquery with the max value to the where clause :
select ID from Prog
where Amount = (SELECT MAX(Amount) from Prog)
If you're using SQL Server that should do it :
SELECT TOP 1 ID
FROM Prog
ORDER BY Amount DESC
This should be something like:
select P.ID from Prog P
where P.Amount = (select max(Amount) from Prog)
EDIT:
If you really want only 1 row, you should do:
select max(P.ID) from Prog P
where P.Amount = (select max(Amount) from Prog);
However, if you have multiple rows that would match amount and you only want 1 row, you should have some kind of logic behind how you pick your one row. Not just relying on this max trick, or limit 1 type logic.
Also, I don't write limit 1, because this is not ANSI sql -- it works in mysql but OP doesn't say what he wants. Every db is different -- see here: Is there an ANSI SQL alternative to the MYSQL LIMIT keyword? Don't get used to one db's extensions unless you only want to work in 1 db for the rest of your life.
select min(ID) from Prog
where Amount in
(
select max(amount)
from prog
)
The min statement ensures that you get only one result.