Can't use where clause with alias count function name - sql

SELECT
student_id
,COUNT(group_id) num
FROM StudentTable
WHERE num >= 2
GROUP BY student_id
ORDER BY num
desc
;
I keep getting the error: "num" invalid identifier. I don't understand why the where clause doesn't work with the alias name I give it. I already figured out the solution by using the "having" clause. I am just curious as to why the where clause doesn't work because In my mind it makes no sense as to why it doesnt work.

You don't want a where clause. You want a having clause:
SELECT student_id, COUNT(group_id) num
FROM StudentTable
GROUP BY student_id
HAVING num >= 2
ORDER BY num desc;
Some databases may not accept aliases in the having clause. In that case, you need to use a subquery or repeat the definition:
SELECT student_id, COUNT(group_id) num
FROM StudentTable
GROUP BY student_id
HAVING COUNT(group_id) >= 2
ORDER BY COUNT(group_id) desc;

you have to say either ORDER BY 2 or ORDER BY COUNT(group_id).
The query isn't "aware" of the column alias during the order by clause. The select is the VERY last thing to be run by the query engine.
The same thing applies to the WHERE clause.
If you're looking to filter on an aggregate function, using the HAVING clause is the best option. And even then, you have to say COUNT(group_id)

Related

group by with condition on count(*)

I am trying to find the count of obsv_dt which has less than million records
select obsv_dt,count(*) as c from table
group by obsv_dt
having c<1000000
order by c
is giving unable to resolve column 'c'. I get that 'c' is alias and this error is expected
How can i get this working?
select obsv_dt,count(*) as c from table
group by obsv_dt
having count(*) <1000000
order by count(*)
You cannot use the alias before it has been calculated; try:
select
obsv_dt,
count(obsv_dt) as c
from
table
group by obsv_dt
having count(obsv_dt) < 1000000
order by count(obsv_dt)
There is a subtle difference in using count(*) vs count(col). But often it does not matter. count(*) vs count(column-name) - which is more correct?
Below is the precedence of sql clauses:
FROM clause
ON clause
OUTER clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
DISTINCT clause
ORDER BY clause
TOP clause
Since HAVING clause is evaluated prior to the SELECT clause it is
unware of the aliases. Similarly for all clauses except for ORDER BY
where we can include the alias for sorting the result set.

SAS SQL SELECT DISTINCT WITH GROUP BY

What if a SQL code as below?
Proc SQL;
SELECT DISTINCT ID,SUM(AMOUNT) AS M,SUM(NO) AS CNT
FROM CUSTOMER_LIST
GROUP BY ID
ORDER BY CNT DESC;
QUIT;
Use DISTINCT with GROUP BY. Any possible error will occur when using this combination Or DISTINCT just a redundant word?
Thanks~
Use DISTINCT with GROUP BY. Any possible error will occur when using this combination? Or DISTINCT just a redundant word?
This won't error, but that's just unnecessary redondancy. GROUP BY ID guarantees that each ID will appear only on one row in the resulset. There is no benefit for adding DISTINCT here - and it makes the intent of the query harder to understand.
On the other hand, there are situations where you would use DISTINCT without GROUP BY: typically when you want to deduplicate a set of columns, but do not need to use aggregate functions (SUM(), COUNT()...).
SELECT ID,SUM(AMOUNT) AS M,SUM(NO) AS CNT
FROM CUSTOMER_LIST
GROUP BY ID
ORDER BY CNT DESC;
We already group by id so no need distinct id

Is order in a subquery guaranteed to be preserved?

I am wondering in particular about PostgreSQL. Given the following contrived example:
SELECT name FROM
(SELECT name FROM people WHERE age >= 18 ORDER BY age DESC) p
LIMIT 10
Are the names returned from the outer query guaranteed to be be in the order they were for the inner query?
No, put the order by in the outer query:
SELECT name FROM
(SELECT name, age FROM people WHERE age >= 18) p
ORDER BY p.age DESC
LIMIT 10
The inner (sub) query returns a result-set. If you put the order by there, then the intermediate result-set passed from the inner (sub) query, to the outer query, is guaranteed to be ordered the way you designate, but without an order by in the outer query, the result-set generated by processing that inner query result-set, is not guaranteed to be sorted in any way.
For simple cases, #Charles query is most efficient.
More generally, you can use the window function row_number() to carry any order you like to the main query, including:
order by columns not in the SELECT list of the subquery and thus not reproducible
arbitrary ordering of peers according to ORDER BY criteria. Postgres will reuse the same arbitrary order in the window function within the subquery. (But not truly random order from random() for instance!)
If you don't want to preserve arbitrary sort order of peers from the subquery, use rank() instead.
This may also be generally superior with complex queries or multiple query layers:
SELECT p.name
FROM (
SELECT name, row_number() OVER (ORDER BY <same order by criteria>) AS rn
FROM people
WHERE age >= 18
ORDER BY <any order by criteria>
) p
ORDER BY p.rn
LIMIT 10;
The are not guaranteed to be in the same order, though when you run it you might see that it is generally follows the order.
You should place the order by on the main query
SELECT name FROM
(SELECT name FROM people WHERE age >= 18) p
ORDER BY p.age DESC LIMIT 10

Order by not working in Oracle subquery

I'm trying to return 7 events from a table, from todays date, and have them in date order:
SELECT ID
FROM table
where ID in (select ID from table
where DATEFIELD >= trunc(sysdate)
order by DATEFIELD ASC)
and rownum <= 7
If I remove the 'order by' it returns the IDs just fine and the query works, but it's not in the right order. Would appreciate any help with this since I can't seem to figure out what I'm doing wrong!
(edit) for clarification, I was using this before, and the order returned was really out:
select ID
from TABLE
where DATEFIELD >= trunc(sysdate)
and rownum <= 7
order by DATEFIELD
Thanks
The values for the ROWNUM "function" are applied before the ORDER BY is processed. That why it doesn't work the way you used it (See the manual for a similar explanation)
When limiting a query using ROWNUM and an ORDER BY is involved, the ordering must be done in an inner select and the limit must be applied in the outer select:
select *
from (
select *
from table
where datefield >= trunc(sysdate)
order by datefield ASC
)
where rownum <= 7
You cannot use order by in where id in (select id from ...) kind of subquery. It wouldn't make sense anyway. This condition only checks if id is in subquery. If it affects the order of output, it's only incidental. With different data query execution plan might be different and output order would be different as well. Use explicit order by at the end of the main query.
It is well known 'feature' of Oracle that rownum doesn't play nice with order by. See http://www.adp-gmbh.ch/ora/sql/examples/first_rows.html for more information. In your case you should use something like:
SELECT ID
FROM (select ID, row_number() over (order by DATEFIELD ) r
from table
where DATEFIELD >= trunc(sysdate))
WHERE r <= 7
See also:
http://www.orafaq.com/faq/how_does_one_select_the_top_n_rows_from_a_table
http://www.oracle.com/technetwork/issue-archive/2006/06-sep/o56asktom-086197.html
http://asktom.oracle.com/pls/asktom/f?p=100:11:507524690399301::::P11_QUESTION_ID:127412348064
See also other similar questions on SO, eg.:
Oracle SELECT TOP 10 records
Oracle/SQL - Select specified range of sequential records
Your outer query cant "see" the ORDER in the inner query and in this case the order in the inner doesn't make sense because it (the inner) is only being used to create a subset of data that will be used on the WHERE of the outer one, so the order of this subset doesn't matter.
maybe if you explain better what you want to do, we can help you
ORDER BY CLAUSE IN Subqueries:
the order by clause is not allowed inside a subquery, with the exception of the inline views. If attempt to include an ORDER BY clause, you receive an error message
An inline View is a query at the from clause.
SELECT t.*
FROM (SELECT id, name FROM student) t

MySQL query to return only duplicate entries with counts

I have a legacy MySQL table called lnk_lists_addresses with columns list_id and address_id. I'd like to write a query that reports all the cases where the same list_id-address_id combination appears more than once in the table with a count.
I tried this...
SELECT count(*), list_id, address_id
FROM lnk_lists_addresses
GROUP BY list_id, address_id
ORDER BY count(*) DESC
LIMIT 20
It works, sort of, because there are fewer than 20 duplicates. But how would I return only the counts greater than 1?
I tried adding "WHERE count(*) > 1" before and after GROUP BY but got errors saying the statement was invalid.
SELECT count(*), list_id, address_id
FROM lnk_lists_addresses
GROUP BY list_id, address_id
HAVING count(*)>1
ORDER BY count(*) DESC
To combine mine and Todd.Run's answers for a more "complete" answer. You want to use the HAVING clause:
http://dev.mysql.com/doc/refman/5.1/en/select.html
You want to use a "HAVING" clause. Its use is explained in the MySQL manual.
http://dev.mysql.com/doc/refman/5.1/en/select.html
SELECT count(*) AS total, list_id, address_id
FROM lnk_lists_addresses
WHERE total > 1
GROUP BY list_id, address_id
ORDER BY total DESC
LIMIT 20
If you name the COUNT() field, you can use it later in the statement.
EDIT: forgot about HAVING (>_<)