group by with condition on count(*) - sql

I am trying to find the count of obsv_dt which has less than million records
select obsv_dt,count(*) as c from table
group by obsv_dt
having c<1000000
order by c
is giving unable to resolve column 'c'. I get that 'c' is alias and this error is expected
How can i get this working?

select obsv_dt,count(*) as c from table
group by obsv_dt
having count(*) <1000000
order by count(*)

You cannot use the alias before it has been calculated; try:
select
obsv_dt,
count(obsv_dt) as c
from
table
group by obsv_dt
having count(obsv_dt) < 1000000
order by count(obsv_dt)
There is a subtle difference in using count(*) vs count(col). But often it does not matter. count(*) vs count(column-name) - which is more correct?

Below is the precedence of sql clauses:
FROM clause
ON clause
OUTER clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
DISTINCT clause
ORDER BY clause
TOP clause
Since HAVING clause is evaluated prior to the SELECT clause it is
unware of the aliases. Similarly for all clauses except for ORDER BY
where we can include the alias for sorting the result set.

Related

SAS SQL SELECT DISTINCT WITH GROUP BY

What if a SQL code as below?
Proc SQL;
SELECT DISTINCT ID,SUM(AMOUNT) AS M,SUM(NO) AS CNT
FROM CUSTOMER_LIST
GROUP BY ID
ORDER BY CNT DESC;
QUIT;
Use DISTINCT with GROUP BY. Any possible error will occur when using this combination Or DISTINCT just a redundant word?
Thanks~
Use DISTINCT with GROUP BY. Any possible error will occur when using this combination? Or DISTINCT just a redundant word?
This won't error, but that's just unnecessary redondancy. GROUP BY ID guarantees that each ID will appear only on one row in the resulset. There is no benefit for adding DISTINCT here - and it makes the intent of the query harder to understand.
On the other hand, there are situations where you would use DISTINCT without GROUP BY: typically when you want to deduplicate a set of columns, but do not need to use aggregate functions (SUM(), COUNT()...).
SELECT ID,SUM(AMOUNT) AS M,SUM(NO) AS CNT
FROM CUSTOMER_LIST
GROUP BY ID
ORDER BY CNT DESC;
We already group by id so no need distinct id

count the total number of column field appeared more than once in database

I am trying to run the query to get the total number of repetitions (appeared more than once) for one column called "abc" . I am trying this but not able to achieve.
select COUNT(SELECT DISTINCT card_no, COUNT(*) AS cnt )
please help, thanks in advance.
For Example below is the column :
cards
123,
456
,123
Result:
Count
1
As 123 appeared more than once.
You want the number of distinct values in the column that are repeated at least once, is that right?
SELECT COUNT(dupes)
FROM (SELECT card_no AS dupes, COUNT(*) cnt FROM table_name
GROUP BY card_no HAVING COUNT(*) > 1) A
Edit for explanation.
The inner query SELECT card_no AS dupes, COUNT(*) cnt FROM table_name GROUP BY card_no HAVING COUNT(*) > 1 returns only those values that are repeated in your table. The aliases on the columns are necessary because it's a subquery. You can run this query independently of the outer query to see what results it returns.
You have to have the group by on any field that you don't want to aggregate when you're aggregating other fields (e.g. performing a count of records), and the HAVING part is to filter out anything that isn't duplicated (i.e. has a count of 1). HAVING is the way to apply filtering on aggregated fields that you can't have in a WHERE.
The outer query SELECT COUNT(dupes)... is merely counting the number of card_no values returned by the inner query. Since these are grouped, it gives the number of distinct values that are duplicated.
A at the end there sets up an alias for the subquery so that it can be referenced like it's an actual table elsewhere in the query. This is necessary for any subquery in the FROM clause of another query. Effectively the select in the outer query reads SELECT COUNT(A.dupes)... and without the alias A there would be no way to qualify where the dupes field is being referenced from (even though in this case it's implied).
It's also worth noting that the field COUNT(*) cnt isn't required in the SELECT part of the subquery as it isn't being used anywhere else in the query. It will work just as effectively without it, as long as you still have the GROUP BY and HAVING clauses.
SELECT
card_no, COUNT(*) AS "Occurrences"
FROM
YourTable
GROUP BY card_no
HAVING
COUNT(*) > 1

Can't use where clause with alias count function name

SELECT
student_id
,COUNT(group_id) num
FROM StudentTable
WHERE num >= 2
GROUP BY student_id
ORDER BY num
desc
;
I keep getting the error: "num" invalid identifier. I don't understand why the where clause doesn't work with the alias name I give it. I already figured out the solution by using the "having" clause. I am just curious as to why the where clause doesn't work because In my mind it makes no sense as to why it doesnt work.
You don't want a where clause. You want a having clause:
SELECT student_id, COUNT(group_id) num
FROM StudentTable
GROUP BY student_id
HAVING num >= 2
ORDER BY num desc;
Some databases may not accept aliases in the having clause. In that case, you need to use a subquery or repeat the definition:
SELECT student_id, COUNT(group_id) num
FROM StudentTable
GROUP BY student_id
HAVING COUNT(group_id) >= 2
ORDER BY COUNT(group_id) desc;
you have to say either ORDER BY 2 or ORDER BY COUNT(group_id).
The query isn't "aware" of the column alias during the order by clause. The select is the VERY last thing to be run by the query engine.
The same thing applies to the WHERE clause.
If you're looking to filter on an aggregate function, using the HAVING clause is the best option. And even then, you have to say COUNT(group_id)

Detect duplicated rows with group by statement

To Detecting duplicated rows in my table i have this query :
select SeatForShowtimeID_FK,count(*) as cnt from dbo.TicketRow
group by SeatForShowtimeID_FK
having cnt>1
I want to find row that have same SeatForShowtimeID_FK, but when i execute this query i get this error :
Invalid column name 'cnt'.
What should i do for this?
change having cnt > 1 to having count(*) > 1
HAVING clause is WHERE clause of GROUP BY.
In HAVING you can't use alias of field.
You have written:
having cnt>1
but cnt is an alias. Your condition must be COUNT(*)>1 (or COUNT(1) as suggested by Moho).

Avoiding Correlated Subquery in Oracle

In Oracle 9.2.0.8, I need to return a record set where a particular field (LAB_SEQ) is at a maximum (it is a sequential VARCHAR array '0001', '0002', etc.) for each of another field (WO_NUM). To select the maximum, I am attempting to order in descending order and select the first row. Everything I can find on StackOverflow suggests that the only way to do this is with a correlated subquery. Then I use this maximum in the WHERE clause of the outer query to get the row I want for each WO_NUM:
SELECT lt.WO_NUM, lt.EMP_NUM, lt.LAB_END_DATE, lt.LAB_END_TIME
FROM LAB_TIM lt WHERE lt.LAB_SEQ = (
SELECT LAB_SEQ FROM (
SELECT lab.LAB_SEQ FROM LAB_TIM lab WHERE lab.CCN='1' AND MAS_LOC='1'
AND lt.WO_NUM = lab.WO_NUM ORDER BY ROWNUM DESC
) WHERE ROWNUM=1
)
However, this returns an invalid identifier for lt.WO_NUM error. Research suggests that ORacle 8 only allows correlated subqueries one level deep, and suggests rewriting to avoid the subquery - something which discussion of selecting maximums suggests can't be done. Any help getting this statement to execute would be greatly appreciated.
Your correlated subquery would need to be something like
SELECT lt.WO_NUM, lt.EMP_NUM, lt.LAB_END_DATE, lt.LAB_END_TIME
FROM LAB_TIM lt WHERE lt.LAB_SEQ = (
SELECT max(lab.LAB_SEQ)
FROM LAB_TIM lab
WHERE lab.CCN='1' AND MAS_LOC='1'
AND lt.WO_NUM = lab.WO_NUM
)
Since you are on Oracle 9.2, it will probably be more efficient to use a correlated subquery. I'm not sure what the predicates lab.CCN='1' AND MAS_LOC='1' are doing in your current query so I'm not quite sure how to translate them into the analytic function approach. Is the combination of LAB_SEQ and WO_NUM not unique in LAB_TIM? Do you need to add in the predicates on CCN and MAS_LOC in order to get a single unique row for every WO_NUM? Or are you using those predicates to decrease the number of rows in your output? The basic approach will be something like
SELECT *
FROM (SELECT lt.WO_NUM,
lt.EMP_NUM,
lt.LAB_END_DATE,
lt.LAB_END_TIME,
rank() over (partition by wo_num
order by lab_seq desc) rnk
FROM LAB_TIM lt)
WHERE rnk = 1
but it's not clear to me whether CCN and MAS_LOC need to be added to the ORDER BY clause in the analytic function or whether they need to be added to the WHERE clause.
This is one case where a correlated subquery is better, particularly if you have indexes on the table. However, it should be possible to rewrite correlated subqueries as joins.
I think the following is equivalent, without the correlated subquery:
SELECT lt.WO_NUM, lt.EMP_NUM, lt.LAB_END_DATE, lt.LAB_END_TIME
FROM (select *, rownum as r
from LAB_TIM lt
) lt join
(select wo_num, max(r) as maxrownum
from (select LAB_SEQ, wo_num, rownum as r
from LAB_TIM lt
where lab.CCN = '1' AND MAS_LOC = '1'
)
) ltsum
on lt.wo_num = ltsum.wo_num and
lt.r = ltsum.maxrownum
I'm a little unsure about how Oracle works with rownums in things like ORDER BY.