I'll explain my problem so it becomes clearer.
I have to select the hospital with the biggest amount of medics.
My table looks like this :
Medic_Hospital values (codhospital,codmedic)
I have tried :
SELECT MAX(codmedic) FROM Medic_Hospital
but that only returns the number 6
( which is one of the medic's id )
SELECT codhospital,count(codmedic) FROM Medic_Hospital
where max(codmedic) = count(codmedic)
group by codhospital
but this also failed as
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.
SELECT codhospital,MAX(COUNT(codmedic)) from Medic_Hospital
but that failed as
"Cannot perform an aggregate function on an expression containing an
aggregate or a subquery."
I'm not very experienced in SQL and I can see that my logic is failing me here. Could someone point me in the right direction please?
You could use the top clause to return just the first row of an ordered query:
SELECT TOP 1 codhospital, COUNT(codmedic)
FROM Medic_Hospital
GROUP BY codhospital
ORDER BY 2 DESC
Related
i have this query
SELECT PersonalInfo.id, PersonalInfo.[k-commission], Abs(Not IsNull([PersonalInfo]![k-commission].[Value])) AS CommissionAbsent
FROM PersonalInfo;
and the PersonalInfo.k-commission is a multi value field. the CommissionAbsent shows duplicate values for each k-commission value. when i use DISTINCT i get an error saying that the keyword cannot be used with a multi value field.
now i want to remove the duplicates and show only one result for each. i tried using a WHERE but i dont know how.
edit: i have a lot more columnes and in the example i only showed the few i need.
You can use GROUP BY and COUNT to solve your problem, here is an example for it
SELECT clmn1, clmn2, COUNT(*) as count
FROM table
GROUP BY clmn1, clmn2
HAVING COUNT(*) > 1;
the query groups the rows in the table by the clmn1 and clmn2 columns, and counts the number of occurrences of each group. The HAVING clause is then used to filter the groups and only return the groups that have a count greater than 1, which indicates duplicates.
If you want to select all, then you can do like this
SELECT *
FROM table
WHERE (clmn1, clmn2) IN (SELECT clmn1, clmn2
FROM table
GROUP BY clmn1, clmn2
HAVING COUNT(*) > 1)
SELECT PersonalInfo.id, PersonalInfo.[k-commission], Abs(Not IsNull([PersonalInfo]![k-commission].[Value])) AS CommissionAbsent
FROM PersonalInfo
GROUP BY PersonalInfo.id, PersonalInfo.[k-commission], Abs(Not IsNull([PersonalInfo]![k-commission].[Value]))
HAVING COUNT(*) > 1
Can you filter a SQL table based on an aggregated value, but still show column values that weren't in the aggregate statement?
My table has only 3 columns: "Composer_Tune", "_Year", and "_Rank".
I want to use SQL to find which "Composer_Tune" values are repeated in each annual list, as well as which ranks the duplicated items had.
Since I am grouping by "Composer_Tune" & "Year", I can't list "_Rank" with my current code.
The image shows the results of my original "find the duplicates" query vs what I want:
Current vs Desired Results
I tried applying the concepts in this Aggregate Subquery StackOverflow post but am still getting "_Rank is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause" from this code:
WITH DUPE_DB AS (SELECT * FROM DB.dbo.[NAME] GROUP BY Composer_Tune, _Year HAVING COUNT(*)>1)
SELECT Composer_Tune, _Year, _Rank
FROM DUPE_DB
You need to explicitly declare the columns used in the Group By expression in the select columns.
You can use the following documentation if you are using transact sql for the proper use of Group By.
Simply join the aggregated resultset to original unit level table:
WITH DUPE_DB AS (
SELECT Composer_Tune, _Year
FROM DB.dbo.[NAME]
GROUP BY Composer_Tune, _Year
HAVING COUNT(*) > 1
)
SELECT n.Composer_Tune, n._Year, n._Rank
FROM DB.dbo.[NAME] n
INNER JOIN DUPE_DB
ON n.Compuser_Tune = DUPE_DB.Composer_Tune
AND n._Year = DUPE_DB._Year
ORDER n.Composer_Tune, n._Year
I have a table with two columns namely ID and KEY (let key here be an integer) such as
ID KEY
ABC 6
DEF 1
GHI 12
TASK: Get the ID of the MAX key
Solution 1:
Select Top(1) ID
from TABLE
order by KEY desc
Solution 2:
Select ID
from TABLE
where ID = MAX(ID)
EDIT: The query was invalid. This is what I meant:
Select ID
from TABLE
where KEY = (select max(KEY) from TABLE)
Is one of these solutions categorically better than the other? What are the advantages/disvantages of each solution.
EDIT:
Assume there is no index.
Case 1 - large table
Case 2 - small table
Background:
I am doing code review and I have found both solutions multiple times in different context - sometimes with indices, sometimes without, sometimes for large tables, sometimes for small.
The two queries are different (after your edits fixing the second one).
The first necessarily returns a single row.
The second returns all matching rows.
The first returns a row even when key is NULL.
The second does not.
You should use the logic that does what you want.
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list..
Solution 1 will be the best. A subquery in a where clause will be less optimal.
There really are lots of design techniques to look at for performance which I am not going to go into with this answer. I found this article yesterday which gave me more perspective https://www.red-gate.com/simple-talk/sql/database-administration/sql-server-storage-internals-101/
In Solution 1, the order by clause will just sort your query result.
Query execution order:
FROM clause ON clause OUTER clause WHERE clause GROUP BY clause HAVING clause SELECT clause DISTINCT clause ORDER BY clause TOP clause
You can use the following query:
Select ID,
RANK() OVER (ORDER BY KEY DESC) AS KeyRank
from table1
HAVING keyRank = 1
Solution 1 will work but Solution 2 will throw exception like bellow
Msg 147, Level 15, State 1, Line 22 An aggregate may not appear in the
WHERE clause unless it is in a subquery contained in a HAVING clause
or a select list, and the column being aggregated is an outer
reference.
You can go with query 1 ,
You cannot use query 2 because you cannot use aggregate function like that if you want to use where clause and aggregate function in your query you have to go with as below :
Select id from table where key in (select max(key) from test);
reference only using aggregate function and having clause
Select ID ,max(key)
from test
group by ID,key
having (key) >= 12
order by 1
The following statement works in my database:
select column_a, count(*) from my_schema.my_table group by 1;
but this one doesn't:
select column_a, count(*) from my_schema.my_table;
I get the error:
ERROR: column "my_table.column_a" must appear in the GROUP BY clause
or be used in an aggregate function
Helpful note: This thread: What does SQL clause "GROUP BY 1" mean? discusses the meaning of "group by 1".
Update:
The reason why I am confused is because I have often seen count(*) as follows:
select count(*) from my_schema.my_table
where there is no group by statement. Is COUNT always required to be followed by group by? Is the group by statement implicit in this case?
This error makes perfect sense. COUNT is an "aggregate" function. So you need to tell it which field to aggregate by, which is done with the GROUP BY clause.
The one which probably makes most sense in your case would be:
SELECT column_a, COUNT(*) FROM my_schema.my_table GROUP BY column_a;
If you only use the COUNT(*) clause, you are asking to return the complete number of rows, instead of aggregating by another condition. Your questing if GROUP BY is implicit in that case, could be answered with: "sort of": If you don't specify anything is a bit like asking: "group by nothing", which means you will get one huge aggregate, which is the whole table.
As an example, executing:
SELECT COUNT(*) FROM table;
will show you the number of rows in that table, whereas:
SELECT col_a, COUNT(*) FROM table GROUP BY col_a;
will show you the the number of rows per value of col_a. Something like:
col_a | COUNT(*)
---------+----------------
value1 | 100
value2 | 10
value3 | 123
You also should take into account that the * means to count everything. Including NULLs! If you want to count a specific condition, you should use COUNT(expression)! See the docs about aggragate functions for more details on this topic.
If you don't use the Group by clause at all then all that will be returned is a count of 1 for each row, which is already assumed anyway and therefore redundant data. By adding GROUP BY 1 you have categorized the information thereby making it non-redundant even though it returns the same result in theory as the statement that creates an error.
When you have a function like count, sum etc. you need to group the other columns. This would be equivalent to your query:
select column_a, count(*) from my_schema.my_table group by column_a;
When you use count(*) with no other column, you are counting all rows from SELECT * from the table. When you use count(*) alongside another column, you are counting the number of rows for each different value of that other column. So in this case you need to group the results, in order to show each value and its count only once.
group by 1 in this case refers to column_a which has the column position 1 in your query.
This why it works on your server. Indeed this is not a good practice in sql.
You should mention the column name because the column order may change in the table so it will be hard to maintain this code.
The best solution is:
select column_a, count(*) from my_schema.my_table group by column_a;
I have to find distinct count of combination of 2 variables. I used the following 2 queries to find the count:
select count(*) from
( select V1, V2
from table1
group by 1,2
) a
select count(distinct catx('-', V1, V2))
from table1
Logically, both the above queries should give the same count but I am getting different counts. Note that
both V1 and V2 are integers
Both variables can have null values, though there are no null values in my table
There are no negative values
Any idea why I might be getting different outputs? And which is the best way to find the count of distinct combinations of 2 or more columns?
Thanks.
The SAS log gives the answer when you run the first sql code. Using 'group by' requires a summary function, otherwise it is ignored. The count will therefore return the overall number of rows instead of a distinct count of the 2 variables combined.
Just add count(*) to the subquery and you will get the same answer with both methods.
select count(*) from
( select V1, V2, count(*)
from table1
group by 1,2
) a
Use distinct in the subquery for the first query..
When you do a group by but don't include any aggregate function, it discards the group by.
so you will still have duplicate combinations of v1 and v2.
It seems that GROUP BY doesn't work that way in SAS. You can't use it to remove duplicates unless you have an aggregate function in your query. I found this in the log of my query output -
NOTE: A GROUP BY clause has been discarded because neither the SELECT
clause nor the optional HAVING clause of the associated
table-expression referenced a summary function.
This answers the question.
you can ignore the group by part also and just add a distinct in the sub-query. Also the second query you wrote is more efficient