SQL - Displaying entries that are the max of a count? - sql

CREATE TABLE doctor( patient CHAR(13), docname CHAR(30) );
Say I had a table like this, then how would I display the names of the doctors that have the most patients? Like if the most was three and two doctors had three patients then I would display both of their names.
This would get the max patients:
SELECT MAX(count)
FROM (SELECT COUNT(docname) FROM doctor GROUP BY docname) a;
This is all the doctors and how many patients they have:
SELECT docname, COUNT(docname) FROM doctor GROUP BY name;
Now I can't figure out how to combine them to list only the names of doctors who have the max patients.
Thanks.

This should do it.
SELECT docname, COUNT(*) FROM doctor GROUP BY name HAVING COUNT(*) =
(SELECT MAX(c) FROM
(SELECT COUNT(patient) AS c
FROM doctor
GROUP BY docname))
On the other hand if you require only the first entry, then
SELECT docname, COUNT(docname) FROM doctor
GROUP BY name
ORDER BY COUNT(docname) DESC LIMIT 1;

This should do it for you:
SELECT docname
FROM doctor
GROUP BY docname
HAVING COUNT(patient)=
(SELECT MAX(patientcount) FROM
(SELECT docname,COUNT(patient) AS patientcount
FROM doctor
GROUP BY docname) t1)

Here's another alternative that only has one subquery instead of two:
SELECT docname
FROM author
GROUP BY name
HAVING COUNT(*) = (
SELECT COUNT(*) AS c
FROM author
GROUP BY name
ORDER BY c DESC
LIMIT 1
)

Allowing for any feature in any ISO SQL specification since you did not specify a database product or version, and assuming that the table of patients is called "patients" and has a column called "docname", the following might give you what you wanted:
With PatientCounts As
(
Select docname
, Count(*) As PatientCount
From patient
Group By docname
)
, RankedCounts As
(
Select docname, PatientCount
, Rank() Over( Order By PatientCount ) As PatientCountRank
From PatientCounts
)
Select docname, PatientCount, PatientCountRank
From RankedCounts
Where PatientCountRank = 1

While using ... HAVING COUNT(*) = ( ...MAX().. ) works:
Within the query, it needs almost the same sub-query twice.
For most databases, it needs a 2nd level sub-query as MAX( COUNT(*) )
is not supported.
While using TOP / LIMIT / RANK etc works:
It uses SQL extensions for a specific database.
Also, using TOP / LIMIT of 1 will only give one row - what if there are two or more doctors with the same maximum number of patients?
I would break the problem into steps:
Get target field(s) and associated count
SELECT docName, COUNT( patient ) AS countX
FROM doctor
GROUP BY docName
Using the above as a 'statement scoped view', use this to get the max count row(s)
WITH x AS
(
SELECT docName, COUNT( patient ) AS countX
FROM doctor
GROUP BY docName
)
SELECT x.docName, x.countX
FROM x
WHERE x.countX = ( SELECT MAX( countX ) FROM x )
The WITH clause, which defines a 'statement scoped view', effectively gives named sub-queries that can be re-used within the same query.
While this solution, using statement scoped views, is longer, it is:
Easier to test
Self documenting
Extendable
It is easier to test as parts of the query can be run standalone.
It is self documenting as the query directly reflects the requirement
ie the statement scoped view lists the target field(s) and associated count.
It is extendable as if other conditions or fields are required, this can be easily added to the statement scoped view.
eg in this case, the table stucture should be changed to include a doctor-id as a primary key field and this should be part of the results.

Take both queries and join them together to get the max:
SELECT
docName,
m.MaxCount
FROM
author
INNER JOIN
(
SELECT
MAX(count) as MaxCount,
docName
FROM
(SELECT
COUNT(docname)
FROM
doctor
GROUP BY
docname
)
) m ON m.DocName = author.DocName

Another alternative using CTE:
with cte_DocPatients
as
(
select docname, count(*) as patientCount
from doctor
group by docname
)
select docname, patientCount from
cte_DocPatients where
patientCount = (select max(patientCount) from cte_DocPatients)

if you do not need to care about performance I think just sorting and pick first element. Something like this:
SELECT docname, COUNT(docname) as CNT
FROM doctor
WHERE docname = docname
GROUP BY docname
ORDER BY CNT DESC
LIMIT 1

This will give you each doctor name and respective count of treating patients
SELECT docname, COUNT(docname) as TreatingPatients FROM doctor
WHERE docname = docname
GROUP BY docname

Related

DISTINCT AND COUNT(*)=1 not working on SQL

I need to show the ID (which is unique in every case) and the name, which is sometimes different. In my code I only want to show the names IF they are unique.
I tried with both distinct and count(*)=1, nothing solves my problem.
SELECT DISTINCT id, name
FROM person
GROUP BY id, name
HAVING count(name) = 1;
The result is still showing the names multiple times
By "unique", I assume you mean names that only appear once. That is not what "distinct" means in SQL; the use of distinct is to remove duplicates (either for counting or in a result set).
If so:
SELECT MAX(id), name
FROM person
GROUP BY name
HAVING COUNT(*) = 1;
If your DBMS supports it, you can use a window function:
SELECT id, name
FROM (
SELECT id, name, COUNT(*) OVER(PARTITION BY name) AS NameCount -- get count of each name
FROM person
) src
WHERE NameCount = 1
If not, you can do:
SELECT id, name
FROM person
WHERE name IN (
SELECT name
FROM person
GROUP BY name
HAVING COUNT(*) = 1 -- Only get names that occur once
)

SQL Query: Find the name of the company that has been assigned the highest number of patents

Using this query I can find the Company Assignee number for company with most patents but I can't seem to print the company name.
SELECT count(*), patent.assignee
FROM Patent
GROUP BY patent.assignee
HAVING count(*) =
(SELECT max(count(*))
FROM Patent
Group by patent.assignee);
COUNT(*) --- ASSIGNEE
9 19715
9 27895
Nesting above query into
SELECT company.compname
FROM company
WHERE ( company.assignee = ( *above query* ) );
would give an error "too many values" since there are two companies with most patents but above query takes only one assignee number in the WHERE clause. How do I solve this problem? I need to print name of BOTH companies with assignee number 19715 and 27895. Thank you.
You have started down the path of using nested queries. All you need to do is remove COUNT(*):
SELECT company.compname
FROM company
WHERE company.assignee IN
(SELECT patent.assignee
FROM Patent
GROUP BY patent.assignee
HAVING count(*) = (SELECT max(count(*))
FROM Patent
GROUP BY patent.assignee
)
);
I wouldn't write the query this way. The use of max(count(*)) is particularly jarring, but it is valid Oracle syntax.
Applying an aggregate function on another aggregate function (like max(count(*))) is illegal in many databases but I believe using the ALL operator instead and a join to get the company name would solve your problem.
Try this:
SELECT COUNT(*), p.assignee, c.compname
FROM Patent p
JOIN Company c ON c.assignee = p.assignee
GROUP BY p.assignee, c.compname
HAVING COUNT(*) >= ALL -- this predicate will return those rows
( -- for which the comparison holds true
SELECT COUNT(*) -- for all instances.
FROM Patent -- it can only be true for the highest count
GROUP BY assignee
);
Assuming you have Oracle, I thought about this a bit differently:
select
c.compname
from
company c
join
(
select
assignee,
dense_rank() over (order by count(1) desc) rnk
from
patent
group by
assignee
) p
on p.assignee = c.assignee
where
p.rnk = 1
;
I like this because is lets you find the any rank. For example, if you want the top 3 you would just change p.rnk = 1 to p.rnk <= 3. If you want 10th place, you just change it to p.rnk = 10. Adding the total count and rank into the results would be easy from here too. Overall I think it's more versatile.

Select entry of each group having exactly 1 entry

I am looking for an optimized query
let me show you a small example.
Lets suppose I have a table having three field studentId, teacherId and subject as
Now I want those data in which a physics teacher is teaching to only one student, i.e
teacher 300 is only teaching student 3 and so on.
What I have tried till now
select sid,tid from tabletesting with(nolock)
where tid in (select tid from tabletesting with(nolock)
where subject='physics' group by tid having count(tid) = 1)
and subject='physics'
The above query is working fine. But I want different solution in which I don't have to scan the same table twice.
I also tried using Rank() and Row_Number() but no result.
FYI :
I have showed you an example, this is not the actual table i am playing with, my table contain huge number of rows and columns and where clause is also very complex(i.e date comparison etc.), so I don't want to give the same where clause in subquery and outquery.
You can do this with window functions. Assuming that there are no duplicate students for a given teacher (as in your sample data):
select tt.sid, tt.tid
from (select tt.*, count(*) over (partition by teacher) as scnt
from TableTesting tt
) tt
where scnt = 1;
Another way to approach this, which might be more efficient, is to use an exists clause:
select tt.sid, tt.tid
from TableTesting tt
where not exists (select 1 from TableTesting tt1 where tt1.tid = tt.tid and tt1.sid <> tt.sid)
Another option is to use an analytic function:
select sid, tid, subject from
(
select sid, tid, subject, count(sid) over (partition by subject, tid) cnt
from tabletesting
) X
where cnt = 1

UPDATE PostgreSQL table with values from self

I am attempting to update multiple columns on a table with values from another row in the same table:
CREATE TEMP TABLE person (
pid INT
, name VARCHAR(40)
, dob DATE
, younger_sibling_name VARCHAR(40)
, younger_sibling_dob DATE
);
INSERT INTO person VALUES (pid, name, dob)
(1, 'John' , '1980-01-05')
, (2, 'Jimmy', '1975-04-25')
, (3, 'Sarah', '2004-02-10')
, (4, 'Frank', '1934-12-12')
;
The task is to populate younger_sibling_name and younger_sibling_dob with the name and birthday of the person that is closest to them in age, but not older or the same age.
I can set the younger sibling dob easily because this is the value that determines the record to use with a correlated subquery (I think this is an example of that?):
UPDATE person SET younger_sibling_dob = (
SELECT MAX(dob)
FROM person AS sibling
WHERE sibling.dob < person.dob);
I just can't see any way to get the name?
The real query of this will run over about 1M rows in groups of 100-500 for each MAX selection so performance is a concern.
Edit
After trying many different approaches, I've decided on this one which I think is a good balance of being able to verify the data with the intermediate result, shows the intention of what the logic is, and performs adequately:
WITH sibling AS (
SELECT person.pid, sibling.dob, sibling.name,
row_number() OVER (PARTITION BY person.pid
ORDER BY sibling.dob DESC) AS age_closeness
FROM person
JOIN person AS sibling ON sibling.dob < person.dob
)
UPDATE person
SET younger_sibling_name = sibling.name
,younger_sibling_dob = sibling.dob
FROM sibling
WHERE person.pid = sibling.pid
AND sibling.age_closeness = 1;
SELECT * FROM person ORDER BY dob;
Rewrite 2022
I expect your added solution to perform poorly, as it's doing a of of unnecessary work. The following should be much faster.
The question and the added solution do not define which row to pick when there are multiple with the same dob. Typically you'll want a deterministic pick. This query picks the alphabetically first name from each group of peers with the same dob. Adapt to your needs.
UPDATE person p
SET younger_sibling_name = y.name
, younger_sibling_dob = y.dob
FROM (
SELECT dob, name, lead(dob) OVER (ORDER BY dob) AS next_dob
FROM (
SELECT DISTINCT ON (dob)
dob, name
FROM person p
ORDER BY dob, name -- ①
) sub
) y
WHERE p.dob = y.next_dob;
db<>fiddle here - with extended test case
Works since at least Postgres 8.4.
Needs an index on dob to be fast, ideally a multicolumn index on (dob, name).
Subquery sub passes over the whole table once and distills distinct rows per dob.
① I added name to ORDER BY as tiebreaker to pick the row with the alphabetically first name. Adapt to our needs.
In the outer SELECT add the next later dob (next_dob) to each row with lead() - simple now with distinct dob. Then join to that next_dob and the rest is simple.
If no younger person exists, no UPDATE happens and the columns stay NULL.
About DISTINCT ON and possibly faster query techniques for many duplicates:
Select first row in each GROUP BY group?
Optimize GROUP BY query to retrieve latest row per user
Taking dob and name from the same row guarantees we stay in sync. Multiple correlated subqueries would not offer this guarantee, and would be more expensive anyway.
Original answer
Still valid.
Old query 1
WITH cte AS (
SELECT *, dense_rank() OVER (ORDER BY dob) AS drk
FROM person
)
UPDATE person p
SET younger_sibling_name = y.name
, younger_sibling_dob = y.dob
FROM cte x
JOIN (SELECT DISTINCT ON (drk) * FROM cte) y ON y.drk = x.drk - 1
WHERE x.pid = p.pid;
Old sqlfiddle
In the CTE cte use the window function dense_rank() to get a rank without gaps according to the dop for every person.
Join cte to itself, but remove duplicates on dob from the second instance. Thereby everybody gets exactly one UPDATE. If more than one person share the same dop, the same one is selected as younger sibling for all persons on the next dob. I do this with:
(SELECT DISTINCT ON (rnk) * FROM cte)
Add ORDER BY rnk, ... to this subquery to pick a particular person for every dob.
Old query 2
WITH cte AS (
SELECT dob, min(name) AS name
, row_number() OVER (ORDER BY dob) rn
FROM person p
GROUP BY dob
)
UPDATE person p
SET younger_sibling_name = y.name
, younger_sibling_dob = y.dob
FROM cte x
JOIN cte y ON y.rn = x.rn - 1
WHERE x.dob = p.dob;
Old sqlfiddle
This works, because aggregate functions are applied before window functions. And it should be very fast since both operations agree on the sort order.
Obviates the need for a later DISTINCT like in query 1.
Result is the same as query 1, exactly.
Again, you can add more columns to ORDER BY to pick a particular person for every dob.
1) Finding the MAX() can alway be rewritten in terms of NOT EXISTS (...)
UPDATE person dst
SET younger_sibling_name = src.name
,younger_sibling_dob = src.dob
FROM person src
WHERE src.dob < dst.dob
OR src.dob = dst.dob AND src.pid < dst.pid
AND NOT EXISTS (
SELECT * FROM person nx
WHERE nx.dob < dst.dob
OR nx.dob = dst.dob AND nx.pid < dst.pid
AND nx.dob > src.dob
OR nx.dob = src.dob AND nx.pid > src.pid
);
2) Instead of rank() / row_number(), you could also use a LAG() function over the WINDOW:
UPDATE person dst
SET younger_sibling_name = src.name
,younger_sibling_dob = src.dob
FROM (
SELECT pid
, LAG(name) OVER win AS name
, LAG(dob) OVER win AS dob
FROM person
WINDOW win AS (ORDER BY dob, pid)
) src
WHERE src.pid = dst.pid
;
Both versions require a self-joined subquery (or CTE) because UPDATE does not allow window functions.
To get the dob and name, you can do:
update person
set younger_sibling_dob = (select dob
from person p2
where s.dob < person.dob
order by dob desc
limit 1),
younger_sibling_name = (select name
from person p2
where s.dob < person.dob
order by dob desc
limit 1)
If you have an index on dob, then the query will run faster.

How do I create a SQL Distinct query and add some additional fields

I have the following query that selects combinations of first and last names and show me dupes. It works, not problems here.
I want to include three other fields for reference; Id, cUser, and cDate. These additional fields, however, should not be used to determine duplicates as I'd likely not end up with any duplicates.
SELECT * FROM
(SELECT FirstName, LastName, COUNT(*) as "Count"
FROM Contacts
WHERE ContactTypeID = 1
GROUP BY LastName,FirstName
) AS X
WHERE COUNT > 1
ORDER BY COUNT DESC
Any suggestions? Thanks!
SELECT *
FROM (
SELECT *, COUNT(*) OVER (PARTITION BY FirstName, LastName) AS cnt
FROM Contacts
WHERE ContactTypeId = 1
) q
WHERE cnt > 1
ORDER BY
cnt DESC
This will return all fields for each of the duplicated records.
If these fields are always the same then you can include them in GROUP BY and it will not affect the detection of duplicates
If they are not then you must decide what kind of aggregate function you will apply on them, for example MAX() or MIN() would work and would give you some indication of which values are associated with some of the attributes for the duplicates.
Otherwise, if you want to see all of the records you can join back to the source
SELECT X2.* FROM
(SELECT FirstName, LastName, COUNT(*) as "Count"
FROM Contacts
WHERE ContactTypeID = 1
GROUP BY LastName,FirstName
) AS X INNER JOIN Contact X2 ON X.LastName = X2.LastName AND X.FirstName = X2.FirstName
WHERE COUNT > 1
ORDER BY COUNT DESC