How to create a good request with "max(count(*))"? - sql

I have to say who is the scientist who have been the most in mission. I tried this code but it wasn't successful:
select name
from scientist, mission
where mission.nums = chercheur.nums
having count(*) = (select max(count(numis)) from mission, scientist where
mission.nums = chercheur.nums
group by name)
I have done several modifications for this request but I only obtain errors (ora-0095 and ora-0096 if I remember correctly).
Also, I create my tables with:
CREATE TABLE Scientist
(NUMS NUMBER(8),
NAME VARCHAR2 (15),
CONSTRAINT CP_CHER PRIMARY KEY (NUMS));
CREATE TABLE MISSION
(NUMIS NUMBER(8),
Country VARCHAR2 (15),
NUMS NUMBER(8),
CONSTRAINT CP_MIS PRIMARY KEY (NUMIS),
CONSTRAINT CE_MIS FOREIGN KEY (NUMS) REFERENCES SCIENTIST (NUMC));

You could count the missions each scientist participated in, and wrap that query in a query with a window function that will rank them according to their participation:
SELECT name
FROM (SELECT name, RANK() OVER (PARTITION BY name ORDER BY cnt DESC) AS rk
FROM (SELECT name, COUNT(*) AS cnt
FROM scientist s
JOIN mission m ON s.nums = m.nums
GROUP BY name) t
) q
WHERE rk = 1

Step 0 : Format your code :-) It would make it much easier to visualize
Step 1 : Get the count of Numis by Nums in the Mission table. This will tell you how many missions were done by each Nums
This is done in the cte block cnt_by_nums
Next to get the name of the scientist by joining cnt_by_nums with scientist table.
After that you want to get only those scientists who have the cnt_by_missions as the max available value from cnt_by_num
with cnt_by_nums
as (select Nums,count(Numis) as cnt_missions
from mission
group by Nums
)
select a.Nums,max(b.Name) as name
from cnt_by_nums a
join scientist b
on a.Nums=b.Nums
group by a.Nums
having count(a.cnt_missions)=(select max(a1.cnt_missions) from cnt_by_nums a1)

I'd write a query like this:
SELECT NAME, COUNTER
FROM
(SELECT NAME, COUNT(*) AS COUNTER
FROM SCIENTIST S
LEFT JOIN MISSION M
ON S.NUMS=M.NUMS
GROUP BY NAME) NUM
INNER JOIN
(SELECT MAX(COUNTER) AS MAX_COUNTER FROM
(SELECT NAME, COUNT(*) AS COUNTER
FROM SCIENTIST S
LEFT JOIN MISSION M
ON S.NUMS=M.NUMS
GROUP BY NAME) C) MAX
ON NUM.COUNTER=MAX.MAX_COUNTER;
(it works on MYSQL, I hope it's the same in Oracle)

As you don't select the name of your scientist (only count their missions) you don't need to join those tables within the subquery. Grouping over the foreign key would be sufficient:
select count(numis) from mission group by nums
You column names are a bit weird but that's your choice ;-)
Selecting only the scientist with the most mission references could be achieved in two ways. One way would be your approach where you may get multiple scientists if they have the same max missions.
The first problem you have in your query is that you are checking an aggregation (HAVING COUNT(*) = ) without grouping. You are only grouping your subselect.
Second, you could not aggregate an aggregation (MAX(COUNT)) but you may select only the first row of that subselect ordered by it's size or select the max of it by subquerying the subquery.
Approach with only one line:
select s.name from scientist s, mission m
where m.nums = s.nums
group by name
having count(*) =
(select count(numis) from mission
group by nums
order by 1 desc
fetch first 1 row only)
Approach with double subquery:
select s.name from scientist s, mission m
where m.nums = s.nums
group by name having count(*) =
(select max(numis) from
(select count(numis) numis from mission group by nums)
)
Second approach would be doing the FETCH FIRST on yur final result but this would give you exactly 1 scientist even if there are multiple with the same max missions:
select s.name from scientist s, mission m
where m.nums = s.nums
group by name
order by count(*) desc
fetch first 1 row only
Doing a cartisian product is not state of the art but the optimizer would make it a "good join" with the given reference in the where clause.
Made these with IBM Db2 but should also work on Oracle.

If you want one row, then in Oracle 12+, you can do:
SELECT name, COUNT(*) AS cnt
FROM scientist s JOIN
mission m
ON s.nums = m.nums
GROUP BY name
ORDER BY COUNT(*) DESC
FETCH FIRST 1 ROW ONLY;
In earlier versions, you would generally use a subquery:
SELECT s.*
FROM (SELECT name, COUNT(*) AS cnt
FROM scientist s JOIN
mission m
ON s.nums = m.nums
GROUP BY name
ORDER BY COUNT(*) DESC
) sm
WHERE rownum = 1;
If you want ties, then generally window functions would be a simple solution:
SELECT s.*
FROM (SELECT name, COUNT(*) AS cnt,
RANK() OVER (ORDER BY COUNT(*) DESC) as seqnum
FROM scientist s JOIN
mission m
ON s.nums = m.nums
GROUP BY name
ORDER BY COUNT(*) DESC
) sm
WHERE seqnum = 1;

Related

Returning the entity with max number of participants

I'm using pgsql to find the cage number (cno) that holds the largest number of animals but doesn't have a bird in it.
The way I tried to do it is by creating a table that counts the number of animals in each cage (not including those with birds) and then return the ones where the count equals the max value.
select temp.cno,temp.size
from
(select cage.cno,cage.size,count(*) as q
from cage,animal
where cage.cno = animal.cno and cage.cno not in (select cno from animal where lower(atype)='sheep')
group by cage.cno,cage.size) as temp
where temp.q = (select max(q) from temp)
I'm getting the following error message
ERROR: relation "temp" does not exist
LINE 7: where temp.q = (select max(q) from temp)
Any idea how to overcome this issue? Why isn't temp recognized within the last sub query?
Here are the tables
cage (cno, type, size)
animal (aid, aname, cno, atype)
You already found out that a subquery defined in the FROM is not visible inside another subquery defined in the WHERE clause.
This is easily solvable with the use of a CTE (with a proper join):
WITH temp AS (
SELECT c.cno, c.size, COUNT(*) AS q
FROM cage c INNER JOIN animal a
ON a.cno = c.cno
WHERE c.cno NOT IN (SELECT cno FROM animal WHERE LOWER(atype) = 'bird')
GROUP BY c.cno, c.size
)
SELECT cno, size
FROM temp
WHERE q = (SELECT MAX(q) FROM temp);
But, if there is a case that in a cage exist animals of more than one type then the condition:
c.cno NOT IN (SELECT cno FROM animal WHERE LOWER(atype) = 'bird')
is not correct, because it returns all cages which contain other types than birds without a restriction that there are only other types than birds.
You can apply this restriction with aggregation.
If you want/expect only 1 cage as result:
SELECT c.cno, c.size
FROM cage c INNER JOIN animal a
ON a.cno = c.cno
GROUP BY c.cno
HAVING MAX((LOWER(a.atype) = 'bird')::int) = 0
ORDER BY COUNT(*) DESC LIMIT 1;
If you want more than one cages with the largest number of animals, use RANK() window function:
WITH cte AS (
SELECT c.cno, c.size,
RANK() OVER (ORDER BY COUNT(*) DESC) rnk
FROM cage c INNER JOIN animal a
ON a.cno = c.cno
GROUP BY c.cno
HAVING MAX((LOWER(a.atype) = 'bird')::int) = 0
)
SELECT cno, size FROM cte WHERE rnk = 1;
Note that since cno is the PRIMARY KEY of cage you only need to group by cno.
I solved it by ordering the results descending and using limit 1 to show the first row (which is the max)

SQL aggregate functions, inner join

I am working on writing a sql to get the SID and SNAME. In this task, I need to count which team win the max number of League and find out the SID.
Leagues(LID, CHAMPION_TID)
LID: League ID ; CHAMPION_TID: champion team ID
SUPPORT(SID, LID)
SPONSORS(SID, SNAME)
PRIMARY KEY: LID,SID
Now, I can find out which team win the max number of League through the following SQL:
SELECT
MAX(y.cham)
FROM
(SELECT
CHAMPION_TID, COUNT(L.CHAMPION_TID) AS cham
FROM
LEAGUES L
GROUP BY
L.CHAMPION_TID) y, LEAGUES L
WHERE
y.CHAMPION_TID = L.CHAMPION_TID;
I am confusing in the following step. My idea get the LID, then use the join table to display SID and SNAME. But I suck in this step.
SELECT L.LID, MAX(y.cham)
FROM
(SELECT CHAMPION_TID, COUNT(L.CHAMPION_TID) AS cham
FROM LEAGUES L
GROUP BY L.CHAMPION_TID) y, LEAGUES L
WHERE
y.CHAMPION_TID = L.CHAMPION_TID
You can use the following to find the Sponsor ID and Sponsor Name:
SELECT DISTINCT
sp.SID,
sp.SNAME
FROM
LEAGUES l3
INNER JOIN support s ON
l3.LID = s.LID
INNER JOIN SPONSORS sp ON
s.SID = sp.SID
WHERE
l3.CHAMPION_TID IN (
SELECT
l2.CHAMPION_TID
FROM
LEAGUES l2
GROUP BY
l2.CHAMPION_TID
HAVING
count(l2.CHAMPION_TID) = (
SELECT
count(l1.CHAMPION_TID)
FROM
LEAGUES l1
GROUP BY
l1.CHAMPION_TID
ORDER BY
count(l1.CHAMPION_TID) DESC
FETCH FIRST 1 ROW ONLY
)
);
It finds the count of CHAMPION_TID in LEAGUES, orders it by desc (such that the highest count is always on top), then uses it to find the associated CHAMPION_TID. It handles ties for max(count(CHAMPION_TID)) as well :)
If fetch first 1 row only does not work, you can use select top 1 l1.CHAMPION_TID...
Here is a working demo using Postgres.

Mysql subquery with "in" problem to associate parent table

I try to create a query who select the contacts information (table invoice_contacts), and the adresses (table invoice_adresses) associate to the contact which is the most used in the (table invoice_compta)
For exemple I have two contact :
Mike
John
Mike have 2 adresses :
Paris
London
Mike have 1 invoice with Paris, and 5 invoice with London, so I want the adresse of London associate to Mike.
I have try this query with an subquery which count all adresses associate to the contact for an adresses (with NB_ADRESSES), and select only the biggest (with order by NB_ADRESSES desc and limit 1), it's seem wells but I have an error when I do where ia2.ID_CONTACT = ic.ID_CONTACT ic.ID_CONTACT is not found.. (and I need to associate the contact to the subquery).
select ic.*,
ia.*
from invoice_contacts ic
left join invoice_adresses ia on ia.ID_CONTACT = ic.ID_CONTACT
and ia.ID_ADRESSE in (
select ia3.ID_ADRESSE
from (
select ia2.ID_ADRESSE,
count(*) as NB_ADRESSES
from invoice_adresses ia2
left join invoice_comptas ico on ico.ID_ADRESSE_CONTACT = ia2.ID_ADRESSE
where ia2.ID_CONTACT = ic.ID_CONTACT
group by ia2.ID_ADRESSE
order by NB_ADRESSES desc
limit 1
) as ia3
)
group by ic.ID_CONTACT
order by CONTACT_TITRE asc
I also have try with "exist" or "inner join" instead of "in" but I doesn't find good results, so the best way seems it to be with this query for me, but I don't found the solution.
I hope you will help me :)
Thanks
UPDATE :
So finally I have found an solution with this query :
select ic.*,
ia.*
from invoice_contacts ic
left join invoice_adresses ia on ia.ID_CONTACT = ic.ID_CONTACT
and ia.ID_ADRESSE = (
select ia3.ID_ADRESSE
from (
select ia2.*,
count(*) as NB_ADRESSES
from invoice_adresses ia2
left join invoice_comptas ico on ico.ID_ADRESSE_CONTACT = ia2.ID_ADRESSE
group by ia2.ID_ADRESSE
) as ia3
where ia3.ID_CONTACT = ic.ID_CONTACT
order by NB_ADRESSES desc
limit 1
)
group by ic.ID_CONTACT
order by CONTACT_TITRE asc
Thanks
Let me rephrase the problem as finding the most common contact/address combination for a given invoice.
I find it hard to follow your query and your table naming. But this is the idea:
select contact, address
from (select contact, address, count(*) as cnt,
row_number() over (partition by contact order by count(*) desc) as seqnum
from invoices
group by contact, address
) ca
where seqnum = 1;
The subquery is counting the number of times a given address (or city if you prefer) occur for each contact. The row_number() enumerates these, so the most common one has a value of "1". The outer query then chooses the most common value.

delete duplicates and retain MAX(id) mysql

I have a code where it list all the duplicates of the data on database
SELECT MAX(id) id
FROM el_student_class_relation
GROUP BY student_id, class_id
HAVING COUNT(*) > 1
Now, I'm trying to retain the MAX(id), then the rest of the duplicates should be deleted
I tried the code
DELETE us
FROM el_student_class_relation us
INNER JOIN(SELECT MAX(id) id
FROM el_student_class_relation
GROUP BY student_id, class_id HAVING COUNT(*) > 1) t ON t.id = us.id
But it deletes the MAX(ID) and it is retaining the the other duplicates and it is the opposite of what I want.
Try this
DELETE FROM el_student_class_relation
WHERE id not in
(
SELECT * from
(SELECT MAX(id) id
FROM el_student_class_relation
GROUP BY student_id, class_id) temp_tbl
)
Please note:
do not use the HAVING COUNT(*) > 1 in inner query.
it will create issue when there is only single record with same id.
You might try the following query that deletes all elements for which another one with a higher ID (and same class and student) exists:
DELETE
FROM el_student_class_relation el1
WHERE EXISTS (SELECT el2.id
FROM el_student_class_relation el2
WHERE el1.student_id = el2.student_id
AND el1.class_id = el2.class_id
AND el2.id > el1.id);
The direct fix for your query is to use an "anti-join", where NOT joining is the important feature. This can be done with LEFT JOIN.
DELETE
us
FROM
el_student_class_relation us
LEFT JOIN
(
SELECT student_id, class_id, MAX(id) id
FROM el_student_class_relation
GROUP BY student_id, class_id
-- HAVING COUNT(*) > 1 [Don't do this, you need to return ALL the rows you want to keep]
)
gr
ON gr.id = us.id
WHERE
gr.id IS NULL -- WHERE there wasn't a match in the "good rows" table
EDIT MariaDB and MySQL aren't the same thing. MariaDB DOES allow self joins on the table being deleted from.
in mysql(lower version) in case of delete sub-query work a little bit different way, you have to use a layer more than required
DELETE FROM el_student_class_relation us
WHERE us.id not in
(
select * from (
SELECT MAX(id) id
FROM el_student_class_relation
GROUP BY student_id, class_id
) t1
)

How to select the highest value after a count() | Sql Oracle

This is my query:
SELECT f.name, COUNT(*) as num_books
from author f
JOIN book b on b.tittle = f.book
Group by f.name
Which gives me this table:
NAME NUM_BOOKS
-------------------------------------------------- ----------
Dyremann 2
Nam mann 1
Thomas 1
Asgeir 1
Tullemann 5
Plantemann 1
Beste forfatter 1
Fagmann 5
Lars 1
Hans 1
Svein Arne 1
How could I easly alter the query to only display the author with the highest amount of released books? (While keeping in mind I'm rather new to sql)
Oracle, and as far as I know - only Oracle, allows you to nest two aggregate functions.
SELECT max (f.name) keep (dense_rank last order by count (*)) as name
from author f
JOIN book b on b.tittle = f.book
Group by f.name
In order to get ALL top authors:
select name
from (SELECT f.name,rank () over (order by count(*) desc) as rnk
from author f
JOIN book b on b.tittle = f.book
Group by f.name
)
where rnk = 1
Since Oracle 12c:
SELECT f.name
from author f
JOIN book b on b.tittle = f.book
Group by f.name
order by count (*) desc
fetch first row /* with ties (optional, in order to get all top authors) */
The best way to do is to use:
SELECT f.name, COUNT(*) as num_books
from author f
JOIN book b on b.tittle = f.book
Group by f.name
Order by num_books DESC
FETCH FIRST ROW ONLY
This will order the results from biggest to smallest and return the first result.
1) Oracle Specific : ( Using ROWNUM, For Postgres/MySql use limit )
select * from
(SELECT f.name, COUNT(*) as num_books
from author f
JOIN book b on b.tittle = f.book
Group by f.name order by num_books desc )
where ROWNUM = 1
2) General Query for all databases :
select f.name,count(*) as max_num_books from author f
JOIN book b on b.tittle = f.book
Group by f.name
having count(*) =
(select max(num_books)
from
(SELECT f.name, COUNT(*) as num_books
from author f
JOIN book b on b.tittle = f.book
Group by f.name)
);
I am not sure why you need a join in the first place. It appears that the author table has a column book - why is it not enough to count(book) from that table, grouping by name? This arrangement is very strange - the author table should only have author properties, the author name should be in the title table, but you do join on author.book = book.title which seems to suggest that you do, in fact, have that strange arrangement (and therefore you don't need a join). Also, having a table and a column (in another table) share the same name, book, is a practice best to be avoided.
The most elementary solution (not the most efficient though), in this case, is
select name, count(book) as max_num_books
from author
group by name
having count(book) = (select max(count(book) from author group by name);
The subquery groups by name, and then it selects the max over all group counts. The outer query selects the names that have a book count equal to this maximum. The subquery returns a single row in a single column - a single value. Such a query is called a "scalar" subquery and can be used wherever a single value is needed, such as the HAVING clause of the outer query. (It's in the HAVING clause and not a WHERE clause, since it refers to group properties - count(book) - and not to individual row properties).
The more efficient solution is as Dudu showed:
select name, ct as max_num_books
from ( select name, count(*) as ct, rank() over (order by count(*) desc) rnk
from author
group by name
)
where rnk = 1;