delete duplicates and retain MAX(id) mysql

delete duplicates and retain MAX(id) mysql - sql

I have a code where it list all the duplicates of the data on database
SELECT MAX(id) id
FROM el_student_class_relation
GROUP BY student_id, class_id
HAVING COUNT(*) > 1
Now, I'm trying to retain the MAX(id), then the rest of the duplicates should be deleted
I tried the code
DELETE us
FROM el_student_class_relation us
INNER JOIN(SELECT MAX(id) id
FROM el_student_class_relation
GROUP BY student_id, class_id HAVING COUNT(*) > 1) t ON t.id = us.id
But it deletes the MAX(ID) and it is retaining the the other duplicates and it is the opposite of what I want.

Try this
DELETE FROM el_student_class_relation
WHERE id not in
(
SELECT * from
(SELECT MAX(id) id
FROM el_student_class_relation
GROUP BY student_id, class_id) temp_tbl
)
Please note:
do not use the HAVING COUNT(*) > 1 in inner query.
it will create issue when there is only single record with same id.

You might try the following query that deletes all elements for which another one with a higher ID (and same class and student) exists:
DELETE
FROM el_student_class_relation el1
WHERE EXISTS (SELECT el2.id
FROM el_student_class_relation el2
WHERE el1.student_id = el2.student_id
AND el1.class_id = el2.class_id
AND el2.id > el1.id);

The direct fix for your query is to use an "anti-join", where NOT joining is the important feature. This can be done with LEFT JOIN.
DELETE
us
FROM
el_student_class_relation us
LEFT JOIN
(
SELECT student_id, class_id, MAX(id) id
FROM el_student_class_relation
GROUP BY student_id, class_id
-- HAVING COUNT(*) > 1 [Don't do this, you need to return ALL the rows you want to keep]
)
gr
ON gr.id = us.id
WHERE
gr.id IS NULL -- WHERE there wasn't a match in the "good rows" table
EDIT MariaDB and MySQL aren't the same thing. MariaDB DOES allow self joins on the table being deleted from.

in mysql(lower version) in case of delete sub-query work a little bit different way, you have to use a layer more than required
DELETE FROM el_student_class_relation us
WHERE us.id not in
(
select * from (
SELECT MAX(id) id
FROM el_student_class_relation
GROUP BY student_id, class_id
) t1
)

Related

Subquery and normal query comes out with different results

I'm a beginner of the oracle, currently, I'm doing a question using subquery(without JOIN) and normal (with JOIN) query, but at the end, the results are different from this two query,
I can't figure out this problem, does anyone know?
The question is asking about list the dog owner details which has booked at least twice in this platform
SELECT PET_OWNER.Owner_id,Oname,OAdd,COUNT(*) AS BOOKING
FROM PET_OWNER
WHERE Owner_id IN(
SELECT Owner_id
FROM PET
WHERE PType = 'DOG' AND Pet_id IN(SELECT Pet_id FROM BOOKING))
GROUP BY PET_OWNER.Owner_id,Oname,OAdd
HAVING COUNT(*) >=2
ORDER BY PET_OWNER.Owner_id;
This subquery shows no rows selected,
SELECT PET_OWNER.Owner_id,Oname,OAdd,COUNT(*) AS BOOKING
FROM PET_OWNER,PET,BOOKING
WHERE PET_OWNER.Owner_id = PET.Owner_id AND
PET.Pet_id = BOOKING.Pet_id AND
PType = 'DOG'
GROUP BY PET_OWNER.Owner_id,Oname,OAdd
HAVING COUNT(*) >=2
ORDER BY PET_OWNER.Owner_id;
this query shows 10 records which are the correct answer for this question
I expected these two queries come out with the same result but it is not
does anyone know what is wrong with it?
can anyone show me how to convert this code to subquery?

Because duplicated join key will cause duplicatation in result.
In your case, the Owner_id should be non-unique in the PET table.
It is still possible to get the correct answer by using join. And as the owner_id in the subquery t is unique, so the execution plan should be same with the subquery version.
select p.* from Pet_Owner p
join (
select PET.Owner_id
from PET
inner join Booking on Booking.Pet_id = PET.Pet_id
where pType = 'DOG'
group by PET.Owner_id
having count(1) >= 2) t
on t.Owner_id = p.Owner_id
order by p.Owner_id
By the way, your SQL code is so old-school as it is in ANSI-89, while the join syntax is already in ANSI-92. I know many school teachers still love the old style, I hope you can read both, but only write code in ANSI-92 way.

What happen is that it will give you distinct values on your PET_OWNER.Owner_id,Oname,OAdd. So what we need is to group by owner_id first.
Here's your query. get first those owner_id with count() >= 2 as subquery
select * from Pet_Owner where Owner_id in (
select t1.Owner_id from PET_OWNER t1
inner join PET t2 on t1.Owner_id = t2.Owner_id
inner join Booking t3 on t3.Pet_id = t2.Pet_id
where pType = 'DOG'
group by t1.Owner_id
having count(1) >= 2)
order by Owner_id
not using join, nested subqueries is our only option
select * from Pet_Owner where Owner_id in (
select owner_id from Pet_Owner where Owner_id in
(select Owner_id from Pet where Pet_id in
(select Pet_id in Booking) and PType='DOG')
group by owner_id
having count(1) >= 2)
order by Owner_id
if you are trying to the # of dogs per owner:
select * from Pet_Owner where Owner_id in (
select Owner_id from Pet where Pet_id in
(select Pet_id in Booking) and PType='DOG'
group by owner_id
having count(1) >= 2)
) order by Owner_id

How to create a good request with "max(count(*))"?

I have to say who is the scientist who have been the most in mission. I tried this code but it wasn't successful:
select name
from scientist, mission
where mission.nums = chercheur.nums
having count(*) = (select max(count(numis)) from mission, scientist where
mission.nums = chercheur.nums
group by name)
I have done several modifications for this request but I only obtain errors (ora-0095 and ora-0096 if I remember correctly).
Also, I create my tables with:
CREATE TABLE Scientist
(NUMS NUMBER(8),
NAME VARCHAR2 (15),
CONSTRAINT CP_CHER PRIMARY KEY (NUMS));
CREATE TABLE MISSION
(NUMIS NUMBER(8),
Country VARCHAR2 (15),
NUMS NUMBER(8),
CONSTRAINT CP_MIS PRIMARY KEY (NUMIS),
CONSTRAINT CE_MIS FOREIGN KEY (NUMS) REFERENCES SCIENTIST (NUMC));

You could count the missions each scientist participated in, and wrap that query in a query with a window function that will rank them according to their participation:
SELECT name
FROM (SELECT name, RANK() OVER (PARTITION BY name ORDER BY cnt DESC) AS rk
FROM (SELECT name, COUNT(*) AS cnt
FROM scientist s
JOIN mission m ON s.nums = m.nums
GROUP BY name) t
) q
WHERE rk = 1

Step 0 : Format your code :-) It would make it much easier to visualize
Step 1 : Get the count of Numis by Nums in the Mission table. This will tell you how many missions were done by each Nums
This is done in the cte block cnt_by_nums
Next to get the name of the scientist by joining cnt_by_nums with scientist table.
After that you want to get only those scientists who have the cnt_by_missions as the max available value from cnt_by_num
with cnt_by_nums
as (select Nums,count(Numis) as cnt_missions
from mission
group by Nums
)
select a.Nums,max(b.Name) as name
from cnt_by_nums a
join scientist b
on a.Nums=b.Nums
group by a.Nums
having count(a.cnt_missions)=(select max(a1.cnt_missions) from cnt_by_nums a1)

I'd write a query like this:
SELECT NAME, COUNTER
FROM
(SELECT NAME, COUNT(*) AS COUNTER
FROM SCIENTIST S
LEFT JOIN MISSION M
ON S.NUMS=M.NUMS
GROUP BY NAME) NUM
INNER JOIN
(SELECT MAX(COUNTER) AS MAX_COUNTER FROM
(SELECT NAME, COUNT(*) AS COUNTER
FROM SCIENTIST S
LEFT JOIN MISSION M
ON S.NUMS=M.NUMS
GROUP BY NAME) C) MAX
ON NUM.COUNTER=MAX.MAX_COUNTER;
(it works on MYSQL, I hope it's the same in Oracle)

As you don't select the name of your scientist (only count their missions) you don't need to join those tables within the subquery. Grouping over the foreign key would be sufficient:
select count(numis) from mission group by nums
You column names are a bit weird but that's your choice ;-)
Selecting only the scientist with the most mission references could be achieved in two ways. One way would be your approach where you may get multiple scientists if they have the same max missions.
The first problem you have in your query is that you are checking an aggregation (HAVING COUNT(*) = ) without grouping. You are only grouping your subselect.
Second, you could not aggregate an aggregation (MAX(COUNT)) but you may select only the first row of that subselect ordered by it's size or select the max of it by subquerying the subquery.
Approach with only one line:
select s.name from scientist s, mission m
where m.nums = s.nums
group by name
having count(*) =
(select count(numis) from mission
group by nums
order by 1 desc
fetch first 1 row only)
Approach with double subquery:
select s.name from scientist s, mission m
where m.nums = s.nums
group by name having count(*) =
(select max(numis) from
(select count(numis) numis from mission group by nums)
)
Second approach would be doing the FETCH FIRST on yur final result but this would give you exactly 1 scientist even if there are multiple with the same max missions:
select s.name from scientist s, mission m
where m.nums = s.nums
group by name
order by count(*) desc
fetch first 1 row only
Doing a cartisian product is not state of the art but the optimizer would make it a "good join" with the given reference in the where clause.
Made these with IBM Db2 but should also work on Oracle.

If you want one row, then in Oracle 12+, you can do:
SELECT name, COUNT(*) AS cnt
FROM scientist s JOIN
mission m
ON s.nums = m.nums
GROUP BY name
ORDER BY COUNT(*) DESC
FETCH FIRST 1 ROW ONLY;
In earlier versions, you would generally use a subquery:
SELECT s.*
FROM (SELECT name, COUNT(*) AS cnt
FROM scientist s JOIN
mission m
ON s.nums = m.nums
GROUP BY name
ORDER BY COUNT(*) DESC
) sm
WHERE rownum = 1;
If you want ties, then generally window functions would be a simple solution:
SELECT s.*
FROM (SELECT name, COUNT(*) AS cnt,
RANK() OVER (ORDER BY COUNT(*) DESC) as seqnum
FROM scientist s JOIN
mission m
ON s.nums = m.nums
GROUP BY name
ORDER BY COUNT(*) DESC
) sm
WHERE seqnum = 1;

Update data in table1 where duplicates in table2

I have 2 tables: Users and Results.
The usertable contains duplicate data which is reflected in the results table. The user below is created 3 times. I need to update the results table where UserId 2 and 3 to 1 so that all the results can be viewed on this user only.
This is easy if I have only have a few users and a few results for them, but in my case I have 500 duplicated users and 30000 results.
I am using SQL Server Express 2014
I will really appreciate any help with this!
Edit: misstyped column names in resultTable. Im sorry if you got confused by it.
UserTable
UserId---Fname---LName
1-----Georg-----Smith
2-----Georg-----Smith
3-----Georg-----Smith
ResultsTable
ResultId---UserRefId
1-----1
2-----2
3-----3
4-----1
I have manage to select duplicates from usertable, but i don't know how to proceed further.
;WITH T AS
(
SELECT *, COUNT(*) OVER (PARTITION BY Fname + Lname) as Cnt
FROM TestDatabase.Users
)
SELECT Id, Fname, Lname
FROM T
WHERE Cnt > 2

Your ResultTable has 2 columns with the same UserId name. I changed the second to UserId2 for the query below:
;WITH cte As
(
SELECT R.UserId, R.UserId2,
MIN(U.UserId) OVER (PARTITION BY U.FName, U.LName) As OriginalUserId
FROM ResultTable R
INNER JOIN UserTable U ON R.UserId = U.UserId
)
UPDATE cte
SET UserId2 = OriginalUserId

You are on the right track with the cte. The ROW_NUMBER() function can be used to flag duplicate UserIds, then you can join the cte into the from clause of your update statement to find the UserIds you want to replace, and join again to find the UserIds you want to replace them with.
;WITH cteDedup AS(
SELECT
UserId
,FName
,LName
,ROW_NUMBER() OVER(PARTITION BY FName, LName ORDER BY UserID ASC) AS row_num
FROM UserTable
)
UPDATE rt
SET UserId = original.UserId
FROM ResultsTable rt
JOIN cteDedup dupe
ON rt.UserId = dupe.UserId
JOIN cteDedup original
ON dupe.FName = original.FName
AND dupe.LName = original.LName
WHERE dupe.row_num <> 1
AND original.row_num = 1
See the SQLFiddle

A little tricky query looks like this:
;with t as (
select fname+lname name,id,
ROW_NUMBER() over(partition by fname+lname order by id) rn
from #users
)
--for test purpose comment next 2 lines
update #results
set userid=t1.id
--and uncomment the next one
--select t.name,t.id,userid,res,t1.id id1--,(select top 1 id from t t1 where t1.name=t.name and t.rn=1) id1
from t
inner join #results r on t.id=r.userid
inner join t t1 on t.name=t1.name and t1.rn=1
And then you can delete duplicate users
;with t as (
select name,id,
ROW_NUMBER() over(partition by name order by id) rn
from #users
)
delete t where rn>1

Retrieve rows that matches with all the values listed

Hi I need to get the rows which matches all the groupid listed as an array
SELECT user_id,group_id
FROM group_privilege_details g
WHERE g.group_id in (102,101)
This will return me if any one of the groupid matches. But, I need userid which has all the groupid mention in the list.

Assuming that you cannot have duplicate user_id/group_id combinations:
SELECT user_id,count(group_id)
FROM group_privilege_details g
WHERE g.group_id in (102,101)
GROUP BY user_id
HAVING count(group_id) = 2

Here is a variant of Steven's query for generic arrays:
SELECT user_id
FROM group_privilege_details
WHERE group_id = ANY(my_array)
GROUP BY 1
HAVING count(*) = array_length(my_array, 1)
Works as long as these requirements are met (not mentioned in the question):
(user_id, group_id) is unique in group_privilege_details.
array has only 1 dimension
base array-elements are unique
A generic solution that works regardless of these preconditions:
WITH ids AS (SELECT DISTINCT unnest(my_array) group_id)
SELECT g.user_id
FROM (SELECT user_id, group_id FROM group_privilege_details GROUP BY 1,2) g
JOIN ids USING (group_id)
GROUP BY 1
HAVING count(*) = (SELECT count(*) FROM ids)
unnest() produces one row per base-element. DISTINCT removes possible dupes. The subselect does the same for the table.
Extensive list of options for this kind of queries: How to filter SQL results in a has-many-through relation

Please find my solved query:
select user_id,login_name from user_info where user_id in (
SELECT user_id FROM
group_privilege_details g WHERE g.group_id in
(select group_id from group_privilege_details g,user_info u where u.user_id=g.user_id
and login_name='123')
GROUP BY user_id HAVING count(group_id) = (select count(group_id)
from group_privilege_details g,user_info u where u.user_id=g.user_id
and login_name='123') ) and login_name!='123'

Help with query

I'm trying to make a query that looks at a single table to see if a student is in a team called CMHT and in a medic team - if they are I don't want to see the result.
I only want see the record if they're only in CMHT or medic, not both.
Would the right direction be using sub query to filter it out? I've done a search on NOT IN but how could you get to see check if its in more then 2 teams are not?
Student Team ref
1 CMHT 1
1 Medic 2
2 Medic 3 this would be in the result
3 CMHT 5 this would be in the result
So far I've done the following code would I need use a sub query or do a self join and filter it that way?
SELECT Table1.Student, Table1.Team, Table1.refnumber
FROM Table1
WHERE (((Table1.Team) In ('Medics','CMHT'))

This is Mark Byers's answer with a HAVING clause instead of a subquery:
SELECT Student, Team, ref
FROM Table1
GROUP BY Student
HAVING COUNT(Student) = 1

SELECT *
FROM students
WHERE NOT EXISTS
(
SELECT NULL
FROM students si
WHERE si.student = s.student
AND si.team = 'CMHT'
)
OR NOT EXISTS
(
SELECT NULL
FROM students si
WHERE si.student = s.student
AND si.team = 'Medic'
)

SELECT a.*
FROM Table1 a
INNER JOIN
( SELECT Student, COUNT(*) FROM Table1
GROUP BY Student
HAVING COUNT(*) = 1)b
ON (a.Student = b.Student)

how could you get to see check if its in 2 or more teams?
You can count the number of teams per student and then filter only those you want to see:
SELECT student FROM
(
SELECT student, COUNT(*) AS cnt
FROM Table1
GROUP BY student
) T1
WHERE cnt = 1

You can do it with outer join
select COALESCE(t1.Student, t2.Student) as Student,
COALESCE(t1.Team, t2.Team) as Team,
COALESCE(t1.ref, t2.ref) as ref
from
(select * from Student where Team = 'CMHT') t1
outer join
(select * from Student where Team = 'Medic') t2
on t1.Student = t2.Student
where
t1.Student is null or
t2.Student is null;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

delete duplicates and retain MAX(id) mysql - sql

in mysql(lower version) in case of delete sub-query work a little bit different way, you have to use a layer more than required DELETE FROM el_student_class_relation us WHERE us.id not in ( select * from ( SELECT MAX(id) id FROM el_student_class_relation GROUP BY student_id, class_id ) t1 )

Related

Subquery and normal query comes out with different results

How to create a good request with "max(count(*))"?

Update data in table1 where duplicates in table2

Retrieve rows that matches with all the values listed

Help with query

Categories

Resources