SQL remove duplicate row depend on certain value - sql

I spend day in hope to figure out how to solve this query.
I have following table
ID Name Pregnancy Gender
1 Raghad Yes Female
1 Raghad No Female
2 Ohoud no Male
What I need is to remove duplicate (in this case 1,1) and to keep one of these rows which has a pregnancy status of yes.
To clarify, I can't use delete since it's a restricted database. I can only retrieve data.

Using an exists clause:
DELETE
FROM yourTable t1
WHERE
pregnancy = 'no' AND
EXISTS (SELECT 1 FROM yourTable t2 WHERE t2.ID = t1.ID AND t2.pregnancy = 'yes');
There are other ways to go about doing this, e.g. using ROW_NUMBER, but as you did not tag your database, I offer the above solution which should work on basically any database.
If you want to just view your data with the "duplicates" removed, then use:
SELECT *
FROM yourTable t1
WHERE
pregnancy = 'yes' OR
NOT EXISTS (SELECT 1 FROM yourTable t2 WHERE t2.ID = t1.ID AND t2.pregnancy = 'yes');

If column Pregnancy have just two values "Yes" and "No", in that case you can use ROW_NUMBER() also to get the results.
;WITH CTE
AS (
SELECT *,ROW_NUMBER() OVER (PARTITION BY id ORDER BY Pregnancy DESC) RN
FROM TABLE_NAME
)
SELECT *
FROM CTE
WHERE RN = 1
In case of multiple values when you want to give highest priorty to "Yes", you can write your query like following
;WITH CTE
AS (
SELECT *,ROW_NUMBER() OVER
(PARTITION BY id ORDER BY CASE WHEN Pregnancy = 'Yes' then 0 else 1 end) RN
FROM TABLE_NAME
)
SELECT *
FROM CTE
WHERE RN= 1

For this sample data you can group by ID, Name, Gender and return the maximum value of the column Pregnancy for each group since Yes is greater compared to No:
SELECT ID, Name, MAX(Pregnancy) Pregnancy, Gender
FROM tablename
GROUP BY ID, Name, Gender
See the demo.
Results:
> ID | Name | Pregnancy | Gender
> -: | :----- | :-------- | :-----
> 1 | Raghad | Yes | Female
> 2 | Ohoud | No | Male

Here is how you could do it in MySQL 8.
Similar Common Table Expressions exist in SQL Server and Oracle.
There you may need to add a comma after then closing parentheses that
ends the CTE (with) definition.
with dups as (
Select id from test
group by id
Having count(1) > 1
)
select * from test
where id in (select id from dups)
and Pregnancy = 'Yes'
union all
select * from test where id not in (select id from dups);
You can see it in action, by running it here
Note this does it without deleting the original.
But it gives you a result set to work with that has what you want.
If you wanted to delete, then you could use this instead, after the dups CTE definition:
delete from test
where id in (select id from dups) and Pregnancy = 'No'
Or distill this into:
delete from test
where id in (Select id from test
group by id
Having count(1) > 1) and Pregnancy = 'No'

1) First of all, update design of your table. ID must be primary key. This would automatically restrict the duplicate rows having same ID.
2) You can use Group by and having clause to remove duplicates
delete from table where pregnancy='no' and exists (SELECT
id
FROM table
GROUP BY id
HAVING count(id)>1)

Related

Removing rows from result set where column only has one value against a user

I have a result set
name stage value
---- ----- -----
jim 1 4
jim 1 8
paul 1 8
paul 1 8
want to remove the rows where 8 is the only value against a person
keep the 2 jim rows and lose the 2 paul rows
You can use not exists. For a select query:
select t.*
from t
where not exists (select 1
from t t2
where t2.name = t.name and t2.value = 8
);
Similar logic (except using exists rather than not exists) can be used for a delete -- if you really want to delete the rows from the table.
If you have a complex query that you don't want to repeat, then window functions are helpful:
select t.*
from (select t.*,
sum(case when value = 8 then 1 else 0 end) over (partition by name) as cnt_8
from t
) t
where cnt_8 = 0;
If your database support analytical function then you can use count as follows:
Select * from
(Select t.*,
Count(case when value <> 8 then 1 end) over (partition by name) as cnt
From your_table t) t
Where cnt > 0
Assuming you also have an ID column (defined as an auto increment integer) defined in your table this query would select the row with the highest id for each unique combination:
select max(id) from t group by name,stage,value
In your example this would only return the latest id for rows having values paul,1,8 in columns name,stage,value respectively.
You can then use the prior query to filter out any duplciates using it in the where clause:
select * from t
where id in (select max(id) from t group by name,stage,value)
Finally you can also delete rows that are not unique if that's your goal:
delete from t
where not id in (select max(id) from t group by name,stage,value)

multi condition on different rows

age | name | course | score
_________________________
10 |James | Math | 10
10 |James | Lab | 15
12 |Oliver | Math | 15
13 |William | Lab | 13
I want select record where math >= 10 and lab >11
I write this query
select * from mytable
where (course='Math' and score>10) and (course='Lab' and score>11)
but this query does not return any record.
I want this result
age | name
____________
10 |James
where condition (math >= 10 and lab >11) is dynamically generate and perhaps has 2 condition or 100 or more...
please help me
You query looks for records that satisfy both conditions at once - which cannot happen, since each record has a single course.
You want a condition that applies across rows having the same name, so this suggest aggregation instead:
select age, name
from mytable
where course in ('Math', 'Lab')
group by age, name
having
max(case when course = 'Math' then score end) > 10
and max(case when course = 'Lab' then score end) > 11
If you want the names, then use aggregation and a having clause:
select name, age
from mytable
where (course = 'Math' and score > 10) or
(course = 'Lab' and score > 11)
group by name, age
having count(distinct course) = 2;
If you want the detailed records, use window functions:
select t.*
from (select t.*,
(dense_rank() over (partition by name, age order by course asc) +
dense_rank() over (partition by name, age order by course desc)
) as cnt_unique_courses
from mytable t
where (course = 'Math' and score > 10) or
(course = 'Lab' and score > 11)
) t
where cnt_unique_courses = 2;
SQL Server doesn't support count(distinct) as a window function. But you can implement it by using dense_rank() twice.
If you formulate the problem as:
Select all unique (name, age) combinations
That have a row for course Math with a score >= 10
And that have a row for course Lab with a score > 11
Then you can translate this to something very similar in SQL:
select distinct t1.age, t1.name -- unique combinations
from mytable t1
where exists ( select top 1 'x' -- with a row math score >= 10
from mytable t2
where t2.name = t1.name
and t2.age = t1.age
and t2.course = 'math'
and t2.score >= 10 )
and exists ( select top 1 'x' -- with a row lab score > 11
from mytable t3
where t3.name = t1.name
and t3.age = t1.age
and t3.course = 'lab'
and t3.score > 11 );
i think either your data or your condition is not right to get your output. though based on your condition you can separately used your condition and then use Intersect from both selection and get your filtered data. like the code below.
select Age,Name
from Table_1
where Course ='Math' and Score>=10
INTERSECT
select Age,Name
from Table_1
where Course ='Lab' and Score>11
You can write query using co-related subquery
select * from table_1 t1
where score >11 and course ='lab'
and [name] in (select [name] from table_1 t2 where t1.[name] =t2.[name] and t1.age =t2.Age
and t2.Score >=10 and course = 'Math')

SQL Joining two tables and removing the duplicates from the two tables but without loosing any duplicates from the tables itslef

I want to join two tables and remove duplicates from both the tables but keeping any duplicate value found in the first table.
T1
Name
-----
A
A
B
C
T2
Name
----
A
D
E
Expected result
A - > FROM T1
A - > FROM T1
B
C
D
E
I tried union but removes all duplicates of 'A' from both tables.
How can I achieve this?
Filter T2 before UNION ALL
select col
from T1
union all
select col
from T2
where not exists (select 1 from T1 where T1.col = T2.col)
Assuming you want the number of duplicates from the table with the most repetitions for each value, you can do it with the ROW_NUMBER() windowing function, to eliminate duplicates by their sequence with the set of repetitions in each table.
SELECT Name FROM (
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T1
UNION
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T2
) x
ORDER BY Name
To see how this works out, we add two B rows to T2 then do this:
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T1
Name Row
A 1
A 2
B 1
C 1
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T2
Name Row
A 1
B 1
B 2
D 1
E 1
Now UNION them without ALL to combine and eliminate duplicates:
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T1
UNION
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T2
Name Row
A 1
A 2
B 1
B 2
C 1
D 1
E 1
The final query up top is then just eliminating the Row column and sorting the result, to ensure ascending order.
See SQL Fiddle for demo.
select * from T1
union all
select * from T2 where name not in (select distinct name from T1)
Sql Fiddle Demo
you should use "union all" instead of "union".
"union" remove other duplicated records while "union all" gives all of them.
for you result,because of we filtered intersects from table 2 in "where",we don't need "UNION ALL"
select col1 from t1
union
select col1 from t2 where t2.col1 not in(select t1.col1 from t1)
I D'not know the following code is good practice or not But it's working
select name from T1
UNION
select name from T2 Where name not in (select name from T1)
The Above Query Filter the value based on T1 value and then join two tables values and show the result.
I hope it's helps you thanks.
Note : It's not better way to get result it's affect your performance.
I sure i update the better solution after my research
You want all names from T1 and all names from T2 except the names that are in T1.
So you can use UNION ALL for the 2 cases and the operator EXCEPT to filter the rows of T2:
SELECT Name FROM T1
UNION ALL
(
SELECT Name FROM T2
EXCEPT
SELECT Name FROM T1
)
See the demo.
Results:
> | Name |
> | :--- |
> | A |
> | A |
> | B |
> | C |
> | D |
> | E |

Remove duplicates in Select query based on one column

I want to select without duplicate ids and keep row '5d' and not '5e' in select statement.
table
id | name
1 | a
2 | b
3 | c
5 | d
5 | e
I tried:
SELECT id, name
FROM table t
INNER JOIN (SELECT DISTINCT id FROM table) t2 ON t.id = t2.id
For the given example an aggregation using min() would work.
SELECT id,
min(name) name
FROM table
GROUP BY id;
You can also use ROW_NUMBER():
SELECT id, name
FROM (
SELECT id, name, ROW_NUMBER() OVER(PARTITION BY id ORDER BY name) rn
FROM mytable
) x
WHERE rn = 1
This will retain the record that has the smallest name (so '5d' will come before '5e'). With this technique, you can also use a sort criteria on another column that the one where duplicates exists (which an aggregate query with MIN() cannot do). Also, queries using window functions usually perform better than the equivalent aggregate query.
If you want to keep the row with the smallest name then you can use not exists:
select t.* from tablename t
where not exists (
select 1 from tablename
where id = t.id and name < t.name
)

how to get dupes from table using group by and/or having

If I have this table:
id | aux_id | name
------------------
1 | 22 | foo
2 | 22 | bar
3 | 19 | baz
How can I get this result, showing names that share an aux_id with at least one other record?
name
----
foo
bar
I know I need to use GROUP BY and/or HAVING but this isn't working:
SELECT name FROM my_table
GROUP BY aux_id
HAVING COUNT(aux_id) > 1
Column 'name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
How about exists?
select t.name
from my_table t
where exists (select 1
from my_table t2
where t2.aux_id = t.aux_id and t2.name <> t.name
);
I would use exists :
select t.name
from table t
where exists (select 1 from table t1 where t1.aux_id = t.aux_id and t1.id <> t.id);
This will have a advantage to cover all columns if you want, without using group by clause.
An alternative, just for fun...
WITH
duplication_counts AS
(
SELECT
*,
COUNT(*) OVER (PARTITION BY aux_id) AS aux_id_occurrences
FROM
my_table
)
SELECT
*
FROM
duplication_counts
WHERE
aux_id_occurrences > 1
Group by works IMHO (performance would not be good in large data as it would be with EXISTS):
select * from myTable
where aux_id in
(select aux_id
from myTable
group by aux_id
having count(*) > 1)
SQLFiddle Demo