select duplicate columns in sql server [duplicate] - sql

This question already has answers here:
Select statement to find duplicates on certain fields
(9 answers)
Closed 7 years ago.
Table 1
Id Name
1 xxxxx
1 ccccc
2 uuuuu
3 ddddd
I want to select where the Id have multiple entries with same Id
How to do this?

You can find ids with multiple entries and then use LEFT JOIN/IS NOT NULL pattern to retrieve corresponding data from the original table :
SELECT t1.*
FROM tbl t1
LEFT JOIN ( SELECT id
FROM tbl
GROUP BY id
HAVING COUNT(*) > 1) t2 ON t1.id = t2.id
WHERE t2.id IS NOT NULL
Other possible solutions include using EXISTS or IN clauses instead of LEFT JOIN/IS NOT NULL.

With ranking functions
Y as (
select *, count(*) over (partition by id) counter
from X)
select id, name from Y where counter > 1

Related

SQL Joining two tables and removing the duplicates from the two tables but without loosing any duplicates from the tables itslef

I want to join two tables and remove duplicates from both the tables but keeping any duplicate value found in the first table.
T1
Name
-----
A
A
B
C
T2
Name
----
A
D
E
Expected result
A - > FROM T1
A - > FROM T1
B
C
D
E
I tried union but removes all duplicates of 'A' from both tables.
How can I achieve this?
Filter T2 before UNION ALL
select col
from T1
union all
select col
from T2
where not exists (select 1 from T1 where T1.col = T2.col)
Assuming you want the number of duplicates from the table with the most repetitions for each value, you can do it with the ROW_NUMBER() windowing function, to eliminate duplicates by their sequence with the set of repetitions in each table.
SELECT Name FROM (
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T1
UNION
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T2
) x
ORDER BY Name
To see how this works out, we add two B rows to T2 then do this:
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T1
Name Row
A 1
A 2
B 1
C 1
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T2
Name Row
A 1
B 1
B 2
D 1
E 1
Now UNION them without ALL to combine and eliminate duplicates:
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T1
UNION
SELECT Name, ROW_NUMBER() OVER ( PARTITION BY Name ORDER BY Name ) AS Row
FROM T2
Name Row
A 1
A 2
B 1
B 2
C 1
D 1
E 1
The final query up top is then just eliminating the Row column and sorting the result, to ensure ascending order.
See SQL Fiddle for demo.
select * from T1
union all
select * from T2 where name not in (select distinct name from T1)
Sql Fiddle Demo
you should use "union all" instead of "union".
"union" remove other duplicated records while "union all" gives all of them.
for you result,because of we filtered intersects from table 2 in "where",we don't need "UNION ALL"
select col1 from t1
union
select col1 from t2 where t2.col1 not in(select t1.col1 from t1)
I D'not know the following code is good practice or not But it's working
select name from T1
UNION
select name from T2 Where name not in (select name from T1)
The Above Query Filter the value based on T1 value and then join two tables values and show the result.
I hope it's helps you thanks.
Note : It's not better way to get result it's affect your performance.
I sure i update the better solution after my research
You want all names from T1 and all names from T2 except the names that are in T1.
So you can use UNION ALL for the 2 cases and the operator EXCEPT to filter the rows of T2:
SELECT Name FROM T1
UNION ALL
(
SELECT Name FROM T2
EXCEPT
SELECT Name FROM T1
)
See the demo.
Results:
> | Name |
> | :--- |
> | A |
> | A |
> | B |
> | C |
> | D |
> | E |

How to write a query to delete everything except maximum value grouped by an ID?

I am trying to write a query to delete duplicate records based on a ID and a value. There are multiple rows with the same ID. Condition to get the result are (and the queries I have written as per my understanding),
Look for maximum value available for the ID column in Value column (SELECT * FROM TABLE WHERE VALUE IN (SELECT MAX(VALUE) FROM TABLE GROUP BY ID)
Example:
Table data:
ID - Value
a - 1
a - 2
a - 3
b - 2
c - 3
Output:
ID - Value
a - 3
b - 2
c - 3
Ignore the results from point 1 in the table (SELECT * FROM TABLE WHERE NOT EXISTS ((SELECT * FROM TABLE WHERE VALUE IN (SELECT MAX(VALUE) FROM TABLE GROUP BY ID))
Edit: I wrote a query that finally outputs the required result for point 2
SELECT t1.* FROM TABLE t1
LEFT JOIN
(
SELECT 1 AS aux, * FROM (SELECT * FROM TABLE
WHERE VALUE IN
(SELECT MAX(VALUE) FROM TABLE group by ID))
) t2
ON
t2.ID= t1.ID
and
t2.VALUE= t1.VALUE
WHERE t2.aux IS NULL
Example:
Table data:
ID - Value
a - 1
a - 2
a - 3
b - 2
c - 3
Output:
ID - Value
a - 1
a - 2
Use the query of point 2 to delete rows from table (DELETE FROM TABLE WHERE (ID,VALUE) IN (SELECT * FROM TABLE WHERE NOT EXISTS ((SELECT * FROM TABLE WHERE VALUE IN (SELECT MAX(VALUE) FROM TABLE GROUP BY ID)))
Example:
Table data:
ID - Value
a - 1
a - 2
a - 3
b - 2
c - 3
Table data:
ID - Value
a - 3
b - 2
c - 3
Point 2 does not work, it is giving no results. When the checked the total row of output of the query from point 2 and total row of the table, there is a difference.
Since point 2 does not work, point 3 fails as well. What am I doing wrong?
After our discussion, I understand that you aimed to select many rows of data which respects the filter id and max(value). Therefore, I can suggest you the following script:
SELECT
DISTINCT a.*
FROM
`test-proj-261014.sample.id_value` a
RIGHT JOIN (
SELECT
id,
MAX(value) AS max_val
FROM
`test-proj-261014.sample.id_value`
GROUP BY
id
ORDER BY
id) b
ON
a.id = b.id
AND a.value = b.max_val
WHERE
a.value IS NOT NULL
ORDER BY
id;
Not that I use SELECT DISTINCT, which will not select duplicated data. In addition, due to the possibility of the existence of null values, I added the consition***WHERE a.value IS NOT NULL***, which will not select the rows that do not respect the condition.
The above query should solve the problem, however if you find any discrepancy with the expected amount of rows, I encourage you explore your data set and detect the reason why there are extra or less rows. You can use different types of joins to do so, one example would be the following query:
SELECT
a.*
FROM
`test-proj-261014.sample.id_value` a
LEFT JOIN (
SELECT
id,
MAX(value) AS max_val
FROM
`test-proj-261014.sample.id_value`
GROUP BY
id
ORDER BY
id) b
ON
a.id = b.id
AND a.value = b.max_val
WHERE
b.max_val IS NULL
ORDER BY
id;
This query retrieves all the values which are not present in the final output generated by the first query. This would help you understand better the data you are dealing with.
I hope it helps.

SQL: Get duplicates in same table [duplicate]

This question already has answers here:
How do I find duplicates across multiple columns?
(10 answers)
Closed 4 years ago.
I have the following table:
name email number type
1 abc#example.com 10 A
1 abc#example.com 10 B
2 def#def.com 20 B
3 ggg#ggg.com 30 B
1 abc#example.com 10 A
4 hhh#hhh.com 60 A
I want the following:
Result
name email number type
1 abc#example.com 10 A
1 abc#example.com 10 B
1 abc#example.com 10 A
Basically, I want to find the first lines where the three columns (name, email, number) are identical and see them, regardless of type.
How can I achieve this in SQL? I don't want a result with every combination once, I want to see every line that is in the table multiple times.
I thought of doing a group by but a group by gives me only the unique combinations and every line once. I tried it with a join on the table itself but somehow it got too bloated.
Any ideas?
EDIT: I want to display the type column as well, so group by isn't working and therefore, it's not a duplicate.
You can use exists for that case :
select t.*
from table t
where exists (select 1
from table
where name = t.name and email = t.email and
number = t.number and type <> t.type);
You can also use window function if your DBMS support
select *
from (select *, count(*) over (partition by name, email, number) Counter
from table
) t
where counter > 1;
Core SQL-99 compliant solution.
Have a sub-query that returns name, email, number combinations having duplicates. JOIN with that result:
select t1.*
from tablename t1
join (select name, email, number
from tablename
group by name, email, number
having count(*) > 1) t2
on t1.name = t2.name
and t1.email = t2.email
and t1.number = t2.number
You can use window functions:
select t.*
from (select t.*, count(*) over (partition by name, email, number) as cnt
from t
) t
where cnt > 1;
If you only want combos that have different types (which might be your real problem), I would suggest exists:
select t.*
from t
where exists (select 1
from t t2
where t2.name = t.name and t2.email = t.email and t2.number = t.number and t2.type <> t.type
);
For performance, you want an index on (name, email, number, type) for this version.

how do I make multiple count under having clause

some sample data:
Id name value ref
1 ab xy
2 aba z
3 ab xy
4 abc def
5 gxr mdy
what I am trying to do is to get the two column that appeared more than once
so row 1 and row 3 would be selected.
select name, value from table_x
where value is not null group by name having count(name) >= 2
and having count(value) >= 2;
got stucked.....
#vkp's answer is correct if you only care about finding the distinct name/value pairs that appear more than once. But if you actually want the individual rows that satisfy the criteria, try this:
SELECT t1.Name, t1.[Value]
FROM Table_X t1
JOIN
(
SELECT Name, [Value]
FROM Table_X
where [Value] IS NOT NULL
GROUP BY Name, [Value]
HAVING COUNT(1) >= 2
) t2 ON t1.Name = t2.Name AND t1.[Value] = t2.[Value]
Your syntax is incorrect. group by name and value and check for count >=2 thereafter.
select name, value
from table_x
where value is not null
group by name, value
having count(*) >= 2;

Count the number of pairs in SQL

I have a column and I would like to count the number of unique pairings of the elements within the column in SQL, for example, in Col 1 the number of unique pairings should be 6: ([1,2],[1,3],[1,4],[2,3],[2,4],[3,4]). Thanks!
col 1,
1
2
3
4
Consider a scenario where in we have dulpicates values in the table say
col1
1
1
2
3
4
5
The total number of unique combinations is 10:([1,2],[1,3],[1,4],[1,5],[2,3],[2,4],[2,5],[3,4][3,5],[4,5]).
But the given query below is giving me a count of 14 because of the dulplicate 1 which is counting 4 extra pairs [1,2],[1,3],[1,4],[1,5] twice.
select count(*)
from table t1 join
table t2
on t1.col1 < t2.col1;
To modify this defect I have the following query which ensures that the duplicates are removed and we get the correct output.The table name I have chosen is countunique which can store integer values in it in column named col1.
select count(*) from
(select distinct col1 from countunique) t1
join (select distinct col1 from countunique) t2
on t1.col1<t2.col1
SQL Fiddle for your reference SQLFIDDLE
Hope this answers to your question.
There are two ways. The cumbersome way is to generate the pairs and then count them:
select count(*)
from table t1 join
table t2
on t1.col1 < t2.col1;
The simpler way is to use a formula:
select count(*) * (count(*) - 1) / 2
from table t;