SQL Duplicates multi criteria selection - sql

I am trying to create a query that removes duplicates based multiple rules.
Some sample data:
Column A: Column B:
One Apple
One Pear
Two Apple
Two Mango
Three Pear
Four Mango
Five Plum
Six Mango
Zero Banana
Essentially what I have been rattling about is that I would like the query to return the distinct pairs based upon duplicates from each column. Meaning that the if there is a duplicate in A, all entries are removed based upon the column duplicates(example, two and two would remove two-apple, two-mango). The same logic for B(ex, apple apple and mango mango being taken out) So the final results would be:
Column A: Column B:
Three Pear
Zero Banana
Five Plum
Any pointers would be great. I am on SQL Server. Thank you.

You could join back on the table and then select rows that have no matches (e.g. no duplicates).
SELECT a, b
FROM my_table source
LEFT JOIN my_table a_dups
ON source.a = a_dups.a
AND source.b <> a_dups.b
LEFT JOIN my_table b_dups
ON source.b = b_dups.b
AND source.a <> b_dups.a
WHERE a_dups.a IS NULL
AND b_dups.b IS NULL
Typing this outside IDE so pardon SQL errors but hopefully should give you an idea.

You can use a windowing functions to get the count of the each field and then just check that the count is one for both. Like this:
SELECT ColumnA, ColumnB
FROM (
SELECT ColumnA, ColumnB,
COUNT(*) OVER (PARTITION BY ColumnA) as ACount,
COUNT(*) OVER (PARTITION BY ColumnB) as BCount
FROM TABLE
) X
WHERE ACount = 1 AND BCount = 1

Here it goes:
Creating sample data set:
CREATE TABLE #temp (ColumnA varchar(20), ColumnB varchar(20))
INSERT INTO #temp
VALUES('One','Apple'),
('One','Pear'),
('Two','Apple'),
('Two','Mango'),
('Three','Pear'),
('Four','Mango'),
('Five','Plum'),
('Six','Mango'),
('Zero','Banana');
Showing full data set:
SELECT * FROM #temp;
using Common Table expression with patrition by to identify duplicates in both columns:
WITH CTE AS (SELECT *,ROW_NUMBER() OVER (PARTITION BY ColumnA ORDER BY ColumnA ) AS rn1, ROW_NUMBER() OVER (PARTITION BY ColumnB ORDER BY ColumnB ) AS rn2 FROM #temp)
SELECT * FROM CTE WHERE ColumnA NOT IN (SELECT ColumnA FROM CTE WHERE rn1 <> 1) AND ColumnB NOT IN (SELECT ColumnB FROM CTE WHERE rn2 <> 1)
Showing result after:
SELECT * FROM #temp;

Related

Subtract one table from another

I have two tables with two different select statements. These tables contain only one column. I would like to subtract the rows from table2 from rows in table1 only once. In other words: I would like to remove only one occurence, not all.
table1:
apple
apple
orange
table2:
apple
pear
result:
apple
orange
Basically FYI If A={A,A,O},B={A,P} then A-B is logically
select * from t1 except select * from t2
try this !
create table #t(id varchar(10))
create table #t1(id1 varchar(10))
insert into #t values('apple'),('apple'),('orange')
insert into #t1 values('apple'),('pear')
select * from
(
select *,rn=row_number()over(partition by id order by id) from #t
except
select *,rn1=row_number()over(partition by id1 order by id1) from #t1
)x
SEE DEMO
Here is an answer for an Oracle dbms. The trick is to number records per fruit, so to get apple 1, apple 2, etc. Then subtract the sets to stay with apple 2 whereas apple 1 was removed for instance. (The row_number function needs a sort order which is not important for us, but we must specify it for syntax reasons.)
select fruit
from
(
select fruit, row_number() over (partition by fruit order by fruit)
from table1
minus
select fruit, row_number() over (partition by fruit order by fruit)
from table2
);
I don't think you would be able to delete the records through a single SQL query:
According to me, you will need to connect the database to a programming language and will need to write an algorithm like this:
for(all records in table2)
{
if(record present in table1)
{
delete from table1 where record = (record in table2) limit 1;
}
}

How to select all columns for rows where I check if just 1 or 2 columns contain duplicate values

I'm having difficulty with what I figure should be an easy problem. I want to select all the columns in a table for which one particular column has duplicate values.
I've been trying to use aggregate functions, but that's constraining me as I want to just match on one column and display all values. Using aggregates seems to require that I 'group by' all columns I'm going to want to display.
If I understood you correctly, this should do:
SELECT *
FROM YourTable A
WHERE EXISTS(SELECT 1
FROM YourTable
WHERE Col1 = A.Col1
GROUP BY Col1
HAVING COUNT(*) > 1)
You can join on a derived table where you aggregate and determine "col" values which are duplicated:
SELECT a.*
FROM Table1 a
INNER JOIN
(
SELECT col
FROM Table1
GROUP BY col
HAVING COUNT(1) > 1
) b ON a.col = b.col
This query gives you a chance to ORDER BY cola in ascending or descending order and change Cola output.
Here's a Demo on SqlFiddle.
with cl
as
(
select *, ROW_NUMBER() OVER(partition by colb order by cola ) as rn
from tbl)
select *
from cl
where rn > 1

How to select distinct rows with a specified condition

Suppose there is a table
_ _
a 1
a 2
b 2
c 3
c 4
c 1
d 2
e 5
e 6
How can I select distinct minimum value of all the rows of each group?
So the expected result here is:
_ _
a 1
b 2
c 1
d 2
e 5
EDIT
My actual table contains more columns and I want to select them all. The rows differ only in the last column (the second one in the example). I'm new to SQL and possibly my question is ill-formed in it initial view.
The actual schema is:
| day | currency ('EUR', 'USD') | diff (integer) | id (foreign key) |
The are duplicate pairs (day, currency) that differ by (diff, id). I want to see a table with uniquer pairs (day, currency) with a minimum diff from the original table.
Thanks!
in your case it's as simple as this:
select column1, min(column2) as column2
from table
group by column1
for more than two columns I can suggest this:
select top 1 with ties
t.column1, t.column2, t.column3
from table as t
order by row_number() over (partition by t.column1 order by t.column2)
take a look at this post https://stackoverflow.com/a/13652861/1744834
You can use the ranking function ROW_NUMBER() to do this with a CTE. Especially, if there are more column other than these two column, it will give the distict values like so:
;WITH RankedCTE
AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY column1 ORDER BY Colmn2 ) rownum
FROM Table
)
SELECT column1, column2
FROM RankedCTE
WHERE rownum = 1;
This will give you:
COLUMN1 COLUMN2
a 1
b 2
c 1
d 2
e 5
SQL Fiddle Demo
SELECT ColOne, Min(ColTwo)
FROM Table
GROUP BY ColOne
ORDER BY ColOne
PS: not front of a,machine, but give above a try please.
select MIN(col2),col1
from dbo.Table_1
group by col1

get subset of a table in SQL

I want to get a subset of a table, here's the example:
1 A
2 A
3 B
4 B
5 C
6 D
7 D
8 D
I want to get the unique record, but with the smallest id:
1 A
3 B
5 C
6 D
How can I write the SQL in SQL Server? Thanks!
Use a common-table expression like this:
;WITH DataCTE AS
(
SELECT ID, OtherCol,
ROW_NUM() OVER(PARTITION BY OtherCol ORDER BY ID) 'RowNum'
FROM dbo.YourTable
)
SELECT *
FROM DataCTE
WHERE RowNum = 1
This "partitions" your data by the second column you have (A, B, C) and orders by the ID (1, 2, 3) - smallest ID first.
Therefore, for each "partition" (i.e. each value of your second column), the entry with RowNum = 1 is the one with the smallest ID for each value of the second column.
select min(id), othercol
from thetable
group by othercol
and maybe with
order by othercol
... at the end if thats important
Try this:
SELECT MIN(Id) AS Id, Name
FROM MyTable
GROUP BY Name
select min(id), column2
from table
group by column2
It helps if you provide the table information in the question - I've just guessed at the column names...

Select DISTINCT, return entire row

I have a table with 10 columns.
I want to return all rows for which Col006 is distinct, but return all columns...
How can I do this?
if column 6 appears like this:
| Column 6 |
| item1 |
| item1 |
| item2 |
| item1 |
I want to return two rows, one of the records with item1 and the other with item2, along with all other columns.
In SQL Server 2005 and above:
;WITH q AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY col6 ORDER BY id) rn
FROM mytable
)
SELECT *
FROM q
WHERE rn = 1
In SQL Server 2000, provided that you have a primary key column:
SELECT mt.*
FROM (
SELECT DISTINCT col6
FROM mytable
) mto
JOIN mytable mt
ON mt.id =
(
SELECT TOP 1 id
FROM mytable mti
WHERE mti.col6 = mto.col6
-- ORDER BY
-- id
-- Uncomment the lines above if the order matters
)
Update:
Check your database version and compatibility level:
SELECT ##VERSION
SELECT COMPATIBILITY_LEVEL
FROM sys.databases
WHERE name = DB_NAME()
The key word "DISTINCT" in SQL has the meaning of "unique value". When applied to a column in a query it will return as many rows from the result set as there are unique, different values for that column. As a consequence it creates a grouped result set, and values of other columns are random unless defined by other functions (such as max, min, average, etc.)
If you meant to say you want to return all rows for which Col006 has a specific value, then use the "where Col006 = value" clause.
If you meant to say you want to return all rows for which Col006 is different from all other values of Col006, then you still need to specify what that value is => see above.
If you want to say that the value of Col006 can only be evaluated once all rows have been retrieved, then use the "having Col006 = value" clause. This has the same effect as the "where" clause, but "where" gets applied when rows are retrieved from the raw tables, whereas "having" is applied once all other calculations have been made (i.e. aggregation functions have been run etc.) and just before the result set is returned to the user.
UPDATE:
After having seen your edit, I have to point out that if you use any of the other suggestions, you will end up with random values in all other 9 columns for the row that contains the value "item1" in Col006, due to the constraint further up in my post.
You can group on Col006 to get the distinct values, but then you have to decide what to do with the multiple records in each group.
You can use aggregates to pick a value from the records. Example:
select Col006, min(Col001), max(Col002)
from TheTable
group by Col006
order by Col006
If you want the values to come from a specific record in each group, you have to identify it somehow. Example of using Col002 to identify the record in each group:
select Col006, Col001, Col002
from TheTable t
inner join (
select Col006, min(Col002)
from TheTable
group by Col006
) x on t.Col006 = x.Col006 and t.Col002 = x.Col002
order by Col006
SELECT *
FROM (SELECT DISTINCT YourDistinctField FROM YourTable) AS A
CROSS APPLY
( SELECT TOP 1 * FROM YourTable B
WHERE B.YourDistinctField = A.YourDistinctField ) AS NewTableName
I tried the answers posted above with no luck... but this does the trick!
select * from yourTable where column6 in (select distinct column6 from yourTable);
SELECT *
FROM harvest
GROUP BY estimated_total;
You can use GROUP BY and MIN() to get more specific result.
Lets say that you have id as the primary_key.
And we want to get all the DISTINCT values for a column lets say estimated_total, And you also need one sample of complete row with each distinct value in SQL. Following query should do the trick.
SELECT *, min(id)
FROM harvest
GROUP BY estimated_total;
create table #temp
(C1 TINYINT,
C2 TINYINT,
C3 TINYINT,
C4 TINYINT,
C5 TINYINT,
C6 TINYINT)
INSERT INTO #temp
SELECT 1,1,1,1,1,6
UNION ALL SELECT 1,1,1,1,1,6
UNION ALL SELECT 3,1,1,1,1,3
UNION ALL SELECT 4,2,1,1,1,6
SELECT * FROM #temp
SELECT *
FROM(
SELECT ROW_NUMBER() OVER (PARTITION BY C6 Order by C1) ID,* FROM #temp
)T
WHERE ID = 1