Remove duplicate rows from one column

Remove duplicate rows from one column - sql

The problem is:
select (..)
UNION
select (..)
Result is:
Col1, Col2, Col3
Val1 Text1 Data
Val1 Text2 Data
The problem is that i need to save only 1 row of this two. Col2 value is not same at fact, but the same in business logic.
So, how to get result like this:
Col1, Col2,Col3
Val1 Text1 Data
OR
Col1, Col2, Col3
Val1 Text2 Data
Thank you!

You can place the UNION in a subquery and group again
SELECT
Col1,
MIN(Col2),
Col3
FROM (
SELECT Col1, Col2, Col3
FROM table1 t1
UNION ALL
SELECT Col1, Col2, Col3
FROM table2 t2
) t
GROUP BY
Col1,
Col2;
Note the use of UNION ALL rather than UNION, because you are grouping anyway it is not necessary to de-duplicate first.

Hmmm . . . If you want one row per val, then one method is:
with t1 as ( < query 1 here > ),
t2 as ( < query 2 here > )
select t1.*
from t1
union all
select t2.*
from t2
where not exists (select 1 from t1 where t1.val = t2.val);

Related

Delete Duplicate record in sql server if 2 colums matching

Col1
Col2
Col3
A
B
1
A
B
1
A
B
2
A
B
2
A
c
1
When col1 and Col2 values are same and Col3 values are different I dont want that values in result set.
I want result as below. I tried with row_number, group by , so manythings but did not worked. Please help me here
Col1
Col2
Col3
A
c
1

You can use exists:
delete from t
where exists (select 1
from t t2
where t2.col1 = t.col1 and t2.col2 = t.col1 and
t2.col3 <> t.col3
);
You can also use window functions:
with todelete as (
select t.*,
min(col3) over (partition by col1, col2) as min_col3,
max(col3) over (partition by col1, col2) as min_col4
from t
)
delete from todelete
where min_col3 <> max_col3;

Best way is to make these column a unique composite key. But here is a query to delete all records other than your desired result.
delete from Table_1
where
Col1=(SELECT Col1
FROM table_1
GROUP BY Col1, Col2
HAVING Count(*) > 1)
And
Col2 =(SELECT Col2
FROM table_1
GROUP BY Col1, Col2
HAVING Count(*) > 1)
this might not be the most optimized and efficient query but it works. if you don't want to delete duplicated records and just retrieve unique ones:
SELECT Col1,Col2
FROM table_1
GROUP BY Col1, Col2
HAVING Count(*) = 1
To get duplicating records:
SELECT Col2,Col1
FROM table_1
GROUP BY Col1, Col2
HAVING Count(*) > 1

SQL UNION - Adding Source

I am currently using UNION on two queries (see psuedo-code below):
query1
UNION
query2
I want to add an additional column to my results that says the source of the data. The new column called "Source" would return one of the following: "1", "2", or "both".
Being able to handle "both" is very important because query1 and query2 will have similar results and many overlapping records. If anyone could help point me in the right direction, especially with how to handle the "both" case, that would be greatly appreciated!
Sample:
If query1 has a row "Apple,Yellow,Bob" and query2 has the same row, then the result I'm hoping for is:
"Apple,Yellow,Bob,Both"
The individual queries themselves will not have duplicates, but there may be the same row both in query1 and query2 (as seen above).

you can make use of an additional column col4 like this
select col1,col2,col3,sum(col4)
from(
Select col1, col2, col3, 1 as col4 from table1
UNION
Select col1,col2,col3, 2 as col4 from table4
)
group by col1,col2,col3
The records with col4=1 only exist in table1.
The records with col4=2 only exist in table2.
The records with col4=3 exist in both table1+table

add a Source field to both query 1 and query 2:
select 1 as source, ...
from table1
union
select 2 as source, ...
from table2

Here's one way
WITH T
AS (SELECT '1' AS Source,
Col1,
Col2,
Col3
FROM table1
UNION ALL
SELECT '2' AS Source,
Col1,
Col2,
Col3
FROM table2)
SELECT CASE
WHEN MAX(Source) = MIN(Source) THEN Source
ELSE 'Both'
END AS Source,
Col1,
Col2,
Col3
FROM T
GROUP BY Col1,
Col2,
Col3

One more approach
SELECT col1
,col2
,source = CASE
WHEN count(DISTINCT source) > 1
THEN 'Both'
ELSE max(source)
END
FROM (
SELECT col1 ,col2, source = 'source1'
FROM source1
UNION ALL
SELECT col1, col2, source = 'source2'
FROM source2
) u
GROUP BY col1, col2

You can try this
SELECT
a.col1 , a.col2,
CASE WHEN MAX(a.Source) <> MIN(a.Source)
THEN 'BOTH'
ELSE MAX(a.Source) END
FROM
(
SELECT
col1, col2 ,'Source2' AS Source
FROM Table1
UNION ALL
SELECT
col1, col2 ,'Source1' AS Source
FROM Table2
) a
GROUP BY
a.col1 , a.col2
Link to the Sample

SQL Where Not Exists

I think I have a misunderstanding of how NOT EXISTS work and hope it can be clarified to me.
Here is the sample code I am running (also on SQL Fiddle)
select sum(col1) col1, sum(col2) col1, sum(col3) col3
from (
select 1 col1, 1 col2, 1 col3
from dual tbl1
)
where not exists(
select 2 col1, 1 col2, 1 col3
from dual tbl2
)
I thought that it should return:
1, 1, 1
But instead it returns nothing.
I make this assumption only on the fact that I though NOT EXISTS would give me a list of all the rows in the first query that do not exist in the second query (in this case 1,1,1)
Why does this not work
What would be the appropriate way to make it work the way I am expecting it to?

You are performing an uncorrelated subquery in your NOT EXISTS() condition. It always returns exactly one row, therefore the NOT EXISTS condition is never satisfied, and your query returns zero rows.
Oracle has a rowset difference operator, MINUS, that should do what you wanted:
select sum(col1) col1, sum(col2) col1, sum(col3) col3
from (
select 1 col1, 1 col2, 1 col3
from dual tbl1
MINUS
select 2 col1, 1 col2, 1 col3
from dual tbl2
)
SQL Server has an EXCEPT operator that does the same thing as Oracle's MINUS. Some other databases implement one or the other of these.

EXISTS just returns true if a record exists in the result set; it does not do any value checking. Since the sub-query returns one record, EXISTS is true, NOT EXISTS is false, and you get no records in your result.
Typically you have a WHERE cluase in the sub-query to compare values to the outer query.
One way to accomplish what you want is to use EXCEPT:
select sum(col1) col1, sum(col2) col1, sum(col3) col3
from (
select 1 col1, 1 col2, 1 col3
from dual tbl1
)
EXCEPT(
select 2 col1, 1 col2, 1 col3
from dual tbl2
)

A not exists that includes a select from dual will never return anything. Not exists will exclude rows where the embedded SQL returns something. Normally not exists should be used more like this:
select ... from MY_TABLE A where not exists (select 1 from OTHER_TABLE B where A.SOME_COL = B.SOME_COL)

As using NOT EXISTS is not good approach as it is return only single row so try it with MINUS or EXCEPT
select sum(col1) col1, sum(col2) col1, sum(col3) col3 from ( select 1 col1, 1 col2, 1 col3 from dual tbl1 MINUS select 2 col1, 1 col2, 1 col3 from dual tbl2 )
select sum(col1) col1, sum(col2) col1, sum(col3) col3 from ( select 1 col1, 1 col2, 1 col3 from dual tbl1 ) EXCEPT( select 2 col1, 1 col2, 1 col3 from dual tbl2 )

select all columns with one column has different value

In my table,some records have all column values are the same, except one. I need write a query to get those records. what's the best way to do it? the table is like this:
colA colB colC
a b c
a b d
a b e
What's the best way to get all records with all the columns? Thanks for everyone's help.

Assuming you know that column3 will always be different, to get the rows that have more than one value:
SELECT Col1, Col2
FROM Table t
GROUP BY Col1, Col2
HAVING COUNT(distinct col3) > 1
If you need all the values in the three columns, then you can join this back to the original table:
SELECT t.*
FROM table t join
(SELECT Col1, Col2
FROM Table t
GROUP BY Col1, Col2
HAVING COUNT(distinct col3) > 1
) cols
on t.col1 = cols.col1 and t.col2 = cols.col2

Just select those rows that have the different values:
SELECT col1, col2
FROM myTable
WHERE colWanted != knownValue
If this is not what you are looking for, please post examples of the data in the table and the wanted output.

How about something like
SELECT Col1, Col2
FROM Table
GROUP BY Col1, Col2
HAVING COUNT(*) = 1
This will give you Col1, Col2 that have unique data.

Assuming col3 has the difs
SELECT Col1, Col2
FROM Table
GROUP BY Col1, Col2
HAVING COUNT(*) > 1
OR TO SHOW ALL 3 COLS
SELECT Col1, Col2, Col3
FROM Table1
GROUP BY Col1, Col2, Col3
HAVING COUNT(Col3) > 1

SQL query to simulate distinct

SELECT DISTINCT col1, col2 FROM table t ORDER BY col1;
This gives me distinct combination of col1 & col2. Is there an alternative way of writing the Oracle SQL query to get the unique combination of col1 & col2 records with out using the keyword distinct?

Use the UNIQUE keyword which is a synonym for DISTINCT:
SELECT UNIQUE col1, col2 FROM table t ORDER BY col1;

I don't see why you would want to but you could do
SELECT col1, col2 FROM table_t GROUP BY col1, col2 ORDER BY col1

Another - yet overly complex and somewhat useless - solution:
select *
from (
select col1,
col2,
row_number() over (partition by col1, col2 order by col1, col2) as rn
from the_table
)
where rn = 1
order by col1

select col1, col2
from table
group by col1, col2
order by col1
or a less elegant way:
select col1,col2 from table
UNION
select col1,col2 from table
order by col1;
or a even less elegant way:
select a.col1, a.col2
from (select col1, col2 from table
UNION
select NULL, NULL) a
where a.col1 is not null
order by a.col1

Yet another ...
select
col1,
col2
from
table t1
where
not exists (select *
from table t2
where t2.col1 = t1.col1 and
t2.col2 = t1.col2 and
t2.rowid > t1.rowid)
order by
col1;

Variations on the UNION solution by #aF. :
INTERSECT
SELECT col1, col2 FROM tableX
INTERSECT
SELECT col1, col2 FROM tableX
ORDER BY col1;
MINUS
SELECT col1, col2 FROM tableX
MINUS
SELECT col1, col2 FROM tableX WHERE 0 = 1
ORDER BY col1;
MINUS (2nd version, it will return one row less than the other versions, if there is (NULL, NULL) group)
SELECT col1, col2 FROM tableX
MINUS
SELECT NULL, NULL FROM dual
ORDER BY col1;

Another ...
select col1,
col2
from (
select col1,
col2,
rowid,
min(rowid) over (partition by col1, col2) min_rowid
from table)
where rowid = min_rowid
order by col1;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Remove duplicate rows from one column - sql

Hmmm . . . If you want one row per val, then one method is: with t1 as ( < query 1 here > ), t2 as ( < query 2 here > ) select t1.* from t1 union all select t2.* from t2 where not exists (select 1 from t1 where t1.val = t2.val);

Related

Delete Duplicate record in sql server if 2 colums matching

SQL UNION - Adding Source

SQL Where Not Exists

select all columns with one column has different value

SQL query to simulate distinct

Categories

Resources