How to handle duplicates in sql on union all select statements

How to handle duplicates in sql on union all select statements - sql

For an example, if have:
SELECT 'A#G.com' AS Email, 2 AS Somenumber, 3 AS Number
UNION ALL
SELECT 'A#G.com' AS Email, 2 AS Somenumber, 5 AS Number
UNION ALL
SELECT 'z#y.com' AS Email, 1 AS Somenumber, 6 AS Number
instead of:
I want to get:

SELECT Email, Somenumber, Number
FROM (
SELECT *, RowNum = ROW_NUMBER() OVER (PARTITION BY Email ORDER BY Number DESC)
FROM (
VALUES
('A#G.com', 2, 3),
('A#G.com', 2, 5),
('z#y.com', 1, 6)
) t(Email, Somenumber, Number)
) t
WHERE RowNum = 1
output -
Email Somenumber Number
------- ----------- -----------
A#G.com 2 5
z#y.com 1 6

It looks like you're after one row per email. You can do that like:
; with all_rows as
(
... your union query here ...
)
, with numbered_rows as
(
select row_number() over (partition by email order by somenumber) as rn
, *
from all_rows
)
select email
, somenumber
, number
from numbered_rows
WHERE rn = 1

usually when I need to solve problem like this I am going with Devart solution above just my mind working in same way
but here alternate solution which give better execution plan
select Email, max(Somenumber), max(Number) from (
SELECT 'A#G.com' AS Email, 2 AS Somenumber, 3 AS Number
UNION ALL
SELECT 'A#G.com' AS Email, 2 AS Somenumber, 5 AS Number
UNION ALL
SELECT 'z#y.com' AS Email, 1 AS Somenumber, 6 AS Number
) A
group by Email

Related

How to do sorting and then numbering on an Oracle database

As an example I have a database with the following information
Name Number
Boris
Trevor
Arthur
bessie
big Dave
BOB
I want to be able to sort that data in the below order and then add a number to the number column in that specific order
Name Number
Arthur 1
BOB 2
Boris 3
big Dave 4
bessie 5
Trevor 6
I can select using the order I have specified using
select DB.TABLE.NAME , case
when row_number() over(partition by lower(DB.TABLE.NAME )
order by DB.TABLE.NAME ) = 1
then 1
else 0
end as result
from DB.TABLE;
but I then have no idea how to apply the numbers to the numbers column.
If I try a different method of sorting, I can use a sequence to apply the numbers but the order is not what I want. It seems to be the row_number() function that is causing me problems.
Any help would be appreciated.

I think what you're after is something like:
with sample_data as (select 'Boris' name from dual union all
select 'Trevor' name from dual union all
select 'BO Derek' name from dual union all
select 'Arthur' name from dual union all
select 'big dave' name from dual union all
select 'big Dave' name from dual union all
select 'BOB' name from dual union all
select 'BORAT' name from dual union all
select 'Brian' name from dual union all
select 'Big Bad Dom' name from dual)
-- end of creating a subquery "sample_data" to mimic a table with data in it.
-- see SQL below:
select name,
row_number() over (order by upper(substr(name, 1, 1)),
name) row_num
from sample_data
order by upper(substr(name, 1, 1)),
name;
NAME ROW_NUM
----------- ----------
Arthur 1
BO Derek 2
BOB 3
BORAT 4
Big Bad Dom 5
Boris 6
Brian 7
big Dave 8
big dave 9
Trevor 10
To update a table, you'd do something like (assuming name is a unique column):
merge into some_table tgt
using (select name,
row_number() over (order by upper(substr(name, 1, 1)),
name) row_num
from some_table) src
on (tgt.name = src.name)
when matched then
update set tgt.number = src.row_num;

Use a MERGE statement:
merge into the_table t
using (
select rowid as rid,
row_number() over(order by lower(name)) as result
from the_table
) nr on (nr.rid = t.rowid)
when matched then update
set "number" = nr.result;
I am not sure what the CASE should do. It only returns 1 or 0 but the expected result shows you want numbers from 1 to 6, so I removed the CASE
If you have a proper primary key on the table, it's better to use that instead of rowid

Try this.
select DB.TABLE.NAME ,
row_number() over(ORDER by DB.TABLE.NAME ) as Number
from DB.TABLE
order by DB.TABLE.NAME;
Maybe you are looking to update db.table in that case:
update DB.TABLE
set number = (select row_number() over(ORDER by DB.TABLE.NAME ) as Number
from DB.TABLE t1 where t1.name = DB.TABLE.NAME );

Thanks all for your suggestions.
I went with this hacky approach to the answer by #a_horse_with_no_name
CREATE SEQUENCE NEWSEQ
START WITH 1
MAXVALUE 999999999999999999999999999
MINVALUE 1;
merge into DB.TABLE t
using (
select rowid as rid, DB.TABLE.NAME, case
when row_number() over(partition by lower(DB.TABLE.NAME )
order by DB.TABLE.NAME ) = 1
then 1
else 0
end as result
from DB.TABLE
) nr on (nr.rid = t.rowid)
when matched then update
set NUMBER = NEWSEQ.NEXTVAL;
drop sequence NEWSEQ;
It may not be the most efficient way to do it, but it works

Add A,B,C letters to Duplicate values

I need big help from you I am using sql server 2008 and I want to get the output using sql query.
I have a following data in the table.
Id Code
-----------------
1 01012
2 01012
3 01012
4 01012
5 01013
6 01013
7 01014
I need Following output
Id Code
-----------------
1 01012
2 01012A
3 01012B
4 01012C
5 01013
6 01013A
7 01014

You can use ROW_NUMBER. When Rn = 1, retain the original Code else, add A, B, and so on.
To determine which letter to add, the formula is CHAR(65 - RN - 2).
WITH CTE AS(
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY Code ORDER BY Id)
FROM tbl
)
SELECT
Id,
Code = CASE
WHEN Rn = 1 THEN Code
ELSE Code + CHAR(65 + Rn - 2)
END
FROM CTE

SQL Server 2012+ Solution
Can be adpated to 2008 be replacing CONCAT with + and CHOOSE with CASE.
Data:
CREATE TABLE #tab(ID INT, Code VARCHAR(100));
INSERT INTO #tab
SELECT 1, '01012'
UNION ALL SELECT 2, '01012'
UNION ALL SELECT 3, '01012'
UNION ALL SELECT 4, '01012'
UNION ALL SELECT 5, '01013'
UNION ALL SELECT 6, '01013'
UNION ALL SELECT 7, '01014';
Query:
WITH cte AS
(
SELECT ID, Code,
[rn] = ROW_NUMBER() OVER(PARTITION BY Code ORDER BY id)
FROM #tab
)
SELECT
ID,
Code = CONCAT(Code, CHOOSE(rn, '', 'A', 'B', 'C', 'D', 'E', 'F')) -- next letters
FROM cte;
LiveDemo

select
case
when rownum > 1 then code + char(65+rownum-2)
else code
end as code,
id
from (
select *,
ROW_NUMBER() over( partition by code order by code) as rownum
from #tab
)c

delete duplicates records leaving unique in group with priority

I have table that is generated by procedure that I cannot modify and that is returning data like so:
USER_ID ACTIVE_STREET STREET
----------- ----------- -----------------
1 1 STREET1
1 0 STREET1
1 0 OTHER STREET
2 0 OTHER USER STREET
2 0 OTHER USER STREET
2 0 OTHER USER STREET
2 1 OTHER USER STREET
I need to remove records from this table following this rules:
Every user has only one active street.
I must delete duplicates but only removing those that have ACTIVE_STREET set to 0
So I'd like to leave only these records:
USER_ID ACTIVE_STREET STREET
----------- ----------- -----------------
1 1 STREET1
1 0 OTHER STREET
2 1 OTHER USER STREET
I've tried grouping but there is no id column so I can't get id's to delete.
How can I delete those duplicates without altering original table structure?
EDIT - based on Gordon's answer
this is really close, but there is a litle difference:
IF OBJECT_ID( 'tempdb..#MY_TMP' ) IS NOT NULL
BEGIN
DROP TABLE #MY_TMP;
END;
SELECT * INTO #MY_TMP
FROM(
SELECT 1 AS USER_ID,
1 AS ACTIVE_STREET,
'STREET1' AS STREET
UNION ALL
SELECT 2 AS USER_ID,
1 AS active,
'OTHER USER STREET' AS STREET
UNION ALL
SELECT 1 AS USER_ID,
0 AS active,
'STREET1' AS STREET
UNION ALL
SELECT 1 AS USER_ID,
0 AS active,
'OTHER STREET' AS STREET
UNION ALL
SELECT 2 AS USER_ID,
0 AS active,
'OTHER USER STREET' AS STREET
UNION ALL
SELECT 2 AS USER_ID,
0 AS active,
'OTHER USER STREET 2' AS STREET ) X;
SELECT *
FROM #MY_TMP ORDER BY USER_ID, ACTIVE_STREET desc;
SELECT * FROM (
select USER_ID, MAX(ACTIVE_STREET) AS a, STREET
from #MY_TMP
group by USER_ID, STREET ) X ORDER BY USER_ID, a desc
;with todelete as (
select row_number() over (partition by user_id, ACTIVE_STREET
order by street) as seqnum
from #MY_TMP t
)
delete todelete
where seqnum > 1;
SELECT *
FROM #MY_TMP ORDER BY USER_ID, ACTIVE_STREET desc;

Does this do what you want?
select user_id, active_street, min(street) as street
from atable t
group by user_id, active_street;
It returns the results that you specify.
If you actually want to delete rows from the table, you can use row_number():
with todelete as (
select t.*, row_number() over (partition by user_id, active_street
order by street) as seqnum
from atable t
)
delete todelete
where seqnum > 1;
Here is a SQL Fiddle that demonstrates the code.
EDIT:
Ooops, I think I misunderstood the logic. You want to delete all streets that are the same as the active street with the flag = 0. If so, this is the query:
delete t from my_tmp t
where active_street = 0 and
exists (select 1
from my_tmp t2
where t2.user_id = t.user_id and
t2.street = t.street and
t2.active_street = 1
);
And here is the SQL Fiddle for this one.

Create a temporary table. Move data to temp table, using GROUP BY:
insert into temptable
select USER_ID, MAX(ACTIVE_STREET), STREET
from tablename
group by USER_ID, STREET
When done, delete from original table and copy from temptable to it.

Maybe this variants will be applicable for your task?
-- Create table with sample data
IF OBJECT_ID('tempdb..#MY_TMP') IS NOT NULL
DROP TABLE #MY_TMP
;
SELECT * INTO #MY_TMP
FROM (
VALUES ( 1, 1, 'STREET1' )
, ( 1, 0, 'STREET1' )
, ( 1, 0, 'OTHER STREET' )
, ( 2, 0, 'OTHER USER STREET' )
, ( 2, 0, 'OTHER USER STREET' )
, ( 2, 0, 'OTHER USER STREET' )
, ( 2, 1, 'OTHER USER STREET' )
) T([USER_ID], [ACTIVE_STREET], [STREET]);
Variant using temp table:
1 - fill table with required ruesults;
2 - truncate source table;
3 - insert data from temp to source table:
IF OBJECT_ID('tempdb..#ToBeInserted') IS NOT NULL
DROP TABLE #ToBeInserted
SELECT [USER_ID]
, [ACTIVE_STREET]
, [STREET]
INTO #ToBeInserted
FROM (SELECT *, RN = ROW_NUMBER() OVER (PARTITION BY [USER_ID], [STREET]
ORDER BY [STREET],[ACTIVE_STREET] DESC)
FROM #MY_TMP) AS T
WHERE RN = 1
TRUNCATE TABLE #MY_TMP
INSERT INTO #MY_TMP ( [USER_ID], [ACTIVE_STREET], [STREET] )
SELECT [USER_ID]
, [ACTIVE_STREET]
, [STREET]
FROM #ToBeInserted
Variant using CTE
WITH CTE
AS
(SELECT *, RN = ROW_NUMBER() OVER (PARTITION BY [USER_ID],[STREET]
ORDER BY [STREET],[ACTIVE_STREET] DESC)
FROM #MY_TMP)
DELETE CTE
WHERE RN > 1;

t-SQL Use row number, but on duplicate rows, use the same number

I have some data and want to be able to number each row sequentially, but rows with the same type consecutively, number the same number, and when it's a different type continue numbering.
There will only be types 5 and 6, ID is actually more complex than abc123. I've tried rank but I seem to get two different row counts - in the example instead of 1 2 2 3 4 it would be 1 1 2 2
original image
dense rank result
MS SQL 2008 R2

As far as I understand, you want to number your continous groups
declare #Temp table (id1 bigint identity(1, 1), ID nvarchar(128), Date date, Type int)
insert into #Temp
select 'abc123', '20130101', 5 union all
select 'abc124', '20130102', 6 union all
select 'abc125', '20130103', 6 union all
select 'abc126', '20130104', 5 union all
select 'abc127', '20130105', 6 union all
select 'abc128', '20130106', 6 union all
select 'abc129', '20130107', 6 union all
select 'abc130', '20130108', 6 union all
select 'abc131', '20130109', 5
;with cte1 as (
select
*,
row_number() over (order by T.Date) - row_number() over (order by T.Type, T.Date) as grp
from #Temp as T
), cte2 as (
select *, min(Date) over (partition by grp) as grp2
from cte1
)
select
T.ID, T.Date, T.Type,
dense_rank() over (order by grp2)
from cte2 as T
order by id1

How to select top x from with params?

I'm using Sql-Server 2005
I have Users table with userID and gender. I want to select top 1000 males(0) and top 1000 females(1) order by userID desc.
If i create union only one result set is ordered by userID desc. What other way to do that?
SELECT top 1000 *
FROM Users
where gender=0
union
SELECT top 1000 *
FROM Users
where gender=1
order by userID desc

Another way of doing it
WITH TopUsers AS
(
SELECT UserId,
Gender,
ROW_NUMBER() OVER (PARTITION BY Gender ORDER BY UserId DESC) AS RN
FROM Users
WHERE Gender IN (0,1) /*I guess this line might well not be needed*/
)
SELECT UserId, Gender
FROM TopUsers
WHERE RN <= 1000
ORDER BY UserId DESC

Martin Smith's solution is better than the following.
SELECT UserID, Gender
FROM
(SELECT TOP 1000 UserId, Gender
FROM Users
WHERE gender = 0
ORDER BY UserId DESC) m
UNION ALL
SELECT UserID, Gender
FROM
(SELECT TOP 1000 UserId, Gender
FROM Users
WHERE gender = 1
ORDER BY UserId DESC) f
ORDER BY Gender, UserID DESC
This does what you want, just change the order by if you'd rather have the latest user first, but it will get you the top 1000 of each.

Done some testing, and the results are pretty strange. If you specify an order by in both parts of a union, SQL Server gives a syntax error:
select top 2 * from #users where gender = 0 order by id
union all
select top 2 * from #users where gender = 1 order by id
That makes sense, because the order by should only be at the end of the union. But if you use the same construct in a subquery, it compiles! And works as expected:
select * from (
select top 2 * from #users where gender = 0 order by id
union all
select top 2 * from #users where gender = 1 order by id
) sub
The strangest thing happens when you specify only one order by for the subquery union:
select * from (
select top 2 * from #users where gender = 0
union all
select top 2 * from #users where gender = 1 order by id
) sub
Now it orders the first half of the union at random, but the second half by id. That's pretty unexpected. The same thing happens with the order by in the first half:
select * from (
select top 2 * from #users where gender = 0 order by id desc
union all
select top 2 * from #users where gender = 1
) sub
I'd expect this to give a syntax error, but instead it orders the first half of the union. So it looks like union interacts with order by in a different way when the union is part of a subquery.
Like Chris Diver originally posted, a good way to get out of the confusion is not to rely on the order by in a union, and specify everything explicitly:
select *
from (
select *
from (
select top 2 *
from #users
where gender = 0
order by
id desc
) males
union all
select *
from (
select top 2 *
from #users
where gender = 1
order by
id desc
) females
) males_and_females
order by
id
Example data:
declare #users table (id int identity, name varchar(50), gender bit)
insert into #users (name, gender)
select 'Joe', 0
union all select 'Alex', 0
union all select 'Fred', 0
union all select 'Catherine', 1
union all select 'Diana', 1
union all select 'Esther', 1

You need to ensure that you create a sub-select for the union, then do the ordering outside over the combined results.
Something like this should work:
SELECT u.*
FROM (SELECT u1a.* FROM (SELECT TOP 1000 u1.*
FROM USERS u1
WHERE u1.gender = 0
ORDER BY u1.userid DESC) u1a
UNION ALL
SELECT u2a.* FROM (SELECT TOP 1000 u2.*
FROM USERS u2
WHERE u2.gender = 1
ORDER BY u2.userid DESC) u2a
) u
ORDER BY u.userid DESC
Also, using a UNION ALL will give better performance as the db won't bother checking for duplicates (which there won't be in this query) in the results.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to handle duplicates in sql on union all select statements - sql

For an example, if have: SELECT 'A#G.com' AS Email, 2 AS Somenumber, 3 AS Number UNION ALL SELECT 'A#G.com' AS Email, 2 AS Somenumber, 5 AS Number UNION ALL SELECT 'z#y.com' AS Email, 1 AS Somenumber, 6 AS Number instead of: I want to get:

Related

How to do sorting and then numbering on an Oracle database

Add A,B,C letters to Duplicate values

delete duplicates records leaving unique in group with priority

t-SQL Use row number, but on duplicate rows, use the same number

How to select top x from with params?

Categories

Resources