Sql server combine multiple data sets without duplicate data - sql

Given three tables Ta, Tb, Tc:
Ta(ID, Field1)
Tb(ID, Field2)
Tc(ID, Field3)
Given data example:
Ta
ID Field1
---------
1 A
1 B
Tb
ID Field2
---------
1 C
1 D
2 E
Tc
ID Field3
---------
1 F
2 G
2 H
Question:
How can I join this data to return:
ID Field1 Field2 Field3
-----------------------
1 A C F
1 B D NULL
2 NULL E G
2 NULL NULL H
I thought I could achieve this with outer joins but that doesn't seem to be the case. The order of the groupings doesn't really matter, as long as I bring back all information without duplicate rows.
Just to clarify. I don't really mind which combination I get as long as the result set returns all data in the minimum number of rows. Here's a more realistic example of what I am trying to do:
Given a person, call him John. He has two phone numbers and three email addresses:
PID Email
---------
John john#test.com
John john#mail.com
John john#john.com
PID Tel
--------
John 011
John 022
I want to return:
PID Email Tel
----------------------
John john#test.com 011
John john#mail.com 022
John john#john.com NULL

You can come close with the following:
select coalesce(ta.id, tb.id, tc.id), ta.field1, tb.field2, tc.field3
from (select ta.*, row_number() over (partition by id order by (select NULL)) as seqnum
from ta
) ta full outer join
(select tb.*, row_number() over (partition by id order by (select NULL)) as seqnum
from tb
) tb
on ta.id = tb.id and
ta.seqnum = tb.seqnum
(select tc.*, row_number() over (partition by id order by (select NULL)) as seqnum
from tc
) tc
on coalesce(ta.id, tb.id) = tc.id and
coalesce(ta.seqnum, tb.seqnum) = tc.seqnum
group by coalesce(ta.id, tb.id, tc.id),
coalesce(ta.seqnum, tb.seqnum, tc.seqnum)
order by 1, 2
As I said, though, in my comment, the ordering of rows in a table is not guaranteed, so these may not come out in the order you expect. With your sample data, you could use:
over (partition by id order by field<n>)
If the fields define the ordering

Here's an alternative, using CTE's and a Union, with MIN to exclude the nulls. It doesn't guarantee the ordering, but as since you say you don't care as long as the ID's are all present.
SQL Fiddle here
WITH TaRanked AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Field1) as Rnk, ID, Field1
FROM Ta
),
TbRanked AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Field2) as Rnk, ID, Field2
FROM Tb
),
TcRanked AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Field3) as Rnk, ID, Field3
FROM Tc
),
TUnion AS
(
SELECT Rnk, ID, Field1, NULL AS Field2, NULL AS Field3
FROM TaRanked
UNION ALL
SELECT Rnk, ID, NULL, Field2, NULL
FROM TbRanked
UNION ALL
SELECT Rnk, ID, NULL, NULL, Field3
FROM TcRanked
)
SELECT ID, MIN(Field1), MIN(Field2), MIN(Field3)
FROM TUnion
GROUP BY ID, Rnk
ORDER BY ID, Rnk
The result is
1 A C F
1 B D (null)
2 (null) E G
2 (null) (null) H

Related

Select the duplicate rows with specific values

How can I only get the data with the same ID, but not the same Name?
The following is the example to explain my thought. Thanks.
ID Name Date
123 Amy 08/03/2022
123 Amy 12/03/2022
456 Billy 08/03/2022
456 Cat 09/03/2022
789 Peter 10/03/2022
Expected Output:
ID Name Date
456 Billy 08/03/2022
456 Cat 09/03/2022
How I have done.
select ID, Name, count(*)
from table
groupby ID, Name
having count(*) > 1
But the result included the following parts that I do not want it.
ID Name Date
123 Amy 08/03/2022
123 Amy 12/03/2022
One approach would be to use a subquery to identify IDs that have multiple names.
SELECT *
FROM YourTable
WHERE ID IN (SELECT ID FROM YourTable GROUP BY ID HAVING COUNT(DISTINCT Name) > 1)
I'd join the table to its self like this:
SELECT DISTINCT
a.Id as ID_A,
b.Id as ID_B,
a.[Name] as Name_A
FROM
Test as a
INNER JOIN Test as b
ON A.Id = B.Id
WHERE
A.[Name] <> B.[Name]
Do you want
SELECT * FROM table_name
WHERE ID = 456;
or
SELECT * FROM table_name
WHERE ID IN
(SELECT
ID
FROM table_name
GROUP BY ID
HAVING COUNT(DISTINCT name) > 1
);
?
Window functions are likely to be the most efficient here. They do not require self-joining of the source table.
Unfortunately, SQL Server does not support COUNT(DISTINCT as a window function. But we can simulate it by using DENSE_RANK and MAX
WITH DistinctRanks AS (
SELECT *,
rnk = DENSE_RANK(*) OVER (PARTITION BY ID ORDER BY Name)
FROM YourTable
),
MaxRanks AS (
SELECT *,
mr = MAX(rnk) OVER (PARTITION BY ID)
FROM DistinctRanks
)
SELECT
ID,
Name,
Count
FROM MaxRanks t
WHERE t.mr > 1;

SQL select top if columns are same

If I have a table like this:
Id StateId Name
1 1 a
2 2 b
3 1 c
4 1 d
5 3 e
6 2 f
I want to select like below:
Id StateId Name
4 1 d
5 3 e
6 2 f
For example, Ids 1,3,4 have stateid 1. So select row with max Id, i.e, 4.
; WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY STATEID ORDER BY ID DESC) AS RN
)SELECT ID, STATEID, NAME FROM CTE WHERE RN = 1
You can use ROW_NUMBER() + TOP 1 WITH TIES:
SELECT TOP 1 WITH TIES
Id,
StateId,
[Name]
FROM YourTable
ORDER BY ROW_NUMBER() OVER (PARTITION BY StateId ORDER BY Id DESC)
Output:
Id StateId Name
4 1 d
6 2 f
5 3 e
Disclaimer: I gave this answer before the OP had specified an actual database, and hence avoided using window functions. For a possibly more appropriate answer, see the reply by #Tanjim above.
Here is an option using joins which should work across most RDBMS.
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT StateId, MAX(Id) AS Id
FROM yourTable
GROUP BY StateId
) t2
ON t1.StateId = t2.StateId AND
t1.Id = t2.Id
The following using a subquery, to find the maximum Id for each of the states. The WHERE clause then only includes rows with ids from that subquery.
SELECT
[Id], [StateID], [Name]
FROM
TABLENAME S1
WHERE
Id IN (SELECT MAX(Id) FROM TABLENAME S2 WHERE S2.StateID = S1.StateID)

left join without duplicate values using MIN()

I have a table_1:
id custno
1 1
2 2
3 3
and a table_2:
id custno qty descr
1 1 10 a
2 1 7 b
3 2 4 c
4 3 7 d
5 1 5 e
6 1 5 f
When I run this query to show the minimum order quantities from every customer:
SELECT DISTINCT table_1.custno,table_2.qty,table_2.descr
FROM table_1
LEFT OUTER JOIN table_2
ON table_1.custno = table_2.custno AND qty = (SELECT MIN(qty) FROM table_2
WHERE table_2.custno = table_1.custno )
Then I get this result:
custno qty descr
1 5 e
1 5 f
2 4 c
3 7 d
Customer 1 appears twice each time with the same minimum qty (& a different description) but I only want to see customer 1 appear once. I don't care if that is the record with 'e' as a description or 'f' as a description.
First of all... I'm not sure why you need to include table_1 in the queries to begin with:
select custno, min(qty) as min_qty
from table_2
group by custno;
But just in case there is other information that you need that wasn't included in the question:
select table_1.custno, ifnull(min(qty),0) as min_qty
from table_1
left outer join table_2
on table_1.custno = table_2.custno
group by table_1.custno;
"Generic" SQL way:
SELECT table_1.custno,table_2.qty,table_2.descr
FROM table_1, table_2
WHERE table_2.id = (SELECT TOP 1 id
FROM table_2
WHERE custno = table_1.custno
ORDER BY qty )
SQL 2008 way (probably faster):
SELECT custno, qty, descr
FROM
(SELECT
custno,
qty,
descr,
ROW_NUMBER() OVER (PARTITION BY custno ORDER BY qty) RowNum
FROM table_2
) A
WHERE RowNum = 1
If you use SQL-Server you could use ROW_NUMBER and a CTE:
WITH CTE AS
(
SELECT table_1.custno,table_2.qty,table_2.descr,
RN = ROW_NUMBER() OVER ( PARTITION BY table_1.custno
Order By table_2.qty ASC)
FROM table_1
LEFT OUTER JOIN table_2
ON table_1.custno = table_2.custno
)
SELECT custno, qty,descr
FROM CTE
WHERE RN = 1
Demolink

How do I select a row based on a priority value in another row?

I am using Oracle 11G and I have a table with the following columns and values and I want to select the value for each column based on the priority column. I only want one row for each ID.
ID NAME NAME_PRIORITY COLOR COLOR_PRIORITY
1 SAM 2 RED 1
1 SAM 2 GREEN 2
1 JOHN 1 BLUE 3
2 MARY 2 ORANGE 1
3 JON 2 RED 2
3 PETE 3 GREEN 1
Desired Results
ID NAME NAME_PRIORITY COLOR COLOR_PRIORITY
1 JOHN 1 RED 1
2 MARY 2 ORANGE 1
3 JON 2 GREEN 1
How do I select the NAME and COLOR with the lowest PRIORITY # and only have one row for each ID.
one option is:
select d.id, min(name) keep (dense_rank first order by name_priority) name,
min(name_priority) name_priority,
min(color) keep (dense_rank first order by color_priority) color,
min(color_priority) color_priority
from yourtab d
group by id;
You can use row_number() on both the name_priority and color_priority to get the result:
select n.id,
name,
name_priority,
color,
color_priority
from
(
select id,
name,
name_priority,
row_number() over(partition by id order by name_priority) name_row
from yourtable
) n
inner join
(
select id,
color,
color_priority,
row_number() over(partition by id order by color_priority) color_row
from yourtable
) c
on n.id = c.id
and n.name_row = c.color_row
where n.name_row = 1
and c.color_row = 1
See SQL Fiddle with Demo.
Once you have the row_number() for each priority, then you will join the results on the id and the row number and only return the rows where the row number is equal to 1.
This query uses Common Table Expression and ROW_NUMBER()
WITH nameList
AS
(
SELECT ID, Name,
ROW_NUMBER() OVER (PARTITION BY ID
ORDER BY NAME_PRIORITY) rn
FROM TableName
),
colorList
AS
(
SELECT a.ID, a.Name,
b.Color, b.COLOR_PRIORITY,
ROW_NUMBER() OVER (PARTITION BY a.ID
ORDER BY COLOR_PRIORITY) rnB
FROM nameList a
INNER JOIN tableName b
ON a.ID = b.ID AND a.rn = 1
)
SELECT ID, Name, Color, COLOR_PRIORITY
FROM colorList
WHERE rnB = 1
SQLFiddle Demo

How do I remove duplicates in paging

table1 & table2:
table1 & table2 http://aftabfarda.parsfile.com/1.png
SELECT *
FROM (SELECT DISTINCT dbo.tb1.ID, dbo.tb1.name, ROW_NUMBER() OVER (ORDER BY tb1.id DESC) AS row
FROM dbo.tb1 INNER JOIN
dbo.tb2 ON dbo.tb1.ID = dbo.tb2.id_tb1) AS a
WHERE row BETWEEN 1 AND 7
ORDER BY id DESC
Result:
Result... http://aftabfarda.parsfile.com/3.png
(id 11 Repeated 3 times)
How can I have this output:
ID name row
-- ------ ---
11 user11 1
10 user10 2
9 user9 3
8 user8 4
7 user7 5
6 user6 6
5 user5 7
You could apply distinct before row_number using a subquery:
select *
from (
select row_number() over (order by tbl.id desc) as row
, *
from (
select distinct t1.ID
, tb1.name
from dbo.tb1 as t1
join dbo.tb2 as t2
on t1.ID = t2.id_tb1
) as sub_dist
) as sub_with_rn
where row between 1 and 7
Alternatively to #Andomar's suggestion, you could use DENSE_RANK instead of ROW_NUMBER and rank the rows first (in the subquery), then apply DISTINCT (in the outer query):
SELECT DISTINCT
ID,
name,
row
FROM (
SELECT
t1.ID,
t1.name,
DENSE_RANK() OVER (ORDER BY t1.ID DESC) AS row
FROM dbo.tb1 t1
INNER JOIN dbo.tb2 t2 ON t1.ID = t2.id_tb1
) AS a
WHERE row BETWEEN 1 AND 7
ORDER BY ID DESC
Similar, but not quite the same, although both might boil down to the same query plan, I'm just not sure. Worth testing, I think.
And, of course, you could also try a semi-join instead of a proper join, in the form of either IN or EXISTS, to prevent duplicates in the first place:
SELECT
ID,
name,
row
FROM (
SELECT
ID,
name,
ROW_NUMBER() OVER (ORDER BY ID DESC) AS row
FROM dbo.tb1
WHERE ID IN (SELECT id_tb1 FROM dbo.tb2)
/* Or:
WHERE EXISTS (
SELECT *
FROM dbo.tb2
WHERE id_tb1 = dbo.tb1.ID
)
*/
) AS a
WHERE row BETWEEN 1 AND 7
ORDER BY ID DESC