Select duplicate records info

Select duplicate records info - sql

I have a person table:
Phone | Id1 | Id2 | Fname | Lname| Street
111111111 | A1 | 1000 | David | Luck | 123 Main Street
111111111 | A2 | 1001 | David | Luck | blank
111111111 | A3 | 1002 | David | Luck | blank
222222222 | B1 | 2000 | Smith | Nema | blank
333333333 | C1 | 3000 | Lanyn | Buck | 456 Street
I would like to have the result below:
Phone | Id1 | Id2 | Fname | Lname| Street
111111111 | A1 | 1000 | David | Luck | 123 Main Street
222222222 | B1 | 2000 | Smith | Nema | blank
333333333 | C1 | 3000 | Lanyn | Buck | 456 Street
What SQL2008 query should I be using to pick the dup phone records that have street info? Thanks

You want to choose a particular row. This is where the window function row_number() is most useful. The challenge is finding the right order by clause:
select p.Phone, p.Id1, p.Id2, p.Fname, p.Lname, p.Street
from (select p.*,
row_number() over (partition by phone
order by (case when street is not null then 0 else 1 end),
id2
) as seqnum
from person p
) p
where seqnum = 1
The function row_number() assigns a sequential number to rows with the same value of phone (based on the partition by clause). The one with non-blank street and lowest id2 gets a value of 1. If none exist, then the one with the lowest id2 gets the value. That is the one chosen by the outer filter.

If your street is blank (as in empty set '' or NULL) when not populated with an actual address, you can use this to get your results:
SELECT a.*
FROM Person a
JOIN (SELECT Phone, MAX(Street)'Street'
FROM Person
GROUP BY Phone
)b
ON a.Phone = b.Phone
AND a.Street = b.Street
Demo: SQL Fiddle
If your street was literally the string 'Blank' then the above would not return the desired results.

SELECT a.*
FROM person a
JOIN ( SELECT Phone, Street,
ROW_NUMBER() OVER (PARTITION BY Phone
ORDER BY CASE WHEN street is null then 0 else 1 end) as 'Rank'
FROM Person
)b
ON a.Phone = b.Phone
AND a.Street = b.Street
WHERE b.Rank = 1

Try this
select a.* from Table1 a
inner join
(
select distinct Phone from Table1
group by Phone
) as b
on a.Phone= b.Phone

Related

SELECT multiple instances of a row where they might have the same value

I want to be able to select some customer details, then select the customer address details, now these customer address details also include previous addresses, and some customers have up to 3 addresses (including previous) and some only one.
I want to select the customer from the customer db, then the customer address(es) from the address db, and load up to 3 of the addresses, per customer, on the same row.
Customer DB:
| name | value | num |SecondName| Date |
| James | HEX124 | 1 | Carl | 11022020 |
| Jack | JEU836 | 4 | Smith | 19042020 |
| Mandy | GER234 | 33 | Jones | 09042020 |
Address DB
| Address | value | PostCode |
| 1 Smith street | HEX124 | LN18HB |
| 12 fellow garden | GER234 | LN18JL |
| 8 Long street | HEX124 | FF23F2 |
| 8 Big road | HEX124 | FWF4GW |
| 89 Kings avenue | GER234 | HH29DD |
| Roadhouse Cottage | JEU836 | FK28DD |
The 'value' column inside the Customer DB is the customers unique value, this value is also used in the Address DB to assign up to 3 addresses to this one specific customer
My SQL Below:
SELECT c.name, c.value, c.num, c.secondname, c.date,
a.address, a.value, a.postcode,
a2.address, a2.value, a2.postcode,
a3.address, a3.value, a3.postcode
FROM Customers c
INNER JOIN Address a ON a.value = c.value. //first address
INNER JOIN Address a2 ON a2.value = c.value //second address
INNER JOIN Address a3 ON a3.value = c.value //third address
This returns as many rows for each customer, as as many addresses they have, and it will repeat the same address 3 times for them:
Result:
| James | HEX124 | 1 | Carl | 11022020 | 1 Smith street | HEX124 | LN18HB | 1 Smith street | HEX124 | LN18HB | 1 Smith street | HEX124 | LN18HB |
| James | HEX124 | 1 | Carl | 11022020 | 8 Long street | HEX124 | FF23F2 | 8 Long street | HEX124 | FF23F2 | 8 Long street | HEX124 | FF23F2 |
As you can see, the results above just repeat the same address for the customer, James will have 3 rows, Mandy 2 rows, and Jack 1 row (1 row per address)
So I tried adding this in, hoping it would read a different address, or return a null value if they don't have 3 addresses:
WHERE a.address <> a2.address //Saying the addresses must not be the same, but this loads nothing when I run the SQL
AND a.address <> a3.address
AND a2.address <> a3.address
But this returns 0 results
Desired result:
| James | HEX124 | 1 | Carl | 11022020 | 1 Smith street | HEX124 | LN18HB | 8 Long street | HEX124 | FF23F2 | 8 Big road | HEX124 | FWF4GW |
| Jack | JEU836 | 4 | Smith | 19042020 | Roadhouse Cottage | JEU836 | FK28DD |
| Mandy | GER234 | 33 | Jones | 09042020 | 12 fellow garden | GER234 | LN18JL | 89 Kings avenue | GER234 | HH29DD |
As you can see in my desired result, I am loading each customer once, with ALL of their previous address(es) onto the end of their column, some customers will have either 1, 2, or 3 addresses,

Seems like what you are after here is a pivot, but on the value of a ROW_NUMBER. I assume you have some kind of always ascending key for the addresses, so you can determine the most recent ones. I also use a Cross tab (Conditional Aggregation) as it is far less restrictive that the PIVOT operator:
WITH RNs AS(
SELECT c.[name],
c.[value],
c.num,
c.secondname,
c.[date],
a.address,
a.[value] AS addressvalue,
a.postcode,
ROW_NUMBER() OVER (PARTITION BY c.[value] ORDER BY a.idcolumn DESC) AS RN
FROM dbo.Customer c
JOIN dbo.Address a ON a.[value] = c.[value])
SELECT R.[name],
R.[value],
R.num,
R.secondname,
R.[date],
MAX(CASE R.RN WHEN 1 THEN R.address END) AS address1,
MAX(CASE R.RN WHEN 1 THEN R.addressvalue END) AS addressvalue1,
MAX(CASE R.RN WHEN 1 THEN R.postcode END) AS postcode1,
MAX(CASE R.RN WHEN 2 THEN R.address END) AS address2,
MAX(CASE R.RN WHEN 2 THEN R.addressvalue END) AS addressvalue2,
MAX(CASE R.RN WHEN 2 THEN R.postcode END) AS postcode2,
MAX(CASE R.RN WHEN 3 THEN R.address END) AS address3,
MAX(CASE R.RN WHEN 3 THEN R.addressvalue END) AS addressvalue3,
MAX(CASE R.RN WHEN 3 THEN R.postcode END) AS postcode3
FROM RNs R
GROUP BY c.[name],
c.[value],
c.num,
c.secondname,
c.[date];

You can utilize Row_Number(). ie:
with addresses (AddrNo, [Value], Address, PostCode) as
(
Select row_number() over (partition by [value] order by Address, PostCode),
[Value], Address, PostCode
)
SELECT c.name, c.value, c.num, c.secondname, c.date,
max(case when a.AddrNo=1 then a.address end) as Address1,
max(case when a.AddrNo=1 then a.[Value] end) as Value1,
max(case when a.AddrNo=1 then a.PostCode end) as PostCode1,
max(case when a.AddrNo=2 then a.address end) as Address2,
max(case when a.AddrNo=2 then a.[Value] end) as Value2,
max(case when a.AddrNo=2 then a.PostCode end) as PostCode2,
max(case when a.AddrNo=3 then a.address end) as Address3,
max(case when a.AddrNo=3 then a.[Value] end) as Value3,
max(case when a.AddrNo=3 then a.PostCode end) as PostCode3
FROM Customers c
INNER JOIN Addresses a ON a.[value] = c.[value]
group by c.name, c.value, c.num, c.secondname, c.date;
Note: This is only one of the ways to do it. There are other ways which might be easier.

How to handle duplicates created by LEFT JOIN

LEFT TABLE:
+------+---------+--------+
| Name | Surname | Salary |
+------+---------+--------+
| Foo | Bar | 100 |
| Foo | Kar | 300 |
| Fo | Ba | 35 |
+------+---------+--------+
RIGHT TABLE:
+------+-------+
| Name | Bonus |
+------+-------+
| Foo | 10 |
| Foo | 20 |
| Foo | 50 |
| Fo | 10 |
| Fo | 100 |
| F | 1000 |
+------+-------+
DESIRED OUTPUT:
+------+---------+--------+-------+
| Name | Surname | Salary | Bonus |
+------+---------+--------+-------+
| Foo | Bar | 100 | 80 |
| Foo | Kar | 300 | 0 |
| Fo | Ba | 35 | 110 |
+------+---------+--------+-------+
The closest I get is this:
SELECT
a.Name,
Surname,
sum(Salary),
sum(Bonus)
FROM (SELECT
Name,
Surname,
sum(Salary) as Salary
FROM input
GROUP BY 1,2) a LEFT JOIN (SELECT Name,
SUM(Bonus) as Bonus
FROM input2
GROUP BY 1) b
ON a.Name = b.Name
GROUP BY 1,2;
Which gives:
+------+---------+-------------+------------+
| Name | Surname | sum(Salary) | sum(Bonus) |
+------+---------+-------------+------------+
| Fo | Ba | 35 | 110 |
| Foo | Bar | 100 | 80 |
| Foo | Kar | 300 | 80 |
+------+---------+-------------+------------+
I can't figure out how to get rid of Bonus duplication. Ideal solution for me would be as specified in the 'DESIRED OUTPUT', which is adding Bonus to only one Name and for other records with the same Name adding 0.

You can use row_number():
select l.*, (case when l.seqnum = 1 then r.bonus else 0 end) as bonus
from (select l.*, row_number() over (partition by name order by salary) as seqnum
from "left" l
) l left join
(select r.name, sum(bonus) as bonus
from "right" r
group by r.name
) r
on r.name = l.name

Try a Row_number over the Name category partioned by Name. This will give you different numbers for your duplicates. You can then search for the case when this number is 1 and return the result you want. Else return 0. The code can look something like this.
SELECT
a.Name,
Surname,
sum(Salary),
Case when Duplicate_Order = 1
then bonus
else 0
end as 'Bonus'
FROM (SELECT
Name,
Surname,
sum(Salary) as Salary
,ROW_NUMBER() over (partition by Name order by name) as [Duplicate_Order]
FROM input
GROUP BY 1,2) a
LEFT JOIN (SELECT Name,
SUM(Bonus) as Bonus
FROM input2
GROUP BY 1) b
ON a.Name = b.Name
GROUP BY 1,2;
Hope that helps!

You can use Correlated Subquery with sum() aggregation to compute the bonus column, and then apply lag() window analytic function to get the zeros for successively identical valued column values for the name column :
select Name, Surname, Salary,
bonus - lag(bonus::int,1,0) over (partition by name order by salary) as bonus
from
(
select i1.*,
( select sum(Bonus)
from input2 i2
where i1.Name = i2.Name
group by i2.Name ) as bonus
from input i1
) ii
order by name desc, surname;
Demo

SQL - SELECT duplicates between IDs, but not show records if duplicates occur for same ID

I have the following table (simplified from the real table) at the moment:
+----+-------+-------+
| ID | Name | Phone |
+----+-------+-------+
| 1 | Tom | 123 |
| 1 | Tom | 123 |
| 1 | Tom | 123 |
| 2 | Mark | 321 |
| 2 | Mark | 321 |
| 3 | Kate | 321 |
+----+-------+-------+
My desired output in the SELECT statement is:
+----+------+-------+
| ID | Name | Phone |
+----+------+-------+
| 2 | Mark | 321 |
| 3 | Kate | 321 |
+----+------+-------+
I want to select duplicates only when they occur between two different IDs (like Mark and Kate sharing the same phone number), but not to show any records for IDs that share the same phone number with themselves only (like Tom).
Could someone advise how this can be achieved?

You can use an EXISTS condition with a correlated subquery to ensure that another record exists that has the same phone and a different id. We also need DISTINCT to remove the duplicates in the resultset.
SELECT DISTINCT id, name, phone
FROM mytable t
WHERE EXISTS (
SELECT 1
FROM mytable t1
WHERE t1.phone = t.phone AND t1.id <> t.id
)
Demo on DB Fiddle:
| id | name | phone |
| --- | ---- | ----- |
| 2 | Mark | 321 |
| 3 | Kate | 321 |

You can use window functions for this:
select t.*
from (select t.*,
row_number() over (partition by phone, name order by id) as seqnum,
min(id) over (partition by phone) as min_id,
max(id) over (partition by phone) as max_id
from t
) t
where seqnum = 1 and min_id <> max_id;
Another method uses aggregation and a window function:
select phone, name, id
from (select phone, name, id,
count(*) over (partition by phone) as num_ids
from t
group by phone, name, id
) pn
where num_ids > 1;
Both of these have the advantage over the exists solution (GMB's) that they refer to the "table" only once. That can be a big advantage if the table is a complex view or query. If performance is an issue, I would encourage you to test several variants to see which works best.

Can use somewhat a corelated query with group by and having as below
Select ID, NAME, max(PHONE) From
(Select * From Table) t group by id,
name having
1= max(
case
When phone in (select phone from
table where t.id<>Id) then 1 else 0)
end)

Join Table From Minimum Value and Specific Name

I have:
Table id
+--------+
| number |
+--------+
| 1 |
| 2 |
| 3 |
+--------+
Table data
+-------+--------------+
| name | phone_number |
+-------+--------------+
| Bob | 111 |
| John | 333 |
| Alice | 555 |
+-------+--------------+
How to join table with results: (number from minimum value & name='John') ?
+--------+-------+--------------+
| number | name | phone_number |
+--------+-------+--------------+
| 1 | John | 333 |
+--------+-------+--------------+

You can try below -
select
(select min(number) FROM ID) as number, name, phone_number
from date
where name = 'John'

You can use cross join:
select min(number) as number, name, phone_number
from Table_Id
cross join Table_Data
group by name, phone_number

Depending on the RDBMS you're using, this query should get you close.
SELECT
MIN_NUMBER, NAME, PHONE_NUMBER
FROM
DATA LEFT JOIN (SELECT MIN(NUMBER) AS MIN_NUMBER FROM ID) ON 1=1
WHERE NAME = 'JOHN'

If rows have repeated names only return the row with the repeat

To elaborate, say I have this table:
NAME | ID | EMAIL | TYPE
------+----+-------------+------
Joe | 1 | NULL | 01
Joe | 1 | joe#email | 02
Henry | 2 | NULL | 01
Jane | 3 | jane#email | 01
Jane | 3 | jane#email | 02
Larry | 4 | larry#email | 01
Sue | 5 | NULL | 02
I want to return this:
Joe | 1 | joe#email | 02
Henry | 2 | NULL | 01
Jane | 3 | jane#email | 02
Larry | 4 | larry#email | 01
Sue | 5 | NULL | 02
I've tried Select Distinct but that returns the original table. I have not found anything else that seems to tackle what I'm asking since the rows aren't total repeats, just the first two columns.
Select *
From Table_Name

You seem to want the record from each person with the highest TYPE value. One straightforward approach uses ROW_NUMBER to identify the records you want to retain:
SELECT NAME, ID, EMAIL, TYPE
FROM
(
SELECT NAME, ID, EMAIL, TYPE,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY TYPE DESC) rn
FROM yourTable
) t
WHERE rn = 1;
Demo

I think you can do what you want using aggregation:
select name, id, max(email) as email, max(type) as type
from tablename
group by name, id;

I would use GROUP BY and JOIN
select t1.*
from table_name t1
join (
select id, max(type) max_type
from table_name
group by id
) t2 on t1.id = t2.id and
t1.type = t2.max_type

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Select duplicate records info - sql

SELECT a.* FROM person a JOIN ( SELECT Phone, Street, ROW_NUMBER() OVER (PARTITION BY Phone ORDER BY CASE WHEN street is null then 0 else 1 end) as 'Rank' FROM Person )b ON a.Phone = b.Phone AND a.Street = b.Street WHERE b.Rank = 1

Try this select a.* from Table1 a inner join ( select distinct Phone from Table1 group by Phone ) as b on a.Phone= b.Phone

Related

SELECT multiple instances of a row where they might have the same value

How to handle duplicates created by LEFT JOIN

SQL - SELECT duplicates between IDs, but not show records if duplicates occur for same ID

Join Table From Minimum Value and Specific Name

If rows have repeated names only return the row with the repeat

Categories

Resources