Merging Duplicate Rows with SQL

Merging Duplicate Rows with SQL - sql

I have a table that contains usernames, these names are duplicated in various forms, for example, Mr. John is replicated as John Mr. I want to combine the two rows using their unique phone numbers in SQL.
I want a new table in this form after removing the duplicates

you can do it with ROW_NUMBER window function.
First, you need to group the data by your unique column (Phone_Number), then sort by name.
Preparing the table and example data:
DECLARE #vCustomers TABLE (
Name NVARCHAR(25),
Phone_Number NVARCHAR(9),
Address NVARCHAR(25)
)
INSERT INTO #vCustomers
VALUES
('Mr John', '234881675', 'Lagos'),
('Mr Felix', '234867467', 'Atlanta'),
('Mrs Ayo', '234786959', 'Doha'),
('John Mr', '234881675', 'Lagos'),
('Mr Jude', '235689760', 'Rabat'),
('Ayo', '234786959', 'Doha'),
('Jude', '235689760', 'Rabat')
After that, removing the duplicate rows:
DELETE
vc
FROM (
SELECT
ROW_NUMBER() OVER(PARTITION BY Phone_Number ORDER BY Name DESC) AS RN
FROM #vCustomers
) AS vc
WHERE RN > 1
SELECT * FROM #vCustomers
As final, the result:
Name
Phone_Number
Address
Mr John
234881675
Lagos
Mr Felix
234867467
Atlanta
Mrs Ayo
234786959
Doha
Mr Jude
235689760
Rabat

Related

Transform rows to comma delimited string by groups of n records

Based on this sample table
ID
NAME
ZIP
NTID
1
Juan
123
H1
1
Juan
123
H2
2
John
456
H3
2
John
456
H4
2
John
456
H5
I want to show ntid as a comma separated value but in groups of maximum 2 items
Expected result
ID
NAME
ZIP
NTID
1
Juan
123
H1, H2
2
John
456
H3, H4
2
John
456
H5
Using SQL Server, is there a way to achieve this?
By using the STRING_AGG function it concatenates all in a single row, but I need to group as groups of maximum of 2 members.
SELECT ID, NAME, ZIP, STRING_AGG(NTID,',')
FROM MyTable
GROUP BY ID, NAME, ZIP

So you can use a modified row number (divided by 2) to group your data into blocks of 2 rows, and then you can use string_agg().
You do need a way to order the rows though (if you want consistent results), I have assumed NTID will work, but you may have a better column to order by.
declare #Test table (ID int, [NAME] varchar(32), ZIP varchar(12), NTID varchar(2));
insert into #Test (ID, [NAME], ZIP, NTID)
values
(1, 'Juan', '123', 'H1'),
(1, 'Juan', '123', 'H2'),
(2, 'John', '456', 'H3'),
(2, 'John', '456', 'H4'),
(2, 'John', '456', 'H5');
with cte as (
select *
, (row_number() over (partition by ID order by NTID) - 1) / 2 rn
from #Test
)
select ID, [NAME], string_agg(NTID,',')
from cte
group by ID, [NAME], rn;
db<>fiddle here
Note: If you supply the DDL+DML as part of your question you make it much easier to answer.

How can I use SQL Pivot Table to turn my rows of data into columns

In SQL 2008, I need to flatten a table and show extra rows as columns. All I can find are queries with calculations. I just want to show the raw data. The data is like as below (simplified):
ID# Name Name_Type
1 Mary Jane Legal
1 MJ Nickname
1 Smith Maiden
2 John Legal
3 Suzanne Legal
3 Susie Nickname
I want the data to show as:
ID# Legal Nickname Maiden
1 Mary Jane MJ Smith
2 John
3 Suzanne Susie
where nothing shows in the column if there is not a row existing for that column. I'm thinking the Pivot Table method should work.

PIVOT requires you to use an aggregate. See this post for a better explanation of how it works.
CREATE TABLE #MyTable
(
ID# INT
, Name VARCHAR(50)
, Name_Type VARCHAR(50)
);
INSERT INTO #MyTable VALUES
(1, 'Mary Jane', 'Legal')
, (1, 'MJ', 'Nickname')
, (1, 'Smith', 'Maiden')
, (2, 'John', 'Legal')
, (3, 'Suzanne', 'Legal')
, (3, 'Susie', 'Nickname');
SELECT *
FROM
(
SELECT * FROM #MyTable
) AS Names
PIVOT (MAX(NAME)
FOR Name_Type IN ([Legal], [Nickname], [Maiden]))
AS PVT;
DROP TABLE #MyTable;

Try this (replace "new_Table" with your table - name and "ID_" with your id - column):
SELECT ID_ AS rootID, (
SELECT Name
FROM new_table
WHERE Name_type = 'legal'
AND new_table.ID_ = rootID
) AS legal,
(
SELECT Name
FROM new_table
WHERE Name_type = 'Nickname'
AND ID_ = rootID
) AS Nickname,
(
SELECT Name
FROM new_table
WHERE Name_type = 'Maiden'
AND ID_ = rootID
) AS Maiden
FROM new_table
GROUP BY rootID;

ms-access 2010: count duplicate names per household address

I am currently working with a spreadsheet in MS Access 2010 which contains about 130k rows of information about people who voted in a local election recently. Each row has their residential information (street name, number, postcode etc.) and personal information (title, surname, forename, middle name, DOB etc.). Each row represents an individual person rather than a household (therefore in many cases the same residential address appears more than once as more than one person resides in a particular household).
What I want to achieve is basically to create a new field in this dataset called 'count'. I want this field to give me a count of how many different surnames reside at a single address.
Is there an SQL script that will allow me to do this in Access 2010?
+------------------+----------+-------+---------+----------+-------------+
| PROPERTYADDRESS1 | POSTCODE | TITLE | SURNAME | FORENAME | MIDDLE_NAME |
+------------------+----------+-------+---------+----------+-------------+
FAKEADDRESS1 EEE 5GG MR BLOGGS JOE N
FAKEADDRESS2 EEE 5BB MRS BLOGGS SUZANNE P
FAKEADDRESS3 EEE 5RG MS SMITH PAULINE S
FAKEADDRESS4 EEE 4BV DR JONES ANNE D
FAKEADDRESS5 EEE 3AS MR TAYLOR STUART A
The following syntax has got me close so far:
SELECT COUNT(electoral.SURNAME)
FROM electoral
GROUP BY electoral.UPRN
However, instead of returning me all 130k odd rows, it only returns me around 67k rows. Is there anything I can do to the syntax to achieve the same result, but just returning every single row?
Any help is greatly appreciated!
Thanks

You could use something like this:
select *,
count(surname) over (partition by householdName)
from myTable
If you have only one column which contains the name,
ex: Rob Adams
then you can do this to have all the surnames in a different column so it will be easier in the select:
SELECT LEFT('HELLO WORLD',CHARINDEX(' ','HELLO WORLD')-1)
in our example:
select right (surmane, charindex (' ',surname)-1) as surname
example on how to use charindex, left and right here:
http://social.technet.microsoft.com/wiki/contents/articles/17948.t-sql-right-left-substring-and-charindex-functions.aspx
if there are any questions, leave a comment.
EDIT: I edited the query, had a syntax error, please try it again. This works on sql server.
here is an example:
create table #temp (id int, PropertyAddress varchar(50), surname varchar(50), forname varchar(50))
insert into #temp values
(1, 'hiddenBase', 'Adamns' , 'Kara' ),
(2, 'hiddenBase', 'Adamns' , 'Anne' ),
(3, 'hiddenBase', 'Adamns' , 'John' ),
(4, 'QueensResidence', 'Queen' , 'Oliver' ),
(5, 'QueensResidence', 'Queen' , 'Moira' ),
(6, 'superSecretBase', 'Diggle' , 'John' ),
(7, 'NandaParbat', 'Merlin' , 'Malcom' )
select * from #temp
select *,
count (surname) over (partition by PropertyAddress) as CountMembers
from #temp
gives:
1 hiddenBase Adamns Kara 3
2 hiddenBase Adamns Anne 3
3 hiddenBase Adamns John 3
7 NandaParbat Merlin Malcom 1
4 QueensResidence Queen Oliver 2
5 QueensResidence Queen Moira 2
6 superSecretBase Diggle John 1
Your query should look like this:
select *,
count (SURNAME) over (partition by PropertyAddress) as CountFamilyMembers
from electoral
EDIT
If over partition by isn't supported, then I guess you can get to your desired result by using group by
select *,
count (SURNAME) over (partition by PropertyAddress) as CountFamilyMembers
from electoral
group by -- put here the fields in the select (one by one), however you can't write group by *

GROUP BY creates an aggregate query, so it's by design that you get fewer records (one per UPRN).
To get the count for each row in the original table, you can join the table with the aggregate query:
SELECT electoral.*, elCount.NumberOfPeople
FROM electoral
INNER JOIN
(
SELECT UPRN, COUNT(*) AS NumberOfPeople
FROM electoral
GROUP BY UPRN
) AS elCount
ON electoral.UPRN = elCount.UPRN

Given the update I want to post another answer. Try it like this:
create table #temp2 ( PropertyAddress1 varchar(50), POSTCODE varchar(20), TITLE varchar (20),
surname varchar(50), FORENAME varchar(50), MIDDLE_NAME varchar (50) )
insert into #temp2 values
('FAKEADDRESS1', 'EEE 5GG', 'MR', 'BLOGGS', 'JOE', 'N'),
('FAKEADDRESS1', 'EEE 5BB', 'MRS', 'BLOGGS', 'SUZANNE', 'P'),
('FAKEADDRESS2', 'EEE 5RG', 'MS', 'SMITH', 'PAULINE', 'S'),
('FAKEADDRESS3', 'EEE 4BV', 'DR', 'JONES', 'ANNE', 'D'),
('FAKEADDRESS4', 'EEE 3AS', 'MR', 'TAYLOR', 'STUART', 'A')
select PropertyAddress1, surname,count (#temp2.surname) as CountADD
into #countTemp
from #temp2
group by PropertyAddress1, surname
select * from #temp2 t2
left join #countTemp ct
on t2.PropertyAddress1 = ct.PropertyAddress1 and t2.surname = ct.surname
This yields:
PropertyAddress1 POSTCODE TITLE surname FORENAME MIDDLE_NAME PropertyAddress1 surname CountADD
FAKEADDRESS1 EEE 5GG MR BLOGGS JOE N FAKEADDRESS1 BLOGGS 2
FAKEADDRESS1 EEE 5BB MRS BLOGGS SUZANNE P FAKEADDRESS1 BLOGGS 2
FAKEADDRESS2 EEE 5RG MS SMITH PAULINE S FAKEADDRESS2 SMITH 1
FAKEADDRESS3 EEE 4BV DR JONES ANNE D FAKEADDRESS3 JONES 1
FAKEADDRESS4 EEE 3AS MR TAYLOR STUART A FAKEADDRESS4 TAYLOR 1

One SQL statement for counting the records in the master table based on matching records in the detail table?

I have the following master table called Master and sample data
ID---------------Date
1 2014-09-07
2 2014-09-07
3 2014-09-08
The following details table called Details
masterId-------------Name
1 John Walsh
1 John Jones
2 John Carney
1 Peter Lewis
3 John Wilson
Now I want to find out the count of Master records (grouped on the Date column) whose corresponding details record with Name having the value "John".
I cannot figure how to write a single SQL statement for this job.
**Please note that join is needed in order to find master records for count. However, such join creates duplicate master records for count. I need to remove such duplicate records from being counted when grouping on the Date column in the Master table.
The correct results should be:
count: grouped on Date column
2 2014-09-07
1 2014-09-08
**
Thanks and regards!

This answer assumes the following
The Name field is always FirstName LastName
You are looking once and only once for the John firstname. The search criteria would be different, pending what you need
SELECT Date, Count(*)
FROM tblmaster
INNER JOIN tbldetails ON tblmaster.ID=tbldetails.masterId
WHERE NAME LIKE 'John%'
GROUP BY Date, tbldetails.masterId
What we're doing here is using a wilcard character in our string search to say "Look for John where any characters of any length follows".
Also, here is a way to create table variables based on what we're working with
DECLARE #tblmaster as table(
ID int,
[date] datetime
)
DECLARE #tbldetails as table(
masterID int,
name varchar(50)
)
INSERT INTO #tblmaster (ID,[date])
VALUES
(1,'2014-09-07'),(2,'2014-09-07'),(3,'2014-09-08')
INSERT INTO #tbldetails(masterID, name) VALUES
(1,'John Walsh'),
(1,'John Jones'),
(2,'John Carney'),
(1,'Peter Lewis'),
(3,'John Wilson')
Based on all comments below, this SQL statement in it's clunky glory should do the trick.
SELECT date,count(t1.ID) FROM #tblmaster mainTable INNER JOIN
(
SELECT ID, COUNT(*) as countOfAll
FROM #tblmaster t1
INNER JOIN #tbldetails t2 ON t1.ID=t2.masterId
WHERE NAME LIKE 'John%'
GROUP BY id)
as t1 on t1.ID = mainTable.id
GROUP BY mainTable.date

Is this what you want?
select date, count(distinct m.id)
from master m join
details d
on d.masterid = m.id
where name like '%John%'
group by date;

Add or delete repeated row

I have an output like this:
id name date school school1
1 john 11/11/2001 nyu ucla
1 john 11/11/2001 ucla nyu
2 paul 11/11/2011 uft mit
2 paul 11/11/2011 mit uft
I would like to achieve this:
id name date school school1
1 john 11/11/2001 nyu ucla
2 paul 11/11/2011 mit uft
I am using direct join as in:
select distinct
a.id, a.name,
b.date,
c.school
a1.id, a1.name,
b1.date,
c1.school
from table a, table b, table c,table a1, table b1, table c1
where
a.id=b.id
and...
Any ideas?

We will need more information such as what your tables contain and what you are after.
One thing I noticed is you have a school and then school1. 3nf states that you should never duplicate fields and append numbers to them to get more information even if you think that the relationship will only be 1 or 2 additional items. You need to create a second table that stores a user associated with 1 to many schools.

I agree with everyone else that both your source table and your desired output are poor design. While you probably can't do anything about your source table, I recommend the following code and output:
Select id, name, date, school from MyTable;
union
Select id, name, date, school1 from MyTable;
(repeat as necessary)
This will give you results in the format:
id name date school
1 john 11/11/2001 nyu
1 john 11/11/2001 ucla
2 paul 11/11/2011 mit
2 paul 11/11/2011 uft
(Note: in my version of SQL, union queries automatically select distinct records so the distinct flag isn't needed)
With this format, you could easily count the number of schools per student, number of students per school, etc.
If processing time and/or storage space is a factor here, you could then split this into 2 tables, 1 with the id,name & date, the other with the id & school (basically what JonH just said). But if you're just working up some simple statistics, this should suffice.

This problem was just too irresistable, so I just took a guess at the data structures that we are dealing with. The technology wasn't specified in the question. This is in Transact-SQL.
create table student
(
id int not null primary key identity,
name nvarchar(100) not null default '',
graduation_date date not null default getdate(),
)
go
create table school
(
id int not null primary key identity,
name nvarchar(100) not null default ''
)
go
create table student_school_asc
(
student_id int not null foreign key references student (id),
school_id int not null foreign key references school (id),
primary key (student_id, school_id)
)
go
insert into student (name, graduation_date) values ('john', '2001-11-11')
insert into student (name, graduation_date) values ('paul', '2011-11-11')
insert into school (name) values ('nyu')
insert into school (name) values ('ucla')
insert into school (name) values ('uft')
insert into school (name) values ('mit')
insert into student_school_asc (student_id, school_id) values (1,1)
insert into student_school_asc (student_id, school_id) values (1,2)
insert into student_school_asc (student_id, school_id) values (2,3)
insert into student_school_asc (student_id, school_id) values (2,4)
select
s.id,
s.name,
s.graduation_date as [date],
(select max(name) from
(select name,
RANK() over (order by name) as rank_num
from school sc
inner join student_school_asc ssa on ssa.school_id = sc.id
where ssa.student_id = s.id) s1 where s1.rank_num = 1) as school,
(select max(name) from
(select name,
RANK() over (order by name) as rank_num
from school sc
inner join student_school_asc ssa on ssa.school_id = sc.id
where ssa.student_id = s.id) s2 where s2.rank_num = 2) as school1
from
student s
Result:
id name date school school1
--- ----- ---------- ------- --------
1 john 2001-11-11 nyu ucla
2 paul 2011-11-11 mit uft

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Merging Duplicate Rows with SQL - sql

I have a table that contains usernames, these names are duplicated in various forms, for example, Mr. John is replicated as John Mr. I want to combine the two rows using their unique phone numbers in SQL. I want a new table in this form after removing the duplicates

Related

Transform rows to comma delimited string by groups of n records

How can I use SQL Pivot Table to turn my rows of data into columns

ms-access 2010: count duplicate names per household address

One SQL statement for counting the records in the master table based on matching records in the detail table?

Add or delete repeated row

Categories

Resources