Row merging in sql - sql

How to merge the duplicate records in single rows.

You need aggregation :
select id, max(lname) as lname, max(fname) as fname, max(address) as address,
max(zip) as zip, max(city) as city, max(state) as state, max(phone)
from table t
group by id;

You can use aggregation:
select id, max(lname) as lname, max(fname) as fname, max(address) as address,
. . .
from t
group by id;

Related

How do I find a duplicate in SQL

I have a query that selects 3 columns. Each row row should be a unique combination of county, city,and zip. However, I have reason to believe I'm getting a duplicate somewhere. How do I find the duplicate ? COUNT() ?? This in MS SQL Server . Any help would be most appreciated. --Jason
SELECT COUNTY, CITY, ZIP
FROM MoratoriumLocations
WHERE MoratoriumID=20
ORDER BY County
You coul use group by and having
SELECT COUNTY, CITY, ZIP
FROM MoratoriumLocations
WHERE MoratoriumID=20
GROUP BY COUNTY, CITY, ZIP
HAVING COUNT(1) >1
ORDER BY County
If you want to get the full row details you can use a sub query in combination with the group by and having statements
SELECT x.*
FROM MoratoriumLocations x
INNER JOIN(
SELECT COUNTY, CITY, ZIP
FROM MoratoriumLocations
WHERE MoratoriumID=20
GROUP BY COUNTY, CITY, ZIP
HAVING COUNT(1) >1
) dups ON dups.County = x.County
AND dups.City = x.City
AND dups.Zip = x.Zip
See Preben's answer for how to find dups.
To avoid dups altogether consider creating an unique index.
I would suggest window functions:
SELECT ml.*
FROM (SELECT ml.*, COUNT(*) OVER (PARTITION BY County, City, Zip) as cnt
FROM MoratoriumLocations ml
WHERE MoratoriumID = 20
) ml
ORDER BY cnt DESC, County, City, Zip;
This will show the complete rows with duplicates, which can help you understand them better.

Oracle query: how do I limit the returned records to only those having a count > 1 but show full results?

I need to show all the users who have more than one ID but not return the users who do. I tried group by having but I need to list the IDs and not just count them so could not get that to work for me. I ended up with using a the code below but it returns all the records.
select id,fname,lname,ssn,dob
count(id) over partition by fname,lname,ssn,dob) as cnt
from TABLE
order by cnt desc;
Use a subquery:
select id, fname, lname, ssn, dob
from (select id, fname, lname, ssn, dob,
count(id) over (partition by fname, lname, ssn, dob) as cnt
from TABLE
) t
where cnt >= 2
order by cnt;
WITH CTE (FNAME, LNAME, TALLY) AS
(
SELECT FNAME, LNAME, COUNT(ID) AS TALLY
FROM TABLE
HAVING COUNT(ID) > 1
)
SELECT T.ID, C.FNAME,C.LNAME FROM CTE C
JOIN TABLE T
ON C.FNAME = T.FNAME
AND C.LNAME = T.LNAME

Advanced table deduping

have a question... I have a table that has over 2 billion rows. Many are duplicates but there is a column (varchar) that has a validity date in a format such as 201806.
I want to dedupe the table BUT keep the most current date.
ex.
ID,fname, lname, addrees, city, state, zip, validitydate
1,steve,smith, pob 123, miami, fl. 33081,201709
2,steve,smith, pob 123, miami, fl. 33081,201010
3,steve,smith, pob 123, miami, fl. 33081,201809
4.steve,smith, pob 123, miami, fl. 33081,201201
I only want to keep: steve,smith, pob 123, miami, fl. 33081,201809 as it is the most current. If I run the below, it dedups, but it's a crap-shoot which one is left in the table as I cannot add the validityDate as the tsql will then look as all of them as unique.
How can I make it so it dedups but calculates to keep the most current date as the final entry?
thanks in advance.
WITH Records AS
(
SELECT fname, lname, addrees, city,
ROW_NUMBER() OVER (
PARTITION BY fname, lname, addrees, city, state, zip,
validitydate by ID) AS RecordInstance
FROM PEOPLE where lastname like 'S%'
)
DELETE
FROM Records
WHERE
RecordInstance > 1
Order by month (descending) so the RecordInstance will be 1 for the most current one:
WITH Records AS (
SELECT fname, lname, addrees, city,
ROW_NUMBER() OVER (
PARTITION BY fname, lname, addrees, city, state, zip
ORDER BY validitydate DESC -- Add this to order correctly!
) AS RecordInstance
FROM PEOPLE where lastname like 'S%'
)
DELETE FROM Records WHERE RecordInstance > 1
The delete will also work with just the ROW_NUMBER in the CTE. Which is ordered by the descending validitydate. So that the most recent month will have row_number 1 and you can delete those > 1
WITH CTE AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY fname, lname, addrees, city, state, zip ORDER BY validitydate DESC, ID DESC) AS rn
FROM PEOPLE
WHERE lname like 'S%'
)
DELETE
FROM CTE
WHERE rn > 1;
A test can be found here
Here is a link to an article I wrote regarding this very issue.
https://sqlfundamentals.wordpress.com/delete-duplicate-rows-in-t-sql/
Hope this helps.

Get distinct name, address, max(date) while preserving ID

I have the following table structure:
ID | fname | lname | street | date
I'm trying to grab the distinct fname, lname, street and max(date) but also preserve the id of the matching row. So there might be multiple lines of matching fname, lname, street but all with different IDs Seems like a simple thing but evidently it's escaped me to this point.
I found some solutions that almost fit this but not quite. My apologies if this has been covered.
Thanks.
Try the following:
;WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY fname, lname, street ORDER BY [Date] DESC) RN
FROM yourTable
)
SELECT ID, fname, lname, street, [date]
FROM CTE
WHERE RN = 1
Assuming max(date) is in the max(id):
select max(ID), fname, lname, street, max(date)
from tablename
group by fname, lname, street

SQL DISTINCT [Alternative Using]

I have a simple query on Oracle.
SELECT DISTINCT City, Name, Surname FROM Persons
Is there any alternative sql query for the same query without DISTINCT ?
Have a look at this article
Example as;
select City
from (
select City,
row_number() over
(partition by City
order by City) rownumber
from Persons
) t
where rownumber = 1
SELECT City, Name, Surname FROM Persons
UNION
SELECT City, Name, Surname FROM Persons
SELECT First(City), First(Name), First(Surname)
FROM Persons
GROUP BY City, Name, Surname