Delete and Merge Records in SQL Server

Delete and Merge Records in SQL Server - sql

I have a table as following.
id | firstname| lastname | email | homephone
-------------------------------------------------------
1 | aaa | bbb | xxx#yyy.com | 12344444
2 | aaa | bbb | null | null
3 | ccc | ddd | zzz#fff.com | null
4 | ccc | ddd | null | 34343322
The issue is I want to keep only 1 record since these are considered as duplicates and merge the nulls so that the table appears as follows
1 aaa | bbb | xxx#yyy.com | 12344444
3 ccc | ddd | zzz#fff.com | 3433322
So far I have managed to get the duplicates using the following the code
Select
max(a.id) as original id, b.id as DuplicateId,
a.firstname, b.firstname as dup_fname,
a.lastname, b.lastname as dup_lname,
a.email, b.email
From
tbl_xxx a
join
tbl_xxx b on a.firstname = b.firstname
and a.lastname = b.lastname
and a.email is null
and a.homephone is null
and b.email is null
and b.homephone is null
and v.id < v2.id
Group by
b.id, a.firstname, b.firstname, a.lastname,
b.lastname, a.homephone, b.homephone
My merge query looks like this
update tbl_xxx
SET
email = email ,
phone = phone
where
firstname = firstname
and lastname = lastname
and email is null
and phone is null
Eventually I will get distinct rows.
Is my approach correct? kindly suggest how can I make my query more efficient

update tbl_tmpdupes3 SET
email = email ,
phone = phone ,
where
firstname=firstname
and lastname=lastname
and email is null
and homephone is null
will do absolutely nothing, since the query is not comparing the table against itself, but each row against itself. Using an update won't work either, you'll still have duplicates. What you want is to remove the duplicate data entirely, AFTER doing an update comparing the table to itself. So, basically we run one query that ensures all information in duplicates that is not null is copied to the originals, then we delete the higher value pk.
One way of tackling the problem update one column at a time:
update tbl_xxx SET tbl_xxx.email = tmp.email
FROM (SELECT tbl_xxx.firstname,tbl_xxx.lastname,tbl_xxx.email FROM tbl_xxx
WHERE NOT tbl_xxx.email IS NULL LIMIT 1)
AS tmp ON tmp.firstname = tbl_xxx.firstname AND tmp.lastname = tbl_xxx.lastname
WHERE tbl_xxx.email IS NULL;
update tbl_xxx SET tbl_xxx.phone = tmp.phone
FROM (SELECT tbl_xxx.firstname,tbl_xxx.lastname,tbl_xxx.phone FROM tbl_xxx
WHERE NOT tbl_xxx.phone IS NULL LIMIT 1)
AS tmp ON tmp.firstname = tbl_xxx.firstname AND tmp.lastname = tbl_xxx.lastname
WHERE tbl_xxx.phone is NULL;
The query finds the values of each column for first name and last name, and copies over first value it finds into null fields. So, if the original data was missing, it will add it. It may not be 100% correct if two different people in the DB have the same name, you'll have to take that into consideration.
That said, follow it up with this query, which should only delete the higher-pk row that is identical.
DELETE FROM tbl_xxx WHERE tbl_xxx.id IN (
SELECT max(id) FROM tbl_xxx
GROUP BY tbl_xxx.firstname,tbl_xxx.lastname,tbl_xxx.phone,tbl_xxx.email
HAVING count(tbl_xxx.id) > 1));
Edit: if there are potentially multiple duplicates, you could do:
DELETE FROM tbl_xxx WHERE tbl_xxx.id NOT IN (
SELECT min(id) FROM tbl_xxx
GROUP BY tbl_xxx.firstname,tbl_xxx.lastname,tbl_xxx.phone,tbl_xxx.email);

you can use Merge statement for this. Try this sample.
create table temptable (id int, firstname varchar(50), lastname varchar(50), email varchar(50), homephone varchar(50))
insert into temptable values
(1,'aaa' , 'bbb', 'xxx#yyy.com', 1234444),
(2,'aaa' , 'bbb', null, null),
(3,'ccc' , 'ddd', 'abc#ddey.com', null),
(4,'ccc' , 'ddd', null, 34343322 )
select * from temptable
;with cte as
(
select firstname, lastname
,(select top 1 id from temptable b where b.firstname = a.firstname and b.lastname = a.lastname and ( b.email is not null or b.homephone is not null)) tid
,(select top 1 email from temptable b where b.firstname = a.firstname and b.lastname = a.lastname and b.email is not null ) email
,(select top 1 homephone from temptable b where b.firstname = a.firstname and b.lastname = a.lastname and b.homephone is not null ) homephone
from temptable a
group by firstname , lastname
)
--select * from cte
merge temptable as a
using cte as b
on ( a.id = b.tid )
when matched
then
update set a.email = b.email , a.homephone = b.homephone
when not matched by source then
delete ;
select * from temptable
drop table temptable

Related

Postgres - Where EXISTS else use other where condition

I am trying to write a single query that simply looks for a record based on 2 values. However, if the record doesn't exist I want to search again where 1 of the values (last name) is null. I'm trying to figure out if this is possible outside of PL/SQL through some use of EXISTS or IN keywords.
SELECT t.id
FROM table t
WHERE t.first_name = :firstName AND
EXISTS (SELECT t.id FROM table t WHERE t.first_name = :firstName AND t.last_name = :lastName)
ELSE t.last_name IS NULL;
EDIT:
I have 2 records:
(1, John, null) & (2, John, Frank)
If we search for John Jonas then, we expect 1 to be returned. Alternatively, if we search for John Frank we expect 2 to be returned.

You might use COALESCE:
select t.id
from my_table t
where t.first_name = :firstName
and coalesce(t.last_name, :lastName) = :lastName;
The above query returns all the rows where first_name is equal to :firstName and last_name is null or equal to :lastName. The logic you want (conditional querying) is much more complex:
with condition(do_exist) as (
select exists(
select from my_table
where first_name = :firstName
and last_name = :lastName)
)
select id
from my_table
cross join condition
where first_name = :firstName
and case when do_exist then last_name = :lastName else last_name is null end;
Test it in Db<>fiddle.

Select mismatched column values from Two Tables

I have two tables which has same column names.
For example:
NEEDTOSYNCREQUESTS table
Column Name Value
----------------------------
ID 1
LoadId L1
ShipmentId 123
OrderId NULL
PackageId P456
CustomerOTP 99999
ClientOTP 88888
LASTSYNCEDREQUEST table:
Column Name Value
-------------------------
ID 1
LoadId L1
ShipmentId NULL
OrderId 1234567
PackageId P456
CustomerOTP 44444
ClientOTP 686868
If you compare the above table's column valuesy You could see the following:
CustomerOTP & ClientOTP columns values are not identical.
ShipmentId Column in NEEDTOSYNCREQUESTS has value and ShipmentId Column in LASTSYNCEDREQUEST Table is NULL.
OrderId Column in LASTSYNCEDREQUEST Table has value and ShipmentId Column in NEEDTOSYNCREQUESTS Table is NULL.
So, I need to get the following output. How to achieve this?
OUTPUT
Column Name Value
---------------------------------
ID 1
LoadId NULL
ShipmentId 123
OrderId NULL
PackageId NULL
CustomerOTP 99999
ClientOTP 88888
The condition is, I need to compare the above two tables and needed only the updated column values NEEDTOSYNCREQUESTS Table when compared with another LASTSYNCEDREQUEST Table. Note: Both the columns have same values or NEEDTOSYNCREQUESTS Table Column does not have values then those columns should be null in the Output. PackageId in both the Table is Identical(same). So, I need PackageId to be NULL in the output.
Please help me to achieve this in a SQL query.
Thanks in advance!

As you can see, the same rules implemented in 3 WHENs in a CASE statement for all fields.
SELECT A.ID,
CASE WHEN A.LOADID = B.LOADID THEN NULL
WHEN A.LOADID IS NULL THEN NULL
WHEN (B.LOADID IS NULL AND A.LOADID IS NOT NULL) OR (A.LOADID IS NOT NULL AND B.LOADID IS NOT NULL) THEN A.LOADID END AS LOADID,
CASE WHEN A.SHIPMENTID = B.SHIPMENTID THEN NULL
WHEN A.SHIPMENTID IS NULL THEN NULL
WHEN (B.SHIPMENTID IS NULL AND A.SHIPMENTID IS NOT NULL) OR (A.SHIPMENTID IS NOT NULL AND B.SHIPMENTID IS NOT NULL) THEN A.SHIPMENTID END AS SHIPMENTID,
CASE WHEN A.ORDERID = B.ORDERID THEN NULL
WHEN A.ORDERID IS NULL THEN NULL
WHEN (B.ORDERID IS NULL AND A.ORDERID IS NOT NULL) OR (A.ORDERID IS NOT NULL AND B.ORDERID IS NOT NULL) THEN A.ORDERID END AS ORDERID,
CASE WHEN A.PACKAGEID = B.PACKAGEID THEN NULL
WHEN A.PACKAGEID IS NULL THEN NULL
WHEN (B.PACKAGEID IS NULL AND A.PACKAGEID IS NOT NULL) OR (A.PACKAGEID IS NOT NULL AND B.PACKAGEID IS NOT NULL) THEN A.PACKAGEID END AS PACKAGEID,
CASE WHEN A.CUSTOMEROTP = B.CUSTOMEROTP THEN NULL
WHEN A.CUSTOMEROTP IS NULL THEN NULL
WHEN (B.CUSTOMEROTP IS NULL AND A.CUSTOMEROTP IS NOT NULL) OR (A.CUSTOMEROTP IS NOT NULL AND B.CUSTOMEROTP IS NOT NULL) THEN A.CUSTOMEROTP END AS CUSTOMEROTP,
CASE WHEN A.CLIENTOTP = B.CLIENTOTP THEN NULL
WHEN A.CLIENTOTP IS NULL THEN NULL
WHEN (B.CLIENTOTP IS NULL AND A.CLIENTOTP IS NOT NULL) OR (A.CLIENTOTP IS NOT NULL AND B.CLIENTOTP IS NOT NULL) THEN A.CLIENTOTP END AS CLIENTOTP
FROM
NEEDTOSYNCREQUESTS A
INNER JOIN
LASTSYNCEDREQUEST B
ON A.ID = B.ID;

you can try a case based query like below
See live demo
select
id = N.id,
Loadid = case
when ISNULL(N.Loadid,'')=ISNULL(L.Loadid,'')
then NULL
else N.LoadId
end,
Shipmentid=case
when ISNULL(N.Shipmentid,'')=ISNULL(L.Shipmentid,'')
then NULL
else N.Shipmentid
end,
orderid=case
when ISNULL(N.orderid,'')=ISNULL(L.orderid,'')
then NULL
else N.orderid
end,
packageid=case
when ISNULL(N.packageid,'')=ISNULL(L.packageid,'')
then NULL
else N.packageid
end,
customerOTP=case
when ISNULL(N.customerOTP,'')=ISNULL(L.customerOTP,'')
then NULL
else N.customerOTP
end,
clientOTP=case
when ISNULL(N.clientOTP,'')=ISNULL(L.clientOTP,'')
then NULL
else N.clientOTP
end
from
NEEDTOSYNCREQUESTS N LEFT JOIN
LASTSYNCEDREQUEST L ON
N.id=L.id

Try this:
SELECT n.ID, n.LoadId, n.ShipmentId,
n.OrderId, NULL PackageId, n.CustomerOTP,
n.ClientOTP
FROM NEEDTOSYNCREQUESTS AS n
INNER JOIN LASTSYNCEDREQUEST AS l ON n.ID = l.ID AND n.LoadId = l.LoadId
WHERE n.CustomerOTP <> l.CustomerOTP AND
n.ClientOTP <> l.ClientOTP
AND n.ShipmentId IS NOT NULL
AND l.ShipmentId IS NULL
AND l.OrderId IS NOT NULL
AND l.ShipmentId IS NULL;
SQL Fiddle Demo
| ID | LoadId | ShipmentId | OrderId | PackageId | CustomerOTP | ClientOTP |
|----|--------|------------|---------|-----------|-------------|-----------|
| 1 | L1 | 123 | (null) | (null) | 99999 | 88888 |
Note that, I don't understand why the PackageId should be null, because according to your criteria, it shouldn't. Anyway, I select it as a fixed NULL value, so that you will always get a NULL value no matter what is the actual value.

I often have to solve some case like this when creating data extraction or integration.
So the answer for me will be close to this :
you can use the MERGE function and add some switch case if you want to custom it for some column
MERGE LASTSYNCEDREQUEST TGT
USING (
SELECT
ID,
LoadId,
ShipmentId,
OrderId,
PackageId,
CustomerOTP,
ClientOTP
FROM NEEDTOSYNCREQUESTS
) AS SRC
ON (
SRC.ID = TGT.ID)
WHEN MATCHED
THEN
UPDATE
SET
TGT.ID = SRC.ID
,TGT.LoadID = NULL
,TGT.ShipmentID = SRC.ShipmentID
,TGT.OrderID = NULL
,TGT.PackageID = NULL
,TGT.CustomerOTP = SRC.CustomerOTP
,TGT.ClientOTP = SRC.ClientOTP
WHEN NOT MATCHED
THEN
INSERT (
ID,
LoadId,
ShipmentId,
OrderId,
PackageId,
CustomerOTP,
ClientOTP
)
VALUES (
SRC.ID,
NULL,
SRC.ShipmentId,
NULL,
NULL,
SRC.CustomerOTP,
SRC.ClientOTP
);
SELECT * FROM LASTSYNCEDREQUEST
you can try the code that I write above.

How to get one to many relationship data?

These are two tables having one-to-many relationship:
Employee[Table]:
---------------------------------------------------
EmpId | Name | Country | Salary | Email
--------------------------------------------------
1 John USA 875847 john#test.com
2 Mike USA 785487 mike#test.com
Lincense[Table]
----------------------------------------
EmpId | LicenseType | LincenseNumber
----------------------------------------
1 LincenseType1 12345678
1 LincenseType2 87654321
1 LincenseType3 78945613
2 LincenseType1 12345678
2 LincenseType2 87654321
2 LincenseType3 78945613
EmployeeDetails[Expected ResulSet]
-----------------------------------------------------------------------------------------------
EmpId | Name | Country | LicenseType | LicenseNumber | Salary | Email
-----------------------------------------------------------------------------------------------
1 John USA LincenseType1 12345678 875847 john#test.com
LincenseType2 87654321
LincenseType3 78945613
2 Mike USA LincenseType1 12345678 785487 mike#test.com
LincenseType2 87654321
LincenseType3 78945613
----------------------------------------------------------------------------------------------
To get result in above expected format what would be the best way to achieve that so that result contain only one row for Employee detail and all associated License details?

This here will do the trick for you. Remember if your empid and salary are int columns you can only null them or set 0 into it. Otherwise it need to be of type string
SQL Code
declare #emp table (empid int,[name] nvarchar(50),Country nvarchar(50),Salary int,[Email] nvarchar(50)
)
insert into #emp
values
(1 ,'John', 'USA', 875847, 'john#test.com'),
(2 ,'Mike', 'USA', 785487, 'mike#test.com')
declare #lic table (empid int, licensetype nvarchar(50),licencenumber int)
insert into #lic
values
(1 ,'LincenseType1', 12345678),
(1 ,'LincenseType2', 87654321),
(1 ,'LincenseType3', 78945613),
(2 ,'LincenseType1', 12345678),
(2 ,'LincenseType2', 87654321),
(2 ,'LincenseType3', 78945613)
select
empid = case when rn > 1 then null else x.empid end,
[name] = case when rn > 1 then '' else [name] end,
Country = case when rn > 1 then '' else country end,
licensetype = licensetype,
licencenumber = licencenumber,
Salary = case when rn > 1 then '' else Salary end,
Email = case when rn> 1 then '' else Email end
from (
select a.empid,[name],country,licensetype,licencenumber,Salary,Email,ROW_NUMBER() over(partition by a.empid order by licensetype) as rn from #emp a left join #lic b on a.empid = b.empid
)x
SQL Update
If Lincencetype should always be 1 on first row you can just do it like this. This will be faster
select
empid = case when licensetype !='LincenseType1' then null else a.empid end,
[name] = case when licensetype !='LincenseType1' then '' else [name] end,
Country = case when licensetype !='LincenseType1' then '' else country end,
licensetype = licensetype,
licencenumber = licencenumber,
Salary = case when licensetype !='LincenseType1' then '' else Salary end,
Email = case when licensetype !='LincenseType1' then '' else Email end
from #emp a inner join #lic b on a.empid = b.empid
Result

How about this?
SELECT Employee.EmpId, Employee.Name, Employee.Country,
Lincense.LicenseType, Lincense.LincenseNumber,
Employee.Salary, Employee.Email
FROM Employee JOIN Lincense
ON (Employee.EmpId = Lincense.EmpId);
If you are looking for empty values as you have shown, the best approach is to do that in your reporting tool rather than trying to do that in SQL

This is not what you want but close enough:
SELECT A.EMPID, A.NAME, A.COUNTRY, STUFF((SELECT ','+ b.LicenseType FROM License b WHERE A.EMPID = B.EMPID FOR XML PATH('')),1,1,'') AS LicenseType, STUFF((SELECT ','+ C.LicenseNumber FROM License C WHERE A.EMPID = C.EMPID FOR XML PATH('')),1,1,'') AS LicenseNumber, A.Salary, A.Email
FROM Employee A
It will not give you blank rows but will put LicenseType and LicenseNumber in same columns delimited by commas.

SELECT Employee.EmpId, Employee.Name, Employee.Country,
Lincense.LicenseType, Lincense.LincenseNumber,
Employee.Salary, Employee.Email
FROM Employee
JOIN Lincense ON Employee.EmpId = Lincense.EmpId;
Will give you simmilar result but all rows will be filled with data.
SELECT Employee.EmpId, Employee.Name, Employee.Country,
Lincense.LicenseType, Lincense.LincenseNumber,
Employee.Salary, Employee.Email
FROM Employee
LEFT OUTER JOIN Lincense ON Employee.EmpId = Lincense.EmpId;
Will give you result for all employees, even if it has no licence, you will notice those rows by having nulls on Lincence's properties
SELECT Employee.EmpId, Employee.Name, Employee.Country,
Lincense.LicenseType, Lincense.LincenseNumber,
Employee.Salary, Employee.Email
FROM Employee
RIGHT OUTER JOIN Lincense ON Employee.EmpId = Lincense.EmpId;
Will give you result for all licences, even if employee is not present, you will notice those rows by having nulls on Employee's properties
If you need the empty rows, it's bad practice to do that in DB. Handle that in backend code once you obtain result. Or even only when displaying the result in front end.
NOTE it's Licence or License, depending in what English you prefer.

How to combine multiple rows, merging cells with different values across columns and rows using SQL

Below is the form of data I have right now from a SQL query:
ID Name Nationality Institution Degree Result
---------------------------------------------
1 Brian USA a b c
1 Brian USA d e f
1 Brian USA h i j
2 Faye UK y z x
2 Faye UK o p q
And the data would ideally be sorted as below:
ID Name Nationality Background
-------------------------------------------------
1 Brian USA a,b,c; d,e,f; h,i,j
2 Faye UK y,z,x; o,p,q
I'm a SQL beginner and I'd very much appreciate any help with this.
Below is my current SQL query:
select
table1.id,
table1.lastname,
table1.firstname,
table1.group,
table2.institution,
table2.degree,
table2.result,
from
table1
inner join
table2 on (table1.id = table2.id)
where
((table1.startyear = '2017')
and (table1.group = 'A'))

You can query as below:
Select Id, [Name], Nationality,
Background = Stuff((Select '; '+Institution+','+Degree+','+Result from #table1 where id = t.Id for xml path('')),1,2,'')
from #table1 t
Group by Id, [Name], Nationality

So using list() and the || string concat in Firebird ( IBExpert ) gives us:
UNTESTED: List() Doc requires 2.1 version or greater.
A prior SO answer using List()
Note Group is on the reserved words list; so you may have to escape it using " in firebird
SELECT table1.id
, table1.lastname
, table1.firstname
, table1."group"
, List(table2.institution ||','||
table2.degree ||','||
table2.result,';') as Background
FROM table1
INNER JOIN table2
ON table1.id = table2.id
WHERE table1.startyear = '2017'
AND table1."group" = 'A'
GROUP BY table1.id
, table1.lastname
, table1.firstname
, table1."group"

SQL query to output all results but only one specific value from a list without duplicating

I have a table of people and a table of client types (linked by a 3rd table called client type details) which these people are linked to. Client types could be 'Friend', 'Enemy', 'Alien', or 'Monster'. Some people can have more than one of these and some people have none.
What I'm really trying to get is an output something like:
ID | Name | Friend | Enemy | Alien | Monster |
35 | John | Friend | -blank- | -blank- | Monster |
42 | Eric | -blank- | -blank- | -blank- | -blank- |
So John is both a Friend and a Monster whereas Eric isn't any. With the query I have tried creating (just with a column for the Friends in the first instant) I am getting a row for everyone but for those who are Friends I am getting 2 rows for them - one to say they are a Friend and one to say NULL
Does any of this make sense?
Query below:
SELECT DISTINCT
cl.ClientID,
cl.dcFirstName,
(SELECT Dwh.DimClientTypes.dctName
WHERE (Dwh.DimClientTypes.dctGuid IN ('52CD80A6-D4D7-4FD3-8AE8-644A40FEC108'))
) AS Friend
FROM Dwh.DimClientTypeDetails
LEFT OUTER JOIN Dwh.DimClientTypes ON Dwh.DimClientTypeDetails.dctdTypeGuid = Dwh.DimClientTypes.dctGuid
LEFT OUTER JOIN Dwh.DimClients AS cl ON Dwh.DimClientTypeDetails.dctdClientGuid = cl.dcClientGUID
I'm really not sure the best way of approaching it so any help/advice would be very gratefully received.
Thanks
Lee

You are basically looking for a pivot with strings, and you could write it using the pivot or apply but this seems simpler.
select
Id = cl.ClientId
, Name = cl.dcFirstName
, Friend = max(case when ct.dctName = 'Friend' then ct.dctName else null end)
, Enemy = max(case when ct.dctName = 'Enemy' then ct.dctName else null end)
, Alien = max(case when ct.dctName = 'Alien' then ct.dctName else null end)
, Monster = max(case when ct.dctName = 'Monster' then ct.dctName else null end)
from Dwh.DimClients as cl
left join Dwh.DimClientTypeDetails ctd on ctd.dctdClientGuid = cl.dcClientguid
left join Dwh.DimClientTypes ct on ct.dctGuid = ctd.dctdTypeGuid
group by cl.ClientId, cl.dcFirstName
For a pivot version, this is a good example: http://rextester.com/XDACE35377
create table #t (Id int not null, Name varchar(32) not null, ClientType varchar(32) null)
insert into #t values
(35, 'John', 'Friend')
,(35, 'John', 'Monster')
,(42, 'Eric', null);
select
Id
, Name
, pvt.Friend
, pvt.Enemy
, pvt.Alien
, pvt.Monster
from #t
pivot (max(ClientType) for ClientType in ([Friend],[Enemy],[Alien],[Monster])) pvt
This pivot could be done on your schema with something like this:
with c as (
select
Id = cl.ClientId
, Name = cl.dcFirstName
, ClientType = ct.dctName
from Dwh.DimClients as cl
left join Dwh.DimClientTypeDetails ctd on ctd.dctdClientGuid = cl.dcClientguid
left join Dwh.DimClientTypes ct on ct.dctGuid = ctd.dctdTypeGuid
)
select
Id
, Name
, pvt.Friend
, pvt.Enemy
, pvt.Alien
, pvt.Monster
from c
pivot (max(ClientType) for ClientType in ([Friend],[Enemy],[Alien],[Monster])) pvt

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Delete and Merge Records in SQL Server - sql

Related

Postgres - Where EXISTS else use other where condition

Select mismatched column values from Two Tables

How to get one to many relationship data?

How to combine multiple rows, merging cells with different values across columns and rows using SQL

SQL query to output all results but only one specific value from a list without duplicating

Categories

Resources