SQL Query Find Exact and Near Dupes

SQL Query Find Exact and Near Dupes - sql

I have a SQL table with FirstName, LastName, Add1 and other fields. I am working to get this data cleaned up. There are a few instances of likely dupes -
All 3 columns are the exact same for more than 1 record
The First and Last are the same, only 1 has an address, the other is blank
The First and Last are similar (John | Doe vs John C. | Doe) and the address is the same or one is blank
I'm wanting to generate a query I can provide to the users, so they can check these records out, compare their related records and then delete the one they don't need.
I've been looking at similarity functions, soundex, and such, but it all seems so complicated. Is there an easy way to do this?
Thanks!
Edit:
So here is some sample data:
FirstName | LastName | Add1
John | Doe | 1 Main St
John | Doe |
John A. | Doe |
Jane | Doe | 2 Union Ave
Jane B. | Doe | 2 Union Ave
Alex | Smith | 3 Broad St
Chris | Anderson | 4 South Blvd
Chris | Anderson | 4 South Blvd
I really like Critical Error's query for identifying all different types of dupes. That would give me the above sample data, with the Alex Smith result not included, because there are no dupes for that.
What I want to do is take that result set and identify which are dupes for Jane Doe. She should only have 2 dupes. John Doe has 3, and Chris Anderson has 2. Can I get at that sub-result set?
Edit:
I figured it out! I will be marking Critical Error's answer as the solution, since it totally got me where I needed to go. Here is the solution, in case it might help others. Basically, this is what we are doing.
Selecting the records from the table where there are dupes
Adding a WHERE EXISTS sub-query to look in the same table for exact dupes, where the ID from the main query and sub-query do not match
Adding a WHERE EXISTS sub-query to look in the same table for similar dupes, using a Difference factor between duplicative columns, where the ID from the main query and sub-query do not match
Adding a WHERE EXISTS sub-query to look in the same table for dupes on 2 fields where a 3rd may be null for one of the records, where the ID from the main query and sub-query do not match
Each subquery is connected with an OR, so that any kind of duplicate is found
At the end of each sub-query add a nested requirement that either the main query or sub-query be the ID of the record you are looking to identify duplicates for.
DECLARE #CID AS INT
SET ANSI_NULLS ON
SET NOCOUNT ON;
SET #CID = 12345
BEGIN
SELECT
*
FROM #Customers c
WHERE
-- Exact duplicates.
EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Add1 = c.Add1
AND x.Id <> c.Id
AND (x.ID = #CID OR c.ID = #CID)
)
-- Match First/Last name are same/similar and the address is same.
OR EXISTS (
SELECT * FROM #Customers x WHERE
DIFFERENCE( x.FirstName, c.FirstName ) = 4
AND DIFFERENCE( x.LastName, c.LastName ) = 4
AND x.Add1 = c.Add1
AND x.Id <> c.Id
AND (x.ID = #CID OR c.ID = #CID)
)
-- Match First/Last name and one address exists.
OR EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Id <> c.Id
AND (
x.Add1 IS NULL AND c.Add1 IS NOT NULL
OR
x.Add1 IS NOT NULL AND c.Add1 IS NULL
)
AND (x.ID = #CID OR c.ID = #CID)
);

Assuming you have a unique id between records, you can give this a try:
DECLARE #Customers table ( FirstName varchar(50), LastName varchar(50), Add1 varchar(50), Id int IDENTITY(1,1) );
INSERT INTO #Customers ( FirstName, LastName, Add1 ) VALUES
( 'John', 'Doe', '123 Anywhere Ln' ),
( 'John', 'Doe', '123 Anywhere Ln' ),
( 'John', 'Doe', NULL ),
( 'John C.', 'Doe', '123 Anywhere Ln' ),
( 'John C.', 'Doe', '15673 SW Liar Dr' );
SELECT
*
FROM #Customers c
WHERE
-- Exact duplicates.
EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Add1 = c.Add1
AND x.Id <> c.Id
)
-- Match First/Last name are same/similar and the address is same.
OR EXISTS (
SELECT * FROM #Customers x WHERE
DIFFERENCE( x.FirstName, c.FirstName ) = 4
AND DIFFERENCE( x.LastName, c.LastName ) = 4
AND x.Add1 = c.Add1
AND x.Id <> c.Id
)
-- Match First/Last name and one address exists.
OR EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Id <> c.Id
AND (
x.Add1 IS NULL AND c.Add1 IS NOT NULL
OR
x.Add1 IS NOT NULL AND c.Add1 IS NULL
)
);
Returns
+-----------+----------+-----------------+----+
| FirstName | LastName | Add1 | Id |
+-----------+----------+-----------------+----+
| John | Doe | 123 Anywhere Ln | 1 |
| John | Doe | 123 Anywhere Ln | 2 |
| John | Doe | NULL | 3 |
| John C. | Doe | 123 Anywhere Ln | 4 |
+-----------+----------+-----------------+----+
Initial resultset:
+-----------+----------+------------------+----+
| FirstName | LastName | Add1 | Id |
+-----------+----------+------------------+----+
| John | Doe | 123 Anywhere Ln | 1 |
| John | Doe | 123 Anywhere Ln | 2 |
| John | Doe | NULL | 3 |
| John C. | Doe | 123 Anywhere Ln | 4 |
| John C. | Doe | 15673 SW Liar Dr | 5 |
+-----------+----------+------------------+----+

Related

How to select DISTINCT records based on multiple columns and without considering their order

I'm currently working on a SQL Server database and I would need a query that returns pairs of customers with the same city from a table that has this structure
Customer(ID, Name, Surname, City)
and this sample data
Name | Surname | City
-----------+-----------+-----------
Foo | Foo | New York
-----------+-----------+-----------
Bar | Bar | New York
-----------+-----------+-----------
Alice | A | London
-----------+-----------+-----------
Bob | B | London
I have tried defining a query that joins the Customer table itself
SELECT C1.Name + ' ' + C1.Surname CustomerA, C2.Name + ' ' + C2.Surname CustomerB, C1.City
FROM Customer C1 JOIN Customer C2
ON C1.City = C2.City
WHERE CustomerA <> CustomerB
but it gives me a table that looks like this
CustomerA | CustomerB | City
-----------+-----------+-----------
Foo Foo | Bar Bar | New York
-----------+-----------+-----------
Bar Bar | Foo Foo | New York
-----------+-----------+-----------
Alice A | Bob B | London
-----------+-----------+-----------
Bob B | Alice A | London
with duplicated rows but with swapped customers.
My question is how would I have to do to select those rows once (e.g. for the first two results, it would be great only the first or the second row).
This would be an example of the expected result
CustomerA | CustomerB | City
-----------+-----------+-----------
Foo Foo | Bar Bar | New York
-----------+-----------+-----------
Alice A | Bob B | London

I think I understand what you are looking for but it seems over simplified to your actual problem. Your query you posted was incredibly close to working. You can't reference columns by their alias in the where predicates so you will need to use the string concatenation you had in your column. Then you can simply change the <> to either > or < so you only get one match. This example should work for your problem as I understand it.
declare #Customer table
(
CustID int identity
, Name varchar(10)
, Surname varchar(10)
, City varchar(10)
)
insert #Customer
select 'Foo', 'Foo', 'New York' union all
select 'Bar', 'Bar', 'New York' union all
select 'Smith', 'Smith', 'New York' union all
select 'Alice', 'A', 'London' union all
select 'Bob', 'B', 'London'
SELECT CustomerA = C1.Name + ' ' + C1.Surname
, CustomerB = C2.Name + ' ' + C2.Surname
, C1.City
FROM #Customer C1
JOIN #Customer C2 ON C1.City = C2.City
where C1.Name + ' ' + C1.Surname > C2.Name + ' ' + C2.Surname

You can use concat and group by clause for this query
select concat(C1.Name," ",C1.surname) as CustomerA, concat(C2.Name," ",C2.surname) CustomerB,C1.city
from customer C1
left join customer C2
on C1.city=C2.city and C1.name<>C2.name
group by C1.city;

Multi-Pass Duplication Identification with Exclusions

I have a customer table with several hundred thousand records. There are a LOT of duplicates of varying degrees. I am trying to identify duplicate records with level of possibility of being a duplicate.
My source table has 7 fields and looks like this:
I look for duplicates, and put them into an intermediate table with the level of possibility, table name, and the customer number.
Intermediate Table
CREATE TABLE DataCheck (
id int identity(1,1),
reason varchar(100) DEFAULT NULL,
tableName varchar(100) DEFAULT NULL,
tableID varchar(100) DEFAULT NULL
)
Here is my code to identify and insert:
-- Match on Company, Contact, Address, City, and Phone
-- DUPE
INSERT INTO DataCheck
SELECT 'Duplicate','CUSTOMER',tcd.uid
FROM #tmpCoreData tcd
INNER JOIN
(SELECT
company,
fname,
lname,
add1,
city,
phone1,
COUNT(*) AS count
FROM #tmpCoreData
WHERE company <> ''
GROUP BY company, fname, lname, add1, city, phone1
HAVING COUNT(*) > 1) dl
ON dl.company = tcd.company
ORDER BY tcd.company
In this example, it would insert ids 101, 102
The problem is when I perform the next pass:
-- Match on Company, Address, City, Phone (Diff Contacts)
-- LIKELY DUPE
INSERT INTO DataCheck
SELECT 'Likely Duplicate','CUSTOMER',tcd.uid
FROM #tmpCoreData tcd
INNER JOIN
(SELECT
company,
add1,
city,
phone1,
COUNT(*) AS count
FROM #tmpCoreData
WHERE company <> ''
GROUP BY company, add1, city, phone1
HAVING COUNT(*) > 1) dl
ON dl.company = tcd.company
ORDER BY tcd.companyc
This pass would then insert, 101, 102 & 103.
The next pass drops the phone so it would insert 101, 102, 103, 104
The next pass would look for company only which would insert all 5.
I now have 14 entries into my intermediate table for 5 records.
How can I add an exclusion so the 2nd pass groups on the same Company, Address, City, Phone but DIFFERENT fname and lname. Then it should only insert 101 and 103
I considered adding a NOT IN (SELECT tableID FROM DataCheck) to ensure IDs aren't added multiple times, but on the 3rd of 4th pass it may find a duplicate and entered 700 records after the row it's a duplicate of, so you lose the context of it's a dupe of.
My output uses:
SELECT
dc.reason,
dc.tableName,
tcd.*
FROM DataCheck dc
INNER JOIN #tmpCoreData tcd
ON tcd.uid = dc.tableID
ORDER BY dc.id
And looks something like this, which is a bit confusing:

I'm going to challenge your perception of your issue, and instead propose that you calculate a simple "confidence score", which will also help you vastly simplify your results table:
WITH FirstCompany AS (SELECT custNo, company, fname, lname, add1, city, phone1
FROM(SELECT custNo, company, fname, lname, add1, city, phone1,
ROW_NUMBER() OVER(PARTITION BY company ORDER BY custNo) AS ordering
FROM CoreData) FC
WHERE ordering = 1)
SELECT RankMapping.description, Duplicate.custNo, Duplicate.company, Duplicate.fname, Duplicate.lname, Duplicate.add1, Duplicate.city, Duplicate.phone1
FROM (SELECT FirstCompany.custNo AS originalCustNo, Duplicate.*,
CASE WHEN FirstCompany.custNo = Duplicate.custNo THEN 1 ELSE 0 END
+ CASE WHEN FirstCompany.fname = Duplicate.fname AND FirstCompany.lname = Duplicate.lname THEN 1 ELSE 0 END
+ CASE WHEN FirstCompany.add1 = Duplicate.add1 AND FirstCompany.city = Duplicate.city THEN 1 ELSE 0 END
+ CASE WHEN FirstCompany.phone1 = Duplicate.phone1 THEN 1 ELSE 0 END
AS ranking
FROM FirstCompany
JOIN CoreData Duplicate
ON Duplicate.custNo >= FirstCompany.custNo
AND Duplicate.company = FirstCompany.company) Duplicate
JOIN (VALUES (4, 'original'),
(3, 'duplicate'),
(2, 'likely dupe'),
(1, 'possible dupe'),
(0, 'not likely dupe')) RankMapping(score, description)
ON RankMapping.score = Duplicate.ranking
ORDER BY Duplicate.originalCustNo, Duplicate.ranking DESC
SQL Fiddle Example
... which generates results that look like this:
| description | custNo | company | fname | lname | add1 | city | phone1 |
|-----------------|--------|----------|---------|--------|--------------|--------------|------------|
| original | 101 | ACME INC | JOHN | DOE | 123 ACME ST | LOONEY HILLS | 1231234567 |
| duplicate | 102 | ACME INC | JOHN | DOE | 123 ACME ST | LOONEY HILLS | 1231234567 |
| likely dupe | 103 | ACME INC | JANE | SMITH | 123 ACME ST | LOONEY HILLS | 1231234567 |
| possible dupe | 104 | ACME INC | BOB | DOLE | 123 ACME ST | LOONEY HILLS | 4564567890 |
| not likely dupe | 105 | ACME INC | JESSICA | RABBIT | 456 ROGER LN | WARNER | 4564567890 |
This code baselessly assumes that the smallest custNo is the "original", and assumes matches will be equivalent to solely that one, but it's completely possible to get other matches as well (just unnest the subquery in the CTE, and remove the row number).

Identifying duplicate records in SQL along with primary key

I have a business case scenario where I need to do a lookup into our SQL "Users" table to find out email addresses which are duplicated. I was able to do that by the below query:
SELECT
user_email, COUNT(*) as DuplicateEmails
FROM
Users
GROUP BY
user_email
HAVING
COUNT(*) > 1
ORDER BY
DuplicateEmails DESC
I get an output like this:
user_email DuplicateEmails
--------------------------------
abc#gmail.com 2
xyz#yahoo.com 3
Now I am asked to list out all the duplicate records in a single row of its own and display some additional properties like first name , last name and userID. All this information is stored in this table "Users". I am having difficulty doing so. Can anyone help me or put me toward right direction?
My output needs to look like this:
user_email DuplicateEmails FirstName LastName UserID
------------------------------------------------------------------------------
abc#gmail.com 2 Tim Lentil timLentil
abc#gmail.com 2 John Doe johnDoe12
xyz#yahoo.com 3 brian boss brianTheBoss
xyz#yahoo.com 3 Thomas Hood tHood
xyz#yahoo.com 3 Mark Brown MBrown12

There are several ways you could do this. Here is one using a cte.
with FoundDuplicates as
(
SELECT
uter_email, COUNT(*) as DuplicateEmails
FROM
Users
GROUP BY
uter_email
HAVING
COUNT(*) > 1
)
select fd.user_email
, fd.DuplicateEmails
, u.FirstName
, u.LastName
, u.UserID
from Users u
join FoundDuplicates fd on fd.uter_email = u.uter_email
ORDER BY fd.DuplicateEmails DESC

Use count() over( Partition by ), example

You can solve it like:
DECLARE #T TABLE
(
UserID VARCHAR(20),
FirstName NVARCHAR(45),
LastName NVARCHAR(45),
UserMail VARCHAR(45)
);
INSERT INTO #T (UserMail, FirstName, LastName, UserID) VALUES
('abc#gmail.com', 'Tim', 'Lentil', 'timLentil'),
('abc#gmail.com', 'John', 'Doe', 'johnDoe12'),
('xyz#yahoo.com', 'brian', 'boss', 'brianTheBoss'),
('xyz#yahoo.com', 'Thomas', 'Hood', 'tHood'),
('xyz#yahoo.com', 'Mark', 'Brown', 'MBrown12');
SELECT *, COUNT (1) OVER (PARTITION BY UserMail) MailCount
FROM #T;
Results:
+--------------+-----------+----------+---------------+-----------+
| UserID | FirstName | LastName | UserMail | MailCount |
+--------------+-----------+----------+---------------+-----------+
| timLentil | Tim | Lentil | abc#gmail.com | 2 |
| johnDoe12 | John | Doe | abc#gmail.com | 2 |
| brianTheBoss | brian | boss | xyz#yahoo.com | 3 |
| tHood | Thomas | Hood | xyz#yahoo.com | 3 |
| MBrown12 | Mark | Brown | xyz#yahoo.com | 3 |
+--------------+-----------+----------+---------------+-----------+

Use a window function like this:
SELECT u.*
FROM (SELECT u.*, COUNT(*) OVER (PARTITION BY user_email) as numDuplicateEmails
FROM Users
) u
WHERE numDuplicateEmails > 1
ORDER BY numDuplicateEmails DESC;

I think this will also work.
WITH cte (
SELECT
*
,DuplicateEmails = ROW_NUMBER() OVER (Partition BY user_email ORder by user_email)
FROM Users
)
Select * from CTE
where DuplicateEmails > 1

SQL Server 2012 - A Little Guidance

I have searched the net but I am certain I must not be phrasing my keywords correctly because I am not finding possible solutions for my problem. think it might be recursion but I'm not quite certain.
I have a table that has the following categories:
ID, Author, Customer, Group
A sample dataset would be like:
ID | Author | Customer | Group
------------------------------------------
1 | Paula Hawkins | John Doe | NULL
2 | Harlan Coben | John Doe | NULL
3 | James Patterson| John Doe | NULL
4 | Paula Hawkins | Jane Doe | NULL
5 | James Patterson| Jane Doe | NULL
6 | James Patterson| Steven Doe| NULL
7 | Harlan Coben | Steven Doe| NULL
8 | Paula Hawkins | Harry Doe | NULL
9 | James Patterson| Harry Doe | NULL
Its possible a customer may have one ore more then one author checked out so what I am trying to do is group them with a unique id based on what total are checked out (regardless of the customer name):
ID | Author | Customer | Group
--------------------------------------------
1 | Paula Hawkins | John Doe | 1
2 | Harlan Coben | John Doe | 1
3 | James Patterson| John Doe | 1
4 | Paula Hawkins | Jane Doe | 2
5 | James Patterson| Jane Doe | 2
6 | James Patterson| Steven Doe | 3
7 | Harlan Coben | Steven Doe | 3
8 | Paula Hawkins | Harry Doe | 2
9 | James Patterson| Harry Doe | 2
its very possible the same customer could be found hundreds of times for multiple books so the final group category would represent the unique value for that customer (other customers would have the same value only if everything they have checked out also matches everything the other customer has checked out).
Using the above data, Harry and Jane have the exact same authors checked out so they are in the same group but John and Steven have different combinations so they have their own unique group.
Hopefully this makes sense. Is this what is called recursion? If so then I will look towards a cte solution that uses some sort of ranking for the unique id value. Thanks for any help you give.

Not sure how to get your exact group order, but to just group customers together you can combine their authors with FOR XML and group the customers based on exact matches.
WITH cte AS (
SELECT
*,
RANK() OVER (ORDER BY Authors) [Group]
FROM (
SELECT
[Customer],
STUFF((SELECT ',' + [Author]
FROM myTable WHERE Customer = mt.Customer
ORDER BY Author
FOR XML PATH('')), 1, 1, '') AS Authors
FROM
myTable mt
GROUP BY [Customer] ) t
)
SELECT
mt.[ID],
mt.[Author],
mt.[Customer],
cte.[Group]
FROM
cte
JOIN myTable mt ON mt.Customer = cte.Customer
ORDER BY mt.[ID]
SQL FIDDLE DEMO

Try using cursors... Cursors are slow, but they're also easier to understand..
Here's a sample implementation...
DECLARE #GroupExists Bit
DECLARE #CurrGroup Int
DECLARE #NextGroup Int
DECLARE #Customer VARCHAR(250)
SET #NextGroup = 1
DECLARE customer_cursor CURSOR FAST_FORWARD
FOR SELECT distinct Customer FROM dbo.TableName
OPEN customer_cursor
FETCH NEXT FROM customer_cursor
INTO #Customer
WHILE ##FETCH_STATUS = 0
BEGIN
SET #GroupExists = 0
--Test condition to check if group of authors in in use
IF #GroupExists = 1 Then
BEGIN
UPDATE dbo.TableName
SET Group = #CurrGroup
WHERE Customer = #Customer
END
ELSE
BEGIN
UPDATE dbo.TableName
SET Group = #NextGroup
WHERE Customer = #Customer
SET #NextGroup= #NextGroup+ 1
END
FETCH NEXT FROM customer_cursor
INTO #Customer
END

You should be able to generate groups using standard SQL. The following query should do the job; I make no promises of its performance though.
WITH
CTE_CheckOutBookCount AS
(
SELECT [ID]
,[Author]
,[Customer]
,COUNT([Author]) OVER (PARTITION BY [Customer]) AS [CheckOutBooks] -- Count the number of books checked out by each customer. This will be used for our initial compare between customers.
FROM CheckedOutBooks
),
CTE_AuthorAndCountCompare AS
(
SELECT CB.[ID]
,CBC.[Customer] AS MatchedCustomers
FROM CTE_CheckOutBookCount CB
INNER JOIN CTE_CheckOutBookCount CBC ON CB.[Author] = CBC.[Author] AND CB.[CheckOutBooks] = CBC.[CheckOutBooks] --Join customer information on number of books checked out and author name of books checked out.
)
,CTE_MatchedCustomers
AS
(
SELECT
[ID]
,[Author]
,[Customer]
--Get the minimum record id of customers which match exactly on count and authors checked out. This will be used to help generate group ID.
,(
SELECT MIN(ID)
FROM CTE_AuthorAndCountCompare
WHERE CheckedOutBooks.[Customer] = CTE_AuthorAndCountCompare.MatchedCustomers
) MinID
FROM CheckedOutBooks
)
SELECT
[ID]
,[Author]
,[Customer]
,DENSE_RANK() OVER (ORDER BY MinID) AS [Group] -- Generate new group id
FROM CTE_MatchedCustomers
ORDER BY ID

SQL records with common ID - update all with single user defined function call

ClientInfo Table
------------------------------------------------------------
||ClientInfoID | ClientID | FName | MName | LName ||
||1 | 1 | Don | A | Smith||
||2 | 1 | Dan | A | Smith||
||3 | 1 | Dan | G | Smith||
||4 | 2 | John | D | Doe ||
------------------------------------------------------------
Trying to get an sql call right in SQL Server. I've written a user defined function that generates a random first/middle/last names which is working fine. My challenge is that I want ALL records with the same ClientID to get updated with the result of a single call to my rename function (actually 3 calls = 1 for first, middle, and last name).
My attempt below is chaning EVERY record in ClientInfo DIFFERENT names instead of giving all ClientID = 1 records the SAME f/m/last names, ClientID = 2 the SAME f/m/last names, etc.
DESIRED RESULT:
------------------------------------------------------------
||ClientInfoID | ClientID | FName | MName | LName ||
||1 | 1 | Bill | X | Brown ||
||2 | 1 | Bill | X | Brown ||
||3 | 1 | Bill | X | Brown ||
||4 | 2 | Kate | Q | Ramirez ||
------------------------------------------------------------
ACTUAL RESULT:
----------------------------------------------------------------
|| ClientInfoID |ClientID | FName | MName | LName ||
|| 1 | 1 | Bill | X | Brown ||
|| 2 | 1 | Sue | R | Henderson||
|| 3 | 1 | Phil | S | Anders ||
|| 4 | 2 | Kate | Q | Ramirez ||
----------------------------------------------------------------
My SQL call
UPDATE ClientInfo
SET FirstName = X.NewFirstName
,MiddleName = X.NewMiddleName
,LastName = X.NewLastName
FROM (
SELECT UniqueClientID
,LastClientInfoID
,NewFirstName
,NewMiddleName
,NewLastName
FROM (
(
SELECT ClientID AS UniqueClientID
,MAX(ClientInfoID) AS LastClientInfoID
FROM ClientInfo
GROUP BY ClientID
) A INNER JOIN (
SELECT ClientInfoID
,NewFirstName = dbo.fnSampleFnameMnameLname(0, #MaxName, '')
,NewMiddleName = dbo.fnSampleFnameMnameLname(1, #MaxName, MiddleName)
,NewLastName = dbo.fnSampleFnameMnameLname(2, #MaxName, '')
FROM ClientInfo
) B ON A.LastClientInfoID = B.ClientInfoID
)
) X
WHERE ClientID = X.UniqueClientID

Solved it. Moved the creation of names attached to each clientid into a temp table first. Then just joined on clientinfo on that temp table to pull in the new sample names.
SELECT ClientID, NewFirstName, NewMiddleName, NewLastName
INTO #TempSampleNames
FROM (
(
SELECT ClientID,
MAX(ClientInfoID) MaxClientInfoID
FROM ClientInfo
GROUP BY ClientID
) A
INNER JOIN (
SELECT ClientInfoID
,NewFirstName = dbo.fnSampleFnameMnameLname(0, #MaxName, '')
,NewMiddleName = dbo.fnSampleFnameMnameLname(1, #MaxName, MiddleName)
,NewLastName = dbo.fnSampleFnameMnameLname(2, #MaxName, '')
FROM ClientInfo
) B ON A.MaxClientInfoID = B.ClientInfoID
)
UPDATE ClientInfo
SET
FirstName = B.NewFirstName
,MiddleName = B.NewMiddleName
,LastName = B.NewLastName
FROM ClientInfo A
INNER JOIN #TempSampleNames B ON A.ClientID = B.ClientID
DROP TABLE #TempSampleNames

Assuming that you have a Client table, and ClientID is a primary or unique key in that table, and ClientInfo has a one-to-many relationship with it on ClientID, then you could simply do this:
UPDATE A
SET
FirstName = B.NewFirstName
,MiddleName = B.NewMiddleName
,LastName = B.NewLastName
FROM ClientInfo A
INNER JOIN (
SELECT ClientID
,NewFirstName = dbo.fnSampleFnameMnameLname(0, #MaxName, '')
,NewMiddleName = dbo.fnSampleFnameMnameLname(1, #MaxName, MiddleName)
,NewLastName = dbo.fnSampleFnameMnameLname(2, #MaxName, '')
FROM Client
) B ON A.ClientID = B.ClientID

I would change the function from scalar to table valued
That's just a proof of concept. I fill the existing FirstName into the function to proof that it's always the same with one client...
ATTENTION: I used your data as given in your question. One 'Don' should rather be a 'Dan' I assume...
EDIT: expand example...
CREATE FUNCTION dbo.fnSampleFNameMnameLname
(
#Inp VARCHAR(30) --don't quite understand the #Maxname...
)
RETURNS TABLE
AS
--Create your Names in one go (this is inline syntax, maybe you'd do it with multi statement syntax...)
RETURN SELECT #Inp+'FName1' AS FName,#Inp+'MName1' AS MName,#Inp+'LName1' AS LName;
GO
DECLARE #ClientInfo TABLE(ClientInfoID INT,ClientID INT, FName VARCHAR(30),MName VARCHAR(30),LName VARCHAR(30));
INSERT INTO #ClientInfo VALUES
(1,1,'Don','A','Smith')
,(1,1,'Dan','A','Smith')
,(1,1,'Dan','A','Smith')
,(1,1,'John','D','Doe');
SELECT * FROM #ClientInfo AS ci
CROSS APPLY dbo.fnSampleFNameMnameLname(ci.FName) AS NewNames

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Query Find Exact and Near Dupes - sql

Related

How to select DISTINCT records based on multiple columns and without considering their order

Multi-Pass Duplication Identification with Exclusions

Identifying duplicate records in SQL along with primary key

SQL Server 2012 - A Little Guidance

SQL records with common ID - update all with single user defined function call

Categories

Resources