Objective: using SqlServer 2005, Select multiple columns, but ensure that 1 specific column is not a duplicate
Issue: The following code does not remove the duplicates. The field that has duplicates is email.
SELECT DISTINCT
email,
name,
phone
FROM
database.dbo.table
WHERE
status = 'active'
GROUP BY
email,
name,
phone
Thank you in advance for any comments, suggestions or recommendations.
It removes email duplicates but you have to decide which name, phone you need. The result is based on name, phone sort order.
WITH cl
as
(
SELECT email, name, phone, ROW_NUMBER() OVER(PARTITION BY email ORDER BY name, phone) rn
FROM
database.dbo.table
WHERE
status = 'active')
select *
from cl
where rn =1
This is a way of doing it
DECLARE #Table AS TABLE
(email NVARCHAR(100), name NVARCHAR(100), phone NVARCHAR(100))
INSERT INTO #Table
( email , name , phone)
VALUES ( N'fred', -- email - nvarchar(100)
N'bob', -- name - nvarchar(100)
N'steve' -- phone- nvarchar(100)
)
INSERT INTO #Table
( email , name , phone)
VALUES ( N'fred', -- email - nvarchar(100)
N'bob2', -- name - nvarchar(100)
N'ste1ve' -- phone- nvarchar(100)
)
INSERT INTO #Table
( email , name , phone)
VALUES ( N'fred1', -- email - nvarchar(100)
N'bob3', -- name - nvarchar(100)
N'steve3' -- phone- nvarchar(100)
)
SELECT email , MAX(name ) c2, MAX(col3) c3 FROM #Table GROUP BY email
Related
I currently have a query that looks like this:
Select val1, val2, val3, val4 from Table_A where someID = 10
UNION
Select oth1, val2, val3, oth4 from Table_B where someId = 10
I initially run this same query above but with EXCEPT, to identify which ID's are returned with differences, and then I do a UNION query to find which columns specifically are different.
My goal is to compare the values between the two tables (some columns have different names). And that's what I'm doing.
However, the two queries above have about 250 different field names, so it is quite mundane to scroll through to find the differences.
Is there a better and quicker way to identify which column names are different after running the two queries?
EDIT: Here's my current process:
DROP TABLE IF EXISTS #Table_1
DROP TABLE IF EXISTS #Table_2
SELECT 'Dave' AS Name, 'Smih' AS LName, 18 AS Age, 'Alabama' AS State
INTO #Table_1
SELECT 'Dave' AS Name, 'Smith' AS LName, 19 AS Age, 'Alabama' AS State
INTO #Table_2
--FInd differences
SELECT Name, LName,Age,State FROM #Table_1
EXCEPT
SELECT Name, LName,Age,State FROM #Table_2
--How I compare differences
SELECT Name, LName,Age,State FROM #Table_1
UNION
SELECT Name, LName,Age,State FROM #Table_2
Is there any way to streamline this so I can get a column list of differences?
Here is a generic way to handle two tables differences.
We just need to know their primary key column.
It is based on JSON, and will work starting from SQL Server 2016 onwards.
SQL
-- DDL and sample data population, start
DECLARE #TableA TABLE (rowid INT IDENTITY(1,1), FirstName VARCHAR(100), LastName VARCHAR(100), Phone VARCHAR(100));
DECLARE #TableB table (rowid int Identity(1,1), FirstName varchar(100), LastName varchar(100), Phone varchar(100));
INSERT INTO #TableA(FirstName, LastName, Phone) VALUES
('JORGE','LUIS','41514493'),
('JUAN','ROBERRTO','41324133'),
('ALBERTO','JOSE','41514461'),
('JULIO','ESTUARDO','56201550'),
('ALFREDO','JOSE','32356654'),
('LUIS','FERNANDO','98596210');
INSERT INTO #TableB(FirstName, LastName, Phone) VALUES
('JORGE','LUIS','41514493'),
('JUAN','ROBERTO','41324132'),
('ALBERTO','JOSE','41514461'),
('JULIO','ESTUARDO','56201551'),
('ALFRIDO','JOSE','32356653'),
('LUIS','FERNANDOO','98596210');
-- DDL and sample data population, end
SELECT rowid
,[key] AS [column]
,Org_Value = MAX( CASE WHEN Src=1 THEN Value END)
,New_Value = MAX( CASE WHEN Src=2 THEN Value END)
FROM (
SELECT Src=1
,rowid
,B.*
FROM #TableA A
CROSS APPLY ( SELECT [Key]
,Value
FROM OpenJson( (SELECT A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES))
) AS B
UNION ALL
SELECT Src=2
,rowid
,B.*
FROM #TableB A
CROSS APPLY ( SELECT [Key]
,Value
FROM OpenJson( (SELECT A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES))
) AS B
) AS A
GROUP BY rowid,[key]
HAVING MAX(CASE WHEN Src=1 THEN Value END)
<> MAX(CASE WHEN Src=2 THEN Value END)
ORDER BY rowid,[key];
Output
rowid
column
Org_Value
New_Value
2
LastName
ROBERRTO
ROBERTO
2
Phone
41324133
41324132
4
Phone
56201550
56201551
5
FirstName
ALFREDO
ALFRIDO
5
Phone
32356654
32356653
6
LastName
FERNANDO
FERNANDOO
I have a use case where there is a free text field and the user id in the format ab12345 (fixed) and name (dynamic) can appear anywhere in the string.
Now I need to replace the ab12345 with xxxxxxx and the names also with XXXX wherever I find them in the string.
I used:
select *
from dbo.TEST
WHERE DESCRIPTION like '%[a-zA-z][a-zA-Z][0-9][0-9][0-9][0-9][0-9]%';
to get the user id ab12345 but I am unable to write the replace function for this since the result is dynamic.
Same with the name as well.
the following may help in redacting the userID
USE tempdb
GO
CREATE TABLE #CustComments
( CustomerID INT
, CustomerNotes VARCHAR(8000)
)
GO
INSERT dbo.#CustComments
( CustomerID
, CustomerNotes
)
VALUES
( 1, 'An infraction was raised on user id ab12345, and the name of the complainant is John')
, ( 2, 'The customer was not happy with person CD45678 and is going to ask William Jones to speak with George Hillman about this matter' )
, ( 3, 'A customer called and repeatedly mentioned the name of employee ZX98765 and assumes their name was Janet which is not correct')
SELECT * ,
PATINDEX('%[a-zA-z][a-zA-Z][0-9][0-9][0-9][0-9][0-9]%', CustomerNotes) start_pos,
SUBSTRING (customernotes, (PATINDEX('%[a-zA-z][a-zA-Z][0-9][0-9][0-9][0-9][0-9]%', CustomerNotes)) ,7 ) extractstring,
REPLACE(customernotes, substring (customernotes, (PATINDEX('%[a-zA-z][a-zA-Z][0-9][0-9][0-9][0-9][0-9]%', CustomerNotes)) ,7 ), 'XXXXXXX') redacted
FROM #CustComments
--TIDY UP
DROP TABLE #CustComments
if you have or can create a table of "names"...this may work
USE tempdb
GO
CREATE TABLE #CustComments (
CustomerID int,
CustomerNotes varchar(8000)
)
GO
INSERT #CustComments (CustomerID
, CustomerNotes)
VALUES (1, 'An infraction was raised on user id ab12345 , and the name of the complainant is Ann')
, (2, 'The customer was not happy with person CD45678 and is going to ask Richard Jones to speak with Todd Hillman about this matter')
, (3, 'A customer called and repeatedly mentioned the name of employee ZX98765 and assumes their name was Shana which is not correct')
CREATE TABLE #empname (
ename varchar(255) NOT NULL
)
GO
INSERT INTO #empname ([ename])
VALUES ('Zeph'), ('Ebony'), ('Felicia'), ('Benedict'), ('Ahmed'), ('Ira'), ('Julie'), ('Levi'),
('Sebastian'), ('Fiona'), ('Lamar'), ('Russell'), ('Abdul'), ('Lev'), ('Isaiah'), ('Charlotte'),
('Rowan'), ('Ivory'), ('Quinn'), ('Jordan'), ('Xantha'), ('Shana'), ('Mufutau'), ('Jessamine'),
('Desirae'), ('Yvette'), ('Odessa'), ('Ray'), ('Ori'), ('Zenaida'), ('Allegra'), ('Allistair'),
('Raymond'), ('Martena'), ('Cameron'), ('Ila'), ('Nigel'), ('Dale'), ('Emerald'), ('Guinevere'),
('Boris'), ('Dolan'), ('Ainsley'), ('Madeson'), ('Kadeem'), ('Ciaran'), ('Hop'), ('Louis'),
('Maia'), ('Hiroko'), ('Hakeem'), ('Cole'), ('Tyrone'), ('Amy'), ('Doris'), ('Keaton'),
('Carlos'), ('Richard'), ('Lysandra'), ('Beverly'), ('Hamish'), ('Demetria'), ('Eric'), ('Nayda'),
('Sydney'), ('Fritz'), ('Blaze'), ('Regina'), ('Ciara'), ('Ina'), ('Joan'), ('Risa'),
('Alea'), ('Denton'), ('Daryl'), ('Mollie'), ('Keane'), ('Jarrod'), ('Ann'), ('Juliet'),
('Germaine'), ('Alexa'), ('Zane'), ('Kiona'), ('Armand'), ('Jin'), ('Geraldine'), ('Natalie'),
('Nomlanga'), ('Todd'), ('Rajah'),('Lucian'), ('Idona'), ('Autumn'), ('Briar'),
-- add surname
('Hillman');
;
-- redact the userID std format
SELECT
CustomerID ,
--PATINDEX('%[a-zA-z][a-zA-Z][0-9][0-9][0-9][0-9][0-9]%', CustomerNotes) start_pos,
--SUBSTRING (customernotes, (PATINDEX('%[a-zA-z][a-zA-Z][0-9][0-9][0-9][0-9][0-9]%', CustomerNotes)) ,7 ) extractstring,
REPLACE(customernotes, substring (customernotes, (PATINDEX('%[a-zA-z][a-zA-Z][0-9][0-9][0-9][0-9][0-9]%', CustomerNotes)) ,7 ), 'XXXXXXX') ID_redacted
INTO #ID_REDACT
FROM #CustComments
-- split into rows
SELECT customerId, value
into #SPLIT
FROM #ID_REDACT
CROSS APPLY STRING_SPLIT(ID_redacted, ' ');
--redact based on join with a ""name"" table
SELECT s.customerid,
CASE
WHEN e.ename IS NULL THEN s.value
ELSE 'XXXXXXX'
END AS name_redact
INTO #NAME_REDACT
FROM #split AS s
LEFT OUTER JOIN #empname AS e
ON s.value = e.ename
SELECT customerId,
STRING_AGG(name_redact, ' ') as full_redact
INTO #RESULTS
from #NAME_REDACT
group by CustomerID
-- RESULTS WITH COMPARISON
SELECT
C.CustomerID,
C.CustomerNotes AS Original,
R.full_redact AS Redacted
FROM #CustComments AS C
INNER JOIN #RESULTS AS R
ON C.CustomerID = R.customerId
--TIDY UP
DROP TABLE #CustComments
DROP TABLE #empname
DROP TABLE #ID_REDACT
DROP TABLE #SPLIT
DROP TABLE #NAME_REDACT
DROP TABLE #RESULTS
I have looked for the codes but I couldn't make the codes work in my SQL server 2017.
I need to create a stored procedure to avoid data duplication from a table and delete all the data duplicated.
I have created this code:
CREATE PROCEDURE deldupl_LSBU_Staff AS
SELECT Phone_number, COUNT(*) as CNT
FROM LSBU_Staff
GROUP BY Phone_number
DELETE FROM LSBU_Staff
WHERE Phone_number > 1;
BUT when I execute my code, it deletes all the records from the table and I do not want this. I just want to delete all the duplicated data.
I have also created another code to delete the duplicated data from the table LSBU_Staff:
SELECT ROW_NUMBER() OVER(PARTITION BY Phone_number ORDER BY Phone_number)
AS del_dupl_record
FROM LSBU_Staff
WHERE Phone_number > 1
DELETE FROM LSBU_Staff
WHERE Phone_number > 1;
And it still deletes all the data.
LSBU_Staff columns are: Staff_id, LastName, FirstName, Speciality_type and Phone_number. I chose Phone_number as its identification.
Try this. Not super elegant and can be cleaned up but should do the trick. This will keep the first of the group. If you prefer to keep the last change to "l2.Staff_id > l1.Staff_id"
--
DROP TABLE IF EXISTS LSBU_Staff
CREATE TABLE LSBU_Staff
( Staff_id INT IDENTITY(1,1)
, LastName VARCHAR(32)
, FirstName VARCHAR(32)
, Speciality_type VARCHAR(32)
, Phone_number VARCHAR(32)
)
INSERT INTO LSBU_Staff (LastName, FirstName, Speciality_type, Phone_number)
VALUES
('Stilskin', 'Rumple', 'dancer' , '305-305-3050')
, ('Lamb', 'Mary', 'shepherd' , '305-123-4567')
, ('Lamb', 'Aurthur', 'shepherd' , '305-123-4567')
, ('Fenokee', 'Okee', 'swimmer' , '305-305-3051')
SELECT * FROM LSBU_Staff
DELETE LSBU_Staff
WHERE Staff_id IN
(
SELECT Staff_id
FROM LSBU_Staff l1
WHERE EXISTS (SELECT 1 FROM LSBU_Staff l2 WHERE l2.Phone_number = l1.Phone_number
AND l2.Staff_id < l1.Staff_id)
)
SELECT * FROM LSBU_Staff
DROP TABLE IF EXISTS LSBU_Staff
if you have a table (example)
declare #MyTable table (
CustomerName nvarchar(50),
BirthDate datetime,
BirtPlace nvarchar(50),
Phone nvarchar(50),
Email nvarchar(50)
)
insert into #MyTable
(
CustomerName,
BirthDate,
BirtPlace,
Phone,
Email
)
values (
'Customer1',
'12.05.1990',
'Place1',
N'+000125456789',
N'customer#customer.com'
)
Is it possible to get following result set:
CustomerName Customer1
BirtDate 1990-12-05
BirtPlace Place1
Phone +000125456789
Email customer#customer.com
Something like pivot, but i don't have any idea how to get to this result.
As you want to change columns to rows the function you want is unpivot not pivot.
This should do the trick:
SELECT col, val
FROM
(
SELECT
CustomerName,
CAST(BirthDate AS NVARCHAR(50)) BirthDate,
BirtPlace,
Phone,
Email
FROM #MyTable
) AS t
UNPIVOT
(
val FOR col IN (CustomerName, BirthDate, BirtPlace, Phone, Email)
) AS u
Try this
SELECT myColumn, myDetail
FROM
(
SELECT
CustomerName,
CONVERT(NVARCHAR(50),BirthDate,121) AS BirthDate,
BirtPlace,
Phone,
Email
FROM
#MyTable
) AS A
UNPIVOT
(
myDetail FOR myColumn IN (CustomerName, BirthDate, BirtPlace, Phone, Email)
) AS tbUnpivot
This is a unpivot issue, and if you used old version sql server 2000, this unpivot syntax will not work, then you can use UNION:
SELECT 'CustomerName' AS colName, CustomerName AS colVal FROM #MyTable
UNION
SELECT 'BirthDate' AS colName, CAST(BirthDate AS NVARCHAR(50)) AS colVal FROM #MyTable
UNION
SELECT 'BirthPlace' AS colName, BirthPlace AS colVal FROM #MyTable
UNION
SELECT 'Phone' AS colName, Phone AS colVal FROM #MyTable
UNION
SELECT 'Email' AS colName, Email AS colVal FROM #MyTable;
SELECT CustomerName,BirtDate,BirtPlace,Phone,Email FROM #MyTable\G
You have no ID and no Foreign keys so i think thats a way to solve your problem. The \G giveĀ“s you the SQL-query as a list.
values (
'Customer1', '1990-12-05', 'Place1', '+000125456789', 'customer#customer.com'
);
I hope i could help you
Have a nice Day
I am still getting a weird error:
The select list for the INSERT statement contains more items than the insert list. The number of SELECT values must match the number of INSERT columns.
Code:
INSERT INTO #tab (Phone)
select t2.Phone
from
(
SELECT DISTINCT top 999 t3.Phone, MIN(t3.Ord)
FROM
(
select Phone1 as Phone, Ord from #tabTemp
union all
select Phone2 as Phone, Ord from #tabTemp
) t3
GROUP BY t3.Phone
ORDER BY MIN(t3.Ord) asc, t3.Phone
) t2
The idea is to select all phone numbers from #tabTemp with their row order. Then I wanna distinct them and insert distincted numbers into table #tab. Top 999 is here only for order by purpose, because I use it into a function (UDF).
Structures are following:
declare #tabTemp TABLE
(
Phone1 varchar(128) NULL,
Phone2 varchar(128) NULL,
Ord int
);
declate #tab TABLE
(
Phone varchar(max) NULL
);
EDITED:
FULL CODE
CREATE FUNCTION dbo.myFnc(#PID int, #VID int, #JID int, #ColumnNo int)
RETURNS #tab TABLE
(
Phone varchar(max) NULL
)
AS
BEGIN
if #PID is null and #VID is null and #JID is null
return;
if #ColumnNo is null or (#ColumnNo<>2 and #ColumnNo<>3 and #ColumnNo<>6)
return;
declare #catH int;
set #catH = dbo.fncGetCategoryID('H','tt'); -- just returning int value
declare #kvalP int;
set #kvalP = dbo.fncGetCategoryID('P','te');
declare #kvalR int;
set #kvalR = dbo.fncGetCategoryID('R','te');
declare #tabTemp TABLE
(
Phone1 varchar(128) NULL,
Phone2 varchar(128) NULL,
Ord int
);
-- finding parent subject + current one
WITH subj AS(
SELECT *
FROM Subjekt
WHERE
(ID = #PID and #PID is not null)
or
(ID = #VID and #VID is not null)
or
(ID = #JID and #JID is not null)
UNION ALL
SELECT t.*
FROM Subjekt t
INNER JOIN subj r ON r.ID = t.ID
)
INSERT INTO #tabTemp (Phone1,Phone2)
(select
(case when o.TYP1=#catH then o.TEL1 else null end) Phone1
,(case when o.TYP2=#catH then o.TEL2 else null end) Phone2
,so.POR_C
from
subj s
,SubjektPerson so
,Persons o
,recSetup idS
,recSetup idSO
,recSetup idO
where 1=1
and idO.isValid=1
and idSO.isValid=1
and idS.isValid=1
and idSO.ID0=so.ID
and idS.ID0=s.ID
and idO.ID0=o.ID
and so.ID_PERSON=o.ID
and so.ID_SUBJECT=s.ID
and (o.TYP=#kvalP or o.TYP=#kvalR)
)
INSERT INTO #tab (Phone)
select t2.Phone
from
(
SELECT DISTINCT top 999 t3.Phone, MIN(t3.Ord)
FROM
(
select Phone1 as Phone, Ord from #tabTemp
union all
select Phone2 as Phone, Ord from #tabTemp
) t3
GROUP BY t3.Phone
ORDER BY MIN(t3.Ord) asc, t3.Phone
) t2
RETURN
END
Not sure why you have distinct AND a group by on the same query. You could greatly simplify this.
INSERT INTO #tab (Phone)
SELECT top 999 t3.Phone
FROM
(
select Phone1 as Phone, Ord from #tabTemp
union all
select Phone2 as Phone, Ord from #tabTemp
) t3
GROUP BY t3.Phone
ORDER BY MIN(t3.Ord) asc, t3.Phone
Now for the error message you were receiving, it doesn't seem like it came from this block of code because the syntax is fine and the number of columns matches correctly. I suspect the error is somewhere earlier in your code.
Also, you might want to consider using temp tables instead of table variables since it seems like you have a lot of rows in these tables.
You've focussed on the wrong insert. This is the one with the mismatch:
INSERT INTO #tabTemp (Phone1,Phone2)
(select
(case when o.TYP1=#catH then o.TEL1 else null end) Phone1
,(case when o.TYP2=#catH then o.TEL2 else null end) Phone2
,so.POR_C
from
...
Two columns in the insert list, 3 columns in the subselect. I can't tell just from the naming whether POR_C was meant to end up in the Ord column or not.
On the surface, it appears you are maybe triggering a query planner bug or something. There are a number of iffy things going on:
The union all of the same table to itself
Using both group by and distinct
I'm not sure what you mean by
Top 999 is here only for order by purpose, because I use it into a function (UDF).
Do you mean this whole query is executed within a UDF? If so, are there other queries that might be giving that error?