So I have an interesting scenario. There are 2 events, for which one of the Married EventType I'd like to join associated married couples of the house, grouped by an identifier (HouseID; 2 people in the same house), the EventType, and the EventDate. For the case such as an EventType of Birthday, the 2 residents of the house would not be combined in the same row. In the case of an EventType of Wedding, combine the name result of 2 rows (2 rules below based on LastName) into 1. It's possible to have only 1 person for a HouseID for the EventType of Wedding or Birthday, so therefore they would list as an individual row. The combining rules for EventType of Wedding would be as follows:
If the last names are the same, the FinalName column result would be Will and Mary Stanton
If the last names are different, the FinalName column result would be Stephen Jacobs and Janetsy Lilly.
The combining of the names would be contingent on the HouseID, EventType, and EventDate being the same, specific only to Wedding. This is because it's possible for 2 people to live in a house that are married, but not to each other which we base off the EventDate; we assume the EventDate is the indicator that they are married to each other. The example table input is as follows:
DECLARE #t TABLE (
HouseID INT,
FirstName NVARCHAR(64),
LastName NVARCHAR(64),
EventType NVARCHAR(64),
EventDate DATE
);
INSERT INTO #t (HouseID, FirstName, LastName, EventType, EventDate)
VALUES
(1, 'Will', 'Stanton', 'Birthday', '1974-01-05'),
(1, 'Mary', 'Stanton', 'Birthday', '1980-05-22'),
(2, 'Jason', 'Stockmore', 'Birthday', '1987-12-07'),
(3, 'Mark', 'Mellony', 'Wedding', '2021-04-04'),
(3, 'Stacy', 'Mellony', 'Wedding', '2021-04-04'),
(4, 'Stephen', 'Johnson', 'Wedding', '2012-01-30'),
(4, 'Janetsy', 'Johnson', 'Wedding', '2012-01-30'),
(5, 'George', 'Jackson', 'Wedding', '2009-11-15'),
(5, 'Sally', 'Mistmoore', 'Wedding', '2009-11-15'),
(6, 'Sandy', 'Katz', 'Wedding', '2010-03-19'),
(6, 'Jeff', 'Trilov', 'Wedding', '2016-09-09'),
(7, 'Sandra', 'Kirchbaum', 'Wedding', '2011-05-22'),
(8, 'Jessica', 'Bower', 'Birthday', '1996-02-26'),
(8, 'Frank', 'Fjorn', 'Birthday', '1969-07-19');
The ideal result based on the input table would resemble:
| HouseID | FinalName | EventType | EventDate |
| ------- | ---------------------------------- | --------- | ---------- |
| 1 | Mary Stanton | Birthday | 1974-01-05 |
| 1 | Will Stanton | Birthday | 1980-05-22 |
| 2 | Jason Stockmore | Birthday | 1987-12-07 |
| 3 | Mark and Stacy Mellony | Wedding | 2021-04-04 |
| 4 | Stephen and Janetsy Johnson | Wedding | 2012-01-30 |
| 5 | George Jackson and Sally Mistmoore | Wedding | 2009-11-15 |
| 6 | Sandy Katz | Wedding | 2010-03-19 |
| 6 | Jeff Trilov | Wedding | 2016-09-09 |
| 7 | Sandra Kirchbaum | Wedding | 2011-05-22 |
| 8 | Jessica Bower | Birthday | 1996-02-26 |
| 8 | Frank Fjorn | Birthday | 1969-07-19 |
I have tried a couple of approaches; one of which is using an update statement to update the First Name and update a subsequent FirstName based on Row number using a previously set #Values variable, unioning the result of the types that do not combine (in this case, Birthday). Here is where I built the names then used the MAX() aggregation to select the larger result:
SELECT HouseID,
FirstName
, LastName
, EventType
, EventDate
, RowNum = ROW_NUMBER() OVER (PARTITION BY LastName, EventType ORDER BY 1/0)
, Values1 = CAST(NULL AS VARCHAR(MAX))
INTO #EntityValues1
FROM #t
WHERE EventType = 'Wedding'
UPDATE #EntityValues1
SET #Values1 = Values1 =
CASE WHEN RowNum = 1
THEN FirstName
ELSE #Values1 + ' and ' + FirstName
END
However this example only works with combining FirstName1 + FirstName2 + LastName (in a subsequent query with MAX(Values1) + ' ' + LastName. I had to do a subsequent query to take the approach where I am combining names that do not have the same last name. I know this particular query is a bit tricky, but I'm wondering if there's any magic I'm missing out on. I've seen some suggestions that use a FOR XML approach with STUFF involved, and some other suggestions, but this one appears to be a tough one.
Here is an answer, with a working demo
;WITH
[DoubleWeddings] AS (
SELECT
[HouseID]
FROM
#t
WHERE
[EventType] = 'Wedding'
GROUP BY
[HouseID],
[EventDate]
HAVING
COUNT(*) = 2
),
[DoubleWeddingsSameLastName] AS (
SELECT
T.[HouseID]
FROM
[DoubleWeddings] DW
JOIN
#t T
ON T.[HouseID] = DW.[HouseID]
GROUP BY
T.[HouseID],
T.[LastName]
HAVING
COUNT(*) = 2
),
[DoubleWeddingsDifferentLastName] AS (
SELECT [HouseID] FROM [DoubleWeddings]
EXCEPT
SELECT [HouseID] FROM [DoubleWeddingsSameLastName]
),
[Couples] AS (
SELECT
T.[HouseID],
ROW_NUMBER() OVER (PARTITION BY T.[HouseID] ORDER BY 1/0) [RN],
T.[FirstName],
T.[LastName],
T.[EventType],
T.[EventDate]
FROM
[DoubleWeddings] DW
JOIN
#t T
ON T.[HouseID] = DW.[HouseID]
)
SELECT
DWSL.[HouseID],
FORMATMESSAGE(
'%s and %s %s',
F.[FirstName],
S.[FirstName],
S.[LastName]) [FinalName],
F.[EventType],
F.[EventDate]
FROM
[DoubleWeddingsSameLastName] DWSL
JOIN
[Couples] F
ON F.[HouseID] = DWSL.[HouseID] AND F.[RN] = 1
JOIN
[Couples] S
ON S.[HouseID] = DWSL.[HouseID] AND S.[RN] = 2
UNION ALL
SELECT
DWDL.[HouseID],
FORMATMESSAGE(
'%s %s and %s %s',
F.[FirstName],
F.[LastName],
S.[FirstName],
S.[LastName]) [FinalName],
F.[EventType],
F.[EventDate]
FROM
[DoubleWeddingsDifferentLastName] DWDL
JOIN
[Couples] F
ON F.[HouseID] = DWDL.[HouseID] AND F.[RN] = 1
JOIN
[Couples] S
ON S.[HouseID] = DWDL.[HouseID] AND S.[RN] = 2
UNION ALL
SELECT
T.[HouseID],
FORMATMESSAGE(
'%s %s',
T.[FirstName],
T.[LastName]) [FinalName],
T.[EventType],
T.[EventDate]
FROM
#t T
LEFT JOIN
[DoubleWeddings] DW
ON DW.[HouseID] = T.[HouseID]
WHERE
DW.[HouseID] IS NULL
ORDER BY
[HouseID],
[EventDate],
[FinalName]
There is one issue, I've used the ORDER BY 1/0 technique to skip providing an order for the ROW_NUMBER() which tends to assign the row numbers in the order of the underlying data. However, this is not guaranteed, and could very depending on the parallelization of the query.
It would be better if the order of the combination was provided by a column in the data, however, none is present in the example.
To get this event calendar:
select
t.HouseID,
CONCAT(STRING_AGG(CONCAT(FirstName,
' ',
CASE WHEN ln.LastName<>t.LastName
then t.LastName END),' and '),
' ',
MIN(t.LastName)) as Name,
t.EventType,
t.EventDate
from t
left join (select t1.HouseID, min(t1.LastName) as LastName from t as t1 GROUP BY t1.HouseID having count(t1.LastName)=1) ln on ln.HouseID = t.HouseID
GROUP BY t.EventDate, t.EventType, t.HouseID
order by month(t.EventDate), day(t.EventDate);
output:
HouseID
Name
EventType
EventDate
1
Will Stanton
Birthday
1974-01-05
4
Stephen and Janetsy Johnson
Wedding
2012-01-30
8
Jessica Bower
Birthday
1996-02-26
6
Sandy Katz
Wedding
2010-03-19
3
Mark and Stacy Mellony
Wedding
2021-04-04
7
Sandra Kirchbaum
Wedding
2011-05-22
1
Mary Stanton
Birthday
1980-05-22
8
Frank Fjorn
Birthday
1969-07-19
6
Jeff Trilov
Wedding
2016-09-09
5
George and Sally Jackson
Wedding
2009-11-15
2
Jason Stockmore
Birthday
1987-12-07
see: DBFIDDLE
EDIT:
Corrected the columnname (Name to FinalName), and the ordering of the results.
Fixed the value for HouseId=5
select
t.HouseID,
CONCAT(STRING_AGG(CONCAT(FirstName,
' ',
ISNULL(ln.LastName, t.LastName)
),' and '),
' ',
MIN(ln.LastName)) as FinalName,
t.EventType,
t.EventDate
from t
left join (select t1.HouseID, min(t1.LastName) as LastName from t as t1 GROUP BY t1.HouseID having count(t1.LastName)=1) ln on ln.HouseID = t.HouseID
GROUP BY t.EventDate, t.EventType, t.HouseID
order by HouseID, EventDate
;
see: DBFIDDLE
output:
HouseID
FinalName
EventType
EventDate
1
Will Stanton
Birthday
1974-01-05
1
Mary Stanton
Birthday
1980-05-22
2
Jason Stockmore Stockmore
Birthday
1987-12-07
3
Mark Mellony and Stacy Mellony
Wedding
2021-04-04
4
Stephen Johnson and Janetsy Johnson
Wedding
2012-01-30
5
George Jackson and Sally Mistmoore
Wedding
2009-11-15
6
Sandy Katz
Wedding
2010-03-19
6
Jeff Trilov
Wedding
2016-09-09
7
Sandra Kirchbaum Kirchbaum
Wedding
2011-05-22
8
Frank Fjorn
Birthday
1969-07-19
8
Jessica Bower
Birthday
1996-02-26
Related
I have a SQL table with FirstName, LastName, Add1 and other fields. I am working to get this data cleaned up. There are a few instances of likely dupes -
All 3 columns are the exact same for more than 1 record
The First and Last are the same, only 1 has an address, the other is blank
The First and Last are similar (John | Doe vs John C. | Doe) and the address is the same or one is blank
I'm wanting to generate a query I can provide to the users, so they can check these records out, compare their related records and then delete the one they don't need.
I've been looking at similarity functions, soundex, and such, but it all seems so complicated. Is there an easy way to do this?
Thanks!
Edit:
So here is some sample data:
FirstName | LastName | Add1
John | Doe | 1 Main St
John | Doe |
John A. | Doe |
Jane | Doe | 2 Union Ave
Jane B. | Doe | 2 Union Ave
Alex | Smith | 3 Broad St
Chris | Anderson | 4 South Blvd
Chris | Anderson | 4 South Blvd
I really like Critical Error's query for identifying all different types of dupes. That would give me the above sample data, with the Alex Smith result not included, because there are no dupes for that.
What I want to do is take that result set and identify which are dupes for Jane Doe. She should only have 2 dupes. John Doe has 3, and Chris Anderson has 2. Can I get at that sub-result set?
Edit:
I figured it out! I will be marking Critical Error's answer as the solution, since it totally got me where I needed to go. Here is the solution, in case it might help others. Basically, this is what we are doing.
Selecting the records from the table where there are dupes
Adding a WHERE EXISTS sub-query to look in the same table for exact dupes, where the ID from the main query and sub-query do not match
Adding a WHERE EXISTS sub-query to look in the same table for similar dupes, using a Difference factor between duplicative columns, where the ID from the main query and sub-query do not match
Adding a WHERE EXISTS sub-query to look in the same table for dupes on 2 fields where a 3rd may be null for one of the records, where the ID from the main query and sub-query do not match
Each subquery is connected with an OR, so that any kind of duplicate is found
At the end of each sub-query add a nested requirement that either the main query or sub-query be the ID of the record you are looking to identify duplicates for.
DECLARE #CID AS INT
SET ANSI_NULLS ON
SET NOCOUNT ON;
SET #CID = 12345
BEGIN
SELECT
*
FROM #Customers c
WHERE
-- Exact duplicates.
EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Add1 = c.Add1
AND x.Id <> c.Id
AND (x.ID = #CID OR c.ID = #CID)
)
-- Match First/Last name are same/similar and the address is same.
OR EXISTS (
SELECT * FROM #Customers x WHERE
DIFFERENCE( x.FirstName, c.FirstName ) = 4
AND DIFFERENCE( x.LastName, c.LastName ) = 4
AND x.Add1 = c.Add1
AND x.Id <> c.Id
AND (x.ID = #CID OR c.ID = #CID)
)
-- Match First/Last name and one address exists.
OR EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Id <> c.Id
AND (
x.Add1 IS NULL AND c.Add1 IS NOT NULL
OR
x.Add1 IS NOT NULL AND c.Add1 IS NULL
)
AND (x.ID = #CID OR c.ID = #CID)
);
Assuming you have a unique id between records, you can give this a try:
DECLARE #Customers table ( FirstName varchar(50), LastName varchar(50), Add1 varchar(50), Id int IDENTITY(1,1) );
INSERT INTO #Customers ( FirstName, LastName, Add1 ) VALUES
( 'John', 'Doe', '123 Anywhere Ln' ),
( 'John', 'Doe', '123 Anywhere Ln' ),
( 'John', 'Doe', NULL ),
( 'John C.', 'Doe', '123 Anywhere Ln' ),
( 'John C.', 'Doe', '15673 SW Liar Dr' );
SELECT
*
FROM #Customers c
WHERE
-- Exact duplicates.
EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Add1 = c.Add1
AND x.Id <> c.Id
)
-- Match First/Last name are same/similar and the address is same.
OR EXISTS (
SELECT * FROM #Customers x WHERE
DIFFERENCE( x.FirstName, c.FirstName ) = 4
AND DIFFERENCE( x.LastName, c.LastName ) = 4
AND x.Add1 = c.Add1
AND x.Id <> c.Id
)
-- Match First/Last name and one address exists.
OR EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Id <> c.Id
AND (
x.Add1 IS NULL AND c.Add1 IS NOT NULL
OR
x.Add1 IS NOT NULL AND c.Add1 IS NULL
)
);
Returns
+-----------+----------+-----------------+----+
| FirstName | LastName | Add1 | Id |
+-----------+----------+-----------------+----+
| John | Doe | 123 Anywhere Ln | 1 |
| John | Doe | 123 Anywhere Ln | 2 |
| John | Doe | NULL | 3 |
| John C. | Doe | 123 Anywhere Ln | 4 |
+-----------+----------+-----------------+----+
Initial resultset:
+-----------+----------+------------------+----+
| FirstName | LastName | Add1 | Id |
+-----------+----------+------------------+----+
| John | Doe | 123 Anywhere Ln | 1 |
| John | Doe | 123 Anywhere Ln | 2 |
| John | Doe | NULL | 3 |
| John C. | Doe | 123 Anywhere Ln | 4 |
| John C. | Doe | 15673 SW Liar Dr | 5 |
+-----------+----------+------------------+----+
I have a table that contains Home addresses and Mailing addresses. It looks like this:
ID Name StNum StName City State Zip Type
-- ---- ----- ------ ---- ----- --- ----
1 Joe 1234 Main St Waco TX 76767 HOM
1 Joe 2345 High St Waco TX 76763 MLG
2 Amy 3456 Broad St Athens GA 34622 HOM
3 Mel 987 Front St Cary NC 65331 HOM
3 Mel 1111 Main Ave Hilo HI 99779 MLG
I need to write an SQL statement that will only return the Mailing address (MLG record) if it exists, and if not, will return the Home address (HOM record).
The expected results from this table would be:
ID Name StNum StName City State Zip Type
-- ---- ----- ------ ---- ----- --- ----
1 Joe 2345 High St Waco TX 76763 MLG
2 Amy 3456 Broad St Athens GA 34622 HOM
3 Mel 1111 Main Ave Hilo HI 99779 MLG
Any help that you can provide would be much appreciated! Thanks!
use correlated subquery
select * from
(
select *,case when Type='MLG' then 1 else 0 end as typeval
from tablename
)A where typeval in (select max(case when Type='MLG' then 1 else 0 end) from tablename b
where a.name=b.name)
OR if your DB supports row_number() then u can try below -
select * from
(
select *, row_number() over(partition by name order by case when Type='MLG' then 1 else 0 end desc)
from tablename
)A where rn=1
In case you are using SQL Server, i would solve it with the ROW_NUMBER function.
SELECT ID, Name, StNum, StName, City, State, Zip, Type
FROM (
SELECT *
,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Type DESC) AS Rn
FROM yourtable
)
WHERE Rn = 1
This can be done using a WHERE clause that exclude the ids of the users who have MLG
Schema (MySQL v5.7)
CREATE TABLE test (
`ID` INTEGER,
`Name` VARCHAR(3),
`StNum` INTEGER,
`StName` VARCHAR(8),
`City` VARCHAR(6),
`State` VARCHAR(2),
`Zip` INTEGER,
`Type` VARCHAR(3)
);
INSERT INTO test
(`ID`, `Name`, `StNum`, `StName`, `City`, `State`, `Zip`, `Type`)
VALUES
('1', 'Joe', '1234', 'Main St', 'Waco', 'TX', '76767', 'HOM'),
('1', 'Joe', '2345', 'High St', 'Waco', 'TX', '76763', 'MLG'),
('2', 'Amy', '3456', 'Broad St', 'Athens', 'GA', '34622', 'HOM'),
('3', 'Mel', '987', 'Front St', 'Cary', 'NC', '65331', 'HOM'),
('3', 'Mel', '1111', 'Main Ave', 'Hilo', 'HI', '99779', 'MLG');
Query #1
SELECT id,
name,
StNum,
StName,
City,
State,
Zip,
Type
FROM test t1
WHERE t1.`Type` = 'MLG'
OR t1.id NOT IN
(
SELECT id
FROM test t2
WHERE t2.`Type` = 'MLG'
);
Output :
| id | name | StNum | StName | City | State | Zip | Type |
| --- | ---- | ----- | -------- | ------ | ----- | ----- | ---- |
| 1 | Joe | 2345 | High St | Waco | TX | 76763 | MLG |
| 2 | Amy | 3456 | Broad St | Athens | GA | 34622 | HOM |
| 3 | Mel | 1111 | Main Ave | Hilo | HI | 99779 | MLG |
View on DB Fiddle
Or, my first dumb version :
This can be done using UNION
Schema (MySQL v5.7)
CREATE TABLE test (
`ID` INTEGER,
`Name` VARCHAR(3),
`StNum` INTEGER,
`StName` VARCHAR(8),
`City` VARCHAR(6),
`State` VARCHAR(2),
`Zip` INTEGER,
`Type` VARCHAR(3)
);
INSERT INTO test
(`ID`, `Name`, `StNum`, `StName`, `City`, `State`, `Zip`, `Type`)
VALUES
('1', 'Joe', '1234', 'Main St', 'Waco', 'TX', '76767', 'HOM'),
('1', 'Joe', '2345', 'High St', 'Waco', 'TX', '76763', 'MLG'),
('2', 'Amy', '3456', 'Broad St', 'Athens', 'GA', '34622', 'HOM'),
('3', 'Mel', '987', 'Front St', 'Cary', 'NC', '65331', 'HOM'),
('3', 'Mel', '1111', 'Main Ave', 'Hilo', 'HI', '99779', 'MLG');
Query #1
SELECT id,
name,
StNum,
StName,
City,
State,
Zip,
Type
FROM test t1
WHERE t1.`Type` = 'MLG'
UNION ALL
SELECT id,
name,
StNum,
StName,
City,
State,
Zip,
Type
FROM test t2
WHERE t2.id NOT IN (SELECT id FROM test t3 WHERE t3.`Type` = 'MLG')
ORDER BY id;
Output
| id | name | StNum | StName | City | State | Zip | Type |
| --- | ---- | ----- | -------- | ------ | ----- | ----- | ---- |
| 1 | Joe | 2345 | High St | Waco | TX | 76763 | MLG |
| 2 | Amy | 3456 | Broad St | Athens | GA | 34622 | HOM |
| 3 | Mel | 1111 | Main Ave | Hilo | HI | 99779 | MLG |
View on DB Fiddle
This is a prioritization query. With two values, often the simplest method is union all with not exists (or not in).
That does not generalize well For more values, using row_number() with case is convenient:
select t.*
from (select t.*,
row_number() over (partition by id
order by (case when type = 'MLG' then 1 else 2 end)
) as seqnum
from t
) t
where seqnum = 1;
In your particular case, you could use order by type desc, because the two types happen to be prioritized in reverse alphabetical ordering. However, I recommend using case because the intention is more explicit.
I have a business case scenario where I need to do a lookup into our SQL "Users" table to find out email addresses which are duplicated. I was able to do that by the below query:
SELECT
user_email, COUNT(*) as DuplicateEmails
FROM
Users
GROUP BY
user_email
HAVING
COUNT(*) > 1
ORDER BY
DuplicateEmails DESC
I get an output like this:
user_email DuplicateEmails
--------------------------------
abc#gmail.com 2
xyz#yahoo.com 3
Now I am asked to list out all the duplicate records in a single row of its own and display some additional properties like first name , last name and userID. All this information is stored in this table "Users". I am having difficulty doing so. Can anyone help me or put me toward right direction?
My output needs to look like this:
user_email DuplicateEmails FirstName LastName UserID
------------------------------------------------------------------------------
abc#gmail.com 2 Tim Lentil timLentil
abc#gmail.com 2 John Doe johnDoe12
xyz#yahoo.com 3 brian boss brianTheBoss
xyz#yahoo.com 3 Thomas Hood tHood
xyz#yahoo.com 3 Mark Brown MBrown12
There are several ways you could do this. Here is one using a cte.
with FoundDuplicates as
(
SELECT
uter_email, COUNT(*) as DuplicateEmails
FROM
Users
GROUP BY
uter_email
HAVING
COUNT(*) > 1
)
select fd.user_email
, fd.DuplicateEmails
, u.FirstName
, u.LastName
, u.UserID
from Users u
join FoundDuplicates fd on fd.uter_email = u.uter_email
ORDER BY fd.DuplicateEmails DESC
Use count() over( Partition by ), example
You can solve it like:
DECLARE #T TABLE
(
UserID VARCHAR(20),
FirstName NVARCHAR(45),
LastName NVARCHAR(45),
UserMail VARCHAR(45)
);
INSERT INTO #T (UserMail, FirstName, LastName, UserID) VALUES
('abc#gmail.com', 'Tim', 'Lentil', 'timLentil'),
('abc#gmail.com', 'John', 'Doe', 'johnDoe12'),
('xyz#yahoo.com', 'brian', 'boss', 'brianTheBoss'),
('xyz#yahoo.com', 'Thomas', 'Hood', 'tHood'),
('xyz#yahoo.com', 'Mark', 'Brown', 'MBrown12');
SELECT *, COUNT (1) OVER (PARTITION BY UserMail) MailCount
FROM #T;
Results:
+--------------+-----------+----------+---------------+-----------+
| UserID | FirstName | LastName | UserMail | MailCount |
+--------------+-----------+----------+---------------+-----------+
| timLentil | Tim | Lentil | abc#gmail.com | 2 |
| johnDoe12 | John | Doe | abc#gmail.com | 2 |
| brianTheBoss | brian | boss | xyz#yahoo.com | 3 |
| tHood | Thomas | Hood | xyz#yahoo.com | 3 |
| MBrown12 | Mark | Brown | xyz#yahoo.com | 3 |
+--------------+-----------+----------+---------------+-----------+
Use a window function like this:
SELECT u.*
FROM (SELECT u.*, COUNT(*) OVER (PARTITION BY user_email) as numDuplicateEmails
FROM Users
) u
WHERE numDuplicateEmails > 1
ORDER BY numDuplicateEmails DESC;
I think this will also work.
WITH cte (
SELECT
*
,DuplicateEmails = ROW_NUMBER() OVER (Partition BY user_email ORder by user_email)
FROM Users
)
Select * from CTE
where DuplicateEmails > 1
I have a table in SQL where the results look something like:
Number | Name | Name 2
1 | John | Derek
1 | John | NULL
2 | Jane | Louise
2 | Jane | NULL
3 | Michael | Mark
3 | Michael | NULL
4 | Sara | Paul
4 | Sara | NULL
I want a way to say that if Number=1, return Name 2 in new column Name 3, so that the results would look like:
Number | Name | Name 2 | Name 3
1 | John | Derek | Derek
1 | John | NULL | Derek
2 | Jane | Louise | Louise
2 | Jane | NULL | Louise
3 | Michael | Mark | Mark
3 | Michael | NULL | Mark
4 | Sara | Paul | Paul
4 | Sara | NULL | Paul
The problem is that I can't say if Number=1, return Name 2 in Name 3, because my table has >100,000 records. I need it to do it automatically. More like "if Number is the same, return Name 2 in Name 3." I've tried to use a CASE statement but haven't been able to figure it out. Is there any way to do this?
Empirically, this seems to work:
SELECT
Number, Name, [Name 2],
MAX([Name 2]) OVER (PARTITION BY Number) [Name 3]
FROM yourTable;
The idea here, if I interpreted your requirements correctly, is that you want to report the non NULL value of the second name for all records as the third name value.
Solution 3, with group by
with maxi as(
SELECT Number, max(Name2) name3
FROM #sample
group by number, name
)
SELECT f1.*, f2.name3
FROM #sample f1 inner join maxi f2 on f1.number=f2.number
Solution 4, with cross apply
SELECT *
FROM #sample f1 cross apply
(
select top 1 f2.Name2 as Name3 from #sample f2
where f2.number=f1.number and f2.Name2 is not null
) f3
you can try this:
Solution 1, with row_number
declare #sample table (Number integer, Name varchar(50), Name2 varchar(50))
insert into #sample
select 1 , 'John' , 'Derek' union all
select 1 , 'John' , NULL union all
select 2 , 'Jane' , 'Louise' union all
select 2 , 'Jane' , NULL union all
select 3 , 'Michael' , 'Mark' union all
select 3 , 'Michael' , NULL union all
select 4 , 'Sara' , 'Paul' union all
select 4 , 'Sara' , NULL ;
with tmp as (
select *, row_number() over(partition by number order by number) rang
from #sample
)
select f1.Number, f1.Name, f1.Name2, f2.Name2 as Name3
from tmp f1 inner join tmp f2 on f1.Number=f2.Number and f2.rang=1
Solution 2, with lag (if your sql server version has lag function)
SELECT
Number, Name, Name2,
isnull(Name2, lag(Name2) OVER (PARTITION BY Number order by number)) Name3
FROM #sample;
Okay, this is a little hard to explain, but I'll give it my best shot.
I've got two tables, we'll call them table1 and table2. table1 looks something like this:
ID | CampaignID | Package | GroupID
1 | 1 | 1 | 1
2 | 1 | 1 | 2
3 | 1 | 2 | 2
4 | 2 | 1 | 3
5 | 2 | 2 | 3
6 | 2 | 3 | 3
etc
Table2 looks something like this:
ID | ClientID | ClientName | Package | OrderID
1 | 1111 | John Smith | 1 | 155
2 | 1111 | John Smith | 2 | 155
4 | 2222 | Dave Jones | 1 | 177
5 | 2222 | Dave Jones | 2 | 178
6 | 2222 | Dave Jones | 3 | 179
What I'm trying to do, is see if for example, John Smith has any Orders with sets of packages that match one of the Campaign Groups in table1. For the above example, John Smith's order 155 would match CampaignID 1, GroupID 2. Dave Jones's order 177 matches CampaignID 1, GroupID 1. However orders 178 and 179 don't match anything. So each set of packages in an order need to contain all the packages for a given group in order to match it
For the purposes of the select statement I have the client's id along with the orderID, and I'm simply trying to see if the packages in his order match the criteria for any campaigns.
I know I probably haven't explained this too well, so let me know what needs clarifying.
EDIT:
If lets say we search for orderID 155, clientID 1111, then the desired result would be:
CampaignID | GroupID
1 | 2
Perhaps given that GroupID 1 also qualifies, it could return the groupID that qualifies with the largest number of packages.
Is this , what u want?
select table1.CampaignID from Table2
left join table1 on table1.Package =table2.Package
where Table2.ClientID =#ClientID
and Table2.OrderID =#OrderID
I think I figured it out:
SELECT top 1 pcr1.GroupID FROM table1 pcr1
Where Not Exists (Select CampaignID from table1 as pcr2
where CampaignID = #campaignID and pcr2.GroupID= pcr1.GroupID Except
Select t2.Package from table2 as t2 where t2.OrderID = #ordID)
GROUP BY pcr1.GroupID
ORDER BY COUNT(pcr1.GroupID) DESC
Given a known order and a campaign the client is trying to apply for, this gives me the group id that best matches the given order (in regard to packages), if any exists. And just because people seem to be asking, the way the tables appear here aren't exactly the way they're implemented, it's just a simplified equivalent for the purposes of this question.
Here is my Linkedin Article with a more in depth explaination:
https://www.linkedin.com/pulse/profile-matching-mike-inman?lipi=urn%3Ali%3Apage%3Ad_flagship3_profile_view_base_post_details%3Bqj89uO7mTSyVtHet9544VA%3D%3D
CREATE TABLE Campaign(
ID INT,
CampaignID INT,
Package INT,
GroupID INT)
CREATE TABLE ClientOrder(
ID INT,
ClientID INT,
ClientName VARCHAR(20),
Package INT,
OrderID INT)
INSERT Campaign
(ID, CampaignID, Package, GroupID)
VALUES
(1, 1, 1, 1)
,(2, 1, 1, 2)
,(3, 1, 2, 2)
,(4, 2, 1, 3)
,(5, 2, 2, 3)
,(6, 2, 3, 3)
INSERT ClientOrder
(ID, ClientID, ClientName, Package, OrderID)
VALUES
(1, 1111, 'John Smith', 1 , 155)
,(2, 1111, 'John Smith', 2 , 155)
,(4, 2222, 'Dave Jones', 1 , 177)
,(5, 2222, 'Dave Jones', 2 , 178)
,(6, 2222, 'Dave Jones', 3 , 179)
The query:
SELECT CO.ClientName, CampaignID
FROM ClientOrder AS CO
JOIN Campaign as C1
ON CO.Package = C1.Package
GROUP BY CO.ClientName, CampaignID
HAVING COUNT(CO.Package) = (
SELECT COUNT(Package)
FROM Campaign C2
WHERE C1.CampaignID = C2.CampaignID)
Results:
ClientName CampaignID
-------------------- -----------
Dave Jones 1
John Smith 1
Dave Jones 2