I have a table from a third party source that I have no control over. Each address field contains the FULL address including postcode. Each address has a dynamic number of separate address fields included. I have got a Split() TVF that works in splitting the address fields into seperate rows. However, when I CROSS APPLY the TVF table back to the SELECT query it returns a row for each ro in the TVF table. How do I get it to return one row for the main table and separate COLUMNS from the rows in the TVF table?
Example of addresses supplied:
a. 1 The Street, The Locality, The Town, The County, The Postcode
b. 2 The Building, The Street, The Town, The Postcode
c. Floor 3, 3 The Street, The Locality, The Town, The County, The Postcode
The TVF returns these as one value per row using the ',' as a delimiter. I then need to join that data back to the original data as one column per address field.
This is my SELECT QUERY:
select DISTINCT TOP 5 ttp.ProjectID
,cac.ID
,cac1.Proprietor1Address1
--,CASE WHEN addr.ID = 1 THEN addr.Data END AS Address1
FROM ArdentTest.ardent.LRTitlesToProcess ttp
JOIN LandRegistryData.landreg.CommercialandCorporateOwnershipData cac
ON ttp.TitleNo = cac.TitleNumber
JOIN (SELECT TitleNumber
,Proprietor1Address1
FROM LandRegistryData.landreg.CommercialandCorporateOwnershipData
WHERE 1 = 1
AND ISNULL(Proprietor1Address1, '') <> '') cac1
ON ttp.TitleNo = cac1.TitleNumber
CROSS APPLY DBAdmin.resource.Split(cac1.Proprietor1Address1, ', ') addr
WHERE 1 = 1
AND ttp.DateLRRequestSent IS NULL
AND cac.ID IN (50764, 78800, 157089, 206049, 449112)
ORDER BY 1
Which produces the following results:
ProjectID ID Proprietor1Address1
1010 50764 Bridge House, 1 Walnut Tree Close, Guildford, Surrey GU1 4LZ
1010 78800 Bridge House, 1 Walnut Tree Close, Guildford, Surrey GU1 4LZ
1010 157089 Bridge House, 1 Walnut Tree Close, Guildford, Surrey GU1 4LZ
1010 206049 Bridge House, 1 Walnut Tree Close, Guildford, Surrey GU1 4LZ
1010 449112 Church House, Great Smith Street, London SW1P 3AZ
I need to use the rows from the function to add separate address columns to the result set and I cannot figure out how to do it.
If you can make some assumption like an address can contain max 10 parts. You can use this script. I used STRING_SPLIT instead of your split function.
DECLARE #Table TABLE (Id INT, Address VARCHAR(500))
INSERT INTO #Table VALUES
(1, '1 The Street, The Locality, The Town, The County, The Postcode'),
(2, '2 The Building, The Street, The Town, The Postcode'),
(3, 'Floor 3, 3 The Street, The Locality, The Town, The County, The Postcode')
SELECT * FROM #Table T
CROSS APPLY(
SELECT * FROM (SELECT ROW_NUMBER()OVER(ORDER BY (SELECT NULL)) RN, * FROM STRING_SPLIT(T.Address, ',')) SRC
PIVOT (MAX(value) FOR RN IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10])) PVT
) X
Result:
Id Address 1 2 3 4 5 6 7 8 9 10
----------- -------------------------------------------------------------------------------- --------------------- --------------------- --------------------- --------------------- --------------------- --------------------- --------------------- --------------------- --------------------- ---------------------
1 1 The Street, The Locality, The Town, The County, The Postcode 1 The Street The Locality The Town The County The Postcode NULL NULL NULL NULL NULL
2 2 The Building, The Street, The Town, The Postcode 2 The Building The Street The Town The Postcode NULL NULL NULL NULL NULL NULL
3 Floor 3, 3 The Street, The Locality, The Town, The County, The Postcode Floor 3 3 The Street The Locality The Town The County The Postcode NULL NULL NULL NULL
You can create an XML out of your concatenated string and address each part via index
--the table (thx Serkan, I copied your DDL)
DECLARE #Table TABLE (Id INT, Address VARCHAR(500))
INSERT INTO #Table VALUES
(1, '1 The Street, The Locality, The Town, The County, The Postcode'),
(2, '2 The Building, The Street, The Town, The Postcode'),
(3, 'Floor 3, 3 The Street, The Locality, The Town, The County, The Postcode');
--This query will return the parts one by one
WITH Splitted AS
(
SELECT *
,CAST('<x>' + REPLACE((SELECT [Address] AS [*] FOR XML PATH('')),',','</x><x>') + '</x>' AS XML) AsXml
FROM #Table
)
SELECT *
,AsXml.value('/x[1]','nvarchar(max)') AS Col1
,AsXml.value('/x[2]','nvarchar(max)') AS Col2
,AsXml.value('/x[3]','nvarchar(max)') AS Col3
,AsXml.value('/x[4]','nvarchar(max)') AS Col4
,AsXml.value('/x[5]','nvarchar(max)') AS Col5
,AsXml.value('/x[6]','nvarchar(max)') AS Col6
FROM Splitted
--the result
/*+----+----------------+--------------+--------------+--------------+--------------+--------------+
| ID | Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
+----+----------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 The Street | The Locality | The Town | The County | The Postcode | NULL |
+----+----------------+--------------+--------------+--------------+--------------+--------------+
| 2 | 2 The Building | The Street | The Town | The Postcode | NULL | NULL |
+----+----------------+--------------+--------------+--------------+--------------+--------------+
| 3 | Floor 3 | 3 The Street | The Locality | The Town | The County | The Postcode |
+----+----------------+--------------+--------------+--------------+--------------+--------------+*/
--You might use REVERSE on all steps. This will bring the Postcode in the first place in all cases (as long as the postcode is the last element there)
WITH Splitted AS
(
SELECT *
,CAST('<x>' + REPLACE((SELECT REVERSE([Address]) AS [*] FOR XML PATH('')),',','</x><x>') + '</x>' AS XML) AsXml
FROM #Table
)
SELECT *
,REVERSE(AsXml.value('/x[1]','nvarchar(max)')) AS Col1
,REVERSE(AsXml.value('/x[2]','nvarchar(max)')) AS Col2
,REVERSE(AsXml.value('/x[3]','nvarchar(max)')) AS Col3
,REVERSE(AsXml.value('/x[4]','nvarchar(max)')) AS Col4
,REVERSE(AsXml.value('/x[5]','nvarchar(max)')) AS Col5
,REVERSE(AsXml.value('/x[6]','nvarchar(max)')) AS Col6
FROM Splitted;
--the result
/*+----+--------------+------------+------------+----------------+--------------+---------+
| ID | Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
+----+--------------+------------+------------+----------------+--------------+---------+
| 1 | The Postcode | The County | The Town | The Locality | 1 The Street | NULL |
+----+--------------+------------+------------+----------------+--------------+---------+
| 2 | The Postcode | The Town | The Street | 2 The Building | NULL | NULL |
+----+--------------+------------+------------+----------------+--------------+---------+
| 3 | The Postcode | The County | The Town | The Locality | 3 The Street | Floor 3 |
+----+--------------+------------+------------+----------------+--------------+---------+*/
Related
I have a SQL table with FirstName, LastName, Add1 and other fields. I am working to get this data cleaned up. There are a few instances of likely dupes -
All 3 columns are the exact same for more than 1 record
The First and Last are the same, only 1 has an address, the other is blank
The First and Last are similar (John | Doe vs John C. | Doe) and the address is the same or one is blank
I'm wanting to generate a query I can provide to the users, so they can check these records out, compare their related records and then delete the one they don't need.
I've been looking at similarity functions, soundex, and such, but it all seems so complicated. Is there an easy way to do this?
Thanks!
Edit:
So here is some sample data:
FirstName | LastName | Add1
John | Doe | 1 Main St
John | Doe |
John A. | Doe |
Jane | Doe | 2 Union Ave
Jane B. | Doe | 2 Union Ave
Alex | Smith | 3 Broad St
Chris | Anderson | 4 South Blvd
Chris | Anderson | 4 South Blvd
I really like Critical Error's query for identifying all different types of dupes. That would give me the above sample data, with the Alex Smith result not included, because there are no dupes for that.
What I want to do is take that result set and identify which are dupes for Jane Doe. She should only have 2 dupes. John Doe has 3, and Chris Anderson has 2. Can I get at that sub-result set?
Edit:
I figured it out! I will be marking Critical Error's answer as the solution, since it totally got me where I needed to go. Here is the solution, in case it might help others. Basically, this is what we are doing.
Selecting the records from the table where there are dupes
Adding a WHERE EXISTS sub-query to look in the same table for exact dupes, where the ID from the main query and sub-query do not match
Adding a WHERE EXISTS sub-query to look in the same table for similar dupes, using a Difference factor between duplicative columns, where the ID from the main query and sub-query do not match
Adding a WHERE EXISTS sub-query to look in the same table for dupes on 2 fields where a 3rd may be null for one of the records, where the ID from the main query and sub-query do not match
Each subquery is connected with an OR, so that any kind of duplicate is found
At the end of each sub-query add a nested requirement that either the main query or sub-query be the ID of the record you are looking to identify duplicates for.
DECLARE #CID AS INT
SET ANSI_NULLS ON
SET NOCOUNT ON;
SET #CID = 12345
BEGIN
SELECT
*
FROM #Customers c
WHERE
-- Exact duplicates.
EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Add1 = c.Add1
AND x.Id <> c.Id
AND (x.ID = #CID OR c.ID = #CID)
)
-- Match First/Last name are same/similar and the address is same.
OR EXISTS (
SELECT * FROM #Customers x WHERE
DIFFERENCE( x.FirstName, c.FirstName ) = 4
AND DIFFERENCE( x.LastName, c.LastName ) = 4
AND x.Add1 = c.Add1
AND x.Id <> c.Id
AND (x.ID = #CID OR c.ID = #CID)
)
-- Match First/Last name and one address exists.
OR EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Id <> c.Id
AND (
x.Add1 IS NULL AND c.Add1 IS NOT NULL
OR
x.Add1 IS NOT NULL AND c.Add1 IS NULL
)
AND (x.ID = #CID OR c.ID = #CID)
);
Assuming you have a unique id between records, you can give this a try:
DECLARE #Customers table ( FirstName varchar(50), LastName varchar(50), Add1 varchar(50), Id int IDENTITY(1,1) );
INSERT INTO #Customers ( FirstName, LastName, Add1 ) VALUES
( 'John', 'Doe', '123 Anywhere Ln' ),
( 'John', 'Doe', '123 Anywhere Ln' ),
( 'John', 'Doe', NULL ),
( 'John C.', 'Doe', '123 Anywhere Ln' ),
( 'John C.', 'Doe', '15673 SW Liar Dr' );
SELECT
*
FROM #Customers c
WHERE
-- Exact duplicates.
EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Add1 = c.Add1
AND x.Id <> c.Id
)
-- Match First/Last name are same/similar and the address is same.
OR EXISTS (
SELECT * FROM #Customers x WHERE
DIFFERENCE( x.FirstName, c.FirstName ) = 4
AND DIFFERENCE( x.LastName, c.LastName ) = 4
AND x.Add1 = c.Add1
AND x.Id <> c.Id
)
-- Match First/Last name and one address exists.
OR EXISTS (
SELECT * FROM #Customers x WHERE
x.FirstName = c.FirstName
AND x.LastName = c.LastName
AND x.Id <> c.Id
AND (
x.Add1 IS NULL AND c.Add1 IS NOT NULL
OR
x.Add1 IS NOT NULL AND c.Add1 IS NULL
)
);
Returns
+-----------+----------+-----------------+----+
| FirstName | LastName | Add1 | Id |
+-----------+----------+-----------------+----+
| John | Doe | 123 Anywhere Ln | 1 |
| John | Doe | 123 Anywhere Ln | 2 |
| John | Doe | NULL | 3 |
| John C. | Doe | 123 Anywhere Ln | 4 |
+-----------+----------+-----------------+----+
Initial resultset:
+-----------+----------+------------------+----+
| FirstName | LastName | Add1 | Id |
+-----------+----------+------------------+----+
| John | Doe | 123 Anywhere Ln | 1 |
| John | Doe | 123 Anywhere Ln | 2 |
| John | Doe | NULL | 3 |
| John C. | Doe | 123 Anywhere Ln | 4 |
| John C. | Doe | 15673 SW Liar Dr | 5 |
+-----------+----------+------------------+----+
I want to change multiple rows into a single row based on Number and Type:
+------+---------+----------+-------+---------+-------+
| Name | Address | City | State | Number | Type |
+------+---------+----------+-------+---------+-------+
| XYZ | 123 Rd | New York | NY | 123D | Code1 |
| XYZ | 123 Rd | New York | NY | 56A45 | Code2 |
| XYZ | 123 Rd | New York | NY | D45256 | Code3 |
| XYZ | 123 Rd | New York | NY | 345TT | Code2 |
| XYZ | 123 Rd | New York | NY | 34561NN | Code2 |
| XYZ | 123 Rd | New York | NY | 84567YY | Code2 |
+------+---------+----------+-------+---------+-------+
Result
+------+---------+----------+-------+-------+-------+----------+-----------+----------+--------+
| Name | Address | City | State | code1 | Code2 | code2_II | Code2_III | Code2_IV | Code3 |
+------+---------+----------+-------+-------+-------+----------+-----------+----------+--------+
| XYZ | 123 Rd | New York | NY | 123D | 56A45 | 345TT | 34561NN | 84567YY | D45256 |
+------+---------+----------+-------+-------+-------+----------+-----------+----------+--------+
Query I used only work for Code2 and Code2_II but I want to bring Code2_III and Code2_IV
select
Name, Addresss, City, State,
min(Number) as Code1,
(case when min(Number) <> max(Number) then max(Number) end) as Code2
from
t
group by
Name, Addresss, City, State;
As Tim mentions in the comments to your question, this will be very difficult to do without using dynamic SQL.
This code is not exactly what you are looking for but should get you most of the way there.
CREATE TABLE #Addresses (
[Name] varchar(255),
[Address] varchar(255),
[City] varchar(255),
[State] varchar(255),
[Number] varchar(255),
[Type] varchar(255)
)
INSERT INTO #Addresses
VALUES
('XYZ','123 Rd','New York','NY','123D','Code1'),
('XYZ','123 Rd','New York','NY','56A45','Code2'),
('XYZ','123 Rd','New York','NY','D45256','Code3'),
('XYZ','123 Rd','New York','NY','345TT','Code2'),
('XYZ','123 Rd','New York','NY','34561NN','Code2'),
('XYZ','123 Rd','New York','NY','84567YY','Code2');
SELECT *
FROM (
SELECT
[Name], [Address], [City], [State], [Number]
, [Type] + '_' + LTRIM(STR(rn)) AS NewType
FROM (
SELECT *
,ROW_NUMBER() OVER(PARTITION BY [Name], [Type] ORDER BY [Number]) rn
FROM #Addresses
) a
) p
PIVOT (
MAX([Number])
FOR [NewType] IN ([Code1_1], [Code2_1], [Code2_2], [Code2_3], [Code2_4], [Code3_1])
) AS pvt
In short it is using ROW_NUMBER to figure out how many Numbers exist for a particular Type. It then uses a select statement to build a new name field called NewType.
From there it uses a PIVOT statement to break out the data into separate columns.
A few things to note.
As mentioned it is somewhat dynamic but you will have to use Dynamic SQL for the PIVOT part if you are going to run this in a larger query or have varying amount of values in the Type and Number fields. However if you know you will never have more than 4 of any code you can build the statement like this
SELECT *
FROM (
SELECT
[Name], [Address], [City], [State], [Number]
, [Type] + '_' + LTRIM(STR(rn)) AS NewType
FROM (
SELECT *
,ROW_NUMBER() OVER(PARTITION BY [Name], [Type] ORDER BY [Number]) rn
FROM #Addresses
) a
) p
PIVOT (
MAX([Number])
FOR [NewType] IN ([Code1_1], [Code1_2], [Code1_3], [Code1_4], [Code2_1], [Code2_2], [Code2_3], [Code2_4], [Code3_1], [Code3_2], [Code3_3], [Code3_4])
) AS pvt
Doing it this way will produce null fields for any code that doesn't have a value.
This will likely be slow with larger datasets.
Hope this gets you point in the right direction.
You can use conditional aggregation:
select Name, Addresss, City, State,
min(case when type = 'Code1' then number) as code1,
min(case when type = 'Code2' and seqnum = 1 then number) as code2,
min(case when type = 'Code2' and seqnum = 2 then number) as code2_II,
min(case when type = 'Code2' and seqnum = 3 then number) as code2_III,
min(case when type = 'Code2' and seqnum = 4 then number) as code2_IV,
min(case when type = 'Code3' and seqnum = 1 then number) as code3
from (select t.*,
row_number() over (partition by Name, Addresss, City, State, type order by type) as seqnum
from t
) t
group by Name, Addresss, City, State;
Note: This returns exactly six code columns. That seems to be what you expect. If you want a dynamic number of columns, you need to use dynamic SQL.
SELECT DISTINCT Name,
Address,
City,
STATE,
STUFF((
SELECT ',' + [tCode1].Number
FROM #TestTable [tCode1]
WHERE [tCode1].Type = 'Code1'
FOR XML PATH('')
), 1, 1, N'') [Code1],
STUFF((
SELECT ',' + [tCode2].Number
FROM #TestTable [tCode2]
WHERE [tCode2].Type = 'Code2'
FOR XML PATH('')
), 1, 1, N'') [Code2],
STUFF((
SELECT ',' + [tCode3].Number
FROM #TestTable [tCode3]
WHERE [tCode3].Type = 'Code3'
FOR XML PATH('')
), 1, 1, N'') [Code3]
FROM #TestTable;
I referenced this https://www.mssqltips.com/sqlservertip/2914/rolling-up-multiple-rows-into-a-single-row-and-column-for-sql-server-data/
To Rolling multiple rows into one
Result:
+------+---------+----------+-------+-------+-----------------------------+---------+
| Name | Address | City | State | Code1 | Code2 | Code3 |
+------+---------+----------+-------+-------+-----------------------------+---------+
| XYZ | 123 Rd | New York | NY | 123D | 56A45,345TT,34561NN,84567YY | D45256 |
+------+---------+----------+-------+-------+-----------------------------+---------+
I have a customer table with several hundred thousand records. There are a LOT of duplicates of varying degrees. I am trying to identify duplicate records with level of possibility of being a duplicate.
My source table has 7 fields and looks like this:
I look for duplicates, and put them into an intermediate table with the level of possibility, table name, and the customer number.
Intermediate Table
CREATE TABLE DataCheck (
id int identity(1,1),
reason varchar(100) DEFAULT NULL,
tableName varchar(100) DEFAULT NULL,
tableID varchar(100) DEFAULT NULL
)
Here is my code to identify and insert:
-- Match on Company, Contact, Address, City, and Phone
-- DUPE
INSERT INTO DataCheck
SELECT 'Duplicate','CUSTOMER',tcd.uid
FROM #tmpCoreData tcd
INNER JOIN
(SELECT
company,
fname,
lname,
add1,
city,
phone1,
COUNT(*) AS count
FROM #tmpCoreData
WHERE company <> ''
GROUP BY company, fname, lname, add1, city, phone1
HAVING COUNT(*) > 1) dl
ON dl.company = tcd.company
ORDER BY tcd.company
In this example, it would insert ids 101, 102
The problem is when I perform the next pass:
-- Match on Company, Address, City, Phone (Diff Contacts)
-- LIKELY DUPE
INSERT INTO DataCheck
SELECT 'Likely Duplicate','CUSTOMER',tcd.uid
FROM #tmpCoreData tcd
INNER JOIN
(SELECT
company,
add1,
city,
phone1,
COUNT(*) AS count
FROM #tmpCoreData
WHERE company <> ''
GROUP BY company, add1, city, phone1
HAVING COUNT(*) > 1) dl
ON dl.company = tcd.company
ORDER BY tcd.companyc
This pass would then insert, 101, 102 & 103.
The next pass drops the phone so it would insert 101, 102, 103, 104
The next pass would look for company only which would insert all 5.
I now have 14 entries into my intermediate table for 5 records.
How can I add an exclusion so the 2nd pass groups on the same Company, Address, City, Phone but DIFFERENT fname and lname. Then it should only insert 101 and 103
I considered adding a NOT IN (SELECT tableID FROM DataCheck) to ensure IDs aren't added multiple times, but on the 3rd of 4th pass it may find a duplicate and entered 700 records after the row it's a duplicate of, so you lose the context of it's a dupe of.
My output uses:
SELECT
dc.reason,
dc.tableName,
tcd.*
FROM DataCheck dc
INNER JOIN #tmpCoreData tcd
ON tcd.uid = dc.tableID
ORDER BY dc.id
And looks something like this, which is a bit confusing:
I'm going to challenge your perception of your issue, and instead propose that you calculate a simple "confidence score", which will also help you vastly simplify your results table:
WITH FirstCompany AS (SELECT custNo, company, fname, lname, add1, city, phone1
FROM(SELECT custNo, company, fname, lname, add1, city, phone1,
ROW_NUMBER() OVER(PARTITION BY company ORDER BY custNo) AS ordering
FROM CoreData) FC
WHERE ordering = 1)
SELECT RankMapping.description, Duplicate.custNo, Duplicate.company, Duplicate.fname, Duplicate.lname, Duplicate.add1, Duplicate.city, Duplicate.phone1
FROM (SELECT FirstCompany.custNo AS originalCustNo, Duplicate.*,
CASE WHEN FirstCompany.custNo = Duplicate.custNo THEN 1 ELSE 0 END
+ CASE WHEN FirstCompany.fname = Duplicate.fname AND FirstCompany.lname = Duplicate.lname THEN 1 ELSE 0 END
+ CASE WHEN FirstCompany.add1 = Duplicate.add1 AND FirstCompany.city = Duplicate.city THEN 1 ELSE 0 END
+ CASE WHEN FirstCompany.phone1 = Duplicate.phone1 THEN 1 ELSE 0 END
AS ranking
FROM FirstCompany
JOIN CoreData Duplicate
ON Duplicate.custNo >= FirstCompany.custNo
AND Duplicate.company = FirstCompany.company) Duplicate
JOIN (VALUES (4, 'original'),
(3, 'duplicate'),
(2, 'likely dupe'),
(1, 'possible dupe'),
(0, 'not likely dupe')) RankMapping(score, description)
ON RankMapping.score = Duplicate.ranking
ORDER BY Duplicate.originalCustNo, Duplicate.ranking DESC
SQL Fiddle Example
... which generates results that look like this:
| description | custNo | company | fname | lname | add1 | city | phone1 |
|-----------------|--------|----------|---------|--------|--------------|--------------|------------|
| original | 101 | ACME INC | JOHN | DOE | 123 ACME ST | LOONEY HILLS | 1231234567 |
| duplicate | 102 | ACME INC | JOHN | DOE | 123 ACME ST | LOONEY HILLS | 1231234567 |
| likely dupe | 103 | ACME INC | JANE | SMITH | 123 ACME ST | LOONEY HILLS | 1231234567 |
| possible dupe | 104 | ACME INC | BOB | DOLE | 123 ACME ST | LOONEY HILLS | 4564567890 |
| not likely dupe | 105 | ACME INC | JESSICA | RABBIT | 456 ROGER LN | WARNER | 4564567890 |
This code baselessly assumes that the smallest custNo is the "original", and assumes matches will be equivalent to solely that one, but it's completely possible to get other matches as well (just unnest the subquery in the CTE, and remove the row number).
I have a table that contains Home addresses and Mailing addresses. It looks like this:
ID Name StNum StName City State Zip Type
-- ---- ----- ------ ---- ----- --- ----
1 Joe 1234 Main St Waco TX 76767 HOM
1 Joe 2345 High St Waco TX 76763 MLG
2 Amy 3456 Broad St Athens GA 34622 HOM
3 Mel 987 Front St Cary NC 65331 HOM
3 Mel 1111 Main Ave Hilo HI 99779 MLG
I need to write an SQL statement that will only return the Mailing address (MLG record) if it exists, and if not, will return the Home address (HOM record).
The expected results from this table would be:
ID Name StNum StName City State Zip Type
-- ---- ----- ------ ---- ----- --- ----
1 Joe 2345 High St Waco TX 76763 MLG
2 Amy 3456 Broad St Athens GA 34622 HOM
3 Mel 1111 Main Ave Hilo HI 99779 MLG
Any help that you can provide would be much appreciated! Thanks!
use correlated subquery
select * from
(
select *,case when Type='MLG' then 1 else 0 end as typeval
from tablename
)A where typeval in (select max(case when Type='MLG' then 1 else 0 end) from tablename b
where a.name=b.name)
OR if your DB supports row_number() then u can try below -
select * from
(
select *, row_number() over(partition by name order by case when Type='MLG' then 1 else 0 end desc)
from tablename
)A where rn=1
In case you are using SQL Server, i would solve it with the ROW_NUMBER function.
SELECT ID, Name, StNum, StName, City, State, Zip, Type
FROM (
SELECT *
,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Type DESC) AS Rn
FROM yourtable
)
WHERE Rn = 1
This can be done using a WHERE clause that exclude the ids of the users who have MLG
Schema (MySQL v5.7)
CREATE TABLE test (
`ID` INTEGER,
`Name` VARCHAR(3),
`StNum` INTEGER,
`StName` VARCHAR(8),
`City` VARCHAR(6),
`State` VARCHAR(2),
`Zip` INTEGER,
`Type` VARCHAR(3)
);
INSERT INTO test
(`ID`, `Name`, `StNum`, `StName`, `City`, `State`, `Zip`, `Type`)
VALUES
('1', 'Joe', '1234', 'Main St', 'Waco', 'TX', '76767', 'HOM'),
('1', 'Joe', '2345', 'High St', 'Waco', 'TX', '76763', 'MLG'),
('2', 'Amy', '3456', 'Broad St', 'Athens', 'GA', '34622', 'HOM'),
('3', 'Mel', '987', 'Front St', 'Cary', 'NC', '65331', 'HOM'),
('3', 'Mel', '1111', 'Main Ave', 'Hilo', 'HI', '99779', 'MLG');
Query #1
SELECT id,
name,
StNum,
StName,
City,
State,
Zip,
Type
FROM test t1
WHERE t1.`Type` = 'MLG'
OR t1.id NOT IN
(
SELECT id
FROM test t2
WHERE t2.`Type` = 'MLG'
);
Output :
| id | name | StNum | StName | City | State | Zip | Type |
| --- | ---- | ----- | -------- | ------ | ----- | ----- | ---- |
| 1 | Joe | 2345 | High St | Waco | TX | 76763 | MLG |
| 2 | Amy | 3456 | Broad St | Athens | GA | 34622 | HOM |
| 3 | Mel | 1111 | Main Ave | Hilo | HI | 99779 | MLG |
View on DB Fiddle
Or, my first dumb version :
This can be done using UNION
Schema (MySQL v5.7)
CREATE TABLE test (
`ID` INTEGER,
`Name` VARCHAR(3),
`StNum` INTEGER,
`StName` VARCHAR(8),
`City` VARCHAR(6),
`State` VARCHAR(2),
`Zip` INTEGER,
`Type` VARCHAR(3)
);
INSERT INTO test
(`ID`, `Name`, `StNum`, `StName`, `City`, `State`, `Zip`, `Type`)
VALUES
('1', 'Joe', '1234', 'Main St', 'Waco', 'TX', '76767', 'HOM'),
('1', 'Joe', '2345', 'High St', 'Waco', 'TX', '76763', 'MLG'),
('2', 'Amy', '3456', 'Broad St', 'Athens', 'GA', '34622', 'HOM'),
('3', 'Mel', '987', 'Front St', 'Cary', 'NC', '65331', 'HOM'),
('3', 'Mel', '1111', 'Main Ave', 'Hilo', 'HI', '99779', 'MLG');
Query #1
SELECT id,
name,
StNum,
StName,
City,
State,
Zip,
Type
FROM test t1
WHERE t1.`Type` = 'MLG'
UNION ALL
SELECT id,
name,
StNum,
StName,
City,
State,
Zip,
Type
FROM test t2
WHERE t2.id NOT IN (SELECT id FROM test t3 WHERE t3.`Type` = 'MLG')
ORDER BY id;
Output
| id | name | StNum | StName | City | State | Zip | Type |
| --- | ---- | ----- | -------- | ------ | ----- | ----- | ---- |
| 1 | Joe | 2345 | High St | Waco | TX | 76763 | MLG |
| 2 | Amy | 3456 | Broad St | Athens | GA | 34622 | HOM |
| 3 | Mel | 1111 | Main Ave | Hilo | HI | 99779 | MLG |
View on DB Fiddle
This is a prioritization query. With two values, often the simplest method is union all with not exists (or not in).
That does not generalize well For more values, using row_number() with case is convenient:
select t.*
from (select t.*,
row_number() over (partition by id
order by (case when type = 'MLG' then 1 else 2 end)
) as seqnum
from t
) t
where seqnum = 1;
In your particular case, you could use order by type desc, because the two types happen to be prioritized in reverse alphabetical ordering. However, I recommend using case because the intention is more explicit.
My table is as below
ID | Name | City
------ | ------ | ------
1 | ABC | London, Paris
1 | ABC | Paris
1 | ABC | Japan
2 | XYZ | Delhi
2 | XYZ | Delhi, New York
My output needs to be like this:
ID | Name | City
------ | ------ | ------
1 | ABC | London, Paris, Japan
2 | XYZ | Delhi, New York
I see it as a 2 step process:
Concatenate all the unique cities for each ID and Name. Example: For ID 1 and Name ABC Cities would be London, Paris, Japan
Update the concatenated string by grouping the ID and Name.
I am able to do this for just one group, but how do I do this for all the different groups in the table?
Also, would cursors come into picture here when I want to update the string to all the rows matching the ID and Name.
Any help or idea on this would be appreciated.
You should consider normalizing your table first.
Here, you first want to convert all the comma separated values into separate rows and then, group them together using STUFF and FOR XML PATH.
with your_table (ID, name, City)
as (
select 1, 'ABC', 'London, Paris'
union all
select 1, 'ABC', 'Paris'
union all
select 1, 'ABC', 'Japan'
union all
select 2, 'XYZ', 'Delhi'
union all
select 2, 'XYZ', 'Delhi, New York'
), your_table_modified
as (
select distinct id, name, Split.a.value('.', 'varchar(100)') City
from (
select id, name, cast('<x>' + replace(City, ', ', '</x><x>') + '</x>' as xml) as x
from your_table
) t
cross apply x.nodes('/x') as Split(a)
)
select id, name, stuff((
select ', ' + city
from your_table_modified t2
where t.id = t2.id
for xml path(''), type
).value('(./text())[1]', 'varchar(max)')
, 1, 2, '')
from your_table_modified t
group by id, name;
Produces:
Try this:
select y.id,y.name,y.city from (
select id,name, case when Id=1 and name='ABC' then 'London,Paris,Japan'
when Id=2 and name='XYZ' then 'Delhi,Newyork'
end as CIty
,row_number () over (partition by id order by name asc) as rnk
from Yourtable
)y
where y.rnk=1