SQL - Separating a merge field into separate fields based on delimiters

SQL - Separating a merge field into separate fields based on delimiters - sql

I have a table with an nvarchar(max) column including a merged text like below:
ID MyString
61 Team:Finance,Accounting,HR,Country:Global,
62 Country:Germany,
63 Team:Legal,
64 Team:Finance,Accounting,Country:Global,External:Tenants,Partners,
65 External:Vendors,
What I need is to create another table for each item having the Team, Country and External values separated into 3 different columns.
Id Team Country External
61 Finance,Accounting,HR Global NULL
62 NULL Germany NULL
63 Legal NULL NULL
64 Finance,Accounting Global Tenants,Partners
65 NULL NULL Vendors
What is the most efficient way to do it? I'm trying to use STRING_SPLIT but couldn't manage it.
Any help would be appreciated.

Please try the following solution.
Data resembles JSON, so we'll compose a proper JSON via few REPLACE() function calls.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT PRIMARY KEY, tokens NVARCHAR(MAX));
INSERT INTO #tbl (ID, tokens) VALUES
(61, 'Team:Finance,Accounting,HR,Country:Global,'),
(62, 'Country:Germany,'),
(63, 'Team:Legal,'),
(64, 'Team:Finance,Accounting,Country:Global,External:Tenants,Partners,'),
(65, 'External:Vendors,');
-- DDL and sample data population, end
SELECT *
FROM #tbl
CROSS APPLY OPENJSON('{"' + REPLACE(REPLACE(REPLACE(TRIM(',' FROM tokens), ':', '": "')
,',Country', '", "Country')
,',External', '", "External') + '"}')
WITH
(
Team VARCHAR(100) '$.Team',
Country VARCHAR(100) '$.Country',
[External] VARCHAR(100) '$.External'
) AS u;
Output
+----+-------------------------------------------------------------------+-----------------------+---------+------------------+
| ID | tokens | Team | Country | External |
+----+-------------------------------------------------------------------+-----------------------+---------+------------------+
| 61 | Team:Finance,Accounting,HR,Country:Global, | Finance,Accounting,HR | Global | NULL |
| 62 | Country:Germany, | NULL | Germany | NULL |
| 63 | Team:Legal, | Legal | NULL | NULL |
| 64 | Team:Finance,Accounting,Country:Global,External:Tenants,Partners, | Finance,Accounting | Global | Tenants,Partners |
| 65 | External:Vendors, | NULL | NULL | Vendors |
+----+-------------------------------------------------------------------+-----------------------+---------+------------------+

Firstly, let me repeat my comments here. SQL Server is the last place you should be doing this; it's string manipulation is poor and you have a severely denormalised design, with denormalised data containing denormalised data. Fixing your design to a normalised approach must be a priority, as leaving your data in this state is only going to make things harder the further you go down this rabbit hole.
One method you could use to achieve this, however, would be with a JSON splitter and some restring aggregation, but this is real ugly. The choice of having the "column" and "row" delimiter to both be a comma (,) makes this a complete mess, and I am not going to explain what it's doing because you just should not be doing this.
WITH YourTable AS(
SELECT *
FROM (VALUES(61,'Team:Finance,Accounting,HR,Country:Global,'),
(62,'Country:Germany,'),
(63,'Team:Legal,'),
(64,'Team:Finance,Accounting,Country:Global,External:Tenants,Partners,'),
(65,'External:Vendors,'))V(ID,MyString)),
PartiallyNormal AS(
SELECT YT.ID,
CONVERT(int,LEAD(OJC.[Key],1,OJC.[Key]) OVER (PARTITION BY ID ORDER BY OJC.[Key], OJV.[Key])) AS ColumnNo,
OJV.[value],
CONVERT(int,OJC.[key]) AS [key]
FROM YourTable YT
CROSS APPLY OPENJSON(CONCAT('["', REPLACE(YT.MyString,':','","'),'"]')) OJC
CROSS APPLY OPENJSON(CONCAT('["', REPLACE(OJC.[value],',','","'),'"]')) OJV),
WithNames AS(
SELECT ID,
ColumnNo,
[value],
[key],
FIRST_VALUE(PN.[Value]) OVER (PARTITION BY ID, ColumnNo ORDER BY [Key]) AS ColumnName
FROM PartiallyNormal PN)
SELECT ID,
TRIM(',' FROM STRING_AGG(CASE ColumnName WHEN 'Team' THEN NULLIF([value],'''') END,',') WITHIN GROUP (ORDER BY [key])) AS Team, --TRIM because I've not taken the time to work out why there are sometimes a trailing comma
TRIM(',' FROM STRING_AGG(CASE ColumnName WHEN 'Country' THEN NULLIF([value],'''') END,',') WITHIN GROUP (ORDER BY [key])) AS Country,
TRIM(',' FROM STRING_AGG(CASE ColumnName WHEN 'External' THEN NULLIF([value],'''') END,',') WITHIN GROUP (ORDER BY [key])) AS [External]
FROM WithNames WN
WHERE [value] <> [ColumnName]
GROUP BY ID
ORDER BY ID;
db<>fiddle

STRING_SPLIT in SQL Server 2017 doesn't tell us the order of the items in the list, so it can't be used here.
Only SQL Server 2022 would add a parameter to STRING_SPLIT that would tell the order of the items.
Until that version of SQL Server the most efficient method would likely be the CLR. Write your parser in C# and call your function using CLR.

Another option is:
splitting the string using the STRING_SPLIT function on the colon
extracting consecutive strings using the LAG function
removing the string identifiers (Team, Country and External)
aggregating on the ID to remove NULL values
Here's the query:
WITH cte AS (
SELECT ID,
LAG(value) OVER(PARTITION BY ID ORDER BY (SELECT 1)) AS prev_value,
value
FROM tab
CROSS APPLY STRING_SPLIT(MyString, ':')
)
SELECT ID,
MAX(CASE WHEN prev_value LIKE 'Team'
THEN REPLACE(value, ',Country', '') END) AS [Team],
MAX(CASE WHEN prev_value LIKE '%Country'
THEN LEFT(value, LEN(value)-1) END) AS [Country],
MAX(CASE WHEN prev_value LIKE '%External'
THEN LEFT(value, LEN(value)-1) END) AS [External]
FROM cte
GROUP BY ID
Check the demo here.

Related

Split column value into substrings and search for pattern in SQL

I have a table like this:
campaign
code
AL2330GH_HDKASL_QCLKP
NULL
JPDJK34_QPKSLL_QKPAL
NULL
QCK32_SDSKDS_TLLKA
NULL
I want to update the above table by populating the column 'code' with a substring in column 'campaign' which starts with 'AL', 'QC', or 'QP'. All the column values have 3 substrings separated by an '_'. If none of the substrings matches with the provided values, then keep the 'code' column value as NULL. And if multiple matches happen, take the first substring.
Desired Output:
campaign
code
AL2330GH_HDKASL_QCLKP
AL2330GH
JPDJK34_QPKSLL_QKPAL
QPKSLL
QCK32_SDSKDS_TLLKA
QCK32
Link to try out the problem: https://www.db-fiddle.com/f/8qoFDL1RmjwpwFNP3LP4eK/1

Here's a method using OPENJSON():
;WITH src AS
(
SELECT campaign, value, code,
rn = ROW_NUMBER() OVER (PARTITION BY campaign ORDER BY [key])
FROM
(
SELECT campaign, [key], value, code
FROM dbo.SomeTable
CROSS APPLY OPENJSON(CONCAT('["',
REPLACE(STRING_ESCAPE(campaign,'JSON'),'_','","'),'"]')) AS j
) AS x WHERE LEFT(value,2) IN ('AL','QC','QP')
)
UPDATE src SET code = value WHERE rn = 1;
Example db<>fiddle

You can try to use STRING_SPLIT with CROSS APPLY and ROW_NUMBER window function to make it.
CHARINDEX function will find the first match position value then we can put the split value in the first parameter, then we can find which string the first appearance.
SELECT campaign,value
FROM (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY campaign ORDER BY CHARINDEX(v.value,t1.campaign)) rn
FROM mainTable t1
CROSS APPLY STRING_SPLIT(t1.campaign,'_') v
WHERE (value LIKE 'AL%'
OR value LIKE 'QC%'
OR value LIKE 'QP%')
) t1
WHERE rn = 1
If you want to UPDATE values you can try UPDATE like this.
UPDATE t1
SET
code = value
FROM (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY campaign ORDER BY CHARINDEX(v.value,t1.campaign)) rn
FROM mainTable t1
CROSS APPLY STRING_SPLIT(t1.campaign,'_') v
WHERE (value LIKE 'AL%'
OR value LIKE 'QC%'
OR value LIKE 'QP%')
) t1
WHERE rn = 1
sqlfiddle

Please try the following solution.
It is using XML and XQuery for tokenization. XML/XQuery data model is based on ordered sequences. Exactly what we need for the scenario.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (id INT IDENTITY PRIMARY KEY, campaign varchar(50), code varchar(20));
INSERT INTO #tbl (campaign, code) VALUES
('AL2330GH_HDKASL_QCLKP', NULL),
('JPDJK34_QPKSLL_QKPAL', NULL),
('QCK32_SDSKDS_TLLKA', NULL);
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = '_';
UPDATE t
SET code = c.query('
for $x in /root/r[substring(text()[1],1,2)=("AL","QC","QP")]
return $x').value('(/r/text())[1]', 'VARCHAR(20)')
FROM #tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(campaign, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t1(c);
Output
+----+-----------------------+----------+
| id | campaign | Code |
+----+-----------------------+----------+
| 1 | AL2330GH_HDKASL_QCLKP | AL2330GH |
| 2 | JPDJK34_QPKSLL_QKPAL | QPKSLL |
| 3 | QCK32_SDSKDS_TLLKA | QCK32 |
+----+-----------------------+----------+

Compare two rows (both with different ID) & check if their column values are exactly the same. All rows & columns are in the same table

I have a table named "ROSTER" and in this table I have 22 columns.
I want to query and compare any 2 rows of that particular table with the purpose to check if each column's values of that 2 rows are exactly the same. ID column always has different values in each row so I will not include ID column for the comparing. I will just use it to refer to what rows will be used for the comparison.
If all column values are the same: Either just display nothing (I prefer this one) or just return the 2 rows as it is.
If there are some column values not the same: Either display those column names only or display both the column name and its value (I prefer this one).
Example:
ROSTER Table:
ID
NAME
TIME
1
N1
0900
2
N1
0801
Output:
ID
TIME
1
0900
2
0801
OR
Display "TIME"
Note: Actually I'm okay with whatever result or way of output as long as I can know in any way that the 2 rows are not the same.
What are the possible ways to do this in SQL Server?
I am using Microsoft SQL Server Management Studio 18, Microsoft SQL Server 2019-15.0.2080.9

Please try the following solution based on the ideas of John Cappelletti. All credit goes to him.
SQL
-- DDL and sample data population, start
DECLARE #roster TABLE (ID INT PRIMARY KEY, NAME VARCHAR(10), TIME CHAR(4));
INSERT INTO #roster (ID, NAME, TIME) VALUES
(1,'N1','0900'),
(2,'N1','0801')
-- DDL and sample data population, end
DECLARE #source INT = 1
, #target INT = 2;
SELECT id AS source_id, #target AS target_id
,[key] AS [column]
,source_Value = MAX( CASE WHEN Src=1 THEN Value END)
,target_Value = MAX( CASE WHEN Src=2 THEN Value END)
FROM (
SELECT Src=1
,id
,B.*
FROM #roster AS A
CROSS APPLY ( SELECT [Key]
,Value
FROM OpenJson( (SELECT A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES))
) AS B
WHERE id=#source
UNION ALL
SELECT Src=2
,id = #source
,B.*
FROM #roster AS A
CROSS APPLY ( SELECT [Key]
,Value
FROM OpenJson( (SELECT A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES))
) AS B
WHERE id=#target
) AS A
GROUP BY id, [key]
HAVING MAX(CASE WHEN Src=1 THEN Value END)
<> MAX(CASE WHEN Src=2 THEN Value END)
AND [key] <> 'ID' -- exclude this PK column
ORDER BY id, [key];
Output
+-----------+-----------+--------+--------------+--------------+
| source_id | target_id | column | source_Value | target_Value |
+-----------+-----------+--------+--------------+--------------+
| 1 | 2 | TIME | 0900 | 0801 |
+-----------+-----------+--------+--------------+--------------+

A general approach here might be to just aggregate over the entire table and report the state of the counts:
SELECT
CASE WHEN COUNT(DISTINCT ID) = COUNT(*) THEN 'Yes' ELSE 'No' END AS [ID same],
CASE WHEN COUNT(DISTINCT NAME) = COUNT(*) THEN 'Yes' ELSE 'No' END AS [NAME same],
CASE WHEN COUNT(DISTINCT TIME) = COUNT(*) THEN 'Yes' ELSE 'No' END AS [TIME same]
FROM yourTable;

How to show only the latest record in SQL

I have this issue where I want to show only the latest record (Col 1). I deleted the date column thinking that it might not work if it has different values. but if that's the case, then the record itself has a different name (Col 1) because it has a different date in the name of it.
Is it possible to fetch one record in this case?
The code:
SELECT distinct p.ID,
max(at.Date) as date,
at.[RAPID3 Name] as COL1,
at.[DLQI Name] AS COL2,
at.[HAQ-DI Name] AS COL3,
phy.name as phyi,
at.State_ID
FROM dbo.[Assessment Tool] as at
Inner join dbo.patient as p on p.[ID] = at.[Owner (Patient)_Patient_ID]
Inner join dbo.[Physician] as phy on phy.ID = p.Physician_ID
where (at.State_ID in (162, 165,168) and p.ID = 5580)
group by
at.[RAPID3 Name],
at.[DLQI Name],
at.[HAQ-DI Name],
p.ID, phy.name,
at.State_ID
SS:
In this SS I want to show only the latest record (COL 1) of this ID "5580". Means the first row for this ID.
Thank you

The Most Accurate way to handle this.
Extract The Date.
Than use Top and Order.
create table #Temp(
ID int,
Col1 Varchar(50) null,
Col2 Varchar(50) null,
Col3 Varchar(50) null,
Phyi Varchar(50) null,
State_ID int)
Insert Into #Temp values(5580,'[9/29/2021]-[9.0]High Severity',null,null,'Eman Elshorpagy',168)
Insert Into #Temp values(5580,'[10/3/2021]-[9.3]High Severity',null,null,'Eman Elshorpagy',168)
select top 1 * from #Temp as t
order by cast((Select REPLACE((SELECT REPLACE((SELECT top 1 Value FROM STRING_SPLIT(t.Col1,'-')),'[','')),']','')) as date) desc

This is close to ANSI standard, and it also caters for the newest row per id.
The principle is to use ROW_NUMBER() using a descending order on the date/timestamp (using a DATE type instead of a DATETIME and avoiding the keyword DATE for a column name) in one query, then to select from that query using the result of row number for the filter.
-- your input, but 2 id-s to show how it works with many ..
indata(id,dt,col1,phyi,state_id) AS (
SELECT 5580,DATE '2021-10-03','[10/3/2021] - [9,3] High Severity','Eman Elshorpagy',168
UNION ALL SELECT 5580,DATE '2021-09-29','[9/29/2021] - [9,0] High Severity','Eman Elshorpagy',168
UNION ALL SELECT 5581,DATE '2021-10-03','[10/3/2021] - [9,3] High Severity','Eman Elshorpagy',168
UNION ALL SELECT 5581,DATE '2021-09-29','[9/29/2021] - [9,0] High Severity','Eman Elshorpagy',168
)
-- real query starts here, replace following comman with "WITH" ...
,
with_rank AS (
SELECT
*
, ROW_NUMBER() OVER(PARTITION BY id ORDER BY dt DESC) AS rank_id
FROM indata
)
SELECT
id
, dt
, col1
, phyi
, state_id
FROM with_rank
WHERE rank_id=1
;
id | dt | col1 | phyi | state_id
------+------------+-----------------------------------+-----------------+----------
5580 | 2021-10-03 | [10/3/2021] - [9,3] High Severity | Eman Elshorpagy | 168
5581 | 2021-10-03 | [10/3/2021] - [9,3] High Severity | Eman Elshorpagy | 168

MS SQL: Transform delimeted string with key value pairs into table where keys are column names

I have a MS SQL table that has a message field containing a string with key value pairs in comma delimited format. Example:
id
date
message
1
11-5-2021
species=cat,color=black,says=meow
I need to read the data from tables message field and insert it into a table where keys are column names.
Format of the strings:
species=cat,color=black,says=meow
And this should be transformed into table as follows:
species
color
says
cat
black
meow
The order of key value pairs is not fixed in the message. Message can also contain additional keys that should be ignored.
How can I achieve this using MS SQL?

It is so much easier to implement by using JSON.
It will work starting from SQL Server 2016 onwards.
This way all the scenarios are taken in the account. I added them to the DDL and sample data population section in the T-SQL.
The order of key value pairs is not fixed in the message. Message can
also contain additional keys that should be ignored.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, [Date] DATE, Message VARCHAR(500));
INSERT INTO #tbl VALUES
('2021-05-01', 'species=cat,color=black,says=meow'),
('2021-05-11', 'species=dog,says=bark,comment=wow,color=white');
-- DDL and sample data population, end
WITH rs AS
(
SELECT *
, '[{"' + REPLACE(REPLACE(Message
, '=', '":"')
, ',', '","') + '"}]' AS jsondata
FROM #tbl
)
SELECT rs.ID, rs.Date, report.*
FROM rs
CROSS APPLY OPENJSON(jsondata)
WITH
(
[species] VARCHAR(10) '$.species'
, [color] VARCHAR(10) '$.color'
, [says] VARCHAR(30) '$.says'
) AS report;
Output
+----+------------+---------+-------+------+
| ID | Date | species | color | says |
+----+------------+---------+-------+------+
| 1 | 2021-05-01 | cat | black | meow |
| 2 | 2021-05-11 | dog | white | bark |
+----+------------+---------+-------+------+

You can use string_split() and some string operations:
select t.*, ss.*
from t cross apply
(select max(case when s.value like 'color=%'
then stuff(s.value, 1, 6, '')
end) as color,
max(case when s.value like 'says=%'
then stuff(s.value, 1, 5, '')
end) as s
from string_split(t.message, ',') s
) ss

Assuming you are using a fully supported version of SQL Server you could do something like this:
SELECT MAX(CASE PN.ColumnName WHEN 'species' THEN PN.ColumnValue END) AS Species,
MAX(CASE PN.ColumnName WHEN 'color' THEN PN.ColumnValue END) AS Color,
MAX(CASE PN.ColumnName WHEN 'says' THEN PN.ColumnValue END) AS Says
FROM (VALUES(1,CONVERT(date,'20210511'),'species=cat,color=black,says=meow'))V(id,date,message)
CROSS APPLY STRING_SPLIT(V.message,',') SS
CROSS APPLY (VALUES(PARSENAME(REPLACE(SS.[value],'=','.'),2),PARSENAME(REPLACE(SS.[value],'=','.'),1)))PN(ColumnName, ColumnValue);
Hopefully the reason you are doing this exercise is the normalise your design. If you aren't, I suggest you do.

SQL Server: Split values from columns with multiple values, into multiple rows [duplicate]

This question already has answers here:
Turning a Comma Separated string into individual rows
(16 answers)
Closed 4 years ago.
I have data that currently looks like this (pipe indicates separate columns):
ID | Sex | Purchase | Type
1 | M | Apple, Apple | Food, Food
2 | F | Pear, Barbie, Soap | Food, Toys, Cleaning
As you can see, the Purchase and Type columns feature multiple values that are comma delimited (some of the cells in these columns actually have up to 50+ values recorded within). I want the data to look like this:
ID | Sex | Purchase | Type
1 | M | Apple | Food
1 | M | Apple | Food
2 | F | Pear | Food
2 | F | Barbie | Toys
2 | F | Soap | Cleaning
Any ideas on how would I be able to do this with SQL? Thanks for your help everyone.
Edit: Just to show that this is different to some of the other questions. The key here is that data for each unique row is contained across two separate columns i.e. the second word in "Purchase" should be linked with the second word in "Type" for ID #1. The other questions I've seen was where the multiple values had been contained in just one column.

Basically you will required a delimited spliter function. There are many around. Here i am using DelimitedSplit8K from Jeff Moden http://www.sqlservercentral.com/articles/Tally+Table/72993/
-- create the sample table
create table #sample
(
ID int,
Sex char,
Purchase varchar(20),
Type varchar(20)
)
-- insert the sample data
insert into #sample (ID, Sex, Purchase, Type) select 1, 'M', 'Apple,Apple', 'Food,Food'
insert into #sample (ID, Sex, Purchase, Type) select 2, 'M', 'Pear,Barbie,Soap', 'Food,Toys,Cleaning'
select s.ID, s.Sex, Purchase = p.Item, Type = t.Item
from #sample s
cross apply DelimitedSplit8K(Purchase, ',') p
cross apply DelimitedSplit8K(Type, ',') t
where p.ItemNumber = t.ItemNumber
drop table #sample

EDIT: The original question as posted had the data as strings, with pipe characters as column delimiters and commas within the columns. The below solution works for that.
The question has since been edited to show that the input data is actually in columns, not as a single string.
I've left the solution here as an interesting version of the original question.
This is an interesting problem. I have a solution that works for a single row of your data. I dont know from the question if you are going to process it row by row, but I assume you will.
If so, this will work. I suspect there might be a better way using xml or without the temp tables, but in any case this is one solution.
declare #row varchar(1000); set #row='2 | F | Pear, Barbie, Soap | Food, Toys, Cleaning'
declare #v table(i int identity, val varchar(1000), subval varchar(100))
insert #v select value as val, subval from STRING_SPLIT(#row,'|')
cross apply (select value as subval from STRING_SPLIT(value,',') s) subval
declare #v2 table(col_num int, subval varchar(100), correlation int)
insert #v2
select col_num, subval,
DENSE_RANK() over (partition by v.val order by i) as correlation
from #v v
join (
select val, row_number()over (order by fst) as Col_Num
from (select val, min(i) as fst from #v group by val) colnum
) c on c.val=v.val
order by i
select col1.subval as ID, col2.subval as Sex, col3.subval as Purchase, col4.subval as Type
from #v2 col1
join #v2 col2 on col2.col_num=2
join #v2 col3 on col3.col_num=3
join #v2 col4 on col4.col_num=4 and col4.correlation=col3.correlation
where col1.col_num=1
Result is:
ID Sex Purchase Type
2 F Pear Food
2 F Barbie Toys
2 F Soap Cleaning

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - Separating a merge field into separate fields based on delimiters - sql

Related

Split column value into substrings and search for pattern in SQL

Compare two rows (both with different ID) & check if their column values are exactly the same. All rows & columns are in the same table

How to show only the latest record in SQL

MS SQL: Transform delimeted string with key value pairs into table where keys are column names

SQL Server: Split values from columns with multiple values, into multiple rows [duplicate]

Categories

Resources