Split a column with comma delimiter - sql

I have a table with 3 columns with the data given below.
ID | Col1 | Col2 | Status
1 8007590006 8002240001,8002170828 I
2 8002170828 8002000004 I
3 8002000001 8002240001 I
4 8769879809 8002000001 I
5 8769879809 8002000001 I
Col2 can contain multiple comma delimited values. I need to update status to C if there is a value in col2 that is also present in col1.
For example, for ID = 1, col2 contains 8002170828 which is present in Col1, ID = 2. So, status = 'C'
From what I tried, I know it won't work where there are multiple values as I need to split that data and get individual values and then apply update.
UPDATE Table1
SET STATUS = 'C'
WHERE Col1 IN (SELECT Col2 FROM Table1)

If you are using SQL Server 2016 or later, then STRING_SPLIT comes in handy:
WITH cte AS (
SELECT ID, Col1, value AS Col2
FROM Table1
CROSS APPLY STRING_SPLIT(Col2, ',')
)
UPDATE t1
SET Status = 'C'
FROM Table1 t1
INNER JOIN cte t2
ON t1.Col1 = t2.Col2;
Demo

This answer is intended as a supplement to Tim's answer
As you don't have the native string split that came in 2016 we can make one:
CREATE FUNCTION dbo.STRING_SPLIT
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT y.i.value('(./text())[1]', 'nvarchar(4000)') as value
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO
--credits to sqlserverperfomance.com for the majority of this code - https://sqlperformance.com/2012/07/t-sql-queries/split-strings
Now Tim's answer should work out for you, so I won't need to repeat it here
I chose an xml based approach because it performs well and your data seems sane and won't have any xml chars in it. If it ever will contain xml chars like > that will break the parsing they should be escaped then unescaped after split
If you aren't allowed to make functions you can extract everything between the RETURNS and the GO, insert it into Tim's query,tweak the variable names to be column names and it'll still work out

Related

How to extract pattern from string in SQL Server?

I have a host information table stored in a database of SQL Server, and the table has a text column storing a string formatted like Ansible inventory. See the text below for a sample item in the text column.
host-001.servers.company.com desc='Production Web Cache' env='Prod' patch_round='Beta' dc='Main' rhel_v='7.6' primary='admin#company.com' secondary='manager#company.com'
I need to extract certain attributes from the text column, e.g. extract desc='Production Web Cache', and get its value Production Web Cache. I want to use regular expression in a SQL query and hope to get some pointers.
Or, if you know another way of achieving this purpose, I will also highly appreciate your hints. Let me know if you need more clarification.
A very similar approach to John. I use a JSON splitter firstly to get the data into parts, though this puts the value with the next header. I use CHARINDEX to find the end of the value, and then use that LEFT/STUFF to get the 2 values into their respective parts. Then I use LAG to get the actual header, rather that the next values header. Finally, I remove the surrounding quotes.
This follows on the assumptions from my comment:
A domain is present at the start and followed by a space.
Values cannot contain =.
All values are enclosed in single quotes (') and all names are not
Note I don't include the domain in the results, but the SQL should give you more than enough to work out how to add it:
DECLARE #YourString nvarchar(4000) = N'host-001.servers.company.com desc=''Production Web Cache'' env=''Prod'' patch_round=''Beta'' dc=''Main'' rhel_v=''7.6'' primary=''admin#company.com'' secondary=''manager#company.com''';
WITH CTE AS(
SELECT *,
LAG(ContentHeader) OVER (ORDER BY [Key]) AS ActualHeader
FROM (VALUES(#YourString))V(YourString)
CROSS APPLY(VALUES(STUFF(#YourString, 1, CHARINDEX(N' ',#YourString),N'')))S(NewString)
CROSS APPLY OPENJSON('["' + REPLACE(NewString,'=','","') + '"]')OJ
CROSS APPLY(VALUES(NULLIF(CHARINDEX('''',OJ.[value],2),0)))CI(I)
CROSS APPLY(VALUES(LEFT(OJ.[Value],CI.I),STUFF(OJ.[Value],1,ISNULL(CI.I+1,0),'')))P(ContentValue,ContentHeader))
SELECT ActualHeader AS Header,
REPLACE([ContentValue],'''','') AS [Value]
FROM CTE
WHERE ActualHeader IS NOT NULL;
db<>fiddle
A little ugly, but using a bit of JSON (to GTD the sequence) and the window function lead() over()
Example
Declare #YourTable table (ID int,SomeCol varchar(max))
Insert Into #YourTable values
(1,'host-001.servers.company.com desc=''Production Web Cache'' env=''Prod'' patch_round=''Beta'' dc=''Main'' rhel_v=''7.6'' primary=''admin#company.com'' secondary=''manager#company.com''')
Select A.ID
,Host = left(SomeCol,charindex(' ',SomeCol+' '))
,B.*
From #YourTable A
Cross Apply (
Select Item = ltrim(rtrim(right(Value,charindex(' ',reverse(Value)+' '))))
,Value = ltrim(rtrim(replace(
IsNull(lead( left(Value,nullif(len(Value)+1-charindex(' ',reverse(Value)+' '),0)),1) over (order by [Key])
,lead(right(Value,charindex(' ',reverse(Value)+' ')),1) over (order by [key])
),'''','')))
From OpenJSON( '["'+replace(string_escape(SomeCol,'json'),'=','","')+'"]' )
) B
Where B.Value is not null
Results
ID Host Item Value
1 host-001.servers.company.com desc Production Web Cache
1 host-001.servers.company.com env Prod
1 host-001.servers.company.com patch_round Beta
1 host-001.servers.company.com dc Main
1 host-001.servers.company.com rhel_v 7.6
1 host-001.servers.company.com primary admin#company.com
1 host-001.servers.company.com secondary manager#company.com
EDIT - Injected "HOST="
Declare #YourTable table (ID int,SomeCol varchar(max))
Insert Into #YourTable values
(1,'host-001.servers.company.com desc=''Production Web Cache'' env=''Prod'' patch_round=''Beta'' dc=''Main'' rhel_v=''7.6'' primary=''admin#company.com'' secondary=''manager#company.com''')
Select A.ID
,B.*
From #YourTable A
Cross Apply (
Select Item = ltrim(rtrim(right(Value,charindex(' ',reverse(Value)+' '))))
,Value = ltrim(rtrim(replace(
IsNull(lead(left(Value,nullif(len(Value)+1-charindex(' ',reverse(Value)+' '),0)),1) over (order by [Key])
,lead(right(Value,charindex(' ',reverse(Value)+' ')),1) over (order by [key])
),'''','')))
From OpenJSON( '["'+replace(string_escape('host='+SomeCol,'json'),'=','","')+'"]' )
) B
Where B.Value is not null
Results
ID Item Value
1 host host-001.servers.company.com
1 desc Production Web Cache
1 env Prod
1 patch_round Beta
1 dc Main
1 rhel_v 7.6
1 primary admin#company.com
1 secondary manager#company.com
Ideally, your data should be stored in separate columns. But if you are going to cram it into one column, at least use a recognized format such as XML or JSON.
Given that single quotes are valid XML attribute delimiters, you can transform this into XML and use XQuery.
It's not pretty, because the hostname value is not delimited
SELECT
v3.n.value('#host','varchar(255)'),
v3.n.value('#desc','varchar(1000)')
FROM t
CROSS APPLY(VALUES(
CHARINDEX(' ', t.value)
)) v1(space)
CROSS APPLY(VALUES(
CAST(
'<x host=''' +
CASE WHEN v1.space = 0
THEN t.value
ELSE LEFT(t.value, v1.space - 1) + '''' + SUBSTRING(t.value, v1.space, LEN(t.value))
END +
' />'
AS xml)
)) v2(xml)
CROSS APPLY v2.xml.nodes('x') v3(n);
db<>fiddle

Joining sql tables with no common columns without ordering

I have my data in a form of 2 coma separated strings
DECLARE #ids nvarchar(max) = '1,2,3'
DECLARE #guids nvarchar(max) =
'0000000-0001-0000-0000-000000000000,
`0000000-0022-0000-0000-000000000000`,
`0000000-0013-0000-0000-000000000000'`
I need them in a table as separate columns based on their position in the string
Table1
| Id | Guid |
| 1 | 0000000-0001-0000-0000-000000000000 |
| 2 | 0000000-0022-0000-0000-000000000000 |
| 3 | 0000000-0013-0000-0000-000000000000 |
I can split both strings into separate tables by using
DECLARE #split_ids
(value nvarchar(max))
DECLARE #xml xml
SET #xml = N'<root><r>' + replace(#ids, ',' ,'</r><r>') + '</r></root>'
INSERT INTO #split_ids(Value)
SELECT r.value('.','nvarchar(max)')
FROM #xml.nodes('//root/r') as records(r)
I've tried
SELECT t1.*, t2.*
FROM (SELECT t1.*, row_number() OVER (ORDER BY [Value]) as seqnum
from cte_Ids t1
) t1 FULL OUTER JOIN
(SELECT t2.*, row_number() OVER (ORDER BY [Value]) as seqnum
from cte_barcodes t2
) t2
ON t1.seqnum = t2.seqnum;
But that orders the tables by Value and my data is random and can't be ordered.
Is there a way of joining tables based on their row numbers without ordering them first?
Or is there another way of inserting data from a string to a table?
You do not need to split and/or insert the input data into separate tables. In this situation you simply need to parse the input strings and get the substrings and their ordinal positions (an XML-based approach or a splitter function are possible solutions).
But if you use SQL Server 2016+, a JSON-based approach is also an option. The idea is to transform the strings into valid JSON arrays (1,2,3 into [1,2,3]), parse the arrays with OPENJSON() and join the tables returned from OPENJSON() calls. As is explained in the documentation, the columns that OPENJSON() function returns (when the default schema is used) are key, value and type and in case of JSON array, the key column holds the index of the element in the specified array.
DECLARE #ids nvarchar(max) = N'1,2,3'
DECLARE #guids nvarchar(max) = N'0000000-0001-0000-0000-000000000000,0000000-0022-0000-0000-000000000000,0000000-0013-0000-0000-000000000000'
SELECT j1.[value] AS Id, j2.[value] AS Guid
FROM OPENJSON(CONCAT('[', #ids, ']')) j1
JOIN OPENJSON(CONCAT('["', REPLACE(#guids, ',', '","'), '"]')) j2 ON j1.[key] = j2.[key]
Result:
Id Guid
1 0000000-0001-0000-0000-000000000000
2 0000000-0022-0000-0000-000000000000
3 0000000-0013-0000-0000-000000000000
You need row numbering over initial order, this means that you should use some constant expression in window function order_by clause.
SQL server does not allow use constants directly, but over(order_by (select 1)) is allowed:
SELECT t1.*, t2.*
FROM (SELECT t1.*, row_number() OVER (ORDER BY (select 1)) as seqnum
from cte_Ids t1
) t1 FULL OUTER JOIN
(SELECT t2.*, row_number() OVER (ORDER BY (select 1)) as seqnum
from cte_barcodes t2
) t2
ON t1.seqnum = t2.seqnum;
Note that this doesn't guarantee initial order (it will be unspecified), but often it behaves correctly :)
One of solutions is to parse your comma separated values in a loop (using WHILE) from both variables. Then you could insert those extracted in the same iteration values at once as one row to a table.
One solution uses recursive CTEs:
with cte as (
select cast(null as nvarchar(max)) as id, cast(null as nvarchar(max)) as guid, #ids + ',' as rest_ids, #guids + ',' as rest_guids, 0 as lev
union all
select left(rest_ids, charindex(',', rest_ids) - 1),
left(rest_guids, charindex(',', rest_guids) - 1),
stuff(rest_ids, 1, charindex(',', rest_ids), ''),
stuff(rest_guids, 1, charindex(',', rest_guids), ''),
lev + 1
from cte
where rest_ids <> ''
)
select id, guid
from cte
where lev > 0;
Here is a db<>fiddle.

How to SELECT string between second and third instance of ",,"?

I am trying to get string between second and third instance of ",," using SQL SELECT.
Apparently functions substring and charindex are useful, and I have tried them but the problem is that I need the string between those specific ",,"s and the length of the strings between them can change.
Can't find working example anywhere.
Here is an example:
Table: test
Column: Column1
Row1: cat1,,cat2,,cat3,,cat4,,cat5
Row2: dogger1,,dogger2,,dogger3,,dogger4,,dogger5
Result: cat3dogger3
Here is my closest attempt, it works if the strings are same length every time, but they aren't:
SELECT SUBSTRING(column1,LEN(LEFT(column1,CHARINDEX(',,', column1,12)+2)),LEN(column1) - LEN(LEFT(column1,CHARINDEX(',,', column1,20)+2)) - LEN(RIGHT(column1,CHARINDEX(',,', (REVERSE(column1)))))) AS column1
FROM testi
Just repeat sub-string 3 times, each time moving onto the next ",," e.g.
select
-- Substring till the third ',,'
substring(z.col1, 1, patindex('%,,%',z.col1)-1)
from (values ('cat1,,cat2,,cat3,,cat4,,cat5'),('dogger1,,dogger2,,dogger3,,dogger4,,dogger5')) x (col1)
-- Substring from the first ',,'
cross apply (values (substring(x.col1,patindex('%,,%',x.col1)+2,len(x.col1)))) y (col1)
-- Substring from the second ',,'
cross apply (values (substring(y.col1,patindex('%,,%',y.col1)+2,len(y.col1)))) z (col1);
And just to reiterate, this is a terrible way to store data, so the best solution is to store it properly.
Here is an alternative solution using charindex. The base idea is the same as in Dale K's an answer, but instead of cutting the string, we specify the start_location for the search by using the third, optional parameter, of charindex. This way, we get the location of each separator, and could slip each value off from the main string.
declare #vtest table (column1 varchar(200))
insert into #vtest ( column1 ) values('dogger1,,dogger2,,dogger3,,dogger4,,dogger5')
insert into #vtest ( column1 ) values('cat1,,cat2,,cat3,,cat4,,cat5')
declare #separetor char(2) = ',,'
select
t.column1
, FI.FirstInstance
, SI.SecondInstance
, TI.ThirdInstance
, iif(TI.ThirdInstance is not null, substring(t.column1, SI.SecondInstance + 2, TI.ThirdInstance - SI.SecondInstance - 2), null)
from
#vtest t
cross apply (select nullif(charindex(#separetor, t.column1), 0) FirstInstance) FI
cross apply (select nullif(charindex(#separetor, t.column1, FI.FirstInstance + 2), 0) SecondInstance) SI
cross apply (select nullif(charindex(#separetor, t.column1, SI.SecondInstance + 2), 0) ThirdInstance) TI
For transparency, I saved the separator string in a variable.
By default the charindex returns 0 if the search string is not present, so I overwrite it with the value null, by using nullif
IMHO, SQL Server 2016 and its JSON support in the best option here.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, Tokens VARCHAR(500));
INSERT INTO #tbl VALUES
('cat1,,cat2,,cat3,,cat4,,cat5'),
('dogger1,,dogger2,,dogger3,,dogger4,,dogger5');
-- DDL and sample data population, end
WITH rs AS
(
SELECT *
, '["' + REPLACE(Tokens
, ',,', '","')
+ '"]' AS jsondata
FROM #tbl
)
SELECT rs.ID, rs.Tokens
, JSON_VALUE(jsondata, '$[2]') AS ThirdToken
FROM rs;
Output
+----+---------------------------------------------+------------+
| ID | Tokens | ThirdToken |
+----+---------------------------------------------+------------+
| 1 | cat1,,cat2,,cat3,,cat4,,cat5 | cat3 |
| 2 | dogger1,,dogger2,,dogger3,,dogger4,,dogger5 | dogger3 |
+----+---------------------------------------------+------------+
It´s the same as #"Yitzhak Khabinsky" but i think it looks clearer
WITH CTE_Data
AS(
SELECT 'cat1,,cat2,,cat3,,cat4,,cat5' AS [String]
UNION
SELECT 'dogger1,,dogger2,,dogger3,,dogger4,,dogger5' AS [String]
)
SELECT
A.[String]
,Value3 = JSON_VALUE('["'+ REPLACE(A.[String], ',,', '","') + '"]', '$[2]')
FROM CTE_Data AS A

fuzzy join in SQL

I was hoping someone could shed some light for me on my issue.
I need to be able to join the following two tables together in SQL
Values in table 1 for some column
QWERTY10
QAZWSXEDCR10
QAZWSXED1230
Values in table 2 for some column
QWWERTY20
QAZWSXEDCR20
QAZWSXED1240
the result that I need is
QWERTY100000 QWERTY200000
QAZWSXEDCR10 QAZWSXEDCR20
QAZWSXED1230 QAZWSXED1240
Now, for QWERTY10000 to be linked to QWERTY20000 I need to do the join on the first 6 characters of the value in the field
but for the QAZWSXEDCR10 to be linked to QAZWSXEDCR20 I need to do a join on the first 10 characters of the value in the field. If I do a join on the first 6 characters only then I will get duplicates. I will have smth like this:
QAZWSXEDCR10 QAZWSXEDCR20
QAZWSXEDCR10 QAZWSXED1240
QAZWSXED1230 QAZWSXEDCR20
QAZWSXED1230 QAZWSXED1240
and I also need QAZWSXED1230 to be linked to QAZWSXED1240 and there I need to do a join on 8 characters to make it work.
im having a hard time to figure out how to join my data together. I would like to avoid doing 10 different joins each based on a different number of characters.
eg do a join on 6 characters first and if not successful then do the join on 7, 8, 9 and 10. - there must be a different way...
Can someone recommend a solution here?
KR
As mentioned in Milney's comment, PatIndex may help by finding the string location of the first number - or special character if applicable. You can then construct a substring of the matching portions of the strings
select table1.col as col1,
table2.col as col2
from table1
inner join
table2
on substring( table1.col, 1, patindex( '[0-9]', table1.col ) ) =
substring( table2.col, 1, patindex( '[0-9]', table2.col ) )
This is a modification of Alex's answer, just to handle the case where one or both values do not contain a digit:
select t1.col as col1, t2.col as col2
from table1 t1 inner join
table2 t2
on left(t1.col, patindex('%[0-9]%', t1.col+'0')) = left(t2.col, patindex('%[0-9]%', t2.col+'0'));
I Think this will help
Create table #table1 ( strValue varchar(100) )
Create table #table2 ( strValue varchar(100) )
Insert Into #table1 ( strValue ) Values
('QWERTY10'), ('QAZWSXEDCR10'),('QAZWSXED1230')
Insert Into #table2 ( strValue ) Values
('QWERTY20'), ('QAZWSXEDCR20'),('QAZWSXED1240')
Declare #MaxlengthT1 int, #MaxlengthT2 int
SELECT #MaxlengthT1 = MAX(LEN(strValue)) FROM #table1
SELECT #MaxlengthT2 = MAX(LEN(strValue)) FROM #table2
select a.strValue + REPLICATE('0',#MaxlengthT1 - LEN(a.strValue)) as col1,
b.strValue + REPLICATE('0',#MaxlengthT1 - LEN(b.strValue)) as col2
from #table1 a
inner join
#table2 b
on substring( a.strValue, 0, patindex( '%[0-9]%', a.strValue )) =
substring( b.strValue, 0, patindex( '%[0-9]%', b.strValue ))
DROP TABLE #table1
DROP TABLE #table2

need to get description of id from another table which are pipe separated

I have two tables table1 and table2
in table1 there is a column with name typeids in which ids are pipe separated
ex: 2|3|4 --> these ids are the primary key in table2
table2 contains Id, Description which has data like
2-text1
3-text2
4-text3
now I need to get the table1 contents but 2|3|4 will be replaced by
text1|text2|text3
This is a really poor design of your database and as others have said you should do your level best to get it changed.
That said, this is possible. It is just ugly as sin and I am sure performs like a dog, but you can blame that on your database designer. In short, you need to split your id string on the | character, join each element to your table2 and then concatenate them all back together using for xml. As you are using SQL Server 2016 you can use STRING_SPLIT instead of the function I have used below, though as I don't currently have access to a 2016 box here we are (Working example):
create function dbo.StringSplit
(
#str nvarchar(4000) = ' ' -- String to split.
,#delimiter as nvarchar(1) = ',' -- Delimiting value to split on.
,#num as int = null -- Which value to return.
)
returns table
as
return
(
-- Start tally table with 10 rows.
with n(n) as (select n from (values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n(n))
-- Select the same number of rows as characters in isnull(#str,'') as incremental row numbers.
-- Cross joins increase exponentially to a max possible 10,000 rows to cover largest isnull(#str,'') length.
,t(t) as (select top (select len(isnull(#str,'')) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)
-- Return the position of every value that follows the specified delimiter.
,s(s) as (select 1 union all select t+1 from t where substring(isnull(#str,''),t,1) = #delimiter)
-- Return the start and length of every value, to use in the SUBSTRING function.
-- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
,l(s,l) as (select s,isnull(nullif(charindex(#delimiter,isnull(#str,''),s),0)-s,4000) from s)
select rn as ItemNumber
,Item
from(select row_number() over(order by s) as rn
,substring(isnull(#str,''),s,l) as item
from l
) a
where rn = #num -- Return a specific value where specified,
or #num is null -- Or everything where not.
)
go
declare #t1 table (id varchar(10));
insert into #t1 values
('2|3|4')
,('5|6|7');
declare #t2 table (id varchar(1), description varchar(10));
insert into #t2 values
('2','text1')
,('3','text2')
,('4','text3')
,('5','text4')
,('6','text5')
,('7','text6')
;
select t1.id
,stuff((select '|' + t2.description
from #t1 as t1a
cross apply dbo.StringSplit(t1a.id,'|',null) as s
join #t2 as t2
on s.Item = t2.id
where t1.id = t1a.id
for xml path('')
),1,1,''
) as t
from #t1 as t1;
Output:
+-------+-------------------+
| id | t |
+-------+-------------------+
| 2|3|4 | text1|text2|text3 |
| 5|6|7 | text4|text5|text6 |
+-------+-------------------+