Query to update strings using string_split function - sql

I am trying to update column in table where data is in below format:
Id | ColA
----------
1 Peter,John:Ryan,Jack:Evans,Chris
2 Peter,John:Ryan,Jack
3 Hank,Tom
4
5 Cruise,Tom
I need to split the string by ':' and remove ',' and need to reverse the name and again append the same data separated by: and finally data should be as shown
Id | ColA
----------
1 John Peter:Jack Ryan:Chris Evans
2 John Peter:Jack Ryan
3 Tom Hank
4
5 Tom Cruise
Please let me know how can we achieve this
I tried to use Replace and Substring but how can we do it if we have data some are separated by two colon and some are separated by single colon.
Is there any way to identify and achieve the data in the above formatted one.

Here is a solution for SQL Server 2008 onwards.
It is based on XML and XQuery.
Using XQuery's FLWOR expression allows to tokenize odd vs. even XML elements. The rest is just a couple of the REPLACE() function calls to compose the desired output.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, tokens VARCHAR(1024));
INSERT INTO #tbl (tokens) VALUES
('Peter,John:Ryan,Jack:Evans,Chris'),
('Peter,John:Ryan,Jack'),
('Hank,Tom'),
(''),
('Cruise,Tom');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = ':'
, #comma CHAR(1) = ',';
SELECT ID, tokens
, REPLACE(REPLACE(c.query('
for $x in /root/r[position() mod 2 eq 0]
let $pos := count(root/r[. << $x])
return concat($x, sql:variable("#comma"), (/root/r[$pos])[1])
').value('text()[1]', 'VARCHAR(8000)')
, SPACE(1), #separator), #comma, SPACE(1)) AS result
FROM #tbl
CROSS APPLY (SELECT CAST('<root><r><![CDATA[' +
REPLACE(REPLACE(tokens,#comma,#separator), #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t1(c)
ORDER BY ID;
Output
+----+----------------------------------+----------------------------------+
| ID | tokens | result |
+----+----------------------------------+----------------------------------+
| 1 | Peter,John:Ryan,Jack:Evans,Chris | John Peter:Jack Ryan:Chris Evans |
| 2 | Peter,John:Ryan,Jack | John Peter:Jack Ryan |
| 3 | Hank,Tom | Tom Hank |
| 4 | | NULL |
| 5 | Cruise,Tom | Tom Cruise |
+----+----------------------------------+----------------------------------+
SQL #2 (don't try it, it won't work)
Unfortunately, SQL Server doesn't fully support even XQuery 1.0 standard. XQuery 3.1 is the latest standard. XQuery 1.0 functions fn:substring-after() and fn:substring-before() are badly missing.
In a dream world a solution would be much simpler, along the following:
SELECT *
, c.query('
for $x in /root/r
return concat(fn:substring-after($x, ","), ",", fn:substring-before($x, ","))
')
FROM #tbl
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(tokens, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t1(c);
Please up-vote the following suggestion to improve SQL Server:
SQL Server vNext (post 2019) and NoSQL functionality
It became one of the most popular requests for SQL Server.
The current voting tally is 590 and counting.

Something like this should work:
CREATE TABLE YourTableNameHere (
Id int NULL
,ColA varchar(1000) NULL
);
INSERT INTO YourTableNameHere (Id,ColA) VALUES
(1, 'Peter,John:Ryan,Jack:Evans,Chris')
,(2, 'Peter,John:Ryan,Jack')
,(3, 'Hank,Tom')
,(4, '')
,(5, 'Cruise,Tom');
SELECT
tbl.Id
,STUFF((SELECT
CONCAT(':'
,RIGHT(REPLACE(ss.value, ',', ' '), LEN(REPLACE(ss.value, ',', ' ')) - CHARINDEX(' ', REPLACE(ss.value, ',', ' '), 1)) /*first name*/
,' '
,CASE WHEN CHARINDEX(',', ss.value, 1) > 1 THEN LEFT(REPLACE(ss.value, ',', ' '), CHARINDEX(' ', REPLACE(ss.value, ',', ' '), 1) - 1) /*last name*/ ELSE '' END)
FROM
YourTableNameHere AS tbl_inner
CROSS APPLY string_split(tbl_inner.ColA, ':') AS ss
WHERE
tbl_inner.Id = tbl.Id
FOR XML PATH('')), 1, 1, '') AS ColA
FROM
YourTableNameHere AS tbl;
This uses the string_split function within a FOR XML clause to split the values in ColA by the : character, then replace the , with a space, parse to the left and right of the space, then recombine the parsed values delimited by a : character.
One thing to note here, per Microsoft the output of string_split is not guaranteed to be in the same order as the input:
Note
The order of the output may vary as the order is not guaranteed to match the order of the substrings in the input string.
So in order to guarantee the output of this function is going to concatenate the names back in the same order that they existed in the input column you would either need to implement your own function to split the string or come up with some criteria for combining them in a certain order. For example, you could recombine them in alphabetical order by adding ORDER BY ss.value to the inner query for ColA in the final result set. In my testing using your input the final values were ordered the same as the input column, but it is worth noting that that behaviour is not guaranteed and in order to guarantee it then you need to do more work.

Related

Replace a specific character with blank

How can I replace 'a' to blank?
`Name` `ID`
----------------------------------
`b,c,d,e,abb,a` `1`
`b,c,d,a,e,abb` `2`
`a,b,c,d,a,e,abb` `3`
One way to do it would be to add a , to the beginning and end of each Name, then replace every occurence of ',a,' with ',', then trim the result of the ,:
update table_name
set Name = trim(',' from replace(concat(',', Name, ','), ',a,', ','));
Fiddle
Or if you just want to do a select without changing the rows:
select trim(',' from replace(concat(',', Name, ','), ',a,', ',')) as Name, ID
from table_name;
To address #Iptr's comment, if there can be consecutive a such as a, a, ..., you could use STRING_SPLIT to get rows from comma-separated values, then filter out where the value is a, then STRING_AGG and group by to get the comma separated values back:
select ID, STRING_AGG(u.Value, ',') as Name
from table_name
cross apply STRING_SPLIT (Name, ',') u
where Value <> 'a'
group by ID
Fiddle
Here is a solution based on tokenization via XML/XQuery.
It will work starting from SQL Server 2012 onwards.
Steps:
We are tokenizing a string of tokens via XML.
XQuery FLWOR expression is filtering out the 'a' token.
Reverting it back to a string of tokens.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, tokens VARCHAR(1000));
INSERT INTO #tbl (tokens) VALUES
('b,c,d,e,abb,a'),
('b,c,d,a,e,abb'),
('a,b,c,d,a,e,abb');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = ',';
SELECT t.*
, REPLACE(c.query('
for $x in /root/r/text()
return if ($x = "a") then ()
else data($x)
').value('.', 'VARCHAR(MAX)'), SPACE(1), #separator) AS Result
FROM #tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(tokens, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t1(c);
Output
+----+-----------------+-------------+
| ID | tokens | Result |
+----+-----------------+-------------+
| 1 | b,c,d,e,abb,a | b,c,d,e,abb |
| 2 | b,c,d,a,e,abb | b,c,d,e,abb |
| 3 | a,b,c,d,a,e,abb | b,c,d,e,abb |
+----+-----------------+-------------+
Try as follow:
select Replace(name, N'a', N'') as RepName , ID from yourTable
Try this.
SELECT ID,Name, REPLACE(Name, 'a', ' ')
FROM tableName;

SQL REPLACE with Multiple [0-9]

I have a string that I want to replace a group of numbers.
The string contains groupings of numbers (and a few letters). 'A12 456 1 65 7944'
I want to replace the group of 3 numbers with 'xxx', and the group of 4 numbers with 'zzzz'
I thought something like REPLACE(#str, '%[0-9][0-9][0-9]%', 'xxx') would work, but it doesn't. I can't even get '%[0-9]%' to replace anything.
If REPLACE is not suitable, how can I replace groups of numbers?
Please try the following solution based on XML and XQuery.
Notable points:
We are tokenizing input string as XML in the CROSS APPLY clause.
XQuery's FLWOR expression is checking for numeric integer values with
a particular length, and substitutes then with a replacement string.
XQuery .value() method outputs back a final result.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, tokens VARCHAR(MAX));
INSERT INTO #tbl (tokens) VALUES
('A12 456 1 65 7944');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = SPACE(1);
SELECT t.*
, c.query('
for $x in /root/r/text()
return if (xs:int($x) instance of xs:int and string-length($x)=3) then "xxx"
else if (xs:int($x) instance of xs:int and string-length($x)=4) then "zzzz"
else data($x)
').value('.', 'VARCHAR(MAX)') AS Result
FROM #tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(tokens, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t1(c);
Output
+----+-------------------+-------------------+
| ID | tokens | Result |
+----+-------------------+-------------------+
| 1 | A12 456 1 65 7944 | A12 xxx 1 65 zzzz |
+----+-------------------+-------------------+

How to compare two comma-separated value in a column, and return only matching values in SQL

I have a column with comma-separated strings, I need to compare it with another comma-separated column and return only matching values.
For example
column 1 = John, james, steve
column 2 = john, smith, will, james
I need a result like John,james since it is available in both column 1 and column 2. Is this possible in SQL?
As I'm using SQL Server 2012, I'm not able to use the String_split function
You can try the following solution.
It is using XQuery and its FLWOR expression.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, col1 VARCHAR(MAX), col2 VARCHAR(MAX));
INSERT INTO #tbl (col1, col2) VALUES
('John, james, steve', 'john, smith, will, james'),
('Mary, Lisa', 'Mary');
-- DDL and sample data population, end
DECLARE #separator CHAR(2) = ', ';
SELECT *
, REPLACE(TRY_CAST('<root>' +
'<source><r><![CDATA[' + REPLACE(col1, #separator, ']]></r><r><![CDATA[') +
']]></r></source>' +
'<target><r><![CDATA[' + REPLACE(col2, #separator, ']]></r><r><![CDATA[') +
']]></r></target>' +
'</root>' AS XML)
.query('
for $x in /root/source/r/text(),
$y in /root/target/r/text()
where lower-case($y) eq lower-case($x)
return data($x)
').value('.', 'NVARCHAR(MAX)'), SPACE(1), ',') AS result
FROM #tbl;
Output
+----+--------------------+--------------------------+------------+
| ID | col1 | col2 | result |
+----+--------------------+--------------------------+------------+
| 1 | John, james, steve | john, smith, will, james | John,james |
| 2 | Mary, Lisa | Mary | Mary |
+----+--------------------+--------------------------+------------+
This can certainly be done using STRING_SPLIT():
DECLARE #a VARCHAR(MAX) = 'john,james,steve'
DECLARE #b VARCHAR(MAX) = 'john,smith,will,james'
SELECT a.value
FROM string_split(#a, ',') a
JOIN string_split(#b, ',') b
ON a.value = b.value
But I think the broader question is if you should be doing this in SQL at all. SQL is much better and more efficient at set-based operations, and this approach is very iterative going row by row. I would consider restructing your data so it's easier to do this or read it into memory and use C# to do this type of manipulation.

Remove specific characters for a specific text via regex in SQL Server 2008

I am using SQL Server 2008.
Data:
AA00012345/99
AX0000045687044/78
XB000077008/12
What I need to do is to remove consecutive zeros at the beginning only (after a number except 0 is seen, consecutive zeros are allowed) and the slash (/) chars from the string in the output by using REGEX.
Expected output:
AA1234599
AX4568704478
XB7700812
I managed to do it via this query:
select dbo.RegexGroup(t.Value,'(?<prefix>\w\w)(?<zeros>0*)(?<beforeslash>\d*)/(?<afterslash>\d\d)', 'prefix') +
dbo.RegexGroup(t.Value,'(?<prefix>\w\w)(?<zeros>0*)(?<beforeslash>\d*)/(?<afterslash>\d\d)', 'beforeslash') +
dbo.RegexGroup(t.Value,'(?<prefix>\w\w)(?<zeros>0*)(?<beforeslash>\d*)/(?<afterslash>\d\d)', 'afterslash')
from table t
However, I believe there should be a better and more professional way to handle this issue. Regex usage is a must. Any advice would be appreciated.
Here's one option. Query divides strings into two parts, first - letters, second - rest part, starting with number. 0 is removed by replacing with space and applying ltrim function
declare #t table (c varchar(100))
insert into #t
values ('AA00012345/99')
,('AX0000045687044/78')
,('XB000077008/12')
select
t.c, q2.p1 + replace(replace(ltrim(replace(q2.p2, '0', ' ')), ' ', '0'), '/', '')
from
#t t
cross apply (select ci = patindex('%[0-9]%', t.c)) q1
cross apply (select p1 = substring(t.c, 1, q1.ci - 1), p2 = substring(t.c, q1.ci, len(t.c))) q2
Output:
AA00012345/99 AA12345/99
AX0000045687044/78 AX45687044/78
XB000077008/12 XB77008/12
DECLARE #v varchar(100);
SET #V = 'AA00012345/99';
SELECT REPLACE(REPLACE(#v, SUBSTRING(#v, PatIndex('%[0]%', #v), 1), ''), '/', '')
SET #V = 'AX0000045687044/78';
SELECT REPLACE(#v, SUBSTRING(#v, PatIndex('%[0]%', #v), 1), '')
SET #V = 'XB000077008/12';
SELECT REPLACE(#v, SUBSTRING(#v, PatIndex('%[0]%', #v), 1), '')
GO
| (No column name) |
| :--------------- |
| AA1234599 |
| (No column name) |
| :--------------- |
| AX4568744/78 |
| (No column name) |
| :--------------- |
| XB778/12 |
dbfiddle here

Sql Procedure to count words of a given string [duplicate]

I'm trying to count how many words there are in a string in SQL.
Select ("Hello To Oracle") from dual;
I want to show the number of words. In the given example it would be 3 words though there could be more than one space between words.
You can use something similar to this. This gets the length of the string, then substracts the length of the string with the spaces removed. By then adding the number one to that should give you the number of words:
Select length(yourCol) - length(replace(yourcol, ' ', '')) + 1 NumbofWords
from yourtable
See SQL Fiddle with Demo
If you use the following data:
CREATE TABLE yourtable
(yourCol varchar2(15))
;
INSERT ALL
INTO yourtable (yourCol)
VALUES ('Hello To Oracle')
INTO yourtable (yourCol)
VALUES ('oneword')
INTO yourtable (yourCol)
VALUES ('two words')
SELECT * FROM dual
;
And the query:
Select yourcol,
length(yourCol) - length(replace(yourcol, ' ', '')) + 1 NumbofWords
from yourtable
The result is:
| YOURCOL | NUMBOFWORDS |
---------------------------------
| Hello To Oracle | 3 |
| oneword | 1 |
| two words | 2 |
Since you're using Oracle 11g it's even simpler-
select regexp_count(your_column, '[^ ]+') from your_table
Here is a sqlfiddle demo
If your requirement is to remove multiple spaces too, try this:
Select length('500 text Oracle Parkway Redwood Shores CA') - length(REGEXP_REPLACE('500 text Oracle Parkway Redwood Shores CA',
'( ){1,}', '')) NumbofWords
from dual;
Since I have used the dual table you can test this directly in your own development environment.
DECLARE #List NVARCHAR(MAX) = ' ab a
x'; /*Your column/Param*/
DECLARE #Delimiter NVARCHAR(255) = ' ';/*space*/
DECLARE #WordsTable TABLE (Data VARCHAR(1000));
/*convert by XML the string to table*/
INSERT INTO #WordsTable(Data)
SELECT Data = y.i.value('(./text())[1]', 'VARCHAR(1000)')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
/*Your total words*/
select count(*) NumberOfWords
from #WordsTable
where Data is not null;
/*words list*/
select *
from #WordsTable
where Data is not null
/from this Logic you can continue alon/