Replace string without fixed length - sql

I have some data that I'm looking at that has text formatting stored within a NTEXT field.
Happy enough with SQL Replace to remove data of a known length and format, however there are some fields with what looks like colour formatting and I'm trying to find a way to remove these.
An example of the data below, however (if possible) I would like to be able to remove whatever numbers follow the colours in the data but can't see how to introduce a wildcard into the replace statement.
Something like '\red***\green\***\blue***' as per Excel, but this doesn't work in Sql Server.
declare #str varchar(1500) = '\red3\green73\blue125;Jimmy Jazz\red31\green73\blue125;'
select #str,
replace(#str,'\red31\green73\blue125;','')
Any pointers would be gratefully received, thanks in advance.

Based on your sample data it would appear that you only need to remove the numbers in your string you can use patreplace8k or using patextract8K. Note the sample data and examples below:
-- Sample data
DECLARE #strings TABLE(stringId INT IDENTITY, string VARCHAR(100));
INSERT #strings VALUES('DeepPurple1978\yellow2\red009;pink\black3322'),
('red202\yellow5\red009;hotpink2'),('purple999\gray65\violet;blue\yellow381');
--==== Solution #1 Patreplace8k
SELECT
s.stringId,
pr.newString
FROM #strings AS s
CROSS APPLY samd.patReplace8K(s.string,'[0-9]','') AS pr;
--==== Solution #2 PatExtract8k + STRING_AGG (SQL 2017+)
SELECT
s.stringId,
NewString = STRING_AGG(pe.Item,'') WITHIN GROUP (ORDER BY pe.ItemNumber)
FROM #strings AS s
CROSS APPLY samd.patExtract8K(s.string,'[0-9]') AS pe
GROUP BY s.stringId;
--==== Solution #3 PatExtract8k + XML Concatination (Pre SQL 2017\)
SELECT
s.stringId,
NewString =
(
SELECT pe.item+''
FROM #strings AS s2
CROSS APPLY samd.patExtract8K(s2.string,'[0-9]') AS pe
WHERE s.stringId = s2.stringid
ORDER BY pe.itemNumber
FOR XML PATH('')
)
FROM #strings AS s
GROUP BY s.stringId;
Each of these solutions return:
stringId NewString
----------- -------------------------------------
1 DeepPurple\yellow\red;pink\black
2 red\yellow\red;hotpink
3 purple\gray\violet;blue\yellow
The second and third leverage concatenation, the second compatible with SQL Server 2017+ the third works with earlier versions (you did not include what version you are on.)
To only strip the numbers that follow one or more pre-defined colors you could use patternsplitCM. Note the use of a table with a group of colors your are seeking; in the real world I'd use a real table.
-- Colors
DECLARE #colors TABLE(color VARCHAR(20) PRIMARY KEY);
INSERT #colors VALUES('red'),('green'),('blue'),('yellow'),('purple'),('grey');
-- Sample data
DECLARE #strings TABLE(stringId INT IDENTITY, string VARCHAR(100));
INSERT #strings VALUES('Burger1978\yellow2\red009;pink\86thisfool'),
('red202\yellow5\red009;Freddy99'),('green999\grey65\violet;blue\yellow381');
SELECT
s.stringId, s.string, NewString =
(
SELECT
(
SELECT SUBSTRING(f.Item, IIF(f.M=0 AND EXISTS (SELECT c.Color FROM #colors AS c
WHERE c.Color = f.L),NULLIF(PATINDEX('%[^0-9]',f.item),0),1),8000)
FROM
(
SELECT ps.ItemNumber, ps.Item, ps.[Matched],
LAG(ps.Item,1,ps.Item) OVER (ORDER BY ps.ItemNumber)
FROM dbo.PatternSplitCM(s.string,'[^0-9\ ;]') AS ps
) AS f(ItemNumber,Item,M,L)
ORDER BY f.ItemNumber
FOR XML PATH(''), TYPE
).value('(text())[1]','varchar(8000)')
)
FROM #strings AS s;
Returns:
stringId string NewString
----------- --------------------------------------------- ----------------------------------------
1 Burger1978\yellow2\red009;pink\86thisfool Burger1978\yellow\red;pink\86thisfool
2 red202\yellow5\red009;Freddy99 red\yellow\red;Freddy99
3 green999\grey65\violet;blue\yellow381 green\grey\violet;blue\yellow

Related

T-SQL: Count Numbers of semicolons before expression

I got a table with strings that look like that:
'9;1;test;A;11002'
How would I count how many semicolons are there before the 'A'?
Cheers!
Using string functions
select len(left(str,charindex(str,'A')) - len(replace(left(str,charindex(str,'A'), ';', '')) n
from tbl
Hint1: The whole issue has some smell... You should not store your data as CSV string. But sometimes we have to work with what we have...
Hint2: The following needs SQL-Server v2016. With an older version we'd need to do something similar based on XML.
Try this:
--A declared table to mockup your issue
DECLARE #tbl TABLE(ID INT IDENTITY, YourCSVstring VARCHAR(100));
INSERT INTO #tbl(YourCSVstring)
VALUES('9;1;test;A;11002');
--the query
SELECT t.ID
,A.*
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVstring,';','","'),N'"]')) A;
The idea in short:
We use some replacements to translate your CSV-string to a JSON array.
Now we can use use OPENJSON() to read it.
The value is the array item, the key its zero-based index.
Proceed with this however you need it.
Just to give you some fun: You can easily read the CSV type-safe into columns by doubling the [[ and using WITH to specify your columns:
SELECT t.ID
,A.*
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT(N'[["',REPLACE(t.YourCSVstring,';','","'),N'"]]'))
WITH(FirstNumber INT '$[0]'
,SecondNumber INT '$[1]'
,SomeText NVARCHAR(100) '$[2]'
,YourLetterA NVARCHAR(100) '$[3]'
,FinalNumber INT '$[4]')A
returns:
ID FirstNumber SecondNumber SomeText YourLetterA FinalNumber
1 9 1 test A 11002

How to add '' and , for multiple ID in SQL Server

I am writing a SELECT query that has multiple id, and I have to manually add '','' (e.g '12L','22C').
I have around 2000 id in an Excel sheet.
Is there any quicker way to add '','' to all the ID?
SELECT id, name
FROM table
WHERE id IN ('12L', '22C', 33j, 7k, 44J, 234C)
DECLARE #Ids VARCHAR(MAX) = '12L,22C,33j,7k,44J,234C'
--Your question's answer.
DECLARE #Splitted VARCHAR(MAX) = STUFF((
SELECT CONCAT(',''', value, '''')
FROM string_split(#Ids, ',')
FOR XML PATH('')), 1, 1, '')
SELECT #Splitted
--'12L','22C','33j','7k','44J','234C'
OR simplified
SELECT id, name from table where id in (SELECT value FROM string_split(#Ids, ','))
string_split: for more information docs
concat: for more information docs
Here is a conceptual example for you. It will work in SQL Server 2012 onwards.
It is a three step process:
Convert input string into XML.
Convert XML into a relational resultset inside the CTE.
Join with a DB table.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, Code VARCHAR(10), City VARCHAR(50));
INSERT INTO #tbl (Code, City) VALUES
('10T', 'Miami'),
('45L', 'Orlando'),
('50Z', 'Dallas'),
('70W', 'Houston');
-- DDL and sample data population, end
DECLARE #Str VARCHAR(100) = '22C,45L,50Z,105M'
, #separator CHAR(1) = ',';
DECLARE #parameter XML = TRY_CAST('<root><r><![CDATA[' +
REPLACE(#Str, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML);
;WITH rs AS
(
SELECT c.value('.', 'VARCHAR(10)') AS Code
FROM #parameter.nodes('/root/r/text()') AS t(c)
)
SELECT t.*
FROM #tbl AS t INNER JOIN
rs ON t.Code = rs.Code;
Two alternatives if you're ok doing the transformation outside of SQL.
As one of the comments on your question suggests, you could do this in Excel using this as a formula:
="'" & A1 & "',"
Replace the "A1" with whatever cell your first ID is in. After you enter the formula, click the cell it's in, and there will be a small square on the bottom right. Double click that and it will apply the formula to every cell in the column, automatically shifting the cell reference to match the current row. You can then copy the values from that column and erase the comma at the end.
You could also use an editor that supports regular expression like SSMS, Azure Data Studio, Notepad++, etc and do a Find+Replace:
Paste your IDs in
Hit the replace hotkey (Ctrl+H in all 3 of the ones I listed). There will be an option to enable Regular Expression (SSMS/ADS have a little .* icon, Notepad++ has a labeled radio button). Click it
Find this:
(\w+)
Replace it with this
'$1',
Copy and paste the formatted IDs into your query. Same as above, you'll have to erase the final comma
This will work as long as your IDs are alphanumeric with no spaces, punctuation, etc. If the formatting is more complex, the regex (the (\w+) you search for) will need to be more complex as well. Using this strategy, you could also get rid of the linebreaks by using the regex (\w+)\r\n.
hei, you can use Function CONCATENATE in Excel before you copy those ID in sql.

Extracting specific column values embedded within composite Strings of codes

I am trying to create a piece of code in sql server 2008 that will grab specific values from each distinct string within my dbo table. The ultimate goal is to make a drop down box within Visual Studio so that one can choose all lines from the database that contain a specific product code (see definition of product code below). Example strings:
in_0314_95pf_500_w_0315
in_0314_500_95pf_0315_w
The part of these strings I am wishing to identify is the 3 digit numeric code (in this case let us call it product code) that appears once within each string. There are roughly 300 different product codes.
The problem is that these product code values do not appear in the same position within each unique string. Hence, I am having a hard time determining the product code because I can't use substring, charindex, like, etc.
Any ideas? Any help is MUCH appreciated.
This can be done with PATINDEX:
DECLARE #s NVARCHAR(100) = 'in_0314_95pf_500_w_0315'
SELECT SUBSTRING(#s, PATINDEX('%[_][0-9][0-9][0-9][_]%', #s) + 1, 3)
Output:
500
If there are no underscores then:
SELECT SUBSTRING(#s, PATINDEX('%[^0-9][0-9][0-9][0-9][^0-9]%', #s) + 1, 3)
This means 3 digits between any symbols that are not digits.
EDIT:
Apply to table like:
SELECT SUBSTRING(ColumnName, PATINDEX('%[^0-9][0-9][0-9][0-9][^0-9]%', ColumnName) + 1, 3)
FROM TableName
One approach is to use a String splitting table function like this one which breaks the string up into its components. You can then filter the components based on your criteria:
SELECT Name
FROM dbo.splitstring('in_0314_95pf_500_w_0315', '_')
WHERE ISNUMERIC(Name) = 1 AND LEN(Name) = 3;
I've amended the function slightly to accept the delimiter as a parameter.
CREATE FUNCTION dbo.splitstring ( #stringToSplit VARCHAR(MAX), #delimiter VARCHAR(50))
RETURNS
#returnList TABLE ([Name] [nvarchar] (500))
AS
BEGIN
DECLARE #name NVARCHAR(255)
DECLARE #pos INT
WHILE CHARINDEX(#delimiter, #stringToSplit) > 0
BEGIN
SELECT #pos = CHARINDEX(#delimiter, #stringToSplit)
SELECT #name = SUBSTRING(#stringToSplit, len(#delimiter), #pos-len(#delimiter))
INSERT INTO #returnList
SELECT #name
SELECT #stringToSplit = SUBSTRING(#stringToSplit, #pos+LEN(#delimiter),
LEN(#stringToSplit)-#pos)
END
INSERT INTO #returnList
SELECT #stringToSplit
RETURN
END
To apply this to your table, use CROSS APPLY (Single Delimiter):
SELECT mt.Name, x.Name AS ProductCode
FROM MyTable mt
CROSS APPLY dbo.splitstring(mt.Name, '_') x
WHERE ISNUMERIC(x.Name) = 1 AND LEN(x.Name) = 3
Update, Multiple Delimiters
I guess the real underlying problem is that ultimately the product codes need to be normalized out of the composite key (e.g. add a distinct ProductId or ProductCode column to the same table), derived using a query like this, and then stored back in the table via an update. Reverse engineering the product codes out of the string appears to be a trial and error process.
Nonetheless, you can continue to keep passing the split strings through further splitting functions (one per each type of delimiter), before applying your final discriminating filter:
SELECT *
FROM MyTable mt
CROSS APPLY dbo.splitstring(mt.Name, 'test') y -- First alias
CROSS APPLY dbo.splitstring(y.Name, '_') x -- Reference the preceding alias
WHERE ISNUMERIC(x.Name) = 1 AND LEN(x.Name) = 3; -- Must reference the last alias (x)
Note that the stringsplit function has again been changed to accommodate multicharacter delimiters.
If you have a table (or can generate in inline view) of the product codes, you can join the list of long strings to the product codes with a like clause.
Create Table longcodes (
longcode varchar(20)
)
Create Table products (
prodCode char(3)
)
insert products values('100')
insert products values('111')
insert products values('123')
insert longcodes values ('abc_a_100_test')
insert longcodes values ('asdf_111_bob')
insert longcodes values ('in_0314_123_95pf')
insert longcodes values ('f_100_u')
insert longcodes values ('hihi_111_bye')
insert longcodes values ('in_123_0314_95pf')
insert longcodes values ('a_b__c_d_100_efg')
select *
from products p
join longcodes l on l.longcode like '%_' + p.prodCode + '_%'
And they get aligned with the product codes like this:
prodCode longcode
100 abc_a_100_test
100 f_100_u
100 a_b__c_d_100_efg
111 asdf_111_bob
111 hihi_111_bye
123 in_0314_123_95pf
123 in_123_0314_95pf
EDIT: Seeing the developments in the other answer, you can simplify the like clause to
like p.prodCode
and just deal with the fact that you have a much greater chance of a single composite string producing multiple matches.

How to get the data between mth and nth occurrence in a string

I'm using a SQL Server query to fetch the column information. But I need some information which is after 3rd and 4th occurrence in that particular column
Here is my sample data
[xxxxxxx||gh||vbh||CAPACITY_CPU||aed]
[qwe34||asdf||qwe||CONNECTIVITY||ghj]
[ertgfy||fgv||yuhjj||ACCESS||rty]
[tyhuj||rtg||qwert||ACCESS||TMW]
I'm looking for the data information after 3rd and 4th occurrence of ||
Something like
Capacity_CPU
CONNECTIVITY
ACCESS
My source column is not specific length, it will vary in the length
Use PATINDEX
create regex for the column that you need, then use SUBSTRING to extract the string that you want
You can use mixture of SUBSTRING, CHARINDEX, LEFT AND RIGHT Function. The best solution is you have to play with this function.
`
Create table #t( Name varchar(200))
Insert into #t
values
('[xxxxxxx||gh||vbh||CAPACITY_CPU||aed]'),
('[qwe34||asdf||qwe||CONNECTIVITY||ghj]'),
('[ertgfy||fgv||yuhjj||ACCESS||rty]'),
('[tyhuj||rtg||qwert||ACCESS||TMW]')
Select * from #t
Select
name,
Right(LEFT(name,len(name)-6),charindex('||',reverse(LEFT(name,len(name)-7))))
From #t
`
1) Instead of trying to do such operations with those strings you could normalize database by designing and adding a new table. In this case, you would need a simple SELECT:
SELECT Column4
FROM dbo.Table;
2) Otherwise, one solution is to convert those strings into XML and to use nodes and value XML methods:
DECLARE #Source NVARCHAR(MAX);
SET #Source =
N'[xxxxxxx||gh||vbh||CAPACITY_CPU||aed]
[qwe34||asdf||qwe||CONNECTIVITY||ghj]
[ertgfy||fgv||yuhjj||ACCESS||rty]
[tyhuj||rtg||qwert||ACCESS||TMW]';
DECLARE #EncodedSource NVARCHAR(MAX);
SET #EncodedSource = (SELECT #source FOR XML PATH(''));
DECLARE #x XML;
SET #x = REPLACE(REPLACE(REPLACE(#EncodedSource, N'[', N'<row> <col>'), N']', N'"</col> </row>'), N'||', N'</col> <col>');
SELECT r.XmlContent.value('(col[1]/text())[1]', 'NVARCHAR(100)') AS Col1,
r.XmlContent.value('(col[4]/text())[1]', 'NVARCHAR(100)') AS Col4
FROM #x.nodes('/row') r(XmlContent);
Note: you need to replace NVARCHAR(length) with the proper data type and max. length.

Splitting delimited values in a SQL column into multiple rows

I would really like some advice here, to give some background info I am working with inserting Message Tracking logs from Exchange 2007 into SQL. As we have millions upon millions of rows per day I am using a Bulk Insert statement to insert the data into a SQL table.
In fact I actually Bulk Insert into a temp table and then from there I MERGE the data into the live table, this is for test parsing issues as certain fields otherwise have quotes and such around the values.
This works well, with the exception of the fact that the recipient-address column is a delimited field seperated by a ; character, and it can be incredibly long sometimes as there can be many email recipients.
I would like to take this column, and split the values into multiple rows which would then be inserted into another table. Problem is anything I am trying is either taking too long or not working the way I want.
Take this example data:
message-id recipient-address
2D5E558D4B5A3D4F962DA5051EE364BE06CF37A3A5#Server.com user1#domain1.com
E52F650C53A275488552FFD49F98E9A6BEA1262E#Server.com user2#domain2.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user3#domain3.com;user4#domain4.com;user5#domain5.com
I would like this to be formatted as followed in my Recipients table:
message-id recipient-address
2D5E558D4B5A3D4F962DA5051EE364BE06CF37A3A5#Server.com user1#domain1.com
E52F650C53A275488552FFD49F98E9A6BEA1262E#Server.com user2#domain2.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user3#domain3.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user4#domain4.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user5#domain5.com
Does anyone have any ideas about how I can go about doing this?
I know PowerShell pretty well, so I tried in that, but a foreach loop even on 28K records took forever to process, I need something that will run as quickly/efficiently as possible.
Thanks!
If you are on SQL Server 2016+
You can use the new STRING_SPLIT function, which I've blogged about here, and Brent Ozar has blogged about here.
SELECT s.[message-id], f.value
FROM dbo.SourceData AS s
CROSS APPLY STRING_SPLIT(s.[recipient-address], ';') as f;
If you are still on a version prior to SQL Server 2016
Create a split function. This is just one of many examples out there:
CREATE FUNCTION dbo.SplitStrings
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
AS
RETURN (SELECT Number = ROW_NUMBER() OVER (ORDER BY Number),
Item FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(#List, Number,
CHARINDEX(#Delimiter, #List + #Delimiter, Number) - Number)))
FROM (SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1 CROSS APPLY sys.all_objects) AS n(Number)
WHERE Number <= CONVERT(INT, LEN(#List))
AND SUBSTRING(#Delimiter + #List, Number, 1) = #Delimiter
) AS y);
GO
I've discussed a few others here, here, and a better approach than splitting in the first place here.
Now you can extrapolate simply by:
SELECT s.[message-id], f.Item
FROM dbo.SourceData AS s
CROSS APPLY dbo.SplitStrings(s.[recipient-address], ';') as f;
Also I suggest not putting dashes in column names. It means you always have to put them in [square brackets].
SQL Server 2016 include a new table function string_split(), similar to the previous solution.
The only requirement is Set compatibility level to 130 (SQL Server 2016)
You may use CROSS APPLY (available in SQL Server 2005 and above) and STRING_SPLIT function (available in SQL Server 2016 and above):
DECLARE #delimiter nvarchar(255) = ';';
-- create tables
CREATE TABLE MessageRecipients (MessageId int, Recipients nvarchar(max));
CREATE TABLE MessageRecipient (MessageId int, Recipient nvarchar(max));
-- insert data
INSERT INTO MessageRecipients VALUES (1, 'user1#domain.com; user2#domain.com; user3#domain.com');
INSERT INTO MessageRecipients VALUES (2, 'user#domain1.com; user#domain2.com');
-- insert into MessageRecipient
INSERT INTO MessageRecipient
SELECT MessageId, ltrim(rtrim(value))
FROM MessageRecipients
CROSS APPLY STRING_SPLIT(Recipients, #delimiter)
-- output results
SELECT * FROM MessageRecipients;
SELECT * FROM MessageRecipient;
-- delete tables
DROP TABLE MessageRecipients;
DROP TABLE MessageRecipient;
Results:
MessageId Recipients
----------- ----------------------------------------------------
1 user1#domain.com; user2#domain.com; user3#domain.com
2 user#domain1.com; user#domain2.com
and
MessageId Recipient
----------- ----------------
1 user1#domain.com
1 user2#domain.com
1 user3#domain.com
2 user#domain1.com
2 user#domain2.com
for table = "yelp_business", split the column categories values separated by ; into rows and display as category column.
SELECT unnest(string_to_array(categories, ';')) AS category
FROM yelp_business;