How to Replace Multiple Characters in SQL? - sql

This is based on a similar question How to Replace Multiple Characters in Access SQL?
I wrote this since sql server 2005 seems to have a limit on replace() function to 19 replacements inside a where clause.
I have the following task: Need to perform a match on a column, and to improve the chances of a match stripping multiple un-needed chars using replace() function
DECLARE #es NVarChar(1) SET #es = ''
DECLARE #p0 NVarChar(1) SET #p0 = '!'
DECLARE #p1 NVarChar(1) SET #p1 = '#'
---etc...
SELECT *
FROM t1,t2
WHERE REPLACE(REPLACE(t1.stringkey,#p0, #es), #p1, #es)
= REPLACE(REPLACE(t2.stringkey,#p0, #es), #p1, #es)
---etc
If there are >19 REPLACE() in that where clause, it doesn't work. So the solution I came up with is to create a sql function called trimChars in this example (excuse them starting at #22
CREATE FUNCTION [trimChars] (
#string varchar(max)
)
RETURNS varchar(max)
AS
BEGIN
DECLARE #es NVarChar(1) SET #es = ''
DECLARE #p22 NVarChar(1) SET #p22 = '^'
DECLARE #p23 NVarChar(1) SET #p23 = '&'
DECLARE #p24 NVarChar(1) SET #p24 = '*'
DECLARE #p25 NVarChar(1) SET #p25 = '('
DECLARE #p26 NVarChar(1) SET #p26 = '_'
DECLARE #p27 NVarChar(1) SET #p27 = ')'
DECLARE #p28 NVarChar(1) SET #p28 = '`'
DECLARE #p29 NVarChar(1) SET #p29 = '~'
DECLARE #p30 NVarChar(1) SET #p30 = '{'
DECLARE #p31 NVarChar(1) SET #p31 = '}'
DECLARE #p32 NVarChar(1) SET #p32 = ' '
DECLARE #p33 NVarChar(1) SET #p33 = '['
DECLARE #p34 NVarChar(1) SET #p34 = '?'
DECLARE #p35 NVarChar(1) SET #p35 = ']'
DECLARE #p36 NVarChar(1) SET #p36 = '\'
DECLARE #p37 NVarChar(1) SET #p37 = '|'
DECLARE #p38 NVarChar(1) SET #p38 = '<'
DECLARE #p39 NVarChar(1) SET #p39 = '>'
DECLARE #p40 NVarChar(1) SET #p40 = '#'
DECLARE #p41 NVarChar(1) SET #p41 = '-'
return REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
#string, #p22, #es), #p23, #es), #p24, #es), #p25, #es), #p26, #es), #p27, #es), #p28, #es), #p29, #es), #p30, #es), #p31, #es), #p32, #es), #p33, #es), #p34, #es), #p35, #es), #p36, #es), #p37, #es), #p38, #es), #p39, #es), #p40, #es), #p41, #es)
END
This can then be used in addition to the other replace strings
SELECT *
FROM t1,t2
WHERE trimChars(REPLACE(REPLACE(t1.stringkey,#p0, #es), #p1, #es)
= REPLACE(REPLACE(t2.stringkey,#p0, #es), #p1, #es))
I created a few more functions to do similar replacing like so trimChars(trimMoreChars(
SELECT *
FROM t1,t2
WHERE trimChars(trimMoreChars(REPLACE(REPLACE(t1.stringkey,#p0, #es), #p1, #es)
= REPLACE(REPLACE(t2.stringkey,#p0, #es), #p1, #es)))
Can someone give me a better solution to this problem in terms of performance and maybe a cleaner implementation?

One useful trick in SQL is the ability use #var = function(...) to assign a value. If you have multiple records in your record set, your var is assigned multiple times with side-effects:
declare #badStrings table (item varchar(50))
INSERT INTO #badStrings(item)
SELECT '>' UNION ALL
SELECT '<' UNION ALL
SELECT '(' UNION ALL
SELECT ')' UNION ALL
SELECT '!' UNION ALL
SELECT '?' UNION ALL
SELECT '#'
declare #testString varchar(100), #newString varchar(100)
set #teststring = 'Juliet ro><0zs my s0x()rz!!?!one!#!#!#!'
set #newString = #testString
SELECT #newString = Replace(#newString, item, '') FROM #badStrings
select #newString -- returns 'Juliet ro0zs my s0xrzone'

I would seriously consider making a CLR UDF instead and using regular expressions (both the string and the pattern can be passed in as parameters) to do a complete search and replace for a range of characters. It should easily outperform this SQL UDF.

I really like #Juliett's solution! I would just use a CTE to get all the invalid characters:
DECLARE #badStrings VARCHAR(100)
DECLARE #teststring VARCHAR(100)
SET #badStrings = '><()!?#'
SET #teststring = 'Juliet ro><0zs my s0x()rz!!?!one!#!#!#!'
;WITH CTE AS
(
SELECT SUBSTRING(#badStrings, 1, 1) AS [String], 1 AS [Start], 1 AS [Counter]
UNION ALL
SELECT SUBSTRING(#badStrings, [Start] + 1, 1) AS [String], [Start] + 1, [Counter] + 1
FROM CTE
WHERE [Counter] < LEN(#badStrings)
)
SELECT #teststring = REPLACE(#teststring, CTE.[String], '') FROM CTE
SELECT #teststring
Juliet ro0zs my s0xrzone

I suggest you to create a scalar user defined function. This is an example (sorry in advance, because the variable names are in spanish):
CREATE FUNCTION [dbo].[Udf_ReplaceChars] (
#cadena VARCHAR(500), -- String to manipulate
#caracteresElim VARCHAR(100), -- String of characters to be replaced
#caracteresReem VARCHAR(100) -- String of characters for replacement
)
RETURNS VARCHAR(500)
AS
BEGIN
DECLARE #cadenaFinal VARCHAR(500), #longCad INT, #pos INT, #caracter CHAR(1), #posCarER INT;
SELECT
#cadenaFinal = '',
#longCad = LEN(#cadena),
#pos = 1;
IF LEN(#caracteresElim)<>LEN(#caracteresReem)
BEGIN
RETURN NULL;
END
WHILE #pos <= #longCad
BEGIN
SELECT
#caracter = SUBSTRING(#cadena,#pos,1),
#pos = #pos + 1,
#posCarER = CHARINDEX(#caracter,#caracteresElim);
IF #posCarER <= 0
BEGIN
SET #cadenaFinal = #cadenaFinal + #caracter;
END
ELSE
BEGIN
SET #cadenaFinal = #cadenaFinal + SUBSTRING(#caracteresReem,#posCarER,1)
END
END
RETURN #cadenaFinal;
END
Here is an example using this function:
SELECT dbo.Udf_ReplaceChars('This is a test.','sat','Z47');
And the result is: 7hiZ iZ 4 7eZ7.
As you can see, each character of the #caracteresElim parameter is replaced by the character in the same position from the #caracteresReem parameter.

While this question was asked about SQL Server 2005, it's worth noting that as of Sql Server 2017, the request can be done with the new TRANSLATE function.
https://learn.microsoft.com/en-us/sql/t-sql/functions/translate-transact-sql
I hope this information helps people who get to this page in the future.

I had a one-off data migration issue where the source data could not output correctly some unusual/technical characters plus the ubiquitous extra commas in CSVs.
We decided that for each such character the source extract should replace them with something that was recognisable to both the source system and the SQL Server that was loading them but which would not be in the data otherwise.
It did mean however that in various columns across various tables these replacement characters would appear and I would have to replace them. Nesting multiple REPLACE functions made the import code look scary and prone to errors in misjudging the placement and number of brackets so I wrote the following function. I know it can process a column in a table of 3,000 rows in less than a second though I'm not sure how quickly it will scale up to multi-million row tables.
create function [dbo].[udf_ReplaceMultipleChars]
(
#OriginalString nvarchar(4000)
, #ReplaceTheseChars nvarchar(100)
, #LengthOfReplacement int = 1
)
returns nvarchar(4000)
begin
declare #RevisedString nvarchar(4000) = N'';
declare #lengthofinput int =
(
select len(#OriginalString)
);
with AllNumbers
as (select 1 as Number
union all
select Number + 1
from AllNumbers
where Number < #lengthofinput)
select #RevisedString += case
when (charindex(substring(#OriginalString, Number, 1), #ReplaceTheseChars, 1) - 1) % 2
= 0 then
substring(
#ReplaceTheseChars
, charindex(
substring(#OriginalString, Number, 1)
, #ReplaceTheseChars
, 1
) + 1
, #LengthOfReplacement
)
else
substring(#OriginalString, Number, 1)
end
from AllNumbers
option (maxrecursion 4000);
return (#RevisedString);
end;
It works by submitting both the string to be evaluated and have characters to be replaced (#OriginalString) along with a string of paired characters where the first character is to be replaced by the second, the third by the fourth, fifth by sixth and so on (#ReplaceTheseChars).
Here is the string of chars that I needed to replace and their replacements... [']"~,{Ø}°$±|¼¦¼ª½¬½^¾#✓
i.e. A opening square bracket denotes an apostrophe, a closing one a double quote. You can see that there were vulgar fractions as well as degrees and diameter symbols in there.
There is a default #LengthOfReplacement that is included as a starting point if anyone needed to replace longer strings. I played around with that in my project but the single char replacement was the main function.
The condition of the case statement is important. It ensures that it only replaces the character if it is found in your #ReplaceTheseChars variable and that the character has to be found in an odd numbered position (the minus 1 from charindex result ensures that anything NOT found returns a negative modulo value). i.e if you find a tilde (~) in position 5 it will replace it with a comma but if on a subsequent run it found the comma in position 6 it would not replace it with a curly bracket ({).
This can be best demonstrated with an example...
declare #ProductDescription nvarchar(20) = N'abc~def[¦][123';
select #ProductDescription
= dbo.udf_ReplaceMultipleChars(
#ProductDescription
/* NB the doubling up of the apostrophe is necessary in the string but resolves to a single apostrophe when passed to the function */
,'['']"~,{Ø}°$±|¼¦¼ª½¬½^¾#✓'
, default
);
select #ProductDescription
, dbo.udf_ReplaceMultipleChars(
#ProductDescription
,'['']"~,{Ø}°$±|¼¦¼ª½¬½^¾#✓'
/* if you didn't know how to type those peculiar chars in then you can build a string like this... '[' + nchar(0x0027) + ']"~,{' + nchar(0x00D8) + '}' + nchar(0x00B0) etc */
,
default
);
This will return both the value after the first pass through the function and the second time as follows...
abc,def'¼"'123 abc,def'¼"'123
A table update would just be
update a
set a.Col1 = udf.ReplaceMultipleChars(a.Col1,'~,]"',1)
from TestTable a
Finally (I hear you say!), although I've not had access to the translate function I believe that this function can process the example shown in the documentation quite easily. The TRANSLATE function demo is
SELECT TRANSLATE('2*[3+4]/{7-2}', '[]{}', '()()');
which returns 2*(3+4)/(7-2) although I understand it might not work on 2*[3+4]/[7-2] !!
My function would approach this as follows listing each char to be replaced followed by its replacement [ --> (, { --> ( etc.
select dbo.udf_ReplaceMultipleChars('2*[3+4]/{7-2}', '[({(])})', 1);
which will also work for
select dbo.udf_ReplaceMultipleChars('2*[3+4]/[7-2]', '[({(])})', 1);
I hope someone finds this useful and if you get to test its performance against larger tables do let us know one way or another!

declare #testVal varchar(20)
set #testVal = '?t/es?ti/n*g 1*2?3*'
select #testVal = REPLACE(#testVal, item, '') from (select '?' item union select '*' union select '/') list
select #testVal;

One option is to use a numbers/tally table to drive an iterative process via a pseudo-set based query.
The general idea of char replacement can be demonstrated with a simple character map table approach:
create table charMap (srcChar char(1), replaceChar char(1))
insert charMap values ('a', 'z')
insert charMap values ('b', 'y')
create table testChar(srcChar char(1))
insert testChar values ('1')
insert testChar values ('a')
insert testChar values ('2')
insert testChar values ('b')
select
coalesce(charMap.replaceChar, testChar.srcChar) as charData
from testChar left join charMap on testChar.srcChar = charMap.srcChar
Then you can bring in the tally table approach to do the lookup on each character position in the string.
create table tally (i int)
declare #i int
set #i = 1
while #i <= 256 begin
insert tally values (#i)
set #i = #i + 1
end
create table testData (testString char(10))
insert testData values ('123a456')
insert testData values ('123ab456')
insert testData values ('123b456')
select
i,
SUBSTRING(testString, i, 1) as srcChar,
coalesce(charMap.replaceChar, SUBSTRING(testString, i, 1)) as charData
from testData cross join tally
left join charMap on SUBSTRING(testString, i, 1) = charMap.srcChar
where i <= LEN(testString)

I don't know why Charles Bretana deleted his answer, so I'm adding it back in as a CW answer, but a persisted computed column is a REALLY good way to handle these cases where you need cleansed or transformed data almost all the time, but need to preserve the original garbage. His suggestion is relevant and appropriate REGARDLESS of how you decide to cleanse your data.
Specifically, in my current project, I have a persisted computed column which trims all the leading zeros (luckily this is realtively easily handled in straight T-SQL) from some particular numeric identifiers stored inconsistently with leading zeros. This is stored in persisted computed columns in the tables which need it and indexed because that conformed identifier is often used in joins.

Here are the steps
Create a CLR function
See following code:
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString Replace2(SqlString inputtext, SqlString filter,SqlString replacewith)
{
string str = inputtext.ToString();
try
{
string pattern = (string)filter;
string replacement = (string)replacewith;
Regex rgx = new Regex(pattern);
string result = rgx.Replace(str, replacement);
return (SqlString)result;
}
catch (Exception s)
{
return (SqlString)s.Message;
}
}
}
Deploy your CLR function
Now Test it
See following code:
create table dbo.test(dummydata varchar(255))
Go
INSERT INTO dbo.test values('P#ssw1rd'),('This 12is #test')
Go
Update dbo.test
set dummydata=dbo.Replace2(dummydata,'[0-9#]','')
select * from dbo.test
dummydata, Psswrd, This is test booom!!!!!!!!!!!!!

Here's a modern solution using STRING_SPLIT that's very concise. The drawback is that you need at least version SQL Server 2016 running at compatibility level 130.
Declare #strOriginal varchar(100) = 'Juliet ro><0zs my s0x()rz!!?!one!#!#!#!'
Declare #strModified varchar(100) = #strOriginal
Declare #disallowed varchar(100) = '> < ( ) ! ? #'
Select
#strModified = Replace(#strModified, value, '')
From
String_Split(#disallowed,' ')
Select #strModified
It returns:
Juliet ro0zs my s0xrzone

create function RemoveCharacters(#original nvarchar(max) , #badchars nvarchar(max))
returns nvarchar(max)
as
begin
declare #len int = (select len(#badchars))
return REPLACE(TRANSLATE(#original, #badchars, replicate('#' , #len )), '#', '')
end
go
select dbo.RemoveCharacters('Hello World!' , 'lo!' )
--returns He Wrd

Related

Get value from a string between special characters in sql server [duplicate]

I have a need to create a function the will return nth element of a delimited string.
For a data migration project, I am converting JSON audit records stored in a SQL Server database into a structured report using SQL script. Goal is to deliver a sql script and a sql function used by the script without any code.
(This is a short-term fix will be used while a new auditing feature is added the ASP.NET/MVC application)
There is no shortage of delimited string to table examples available.
I've chosen a Common Table Expression example http://www.sqlperformance.com/2012/07/t-sql-queries/split-strings
Example: I want to return 67 from '1,222,2,67,888,1111'
This is the easiest answer to rerieve the 67 (type-safe!!):
SELECT CAST('<x>' + REPLACE('1,222,2,67,888,1111',',','</x><x>') + '</x>' AS XML).value('/x[4]','int')
In the following you will find examples how to use this with variables for the string, the delimiter and the position (even for edge-cases with XML-forbidden characters)
The easy one
This question is not about a string split approach, but about how to get the nth element. The easiest, fully inlineable way would be this IMO:
This is a real one-liner to get part 2 delimited by a space:
DECLARE #input NVARCHAR(100)=N'part1 part2 part3';
SELECT CAST(N'<x>' + REPLACE(#input,N' ',N'</x><x>') + N'</x>' AS XML).value('/x[2]','nvarchar(max)')
Variables can be used with sql:variable() or sql:column()
Of course you can use variables for delimiter and position (use sql:column to retrieve the position directly from a query's value):
DECLARE #dlmt NVARCHAR(10)=N' ';
DECLARE #pos INT = 2;
SELECT CAST(N'<x>' + REPLACE(#input,#dlmt,N'</x><x>') + N'</x>' AS XML).value('/x[sql:variable("#pos")][1]','nvarchar(max)')
Edge-Case with XML-forbidden characters
If your string might include forbidden characters, you still can do it this way. Just use FOR XML PATH on your string first to replace all forbidden characters with the fitting escape sequence implicitly.
It's a very special case if - additionally - your delimiter is the semicolon. In this case I replace the delimiter first to '#DLMT#', and replace this to the XML tags finally:
SET #input=N'Some <, > and &;Other äöü#€;One more';
SET #dlmt=N';';
SELECT CAST(N'<x>' + REPLACE((SELECT REPLACE(#input,#dlmt,'#DLMT#') AS [*] FOR XML PATH('')),N'#DLMT#',N'</x><x>') + N'</x>' AS XML).value('/x[sql:variable("#pos")][1]','nvarchar(max)');
UPDATE for SQL-Server 2016+
Regretfully the developers forgot to return the part's index with STRING_SPLIT. But, using SQL-Server 2016+, there is JSON_VALUE and OPENJSON.
With JSON_VALUE we can pass in the position as the index' array.
For OPENJSON the documentation states clearly:
When OPENJSON parses a JSON array, the function returns the indexes of the elements in the JSON text as keys.
A string like 1,2,3 needs nothing more than brackets: [1,2,3].
A string of words like this is an example needs to be ["this","is","an"," example"].
These are very easy string operations. Just try it out:
DECLARE #str VARCHAR(100)='Hello John Smith';
DECLARE #position INT = 2;
--We can build the json-path '$[1]' using CONCAT
SELECT JSON_VALUE('["' + REPLACE(#str,' ','","') + '"]',CONCAT('$[',#position-1,']'));
--See this for a position safe string-splitter (zero-based):
SELECT JsonArray.[key] AS [Position]
,JsonArray.[value] AS [Part]
FROM OPENJSON('["' + REPLACE(#str,' ','","') + '"]') JsonArray
In this post I tested various approaches and found, that OPENJSON is really fast. Even much faster than the famous "delimitedSplit8k()" method...
UPDATE 2 - Get the values type-safe
We can use an array within an array simply by using doubled [[]]. This allows for a typed WITH-clause:
DECLARE #SomeDelimitedString VARCHAR(100)='part1|1|20190920';
DECLARE #JsonArray NVARCHAR(MAX)=CONCAT('[["',REPLACE(#SomeDelimitedString,'|','","'),'"]]');
SELECT #SomeDelimitedString AS TheOriginal
,#JsonArray AS TransformedToJSON
,ValuesFromTheArray.*
FROM OPENJSON(#JsonArray)
WITH(TheFirstFragment VARCHAR(100) '$[0]'
,TheSecondFragment INT '$[1]'
,TheThirdFragment DATE '$[2]') ValuesFromTheArray
Here is my initial solution...
It is based on work by Aaron Bertrand http://www.sqlperformance.com/2012/07/t-sql-queries/split-strings
I simply changed the return type to make it a scalar function.
Example:
SELECT dbo.GetSplitString_CTE('1,222,2,67,888,1111',',',4)
CREATE FUNCTION dbo.GetSplitString_CTE
(
#List VARCHAR(MAX),
#Delimiter VARCHAR(255),
#ElementNumber int
)
RETURNS VARCHAR(4000)
AS
BEGIN
DECLARE #result varchar(4000)
DECLARE #Items TABLE ( position int IDENTITY PRIMARY KEY,
Item VARCHAR(4000)
)
DECLARE #ll INT = LEN(#List) + 1, #ld INT = LEN(#Delimiter);
WITH a AS
(
SELECT
[start] = 1,
[end] = COALESCE(NULLIF(CHARINDEX(#Delimiter,
#List, #ld), 0), #ll),
[value] = SUBSTRING(#List, 1,
COALESCE(NULLIF(CHARINDEX(#Delimiter,
#List, #ld), 0), #ll) - 1)
UNION ALL
SELECT
[start] = CONVERT(INT, [end]) + #ld,
[end] = COALESCE(NULLIF(CHARINDEX(#Delimiter,
#List, [end] + #ld), 0), #ll),
[value] = SUBSTRING(#List, [end] + #ld,
COALESCE(NULLIF(CHARINDEX(#Delimiter,
#List, [end] + #ld), 0), #ll)-[end]-#ld)
FROM a
WHERE [end] < #ll
)
INSERT #Items SELECT [value]
FROM a
WHERE LEN([value]) > 0
OPTION (MAXRECURSION 0);
SELECT #result=Item
FROM #Items
WHERE position=#ElementNumber
RETURN #result;
END
GO
How about:
CREATE FUNCTION dbo.NTH_ELEMENT (#Input NVARCHAR(MAX), #Delim CHAR = '-', #N INT = 0)
RETURNS NVARCHAR(MAX)
AS
BEGIN
RETURN (SELECT VALUE FROM STRING_SPLIT(#Input, #Delim) ORDER BY (SELECT NULL) OFFSET #N ROWS FETCH NEXT 1 ROW ONLY)
END
On Azure SQL Database, and on SQL Server 2022, STRING_SPLIT now has an optional ordinal parameter. If the parameter is omitted, or 0 is passed, then the function acts as it did before, and just returns a value column and the order is not guaranteed. If you pass the parameter with the value 1 then the function returns 2 columns, value, and ordinal which (unsurprisingly) provides the ordinal position of the value within the string.
So, if you wanted the 4th delimited value from the string '1,222,2,67,888,1111' you could do the following:
SELECT [value]
FROM STRING_SPLIT('1,222,2,67,888,1111',',',1)
WHERE ordinal = 4;
If the value was in a column, it would look like this:
SELECT SS.[value]
FROM dbo.YourTable YT
CROSS APPLY STRING_SPLIT(YT.YourColumn,',',1) SS
WHERE SS.ordinal = 4;
#a - the value (f.e. 'a/bb/ccc/dddd/ee/ff/....')
#p - the desired position (1,2,3...)
#d - the delimeter ( '/' )
trim(substring(replace(#a,#d,replicate(' ',len(#a))),(#p-1)*len(#a)+1,len(#a)))
only problem is - if desired part has trailing or leading blanks they get trimmed.
Completely Based on article from https://exceljet.net/formula/split-text-with-delimiter
In a rare moment of lunacy I just thought that split is far easier if we use XML to parse it out for us:
(Using the variables from #Gary Kindel's answer)
declare #xml xml
set #xml = '<split><el>' + replace(#list,#Delimiter,'</el><el>') + '</el></split>'
select
el = split.el.value('.','varchar(max)')
from #xml.nodes('/split/el') split(el))
This lists all elements of the string, split by the specified character.
We can use an xpath test to filter out empty values, and a further xpath test to restrict this to the element we're interested in. In full Gary's function becomes:
alter FUNCTION dbo.GetSplitString_CTE
(
#List VARCHAR(MAX),
#Delimiter VARCHAR(255),
#ElementNumber int
)
RETURNS VARCHAR(max)
AS
BEGIN
-- escape any XML https://dba.stackexchange.com/a/143140/65992
set #list = convert(VARCHAR(MAX),(select #list for xml path(''), type));
declare #xml xml
set #xml = '<split><el>' + replace(#list,#Delimiter,'</el><el>') + '</el></split>'
declare #ret varchar(max)
set #ret = (select
el = split.el.value('.','varchar(max)')
from #xml.nodes('/split/el[string-length(.)>0][position() = sql:variable("#elementnumber")]') split(el))
return #ret
END
you can put this select into UFN. if you need you can customize it for specifying delimiter as well. in that case your ufn will have two input. number Nth and delimiter to use.
DECLARE #tlist varchar(max)='10,20,30,40,50,60,70,80,90,100'
DECLARE #i INT=1, #nth INT=3
While len(#tlist) <> 0
BEGIN
IF #i=#nth
BEGIN
select Case when charindex(',',#tlist) <> 0 Then LEFT(#tlist,charindex(',',#tlist)-1)
Else #tlist
END
END
Select #tlist = Case when charindex(',',#tlist) <> 0 Then substring(#tlist,charindex(',',#tlist)+1,len(#tlist))
Else ''
END
SELECT #i=#i+1
END
Alternatively, one can use xml, nodes() and ROW_NUMBER. We can order the elements based on their document order. For example:
DECLARE #Input VARCHAR(100) = '1a,2b,3c,4d,5e,6f,7g,8h'
,#Number TINYINT = 3
DECLARE #XML XML;
DECLARE #value VARCHAR(100);
SET #XML = CAST('<x>' + REPLACE(#Input,',','</x><x>') + '</x>' AS XML);
WITH DataSource ([rowID], [rowValue]) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY T.c ASC)
,T.c.value('.', 'VARCHAR(100)')
FROM #XML.nodes('./x') T(c)
)
SELECT #value = [rowValue]
FROM DataSource
WHERE [rowID] = #Number;
SELECT #value;
I would rather create a temp table with an identity column and fill it up with output from the SPLIT function.
CREATE TABLE #tblVals(Id INT IDENTITY(1,1), Val NVARCHAR(100))
INSERT INTO #tblVals (Val)
SELECT [value] FROM STRING_SPLIT('Val1-Val3-Val2-Val5', '-')
SELECT * FROM #tblVals
Now you can easily do something like below.
DECLARE #val2 NVARCHAR(100) = (SELECT TOP 1 Val FROM #tblVals WHERE Id = 2)
See the snapshot below:
You can use STRING_SPLIT with ROW_NUMBER:
SELECT value, idx FROM
(
SELECT
value,
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) idx
FROM STRING_SPLIT('Lorem ipsum dolor sit amet.', ' ')
) t
WHERE idx=2
returns second element (idx=2): 'ipsum'
We have the answer over below url.
DECLARE # AS VARCHAR(MAX) = 'Pawan1,Pawan2,Pawan4,Pawan3'
SELECT VALUE FROM
(
SELECT VALUE , ROW_NUMBER() OVER (ORDER BY (SELECT null)) rnk FROM STRING_SPLIT(#, ',')
)x where rnk = 3
GO
https://msbiskills.com/2018/06/15/sql-puzzle-multiple-ways-to-split-a-string-and-get-nth-row-xml-advanced-sql/
I don't have enough reputation to comment, so I am adding an answer. Please adjust as appropriate.
I have a problem with Gary Kindel's answer for cases where there is nothing between the two delimiters
If you do
select * from dbo.GetSplitString_CTE('abc^def^^ghi','^',3)
you get
ghi
instead of an empty string
If you comment out the
WHERE LEN([value]) > 0
line, you get the desired result
I cannot comment on Gary's solution because of my low reputation
I know Gary was referencing another link.
I have struggled to understand why we need this variable
#ld INT = LEN(#Delimiter)
I also don't understand why charindex has to start at the position of length of delimiter, #ld
I tested with many examples with a single character delimiter, and they work. Most of the time, delimiter character is a single character. However, since the developer included the ld as length of delimiter, the code has to work for delimiters that have more than one character
In this case, the following case will fail
11,,,22,,,33,,,44,,,55,,,
I cloned from the codes from this link. http://codebetter.com/raymondlewallen/2005/10/26/quick-t-sql-to-parse-a-delimited-string/
I have tested various scenarios including the delimiters that have more than one character
alter FUNCTION [dbo].[split1]
(
#string1 VARCHAR(8000) -- List of delimited items
, #Delimiter VARCHAR(40) = ',' -- delimiter that separates items
, #ElementNumber int
)
RETURNS varchar(8000)
AS
BEGIN
declare #position int
declare #piece varchar(8000)=''
declare #returnVal varchar(8000)=''
declare #Pattern varchar(50) = '%' + #Delimiter + '%'
declare #counter int =0
declare #ld int = len(#Delimiter)
declare #ls1 int = len (#string1)
declare #foundit int = 0
if patindex(#Pattern , #string1) = 0
return ''
if right(rtrim(#string1),1) <> #Delimiter
set #string1 = #string1 + #Delimiter
set #position = patindex(#Pattern , #string1) + #ld -1
while #position > 0
begin
set #counter = #counter +1
set #ls1 = len (#string1)
if (#ls1 >= #ld)
set #piece = left(#string1, #position - #ld)
else
break
if (#counter = #ElementNumber)
begin
set #foundit = 1
break
end
if len(#string1) > 0
begin
set #string1 = stuff(#string1, 1, #position, '')
set #position = patindex(#Pattern , #string1) + #ld -1
end
else
set #position = -1
end
if #foundit =1
set #returnVal = #piece
else
set #returnVal = ''
return #returnVal
you can create simple table variable and use it as below
Declare #tbl_split Table (Id INT IDENTITY(1,1), VAL VARCHAR(50))
INSERT #tbl_split SELECT VALUE
FROM string_split('999999:01', ':')
Select val from #tbl_split
WHERE Id=2

SQL to Split between Pipe [duplicate]

I have a need to create a function the will return nth element of a delimited string.
For a data migration project, I am converting JSON audit records stored in a SQL Server database into a structured report using SQL script. Goal is to deliver a sql script and a sql function used by the script without any code.
(This is a short-term fix will be used while a new auditing feature is added the ASP.NET/MVC application)
There is no shortage of delimited string to table examples available.
I've chosen a Common Table Expression example http://www.sqlperformance.com/2012/07/t-sql-queries/split-strings
Example: I want to return 67 from '1,222,2,67,888,1111'
This is the easiest answer to rerieve the 67 (type-safe!!):
SELECT CAST('<x>' + REPLACE('1,222,2,67,888,1111',',','</x><x>') + '</x>' AS XML).value('/x[4]','int')
In the following you will find examples how to use this with variables for the string, the delimiter and the position (even for edge-cases with XML-forbidden characters)
The easy one
This question is not about a string split approach, but about how to get the nth element. The easiest, fully inlineable way would be this IMO:
This is a real one-liner to get part 2 delimited by a space:
DECLARE #input NVARCHAR(100)=N'part1 part2 part3';
SELECT CAST(N'<x>' + REPLACE(#input,N' ',N'</x><x>') + N'</x>' AS XML).value('/x[2]','nvarchar(max)')
Variables can be used with sql:variable() or sql:column()
Of course you can use variables for delimiter and position (use sql:column to retrieve the position directly from a query's value):
DECLARE #dlmt NVARCHAR(10)=N' ';
DECLARE #pos INT = 2;
SELECT CAST(N'<x>' + REPLACE(#input,#dlmt,N'</x><x>') + N'</x>' AS XML).value('/x[sql:variable("#pos")][1]','nvarchar(max)')
Edge-Case with XML-forbidden characters
If your string might include forbidden characters, you still can do it this way. Just use FOR XML PATH on your string first to replace all forbidden characters with the fitting escape sequence implicitly.
It's a very special case if - additionally - your delimiter is the semicolon. In this case I replace the delimiter first to '#DLMT#', and replace this to the XML tags finally:
SET #input=N'Some <, > and &;Other äöü#€;One more';
SET #dlmt=N';';
SELECT CAST(N'<x>' + REPLACE((SELECT REPLACE(#input,#dlmt,'#DLMT#') AS [*] FOR XML PATH('')),N'#DLMT#',N'</x><x>') + N'</x>' AS XML).value('/x[sql:variable("#pos")][1]','nvarchar(max)');
UPDATE for SQL-Server 2016+
Regretfully the developers forgot to return the part's index with STRING_SPLIT. But, using SQL-Server 2016+, there is JSON_VALUE and OPENJSON.
With JSON_VALUE we can pass in the position as the index' array.
For OPENJSON the documentation states clearly:
When OPENJSON parses a JSON array, the function returns the indexes of the elements in the JSON text as keys.
A string like 1,2,3 needs nothing more than brackets: [1,2,3].
A string of words like this is an example needs to be ["this","is","an"," example"].
These are very easy string operations. Just try it out:
DECLARE #str VARCHAR(100)='Hello John Smith';
DECLARE #position INT = 2;
--We can build the json-path '$[1]' using CONCAT
SELECT JSON_VALUE('["' + REPLACE(#str,' ','","') + '"]',CONCAT('$[',#position-1,']'));
--See this for a position safe string-splitter (zero-based):
SELECT JsonArray.[key] AS [Position]
,JsonArray.[value] AS [Part]
FROM OPENJSON('["' + REPLACE(#str,' ','","') + '"]') JsonArray
In this post I tested various approaches and found, that OPENJSON is really fast. Even much faster than the famous "delimitedSplit8k()" method...
UPDATE 2 - Get the values type-safe
We can use an array within an array simply by using doubled [[]]. This allows for a typed WITH-clause:
DECLARE #SomeDelimitedString VARCHAR(100)='part1|1|20190920';
DECLARE #JsonArray NVARCHAR(MAX)=CONCAT('[["',REPLACE(#SomeDelimitedString,'|','","'),'"]]');
SELECT #SomeDelimitedString AS TheOriginal
,#JsonArray AS TransformedToJSON
,ValuesFromTheArray.*
FROM OPENJSON(#JsonArray)
WITH(TheFirstFragment VARCHAR(100) '$[0]'
,TheSecondFragment INT '$[1]'
,TheThirdFragment DATE '$[2]') ValuesFromTheArray
Here is my initial solution...
It is based on work by Aaron Bertrand http://www.sqlperformance.com/2012/07/t-sql-queries/split-strings
I simply changed the return type to make it a scalar function.
Example:
SELECT dbo.GetSplitString_CTE('1,222,2,67,888,1111',',',4)
CREATE FUNCTION dbo.GetSplitString_CTE
(
#List VARCHAR(MAX),
#Delimiter VARCHAR(255),
#ElementNumber int
)
RETURNS VARCHAR(4000)
AS
BEGIN
DECLARE #result varchar(4000)
DECLARE #Items TABLE ( position int IDENTITY PRIMARY KEY,
Item VARCHAR(4000)
)
DECLARE #ll INT = LEN(#List) + 1, #ld INT = LEN(#Delimiter);
WITH a AS
(
SELECT
[start] = 1,
[end] = COALESCE(NULLIF(CHARINDEX(#Delimiter,
#List, #ld), 0), #ll),
[value] = SUBSTRING(#List, 1,
COALESCE(NULLIF(CHARINDEX(#Delimiter,
#List, #ld), 0), #ll) - 1)
UNION ALL
SELECT
[start] = CONVERT(INT, [end]) + #ld,
[end] = COALESCE(NULLIF(CHARINDEX(#Delimiter,
#List, [end] + #ld), 0), #ll),
[value] = SUBSTRING(#List, [end] + #ld,
COALESCE(NULLIF(CHARINDEX(#Delimiter,
#List, [end] + #ld), 0), #ll)-[end]-#ld)
FROM a
WHERE [end] < #ll
)
INSERT #Items SELECT [value]
FROM a
WHERE LEN([value]) > 0
OPTION (MAXRECURSION 0);
SELECT #result=Item
FROM #Items
WHERE position=#ElementNumber
RETURN #result;
END
GO
How about:
CREATE FUNCTION dbo.NTH_ELEMENT (#Input NVARCHAR(MAX), #Delim CHAR = '-', #N INT = 0)
RETURNS NVARCHAR(MAX)
AS
BEGIN
RETURN (SELECT VALUE FROM STRING_SPLIT(#Input, #Delim) ORDER BY (SELECT NULL) OFFSET #N ROWS FETCH NEXT 1 ROW ONLY)
END
On Azure SQL Database, and on SQL Server 2022, STRING_SPLIT now has an optional ordinal parameter. If the parameter is omitted, or 0 is passed, then the function acts as it did before, and just returns a value column and the order is not guaranteed. If you pass the parameter with the value 1 then the function returns 2 columns, value, and ordinal which (unsurprisingly) provides the ordinal position of the value within the string.
So, if you wanted the 4th delimited value from the string '1,222,2,67,888,1111' you could do the following:
SELECT [value]
FROM STRING_SPLIT('1,222,2,67,888,1111',',',1)
WHERE ordinal = 4;
If the value was in a column, it would look like this:
SELECT SS.[value]
FROM dbo.YourTable YT
CROSS APPLY STRING_SPLIT(YT.YourColumn,',',1) SS
WHERE SS.ordinal = 4;
#a - the value (f.e. 'a/bb/ccc/dddd/ee/ff/....')
#p - the desired position (1,2,3...)
#d - the delimeter ( '/' )
trim(substring(replace(#a,#d,replicate(' ',len(#a))),(#p-1)*len(#a)+1,len(#a)))
only problem is - if desired part has trailing or leading blanks they get trimmed.
Completely Based on article from https://exceljet.net/formula/split-text-with-delimiter
In a rare moment of lunacy I just thought that split is far easier if we use XML to parse it out for us:
(Using the variables from #Gary Kindel's answer)
declare #xml xml
set #xml = '<split><el>' + replace(#list,#Delimiter,'</el><el>') + '</el></split>'
select
el = split.el.value('.','varchar(max)')
from #xml.nodes('/split/el') split(el))
This lists all elements of the string, split by the specified character.
We can use an xpath test to filter out empty values, and a further xpath test to restrict this to the element we're interested in. In full Gary's function becomes:
alter FUNCTION dbo.GetSplitString_CTE
(
#List VARCHAR(MAX),
#Delimiter VARCHAR(255),
#ElementNumber int
)
RETURNS VARCHAR(max)
AS
BEGIN
-- escape any XML https://dba.stackexchange.com/a/143140/65992
set #list = convert(VARCHAR(MAX),(select #list for xml path(''), type));
declare #xml xml
set #xml = '<split><el>' + replace(#list,#Delimiter,'</el><el>') + '</el></split>'
declare #ret varchar(max)
set #ret = (select
el = split.el.value('.','varchar(max)')
from #xml.nodes('/split/el[string-length(.)>0][position() = sql:variable("#elementnumber")]') split(el))
return #ret
END
you can put this select into UFN. if you need you can customize it for specifying delimiter as well. in that case your ufn will have two input. number Nth and delimiter to use.
DECLARE #tlist varchar(max)='10,20,30,40,50,60,70,80,90,100'
DECLARE #i INT=1, #nth INT=3
While len(#tlist) <> 0
BEGIN
IF #i=#nth
BEGIN
select Case when charindex(',',#tlist) <> 0 Then LEFT(#tlist,charindex(',',#tlist)-1)
Else #tlist
END
END
Select #tlist = Case when charindex(',',#tlist) <> 0 Then substring(#tlist,charindex(',',#tlist)+1,len(#tlist))
Else ''
END
SELECT #i=#i+1
END
Alternatively, one can use xml, nodes() and ROW_NUMBER. We can order the elements based on their document order. For example:
DECLARE #Input VARCHAR(100) = '1a,2b,3c,4d,5e,6f,7g,8h'
,#Number TINYINT = 3
DECLARE #XML XML;
DECLARE #value VARCHAR(100);
SET #XML = CAST('<x>' + REPLACE(#Input,',','</x><x>') + '</x>' AS XML);
WITH DataSource ([rowID], [rowValue]) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY T.c ASC)
,T.c.value('.', 'VARCHAR(100)')
FROM #XML.nodes('./x') T(c)
)
SELECT #value = [rowValue]
FROM DataSource
WHERE [rowID] = #Number;
SELECT #value;
I would rather create a temp table with an identity column and fill it up with output from the SPLIT function.
CREATE TABLE #tblVals(Id INT IDENTITY(1,1), Val NVARCHAR(100))
INSERT INTO #tblVals (Val)
SELECT [value] FROM STRING_SPLIT('Val1-Val3-Val2-Val5', '-')
SELECT * FROM #tblVals
Now you can easily do something like below.
DECLARE #val2 NVARCHAR(100) = (SELECT TOP 1 Val FROM #tblVals WHERE Id = 2)
See the snapshot below:
You can use STRING_SPLIT with ROW_NUMBER:
SELECT value, idx FROM
(
SELECT
value,
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) idx
FROM STRING_SPLIT('Lorem ipsum dolor sit amet.', ' ')
) t
WHERE idx=2
returns second element (idx=2): 'ipsum'
We have the answer over below url.
DECLARE # AS VARCHAR(MAX) = 'Pawan1,Pawan2,Pawan4,Pawan3'
SELECT VALUE FROM
(
SELECT VALUE , ROW_NUMBER() OVER (ORDER BY (SELECT null)) rnk FROM STRING_SPLIT(#, ',')
)x where rnk = 3
GO
https://msbiskills.com/2018/06/15/sql-puzzle-multiple-ways-to-split-a-string-and-get-nth-row-xml-advanced-sql/
I don't have enough reputation to comment, so I am adding an answer. Please adjust as appropriate.
I have a problem with Gary Kindel's answer for cases where there is nothing between the two delimiters
If you do
select * from dbo.GetSplitString_CTE('abc^def^^ghi','^',3)
you get
ghi
instead of an empty string
If you comment out the
WHERE LEN([value]) > 0
line, you get the desired result
I cannot comment on Gary's solution because of my low reputation
I know Gary was referencing another link.
I have struggled to understand why we need this variable
#ld INT = LEN(#Delimiter)
I also don't understand why charindex has to start at the position of length of delimiter, #ld
I tested with many examples with a single character delimiter, and they work. Most of the time, delimiter character is a single character. However, since the developer included the ld as length of delimiter, the code has to work for delimiters that have more than one character
In this case, the following case will fail
11,,,22,,,33,,,44,,,55,,,
I cloned from the codes from this link. http://codebetter.com/raymondlewallen/2005/10/26/quick-t-sql-to-parse-a-delimited-string/
I have tested various scenarios including the delimiters that have more than one character
alter FUNCTION [dbo].[split1]
(
#string1 VARCHAR(8000) -- List of delimited items
, #Delimiter VARCHAR(40) = ',' -- delimiter that separates items
, #ElementNumber int
)
RETURNS varchar(8000)
AS
BEGIN
declare #position int
declare #piece varchar(8000)=''
declare #returnVal varchar(8000)=''
declare #Pattern varchar(50) = '%' + #Delimiter + '%'
declare #counter int =0
declare #ld int = len(#Delimiter)
declare #ls1 int = len (#string1)
declare #foundit int = 0
if patindex(#Pattern , #string1) = 0
return ''
if right(rtrim(#string1),1) <> #Delimiter
set #string1 = #string1 + #Delimiter
set #position = patindex(#Pattern , #string1) + #ld -1
while #position > 0
begin
set #counter = #counter +1
set #ls1 = len (#string1)
if (#ls1 >= #ld)
set #piece = left(#string1, #position - #ld)
else
break
if (#counter = #ElementNumber)
begin
set #foundit = 1
break
end
if len(#string1) > 0
begin
set #string1 = stuff(#string1, 1, #position, '')
set #position = patindex(#Pattern , #string1) + #ld -1
end
else
set #position = -1
end
if #foundit =1
set #returnVal = #piece
else
set #returnVal = ''
return #returnVal
you can create simple table variable and use it as below
Declare #tbl_split Table (Id INT IDENTITY(1,1), VAL VARCHAR(50))
INSERT #tbl_split SELECT VALUE
FROM string_split('999999:01', ':')
Select val from #tbl_split
WHERE Id=2

Is there a LastIndexOf in SQL Server?

I am trying to parse out a value from a string that involves getting the last index of a string. Currently, I am doing a horrible hack that involves reversing a string:
SELECT REVERSE(SUBSTRING(REVERSE(DB_NAME()), 1,
CHARINDEX('_', REVERSE(DB_NAME()), 1) - 1))
To me this code is nearly unreadable. I just upgraded to SQL Server 2016 and I hoping there is a better way.
Is there?
If you want everything after the last _, then use:
select right(db_name(), charindex('_', reverse(db_name()) + '_') - 1)
If you want everything before, then use left():
select left(db_name(), len(db_name()) - charindex('_', reverse(db_name()) + '_'))
Wrote 2 functions, 1 to return LastIndexOf for the selected character.
CREATE FUNCTION dbo.LastIndexOf(#source nvarchar(80), #pattern char)
RETURNS int
BEGIN
RETURN (LEN(#source)) - CHARINDEX(#pattern, REVERSE(#source))
END;
GO
and 1 to return a string before this LastIndexOf. Maybe it will be useful to someone.
CREATE FUNCTION dbo.StringBeforeLastIndex(#source nvarchar(80), #pattern char)
RETURNS nvarchar(80)
BEGIN
DECLARE #lastIndex int
SET #lastIndex = (LEN(#source)) - CHARINDEX(#pattern, REVERSE(#source))
RETURN SUBSTRING(#source, 0, #lastindex + 1)
-- +1 because index starts at 0, but length at 1, so to get up to 11th index, we need LENGTH 11+1=12
END;
GO
No, SQL server doesnt have LastIndexOf.
This are the available string functions
But you can always can create your own function
CREATE FUNCTION dbo.LastIndexOf(#source text, #pattern char)
RETURNS
AS
BEGIN
DECLARE #ret text;
SELECT into #ret
REVERSE(SUBSTRING(REVERSE(#source), 1,
CHARINDEX(#pattern, REVERSE(#source), 1) - 1))
RETURN #ret;
END;
GO
Once you have one of the split strings from here,you can do it in a set based way like this..
declare #string varchar(max)
set #string='C:\Program Files\Microsoft SQL Server\MSSQL\DATA\AdventureWorks_Data.mdf'
;with cte
as
(select *,row_number() over (order by (select null)) as rownum
from [dbo].[SplitStrings_Numbers](#string,'\')
)
select top 1 item from cte order by rownum desc
**Output:**
AdventureWorks_Data.mdf
CREATE FUNCTION dbo.LastIndexOf(#text NTEXT, #delimiter NTEXT)
RETURNS INT
AS
BEGIN
IF (#text IS NULL) RETURN NULL;
IF (#delimiter IS NULL) RETURN NULL;
DECLARE #Text2 AS NVARCHAR(MAX) = #text;
DECLARE #Delimiter2 AS NVARCHAR(MAX) = #delimiter;
DECLARE #Index AS INT = CHARINDEX(REVERSE(#Delimiter2), REVERSE(#Text2));
IF (#Index < 1) RETURN 0;
DECLARE #ContentLength AS INT = (LEN('|' + #Text2 + '|') - 2);
DECLARE #DelimiterLength AS INT = (LEN('|' + #Delimiter2 + '|') - 2);
DECLARE #Result AS INT = (#ContentLength - #Index - #DelimiterLength + 2);
RETURN #Result;
END
Allows for multi-character delimiters like ", " (comma space).
Returns 0 if the delimiter is not found.
Takes a NTEXT for comfort reasons as NVARCHAR(MAX)s are implicitely cast into NTEXT but not vice-versa.
Handles delimiters with leading or tailing space correctly!
Try:
select LEN('tran van abc') + 1 - CHARINDEX(' ', REVERSE('tran van abc'))
So, the last index of ' ' is : 9
I came across this thread while searching for a solution to my similar problem which had the exact same requirement but was for a different kind of database that was lacking the REVERSE function.
In my case this was for a OpenEdge (Progress) database, which has a slightly different syntax. This made the INSTR function available to me that most Oracle typed databases offer.
So I came up with the following code:
SELECT
INSTR(foo.filepath, '/',1, LENGTH(foo.filepath) - LENGTH( REPLACE( foo.filepath, '/', ''))) AS IndexOfLastSlash
FROM foo
However, for my specific situation (being the OpenEdge (Progress) database) this did not result into the desired behaviour because replacing the character with an empty char gave the same length as the original string. This doesn't make much sense to me but I was able to bypass the problem with the code below:
SELECT
INSTR(foo.filepath, '/',1, LENGTH( REPLACE( foo.filepath, '/', 'XX')) - LENGTH(foo.filepath)) AS IndexOfLastSlash
FROM foo
Now I understand that this code won't solve the problem for T-SQL because there is no alternative to the INSTR function that offers the Occurence property.
Just to be thorough I'll add the code needed to create this scalar function so it can be used the same way like I did in the above examples. And will do exactly what the OP wanted, serve as a LastIndexOf method for SQL Server.
-- Drop the function if it already exists
IF OBJECT_ID('INSTR', 'FN') IS NOT NULL
DROP FUNCTION INSTR
GO
-- User-defined function to implement Oracle INSTR in SQL Server
CREATE FUNCTION INSTR (#str VARCHAR(8000), #substr VARCHAR(255), #start INT, #occurrence INT)
RETURNS INT
AS
BEGIN
DECLARE #found INT = #occurrence,
#pos INT = #start;
WHILE 1=1
BEGIN
-- Find the next occurrence
SET #pos = CHARINDEX(#substr, #str, #pos);
-- Nothing found
IF #pos IS NULL OR #pos = 0
RETURN #pos;
-- The required occurrence found
IF #found = 1
BREAK;
-- Prepare to find another one occurrence
SET #found = #found - 1;
SET #pos = #pos + 1;
END
RETURN #pos;
END
GO
To avoid the obvious, when the REVERSE function is available you do not need to create this scalar function and you can just get the required result like this:
SELECT
LEN(foo.filepath) - CHARINDEX('\', REVERSE(foo.filepath))+1 AS LastIndexOfSlash
FROM foo
Try this.
drop table #temp
declare #brokername1 nvarchar(max)='indiabullssecurities,canmoney,indianivesh,acumencapitalmarket,sharekhan,edelweisscapital';
Create Table #temp
(
ID int identity(1,1) not null,
value varchar(100) not null
)
INSERT INTO #temp(value) SELECT value from STRING_SPLIT(#brokername1,',')
declare #id int;
set #id=(select max(id) from #temp)
--print #id
declare #results varchar(500)
select #results = coalesce(#results + ',', '') + convert(varchar(12),value)
from #temp where id<#id
order by id
print #results

clearing unwanted characters with replace [duplicate]

This is based on a similar question How to Replace Multiple Characters in Access SQL?
I wrote this since sql server 2005 seems to have a limit on replace() function to 19 replacements inside a where clause.
I have the following task: Need to perform a match on a column, and to improve the chances of a match stripping multiple un-needed chars using replace() function
DECLARE #es NVarChar(1) SET #es = ''
DECLARE #p0 NVarChar(1) SET #p0 = '!'
DECLARE #p1 NVarChar(1) SET #p1 = '#'
---etc...
SELECT *
FROM t1,t2
WHERE REPLACE(REPLACE(t1.stringkey,#p0, #es), #p1, #es)
= REPLACE(REPLACE(t2.stringkey,#p0, #es), #p1, #es)
---etc
If there are >19 REPLACE() in that where clause, it doesn't work. So the solution I came up with is to create a sql function called trimChars in this example (excuse them starting at #22
CREATE FUNCTION [trimChars] (
#string varchar(max)
)
RETURNS varchar(max)
AS
BEGIN
DECLARE #es NVarChar(1) SET #es = ''
DECLARE #p22 NVarChar(1) SET #p22 = '^'
DECLARE #p23 NVarChar(1) SET #p23 = '&'
DECLARE #p24 NVarChar(1) SET #p24 = '*'
DECLARE #p25 NVarChar(1) SET #p25 = '('
DECLARE #p26 NVarChar(1) SET #p26 = '_'
DECLARE #p27 NVarChar(1) SET #p27 = ')'
DECLARE #p28 NVarChar(1) SET #p28 = '`'
DECLARE #p29 NVarChar(1) SET #p29 = '~'
DECLARE #p30 NVarChar(1) SET #p30 = '{'
DECLARE #p31 NVarChar(1) SET #p31 = '}'
DECLARE #p32 NVarChar(1) SET #p32 = ' '
DECLARE #p33 NVarChar(1) SET #p33 = '['
DECLARE #p34 NVarChar(1) SET #p34 = '?'
DECLARE #p35 NVarChar(1) SET #p35 = ']'
DECLARE #p36 NVarChar(1) SET #p36 = '\'
DECLARE #p37 NVarChar(1) SET #p37 = '|'
DECLARE #p38 NVarChar(1) SET #p38 = '<'
DECLARE #p39 NVarChar(1) SET #p39 = '>'
DECLARE #p40 NVarChar(1) SET #p40 = '#'
DECLARE #p41 NVarChar(1) SET #p41 = '-'
return REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
#string, #p22, #es), #p23, #es), #p24, #es), #p25, #es), #p26, #es), #p27, #es), #p28, #es), #p29, #es), #p30, #es), #p31, #es), #p32, #es), #p33, #es), #p34, #es), #p35, #es), #p36, #es), #p37, #es), #p38, #es), #p39, #es), #p40, #es), #p41, #es)
END
This can then be used in addition to the other replace strings
SELECT *
FROM t1,t2
WHERE trimChars(REPLACE(REPLACE(t1.stringkey,#p0, #es), #p1, #es)
= REPLACE(REPLACE(t2.stringkey,#p0, #es), #p1, #es))
I created a few more functions to do similar replacing like so trimChars(trimMoreChars(
SELECT *
FROM t1,t2
WHERE trimChars(trimMoreChars(REPLACE(REPLACE(t1.stringkey,#p0, #es), #p1, #es)
= REPLACE(REPLACE(t2.stringkey,#p0, #es), #p1, #es)))
Can someone give me a better solution to this problem in terms of performance and maybe a cleaner implementation?
One useful trick in SQL is the ability use #var = function(...) to assign a value. If you have multiple records in your record set, your var is assigned multiple times with side-effects:
declare #badStrings table (item varchar(50))
INSERT INTO #badStrings(item)
SELECT '>' UNION ALL
SELECT '<' UNION ALL
SELECT '(' UNION ALL
SELECT ')' UNION ALL
SELECT '!' UNION ALL
SELECT '?' UNION ALL
SELECT '#'
declare #testString varchar(100), #newString varchar(100)
set #teststring = 'Juliet ro><0zs my s0x()rz!!?!one!#!#!#!'
set #newString = #testString
SELECT #newString = Replace(#newString, item, '') FROM #badStrings
select #newString -- returns 'Juliet ro0zs my s0xrzone'
I would seriously consider making a CLR UDF instead and using regular expressions (both the string and the pattern can be passed in as parameters) to do a complete search and replace for a range of characters. It should easily outperform this SQL UDF.
I really like #Juliett's solution! I would just use a CTE to get all the invalid characters:
DECLARE #badStrings VARCHAR(100)
DECLARE #teststring VARCHAR(100)
SET #badStrings = '><()!?#'
SET #teststring = 'Juliet ro><0zs my s0x()rz!!?!one!#!#!#!'
;WITH CTE AS
(
SELECT SUBSTRING(#badStrings, 1, 1) AS [String], 1 AS [Start], 1 AS [Counter]
UNION ALL
SELECT SUBSTRING(#badStrings, [Start] + 1, 1) AS [String], [Start] + 1, [Counter] + 1
FROM CTE
WHERE [Counter] < LEN(#badStrings)
)
SELECT #teststring = REPLACE(#teststring, CTE.[String], '') FROM CTE
SELECT #teststring
Juliet ro0zs my s0xrzone
I suggest you to create a scalar user defined function. This is an example (sorry in advance, because the variable names are in spanish):
CREATE FUNCTION [dbo].[Udf_ReplaceChars] (
#cadena VARCHAR(500), -- String to manipulate
#caracteresElim VARCHAR(100), -- String of characters to be replaced
#caracteresReem VARCHAR(100) -- String of characters for replacement
)
RETURNS VARCHAR(500)
AS
BEGIN
DECLARE #cadenaFinal VARCHAR(500), #longCad INT, #pos INT, #caracter CHAR(1), #posCarER INT;
SELECT
#cadenaFinal = '',
#longCad = LEN(#cadena),
#pos = 1;
IF LEN(#caracteresElim)<>LEN(#caracteresReem)
BEGIN
RETURN NULL;
END
WHILE #pos <= #longCad
BEGIN
SELECT
#caracter = SUBSTRING(#cadena,#pos,1),
#pos = #pos + 1,
#posCarER = CHARINDEX(#caracter,#caracteresElim);
IF #posCarER <= 0
BEGIN
SET #cadenaFinal = #cadenaFinal + #caracter;
END
ELSE
BEGIN
SET #cadenaFinal = #cadenaFinal + SUBSTRING(#caracteresReem,#posCarER,1)
END
END
RETURN #cadenaFinal;
END
Here is an example using this function:
SELECT dbo.Udf_ReplaceChars('This is a test.','sat','Z47');
And the result is: 7hiZ iZ 4 7eZ7.
As you can see, each character of the #caracteresElim parameter is replaced by the character in the same position from the #caracteresReem parameter.
While this question was asked about SQL Server 2005, it's worth noting that as of Sql Server 2017, the request can be done with the new TRANSLATE function.
https://learn.microsoft.com/en-us/sql/t-sql/functions/translate-transact-sql
I hope this information helps people who get to this page in the future.
I had a one-off data migration issue where the source data could not output correctly some unusual/technical characters plus the ubiquitous extra commas in CSVs.
We decided that for each such character the source extract should replace them with something that was recognisable to both the source system and the SQL Server that was loading them but which would not be in the data otherwise.
It did mean however that in various columns across various tables these replacement characters would appear and I would have to replace them. Nesting multiple REPLACE functions made the import code look scary and prone to errors in misjudging the placement and number of brackets so I wrote the following function. I know it can process a column in a table of 3,000 rows in less than a second though I'm not sure how quickly it will scale up to multi-million row tables.
create function [dbo].[udf_ReplaceMultipleChars]
(
#OriginalString nvarchar(4000)
, #ReplaceTheseChars nvarchar(100)
, #LengthOfReplacement int = 1
)
returns nvarchar(4000)
begin
declare #RevisedString nvarchar(4000) = N'';
declare #lengthofinput int =
(
select len(#OriginalString)
);
with AllNumbers
as (select 1 as Number
union all
select Number + 1
from AllNumbers
where Number < #lengthofinput)
select #RevisedString += case
when (charindex(substring(#OriginalString, Number, 1), #ReplaceTheseChars, 1) - 1) % 2
= 0 then
substring(
#ReplaceTheseChars
, charindex(
substring(#OriginalString, Number, 1)
, #ReplaceTheseChars
, 1
) + 1
, #LengthOfReplacement
)
else
substring(#OriginalString, Number, 1)
end
from AllNumbers
option (maxrecursion 4000);
return (#RevisedString);
end;
It works by submitting both the string to be evaluated and have characters to be replaced (#OriginalString) along with a string of paired characters where the first character is to be replaced by the second, the third by the fourth, fifth by sixth and so on (#ReplaceTheseChars).
Here is the string of chars that I needed to replace and their replacements... [']"~,{Ø}°$±|¼¦¼ª½¬½^¾#✓
i.e. A opening square bracket denotes an apostrophe, a closing one a double quote. You can see that there were vulgar fractions as well as degrees and diameter symbols in there.
There is a default #LengthOfReplacement that is included as a starting point if anyone needed to replace longer strings. I played around with that in my project but the single char replacement was the main function.
The condition of the case statement is important. It ensures that it only replaces the character if it is found in your #ReplaceTheseChars variable and that the character has to be found in an odd numbered position (the minus 1 from charindex result ensures that anything NOT found returns a negative modulo value). i.e if you find a tilde (~) in position 5 it will replace it with a comma but if on a subsequent run it found the comma in position 6 it would not replace it with a curly bracket ({).
This can be best demonstrated with an example...
declare #ProductDescription nvarchar(20) = N'abc~def[¦][123';
select #ProductDescription
= dbo.udf_ReplaceMultipleChars(
#ProductDescription
/* NB the doubling up of the apostrophe is necessary in the string but resolves to a single apostrophe when passed to the function */
,'['']"~,{Ø}°$±|¼¦¼ª½¬½^¾#✓'
, default
);
select #ProductDescription
, dbo.udf_ReplaceMultipleChars(
#ProductDescription
,'['']"~,{Ø}°$±|¼¦¼ª½¬½^¾#✓'
/* if you didn't know how to type those peculiar chars in then you can build a string like this... '[' + nchar(0x0027) + ']"~,{' + nchar(0x00D8) + '}' + nchar(0x00B0) etc */
,
default
);
This will return both the value after the first pass through the function and the second time as follows...
abc,def'¼"'123 abc,def'¼"'123
A table update would just be
update a
set a.Col1 = udf.ReplaceMultipleChars(a.Col1,'~,]"',1)
from TestTable a
Finally (I hear you say!), although I've not had access to the translate function I believe that this function can process the example shown in the documentation quite easily. The TRANSLATE function demo is
SELECT TRANSLATE('2*[3+4]/{7-2}', '[]{}', '()()');
which returns 2*(3+4)/(7-2) although I understand it might not work on 2*[3+4]/[7-2] !!
My function would approach this as follows listing each char to be replaced followed by its replacement [ --> (, { --> ( etc.
select dbo.udf_ReplaceMultipleChars('2*[3+4]/{7-2}', '[({(])})', 1);
which will also work for
select dbo.udf_ReplaceMultipleChars('2*[3+4]/[7-2]', '[({(])})', 1);
I hope someone finds this useful and if you get to test its performance against larger tables do let us know one way or another!
declare #testVal varchar(20)
set #testVal = '?t/es?ti/n*g 1*2?3*'
select #testVal = REPLACE(#testVal, item, '') from (select '?' item union select '*' union select '/') list
select #testVal;
One option is to use a numbers/tally table to drive an iterative process via a pseudo-set based query.
The general idea of char replacement can be demonstrated with a simple character map table approach:
create table charMap (srcChar char(1), replaceChar char(1))
insert charMap values ('a', 'z')
insert charMap values ('b', 'y')
create table testChar(srcChar char(1))
insert testChar values ('1')
insert testChar values ('a')
insert testChar values ('2')
insert testChar values ('b')
select
coalesce(charMap.replaceChar, testChar.srcChar) as charData
from testChar left join charMap on testChar.srcChar = charMap.srcChar
Then you can bring in the tally table approach to do the lookup on each character position in the string.
create table tally (i int)
declare #i int
set #i = 1
while #i <= 256 begin
insert tally values (#i)
set #i = #i + 1
end
create table testData (testString char(10))
insert testData values ('123a456')
insert testData values ('123ab456')
insert testData values ('123b456')
select
i,
SUBSTRING(testString, i, 1) as srcChar,
coalesce(charMap.replaceChar, SUBSTRING(testString, i, 1)) as charData
from testData cross join tally
left join charMap on SUBSTRING(testString, i, 1) = charMap.srcChar
where i <= LEN(testString)
I don't know why Charles Bretana deleted his answer, so I'm adding it back in as a CW answer, but a persisted computed column is a REALLY good way to handle these cases where you need cleansed or transformed data almost all the time, but need to preserve the original garbage. His suggestion is relevant and appropriate REGARDLESS of how you decide to cleanse your data.
Specifically, in my current project, I have a persisted computed column which trims all the leading zeros (luckily this is realtively easily handled in straight T-SQL) from some particular numeric identifiers stored inconsistently with leading zeros. This is stored in persisted computed columns in the tables which need it and indexed because that conformed identifier is often used in joins.
Here are the steps
Create a CLR function
See following code:
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString Replace2(SqlString inputtext, SqlString filter,SqlString replacewith)
{
string str = inputtext.ToString();
try
{
string pattern = (string)filter;
string replacement = (string)replacewith;
Regex rgx = new Regex(pattern);
string result = rgx.Replace(str, replacement);
return (SqlString)result;
}
catch (Exception s)
{
return (SqlString)s.Message;
}
}
}
Deploy your CLR function
Now Test it
See following code:
create table dbo.test(dummydata varchar(255))
Go
INSERT INTO dbo.test values('P#ssw1rd'),('This 12is #test')
Go
Update dbo.test
set dummydata=dbo.Replace2(dummydata,'[0-9#]','')
select * from dbo.test
dummydata, Psswrd, This is test booom!!!!!!!!!!!!!
Here's a modern solution using STRING_SPLIT that's very concise. The drawback is that you need at least version SQL Server 2016 running at compatibility level 130.
Declare #strOriginal varchar(100) = 'Juliet ro><0zs my s0x()rz!!?!one!#!#!#!'
Declare #strModified varchar(100) = #strOriginal
Declare #disallowed varchar(100) = '> < ( ) ! ? #'
Select
#strModified = Replace(#strModified, value, '')
From
String_Split(#disallowed,' ')
Select #strModified
It returns:
Juliet ro0zs my s0xrzone
create function RemoveCharacters(#original nvarchar(max) , #badchars nvarchar(max))
returns nvarchar(max)
as
begin
declare #len int = (select len(#badchars))
return REPLACE(TRANSLATE(#original, #badchars, replicate('#' , #len )), '#', '')
end
go
select dbo.RemoveCharacters('Hello World!' , 'lo!' )
--returns He Wrd

sql search from csv string

im doing a search page where i have to search multiple fields with a single textbox.
so i will get the search text as a CSV string in my stored procedure
My table is as below
ID Name age
5 bob 23
6 bod.harry 34
7 charles 44
i need a sql query something like this
declare #searchtext='bob,harry,charley'
select * from employee where name like (#searchtext)
this query should return both this records (id 5 and 6)
You can use this way in Stored Procedure,
declare #searchtext varchar(1000)
set searchtext ='bob,harry,charley'
declare #filter varchar(2000)
set #filter = '(name LIKE ''%' + replace('bob,harry,charley',',','%'' OR name LIKE ''%') + '%'')'
exec
('
select *
from mytab
where ' + #filter + '
'
)
Use (or adapt) this splitting function:
ALTER FUNCTION [dbo].CsvToList(#SplitOn char(1), #List varchar(8000))
RETURNS TABLE
AS
RETURN
(
SELECT
ROW_NUMBER() OVER(ORDER BY number) AS RowNumber
,LTRIM(RTRIM(SUBSTRING(ListValue, number+1, CHARINDEX(#SplitOn, ListValue, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS ListValue
) AS InnerQuery
INNER JOIN master.dbo.spt_values n ON n.Number < LEN(InnerQuery.ListValue)
WHERE SUBSTRING(ListValue, number, 1) = #SplitOn
AND n.type = 'P'
);
GO
usage
declare #searchtext='bob,harry,charley'
select DISTINCT * from employee e
JOIN dbo.csvToList(',', #searchtext) f
ON f.ListValue = e.name
You'll need to break #searchtext into multiple strings, one for each name. It's doable in TSQL but may be easier in your application code. You can then compare those with your Name field.
If I'm not mistaken Sql-Server doesn't support Regex. You can use table valued parameters. If you are using Entity framework the you could do so.
var dc = new MyContext();
var result = dc.employees.Where(x => new [] { "bob", "harry", "charley" }.Contains(x.name));
and finally you might construct the following
select * from employee where name in (#Param1, #Param2, #Param3, #Param4)
EDIT
I highly discourage you to use CSV because of the performance drop (you have to parse your csv) and possibility of errors (consider this csv Foo,Bar,"Foo with, comma","comma, "" and quote")
P.S. If you use table valued parameter when you assign the value use DataTable as source.
The above version of [dbo].CsvToList does not work with long input string with a lot of separators. Table spt_values where type='P' has limited number of records. In my case the function returned 16 rows instead of 66. Some advise to create your own table with numbers. I used a different version of this function I copied from other place:
CREATE FUNCTION [dbo].[fngCsvToList](#SplitOn char(1), #List varchar(8000))
RETURNS #Result TABLE (ListValue varchar(100))
AS
BEGIN
DECLARE #str VARCHAR(20)
DECLARE #ind Int
IF(#List is not null)
BEGIN
SET #List = REPLACE(REPLACE(REPLACE(LTRIM(RTRIM(#List)), CHAR(10), ''), CHAR(13), ''), CHAR(9), '')
SET #ind = CharIndex(#SplitOn, #List)
WHILE #ind > 0
BEGIN
SET #str = SUBSTRING(#List, 1, #ind-1)
SET #List = SUBSTRING(#List, #ind+1, LEN(#List)-#ind)
INSERT INTO #Result values (LTRIM(RTRIM(#str)))
SET #ind = CharIndex(',',#List)
END
SET #str = #List
INSERT INTO #Result values (LTRIM(RTRIM(#str) ))
END
RETURN
END