Replace all numbers with three digits or more - sql

I have a field say "keywords" which contains random strings of numbers and I'd like to clean the field from any string of numbers which has more than 3 digits.
I have searched and know wildcards are not possible in replace. Any idea how I can go about that?

Here's a good place to start
Say you have a table called "test_test":
create table dbo.test_test (thisStuff varchar(100));
With a value like this in it:
insert into test_test values ('Hello123 this is 12 a test 22983o398r57298298347238');
You can do some limited pattern matching with patindex():
select substring(thisStuff,
1,
patindex('%[0-9][0-9][0-9]%',thisStuff)-1) +
substring(thisStuff,
patindex('%[0-9][0-9][0-9]%',thisStuff)+3,
len(thisStuff))
from test_test
Which converts this value:
Hello123 this is 12 a test 22983o398r57298298347238
Into this value:
Hello this is 12 a test 22983o398r57298298347238
In update form it would look like this:
update test_test set thisStuff =
substring(thisStuff,
1,
patindex('%[0-9][0-9][0-9]%',thisStuff)-1) +
substring(thisStuff,
patindex('%[0-9][0-9][0-9]%',thisStuff)+3,
len(thisStuff));
Which, when run over and over, gives you the progressive values:
Hello this is 12 a test 83o398r57298298347238
Hello this is 12 a test 83or57298298347238
Hello this is 12 a test 83or98298347238
Hello this is 12 a test 83or98347238
Hello this is 12 a test 83or47238
Hello this is 12 a test 83or38
Before erroring out
Msg 3621, Level 0, State 0.
The statement has been terminated.
Msg 537, Level 16, State 2.
Invalid length parameter passed to the LEFT or SUBSTRING function. (Line 35)

Since you are on 2016, you can use String_Split() in concert with Try_Convert()
Example
Declare #YourTable table (idproduct int,searchkeywords varchar(500))
Insert Into #YourTable values
(109070,'stands & cabinets kantec ams300 1010055 43212002 03906786808 7503 ktkams ltk ams 300')
Select A.idproduct
,NewString = B.S
From #YourTable A
Cross Apply (
Select S = Stuff((Select ' ' +Value
From (Select Value,Seq=Row_Number() over (Order by (select null))
From String_Split(A.searchkeywords,' ')
) B1
Where (try_convert(float,Value) is null)
or (try_convert(float,Value) is not null and len(Value)<=3)
Order by Seq
For XML Path ('')),1,1,'')
) B
Returns
idproduct NewString
109070 stands & cabinets kantec ams300 ktkams ltk ams 300
If you are satisfied with the results, you can apply an update like so:
Update A Set searchkeywords = B.S
From #YourTable A
Cross Apply (
Select S = Stuff((Select ' ' +Value
From (Select Value,Seq=Row_Number() over (Order by (select null))
From String_Split(A.searchkeywords,' ')
) B1
Where (try_convert(float,Value) is null)
or (try_convert(float,Value) is not null and len(Value)<=3)
Order by Seq
For XML Path ('')),1,1,'')
) B

Related

I need help parsing an HL7 string with TSQL

I have a column in a table that looks like this
Name
WALKER^JAMES^K^^
ANDERSON^MICHAEL^R^^
HUFF^CHRIS^^^
WALKER^JAMES^K^^
SWEARINGEN^TOMMY^L^^
SMITH^JOHN^JACCOB^^
I need to write a query that looks like this
Name
FirstName
LastName
MiddleName
WALKER^JAMES^K^^
JAMES
WALKER
K
ANDERSON^MICHAEL^R^^
MICHAEL
ANDERSON
R
HUFF^CHRIS^^^
CHRIS
HUFF
BUTLER^STEWART^M^^
STEWART
BUTLER
M
SWEARINGEN^TOMMY^L^^
TOMMY
SWEARINGEN
L
SMITH^JOHN^JACCOB^^
JOHN
SMITH
JACCOB
I need help generating the LastName column.
This is what I've tried so far
SUBSTRING
(
--SEARCH THE NAME COLUMN
Name,
--Starting after the first '^'
CHARINDEX('^', Name) + 1 ),
--Index of second ^ minus the index of the first ^
(CHARINDEX('^', PatientName, CHARINDEX('^', PatientName) +1)) - (CHARINDEX('^', PatientName))
)
This produces:
Invalid length parameter passed to the LEFT or SUBSTRING function.
I know this can work because if I change the minus sign to a plus sign it performs as expected.
It produces the right integer.
Where am I going wrong? Is there a better way to do this?
If you are using the latest SQL Server versions 2016 13.x or higher, you can maximize the use of string_split function with ordinal (position).
declare #strTable table(sqlstring varchar(max))
insert into #strTable (sqlstring) values ('WALKER^JAMES^K^^')
insert into #strTable (sqlstring) values ('ANDERSON^MICHAEL^R^^')
insert into #strTable (sqlstring) values ('HUFF^CHRIS^^^')
insert into #strTable (sqlstring) values ('SWEARINGEN^TOMMY^L^^');
with tmp as
(select value s, Row_Number() over (order by (select 0)) n from #strTable
cross apply String_Split(sqlstring, '^', 1))
select t2.s as FirstName, t1.s as LastName, t3.s as MiddleInitial from tmp t1
left join tmp t2 on t2.n-t1.n = 1
left join tmp t3 on t3.n-t1.n = 2
where t1.n = 1 or t1.n % 5 = 1
I recommend SUBSTRING() as it will perform the best. The challenge with SUBSTRING is it's hard to account to keep track of the nested CHARDINDEX() calls so it's better to break the calculation into pieces. I use CROSS APPLY to alias each "^" found and start from there to search for the next. Also allows to do NULLIF() = 0, so if it can't find the "^", it just returns a NULL instead of erroring out
Parse Delimited String using SUBSTRING() and CROSS APPLY
DROP TABLE IF EXISTS #Name
CREATE TABLE #Name (ID INT IDENTITY(1,1) PRIMARY KEY,[Name] varchar(255))
INSERT INTO #Name
VALUES ('WALKER^JAMES^K^^')
,('ANDERSON^MICHAEL^R^^')
,('HUFF^CHRIS^^^')
,('SWEARINGEN^TOMMY^L^^');
SELECT ID
,A.[Name]
,LastName = NULLIF(SUBSTRING(A.[Name],0,idx1),'')
,FirstName = NULLIF(SUBSTRING(A.[Name],idx1+1,idx2-idx1-1),'')
,MiddleInitial = NULLIF(SUBSTRING(A.[Name],idx2+1,idx3-idx2-1),'')
FROM #Name AS A
CROSS APPLY (SELECT idx1 = NULLIF(CHARINDEX('^',[Name]),0)) AS B
CROSS APPLY (SELECT idx2 = NULLIF(CHARINDEX('^',[Name],idx1+1),0)) AS C
CROSS APPLY (SELECT idx3 = NULLIF(CHARINDEX('^',[Name],idx2+1),0)) AS D

How to SELECT string between second and third instance of ",,"?

I am trying to get string between second and third instance of ",," using SQL SELECT.
Apparently functions substring and charindex are useful, and I have tried them but the problem is that I need the string between those specific ",,"s and the length of the strings between them can change.
Can't find working example anywhere.
Here is an example:
Table: test
Column: Column1
Row1: cat1,,cat2,,cat3,,cat4,,cat5
Row2: dogger1,,dogger2,,dogger3,,dogger4,,dogger5
Result: cat3dogger3
Here is my closest attempt, it works if the strings are same length every time, but they aren't:
SELECT SUBSTRING(column1,LEN(LEFT(column1,CHARINDEX(',,', column1,12)+2)),LEN(column1) - LEN(LEFT(column1,CHARINDEX(',,', column1,20)+2)) - LEN(RIGHT(column1,CHARINDEX(',,', (REVERSE(column1)))))) AS column1
FROM testi
Just repeat sub-string 3 times, each time moving onto the next ",," e.g.
select
-- Substring till the third ',,'
substring(z.col1, 1, patindex('%,,%',z.col1)-1)
from (values ('cat1,,cat2,,cat3,,cat4,,cat5'),('dogger1,,dogger2,,dogger3,,dogger4,,dogger5')) x (col1)
-- Substring from the first ',,'
cross apply (values (substring(x.col1,patindex('%,,%',x.col1)+2,len(x.col1)))) y (col1)
-- Substring from the second ',,'
cross apply (values (substring(y.col1,patindex('%,,%',y.col1)+2,len(y.col1)))) z (col1);
And just to reiterate, this is a terrible way to store data, so the best solution is to store it properly.
Here is an alternative solution using charindex. The base idea is the same as in Dale K's an answer, but instead of cutting the string, we specify the start_location for the search by using the third, optional parameter, of charindex. This way, we get the location of each separator, and could slip each value off from the main string.
declare #vtest table (column1 varchar(200))
insert into #vtest ( column1 ) values('dogger1,,dogger2,,dogger3,,dogger4,,dogger5')
insert into #vtest ( column1 ) values('cat1,,cat2,,cat3,,cat4,,cat5')
declare #separetor char(2) = ',,'
select
t.column1
, FI.FirstInstance
, SI.SecondInstance
, TI.ThirdInstance
, iif(TI.ThirdInstance is not null, substring(t.column1, SI.SecondInstance + 2, TI.ThirdInstance - SI.SecondInstance - 2), null)
from
#vtest t
cross apply (select nullif(charindex(#separetor, t.column1), 0) FirstInstance) FI
cross apply (select nullif(charindex(#separetor, t.column1, FI.FirstInstance + 2), 0) SecondInstance) SI
cross apply (select nullif(charindex(#separetor, t.column1, SI.SecondInstance + 2), 0) ThirdInstance) TI
For transparency, I saved the separator string in a variable.
By default the charindex returns 0 if the search string is not present, so I overwrite it with the value null, by using nullif
IMHO, SQL Server 2016 and its JSON support in the best option here.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, Tokens VARCHAR(500));
INSERT INTO #tbl VALUES
('cat1,,cat2,,cat3,,cat4,,cat5'),
('dogger1,,dogger2,,dogger3,,dogger4,,dogger5');
-- DDL and sample data population, end
WITH rs AS
(
SELECT *
, '["' + REPLACE(Tokens
, ',,', '","')
+ '"]' AS jsondata
FROM #tbl
)
SELECT rs.ID, rs.Tokens
, JSON_VALUE(jsondata, '$[2]') AS ThirdToken
FROM rs;
Output
+----+---------------------------------------------+------------+
| ID | Tokens | ThirdToken |
+----+---------------------------------------------+------------+
| 1 | cat1,,cat2,,cat3,,cat4,,cat5 | cat3 |
| 2 | dogger1,,dogger2,,dogger3,,dogger4,,dogger5 | dogger3 |
+----+---------------------------------------------+------------+
It´s the same as #"Yitzhak Khabinsky" but i think it looks clearer
WITH CTE_Data
AS(
SELECT 'cat1,,cat2,,cat3,,cat4,,cat5' AS [String]
UNION
SELECT 'dogger1,,dogger2,,dogger3,,dogger4,,dogger5' AS [String]
)
SELECT
A.[String]
,Value3 = JSON_VALUE('["'+ REPLACE(A.[String], ',,', '","') + '"]', '$[2]')
FROM CTE_Data AS A

Order Concatenated field

I have a field which is a concatenation of single letters. I am trying to order these strings within a view. These values can't be hard coded as there are too many. Is someone able to provide some guidance on the function to use to achieve the desired output below? I am using MSSQL.
Current output
CustID | Code
123 | BCA
Desired output
CustID | Code
123 | ABC
I have tried using a UDF
CREATE FUNCTION [dbo].[Alphaorder] (#str VARCHAR(50))
returns VARCHAR(50)
BEGIN
DECLARE #len INT,
#cnt INT =1,
#str1 VARCHAR(50)='',
#output VARCHAR(50)=''
SELECT #len = Len(#str)
WHILE #cnt <= #len
BEGIN
SELECT #str1 += Substring(#str, #cnt, 1) + ','
SET #cnt+=1
END
SELECT #str1 = LEFT(#str1, Len(#str1) - 1)
SELECT #output += Sp_data
FROM (SELECT Split.a.value('.', 'VARCHAR(100)') Sp_data
FROM (SELECT Cast ('<M>' + Replace(#str1, ',', '</M><M>') + '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)) A
ORDER BY Sp_data
RETURN #output
END
This works when calling one field
ie.
Select CustID, dbo.alphaorder(Code)
from dbo.source
where custid = 123
however when i try to apply this to top(10) i receive the error
"Invalid length parameter passed to the LEFT or SUBSTRING function."
Keeping in mind my source has ~4million records, is this still the best solution?
Unfortunately i am not able to normalize the data into a separate table with records for each Code.
This doesn't rely on a id column to join with itself, performance is almost as fast
as the answer by #Shnugo:
SELECT
CustID,
(
SELECT
chr
FROM
(SELECT TOP(LEN(Code))
SUBSTRING(Code,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),1)
FROM sys.messages) A(Chr)
ORDER by chr
FOR XML PATH(''), type).value('.', 'varchar(max)'
) As CODE
FROM
source t
First of all: Avoid loops...
You can try this:
DECLARE #tbl TABLE(ID INT IDENTITY, YourString VARCHAR(100));
INSERT INTO #tbl VALUES ('ABC')
,('JSKEzXO')
,('QKEvYUJMKRC');
--the cte will create a list of all your strings separated in single characters.
--You can check the output with a simple SELECT * FROM SeparatedCharacters instead of the actual SELECT
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,(
SELECT Chr As [*]
FROM SeparatedCharacters sc1
WHERE sc1.ID=t.ID
ORDER BY sc1.Chr
FOR XML PATH(''),TYPE
).value('.','nvarchar(max)') AS Sorted
FROM #tbl t;
The result
ID YourString Sorted
1 ABC ABC
2 JSKEzXO EJKOSXz
3 QKEvYUJMKRC CEJKKMQRUvY
The idea in short
The trick is the first CROSS APPLY. This will create a tally on-the-fly. You will get a resultset with numbers from 1 to n where n is the length of the current string.
The second apply uses this number to get each character one-by-one using SUBSTRING().
The outer SELECT calls from the orginal table, which means one-row-per-ID and use a correalted sub-query to fetch all related characters. They will be sorted and re-concatenated using FOR XML. You might add DISTINCT in order to avoid repeating characters.
That's it :-)
Hint: SQL-Server 2017+
With version v2017 there's the new function STRING_AGG(). This would make the re-concatenation very easy:
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,STRING_AGG(sc.Chr,'') WITHIN GROUP(ORDER BY sc.Chr) AS Sorted
FROM SeparatedCharacters sc
GROUP BY ID,YourString;
Considering your table having good amount of rows (~4 Million), I would suggest you to create a persisted calculated field in the table, to store these values. As calculating these values at run time in a view, will lead to performance problems.
If you are not able to normalize, add this as a denormalized column to the existing table.
I think the error you are getting could be due to empty codes.
If LEN(#str) = 0
BEGIN
SET #output = ''
END
ELSE
BEGIN
... EXISTING CODE BLOCK ...
END
I can suggest to split string into its characters using referred SQL function.
Then you can concatenate string back, this time ordered alphabetically.
Are you using SQL Server 2017? Because with SQL Server 2017, you can use SQL String_Agg string aggregation function to concatenate characters splitted in an ordered way as follows
select
t.CustId, string_agg(strval, '') within GROUP (order by strval)
from CharacterTable t
cross apply dbo.SPLIT(t.code) s
where strval is not null
group by CustId
order by CustId
If you are not working on SQL2017, then you can follow below structure using SQL XML PATH for concatenation in SQL
select
CustId,
STUFF(
(
SELECT
'' + strval
from CharacterTable ct
cross apply dbo.SPLIT(t.code) s
where strval is not null
and t.CustId = ct.CustId
order by strval
FOR XML PATH('')
), 1, 0, ''
) As concatenated_string
from CharacterTable t
order by CustId

how to extract a particular id from the string using sql

I want to extract a particular ids from the records in a table.For example i have a below table
Id stringvalue
1 test (ID 123) where another ID 2596
2 next ID145 and the condition I(ID 635,897,900)
I want the result set as below
ID SV
1 123,2596
2 145,635,897,900
i have tried the below query which extracts only one ID from the string:
Select Left(substring(string,PATINDEX('%[0-9]%',string),Len(string)),3) from Table1
I seriously don't encourage the T-SQL approach (as SQL is not meant to do this), however, a working version is presented below -
Try this
DECLARE #T TABLE(ID INT IDENTITY,StringValue VARCHAR(500))
INSERT INTO #T
SELECT 'test (ID 123) where another ID 2596' UNION ALL
SELECT 'next ID145 and the condition I(ID 635,897,900)'
;WITH SplitCTE AS(
SELECT
F1.ID,
X.SplitData
,Position = PATINDEX('%[0-9]%', X.SplitData)
FROM (
SELECT *,
CAST('<X>'+REPLACE(REPLACE(StringValue,' ',','),',','</X><X>')+'</X>' AS XML) AS XmlFilter
FROM #T F
)F1
CROSS APPLY
(
SELECT fdata.D.value('.','varchar(50)') AS SplitData
FROM f1.xmlfilter.nodes('X') AS fdata(D)) X
WHERE PATINDEX('%[0-9]%', X.SplitData) > 0),
numericCTE AS(
SELECT
ID
,AllNumeric = LEFT(SUBSTRING(SplitData, Position, LEN(SplitData)), PATINDEX('%[^0-9]%', SUBSTRING(SplitData, Position, LEN(SplitData)) + 't') - 1)
FROM SplitCTE
)
SELECT
ID
,STUFF(( SELECT ',' + c1.AllNumeric
FROM numericCTE c1
WHERE c1.ID = c2.ID
FOR XML PATH(''),TYPE)
.value('.','NVARCHAR(MAX)'),1,1,'') AS SV
FROM numericCTE c2
GROUP BY ID
/*
Result
ID SV
1 123,2596
2 145,635,897,900
*/
However, I completely agree with #Giorgi Nakeuri. It is better to use some programming language (if you have that at your disposal) and use regular expression for the same. You can figure out that, I have used REPLACE function two times, first to replace the blank space and second to replace the commas(,).
Hope you will get some idea to move on.

replace value in varchar(max) field with join

I have a table that contains text field with placeholders. Something like this:
Row Notes
1. This is some notes ##placeholder130## this ##myPlaceholder##, #oneMore#. End.
2. Second row...just a ##test#.
(This table contains about 1-5k rows on average. Average number of placeholders in one row is 5-15).
Now, I have a lookup table that looks like this:
Name Value
placeholder130 Dog
myPlaceholder Cat
oneMore Cow
test Horse
(Lookup table will contain anywhere from 10k to 100k records)
I need to find the fastest way to join those placeholders from strings to a lookup table and replace with value. So, my result should look like this (1st row):
This is some notes Dog this Cat, Cow. End.
What I came up with was to split each row into multiple for each placeholder and then join it to lookup table and then concat records back to original row with new values, but it takes around 10-30 seconds on average.
You could try to split the string using a numbers table and rebuild it with for xml path.
select (
select coalesce(L.Value, T.Value)
from Numbers as N
cross apply (select substring(Notes.notes, N.Number, charindex('##', Notes.notes + '##', N.Number) - N.Number)) as T(Value)
left outer join Lookup as L
on L.Name = T.Value
where N.Number <= len(notes) and
substring('##' + notes, Number, 2) = '##'
order by N.Number
for xml path(''), type
).value('text()[1]', 'varchar(max)')
from Notes
SQL Fiddle
I borrowed the string splitting from this blog post by Aaron Bertrand
SQL Server is not very fast with string manipulation, so this is probably best done client-side. Have the client load the entire lookup table, and replace the notes as they arrived.
Having said that, it can of course be done in SQL. Here's a solution with a recursive CTE. It performs one lookup per recursion step:
; with Repl as
(
select row_number() over (order by l.name) rn
, Name
, Value
from Lookup l
)
, Recurse as
(
select Notes
, 0 as rn
from Notes
union all
select replace(Notes, '##' + l.name + '##', l.value)
, r.rn + 1
from Recurse r
join Repl l
on l.rn = r.rn + 1
)
select *
from Recurse
where rn =
(
select count(*)
from Lookup
)
option (maxrecursion 0)
Example at SQL Fiddle.
Another option is a while loop to keep replacing lookups until no more are found:
declare #notes table (notes varchar(max))
insert #notes
select Notes
from Notes
while 1=1
begin
update n
set Notes = replace(n.Notes, '##' + l.name + '##', l.value)
from #notes n
outer apply
(
select top 1 Name
, Value
from Lookup l
where n.Notes like '%##' + l.name + '##%'
) l
where l.name is not null
if ##rowcount = 0
break
end
select *
from #notes
Example at SQL Fiddle.
I second the comment that tsql is just not suited for this operation, but if you must do it in the db here is an example using a function to manage the multiple replace statements.
Since you have a relatively small number of tokens in each note (5-15) and a very large number of tokens (10k-100k) my function first extracts tokens from the input as potential tokens and uses that set to join to your lookup (dbo.Token below). It was far too much work to look for an occurrence of any of your tokens in each note.
I did a bit of perf testing using 50k tokens and 5k notes and this function runs really well, completing in <2 seconds (on my laptop). Please report back how this strategy performs for you.
note: In your example data the token format was not consistent (##_#, ##_##, #_#), I am guessing this was simply a typo and assume all tokens take the form of ##TokenName##.
--setup
if object_id('dbo.[Lookup]') is not null
drop table dbo.[Lookup];
go
if object_id('dbo.fn_ReplaceLookups') is not null
drop function dbo.fn_ReplaceLookups;
go
create table dbo.[Lookup] (LookupName varchar(100) primary key, LookupValue varchar(100));
insert into dbo.[Lookup]
select '##placeholder130##','Dog' union all
select '##myPlaceholder##','Cat' union all
select '##oneMore##','Cow' union all
select '##test##','Horse';
go
create function [dbo].[fn_ReplaceLookups](#input varchar(max))
returns varchar(max)
as
begin
declare #xml xml;
select #xml = cast(('<r><i>'+replace(#input,'##' ,'</i><i>')+'</i></r>') as xml);
--extract the potential tokens
declare #LookupsInString table (LookupName varchar(100) primary key);
insert into #LookupsInString
select distinct '##'+v+'##'
from ( select [v] = r.n.value('(./text())[1]', 'varchar(100)'),
[r] = row_number() over (order by n)
from #xml.nodes('r/i') r(n)
)d(v,r)
where r%2=0;
--tokenize the input
select #input = replace(#input, l.LookupName, l.LookupValue)
from dbo.[Lookup] l
join #LookupsInString lis on
l.LookupName = lis.LookupName;
return #input;
end
go
return
--usage
declare #Notes table ([Id] int primary key, notes varchar(100));
insert into #Notes
select 1, 'This is some notes ##placeholder130## this ##myPlaceholder##, ##oneMore##. End.' union all
select 2, 'Second row...just a ##test##.';
select *,
dbo.fn_ReplaceLookups(notes)
from #Notes;
Returns:
Tokenized
--------------------------------------------------------
This is some notes Dog this Cat, Cow. End.
Second row...just a Horse.
Try this
;WITH CTE (org, calc, [Notes], [level]) AS
(
SELECT [Notes], [Notes], CONVERT(varchar(MAX),[Notes]), 0 FROM PlaceholderTable
UNION ALL
SELECT CTE.org, CTE.[Notes],
CONVERT(varchar(MAX), REPLACE(CTE.[Notes],'##' + T.[Name] + '##', T.[Value])), CTE.[level] + 1
FROM CTE
INNER JOIN LookupTable T ON CTE.[Notes] LIKE '%##' + T.[Name] + '##%'
)
SELECT DISTINCT org, [Notes], level FROM CTE
WHERE [level] = (SELECT MAX(level) FROM CTE c WHERE CTE.org = c.org)
SQL FIDDLE DEMO
Check the below devioblog post for reference
devioblog post
To get speed, you can preprocess the note templates into a more efficient form. This will be a sequence of fragments, with each ending in a substitution. The substitution might be NULL for the last fragment.
Notes
Id FragSeq Text SubsId
1 1 'This is some notes ' 1
1 2 ' this ' 2
1 3 ', ' 3
1 4 '. End.' null
2 1 'Second row...just a ' 4
2 2 '.' null
Subs
Id Name Value
1 'placeholder130' 'Dog'
2 'myPlaceholder' 'Cat'
3 'oneMore' 'Cow'
4 'test' 'Horse'
Now we can do the substitutions with a simple join.
SELECT Notes.Text + COALESCE(Subs.Value, '')
FROM Notes LEFT JOIN Subs
ON SubsId = Subs.Id WHERE Notes.Id = ?
ORDER BY FragSeq
This produces a list of fragments with substitutions complete. I am not an MSQL user, but in most dialects of SQL you can concatenate these fragments in a variable quite easily:
DECLARE #Note VARCHAR(8000)
SELECT #Note = COALESCE(#Note, '') + Notes.Text + COALSCE(Subs.Value, '')
FROM Notes LEFT JOIN Subs
ON SubsId = Subs.Id WHERE Notes.Id = ?
ORDER BY FragSeq
Pre-processing a note template into fragments will be straightforward using the string splitting techniques of other posts.
Unfortunately I'm not at a location where I can test this, but it ought to work fine.
I really don't know how it will perform with 10k+ of lookups.
how does the old dynamic SQL performs?
DECLARE #sqlCommand NVARCHAR(MAX)
SELECT #sqlCommand = N'PlaceholderTable.[Notes]'
SELECT #sqlCommand = 'REPLACE( ' + #sqlCommand +
', ''##' + LookupTable.[Name] + '##'', ''' +
LookupTable.[Value] + ''')'
FROM LookupTable
SELECT #sqlCommand = 'SELECT *, ' + #sqlCommand + ' FROM PlaceholderTable'
EXECUTE sp_executesql #sqlCommand
Fiddle demo
And now for some recursive CTE.
If your indexes are correctly set up, this one should be very fast or very slow. SQL Server always surprises me with performance extremes when it comes to the r-CTE...
;WITH T AS (
SELECT
Row,
StartIdx = 1, -- 1 as first starting index
EndIdx = CAST(patindex('%##%', Notes) as int), -- first ending index
Result = substring(Notes, 1, patindex('%##%', Notes) - 1)
-- (first) temp result bounded by indexes
FROM PlaceholderTable -- **this is your source table**
UNION ALL
SELECT
pt.Row,
StartIdx = newstartidx, -- starting index (calculated in calc1)
EndIdx = EndIdx + CAST(newendidx as int) + 1, -- ending index (calculated in calc4 + total offset)
Result = Result + CAST(ISNULL(newtokensub, newtoken) as nvarchar(max))
-- temp result taken from subquery or original
FROM
T
JOIN PlaceholderTable pt -- **this is your source table**
ON pt.Row = T.Row
CROSS APPLY(
SELECT newstartidx = EndIdx + 2 -- new starting index moved by 2 from last end ('##')
) calc1
CROSS APPLY(
SELECT newtxt = substring(pt.Notes, newstartidx, len(pt.Notes))
-- current piece of txt we work on
) calc2
CROSS APPLY(
SELECT patidx = patindex('%##%', newtxt) -- current index of '##'
) calc3
CROSS APPLY(
SELECT newendidx = CASE
WHEN patidx = 0 THEN len(newtxt) + 1
ELSE patidx END -- if last piece of txt, end with its length
) calc4
CROSS APPLY(
SELECT newtoken = substring(pt.Notes, newstartidx, newendidx - 1)
-- get the new token
) calc5
OUTER APPLY(
SELECT newtokensub = Value
FROM LookupTable
WHERE Name = newtoken -- substitute the token if you can find it in **your lookup table**
) calc6
WHERE newstartidx + len(newtxt) - 1 <= len(pt.Notes)
-- do this while {new starting index} + {length of txt we work on} exceeds total length
)
,lastProcessed AS (
SELECT
Row,
Result,
rn = row_number() over(partition by Row order by StartIdx desc)
FROM T
) -- enumerate all (including intermediate) results
SELECT *
FROM lastProcessed
WHERE rn = 1 -- filter out intermediate results (display only last ones)