Split data element based on delimiter

Split data element based on delimiter - sql

I have a database called Property with a table called Location. The data looks like this:
RecordID Location
-----------------------
1 1/21/s15
2 8/1/21c59
3 1//
4 9//72
I have a script that reads records from the table and inserts them into a second table called ExpandedLocation.
This is the code of my script:
INSERT INTO [Property].[dbo].[ExpandedLocation] (LocationA, LocationB, LocationC)
SELECT
dbo.fnBuildABC(Location, 1),
dbo.fnBuildABC(Location, 2),
dbo.fnBuildABC(Location, 3)
FROM
[Property].[dbo].[Location]
This code should call the function fnBuildABC and pass it 2 parameters, Location and a number. The function should take in the parameters and split the first parameter on the slash and return either the 1st, 2nd, or 3rd portion of the passed string.
So, for example, on the first read of the Location table, I pick up the value 1/21/s15.
The function should return the following:
Parameter Value Returned Value
---------------------------------
Location, 1 1
Location, 2 21
Location, 3 s15
On the second read of the Location table, I pick up the value 8/1/21c59. The function should return the following:
Parameter Value Returned Value
-----------------------------------
Location, 1 8
Location, 2 1
Location, 3 21c59
I'm at a loss as to how to split the passed string in the function without actually inspecting each character of the string one at a time.
Any suggestions on how to start this process would be greatly appreciated. Thank you.

I would build a function that splits your string and returns 3 columns as a table.
With just 3 columns you can comfortably do that with a combination of SQL Server's string functions.
An example of a function would be:
create or alter function fnBuildABC(#location varchar(10))
returns table
as
return
select
Left(location, p1.v - 1) A,
Substring(location, p1.v + 1, p2.v - p1.v - 1) B,
Stuff(location, 1, p2.v, '') C
from (select location = #Location)l
cross apply(values(CharIndex('/', location)))p1(v)
cross apply(values(CharIndex('/', location, p1.v + 1)))p2(v);
And then you can use it with your Location table:
insert into Property.dbo.ExpandedLocation (LocationA, LocationB, LocationC)
select A, B, C
from dbo.Location
cross apply fnBuildABC(Location);
Example DB<>Fiddle
Output:

You may try using XML method as the following:
CREATE FUNCTION fnBuildABC( #loc VARCHAR(MAX))
RETURNS #splitted TABLE
(
[Parameter Value] INT,
[Returned Value] VARCHAR(50)
)
AS
BEGIN
DECLARE #xml xml;
SET #xml = N'<root><p>' + replace(#loc, '/','</p><p>') + '</p></root>';
INSERT INTO #splitted
SELECT ROW_NUMBER() OVER (ORDER BY l_pos) AS pos,
l_pos.value('.', 'VARCHAR(50)') AS val
FROM
#xml.nodes('//root/p') AS Portions(l_pos)
RETURN;
END;
To get all portions of a location call the function as the following:
SELECT CONCAT('Location: ', [Parameter Value]) AS [Parameter Value], [Returned Value]
FROM
(
SELECT L.RecordID, P.* FROM ExpandedLocation L
OUTER APPLY
fnBuildABC(Location) P
) T
WHERE RecordID = 1;
To get specific portions, i.e. 1 & 2:
SELECT CONCAT('Location: ', [Parameter Value]) AS [Parameter Value], [Returned Value]
FROM
(
SELECT L.RecordID, P.* FROM ExpandedLocation L
OUTER APPLY
fnBuildABC(Location) P
) T
WHERE RecordID = 1 AND [Parameter Value] IN (1,2);
See demo.

SQL Server has a built in STRING_SPLIT function that you can probably use. It returns a table of split values.
SELECT value FROM STRING_SPLIT('1/21/s15', '/')
Example output:
Value
1
21
s15

Related

SQL Server: Select rows with multiple occurrences of regex match in a column

I’m fairly used to using MySQL, but not particularly familiar with SQL Server. Tough luck, the database I’m dealing with here is on SQL Server 2014.
I have a table with a column whose values are all integers with leading, separating, and trailing semicolons, like these three fictitious rows:
;905;1493;384;13387;29;933;467;28732;
;905;138;3084;1387;290;9353;4767;2732;
;9085;14493;3864;130387;289;933;4767;28732;
What I am trying to do now is to select all rows where more than one number taken from a list of numbers appears in this column. So for example, given the three rows above, if I have the group 905,467,4767, the statement I’m trying to figure out how to construct should return the first two rows: the first row contains 905 and 467; the second row contains 905 and 4767. The third row contains only 4767, so that row should not be returned.
As far as I can tell, SQL Server does not actually support regex directly (and I don’t even know what managed code is), which doesn’t help. Even with regex, I wouldn’t know where to begin. Oracle seems to have a function that would be very useful, but that’s Oracle.
Most similar questions on here deal with finding multiple instances of the same character (usually singular) and solve the problem by replacing the string to match with nothing and counting the difference in length. I suppose that would technically work here, too, but given a ‘filter’ group of 15 numbers, the SELECT statement would become ridiculously long and convoluted and utterly unreadable. Additionally, I only want to match entire numbers (so if one of the numbers to match is 29, the value 29 would match in the first row, but the value 290 in the second row should not match), which means I’d have to include the semicolons in the REPLACE clause and then discount them when calculating the length. A complete mess.
What I would ideally like to do is something like this:
SELECT * FROM table WHERE REGEXP_COUNT(column, ';(905|467|4767);') > 1
– but that will obviously not work, for all kinds of reasons (the most obvious one being the nonexistence of REGEXP_COUNT outside Oracle).
Is there some sane, manageable way of doing this?

You can do
SELECT *
FROM Mess
CROSS APPLY (SELECT COUNT(*)
FROM (VALUES (905),
(467),
(4767)) V(Num)
WHERE Col LIKE CONCAT('%;', Num, ';%')) ca(count)
WHERE count > 1
SQL Fiddle
Or alternatively
WITH Nums
AS (SELECT Num
FROM (VALUES (905),
(467),
(4767)) V(Num))
SELECT Mess.*
FROM Mess
CROSS APPLY (VALUES(CAST(CONCAT('<x>', REPLACE(Col, ';', '</x><x>'), '</x>') AS XML))) x(x)
CROSS APPLY (SELECT COUNT(*)
FROM (SELECT n.value('.', 'int')
FROM x.x.nodes('/x') n(n)
WHERE n.value('.', 'varchar') <> ''
INTERSECT
SELECT Num
FROM Nums) T(count)
HAVING COUNT(*) > 1) ca2(count)

Could you put your arguments into a table (perhaps using a table-valued function accepting a string (of comma-separated integers) as a parameter) and use something like this?
DECLARE #T table (String varchar(255))
INSERT INTO #T
VALUES
(';905;1493;384;13387;29;933;467;28732;')
, (';905;138;3084;1387;290;9353;4767;2732;')
, (';9085;14493;3864;130387;289;933;4767;28732;')
DECLARE #Arguments table (Arg int)
INSERT INTO #Arguments
VALUES
(905)
, (467)
, (4767)
SELECT String
FROM
#T
CROSS JOIN #Arguments
GROUP BY String
HAVING SUM(CASE WHEN PATINDEX('%;' + CAST(Arg AS varchar) + ';%', String) > 0 THEN 1 ELSE 0 END) > 1
And example of using this with a function to generate the arguments:
CREATE FUNCTION GenerateArguments (#Integers varchar(255))
RETURNS #Arguments table (Arg int)
AS
BEGIN
WITH cte
AS
(
SELECT
PATINDEX('%,%', #Integers) p
, LEFT(#Integers, PATINDEX('%,%', #Integers) - 1) n
UNION ALL
SELECT
CASE WHEN PATINDEX('%,%', SUBSTRING(#Integers, p + 1, LEN(#Integers))) + p = p THEN 0 ELSE PATINDEX('%,%', SUBSTRING(#Integers, p + 1, LEN(#Integers))) + p END
, CASE WHEN PATINDEX('%,%', SUBSTRING(#Integers, p + 1, LEN(#Integers))) = 0 THEN RIGHT(#Integers, PATINDEX('%,%', REVERSE(#Integers)) - 1) ELSE LEFT(SUBSTRING(#Integers, p + 1, LEN(#Integers)), PATINDEX('%,%', SUBSTRING(#Integers, p + 1, LEN(#Integers))) - 1) END
FROM cte
WHERE p <> 0
)
INSERT INTO #Arguments (Arg)
SELECT n
FROM cte
RETURN
END
GO
DECLARE #T table (String varchar(255))
INSERT INTO #T
VALUES
(';905;1493;384;13387;29;933;467;28732;')
, (';905;138;3084;1387;290;9353;4767;2732;')
, (';9085;14493;3864;130387;289;933;4767;28732;')
;
SELECT String
FROM
#T
CROSS JOIN GenerateArguments('905,467,4767')
GROUP BY String
HAVING SUM(CASE WHEN PATINDEX('%;' + CAST(Arg AS varchar) + ';%', String) > 0 THEN 1 ELSE 0 END) > 1

You can achieve this using the like function for the regex and row_number to determine the number of matches.
Here we declare the column values for testing:
DECLARE #tbl TABLE (
string NVARCHAR(MAX)
)
INSERT #tbl VALUES
(';905;1493;384;13387;29;933;467;28732;'),
(';905;138;3084;1387;290;9353;4767;2732;'),
(';9085;14493;3864;130387;289;933;4767;28732;')
Then we pass your search parameters into a table variable to be joined on:
DECLARE #search_tbl TABLE (
search_value INT
)
INSERT #search_tbl VALUES
(905),
(467),
(4767)
Finally we join the table with the column to search for onto the search table. We apply the row_number function to determine the number of times it matches. We select from this subquery where the row_number = 2 meaning that it joined at least twice.
SELECT
string
FROM (
SELECT
tbl.string,
ROW_NUMBER() OVER (PARTITION BY tbl.string ORDER BY tbl.string) AS rn
FROM #tbl tbl
JOIN #search_tbl search_tbl ON
tbl.string LIKE '%;' + CAST(search_tbl.search_value AS NVARCHAR(MAX)) + ';%'
) tbl
WHERE rn = 2

You could build a where clause like this :
WHERE
case when column like '%;905;%' then 1 else 0 end +
case when column like '%;467;%' then 1 else 0 end +
case when column like '%;4767;%' then 1 else 0 end >= 2
The advantage is that you do not need a helper table. I don't know how you build the query, but the following also works, and is useful if the numbers are in a tsql variable.
case when column like ('%;' + #n + ';%') then 1 else 0 end

Replace all numbers with three digits or more

I have a field say "keywords" which contains random strings of numbers and I'd like to clean the field from any string of numbers which has more than 3 digits.
I have searched and know wildcards are not possible in replace. Any idea how I can go about that?

Here's a good place to start
Say you have a table called "test_test":
create table dbo.test_test (thisStuff varchar(100));
With a value like this in it:
insert into test_test values ('Hello123 this is 12 a test 22983o398r57298298347238');
You can do some limited pattern matching with patindex():
select substring(thisStuff,
1,
patindex('%[0-9][0-9][0-9]%',thisStuff)-1) +
substring(thisStuff,
patindex('%[0-9][0-9][0-9]%',thisStuff)+3,
len(thisStuff))
from test_test
Which converts this value:
Hello123 this is 12 a test 22983o398r57298298347238
Into this value:
Hello this is 12 a test 22983o398r57298298347238
In update form it would look like this:
update test_test set thisStuff =
substring(thisStuff,
1,
patindex('%[0-9][0-9][0-9]%',thisStuff)-1) +
substring(thisStuff,
patindex('%[0-9][0-9][0-9]%',thisStuff)+3,
len(thisStuff));
Which, when run over and over, gives you the progressive values:
Hello this is 12 a test 83o398r57298298347238
Hello this is 12 a test 83or57298298347238
Hello this is 12 a test 83or98298347238
Hello this is 12 a test 83or98347238
Hello this is 12 a test 83or47238
Hello this is 12 a test 83or38
Before erroring out
Msg 3621, Level 0, State 0.
The statement has been terminated.
Msg 537, Level 16, State 2.
Invalid length parameter passed to the LEFT or SUBSTRING function. (Line 35)

Since you are on 2016, you can use String_Split() in concert with Try_Convert()
Example
Declare #YourTable table (idproduct int,searchkeywords varchar(500))
Insert Into #YourTable values
(109070,'stands & cabinets kantec ams300 1010055 43212002 03906786808 7503 ktkams ltk ams 300')
Select A.idproduct
,NewString = B.S
From #YourTable A
Cross Apply (
Select S = Stuff((Select ' ' +Value
From (Select Value,Seq=Row_Number() over (Order by (select null))
From String_Split(A.searchkeywords,' ')
) B1
Where (try_convert(float,Value) is null)
or (try_convert(float,Value) is not null and len(Value)<=3)
Order by Seq
For XML Path ('')),1,1,'')
) B
Returns
idproduct NewString
109070 stands & cabinets kantec ams300 ktkams ltk ams 300
If you are satisfied with the results, you can apply an update like so:
Update A Set searchkeywords = B.S
From #YourTable A
Cross Apply (
Select S = Stuff((Select ' ' +Value
From (Select Value,Seq=Row_Number() over (Order by (select null))
From String_Split(A.searchkeywords,' ')
) B1
Where (try_convert(float,Value) is null)
or (try_convert(float,Value) is not null and len(Value)<=3)
Order by Seq
For XML Path ('')),1,1,'')
) B

Select rows using in with comma-separated string parameter

I'm converting a stored procedure from MySql to SQL Server. The procedure has one input parameter nvarchar/varchar which is a comma-separated string, e.g.
'1,2,5,456,454,343,3464'
I need to write a query that will retrieve the relevant rows, in MySql I'm using FIND_IN_SET and I wonder what the equivalent is in SQL Server.
I also need to order the ids as in the string.
The original query is:
SELECT *
FROM table_name t
WHERE FIND_IN_SET(id,p_ids)
ORDER BY FIND_IN_SET(id,p_ids);

The equivalent is like for the where and then charindex() for the order by:
select *
from table_name t
where ','+p_ids+',' like '%,'+cast(id as varchar(255))+',%'
order by charindex(',' + cast(id as varchar(255)) + ',', ',' + p_ids + ',');
Well, you could use charindex() for both, but the like will work in most databases.
Note that I've added delimiters to the beginning and end of the string, so 464 will not accidentally match 3464.

You would need to write a FIND_IN_SET function as it does not exist. The closet mechanism I can think of to convert a delimited string into a joinable object would be a to create a table-valued function and use the result in a standard in statement. It would need to be similar to:
DECLARE #MyParam NVARCHAR(3000)
SET #MyParam='1,2,5,456,454,343,3464'
SELECT
*
FROM
MyTable
WHERE
MyTableID IN (SELECT ID FROM dbo.MySplitDelimitedString(#MyParam,','))
And you would need to create a MySplitDelimitedString type table-valued function that would split a string and return a TABLE (ID INT) object.

A set based solution that splits the id's into ints and join with the base table which will make use of index on the base table id. I assumed the id would be an int, otherwise just remove the cast.
declare #ids nvarchar(100) = N'1,2,5,456,454,343,3464';
with nums as ( -- Generate numbers
select top (len(#ids)) row_number() over (order by (select 0)) n
from sys.messages
)
, pos1 as ( -- Get comma positions
select c.ci
from nums n
cross apply (select charindex(',', #ids, n.n) as ci) c
group by c.ci
)
, pos2 as ( -- Distinct posistions plus start and end
select ci
from pos1
union select 0
union select len(#ids) + 1
)
, pos3 as ( -- add row number for join
select ci, row_number() over (order by ci) as r
from pos2
)
, ids as ( -- id's and row id for ordering
select cast(substring(#ids, p1.ci + 1, p2.ci - p1.ci - 1) as int) id, row_number() over (order by p1.ci) r
from pos3 p1
inner join pos3 p2 on p2.r = p1.r + 1
)
select *
from ids i
inner join table_name t on t.id = i.id
order by i.r;

You can also try this by using regex to get the input values from comma separated string :
select * from table_name where id in (
select regexp_substr(p_ids,'[^,]+', 1, level) from dual
connect by regexp_substr(p_ids, '[^,]+', 1, level) is not null );

SQL Server Function OR Procedure to cut records form table by parameters from other table

I have two tables i should make a comparing between those two tables, the first table have one column this column is the full URL and the other table have two columns first column is URLCategory and the other one is the number of how many / i should cut before in the other table column URL
the first table is
URL
http://10.6.2.26/ERP/HRServices/WorkflowService.asmx
http://195.170.180.170/SADAD/PaymentNotificationService.asmx
http://10.6.2.26/ERP/HRServices/WorkflowService.asmx
http://10.6.2.26/ERP/HRServices/WorkflowService.asmx
http://10.6.2.26/ERP/HRServices/WorkflowService.asmx
http://217.146.8.6/din.aspx?s=11575802&client=DynGate&p=10002926
http://195.170.180.170/SADAD/PaymentNotificationService.asmx
http://10.6.2.26/ERP/HRServices/WorkflowService.asmx
http://195.170.180.170/SADAD/PaymentNotificationService.asmx
http://www.google.com/
the Second table which is hould compare with
URL CUT_BEFORE
http://10.6.2.26 3
http://217.146.8.6 1
http://195.170.180.170 2
I should compare between second table with first column to be like that
URL
http://10.6.2.26/ERP/HRServices
http://195.170.180.170/SADAD
http://10.6.2.26/ERP/HRServices
http://10.6.2.26/ERP/HRServices
http://10.6.2.26/ERP/HRServices
http://217.146.8.6
http://195.170.180.170/SADAD
http://10.6.2.26/ERP/HRServices
http://195.170.180.170/SADAD
http://www.google.com/
What's the function script to do something like that in SQLServer
OR can we make it in Stored procedure with while loop because when i tried to execute the last function below i used this query
declare #table table
( main_url NVARCHAR(MAX),URL NVARCHAR(MAX), count int)
insert #TABLE
select
Main_URL,T2.Url,T2.[Count]
from
(select
URL as Main_URL,LEFT(URL1, CHARINDEX('/', URL1) - 1) as URL1
from
(select URL,replace(stuff(URL1, 1,patindex('%://%', URL1 + '0'), ''),'//','') as URL1
from (select URL, convert(nvarchar(max),[Url]) Url1 from [dbo].[InternetUsage_nn] )T1)T)T1
left outer join [dbo].[InternetUsage_URL_List] T2
on T1.URL1=convert(nvarchar(max),T2.URL) where T2.URL is not null
select dbo.FindAbsolutePath('/',Main_url,count) from #Table
waiting for your answers
Thanks

The following does what you requre, with the aid of a Split function:
CREATE FUNCTION dbo.Split(#StringToSplit NVARCHAR(MAX), #Delimiter NCHAR(1))
RETURNS TABLE
AS
RETURN
(
SELECT ID = ROW_NUMBER() OVER(ORDER BY n.Number),
Position = Number,
Value = SUBSTRING(#StringToSplit, Number, CHARINDEX(#Delimiter, #StringToSplit + #Delimiter, Number) - Number)
FROM ( SELECT TOP (LEN(#StringToSplit) + 1) Number = ROW_NUMBER() OVER(ORDER BY a.object_id)
FROM sys.all_objects a
) n
WHERE SUBSTRING(#Delimiter + #StringToSplit + #Delimiter, n.Number, 1) = #Delimiter
);
Once you have this function your code becomes relatively concise:
DECLARE #T TABLE (URL VARCHAR(1000));
DECLARE #T2 TABLE (URL VARCHAR(1000), Cut_Before INT);
-- POPULATE TABLES HERE (NOT INCLUDED TO SAVE SPACE)
WITH CTE AS
( SELECT FullURL = t.URL,
BaseURLLength = LEN(ISNULL(t2.URL, t.URL)),
Remainder = ISNULL(REPLACE(t.URL, t2.URL, ''), ''),
Cut_Before = ISNULL(t2.Cut_Before, 1)
FROM #T AS t
LEFT JOIN #T2 AS t2
ON t.URL LIKE t2.URL + '/%'
)
SELECT t.FullURL,
Cut = SUBSTRING(t.FullURL, 1, BaseURLLength + LEN(s.Value) + s.Position - 1)
FROM CTE t
OUTER APPLY dbo.Split(t.Remainder, '/') AS s
WHERE s.ID = t.Cut_Before;
Example on SQL Fiddle
The premise is, the first part inside the CTE identifies the part URL for each full url by joining using LIKE. Using http://10.6.2.26/ERP/HRServices/WorkflowService.asmx as an example this will show the following:
FullURL: http://10.6.2.26/ERP/HRServices/WorkflowService.asmx
BaseURLLength: 16
Remainder: /ERP/HRServices/WorkflowService.asmx
Cut_Before: 3
Where remainder is what is left of the full url after you have removed the part URL. The split function will then split the remainder into a new row for each of it's component parts:
SELECT *
FROM dbo.Split('/ERP/HRServices/WorkflowService.asmx', '/');
Will return:
ID Position Value
1 1
2 2 ERP
3 6 HRServices
4 17 WorkflowService.asmx
This is then limited to only the row that matches the Cut_Before value. This row can then be used to establish the position to "Cut" the full URL (the starting position + the length of the value at that position).

i modified my code. this code block will resolve your problem.
CREATE Function FindAbsolutePath(
#TargetStr varchar(8000),
#SearchedStr varchar(8000),
#Occurrence int
)
RETURNS varchar(8000)
AS
BEGIN
DECLARE #Result varchar(8000);
if CHARINDEX('http://',#SearchedStr)>0 --fix http://
BEGIN
set #Occurrence=#Occurrence+2;
END
;WITH Occurrences AS (
SELECT
Number,
ROW_NUMBER() OVER(ORDER BY Number) AS Occurrence
FROM master.dbo.spt_values
WHERE
Number BETWEEN 1
AND LEN(#SearchedStr)
AND type='P'
AND SUBSTRING(#SearchedStr,Number,LEN(#TargetStr))=#TargetStr
)
SELECT #Result= SUBSTRING(#SearchedStr,0,Number)
FROM Occurrences
WHERE Occurrence=#Occurrence
return #Result
END
--select dbo.FindAbsolutePath('/','http://10.6.2.26/ERP/HRServices/WorkflowService.asmx',3)

replace value in varchar(max) field with join

I have a table that contains text field with placeholders. Something like this:
Row Notes
1. This is some notes ##placeholder130## this ##myPlaceholder##, #oneMore#. End.
2. Second row...just a ##test#.
(This table contains about 1-5k rows on average. Average number of placeholders in one row is 5-15).
Now, I have a lookup table that looks like this:
Name Value
placeholder130 Dog
myPlaceholder Cat
oneMore Cow
test Horse
(Lookup table will contain anywhere from 10k to 100k records)
I need to find the fastest way to join those placeholders from strings to a lookup table and replace with value. So, my result should look like this (1st row):
This is some notes Dog this Cat, Cow. End.
What I came up with was to split each row into multiple for each placeholder and then join it to lookup table and then concat records back to original row with new values, but it takes around 10-30 seconds on average.

You could try to split the string using a numbers table and rebuild it with for xml path.
select (
select coalesce(L.Value, T.Value)
from Numbers as N
cross apply (select substring(Notes.notes, N.Number, charindex('##', Notes.notes + '##', N.Number) - N.Number)) as T(Value)
left outer join Lookup as L
on L.Name = T.Value
where N.Number <= len(notes) and
substring('##' + notes, Number, 2) = '##'
order by N.Number
for xml path(''), type
).value('text()[1]', 'varchar(max)')
from Notes
SQL Fiddle
I borrowed the string splitting from this blog post by Aaron Bertrand

SQL Server is not very fast with string manipulation, so this is probably best done client-side. Have the client load the entire lookup table, and replace the notes as they arrived.
Having said that, it can of course be done in SQL. Here's a solution with a recursive CTE. It performs one lookup per recursion step:
; with Repl as
(
select row_number() over (order by l.name) rn
, Name
, Value
from Lookup l
)
, Recurse as
(
select Notes
, 0 as rn
from Notes
union all
select replace(Notes, '##' + l.name + '##', l.value)
, r.rn + 1
from Recurse r
join Repl l
on l.rn = r.rn + 1
)
select *
from Recurse
where rn =
(
select count(*)
from Lookup
)
option (maxrecursion 0)
Example at SQL Fiddle.
Another option is a while loop to keep replacing lookups until no more are found:
declare #notes table (notes varchar(max))
insert #notes
select Notes
from Notes
while 1=1
begin
update n
set Notes = replace(n.Notes, '##' + l.name + '##', l.value)
from #notes n
outer apply
(
select top 1 Name
, Value
from Lookup l
where n.Notes like '%##' + l.name + '##%'
) l
where l.name is not null
if ##rowcount = 0
break
end
select *
from #notes
Example at SQL Fiddle.

I second the comment that tsql is just not suited for this operation, but if you must do it in the db here is an example using a function to manage the multiple replace statements.
Since you have a relatively small number of tokens in each note (5-15) and a very large number of tokens (10k-100k) my function first extracts tokens from the input as potential tokens and uses that set to join to your lookup (dbo.Token below). It was far too much work to look for an occurrence of any of your tokens in each note.
I did a bit of perf testing using 50k tokens and 5k notes and this function runs really well, completing in <2 seconds (on my laptop). Please report back how this strategy performs for you.
note: In your example data the token format was not consistent (##_#, ##_##, #_#), I am guessing this was simply a typo and assume all tokens take the form of ##TokenName##.
--setup
if object_id('dbo.[Lookup]') is not null
drop table dbo.[Lookup];
go
if object_id('dbo.fn_ReplaceLookups') is not null
drop function dbo.fn_ReplaceLookups;
go
create table dbo.[Lookup] (LookupName varchar(100) primary key, LookupValue varchar(100));
insert into dbo.[Lookup]
select '##placeholder130##','Dog' union all
select '##myPlaceholder##','Cat' union all
select '##oneMore##','Cow' union all
select '##test##','Horse';
go
create function [dbo].[fn_ReplaceLookups](#input varchar(max))
returns varchar(max)
as
begin
declare #xml xml;
select #xml = cast(('<r><i>'+replace(#input,'##' ,'</i><i>')+'</i></r>') as xml);
--extract the potential tokens
declare #LookupsInString table (LookupName varchar(100) primary key);
insert into #LookupsInString
select distinct '##'+v+'##'
from ( select [v] = r.n.value('(./text())[1]', 'varchar(100)'),
[r] = row_number() over (order by n)
from #xml.nodes('r/i') r(n)
)d(v,r)
where r%2=0;
--tokenize the input
select #input = replace(#input, l.LookupName, l.LookupValue)
from dbo.[Lookup] l
join #LookupsInString lis on
l.LookupName = lis.LookupName;
return #input;
end
go
return
--usage
declare #Notes table ([Id] int primary key, notes varchar(100));
insert into #Notes
select 1, 'This is some notes ##placeholder130## this ##myPlaceholder##, ##oneMore##. End.' union all
select 2, 'Second row...just a ##test##.';
select *,
dbo.fn_ReplaceLookups(notes)
from #Notes;
Returns:
Tokenized
--------------------------------------------------------
This is some notes Dog this Cat, Cow. End.
Second row...just a Horse.

Try this
;WITH CTE (org, calc, [Notes], [level]) AS
(
SELECT [Notes], [Notes], CONVERT(varchar(MAX),[Notes]), 0 FROM PlaceholderTable
UNION ALL
SELECT CTE.org, CTE.[Notes],
CONVERT(varchar(MAX), REPLACE(CTE.[Notes],'##' + T.[Name] + '##', T.[Value])), CTE.[level] + 1
FROM CTE
INNER JOIN LookupTable T ON CTE.[Notes] LIKE '%##' + T.[Name] + '##%'
)
SELECT DISTINCT org, [Notes], level FROM CTE
WHERE [level] = (SELECT MAX(level) FROM CTE c WHERE CTE.org = c.org)
SQL FIDDLE DEMO
Check the below devioblog post for reference
devioblog post

To get speed, you can preprocess the note templates into a more efficient form. This will be a sequence of fragments, with each ending in a substitution. The substitution might be NULL for the last fragment.
Notes
Id FragSeq Text SubsId
1 1 'This is some notes ' 1
1 2 ' this ' 2
1 3 ', ' 3
1 4 '. End.' null
2 1 'Second row...just a ' 4
2 2 '.' null
Subs
Id Name Value
1 'placeholder130' 'Dog'
2 'myPlaceholder' 'Cat'
3 'oneMore' 'Cow'
4 'test' 'Horse'
Now we can do the substitutions with a simple join.
SELECT Notes.Text + COALESCE(Subs.Value, '')
FROM Notes LEFT JOIN Subs
ON SubsId = Subs.Id WHERE Notes.Id = ?
ORDER BY FragSeq
This produces a list of fragments with substitutions complete. I am not an MSQL user, but in most dialects of SQL you can concatenate these fragments in a variable quite easily:
DECLARE #Note VARCHAR(8000)
SELECT #Note = COALESCE(#Note, '') + Notes.Text + COALSCE(Subs.Value, '')
FROM Notes LEFT JOIN Subs
ON SubsId = Subs.Id WHERE Notes.Id = ?
ORDER BY FragSeq
Pre-processing a note template into fragments will be straightforward using the string splitting techniques of other posts.
Unfortunately I'm not at a location where I can test this, but it ought to work fine.

I really don't know how it will perform with 10k+ of lookups.
how does the old dynamic SQL performs?
DECLARE #sqlCommand NVARCHAR(MAX)
SELECT #sqlCommand = N'PlaceholderTable.[Notes]'
SELECT #sqlCommand = 'REPLACE( ' + #sqlCommand +
', ''##' + LookupTable.[Name] + '##'', ''' +
LookupTable.[Value] + ''')'
FROM LookupTable
SELECT #sqlCommand = 'SELECT *, ' + #sqlCommand + ' FROM PlaceholderTable'
EXECUTE sp_executesql #sqlCommand
Fiddle demo

And now for some recursive CTE.
If your indexes are correctly set up, this one should be very fast or very slow. SQL Server always surprises me with performance extremes when it comes to the r-CTE...
;WITH T AS (
SELECT
Row,
StartIdx = 1, -- 1 as first starting index
EndIdx = CAST(patindex('%##%', Notes) as int), -- first ending index
Result = substring(Notes, 1, patindex('%##%', Notes) - 1)
-- (first) temp result bounded by indexes
FROM PlaceholderTable -- **this is your source table**
UNION ALL
SELECT
pt.Row,
StartIdx = newstartidx, -- starting index (calculated in calc1)
EndIdx = EndIdx + CAST(newendidx as int) + 1, -- ending index (calculated in calc4 + total offset)
Result = Result + CAST(ISNULL(newtokensub, newtoken) as nvarchar(max))
-- temp result taken from subquery or original
FROM
T
JOIN PlaceholderTable pt -- **this is your source table**
ON pt.Row = T.Row
CROSS APPLY(
SELECT newstartidx = EndIdx + 2 -- new starting index moved by 2 from last end ('##')
) calc1
CROSS APPLY(
SELECT newtxt = substring(pt.Notes, newstartidx, len(pt.Notes))
-- current piece of txt we work on
) calc2
CROSS APPLY(
SELECT patidx = patindex('%##%', newtxt) -- current index of '##'
) calc3
CROSS APPLY(
SELECT newendidx = CASE
WHEN patidx = 0 THEN len(newtxt) + 1
ELSE patidx END -- if last piece of txt, end with its length
) calc4
CROSS APPLY(
SELECT newtoken = substring(pt.Notes, newstartidx, newendidx - 1)
-- get the new token
) calc5
OUTER APPLY(
SELECT newtokensub = Value
FROM LookupTable
WHERE Name = newtoken -- substitute the token if you can find it in **your lookup table**
) calc6
WHERE newstartidx + len(newtxt) - 1 <= len(pt.Notes)
-- do this while {new starting index} + {length of txt we work on} exceeds total length
)
,lastProcessed AS (
SELECT
Row,
Result,
rn = row_number() over(partition by Row order by StartIdx desc)
FROM T
) -- enumerate all (including intermediate) results
SELECT *
FROM lastProcessed
WHERE rn = 1 -- filter out intermediate results (display only last ones)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Split data element based on delimiter - sql

SQL Server has a built in STRING_SPLIT function that you can probably use. It returns a table of split values. SELECT value FROM STRING_SPLIT('1/21/s15', '/') Example output: Value 1 21 s15

Related

SQL Server: Select rows with multiple occurrences of regex match in a column

Replace all numbers with three digits or more

Select rows using in with comma-separated string parameter

SQL Server Function OR Procedure to cut records form table by parameters from other table

replace value in varchar(max) field with join

Categories

Resources