SQL Server 2012 T-SQL count number of words between elements of two sets - sql

I have two sets of elements, let's say they are these words:
set 1: "nuclear", "fission", "dirty" and
set 2: "device", "explosive"
In my database, I have a text column (Description) which contains a sentence or two. I would like to find any records where Description contains both an element from set 1 followed by an element from set 2, where the two elements are separated by four words or less. For simplicity, counting (spaces-1) will count words between the two elements.
I'd prefer it if a solution didn't require the installation of anything like CLR functions for regular expression. Rather, if this could be done with a user-defined table function, it would make deployment simpler.
Does this sound possible?

It is possible, but i do not think it will preform well with millions of rows.
I have a solution here that handles about 10 000 rows in 2 sec and 100 000 rows in about 20 sec on our server. It also requires the famous DelimitedSplit8K sql table function from SQLServerCentral:
DECLARE #set1 VARCHAR(MAX) = 'nuclear, fission, dirty';
DECLARE #set2 VARCHAR(MAX) = 'device, explosive';
WITH GetDistances AS
(
SELECT
DubID = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID)
, Distance = dbo.[cf_ValueSetDistance](s.Description, #set1, #set2)
, s.ID
,s.Description
FROM #sentences s
JOIN dbo.cf_DelimitedSplit8K(#set1, ',') s1 ON s.Description LIKE '%' + RTRIM(LTRIM(s1.Item)) + '%'
JOIN dbo.cf_DelimitedSplit8K(#set2, ',') s2 ON s.Description LIKE '%' + RTRIM(LTRIM(s2.Item)) + '%'
) SELECT Distance, ID, Description FROM GetDistances WHERE DubID = 1 AND Distance BETWEEN 1 AND 4;
--10 000 rows: 2sec
--100 000 rows: 20sec
Test data generator
--DROP TABLE #sentences
CREATE TABLE #sentences
(
ID INT IDENTITY(1,1) PRIMARY KEY
, Description VARCHAR(100)
);
GO
--CREATE 10000 random sentences that are 100 chars long
SET NOCOUNT ON;
WHILE((SELECT COUNT(*) FROM #sentences) < 10000)
BEGIN
DECLARE #randomWord VARCHAR(100) = '';
SELECT TOP 100 #randomWord = #randomWord + ' ' + Item FROM dbo.cf_DelimitedSplit8K('nuclear fission dirty device explosive On the other hand, we denounce with righteous indignation and dislike men who are so beguiled and demoralized by the charms of pleasure of the moment, so blinded by desire, that they cannot foresee the pain and trouble that are bound to ensue; and equal blame belongs to those who fail in their duty through weakness of will, which is the same as saying through shrinking from toil and pain. These cases are perfectly simple and easy to distinguish. In a free hour, when our power of choice is untrammelled and when nothing prevents our being able to do what we like best, every pleasure is to be welcomed and every pain avoided. But in certain circumstances and owing to the claims of duty or the obligations of business it will frequently occur that pleasures have to be repudiated and annoyances accepted. The wise man therefore always holds in these matters to this principle of selection: he rejects pleasures to secure other greater pleasures, or else he endures pains to avoid worse pains', ' ') ORDER BY NEWID();
INSERT INTO #sentences
SELECT #randomWord
END
SET NOCOUNT OFF;
Function 1 - cf_ValueSetDistance
CREATE FUNCTION [dbo].[cf_ValueSetDistance]
(
#value VARCHAR(MAX)
, #compareSet1 VARCHAR(MAX)
, #compareSet2 VARCHAR(MAX)
)
RETURNS INT
AS
BEGIN
SET #value = REPLACE(REPLACE(REPLACE(#value, '.', ''), ',', ''), '?', '');
DECLARE #distance INT;
DECLARE #sentence TABLE( WordIndex INT, Word VARCHAR(MAX) );
DECLARE #set1 TABLE(Word VARCHAR(MAX) );
DECLARE #set2 TABLE(Word VARCHAR(MAX) );
INSERT INTO #sentence
SELECT ItemNumber, RTRIM(LTRIM(Item)) FROM dbo.cf_DelimitedSplit8K(#value, ' ')
INSERT INTO #set1
SELECT RTRIM(LTRIM(Item)) FROM dbo.cf_DelimitedSplit8K(#compareSet1, ',')
IF(EXISTS(SELECT 1 FROM #sentence s JOIN #set1 s1 ON s.Word = s1.Word))
BEGIN
INSERT INTO #set2
SELECT RTRIM(LTRIM(Item)) FROM dbo.cf_DelimitedSplit8K(#compareSet2, ',');
IF(EXISTS(SELECT 1 FROM #sentence s JOIN #set2 s2 ON s.Word = s2.Word))
BEGIN
WITH Set1 AS (
SELECT s.WordIndex, s.Word FROM #sentence s
JOIN #set1 s1 ON s1.Word = s.Word
), Set2 AS
(
SELECT s.WordIndex, s.Word FROM #sentence s
JOIN #set2 s2 ON s2.Word = s.Word
)
SELECT #distance = MIN(ABS(s2.WordIndex - s1.WordIndex)) FROM Set1 s1, Set2 s2
END
END
RETURN #distance;
END
Function 2 - DelimitedSplit8K
(No need to even try to understand this code, this is an extremely fast function for splitting a string to a table, written by several talented people):
CREATE FUNCTION [dbo].[cf_DelimitedSplit8K]
(#pString VARCHAR(8000), #pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover NVARCHAR(4000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l;

I dont know anything about performance, but this could be done with cross apply and two temporary tables.
--initialize word set data
DECLARE #set1 TABLE (wordFromSet varchar(n))
DECLARE #set2 TABLE (wordFromSet varchar(n))
INSERT INTO #set1 SELECT 'nuclear' UNION SELECT 'fission' UNION SELECT 'dirty'
INSERT INTO #set2 SELECT 'device' UNION SELECT 'explosive'
SELECT *
FROM MyTable m
CROSS APPLY
(
SELECT wordFromSet
,LEN(SUBSTRING(m.Description, 1, CHARINDEX(wordFromSet, m.Description))) - LEN(REPLACE(SUBSTRING(m.Description, 1, CHARINDEX(wordFromSet, m.Description)),' ', '')) AS WordPosition
FROM #set1
WHERE m.Description LIKE '%' + wordFromSet + '%'
) w1
CROSS APPLY
(
SELECT wordFromSet
,LEN(SUBSTRING(m.Description, 1, CHARINDEX(wordFromSet, m.Description))) - LEN(REPLACE(SUBSTRING(m.Description, 1, CHARINDEX(wordFromSet, m.Description)),' ', '')) AS WordPosition
FROM #set2
WHERE m.Description LIKE '%' + wordFromSet + '%'
) w2
WHERE w2.WordPosition - w1.WordPosition <= treshold
Essentially it will only return rows from MyTable that have at least a word from both sets, and for these rows it will calculate which word position it holds by calculating the difference in length between the substring that ends at the words position and the same substring with spaces removed.

I am adding a new answer, even if my old one has been accepted and I can see you went for the "FULL TEXT INDEX".
I have looked at the answer #Louis gave, and I think it was clever to use "CROSS APPLY". His answer beats the performance of mine. The only problem is that his code will only compare from the first instance of a word. This made me want to try to combine his answer with the split function I used (DelimitedSplit8K from SQLServerCentral).
This results in a remarkable performance boost, I have tested this on 1 million rows, and the result was almost instant:
My old answer: 5min
#Louis answer: 2min
New answer: 3sec
This do not beet the "FULLTEXT INDEX" performance wise, but it at least supports the word search combination specification you provided in a relatively effective way.
DECLARE #set1 TABLE (Word VARCHAR(50))
DECLARE #set2 TABLE (Word VARCHAR(50))
INSERT INTO #set1 SELECT 'nuclear' UNION SELECT 'fission' UNION SELECT 'dirty'
INSERT INTO #set2 SELECT 'device'UNION SELECT 'explosive'
SELECT * FROM #sentences s
CROSS APPLY
(
SELECT * FROM #set1 s1
JOIN dbo.cf_DelimitedSplit8K(s.Description, ' ') split ON split.Item = s1.Word
) s1
CROSS APPLY
(
SELECT * FROM #set2 s2
JOIN dbo.cf_DelimitedSplit8K(s.Description, ' ') split ON split.Item = s2.Word
) s2
WHERE ABS(s1.ItemNumber - s2.ItemNumber) <= 4;
Look at my old answer for the code for the dbo.cf_COM_DelimitedSplit8K function.

Related

How to put mailto around email addresses in text string

I am trying to figure out how to be able to select/find and format each email address contained in a piece of text.
Example string:
Notification: Organizer must notify at least 30 days prior to the event. Provide the event information, including: day of contact information, location, date, schedule, activities, etc. Paul T. Hall – paulhall#email.com - Mikel Zubizarreta – mikelzubizarreta#email.com
The output of the string should be:
Notification: Organizer must notify at least 30 days prior to the event. Provide the event information, including: day of contact information, location, date, schedule, activities, etc. Paul T. Hall – <a href='mailto:paulhall#email.com'> - paulhall#email.com</a> - Mikel Zubizarreta – <a href='mailto:mikelzubizarreta#email.com'>mikelzubizarreta#email.com</a>
This are the attempts I have come up with:
Within a select:
, CASE
WHEN CHARINDEX('#',CONDITION) > 0 THEN
REPLACE(CONDITION, dbo.FN_GET_EMAIL_FROM_STRING(CONDITION), '<a href=''mailto:' + dbo.FN_GET_EMAIL_FROM_STRING(CONDITION) + '''>' + dbo.FN_GET_EMAIL_FROM_STRING(CONDITION) + '</a>')
ELSE CONDITION
END [CONDITION]
Contents of dbo.FN_GET_EMAIL_FROM_STRING(CONDITION):
ALTER FUNCTION [dbo].[FN_GET_EMAIL_FROM_STRING]
(
#TextContainingEmail VARCHAR(1000)
)
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE #retval VARCHAR(1000);
SELECT TOP
1 #retval = Items
FROM
dbo.FN_SPLIT_STRING(#TextContainingEmail, '')
WHERE
Items LIKE '%#%';
RETURN #retval;
END;
Contents of: FN_SPLIT_STRING(#TextContainingEmail, '')
ALTER FUNCTION [dbo].[FN_SPLIT_STRING]
(
#STRING NVARCHAR(4000)
, #Delimiter CHAR(1)
)
RETURNS #Results TABLE(Items NVARCHAR(4000))
AS
BEGIN
DECLARE #INDEX INT;
DECLARE #SLICE NVARCHAR(4000);
-- HAVE TO SET TO 1 SO IT DOESNT EQUAL ZERO FIRST TIME IN LOOP
SELECT #INDEX = 1;
IF #STRING IS NULL
RETURN;
WHILE #INDEX != 0
BEGIN
-- GET THE INDEX OF THE FIRST OCCURENCE OF THE SPLIT CHARACTER
SELECT
#INDEX = CHARINDEX(#Delimiter, LTRIM(RTRIM(#STRING)));
-- NOW PUSH EVERYTHING TO THE LEFT OF IT INTO THE SLICE VARIABLE
IF #INDEX != 0
SELECT
#SLICE = LEFT(#STRING, #INDEX - 1);
ELSE
SELECT
#SLICE = #STRING;
-- PUT THE ITEM INTO THE RESULTS SET
INSERT INTO #Results
(
Items
)
VALUES(#SLICE);
-- CHOP THE ITEM REMOVED OFF THE MAIN STRING
SELECT
#STRING = REPLACE(RIGHT(#STRING, LEN(#STRING) - #INDEX), ',', '');
-- BREAK OUT IF WE ARE DONE
IF LEN(#STRING) = 0
BREAK;
END;
RETURN;
END;
But the output for the string I used as an example at the top of this post, ends up looking like this:
Notification: Organizer must notify at least 30 days prior to the event. Provide the event information, including: day of contact information, location, date, schedule, activities, etc. Paul T. Hall – <a href='mailto:paulhall#email.com'>paulhall#email.com</a> - Mikel Zubizarreta – mikelzubizarreta#email.com
As you can see, it sort of works but it only ads the 'mailto' tag to the first email address and not the second one.
This solution uses the splitter function created by Eirikur Eiriksson based on the original function by Jeff Moden. The whole explanation of this function can be found here.
I'll just copy the code for the function.
CREATE FUNCTION [dbo].[DelimitedSplit8K_LEAD]
--===== Define I/O parameters
(#pString VARCHAR(8000), #pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table” produces values from 0 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(#pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(#pString,t.N,1) = #pDelimiter OR t.N = 0)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(#pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
FROM cteStart s
;
GO
This way we can identify the email addresses independently and concatenate the string again using FOR XML.
CREATE TABLE #SampleData(
String varchar(8000)
)
INSERT INTO #SampleData VALUES('Notification: Organizer must notify at least 30 days prior to the event. Provide the event information, including: day of contact information, location, date, schedule, activities, etc. Paul T. Hall – paulhall#email.com - Mikel Zubizarreta – mikelzubizarreta#email.com')
SELECT STUFF(( SELECT ' ' + CASE WHEN s.Item LIKE '_%#_%._%' THEN '<a href=''mailto:' + s.Item + '''>' + s.Item + '</a>'
ELSE s.Item END
FROM dbo.DelimitedSplit8K_LEAD( d.String, ' ') s
ORDER BY s.ItemNumber
FOR XML PATH(''), TYPE).value('./text()[1]', 'varchar(max)'), 1, 1, '')
FROM #SampleData d

SQL Server - Split column data and retrieve last second value

I have a column name MasterCode in XYZ Table where data is stored in below form.
.105248.105250.104150.111004.
Now first of all I want to split the data into :
105248
105250
104150
111004
Then after to retrieve only last second value from the above.
So In the above given array, value returned should be 104150.
Use a split string function, but not the built in once since it will return only the values and you will lose the location data.
You can use Jeff Moden's DelimitedSplit8K that will return the item and the item index:
CREATE FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
(#pString VARCHAR(8000), #pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
;
Then you can use it to split the string and it will return a table like this:
DECLARE #string varchar(100) = '.105248.105250.104150.111004.';
SELECT *
FROM [dbo].[DelimitedSplit8K](#string, '.')
ItemNumber Item
1
2 105248
3 105250
4 104150
5 111004
6
You want only the parts where there actually is an item, so add a where clause, and you want the second from last so add row_number(), and you want the entire thing in a common table expression so that you can query it:
DECLARE #string varchar(100) = '.105248.105250.104150.111004.';
WITH CTE AS
(
SELECT Item, ROW_NUMBER() OVER(ORDER BY ItemNumber DESC) As rn
FROM [dbo].[DelimitedSplit8K](#string, '.')
WHERE Item <> ''
)
And the query:
SELECT Item
FROM CTE
WHERE rn = 2
Result: 104150
If there are always four parts, you can use PARSENAME():
DECLARE #s varchar(64) = '.105248.105250.104150.111004.';
SELECT PARSENAME(SUBSTRING(#s, 2, LEN(#s)-2),2);
Depending on your version of SQL SERVER, you can also use the STRING_SPLIT function.
DECLARE #string varchar(100) = '.105248.105250.104150.111004.';
SELECT value,
ROW_NUMBER() OVER (ORDER BY CHARINDEX('.' + value + '.', '.' + #string + '.')) AS Pos
FROM STRING_SPLIT(#string,'.')
WHERE RTRIM(value) <> '';
It doesn't return the original position like Jeff's splitter, but does compare very favourably if you check Aaron Bertrand's Article :
Performance Surprises and Assumptions : STRING_SPLIT()
Edit:
Added position, but although works in this case may have issues with duplicate values
You can create a SQL server table valued function with parameters stringvalue and delemeter and call that function for the results as expected.
ALTER function [dbo].[SplitString]
(
#str nvarchar(4000),
#separator char(1)
)
returns table
AS
return (
with tokens(p, a, b) AS (
select
1,
1,
charindex(#separator, #str)
union all
select
p + 1,
b + 1,
charindex(#separator, #str, b + 1)
from tokens
where b > 0
)
select
p-1 ID,
substring(
#str,
a,
case when b > 0 then b-a ELSE 4000 end)
AS s
from tokens
)
To call the function
SELECT * FROM [DBO].[SPLITSTRING] ('.105248.105250.104150.111004.', '.') WHERE ISNULL(S,'') <> ''
Output
ID s
1 105248
2 105250
3 104150
4 111004
To get only second value you can write your query as shown below
DECLARE #MaxID INT
SELECT #MaxID = MAX (ID) FROM (SELECT * FROM [DBO].[SPLITSTRING] ('.105248.105250.104150.111004.', '.') WHERE ISNULL(S,'') <> '') A
SELECT TOP 1 #MaxID = MAX (ID) FROM (
SELECT * FROM [DBO].[SPLITSTRING] ('.105248.105250.104150.111004.', '.') WHERE ISNULL(S,'') <> ''
)a where ID < #MaxID
SELECT * FROM [DBO].[SPLITSTRING] ('.105248.105250.104150.111004.', '.') WHERE ISNULL(S,'') <> '' AND ID = #MaxID
Output
ID s
3 104150
If you want 1 as value of ID then you can write your query as shown below in last line of query.
SELECT 1 AS ID , S FROM [DBO].[SPLITSTRING] ('.105248.105250.104150.111004.', '.') WHERE ISNULL(S,'') <> '' AND ID = #MaxID
Then the output will be
ID S
1 104150
Hope this will help you.
Try this
DECLARE #DATA AS TABLE (Data nvarchar(1000))
INSERT INTO #DATA
SELECT '.105248.105250.104150.111004.'
;WITH CTE
AS
(
SELECT Data,ROW_NUMBER()OVER(ORDER BY Data DESC) AS Rnk
FROM
(
SELECT Split.a.value('.','nvarchar(100)') Data
FROM(
SELECT CAST('<S>'+REPLACE(Data,'.','</S><S>')+'</S>' AS XML ) As Data
FROM #DATA
)DT
CROSS APPLY Data.nodes('S') AS Split(a)
) AS Fnl
WHERE Fnl.Data <>''
)
SELECT Data FROM CTE
WHERE Rnk=2
Result
Data
-----
105248
105250
104150
111004
It can also be achieve only using string functions:
IF OBJECT_ID('tempdb..#temp') IS NOT NULL
DROP TABLE #temp
SELECT '.105248.105250.104150.111004.' code INTO #temp UNION ALL
SELECT '.205248.205250.204150.211004.'
SELECT
REVERSE(LEFT(
REVERSE(LEFT(code, LEN(code) - CHARINDEX('.', REVERSE(code), 2)))
, CHARINDEX('.',REVERSE(LEFT(code, LEN(code) - CHARINDEX('.', REVERSE(code), 2)))) -1
)
) second_last_value
FROM #temp
Result:
second_last_value
-----------------------------
104150
204150

how to modify t-sql to process multiple records not just one.

I am working on a function to remove/ replace special characters from a string from a column named "Title". Currently I am testing the code for one record at a time. I would like to test the code against all the records in the table, but I do not know how to modify the current t-sql to process all the records rather than just one at a time. I would appreciate if someone could show me how, or what type of modifications I need to do to be able to process all records.
This is the code as I have it right now:
DECLARE #str VARCHAR(400);
DECLARE #expres VARCHAR(50) = '%[~,#,#,$,%,&,*,(,),.,!,´,:]%'
SET #str = (SELECT REPLACE(REPLACE(LOWER([a].[Title]), CHAR(9), ''), ' ', '_') FROM [dbo].[a] WHERE [a].[ID] = '43948')
WHILE PATINDEX(#expres, #str) > 0
SET #str = REPLACE(REPLACE(#str, SUBSTRING(#str, PATINDEX(#expres, #str), 1), ''), '-', ' ')
SELECT #str COLLATE SQL_Latin1_General_CP1251_CS_AS
For a Title containing the value: Schöne Wiege Meiner Leiden, the output after the code is applied would be: schone_wiege_meiner_leiden
I would like to make the code work to process multiple records rather that one like is done currently by specifying the ID. I want to process a bulks of records.
I hope I can get some help, thank you in advance for your help.
Code example taken from: remove special characters from string in sql server
There is no need for a loop here. You can instead use a tally table and this can become a set based inline table valued function quite easily. Performance wise it will blow the doors off a loop based scalar function.
I keep a tally table as a view in my system. Here is the code for the tally table.
create View [dbo].[cteTally] as
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select N from cteTally
Now comes the fun part, using this to parse strings and all kinds of various things. It has been dubbed the swiss army knife of t-sql. Anytime you start thinking loop, try to think about using a tally table instead. Here is how this function might look.
create function RemoveValuesFromString
(
#SearchVal nvarchar(max)
, #CharsToRemove nvarchar(max)
) returns table as
RETURN
with MyValues as
(
select substring(#SearchVal, N, 1) as MyChar
, t.N
from cteTally t
where N <= len(#SearchVal)
and charindex(substring(#SearchVal, N, 1), #CharsToRemove) = 0
)
select distinct MyResult = STUFF((select MyChar + ''
from MyValues mv2
order by mv2.N
FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)'), 1, 0, '')
from MyValues mv
;
Here is an example of how you might be able to use this. I am using a table variable here but this could be any table or whatever.
declare #SomeTable table
(
SomeTableID int identity primary key clustered
, SomeString varchar(max)
)
insert #SomeTable
select 'This coffee cost $32.!!! This is a# tot$#a%l r)*i-p~!`of^%f' union all
select 'This &that'
select *
from #SomeTable st
cross apply dbo.RemoveValuesFromString(st.SomeString, '%[~,##$%&*()!´:]%`^-') x

T-SQL LIKE condition on comma-separated list

Is it possible to write a LIKE condition in T-SQL to match a comma-separated list which includes wildcards to a string. Let me explain further with an example:
Say you have the following command separated list of urls in a field:
'/, /news/%, /about/'
Now here's some examples of strings I'd like to match with the string above:
'/'
'/news/'
'/news/2/'
'/about/'
And here's some strings which would not match:
'/contact/'
'/about/me/'
I've achieved this in the past by writing a split function and then doing a like on each one. However I'm trying to get my query to work in SQL Server CE which doesn't support functions.
In case you are wondering here's how I achieved it using the split function:
SELECT Widgets.Id
FROM Widgets
WHERE (SELECT COUNT(*) FROM [dbo].[Split](Urls, ',') WHERE #Input LIKE Data) > 0
And here's the split function:
CREATE FUNCTION [dbo].[Split]
(
#RowData NVARCHAR(MAX),
#Separator NVARCHAR(MAX)
)
RETURNS #RtnValue TABLE
(
[Id] INT IDENTITY(1,1),
[Data] NVARCHAR(MAX)
)
AS
BEGIN
DECLARE #Iterator INT
SET #Iterator = 1
DECLARE #FoundIndex INT
SET #FoundIndex = CHARINDEX(#Separator, #RowData)
WHILE (#FoundIndex > 0)
BEGIN
INSERT INTO #RtnValue ([Data])
SELECT Data = LTRIM(RTRIM(SUBSTRING(#RowData, 1, #FoundIndex - 1)))
SET #RowData = SUBSTRING(#RowData, #FoundIndex + DATALENGTH(#Separator) / 2, LEN(#RowData))
SET #Iterator = #Iterator + 1
SET #FoundIndex = CHARINDEX(#Separator, #RowData)
END
INSERT INTO #RtnValue ([Data])
SELECT Data = LTRIM(RTRIM(#RowData))
RETURN
END
I'd appreciate it if someone could help. Thanks
I can think of several options:
Use a session-keyed table: delete rows matching current spid, insert desired rows with current spid, read from table in SP, delete from table (again).
Make your client submit a query with many OR ... LIKE ... clauses.
Write an SP that does the same thing as your function and returns a recordset. INSERT YourTable EXEC SP #Strings and you are done!
Use the numbers-table-charindex-into-string inside of a derived table method of splitting the string.
Example
Let me flesh this out a little for you with an example combining ideas #3 and #4. Of course, your code for your function could be adapted, too.
Build a separate Numbers table. Here is example creation script:
--Numbers Table with 8192 elements (keeping it small for CE)
CREATE TABLE Numbers (
N smallint NOT NULL CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED
);
INSERT Numbers VALUES (1);
WHILE ##RowCount < 4096
INSERT Numbers SELECT N + (SELECT Max(N) FROM Numbers) FROM Numbers;
The SP:
CREATE PROCEDURE dbo.StringSplitRowset
#String varchar(8000)
AS
SELECT Substring(#String, l.StartPos, l.Chars) Item
FROM (
SELECT
S.StartPos,
IsNull(NullIf(CharIndex(',', #String, S.StartPos), 0) - S.StartPos, 8000)
FROM (
SELECT 1 UNION ALL
SELECT N.N + 1 FROM Numbers N WHERE Substring(#String, N.N, 1) = ','
) S (StartPos)
) L (StartPos, Chars);
And usage, easy as pie:
DECLARE #String varchar(8000);
SET #String = 'abc,def,ghi,jkl';
CREATE TABLE #Split (S varchar(8000));
INSERT #Split EXEC dbo.StringSplitRowset #String;
SELECT * FROM #Split;
Result:
abc
def
ghi
jkl
And finally, if you don't want to build a numbers table, you can use this SP. I think you will find that one of these two SPs performs well enough for you. There are other implementations of string splitting that could work as well.
ALTER PROCEDURE dbo.StringSplitRowset
#String varchar(8000)
AS
SELECT Substring(#String, l.StartPos, l.Chars) Item
FROM (
SELECT
S.StartPos,
IsNull(NullIf(CharIndex(',', #String, S.StartPos), 0) - S.StartPos, 8000)
FROM (
SELECT 1 UNION ALL
SELECT N.N + 1
FROM (
SELECT A.A * 4096 + B.B * 1024 + C.C * 256 + D.D * 64 + E.E * 16 + F.F * 4 + G.G N
FROM
(SELECT 0 UNION ALL SELECT 1) A (A),
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) G (G),
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) F (F),
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) E (E),
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) D (D),
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) C (C),
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) B (B)
) N (N)
WHERE Substring(#String, N.N, 1) = ','
) S (StartPos)
) L (StartPos, Chars)
Any SQL writer serious about understanding some of the performance implications of splitting strings different ways ought to see Aaron Bertrand's blog post on splitting strings.
Also, any serious SQL Server database student ought to see Erland Sommarskog's How to Share Data between Stored Procedures.
Will SQL Server CE let you split with XML functions and use CROSS APPLY? If so, you could do something like this:
SELECT DISTINCT T1.id
FROM (
SELECT id, CAST(('<X>'+
REPLACE(REPLACE(urls,' ',''),',','</X><X>')+
'</X>'
) AS xml
) as URLsXML
FROM dbo.Widgets
) AS T1
CROSS APPLY(
SELECT N.value('.', 'varchar(50)') AS URLPattern
FROM URLsXML.nodes('X') AS S(N)
) AS T2
WHERE #Input LIKE T2.URLPattern
UPDATE: I just checked. It looks like SQL Server CE doesn't support the XML data type or CROSS APPLY.
I think you will have to populate another table with ID's and Patterns to join against.

How can I pivot these key+values rows into a table of complete entries?

Maybe I demand too much from SQL but I feel like this should be possible. I start with a list of key-value pairs, like this:
'0:First, 1:Second, 2:Third, 3:Fourth'
etc. I can split this up pretty easily with a two-step parse that gets me a table like:
EntryNumber PairNumber Item
0 0 0
1 0 First
2 1 1
3 1 Second
etc.
Now, in the simple case of splitting the pairs into a pair of columns, it's fairly easy. I'm interested in the more advanced case where I might have multiple values per entry, like:
'0:First:Fishing, 1:Second:Camping, 2:Third:Hiking'
and such.
In that generic case, I'd like to find a way to take my 3-column result table and somehow pivot it to have one row per entry and one column per value-part.
So I want to turn this:
EntryNumber PairNumber Item
0 0 0
1 0 First
2 0 Fishing
3 1 1
4 1 Second
5 1 Camping
Into this:
Entry [1] [2] [3]
0 0 First Fishing
1 1 Second Camping
Is that just too much for SQL to handle, or is there a way? Pivots (even tricky dynamic pivots) seem like an answer, but I can't figure how to get that to work.
No, in SQL you can't infer columns dynamically based on the data found during the same query.
Even using the PIVOT feature in Microsoft SQL Server, you must know the columns when you write the query, and you have to hard-code them.
You have to do a lot of work to avoid storing the data in a relational normal form.
Alright, I found a way to accomplish what I was after. Strap in, this is going to get bumpy.
So the basic problem is to take a string with two kinds of delimiters: entries and values. Each entry represents a set of values, and I wanted to turn the string into a table with one column for each value per entry. I tried to make this a UDF, but the necessity for a temporary table and dynamic SQL meant it had to be a stored procedure.
CREATE PROCEDURE [dbo].[ParseValueList]
(
#parseString varchar(8000),
#itemDelimiter CHAR(1),
#valueDelimiter CHAR(1)
)
AS
BEGIN
SET NOCOUNT ON;
IF object_id('tempdb..#ParsedValues') IS NOT NULL
BEGIN
DROP TABLE #ParsedValues
END
CREATE TABLE #ParsedValues
(
EntryID int,
[Rank] int,
Pair varchar(200)
)
So that's just basic set up, establishing the temp table to hold my intermediate results.
;WITH
E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),--Brute forces 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --Uses a cross join to generate 100 rows (10 * 10)
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --Uses a cross join to generate 10,000 rows (100 * 100)
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E4)
That beautiful piece of SQL comes from SQL Server Central's Forums and is credited to "a guru." It's a great little 10,000 line tally table perfect for string splitting.
INSERT INTO #ParsedValues
SELECT ItemNumber AS EntryID, ROW_NUMBER() OVER (PARTITION BY ItemNumber ORDER BY ItemNumber) AS [Rank],
SUBSTRING(Items.Item, T1.N, CHARINDEX(#valueDelimiter, Items.Item + #valueDelimiter, T1.N) - T1.N) AS [Value]
FROM(
SELECT ROW_NUMBER() OVER (ORDER BY T2.N) AS ItemNumber,
SUBSTRING(#parseString, T2.N, CHARINDEX(#itemDelimiter, #parseString + #itemDelimiter, T2.N) - T2.N) AS Item
FROM cteTally T2
WHERE T2.N < LEN(#parseString) + 2 --Ensures we cut out once the entire string is done
AND SUBSTRING(#itemDelimiter + #parseString, T2.N, 1) = #itemDelimiter
) AS Items, cteTally T1
WHERE T1.N < LEN(#parseString) + 2 --Ensures we cut out once the entire string is done
AND SUBSTRING(#valueDelimiter + Items.Item, T1.N, 1) = #valueDelimiter
Ok, this is the first really dense meaty part. The inner select is breaking up my string along the item delimiter (the comma), using the guru's string splitting method. Then that table is passed up to the outer select which does the same thing, but this time using the value delimiter (the colon) to each row. The inner RowNumber (EntryID) and the outer RowNumber over Partition (Rank) are key to the pivot. EntryID show which Item the values belong to, and Rank shows the ordinal of the values.
DECLARE #columns varchar(200)
DECLARE #columnNames varchar(2000)
DECLARE #query varchar(8000)
SELECT #columns = COALESCE(#columns + ',[' + CAST([Rank] AS varchar) + ']', '[' + CAST([Rank] AS varchar)+ ']'),
#columnNames = COALESCE(#columnNames + ',[' + CAST([Rank] AS varchar) + '] AS Value' + CAST([Rank] AS varchar)
, '[' + CAST([Rank] AS varchar)+ '] AS Value' + CAST([Rank] AS varchar))
FROM (SELECT DISTINCT [Rank] FROM #ParsedValues) AS Ranks
SET #query = '
SELECT '+ #columnNames +'
FROM #ParsedValues
PIVOT
(
MAX([Value]) FOR [Rank]
IN (' + #columns + ')
) AS pvt'
EXECUTE(#query)
DROP TABLE #ParsedValues
END
And at last, the dynamic sql that makes it possible. By getting a list of Distinct Ranks, we set up our column list. This is then written into the dynamic pivot which tilts the values over and slots each value into the proper column, each with a generic "Value#" heading.
Thus by calling EXEC ParseValueList with a properly formatted string of values, we can break it up into a table to feed into our purposes! It works (but is probably overkill) for simple key:value pairs, and scales up to a fair number of columns (About 50 at most, I think, but that'd be really silly.)
Anyway, hope that helps anyone having a similar issue.
(Yeah, it probably could have been done in something like SQLCLR as well, but I find a great joy in solving problems with pure SQL.)
Though probably not optimal, here's a more condensed solution.
DECLARE #DATA varchar(max);
SET #DATA = '0:First:Fishing, 1:Second:Camping, 2:Third:Hiking';
SELECT
DENSE_RANK() OVER (ORDER BY [Data].[row]) AS [Entry]
, [Data].[row].value('(./B/text())[1]', 'int') as "[1]"
, [Data].[row].value('(./B/text())[2]', 'varchar(64)') as "[2]"
, [Data].[row].value('(./B/text())[3]', 'varchar(64)') as "[3]"
FROM
(
SELECT
CONVERT(XML, '<A><B>' + REPLACE(REPLACE(#DATA , ',', '</B></A><A><B>'), ':', '</B><B>') + '</B></A>').query('.')
) AS [T]([c])
CROSS APPLY [T].[c].nodes('/A') AS [Data]([row]);
Hope is not too late.
You can use the function RANK to know the position of each Item per PairNumber. And then use Pivot
SELECT PairNumber, [1] ,[2] ,[3]
FROM
(
SELECT PairNumber, Item, RANK() OVER (PARTITION BY PairNumber order by EntryNumber) as RANKing
from tabla) T
PIVOT
(MAX(Item)
FOR RANKing in ([1],[2],[3])
)as PVT