how to modify t-sql to process multiple records not just one. - sql

I am working on a function to remove/ replace special characters from a string from a column named "Title". Currently I am testing the code for one record at a time. I would like to test the code against all the records in the table, but I do not know how to modify the current t-sql to process all the records rather than just one at a time. I would appreciate if someone could show me how, or what type of modifications I need to do to be able to process all records.
This is the code as I have it right now:
DECLARE #str VARCHAR(400);
DECLARE #expres VARCHAR(50) = '%[~,#,#,$,%,&,*,(,),.,!,´,:]%'
SET #str = (SELECT REPLACE(REPLACE(LOWER([a].[Title]), CHAR(9), ''), ' ', '_') FROM [dbo].[a] WHERE [a].[ID] = '43948')
WHILE PATINDEX(#expres, #str) > 0
SET #str = REPLACE(REPLACE(#str, SUBSTRING(#str, PATINDEX(#expres, #str), 1), ''), '-', ' ')
SELECT #str COLLATE SQL_Latin1_General_CP1251_CS_AS
For a Title containing the value: Schöne Wiege Meiner Leiden, the output after the code is applied would be: schone_wiege_meiner_leiden
I would like to make the code work to process multiple records rather that one like is done currently by specifying the ID. I want to process a bulks of records.
I hope I can get some help, thank you in advance for your help.
Code example taken from: remove special characters from string in sql server

There is no need for a loop here. You can instead use a tally table and this can become a set based inline table valued function quite easily. Performance wise it will blow the doors off a loop based scalar function.
I keep a tally table as a view in my system. Here is the code for the tally table.
create View [dbo].[cteTally] as
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select N from cteTally
Now comes the fun part, using this to parse strings and all kinds of various things. It has been dubbed the swiss army knife of t-sql. Anytime you start thinking loop, try to think about using a tally table instead. Here is how this function might look.
create function RemoveValuesFromString
(
#SearchVal nvarchar(max)
, #CharsToRemove nvarchar(max)
) returns table as
RETURN
with MyValues as
(
select substring(#SearchVal, N, 1) as MyChar
, t.N
from cteTally t
where N <= len(#SearchVal)
and charindex(substring(#SearchVal, N, 1), #CharsToRemove) = 0
)
select distinct MyResult = STUFF((select MyChar + ''
from MyValues mv2
order by mv2.N
FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)'), 1, 0, '')
from MyValues mv
;
Here is an example of how you might be able to use this. I am using a table variable here but this could be any table or whatever.
declare #SomeTable table
(
SomeTableID int identity primary key clustered
, SomeString varchar(max)
)
insert #SomeTable
select 'This coffee cost $32.!!! This is a# tot$#a%l r)*i-p~!`of^%f' union all
select 'This &that'
select *
from #SomeTable st
cross apply dbo.RemoveValuesFromString(st.SomeString, '%[~,##$%&*()!´:]%`^-') x

Related

How to remove trailing spaces like characters in SQL Server

I have written a simple select query to select a single row from a table using a field named “Name”. The Names are sequential and presented as ‘RM001’, ‘RM002’, ‘RM003’…. This issue was that it didn’t pick up ‘RM004’ with the following query
-- Trim Name Field
UPDATE [dbo].[RoutineMaintenanceTask] SET = LTRIM(RTRIM([dbo].[RoutineMaintenanceTask].Name));
-- Select the record
SELECT *
FROM [dbo].[RoutineMaintenanceTask]
WHERE Name = 'RM004'
When I was checking the length of the value using the following query, it showed me the length as 7
-- Check the length
select (Name), len(Name) AS TextLength
from [dbo].[RoutineMaintenanceTask]
where Name = 'RM004'
It is obvious that this name contains some characters before or after, but it is not a space.
Not only that, I examined the value through Visual Studio debugger and did not notice anything unusual.
Nevertheless, when I copy the value of the “Name” from SQL results pane and copy it to notepad++, with special characters on, I was able to see this.
Ultimately, I was able to fix this the issue by adding following code before the select statement
-- Remove the tail
UPDATE [dbo].[RoutineMaintenanceTask] SET Name = substring(Name,1,5);
I just need to know how I get to know what are the hidden characters in a case like this and how to eliminate it without using substring (Because in this case, it was easy because I knew the length).
PS- I understand that using the keyword of ‘name’ as a field of a table is not a good practise, but in this context there is nothing to do with that.
It was likely either char(9), char(10), or char(13) (tab,lf,cr; respectively).
You can read up on them here: https://learn.microsoft.com/en-us/sql/t-sql/functions/char-transact-sql?view=sql-server-2017
You can remove them using REPLACE().
Such as:
DECLARE #VARIABLE VARCHAR(10)
SET #VARIABLE='RM004'+CHAR(10)+CHAR(10)
SELECT #VARIABLE, LEN(#VARIABLE)
SET #VARIABLE = REPLACE(#VARIABLE, CHAR(9),'')
SET #VARIABLE = REPLACE(#VARIABLE, CHAR(10),'')
SET #VARIABLE = REPLACE(#VARIABLE, CHAR(13),'')
SELECT #VARIABLE, LEN(#VARIABLE)
DECLARE #string VARCHAR(8000) = 'RM004
';
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
cte_Tally (n) AS (
SELECT TOP (DATALENGTH(#string))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
cte_n2 a CROSS JOIN cte_n2 b
)
SELECT
position = t.n,
char_value = SUBSTRING(#string, t.n, 1),
ascii_value = ASCII(SUBSTRING(#string, t.n, 1))
FROM
cte_Tally t;

advice on improving query performance ~ 1 day

I have 5 tables - Each have tens of thousands of records
1 main/very important table (TABLE A)
2 other tables (TABLES B/C) that still important but not as important as table
2 side tables (TABLES D/E)that hold primary keys between A<=>B and A<=>C i.e. only have two columns each
The 3 main tables have ~140 columns each, all have the same column names
The purpose of my query is to perform column level matching between all the tables A<=>D<=>B and A<=>E<=>C in one query
The final query will have about 286 columns (two ID columns from each main table,
select tableA.ID1 as [TABLEAID1],
tableA.ID2 as [TABLEAID2],
tableB.ID1 as [TABLEBID1],
tableB.ID2 as [TABLEBID2],
tableC.ID1 as [TABLECID1],
tableC.ID2 as [TABLECID2],
fn_TESTMatcher(tableA.[postCode], tableB.[postCode],) as
[TABLEAB.postCode.RESULT],
fn_TESTMatcher(tableA.[CityCode], tableB.[CityCode],) as
[TABLEAB.CityCode.RESULT],
.
.
. x238 more 'fn_TESTMatcher(...) as xyz' columns
.
INTO #Results
From tableA WITH (NOLOCK)
FULL JOIN tableD WITH (NOLOCK) ON tableA.ID1 = tableD.A
) FULL JOIN tableB WITH (NOLOCK) ON tableD.B = tableB.ID1
) FULL JOIN tableE WITH (NOLOCK) ON tableA.ID1 = tableE.A
) FULL JOIN tableC WITH (NOLOCK) ON tableE.B = tableC.ID
fn_TESTMatcher is a function, it is fed the same column from two main tables, then it removes/replaces special characters/abbreviations, and then tries to match them, if they match it returns a bit '1', if not then a bit '0'.
at the moment it takes about a day to run (i can't really time it with some sort of query timer), I can comment out all the columns except for the last and run it and its fairly quick, but i dont think i can just scale that up
Does anyone have some advice? My first assumption is to start googling on what indexes are and ...maybe.. apply it to the ID1 of every table although I'm a bit hesitant on a) messing up my tables and b) adding an index that ends up being useless
===========================================
update 2: table structure wise all the columns for all the main tables are varchars, length 100-250 characters, where ID (primary key) is not nullable
With the two side tables, they just have two columns, both varchar, 100 character limit (they're both foreign keys). The most important table's ID in this is not nullable
for functions, i technically have two:
FUNCTION [dbo].[fn_TESTStripCharacters]
(
#String NVARCHAR(MAX) ,
#MatchExpression VARCHAR(255)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE #expres VARCHAR(50) = '%[~,#,#,^,_,+,-,$,%,&,/,|,\,*,(,),.,!,`,:,<,>,?]%'
WHILE PATINDEX( #expres, #String ) > 0
SET #String = REPLACE(REPLACE(REPLACE( #String, SUBSTRING( #String, PATINDEX( #expres, #String ), 1 ),''),';',''),'-','')
RETURN #String
END
and second function
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [dbo].[fn_TESTMatcher](#Field1 NVARCHAR(MAX), #Field2
NVARCHAR(MAX))
RETURNS BIT
BEGIN
SET #Field1 = UPPER(LTRIM(RTRIM(REPLACE(dbo.fn_TESTStripCharacters(#Field1,#SpecialCharacters),'-',''))))
SET #Field2 = UPPER(LTRIM(RTRIM(REPLACE(dbo.fn_TESTStripCharacters(#Field2,#SpecialCharacters),'-',''))))
SET #Field1 = REPLACE(#Field1,' RD ',' ROAD ')
SET #Field2 = REPLACE(#Field2,' RD ',' ROAD ')
SET #Field1 = REPLACE(#Field1,' ST ',' STREET ')
SET #Field2 = REPLACE(#Field2,' ST ',' STREET ')
SET #Field1 = REPLACE(#Field1,' ','')
SET #Field2 = REPLACE(#Field2,' ','')
RETURN
CASE WHEN #Field1=#Field2
THEN '1'
ELSE '0'
END
END
=============================
update 2
Example table data - assuming the same two records exist in all 3 tables (not always the case )
TableA (main + most important table):
ID1 ID2 postCode, cityCode, ................
10001 1221 IG11PJ London ................
10230 1022 IG22PJ Nottingham ................
tableB (slightly less important table)
ID1 ID2 postCode, cityCode, ................
10031 1011 IG1 1PJ london ................
10980 982 IG2 2PJ nottingham ................
tableC (slightly less important table)
ID1 ID2 postCode, cityCode, ................
10551 1011 iG1 1pj london ................
20980 982 iG2 2pJ nottingham ................
tableD (side table)
A B
10001 10031
10230 10980
table E (side table)
A B
10001 10551
10230 20980
If Tables A, B, and C should be identical save for formatting differences I would suggest you create 3 CTEs, the first selecting TableA ID and a HASHBYTES of all other columns (columns will need to be cast to char/varchar so any formatting and replacing can take place there), the second CTE the same for Table B and the third for Table C.
Then just match the HASHBYTES values.
As has already been said though, without sample data, table structures, DDL for the function etc. we are just guessing.
Sean and Milney both make very good points regarding scalar vs inline table function and use of NOLOCK
I see these as a task that does not belong in one query. I would create a new set of tables (or these tables is you have a backup/don't need to preserve the data) and then perform your data cleaning steps into those new tables.
Once you have happy the data has been normalized then do a single query to compare the tables.
Trying to put it all in one query gives no advantage and you don't step wise make progress. For example, if you find you forgot to strip spaces out of once field you have to redo EVERYTHING. If you make new tables with the "cleaned" data you can incrementally invest time on cleaning the data (which is clearly the slow part of this process) till the data is perfect and then run your quick comparison. Forgot something -- it is a relatively quick update and run.
Instead of all the headaches you are running through, making copies of everything, and then trying to parse based on functions that can not be optimized, I would suggest the following. You state you have a column that gets stripped of special characters. I would add a "CleanKey" column for each table and represented column. Then, via a possible table trigger, or before the add/save, pre-clean that value into the "CleanKey" column and you are done. Then have an index on THOSE "Clean" columns and do a direct join.
Since the rest of the system does not know of these "Clean" columns, you can add the columns, clean them out of whatever function you have and not worry about duplicating or otherwise ruining other data.
Yes, it may take a bit to pre-"Clean" these columns, but then its DONE. Your query should be fast after that.
I would agree with the others that cleaning these string values would be a good idea. But since you still need to accomplish that and I absolutely hate loops and scalar functions with a passion I decided to roll up an inline table valued function instead of these two nested scalar functions. I am not using any loops here and the performance might surprise you.
I am using a tally or numbers table for this. I like to keep one of these around as view. Here is the code for the view I use.
create View [dbo].[cteTally] as
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select N from cteTally
GO
Then you can use this tally table to derive a set based approach to accommodate the business rules you have for deciding if two values match. You also don't need comma delimiters in here. In your example you had a comma nearly every other character in the list of values to remove. A single instance of each character is sufficient.
create function [dbo].[fn_TESTMatcher_Sean]
(
#Field1 nvarchar(max)
, #Field2 nvarchar(max)
, #CharsToRemove nvarchar(max)
) returns table as
RETURN
with MyValues1 as
(
select substring(#Field1, N, 1) as MyChar
, t.N
from cteTally t
where N <= len(#Field1)
and charindex(substring(#Field1, N, 1), #CharsToRemove) = 0
)
, MyValues2 as
(
select substring(#Field2, N, 1) as MyChar
, t.N
from cteTally t
where N <= len(#Field2)
and charindex(substring(#Field2, N, 1), #CharsToRemove) = 0
)
select convert(bit, case when mv1.MyResult = mv2.MyResult then 1 else 0 end) as IsMatch
from
(
select distinct MyResult =
replace(
replace(replace(STUFF((select MyChar + ''
from MyValues1 mv2
order by mv2.N
FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)'), 1, 0, '')
, ' RD ', ' ROAD ')
, ' ST ', ' STREET ')
, ' ', '')
from MyValues1 mv
) mv1
cross join
(
select distinct MyResult =
replace(
replace(replace(STUFF((select MyChar + ''
from MyValues2 mv2
order by mv2.N
FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)'), 1, 0, '')
, ' RD ', ' ROAD ')
, ' ST ', ' STREET ')
, ' ', '')
from MyValues2 mv
) mv2
;
Give this a shot and let me know if this works in your environment.
For example:
select *
from fn_TESTMatcher_Sean('123 any st rd or something', '123 any street road or something', '%[~,##^_+-$%&/|\*().!`:<>?]%')
The above returns 1 because they are a match under the rules you defined.

SQL Server 2012 T-SQL count number of words between elements of two sets

I have two sets of elements, let's say they are these words:
set 1: "nuclear", "fission", "dirty" and
set 2: "device", "explosive"
In my database, I have a text column (Description) which contains a sentence or two. I would like to find any records where Description contains both an element from set 1 followed by an element from set 2, where the two elements are separated by four words or less. For simplicity, counting (spaces-1) will count words between the two elements.
I'd prefer it if a solution didn't require the installation of anything like CLR functions for regular expression. Rather, if this could be done with a user-defined table function, it would make deployment simpler.
Does this sound possible?
It is possible, but i do not think it will preform well with millions of rows.
I have a solution here that handles about 10 000 rows in 2 sec and 100 000 rows in about 20 sec on our server. It also requires the famous DelimitedSplit8K sql table function from SQLServerCentral:
DECLARE #set1 VARCHAR(MAX) = 'nuclear, fission, dirty';
DECLARE #set2 VARCHAR(MAX) = 'device, explosive';
WITH GetDistances AS
(
SELECT
DubID = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID)
, Distance = dbo.[cf_ValueSetDistance](s.Description, #set1, #set2)
, s.ID
,s.Description
FROM #sentences s
JOIN dbo.cf_DelimitedSplit8K(#set1, ',') s1 ON s.Description LIKE '%' + RTRIM(LTRIM(s1.Item)) + '%'
JOIN dbo.cf_DelimitedSplit8K(#set2, ',') s2 ON s.Description LIKE '%' + RTRIM(LTRIM(s2.Item)) + '%'
) SELECT Distance, ID, Description FROM GetDistances WHERE DubID = 1 AND Distance BETWEEN 1 AND 4;
--10 000 rows: 2sec
--100 000 rows: 20sec
Test data generator
--DROP TABLE #sentences
CREATE TABLE #sentences
(
ID INT IDENTITY(1,1) PRIMARY KEY
, Description VARCHAR(100)
);
GO
--CREATE 10000 random sentences that are 100 chars long
SET NOCOUNT ON;
WHILE((SELECT COUNT(*) FROM #sentences) < 10000)
BEGIN
DECLARE #randomWord VARCHAR(100) = '';
SELECT TOP 100 #randomWord = #randomWord + ' ' + Item FROM dbo.cf_DelimitedSplit8K('nuclear fission dirty device explosive On the other hand, we denounce with righteous indignation and dislike men who are so beguiled and demoralized by the charms of pleasure of the moment, so blinded by desire, that they cannot foresee the pain and trouble that are bound to ensue; and equal blame belongs to those who fail in their duty through weakness of will, which is the same as saying through shrinking from toil and pain. These cases are perfectly simple and easy to distinguish. In a free hour, when our power of choice is untrammelled and when nothing prevents our being able to do what we like best, every pleasure is to be welcomed and every pain avoided. But in certain circumstances and owing to the claims of duty or the obligations of business it will frequently occur that pleasures have to be repudiated and annoyances accepted. The wise man therefore always holds in these matters to this principle of selection: he rejects pleasures to secure other greater pleasures, or else he endures pains to avoid worse pains', ' ') ORDER BY NEWID();
INSERT INTO #sentences
SELECT #randomWord
END
SET NOCOUNT OFF;
Function 1 - cf_ValueSetDistance
CREATE FUNCTION [dbo].[cf_ValueSetDistance]
(
#value VARCHAR(MAX)
, #compareSet1 VARCHAR(MAX)
, #compareSet2 VARCHAR(MAX)
)
RETURNS INT
AS
BEGIN
SET #value = REPLACE(REPLACE(REPLACE(#value, '.', ''), ',', ''), '?', '');
DECLARE #distance INT;
DECLARE #sentence TABLE( WordIndex INT, Word VARCHAR(MAX) );
DECLARE #set1 TABLE(Word VARCHAR(MAX) );
DECLARE #set2 TABLE(Word VARCHAR(MAX) );
INSERT INTO #sentence
SELECT ItemNumber, RTRIM(LTRIM(Item)) FROM dbo.cf_DelimitedSplit8K(#value, ' ')
INSERT INTO #set1
SELECT RTRIM(LTRIM(Item)) FROM dbo.cf_DelimitedSplit8K(#compareSet1, ',')
IF(EXISTS(SELECT 1 FROM #sentence s JOIN #set1 s1 ON s.Word = s1.Word))
BEGIN
INSERT INTO #set2
SELECT RTRIM(LTRIM(Item)) FROM dbo.cf_DelimitedSplit8K(#compareSet2, ',');
IF(EXISTS(SELECT 1 FROM #sentence s JOIN #set2 s2 ON s.Word = s2.Word))
BEGIN
WITH Set1 AS (
SELECT s.WordIndex, s.Word FROM #sentence s
JOIN #set1 s1 ON s1.Word = s.Word
), Set2 AS
(
SELECT s.WordIndex, s.Word FROM #sentence s
JOIN #set2 s2 ON s2.Word = s.Word
)
SELECT #distance = MIN(ABS(s2.WordIndex - s1.WordIndex)) FROM Set1 s1, Set2 s2
END
END
RETURN #distance;
END
Function 2 - DelimitedSplit8K
(No need to even try to understand this code, this is an extremely fast function for splitting a string to a table, written by several talented people):
CREATE FUNCTION [dbo].[cf_DelimitedSplit8K]
(#pString VARCHAR(8000), #pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover NVARCHAR(4000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l;
I dont know anything about performance, but this could be done with cross apply and two temporary tables.
--initialize word set data
DECLARE #set1 TABLE (wordFromSet varchar(n))
DECLARE #set2 TABLE (wordFromSet varchar(n))
INSERT INTO #set1 SELECT 'nuclear' UNION SELECT 'fission' UNION SELECT 'dirty'
INSERT INTO #set2 SELECT 'device' UNION SELECT 'explosive'
SELECT *
FROM MyTable m
CROSS APPLY
(
SELECT wordFromSet
,LEN(SUBSTRING(m.Description, 1, CHARINDEX(wordFromSet, m.Description))) - LEN(REPLACE(SUBSTRING(m.Description, 1, CHARINDEX(wordFromSet, m.Description)),' ', '')) AS WordPosition
FROM #set1
WHERE m.Description LIKE '%' + wordFromSet + '%'
) w1
CROSS APPLY
(
SELECT wordFromSet
,LEN(SUBSTRING(m.Description, 1, CHARINDEX(wordFromSet, m.Description))) - LEN(REPLACE(SUBSTRING(m.Description, 1, CHARINDEX(wordFromSet, m.Description)),' ', '')) AS WordPosition
FROM #set2
WHERE m.Description LIKE '%' + wordFromSet + '%'
) w2
WHERE w2.WordPosition - w1.WordPosition <= treshold
Essentially it will only return rows from MyTable that have at least a word from both sets, and for these rows it will calculate which word position it holds by calculating the difference in length between the substring that ends at the words position and the same substring with spaces removed.
I am adding a new answer, even if my old one has been accepted and I can see you went for the "FULL TEXT INDEX".
I have looked at the answer #Louis gave, and I think it was clever to use "CROSS APPLY". His answer beats the performance of mine. The only problem is that his code will only compare from the first instance of a word. This made me want to try to combine his answer with the split function I used (DelimitedSplit8K from SQLServerCentral).
This results in a remarkable performance boost, I have tested this on 1 million rows, and the result was almost instant:
My old answer: 5min
#Louis answer: 2min
New answer: 3sec
This do not beet the "FULLTEXT INDEX" performance wise, but it at least supports the word search combination specification you provided in a relatively effective way.
DECLARE #set1 TABLE (Word VARCHAR(50))
DECLARE #set2 TABLE (Word VARCHAR(50))
INSERT INTO #set1 SELECT 'nuclear' UNION SELECT 'fission' UNION SELECT 'dirty'
INSERT INTO #set2 SELECT 'device'UNION SELECT 'explosive'
SELECT * FROM #sentences s
CROSS APPLY
(
SELECT * FROM #set1 s1
JOIN dbo.cf_DelimitedSplit8K(s.Description, ' ') split ON split.Item = s1.Word
) s1
CROSS APPLY
(
SELECT * FROM #set2 s2
JOIN dbo.cf_DelimitedSplit8K(s.Description, ' ') split ON split.Item = s2.Word
) s2
WHERE ABS(s1.ItemNumber - s2.ItemNumber) <= 4;
Look at my old answer for the code for the dbo.cf_COM_DelimitedSplit8K function.

Print bullet before each sentence + new line after each sentence SQL

I have a text like: Sentence one. Sentence two. Sentence three.
I want it to be:
Sentence one.
Sentence two.
Sentence three.
I assume I can replace '.' with '.' + char(10) + char(13), but how can I go about bullets? '•' character works fine if printed manually I just do not know how to bullet every sentence including the first.
-- Initial string
declare #text varchar(100)
set #text = 'Sentence one. Sentence two. Sentence three.'
-- Setting up replacement text - new lines (assuming this works) and bullets ( char(149) )
declare #replacement varchar(100)
set #replacement = '.' + char(10) + char(13) + char(149)
-- Adding a bullet at the beginning and doing the replacement, but this will also add a trailing bullet
declare #processedText varchar(100)
set #processedText = char(149) + ' ' + replace(#text, '.', #replacement)
-- Figure out length of substring to select in the next step
declare #substringLength int
set #substringLength = LEN(#processedText) - CHARINDEX(char(149), REVERSE(#processedText))
-- Removes trailing bullet
select substring(#processedText, 0, #substringLength)
I've tested here - https://data.stackexchange.com/stackoverflow/qt/119364/
I should point out that doing this in T-SQL doesn't seem correct. T-SQL is meant to process data; any presentation-specific work should be done in the code that calls this T-SQL (C# or whatever you're using).
Here's my over-the-top approach but I feel it's a fairly solid approach. It combines classic SQL problem solving techniques of Number tables for string slitting and use of the FOR XML for concatenating the split lines back together. The code is long but the only place you'd need to actually edit is the SOURCE_DATA section.
No knock on #Jeremy Wiggins approach, but I prefer mine as it lends itself well to a set based approach in addition to being fairly efficient code.
-- This code will rip lines apart based on #delimiter
-- and put them back together based on #rebind
DECLARE
#delimiter char(1)
, #rebind varchar(10);
SELECT
#delimiter = '.'
, #rebind = char(10) + char(149) + ' ';
;
-- L0 to L5 simulate a numbers table
-- http://billfellows.blogspot.com/2009/11/fast-number-generator.html
WITH L0 AS
(
SELECT
0 AS C
UNION ALL
SELECT
0
)
, L1 AS
(
SELECT
0 AS c
FROM
L0 AS A
CROSS JOIN L0 AS B
)
, L2 AS
(
SELECT
0 AS c
FROM
L1 AS A
CROSS JOIN L1 AS B
)
, L3 AS
(
SELECT
0 AS c
FROM
L2 AS A
CROSS JOIN L2 AS B
)
, L4 AS
(
SELECT
0 AS c
FROM
L3 AS A
CROSS JOIN L3 AS B
)
, L5 AS
(
SELECT
0 AS c
FROM
L4 AS A
CROSS JOIN L4 AS B
)
, NUMS AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS number
FROM
L5
)
, SOURCE_DATA (ID, content) AS
(
-- This query simulates your input data
SELECT 1, 'Sentence one. Sentence two. Sentence three.'
UNION ALL SELECT 7, 'In seed time learn, in harvest teach, in winter enjoy.Drive your cart and your plow over the bones of the dead.The road of excess leads to the palace of wisdom.Prudence is a rich, ugly old maid courted by Incapacity.He who desires but acts not, breeds pestilence.'
)
, MAX_LENGTH AS
(
-- this query is rather important. The current NUMS query generates a
-- very large set of numbers but we only need 1 to maximum lenth of our
-- source data. We can take advantage of a 2008 feature of letting
-- TOP take a dynamic value
SELECT TOP (SELECT MAX(LEN(SD.content)) AS max_length FROM SOURCE_DATA SD)
N.number
FROM
NUMS N
)
, MULTI_LINES AS
(
-- This query will make many lines out a single line based on the supplied delimiter
-- Need to retain the ID (or some unique value from original data to regroup it
-- http://www.sommarskog.se/arrays-in-sql-2005.html#tblnum
SELECT
SD.ID
, LTRIM(substring(SD.content, Number, charindex(#delimiter, SD.content + #delimiter, Number) - Number)) + #delimiter AS lines
FROM
MAX_LENGTH
CROSS APPLY
SOURCE_DATA SD
WHERE
Number <= len(SD.content)
AND substring(#delimiter + SD.content, Number, 1) = #delimiter
)
, RECONSITITUE (content, ID) AS
(
-- use classic concatenation to put it all back together
-- using CR/LF * (space) as delimiter
-- as a correlated sub query and joined back to our original table to preserve IDs
-- https://stackoverflow.com/questions/5196371/sql-query-concatenating-results-into-one-string
SELECT DISTINCT
STUFF
(
(
SELECT #rebind + M.lines
FROM MULTI_LINES M
WHERE M.ID = ML.ID
FOR XML PATH('')
)
, 1
, 1
, '')
, ML.ID
FROM
MULTI_LINES ML
)
SELECT
R.content
, R.ID
FROM
RECONSITITUE R
Results
content ID
----------------------------------------------------------- ---
• In seed time learn, in harvest teach, in winter enjoy.
• Drive your cart and your plow over the bones of the dead.
• The road of excess leads to the palace of wisdom.
• Prudence is a rich, ugly old maid courted by Incapacity.
• He who desires but acts not, breeds pestilence. 7
• Sentence one.
• Sentence two.
• Sentence three. 1
(2 row(s) affected)
References
Number table
Splitting strings via number table
SQL Query - Concatenating Results into One String
select '• On '+ cast(getdate() as varchar)+' I discovered how to do this '
Sample

How can I pivot these key+values rows into a table of complete entries?

Maybe I demand too much from SQL but I feel like this should be possible. I start with a list of key-value pairs, like this:
'0:First, 1:Second, 2:Third, 3:Fourth'
etc. I can split this up pretty easily with a two-step parse that gets me a table like:
EntryNumber PairNumber Item
0 0 0
1 0 First
2 1 1
3 1 Second
etc.
Now, in the simple case of splitting the pairs into a pair of columns, it's fairly easy. I'm interested in the more advanced case where I might have multiple values per entry, like:
'0:First:Fishing, 1:Second:Camping, 2:Third:Hiking'
and such.
In that generic case, I'd like to find a way to take my 3-column result table and somehow pivot it to have one row per entry and one column per value-part.
So I want to turn this:
EntryNumber PairNumber Item
0 0 0
1 0 First
2 0 Fishing
3 1 1
4 1 Second
5 1 Camping
Into this:
Entry [1] [2] [3]
0 0 First Fishing
1 1 Second Camping
Is that just too much for SQL to handle, or is there a way? Pivots (even tricky dynamic pivots) seem like an answer, but I can't figure how to get that to work.
No, in SQL you can't infer columns dynamically based on the data found during the same query.
Even using the PIVOT feature in Microsoft SQL Server, you must know the columns when you write the query, and you have to hard-code them.
You have to do a lot of work to avoid storing the data in a relational normal form.
Alright, I found a way to accomplish what I was after. Strap in, this is going to get bumpy.
So the basic problem is to take a string with two kinds of delimiters: entries and values. Each entry represents a set of values, and I wanted to turn the string into a table with one column for each value per entry. I tried to make this a UDF, but the necessity for a temporary table and dynamic SQL meant it had to be a stored procedure.
CREATE PROCEDURE [dbo].[ParseValueList]
(
#parseString varchar(8000),
#itemDelimiter CHAR(1),
#valueDelimiter CHAR(1)
)
AS
BEGIN
SET NOCOUNT ON;
IF object_id('tempdb..#ParsedValues') IS NOT NULL
BEGIN
DROP TABLE #ParsedValues
END
CREATE TABLE #ParsedValues
(
EntryID int,
[Rank] int,
Pair varchar(200)
)
So that's just basic set up, establishing the temp table to hold my intermediate results.
;WITH
E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),--Brute forces 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --Uses a cross join to generate 100 rows (10 * 10)
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --Uses a cross join to generate 10,000 rows (100 * 100)
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E4)
That beautiful piece of SQL comes from SQL Server Central's Forums and is credited to "a guru." It's a great little 10,000 line tally table perfect for string splitting.
INSERT INTO #ParsedValues
SELECT ItemNumber AS EntryID, ROW_NUMBER() OVER (PARTITION BY ItemNumber ORDER BY ItemNumber) AS [Rank],
SUBSTRING(Items.Item, T1.N, CHARINDEX(#valueDelimiter, Items.Item + #valueDelimiter, T1.N) - T1.N) AS [Value]
FROM(
SELECT ROW_NUMBER() OVER (ORDER BY T2.N) AS ItemNumber,
SUBSTRING(#parseString, T2.N, CHARINDEX(#itemDelimiter, #parseString + #itemDelimiter, T2.N) - T2.N) AS Item
FROM cteTally T2
WHERE T2.N < LEN(#parseString) + 2 --Ensures we cut out once the entire string is done
AND SUBSTRING(#itemDelimiter + #parseString, T2.N, 1) = #itemDelimiter
) AS Items, cteTally T1
WHERE T1.N < LEN(#parseString) + 2 --Ensures we cut out once the entire string is done
AND SUBSTRING(#valueDelimiter + Items.Item, T1.N, 1) = #valueDelimiter
Ok, this is the first really dense meaty part. The inner select is breaking up my string along the item delimiter (the comma), using the guru's string splitting method. Then that table is passed up to the outer select which does the same thing, but this time using the value delimiter (the colon) to each row. The inner RowNumber (EntryID) and the outer RowNumber over Partition (Rank) are key to the pivot. EntryID show which Item the values belong to, and Rank shows the ordinal of the values.
DECLARE #columns varchar(200)
DECLARE #columnNames varchar(2000)
DECLARE #query varchar(8000)
SELECT #columns = COALESCE(#columns + ',[' + CAST([Rank] AS varchar) + ']', '[' + CAST([Rank] AS varchar)+ ']'),
#columnNames = COALESCE(#columnNames + ',[' + CAST([Rank] AS varchar) + '] AS Value' + CAST([Rank] AS varchar)
, '[' + CAST([Rank] AS varchar)+ '] AS Value' + CAST([Rank] AS varchar))
FROM (SELECT DISTINCT [Rank] FROM #ParsedValues) AS Ranks
SET #query = '
SELECT '+ #columnNames +'
FROM #ParsedValues
PIVOT
(
MAX([Value]) FOR [Rank]
IN (' + #columns + ')
) AS pvt'
EXECUTE(#query)
DROP TABLE #ParsedValues
END
And at last, the dynamic sql that makes it possible. By getting a list of Distinct Ranks, we set up our column list. This is then written into the dynamic pivot which tilts the values over and slots each value into the proper column, each with a generic "Value#" heading.
Thus by calling EXEC ParseValueList with a properly formatted string of values, we can break it up into a table to feed into our purposes! It works (but is probably overkill) for simple key:value pairs, and scales up to a fair number of columns (About 50 at most, I think, but that'd be really silly.)
Anyway, hope that helps anyone having a similar issue.
(Yeah, it probably could have been done in something like SQLCLR as well, but I find a great joy in solving problems with pure SQL.)
Though probably not optimal, here's a more condensed solution.
DECLARE #DATA varchar(max);
SET #DATA = '0:First:Fishing, 1:Second:Camping, 2:Third:Hiking';
SELECT
DENSE_RANK() OVER (ORDER BY [Data].[row]) AS [Entry]
, [Data].[row].value('(./B/text())[1]', 'int') as "[1]"
, [Data].[row].value('(./B/text())[2]', 'varchar(64)') as "[2]"
, [Data].[row].value('(./B/text())[3]', 'varchar(64)') as "[3]"
FROM
(
SELECT
CONVERT(XML, '<A><B>' + REPLACE(REPLACE(#DATA , ',', '</B></A><A><B>'), ':', '</B><B>') + '</B></A>').query('.')
) AS [T]([c])
CROSS APPLY [T].[c].nodes('/A') AS [Data]([row]);
Hope is not too late.
You can use the function RANK to know the position of each Item per PairNumber. And then use Pivot
SELECT PairNumber, [1] ,[2] ,[3]
FROM
(
SELECT PairNumber, Item, RANK() OVER (PARTITION BY PairNumber order by EntryNumber) as RANKing
from tabla) T
PIVOT
(MAX(Item)
FOR RANKing in ([1],[2],[3])
)as PVT