SQL Server Recursive CTE not returning expected rows - sql

I'm building a Markov chain name generator. I'm trying to replace a while loop with a recursive CTE. Limitations in using top and order by in the recursive part of the CTE have led me down the following path.
The point of all of this is to generate names, based on a model, which is just another word that I've chunked out into three character segments, stored in three columns in the Markov_Model table. The next character in the sequence will be a character from the Markov_Model, such that the 1st and 2nd characters in the model match the penultimate and ultimate character in the word being generated. Rather than generate a probability matrix for the that third character, I'm using a scalar function that finds all the characters that fit the criteria, and gets one of them randomly: order by newid().
The problem is that this formulation of the CTE gets the desired number of rows in the anchor segment, but the union that recursively calls the CTE only unions one row from the anchor. I've attached a sample of the desired output at the bottom.
The query:
;with names as
(
select top 5
cast('+' as nvarchar(50)) as char1,
cast('+' as nvarchar(50)) as char2,
cast(char3 as nvarchar(50)) as char3,
cast('++' + char3 as nvarchar(100)) as name_in_progress,
1 as iteration
from markov_Model
where char1 is null
and char2 is null
order by newid() -- Get some random starting characters
union all
select
n.char2 as char1,
n.char3 as char2,
cast(fnc.addition as nvarchar(50)) as char3,
cast(n.name_in_progress + fnc.addition as nvarchar(100)),
1 + n.iteration
from names n
cross apply (
-- This function takes the preceding two characters,
-- and gets a random character that follows the pattern
select isnull(dbo.[fn_markov_getNext] (n.char2, n.char3), ',') as addition
) fnc
)
select *
from names
option (maxrecursion 3) -- For debug
The trouble is the union only unions one row.
Example output:
char1 char2 char3 name_in_progress iteration
+ + F ++F 1
+ + N ++N 1
+ + K ++K 1
+ + S ++S 1
+ + B ++B 1
+ B a ++Ba 2
B a c ++Bac 3
a c h ++Bach 4
Note I'm using + and , as null replacers/delimeters.
What I want to see is the entirety of the previous recursion, with the addition of the new characters to the name_in_progress; each pass should modify the entirely of the previous pass.
My desired output would be:
Top 10 of the Markov_Model table:
Text of the function that gets the next character from the Markov_Model:
CREATEFUNCTION [dbo].[fn_markov_getNext]
(
#char2 nvarchar(1),
#char3 nvarchar(1)
)
RETURNS nvarchar(1)
AS
BEGIN
DECLARE #newChar nvarchar(1)
set #newChar = (
select top 1
isnull(char3, ',')
from markov_Model mm
where isnull(mm.char1, '+') = isnull(#char2, '+')
and isnull(mm.char2, '+') = isnull(#char3, ',')
order by (select new_id from vw_getGuid) -- A view that calls newid()
)
return #newChar
END

Related

SQL Server: Select rows with multiple occurrences of regex match in a column

I’m fairly used to using MySQL, but not particularly familiar with SQL Server. Tough luck, the database I’m dealing with here is on SQL Server 2014.
I have a table with a column whose values are all integers with leading, separating, and trailing semicolons, like these three fictitious rows:
;905;1493;384;13387;29;933;467;28732;
;905;138;3084;1387;290;9353;4767;2732;
;9085;14493;3864;130387;289;933;4767;28732;
What I am trying to do now is to select all rows where more than one number taken from a list of numbers appears in this column. So for example, given the three rows above, if I have the group 905,467,4767, the statement I’m trying to figure out how to construct should return the first two rows: the first row contains 905 and 467; the second row contains 905 and 4767. The third row contains only 4767, so that row should not be returned.
As far as I can tell, SQL Server does not actually support regex directly (and I don’t even know what managed code is), which doesn’t help. Even with regex, I wouldn’t know where to begin. Oracle seems to have a function that would be very useful, but that’s Oracle.
Most similar questions on here deal with finding multiple instances of the same character (usually singular) and solve the problem by replacing the string to match with nothing and counting the difference in length. I suppose that would technically work here, too, but given a ‘filter’ group of 15 numbers, the SELECT statement would become ridiculously long and convoluted and utterly unreadable. Additionally, I only want to match entire numbers (so if one of the numbers to match is 29, the value 29 would match in the first row, but the value 290 in the second row should not match), which means I’d have to include the semicolons in the REPLACE clause and then discount them when calculating the length. A complete mess.
What I would ideally like to do is something like this:
SELECT * FROM table WHERE REGEXP_COUNT(column, ';(905|467|4767);') > 1
– but that will obviously not work, for all kinds of reasons (the most obvious one being the nonexistence of REGEXP_COUNT outside Oracle).
Is there some sane, manageable way of doing this?
You can do
SELECT *
FROM Mess
CROSS APPLY (SELECT COUNT(*)
FROM (VALUES (905),
(467),
(4767)) V(Num)
WHERE Col LIKE CONCAT('%;', Num, ';%')) ca(count)
WHERE count > 1
SQL Fiddle
Or alternatively
WITH Nums
AS (SELECT Num
FROM (VALUES (905),
(467),
(4767)) V(Num))
SELECT Mess.*
FROM Mess
CROSS APPLY (VALUES(CAST(CONCAT('<x>', REPLACE(Col, ';', '</x><x>'), '</x>') AS XML))) x(x)
CROSS APPLY (SELECT COUNT(*)
FROM (SELECT n.value('.', 'int')
FROM x.x.nodes('/x') n(n)
WHERE n.value('.', 'varchar') <> ''
INTERSECT
SELECT Num
FROM Nums) T(count)
HAVING COUNT(*) > 1) ca2(count)
Could you put your arguments into a table (perhaps using a table-valued function accepting a string (of comma-separated integers) as a parameter) and use something like this?
DECLARE #T table (String varchar(255))
INSERT INTO #T
VALUES
(';905;1493;384;13387;29;933;467;28732;')
, (';905;138;3084;1387;290;9353;4767;2732;')
, (';9085;14493;3864;130387;289;933;4767;28732;')
DECLARE #Arguments table (Arg int)
INSERT INTO #Arguments
VALUES
(905)
, (467)
, (4767)
SELECT String
FROM
#T
CROSS JOIN #Arguments
GROUP BY String
HAVING SUM(CASE WHEN PATINDEX('%;' + CAST(Arg AS varchar) + ';%', String) > 0 THEN 1 ELSE 0 END) > 1
And example of using this with a function to generate the arguments:
CREATE FUNCTION GenerateArguments (#Integers varchar(255))
RETURNS #Arguments table (Arg int)
AS
BEGIN
WITH cte
AS
(
SELECT
PATINDEX('%,%', #Integers) p
, LEFT(#Integers, PATINDEX('%,%', #Integers) - 1) n
UNION ALL
SELECT
CASE WHEN PATINDEX('%,%', SUBSTRING(#Integers, p + 1, LEN(#Integers))) + p = p THEN 0 ELSE PATINDEX('%,%', SUBSTRING(#Integers, p + 1, LEN(#Integers))) + p END
, CASE WHEN PATINDEX('%,%', SUBSTRING(#Integers, p + 1, LEN(#Integers))) = 0 THEN RIGHT(#Integers, PATINDEX('%,%', REVERSE(#Integers)) - 1) ELSE LEFT(SUBSTRING(#Integers, p + 1, LEN(#Integers)), PATINDEX('%,%', SUBSTRING(#Integers, p + 1, LEN(#Integers))) - 1) END
FROM cte
WHERE p <> 0
)
INSERT INTO #Arguments (Arg)
SELECT n
FROM cte
RETURN
END
GO
DECLARE #T table (String varchar(255))
INSERT INTO #T
VALUES
(';905;1493;384;13387;29;933;467;28732;')
, (';905;138;3084;1387;290;9353;4767;2732;')
, (';9085;14493;3864;130387;289;933;4767;28732;')
;
SELECT String
FROM
#T
CROSS JOIN GenerateArguments('905,467,4767')
GROUP BY String
HAVING SUM(CASE WHEN PATINDEX('%;' + CAST(Arg AS varchar) + ';%', String) > 0 THEN 1 ELSE 0 END) > 1
You can achieve this using the like function for the regex and row_number to determine the number of matches.
Here we declare the column values for testing:
DECLARE #tbl TABLE (
string NVARCHAR(MAX)
)
INSERT #tbl VALUES
(';905;1493;384;13387;29;933;467;28732;'),
(';905;138;3084;1387;290;9353;4767;2732;'),
(';9085;14493;3864;130387;289;933;4767;28732;')
Then we pass your search parameters into a table variable to be joined on:
DECLARE #search_tbl TABLE (
search_value INT
)
INSERT #search_tbl VALUES
(905),
(467),
(4767)
Finally we join the table with the column to search for onto the search table. We apply the row_number function to determine the number of times it matches. We select from this subquery where the row_number = 2 meaning that it joined at least twice.
SELECT
string
FROM (
SELECT
tbl.string,
ROW_NUMBER() OVER (PARTITION BY tbl.string ORDER BY tbl.string) AS rn
FROM #tbl tbl
JOIN #search_tbl search_tbl ON
tbl.string LIKE '%;' + CAST(search_tbl.search_value AS NVARCHAR(MAX)) + ';%'
) tbl
WHERE rn = 2
You could build a where clause like this :
WHERE
case when column like '%;905;%' then 1 else 0 end +
case when column like '%;467;%' then 1 else 0 end +
case when column like '%;4767;%' then 1 else 0 end >= 2
The advantage is that you do not need a helper table. I don't know how you build the query, but the following also works, and is useful if the numbers are in a tsql variable.
case when column like ('%;' + #n + ';%') then 1 else 0 end

How to obtain certain value from URL? [duplicate]

This question already has answers here:
SQL Server 2008 R2 - How to split my varchar column string and get 3rd index string
(4 answers)
Closed 7 years ago.
Given this URL:
www.google.com/hsisn/-#++#/valuetoretrive/+#(#(/.html
The value to is between 4th and 5th slash.
How to retrieve that particular value using SQL Server 2008?
There is no function in SQL server to get the nth occurrence of a value, the only function is CHARINDEX, which will retrieve the first instance after the specified starting position. So the only way to utilise this is to cascade each value found, i.e:
Find the position of 1st "/"
Find the position of the next "/" after the first one
Find the position of the next "/" after the second one
So each calculation requires the result of the previous one, which to get the 5th occurrence gets fairly messy, but not impossible if you use CROSS APPLY to reuse your results. Once you have the position of the 4th and 5th occurrence you can use SUBSTRING to extract the text:
SELECT t.url,
Value = SUBSTRING(t.url, p4.Position, p5.Position - p4.Position - 1)
FROM (SELECT url = 'URL:/www.google.com/hsisn/-#++#/valuetoretrive/+#(#(/.html') AS t
CROSS APPLY (SELECT 1 + CHARINDEX('/', url)) AS p1 (Position)
CROSS APPLY (SELECT 1 + CHARINDEX('/', url, p1.Position)) AS p2 (Position)
CROSS APPLY (SELECT 1 + CHARINDEX('/', url, p2.Position)) AS p3 (Position)
CROSS APPLY (SELECT 1 + CHARINDEX('/', url, p3.Position)) AS p4 (Position)
CROSS APPLY (SELECT 1 + CHARINDEX('/', url, p4.Position)) AS p5 (Position);
ADDENDUM
The other option you have, if you want more flexibility, i.e. get the text between the 50th and 51st occurrence, is to utilise a split function. The most efficient way to split strings is with a CLR function, but the next best T-SQL only method for this purpose is to use a numbers table to split your string, and in the absence of this create your own using stacked CTEs.
I will assume that you don't have a numbers table and use a stacked CTE in the interest of a complete working example.
CREATE FUNCTION dbo.Split (#StringToSplit VARCHAR(1000), #Delimiter CHAR(1))
RETURNS TABLE
AS
RETURN
( WITH N1 (N) AS (SELECT 1 FROM (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS t (n)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY n1.N) FROM N1 CROSS JOIN N2 AS N2)
SELECT Position = ROW_NUMBER() OVER(ORDER BY n.N),
Value = SUBSTRING(#StringToSplit, n.N, ISNULL(NULLIF(CHARINDEX(#Delimiter, #StringToSplit, n.N + 1), 0), LEN(#StringToSplit)) - n.N)
FROM Numbers AS n
WHERE SUBSTRING(#Delimiter + #StringToSplit, n.N, 1) = #Delimiter
);
Which you can call fairly simply:
DECLARE #Table TABLE (URL VARCHAR(255) NOT NULL);
INSERT #Table VALUES ('URL:/www.google.com/hsisn/-#++#/valuetoretrive/+#(#(/.html');
SELECT s.*
FROM #Table AS t
CROSS APPLY dbo.Split(t.URL, '/') AS s;
Which gives you:
Position Value
---------------------
1 URL:
2 www.google.com
3 hsisn
4 -#++#
5 valuetoretrive
6 +#(#(
7 .htm
So you can simply select the 5th value from this by adding a where clause.:
DECLARE #Table TABLE (URL VARCHAR(255) NOT NULL);
INSERT #Table
VALUES
('URL:/www.google.com/hsisn/-#++#/valuetoretrive/+#(#(/.html'),
('URL:/www.google.com/hsisn/-#++#/valuetoretrive2/+#(#(/.html');
SELECT t.URL, s.Value
FROM #Table AS t
CROSS APPLY dbo.Split(t.URL, '/') AS s
WHERE s.Position = 5;
If you don't know before hand the lenght of the url or the value to retrieve or slash positions you can use this snipet
declare #uri varchar(max) = 'URL:/www.google.com/hsisn/-#++#/valuetoretrive/+#(#(/.html'
,#startAt int = 0
,#slashCount int = 0
while #slashCount < 5
begin
set #startAt = CHARINDEX('/',#uri);
set #slashCount = #slashCount + 1;
if (#slashCount = 5)
set #uri = SUBSTRING(#uri, 0, #startAt)
else
set #uri = SUBSTRING(#uri, #startAt + 1, LEN(#uri))
-- debug info
select #startAt, #slashCount, #uri
end
it ill decompose the string, getting slash positions until it find #4 and #5 slash and get anything between.
OUTPUT
5 1 www.google.com/hsisn/-#++#/valuetoretrive/+#(#(/.html
15 2 hsisn/-#++#/valuetoretrive/+#(#(/.html
6 3 -#++#/valuetoretrive/+#(#(/.html
6 4 valuetoretrive/+#(#(/.html
15 5 valuetoretrive
You also can get it using a cross apply instead of a while loop but this way you code ill not need to get big and messy to get anything after the n > 10, n-th slah.

replace value in varchar(max) field with join

I have a table that contains text field with placeholders. Something like this:
Row Notes
1. This is some notes ##placeholder130## this ##myPlaceholder##, #oneMore#. End.
2. Second row...just a ##test#.
(This table contains about 1-5k rows on average. Average number of placeholders in one row is 5-15).
Now, I have a lookup table that looks like this:
Name Value
placeholder130 Dog
myPlaceholder Cat
oneMore Cow
test Horse
(Lookup table will contain anywhere from 10k to 100k records)
I need to find the fastest way to join those placeholders from strings to a lookup table and replace with value. So, my result should look like this (1st row):
This is some notes Dog this Cat, Cow. End.
What I came up with was to split each row into multiple for each placeholder and then join it to lookup table and then concat records back to original row with new values, but it takes around 10-30 seconds on average.
You could try to split the string using a numbers table and rebuild it with for xml path.
select (
select coalesce(L.Value, T.Value)
from Numbers as N
cross apply (select substring(Notes.notes, N.Number, charindex('##', Notes.notes + '##', N.Number) - N.Number)) as T(Value)
left outer join Lookup as L
on L.Name = T.Value
where N.Number <= len(notes) and
substring('##' + notes, Number, 2) = '##'
order by N.Number
for xml path(''), type
).value('text()[1]', 'varchar(max)')
from Notes
SQL Fiddle
I borrowed the string splitting from this blog post by Aaron Bertrand
SQL Server is not very fast with string manipulation, so this is probably best done client-side. Have the client load the entire lookup table, and replace the notes as they arrived.
Having said that, it can of course be done in SQL. Here's a solution with a recursive CTE. It performs one lookup per recursion step:
; with Repl as
(
select row_number() over (order by l.name) rn
, Name
, Value
from Lookup l
)
, Recurse as
(
select Notes
, 0 as rn
from Notes
union all
select replace(Notes, '##' + l.name + '##', l.value)
, r.rn + 1
from Recurse r
join Repl l
on l.rn = r.rn + 1
)
select *
from Recurse
where rn =
(
select count(*)
from Lookup
)
option (maxrecursion 0)
Example at SQL Fiddle.
Another option is a while loop to keep replacing lookups until no more are found:
declare #notes table (notes varchar(max))
insert #notes
select Notes
from Notes
while 1=1
begin
update n
set Notes = replace(n.Notes, '##' + l.name + '##', l.value)
from #notes n
outer apply
(
select top 1 Name
, Value
from Lookup l
where n.Notes like '%##' + l.name + '##%'
) l
where l.name is not null
if ##rowcount = 0
break
end
select *
from #notes
Example at SQL Fiddle.
I second the comment that tsql is just not suited for this operation, but if you must do it in the db here is an example using a function to manage the multiple replace statements.
Since you have a relatively small number of tokens in each note (5-15) and a very large number of tokens (10k-100k) my function first extracts tokens from the input as potential tokens and uses that set to join to your lookup (dbo.Token below). It was far too much work to look for an occurrence of any of your tokens in each note.
I did a bit of perf testing using 50k tokens and 5k notes and this function runs really well, completing in <2 seconds (on my laptop). Please report back how this strategy performs for you.
note: In your example data the token format was not consistent (##_#, ##_##, #_#), I am guessing this was simply a typo and assume all tokens take the form of ##TokenName##.
--setup
if object_id('dbo.[Lookup]') is not null
drop table dbo.[Lookup];
go
if object_id('dbo.fn_ReplaceLookups') is not null
drop function dbo.fn_ReplaceLookups;
go
create table dbo.[Lookup] (LookupName varchar(100) primary key, LookupValue varchar(100));
insert into dbo.[Lookup]
select '##placeholder130##','Dog' union all
select '##myPlaceholder##','Cat' union all
select '##oneMore##','Cow' union all
select '##test##','Horse';
go
create function [dbo].[fn_ReplaceLookups](#input varchar(max))
returns varchar(max)
as
begin
declare #xml xml;
select #xml = cast(('<r><i>'+replace(#input,'##' ,'</i><i>')+'</i></r>') as xml);
--extract the potential tokens
declare #LookupsInString table (LookupName varchar(100) primary key);
insert into #LookupsInString
select distinct '##'+v+'##'
from ( select [v] = r.n.value('(./text())[1]', 'varchar(100)'),
[r] = row_number() over (order by n)
from #xml.nodes('r/i') r(n)
)d(v,r)
where r%2=0;
--tokenize the input
select #input = replace(#input, l.LookupName, l.LookupValue)
from dbo.[Lookup] l
join #LookupsInString lis on
l.LookupName = lis.LookupName;
return #input;
end
go
return
--usage
declare #Notes table ([Id] int primary key, notes varchar(100));
insert into #Notes
select 1, 'This is some notes ##placeholder130## this ##myPlaceholder##, ##oneMore##. End.' union all
select 2, 'Second row...just a ##test##.';
select *,
dbo.fn_ReplaceLookups(notes)
from #Notes;
Returns:
Tokenized
--------------------------------------------------------
This is some notes Dog this Cat, Cow. End.
Second row...just a Horse.
Try this
;WITH CTE (org, calc, [Notes], [level]) AS
(
SELECT [Notes], [Notes], CONVERT(varchar(MAX),[Notes]), 0 FROM PlaceholderTable
UNION ALL
SELECT CTE.org, CTE.[Notes],
CONVERT(varchar(MAX), REPLACE(CTE.[Notes],'##' + T.[Name] + '##', T.[Value])), CTE.[level] + 1
FROM CTE
INNER JOIN LookupTable T ON CTE.[Notes] LIKE '%##' + T.[Name] + '##%'
)
SELECT DISTINCT org, [Notes], level FROM CTE
WHERE [level] = (SELECT MAX(level) FROM CTE c WHERE CTE.org = c.org)
SQL FIDDLE DEMO
Check the below devioblog post for reference
devioblog post
To get speed, you can preprocess the note templates into a more efficient form. This will be a sequence of fragments, with each ending in a substitution. The substitution might be NULL for the last fragment.
Notes
Id FragSeq Text SubsId
1 1 'This is some notes ' 1
1 2 ' this ' 2
1 3 ', ' 3
1 4 '. End.' null
2 1 'Second row...just a ' 4
2 2 '.' null
Subs
Id Name Value
1 'placeholder130' 'Dog'
2 'myPlaceholder' 'Cat'
3 'oneMore' 'Cow'
4 'test' 'Horse'
Now we can do the substitutions with a simple join.
SELECT Notes.Text + COALESCE(Subs.Value, '')
FROM Notes LEFT JOIN Subs
ON SubsId = Subs.Id WHERE Notes.Id = ?
ORDER BY FragSeq
This produces a list of fragments with substitutions complete. I am not an MSQL user, but in most dialects of SQL you can concatenate these fragments in a variable quite easily:
DECLARE #Note VARCHAR(8000)
SELECT #Note = COALESCE(#Note, '') + Notes.Text + COALSCE(Subs.Value, '')
FROM Notes LEFT JOIN Subs
ON SubsId = Subs.Id WHERE Notes.Id = ?
ORDER BY FragSeq
Pre-processing a note template into fragments will be straightforward using the string splitting techniques of other posts.
Unfortunately I'm not at a location where I can test this, but it ought to work fine.
I really don't know how it will perform with 10k+ of lookups.
how does the old dynamic SQL performs?
DECLARE #sqlCommand NVARCHAR(MAX)
SELECT #sqlCommand = N'PlaceholderTable.[Notes]'
SELECT #sqlCommand = 'REPLACE( ' + #sqlCommand +
', ''##' + LookupTable.[Name] + '##'', ''' +
LookupTable.[Value] + ''')'
FROM LookupTable
SELECT #sqlCommand = 'SELECT *, ' + #sqlCommand + ' FROM PlaceholderTable'
EXECUTE sp_executesql #sqlCommand
Fiddle demo
And now for some recursive CTE.
If your indexes are correctly set up, this one should be very fast or very slow. SQL Server always surprises me with performance extremes when it comes to the r-CTE...
;WITH T AS (
SELECT
Row,
StartIdx = 1, -- 1 as first starting index
EndIdx = CAST(patindex('%##%', Notes) as int), -- first ending index
Result = substring(Notes, 1, patindex('%##%', Notes) - 1)
-- (first) temp result bounded by indexes
FROM PlaceholderTable -- **this is your source table**
UNION ALL
SELECT
pt.Row,
StartIdx = newstartidx, -- starting index (calculated in calc1)
EndIdx = EndIdx + CAST(newendidx as int) + 1, -- ending index (calculated in calc4 + total offset)
Result = Result + CAST(ISNULL(newtokensub, newtoken) as nvarchar(max))
-- temp result taken from subquery or original
FROM
T
JOIN PlaceholderTable pt -- **this is your source table**
ON pt.Row = T.Row
CROSS APPLY(
SELECT newstartidx = EndIdx + 2 -- new starting index moved by 2 from last end ('##')
) calc1
CROSS APPLY(
SELECT newtxt = substring(pt.Notes, newstartidx, len(pt.Notes))
-- current piece of txt we work on
) calc2
CROSS APPLY(
SELECT patidx = patindex('%##%', newtxt) -- current index of '##'
) calc3
CROSS APPLY(
SELECT newendidx = CASE
WHEN patidx = 0 THEN len(newtxt) + 1
ELSE patidx END -- if last piece of txt, end with its length
) calc4
CROSS APPLY(
SELECT newtoken = substring(pt.Notes, newstartidx, newendidx - 1)
-- get the new token
) calc5
OUTER APPLY(
SELECT newtokensub = Value
FROM LookupTable
WHERE Name = newtoken -- substitute the token if you can find it in **your lookup table**
) calc6
WHERE newstartidx + len(newtxt) - 1 <= len(pt.Notes)
-- do this while {new starting index} + {length of txt we work on} exceeds total length
)
,lastProcessed AS (
SELECT
Row,
Result,
rn = row_number() over(partition by Row order by StartIdx desc)
FROM T
) -- enumerate all (including intermediate) results
SELECT *
FROM lastProcessed
WHERE rn = 1 -- filter out intermediate results (display only last ones)

Print bullet before each sentence + new line after each sentence SQL

I have a text like: Sentence one. Sentence two. Sentence three.
I want it to be:
Sentence one.
Sentence two.
Sentence three.
I assume I can replace '.' with '.' + char(10) + char(13), but how can I go about bullets? '•' character works fine if printed manually I just do not know how to bullet every sentence including the first.
-- Initial string
declare #text varchar(100)
set #text = 'Sentence one. Sentence two. Sentence three.'
-- Setting up replacement text - new lines (assuming this works) and bullets ( char(149) )
declare #replacement varchar(100)
set #replacement = '.' + char(10) + char(13) + char(149)
-- Adding a bullet at the beginning and doing the replacement, but this will also add a trailing bullet
declare #processedText varchar(100)
set #processedText = char(149) + ' ' + replace(#text, '.', #replacement)
-- Figure out length of substring to select in the next step
declare #substringLength int
set #substringLength = LEN(#processedText) - CHARINDEX(char(149), REVERSE(#processedText))
-- Removes trailing bullet
select substring(#processedText, 0, #substringLength)
I've tested here - https://data.stackexchange.com/stackoverflow/qt/119364/
I should point out that doing this in T-SQL doesn't seem correct. T-SQL is meant to process data; any presentation-specific work should be done in the code that calls this T-SQL (C# or whatever you're using).
Here's my over-the-top approach but I feel it's a fairly solid approach. It combines classic SQL problem solving techniques of Number tables for string slitting and use of the FOR XML for concatenating the split lines back together. The code is long but the only place you'd need to actually edit is the SOURCE_DATA section.
No knock on #Jeremy Wiggins approach, but I prefer mine as it lends itself well to a set based approach in addition to being fairly efficient code.
-- This code will rip lines apart based on #delimiter
-- and put them back together based on #rebind
DECLARE
#delimiter char(1)
, #rebind varchar(10);
SELECT
#delimiter = '.'
, #rebind = char(10) + char(149) + ' ';
;
-- L0 to L5 simulate a numbers table
-- http://billfellows.blogspot.com/2009/11/fast-number-generator.html
WITH L0 AS
(
SELECT
0 AS C
UNION ALL
SELECT
0
)
, L1 AS
(
SELECT
0 AS c
FROM
L0 AS A
CROSS JOIN L0 AS B
)
, L2 AS
(
SELECT
0 AS c
FROM
L1 AS A
CROSS JOIN L1 AS B
)
, L3 AS
(
SELECT
0 AS c
FROM
L2 AS A
CROSS JOIN L2 AS B
)
, L4 AS
(
SELECT
0 AS c
FROM
L3 AS A
CROSS JOIN L3 AS B
)
, L5 AS
(
SELECT
0 AS c
FROM
L4 AS A
CROSS JOIN L4 AS B
)
, NUMS AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS number
FROM
L5
)
, SOURCE_DATA (ID, content) AS
(
-- This query simulates your input data
SELECT 1, 'Sentence one. Sentence two. Sentence three.'
UNION ALL SELECT 7, 'In seed time learn, in harvest teach, in winter enjoy.Drive your cart and your plow over the bones of the dead.The road of excess leads to the palace of wisdom.Prudence is a rich, ugly old maid courted by Incapacity.He who desires but acts not, breeds pestilence.'
)
, MAX_LENGTH AS
(
-- this query is rather important. The current NUMS query generates a
-- very large set of numbers but we only need 1 to maximum lenth of our
-- source data. We can take advantage of a 2008 feature of letting
-- TOP take a dynamic value
SELECT TOP (SELECT MAX(LEN(SD.content)) AS max_length FROM SOURCE_DATA SD)
N.number
FROM
NUMS N
)
, MULTI_LINES AS
(
-- This query will make many lines out a single line based on the supplied delimiter
-- Need to retain the ID (or some unique value from original data to regroup it
-- http://www.sommarskog.se/arrays-in-sql-2005.html#tblnum
SELECT
SD.ID
, LTRIM(substring(SD.content, Number, charindex(#delimiter, SD.content + #delimiter, Number) - Number)) + #delimiter AS lines
FROM
MAX_LENGTH
CROSS APPLY
SOURCE_DATA SD
WHERE
Number <= len(SD.content)
AND substring(#delimiter + SD.content, Number, 1) = #delimiter
)
, RECONSITITUE (content, ID) AS
(
-- use classic concatenation to put it all back together
-- using CR/LF * (space) as delimiter
-- as a correlated sub query and joined back to our original table to preserve IDs
-- https://stackoverflow.com/questions/5196371/sql-query-concatenating-results-into-one-string
SELECT DISTINCT
STUFF
(
(
SELECT #rebind + M.lines
FROM MULTI_LINES M
WHERE M.ID = ML.ID
FOR XML PATH('')
)
, 1
, 1
, '')
, ML.ID
FROM
MULTI_LINES ML
)
SELECT
R.content
, R.ID
FROM
RECONSITITUE R
Results
content ID
----------------------------------------------------------- ---
• In seed time learn, in harvest teach, in winter enjoy.
• Drive your cart and your plow over the bones of the dead.
• The road of excess leads to the palace of wisdom.
• Prudence is a rich, ugly old maid courted by Incapacity.
• He who desires but acts not, breeds pestilence. 7
• Sentence one.
• Sentence two.
• Sentence three. 1
(2 row(s) affected)
References
Number table
Splitting strings via number table
SQL Query - Concatenating Results into One String
select '• On '+ cast(getdate() as varchar)+' I discovered how to do this '
Sample

SQL: Find rows where Column contains all of the given words

I have some column EntityName, and I want to have users to be able to search names by entering words separated by space. The space is implicitly considered as an 'AND' operator, meaning that the returned rows must have all of the words specified, and not necessarily in the given order.
For example, if we have rows like these:
abba nina pretty balerina
acdc you shook me all night long
sth you are me
dream theater it's all about you
when the user enters: me you, or you me (the results must be equivalent), the result has rows 2 and 3.
I know I can go like:
WHERE Col1 LIKE '%' + word1 + '%'
AND Col1 LIKE '%' + word2 + '%'
but I wanted to know if there's some more optimal solution.
The CONTAINS would require a full text index, which (for various reasons) is not an option.
Maybe Sql2008 has some built-in, semi-hidden solution for these cases?
The only thing I can think of is to write a CLR function that does the LIKE comparisons. This should be many times faster.
Update: Now that I think about it, it makes sense CLR would not help. Two other ideas:
1 - Try indexing Col1 and do this:
WHERE (Col1 LIKE word1 + '%' or Col1 LIKE '%' + word1 + '%')
AND (Col1 LIKE word2 + '%' or Col1 LIKE '%' + word2 + '%')
Depending on the most common searches (starts with vs. substring), this may offer an improvement.
2 - Add your own full text indexing table where each word is a row in the table. Then you can index properly.
Function
CREATE FUNCTION [dbo].[fnSplit] ( #sep CHAR(1), #str VARCHAR(512) )
RETURNS TABLE AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(#sep, #str)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(#sep, #str, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT
pn AS Id,
SUBSTRING(#str, start, CASE WHEN stop > 0 THEN stop - start ELSE 512 END) AS Data
FROM
Pieces
)
Query
DECLARE #FilterTable TABLE (Data VARCHAR(512))
INSERT INTO #FilterTable (Data)
SELECT DISTINCT S.Data
FROM fnSplit(' ', 'word1 word2 word3') S -- Contains words
SELECT DISTINCT
T.*
FROM
MyTable T
INNER JOIN #FilterTable F1 ON T.Col1 LIKE '%' + F1.Data + '%'
LEFT JOIN #FilterTable F2 ON T.Col1 NOT LIKE '%' + F2.Data + '%'
WHERE
F2.Data IS NULL
Source: SQL SELECT WHERE field contains words
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
You're going to end up with a full table scan anyway.
The collation can make a big difference apparently. Kalen Delaney in the book "Microsoft SQL Server 2008 Internals" says:
Collation can make a huge difference
when SQL Server has to look at almost
all characters in the strings. For
instance, look at the following:
SELECT COUNT(*) FROM tbl WHERE longcol LIKE '%abc%'
This may execute 10 times faster or more with a binary collation than a nonbinary Windows collation. And with varchar data, this executes up to seven or eight times faster with a SQL collation than with a Windows collation.
WITH Tokens AS(SELECT 'you' AS Token UNION ALL SELECT 'me')
SELECT ...
FROM YourTable AS t
WHERE (SELECT COUNT(*) FROM Tokens WHERE y.Col1 LIKE '%'+Tokens.Token+'%')
=
(SELECT COUNT(*) FROM Tokens) ;
This should ideally be done with the help of Full text search as mentioned above.
BUT,
If you don't have full text configured for your DB, here is a performance intensive solution for doing a prioritized string search.
-- table to search in
drop table if exists dbo.myTable;
go
CREATE TABLE dbo.myTable
(
myTableId int NOT NULL IDENTITY (1, 1),
code varchar(200) NOT NULL,
description varchar(200) NOT NULL -- this column contains the values we are going to search in
) ON [PRIMARY]
GO
-- function to split space separated search string into individual words
drop function if exists [dbo].[fnSplit];
go
CREATE FUNCTION [dbo].[fnSplit] (#StringInput nvarchar(max),
#Delimiter nvarchar(1))
RETURNS #OutputTable TABLE (
id nvarchar(1000)
)
AS
BEGIN
DECLARE #String nvarchar(100);
WHILE LEN(#StringInput) > 0
BEGIN
SET #String = LEFT(#StringInput, ISNULL(NULLIF(CHARINDEX(#Delimiter, #StringInput) - 1, -1),
LEN(#StringInput)));
SET #StringInput = SUBSTRING(#StringInput, ISNULL(NULLIF(CHARINDEX
(
#Delimiter, #StringInput
),
0
), LEN
(
#StringInput)
)
+ 1, LEN(#StringInput));
INSERT INTO #OutputTable (id)
VALUES (#String);
END;
RETURN;
END;
GO
-- this is the search script which can be optionally converted to a stored procedure /function
declare #search varchar(max) = 'infection upper acute genito'; -- enter your search string here
-- the searched string above should give rows containing the following
-- infection in upper side with acute genitointestinal tract
-- acute infection in upper teeth
-- acute genitointestinal pain
if (len(trim(#search)) = 0) -- if search string is empty, just return records ordered alphabetically
begin
select 1 as Priority ,myTableid, code, Description from myTable order by Description
return;
end
declare #splitTable Table(
wordRank int Identity(1,1), -- individual words are assinged priority order (in order of occurence/position)
word varchar(200)
)
declare #nonWordTable Table( -- table to trim out auxiliary verbs, prepositions etc. from the search
id varchar(200)
)
insert into #nonWordTable values
('of'),
('with'),
('at'),
('in'),
('for'),
('on'),
('by'),
('like'),
('up'),
('off'),
('near'),
('is'),
('are'),
(','),
(':'),
(';')
insert into #splitTable
select id from dbo.fnSplit(#search,' '); -- this function gives you a table with rows containing all the space separated words of the search like in this e.g., the output will be -
-- id
-------------
-- infection
-- upper
-- acute
-- genito
delete s from #splitTable s join #nonWordTable n on s.word = n.id; -- trimming out non-words here
declare #countOfSearchStrings int = (select count(word) from #splitTable); -- count of space separated words for search
declare #highestPriority int = POWER(#countOfSearchStrings,3);
with plainMatches as
(
select myTableid, #highestPriority as Priority from myTable where Description like #search -- exact matches have highest priority
union
select myTableid, #highestPriority-1 as Priority from myTable where Description like #search + '%' -- then with something at the end
union
select myTableid, #highestPriority-2 as Priority from myTable where Description like '%' + #search -- then with something at the beginning
union
select myTableid, #highestPriority-3 as Priority from myTable where Description like '%' + #search + '%' -- then if the word falls somewhere in between
),
splitWordMatches as( -- give each searched word a rank based on its position in the searched string
-- and calculate its char index in the field to search
select myTable.myTableid, (#countOfSearchStrings - s.wordRank) as Priority, s.word,
wordIndex = CHARINDEX(s.word, myTable.Description) from myTable join #splitTable s on myTable.Description like '%'+ s.word + '%'
-- and not exists(select myTableid from plainMatches p where p.myTableId = myTable.myTableId) -- need not look into myTables that have already been found in plainmatches as they are highest ranked
-- this one takes a long time though, so commenting it, will have no impact on the result
),
matchingRowsWithAllWords as (
select myTableid, count(myTableid) as myTableCount from splitWordMatches group by(myTableid) having count(myTableid) = #countOfSearchStrings
)
, -- trim off the CTE here if you don't care about the ordering of words to be considered for priority
wordIndexRatings as( -- reverse the char indexes retrived above so that words occuring earlier have higher weightage
-- and then normalize them to sequential values
select s.myTableid, Priority, word, ROW_NUMBER() over (partition by s.myTableid order by wordindex desc) as comparativeWordIndex
from splitWordMatches s join matchingRowsWithAllWords m on s.myTableId = m.myTableId
)
,
wordIndexSequenceRatings as ( -- need to do this to ensure that if the same set of words from search string is found in two rows,
-- their sequence in the field value is taken into account for higher priority
select w.myTableid, w.word, (w.Priority + w.comparativeWordIndex + coalesce(sequncedPriority ,0)) as Priority
from wordIndexRatings w left join
(
select w1.myTableid, w1.priority, w1.word, w1.comparativeWordIndex, count(w1.myTableid) as sequncedPriority
from wordIndexRatings w1 join wordIndexRatings w2 on w1.myTableId = w2.myTableId and w1.Priority > w2.Priority and w1.comparativeWordIndex>w2.comparativeWordIndex
group by w1.myTableid, w1.priority,w1.word, w1.comparativeWordIndex
)
sequencedPriority on w.myTableId = sequencedPriority.myTableId and w.Priority = sequencedPriority.Priority
),
prioritizedSplitWordMatches as ( -- this calculates the cumulative priority for a field value
select w1.myTableId, sum(w1.Priority) as OverallPriority from wordIndexSequenceRatings w1 join wordIndexSequenceRatings w2 on w1.myTableId = w2.myTableId
where w1.word <> w2.word group by w1.myTableid
),
completeSet as (
select myTableid, priority from plainMatches -- get plain matches which should be highest ranked
union
select myTableid, OverallPriority as priority from prioritizedSplitWordMatches -- get ranked split word matches (which are ordered based on word rank in search string and sequence)
),
maximizedCompleteSet as( -- set the priority of a field value = maximum priority for that field value
select myTableid, max(priority) as Priority from completeSet group by myTableId
)
select priority, myTable.myTableid , code, Description from maximizedCompleteSet m join myTable on m.myTableId = myTable.myTableId
order by Priority desc, Description -- order by priority desc to get highest rated items on top
--offset 0 rows fetch next 50 rows only -- optional paging