Can SQL STRING_SPLIT use two (or more) separators? - sql

I have a list of drug names that are stored in various upper and lower case combinations. I need to capitalize the first letter of each word in a string, excluding certain words. The string is separated by spaces, but can also be separated by a forward slash.
The following code works:
create table #exclusionlist (word varchar(25))
create table #drugnames (drugname varchar(50))
insert into #exclusionlist values ('ER')
insert into #exclusionlist values ('HCL')
insert into #drugnames values ('DRUGNAME ER')
insert into #drugnames values ('drugname hcl')
insert into #drugnames values ('ONEDRUG/OTHERDRUG')
select 'Product Name' = drugname
, 'Product Name 2' = STUFF((SELECT ' ' +
case when value in (select word from #exclusionlist) then upper(value)
else upper(left(value, 1)) + lower(substring(value, 2, len(value))) end
from STRING_SPLIT(drugname, ' ')
FOR XML PATH('')) ,1,1,'')
from #drugnames
The output looks like this:
Drugname ER
Drugname HCL
Onedrug/otherdrug
How can I get that last one to look like this:
Onedrug/Otherdrug
I did try STRING_SPLIT(replace(drugname, '/', ' '), ' ') but obviously replaces the slash with a space. And if the slash is at the end of the string like ONEDRUG/OTHERDRUG/ then the result looks like Onedrug Otherdrug
It's possible that the string may end in a forward slash due to the field only holding N number of characters. When data gets inserted into the table, only the first N characters of the drug name are inserted. If that Nth character is a slash, the string will end in a slash.

You can use a CASE expression for the separator parameter to the STRING_SPLIT function.
In the code below, a common table expression (CTE) uses STRING_SPLIT to split out all the drugname words and capitalize the first letter of each word as appropriate.
The CTE results are unioned together using STRING_AGG to join the drugnames back together. Note that separator parameter for STRING_AGG cannot be an expression. Using a CASE expression results in this error:
Msg 8733, Level 16, State 1, Line 14 Separator parameter for
STRING_AGG must be a string literal or variable.
For SQL Server, you would need to be on SQL 2017 or greater for the STRING_AGG function. (I added an id/identity column to the #drugnames temp table to assist with grouping.)
DROP TABLE IF EXISTS #exclusionlist;
DROP TABLE IF EXISTS #drugnames;
CREATE TABLE #exclusionlist (word VARCHAR(25))
CREATE TABLE #drugnames (id INT IDENTITY, drugname VARCHAR(50))
INSERT INTO #exclusionlist VALUES ('ER')
INSERT INTO #exclusionlist VALUES ('HCL')
INSERT INTO #drugnames VALUES ('DRUGNAME ER')
INSERT INTO #drugnames VALUES ('drugname hcl')
INSERT INTO #drugnames VALUES ('ONEDRUG/OTHERDRUG')
;WITH SomeStuff AS
(
select d.id, d.drugname,
sp.value AS SplitValue, el.word AS ExclusionListSpaceWord,
COALESCE(el.word,
UPPER(LEFT(sp.value, 1)) + LOWER(RIGHT(sp.value, LEN(sp.value) - 1))
) AS CapitalizedWord
from #drugnames d
CROSS APPLY STRING_SPLIT(d.drugname, CASE WHEN d.drugname LIKE '%/%' THEN '/' ELSE ' ' END ) sp
LEFT JOIN #exclusionlist el
ON el.word = sp.value
)
SELECT ss.id, STRING_AGG(ss.CapitalizedWord, '/') AS ReconstructedDrugname
FROM SomeStuff ss
WHERE ss.drugname LIKE '%/%'
GROUP BY ss.id
UNION
SELECT ss.id, STRING_AGG(ss.CapitalizedWord, ' ') AS ReconstructedDrugname
FROM SomeStuff ss
WHERE ss.drugname LIKE '% %'
GROUP BY ss.id
Output:
id ReconstructedDrugname
----------- ----------------------
1 Drugname ER
2 Drugname HCL
3 Onedrug/Otherdrug

If (hopefully) using SQL 2017+ you can somewhat compact the logic. first a single string split by using translate to create a common separator. THen apply the exclude word criteria forllowed by the upper-case criteria, then re-aggregate, noting which separator to use.
select *
from #drugnames
outer apply (
select case
when max(sep)=' ' then String_Agg(word,' ')
else String_Agg(word,'/') end NewName
from (
select
case
when exists (select * from #exclusionlist x where x.word = value)
then Upper(value)
else Stuff(Lower(value),1,1,Upper(Left(value,1)))
end word, Iif(drugname like '%/%','/',' ') sep
from String_Split(Translate(drugname,' /','**'),'*')
)w
)new;
Example DB<>Fiddle
Output:
Note - using string_split does not, according to the documentation, guarantee the ordering of the values. In practice, I've never seen this be the case and since you're already using the function I'm using here also. There's plenty of ways to split the string while retaining an ordering (using json for example) should it ever prove necessary.
If you are still on 2016 then the string_agg can just be replaced with the for xml implementation you're already using.

If the "/" is replaced to a " /", then the split can still happen on the space.
And the extra space can be removed afterwards.
SELECT
[Product Name] = d.drugname
, [Product Name 2] = ca.drugname2
FROM #drugnames d
CROSS APPLY (
SELECT
REPLACE(LTRIM(x.value('(./text())[1]','VARCHAR(MAX)')),' /','/') AS drugname2
FROM
(
SELECT ' '+
CASE
WHEN e.word IS NOT NULL THEN e.word
WHEN s.value LIKE '/%'
THEN STUFF(LOWER(s.value),1,2,UPPER(LEFT(s.value,2)))
ELSE STUFF(LOWER(s.value),1,1,UPPER(LEFT(s.value,1)))
END
FROM STRING_SPLIT(REPLACE(d.drugname,'/',' /'),' ') s
LEFT JOIN #exclusionlist e
ON e.word = s.value
FOR XML PATH(''), TYPE
) q(x)
) ca;
Product Name
Product Name 2
DRUGNAME ER
Drugname ER
drugname hcl
Drugname HCL
ONEDRUG/OTHERDRUG
Onedrug/Otherdrug
Test on db<>fiddle here

Related

I need help parsing an HL7 string with TSQL

I have a column in a table that looks like this
Name
WALKER^JAMES^K^^
ANDERSON^MICHAEL^R^^
HUFF^CHRIS^^^
WALKER^JAMES^K^^
SWEARINGEN^TOMMY^L^^
SMITH^JOHN^JACCOB^^
I need to write a query that looks like this
Name
FirstName
LastName
MiddleName
WALKER^JAMES^K^^
JAMES
WALKER
K
ANDERSON^MICHAEL^R^^
MICHAEL
ANDERSON
R
HUFF^CHRIS^^^
CHRIS
HUFF
BUTLER^STEWART^M^^
STEWART
BUTLER
M
SWEARINGEN^TOMMY^L^^
TOMMY
SWEARINGEN
L
SMITH^JOHN^JACCOB^^
JOHN
SMITH
JACCOB
I need help generating the LastName column.
This is what I've tried so far
SUBSTRING
(
--SEARCH THE NAME COLUMN
Name,
--Starting after the first '^'
CHARINDEX('^', Name) + 1 ),
--Index of second ^ minus the index of the first ^
(CHARINDEX('^', PatientName, CHARINDEX('^', PatientName) +1)) - (CHARINDEX('^', PatientName))
)
This produces:
Invalid length parameter passed to the LEFT or SUBSTRING function.
I know this can work because if I change the minus sign to a plus sign it performs as expected.
It produces the right integer.
Where am I going wrong? Is there a better way to do this?
If you are using the latest SQL Server versions 2016 13.x or higher, you can maximize the use of string_split function with ordinal (position).
declare #strTable table(sqlstring varchar(max))
insert into #strTable (sqlstring) values ('WALKER^JAMES^K^^')
insert into #strTable (sqlstring) values ('ANDERSON^MICHAEL^R^^')
insert into #strTable (sqlstring) values ('HUFF^CHRIS^^^')
insert into #strTable (sqlstring) values ('SWEARINGEN^TOMMY^L^^');
with tmp as
(select value s, Row_Number() over (order by (select 0)) n from #strTable
cross apply String_Split(sqlstring, '^', 1))
select t2.s as FirstName, t1.s as LastName, t3.s as MiddleInitial from tmp t1
left join tmp t2 on t2.n-t1.n = 1
left join tmp t3 on t3.n-t1.n = 2
where t1.n = 1 or t1.n % 5 = 1
I recommend SUBSTRING() as it will perform the best. The challenge with SUBSTRING is it's hard to account to keep track of the nested CHARDINDEX() calls so it's better to break the calculation into pieces. I use CROSS APPLY to alias each "^" found and start from there to search for the next. Also allows to do NULLIF() = 0, so if it can't find the "^", it just returns a NULL instead of erroring out
Parse Delimited String using SUBSTRING() and CROSS APPLY
DROP TABLE IF EXISTS #Name
CREATE TABLE #Name (ID INT IDENTITY(1,1) PRIMARY KEY,[Name] varchar(255))
INSERT INTO #Name
VALUES ('WALKER^JAMES^K^^')
,('ANDERSON^MICHAEL^R^^')
,('HUFF^CHRIS^^^')
,('SWEARINGEN^TOMMY^L^^');
SELECT ID
,A.[Name]
,LastName = NULLIF(SUBSTRING(A.[Name],0,idx1),'')
,FirstName = NULLIF(SUBSTRING(A.[Name],idx1+1,idx2-idx1-1),'')
,MiddleInitial = NULLIF(SUBSTRING(A.[Name],idx2+1,idx3-idx2-1),'')
FROM #Name AS A
CROSS APPLY (SELECT idx1 = NULLIF(CHARINDEX('^',[Name]),0)) AS B
CROSS APPLY (SELECT idx2 = NULLIF(CHARINDEX('^',[Name],idx1+1),0)) AS C
CROSS APPLY (SELECT idx3 = NULLIF(CHARINDEX('^',[Name],idx2+1),0)) AS D

SQL Server: Select rows with multiple occurrences of regex match in a column

I’m fairly used to using MySQL, but not particularly familiar with SQL Server. Tough luck, the database I’m dealing with here is on SQL Server 2014.
I have a table with a column whose values are all integers with leading, separating, and trailing semicolons, like these three fictitious rows:
;905;1493;384;13387;29;933;467;28732;
;905;138;3084;1387;290;9353;4767;2732;
;9085;14493;3864;130387;289;933;4767;28732;
What I am trying to do now is to select all rows where more than one number taken from a list of numbers appears in this column. So for example, given the three rows above, if I have the group 905,467,4767, the statement I’m trying to figure out how to construct should return the first two rows: the first row contains 905 and 467; the second row contains 905 and 4767. The third row contains only 4767, so that row should not be returned.
As far as I can tell, SQL Server does not actually support regex directly (and I don’t even know what managed code is), which doesn’t help. Even with regex, I wouldn’t know where to begin. Oracle seems to have a function that would be very useful, but that’s Oracle.
Most similar questions on here deal with finding multiple instances of the same character (usually singular) and solve the problem by replacing the string to match with nothing and counting the difference in length. I suppose that would technically work here, too, but given a ‘filter’ group of 15 numbers, the SELECT statement would become ridiculously long and convoluted and utterly unreadable. Additionally, I only want to match entire numbers (so if one of the numbers to match is 29, the value 29 would match in the first row, but the value 290 in the second row should not match), which means I’d have to include the semicolons in the REPLACE clause and then discount them when calculating the length. A complete mess.
What I would ideally like to do is something like this:
SELECT * FROM table WHERE REGEXP_COUNT(column, ';(905|467|4767);') > 1
– but that will obviously not work, for all kinds of reasons (the most obvious one being the nonexistence of REGEXP_COUNT outside Oracle).
Is there some sane, manageable way of doing this?
You can do
SELECT *
FROM Mess
CROSS APPLY (SELECT COUNT(*)
FROM (VALUES (905),
(467),
(4767)) V(Num)
WHERE Col LIKE CONCAT('%;', Num, ';%')) ca(count)
WHERE count > 1
SQL Fiddle
Or alternatively
WITH Nums
AS (SELECT Num
FROM (VALUES (905),
(467),
(4767)) V(Num))
SELECT Mess.*
FROM Mess
CROSS APPLY (VALUES(CAST(CONCAT('<x>', REPLACE(Col, ';', '</x><x>'), '</x>') AS XML))) x(x)
CROSS APPLY (SELECT COUNT(*)
FROM (SELECT n.value('.', 'int')
FROM x.x.nodes('/x') n(n)
WHERE n.value('.', 'varchar') <> ''
INTERSECT
SELECT Num
FROM Nums) T(count)
HAVING COUNT(*) > 1) ca2(count)
Could you put your arguments into a table (perhaps using a table-valued function accepting a string (of comma-separated integers) as a parameter) and use something like this?
DECLARE #T table (String varchar(255))
INSERT INTO #T
VALUES
(';905;1493;384;13387;29;933;467;28732;')
, (';905;138;3084;1387;290;9353;4767;2732;')
, (';9085;14493;3864;130387;289;933;4767;28732;')
DECLARE #Arguments table (Arg int)
INSERT INTO #Arguments
VALUES
(905)
, (467)
, (4767)
SELECT String
FROM
#T
CROSS JOIN #Arguments
GROUP BY String
HAVING SUM(CASE WHEN PATINDEX('%;' + CAST(Arg AS varchar) + ';%', String) > 0 THEN 1 ELSE 0 END) > 1
And example of using this with a function to generate the arguments:
CREATE FUNCTION GenerateArguments (#Integers varchar(255))
RETURNS #Arguments table (Arg int)
AS
BEGIN
WITH cte
AS
(
SELECT
PATINDEX('%,%', #Integers) p
, LEFT(#Integers, PATINDEX('%,%', #Integers) - 1) n
UNION ALL
SELECT
CASE WHEN PATINDEX('%,%', SUBSTRING(#Integers, p + 1, LEN(#Integers))) + p = p THEN 0 ELSE PATINDEX('%,%', SUBSTRING(#Integers, p + 1, LEN(#Integers))) + p END
, CASE WHEN PATINDEX('%,%', SUBSTRING(#Integers, p + 1, LEN(#Integers))) = 0 THEN RIGHT(#Integers, PATINDEX('%,%', REVERSE(#Integers)) - 1) ELSE LEFT(SUBSTRING(#Integers, p + 1, LEN(#Integers)), PATINDEX('%,%', SUBSTRING(#Integers, p + 1, LEN(#Integers))) - 1) END
FROM cte
WHERE p <> 0
)
INSERT INTO #Arguments (Arg)
SELECT n
FROM cte
RETURN
END
GO
DECLARE #T table (String varchar(255))
INSERT INTO #T
VALUES
(';905;1493;384;13387;29;933;467;28732;')
, (';905;138;3084;1387;290;9353;4767;2732;')
, (';9085;14493;3864;130387;289;933;4767;28732;')
;
SELECT String
FROM
#T
CROSS JOIN GenerateArguments('905,467,4767')
GROUP BY String
HAVING SUM(CASE WHEN PATINDEX('%;' + CAST(Arg AS varchar) + ';%', String) > 0 THEN 1 ELSE 0 END) > 1
You can achieve this using the like function for the regex and row_number to determine the number of matches.
Here we declare the column values for testing:
DECLARE #tbl TABLE (
string NVARCHAR(MAX)
)
INSERT #tbl VALUES
(';905;1493;384;13387;29;933;467;28732;'),
(';905;138;3084;1387;290;9353;4767;2732;'),
(';9085;14493;3864;130387;289;933;4767;28732;')
Then we pass your search parameters into a table variable to be joined on:
DECLARE #search_tbl TABLE (
search_value INT
)
INSERT #search_tbl VALUES
(905),
(467),
(4767)
Finally we join the table with the column to search for onto the search table. We apply the row_number function to determine the number of times it matches. We select from this subquery where the row_number = 2 meaning that it joined at least twice.
SELECT
string
FROM (
SELECT
tbl.string,
ROW_NUMBER() OVER (PARTITION BY tbl.string ORDER BY tbl.string) AS rn
FROM #tbl tbl
JOIN #search_tbl search_tbl ON
tbl.string LIKE '%;' + CAST(search_tbl.search_value AS NVARCHAR(MAX)) + ';%'
) tbl
WHERE rn = 2
You could build a where clause like this :
WHERE
case when column like '%;905;%' then 1 else 0 end +
case when column like '%;467;%' then 1 else 0 end +
case when column like '%;4767;%' then 1 else 0 end >= 2
The advantage is that you do not need a helper table. I don't know how you build the query, but the following also works, and is useful if the numbers are in a tsql variable.
case when column like ('%;' + #n + ';%') then 1 else 0 end

deleting second comma in data

Ok so I have a table called PEOPLE that has a name column. In the name column is a name, but its totally a mess. For some reason its not listed such as last, first middle. It's sitting like last,first,middle and last first (and middle if there) are separated by a comma.. two commas if the person has a middle name.
example:
smith,steve
smith,steve,j
smith,ryan,tom
I'd like the second comma taken away (for parsing reason ) spaces put after existing first comma so the above would come out looking like:
smith, steve
smith, steve j
smith, ryan tom
Ultimately I'd like to be able to parse the names into first, middle, and last name fields, but that's for another post :_0. I appreciate any help.
thank you.
Drop table T1;
Create table T1(Name varchar(100));
Insert T1 Values
('smith,steve'),
('smith,steve,j'),
('smith,ryan,tom');
UPDATE T1
SET Name=
CASE CHARINDEX(',',name, CHARINDEX(',',name)+1) WHEN
0 THEN Name
ELSE
LEFT(name,CHARINDEX(',',name, CHARINDEX(',',name)+1)-1)+' ' +
RIGHT(name,LEN(Name)-CHARINDEX(',',name, CHARINDEX(',',name)+1))
END
Select * from T1
This seems to work. Not the most concise but avoids cursors.
DECLARE #people TABLE (name varchar(50))
INSERT INTO #people
SELECT 'smith,steve'
UNION
SELECT 'smith,steve,j'
UNION
SELECT 'smith,ryan,tom'
UNION
SELECT 'commaless'
SELECT name,
CASE
WHEN CHARINDEX(',',name) > 0 THEN
CASE
WHEN CHARINDEX(',',name,CHARINDEX(',',name) + 1) > 0 THEN
STUFF(STUFF(name, CHARINDEX(',',name,CHARINDEX(',',name) + 1), 1, ' '),CHARINDEX(',',name),1,', ')
ELSE
STUFF(name,CHARINDEX(',',name),1,', ')
END
ELSE name
END AS name2
FROM #people
Using a table function to split apart the names with a delimiter and for XML Path to stitch them back together, we can get what you're looking for! Hope this helps!
Declare #People table(FullName varchar(200))
Insert Into #People Values ('smith,steve')
Insert Into #People Values ('smith,steve,j')
Insert Into #People Values ('smith,ryan,tom')
Insert Into #People Values ('smith,john,joseph Jr')
Select p.*,stuff(fn.FullName,1,2,'') as ModifiedFullName
From #People p
Cross Apply (
select
Case When np.posID<=2 Then ', ' Else ' ' End+np.Val
From #People n
Cross Apply Custom.SplitValues(n.FullName,',') np
Where n.FullName=p.FullName
For XML Path('')
) fn(FullName)
Output:
ModifiedFullName
smith, steve
smith, steve j
smith, ryan tom
smith, john joseph Jr
SplitValues table function definition:
/*
This Function takes a delimited list of values and returns a table containing
each individual value and its position.
*/
CREATE FUNCTION [Custom].[SplitValues]
(
#List varchar(max)
, #Delimiter varchar(1)
)
RETURNS
#ValuesTable table
(
posID int
,val varchar(1000)
)
AS
BEGIN
WITH Cte AS
(
SELECT CAST('<v>' + REPLACE(#List, #Delimiter, '</v><v>') + '</v>' AS XML) AS val
)
INSERT #ValuesTable (posID,val)
SELECT row_number() over(Order By x) as posID, RTRIM(LTRIM(Split.x.value('.', 'VARCHAR(1000)'))) AS val
FROM Cte
CROSS APPLY val.nodes('/v') Split(x)
RETURN
END
GO
String manipulation in SQLServer, outside of writing your own User Defined Function, is limited but you can use the PARSENAME function for your purposes here. It takes a string, splits it on the period character, and returns the segment you specify.
Try this:
DECLARE #name VARCHAR(100) = 'smith,ryan,tom'
SELECT REVERSE(PARSENAME(REPLACE(REVERSE(#name), ',', '.'), 1)) + ', ' +
REVERSE(PARSENAME(REPLACE(REVERSE(#name), ',', '.'), 2)) +
COALESCE(' ' + REVERSE(PARSENAME(REPLACE(REVERSE(#name), ',', '.'), 3)), '')
Result: smith, ryan tom
If you set #name to 'smith,steve' instead, you'll get:
Result: smith, steve
Segment 1 actually gives you the last segment, segment 2 the second to last etc. Hence I've used REVERSE to get the order you want. In the case of 'steve,smith', segment 3 will be null, hence the COALESCE to add an empty string if that is the case. The REPLACE of course changes the commas to periods so that the split will work.
Note that this is a bit of a hack. PARSENAME will not work if there are more than four parts and this will fail if the name happens to contain a period. However if your data conforms to these limitations, hopefully it provides you with a solution.
Caveat: it sounds like your data may be inconsistently formatted. In that case, applying any automated treatment to it is going to be risky. However, you could try:
UPDATE people SET name = REPLACE(name, ',', ' ')
UPDATE people SET name = LEFT(name, CHARINDEX(' ', name)-1)+ ', '
+ RIGHT(name, LEN(name) - CHARINDEX(' ', name)
That'll work for the three examples you give. What it will do to the rest of your set is another question.
Here's an example with CHARINDEX() and SUBSTRING
WITH yourTable
AS
(
SELECT names
FROM
(
VALUES ('smith,steve'),('smith,steve,j'),('smith,ryan,tom')
) A(names)
)
SELECT names AS old,
CASE
WHEN comma > 0
THEN SUBSTRING(spaced_names,0,comma + 1) --before the comma
+ SUBSTRING(spaced_names,comma + 2,1000) --after the comma
ELSE spaced_names
END AS new
FROM yourTable
CROSS APPLY(SELECT CHARINDEX(',',names,CHARINDEX(',',names) + 1),REPLACE(names,',',', ')) AS CA(comma,spaced_names)

Select statement that concatenates the first character after every '/' character in a column

So I am trying to write a query which, among other things, brings back the first character in a Varchar field, then returns the first character which appears after each / character throughout the rest of the field.
The field I am refrering too will contain a group of last names, separated by a '/'. For example: Fischer-Costello/Korbell/Morrison/Pearson
For the above example, I would want my select statement to return: FKMP.
So far, I have only been able to get my code to return the first character + the first character after the FIRST (and only the first) '/' character.
So for the above example input, my select statement would return: FK
Here is the code that I have written so far:
select rp.CONTACT_ID, ra.TRADE_REP, c.FIRST_NAME, c.LAST_NAME,
UPPER(LEFT(FIRST_NAME, 1)) + SUBSTRING(c.first_name,CHARINDEX('/',c.first_name)+1,1) as al_1,
UPPER(LEFT(LAST_NAME, 1)) + SUBSTRING(c.LAST_name,CHARINDEX('/',c.LAST_name)+1,1) as al_2
from dbo.REP_ALIAS ra
inner join dbo.REP_PROFILE rp on rp.CONTACT_ID = ra.CONTACT_ID
inner join dbo.CONTACT c on rp.CONTACT_ID = c.CONTACT_ID
where
rp.CRD_NUMBER is null and
ra.TRADE_REP like '%DNK%' and
(c.LAST_NAME like '%/%' or c.FIRST_NAME like '%/%') and
ra.TRADE_FIRM in
(
'xxxxxxx',
'xxxxxxx'
)
If you read the code, it's obvious that I am attempting to perform the same concatenation on the first_name column as well. However, I realize that a solution which will work for the Last_name column (used in my example), will also work for the first_name column.
Thank you.
Some default values
DECLARE #List VARCHAR(50) = 'Fischer-Costello/Korbell/Morrison/Pearson'
DECLARE #SplitOn CHAR(1) = '/'
This area just splits the string into a list
DECLARE #RtnValue table
(
Id int identity(1,1),
Value nvarchar(4000)
)
While (Charindex(#SplitOn, #List)>0)
Begin
Insert Into #RtnValue (value)
Select
Value = ltrim(rtrim(Substring(#List,1,Charindex(#SplitOn,#List)-1)))
Set #List = Substring(#List,Charindex(#SplitOn,#List)+len(#SplitOn+',')-1,len(#List))
End
Insert Into #RtnValue (Value)
Select Value = ltrim(rtrim(#List))
Now lets grab the first character of each name and stuff it back into a single variable
SELECT STUFF((SELECT SUBSTRING(VALUE,1,1) FROM #RtnValue FOR XML PATH('')),1,0,'') AS Value
Outputs:
Value
FKMP
Here is another way to do this would be a lot faster than looping. What you need is a set based splitter. Jeff Moden at sql server central has one that is awesome. Here is a link to the article. http://www.sqlservercentral.com/articles/Tally+Table/72993/
Now I know you have to signup for an account to view this but it is free and the logic in that article will change the way you look at data. You might also be able to find his code posted if you search for DelimitedSplit8K.
At any rate, here is how you could implement this type of splitter.
declare #Table table(ID int identity, SomeValue varchar(50))
insert #Table
select 'Fischer-Costello/Korbell/Morrison/Pearson'
select ID, STUFF((select '' + left(x.Item, 1)
from #Table t2
cross apply dbo.DelimitedSplit8K(SomeValue, '/') x
where t2.ID = t1.ID
for xml path('')), 1, 0 , '') as MyResult
from #Table t1
group by t1.ID

SQL: Find rows where Column contains all of the given words

I have some column EntityName, and I want to have users to be able to search names by entering words separated by space. The space is implicitly considered as an 'AND' operator, meaning that the returned rows must have all of the words specified, and not necessarily in the given order.
For example, if we have rows like these:
abba nina pretty balerina
acdc you shook me all night long
sth you are me
dream theater it's all about you
when the user enters: me you, or you me (the results must be equivalent), the result has rows 2 and 3.
I know I can go like:
WHERE Col1 LIKE '%' + word1 + '%'
AND Col1 LIKE '%' + word2 + '%'
but I wanted to know if there's some more optimal solution.
The CONTAINS would require a full text index, which (for various reasons) is not an option.
Maybe Sql2008 has some built-in, semi-hidden solution for these cases?
The only thing I can think of is to write a CLR function that does the LIKE comparisons. This should be many times faster.
Update: Now that I think about it, it makes sense CLR would not help. Two other ideas:
1 - Try indexing Col1 and do this:
WHERE (Col1 LIKE word1 + '%' or Col1 LIKE '%' + word1 + '%')
AND (Col1 LIKE word2 + '%' or Col1 LIKE '%' + word2 + '%')
Depending on the most common searches (starts with vs. substring), this may offer an improvement.
2 - Add your own full text indexing table where each word is a row in the table. Then you can index properly.
Function
CREATE FUNCTION [dbo].[fnSplit] ( #sep CHAR(1), #str VARCHAR(512) )
RETURNS TABLE AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(#sep, #str)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(#sep, #str, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT
pn AS Id,
SUBSTRING(#str, start, CASE WHEN stop > 0 THEN stop - start ELSE 512 END) AS Data
FROM
Pieces
)
Query
DECLARE #FilterTable TABLE (Data VARCHAR(512))
INSERT INTO #FilterTable (Data)
SELECT DISTINCT S.Data
FROM fnSplit(' ', 'word1 word2 word3') S -- Contains words
SELECT DISTINCT
T.*
FROM
MyTable T
INNER JOIN #FilterTable F1 ON T.Col1 LIKE '%' + F1.Data + '%'
LEFT JOIN #FilterTable F2 ON T.Col1 NOT LIKE '%' + F2.Data + '%'
WHERE
F2.Data IS NULL
Source: SQL SELECT WHERE field contains words
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
You're going to end up with a full table scan anyway.
The collation can make a big difference apparently. Kalen Delaney in the book "Microsoft SQL Server 2008 Internals" says:
Collation can make a huge difference
when SQL Server has to look at almost
all characters in the strings. For
instance, look at the following:
SELECT COUNT(*) FROM tbl WHERE longcol LIKE '%abc%'
This may execute 10 times faster or more with a binary collation than a nonbinary Windows collation. And with varchar data, this executes up to seven or eight times faster with a SQL collation than with a Windows collation.
WITH Tokens AS(SELECT 'you' AS Token UNION ALL SELECT 'me')
SELECT ...
FROM YourTable AS t
WHERE (SELECT COUNT(*) FROM Tokens WHERE y.Col1 LIKE '%'+Tokens.Token+'%')
=
(SELECT COUNT(*) FROM Tokens) ;
This should ideally be done with the help of Full text search as mentioned above.
BUT,
If you don't have full text configured for your DB, here is a performance intensive solution for doing a prioritized string search.
-- table to search in
drop table if exists dbo.myTable;
go
CREATE TABLE dbo.myTable
(
myTableId int NOT NULL IDENTITY (1, 1),
code varchar(200) NOT NULL,
description varchar(200) NOT NULL -- this column contains the values we are going to search in
) ON [PRIMARY]
GO
-- function to split space separated search string into individual words
drop function if exists [dbo].[fnSplit];
go
CREATE FUNCTION [dbo].[fnSplit] (#StringInput nvarchar(max),
#Delimiter nvarchar(1))
RETURNS #OutputTable TABLE (
id nvarchar(1000)
)
AS
BEGIN
DECLARE #String nvarchar(100);
WHILE LEN(#StringInput) > 0
BEGIN
SET #String = LEFT(#StringInput, ISNULL(NULLIF(CHARINDEX(#Delimiter, #StringInput) - 1, -1),
LEN(#StringInput)));
SET #StringInput = SUBSTRING(#StringInput, ISNULL(NULLIF(CHARINDEX
(
#Delimiter, #StringInput
),
0
), LEN
(
#StringInput)
)
+ 1, LEN(#StringInput));
INSERT INTO #OutputTable (id)
VALUES (#String);
END;
RETURN;
END;
GO
-- this is the search script which can be optionally converted to a stored procedure /function
declare #search varchar(max) = 'infection upper acute genito'; -- enter your search string here
-- the searched string above should give rows containing the following
-- infection in upper side with acute genitointestinal tract
-- acute infection in upper teeth
-- acute genitointestinal pain
if (len(trim(#search)) = 0) -- if search string is empty, just return records ordered alphabetically
begin
select 1 as Priority ,myTableid, code, Description from myTable order by Description
return;
end
declare #splitTable Table(
wordRank int Identity(1,1), -- individual words are assinged priority order (in order of occurence/position)
word varchar(200)
)
declare #nonWordTable Table( -- table to trim out auxiliary verbs, prepositions etc. from the search
id varchar(200)
)
insert into #nonWordTable values
('of'),
('with'),
('at'),
('in'),
('for'),
('on'),
('by'),
('like'),
('up'),
('off'),
('near'),
('is'),
('are'),
(','),
(':'),
(';')
insert into #splitTable
select id from dbo.fnSplit(#search,' '); -- this function gives you a table with rows containing all the space separated words of the search like in this e.g., the output will be -
-- id
-------------
-- infection
-- upper
-- acute
-- genito
delete s from #splitTable s join #nonWordTable n on s.word = n.id; -- trimming out non-words here
declare #countOfSearchStrings int = (select count(word) from #splitTable); -- count of space separated words for search
declare #highestPriority int = POWER(#countOfSearchStrings,3);
with plainMatches as
(
select myTableid, #highestPriority as Priority from myTable where Description like #search -- exact matches have highest priority
union
select myTableid, #highestPriority-1 as Priority from myTable where Description like #search + '%' -- then with something at the end
union
select myTableid, #highestPriority-2 as Priority from myTable where Description like '%' + #search -- then with something at the beginning
union
select myTableid, #highestPriority-3 as Priority from myTable where Description like '%' + #search + '%' -- then if the word falls somewhere in between
),
splitWordMatches as( -- give each searched word a rank based on its position in the searched string
-- and calculate its char index in the field to search
select myTable.myTableid, (#countOfSearchStrings - s.wordRank) as Priority, s.word,
wordIndex = CHARINDEX(s.word, myTable.Description) from myTable join #splitTable s on myTable.Description like '%'+ s.word + '%'
-- and not exists(select myTableid from plainMatches p where p.myTableId = myTable.myTableId) -- need not look into myTables that have already been found in plainmatches as they are highest ranked
-- this one takes a long time though, so commenting it, will have no impact on the result
),
matchingRowsWithAllWords as (
select myTableid, count(myTableid) as myTableCount from splitWordMatches group by(myTableid) having count(myTableid) = #countOfSearchStrings
)
, -- trim off the CTE here if you don't care about the ordering of words to be considered for priority
wordIndexRatings as( -- reverse the char indexes retrived above so that words occuring earlier have higher weightage
-- and then normalize them to sequential values
select s.myTableid, Priority, word, ROW_NUMBER() over (partition by s.myTableid order by wordindex desc) as comparativeWordIndex
from splitWordMatches s join matchingRowsWithAllWords m on s.myTableId = m.myTableId
)
,
wordIndexSequenceRatings as ( -- need to do this to ensure that if the same set of words from search string is found in two rows,
-- their sequence in the field value is taken into account for higher priority
select w.myTableid, w.word, (w.Priority + w.comparativeWordIndex + coalesce(sequncedPriority ,0)) as Priority
from wordIndexRatings w left join
(
select w1.myTableid, w1.priority, w1.word, w1.comparativeWordIndex, count(w1.myTableid) as sequncedPriority
from wordIndexRatings w1 join wordIndexRatings w2 on w1.myTableId = w2.myTableId and w1.Priority > w2.Priority and w1.comparativeWordIndex>w2.comparativeWordIndex
group by w1.myTableid, w1.priority,w1.word, w1.comparativeWordIndex
)
sequencedPriority on w.myTableId = sequencedPriority.myTableId and w.Priority = sequencedPriority.Priority
),
prioritizedSplitWordMatches as ( -- this calculates the cumulative priority for a field value
select w1.myTableId, sum(w1.Priority) as OverallPriority from wordIndexSequenceRatings w1 join wordIndexSequenceRatings w2 on w1.myTableId = w2.myTableId
where w1.word <> w2.word group by w1.myTableid
),
completeSet as (
select myTableid, priority from plainMatches -- get plain matches which should be highest ranked
union
select myTableid, OverallPriority as priority from prioritizedSplitWordMatches -- get ranked split word matches (which are ordered based on word rank in search string and sequence)
),
maximizedCompleteSet as( -- set the priority of a field value = maximum priority for that field value
select myTableid, max(priority) as Priority from completeSet group by myTableId
)
select priority, myTable.myTableid , code, Description from maximizedCompleteSet m join myTable on m.myTableId = myTable.myTableId
order by Priority desc, Description -- order by priority desc to get highest rated items on top
--offset 0 rows fetch next 50 rows only -- optional paging