Searching multiple word in a given column using regular expression in SQL Server

Searching multiple word in a given column using regular expression in SQL Server - sql

I want to perform multiple word on particular column. The given search string may be in different order. For example , I want to search the book name "Harry Potter Dream world "from books table using like operator and regular expression.
I know, using multiple like operator, we can perform operation using below query
SELECT *
FROM TABLE_1
WHERE bookname LIKE 'Harry Potter' OR LIKE 'Heaven world'
In this case, I want to perform this in a single query. Also I tried with FREETEXT options. That wont be useful when i use self-join. Kindly provide me any other alternatives to solve this.
Also can you provide , how to use regular expression to search multiple word in SQL Server. I tried with multiple options. It won't work for me.

How about this one...
DECLARE #phrase nvarchar(max) = 'Harry Potter Dream world'
;WITH words AS (
SELECT word = y.i.value('(./text())[1]', 'nvarchar(4000)')
FROM (
SELECT x =
CONVERT(XML, '<i>'
+ REPLACE(#phrase, ' ', '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
)
SELECT *
FROM
TABLE_1
CROSS APPLY (
SELECT found = 1
FROM words
WHERE bookname like '%' + word + '%') search

Searching with LIKE could lead to very many hits, especially if you deal with a search string containing "the" or "a"...
The following code will first split your search string into its words, then split the book's names into the words and check for full word hits
DECLARE #tbl TABLE(ID INT, BookName VARCHAR(100));
INSERT INTO #tbl VALUES
(1,'Harry Potter')
,(2,'Dream world')
,(3,'A Midsumme Night''s Dream')
,(4,'Some other Book') --will not be found
,(5,'World of Warcraft');
DECLARE #phrase nvarchar(max) = 'Harry Potter o Dream world'
;WITH words AS (
SELECT word = z.i.value('.', 'nvarchar(max)')
FROM (SELECT CAST('<i>' + REPLACE(#phrase, ' ', '</i><i>') + '</i>' AS XML)) AS x(y)
CROSS APPLY x.y.nodes('/i') AS z(i)
)
SELECT *
FROM #tbl AS tbl
WHERE EXISTS
(
SELECT 1
FROM
(
SELECT z.i.value('.', 'nvarchar(max)')
FROM (SELECT CAST('<i>' + REPLACE(tbl.BookName, ' ', '</i><i>') + '</i>' AS XML)) AS x(y)
CROSS APPLY x.y.nodes('/i') AS z(i)
) AS checkWords(word)
WHERE EXISTS(SELECT 1 FROM words WHERE words.word=checkWords.word)
)

Related

How can I compare these two strings in SQL Server?

So I need to compare a string against another string to see if any parts of the string match. This would be useful for checking if a list of salespeople IDs against the ones that are listed to a specific GM or if falls outside of that GMs list of IDs:
ID_SP ID_GM NEEDED FIELD (overlap)
136,338,342 512,338,112 338
512,112,208 512,338,112 512,112
587,641,211 512,338,112 null
I'm struggling on how to achieve this. I'm guessing some sort of UDF?
I realize this would be much easier to have done prior to using the for XML path(''), but I'm hoping for a solution that doesn't require me to unravel the data as that will blow up the overall size of the dataset.

No, that is not how you do it. You would go back to the raw data. To get the ids in common:
select tbob.id
from t tbob join
t tmary
on tbob.id = tmary.id and tbob.manager = 'Bob' and tmary.manager = 'Mary';

Since the data set isn't two raw sources, but one 'concatenated field' and a hardcoded string field that is a list of GMIDs (same value for every row) then the correct answer (from the starting point of the question) is to use something like nodes('/M') as Split(a).
Then you get something like this:
ID_SP ID_GM
136 512,338,112
338 512,338,112
342 512,338,112
and can do something like this:
case when ID_GM not like '%'+ID_SP+'%'then 1 else 0 end as 'indicator'
From here you can aggregate back and sum the indicator field and say that if > 0 then the ID_SP exists in the list of ID_GMs
Hope this helps someone else.

-- Try This
Declare #String1 as varchar(100)='512,112,208';
Declare #String2 as varchar(100)='512,338,112';
WITH FirstStringSplit(S1) AS
(
SELECT CAST('<x>' + REPLACE(#String1,',','</x><x>') + '</x>' AS XML)
)
,SecondStringSplit(S2) AS
(
SELECT CAST('<x>' + REPLACE(#String2,',','</x><x>') + '</x>' AS XML)
)
SELECT STUFF(
(
SELECT ',' + part1.value('.','nvarchar(max)')
FROM FirstStringSplit
CROSS APPLY S1.nodes('/x') AS A(part1)
WHERE part1.value('.','nvarchar(max)') IN(SELECT B.part2.value('.','nvarchar(max)')
FROM SecondStringSplit
CROSS APPLY S2.nodes('/x') AS B(part2)
)
FOR XML PATH('')
),1,1,'')

Gordon is correct, that you should not do this. This ought do be done with the raw data. the following code will "go back to the raw data" and solve this with an easy INNER JOIN.
The CTEs will create derived tables (all the many rows you want to avoid) and check them for equality (Not using indexes! One more reason to do this in advance):
DECLARE #tbl TABLE(ID INT IDENTITY,ID_SP VARCHAR(100),ID_GM VARCHAR(100));
INSERT INTO #tbl VALUES
('136,338,342','512,338,112')
,('512,112,208','512,338,112')
,('587,641,211','512,338,112');
WITH Splitted AS
(
SELECT t.*
,CAST('<x>' + REPLACE(t.ID_SP,',','</x><x>') + '</x>' AS xml) AS PartedSP
,CAST('<x>' + REPLACE(t.ID_GM,',','</x><x>') + '</x>' AS xml) AS PartedGM
FROM #tbl AS t
)
,SetSP AS
(
SELECT Splitted.ID
,Splitted.ID_SP
,x.value('text()[1]','int') AS SP_ID
FROM Splitted
CROSS APPLY PartedSP.nodes('/x') AS A(x)
)
,SetGM AS
(
SELECT Splitted.ID
,Splitted.ID_GM
,x.value('text()[1]','int') AS GM_ID
FROM Splitted
CROSS APPLY PartedGM.nodes('/x') AS A(x)
)
,BackToYourRawData AS --Here is the point you should do this in advance!
(
SELECT SetSP.ID
,SetSP.SP_ID
,SetGM.GM_ID
FROM SetSP
INNER JOIN SetGM ON SetSP.ID=SetGM.ID
AND SetSP.SP_ID=SetGM.GM_ID
)
SELECT ID
,STUFF((
SELECT ',' + CAST(rd2.SP_ID AS VARCHAR(10))
FROM BackToYourRawData AS rd2
WHERE rd.ID=rd2.ID
ORDER BY rd2.SP_ID
FOR XML PATH('')),1,1,'') AS CommonID
FROM BackToYourRawData AS rd
GROUP BY ID;
The result
ID CommonID
1 338
2 112,512

SSRS selecting results based on comma delimited list with like statement

Based on my question SSRS selecting results based on comma delimited list
Is it possible to do this, but instead of doing this as a an EQUALS, can it be done as below?
WHERE value like 'abc%','def%'
One thing to note is that the % is not included in the list.

One option is to split the passed in SSRS variable (CSV) into a table and join on that
DECLARE #tab TABLE (Col1 NVARCHAR(200))
INSERT INTO #tab (Col1)
VALUES (N'abc'),(N'def'),(N'xyz'),(N'nop'),(N'ghi'),(N'lmn')
DECLARE #substrings NVARCHAR(200) = 'abc,def,ghi'
;WITH cteX
AS( --dynamically split the string
SELECT Strings = y.i.value('(./text())[1]', 'nvarchar(4000)')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#substrings, ',', '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
)
SELECT
T.*
FROM #tab T
INNER JOIN cteX X ON X.Strings = T.Col1
gives the following result

Separate words by group wise for each row in SQL

I have a string something like
No People,Day,side view,looking at camera,snow,mountain,tranquil scene,tranquility,Night,walking,water,Two Person,looking Down
And I have a table Group_words
Group Category
---------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------
No People,One Person,Two Person,Three Person,Four Person,five person,medium group of people,large group of people,unrecognizable person,real people People
Day,dusk,night,dawn,sunset,sunrise Weather
looking at camera,looking way,looking sideways,looking down,looking up View Angle
I want to check every comma separated word with table Group_words and find the wrong combination.
For the above string result should be : "No People,Day,side view,looking at camera,snow,mountain,tranquil scene,tranquility,walking,water"
Night is removed because Day is available in the string.
Two Person is removed because No People is available in the string.
looking Down is removed because looking at camera is available in the string.
I know its to complicated but simply I want to remove the not matching words from sting which is available into table Group_words.

Wow, you should be re-designing your tables. Anyway, here is my attempt using Jeff Moden's DelimitedSplit8k.
I believe you now have this function since I answered one of your previous questions that also uses this function.
First, you want to split your #string input into separate rows. You should also split the Group_Words table.
After that you do a LEFT JOIN to get the matching categories. Then you eliminate the invalid words.
See it in action here: SQL Fiddle
DECLARE #string VARCHAR(8000)
SET #string = 'No People,Day,side view,looking at camera,snow,mountain,tranquil scene,tranquility,Night,walking,water,Two Person,looking Down'
-- Split #string variable
DECLARE #tbl_string AS TABLE(ItemNumber INT, Item VARCHAR(8000))
INSERT INTO #tbl_string
SELECT
ItemNumber, LTRIM(RTRIM(Item))
FROM dbo.DelimitedSplit8K(#string, ',')
-- Normalize Group_Words
DECLARE #tbl_grouping AS TABLE(Category VARCHAR(20), ItemNumber INT, Item VARCHAR(8000))
INSERT INTO #tbl_grouping
SELECT
w.Category, s.ItemNumber, LTRIM(RTRIM(s.Item))
FROM Group_Words w
CROSS APPLY dbo.DelimitedSplit8K(w.[Group], ',')s
;WITH Cte AS(
SELECT
s.ItemNumber,
s.Item,
g.category,
RN = ROW_NUMBER() OVER(PARTITION BY g.Category ORDER BY s.ItemNumber)
FROM #tbl_string s
LEFT JOIN #tbl_grouping g
ON g.Item = s.Item
)
SELECT STUFF((
SELECT ',' + Item
FROM Cte
WHERE
RN = 1
OR Category IS NULL
ORDER BY ItemNumber
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'),
1, 1, '')
OUTPUT:
| |
|--------------------------------------------------------------------------------------------------|
| No People,Day,side view,looking at camera,snow,mountain,tranquil scene,tranquility,walking,water |
If your #string input has more than 8000 characters, the DelimitedSplit8K will slow down. You can use other splitters instead. Here is one taken for Sir Aaron Bertrands's article.
CREATE FUNCTION dbo.SplitStrings_XML
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT Item = y.i.value('(./text())[1]', 'nvarchar(4000)')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO

Dynamic String replacement in a list for SQL 2005

I have a column that can have results like the following:
ER
ER,ER,ER
ER,ER,OR,OR
OR
OR,OR
OR,OR,OR,ER,ER
I am looking for a way to replace any a string such as "ER,ER,ER,OR,OR" to just "ER,OR". No matter how many times ER or OR show up I just want each displayed only once. Thank you

Create a split function (or search for hundreds of variations on this site):
CREATE FUNCTION [dbo].[SplitStrings_XML]
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT Item = y.i.value(N'./text()[1]', N'nvarchar(4000)')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>') + '</i>').query('.')
) AS a
CROSS APPLY x.nodes('i') AS y(i)
);
GO
Then you can do this:
DECLARE #x TABLE(foo VARCHAR(32));
INSERT #x SELECT 'ER'
UNION ALL SELECT 'ER,ER,ER'
UNION ALL SELECT 'ER,ER,OR,OR'
UNION ALL SELECT 'OR'
UNION ALL SELECT 'OR,OR'
UNION ALL SELECT 'OR,OR,OR,ER,ER';
;WITH g AS
(
SELECT x.foo, s.Item
FROM #x AS x
CROSS APPLY dbo.SplitStrings_XML(x.foo, ',') AS s
GROUP BY x.foo, s.Item
)
SELECT original = x.foo, new = STUFF((SELECT ',' + Item FROM g
WHERE foo = x.foo GROUP BY Item
FOR XML PATH(''),
TYPE).value(N'./text()[1]', N'nvarchar(max)'), 1, 1, '')
FROM #x AS x;
Results are almost exactly the way you want, except order from the initial string is not preserved:
original new
----------------- ----------
ER ER
ER,ER,ER ER
ER,ER,OR,OR ER,OR
OR OR
OR,OR OR
OR,OR,OR,ER,ER ER,OR

A quick and dirty way would be like this... though it's limited to the example you provide in the question only. Conceptually there are better ways as provided in the comment to your question by KM.
update mytable
set mycolumn =
case when (mycolumn like '%er%' and not mycolumn like '%or%') then 'ER'
when (mycolumn like '%or%' and not mycolumn like '%er%') then 'OR'
when (mycolumn like '%er%' and mycolumn like '%or%') then 'ER,OR' end

How to sort the words of a single cell in an SQL table?

For example:
Pillars 101 in an apartment
Zuzu Durga International Hotel
Wyndham Garden Fresh Meadows
Need to sort the above as,
101 an apartment in Pillars
Durga Hotel International Zuzu
Fresh Garden Meadows Wyndham

Try this:
DECLARE #tbl TABLE(YourString VARCHAR(100));
INSERT INTO #tbl VALUES
('Pillars 101 in an apartment')
,('Zuzu Durga International Hotel')
,('Wyndham Garden Fresh Meadows');
SELECT CAST('<x>' + REPLACE((SELECT YourString AS [*] FOR XML PATH('')),' ','</x><x>') + '</x>' AS XML).query
('
for $x in /x
order by $x
return
concat($x/text()[1], " ")
').value('.','varchar(max)')
FROM #tbl;
The code will frist transfer your text in an XML like <x>Pillars</x><x>101</x> ....
Then a FLWOR XQuery is used to return the text parts sorted.
The last call to .value() will return the sorted fragments as text again.
The result
101 Pillars an apartment in
Durga Hotel International Zuzu
Fresh Garden Meadows Wyndham
Final statement
This code is kind of an exercise. Your design is really bad and should be changed...

So there's nothing that you can do natively. If you want to sort the values just as a return value, i.e. not update the database itself, you can transform the results with either a stored procedure or perhaps a view.
So let's construct an answer.
Let's just assume you want to do it visually, for a single row. If you have SQL 2016 you can use STRING_SPLIT but SQL Fiddle doesn't, so I used a common UDF fnSplitString
http://sqlfiddle.com/#!6/7194d/2
SELECT value
FROM fnSplitString('Pillars 101 in an apartment', ' ')
WHERE RTRIM(value) <> '';
That gives me each word, split out. What about ordering it?
SELECT value
FROM fnSplitString('Pillars 101 in an apartment', ' ')
WHERE RTRIM(value) <> ''
ORDER BY value;
And if I want to do it for each row in the DB table I have? http://sqlfiddle.com/#!6/7194d/8
SELECT split.value
FROM [Data] d
CROSS APPLY dbo.fnSplitString(IsNull(d.Value,''), ' ') AS split
WHERE RTRIM(split.value) <> ''
ORDER BY value;
That's sort of helpful, except now all my words are jumbled. Let's go back to our original query and identify each row. Each row probably has an Identity column on it. If so, you've got your grouping there. If not, you can use ROW_NUMBER, such as:
SELECT
ROW_NUMBER() OVER(ORDER BY d.Value) AS [Identity] -- here, use identity instead of row_number
, d.Value
FROM [Data] d
If we then use this query as a subquery in our select, we get:
http://sqlfiddle.com/#!6/7194d/21
SELECT d.[Identity], split.value
FROM
(
SELECT
ROW_NUMBER() OVER(ORDER BY d.Value) AS [Identity] -- here, use identity instead of row_number
, d.Value
FROM [Data] d
) d
CROSS APPLY dbo.fnSplitString(IsNull(d.Value,''), ' ') AS split
WHERE RTRIM(split.value) <> ''
ORDER BY d.[Identity], value;
This query now sorts all rows within each identity. But now you need to reconstruct those individual words back into a single string, right? For that, you can use STUFF. In my example I use a CTE because of SQL Fiddle limitations but you could use a temp table, too.
WITH tempData AS (
SELECT d.[Identity], split.value
FROM
(
SELECT
ROW_NUMBER() OVER(ORDER BY d.Value) AS [Identity] -- here, use identity instead of row_number
, d.Value
FROM [Data] d
) d
CROSS APPLY dbo.fnSplitString(IsNull(d.Value,''), ' ') AS split
WHERE RTRIM(split.value) <> ''
)
SELECT grp.[Identity]
, STUFF((SELECT N' ' + [Value] FROM tempData WHERE [Identity] = grp.[Identity] ORDER BY Value FOR XML PATH(N''))
, 1, 1, N'')
FROM (SELECT DISTINCT [Identity] FROM tempData) AS grp
Here's the end result fiddle: http://sqlfiddle.com/#!6/7194d/27
As expressed in comments already, this is not a common case for SQL. It's an unnecessary burden on the server. I would recommend pulling data out of SQL and sorting it through your programming language of choice; or making sure it's sorted as you insert it into the DB. I went through the exercise because I had a few minutes to kill :)

Already +1 on Shnugo's solution. I actually watch for his posts.
Just another option use a parse UDF in concert with a Cross Apply.
Example
Select B.*
From YourTable A
Cross Apply ( Select Sorted=Stuff((Select ' ' +RetVal From [dbo].[tvf-Str-Parse](A.SomeCol,' ') Order By RetVal For XML Path ('')),1,1,'') )B
Returns
Sorted
101 an apartment in Pillars
Durga Hotel International Zuzu
Fresh Garden Meadows Wyndham
The UDF if Interested
CREATE FUNCTION [dbo].[tvf-Str-Parse] (#String varchar(max),#Delimiter varchar(10))
Returns Table
As
Return (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>' + replace((Select replace(#String,#Delimiter,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
);
--Thanks Shnugo for making this XML safe
--Select * from [dbo].[tvf-Str-Parse]('Dog,Cat,House,Car',',')

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Searching multiple word in a given column using regular expression in SQL Server - sql

Related

How can I compare these two strings in SQL Server?

SSRS selecting results based on comma delimited list with like statement

Separate words by group wise for each row in SQL

Dynamic String replacement in a list for SQL 2005

How to sort the words of a single cell in an SQL table?

Categories

Resources