Compare all possible substring of in two strings sql server 2008 - sql

I have a table with a column and value JobSkill = ".net sap lead". Now user enter the value "abap sap hana". I want to include a where condition which match exactly 3 or more continuous characters including space. In above scenario both have common "sap" substring so the condition should result in true. Below is my query. Please help. Previously I am using charindex but it does not resolve the purpose. I am using sql server 2008
SELECT Email_Id, JobSkill FROM Jobs
WHERE CHARINDEX(JobSkill, "abap sap hana") > 0

You need to create a function which loops through all positions of characters of String1 except the last 2, and check if String2 is like '%' + [(x,x+1,x+2)] + '%' string, where x is current position.
So for stings ('abcd acd g', 'ert acd'),
it should check
'ert acd' like '%abc%'
'ert acd' like '%bcd%'
'ert acd' like '%cd %'
'ert acd' like '%d a%'
and so on...
If like returns TRUE, break the loop.

Try like this,
SELECT j.Email_Id
,j.JobSkill
FROM Jobs j
INNER JOIN (
SELECT LTRIM(RTRIM(m.n.value('.[1]', 'varchar(8000)'))) SearchString
FROM (
SELECT CAST('<XMLRoot><RowData>' + REPLACE(#Input, ' ', '</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS x
) t
CROSS APPLY x.nodes('/XMLRoot/RowData') m(n)
) T ON j.JobSkill LIKE '%' + T.SearchString + '%'

Related

How to optimize Impala query to combine LIKE with IN (literally or effectively)?

I need to try and optimize a query in Impala SQL that does partial string matches on about 60 different strings, against two columns in a database of 50+ billion rows. The values in these two columns are encrypted and have to be decrypted with a user defined function (in Java) to do the partial string match. So query would look something like:
SELECT decrypt_function(column_A), decrypt_function(column_B) FROM myTable WHERE ((decrypt_function(column_A) LIKE '%' + partial_string_1 + '%') OR (decrypt_function(column_B) LIKE '%' + partial_string_1 + '%')) OR ((decrypt_function(column_A) LIKE '%' + partial_string_2 + '%') OR (decrypt_function(column_B) LIKE '%' + partial_string_2 + '%')) OR ... [up to partial_string_60]
What I really want to do is decrypt the two column values I'm comparing with, once for each row and then compare that value with all the partial strings, then go onto the next row etc (for 55 billion rows). Is that possible somehow? Can there be a subquery that assigns the decrypted column value to a variable before using that to do the string comparison to each of the 60 strings? Then go onto the next row...
Or is some other optimization possible? E.g. using 'IN', so ... WHERE (decrypt_function(column_A) IN ('%' + partial_string_1 + '%', '%' + partial_string_2 + '%', ... , '%' + partial_string_60 + '%')) OR (decrypt_function(column_B) IN ('%' + partial_string_1 + '%', '%' + partial_string_2 + '%', ... , '%' + partial_string_60 + '%'))
Thanks
Use subquery and also regexp_like can have many patterns concatenated with OR (|), so you can check all alternatives in single regexp, though you may need to split into several function calls if the pattern string is too long:
select colA, ColB
from
(--decrypt in the subquery
SELECT decrypt_function(column_A) as colA, decrypt_function(column_B) as ColB
FROM myTable
) as s
where
--put most frequent substrings first in the regexp
regexp_like(ColA,'partial_string_1|partial_string_2|partial_string_3') --add more
OR
regexp_like(ColB,'partial_string_1|partial_string_2|partial_string_3')
In Hive use this syntax:
where ColA rlike 'partial_string_1|partial_string_2|partial_string_3'
OR ColB rlike 'partial_string_1|partial_string_2|partial_string_3'

Filter rows by whether a text column contains any words in a string in SQL

My SQL Server database table has a column text which is a long string of text.
The search list is a string of words separated by comma. I want to grab those rows where the text column contains any one of words in the string.
DECLARE #words_to_search nvarchar(50)
SET #words_to_search = 'apple, pear, orange'
SELECT *
FROM myTbl
WHERE text ??? --how to specify text contains #words_to_search
Thanks a lot in advance.
If you're running SQL Server 2016 or later, you can use STRING_SPLIT to convert the words to search into a single column table, and then JOIN that to your table using LIKE:
DECLARE #words_to_search nvarchar(50)
SET #words_to_search = 'apple,pear,orange'
SELECT *
FROM myTbl
JOIN STRING_SPLIT(#words_to_search, ',') ON text LIKE '%' + value + '%';
Demo on SQLFiddle
Note that as the query is written it will (for example) match apple within Snapple. You can work around that by making the JOIN condition a bit more complex:
SELECT *
FROM myTbl t
JOIN STRING_SPLIT(#words_to_search, ',') v
ON t.text LIKE '%[^A-Za-z]' + value + '[^A-Za-z]%'
OR t.text LIKE value + '[^A-Za-z]%'
OR t.text LIKE '%[^A-Za-z]' + value;
Demo on SQLFiddle
First, I would use exists, unless you want to return the matching word.
Second, you can do this with a single like comparison If words are separated by spaces:
select t.*
from t
where exists (select 1
from string_split(#words_to_search, ',') s
where ' ' + t.text + ' ' like '% ' + value + ' %'
);
For more generic separators, you can use:
select t.*
from t
where exists (select 1
from string_split(#words_to_search, ',') s
where ' ' + t.text + ' ' like '%[^A-Za-z]' + value + '[^A-Za-z]%'
);
Or whatever describes your separators.
Note that your list of words is separated by a comma-space, not just a comma. However, based on your description (not the sample data), I have only used a ',' for the separator.

SQL Server | Look for specific keywords in strings

I need your help.
I try to match a manually created lookup of specific keywords with a fact comment table. Purpose: an attempt to categorize these comments.
Example
comment: A lot more power than the equivalent from Audi.
keyword from keyword-list: Audi
category from keyword-list: competitor
I tried something like
SELECT
FC.comment_id, KWM.keyword, KWM.category
FROM
dbo.factcomments FC
INNER JOIN
(SELECT
keywordmatcher = '%[,. ]' + keyword + '[ .,]%',
keyword,
category
FROM
dbo.keywordlist) KWM ON FC.comment LIKE KWM.keywordmatcher
Maybe a bad example, but I only want specific matches --> no matches if the keyword is part of another word in the fact comments (e.g. 'part' but not 'apart').
Because my first try didn't match keywords at the beginning/end of strings I did something really nasty:
SELECT
FC.comment_id, KWM.keyword, KWM.category
FROM
dbo.factcomments FC
INNER JOIN
(SELECT
keyword,
category
FROM
dbo.keywordlist) KWM ON FC.comment LIKE '%[,. ]' + KWM.keyword + '[ .,]%'
OR FC.comment LIKE KWM.keyword + '[ .,]%'
OR FC.comment LIKE '%[,. ]' + KWM.keyword
I know...
Besides the fact that I also want to detect those comments where there are '!', '?', ''', '-' or '_' before or after these keywords - is there any clever way to do so?
In fact I want any comments where there are no word characters before or after the keyword, any other character is OK.
In the JOIN condition, REPLACE() all non-alphanumeric characters in FC.Comment with a space character, and surround it with spaces. Something like this:
' '+REPLACE(FC.Comment, ...)+' '
Then do your LIKE Comparison like this:
LIKE '% '+KWM.Keyword+' %'
A different approach may be.
declare #comment varchar(255)=concat(' ','A lot more power than the equivalent from Audi.',' ')
declare #keyword varchar(50)='Audi'
DECLARE #allowedStrings VARCHAR(100)
DECLARE #teststring VARCHAR(100)
SET #allowedStrings = '><()!?#_-.\/?!*&^%$#()~'
;WITH CTE AS
(
SELECT SUBSTRING(#allowedStrings, 1, 1) AS [String], 1 AS [Start], 1 AS [Counter]
UNION ALL
SELECT SUBSTRING(#allowedStrings, [Start] + 1, 1) AS [String], [Start] + 1, [Counter] + 1
FROM CTE
WHERE [Counter] < LEN(#allowedStrings)
)
SELECT #comment = REPLACE(#comment, CTE.[String], '') FROM CTE
Change the #comment variable however you like and check the result
SELECT
#comment as Comment , #keyword as KeyWord,
iif(substring(#comment,PATINDEX(concat('%',#keyword,'%'),#comment)-1,len(#keyword)+2)=' Audi ',1,0) as isMatch
This is a borrowed idea from https://stackoverflow.com/a/29162400/10735793

SQL Server check if value is substring inside isnull

I have a field in UI interface that passes to a stored procedure a null value (when field is unfilled) or a contract number when it is filled. Substrings of the contract number are accepted as input.
Inside the procedure, I need to filter the results by this parameter.
I need something similar to this:
SELECT * FROM tableName tn
WHERE
tn.ContractNumber LIKE ISNULL('%' + #contractNumber + '%', tn.ContractNumber)
What do you think it is the best approach? Problem is that using a condition like this does not return values.
Simply:
SELECT *
FROM tableName tn
WHERE tn.ContractNumber LIKE '%' + #contractNumber + '%'
OR #contractNumber IS NULL
You are really checking multiple condition, so having them separated reads more intuitive (for most people, anyway).
I assume this is just a sample query, and you are not selecting * in reality...
Another one:
SELECT *
FROM tableName tn
WHERE tn.ContractNumber LIKE '%' + ISNULL(#contractNumber, '%') + '%'

how to compare string in SQL without using LIKE

I am using SQL query shown below to compare AMCcode. But if I compare the AMCcode '1' using LIKE operator it will compare all the entries with AMCcode 1, 10,11,12,13. .. 19, 21,31.... etc. But I want to match the AMCcode only with 1. Please suggest how can I do it. The code is given below :
ISNULL(CONVERT(VARCHAR(10),PM.PA_AMCCode), '') like
(
CASE
WHEN #AMCCode IS NULL THEN '%'
ELSE '%'+#AMCCode+ '%'
END
)
This is part of the code where I need to replace the LIKE operator with any other operator which will give the AMCcode with 1 when I want to search AMCcode of 1, not all 10,11,12..... Please help
I think you are looking for something like this:
where ',' + cast(PM.PA_AMCCode as varchar(255)) + ',' like '%,' + #AMCCodes + ',%'
This includes the delimiters in the comparison.
Note that a better method is to split the string and use a join, something like this:
select t.*
from t cross apply
(select cast(code as int) as code
from dbo.split(#AMCCodes, ',') s(code)
) s
where t.AMCCode = s.code;
This is better because under some circumstances, this version can make use of an index on AMCCode.
If you want to exactly match the value, you don't need to use '%' in your query. You can just use like as below
ISNULL(CONVERT(VARCHAR(10),PM.PA_AMCCode), '') like
(
CASE
WHEN #AMCCode IS NULL THEN '%'
ELSE #AMCCode
END
)
Possibly you can remove case statement and query like
ISNULL(CONVERT(VARCHAR(10),PM.PA_AMCCode), '') like #AMCCode