Fastest way to compare substring property between two strings in sql server - sql

Given two strings A and B, what is the fastest way to compare whether A is a substring of B or B is a substring of A?
A LIKE '%' + B + '%' OR B LIKE '%' + A + '%'
or
CHARIDNEX(A,B) <> 0 OR CHARINDEX(B,A) <> 0
I believe its the former because it doesnt calculate the location.
Question 1: is there a faster way to do it because I want to minimize the number of times B has to be used as B is a string I get by processing another column value.
As an additional note,
Basically I want to do something as follows with a column, C
SELECT
CASE WHEN A LIKE Processing(C) THEN 0
WHEN A LIKE '%' + PROCESSING(C) + '%' OR PROCESSING(C) LIKE '%' + A + '%' THEN LEN(A) - LEN(PROCESSING(C))
END AS Score
FROM #table
where A and C are columns in table, #table. As can be seen, the number of times I am calling Processing(C) is huge as it is done for each record.
Question 2: Should I put Processing(C) it in a separate temp table and then run substring check against that column or continue with the same approach.

My guess is that charindex() and like would have similar performance in this case. Don't hesitate to test which is faster (and report back on the results so we can all learn).
However, this particular optimization probably won't make a difference to the overall query. Your question may be an example of premature optimization.
Once upon a time, I thought that like performed worse than the comparable string operation. However, like is optimized in many databases, including SQL Server. As an example of the optimization, like is able to use indexes (when there is no wildcard or the wildcard is at the end). charindex() does not use indexes. If you are looking for matches at the beginning of the respective strings, then your query could possibly take advantage of indexes.
EDIT:
For your concern about PROCESSING(c), you might consider a subquery:
SELECT (CASE WHEN A LIKE Processing_C THEN 0
WHEN A LIKE '%' + Processing_C + '%' OR Processing_C LIKE '%' + A + '%'
THEN LEN(A) - LEN(Processing_C)
END) AS Score
FROM (select t.*, PROCESSING(C) as Processing_C
from #table
) t

Related

SQL Server Partial Match

I have 2 columns that I am trying to see if there is a partial match between two strings. Column A has string: 0C000702AA-G and Column B has string S0C000702AB-DI. I did try:
CASE WHEN ColumnA LIKE '%' + ColumnB + '%' THEN '1' ELSE '0' END AS 'Match'
but it returns a 0. Is there a better way to see if there is almost a match?
Column A = 0C000702AA-G and Column B = S0C000702AB-DI. As you can see Column B is almost the same as A, B has prefix of 'S' and ends with 'AB-DI'. The result should return 1 because the part in the middle '0C000702AA' is the same both sides.
I just tested:
CASE WHEN '%' + ColumnA + '%' LIKE '%' + ColumnB + '%' THEN '1' ELSE '0' END AS 'Match'
Still returns 0
You can utilize DIFFERENCE function, which compares the SOUNDEX values of the two strings. If difference is 0, then no similarity. If the difference is 4, they are very similar. Read more about SOUNDEX and DIFFERENCE
CAVEAT: The comparison is based on how the strings sound. So, it will not be very suitable for your needs, as you have got a identifier kind of thing, including digits.
DECLARE #table table(columnA CHAR(100), ColumnB CHAR(100))
INSERT INTO #table values
('0C000702AA-G','S0C000702AB-DI')
SELECT SOUNDEX(ColumnA) as columnASoundex, SOUNDEX(columnB) as ColumnBSoundex,
DIFFERENCE(ColumnA,ColumnB) as Similarity from #table
columnASoundex
ColumnBSoundex
Similarity
0000
S000
3
But, if you want to go for even detailed comparison, you can use a CLR stored procedure leveraging C# fuzzy matching libraries like fuzzystring. Also refer to SO post fuzzy matching in C#
UPDATE As OP confirmed, the above approach works only in some cases. So, OP has to figure out a better approach, which would suit all his needs.

Use a "case when in" expression where the list values are in a single field?

I'm using SQL Server and would like to check if todays day name is in a list of values in a single field/column.
An example of the column "start_days" contents is:
'Monday','Tuesday','Sunday'
'Thursday'
'Friday','Sunday'
'Tuesday','Sunday'
'Tuesday','Wednesday','Thursday','Friday'
The code I am trying to run on this is:
case
when datename(weekday,getdate()) in (start_days) then 1
else 0
end as today_flag
And the result is 0 for every row.
Am I doing something wrong here or is it just not possible to use a single field as a list of values in the statement?
As a starter: you should fix your data model and not store multiple values in a single column. Storing list of values in a database column basically defeats the purpopose of a relational database. Here is a related reading on that topic.
That said, here is one option using pattern matching:
case
when ',' + start_days + ',' like '%,' + datename(weekday,getdate()) + ',%' then 1
else 0
end as today_flag
If you really have single quotes around values within the list, then we need to include them in the match:
case
when ',' + start_days + ',' like '%,''' + datename(weekday,getdate()) + ''',%' then 1
else 0
end as today_flag
If the values always are weekdays, this can be simplified since there is no risk of overlapping values:
case
when start_days like '%''' + datename(weekday,getdate()) + '''%' then 1
else 0
end as today_flag
The right answer to the question is fixing your data modal. Storing multiple values like that will leads you to many issue and you're stuck on one right now.
Until that, you could use LIKE operaor to get the desired results as:
SELECT *, CASE WHEN
Value LIKE CONCAT('%', QUOTENAME(DATENAME(WEEKDAY,GETDATE()), ''''), '%')
THEN 1
ELSE 0
END
FROM
(
VALUES
('''Monday'',''Tuesday'',''Sunday'''),
('''Thursday'''),
('''Friday'',''Sunday'''),
('''Tuesday'',''Sunday'''),
('''Tuesday'',''Wednesday'',''Thursday'',''Friday''')
) T(Value)
Here is a db<>fiddle where you can see how it's working online.

SQL Server check if value is substring inside isnull

I have a field in UI interface that passes to a stored procedure a null value (when field is unfilled) or a contract number when it is filled. Substrings of the contract number are accepted as input.
Inside the procedure, I need to filter the results by this parameter.
I need something similar to this:
SELECT * FROM tableName tn
WHERE
tn.ContractNumber LIKE ISNULL('%' + #contractNumber + '%', tn.ContractNumber)
What do you think it is the best approach? Problem is that using a condition like this does not return values.
Simply:
SELECT *
FROM tableName tn
WHERE tn.ContractNumber LIKE '%' + #contractNumber + '%'
OR #contractNumber IS NULL
You are really checking multiple condition, so having them separated reads more intuitive (for most people, anyway).
I assume this is just a sample query, and you are not selecting * in reality...
Another one:
SELECT *
FROM tableName tn
WHERE tn.ContractNumber LIKE '%' + ISNULL(#contractNumber, '%') + '%'

Like Query in SQL taking time

So I've looked around to try to find some posts on this and there are many
Like Query 1 and Like Query 2 but none that address my specific question (that I could find).
I have two tables in which I have around 5000000+ records and I am returning Search result from these tables as :
SELECT A.ContactFirstName, A.ContactLastName
FROM Customer.CustomerDetails AS A WITH (nolock)
WHERE (A.ContactFirstName + ' ' + A.ContactLastName LIKE '%' + 'a' + '%')
UNION
SELECT C.ContactFirstName, C.ContactLastName
FROM Customer.Contacts AS C WITH (nolock)
WHERE (C.ContactFirstName + ' ' + C.ContactLastName LIKE '%' + 'a' + '%')
My problem is it is taking around 1 minute to execute.
For above query I am expecting result like :
Please suggest me the best practice to improve performance. Thanks in advance.
NOTE : No missing Indexes.
when you use "LIKE '%xxx%'" index are not used that why your query is slow i think. When you use "LIKE 'xxx%')" index is used (if an index exist on column of course. >Other proble you do a like on concatenante column, i dont knwo if index is used in this case. And why do a 'xxx' + ' ' + 'yyy' like 'z%', just do 'xxx' like 'z%' its the same. You can try to modify your query like this
SELECT A.ContactFirstName, A.ContactLastName
FROM Customer.CustomerDetails AS A WITH (nolock)
WHERE A.ContactFirstName LIKE '%a%' or A.ContactLastName LIKE '%a%'
UNION
SELECT C.ContactFirstName, C.ContactLastName
FROM Customer.Contacts AS C WITH (nolock)
WHERE C.ContactFirstName LIKE 'a%'
Use Charindex which improves performance of the search ,Here it checks the string to match with first charcter of given search charecter and doesn't search for any more matches.
DECLARE #Search VARCHAR(10)='a'
SELECT A.ContactFirstName, A.ContactLastName
FROM Customer.CustomerDetails AS A WITH (NOLOCK)
WHERE CHARINDEX(#Search,(A.ContactFirstName + ' ' + A.ContactLastName),0)>1

Is it possible to search for multiple terms in a column by using a LIKE statement?

I'm trying to understand if the above question is possible. I've been conceptually thinking about it, and basically what I'm looking to do is:
Specify keywords that may appear in a title. Lets use the two terms "Portfolio" and "Mike"
I'm hoping to generate a query that will allow for me to search for when Portfolio is contained within a title, or Mike. These two titles need not to be together.
For instance, if I have a title dubbed: "Portfolio A" and another title "Mike's favorite" I'd like both of these titles to be returned.
The issue I've encountered with using a LIKE statement is the following:
WHERE 1=1
and rpt_title LIKE ''%'+#report_title+'%'''
If I were to input: 'Portfolio,Mike' it would search for the occurrence of just that within a title.
EDIT: I should have been a bit more clear. I believe it's necessary for me to input my variable as 'Portfolio, Mike' in order for it to find the multiple values. Is this possible?
I'm assuming you could maybe use a charindex with a substring and a replace?
Yep, multiple Like statements with OR will work just fine -- just make sure you use the correct parentheses:
SELECT ...
FROM ...
WHERE 1=1
and (rpt_title LIKE '%Portfolio%'
or rpt_title LIKE '%Mike%')
However, I might suggest you look into using a full-text search.
http://msdn.microsoft.com/en-us/library/ms142571.aspx
I can propose a solution where you could specify any number of masks, without using multiple LIKE -
DECLARE #temp TABLE (st VARCHAR(100))
INSERT INTO #temp (st)
VALUES ('Portfolio photo'),('- Mike'),('blank'),('else'),('est')
DECLARE #delims VARCHAR(30)
SELECT #delims = '|Portfolio|Mike|' -- %Portfolio% OR %Mike% OR etc.
SELECT t.st
FROM #temp t
CROSS JOIN (
SELECT substr =
SUBSTRING(
#delims,
number + 1,
CHARINDEX('|', #delims, number + 1) - number - 1)
FROM [master].dbo.spt_values n
WHERE [type] = N'P'
AND number <= LEN(#delims) - 1
AND SUBSTRING(#delims, number, 1) = '|'
) s
WHERE t.st LIKE '%' + s.substr + '%'