SQL query optimization (Search)

SQL query optimization (Search) - sql

I am trying to optimize a Stored procedure which is slow at the moment. It takes few parameters to search on which can be null. The query inside the SP looks like this.
SELECT
*some fields from the table*
FROM
[hrmCase]
WHERE
BrkId = #BrkId
AND
(ChannelId IN ('TO','TD'))
AND
case
when #PsId is null then 1
when (#PsId is not null) and ((SELECT SUBSTRING(UPPER(DATENAME(MONTH, Created)),1,3) + CAST(PSId AS varchar) PSId) like ('%' + #PsId + '%')) then 1 else 0
end = 1
AND
case
when #ACaseId is null then 1
when (#ACaseId is not null) and (AId like ('%' + #ACaseId + '%')) then 1 else 0
end = 1
AND
case
when #DateCreated is null then 1
when (#DateCreated is not null) and (dbo.StripTime(Created) = dbo.StripTime(#DateCreated)) then 1 else 0
end = 1
AND
case
when #Clients is null then 1
when (#Clients is not null) and (Client like ('%' + #Clients + '%')) then 1 else 0
end = 1
Is this the best way to do it or should I build a dynamic query based on the input parameters like below
Declare #SQLQuery AS NVarchar(4000)
Declare #ParamDefinition AS NVarchar(2000)
Set #SQLQuery = 'SELECT *some fields from the table* FROM [hrmCase] WHERE (1=1)'
If #PsId Is Not Null
Set #SQLQuery = #SQLQuery + ' And (SELECT SUBSTRING(UPPER(DATENAME(MONTH, Created)),1,3) + CAST(PSId AS varchar) PSId) like (''%''' + #PsId + '''%'')'
etc..
Which of the above two queries deemed more professional and will be quicker. Please suggest if there is a better way of doing the same thing.
Cheers,
DS

SQL Server is very bad at dealing with parameters like this--we've had many cases where we get it working acceptably and then 4 months later, something changes and it turns into table scans (or worse--I've seen performance FAR worse than simple table scans) all over again. (We have a 1.5TB database on SSD on mirrored 40 core servers). The best solution we've found is to use separate stored procedures or to use dynamic SQL. Many "experts" frown on dynamic SQL, but the reality is that in any decent environment these days, recompilation of dynamic SQL is insignificant in terms of CPU use and performance delay, and it eliminates the types of issues you're seeing because you eliminate the conditionals in the WHERE clauses that confuse the query optimizer.
The leading wildcards also will result in table scans in every case. Fulltext indexing can do this type of search much more efficiently than regular SQL. Some flavors of SQL Server support fulltext indexing and queries and some do not.

Related

SQL Server Partial Match

I have 2 columns that I am trying to see if there is a partial match between two strings. Column A has string: 0C000702AA-G and Column B has string S0C000702AB-DI. I did try:
CASE WHEN ColumnA LIKE '%' + ColumnB + '%' THEN '1' ELSE '0' END AS 'Match'
but it returns a 0. Is there a better way to see if there is almost a match?
Column A = 0C000702AA-G and Column B = S0C000702AB-DI. As you can see Column B is almost the same as A, B has prefix of 'S' and ends with 'AB-DI'. The result should return 1 because the part in the middle '0C000702AA' is the same both sides.
I just tested:
CASE WHEN '%' + ColumnA + '%' LIKE '%' + ColumnB + '%' THEN '1' ELSE '0' END AS 'Match'
Still returns 0

You can utilize DIFFERENCE function, which compares the SOUNDEX values of the two strings. If difference is 0, then no similarity. If the difference is 4, they are very similar. Read more about SOUNDEX and DIFFERENCE
CAVEAT: The comparison is based on how the strings sound. So, it will not be very suitable for your needs, as you have got a identifier kind of thing, including digits.
DECLARE #table table(columnA CHAR(100), ColumnB CHAR(100))
INSERT INTO #table values
('0C000702AA-G','S0C000702AB-DI')
SELECT SOUNDEX(ColumnA) as columnASoundex, SOUNDEX(columnB) as ColumnBSoundex,
DIFFERENCE(ColumnA,ColumnB) as Similarity from #table
columnASoundex
ColumnBSoundex
Similarity
0000
S000
3
But, if you want to go for even detailed comparison, you can use a CLR stored procedure leveraging C# fuzzy matching libraries like fuzzystring. Also refer to SO post fuzzy matching in C#
UPDATE As OP confirmed, the above approach works only in some cases. So, OP has to figure out a better approach, which would suit all his needs.

Will Using Short-Circuiting in WHERE Clause Improve Speed

Use case: I am going to be using SQL Server to retrieve values from a large table (1,000,000+ rows) where many different columns can be used as filter criteria, some more frequently used than others.
Questions
Would it be faster to utilize short-circuiting in the WHERE clause so that less comparisons are done?
Should the most commonly used criteria be filtered first to do even less comparisons?
Should the most commonly used criteria be indexed?
Example
No short circuiting
SELECT value
FROM AssignmentTable
WHERE (criteriaOne = <criteriaOneValue> OR criteriaOne IS NULL)
AND (criteriaTwo = <criteriaTwoValue> OR criteriaTwo IS NULL)
AND (criteriaThree = <criteriaThreeValue> OR criteriaThree IS NULL)
AND ... for all criteria (roughly 15)
With short circuiting
SELECT value
FROM AssignmentTable
WHERE 1 =
CASE
WHEN (criteriaOne = <criteriaOneValue> OR criteriaOne IS NULL) THEN
CASE
WHEN (criteriaTwo = <criteriaTwoValue> OR criteriaTwo IS NULL) THEN
CASE
WHEN (criteriaThree = <criteriaThreeValue> OR criteriaThree IS NULL) THEN 1
ELSE 0
END
ELSE 0
END
ELSE 0
END

The pattern for doing this without dynamic SQL in SQL Server is to use OPTION (RECOMPILE) to prune the un-needed predicates before the query optimizer generates a query plan.
EG:
SELECT value
FROM AssignmentTable
WHERE (Column1 = #column1 OR #column1 IS NULL)
AND (Column2 = #column2 OR #column2 IS NULL)
AND (Column3 = #column3 OR #column3 IS NULL)
AND ... for all criteria (roughly 15)
OPTION (RECOMPILE)
See the classic Dynamic Search Conditions in T-SQL for a complete discussion of the alternatives.

Like Query in SQL taking time

So I've looked around to try to find some posts on this and there are many
Like Query 1 and Like Query 2 but none that address my specific question (that I could find).
I have two tables in which I have around 5000000+ records and I am returning Search result from these tables as :
SELECT A.ContactFirstName, A.ContactLastName
FROM Customer.CustomerDetails AS A WITH (nolock)
WHERE (A.ContactFirstName + ' ' + A.ContactLastName LIKE '%' + 'a' + '%')
UNION
SELECT C.ContactFirstName, C.ContactLastName
FROM Customer.Contacts AS C WITH (nolock)
WHERE (C.ContactFirstName + ' ' + C.ContactLastName LIKE '%' + 'a' + '%')
My problem is it is taking around 1 minute to execute.
For above query I am expecting result like :
Please suggest me the best practice to improve performance. Thanks in advance.
NOTE : No missing Indexes.

when you use "LIKE '%xxx%'" index are not used that why your query is slow i think. When you use "LIKE 'xxx%')" index is used (if an index exist on column of course. >Other proble you do a like on concatenante column, i dont knwo if index is used in this case. And why do a 'xxx' + ' ' + 'yyy' like 'z%', just do 'xxx' like 'z%' its the same. You can try to modify your query like this
SELECT A.ContactFirstName, A.ContactLastName
FROM Customer.CustomerDetails AS A WITH (nolock)
WHERE A.ContactFirstName LIKE '%a%' or A.ContactLastName LIKE '%a%'
UNION
SELECT C.ContactFirstName, C.ContactLastName
FROM Customer.Contacts AS C WITH (nolock)
WHERE C.ContactFirstName LIKE 'a%'

Use Charindex which improves performance of the search ,Here it checks the string to match with first charcter of given search charecter and doesn't search for any more matches.
DECLARE #Search VARCHAR(10)='a'
SELECT A.ContactFirstName, A.ContactLastName
FROM Customer.CustomerDetails AS A WITH (NOLOCK)
WHERE CHARINDEX(#Search,(A.ContactFirstName + ' ' + A.ContactLastName),0)>1

Fastest way to compare substring property between two strings in sql server

Given two strings A and B, what is the fastest way to compare whether A is a substring of B or B is a substring of A?
A LIKE '%' + B + '%' OR B LIKE '%' + A + '%'
or
CHARIDNEX(A,B) <> 0 OR CHARINDEX(B,A) <> 0
I believe its the former because it doesnt calculate the location.
Question 1: is there a faster way to do it because I want to minimize the number of times B has to be used as B is a string I get by processing another column value.
As an additional note,
Basically I want to do something as follows with a column, C
SELECT
CASE WHEN A LIKE Processing(C) THEN 0
WHEN A LIKE '%' + PROCESSING(C) + '%' OR PROCESSING(C) LIKE '%' + A + '%' THEN LEN(A) - LEN(PROCESSING(C))
END AS Score
FROM #table
where A and C are columns in table, #table. As can be seen, the number of times I am calling Processing(C) is huge as it is done for each record.
Question 2: Should I put Processing(C) it in a separate temp table and then run substring check against that column or continue with the same approach.

My guess is that charindex() and like would have similar performance in this case. Don't hesitate to test which is faster (and report back on the results so we can all learn).
However, this particular optimization probably won't make a difference to the overall query. Your question may be an example of premature optimization.
Once upon a time, I thought that like performed worse than the comparable string operation. However, like is optimized in many databases, including SQL Server. As an example of the optimization, like is able to use indexes (when there is no wildcard or the wildcard is at the end). charindex() does not use indexes. If you are looking for matches at the beginning of the respective strings, then your query could possibly take advantage of indexes.
EDIT:
For your concern about PROCESSING(c), you might consider a subquery:
SELECT (CASE WHEN A LIKE Processing_C THEN 0
WHEN A LIKE '%' + Processing_C + '%' OR Processing_C LIKE '%' + A + '%'
THEN LEN(A) - LEN(Processing_C)
END) AS Score
FROM (select t.*, PROCESSING(C) as Processing_C
from #table
) t

Speed up this Sybase stored procedure

I have this stored procedure which I am using to populate a user table. It seems slow because it is taking around avg. 6s to return the records. Is there anything else I can do to tweak this sproc to make it faster?
CREATE PROCEDURE dbo.usmGetPendingAuthorizations
(
#empCorpId char(8) = null,
#empFirstName char(30) = null,
#empLastName char(30) = null,
#accessCompletionStatus char(20) = null,
#ownerCorpId char(8) = null,
#reqCorpId char(8) = null,
#reqDate datetime = null,
#rowCount int = 100
)
AS BEGIN
SET ROWCOUNT #rowCount
SELECT
UPPER(LTRIM(RTRIM(pa.RequestorCorpId))) AS ReqCorpId,
UPPER(LTRIM(RTRIM(pa.AccessCompletionStatus))) AS AccessCompletionStatus,
UPPER(LTRIM(RTRIM(pa.Comment))) AS ReqComment,
UPPER(LTRIM(RTRIM(pa.ValidLoginInd))) AS ValidLoginInd,
UPPER(LTRIM(RTRIM(pa.OwnerCorpId))) AS OwnerCorpId,
UPPER(LTRIM(RTRIM(pa.UserTypeCode))) AS UserTypeCode,
UPPER(LTRIM(RTRIM(pa.SelectMethod))) AS SelectMethod,
pa.ExpirationDate AS ExpirationDate,
pa.RequestorDate AS ReqDate,
pa.BeginDate AS BeginDate,
pa.EndDate AS EndDate,
UPPER(LTRIM(RTRIM(pa.UserGroupTypeCode))) AS UserGroupTypeCode,
pa.SubsidiaryId AS SubsidiaryId,
UPPER(LTRIM(RTRIM(pa.EmployeeCorpId))) AS EmpCorpId,
emp.empKeyId AS EmpKeyId,
LTRIM(RTRIM(emp.firstName)) AS EmpFirstName,
LTRIM(RTRIM(emp.lastName)) AS EmpLastName
FROM
dbo.PendingAuthorization AS pa JOIN capmark..EmployeeDataExtract AS emp
ON
UPPER(LTRIM(RTRIM(pa.EmployeeCorpId))) = UPPER(LTRIM(RTRIM(emp.corporateId)))
WHERE
UPPER(LTRIM(RTRIM(pa.EmployeeCorpId))) LIKE ISNULL(UPPER(LTRIM(RTRIM(#empCorpId))), '%')
AND UPPER(LTRIM(RTRIM(emp.firstName))) LIKE ISNULL('%' + UPPER(LTRIM(RTRIM(#empFirstName))) + '%', '%')
AND UPPER(LTRIM(RTRIM(emp.lastName))) LIKE ISNULL('%' + UPPER(LTRIM(RTRIM(#empLastName))) + '%', '%')
AND pa.AccessCompletionStatus LIKE ISNULL(UPPER(LTRIM(RTRIM(#accessCompletionStatus))), '%')
AND pa.OwnerCorpId LIKE ISNULL(UPPER(LTRIM(RTRIM(#ownerCorpId))), '%')
AND pa.RequestorCorpId LIKE ISNULL(UPPER(LTRIM(RTRIM(#reqCorpId))), '%')
AND DATEDIFF(dd, pa.RequestorDate, CONVERT(VARCHAR(10), ISNULL(#reqDate, pa.RequestorDate), 101)) = 0
SET ROWCOUNT 0
END

The main problem is the liberal use of functions, especially in the join. Where functions are used in this way Sybase cannot take advantage of indexes on those fields. Take for example the join
ON
UPPER(LTRIM(RTRIM(pa.EmployeeCorpId))) = UPPER(LTRIM(RTRIM(emp.corporateId)))
Are all those trims and uppers really needed?
If you have dirty data stored - mixed case, with some leading and some trailing space, I suggest that you try to tighten up the way the data are stored and/or updated - don't allow such data to get in. Carry out a one-time scrub of the data to make all corporate Ids uppercase with no trailing or leading spaces.
Once you've got clean data you can add an index on corporateId column in the EmployeeDataExtract table (or rebuild it if one already exists) and change the join to
ON
pa.EmployeeCorpId = emp.corporateId
If you really can't ensure clean data in the PendingAuthorization table then you'd have to leave the functions wrapping on that side of the join, but at least the index on the emp table will be available for the optimiser to consider.
The use of LIKE with leading edge wildcards makes indexes unusable, but that may be unavoidable in your case.
It looks like the PendingAuthorization.RequestorDate field is used to select data only for one date - the one supplied in #reqDate. You could transform that part of the WHERE clause to a range query, then an index on the date field could be used.
To do that you would use just the date part of #reqDate (ignoring time of day) and then derive from that 'date+1'. These would be the values used. Whether this would help much depends on how many RequestorDate days are present in the PendingAuthorization table.

Although you seem never accepted any answer, I will try to help you :).
First, I would revise the WHERE clause. Instead of your LIKEs:
UPPER(LTRIM(RTRIM(pa.EmployeeCorpId))) LIKE ISNULL(UPPER(LTRIM(RTRIM(#empCorpId))), '%')
I would use this:
(UPPER(LTRIM(RTRIM(pa.EmployeeCorpId))) LIKE UPPER(LTRIM(RTRIM(#empCorpId)))) OR (#empCorpId IS NULL)
So, can you try the following WHERE clause instead of yours to see if there is any difference in performance?
WHERE
(UPPER(LTRIM(RTRIM(pa.EmployeeCorpId))) LIKE UPPER(LTRIM(RTRIM(#empCorpId))) OR #empCorpId IS NULL)
AND (UPPER(LTRIM(RTRIM(emp.firstName))) LIKE '%' + UPPER(LTRIM(RTRIM(#empFirstName))) + '%' OR #empFirstName IS NULL)
AND (UPPER(LTRIM(RTRIM(emp.lastName))) LIKE '%' + UPPER(LTRIM(RTRIM(#empLastName))) + '%' OR #empLastName IS NULL)
AND (pa.AccessCompletionStatus LIKE UPPER(LTRIM(RTRIM(#accessCompletionStatus))) OR #accessCompletionStatus IS NULL)
AND (pa.OwnerCorpId LIKE UPPER(LTRIM(RTRIM(#ownerCorpId))) OR #ownerCorpId IS NULL)
AND (pa.RequestorCorpId LIKE UPPER(LTRIM(RTRIM(#reqCorpId))) OR #reqCorpId IS NULL)
AND (DATEDIFF(dd, pa.RequestorDate, CONVERT(VARCHAR(10), ISNULL(#reqDate, pa.RequestorDate), 101)) = 0)
Second, in general, SELECT works faster if columns referenced in WHERE clause are indexed appropriately.
Then, DATEDIFF along with CONVERT in WHERE clause are not going to speed up your query either.
But, the main question is: how many rows are in the joined tables? Because 6 seconds could be not that bad. You can check/play with Query plan to find out any potential bottle-necks.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas