LIKE work-around in SQL (Performance issues)

LIKE work-around in SQL (Performance issues) - sql

I've been reading around and found that using LIKE causes a big slowdown in queries.
A workmate recommended we use
Select Name
From mytable
a.Name IN (SELECT Name
FROM mytable
WHERE Name LIKE '%' + ISNULL(#Name, N'') + '%'
GROUP BY Name)
in lieu of
Select Name
From mytable
a.Name LIKE '%' + ISNULL(#Name, N'') + '%'
Now I'm no SQL expert and I don't really understand the inner workings of these statements. Is this a better option worth the effort of typing a few extra characters with each like statement? Is there an even better (and easier to type) alternative?

There are a couple of performance issues to address...
Don't Access the Same Table More Than Once, If Possible
Don't use a subquery for criteria that can be done without the need for referencing additional copies of the same table. It's acceptable if you need data from a copy of the table due to using aggregate functions (MAX, MIN, etc), though analytic functions (ROW_NUMBER, RANK, etc) might be more accommodating (assuming supported).
Don't Compare What You Don't Need To
If your parameter is NULL, and that means that you want any value for the columns you are comparing against, don't include filtration criteria. Statements like these:
WHERE a.Name LIKE '%' + ISNULL(#Name, N'') + '%'
...guarantee the optimizer will have to compare values for the name column, wildcarding or not. Worse still in the case with LIKE is that wildcarding the left side of the evaluation ensures that an index can't be used if one is present on the column being searched.
A better performing approach would be:
IF #Name IS NOT NULL
BEGIN
SELECT ...
FROM ...
WHERE a.name LIKE '%' + #Name + '%'
END
ELSE
BEGIN
SELECT ...
FROM ...
END
Well performing SQL is all about tailoring to exactly what you need. Which is why you should be considering dynamic SQL when you have queries with two or more independent criteria.
Use The Right Tool
The LIKE operator isn't very efficient at searching text when you're checking for the existence of a string within text data. Full Text Search (FTS) technology was designed to address the shortcomings:
IF #Name IS NOT NULL
BEGIN
SELECT ...
FROM ...
WHERE CONTAINS(a.name, #Name)
END
ELSE
BEGIN
SELECT ...
FROM ...
END
Always Test & Compare
I agree with LittleBobbyTables - the solution ultimately relies on checking the query/execution plan for all the alternatives because table design & data can impact optimizer decision & performance. In SQL Server, the one with the lowest subtreecost is the most efficient, but it can change over time if the table statistics and indexes aren't maintained.

Simply compare the execution plans and you should see the difference.
I don't have your exact data, but I ran the following queries against a SQL Server 2005 database of mine (yes, it's nerdy):
SELECT UnitName
FROM Units
WHERE (UnitName LIKE '%Space Marine%')
SELECT UnitName
FROM Units
WHERE UnitName IN (
(SELECT UnitName FROM Units
WHERE UnitName LIKE '%Space Marine%' GROUP BY UnitName)
)
Here were my execution plan results:
Your co-worker's suggestion adds a nested loop and a second clustered index scan to my query as you can see above. Your mileage may vary, but definitely check the execution plans to see how they compare. I can't imagine how it would be more efficient.

Unless IIQR is some smaller table that indexes the names somehow (and is not the original table being queried here from the start), I don't see how that longer version helps at all; it's doing the exact same thing, but just adding in an extra step of creating a set of results which is when used in an IN.
But I'd be dubious even if IIQR is a smaller 'index' table. I'd want to see more about the database in question and what the query plan ends up being for each.
LIKE can have a negative effect on query performance because it often requires a table scan - physically loading each record's relevant field and searching for the text in question. Even if the field is indexed, this is likely the case. But there may be no way around it, if what you need to do is search for partial text at any possible location inside a field.
Depending on the size of the table in question, though; it may really not matter at all.
For you, though; I would suggest that keeping it simple is best. Unless you really do know what the whole effect of complicating a query would be on performance, it can be hard to try to decide which way to do things.

Related

difference b/w where column='' and column like '' in sql [duplicate]

This question skirts around what I'm wondering, but the answers don't exactly address it.
It would seem that in general '=' is faster than 'like' when using wildcards. This appears to be the conventional wisdom. However, lets suppose I have a column containing a limited number of different fixed, hardcoded, varchar identifiers, and I want to select all rows matching one of them:
select * from table where value like 'abc%'
and
select * from table where value = 'abcdefghijklmn'
'Like' should only need to test the first three chars to find a match, whereas '=' must compare the entire string. In this case it would seem to me that 'like' would have an advantage, all other things being equal.
This is intended as a general, academic question, and so should not matter which DB, but it arose using SQL Server 2005.

See https://web.archive.org/web/20150209022016/http://myitforum.com/cs2/blogs/jnelson/archive/2007/11/16/108354.aspx
Quote from there:
the rules for index usage with LIKE
are loosely like this:
If your filter criteria uses equals =
and the field is indexed, then most
likely it will use an INDEX/CLUSTERED
INDEX SEEK
If your filter criteria uses LIKE,
with no wildcards (like if you had a
parameter in a web report that COULD
have a % but you instead use the full
string), it is about as likely as #1
to use the index. The increased cost
is almost nothing.
If your filter criteria uses LIKE, but
with a wildcard at the beginning (as
in Name0 LIKE '%UTER') it's much less
likely to use the index, but it still
may at least perform an INDEX SCAN on
a full or partial range of the index.
HOWEVER, if your filter criteria uses
LIKE, but starts with a STRING FIRST
and has wildcards somewhere AFTER that
(as in Name0 LIKE 'COMP%ER'), then SQL
may just use an INDEX SEEK to quickly
find rows that have the same first
starting characters, and then look
through those rows for an exact match.
(Also keep in mind, the SQL engine
still might not use an index the way
you're expecting, depending on what
else is going on in your query and
what tables you're joining to. The
SQL engine reserves the right to
rewrite your query a little to get the
data in a way that it thinks is most
efficient and that may include an
INDEX SCAN instead of an INDEX SEEK)

It's a measureable difference.
Run the following:
Create Table #TempTester (id int, col1 varchar(20), value varchar(20))
go
INSERT INTO #TempTester (id, col1, value)
VALUES
(1, 'this is #1', 'abcdefghij')
GO
INSERT INTO #TempTester (id, col1, value)
VALUES
(2, 'this is #2', 'foob'),
(3, 'this is #3', 'abdefghic'),
(4, 'this is #4', 'other'),
(5, 'this is #5', 'zyx'),
(6, 'this is #6', 'zyx'),
(7, 'this is #7', 'zyx'),
(8, 'this is #8', 'klm'),
(9, 'this is #9', 'klm'),
(10, 'this is #10', 'zyx')
GO 10000
CREATE CLUSTERED INDEX ixId ON #TempTester(id)CREATE CLUSTERED INDEX ixId ON #TempTester(id)
CREATE NONCLUSTERED INDEX ixTesting ON #TempTester(value)
Then:
SET SHOWPLAN_XML ON
Then:
SELECT * FROM #TempTester WHERE value LIKE 'abc%'
SELECT * FROM #TempTester WHERE value = 'abcdefghij'
The resulting execution plan shows you that the cost of the first operation, the LIKE comparison, is about 10 times more expensive than the = comparison.
If you can use an = comparison, please do so.

You should also keep in mind that when using like, some sql flavors will ignore indexes, and that will kill performance. This is especially true if you don't use the "starts with" pattern like your example.
You should really look at the execution plan for the query and see what it's doing, guess as little as possible.
This being said, the "starts with" pattern can and is optimized in sql server. It will use the table index. EF 4.0 switched to like for StartsWith for this very reason.

If value is unindexed, both result in a table-scan. The performance difference in this scenario will be negligible.
If value is indexed, as Daniel points out in his comment, the = will result in an index lookup which is O(log N) performance. The LIKE will (most likely - depending on how selective it is) result in a partial scan of the index >= 'abc' and < 'abd' which will require more effort than the =.
Note that I'm talking SQL Server here - not all DBMSs will be nice with LIKE.

You are asking the wrong question. In databases is not the operator performance that matters, is always the SARGability of the expression, and the coverability of the overall query. Performance of the operator itself is largely irrelevant.
So, how do LIKE and = compare in terms of SARGability? LIKE, when used with an expression that does not start with a constant (eg. when used LIKE '%something') is by definition non-SARGabale. But does that make = or LIKE 'something%' SARGable? No. As with any question about SQL performance the answer does not lie with the query of the text, but with the schema deployed. These expression may be SARGable if an index exists to satisfy them.
So, truth be told, there are small differences between = and LIKE. But asking whether one operator or other operator is 'faster' in SQL is like asking 'What goes faster, a red car or a blue car?'. You should eb asking questions about the engine size and vechicle weight, not about the color... To approach questions about optimizing relational tables, the place to look is your indexes and your expressions in the WHERE clause (and other clauses, but it usually starts with the WHERE).

A personal example using mysql 5.5: I had an inner join between 2 tables, one of 3 million rows and one of 10 thousand rows.
When using a like on an index as below(no wildcards), it took about 30 seconds:
where login like '12345678'
using 'explain' I get:
When using an '=' on the same query, it took about 0.1 seconds:
where login ='12345678'
Using 'explain' I get:
As you can see, the like completely cancelled the index seek, so query took 300 times more time.

= is much faster than LIKE, even without wildcard. I tested on MySQL with 11GB of data and more than 100 million of records, the f_time column is indexed.
SELECT * FROM XXXXX WHERE f_time = '1621442261'
#took 0.00sec and return 330 records
SELECT * FROM XXXXX WHERE f_time LIKE '1621442261'
#took 44.71sec and return 330 records

Besides all the answers, there this to consider:
'like' is case insensitive, so every character needs to be compared twice, whereas the '=' only compares once for identical characters.
This issue arises with or without indexes.

Maybe you are looking about Full Text Search.
In contrast to full-text search, the LIKE Transact-SQL predicate works on
character patterns only. Also, you cannot use the LIKE predicate to
query formatted binary data. Furthermore, a LIKE query against a large
amount of unstructured text data is much slower than an equivalent
full-text query against the same data. A LIKE query against millions
of rows of text data can take minutes to return; whereas a full-text
query can take only seconds or less against the same data, depending
on the number of rows that are returned.

I was working with a huge database that has more then 400M records and I put LIKE in search query. Here is the final results.
There were three tables tb1, tb2 and tb3. When I use EQUAL for in all tables QUERY the response time was 193ms. and when I put LIKE in one of he table the response time was 19.22 sec. and for all table LIKE response time was 112 Sec

Does ISNULL or OR have better performance?

I have the SQL query:
SELECT ISNULL(t.column1, t.column2) as [result]
FROM t
I need to filter out data by [result] column. What is the best approach regarding performance from the two listed below:
WHERE ISNULL(t.column1, t.column2) = #filterValue
or:
WHERE t.column1 = #filterValue OR t.column2 = #filterValue
UPDATE: Sorry, I have forgotten to mention that the column2 is always null if the column1 is filled.

Measure, don't guess! This is something you should be doing yourself, with production-like data. We don't know the make-up of your data and that makes a big difference.
Having said that, I wouldn't do it either way. I'd create another column, column3 to store column1 if non-NULL and column2 if column1 is NULL.
Then I'd have an insert/update trigger to populate that column correctly, index it and use the screaming-banshee-speed:
select t.column3 as [result] from t
The vast majority of databases are read more often than written and it's better if this calculation is done as few times as possible (i.e., when the data changes, not every time you select it). If you want your databases to be scalable, don't use per-row functions.
It's perfectly valid to sacrifice disk space for speed and the triggers ensure that the data doesn't become inconsistent.
If adding another column and triggers is out of the question, I'd go for the or solution since it can often be split into two parallel queries by the smarter DBMS engines.
An alternative, which MarkB gave but since deleted his answer so I'll have to go hunting for another good answer of his to upvote :-), is to use UNION ALL. If your DBMS isn't quite smart enough to recognise OR as a chance for parallelism, it may be smart enough to recognise UNION ALL in that context, something like:
select column1 as c from t where column1 is not NULL
union all
select column2 as c from t where column1 is NULL
But again, it depends on both your database and your data. A smart DBA would put the whole thing in a stored procedure so they could swap in a new method seamlessly should the data change its properties.

On an MSSQL-Table (MSSQL 2000) with 13.000.000 entries and indexes on Col1 and Col2 i get the following results:
select top 1000000 * from Table1 with(nolock) where isnull(Col1,Col2) > '0'
-- Compile-Time: 4ms
-- CPU-Time: 18265ms
-- Elapsed-Time: 24882ms = ~25s
select top 1000000 * from Table1 with(nolock) where Col1 > '0' or (Col1 is null and Col2 > '0')
-- Compile-Time: 9ms
-- CPU-Time: 7781ms
-- Elapsed-Time: 25734 = ~26s
The measured values are subject to strong fluctuations base on the workload of the server.
The first statment need lesser time to compile but takes more cpu-time for excecution (culstered index scan).
Its important to know that many storage-engines have an optimizer who reorganize the statment for better results und executiontimes. Ultimately, both statements will rebuild to mostly the same statement by the optimizer.

I think, your replacement expression does not mean the same. Assume filterValue is 2, then ISNULL(1,2)=2 is false, but 1=2 or 2=2 is true. The expression you need looks more like:
(c1=filter) or ((c1 is null) and (c2 = filter));
There is a chance that a server can answer this from the index on c1. First part of the soultion is an index scan over c1=filter. The second part is a scan over c1=null and then a linear search for c2=filter. I'd even say that a clustered index (c1,c2) could work here.
OTOH, you should rather measure before make assumptions like this, speculations doesn't work usually in SQL unless you have intimate knowledge on the implementation. For example, I'm pretty sure that the query planners already knows that ISNULL(X,Y) can be decomposed into a boolean statement with its implications for searching, but I would not rely on that but rather measure and then decide what to do.

I have measured the performance of both queries on SQL Sever 2008.
And have got the following results:
Both approaches had almost the same Estimated Subtree Cost metric.
But the OR approach had more accurate value of the Estimated Number of Rows metric.
So the query optimizer will build more appropriate execution plan for the OR than for ISNULL approach.

How do I optimize a database for superstring queries?

So I have a database table in MySQL that has a column containing a string. Given a target string, I want to find all the rows that have a substring contained in the target, ie all the rows for which the target string is a superstring for the column. At the moment I'm using a query along the lines of:
SELECT * FROM table WHERE 'my superstring' LIKE CONCAT('%', column, '%')
My worry is that this won't scale. I'm currently doing some tests to see if this is a problem but I'm wondering if anyone has any suggestions for an alternative approach. I've had a brief look at MySQL's full-text indexing but that also appears to be geared toward finding a substring in the data, rather than finding out if the data exists in a given string.

You could create a temporary table with a full text index and insert 'my superstring' into it. Then you could use MySQL's full text match syntax in a join query with your permanent table. You'll still be doing a full table scan on your permanent table because you'll be checking for a match against every single row (what you want, right?). But at least 'my superstring' will be indexed so it will likely perform better than what you've got now.
Alternatively, you could consider simply selecting column from table and performing the match in a high level language. Depending on how many rows are in table, this approach might make more sense. Offloading heavy tasks to a client server (web server) can often be a win because it reduces load on the database server.

If your superstrings are URLs, and you want to find substrings in them, it would be useful to know if your substrings can be anchored on the dots.
For instance, you have superstrings :
www.mafia.gov.ru
www.mymafia.gov.ru
www.lobbies.whitehouse.gov
If your rules contain "mafia' and you want the first 2 to match, then what I'll say doesn't apply.
Else, you can parse your URLs into things like : [ 'www', 'mafia', 'gov', 'ru' ]
Then, it will be much easier to look up each element in your table.

Well it appears the answer is that you don't. This type of indexing is generally not available and if you want it within your MySQL database you'll need to create your own extensions to MySQL. The alternative I'm pursuing is to do the indexing in my application.
Thanks to everyone that responded!

I created a search solution using views that needed to be robust enought to grow with the customers needs. For Example:
CREATE TABLE tblMyData
(
MyId bigint identity(1,1),
Col01 varchar(50),
Col02 varchar(50),
Col03 varchar(50)
)
CREATE VIEW viewMySearchData
as
SELECT
MyId,
ISNULL(Col01,'') + ' ' +
ISNULL(Col02,'') + ' ' +
ISNULL(Col03,'') + ' ' AS SearchData
FROM tblMyData
SELECT
t1.MyId,
t1.Col01,
t1.Col02,
t1.Col03
FROM tblMyData t1
INNER JOIN viewMySearchData t2
ON t1.MyId = t2.MyId
WHERE t2.SearchData like '%search string%'
If they then decide to add columns to tblMyData and they want those columns to be searched then modify viewMysearchData by adding the new colums to "AS SearchData" section.
If they decide that there are two many columns in the search then just modify the viewMySearchData by removing the unwanted columns from the "AS SearchData" section.

SQL Server - Query Short-Circuiting?

Do T-SQL queries in SQL Server support short-circuiting?
For instance, I have a situation where I have two database and I'm comparing data between the two tables to match and copy some info across. In one table, the "ID" field will always have leading zeros (such as "000000001234"), and in the other table, the ID field may or may not have leading zeros (might be "000000001234" or "1234").
So my query to match the two is something like:
select * from table1 where table1.ID LIKE '%1234'
To speed things up, I'm thinking of adding an OR before the like that just says:
table1.ID = table2.ID
to handle the case where both ID's have the padded zeros and are equal.
Will doing so speed up the query by matching items on the "=" and not evaluating the LIKE for every single row (will it short circuit and skip the LIKE)?

SQL Server does NOT short circuit where conditions.
it can't since it's a cost based system: How SQL Server short-circuits WHERE condition evaluation .

You could add a computed column to the table. Then, index the computed column and use that column in the join.
Ex:
Alter Table Table1 Add PaddedId As Right('000000000000' + Id, 12)
Create Index idx_WhateverIndexNameYouWant On Table1(PaddedId)
Then your query would be...
select * from table1 where table1.PaddedID ='000000001234'
This will use the index you just created to quickly return the row.

You want to make sure that at least one of the tables is using its actual data type for the IDs and that it can use an index seek if possible. It depends on the selectivity of your query and the rate of matches though to determine which one should be converted to the other. If you know that you have to scan through the entire first table, then you can't use a seek anyway and you should convert that ID to the data type of the other table.
To make sure that you can use indexes, also avoid LIKE. As an example, it's much better to have:
WHERE
T1.ID = CAST(T2.ID AS VARCHAR) OR
T1.ID = RIGHT('0000000000' + CAST(T2.ID AS VARCHAR), 10)
than:
WHERE
T1.ID LIKE '%' + CAST(T2.ID AS VARCHAR)
As Steven A. Lowe mentioned, the second query might be inaccurate as well.
If you are going to be using all of the rows from T1 though (in other words a LEFT OUTER JOIN to T2) then you might be better off with:
WHERE
CAST(T1.ID AS INT) = T2.ID
Do some query plans with each method if you're not sure and see what works best.
The absolute best route to go though is as others have suggested and change the data type of the tables to match if that's at all possible. Even if you can't do it before this project is due, put it on your "to do" list for the near future.

How about,
table1WithZero.ID = REPLICATE('0', 12-len(table2.ID))+table2.ID
In this case, it should able to use the index on the table1

Just in case it's useful, as the linked page in Mladen Prajdic's anwer explains, CASE clauses are short-circuit evaluated.

If the ID is purely numeric (as your example), I would reccomend (if possible) changing that field to a number type instead. If the database is allready in use it might be hard to change the type though.

fix the database to be consistent
select * from table1 where table1.ID LIKE '%1234'
will match '1234', '01234', '00000000001234', but also '999991234'. Using LIKE pretty much guarantees an index scan (assuming table1.ID is indexed!). Cleaning up the data will improve performance significantly.
if cleaning up the data is not possible, write a user-defined function (UDF) to strip off leading zeros, e.g.
select * from table1 where dbo.udfStripLeadingZeros(table1.ID) = '1234'
this may not improve performance (since the function will have to run for each row) but it will eliminate false matches and make the intent of the query more obvious
EDIT: Tom H's suggestion to CAST to an integer would be best, if that is possible.

Use a LIKE clause in part of an INNER JOIN

Can/Should I use a LIKE criteria as part of an INNER JOIN when building a stored procedure/query? I'm not sure I'm asking the right thing, so let me explain.
I'm creating a procedure that is going to take a list of keywords to be searched for in a column that contains text. If I was sitting at the console, I'd execute it as such:
SELECT Id, Name, Description
FROM dbo.Card
WHERE Description LIKE '%warrior%'
OR
Description LIKE '%fiend%'
OR
Description LIKE '%damage%'
But a trick I picked up a little while go to do "strongly typed" list parsing in a stored procedure is to parse the list into a table variable/temporary table, converting it to the proper type and then doing an INNER JOIN against that table in my final result set. This works great when sending say a list of integer IDs to the procedure. I wind up having a final query that looks like this:
SELECT Id, Name, Description
FROM dbo.Card
INNER JOIN #tblExclusiveCard ON dbo.Card.Id = #tblExclusiveCard.CardId
I want to use this trick with a list of strings. But since I'm looking for a particular keyword, I am going to use the LIKE clause. So ideally I'm thinking I'd have my final query look like this:
SELECT Id, Name, Description
FROM dbo.Card
INNER JOIN #tblKeyword ON dbo.Card.Description LIKE '%' + #tblKeyword.Value + '%'
Is this possible/recommended?
Is there a better way to do something like this?
The reason I'm putting wildcards on both ends of the clause is because there are "archfiend", "beast-warrior", "direct-damage" and "battle-damage" terms that are used in the card texts.
I'm getting the impression that depending on the performance, I can either use the query I specified or use a full-text keyword search to accomplish the same task?
Other than having the server do a text index on the fields I want to text search, is there anything else I need to do?

Try this
select * from Table_1 a
left join Table_2 b on b.type LIKE '%' + a.type + '%'
This practice is not ideal. Use with caution.

Your first query will work but will require a full table scan because any index on that column will be ignored. You will also have to do some dynamic SQL to generate all your LIKE clauses.
Try a full text search if your using SQL Server or check out one of the Lucene implementations. Joel talked about his success with it recently.

try it...
select * from table11 a inner join table2 b on b.id like (select '%'+a.id+'%') where a.city='abc'.
Its works for me.:-)

It seems like you are looking for full-text search. Because you want to query a set of keywords against the card description and find any hits? Correct?

Personally, I have done it before, and it has worked out well for me. The only issues i could see is possibly issues with an unindexed column, but i think you would have the same issue with a where clause.
My advice to you is just look at the execution plans between the two. I'm sure that it will differ which one is better depending on the situation, just like all good programming problems.

#Dillie-O
How big is this table?
What is the data type of Description field?
If either are small a full text search will be overkill.
#Dillie-O
Maybe not the answer you where looking for but I would advocate a schema change...
proposed schema:
create table name(
nameID identity / int
,name varchar(50))
create table description(
descID identity / int
,desc varchar(50)) --something reasonable and to make the most of it alwase lower case your values
create table nameDescJunc(
nameID int
,descID int)
This will let you use index's without have to implement a bolt on solution, and keeps your data atomic.
related: Recommended SQL database design for tags or tagging

a trick I picked up a little while go
to do "strongly typed" list parsing in
a stored procedure is to parse the
list into a table variable/temporary
table
I think what you might be alluding to here is to put the keywords to include into a table then use relational division to find matches (could also use another table for words to exclude). For a worked example in SQL see Keyword Searches by Joe Celko.

Performance will be depend on the actual server than you use, and on the schema of the data, and the amount of data. With current versions of MS SQL Server, that query should run just fine (MS SQL Server 7.0 had issues with that syntax, but it was addressed in SP2).
Have you run that code through a profiler? If the performance is fast enough and the data has the appropriate indexes in place, you should be all set.

LIKE '%fiend%' will never use an seek, LIKE 'fiend%' will. Simply a wildcard search is not sargable

Try this;
SELECT Id, Name, Description
FROM dbo.Card
INNER JOIN #tblKeyword ON dbo.Card.Description LIKE '%' +
CONCAT(CONCAT('%',#tblKeyword.Value),'%') + '%'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas