SQL Server like statement behavior for %% - sql

In terms of performance, how does the like operator behaves when applied to strings with multiple % placeholders?
for example:
select A from table_A where A like 'A%'
takes the same time to select than
select A from table_A where A like 'A%%'
???

Your queries:
select A from table_A where A like 'A%'
and
select A from table_A where A like 'A%%'
^ optimizer will remove second redundant %
are equivalent, the optimizer will remove the second % in the second query
just like it would remove the 1=1 from:
select A from table_A where A like 'A%%' and 1=1
However, this query is very different:
select A from table_A where A like '%A%'
The when using 'A%' it will use the index to find everything starting with an A, like a person using a phone book would quickly look for the start of a name. However when using '%A%' it will scan the entire table looking for anything containing an A, thus slower and no index usage. Like if you had to find every name in the phone book that contained an A, that would take a while!

It will treat them same. If there is an index on column A, it will use that index just as it would with a single wildcard. However, if you were to add a leading wildcard, that would force a table scan regardless of whether an index existed or not.

For the most part the pattern that you're using will not affect the performance of the query. The key to the performance for this is the appropriate use of indexes. In your example, an index on the column will work well because it will seek values that start with 'A', then match the full pattern. There may be some more-challenging patterns around, but the performance difference is negligible between them.
There is one important condition where the wildcard character will hurt performance. And, that is when it is at the beginning of of the pattern. For, example, '%A' will gain no benefit from an index, because it indicates you want to match on any value that starts with any valid character. All rows must be evaluated to meet this criteria.

Related

difference b/w where column='' and column like '' in sql [duplicate]

This question skirts around what I'm wondering, but the answers don't exactly address it.
It would seem that in general '=' is faster than 'like' when using wildcards. This appears to be the conventional wisdom. However, lets suppose I have a column containing a limited number of different fixed, hardcoded, varchar identifiers, and I want to select all rows matching one of them:
select * from table where value like 'abc%'
and
select * from table where value = 'abcdefghijklmn'
'Like' should only need to test the first three chars to find a match, whereas '=' must compare the entire string. In this case it would seem to me that 'like' would have an advantage, all other things being equal.
This is intended as a general, academic question, and so should not matter which DB, but it arose using SQL Server 2005.
See https://web.archive.org/web/20150209022016/http://myitforum.com/cs2/blogs/jnelson/archive/2007/11/16/108354.aspx
Quote from there:
the rules for index usage with LIKE
are loosely like this:
If your filter criteria uses equals =
and the field is indexed, then most
likely it will use an INDEX/CLUSTERED
INDEX SEEK
If your filter criteria uses LIKE,
with no wildcards (like if you had a
parameter in a web report that COULD
have a % but you instead use the full
string), it is about as likely as #1
to use the index. The increased cost
is almost nothing.
If your filter criteria uses LIKE, but
with a wildcard at the beginning (as
in Name0 LIKE '%UTER') it's much less
likely to use the index, but it still
may at least perform an INDEX SCAN on
a full or partial range of the index.
HOWEVER, if your filter criteria uses
LIKE, but starts with a STRING FIRST
and has wildcards somewhere AFTER that
(as in Name0 LIKE 'COMP%ER'), then SQL
may just use an INDEX SEEK to quickly
find rows that have the same first
starting characters, and then look
through those rows for an exact match.
(Also keep in mind, the SQL engine
still might not use an index the way
you're expecting, depending on what
else is going on in your query and
what tables you're joining to. The
SQL engine reserves the right to
rewrite your query a little to get the
data in a way that it thinks is most
efficient and that may include an
INDEX SCAN instead of an INDEX SEEK)
It's a measureable difference.
Run the following:
Create Table #TempTester (id int, col1 varchar(20), value varchar(20))
go
INSERT INTO #TempTester (id, col1, value)
VALUES
(1, 'this is #1', 'abcdefghij')
GO
INSERT INTO #TempTester (id, col1, value)
VALUES
(2, 'this is #2', 'foob'),
(3, 'this is #3', 'abdefghic'),
(4, 'this is #4', 'other'),
(5, 'this is #5', 'zyx'),
(6, 'this is #6', 'zyx'),
(7, 'this is #7', 'zyx'),
(8, 'this is #8', 'klm'),
(9, 'this is #9', 'klm'),
(10, 'this is #10', 'zyx')
GO 10000
CREATE CLUSTERED INDEX ixId ON #TempTester(id)CREATE CLUSTERED INDEX ixId ON #TempTester(id)
CREATE NONCLUSTERED INDEX ixTesting ON #TempTester(value)
Then:
SET SHOWPLAN_XML ON
Then:
SELECT * FROM #TempTester WHERE value LIKE 'abc%'
SELECT * FROM #TempTester WHERE value = 'abcdefghij'
The resulting execution plan shows you that the cost of the first operation, the LIKE comparison, is about 10 times more expensive than the = comparison.
If you can use an = comparison, please do so.
You should also keep in mind that when using like, some sql flavors will ignore indexes, and that will kill performance. This is especially true if you don't use the "starts with" pattern like your example.
You should really look at the execution plan for the query and see what it's doing, guess as little as possible.
This being said, the "starts with" pattern can and is optimized in sql server. It will use the table index. EF 4.0 switched to like for StartsWith for this very reason.
If value is unindexed, both result in a table-scan. The performance difference in this scenario will be negligible.
If value is indexed, as Daniel points out in his comment, the = will result in an index lookup which is O(log N) performance. The LIKE will (most likely - depending on how selective it is) result in a partial scan of the index >= 'abc' and < 'abd' which will require more effort than the =.
Note that I'm talking SQL Server here - not all DBMSs will be nice with LIKE.
You are asking the wrong question. In databases is not the operator performance that matters, is always the SARGability of the expression, and the coverability of the overall query. Performance of the operator itself is largely irrelevant.
So, how do LIKE and = compare in terms of SARGability? LIKE, when used with an expression that does not start with a constant (eg. when used LIKE '%something') is by definition non-SARGabale. But does that make = or LIKE 'something%' SARGable? No. As with any question about SQL performance the answer does not lie with the query of the text, but with the schema deployed. These expression may be SARGable if an index exists to satisfy them.
So, truth be told, there are small differences between = and LIKE. But asking whether one operator or other operator is 'faster' in SQL is like asking 'What goes faster, a red car or a blue car?'. You should eb asking questions about the engine size and vechicle weight, not about the color... To approach questions about optimizing relational tables, the place to look is your indexes and your expressions in the WHERE clause (and other clauses, but it usually starts with the WHERE).
A personal example using mysql 5.5: I had an inner join between 2 tables, one of 3 million rows and one of 10 thousand rows.
When using a like on an index as below(no wildcards), it took about 30 seconds:
where login like '12345678'
using 'explain' I get:
When using an '=' on the same query, it took about 0.1 seconds:
where login ='12345678'
Using 'explain' I get:
As you can see, the like completely cancelled the index seek, so query took 300 times more time.
= is much faster than LIKE, even without wildcard. I tested on MySQL with 11GB of data and more than 100 million of records, the f_time column is indexed.
SELECT * FROM XXXXX WHERE f_time = '1621442261'
#took 0.00sec and return 330 records
SELECT * FROM XXXXX WHERE f_time LIKE '1621442261'
#took 44.71sec and return 330 records
Besides all the answers, there this to consider:
'like' is case insensitive, so every character needs to be compared twice, whereas the '=' only compares once for identical characters.
This issue arises with or without indexes.
Maybe you are looking about Full Text Search.
In contrast to full-text search, the LIKE Transact-SQL predicate works on
character patterns only. Also, you cannot use the LIKE predicate to
query formatted binary data. Furthermore, a LIKE query against a large
amount of unstructured text data is much slower than an equivalent
full-text query against the same data. A LIKE query against millions
of rows of text data can take minutes to return; whereas a full-text
query can take only seconds or less against the same data, depending
on the number of rows that are returned.
I was working with a huge database that has more then 400M records and I put LIKE in search query. Here is the final results.
There were three tables tb1, tb2 and tb3. When I use EQUAL for in all tables QUERY the response time was 193ms. and when I put LIKE in one of he table the response time was 19.22 sec. and for all table LIKE response time was 112 Sec

Oracle does full table scan in like '%test% query

I have a table with millions of entries in it. There are also some indices for the three fields city, street and name.
But when i perform the following query, it takes 10 seconds+ to return any result.
SELECT bd.*
FROM BASEDATA bd
WHERE 1=1
AND lower(city) LIKE '%city%'
AND lower(street) LIKE '%street%'
AND lower(name) LIKE '%schmidt%'
When looking at the explain plan, it shows the the query is executed with a full table scan instead of using the indices .
Basically an index organises values in an alphanumeric order. Given a predicate it looks up the index starting from the leading edge of the value. So for key = 'ABC' it goes to the part of the index with values starting with A and searches from there.
Now we look at your query and we see that none of the predicates in your WHERE clause have leading values. lower(city) LIKE '%city%' can literally match anything from aaa city to zzz city. So potentially every record in the table. An index is useless in such a scenario, and a full table scan is way more efficient.
(Incidentally, applying a function to a column, as in lower(city) would also prevent the use of an index, unless you have the appropriate function-based index on that column.)
If you want to do lots of this sort of querying you should investigate Oracle's Text functionality. It uses special indexes to support free text operators like contains(). There are overheads for these indexes, so you need to understand what benefits you will get. Find out more.

Oracle CONTAINS() not returning results for numbers

So I have this table with a full text indexed column "value". The field contains some number strings in it, which are correctly returned when i use a query like so:
select value
from mytable
where value like '17946234';
Trouble is, it's incredibly slow because there are a lot of rows in this table, but when i use the CONTAINS operator I get no results:
select value
from mytable
where CONTAINS ( value, '17946234',1)>0
Anyone got any thoughts?
Unfortunately, I'm not an Oracle dude, and the same query works fine in SQL Server. I feel like it must be a stoplist or something with the Oracle Lexer that I can change, but not really sure how.
This could be due to INDEXTYPE IS CTXSYS.CONTEXT in general, or the index not having been updated after the looked for records where added (CONTEXT type indexes are not transactional, whilst CTXCAT type ones are).
However, if you did not accidentally lose the wildcard in your statement (In other words: If no wildcard is required indeed.), you could just query
select value from mytable where value = '17946234';
which could possibly be backed by an ordinary index. (Depending on your specific data distribution and the queries run, an index might not help query performance.)
Otherwise
select value from mytable where instr(value, '17946234') > 0;
might be sufficient.

wildcard or "in list" when querying in Postgres

I have a few tables where I need to get the data related to foo. The size of the tables are about 10^8 rows.
So I need to get all rows where the column include substring 'foo' from these tables.
select * from bar where my_col like '%foo%';
I know this is slow so I check the possible values:
select distinct my_col from bar where my_col like '%foo%';
-- => ('xx_foo', 'yy_foo', 'xx_foo_xx', 'foo' ... 'xx_foo_yy')
The number of possible values varies between 3 and 20.
Now how slow is '%foo%' really?
select * from bar where my_col like '%foo%';
-- or
select * from bar where my_col in('foo', 'xx_foo' ... 'foo_yy'); -- list_size = 20
Any general rule on when to use what, or is testing the speed for different cases the only way to go?
Edit: I do not own the table and no index exists on the column foo. So it needs to do a full table scan no matter what.
If you use %foo%, you will get a full-table scan, which is slow.
If you use IN with a list of values, than an index can be used if it exists on the column on which you have the condition.
So, if you are able, you should avoid using %foo%. Depending on how often new values may appear in the table, you might consider using an extra table holding the distinct values and use it when querying your main table, and update that extra table whenever new distinct value comes to play (if it is possible in your design).
A search using the like operator will sure lead to a table scan when the pattern starts with a %. When using the in operator and the values are not more than a few percent of the values in the table an index can be used, if it exists. Check the cardinality concept:
http://en.wikipedia.org/wiki/Cardinality_%28SQL_statements%29
The DBMS knows about the cardinalities keeping statistics about the tables. If your column has high cardinality and an index on it then an index scan is likely when using the in operator. To update the statistics issue an analyze command.

Database Index not used if the where criteria is !=?

I have a index on a column and it is correctly used when the query is
select * from Table where x = 'somestring'
However it seems to be not used when the query is something like
select * from Table where x != 'someotherstring'
Is this normal or am I missing something else in the query? The actual query is of course much larger and so it could be caused by some other factor. Any other ideas why an index would not be used in a query?
This is normal. Index will only be used if you have a '=' condition. Searching index for != condition is not effective.
Similarly, this may use the index (in Oracle)
select * from Table where x like 'some%'
but this wouldn't
select * from Table where x like '%thing%'
Also,
select * from Table where x between 1 and 10 will use the index
but not
select * from Table where x not between 1 and 10
this is absolutely normal. index is used to look for exact something. where you start when I ask you to look a dictionary when I told you not start with 'S'.
you can always do this.
select * from Table a
where not exist (select * from table b where x = 'somestring' and a.key = b.key)
It may use index if the index is clustering and there are not so many different values of the indexed attribute (so we can quickly decide which blocks we may skip). But if the indexed attribute is, say, a key then using index in this case makes absolutely no sense.
That is indeed normal - to use the index, you need to use a exact match (like the "=" equals operator), or something like a range query.
A query that defines a "negative" criteria (NOT something or another) typically can't be satisfied by an index lookup - you'll have to look up everything except a certain value. That doesn't work nicely - typically, a full table scan (clustered index scan in SQL Server) will be quicker, just checking for the criteria to be matched (or not matched, in that case).
I think that a != condition can use an index (in MSSQL). According the the execution plan in MSSQL, if I have an index on a single field, and I apply a where clause on that field, one with a != and one with =, they both result the same execution plan, both using an index seek.
You didn't say what database engine you are using.
MS SQL Server, for example, has both Equality indexes and Inequality indexes.
The latter are used when the not equal operator is in play.