I have a scenario where I need to perform following operation:
SELECT *
FROM
[dbo].[MyTable]
WHERE
[Url] LIKE '%<some url>%';
I have to use two % (wildcard characters) at the beginning and the end of Url ('%<some url>%') as user should be able to search the complete url even if he types partial text. For example, if url is http://www.google.co.in and user types "goo", then the url must appear in search results. LIKE operator is causing performance issues. I need an alternative so that I can get rid of this statement and wildcards. In other words, I don't want to use LIKE statement in this scenario. I tried using T-SQL CONTAINS but it is not solving my problem. Is there any other alternative available than can perform pattern matching and provide me results quickly?
Starting a like with a % is going to cause a scan. No getting around it. It has to evaluate every value.
If you index the column it should be an index (rather than table) scan.
You don't have an alternative that will not cause a scan.
Charindex and patindex are alternatives but will still scan and not fix the performance issue.
Could you break the components out into a separate table?
www
google
co
in
And then search on like 'goo%'?
That would use an index as it does not start with %.
Better yet you could search on 'google' and get an index seek.
And you would want to have the string unique in that table with a separate join on Int PK so it does not return multiple www for instance.
Suspect FullText Contains was not faster because FullText kept the URL as one word.
You could create a FULLTEXT index.
First create your catalog:
CREATE FULLTEXT CATALOG ft AS DEFAULT;
Now assuming your table is called MyTable, the column is TextColumn and it has a unique index on it called UX_MyTable_TextColumn:
CREATE FULLTEXT INDEX ON [dbo].[MyTable](TextColumn)
KEY INDEX UX_MyTable_TextColumn
Now you can search the table using CONTAINS:
SELECT *
FROM MyTable
WHERE CONTAINS(TextColumn, 'searchterm')
To my knowledge there's no alternative to like or contains (full text search feature) which would give better performance.
What you can do is try to improve performance by optimising your query.
To do that, you need to know a bit about your users & how they'll use your system.
I suspect most people will enter a URL from the start of the address (i.e. without protocol), so you could do something like this:
declare #searchTerm nvarchar(128) = 'goo'
set #searchTerm = coalesce(replace(#searchTerm ,'''',''''''),'')
select #searchTerm
SELECT *
FROM [dbo].[MyTable]
WHERE [Url] LIKE 'http://' + #searchTerm + '%'
or [Url] LIKE 'https://' + #searchTerm + '%'
or [Url] LIKE 'http://www.' + #searchTerm + '%'
or [Url] LIKE 'https://www.' + #searchTerm + '%'
or [Url] LIKE '%' + #searchTerm + '%'
option (fast 1); --get back the first result asap;
That then gives you some optimisation; i.e. if the url's http://www.google.com the index on the url column can be used since http://www.goo is at the start of the string.
The option (fast 1) piece on the end's to ensure this benefit is seen; since the last URL like %searchTerm% can't make use of indexes, we'd rather return responses as soon as we can rather than wait for that slow part to complete.
Have a think of other common usage patterns and ways around those.
As written, your query cannot be further optimized, and there is no way of getting around the LIKE to do your searching. The only thing you can do to improve performance is reduce the SELECT to return only the columns you need if you don't need all of them, and create an index on URL with those columns included. The LIKE will not be able to use the index for seeking, but the reduced data size for scanning can help. If you have a SQL Server edition that supports compression, that will help as well.
For instance, if you really need only column A, write
SELECT A FROM [dbo].[MyTable] WHERE [Url] LIKE '%<some url>%';
And create the index as
CREATE INDEX IX_MyTable_URL
ON MyTable([Url])
INCLUDE (A) WITH (DATA_COMPRESSION = PAGE);
If A is already included in your primary key, the INCLUDE is unnecessary.
Your query is a very simple one and I see no reason for it to be slow. The dbms wil read record for record and compare strings. Usually it can even do this in parallel threads.
What do you think can be the reason for your statement being so slow? Are there billions of records in your table? Do your records contain so much data?
Your best bet is not to care about the query, but about the database and your system. Others have already suggested an index on the url column, so rather than scanning the table, the index can be scanned. Is max degree of parallelism mistakenly set? Is your table fragmented? Is your hardware appropriate? These are the things to consider here.
However: charindex('oogl', url) > 0 does the same as url like '%oogl%', but internally they work differently somehow. For some people the LIKE expression turned out faster, for others the CHARINDEX method. Maybe it depends on the query, number of processors, operating system, whatever. It may be worth a try.
Related
I have a table with a full text index. This query returns 2 results:
SELECT *
FROM SampleTable
WHERE ContentForSearch LIKE '% mount %'
ORDER BY 1
This query returns 1 result:
SELECT *
FROM SampleTable
WHERE CONTAINS(ContentForSearch, '"mount"')
ORDER BY 1
Adding a new instance of the word "mount" to the table does show up in the search. Why?
I've already checked the stopwords list as best as I knew how to. This returns no results:
SELECT *
FROM sys.fulltext_stoplists
This returns no results:
SELECT *
FROM sys.fulltext_system_stopwords
WHERE stopword like '%mount%'
I also checked to see if the index was up to date, this returned the current time (minus a few minutes) and a 0, indicating idle:
SELECT DATEADD(ss, FULLTEXTCATALOGPROPERTY('SampleTableCatalog','PopulateCompletionAge'), '1/1/1990') AS LastPopulated,
FULLTEXTCATALOGPROPERTY('SampleTableCatalog','PopulateStatus')
I also did some searches in the string that doesn't show up in the CONTAINS result to see if the ASCII values were strange (and can provide queries if needed), but they were exactly the same as the one that did show up.
On one copy of the database, someone ran:
ALTER FULLTEXT INDEX ON SampleTable SET STOPLIST = OFF;
ALTER FULLTEXT INDEX ON SampleTable SET STOPLIST = SYSTEM;
and that seemed to fix it, but I have no idea why, and I'm uncomfortable making changes I don't understand.
UPDATE
Stoleg's comments led me to the solution eventually. Full text indexing was somehow turned off on a certain database server. When that database was then restored to another server, those entries that didn't get indexed on the first server were still not indexed even though the new server was properly updating the index. I found this by using Stoleg's queries to check which rows were missing from the index, and then checking the modified date for those rows (which luckily were stored). I noticed the pattern that rows from the dates when the database was on the other server were not in the index. The solution on the problem server was to turn on full text indexing and rebuild the catalogs. As to how the indexing got turned off, I don't understand it myself. The comment from the DBA on how he solved it was "I added full text search as resource to cluster node. "
Well, the obvious question is: Is 'mount' in your stoplist?
Microsoft Configure and Manage Stopwords and Stoplists for Full-Text Search shows you how to query and update your stop words.
You might also want to review the general info on stoplist from Microsoft.
ADDED
Don't take as insult (not that you sounded insulted). Way too many times, people say they have checked something when they only thought they had -- looking at the wrong database, etc. So wanted you to make sure. I interpreted your some of the time as works with like, not with contains, so I thought it was more likely actually the stoplist.
The only other "obvious" solution would be to rebuild the full text index -- with the thought that changing the stoplist has the same effect on the other database. I suppose you could restart the server first too. But, as another mysterious solution, not a first choice.
Changes the full-text stoplist that is associated with the index, if any.
OFF
Specifies that no stoplist be associated with the full-text index.
SYSTEM
Specifies that the default full-text system STOPLIST should be used for this full-text index.
stoplist_name
Specifies the name of the stoplist to be associated with the full-text index.
For more information, see Configure and Manage Stopwords and Stoplists for Full-Text Search.
This just removed and reset the stoplist to system default.
It is because of the extra spaces in '% mount %'
try:
SELECT *
FROM SampleTable
WHERE ContentForSearch LIKE '%mount%'
ORDER BY 1
I have Users table contains about 500,000 rows of users data
The Full Name of the User is stored in 4 columns each have type nvarchar(50)
I have a computed column called UserFullName that's equal to the combination of the 4 columns
I have a Stored Procedure searching in Users table by name using like operatior as below
Select *
From Users
Where UserFullName like N'%'+#FullName+'%'
I have a performance issue while executing this SP .. it take long time :(
Is there any way to overcome the lack of performance of using Like operator ?
Not while still using the like operator in that fashion. The % at the start means your search needs to read every row and look for a match. If you really need that kind of search you should look into using a full text index.
Make sure your computed column is indexed, that way it won't have to compute the values each time you SELECT
Also, depending on your indexing, using PATINDEX might be quicker, but really you should use a fulltext index for this kind of thing:
http://msdn.microsoft.com/en-us/library/ms187317.aspx
If you are using index it will be good.
So you can give the column an id or something, like this:
Alter tablename add unique index(id)
Take a look at that article http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/like-performance-tuning .
It easily describes how LIKE works in terms of performance.
Likely, you are facing such issues because your whole table shoould be traversed because of first % symbol.
You should try creating a list of substrings(in a separate table for example) representing k-mers and search for them without preceeding %. Also, an index for such column would help. Please read more about KMer here https://en.m.wikipedia.org/wiki/K-mer .
This will not break an index and will be more efficient to search.
I have a query which slows down immensely when i add an addition where part
which essentially is just a like lookup on a varchar(500) field
where...
and (xxxxx.yyyy like '% blahblah %')
I've been racking my head but pretty much the query slows down terribly when I add this in.
I'm wondering if anyone has suggestions in terms of changing field type, index setup, or index hints or something that might assist.
any help appreciated.
sql 2000 enterprise.
HERE IS SOME ADDITIONAL INFO:
oops. as some background unfortunately I do need (in the case of the like statement) to have the % at the front.
There is business logic behind that which I can't avoid.
I have since created a full text catalogue on the field which is causing me problems
and converted the search to use the contains syntax.
Unfortunately although this has increased performance on occasion it appears to be slow (slower) for new word searchs.
So if i have apple.. apple appears to be faster the subsequent times but not for new searches of orange (for example).
So i don't think i can go with that (unless you can suggest some tinkering to make that more consistent).
Additional info:
the table contains only around 60k records
the field i'm trying to filter is a varchar(500)
sql 2000 on windows server 2003
The query i'm using is definitely convoluted
Sorry i've had to replace proprietary stuff.. but should give you and indication of the query:
SELECT TOP 99 AAAAAAAA.Item_ID, AAAAAAAA.CatID, AAAAAAAA.PID, AAAAAAAA.Description,
AAAAAAAA.Retail, AAAAAAAA.Pack, AAAAAAAA.CatID, AAAAAAAA.Code, BBBBBBBB.blahblah_PictureFile AS PictureFile,
AAAAAAAA.CL1, AAAAAAAA.CL1, AAAAAAAA.CL2, AAAAAAAA.CL3
FROM CCCCCCC INNER JOIN DDDDDDDD ON CCCCCCC.CID = DDDDDDDD.CID
INNER JOIN AAAAAAAA ON DDDDDDDD.CID = AAAAAAAA.CatID LEFT OUTER JOIN BBBBBBBB
ON AAAAAAAA.PID = BBBBBBBB.Product_ID INNER JOIN EEEEEEE ON AAAAAAAA.BID = EEEEEEE.ID
WHERE
(CCCCCCC.TID = 654321) AND (DDDDDDDD.In_Use = 1) AND (AAAAAAAA.Unused = 0)
AND (DDDDDDDD.Expiry > '10-11-2010 09:23:38') AND
(
(AAAAAAAA.Code = 'red pen') OR
(
(my_search_description LIKE '% red %') AND (my_search_description LIKE '% nose %')
AND (DDDDDDDD.CID IN (63,153,165,305,32,33))
)
)
AND (DDDDDDDD.CID IN (20,32,33,63,64,65,153,165,232,277,294,297,300,304,305,313,348,443,445,446,447,454,472,479,481,486,489,498))
ORDER BY AAAAAAAA.f_search_priority DESC, DDDDDDDD.Priority DESC, AAAAAAAA.Description ASC
You can see throwing in the my_search_description filter also includes a dddd.cid filter (business logic).
This is the part which is slowing things down (from a 1.5-2 second load of my pages down to a 6-8 second load (ow ow ow))
It might be my lack of understanding of how to have the full text search catelogue working.
Am very impressed by the answers so if anyone has any tips I'd be most greatful.
If you haven't already, enable full text indexing.
Unfortunately, using the LIKE clause on a query really does slow things down. Full Text Indexing is really the only way that I know of to speed things up (at the cost of storage space, of course).
Here's a link to an overview of Full-Text Search in SQL Server which will show you how to configure things and change your queries to take advantage of the full-text indexes.
More details would certainly help, but...
Full-text indexing can certainly be useful (depending on the more details about the table and your query). Full Text indexing requires a good bit of extra work both in setup and querying, but it's the only way to try to do the sort of search you seek efficiently.
The problem with LIKE that starts with a Wildcard is that SQL server has to do a complete table scan to find matching records - not only does it have to scan every row, but it has to read the contents of the char-based field you are querying.
With or without a full-text index, one thing can possibly help: Can you narrow the range of rows being searched, so at least SQL doesn't need to scan the whole table, but just some subset of it?
The '% blahblah %' is a problem for improving performance. Putting the wildcard at the beginning tells SQL Server that the string can begin with any legal character, so it must scan the entire index. Your best bet if you must have this filter is to focus on your other filters for improvement.
Using LIKE with a wildcard at the beginning of the search pattern forces the server to scan every row. It's unable to use any indexes. Indexes work from left to right, and since there is no constant on the left, no index is used.
From your WHERE clause, it looks like you're trying to find rows where a specific word exists in an entry. If you're searching for a whole word, then full text indexing may be a solution for you.
Full text indexing creates an index entry for each word that's contained in the specified column. You can then quickly find rows that contain a specific word.
As other posters have correctly pointed out, the use of the wildcard character % within the LIKE expression is resulting in a query plan being produced that uses a SCAN operation. A scan operation touches every row in the table or index, dependant on the type of scan operation being performed.
So the question really then becomes, do you actually need to search for the given text string anywhere within the column in question?
If not, great, problem solved but if it is essential to your business logic then you have two routes of optimization.
Really go to town on increasing the overall selectivity of your query by focusing your optimization efforts on the remaining search arguments.
Implement a Full Text Indexing Solution.
I don't think this is a valid answer, but I'd like to throw it out there for some more experienced posters comments...are these equivlent?
where (xxxxx.yyyy like '% blahblah %')
vs
where patindex(%blahbalh%, xxxx.yyyy) > 0
As far as I know, that's equivlent from a database logic standpoint as it's forcing the same scan. Guess it couldn't hurt to try?
In SQL Server 2005, I have a Product search that looks like:
select ProductID, Name, Email
from Product
where Name = #Name
I've been asked to ignore a couple "special" characters in Product.Name, so that a search for "Potatoes" returns "Po-ta-toes" as well as "Potatoes". My first thought is to just do this:
select ProductID, Name, Email
from Product
where REPLACE(Name, '-', '') = #Name
...but on second thought, I wonder if I'm killing performance by running a function on EVERY candidate result. Does SQL have some optimization magic that help it do this kind of thing quickly? Can you think of anything easier I might be able to try with the requirements I have?
More standards-based: You could add a new column, e.g., searchable_name, precalculate the results of the REPLACE (and any other tweaks, e.g., SOUNDEX) on INSERT/UPDATE and store them in the new column, then search against that column.
Less standards-based: Lots of RDBMS provide a feature where you can create an INDEX using a function; this is often called a functional index. Your situation seems fairly well suited to such a feature.
Most powerful/flexible: Use a dedicated search tool such as Lucene. It might seem overkill for this situation, but they were designed for searching, and most offer sophisticated stemming algorithms that would almost certainly solve this problem.
You will likely get better performance if you are willing to force the first character to be alphabetic, like this...
select ProductID, Name, Email
from Product
where REPLACE(Name, '-', '') = #Name
And Name Like Left(#Name, 1) + '%'
If the name column is indexed, you will likely get an index seek instead of a scan. The downside is, you will not return rows where the value is "-po-ta-to-es" because the first character does not match.
Can you add a field to your product table with a search-able version of the product name with special characters already removed? Then you can do the 'replace' only once for each record, and do efficient searches against the new field.
I was curious since i read it in a doc. Does writing
select * from CONTACTS where id = ‘098’ and name like ‘Tom%’;
speed up the query as oppose to
select * from CONTACTS where name like ‘Tom%’ and id = ‘098’;
The first has an indexed column on the left side. Does it actually speed things up or is it superstition?
Using php and mysql
Check the query plans with explain. They should be exactly the same.
This is purely superstition. I see no reason that either query would differ in speed. If it was an OR query rather than an AND query however, then I could see that having it on the left may spped things up.
interesting question, i tried this once. query plans are the same (using EXPLAIN).
but considering short-circuit-evaluation i was wondering too why there is no difference (or does mysql fully evaluate boolean statements?)
You may be mis-remembering or mis-reading something else, regarding which side the wildcards are on a string literal in a Like predicate. Putting the wildcard on the right (as in yr example), allows the query engine to use any indices that might exist on the table column you are searching (in this case - name). But if you put the wildcard on the left,
select * from CONTACTS where name like ‘%Tom’ and id = ‘098’;
then the engine cannot use any existing index and must do a complete table scan.