where like over varchar(500) - sql

I have a query which slows down immensely when i add an addition where part
which essentially is just a like lookup on a varchar(500) field
where...
and (xxxxx.yyyy like '% blahblah %')
I've been racking my head but pretty much the query slows down terribly when I add this in.
I'm wondering if anyone has suggestions in terms of changing field type, index setup, or index hints or something that might assist.
any help appreciated.
sql 2000 enterprise.
HERE IS SOME ADDITIONAL INFO:
oops. as some background unfortunately I do need (in the case of the like statement) to have the % at the front.
There is business logic behind that which I can't avoid.
I have since created a full text catalogue on the field which is causing me problems
and converted the search to use the contains syntax.
Unfortunately although this has increased performance on occasion it appears to be slow (slower) for new word searchs.
So if i have apple.. apple appears to be faster the subsequent times but not for new searches of orange (for example).
So i don't think i can go with that (unless you can suggest some tinkering to make that more consistent).
Additional info:
the table contains only around 60k records
the field i'm trying to filter is a varchar(500)
sql 2000 on windows server 2003
The query i'm using is definitely convoluted
Sorry i've had to replace proprietary stuff.. but should give you and indication of the query:
SELECT TOP 99 AAAAAAAA.Item_ID, AAAAAAAA.CatID, AAAAAAAA.PID, AAAAAAAA.Description,
AAAAAAAA.Retail, AAAAAAAA.Pack, AAAAAAAA.CatID, AAAAAAAA.Code, BBBBBBBB.blahblah_PictureFile AS PictureFile,
AAAAAAAA.CL1, AAAAAAAA.CL1, AAAAAAAA.CL2, AAAAAAAA.CL3
FROM CCCCCCC INNER JOIN DDDDDDDD ON CCCCCCC.CID = DDDDDDDD.CID
INNER JOIN AAAAAAAA ON DDDDDDDD.CID = AAAAAAAA.CatID LEFT OUTER JOIN BBBBBBBB
ON AAAAAAAA.PID = BBBBBBBB.Product_ID INNER JOIN EEEEEEE ON AAAAAAAA.BID = EEEEEEE.ID
WHERE
(CCCCCCC.TID = 654321) AND (DDDDDDDD.In_Use = 1) AND (AAAAAAAA.Unused = 0)
AND (DDDDDDDD.Expiry > '10-11-2010 09:23:38') AND
(
(AAAAAAAA.Code = 'red pen') OR
(
(my_search_description LIKE '% red %') AND (my_search_description LIKE '% nose %')
AND (DDDDDDDD.CID IN (63,153,165,305,32,33))
)
)
AND (DDDDDDDD.CID IN (20,32,33,63,64,65,153,165,232,277,294,297,300,304,305,313,348,443,445,446,447,454,472,479,481,486,489,498))
ORDER BY AAAAAAAA.f_search_priority DESC, DDDDDDDD.Priority DESC, AAAAAAAA.Description ASC
You can see throwing in the my_search_description filter also includes a dddd.cid filter (business logic).
This is the part which is slowing things down (from a 1.5-2 second load of my pages down to a 6-8 second load (ow ow ow))
It might be my lack of understanding of how to have the full text search catelogue working.
Am very impressed by the answers so if anyone has any tips I'd be most greatful.

If you haven't already, enable full text indexing.
Unfortunately, using the LIKE clause on a query really does slow things down. Full Text Indexing is really the only way that I know of to speed things up (at the cost of storage space, of course).
Here's a link to an overview of Full-Text Search in SQL Server which will show you how to configure things and change your queries to take advantage of the full-text indexes.

More details would certainly help, but...
Full-text indexing can certainly be useful (depending on the more details about the table and your query). Full Text indexing requires a good bit of extra work both in setup and querying, but it's the only way to try to do the sort of search you seek efficiently.
The problem with LIKE that starts with a Wildcard is that SQL server has to do a complete table scan to find matching records - not only does it have to scan every row, but it has to read the contents of the char-based field you are querying.
With or without a full-text index, one thing can possibly help: Can you narrow the range of rows being searched, so at least SQL doesn't need to scan the whole table, but just some subset of it?

The '% blahblah %' is a problem for improving performance. Putting the wildcard at the beginning tells SQL Server that the string can begin with any legal character, so it must scan the entire index. Your best bet if you must have this filter is to focus on your other filters for improvement.

Using LIKE with a wildcard at the beginning of the search pattern forces the server to scan every row. It's unable to use any indexes. Indexes work from left to right, and since there is no constant on the left, no index is used.
From your WHERE clause, it looks like you're trying to find rows where a specific word exists in an entry. If you're searching for a whole word, then full text indexing may be a solution for you.
Full text indexing creates an index entry for each word that's contained in the specified column. You can then quickly find rows that contain a specific word.

As other posters have correctly pointed out, the use of the wildcard character % within the LIKE expression is resulting in a query plan being produced that uses a SCAN operation. A scan operation touches every row in the table or index, dependant on the type of scan operation being performed.
So the question really then becomes, do you actually need to search for the given text string anywhere within the column in question?
If not, great, problem solved but if it is essential to your business logic then you have two routes of optimization.
Really go to town on increasing the overall selectivity of your query by focusing your optimization efforts on the remaining search arguments.
Implement a Full Text Indexing Solution.

I don't think this is a valid answer, but I'd like to throw it out there for some more experienced posters comments...are these equivlent?
where (xxxxx.yyyy like '% blahblah %')
vs
where patindex(%blahbalh%, xxxx.yyyy) > 0
As far as I know, that's equivlent from a database logic standpoint as it's forcing the same scan. Guess it couldn't hurt to try?

Related

Optimizing queries that search for text

I have SSRS report that uses a query that searches for specific text in the Where clause. I know using IDs is the preferred method of filtering data down, but this table only makes use of varchar strings...Very long strings I might add.
There are 5 different varchar strings that the query searches for. The strings are static, so there will be no change in the string text (Thankfully). The query itself is very straightforward. I know using wildcards would affect performance for the worse. I did some research and it seems like there's not much I can do. Before I give up, I was hoping the smart folks at Stackoverflow may have an idea of what I could do to optimize the query.
SELECT * FROM
Reviews
WHERE ReviewFieldOption = 'The student displayed proficient knowledge of the material provided over the semester'
For your query, you want in index on ReviewFieldOption.
This works for strings, even long strings -- assuming they are not tooooo long. There are some limits on the size of keys in an index.

SQL Server Efficient Search for LIKE '%str%'

In Sql Server, I have a table containing 46 million rows.
In "Title" column of table, I want make search. The word may be at any index of field value.
For example:
Value in table: BROTHERS COMPANY
Search string: ROTHER
I want this search to match the given record. This is exactly what LIKE '%ROTHER%' do. However, LIKE '%%' usage should not be used on large tables because of performance issues. How can I achieve it?
Though I don't know your requirements, your best approach may be to challenge them. Middle-of-the-string searches are usually not very practical. If you can get your users to perform prefix searches (broth%) then you can easily use Full Text's wildcard search (CONTAINS(*, '"broth*"')). Full Text can also handle suffix searches (%rothers) with a little extra work.
But when it comes to middle-of-the-string searches with SQL Server, you're stuck using LIKE. However you may be able to improve performance of LIKE by using a binary collation as explained in this article. (I hate to post a link without including its content but it is way too long of an article to post here and I don't understand the approach enough to sum it up.)
If that doesn't help and if middle-of-the-string searches are that important of a requirement then you should consider using a different search solution like Lucene.
Add Full-Text index if you want.
You can search the table using CONTAINS:
SELECT *
FROM YourTable
WHERE CONTAINS(TableColumnName, 'SearchItem')

SQL - searching database with the LIKE operator

Given your data stored somewhere in a database:
Hello my name is Tom I like dinosaurs to talk about SQL.
SQL is amazing. I really like SQL.
We want to implement a site search, allowing visitors to enter terms and return relating records. A user might search for:
Dinosaurs
And the SQL:
WHERE articleBody LIKE '%Dinosaurs%'
Copes fine with returning the correct set of records.
How would we cope however, if a user mispells dinosaurs? IE:
Dinosores
(Poor sore dino). How can we search allowing for error in spelling? We can associate common misspellings we see in search with the correct spelling, and then search on the original terms + corrected term, but this is time consuming to maintain.
Any way programatically?
Edit
Appears SOUNDEX could help, but can anyone give me an example using soundex where entering the search term:
Dinosores wrocks
returns records instead of doing:
WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'
which would return squadoosh?
If you're using SQL Server, have a look at SOUNDEX.
For your example:
select SOUNDEX('Dinosaurs'), SOUNDEX('Dinosores')
Returns identical values (D526) .
You can also use DIFFERENCE function (on same link as soundex) that will compare levels of similarity (4 being the most similar, 0 being the least).
SELECT DIFFERENCE('Dinosaurs', 'Dinosores'); --returns 4
Edit:
After hunting around a bit for a multi-text option, it seems that this isn't all that easy. I would refer you to the link on the Fuzzt Logic answer provided by #Neil Knight (+1 to that, for me!).
This stackoverflow article also details possible sources for implentations for Fuzzy Logic in TSQL. Once respondant also outlined Full text Indexing as a potential that you might want to investigate.
Perhaps your RDBMS has a SOUNDEX function? You didn't mention which one was involved here.
SQL Server's SOUNDEX
Just to throw an alternative out there. If SSIS is an option, then you can use Fuzzy Lookup.
SSIS Fuzzy Lookup
I'm not sure if introducing a separate "search engine" is possible, but if you look at products like the Google search appliance or Autonomy, these products can index a SQL database and provide more searching options - for example, handling misspellings as well as synonyms, search results weighting, alternative search recommendations, etc.
Also, SQL Server's full-text search feature can be configured to use a thesaurus, which might help:
http://msdn.microsoft.com/en-us/library/ms142491.aspx
Here is another SO question from someone setting up a thesaurus to handle common misspellings:
FORMSOF Thesaurus in SQL Server
Short answer, there is nothing built in to most SQL engines that can do dictionary-based correction of "fat fingers". SoundEx does work as a tool to find words that would sound alike and thus correct for phonetic misspellings, but if the user typed in "Dinosars" missing the final U, or truly "fat-fingered" it and entered "Dinosayrs", SoundEx would not return an exact match.
Sounds like you want something on the level of Google Search's "Did you mean __?" feature. I can tell you that is not as simple as it looks. At a 10,000-foot level, the search engine would look at each of those keywords and see if it's in a "dictionary" of known "good" search terms. If it isn't, it uses an algorithm much like a spell-checker suggestion to find the dictionary word that is the closest match (requires the fewest letter substitutions, additions, deletions and transpositions to turn the given word into the dictionary word). This will require some heavy procedural code, either in a stored proc or CLR Db function in your database, or in your business logic layer.
You can also try the SubString(), to eliminate the first 3 or so characters . Below is an example of how that can be achieved
SELECT Fname, Lname
FROM Table1 ,Table2
WHERE substr(Table1.Fname, 1,3) || substr(Table1.Lname,1 ,3) = substr(Table2.Fname, 1,3) || substr(Table2.Lname, 1 , 3))
ORDER BY Table1.Fname;

Placing index columns on the left of a mysql WHERE statement?

I was curious since i read it in a doc. Does writing
select * from CONTACTS where id = ‘098’ and name like ‘Tom%’;
speed up the query as oppose to
select * from CONTACTS where name like ‘Tom%’ and id = ‘098’;
The first has an indexed column on the left side. Does it actually speed things up or is it superstition?
Using php and mysql
Check the query plans with explain. They should be exactly the same.
This is purely superstition. I see no reason that either query would differ in speed. If it was an OR query rather than an AND query however, then I could see that having it on the left may spped things up.
interesting question, i tried this once. query plans are the same (using EXPLAIN).
but considering short-circuit-evaluation i was wondering too why there is no difference (or does mysql fully evaluate boolean statements?)
You may be mis-remembering or mis-reading something else, regarding which side the wildcards are on a string literal in a Like predicate. Putting the wildcard on the right (as in yr example), allows the query engine to use any indices that might exist on the table column you are searching (in this case - name). But if you put the wildcard on the left,
select * from CONTACTS where name like ‘%Tom’ and id = ‘098’;
then the engine cannot use any existing index and must do a complete table scan.

Need Pattern for dynamic search of multiple sql tables

I'm looking for a pattern for performing a dynamic search on multiple tables.
I have no control over the legacy (and poorly designed) database table structure.
Consider a scenario similar to a resume search where a user may want to perform a search against any of the data in the resume and get back a list of resumes that match their search criteria. Any field can be searched at anytime and in combination with one or more other fields.
The actual sql query gets created dynamically depending on which fields are searched. Most solutions I've found involve complicated if blocks, but I can't help but think there must be a more elegant solution since this must be a solved problem by now.
Yeah, so I've started down the path of dynamically building the sql in code. Seems godawful. If I really try to support the requested ability to query any combination of any field in any table this is going to be one MASSIVE set of if statements. shiver
I believe I read that COALESCE only works if your data does not contain NULLs. Is that correct? If so, no go, since I have NULL values all over the place.
As far as I understand (and I'm also someone who has written against a horrible legacy database), there is no such thing as dynamic WHERE clauses. It has NOT been solved.
Personally, I prefer to generate my dynamic searches in code. Makes testing convenient. Note, when you create your sql queries in code, don't concatenate in user input. Use your #variables!
The only alternative is to use the COALESCE operator. Let's say you have the following table:
Users
-----------
Name nvarchar(20)
Nickname nvarchar(10)
and you want to search optionally for name or nickname. The following query will do this:
SELECT Name, Nickname
FROM Users
WHERE
Name = COALESCE(#name, Name) AND
Nickname = COALESCE(#nick, Nickname)
If you don't want to search for something, just pass in a null. For example, passing in "brian" for #name and null for #nick results in the following query being evaluated:
SELECT Name, Nickname
FROM Users
WHERE
Name = 'brian' AND
Nickname = Nickname
The coalesce operator turns the null into an identity evaluation, which is always true and doesn't affect the where clause.
Search and normalization can be at odds with each other. So probably first thing would be to get some kind of "view" that shows all the fields that can be searched as a single row with a single key getting you the resume. then you can throw something like Lucene in front of that to give you a full text index of those rows, the way that works is, you ask it for "x" in this view and it returns to you the key. Its a great solution and come recommended by joel himself on the podcast within the first 2 months IIRC.
What you need is something like SphinxSearch (for MySQL) or Apache Lucene.
As you said in your example lets imagine a Resume that will composed of several fields:
List item
Name,
Adreess,
Education (this could be a table on its own) or
Work experience (this could grow to its own table where each row represents a previous job)
So searching for a word in all those fields with WHERE rapidly becomes a very long query with several JOINS.
Instead you could change your framework of reference and think of the Whole resume as what it is a Single Document and you just want to search said document.
This is where tools like Sphinx Search do. They create a FULL TEXT index of your 'document' and then you can query sphinx and it will give you back where in the Database that record was found.
Really good search results.
Don't worry about this tools not being part of your RDBMS it will save you a lot of headaches to use the appropriate model "Documents" vs the incorrect one "TABLES" for this application.