I have an application where search query takes too much time. There are different search queries where LIKE (with '%__%') operator is mostly used. I need some general guidelines (do's and don't s) for making a better and faster search query.
Querying in SQL with a wildcard at the end is quite efficient, but there is definitely a performance issue with having a wildcard at the beginning.
One way around this is in SQL 2008 has support for Full Text Search
You'll have to change how your query works to utilize full text indexing, but it should dramatically improve your text querying performance.
Related
I am writing a prototype of a new app for an enterprise. I want to include a great search engine, which is something they have never had before. What I am looking for is something that can translate a lucene style query language into SQL statements on a key value pair data model. (three fields, grouping id, key, value)
Ive been looking for a while now and havn't had any luck. Im about to open the source for lucene and see if I can pull the query algorithms out and have them generate sql instead of index search commands. but im not very hopefull.
I can't just run lucene or any other indexing system on this enterprise for political and regulatory reasons so thats not an option.
Does this type of system exist?
see if I can pull the query algorithms out and have them generate sql instead
Don't waste your time. SQL and Lucene queries work in a completely different way; this is because they use different underlying data structures, algorithms, etc.
The best you can do is to write SQL query parser and rewrite those queries into Lucene queries. But you'd have to be naive to think you can write full-blown SQL query parser. You can easily solve simple cases, but what are you going to do when somebody sends you a JOIN? Or a GROUP BY bar HAVING foo>3?
If you can't jump over political hurdles, just use one of the full text indexing algorithms databases can offer; this is better than nothing.
Is there a way to query a full text index to help determine additional noise words? I would like to add some custom noise words and wondered if theres a way to analyse the index to help determine suggestions.
As simple as in
http://arcanecode.com/2008/05/29/creating-and-customizing-noise-words-in-sql-server-2005-full-text-search/
where this is explained (how to do it). Coming up with proper ones, though, is hard.
I decided to look into lucene.net because I wasn't happy with the relevance calculations in sql server full text indexing.
I managed to figure out how to index all the content pretty quickly and then used Luke to find noise words. I have now edited the sql server noise files based on this analysis. Now I have a search solution that works reasonably well using sql server full text indexing, but I plan to move to lucene.net in the future.
Using sql server full text indexing as a base, I developed a domain centric approach to finding relevant content using tool I understood. After some serious thinking and testing, I used many other measures to determine the relevance of a search result other than what is provided by analysing text content for term frequency and word distance. SQL Server full text indexing provided me a great start, and now I have a strategy I can express using lucene that will work very well.
It would have taken me a whole lot longer to understand lucene, and develop a strategy for the search. If anyone out there is still reading this, use full text indexing for testing your idea and then move to lucene once you have a strategy you know will work for your domain.
I've found a number of resources that talk about tuning the database server, but I haven't found much on the tuning of the individual queries.
For instance, in Oracle, I might try adding hints to ignore indexes or to use sort-merge vs. correlated joins, but I can't find much on tuning Postgres other than using explicit joins and recommendations when bulk loading tables.
Do any such guides exist so I can focus on tuning the most run and/or underperforming queries, hopefully without adversely affecting the currently well-performing queries?
I'd even be happy to find something that compared how certain types of queries performed relative to other databases, so I had a better clue of what sort of things to avoid.
update:
I should've mentioned, I took all of the Oracle DBA classes along with their data modeling and SQL tuning classes back in the 8i days ... so I know about 'EXPLAIN', but that's more to tell you what's going wrong with the query, not necessarily how to make it better. (eg, are 'while var=1 or var=2' and 'while var in (1,2)' considered the same when generating an execution plan? What if I'm doing it with 10 permutations? When are multi-column indexes used? Are there ways to get the planner to optimize for fastest start vs. fastest finish? What sort of 'gotchas' might I run into when moving from mySQL, Oracle or some other RDBMS?)
I could write any complex query dozens if not hundreds of ways, and I'm hoping to not have to try them all and find which one works best through trial and error. I've already found that 'SELECT count(*)' won't use an index, but 'SELECT count(primary_key)' will ... maybe a 'PostgreSQL for experienced SQL users' sort of document that explained sorts of queries to avoid, and how best to re-write them, or how to get the planner to handle them better.
update 2:
I found a Comparison of different SQL Implementations which covers PostgreSQL, DB2, MS-SQL, mySQL, Oracle and Informix, and explains if, how, and gotchas on things you might try to do, and his references section linked to Oracle / SQL Server / DB2 / Mckoi /MySQL Database Equivalents (which is what its title suggests) and to the wikibook SQL Dialects Reference which covers whatever people contribute (includes some DB2, SQLite, mySQL, PostgreSQL, Firebird, Vituoso, Oracle, MS-SQL, Ingres, and Linter).
As for badly performing queries - do explain analyze and read it.
You can put explain analyze output on site like explain.depesz.com - it will help you find the elements that really take the most time.
There is a nice online tool that takes the output of EXPLAIN ANALYZE, and graphically shows you critical parts (e.g. wrong estimates, hot spots, etc)
http://explain.depesz.com/help
Btw, I think posted queries become public, and the "previous explains" link has been hit by spambots.
http://www.postgresql.org/docs/current/static/indexes-examine.html
You can give hints: SET enable_indexscan TO false; would make PostgreSQL try to not use indexes
To address your point, unfortunately the only way to tune a query in Postgres is pretty much to tune the database underlying it. In oracle, you can set all of those options on a query by query basis, trump the optimizers plan in the process, but in Postgres, you're pretty much at the mercy of the optimizer, for good and ill.
The PGAdmin3 tool includes a graphical explanation tool for breaking down how a query is handled. It also is especially helpful for showing where table scans occur.
Best I've seen are in here: http://wiki.postgresql.org/wiki/Using_EXPLAIN, but the latest PDF in there is from 2008, so there may be something more recent. I'm interested to hear other user's answers.
Also, something's brewing in the contrib packages: http://www.sai.msu.su/~megera/wiki/plantuner
I have a SQL 2000 database with around 10 million rows and I need to make a query to get the product information based on the full / partial text search.
Based on this I need to join back to other tables to check on my business process.
I have this implemented using SQL proc, but I can only validate around 6 rows a sec (without threads.. its a long business logic). I am trying to find a better ways to improve performance.
Lucene.NET might help on this. I have couple of questions.
Can you point me to right sources.
While building index on Lucene, how would I sync up with the SQL database and lucene DB?
Do you think Lucene can give real performance gain?
You can start with Mark Krellenstein's 'Search Engine versus DBMS', to see whether a full text search engine, such as Lucene, is the solution for you. In theory, Lucene should be faster than SQL for textual search, but your mileage may vary.
You can do incremental updates with Lucene, which are a bit similar to database replication. This keeps the Lucene index synchronized with the database.
Here is an article on using LINQ to Lucene to work with SQL. This may point you in the right direction.
I have an MS SQL database and have a varchar field that I would like to do queries like where name like '%searchTerm%'. But right now it is too slow, even with SQL enterprise's full text indexing.
Can someone explain how Lucene .Net might help my situation? How does the indexer work? How do queries work?
What is done for me, and what do I have to do?
I saw this guy (Michael Neel) present on Lucene at a user group meeting - effectively, you build index files (using Lucene) and they have pointers to whatever you want (database rows, whatever)
http://code.google.com/p/vinull/source/browse/#svn/Examples/LuceneSearch
Very fast, flexible and powerful.
What's good with Lucene is the ability to index a variety of things (files, images, database rows) together in your own index using Lucene and then translating that back to your business domain, whereas with SQL Server, it all has to be in SQL to be indexed.
It doesn't look like his slides are up there in Google code.
This article (strangely enough it's on the top of the Google search results :) has a fairly good description of how the Lucene search could be optimised.
Properly configured Lucene should easily beat SQL (pre 2005) full-text indexing search. If you on MS SQL 2005 and your search performance is still too slow you might consider checking your DB setup.