I'm trying to optimize my queries by using ColdFusion's Query of Queries feature to access a cached query of about 45,000 words.
With this below query I had lots of success in speed switching to QoQ:
<cfquery name="FindAnagrams" dbtype="query" >
SELECT AllWords.Word, AllWords.AnagramKey
FROM AllWords
WHERE AllWords.WordLength = #i#
</cfquery>
Executions went from ~400ms to ~15ms.
This below query however was only slightly reduced in execution time (from ~500ms to ~400ms):
<cfquery name="TopStartWith" dbtype="query" maxrows="15">
SELECT AllWords.Word
FROM AllWords
WHERE AllWords.Word LIKE <cfoutput>'#Word#%' </cfoutput>
AND AllWords.Word <> '#Word#'
ORDER BY AllWords.Frequency DESC;
</cfquery>
Removing 'Maxrows' did not really help. My Database fields are indexed and I'm ad the end of my knowledge of optimizing queries (Can you index a column of a CF QoQ object?) I suspect it is the 'ORDER BY' that is causing the delay, but am unsure. How can I further improve the speed of such queries? Many thanks.
For optimizing the second query, there are a couple of approaches you could take.
Firstly, see if your database supports something like function-based indexes (an oracle term, but it is available in other platforms). See this for a mySQL example: Is it possible to have function-based index in MySQL?
Secondly, you could pre-process your words into a structure which supports the query you're after. I'm assuming you're currently loading the query into application or session scope elsewhere. When you do that you could also process the words into a structure like:
{
'tha':['thames','that'],
'the':['them','then','there'],
//etc
}
Instead of running a QoQ, you get the first 3 letters of the word, look up the array, then iterate over it, finding matches. Essentially, it's pretty similar to what a function-based index is doing, but in code. You're trading memory for speed, but with on 45000 words, the structure isn't going to be enormous.
The LIKE clause probably causes the poor performance of your second query. You can see a similar performance penalty if you use LIKE in a regular database query. Since LIKE performs a wildcard search against the entire string stored in the database column, it can't just do an EQUALS comparison.
Related
I want to improve the performance of a simple query, typical structure like that:
SELECT title,datetime
FROM LICENSE_MOVIES
WHERE client='Alex'
As you can read in different websites,like this, you should make an index like that:
CREATE INDEX INDEX_LICENSE_MOVIES
ON LICENSE_MOVIES(client);
But there is any performance in the query, it is like it where "ignoring" the index.
I have try to use hints like this webpage says.
And the query result like this:
SELECT /*+ INDEX(LICENSE_MOVIES INDEX_LICENSE_MOVIES) */ title, datetime
FROM LICENSE_MOVIES
WHERE client='Alex'
Is there is any error in this syntax? Why couldn't I appreciate any improvement?
Oracle has a smart optimizer. It does not always use indexes -- in fact, you might be surprised to learn that sometimes using an index is exactly the wrong thing to do.
In your case, your data fits on a handful of data pages (well, dozens). The question is: How many "Alex"s are in the data. If there is just one, then Oracle should use the index, as following:
Oracle looks up the row containing "Alex" in the index.
Oracle identifies the data page where the row is located.
Oracle loads the data page.
Oracle processes the query and returns the results.
If lots of rows (say more than a few dozen) are for "Alex", then the optimizer is going to "think" . . . "Gosh, I need to read every data page anyway. Let me avoid using the index and just scan all the data."
Of course, this decision is based on the available statistics (which might be inaccurate or out-of-date). But there are definitely circumstances where a full table scan is the right approach, even when an index is available.
We’re having a problem we were hoping the good folks of Stack Overflow could help us with. We’re running SQL Server 2008 R2 and are having problems with a query that takes a very long time to run on a moderate set of data , about 100000 rows. We're using CONTAINS to search through xml files and LIKE on another column to support leading wild cards.
We’ve reproduced the problem with the following small query that takes about 35 seconds to run:
SELECT something FROM table1
WHERE (CONTAINS(TextColumn, '"WhatEver"') OR
DescriptionColumn LIKE '%WhatEver%')
Query plan:
If we modify the query above to using UNION instead, the running time drops from 35 seconds to < 1 seconds. We would like to avoid using this approach to solve the issue.
SELECT something FROM table1 WHERE (CONTAINS(TextColumn, '"WhatEver"')
UNION
(SELECT something FROM table1 WHERE (DescriptionColumn LIKE '%WhatEver%'))
Query plan:
The column that we’re using CONTAINS to search through is a column with type image and consists of xml files sized anywhere from 1k to 20k in size.
We have no good theories as to why the first query is so slow so we were hoping someone here would have something wise to say on the matter. The query plans don’t show anything out of the ordinary as far as we can tell. We've also rebuilt the indexes and statistics.
Is there anything blatantly obvious we’re overlooking here?
Thanks in advance for your time!
Why are you using DescriptionColumn LIKE '%WhatEver%' instead of CONTAINS(DescriptionColumn, '"WhatEver"')?
CONTAINS is obviously a Full-Text predicate and will use the SQL Server Full-Text engine to filter the search results, however LIKE is a "normal" SQL Server keyword and so SQL Server will not use the Full-Text engine to asist with this query - In this case because the LIKE term begins with a wildcard SQL Server will be unable to use any indexes to help with the query either which will most likely result in a table scan and / or poorer performance than using the Full-Text engine.
Its difficult impossible to tell without an execution plan, however my guess on whats happening would be:
The UNION variation of the query is performing a table scan against table1 - the table scan is not fast, however because there are relatively few rows in the table it is not performing that slowly (compared to a 35s benchmark).
In the OR variation of the query SQL Server is first using the Full-Text engine to filter based on the CONTAINS and then goes on to perform an RDI lookup on each matching row in the result to filter based on the LIKE predicate, however for some reason SQL Server has massively underestimated the number of rows (this can happen with certain types of predicate) and so goes on to perform several thousnad RDI lookups which ends up being incredibly slow (a table scan would have been much quicker).
To really understand whats going on you need to get a query plan.
Did you guys try this:
SELECT *
FROM table
WHERE CONTAINS((column1, column2, column3), '"*keyword*"')
Instead of this:
SELECT *
FROM table
WHERE CONTAINS(column1, '"*keyword*"')
OR CONTAINS(column2, '"*keyword*"')
OR CONTAINS(column3y, '"*keyword*"')
The first one is a lot faster.
I just ran into this. This is reportedly a bug on SQL server 2008 R2:
http://www.arcomit.co.uk/support/kb.aspx?kbid=000060
Your approach of using a UNION of two selects instead of an OR is the workaround they recommend in that article.
I have a query that looks something like this:
select xmlelement("rootNode",
(case
when XH.ID is not null then
xmlelement("xhID", XH.ID)
else
xmlelement("xhID", xmlattributes('true' AS "xsi:nil"), XH.ID)
end),
(case
when XH.SER_NUM is not null then
xmlelement("serialNumber", XH.SER_NUM)
else
xmlelement("serialNumber", xmlattributes('true' AS "xsi:nil"), XH.SER_NUM)
end),
/*repeat this pattern for many more columns from the same table...*/
FROM XH
WHERE XH.ID = 'SOMETHINGOROTHER'
It's ugly and I don't like it, and it is also the slowest executing query (there are others of similar form, but much smaller and they aren't causing any major problems - yet). Maintenance is relatively easy as this is mostly a generated query, but my concern now is for performance. I am wondering how much of an overhead there is for all of these case expressions.
To see if there was any difference, I wrote another version of this query as:
select xmlelement("rootNode",
xmlforest(XH.ID, XH.SER_NUM,...
(I know that this query does not produce exactly the same, thing, my plan was to move the logic for handling the renaming and xsi:nil attribute to XSL or maybe to PL/SQL)
I tried to get execution plans for both versions, but they are the same. I'm guessing that the logic does not get factored into the execution plan. My gut tells me the second version should execute faster, but I'd like some way to prove that (other than writing a PL/SQL test function with timing statements before and after the query and running that code over and over again to get a test sample).
Is it possible to get a good idea of how much the case-when will cost?
Also, I could write the case-when using the decode function instead. Would that perform better (than case-statements)?
Just about anything in your SELECT list, unless it is a user-defined function which reads a table or view, or a nested subselect, can usually be neglected for the purpose of analyzing your query's performance.
Open your connection properties and set the value SET STATISTICS IO on. Check out how many reads are happening. View the query plan. Are your indexes being used properly? Do you know how to analyze the plan to see?
For the purposes of performance tuning you are dealing with this statement:
SELECT *
FROM XH
WHERE XH.ID = 'SOMETHINGOROTHER'
How does that query perform? If it returns in markedly less time than the XML version then you need to consider the performance of the functions, but I would astonished if that were the case (oh ho!).
Does this return one row or several? If one row then you have only two things to work with:
is XH.ID indexed and, if so, is the index being used?
does the "many more columns from the same table" indicate a problem with chained rows?
If the query returns several rows then ... Well, actually you have the same two things to work with. It's just the emphasis is different with regards to indexes. If the index has a very poor clustering factor then it could be faster to avoid using the index in favour of a full table scan.
Beyond that you would need to look at physical problems - I/O bottlenecks, poor interconnects, a dodgy disk. The reason why your scope for tuning the query is so restricted is because - as presented - it is a single table, single column read. Most tuning is about efficient joining. Now if XH transpires to be a view over a complex query then it is a different matter.
You can use good old tkprof to analyze statistics. One of the many forms of ALTER SESSION that turn on stats gathering. The DBMS_PROFILER package also gathers statistics if your cursor is in a PL/SQL code block.
How does one performance tune a SQL Query?
What tricks/tools/concepts can be used to change the performance of a SQL Query?
How can the benefits be Quantified?
What does one need to be careful of?
What tricks/tools/concepts can be used to change the performance of a SQL Query?
Using Indexes? How do they work in practice?
Normalised vs Denormalised Data? What are the performance vs design/maintenance trade offs?
Pre-processed intermediate tables? Created with triggers or batch jobs?
Restructure the query to use Temp Tables, Sub Queries, etc?
Separate complex queries into multiples and UNION the results?
Anything else?
How can performance be Quantified?
Reads?
CPU Time?
"% Query Cost" when different versions run together?
Anything else?
What does one need to be careful of?
Time to generate Execution Plans? (Stored Procs vs Inline Queries)
Stored Procs being forced to recompile
Testing on small data sets (Do the queries scale linearly, or square law, etc?)
Results of previous runs being cached
Optimising "normal case", but harming "worst case"
What is "Parameter Sniffing"?
Anything else?
Note to moderators:
This is a huge question, should I have split it up in to multiple questions?
Note To Responders:
Because this is a huge question please reference other questions/answers/articles rather than writing lengthy explanations.
I really like the book "Professional SQL Server 2005 Performance Tuning" to answer this. It's Wiley/Wrox, and no, I'm not an author, heh. But it explains a lot of the things you ask for here, plus hardware issues.
But yes, this question is way, way beyond the scope of something that can be answered in a comment box like this one.
Writing sargable queries is one of the things needed, if you don't write sargable queries then the optimizer can't take advantage of the indexes. Here is one example Only In A Database Can You Get 1000% + Improvement By Changing A Few Lines Of Code this query went from over 24 hours to 36 seconds
Of course you also need to know the difference between these 3 join
loop join,
hash join,
merge join
see here: http://msdn.microsoft.com/en-us/library/ms173815.aspx
Here some basic steps that need to follow:
Define business requirements first
SELECT fields instead of using SELECT *
Avoid SELECT DISTINCT
Create joins with INNER JOIN (not WHERE)
Use WHERE instead of HAVING to define filters
Proper indexing
Here are some basic steps which we can follow to increase the performance:
Check for indexes in pk and fk for the tables involved if it is still taking time index the columns present in the query.
All indexes are modified after every operation so kindly do not index each and every column
Before batch insertion delete the indexes and then recreate the indexes.
Select sparingly
Use if exists instead of count
Before accusing dba first check network connections
We have a whole bunch of queries that "search" for clients, customers, etc. You can search by first name, email, etc. We're using LIKE statements in the following manner:
SELECT *
FROM customer
WHERE fname LIKE '%someName%'
Does full-text indexing help in the scenario? We're using SQL Server 2005.
It will depend upon your DBMS. I believe that most systems will not take advantage of the full-text index unless you use the full-text functions. (e.g. MATCH/AGAINST in mySQL or FREETEXT/CONTAINS in MS SQL)
Here is two good articles on when, why, and how to use full-text indexing in SQL Server:
How To Use SQL Server Full-Text Searching
Solving Complex SQL Problems with Full-Text Indexing
FTS can help in this scenario, the question is whether it is worth it or not.
To begin with, let's look at why LIKE may not be the most effective search. When you use LIKE, especially when you are searching with a % at the beginning of your comparison, SQL Server needs to perform both a table scan of every single row and a byte by byte check of the column you are checking.
FTS has some better algorithms for matching data as does some better statistics on variations of names. Therefore FTS can provide better performance for matching Smith, Smythe, Smithers, etc when you look for Smith.
It is, however, a bit more complex to use FTS, as you'll need to master CONTAINS vs FREETEXT and the arcane format of the search. However, if you want to do a search where either FName or LName match, you can do that with one statement instead of an OR.
To determine if FTS is going to be effective, determine how much data you have. I use FTS on a database of several hundred million rows and that's a real benefit over searching with LIKE, but I don't use it on every table.
If your table size is more reasonable, less than a few million, you can get similar speed by creating an index for each column that you're going to be searching on and SQL Server should perform an index scan rather than a table scan.
According to my test scenario:
SQL Server 2008
10.000.000 rows each with a string like "wordA wordB
wordC..." (varies between 1 and 30 words)
selecting count(*) with CONTAINS(column, "wordB")
result size several hundred thousands
catalog size approx 1.8GB
Full-text index was in range of 2s whereas like '% wordB %' was in range of 1-2 minutes.
But this counts only if you don't use any additional selection criteria! E.g. if I used some "like 'prefix%'" on a primary key column additionally, the performance was worse since the operation of going into the full-text index costs more than doing a string search in some fields (as long those are not too much).
So I would recommend full-text index only in cases where you have to do a "free string search" or use some of the special features of it...
To answer the question specifically for MSSQL, full-text indexing will NOT help in your scenario.
In order to improve that query you could do one of the following:
Configure a full-text catalog on the column and use the CONTAINS() function.
If you were primarily searching with a prefix (i.e. matching from the start of the name), you could change the predicate to the following and create an index over the column.
where fname like 'prefix%'
(1) is probably overkill for this, unless the performance of the query is a big problem.