Oracle: Poor performance on smaller result sets? - sql

I am running into a very strange bit of behavior with a query in Oracle The query itself is enormous and quite complex...but is basically the same every time I run it. However, it seems to execute more slowly when returning a smaller result-set. The best example I can give is that if I set this filter on it,
and mgz_query.IsQueryStatus(10001,lqo.query_id)>0
which returns 960 of 12,429 records, I see an execution time of about 1.9 seconds. However, if I change the filter to
and mgz_query.IsQueryStatus(10005,lqo.query_id)>0
which returns 65 of 12,429 records, I see an execution time of about 6.8 seconds. When digging a bit deeper, I found that it seems the smaller result set was performing considerably more buffer gets than the larger result set. This seems completely counter-intuitive to me.
The query this is being run against is roughly 8000 characters long (Unless someone wants it, I'm not going to clutter this post with the entire query), includes 4 'Union All' statements, but otherwise filters primarily on indexes and is pretty efficient, apart from its massive size.
The filter in use is executed via the below function.
Function IsQueryStatus(Pni_QueryStateId in number,
Pni_Query_Id in number) return pls_integer as
vn_count pls_integer;
Begin
select count(1)
into vn_count
from m_query mq
where mq.id = Pni_Query_Id
and mq.state_id = Pni_QueryStateId;
return vn_count;
End;
Any ideas as to what may be causing the smaller result set to perform so much worse than the large result set?

I think you are facing a situation where determining that something is not in the set takes a lot longer than determining if it is in the set. This can occur quite often. For instance, if there is an index on m_query(id), then consider how the where clause might be executed:
(1) The value Pni_Query_Id is looked up in the index. There is no match. Query is done with a value of 0.
(2) There are a bunch of matches. Now, let's fetch the pages where state_id is located and compare to Pni_QueryStateId. Ohh, that's a lot more work.
If that is the case, then having an index on m_query(id, state_id) should help the query.
By the way, this is assuming that the only change is in function call in the where clause. If there are other changes to get fewer rows, you might just be calling this function fewer times.

Related

Why the difference in speed between these SQL queries?

I'm currently trying to write a query to sift through our ERP DB and I noticed a very odd drop in speed when I removed a filter condition.
This is what the Query looked like before (it took less than a second to complete, typically returned anywhere from 10 to hundreds of records depending on the order and item)
SELECT TOP 1000 jobmat.job, jobmat.suffix, jobmat.item, jobmat.matl_qty,
jobmat.ref_type, jobmat.ref_num, spec.NoteContent, spec.NoteDesc,
job.ord_num, jobmat.RowPointer
FROM jobmat
INNER JOIN ObjectNotes AS obj ON obj.RefRowPointer = jobmat.RowPointer
INNER JOIN SpecificNotes AS spec ON obj.SpecificNoteToken = spec.SpecificNoteToken
INNER JOIN job ON job.job = jobmat.job AND job.suffix = jobmat.suffix
WHERE ord_num LIKE '%3766%' AND ref_type = 'P' AND
(spec.NoteDesc LIKE '%description%' OR spec.NoteContent LIKE '%COMPANY%DWG%1162402%')
And this is what I changed the WHERE Statement too:
WHERE ord_num LIKE '%3766%' AND ref_type = 'P' AND
spec.NoteContent LIKE '%COMPANY%DWG%1162402%'
Running it after having made this modifcation bumped my runtime up to like 9 seconds (returns normally 1-3 records). Is there an obvious reason that I'm missing? I would have thought that the same should have been roughly the same. Any help is greatly appreciated!
Edit: I have run both versions of this query many times to test, and the runtimes are fairly consistant; <1 second for version 1, ~9 seconds for version 2.
Your original query has to find 1000 matching rows before it returns the result set. It is doing this by creating a result set based on the JOINs and then filtering based on the WHERE clause.
The condition:
(spec.NoteDesc LIKE '%description%' OR spec.NoteContent LIKE '%COMPANY%DWG%1162402%')
matches more rows than:
(spec.NoteContent LIKE '%COMPANY%DWG%1162402%')
Hence, the second version has to go through and process more rows to get 1,000 matches.
That is why the second version is slower.
If you want the second version to seem faster, you can add an ORDER BY clause. This requires reading all the data in both cases. Because the ORDER BY then takes additional time related to the number of matching rows, you'll probably see that the second is slightly faster than the first. One caveat: both will then take at least 9 seconds.
EDIT:
I like the above explanation, but if all the rows are processed in both cases, then it might not be correct. Here is another explanation, but it is more speculative.
It depends on NoteContent being very long, so the LIKE has a noticeable effect on query performance. The second version might be "optimized" by pushing the filtering clauses to the node in the processing that reads the data. This is something that SQL Server regularly does. That means that the LIKE is being processed on all the data in the second case.
Because of the OR condition, the first example cannot push the filtering down to that level. Then, the condition on NoteDescription is short-circuited in multiple ways:
The join conditions.
The other WHERE conditions.
The first LIKE condition.
Because of all this filtering, the second condition is run on many fewer rows, and the code runs faster. Usually filtering before joining is a good idea, but there are definitely exceptions to this rule.

SQL Server: Returning different number of rows from the same source data?

I have encountered today by far the most strange behaviour of SQL Server 2008 for me so far (or maybe I was already too tired to figure out what is going on).
I have a fairly complicated statement operating on around 1,100,000 rows. My source data is in table used by the subquery and it doesn't change. The statement looks something like this (I'll include only the parts that could cause the error in my opinion):
SELECT
-- here are some columns straight from the subquery and some hardcoded columns like
'My company' AS CompanyName
-- and here is the most important part
SUM(Subq.ImportantFloatWithBigPrecision)
FROM
(
SELECT
-- Here are some simple columns fetched from table and after that:
udf.GetSomeMappedValue(Value) AS MappedValueFromFunction
,ImportantFloatWithBigPrecision
FROM MyVeryBigTable
WHERE ImportantFloatWithBigPrecision <> 0
AND(...)
) Subq
GROUP BY everything except SUM(Subq.ImportantFloatWithBigPrecision)
HAVING SUM(Subq.ImportantFloatWithBigPrecision)<>0
ORDER BY (...)
I've checked the subquery quite a few times and it returns the same results every time, but the whole query returns 850-855 rows from the same data! Most of the time (around 80%) it is 852 rows, but sometimes its 855, sometimes 850 and I really have no idea why.
Surprisingly removing the <> 0 condition from the subquery helped at first glance (so far 6 times I got the same results), but it has drastic impact on performance (for this amount of rows it runs about 8-9 mins longer (remember the udf? :/)
Could someone explain this to me? Any ideas/questions? I'm completely clueless...
EDIT (inspired by Crono): We're running this on two environments: dev and test and comparing the results so maybe some of the settings may differ.
EDIT 2: Values of ImportantFloatWithBigPrecision are from a really wide range (around -1,000,000 to + 1,000,000) but there are 'few' (propably in this scale around 25k-30k) rows which have values really close to 0 (first non-zero digit on 6-th place after separator, some start even further) both negative and positive.

Finding Execution time of query using SQL Developer

I am beginner with Oracle DB.
I want to know execution time for a query. This query returns around 20,000 records.
When I see the SQL Developer, it shows only 50 rows, maximum tweakable to 500. And using F5 upto 5000.
I would have done by making changes in the application, but application redeployment is not possible as it is running on production.
So, I am limited to using only SQL Developer. I am not sure how to get the seconds spent for execution of the query ?
Any ideas on this will help me.
Thank you.
Regards,
JE
If you scroll down past the 50 rows initially returned, it fetches more. When I want all of them, I just click on the first of the 50, then press CtrlEnd to scroll all the way to the bottom.
This will update the display of the time that was used (just above the results it will say something like "All Rows Fetched: 20000 in 3.606 seconds") giving you an accurate time for the complete query.
If your statement is part of an already deployed application and if you have rights to access the view V$SQLAREA, you could check for number of EXECUTIONS and CPU_TIME. You can search for the statement using SQL_TEXT:
SELECT CPU_TIME, EXECUTIONS
FROM V$SQLAREA
WHERE UPPER (SQL_TEXT) LIKE 'SELECT ... FROM ... %';
This is the most precise way to determine the actual run time. The view V$SESSION_LONGOPS might also be interesting for you.
If you don't have access to those views you could also use a cursor loop for running through all records, e.g.
CREATE OR REPLACE PROCEDURE speedtest AS
count number;
cursor c_cursor is
SELECT ...;
BEGIN
-- fetch start time stamp here
count := 0;
FOR rec in c_cursor
LOOP
count := count +1;
END LOOP;
-- fetch end time stamp here
END;
Depending on the architecture this might be more or less accurate, because data might need to be transmitted to the system where your SQL is running on.
You can change those limits; but you'll be using some time in the data transfer between the DB and the client, and possibly for the display; and that in turn would be affected by the number of rows pulled by each fetch. Those things affect your application as well though, so looking at the raw execution time might not tell you the whole story anyway.
To change the worksheet (F5) limit, go to Tools->Preferences->Database->Worksheet, and increase the 'Max rows to print in a script' value (and maybe 'Max lines in Script output'). To change the fetch size go to the Database->Advanced panel in the preferences; maybe to match your application's value.
This isn't perfect but if you don't want to see the actual data, just get the time it takes to run in the DB, you can wrap the query to get a single row:
select count(*) from (
<your original query
);
It will normally execute the entire original query and then count the results, which won't add anything significant to the time. (It's feasible it might rewrite the query internally I suppose, but I think that's unlikely, and you could use hints to avoid it if needed).

long running queries: observing partial results?

As part of a data analysis project, I will be issuing some long running queries on a mysql database. My future course of action is contingent on the results I obtain along the way. It would be useful for me to be able to view partial results generated by a SELECT statement that is still running.
Is there a way to do this? Or am I stuck with waiting until the query completes to view results which were generated in the very first seconds it ran?
Thank you for any help : )
In general case the partial result cannot be produced. For example, if you have an aggregate function with GROUP BY clause, then all data should be analysed, before the 1st row is returned. LIMIT clause will not help you, because it is applied after the output is computed. Maybe you can give a concrete data and SQL query?
One thing you may consider is sampling your tables down. This is good practice in data analysis in general to get your iteration speed up when you're writing code.
For example, if you have table create privelages and you have some mega-huge table X with key unique_id and some data data_value
If unique_id is numeric, in nearly any database
create table sample_table as
select unique_id, data_value
from X
where mod(unique_id, <some_large_prime_number_like_1013>) = 1
will give you a random sample of data to work your queries out, and you can inner join your sample_table against the other tables to improve speed of testing / query results. Thanks to the sampling your query results should be roughly representative of what you will get. Note, the number you're modding with has to be prime otherwise it won't give a correct sample. The example above will shrink your table down to about 0.1% of the original size (.0987% to be exact).
Most databases also have better sampling and random number methods than just using mod. Check the documentaion to see what's available for your version.
Hope that helps,
McPeterson
It depends on what your query is doing. If it needs to have the whole result set before producing output - such as might happen for queries with group by or order by or having clauses, then there is nothing to be done.
If, however, the reason for the delay is client-side buffering (which is the default mode), then that can be adjusted using "mysql-use-result" as an attribute of the database handler rather than the default "mysql-store-result". This is true for the Perl and Java interfaces: I think in the C interface, you have to use an unbuffered version of the function that executes the query.

What was your coolest SQL optimization, on a slow performing query?

Just speaking to a colleague of mine. He was walking with a hop in his step, on the way to the coffee machine.
I asked him "what's with the 'swarmy' walk?", he said, "I just reduced a two hour long query down to 40 seconds! It feels so good".
He altered a stored procedure, that was using cursors and introduced a temp table, that was refactored from the original dataset - I will email him soon, to get more info on actual implemetation.
But ultimately, he was buzzing.
Question is, what SQL, that sticks in your mind and has made you buzz, whilst optimising slow performing queries?
I have to say when I learned how to create and use covered indexes. Now, THAT was a performance booster.
Using SQL's BULKIMPORT to reduce several hours of inherited INSERT code to less than a minute.
It's always nice to take a poorly written, cursor-laden query and eliminate cursors, cut the code by half, and improve performance many-fold.
Some of the best improvements are in clarity (and often result in nice performance boosts, too).
Sorry, I don't tend to get a buzz from that sort of thing but most situations have been pretty basic, monitoring performance of queries and adding indexes to speed them up.
Now increasing the speed of "real" code that I've written by changing data structures and algorithms within the class, that's where I get my buzz (and reputation a the go-to man for performance fixes at work).
hey on the iphone which uses sqlite, i straight away reduced by database processing time from 40 seconds to 2 seconds with the use of exclusive write transactions... i was super happy doing this
as this was my first experience of sql on an embedded device - quite different from the usual server related stuff (indexes, normalizations, etc etc)
--- as far as servers go, indexes are real blessing. also if you take a bit of pain and get rid of as many nulls as you can in your table, you would be surprised with the performance gains - not many developers focus on nulls, they usually go with indexes and other documented stuff
few other lesser exploited ways - using xml to process multiple batch inserts / updates / deletes at 1 go instead of doing 1 insert at a time - in sql 2005 this can be super cool
It's all about indexes. And avoiding stupid things that make them useless.
Changing order of conditions inside WHERE clause so it filters the most discriminating condition first (while at the same time indexes from non-discriminating columns like gender are removed).
Back in the day, I worked on a CICS/DB2 system, written in COBOL. A lot of our queries were doing full table scans (and slow) even though we had all the proper indexes and WHERE clauses.
It turned out (and I may have this backwards, it's been 15 years) that the problem was that we were using PIC S9(n) COMP in WORKING STORAGE for the query parameters, but DB2 wanted PIC S9(n) COMP-3. By using the wrong data type, DB2 had to do a full table scan in order to convert the values in the database to the value we were passing in. We changed our variable definitions and the queries were able to use the indexes now, which dramatically improved our performance.
I had a query that was originally written for SQL Server 6.5, which did not support the SQL 92 join syntax, i.e.
select foo.baz
from foo
left outer join bar
on foo.a = bar.a
was instead written as
select foo.baz
from foo, bar
where foo.a *= bar.a
The query had been around for a while, and the relevant data had accumulated to make the query run too slow, abut 90 seconds to complete. By the time this problem arose, we had upgraded to SQL Server 7.
After mucking about with indexes and other Easter-egging, I changed the join syntax to be SQL 92 compliant. The query time dropped to 3 seconds.
I don't think I'll ever have that feeling again. I was a f%$^ing hero.
I answered this on a previous question ("Biggest performance improvement you’ve had with the smallest change?"), however, it's such a simple improvement, yet one that is and can be so often overlooked, that it bears repeating:
Indexes!
Well we had a similar thing where we had a slow query on a Open Freeway site. The answer wasn't so much optimising the query, but to optimise the server that it was on. We increased the cache limit and cache size so that the server would not run the query so often.
This has massively increased the speed of the system and ultimately made the client happy! :)
Not quite the calibre of the original posts optimisation skills, but it definitely made us buzz!
Splitting one ridiculously long stored procedure, which did a great deal of "if it's after 5 pm, return this bit of sql" and which took in excess of 20 seconds to run, into a set of stored procedures that were called by one controlling sp, and got the times down to subsecond responses.
One Word, Dynamic Queries
If you serching with large numbers of parameters you can discount them from the SQL string. This has sped up my queries dramatically and with reletive ease.
Create PROCEDURE dbo.qryDynamic
(
#txtParameter1 nvarchar(255),
#txtParameter2 nvarchar(255),
AS
SELECT qry_DataFromAView.*
FROM qry_DataFromAView
BEGIN
DECLARE #SQL nvarchar(2500)
DECLARE #txtJoin nvarchar(50)
Set #txtJoin = ' Where '
SET #SQL = 'SELECT qry_DataFromAView.*
FROM qry_DataFromAView'
IF #txtParameter1 is not null
Begin
SET #SQL=#SQL + #txtJoin + ' Field1 LIKE N''%'' + #dynParameter1 + N''%'') '
Set #txtJoin = ' And '
end
IF #txtParameter2 is not null
Begin
SET #SQL=#SQL + #txtJoin + ' Field2 LIKE N''%'' + #dynParameter2 + N''%'') '
Set #txtJoin = ' And '
end
SET #SQL=#SQL + ' ORDER BY Field2'
Exec sp_executesql #SQL, N'#dynParameter1 nvarchar(255), #dynParameter2 nvarchar(255)', #dynParameter1 = #txtParameter1 ,#dynParameter2 = #txtParameter2
END
GO
I had a warm glow after being able to use a Cross Tab query to scrap oodles (technical term) of processing and lookups...
Usually it's simple things like adding indexes or only getting the data you need, but when you find a problem that fits an answer you've seen before... good times!
(Half way of topic)
I rewrote a 3000 line stored procedure into LINQ2SQL/C#.
The stored procedure juggled lots of data between a bunch of unindexed temp tables.
The LINQ2SQL version read the data into a couple of Dictionaries and ILookups and then I joined the data manually with plain old C# code.
The stored procedure took about 20 seconds and the LINQ2SQL/C# version took 0.2 seconds.