What was your coolest SQL optimization, on a slow performing query? - sql

Just speaking to a colleague of mine. He was walking with a hop in his step, on the way to the coffee machine.
I asked him "what's with the 'swarmy' walk?", he said, "I just reduced a two hour long query down to 40 seconds! It feels so good".
He altered a stored procedure, that was using cursors and introduced a temp table, that was refactored from the original dataset - I will email him soon, to get more info on actual implemetation.
But ultimately, he was buzzing.
Question is, what SQL, that sticks in your mind and has made you buzz, whilst optimising slow performing queries?

I have to say when I learned how to create and use covered indexes. Now, THAT was a performance booster.

Using SQL's BULKIMPORT to reduce several hours of inherited INSERT code to less than a minute.

It's always nice to take a poorly written, cursor-laden query and eliminate cursors, cut the code by half, and improve performance many-fold.
Some of the best improvements are in clarity (and often result in nice performance boosts, too).

Sorry, I don't tend to get a buzz from that sort of thing but most situations have been pretty basic, monitoring performance of queries and adding indexes to speed them up.
Now increasing the speed of "real" code that I've written by changing data structures and algorithms within the class, that's where I get my buzz (and reputation a the go-to man for performance fixes at work).

hey on the iphone which uses sqlite, i straight away reduced by database processing time from 40 seconds to 2 seconds with the use of exclusive write transactions... i was super happy doing this
as this was my first experience of sql on an embedded device - quite different from the usual server related stuff (indexes, normalizations, etc etc)
--- as far as servers go, indexes are real blessing. also if you take a bit of pain and get rid of as many nulls as you can in your table, you would be surprised with the performance gains - not many developers focus on nulls, they usually go with indexes and other documented stuff
few other lesser exploited ways - using xml to process multiple batch inserts / updates / deletes at 1 go instead of doing 1 insert at a time - in sql 2005 this can be super cool

It's all about indexes. And avoiding stupid things that make them useless.

Changing order of conditions inside WHERE clause so it filters the most discriminating condition first (while at the same time indexes from non-discriminating columns like gender are removed).

Back in the day, I worked on a CICS/DB2 system, written in COBOL. A lot of our queries were doing full table scans (and slow) even though we had all the proper indexes and WHERE clauses.
It turned out (and I may have this backwards, it's been 15 years) that the problem was that we were using PIC S9(n) COMP in WORKING STORAGE for the query parameters, but DB2 wanted PIC S9(n) COMP-3. By using the wrong data type, DB2 had to do a full table scan in order to convert the values in the database to the value we were passing in. We changed our variable definitions and the queries were able to use the indexes now, which dramatically improved our performance.

I had a query that was originally written for SQL Server 6.5, which did not support the SQL 92 join syntax, i.e.
select foo.baz
from foo
left outer join bar
on foo.a = bar.a
was instead written as
select foo.baz
from foo, bar
where foo.a *= bar.a
The query had been around for a while, and the relevant data had accumulated to make the query run too slow, abut 90 seconds to complete. By the time this problem arose, we had upgraded to SQL Server 7.
After mucking about with indexes and other Easter-egging, I changed the join syntax to be SQL 92 compliant. The query time dropped to 3 seconds.
I don't think I'll ever have that feeling again. I was a f%$^ing hero.

I answered this on a previous question ("Biggest performance improvement you’ve had with the smallest change?"), however, it's such a simple improvement, yet one that is and can be so often overlooked, that it bears repeating:
Indexes!

Well we had a similar thing where we had a slow query on a Open Freeway site. The answer wasn't so much optimising the query, but to optimise the server that it was on. We increased the cache limit and cache size so that the server would not run the query so often.
This has massively increased the speed of the system and ultimately made the client happy! :)
Not quite the calibre of the original posts optimisation skills, but it definitely made us buzz!

Splitting one ridiculously long stored procedure, which did a great deal of "if it's after 5 pm, return this bit of sql" and which took in excess of 20 seconds to run, into a set of stored procedures that were called by one controlling sp, and got the times down to subsecond responses.

One Word, Dynamic Queries
If you serching with large numbers of parameters you can discount them from the SQL string. This has sped up my queries dramatically and with reletive ease.
Create PROCEDURE dbo.qryDynamic
(
#txtParameter1 nvarchar(255),
#txtParameter2 nvarchar(255),
AS
SELECT qry_DataFromAView.*
FROM qry_DataFromAView
BEGIN
DECLARE #SQL nvarchar(2500)
DECLARE #txtJoin nvarchar(50)
Set #txtJoin = ' Where '
SET #SQL = 'SELECT qry_DataFromAView.*
FROM qry_DataFromAView'
IF #txtParameter1 is not null
Begin
SET #SQL=#SQL + #txtJoin + ' Field1 LIKE N''%'' + #dynParameter1 + N''%'') '
Set #txtJoin = ' And '
end
IF #txtParameter2 is not null
Begin
SET #SQL=#SQL + #txtJoin + ' Field2 LIKE N''%'' + #dynParameter2 + N''%'') '
Set #txtJoin = ' And '
end
SET #SQL=#SQL + ' ORDER BY Field2'
Exec sp_executesql #SQL, N'#dynParameter1 nvarchar(255), #dynParameter2 nvarchar(255)', #dynParameter1 = #txtParameter1 ,#dynParameter2 = #txtParameter2
END
GO

I had a warm glow after being able to use a Cross Tab query to scrap oodles (technical term) of processing and lookups...
Usually it's simple things like adding indexes or only getting the data you need, but when you find a problem that fits an answer you've seen before... good times!

(Half way of topic)
I rewrote a 3000 line stored procedure into LINQ2SQL/C#.
The stored procedure juggled lots of data between a bunch of unindexed temp tables.
The LINQ2SQL version read the data into a couple of Dictionaries and ILookups and then I joined the data manually with plain old C# code.
The stored procedure took about 20 seconds and the LINQ2SQL/C# version took 0.2 seconds.

Related

How to get a count of the number of times a sql statement has executed in X hours?

I'm using oracle db. I want to be able to count the number of times that a SQL statement was executed in X hours. For instance, how many times has the statement Select * From ExampleTable been executed in the past 5 hours?
I tried looking in V$SQL, V$SQLSTATS, V$SQLAREA, but they only keep a record of a statement's total amount of executions. They don't store what times the individual executions occurred. Is there any view I missed, or something else that does keep track of each individual statement execution + timestamp so that I can query by which have occurred X hours ago? Thanks for the help.
The views in the Active Workload Repository store historical SQL execution information, specifically the view DBA_HIST_SQLSTAT.
The view is not perfect; it contains a summary of the top SQL statements. This is almost perfect information for performance tuning - in practice, sampling will catch any performance problem. But if you're looking for a perfect record of every SQL execution, as far as I know the only way to get that information is through tracing, which is buggy and slow.
Hopefully this query is good enough:
select begin_interval_time, end_interval_time, executions_delta, dba_hist_sqlstat.*
from dba_hist_sqlstat
join dba_hist_snapshot
on dba_hist_sqlstat.snap_id = dba_hist_snapshot.snap_id
and dba_hist_sqlstat.instance_number = dba_hist_snapshot.instance_number
order by begin_interval_time desc, sql_id;
Apologies for putting this in an answer instead of a comment (I don't have the required reputation), but I think you may be out of luck. Here is an AskTOM asking basically the same question: AskTOM. Tom says unless you are using ASH that just isn't something the database is designed to do.

Oracle: Poor performance on smaller result sets?

I am running into a very strange bit of behavior with a query in Oracle The query itself is enormous and quite complex...but is basically the same every time I run it. However, it seems to execute more slowly when returning a smaller result-set. The best example I can give is that if I set this filter on it,
and mgz_query.IsQueryStatus(10001,lqo.query_id)>0
which returns 960 of 12,429 records, I see an execution time of about 1.9 seconds. However, if I change the filter to
and mgz_query.IsQueryStatus(10005,lqo.query_id)>0
which returns 65 of 12,429 records, I see an execution time of about 6.8 seconds. When digging a bit deeper, I found that it seems the smaller result set was performing considerably more buffer gets than the larger result set. This seems completely counter-intuitive to me.
The query this is being run against is roughly 8000 characters long (Unless someone wants it, I'm not going to clutter this post with the entire query), includes 4 'Union All' statements, but otherwise filters primarily on indexes and is pretty efficient, apart from its massive size.
The filter in use is executed via the below function.
Function IsQueryStatus(Pni_QueryStateId in number,
Pni_Query_Id in number) return pls_integer as
vn_count pls_integer;
Begin
select count(1)
into vn_count
from m_query mq
where mq.id = Pni_Query_Id
and mq.state_id = Pni_QueryStateId;
return vn_count;
End;
Any ideas as to what may be causing the smaller result set to perform so much worse than the large result set?
I think you are facing a situation where determining that something is not in the set takes a lot longer than determining if it is in the set. This can occur quite often. For instance, if there is an index on m_query(id), then consider how the where clause might be executed:
(1) The value Pni_Query_Id is looked up in the index. There is no match. Query is done with a value of 0.
(2) There are a bunch of matches. Now, let's fetch the pages where state_id is located and compare to Pni_QueryStateId. Ohh, that's a lot more work.
If that is the case, then having an index on m_query(id, state_id) should help the query.
By the way, this is assuming that the only change is in function call in the where clause. If there are other changes to get fewer rows, you might just be calling this function fewer times.

mysterious oracle query

if a query in oracle takes the first time it is executed 11 minutes, and the next time, the same query 25 seconds, with the buffer being flushed, what is the possible cause? could it be that the query is written in a bad way?
set timing on;
set echo on
set lines 999;
insert into elegrouptmp select idcll,idgrpl,0 from elegroup where idgrpl = 109999990;
insert into SLIMONTMP (idpartes, indi, grecptseqs, devs, idcll, idclrelpayl)
select rel.idpartes, rel.indi, rel.idgres,rel.iddevs,vpers.idcll,nvl(cdsptc.idcll,vpers.idcll)
from
relbqe rel,
elegrouptmp ele,
vrdlpers vpers
left join cdsptc cdsptc on
(cdsptc.idclptcl = vpers.idcll and
cdsptc.cdptcs = 'NOS')
where
rel.idtits = '10BCPGE ' and
vpers.idbqes = rel.idpartes and
vpers.cdqltptfc = 'N' and
vpers.idcll = ele.idelegrpl and
ele.idgrpl = 109999990;
alter system flush shared_pool;
alter system flush buffer_cache;
alter system flush global context;
select /* original */ mvtcta_part_SLIMONtmp.idpartes,mvtcta_part_SLIMONtmp.indi,mvtcta_part_SLIMONtmp.grecptseqs,mvtcta_part_SLIMONtmp.devs,
mvtcta_part_SLIMONtmp.idcll,mvtcta_part_SLIMONtmp.idclrelpayl,mvtcta_part_vrdlpers1.idcll,mvtcta_part_vrdlpers1.shnas,mvtcta_part_vrdlpers1.cdqltptfc,
mvtcta_part_vrdlpers1.idbqes,mvtcta_part_compte1.idcll,mvtcta_part_compte1.grecpts,mvtcta_part_compte1.seqc,mvtcta_part_compte1.devs,mvtcta_part_compte1.sldminud,
mvtcta.idcll,mvtcta.grecptseqs,mvtcta.devs,mvtcta.termel,mvtcta.dtcptl,mvtcta.nusesi,mvtcta.fiches,mvtcta.indl,mvtcta.nuecrs,mvtcta.dtexel,mvtcta.dtvall,
mvtcta.dtpayl,mvtcta.ioi,mvtcta.mtd,mvtcta.cdlibs,mvtcta.libcps,mvtcta.sldinitd,mvtcta.flagtypei,mvtcta.flagetati,mvtcta.flagwarnl,mvtcta.flagdonei,mvtcta.oriindl,
mvtcta.idportfl,mvtcta.extnuecrs
from SLIMONtmp mvtcta_part_SLIMONtmp
left join vrdlpers mvtcta_part_vrdlpers1 on
(
mvtcta_part_vrdlpers1.idbqes = mvtcta_part_SLIMONtmp.idpartes
and mvtcta_part_vrdlpers1.cdqltptfc = 'N'
and mvtcta_part_vrdlpers1.idcll = mvtcta_part_SLIMONtmp.idcll
)
left join compte mvtcta_part_compte1 on
(
mvtcta_part_compte1.idcll = mvtcta_part_vrdlpers1.idcll
and mvtcta_part_compte1.grecpts = substr (mvtcta_part_SLIMONtmp.grecptseqs, 1, 2 )
and mvtcta_part_compte1.seqc = substr (mvtcta_part_SLIMONtmp.grecptseqs, -1 )
and mvtcta_part_compte1.devs = mvtcta_part_SLIMONtmp.devs
and (mvtcta_part_compte1.devs = ' ' or ' ' = ' ')
and mvtcta_part_compte1.cdpartc not in ( 'L' , 'R' )
)
left join mvtcta mvtcta on
(
mvtcta.idcll = mvtcta_part_SLIMONtmp.idclrelpayl
and mvtcta.devs = mvtcta_part_SLIMONtmp.devs
and mvtcta.grecptseqs = mvtcta_part_SLIMONtmp.grecptseqs
and mvtcta.flagdonei <> 0
and mvtcta.devs = mvtcta_part_compte1.devs
and mvtcta.dtvall > 20101206
)
where 1=1
order by mvtcta_part_compte1.devs,
mvtcta_part_SLIMONtmp.idpartes,
mvtcta_part_SLIMONtmp.idclrelpayl,
mvtcta_part_SLIMONtmp.grecptseqs,
mvtcta.dtvall;
"if a query in oracle takes the first
time it is executed 11 minutes, and
the next time, the same query 25
seconds, with the buffer being
flushed, what is the possible cause?"
The thing is, flushing the DB Buffers, like this ...
alter system flush shared_pool
/
... wipes the Oracle data store but there are other places where data gets cached. For instance the chances are your OS caches its file reads.
EXPLAIN PLAN is good as a general guide to how the database thinks it will execute a query, but it is only a prediction. It can be thrown out by poor statistics or ambient conditions. It is not good at explaining why a specific instance of a query took as much time as it did.
So, if you really want to understand what occurs when the database executes a specific query you need to get down and dirty, and learn how to use the Wait Interface. This is a very powerful tracing mechanism, which allows us to see the individual events that happen over the course of a single query execution. Each version of Oracle has extended the utility and richness of the Wait Interface, but it has been essential to proper tuning since Oracle 9i (if not earlier).
Find out more by reading Roger Schrag's very good overview .
In your case you'll want to run the trace multiple times. In order to make it easier to compare results you should use a separate session for each execution, setting the 10046 event each time.
What else is happening on the box when you ran these? You can get different timings based on other processes chewing resources. Also, with a lot of joins, performance will depend on memory usage (hash_area_size, sort_area_size, etc) and availability, so perhaps you are paging (check temp space size/usage also). In short, try sql_trace and tkprof to analyze deeper
Sometimes a block is written to the file system before it is committed (a dirty block). When that block is read later, Oracle sees that it was uncommitted. It checks the open transaction and, if the transaction isn't still there, it knows the change was committed. Therefore it writes the block back as a clean block. It is called delayed block cleanout.
That is one possible reason why reading blocks for the first time can be slower than a subsequent re-read.
Could be the second time the execution plan is known. Maybe the optimizer has a very hard time finding a execution plan for some reason.
Try setting
alter session set optimizer_max_permutations=100;
and rerun the query. See if that makes any difference.
could it be that the query is written in a bad way?
"bad" is a rather emotional expression - but broadly speaking, yes, if a query performs significantly faster the second time it's run, it usually means there are ways to optimize the query.
Actually optimizing the query is - as APC says - rather a question of "down and dirty". Obvious candidate in your examply might be the substring - if the table is huge, and the substring misses the index, I'd imagine that would take a bit of time, and I'd imagine the result of all those substrin operations are cached somewhere.
Here's Tom Kyte's take on flushing Oracle buffers as a testing practice. Suffice it to say he's not a fan. He favors the approach of attempting to emulate your production load with your test data ("real life"), and tossing out the first and last runs. #APC's point about OS caching is Tom's point - to get rid of that (non-trivial!) effect you'd need to bounce the server, not just the database.

Efficient way to compute accumulating value in sqlite3

I have an sqlite3 table that tells when I gain/lose points in a game. Sample/query result:
SELECT time,p2 FROM events WHERE p1='barrycarter' AND action='points'
ORDER BY time;
1280622305|-22
1280625580|-9
1280627919|20
1280688964|21
1280694395|-11
1280698006|28
1280705461|-14
1280706788|-13
[etc]
I now want my running point total. Given that I start w/ 1000 points,
here's one way to do it.
SELECT DISTINCT(time), (SELECT
1000+SUM(p2) FROM events e WHERE p1='barrycarter' AND action='points'
AND e.time <= e2.time) AS points FROM events e2 WHERE p1='barrycarter'
AND action='points' ORDER BY time
but this is highly inefficient. What's a better way to write this?
MySQL has #variables so you can do things like:
SELECT time, #tot := #tot+points ...
but I'm using sqlite3 and the above isn't ANSI standard SQL anyway.
More info on the db if anyone needs it: http://ccgames.db.94y.info/
EDIT: Thanks for the answers! My dilemma: I let anyone run any
single SELECT query on "http://ccgames.db.94y.info/". I want to give
them useful access to my data, but not to the point of allowing
scripting or allowing multiple queries with state. So I need a single
SQL query that can do accumulation. See also:
Existing solution to share database data usefully but safely?
SQLite is meant to be a small embedded database. Given that definition, it is not unreasonable to find many limitations with it. The task at hand is not solvable using SQLite alone, or it will be terribly slow as you have found. The query you have written is a triangular cross join that will not scale, or rather, will scale badly.
The most efficient way to tackle the problem is through the program that is making use of SQLite, e.g. if you were using Web SQL in HTML5, you can easily accumulate in JavaScript.
There is a discussion about this problem in the sqlite mailing list.
Your 2 options are:
Iterate through all the rows with a cursor and calculate the running sum on the client.
Store sums instead of, or as well as storing points. (if you only store sums you can get the points by doing sum(n) - sum(n-1) which is fast).

How do I output progress messages from a SELECT statement?

I have a SQL script that I want to output progress messages as it runs. Having it output messages between SQL statements is easy, however I have some very long running INSERT INTO SELECTs. Is there a way to have a select statement output messages as it goes, for example after every 1000 rows, or every 5 seconds?
Note: This is for SQL Anywhere, but answers in any SQL dialect will be fine.
There's no way to retrieve the execution status of a single query. None of the mainstream database engines provide this functionality.
Furthermore, a measurable overhead would be generated from any progress implementation were one to exist, so if a query is already taking an uncomfortably long time such that you want to show progress, causing additional slowdown by showing said progress might not be a design goal.
You may find this article on estimating SQL execution progress helpful, though its practical implications are limited.
SQL itself has no provision for this kind of thing. Any way of doing this would involve talking directly to the database engine, and would not be standard across databases.
Really the idea of progress with set based operations (which is what a relational database uses) wouldn't be too helpful, at least not as displayed with a progress bar (percent done vs total). By the time the optimizer figured out what it needed to do and really understood the full cost of the operation, you have already completed a significant portion of the operation. Progress displays are really meant for iterative operations rather than set operations.
That's talking about your general SELECT statement execution. For inserts that are separate statements there are all kinds of ways to do that from the submitter by monitoring the consumption rate of the statements. If they are bulk inserts (select into, insert from, and the like) then you really have the same problem that I described above. Set operations are batched in a way that make a progress bar type of display somewhat meaningless.
I am on the SQL Anywhere engine development team and there is currently no way to do this. I can't promise anything, but we are considering adding this type of functionality to a future release.
There's certainly no SQL-standard solution to this. Sorry to be doom-laden, but I haven't seen anything that can do this in Oracle, SQL Server, Sybase or MySQL, so I wouldn't be too hopeful for SQLAnywhere.
I agree that SQL does not have a way to do this directly. One way might be to only insert the TOP 1000 at a time and then print your status message. Then keep repeating this as needed (in a loop of some kind). The downside is that you would then need a way to keep track of where you are.
I should note that this approach will not be as efficient as just doing one big INSERT
Here's what I would do (Sybase / SQL Server syntax):
DECLARE #total_rows int
SELECT #total_rows = count(*)
FROM Source_Table
WHILE #total_rows > (SELECT count(*) FROM Target_Table)
BEGIN
SET rowcount 1000
print 'inserting 1000 rows'
INSERT Target_Table
SELECT *
FROM Source_Table s
WHERE NOT EXISTS( SELECT 1
FROM Target_Table t
WHERE t.id = s.id )
END
set rowcount 0
print 'done'
Or you could do it based on IDs (assumes Id is a number):
DECLARE #min_id int,
#max_id int,
#start_id int,
#end_id int
SELECT #min_id = min(id) ,
#max_id = max(id)
FROM Source_Table
SELECT #start_id = #min_id ,
#end_id = #min_id + 1000
WHILE #end_id <= #max_id
BEGIN
print 'inserting id range: ' + convert(varchar,#start_id) + ' to ' + convert(varchar,#end_id)
INSERT Target_Table
SELECT *
FROM Source_Table s
WHERE id BETWEEN #start_id AND #end_id
SELECT #start_id = #end_id + 1,
#end_id = #end_id + 1000
END
set rowcount 0
print 'done'
One thought might to have another separate process count the number of rows in the table where the insert is being done to determine what percentage of them are there already. This of course would require that you know the total in the end. This would probably only be okay if this you're not too worried about server load.
On the off chance you're using Toad, you can generate a set of INSERT statements from a table and configure it to commit at a user input frequency. You could modify your scripts a little bit and then see how much of the new data has been commited as you go.
You can simulate the effect for your users by timing several runs, then having a progress bar advance at the average records / second rate.
The only other ways will be
1 - Refer to the API of your database engine to see if it makes any provision for that
or
2 - Break your INSERT into many smaller statements, and report on them as you go. But that will have a significant negative performance impact.
If you need to have it or you die, for insert,update,delete you can use some trigger logic with db variables, and time by time you do sql to retrieve variable data and display some progress to user.
If you wan`t to use it, I can write an example and send it.
Stumbled upon this old thread looking for something else. I disagree with the idea that we don't want progress information just because it's a set operation. Users will often tolerate even a long wait if they know how long it is.
Here's what I suggest:
Each time this runs, log the number of rows inserted and the total time, then add a step at the beginning of that process to query that log and calculate an estimated total time. If you base your estimate on the last runs, you should be able to present an acceptably good guess for the wait time for the thing to finish.