query slow once an hour or so, takes less than 100 ms 99% of the time

query slow once an hour or so, takes less than 100 ms 99% of the time - sql

I am troubleshooting a slow query, it runs in less than 100 ms 99% of the time, but once in an hr (or two no pattern, i guess), goes bad and does 6 million reads and takes 11 seconds! I saw the query plan, it does do a clustered index scan, I noticed the cached_plans dynamic management view use counts column keeps increasing every time the query executes, so i am thinking its the same plan, just wondering why at one point it goes out-of-whack! any pointers will be helpful. I haven't tried anything as it runs pretty fast most of the time.

First something could easily be blocking the query to make it run slow. Otr there could be other things happening onthe server at the same time that are consuming most of its resources.
Next, the parameters of the query might be bad for the saved execution plan.
Or the statistics might be out of date
Or if the query is an action query as opposed to a select, the particular parameters may be causing a problem in a trigger that makes it take longer.
Or teh query might be returning significanlty more results at times. If you run it at 10 and return 10 results and an import puts more records inteh table that meet the query conditions, at 10:30 you might return a million results which would clearly be slower.
One thing I like to do in such circumstances is set up logging so that the exact query is logged with the time at the time of execution. Then you can see what the query that ran sloweractaully was if you have varaible , than might be differnt from run to run.

Related

deleting 500 records from table with 1 million records shouldn't take this long

I hope someone can help me. I have a simple sql statement
delete from sometable
where tableidcolumn in (...)
I have 500 records I want to delete and recreate. The table recently grew to over 1 mill records. The problem is the statement above is taking over 5 minutes without completing. I have a primary key and 2 non clustered non unique indexes. My delete statement is using the primary key.
Can anyone help me understand why this statement is taking so long and how I can speed it up?

There are two areas I would look at first, locking and a bad plan.
Locking - run your query and while it is running see if it is being blocked by anything else "select * from sys.dm_exec_requests where blocking_session_id <> 0" if you see anything blocking your request then I would start with looking at:
https://www.simple-talk.com/sql/database-administration/the-dba-as-detective-troubleshooting-locking-and-blocking/
If there is no locking then get the execution plan for the insert, what is it doing? it it exceptionally high?
Other than that, how long do you expect it to take? Is it a little bit longer than that or a lot longer? Did it only get so slow after it grew significantly or has it been getting slower over a long period of time?
What is the I/O performance, what are your average read / write times etc etc.

TL;DR: Don't do that (instead of a big 'in' clause: preload and use a temporary table).
With the number of parameters, unknown backend configuration (even though it should be fine by today's standards) and not able to guess what your in-memory size may be during processing, you may be hitting (in order) a stack, batch or memory size limit, starting with this answer. Also possible to hit an instruction size limit.
The troubleshooting comments may lead you to another answer. My pivot's the 'in' clause, statement size, and that all of these links include advice to preload a temporary table and use that with your query.

SQL Server stored procedure reducing amount of memory granted

Execution Plan Download Link: https://www.dropbox.com/s/3spvo46541bf6p1/Execution%20plan.xml?dl=0
I'm using SQL Server 2008 R2
I have a pretty complex stored procedure that's requesting way too much memory upon execution. Here's a screenshot of the execution plan:
http://s15.postimg.org/58ycuhyob/image.png
The underlying query probably needs a lot of tuning as indicated by massive number of estimated rows, but that's besides the point. Regardless of the complexity of the query, it should not be requesting 3 gigabytes of memory upon execution.
How do I prevent this behavior? I've tried the following:
DBCC FREEPROCCACHE to clear plan cache. This accomplished nothing.
Setting RECOMPILE option on both SP and SQL level. Again, this does nothing.
Messing around with MAXDOP option, from 0 to 8. Same issue.
The query returns about ~1k rows on average, and it does look into a table with more than 3 million rows with about 4 tables being joined. Executing the query returns the result in less than 3 seconds in majority of the cases.
Edit:
One more thing, using query hints is not really viable for this case since the parameters vary greatly for our case.
Edit2:
Uploaded execution plan upon request
Edit3:
I've tried rebuilding/reorganizing fragmented indices. Apparently, there were few but nothing too serious. Anyhow, this didn't reduce the amount of memory granted and didn't reduce the number of estimated rows (If this is somehow related).

You say optimizing the query is besides the point, but actually it's actually just the point. When a query is executed, SQL Server will -after generating the execution plan- reserve the memory needed for executing the query. The more rows intermediate results are estimated to hold the more memory is estimated to be required.
So, rewrite your query and/or create new indexes to get a decent query plan. A quick glance at the query plan shows some nested loops without join predicates and a number of table scans of which probably only a few records are used.

View Clustered Index Seek over 0.5 million rows takes 7 minutes

Take a look at this execution plan: http://sdrv.ms/1agLg7K
It’s not estimated, it’s actual. From an actual execution that took roughly 30 minutes.
Select the second statement (takes 47.8% of the total execution time – roughly 15 minutes).
Look at the top operation in that statement – View Clustered Index Seek over _Security_Tuple4.
The operation costs 51.2% of the statement – roughly 7 minutes.
The view contains about 0.5M rows (for reference, log2(0.5M) ~= 19 – a mere 19 steps given the index tree node size is two, which in reality is probably higher).
The result of that operator is zero rows (doesn’t match the estimate, but never mind that for now).
Actual executions – zero.
So the question is: how the bleep could that take seven minutes?! (and of course, how do I fix it?)
EDIT: Some clarification on what I'm asking here.
I am not interested in general performance-related advice, such as "look at indexes", "look at sizes", "parameter sniffing", "different execution plans for different data", etc.
I know all that already, I can do all that kind of analysis myself.
What I really need is to know what could cause that one particular clustered index seek to be so slow, and then what could I do to speed it up.
Not the whole query.
Not any part of the query.
Just that one particular index seek.
END EDIT
Also note how the second and third most expensive operations are seeks over _Security_Tuple3 and _Security_Tuple2 respectively, and they only take 7.5% and 3.7% of time. Meanwhile, _Security_Tuple3 contains roughly 2.8M rows, which is six times that of _Security_Tuple4.
Also, some background:
This is the only database from this project that misbehaves.
There are a couple dozen other databases of the same schema, none of them exhibit this problem.
The first time this problem was discovered, it turned out that the indexes were 99% fragmented.
Rebuilding the indexes did speed it up, but not significantly: the whole query took 45 minutes before rebuild and 30 minutes after.
While playing with the database, I have noticed that simple queries like “select count(*) from _Security_Tuple4” take several minutes. WTF?!
However, they only took several minutes on the first run, and after that they were instant.
The problem is not connected to the particular server, neither to the particular SQL Server instance: if I back up the database and then restore it on another computer, the behavior remains the same.

First I'd like to point out a little misconception here: although the delete statement is said to take nearly 48% of the entire execution, this does not have to mean it takes 48% of the time needed; in fact, the 51% assigned inside that part of the query plan most definitely should NOT be interpreted as taking 'half of the time' of the entire operation!
Anyway, going by your remark that it takes a couple of minutes to do a COUNT(*) of the table 'the first time' I'm inclined to say that you have an IO issue related to said table/view. Personally I don't like materialized views very much so I have no real experience with them and how they behave internally but normally I would suggest that fragmentation is causing its toll on the underlying storage system. The reason it works fast the second time is because it's much faster to access the pages from the cache than it was when fetching them from disk, especially when they are all over the place. (Are there any (max) fields in the view ?)
Anyway, to find out what is taking so long I'd suggest you rather take this code out of the trigger it's currently in, 'fake' an inserted and deleted table and then try running the queries again adding times-stamps and/or using some program like SQL Sentry Plan Explorer to see how long each part REALLY takes (it has a duration column when you run a script from within the program).
It might well be that you're looking at the wrong part; experience shows that cost and actual execution times are not always as related as we'd like to think.

Observations include:
Is this the biggest of these databases that you are working with? If so, size matters to the optimizer. It will make quite a different plan for large datasets versus smaller data sets.
The estimated rows and the actual rows are quite divergent. This is most apparent on the fourth query. "delete c from #alternativeRoutes...." where the _Security_Tuple5 estimates returning 16 rows, but actually used 235,904 rows. For that many rows an Index Scan could be more performant than Index Seeks. Are the statistics on the table up to date or do they need to be updated?
The "select count(*) from _Security_Tuple4" takes several minutes, the first time. The second time is instant. This is because the data is all now cached in memory (until it ages out) and the second query is fast.
Because the problem moves with the database then the statistics, any missing indexes, et cetera are in the database. I would also suggest checking that the indexes match with other databases using the same schema.
This is not a full analysis, but it gives you some things to look at.

Fyodor,
First:
The problem is not connected to the particular server, neither to the particular SQL Server instance: if I back up the database and then restore it on another computer, the behavior remains the same.
I presume that you: a) run this query in isolated environment, b) the data is not under mutation.
Is this correct?
Second: post here your CREATE INDEX script. Do you have a funny FILLFACTOR? SORT_IN_TEMPDB?
Third: which type is your ParentId, ObjectId? int, smallint, uniqueidentifier, varchar?

After writing SQL statements in MySQL, how to measure the speed / performance of them?

I saw something from an "execution plan" article:
10 rows fetched in 0.0003s (0.7344s)
(the link: http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/ )
How come there are 2 durations shown? What if I don't have large data set yet. For example, if I have only 20, 50, or even just 100 records, I can't really measure how faster 2 different SQL statements compare in term of speed in real life situation? In other words, there needs to be at least hundreds of thousands of records, or even a million records to accurately compares the performance of those 2 different SQL statements?

For your first question:
X row(s) fetched in Y s (Z s)
X = number of rows (of course);
Y = time it took the MySQL server to execute the query (parse, retrieve, send);
Z = time the resultset spent in transit from the server to the client;
(Source: http://forums.mysql.com/read.php?108,51989,210628#msg-210628)
For the second question, you will never ever know how the query performs unless you test with a realistic number of records. Here is a good example of how to benchmark correctly: http://www.mysqlperformanceblog.com/2010/04/21/mysql-5-5-4-in-tpcc-like-workload/
That blog in general as well as the book "High Performance MySQL" is a goldmine.

The best way to test and compare performance of operations is often (if not always !) to work with a realistic set of data.
If you plan on having millions of rows when your application is in production, then, you should test with millions of rows right now, and not only a dozen !
A couple of tips :
While benchmarking, use select SQL_NO_CACHE ..., instead of select ...
This will prevent MySQL from using its query cache (which would make the first query take a normal amount of time, and re-executing it several times a lot faster)
Learn how to use EXPLAIN, and understand its output
Read the Chapter 7. Optimization section of the manual ;-)

Generally when there are 2 times shown, one is CPU time and one is wall-clock time. I cannot recall which is which, but it appears that the first is the CPU time and the second is elapsed time.

Long running Jobs Performance Tips

I have been working with SQL server for a while and have used lot of performance techniques to fine tune many queries. Most of these queries were to be executed within few seconds or may be minutes.
I am working with a job which loads around 100K of data and runs for around 10 hrs.
What are the things I need to consider while writing or tuning such query? (e.g. memory, log size, other things)

Make sure you have good indexes defined on the columns you are querying on.

Ultimately, the best thing to do is to actually measure and find the source of your bottlenecks. Figure out which queries in a stored procedure or what operations in your code take the longest, and focus on slimming those down, first.
I am actually working on a similar problem right now, on a job that performs complex business logic in Java for a large number of database records. I've found that the key is to process records in batches, and make as much of the logic as possible operate on a batch instead of operating on a single record. This minimizes roundtrips to the database, and causes certain queries to be much more efficient than when I run them for one record at a time. Limiting the batch size prevents the server from running out of memory when working on the Java side. Since I am using Hibernate, I also call session.clear() after every batch, to prevent the session from keeping copies of objects I no longer need from previous batches.
Also, an RDBMS is optimized for working with large sets of data; use normal SQL operations whenever possible. Avoid things like cursors, and a lot procedural programming; as other people have said, make sure you have your indexes set up correctly.

It's impossible to say without looking at the query. Just because you have indexes doesn't mean they are being used. You'll have to look at the execution plan and see if they are being used. They might show that they aren't useful to the execution plan.
You can start with looking at the estimated execution plan. If the job actually completes, you can wait for the actual execution plan. Look at parameter sniffing. Also, I had an extremely odd case on SQL Server 2005 where
SELECT * FROM l LEFT JOIN r ON r.ID = l.ID WHERE r.ID IS NULL
would not complete, yet
SELECT * FROM l WHERE l.ID NOT IN (SELECT r.ID FROM r)
worked fine - but only for particular tables. Problem was never resolved.
Make sure your statistics are up to date.

If possible post your query here so there is something to look at. I recall a query someone built with joins to 12 different tables dealing with around 4 or so million records that took around a day to run. I was able to tune that to run within 30 mins by eliminating the unnecessary joins. Where possible try to reduce the datasets you are joining before returning your results. Use plenty of temp tables, views etc if you need.
In cases of large datasets with conditions try to preapply your conditions through a view before your joins to reduce the number of records.
100k joining 100k is a lot bigger than 2k joining 3k

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas