When I run a certain stored procedure for the first time it takes about 2 minutes to finish. When I run it for the second time it finished in about 15 seconds. I'm assuming that this is because everything is cached after the first run. Is it possible for me to "warm the cache" before I run this procedure for the first time? Is the cached information only used when I call the same stored procedure with the same parameters again or will it be used if I call the same stored procedure with different params?
When you peform your query, the data is read into memory in blocks. These blocks remain in memory but they get "aged". This means the blocks are tagged with the last access and when Sql Server requires another block for a new query and the memory cache is full, the least recently used block (the oldest) is kicked out of memory. (In most cases - full tables scan blocks are instantly aged to prevent full table scans overrunning memory and choking the server).
What is happening here is that the data blocks in memory from the first query haven't been kicked out of memory yet so can be used for your second query, meaning disk access is avoided and performance is improved.
So what your question is really asking is "can I get the data blocks I need into memory without reading them into memory (actually doing a query)?". The answer is no, unless you want to cache the entire tables and have them reside in memory permanently which, from the query time (and thus data size) you are describing, probably isn't a good idea.
Your best bet for performance improvement is looking at your query execution plans and seeing whether changing your indexes might give a better result. There are two major areas that can improve performance here:
creating an index where the query could use one to avoid inefficient queries and full table scans
adding more columns to an index to avoid a second disk read. For example, you have a query that returns columns A, and B with a where clause on A and C and you have an index on column A. Your query will use the index for column A requiring one disk read but then require a second disk hit to get columns B and C. If the index had all columns A, B and C in it the second disk hit to get the data can be avoided.
I don't think that generating the execution plan will cost more that 1 second.
I believe that the difference between first and second run is caused by caching the data in memory.
The data in the cache can be reused by any further query (stored procedure or simple select).
You can 'warm' the cache by reading the data through any select that reads the same data. But that will even cost about 90 seconds as well.
You can check the execution plan to find out which tables and indexes your query uses. You can then execute some SQL to get the data into the cache, depending on what you see.
If you see a clustered index seek, you can simply do SELECT * FROM my_big_table to force all the table's data pages into the cache.
If you see a non-clustered index seek, you could try SELECT first_column_in_index FROM my_big_table.
To force a load of a specific index, you can also use the WITH(INDEX(index)) table hint in your cache warmup queries.
SQL server cache data read from disc.
Consecutive reads will do less IO.
This is of great help since disk IO is usually the bottleneck.
More at:
http://blog.sqlauthority.com/2014/03/18/sql-server-performance-do-it-yourself-caching-with-memcached-vs-automated-caching-with-safepeak/
The execution plan (the cached info for your procedure) is reused every time, even with different parameters. It is one of the benefits of using stored procs.
The very first time a stored procedure is executed, SQL Server generates an execution plan and puts it in the procedure cache.
Certain changes to the database can trigger an automatic update of the execution plan (and you can also explicitly demand a recompile).
Execution plans are dropped from the procedure cache based an their "age". (from MSDN: Objects infrequently referenced are soon eligible for deallocation, but are not actually deallocated unless memory is required for other objects.)
I don't think there is any way to "warm the cache", except to perform the stored proc once. This will guarantee that there is an execution plan in the cache and any subsequent calls will reuse it.
more detailed information is available in the MSDN documentation: http://msdn.microsoft.com/en-us/library/ms181055(SQL.90).aspx
Related
Execution Plan Download Link: https://www.dropbox.com/s/3spvo46541bf6p1/Execution%20plan.xml?dl=0
I'm using SQL Server 2008 R2
I have a pretty complex stored procedure that's requesting way too much memory upon execution. Here's a screenshot of the execution plan:
http://s15.postimg.org/58ycuhyob/image.png
The underlying query probably needs a lot of tuning as indicated by massive number of estimated rows, but that's besides the point. Regardless of the complexity of the query, it should not be requesting 3 gigabytes of memory upon execution.
How do I prevent this behavior? I've tried the following:
DBCC FREEPROCCACHE to clear plan cache. This accomplished nothing.
Setting RECOMPILE option on both SP and SQL level. Again, this does nothing.
Messing around with MAXDOP option, from 0 to 8. Same issue.
The query returns about ~1k rows on average, and it does look into a table with more than 3 million rows with about 4 tables being joined. Executing the query returns the result in less than 3 seconds in majority of the cases.
Edit:
One more thing, using query hints is not really viable for this case since the parameters vary greatly for our case.
Edit2:
Uploaded execution plan upon request
Edit3:
I've tried rebuilding/reorganizing fragmented indices. Apparently, there were few but nothing too serious. Anyhow, this didn't reduce the amount of memory granted and didn't reduce the number of estimated rows (If this is somehow related).
You say optimizing the query is besides the point, but actually it's actually just the point. When a query is executed, SQL Server will -after generating the execution plan- reserve the memory needed for executing the query. The more rows intermediate results are estimated to hold the more memory is estimated to be required.
So, rewrite your query and/or create new indexes to get a decent query plan. A quick glance at the query plan shows some nested loops without join predicates and a number of table scans of which probably only a few records are used.
There was a query in production which was running for several hours(5-6) hours. I looked into its execution plan, and found that it was ignoring a parallel hint on a huge table. Reason - it was using TABLE ACCESS BY INDEX ROWID. So after I added a /*+ full(huge_table) */ hint before the parallel(huge_table) hint, the query started running in parallel, and it finished in less than 3 minutes. What I could not fathom was the reason for this HUGE difference in performance.
The following are the advantages of parallel FTS I can think of:
Parallel operations are inherently fast if you have more idle CPUs.
Parallel operations in 10g are direct I/O which bypass
buffer cache which means there is no risk of "buffer busy waits" or
any other contention for buffer cache.
Sure there are the above advantages but then again the following disadvantages are still there:
Parallel operations still have to do I/O, and this I/O would be more than what we have for TABLE ACCESS BY INDEX ROWID as the entire table is scanned and is costlier(all physical reads)
Parallel operations are not very scalable which means if there aren't enough free resources, it is going to be slow
With the above knowledge at hand, I see only one reason that could have caused the poor performance for the query when it used ACCESS BY INDEX ROWID - some sort of contention like "busy buffer waits". But it doesn't show up on the AWR top 5 wait events. The top two events were "db file sequential read" and "db file scattered read". Is there something else that I have missed to take into consideration? Please enlighten me.
First, without knowing anything about your data volumes, statistics, the selectivity of your predicates, etc. I would guess that the major benefit you're seeing is from doing a table scan rather than trying to use an index. Indexes are not necessarily fast and table scans are not necessarily slow. If you are using a rowid from an index to access a row, Oracle is limited to doing single block reads (sequential reads in Oracle terms) and that it's going to have to read the same block many times if the block has many rows of interest. A full table scan, on the other hand, can do nice, efficient multiblock reads (scattered reads in Oracle terms). Sure, an individual single block read is going to be more efficient than a single multiblock read but the multiblock read is much more efficient per byte read. Additionally, if you're using an index, you've potentially got to read a number of blocks from the index periodically to find out the next rowid to read from the table.
You don't actually need to read all that much data from the table before a table scan is more efficient than an index. Depending on a host of other factors, the tipping point is probably in the 10-20% range (that's a very, very rough guess). Imagine that you had to get a bunch of names from the phone book and that the phone book had an index that included the information you're filtering on and the page that the entry is on. You could use an index to find the name of a single person you want to look at, flip to the indicated page, record the information, flip back to the index, find the next name, flip back, etc. Or you could simply start at the first name, scan until you find a name of interest, record the information, and continue the scan. It doesn't take too long before you're better off ignoring the index and just reading from the table.
Adding parallelism doesn't reduce the amount of work your query does (in fact, adding in parallel query coordination means that you're doing more work). It's just that you're doing that work over a shorter period of elapsed time by using more of the server's available resources. If you're running the query with 6 parallel slaves, that could certainly allow the query to run 5 times faster overall (parallel query obviously scales a bit less than linearly because of overheads). If that's the case, you'd expect that doing a table scan made the query 20 times faster and adding parallelism added another factor of 5 to get your 100x improvement.
I have stored procedure which is building dynamic SQL statement depending on its input parameters and then executed it.
One of the queries is causing time outs, so I have decided to check it. The first time (and only the first time) the issue statement is executed it is slow (30 secs - 45 secs) and every next execute takes 1-2 seconds.
In order to reproduce the issue, I am using
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
I am really confused where the problem is, because ordinary if SQL statement is slow, it is always slow. Now, it has long execution time only the first time.
Is is possible, the itself to be slow and needs optimization or the problem can be caused by something else?
The execution plan is below, but for me there is nothing strange with it:
From your reply to my comment, it would appear that the first time this query runs it is performing a lot of physical reads or read-ahead reads, meaning that a lot of IO is required to get the right pages into the buffer pool to satisfy this query.
Once pages are read into the buffer pool (memory) they generally stay there so that physical IO is not required to read them again (you can see this is happening as you indicated that the physical reads are converted to logical reads the second time the query is run). Memory is orders of magnitude faster than disk IO, hence the difference in speed for this query.
Looking at the plan, I can just about see that every read operation is being done against the clustered index of the table. As the clustered index contains every column for the row it is potentially fetching more data per row than is actually required for the query.
Unless you are selecting every column from every table, I would suggest that creating non-clustered covering indexes that satisfy this query (that are as narrow as possible), this will reduce the IO requirement for the query and make it less expensive the first time round.
Of course this may not be possible/viable for you to do, in which case you should either just take the hit on the first run and not empty the caches, or rewrite the query itself to be more efficient and perform less reads.
Its very simple the reason 1st and very 1st time it takes longer and then all later executions are done fairly quickly. the reason behind this mystery is "CACHED EXECUTION PLANS".
While working with Stored Procedures, Sql server takes the following
steps.
1) Parse Syntax of command. 2) Translate to Query Tree. 3) Develop
Execution Plan. 4) Execute.
The 1st two steps only take place when you create a Stored Procedure.
3rd step only takes place on very 1st Execution or if the CACHED PLAN has been flushed from the CACHE MEMORY.
Fourth Step takes place on every execution, and this is the only step that takes place after the very 1st execution if the Plan is still in cache memory.
In your case its quite understandable that very 1st execution took long and then later it gets executed fairly quickly.
To reproduce the "issue" you executed DBCC FREEPROCCACHE AND DBCC DROPCLEANBUFFERS commanda which basically Flushes the BUFFER CACHE MEMORY and causes your stored procedure to create a new Execution plan on it next execution. Hope this will clear the fog a little bit :)
Generally, when a Stored procedure is first created, or its statistics etc reset, it will take the first value passed into the Stored Procedure as the 'default' value for the stored procedure. It will then try to optimize itself based off of that.
To stop that from happening, there are a couple of things you can do.
You could potentially use the Query hints feature to mark certain variables as Unknown. So, as an example, at the end of the stored procedure you could put something along the lines of:
select * from foo where foo.bar = #myParam option (optimize for #myParam unknown)
As another approach, you could force the SQL plan to be re-compiled each time - which might be a good idea if your stored procedure is highly variable in the type of SQL it generates. The way you'd do that is:
select * from foo where foo.bar = #myParam option (optimize recompile)
Hope this helps.
I have a large table (~170 million rows, 2 nvarchar and 7 int columns) in SQL Server 2005 that is constantly being inserted into. Everything works ok with it from a performance perspective, but every once in a while I have to update a set of rows in the table which causes problems. It works fine if I update a small set of data, but if I have to update a set of 40,000 records or so it takes around 3 minutes and blocks on the table which causes problems since the inserts start failing.
If I just run a select to get back the data that needs to be updated I get back the 40k records in about 2 seconds. It's just the updates that take forever. This is reflected in the execution plan for the update where the clustered index update takes up 90% of the cost and the index seek and top operator to get the rows take up 10% of the cost. The column I'm updating is not part of any index key, so it's not like it reorganizing anything.
Does anyone have any ideas on how this could be sped up? My thought now is to write a service that will just see when these updates have to happen, pull back the records that have to be updated, and then loop through and update them one by one. This will satisfy my business needs but it's another module to maintain and I would love if I could fix this from just a DBA side of things.
Thanks for any thoughts!
Actually it might reorganise pages if you update the nvarchar columns.
Depending on what the update does to these columns they might cause the record to grow bigger than the space reserved for it before the update.
(See explanation now nvarchar is stored at http://www.databasejournal.com/features/mssql/physical-database-design-consideration.html.)
So say a record has a string of 20 characters saved in the nvarchar - this takes 20*2+2(2 for the pointer) bytes in space. This is written at the initial insert into your table (based on the index structure). SQL Server will only use as much space as your nvarchar really takes.
Now comes the update and inserts a string of 40 characters. And oops, the space for the record within your leaf structure of your index is suddenly too small. So off goes the record to a different physical place with a pointer in the old place pointing to the actual place of the updated record.
This then causes your index to go stale and because the whole physical structure requires changing you see a lot of index work going on behind the scenes. Very likely causing an exclusive table lock escalation.
Not sure how best to deal with this. Personally if possible I take an exclusive table lock, drop the index, do the updates, reindex. Because your updates sometimes cause the index to go stale this might be the fastest option. However this requires a maintenance window.
You should batch up your update into several updates (say 10000 at a time, TEST!) rather than one large one of 40k rows.
This way you will avoid a table lock, SQL Server will only take out 5000 locks (page or row) before esclating to a table lock and even this is not very predictable (memory pressure etc). Smaller updates made in this fasion will at least avoid concurrency issues you are experiencing.
You can batch the updates using a service or firehose cursor.
Read this for more info:
http://msdn.microsoft.com/en-us/library/ms184286.aspx
Hope this helps
Robert
The mos brute-force (and simplest) way is to have a basic service, as you mentioned. That has the advantage of being able to scale with the load on the server and/or the data load.
For example, if you have a set of updates that must happen ASAP, then you could turn up the batch size. Conversely, for less important updates, you could have the update "server" slow down if each update is taking "too long" to relieve some of the pressure on the DB.
This sort of "heartbeat" process is rather common in systems and can be very powerful in the right situations.
Its wired that your analyzer is saying it take time to update the clustered Index . Did the size of the data change when you update ? Seems like the varchar is driving the data to be re-organized which might need updates to index pointers(As KMB as already pointed out) . In that case you might want to increase the % free sizes on the data and the index pages so that the data and the index pages can grow without relinking/reallocation . Since update is an IO intensive operation ( unlike read , which can be buffered ) the performance also depends on several factors
1) Are your tables partitioned by data 2) Does the entire table lies in the same SAN disk ( Or is the SAN striped well ?) 3) How verbose is the transaction logging . Can the buffer size of the transaction loggin increased to support larger writes to the log to suport massive inserts ?
Its also important which API/Language are you using? e.g JDBC support a batch update feature which makes the updates a little bit efficient if you are doing multiple updates .
If I look in my profiler for SQL-server, it comes up with a lot of duplicate queries such as:
exec sp_executesql N'SELECT *
FROM [dbo].[tblSpecifications] AS [t0]
WHERE [t0].[clientID] = #p0
ORDER BY [t0].[Title]', N'#p0 int', #p0 = 21
A lot of these queries are not needed to display real time data, that is, if someone inserted a new record that was matched in that query it wouldn't matter if it didn't display for up to an hour after insertion.
You can output cache the asp.net pages, but I was wondering if there was similar functionality on the dbms (SQL-server in particular), which saves a query results in a cache and renews that cache after a set period of time, say 1 hour, with the aim of improving retrieval speeds of records.
In SQL Server 2000 and prior, you can use DBCC PINTABLE (databaseid, tableid), but its best to allow SQL Server to manage your memory
If you have an expensive aggregate query that you would like "cached", create an indexed view to materialize the results.
Otherwise, the amount of time a database page remains in memory is determined by the least recently used policy. The header of each data page in cache stores details about the last two times it was accessed. A background process scans the cache, and decrements a usecount if the page has not been accessed since the last scan. When SQL Server needs to free cache, pages with the lowest usecount are flushed first. (Professional SQL Server 2008 Internals and Troubleshooting)
sys.dm_os_buffer_descriptors contains one row for each data page currently in cache
Query results are not cached, but the data pages themselves will remain in cache until they are pushed out by other read operations. They next time your query is submitted, these pages will be read from memory instead of disk.
This is a main reason to avoid table scans where possible. If the table being scanned is big enough, your cache gets flooded with potentially useless data.
A lot of people have a "who cares how long the query takes, it is running it batch mode" attitude, but they fail to see the impact on other processes, such as the one you mentioned.
No, but there are a ton of caching solutions out there such as Memcached and Ehcache.
Not to miss the obvious, you could also create a wholly separate reporting table and update it hourly. While there'd be a cost in populating and administering it, you could limit the fields to what's needed and optimize the indices for reads.