SQL Server: force cleanup of deallocated internal objects in tempdb to release disk space reserved for them within an opened session - sql

I have a large cursor bases query which runs within a stored procedure under a job. It makes tons of calculations for bunches of market data in a loop all day long. Every such iteration pools pieces of history time series from a disk, fatches it to temporary tables with appropriate indexing, joins them in a number of trunsformations with intermediate results and stores calculation output to disk. At the end of each loop I drop (mostly) or truncate all temporary tables to deallocate pages of user objects inside tempdb and get namespace ready for the next iteration.
My problem is that after each cycle all internal objects, which DB Engine creates for query execution and dump them to tempdb, - keep disk space reserved for them even being deallocated after transactions commit. And it adds up on every cycle as the next bunch of new internal objects is swiped to disk.
It leads to permanent tempdb grouth, all of accumulating reserved space related to new-and-new deallocated internal objects. DB Engine releases/shrinks (whatever) these tons of wasted disk space only after session closes when proc finishes it's cycles.
I can overcome the problem by reducing number of cycles in each job run, just start it again. But I would like a complete fundamental decision: I need a command or any kind of trick inside a session to force rubbish collection on my demand to clean up / kill deallocated internal objects completely and release tempdb disk space reserved for them. Days of googling did not help. Folks, help!

We have exactly the same issue:
time consuming recalculations are executed every night;
a lot of temporary tables are used in order to used parallel
execution plans
In order to fix the issue, we've just divided the processes to small processes executing each in separate session, but chained (in order to avoid blocking issues) - when the first part is executed, it fires up the next part, then after it is executed, it fires the next one.
For example (if you have a way to chain your calculations), you can break your loop iterations to separate calling of procedure with different parameters. Executed in different sessions, when each of them is finished the pages will be released.

Related

SQL stored procedure - how to optimize slow delete?

I've got a seemingly simple stored procedure that is taking too long to run (25 minutes on about 1 million records). I was wondering what I can do to speed it up. It's just deleting records in a given set of statuses.
Here's the entire procedure:
ALTER PROCEDURE [dbo].[spTTWFilters]
AS
BEGIN
DELETE FROM TMW
WHERE STATUS IN ('AVAIL', 'CANCL', 'CONTACTED', 'EDI-IN', 'NOFRGHT', 'QUOTE');
END
I can obviously beef up my Azure SQL instance to run faster, but are there other ways to improve? Is my syntax not ideal? Do I need to index the STATUS column? Thanks!
So the answer, as it is generally the case with large data update operations, is to break it up into several smaller batches.
Every DML statement, by default, starts an implicit transaction, whether explicitly declared or not. By running the delete affecting a large number of rows in a single batch, locks are held on indexes and the base table for the duration of the operation and the log file will continue to grow, internally creating new VLFs for the entire transaction.
Moreover, if the delete is aborted before it completes, the rollback may well take considerably longer to complete since they are always single-threaded.
Breaking into batches, usually performed in some form of loop working progressively through a range of key values, allows the deletes to occur in smaller more manageable chunks. In this case, having a range of different status-values to delete separately appears to be enough to effect a worthwhile improvement.
You can use top keyword to delete large amount of data by using loop or use = sign instead of in keyword.

SQL Server Multiple calls to the same Stored Procedure affects performance?

I have a SP similar to below:
DECLARE my_cursor CURSOR FAST_FORWARD FOR
SELECT TOP 500 id
FROM Table
WHERE colA= 1 AND colB= colC
--call another SP
--some other select/update operations
-- deallocate the cursor when everything is done
Now, say I have a Java program that calls this SP in a loop:
for (int i =0; i < 1000; i++)
JPA.em().createNativeQuery("exec my_SP").executeUpdate(); //each time gets 500 to process
I realised that the processing time is gradually increasing. It takes only a few sec to finish the first loop, but quickly gone up to around a minute for a loop to finish.
I am guessing this is due to a shortage of memory as each loop the SP/cursor is occupying some of the memory space. So, eventually SQL Server needs to reclaim some of the memory and this becomes more frequent as the loop moves on (hence taking longer processing time).
I have two questions:
If the above is correct, why shouldn't the increased time be quite stable after certain point? Every time I am dealing with a constant number (500) of records. So, I thought, in terms of the memory, it is like 500 out and 500 in.
If the above is incorrect, why is the processing time increasing in each loop?
Is cursor is the main reason, or it is because I separate the SP calls into, say in this case, a thousand pieces?
I believe your issue is related to database locking.
You are firing off many threads of the same query all of which are getting in each others' way. SQL Server applies a read lock to prevent phantom reads but the main lock would be your update statements that would need a write lock. Other threads have to wait until it is released before they can continue, so a backlog of waiting threads form otherwise know as Db blocking.
You can experiment by commenting out the SQL update statements and you may also need to add the following line at the top of your script for testing purposes:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
You should find things to speed up.

SQL Server cache question

When I run a certain stored procedure for the first time it takes about 2 minutes to finish. When I run it for the second time it finished in about 15 seconds. I'm assuming that this is because everything is cached after the first run. Is it possible for me to "warm the cache" before I run this procedure for the first time? Is the cached information only used when I call the same stored procedure with the same parameters again or will it be used if I call the same stored procedure with different params?
When you peform your query, the data is read into memory in blocks. These blocks remain in memory but they get "aged". This means the blocks are tagged with the last access and when Sql Server requires another block for a new query and the memory cache is full, the least recently used block (the oldest) is kicked out of memory. (In most cases - full tables scan blocks are instantly aged to prevent full table scans overrunning memory and choking the server).
What is happening here is that the data blocks in memory from the first query haven't been kicked out of memory yet so can be used for your second query, meaning disk access is avoided and performance is improved.
So what your question is really asking is "can I get the data blocks I need into memory without reading them into memory (actually doing a query)?". The answer is no, unless you want to cache the entire tables and have them reside in memory permanently which, from the query time (and thus data size) you are describing, probably isn't a good idea.
Your best bet for performance improvement is looking at your query execution plans and seeing whether changing your indexes might give a better result. There are two major areas that can improve performance here:
creating an index where the query could use one to avoid inefficient queries and full table scans
adding more columns to an index to avoid a second disk read. For example, you have a query that returns columns A, and B with a where clause on A and C and you have an index on column A. Your query will use the index for column A requiring one disk read but then require a second disk hit to get columns B and C. If the index had all columns A, B and C in it the second disk hit to get the data can be avoided.
I don't think that generating the execution plan will cost more that 1 second.
I believe that the difference between first and second run is caused by caching the data in memory.
The data in the cache can be reused by any further query (stored procedure or simple select).
You can 'warm' the cache by reading the data through any select that reads the same data. But that will even cost about 90 seconds as well.
You can check the execution plan to find out which tables and indexes your query uses. You can then execute some SQL to get the data into the cache, depending on what you see.
If you see a clustered index seek, you can simply do SELECT * FROM my_big_table to force all the table's data pages into the cache.
If you see a non-clustered index seek, you could try SELECT first_column_in_index FROM my_big_table.
To force a load of a specific index, you can also use the WITH(INDEX(index)) table hint in your cache warmup queries.
SQL server cache data read from disc.
Consecutive reads will do less IO.
This is of great help since disk IO is usually the bottleneck.
More at:
http://blog.sqlauthority.com/2014/03/18/sql-server-performance-do-it-yourself-caching-with-memcached-vs-automated-caching-with-safepeak/
The execution plan (the cached info for your procedure) is reused every time, even with different parameters. It is one of the benefits of using stored procs.
The very first time a stored procedure is executed, SQL Server generates an execution plan and puts it in the procedure cache.
Certain changes to the database can trigger an automatic update of the execution plan (and you can also explicitly demand a recompile).
Execution plans are dropped from the procedure cache based an their "age". (from MSDN: Objects infrequently referenced are soon eligible for deallocation, but are not actually deallocated unless memory is required for other objects.)
I don't think there is any way to "warm the cache", except to perform the stored proc once. This will guarantee that there is an execution plan in the cache and any subsequent calls will reuse it.
more detailed information is available in the MSDN documentation: http://msdn.microsoft.com/en-us/library/ms181055(SQL.90).aspx

SQL SERVER Procedure Inconsistent Performance

I am working on a SQL Job which involves 5 procs, a few while loops and a lot of Inserts and Updates.
This job processes around 75000 records.
Now, the job works fine for 10000/20000 records with speed of around 500/min. After around 20000 records, execution just dies. It loads around 3000 records every 30 mins and stays at same speed.
I was suspecting network, but don't know for sure. These kind of queries are difficult to analyze through SQL Performance Monitor. Not very sure where to start.
Also, there is a single cursor in one of the procs, which executes for very few records.
Any suggestions on how to speed this process up on the full-size data set?
I would check if your updates are within a transaction. If they are, it could explain why it dies after a certain amount of "modified" data. You might check how large your "tempdb" gets as an indicator.
Also I have seen cases when during long-running transactions the database would die when there are other "usages" at the same time, again because of transactionality and improper isolation levels used.
If you can split your job into independent non-overlaping chunks, you might want to do it: like doing the job in chunks by dates, ID ranges of "root" objects etc.
I suspect your whole process is flawed. I import a datafile that contains 20,000,000 records and hits many more tables and does some very complex processing in less time than you are describing for 75000 records. Remember looping is every bit as bad as using cursors.
I think if you set this up as an SSIS package you might be surprised to find the whole thing can run in just a few minutes.
With your current set-up consider if you are running out of room in the temp database or maybe it is trying to grow and can't grow fast enough. Also consider if at the time the slowdown starts, is there some other job running that might be causing blocking? Also get rid of the loops and process things in a set-based manner.
Okay...so here's what I am doing in steps:
Loading a file in a TEMP table, just an intermediary.
Do some validations on all records using SET-Based transactions.
Actual Processing Starts NOW.
TRANSACTION BEGIN HERE......
LOOP STARTS HERE
a. Pick Records based in TEMP tables PK (say customer A).
b. Retrieve data from existing tables (e.g. employer information)
c. Validate information received/retrieved.
d. Check if record already exists - UPDATE. else INSERT. (THIS HAPPENS IN SEPARATE PROCEDURE)
e. Find ALL Customer A family members (PROCESS ALL IN ANOTHER **LOOP** - SEPARATE PROC)
f. Update status for CUstomer A and his family members.
LOOP ENDS HERE
TRANSACTION ENDS HERE

Persistent temp tables in SQL?

Is it possible to have a 'persistent' temp table in MS-SQL? What I mean is that I currently have a background task which generates a global temp table, which is used by a variety of other tasks (which is why I made it global). Unfortunately if the table becomes unused, it gets deleted by SQL automatically - this is gracefully handled by my system, since it just queues it up to be rebuilt again, but ideally I would like it just to be built once a day. So, ideally I could just set something like set some timeout parameter, like "If nothing touches this for 1 hour, then delete".
I really don't want it in my existing DB because it will cause loads more headaches related to managing the DB (fragmentation, log growth, etc), since it's effectively rollup data, only useful for a 24 hour period, and takes up more than one gigabyte of HD space.
Worst case my plan is to create another DB on the same drive as tempdb, call it something like PseudoTempDB, and just handle the dropping myself.
Any insights would be greatly appreciated!
If you create a table as tempdb.dbo.TempTable, it won't get dropped until:
a - SQL Server is restarted
b - You explicitly drop it
If you would like to have it always available, you could create that table in model, so that it will get copied to tempdb during the restart (but it will also be created on any new database you create afterwards, so you would have to delete manually) or use a startup stored procedure to have it created. There would be no way of persisting the data through restarts though.
I would go with your plan B, "create another DB on the same drive as tempdb, call it something like PseudoTempDB, and just handle the dropping myself."
How about creating a permanent table? Say, MyTable. Once every 24 hours, refresh the data like this:
Create a new table MyTableNew and populate it
Within a transaction, drop MyTable, and use rename_object to rename MyTableNew to MyTable
This way, you're recreating the table every day.
If you're worried about log files, store the table in a different database and set it to Recovery Model: Simple.
I have to admit to doing a double-take on this question: "persistent" and "temp" don't usually go together! How about a little out-of-the-box thinking? Perhaps your background task could periodically run a trivial query to keep SQL from marking the table as unused. That way, you'd take pretty direct control over creation and tear down.
After 20 years of experience dealing with all major RDBMS in existence, I can only suggest a couple of things for your consideration:
Note the oxymoronic concepts: "persistent" and "temp" are complete opposites. Choose one, and one only.
You're not doing your database any favors writing data to the temp DB for a manual, semi-permanent, user-driven basis. Normal tablespaces (i.e. user) are already there for that purpose. The temp DB is for temporary things.
If you already know that such a table will be permanently used ("daily basis" IS permanent), then create it as a normal table on a user database/schema.
Every time that you delete and recreate the very same table you're fragmenting your whole database. And have the perverse bonus of never giving a chance for the DB engine optimizer to assist you in any sort of crude optimization. Instead, try truncating it. Your rollback segments will thank you for that small relief and disk space will probably still be allocated for when you repopulate it again the next day. You can force that desired behavior by specifying a separate tablespace and datafile for that table alone.
Finally, and utterly more important: Stop mortifying you and your DB engine for a measly 1 GB of data. You're wasting CPU, I/O cycles, adding latency, fragmentation, and so on for the sake of saving literally 0.02 cents of hardware real state. Talk about dropping to the floor in a tuxedo to pick up a brown cent. 😂