I have a stored procedure that performs calculations and stores a large amount of data in new tables, in another database.
If anything goes wrong, I just drop the tables.
I've already set the recovery mode for this database as simple, as the data is available elsewhere. Is there anything else I can do in the stored procedure to limit writing to the transaction log or remove transactions entirely to speed up the process?
It is impossible to completely eliminate transaction log from the equation in SQL Server.
You may try to check bulk logged recovery model in conjunction with bulk insert, but if your calculations are complex and cannot be expressed within a single select statement, it could be worth trying SSIS.
I suggest you that using SSIS package in order to convert data from one database to another database. in SSIS you can control converted data and can use balk insert. In buck insert mode you limit your database to write transaction logs completely.
I ran into similar situations even while using SSIS where my staging database(s) kept logs more then 10 times the size of the actual data (on simple logging and using bulk insert). After lots of searching I have found that it is not feasable to prevent this from happening when doing large data operations like loading a datawarehouse. Instead it is easier to just clean up after you are done by shrinking the log.
dbcc shrinkfile
Related
Good day,
Two questions:
A) If I have something like this:
COMPLEX QUERY
WAIT FOR LOG TO FREE UP (DELAY)
COMPLEX QUERY
Would this actually work? Or would the log segment of tempdb remain just as full, due to still holding on to the log of the first query.
B) In the situation above, is it possible to have the middle query perform a dump tran with truncate_only ?
(It's a very long chain of various queries that are run together. They don't change anything in the databases and I don't care to even keep the logs if I don't have to.)
The reason for the chain is because I need the same two temp tables, and a whole bunch of variables, for various queries in the chain (Some of them for all of the queries). To simply the usage of the query chain by a user with VERY limited SQL knowledge, I collect very simple information at the beginning of the long script, retrieve the rest automatically, and then use it through out the script
I doubt either of these would work, but I thought I may as well ask.
Sybase versions 15.7 and 12 (12.? I don't remember)
Thanks,
Ziv.
Per my understanding of #michael-gardner 's answers this is what I plan:
FIRST TEMP TABLES CREATION
MODIFYING OPERATIONS ON FIRST TABLES
COMMIT
QUERY1: CREATE TEMP TABLE OF THIS QUERY
QUERY1: MODIFYING OPERATIONS ON TABLE
QUERY1: SELECT
COMMIT
(REPEAT)
DROP FIRST TABLES (end of script)
I read that 'select into' is not written to the log, so I'm creating the table with a create (I have to do it this way due to other reasons), and use select into existing table for initial population. (temp tables)
Once done with the table, I drop it, then 'commit'.
At various points in the chain I check the log segment of tempdb, if it's <70% (normally at >98%), I use a goto to reach the end of the script where I drop the last temp tables and the script ends. (So no need for a manual 'commit' here)
I misunderstood the whole "on commit preserve rows" thing, that's solely on IQ, and I'm on ASE.
Dumping the log mid-transaction won't have any affect on the amount of log space. The Sybase log marker will only move if there is a commit (or rollback), AND if there isn't an older open transaction (which can be found in syslogshold)
There are a couple of different ways you can approach solving the issue:
Add log space to tempdb.
This would require no changes to your code, and is not very difficult. It's even possible that tempdb is not properly sized for the sytem, and the extra log space would be useful to other applications utilizing tempdb.
Rework your script to add a commit at the beginning, and query only for the later transactions.
This would accomplish a couple of things. The commit at the beginning would move the log marker forward, which would allow the log dump to reclaim space. Then since the rest of your queries are only reads, there shouldn't be any transaction space associate with them. Remember the transaction log only stores information on Insert/Update/Delete, not Reads.
Int the example you listed above, the users details could be stored and committed to the database, then the rest of the queries would just be select statements using those details for the variables, then a final transaction would cleanup the table. In this scenario the log is only held for the first transaction, and the last transaction..but the queries in the middle would not fill the log.
Without knowing more about the DB configuration or query details it's hard to get much more detailed.
I have a SQL statement that updates records in a table if the query returns any records. The query only returns records if they need to be updated. When I run the select on the query I get no records so when the update runs there should be no records updated.
The problem I'm having is that the query in the stored procedure won't finish becuase the transaction log fills up before the query can complete. I'm not concerned about the transaction log filling up right now.
My question is, if there no records are being updated then why is anything being written to the transaction log?
We need more information before this problem can be solved ...
Remus has a great idea to look at the entries in the log file.
Executing DBCC SQLPERF(logspace) will give you how full the log file is.
Clear the log file using a transaction log backup. This is assuming the recovery model is FULL and a FULL backup has been done.
Re-run the update stored procedure. Look at the transaction log file entries.
A copy of the stored procedure and table definitions would be great. Looking for other processes (sp_who2) during then execution that might fill the log is another good place to look.
Any triggers that might cause updates, deletes or inserts can add to the log file size, suggested by Martin.
Good luck.
Looks like the issue was in the join. It was tyring to join so many records that tempdb was filling up to the point there was no more space on the drive.
I have a performance problem
I need to run a Stored Procedure from .Net 1.1. This stored procedure calls 8 Stored Procedures. Each one of them process information to throw a comparative between old an new informacion and anter afects the physical table in DataBase.
The problem comes since I try to run it directly from SSMS. Servers starts crashing, getting so slow and almost impossible to work. I think infrastructure people has to restar service directly on the server.
I'm working in development enviroment so there is no much problem, but I can't upload this into production enviroment.
I've been thinking in use procedures only for comparison purposes and never affect physical data. Retrive information from them in Temporary tables in principal procedure and then open my try-catch and begin-end transactions blocks and affect database in my principal stored with the informacion in Temp tables.
My principal stored look as follows: Is this the best way I can do this??
create proc spTest
as
/*Some processes here, temporary tables, etc...*/
begin try
begin distributed transaction
sp_nested1
sp_nested2
sp_nested3
sp_nested4
sp_nested5
sp_nested6
sp_nested7
sp_nested8
/*more processes here, updates, deletes, extra inserts, etc...*/
commit transaction
end try
begin catch
rollback transaction
DECLARE #ERROR VARCHAR(3000)
SELECT #ERROR = CONVERT(VARCHAR(3000),ERROR_MESSAGE())
RAISERROR(#ERROR,16,32)
RETURN
end catch
The basic structure of each nested stored proc is similar but doesn't call any other proc, only each one has their own try and catch blocks.
Any help will be really appreciated... The version Im using is SQL Server 2005
Thank you all in advance....
First when things are slow, there is likely a problem in what you wrote. The first place to look is the execution plan of each stored proc. Do you have table scans?
Have you run each one individually and seen how fast each one is? This would help you define whether the problem is the 8 procs or something else. You appear to have a lot of steps involved in this, the procs may or may not even be the problem.
Are you processing data row-by-row by using a cursor or while loop or scalar User-defined function or correlated subquery? This can affect speed greatly. Do you have the correct indexing? Are your query statements sargable? I see you have a distributed transaction, are you sure the user running the proc has the correct rights on other servers? And that the servers exist and are running? Are you running out of room in the temp db? Do you need to run this in batches rather than try to update millions of records across multiple servers?
Without seeing this mess, it is hard to determine what might be causing it to slow.
But I will share how I work with long complex procs. First they all have a test variable that I use to rollback the transactions at the end until I'm sure I'm getting the right actions happening. I also return the results of what I have inserted before doing the rollback. Now this initially isn't going to help the speed problem. But set it up anyway because if you can't figure out what the problem would be from the execution plan, then probably what you want to do is comment out everything but the first step and run the proc in test mode (and rollback) then keep adding steps until you see the one that it is getting stuck on. Of course it may be more than one.
I have an application in which I have to insert in a database, with SQL Server 2008, in groups of N tuples and all the tuples have to be inserted to be a success insert, my question is how I insert these tuples and in the case that someone of this fail, I do a rollback to eliminate all the tuples than was inserted correctly.
Thanks
On SQL Server you might consider doing a bulk insert.
From .NET, you can use SQLBulkCopy.
Table-valued parameters (TVPs) are a second route. In your insert statement, use WITH (TABLOCK) on the target table for minimal logging. eg:
INSERT Table1 WITH (TABLOCK) (Col1, Col2....)
SELECT Col1, Col1, .... FROM #tvp
Wrap it in a stored procedure that exposes #tvp as parameter, add some transaction handling, and call this procedure from your app.
You might even try passing the data as XML if it has a nested structure, and shredding it to tables on the database side.
You should look into transactions. This is a good intro article that discusses rolling back and such.
If you are inserting the data directly from the program, it seems like what you need are transactions. You can start a transaction directly in a stored procedure or from a data adapter written in whatever language you are using (for instance, in C# you might be using ADO.NET).
Once all the data has been inserted, you can commit the transaction or do a rollback if there was an error.
See Scott Mitchell's "Managing Transactions in SQL Server Stored Procedures for some details on creating, committing, and rolling back transactions.
For MySQL, look into LOAD DATA INFILE which lets you insert from a disk file.
Also see the general MySQL discussion on Speed of INSERT Statements.
For a more detailed answer please provide some hints as to the software stack you are using, and perhaps some source code.
You have two competing interests, doing a large transaction (which will have poor performance, high risk of failure), or doing a rapid import (which is best not to do all in one transaction).
If you are adding rows to a table, then don't run in a transaction. You should be able to identify which rows are new and delete them should you not like how the look on the first round.
If the transaction is complicated (each row affects dozens of tables, etc) then run them in transactions in small batches.
If you absolutely have to run a huge data import in one transaction, consider doing it when the database is in single user mode and consider using the checkpoint keyword.
My database background is mainly Oracle, but I've recently been helping with some SQL Server work. My group has inherited some SQL server DTS packages that do daily loads and updates of large amounts of data. Currently it is running in SQL Server 2000, but will soon be upgraded to SQL Server 2005 or 2008. The mass updates are running too slowly.
One thing I noticed about the code is that some large updates are done in procedural code in loops, so that each statement only updates a small portion of the table in a single transaction. Is this a sound method to do updates in SQL server? Locking of concurrent sessions should not be an issue because user access to tables is disabled while the bulk loading takes place. I've googled around some, and found some articles suggesting that doing it this way conserves resources, and that resources are released each time an update commits, leading to greater efficiency. In Oracle this is generally a bad approach, and I've used single transactions for very large updates with success in Oracle. Frequent commits slow the process down and use more resources in Oracle.
My question is, for mass updates in SQL Server, is it generally a good practice to use procedural code, and commit many SQL statements, or to use one big statement to do the whole update?
Sorry Guys,
None of the above answer the question. They are just examples of how you can do things. The answer is, more resources get used with frequent commits, however, the transaction log cannot be truncated until a commit point. Thus, if your single spanning transaction is very big it will cause the transaction log to grow and possibly fregment which if undetected will cause problems later. Also, in a rollback situation, the duration is generally twice as long as the original transaction. So if your transaction fails after 1/2 hour it will take 1 hour to roll back and you can't stop it :-)
I have worked with SQL Server2000/2005, DB2, ADABAS and the above is true for all. I don't really see how Oracle can work differently.
You could possibly replace the T-SQL with a bcp command and there you can set the batch size without having to code it.
Issuing frequest commits in a single table scan is prefferable to running multiple scans with small processing numbers because generally if a table scan is required the whole table will be scanned even if you only returning a small subset.
Stay away from snapshots. A snapshot will only increase the number of IOs and competes for IO and CPU
In general, I find it better to update in batches - typically in the range of between 100 to 1000. It all depends on how your tables are structured: foreign keys? Triggers? Or just updating raw data? You need to experiment to see which scenario works best for you.
If I am in pure SQL, I will do something like this to help manage server resources:
SET ROWCOUNT 1000
WHILE 1=1 BEGIN
DELETE FROM MyTable WHERE ...
IF ##ROWCOUNT = 0
BREAK
END
SET ROWCOUNT 0
In this example, I am purging data. This would only work for an UPDATE if you could restrict or otherwise selectively update rows. (Or only insert xxxx number of rows into an auxiliary table that you can JOIN against.)
But yes, try not to update xx million rows at one time. It takes forever and if an error occurs, all those rows will be rolled back (which takes an additional forever.)
Well everything depends.
But ... assuming your db is in single user mode or you have table locks (tablockx) against all the tables involved, batches will probably perform worse. Especially if the batches are forcing table scans.
The one caveat is that very complex queries will quite often consume resources on tempdb, if tempdb runs out of space (cause the execution plan required a nasty complicated hash join) you are in deep trouble.
Working in batches is a general practice that is quite often used in SQL Server (when its not in snapshot isolation mode) to increase concurrency and avoid huge transaction rollbacks because of deadlocks (you tend to get deadlock galore when updating a 10 million row table that is active).
When you move to SQL Server 2005 or 2008, you will need to redo all those DTS packages in SSIS. I think you will pleasantly surprised to see how much faster SSIS can be.
In general, In SQL Server 2000, you want to run things in batches of records if the whole set ties up the table for too long. If you are running the packages at night when there is no use on the system, you may be able to get away with a set-based insert of the entire dataset. Row-by-row is always the slowest method, so avoid that if possible as well (Especially if all the row-row-row inserts are in one giant transaction!). If you have 24 hour access with no down time, you will almost certainly need to run in batches.