SQL Server MERGE on a large table with small log file

SQL Server MERGE on a large table with small log file - sql

I am running a MERGE statement on a large table (5M of rows) with a small log file size (2GB). I am getting an error:
Merge for MyTable failed: The transaction log for database 'MyDb' is full due to 'ACTIVE_TRANSACTION'.
Could be this solved by another action than extending the log file? I can’t really afford to extend the log file currently.

If you have a fixed log file size, you have essentially two options:
Temporarily change the recovery mode of your database from FULL to BULK-LOGGED. You'll lose the ability to do point-in-time recovery during this period, but it allows you to quickly do the operation and then go back. There are other caveats, so you need to do some research to make sure this is what you want to do.
Instead of changing the transaction log, you can adopt a batching approach to commit small batches of changes at a time, thus allowing the log to flush as needed.

Related

SQL SERVER TRANSACTION LOG

what are the consequences if the Transaction log growth is restricted and full in SQL SERVER

It will explodes and burn down your house..
Seriously , it will generate problems such as, not being able to perform transaction.

I strongly agree with Kundan.
But would like add some more points on this:
Additionally, transaction log expansion may occur for one of the
following reasons or in one of the following scenarios:
A very large transaction log file.
Transactions may fail and may start to roll back.
Transactions may take a long time to complete.
Performance issues may occur.
Blocking may occur.
The database is participating in an AlwaysOn availability group.
You can take following actions i the log file is full:
Backing up the log.
Freeing disk space so that the log can automatically grow.
Moving the log file to a disk drive with sufficient space.
Increasing the size of a log file.
Adding a log file on a different disk.
Completing or killing a long-running transaction.
For more info please refer to the below mentioned link:
https://support.microsoft.com/en-in/help/317375/a-transaction-log-grows-unexpectedly-or-becomes-full-in-sql-server
https://msdn.microsoft.com/en-us/library/ms175495.aspx

Why do SQL databases use a write-ahead log over a command log?

I read about Voltdb's command log. The command log records the transaction invocations instead of each row change as in a write-ahead log. By recording only the invocation, the command logs are kept to a bare minimum, limiting the impact the disk I/O will have on performance.
Can anyone explain the database theory behind why Voltdb uses a command log and why the standard SQL databases such as Postgres, MySQL, SQLServer, Oracle use a write-ahead log?

I think it is better to rephrase:
Why does new distributed VoltDB use a command log over write-ahead log?
Let's do an experiment and imagine you are going to write your own storage/database implementation. Undoubtedly you are advanced enough to abstract a file system and use block storage along with some additional optimizations.
Some basic terminology:
State : stored information at a given point of time
Command : directive to the storage to change its state
So your database may look like the following:
Next step is to execute some command:
Please note several important aspects:
A command may affect many stored entities, so many blocks will get dirty
Next state is a function of the current state and the command
Some intermediate states can be skipped, because it is enough to have a chain of commands instead.
Finally, you need to guarantee data integrity.
Write-Ahead Logging - central concept is that State changes should be logged before any heavy update to permanent storage. Following our idea we can log incremental changes for each block.
Command Logging - central concept is to log only Command, which is used to produce the state.
There are Pros and Cons for both approaches. Write-Ahead log contains all changed data, Command log will require addition processing, but fast and lightweight.
VoltDB: Command Logging and Recovery
The key to command logging is that it logs the invocations, not the
consequences, of the transactions. By recording only the invocation,
the command logs are kept to a bare minimum, limiting the impact the disk I/O will
have on performance.
Additional notes
SQLite: Write-Ahead Logging
The traditional rollback journal works by writing a copy of the
original unchanged database content into a separate rollback journal
file and then writing changes directly into the database file.
A COMMIT occurs when a special record indicating a commit is appended
to the WAL. Thus a COMMIT can happen without ever writing to the
original database, which allows readers to continue operating from the
original unaltered database while changes are simultaneously being
committed into the WAL.
PostgreSQL: Write-Ahead Logging (WAL)
Using WAL results in a significantly reduced number of disk writes,
because only the log file needs to be flushed to disk to guarantee
that a transaction is committed, rather than every data file changed
by the transaction.
The log file is written sequentially, and so the
cost of syncing the log is much less than the cost of flushing the
data pages. This is especially true for servers handling many small
transactions touching different parts of the data store. Furthermore,
when the server is processing many small concurrent transactions, one
fsync of the log file may suffice to commit many transactions.
Conclusion
Command Logging:
is faster
has lower footprint
has heavier "Replay" procedure
requires frequent snapshot
Write Ahead Logging is a technique to provide atomicity. Better Command Logging performance should also improve transaction processing. Databases on 1 Foot
Confirmation
VoltDB Blog: Intro to VoltDB Command Logging
One advantage of command logging over ARIES style logging is that a
transaction can be logged before execution begins instead of executing
the transaction and waiting for the log data to flush to disk. Another
advantage is that the IO throughput necessary for a command log is
bounded by the network used to relay commands and, in the case of
Gig-E, this throughput can be satisfied by cheap commodity disks.
It is important to remember VoltDB is distributed by its nature. So transactions are a little bit tricky to handle and performance impact is noticeable.
VoltDB Blog: VoltDB’s New Command Logging Feature
The command log in VoltDB consists of stored procedure invocations and
their parameters. A log is created at each node, and each log is
replicated because all work is replicated to multiple nodes. This
results in a replicated command log that can be de-duped at replay
time. Because VoltDB transactions are strongly ordered, the command
log contains ordering information as well. Thus the replay can occur
in the exact order the original transactions ran in, with the full
transaction isolation VoltDB offers. Since the invocations themselves
are often smaller than the modified data, and can be logged before
they are committed, this approach has a very modest effect on
performance. This means VoltDB users can achieve the same kind of
stratospheric performance numbers, with additional durability
assurances.

From the description of Postgres' write ahead http://www.postgresql.org/docs/9.1/static/wal-intro.html and VoltDB's command log (which you referenced), I can't see much difference at all. It appears to be the identical concept with a different name.
Both sync only the log file to the disk but not the data so that the data could be recovered by replaying the log file.
Section 10.4 of VoltDB explains that their community version does not have command log so it would not pass the ACID test. Even in the enterprise edition, I don't see the details of their transaction isolation (e.g. http://www.postgresql.org/docs/9.1/static/transaction-iso.html) needed to make me comfortable that VoltDB is as serious as Postges.

With WAL, readers read from pages from unflushed logs. No modification is made to the main DB. With command logging, you have no ability to read from the command log.
Command logging is therefore vastly different. VoltDB uses command logging to create recovery points and ensure durability, sure - but it is writing to the main db store (RAM) in real time - with all the attendant locking issues, etc.

The way I read it is as follows: (My own opinion)
Command Logging as described here logs only transactions as they occur and not what happens in or to them. Ok, so here is the magic piece... If you want to rollback you need to restore the last snapshot and then you can replay all the transactions that were applied after that (Described in the link above). So effectively you are restoring a backup and re-applying all your scripts, only VoltDB has now automated it for you.
The real difference that i see with this is that you cannot rollback to a point in time logically as with a normal transaction log. Normal transaction logs (MSSQL, MySQL etc.) can easily rollback to a point in time (in the correct setup) as the transactions can be 'reversed'.
Interresting question comes up - referring to the pos by pedz, will it always pass the ACID test even with the Command Log? Will do some more reading...
Add: Did more reading and I don't think this is a good idea for very big and busy transactional databases. A DB snapshot is automatically created when the Command Logs fill up, to save you from big transaction logs and the IO used for this? You are going to incur large IO amounts with your snapshots being done at a regular interval and you are also using your memory to the brink. Alos, in my view you lose your ability to rollback easily to a point in time before the last automatic snapshot - think this will get very tricky to manage.
I'll rather stick to Transaction Logs for Transactional systems. It's proven and it works.

Its really just a matter of granularity. They log operations at the level of stored procedures, most RDBMS log at the level of individual statements (and 'lower'). Also their blurb regarding advantages is a bit of a red herring:
One advantage of command logging over ARIES style logging is that a
transaction can be logged before execution begins instead of executing
the transaction and waiting for the log data to flush to disk.
They have to wait for the command to be logged too, its just a much smaller record.
If I'm not mistaken VoltDB's unit of transaction is a stored proc. Traditional RDBMS usually need to support ad-hoc transactions containing any number of statements, so procedure-level logging is out of the question. Furthermore stored procedures are often not truly deterministic in traditional RDBMS (i.e. given params+log+data always produce same output), which they would have to be for this to work.
Nevertheless the performance improvements would be substantial for this constrained RDBMS model.

Few terminologies before I start explaining:
Logging schemes: The database uses logging schemes such as Shadow paging, Write Ahead Log (WAL), to implement concurrency, isolation, and durability (how is a different topic).
In order to understand why WAL is better, let's see an issue with shadow paging. In shadow paging, the database uses a master version and a shadow version of the database so that if the table size is 1 billion and the buffer pool manager does not have enough memory to hold all the tuple (records) in the memory the dirty pages are not written to the master version until the transaction(s) are not committed.
Once all the transactions are committed, the flag is switched and the shadow version becomes the master version. In the diagram above there are Page 3 and Page 5 that are old and can be garbage collected.
The issue with this approach is a large number of fragmented tuples left behind which is randomly located, this is slower as compared to if the dirty pages are sequentially accessed, and this is what Write Ahead Log does.
The other advantage of using WAL is the runtime performance (as you are not doing random IO to flush out the pages) but slower recovery time. Whereas, with shadow paging, the recovery performance is faster (which is required occasionally).

SQL Server 2005-- How can I detach & attach a DB and not create a T-Log File?

I have a database that is a container for data that is exported nightly from another database. Each night all of the data is deleted and refreshed. We previously migrated this process from SQL 2000 to 2005. The process is causing the associated Tran Log to get out of hand in size.
To fix this problem I've decided to remove the tran log file from the database since the data is re-exported each night.
I found this article and I've been trying to follow step II.
SQL SERVER – Shrinking Truncate Log File – Log Full – Part 2
The problem that I am having is that once I re-attach the database, a new transaction log is being created. To be sure, when I go to attach the mdb file, I see a list of two database files on the attach screen, at which point I remove the log file from the list. Regardless, a new tran log file is created.
The log file is getting out of control. Is there another way that I can remove if from my database or stop it from growing beyond an unreasonable size?
I know that I can set a max tran log file size, but I wasn't sure once that limit was reached if it would automatically shrink the log file or start logging errors? Any tips or pointers would be helpful. I don't mind the existence of the tran log, I just want it to maintain a manageable size.

If you don't need a transaction history and point-in-time recovery, set the database recovery model to 'Simple'.
The transaction log will then only store enough information to roll back pending transactions, rather than be a complete log of all (most) DB changes..
See also: http://msdn.microsoft.com/en-us/library/ms175987.aspx
That said, if like me you've pressed F5 mid-query, just before you started typing 'WHERE' by mistake:
DELETE FROM OrderLine
....then being able to undo the last 5 minutes' worth of damage is very handy.

SQL Server 2008 - Shrinking the Transaction Log - Any way to automate?

I went in and checked my Transaction log the other day and it was something crazy like 15GB. I ran the following code:
USE mydb
GO
BACKUP LOG mydb WITH TRUNCATE_ONLY
GO
DBCC SHRINKFILE(mydb_log,8)
GO
Which worked fine, shrank it down to 8MB...but the DB in question is a Log Shipping Publisher, and the log is already back up to some 500MB and growing quick.
Is there any way to automate this log shrinking, outside of creating a custom "Execute T-SQL Statement Task" Maintenance Plan Task, and hooking it on to my log backup task? If that's the best way then fine...but I was just thinking that SQL Server would have a better way of dealing with this. I thought it was supposed to shrink automatically whenever you took a log backup, but that's not happening (perhaps because of my log shipping, I don't know).
Here's my current backup plan:
Full backups every night
Transaction log backups once a day, late morning (maybe hook the Log shrinking onto this...doesn't need to be shrank every day though)
Or maybe I just run it once a week, after I run a full backup task? What do you all think?

If you file grows every night at 500 MB there is only one correct action: pre-grow the file to 500MB and leave it there. Shrinking the log file is damaging. Having the log file auto-grow is also damaging.
you hit the file growth zero fill initialization during normal operations, reducing performance
your log grows in small increments creating many virtual log files, resulting in poorer operational performance
your log gets fragmented during shrinkage. While not as bad as a data file fragmentation, log file fragmentation still impact performance
one day the daily growth of 500MB will run out of disk space and you'd wish the file was pre-grown
You don't have to take my word for it, you can read on some of the MVP blogs what they have to say about the practice of log and file shrinkage on a regular basis:
Auto-shrink – turn it OFF!
Oh, the horror! Please stop telling people they should shrink their log files!
Why you want to be restrictive with shrink of database files
Don't Touch that Shrink Button!
Do not truncate your ldf files!
There are more, I just got tired of linking them.
Every time you shrink a log file, a fairy loses her wings.

I'd think more frequent transaction log backups.

I think what you suggest in your question is the right approach. That is, "hook the Log shrinking onto" your nightly backup/maintenance task process. The main thing is that you are regularly doing transaction log backups, which will allow the database to be shrunk when you do the shrink task. The key thing to keep in mind is that this is a two-step process: 1) backup your transaction log, which automatically "truncates" your log file; 2) run a shrink against your log file. "truncate" doesn't necessarily (or ever?) mean that the file will shrink...shrinking it is a separate step you must do.

for SQL Server 2005
DBCC SHRINKFILE ( Database_log_file_name , NOTRUNCATE)
This statement don't break log shipping. But, you may need to run more than one. For each run, the log shipping backup, copy, and restored to run after again run this statement.
Shrink and truncate are different.
My experiences:
AA db, 6.8GB transaction log
first run: 6.8 GB
log shipping backup, copy, restore after second run: 1.9 GB
log shipping backup, copy, restore after third run: 1.7 GB
log shipping backup, copy, restore after fourth run: 1 GB
BB db, 50GB transaction log
first run: 39 GB
log shipping backup, copy, restore after second run: 1 GB

Creating a transaction log backup doesn't mean that the online transaction log file size will be reduced. The file size remains the same. When a transaction is backuped up, in the online transaction log it's marked for overwriting. It;s not automatically removed, and no spaces is freed, therefore, the size remains the same.
Once you set the LDF file size, maintain its size by setting the right transaction log backup frequency.
Paul Randal provides details here:
Understanding Logging and Recovery in SQL Server
Understanding SQL Server Backups

Based on Microsoft recommendation Before you intend to Shrink log file you should first try to perform the following capabilities:
Freeing disk space so that the log can automatically grow.
Moving the log file to a disk drive with sufficient space.
Increasing the size of a log file.
Adding a log file on a different disk.
Turn on auto growth by using the ALTER DATABASE statement to set a non-zero growth increment for the FILEGROWTH option.
ALTER DATABASE EmployeeDB MODIFY FILE ( NAME = SharePoint_Config_log, SIZE = 2MB, MAXSIZE = 200MB, FILEGROWTH = 10MB );
Also, you should be aware of shrink operation via maintenance plan will effect on *.mdf file and *.ldf file. so you need to create a maintenance plan with SQL job task and write the following command to can only shrink *.ldf file to your appropriate target size.
use sharepoint_config
go
alter database sharepoint_config set recovery simple
go
dbcc shrinkfile('SharePoint_Config_log',100)
go
alter database sharepoint_config set recovery FUll
go
Note: 100 is called the target_size for the file in megabytes, expressed as an integer. If not specified, DBCC SHRINKFILE reduces the size to the default file size. The default size is the size specified when the file was created.
In my humble opinion, It’s not recommended to perform the shrink operation periodically! Only in some circumstances that you need to reduce the physical size.
You can also check this useful guide to Shrink a transaction log file Maintenance Plan in SQL Server

How to temporarily disable the log in SQL2000/2005?

Is there a way to stop the log file from growing (or at least from growing as much) in SQL2000/2005?
I am running a very extensive process with loads of inserts and the log is going through the roof.
EDIT: please note I am talking about an batch-import process not about everyday update of live-data.

You can't disable the log, but you could perform your inserts in batches and backup/truncate the log in between batches.
If the data originates from outside your database you could also consider using BCP.

Remember that setting the recovery mode to SIMPLE only allows you to recover the database to the point of your most recent backup. Pending transaction which have not been committed to the database - after the backup has been created - will be lost.

Changing the recovery model will cause your old log backups to be of no use if you need to restore as this will change the log chain.
If you need full recovery normally you'll want to increase your log backup frequency during the load process. This can be done by changing the job schedule for the log backup via the sp_update_jobschedule procedure in the msdb database both before and after the load process.

Your batch may make too much use of temporary tables.

You can turn 'autogrowth' off when creating a database.
You can change this setting seperately for the database and/or the logfile.
Change Autogrowth setting SQL Server http://www.server-management.co.uk/images/library/c1652bc7-.jpg

Changing the recovery mode to SIMPLE causes the log to grow not as much.
What's people opinion about this solution?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas