I'm hoping to get some opinions on what could be the cause of strange checkpoint behaviour in SQL Server.
I have a database which is in the SIMPLE recovery model and starts at 10 GB in size. The database is on a SQL Server 2017 instance and is configured for Indirect Checkpoints with target_recovery_time_in_seconds set to 60.
We have alerts that trigger on transaction log percent usage (70%) which is typically when an internal CHECKPOINT would occur. We then continued to receive alerts as the transaction log continued to grow and eventually registered 99% full but no further growth occurred.
The log_reuse_wait_desc column in sys.databases showed ACTIVE TRANSACTION as the reason why the last attempted log truncation failed. I confirmed that there were no active transactions running using close to all relevant DMVs.
Issuing a CHECKPOINT manually cleared the wait_desc and truncated the log.
My theory is that the database had an active transaction at the time when log truncation was last attempted either when 70% log usage was breached or after that point when the target dirty buffers to be flushed to disk was reached. In either case there was an active transaction at that point which prevented log truncation. Since that last checkpoint there was minimal activity resulting in no further checkpoint attempt due to not reaching the dirty buffers threshold therefore even though there is now no active transaction log truncation would can't take place until a CHECKPOINT was issued.
I intend to place Trace Flag 3502 on to see the checkpoint activity when this transaction is supposedly running.
Has anyone ever encountered this behaviour, or knows if SQL Server has a back off configured for running checkpoints when above 70% transaction log usage even as the log continues to fill?
Many thanks!
As pointed out by #sepupic, the 70% log space usage issued checkpoint is a characteristic of automatic checkpoints and not internal checkpoints (see comments on question).
The simple reason for this noticed behaviour is that the indirect checkpoints would've responded to dirty page threshold breaches while the active transaction continued to execute. The active transaction prevented log truncation from occurring with the checkpoints and so the transaction log continued to grow.
Between the time that the last indirect checkpoint and the previously active transaction (that prevented log truncation) completed there were insufficient dirty pages to trigger an indirect checkpoint to occur.
Hence why the last log_reuse_wait_desc remained ACTIVE TRANSACTION even when no active transaction was found upon investigation and that the log file usage was immediately cleared by a manual CHECKPOINT command being issued.
Related
i updated data in my table before commit transaction i shut the database with shutdown abort when again start the database the data gone
how to recover uncommitted transaction in oracle 11g?
There are two possible ways of doing that (besides some workarounds):
Cache Recovery
To solve this dilemma, two separate steps are generally used by Oracle Database for a successful recovery of a system failure: rolling forward with the redo log (cache recovery) and rolling back with the rollback or undo segments (transaction recovery).
The online redo log is a set of operating system files that record all changes made to any database block, including data, index, and rollback segments, whether the changes are committed or uncommitted. All changes to Oracle Database blocks are recorded in the online redo log.
The first step of recovery from an instance or media failure is called cache recovery or rolling forward, and involves reapplying all of the changes recorded in the redo log to the datafiles. Because rollback data is also recorded in the redo log, rolling forward also regenerates the corresponding rollback segments.
Rolling forward proceeds through as many redo log files as necessary to bring the database forward in time. Rolling forward usually includes online redo log files (instance recovery or media recovery) and could include archived redo log files (media recovery only).
After rolling forward, the data blocks contain all committed changes. They could also contain uncommitted changes that were either saved to the datafiles before the failure, or were recorded in the redo log and introduced during cache recovery.
Transaction Recovery
After the roll forward, any changes that were not committed must be undone. Oracle Database applies undo blocks to roll back uncommitted changes in data blocks that were either written before the failure or introduced by redo application during cache recovery. This process is called rolling back or transaction recovery.
Figure 12-2 illustrates rolling forward and rolling back, the two steps necessary to recover from any type of system failure.
Figure 12-2 Basic Recovery Steps: Rolling Forward and Rolling Back
Oracle Database can roll back multiple transactions simultaneously as needed. All transactions that were active at the time of failure are marked as terminated. Instead of waiting for SMON to roll back terminated transactions, new transactions can recover blocking transactions themselves to get the row locks they need.
Source link here.
A small addition, to shed some light on the case:
Oracle performs crash recovery and instance recovery automatically after an instance failure. In the case of media failure, a database administrator (DBA) must initiate a recovery operation. Recovering a backup involves two distinct operations: rolling the backup forward to a more recent time by applying redo data, and rolling back all changes made in uncommitted transactions to their original state.
In general, recovery refers to the various operations involved in restoring, rolling forward, and rolling back a backup. Backup and recovery refers to the various strategies and operations involved in protecting the database against data loss and reconstructing the database should a loss occur.
In brief, you can not recover the updated data, as it should be rolled back, in order to preserve the Database consistency. Have in mind that transactions are atomic, so they should be either COMMITTED or ROLLED BACK. Since the session that initiated it is now killed(stopped), no one can COMMIT it - thus the SMON does a ROLLBACK.
Uncommitted transactions will be rolled-back one the instance starts after the crash.
What is Checkpoint in SQL Server Transaction, what are the different types of Checkpoint
A checkpoint writes the current in-memory modified pages (known as dirty pages) and transaction log information from memory to disk and, also, records information about the transaction log
Automatic
Issued automatically in the background to meet the upper time limit suggested by the recovery interval server configuration option. Automatic checkpoints run to completion. Automatic checkpoints are throttled based on the number of outstanding writes and whether the Database Engine detects an increase in write latency above 20 milliseconds.
Indirect
Issued in the background to meet a user-specified target recovery time for a given database. The default is 0, which indicates that the database will use automatic checkpoints, whose frequency depends on the recovery interval setting of the server instance.
Manual
Issued when you execute a Transact-SQL CHECKPOINT command. The manual checkpoint occurs in the current database for your connection. By default, manual checkpoints run to completion. Throttling works the same way as for automatic checkpoints. Optionally, the checkpoint_duration parameter specifies a requested amount of time, in seconds, for the checkpoint to complete.
Internal
Issued by various server operations such as backup and database-snapshot creation to guarantee that disk images match the current state of the log.
A checkpoint creates a known good point from which the SQL Server Database Engine can start applying changes contained in the log during recovery after an unexpected shutdown or crash.
While doing batch delete operation forcing 'Checkpoint' helped to deletion faster..
what are the consequences if the Transaction log growth is restricted and full in SQL SERVER
It will explodes and burn down your house..
Seriously , it will generate problems such as, not being able to perform transaction.
I strongly agree with Kundan.
But would like add some more points on this:
Additionally, transaction log expansion may occur for one of the
following reasons or in one of the following scenarios:
A very large transaction log file.
Transactions may fail and may start to roll back.
Transactions may take a long time to complete.
Performance issues may occur.
Blocking may occur.
The database is participating in an AlwaysOn availability group.
You can take following actions i the log file is full:
Backing up the log.
Freeing disk space so that the log can automatically grow.
Moving the log file to a disk drive with sufficient space.
Increasing the size of a log file.
Adding a log file on a different disk.
Completing or killing a long-running transaction.
For more info please refer to the below mentioned link:
https://support.microsoft.com/en-in/help/317375/a-transaction-log-grows-unexpectedly-or-becomes-full-in-sql-server
https://msdn.microsoft.com/en-us/library/ms175495.aspx
We are using hsqldb (2.2.8) as in-memory database for our web application which is running on tomcat web server (6.19). We have set the max size of the log file to hsqldb.log_size=200 (which is 200 MB),
but for some instances of out production environment the log file (~/tomcat/work/hypersonic/localDB.log) is growing way beyond that range (40GB).
Further looking into the logs we found that the DB stop performing the CHECKPOINT operation. What is the default behaviors of HSQL DB in performing periodic CHECKPOINT operation ? Is there anyway we can stop growing this LOG file.
After the size of the .log reaches its limit, the CHECKPOINT operation is performed when all connections to the database have committed. You may have a connection that has not been committed.
You can check the INFORMATION_SCHEMA.SYSTEM_SESSIONS table and see if there is a session in transaction. You can reset such sessions with the ALTER SESSION statement.
http://www.hsqldb.org/doc/2.0/guide/sessions-chapt.html
What are the possible causes of premature redo log switching in Oracle other than reaching the specified file size and executing ALTER SYSTEM SWITCH LOGFILE?
We have a situation where some (but not all) of our nodes are prematurely switching redo log files before filling up. This happens every 5 - 15 minutes and the size of the logs in each case vary wildly (from 15% - 100% of the specified size).
This article says that it behaves differently in RAC.
In a parallel server environment, the
LGWR process in each instance holds a
KK instance lock on its own thread.
The id2 field identifies the thread
number. This lock is used to trigger
forced log switches from remote
instances. A log switch is forced
whenever the current SCN for a thread
falls behind the force SCN recorded in
the database entry section of the
controlfile. The force SCN is one more
than the highest high SCN of any log
file reused in any thread.