WAL log files fill up quickly - how to prevent this?

WAL log files fill up quickly - how to prevent this? - config

currently the logs in the folder “/engine-rocksdb/journals” are running full (WAL logs).
When does ArangoDB do a cleaning run of these logs and delete them automatically and how to trigger this cleaning run earlier? My ArangoDB 3.10 runs in single mode and in a virtual environment (cloud with a network storage).
The logfile are increasing very fast for me because there are many writes to the DB. What is the best way, any idea?
What I have done so far:
If I set the value “rocksdb.wal-archive-size-limit” it does delete the logs when the set limit is reached, but it shows errors in the logfile:
2022-09-27T17:53:04Z [898948] WARNING [d9793] {engines} forcing removal of RocksDB WAL file '/archive/813371.log' with start sequence 5387062892 because of overflowing archive. configured maximum archive size is 1073741824, actual archive size is: 75401520
However, I still don't understand the meaning of the logfile output: "configured maximum archive size is 1073741824, actual archive size is: 75401520`". The "actual archive size" is smaller?
But what are the consequences of lowering the "wal-archive-size-limit" value? Is it possible to switch off the wal-archive completely. What exactly is it for? As I understand it, ArangoDb need it for transaction security (i.e. in case of power loss), right?
In general, yes, this is a good thing, but how can I get ArangoDb to a) limit this WAL-archive (without error massages) and b) do a cleaning run faster?
thx :-)

When does ArangoDB do a cleaning run of these logs and delete them automatically and how to trigger this cleaning run earlier?
ArangoDB uses RocksDB underneath, and RocksDB will move WAL file (.log files) into its archive as soon as possible. In order to do so all data from the WAL file needs to be safely stored in the column families' .sst files and have been flushed to disk.
ArangoDB will delete files from the WAL archive (and only from there) once it can assure that an archived WAL file is not used anymore. It will not remove files for the archive that are or may be in current use.
There are a few reasons why ArangoDB may keep archived WAL files for some time:
when server-to-server replication is used: while a follower replicates data, it may read from the leader's WAL. Deleting the WAL file on the leader may make the replication fail
when arangodump is used to create a database dump, it will create a snapshot of data on the server, and the WAL files for that snapshot will be kept around until the snapshot isn't needed anymore (i.e. arangodump finishes).
the first 180 seconds after server start, all WAL files are intentionally kept, for forensic reasons, and to allow followers to replay events from a leader's WAL when it is restarted. The value of 180 seconds can be changed by adjusting the startup option --rocksdb.wal-file-timeout-initial.
there can be some background processing of changes that may refer to data from WAL files. For example, each insert into a collection will need to increase the collection's count() value by 1. To save an extra write into RocksDB on each insert, the count() value is only written to the storage engine by a background thread, ideally only once every X insert operations. However, this may lead to WAL files being around for a bit longer, especially if the background thread cannot keep up with the insert workload.
There is the startup option --rocksdb.wal-archive-size-limit to put a hard limit on the cumulated size of the WAL files in the archive. From your question, it appears that you are currently using ArangoDB version 3.10.
From the warning message you posted, it seems that the WAL archive cleanup somehow applies the wrong limit values.
It turns out that there has been a recent bugfix, released in ArangoDB version 3.10.1, 3.9.4, and 3.8.8, that should rectify this behavior. So upgrading to one of these or later versions may actually help when using the WAL archive size limit.

Shared your question in the Speedb hive, on Discord, and here is what we got for you:
"By default, ArangoDB set the max_wal_size to 1G the value of rocksdb.wal-archive-size-limit must be set to at least twice this number (otherwise you may end up with a single WAL file and the delete will fail)."
Hope this help, if it doesn't or you have follow up questions, please join the Speedb Discord and we will be happy to help.

Related

How do I keep Particular ServiceControl Audit db (Raven5) from getting larger and larger in size?

We recently upgraded Particular.ServiceControl.Audit to v4.26 and recreated our Audit instance, since from v4.26 and up new audit instances will bump up RavenDB from 3.5 to 5.4.
We did this hoping to remedy a problem in 4.25.x where compacting the audit database would not free up any disk space.
As it is, the database is still growing really large (3-400GB). To test if the data contained should actually use up all this disk space, we tried to tamper with the ServiceControl.Audit/AuditRetentionPeriod config parameter, setting it to a small value like 1 day (before value was 30 days). Naively, maybe, waiting for db to shrink at some point - expecting that retention would somehow influence disk space used. The db file remained the same size (and growing).
The docs for compacting the database in ServiceControl only mention the esent based Raven 3.5, but Raven5+ uses Raven's own Voron storage engine. There does not seem to be Particular docs describing how to compact the Voron db.
So we tried to follow the RavenDB docs for compacting the db through Raven Studio after putting the service in maintenance mode. As the screenshot shows, this operation stopped or stalled very shortly after initializing (disregard time elapsed, as we actually left it running for several hours).
We tried this several times with no luck. We can see that the disk containing the specific db has zero reads or writes, so it is definitely stalled in some way. Pressing the "Abort" button also got stuck every time, after which we resolved to simply restarting the entire service (which then seemed to bring the db back to normal operation).
So the question is: how do we keep the audit db from growing indefinitely? We can't at any point see it not growing or staying the same size, causing SSD disk costs to grow in expenses.

Redis data recovery from Append Only File?

If we enable the AppendFileOnly in the redis.conf file, every operation which changes the redis database is loggged in that file.
Now, Suppose Redis has used all the memory allocated to it in the "maxmemory" direcive in the redis.conf file.
To store more data., it starts removing data by any one of the behaviours(volatile-lru, allkeys-lru etc.) specified in the redis.conf file.
Suppose some data gets removed from the main memory, But its log will still be there in the AppendOnlyFile(correct me if I am wrong). Can we get that data back using this AppendOnlyFile ?
Simply, I want to ask that if there is any way we can get that removed data back in the main memory ? Like Can we store that data into disk memory and load that data in the main memory when required.

I got this answer from google groups. I'm sharing it.
----->
Eviction of keys is recorded in the AOF as explicit DEL commands, so when
the file is replayed fully consistency is maintained.
The AOF is used only to recover the dataset after a restart, and is not
used by Redis for serving data. If the key still exists in it (with a
subsequent eviction DEL), the only way to "recover" it is by manually
editing the AOF to remove the respective deletion and restarting the
server.
-----> Another answer for this
The AOF, as its name suggests, is a file that's appended to. It's not a database that Redis searches through and deletes the creation record when a deletion record is encountered. In my opinion, that would be too much work for too little gain.
As mentioned previously, a configuration that re-writes the AOF (see the BGREWRITEAOF command as one example) will erase any keys from the AOF that had been deleted, and now you can't recover those keys from the AOF file. The AOF is not the best medium for recovering deleted keys. It's intended as a way to recover the database as it existed before a crash - without any deleted keys.
If you want to be able to recover data after it was deleted, you need a different kind of backup. More likely a snapshot (RDB) file that's been archived with the date/time that it was saved. If you learn that you need to recover data, select the snapshot file from a time you know the key existed, load it into a separate Redis instance, and retrieve the key with RESTORE or GET or similar commands. As has been mentioned, it's possible to parse the RDB or AOF file contents to extract data from them without loading the file into a running Redis instance. The downside to this approach is that such tools are separate from the Redis code and may not always understand changes to the data format of the files the way the Redis server does. You decide which approach will work with the level of speed and reliability you want.

But its log will still be there in the AppendOnlyFile(correct me if I am wrong). Can we get that data back using this AppendOnlyFile ?
NO, you CANNOT get the data back. When Redis evicts a key, it also appends a delete command to AOF. After rewriting the AOF, anything about the evicted key will be removed.
if there is any way we can get that removed data back in the main memory ? Like Can we store that data into disk memory and load that data in the main memory when required.
NO, you CANNOT do that. You have to take another durable data store (e.g. Mysql, Mongodb) for saving data to disk, and use Redis as cache.

What to do with old files of the SoftIndexFileStore in Infinispan persistent cache store?

I have a clustered cache store set up with Infinispan (8.2.4 Final) using the SoftIndexFileStore for persistence.
The documentation states that if entries expire it's not possible for the Compactor to cleanup purged entries and the disk usage will grow overtime. From the userguide:
When entries are stored with expiration, SIFS cannot detect that some
of those entries are expired. Therefore, such old file will not be
compacted (method AdvancedStore.purgeExpired() is not implemented).
This can lead to excessive file-system space usage.
Most of my entries expire but there are some which need to persist indefinitely meaning I can't simply run a cleanup job every once in while to delete all the data files.
How to deal with this wasted disk usage? After several weeks of running I see many files which haven't been modified in weeks. Is it safe to delete old files which haven't been modified e.g. less than a month ago?

No; old files won't ever be modified again (they are written once and then considered immutable until removed). Removing them manually could lead to failures since these files are referenced in the index.
Regrettably, when the store is iterated and the entries are found expired, the Compactor.free() is not called, because there could be multiple concurrent iterations and we could end up calling it many times for single entry.
A proper solution would be implementing a periodic (or JMX-triggered) process that goes through old files, computes space occupied by expired entries and schedules files that exceed some threshold for compaction. This should go into Compactor. Please see SIFS javadoc for general design description.
If you're interested in developing this feature and you want to discuss that more, please go to Infinispan forum.

SQL Log File Not Shrinking in SQL Server 2012

I am dealing with someone else's backup Maintenance Plan and have an issue with the log file, I have a database that sits on one drive with a size of 31 GB and a log file that sits on another server with a size of 20 GB, the database is in Full Recovery Model. There is a maintenance plan that runs once a day to do a complete backup and a second plan that does a backup of the log file every 15 minutes. I have checked and the drive that the log file gets backed up to and there is still plenty of room but the log file never gets smaller after the backup, is there something missing from the maintenance plan?
Thanks in advance

The situation as you describe it seems fine.
A transaction log backup does not shrink the log file. However, it does truncate the log, file, which means that space can be reused:
From Books Online (Transaction Log Truncation):
Log truncation automatically frees space in the logical log for reuse
by the transaction log.
Also, from Managing the Transaction Log:
Log truncation, which is automatic under the simple recovery model, is
essential to keep the log from filling. The truncation process reduces
the size of the logical log file by marking as inactive the virtual
log files that do not hold any part of the logical log.
This means that each time the transaction log backup occurs in your scenario, it's creating free space in the file which can be used by subsequent transactions.
Leading on from this, should you shrink the file as well? Generally speaking, the answer is no. Assuming your database does not suddenly have massive one-off spikes in usage, the transaction log will have grown to a size to accommodate the typical workload.
This means if you start shrinking the log, SQL Server will just need to grow it again... This is a resource intensive operation, affecting server performance, and no transactions can complete while the log is growing.
The current plan and file sizes all seem reasonable to me.

I don't know if this applies to your situation, but earlier versions of SQL Server 2012 have a bug that crops up when model is set to Simple recovery model. For any database created with model set to Simple, log files will continue to grow in an attempt to reach the 2,097,152 MB limit. This still applies if you alter to Full afterwards. KB article 2830400 states that altering to Full, then altering back to Simple is a workaround -- that was not my experience. Running CU 7 for SP1 was the only trick that worked for me.
The article provides links for the first updates that resolved this bug: "Cumulative Update 4 for SQL Server 2012 SP1", as well as, "Cumulative Update 7 for SQL Server 2012" (if you haven't installed SP1).

If you change the recovery to full and then back to simple, the shrink will work successfully.

When should one use auto shrink on log files in SQL Server?

I have had a few problems with log files growing too big on my SQL Servers (2000). Microsoft doesn't recommend using auto shrink for log files, but since it is a feature it must be useful in some scenarios. Does anyone know when is proper to use the auto shrink property?

Your problem is not that you need to autoshrink periodically but that you need to backup the log files periodically. (We back ours up every 15 minutes.) Backing up the database itself is not sufficient, you must do the log as well. If you do not back up the transaction log, it will grow until it takes up all the space on the drive. If you back it up, it frees the space to be reused (you will still probably need to shrink after the first backup to get the log down to a more reasonable size). If you don't need to be able torecover from transactions (which you should need to be able to do unless your entire database consists of tables that are loaded from another source and can easily be re-loaded.), then set your log to simlpe recovery mode.
One reason why autoshrinking isn't so good an idea is that you will be growing the transaction log frequently which slows down performance. IF you back up the log, one you get to a relatively stable size (the amount of space normally used by the transaction log in the time period between backups), then the log will only need to grow occasionally if there are an unusually heavy amount fo transactions.

My take on this is that auto-shrink is useful when you have many fairly small databases that frequently get larger due to added data, and then have a lot of empty space afterwards. You also need to not mind that the files will be fragmented on the disk when they frequently grow and shrink. I'd never use auto-shrink on a critical database or one larger than 2 GB, as you never know when the shrink operation will kick in, and access to the database will be blocked until the shrink has completed.

You should never have autoshrink turned on. It causes performance degradation in several ways. The file-system and indexes become fragmented and it is very resource intensive. It is also not necessary if you manage your backups correctly.
Read this answer from Paul Randal on Server Fault and Just Say No To Auto-Shrink!!

I used to use it when we had a demo version of a huge database that took up a lot of space on the laptop, so we used it to keep the size down.
The key is to use it only when the data is basically throw away.
You should truncate the logs periodically as a part of your backup strategy.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas