Why are there unused pages in my SQLite database?

Why are there unused pages in my SQLite database? - sql

If I look in the database settings of my database (with SQLite Manager) I have a freelist_count of 16. According to this source the freelist count is
the number of unused pages in the database file
What is meant with unused pages and why are there unusued pages? Is it bad to have a freelist count of greater than zero? What can I do to reduce the number to zero?

From SQLite FAQ:
(12) I deleted a lot of data but the database file did not get any smaller. Is this a bug?
No. When you delete information from an SQLite database, the unused
disk space is added to an internal "free-list" and is reused the next
time you insert data. The disk space is not lost. But neither is it
returned to the operating system.
If you delete a lot of data and want to shrink the database file, run
the VACUUM command. VACUUM will reconstruct the database from scratch.
This will leave the database with an empty free-list and a file that
is minimal in size. Note, however, that the VACUUM can take some time
to run (around a half second per megabyte on the Linux box where
SQLite is developed) and it can use up to twice as much temporary disk
space as the original file while it is running.
As of SQLite version 3.1, an alternative to using the VACUUM command
is auto-vacuum mode, enabled using the auto_vacuum pragma.
There's nothing bad in having some free pages, unless they take significant amount of space. It is up to you to decide where ends the “nothing bad” and starts the “significant”, depending on the needs of your application.

Related

WAL log files fill up quickly - how to prevent this?

currently the logs in the folder “/engine-rocksdb/journals” are running full (WAL logs).
When does ArangoDB do a cleaning run of these logs and delete them automatically and how to trigger this cleaning run earlier? My ArangoDB 3.10 runs in single mode and in a virtual environment (cloud with a network storage).
The logfile are increasing very fast for me because there are many writes to the DB. What is the best way, any idea?
What I have done so far:
If I set the value “rocksdb.wal-archive-size-limit” it does delete the logs when the set limit is reached, but it shows errors in the logfile:
2022-09-27T17:53:04Z [898948] WARNING [d9793] {engines} forcing removal of RocksDB WAL file '/archive/813371.log' with start sequence 5387062892 because of overflowing archive. configured maximum archive size is 1073741824, actual archive size is: 75401520
However, I still don't understand the meaning of the logfile output: "configured maximum archive size is 1073741824, actual archive size is: 75401520`". The "actual archive size" is smaller?
But what are the consequences of lowering the "wal-archive-size-limit" value? Is it possible to switch off the wal-archive completely. What exactly is it for? As I understand it, ArangoDb need it for transaction security (i.e. in case of power loss), right?
In general, yes, this is a good thing, but how can I get ArangoDb to a) limit this WAL-archive (without error massages) and b) do a cleaning run faster?
thx :-)

When does ArangoDB do a cleaning run of these logs and delete them automatically and how to trigger this cleaning run earlier?
ArangoDB uses RocksDB underneath, and RocksDB will move WAL file (.log files) into its archive as soon as possible. In order to do so all data from the WAL file needs to be safely stored in the column families' .sst files and have been flushed to disk.
ArangoDB will delete files from the WAL archive (and only from there) once it can assure that an archived WAL file is not used anymore. It will not remove files for the archive that are or may be in current use.
There are a few reasons why ArangoDB may keep archived WAL files for some time:
when server-to-server replication is used: while a follower replicates data, it may read from the leader's WAL. Deleting the WAL file on the leader may make the replication fail
when arangodump is used to create a database dump, it will create a snapshot of data on the server, and the WAL files for that snapshot will be kept around until the snapshot isn't needed anymore (i.e. arangodump finishes).
the first 180 seconds after server start, all WAL files are intentionally kept, for forensic reasons, and to allow followers to replay events from a leader's WAL when it is restarted. The value of 180 seconds can be changed by adjusting the startup option --rocksdb.wal-file-timeout-initial.
there can be some background processing of changes that may refer to data from WAL files. For example, each insert into a collection will need to increase the collection's count() value by 1. To save an extra write into RocksDB on each insert, the count() value is only written to the storage engine by a background thread, ideally only once every X insert operations. However, this may lead to WAL files being around for a bit longer, especially if the background thread cannot keep up with the insert workload.
There is the startup option --rocksdb.wal-archive-size-limit to put a hard limit on the cumulated size of the WAL files in the archive. From your question, it appears that you are currently using ArangoDB version 3.10.
From the warning message you posted, it seems that the WAL archive cleanup somehow applies the wrong limit values.
It turns out that there has been a recent bugfix, released in ArangoDB version 3.10.1, 3.9.4, and 3.8.8, that should rectify this behavior. So upgrading to one of these or later versions may actually help when using the WAL archive size limit.

Shared your question in the Speedb hive, on Discord, and here is what we got for you:
"By default, ArangoDB set the max_wal_size to 1G the value of rocksdb.wal-archive-size-limit must be set to at least twice this number (otherwise you may end up with a single WAL file and the delete will fail)."
Hope this help, if it doesn't or you have follow up questions, please join the Speedb Discord and we will be happy to help.

Can I remove #img_bkp_cache generated by Hyper Backup in Synology?

I have a Synology with 2 To of disk space, and it is saved every day by Hyper Backup (with Smart Recycle).
But there is a file #img_bkp_cache that is growing, and takes almost 1/5th of the total disk capacity :
368G /volume2/#img_bkp_cache
1.3T /volume2/Samba
Is it safe to remove that cache file? How to do that? What can I do to shrink it otherwise?
Thank you for your help.

Here is Synology support answer (translated):
The cache image contains your remote backups index. This index is
compared to the remote index to figure out which elements have
changed. If you have several remote backups, then #img_bkp_cache will
get bigger and bigger.
The index takes roughly 5% of the total size of a backup.
It is not really safe to remove #img_bkp_cache. If you do so, the
remote backup will not be affected, but it will be impossible to manage
incremental backups.
In a nutshell, this file is important and cannot be deleted without consequences.
Note: Finally, I switched from RAID 1 to RAID 5 and doubled my storage capacity (I had a fifth volume that was unused), which "solved" the problem.

Is there any logic in just maxing tempdb and never having it change size?

The reason I ask it we have a dedicated RAID10 array with ~150GB for the tempdb (the "t" drive). It is only used for storing tempdb. The t drive isn't used by by SQL Server or any other process for anything else.
Our DBA has tempdb setup with 15GB initial size and autogrow 20% increments. Everytime the server starts it resized to 15GB and then over the course of the day grows to ~80GB (on average). Now IT is looking into making initial size larger say 30 or 40GB but given the drive is ONLY used for tempdb my thinking is why not "max it" right away.
Is the any negative effect to simply create 4 data files in the primary group for tempdb give them each an initial size of 30GB (120GB total), turn autogrow off and be done with it?
Are there any limits on SQL Server ability to span multiple tempdb data files in one query? i.e. will it cause problems if the tempdb has say 70GB total free but the file used by one process is full (30 of 30GB used)?

I would size them to about 100GB and leave autogrow on, this way you don't have to wait for it to grow every time, I would also add multiple files
Is the any negative effect to simply
create 4 data files in the primary
group for tempdb give them each an
initial size of 30GB, turn autogrow
off and be done with it?
Sounds like a good plan to me, however I would leave autogrow on just in case someone decides to do a sort operation on a big table which doesn't have an index on that column
See also here: http://technet.microsoft.com/en-us/library/cc966534.aspx
It is recommended to have .25 to 1
data files (per filegroup) for each
CPU on the host server.
This is especially true for TEMPDB
where the recommendation is 1 data
file per CPU.
Dual core counts as 2 CPUs; logical
procs (hyperthreading) do not.

We have found it very useful to create large TempDB data and log files. Any actions that limit server OS activities such as resizing TempDB increase server efficiencies. We have a 16 processor machine with 113 GB dedicated to TempDB data space. This machine is dedicated to large SSIS ETL processes, thus resulting in mass data operations.
The bulk of our ETL operations spawn up to 4 SQL threads. After initially configuring a TempDB file for each processor (16), we quickly realized via performance monitoring that our configuration was forcing SQL\windows to unnecessarily span the multiple TempDB files. We settled on 5 larger TempDB data files and realized performance improvements. We have since moved on to a 24 processor box and are using 8 TempDB files.
Please note that this is a large data migration server; I’m sure transaction-oriented systems would still benefit from the recommended 1-1 processor to TempDB file configuration. It should also be noted that having a large increase % on a TempDB file may force a critical transaction to take the windows operation hit and thus may not be appropriate for your specific application.

Does SQL's DELETE statement truly delete data?

Story: today one of our customers asked us if all the data he deleted in the program was not recoverable.
Aside scheduled backups, we shrink the log file once a day, and we use the DELETE command to remove records inside our tables where needed.
Though, just for the sake of it, I opened the .mdf file with an editor (used PSPad), and searched for a particular unique piece of data -I was sure- was inside one of tables.
Problem: I tracked it in the file, then executed the DELETE command, and it was still there.
Question:
Is there a particular command we are not aware of to delete the records physically form the disk?
Note: we know there are particular techniques to recover lost data from the hard drives, but here I am talking about a notepad-wannabe!

The text may still be there, but SQL Server has no concept of that data having any structure or being available.
The "freed space" is simply deallocated: not removed, compacted or zeroed.
The "Instant File Initialization" feature relies on this too (not zeroing the entire MDF file) and previous disk data is still available eben for a brand new database:
Because the deleted disk content is overwritten only as new data is written to the files, the deleted content might be accessed by an unauthorized principal.
Edit: To reclaim space:
ALTER INDEX...WITH REBUILD is the best way
DBCC SHRINKFILE using NOTRUNCATE can compact pages into gaps caused by deallocated pages, but won't reclaim space in a page for deleted row

SQL Server just marks the space of deleted rows as available, but does not reorganize the database and does not zero out the freed up space. Try to "Shrink" the database, and the deleted rows should no longer be found.
Thanks, gbn, for your correction. A page is the allocation unit of the database, and shrinking a database only eliminates pages, but does not compact them. You'd have to delete all rows in a page in order to see them disappear after shrinking.

If your client is concerned about data security it should use Transparent Database Encryption. Even if you obliterate information from the table, the record is still in the log. Even when log is recycled, the info is still in the backups.

You could update the record with dummy values before issuing the delete, thereby overwriting the data on disk before the database marks it as free. (Whether this also works with LOB fields would warrant investigation, though).
And of course, you'd still have the problem of logs and backups, but I take it you already solved those.

When should one use auto shrink on log files in SQL Server?

I have had a few problems with log files growing too big on my SQL Servers (2000). Microsoft doesn't recommend using auto shrink for log files, but since it is a feature it must be useful in some scenarios. Does anyone know when is proper to use the auto shrink property?

Your problem is not that you need to autoshrink periodically but that you need to backup the log files periodically. (We back ours up every 15 minutes.) Backing up the database itself is not sufficient, you must do the log as well. If you do not back up the transaction log, it will grow until it takes up all the space on the drive. If you back it up, it frees the space to be reused (you will still probably need to shrink after the first backup to get the log down to a more reasonable size). If you don't need to be able torecover from transactions (which you should need to be able to do unless your entire database consists of tables that are loaded from another source and can easily be re-loaded.), then set your log to simlpe recovery mode.
One reason why autoshrinking isn't so good an idea is that you will be growing the transaction log frequently which slows down performance. IF you back up the log, one you get to a relatively stable size (the amount of space normally used by the transaction log in the time period between backups), then the log will only need to grow occasionally if there are an unusually heavy amount fo transactions.

My take on this is that auto-shrink is useful when you have many fairly small databases that frequently get larger due to added data, and then have a lot of empty space afterwards. You also need to not mind that the files will be fragmented on the disk when they frequently grow and shrink. I'd never use auto-shrink on a critical database or one larger than 2 GB, as you never know when the shrink operation will kick in, and access to the database will be blocked until the shrink has completed.

You should never have autoshrink turned on. It causes performance degradation in several ways. The file-system and indexes become fragmented and it is very resource intensive. It is also not necessary if you manage your backups correctly.
Read this answer from Paul Randal on Server Fault and Just Say No To Auto-Shrink!!

I used to use it when we had a demo version of a huge database that took up a lot of space on the laptop, so we used it to keep the size down.
The key is to use it only when the data is basically throw away.
You should truncate the logs periodically as a part of your backup strategy.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas