Is SELECT retrieving data from the WAL files? - sql

To my understanding, database can postpone writing to table files to boost IO performance. When transaction is COMMITted, data are written to the WAL files.
I'm curious, how much delayed the writing to table files can be. In particular, when I use a simple SELECT, e.g.
SELECT * from myTable;
after COMMIT, is it possible that database has to retrieve data from the WAL files in addition to the table files?

The documentation talks about being able to postpone the flushing of data pages:
If we follow this procedure, we do not need to flush data pages to disk on every transaction commit.
What WAL files allow an RDBMS to do is keeping "dirty" data pages in memory and flushing them to disk at a later time. It does not work in a way that the data pages are modified at a later time.
So the answer to your question is "No, a SELECT is always retrieving data from the data pages, not from the WAL files".

PostgreSQL does not read from WAL for this purpose during normal operations. It would only read WAL in order to apply changes to the data files after a crash (or during replication onto another server).
When data from ordinary data files is changed, the pages of the data files are kept in shared memory (in shared_buffers) until they are written to disk. Any other processes wanting to see that data will find it in that shared memory, and it will be in its changed form. They will always look for it in shared_buffers before they try to read it from disk, so no one will ever see the stale on-disk version of the data, except for the recovery process after a crash.

Related

WAL log files fill up quickly - how to prevent this?

currently the logs in the folder “/engine-rocksdb/journals” are running full (WAL logs).
When does ArangoDB do a cleaning run of these logs and delete them automatically and how to trigger this cleaning run earlier? My ArangoDB 3.10 runs in single mode and in a virtual environment (cloud with a network storage).
The logfile are increasing very fast for me because there are many writes to the DB. What is the best way, any idea?
What I have done so far:
If I set the value “rocksdb.wal-archive-size-limit” it does delete the logs when the set limit is reached, but it shows errors in the logfile:
2022-09-27T17:53:04Z [898948] WARNING [d9793] {engines} forcing removal of RocksDB WAL file '/archive/813371.log' with start sequence 5387062892 because of overflowing archive. configured maximum archive size is 1073741824, actual archive size is: 75401520
However, I still don't understand the meaning of the logfile output: "configured maximum archive size is 1073741824, actual archive size is: 75401520`". The "actual archive size" is smaller?
But what are the consequences of lowering the "wal-archive-size-limit" value? Is it possible to switch off the wal-archive completely. What exactly is it for? As I understand it, ArangoDb need it for transaction security (i.e. in case of power loss), right?
In general, yes, this is a good thing, but how can I get ArangoDb to a) limit this WAL-archive (without error massages) and b) do a cleaning run faster?
thx :-)
When does ArangoDB do a cleaning run of these logs and delete them automatically and how to trigger this cleaning run earlier?
ArangoDB uses RocksDB underneath, and RocksDB will move WAL file (.log files) into its archive as soon as possible. In order to do so all data from the WAL file needs to be safely stored in the column families' .sst files and have been flushed to disk.
ArangoDB will delete files from the WAL archive (and only from there) once it can assure that an archived WAL file is not used anymore. It will not remove files for the archive that are or may be in current use.
There are a few reasons why ArangoDB may keep archived WAL files for some time:
when server-to-server replication is used: while a follower replicates data, it may read from the leader's WAL. Deleting the WAL file on the leader may make the replication fail
when arangodump is used to create a database dump, it will create a snapshot of data on the server, and the WAL files for that snapshot will be kept around until the snapshot isn't needed anymore (i.e. arangodump finishes).
the first 180 seconds after server start, all WAL files are intentionally kept, for forensic reasons, and to allow followers to replay events from a leader's WAL when it is restarted. The value of 180 seconds can be changed by adjusting the startup option --rocksdb.wal-file-timeout-initial.
there can be some background processing of changes that may refer to data from WAL files. For example, each insert into a collection will need to increase the collection's count() value by 1. To save an extra write into RocksDB on each insert, the count() value is only written to the storage engine by a background thread, ideally only once every X insert operations. However, this may lead to WAL files being around for a bit longer, especially if the background thread cannot keep up with the insert workload.
There is the startup option --rocksdb.wal-archive-size-limit to put a hard limit on the cumulated size of the WAL files in the archive. From your question, it appears that you are currently using ArangoDB version 3.10.
From the warning message you posted, it seems that the WAL archive cleanup somehow applies the wrong limit values.
It turns out that there has been a recent bugfix, released in ArangoDB version 3.10.1, 3.9.4, and 3.8.8, that should rectify this behavior. So upgrading to one of these or later versions may actually help when using the WAL archive size limit.
Shared your question in the Speedb hive, on Discord, and here is what we got for you:
"By default, ArangoDB set the max_wal_size to 1G the value of rocksdb.wal-archive-size-limit must be set to at least twice this number (otherwise you may end up with a single WAL file and the delete will fail)."
Hope this help, if it doesn't or you have follow up questions, please join the Speedb Discord and we will be happy to help.

Redis data recovery from Append Only File?

If we enable the AppendFileOnly in the redis.conf file, every operation which changes the redis database is loggged in that file.
Now, Suppose Redis has used all the memory allocated to it in the "maxmemory" direcive in the redis.conf file.
To store more data., it starts removing data by any one of the behaviours(volatile-lru, allkeys-lru etc.) specified in the redis.conf file.
Suppose some data gets removed from the main memory, But its log will still be there in the AppendOnlyFile(correct me if I am wrong). Can we get that data back using this AppendOnlyFile ?
Simply, I want to ask that if there is any way we can get that removed data back in the main memory ? Like Can we store that data into disk memory and load that data in the main memory when required.
I got this answer from google groups. I'm sharing it.
----->
Eviction of keys is recorded in the AOF as explicit DEL commands, so when
the file is replayed fully consistency is maintained.
The AOF is used only to recover the dataset after a restart, and is not
used by Redis for serving data. If the key still exists in it (with a
subsequent eviction DEL), the only way to "recover" it is by manually
editing the AOF to remove the respective deletion and restarting the
server.
-----> Another answer for this
The AOF, as its name suggests, is a file that's appended to. It's not a database that Redis searches through and deletes the creation record when a deletion record is encountered. In my opinion, that would be too much work for too little gain.
As mentioned previously, a configuration that re-writes the AOF (see the BGREWRITEAOF command as one example) will erase any keys from the AOF that had been deleted, and now you can't recover those keys from the AOF file. The AOF is not the best medium for recovering deleted keys. It's intended as a way to recover the database as it existed before a crash - without any deleted keys.
If you want to be able to recover data after it was deleted, you need a different kind of backup. More likely a snapshot (RDB) file that's been archived with the date/time that it was saved. If you learn that you need to recover data, select the snapshot file from a time you know the key existed, load it into a separate Redis instance, and retrieve the key with RESTORE or GET or similar commands. As has been mentioned, it's possible to parse the RDB or AOF file contents to extract data from them without loading the file into a running Redis instance. The downside to this approach is that such tools are separate from the Redis code and may not always understand changes to the data format of the files the way the Redis server does. You decide which approach will work with the level of speed and reliability you want.
But its log will still be there in the AppendOnlyFile(correct me if I am wrong). Can we get that data back using this AppendOnlyFile ?
NO, you CANNOT get the data back. When Redis evicts a key, it also appends a delete command to AOF. After rewriting the AOF, anything about the evicted key will be removed.
if there is any way we can get that removed data back in the main memory ? Like Can we store that data into disk memory and load that data in the main memory when required.
NO, you CANNOT do that. You have to take another durable data store (e.g. Mysql, Mongodb) for saving data to disk, and use Redis as cache.

When does DBwn update buffers in Database buffer cache to database disk?

I am learning some basic knowledge about Oracle Database Architecture, and there are two examples.
The steps involved in executing a data manipulation language (DML) statements.
The steps involved in executing COMMIT command.
Executing DML steps
The steps are as follows:
The server process receives the statement and checks the library cache for any shared SQL area that contains a similar SQL statement.
If a shared SQL area is found, the server process checks the user’s
access privileges for the requested data, and the existing shared SQL
area is used to process the statement. If not, a new shared SQL area
is allocated for the statement, so that it can be parsed and
processed.
If the data and undo segment blocks are not already in the buffer cache, the server process reads them from the data files into the
buffer cache. The server process locks the rows that are to be
modified.
The server process records the changes to be made to the data buffers as well as the undo changes. These changes are written to the
redo log buffer before the in-memory data and undo buffers are
modified. This is called write-ahead logging.
The undo segment buffers contain values of the data before it is modified. The undo buffers are used to store the before image of the
data so that the DML statements can be rolled back, if necessary. The
data buffers record the new values of the data.
The user gets feedback from the DML operation (such as how many rows were affected by the operation).
COMMIT Process steps
The steps are as follows:
The server process places a commit record, along with the system change number (SCN), in the redo log buffer. The SCN is monotonically
incremented and is unique within the database.
The LGWR background process performs a contiguous write of all the redo log buffer entries up to and including the commit record to the
redo log files.
If modified blocks are still in the SGA, and if no other session is modifying them, then the database removes lock-related transaction
information from the blocks.
The server process provides feedback to the user process about the completion of the transaction.
First question: Do server process or background process finally move or migrate Redo log files to Data files? If yes, how to do this process?
Thanks to Nicholas Krasnov & JSapkota comments. There doesn't exist this kind of "Migration" process because they serve different purposes. Data files are the data of the database and Redo log files are used to recover the database. DBWn is responsible for writing data to data files and LGWR write redo log buffer to the active redo log file on disk.
My Second question: When does DBwn(Database writer process) modify buffers in the cache to database disk? Update database disk before COMMIT or after COMMIT.
DBwn will dose not write to database files because of issuing commit statement. Commit just means that this is a end of transaction. Locks on table or rows are released, SCN is incremented, and LGWR writes the SCN and changes to online redolog file.
The database buffer cache has two list
1) Write list
2) least-recently-used (LRU) list
Least-recently-used (LRU) list has dirty buffer. Dirty buffer is those buffers which were modified. Commit will indirectly make a buffer dirty. It will be in buffer as its been recently accessed.
DBwn neither writes in the datafile before the commit nor after. It has its own scenarios like when checkpoint happens or dirty buffers reaches threshold or there is no free buffer etc.
I hope, I answered your question. Thank you.

Core Data sqlite file growing

I have a synchronization process and Im using Core Data to store a lot of information. Several times I downloaded the real SQLite database file with the organizer to check if the data is correct.
Some days ago I recongized that the size difference of two SQLite files was huge. One file was 80MB, the other file about 100MB. As I checked out the data in it with a SQLiteviewer there was no difference. The same tables, same indexes, same rows. How can that be? Is is is possible that some data is still in the file when I delete objects over Core Data?
EDIT:
The solution is an option flag which can be inserted in an option NSDictionary and added as parameter to the addPersistentStore method.
NSSQLiteManualVacuumOption
Option key to rebuild the store file, forcing a database wide defragmentation when the store is added to the coordinator.
This invokes SQLite's VACUUM command. It is ignored by stores other than the SQLite store.
Sqlite does not proactively return unused disk space as it deletes data, for performance reasons. This could be why you see the difference. See this link for more info:
SQLite FAQ

Is having multiple data/log files a good thing even on the same LUN?

I have read that it is a good idea to have one file per CPU/CPU Core so that SQL can more efficiently stream data to and from the disks. Ok, I can see the benefit if they are on different spindles, but what if I only have one spindle (4 drives in Raid 10) for my data files (.mdf and .ndf), will I still benefit from splitting the data files (from just the .mdf file to a .mdf and several .ndf files)? Same goes for the log file, although I see no benefit to it as the data has to be written serially and you're limited by the spindle's sequential write speed...
FYI, this is in regards to SQL Server 2005/2008...
Thanks.
The recommendation for multiple tempdb data files is definitely not about IOPS. It is about contention on the allocation pages (GAM, SGAM, PFS) in tempdb. SQL 2005+ doesn't require as big of a load on these pages, but contention still occurs. Not all system require a 1 file to 1 core mapping. Most sytems will perform well with 1 file to 2 or 4 cores. Having too many files adds overhead for managing the files. A good recommendation is to start with 1:4 or 1:2 and increasing if contention continues. Don't go above 1:1.
For other databases, this is not recommended.
And yes, only 1 log file ... always.
8 Steps to better Transaction Log throughput:
Create only ONE transaction log file.
Even though you can create multiple
transaction log files, you only need
one... SQL Server DOES not "stripe"
across multiple transaction log files.
Instead, SQL Server uses the
transaction log files sequentially.
Misconceptions around TF 1118:
Why is the trace flag not required so
much in 2005 and 2008? In SQL Server
2005, my team changed the allocation
system for tempdb to reduce the
possibility of contention. There is
now a cache of temp tables. When a new
temp table is created on a cold system
(just after startup) it uses the same
mechanism as for SQL 2000. When it is
dropped though, instead of all the
pages being deallocated completely,
one IAM page and one data page are
left allocated, and the temp table is
put into a special cache. Subsequent
temp table creations will look in the
cache to see if they can just grab a
pre-created temp table 'off the
shelf'. If so, this avoids accessing
the allocation bitmaps completely. The
temp table cache isn't huge (I think
it's 32 tables), but this can still
lead to a big drop in latch
contention in tempdb.
So the answer is NO to both questions. Log striping was never an issue, and one-NDF-per-CPU is largely a myth, one that will take a very long time to die out. Multiple files IMHO make sense only if you can stripe IO (separate LUNs). Multiple filegroups though make sense, but not for IO reasons, for administrative purposes: piecemeal restores and archive read-only filegroups.
Still good. This is not about IOPS - it is about SQL Server BLOCKING a file for certain operations. mostly when file extends are allocated to a table / index. If you do a lot of inserts / updates, multiple files basically mean another thread will block another file, not wait on the first one.
So, this is not really about IOPS loads, it is about a blocking behavior.