Hsqldb-How data will persist when process crashes

Hsqldb-How data will persist when process crashes - hsqldb

I want data should commit in hsqldb data file while process crashes without logging to log file. I set one command "SET FILES WRITE DELAY FALSE" But it needs log file enabled to persist the data.As logging will create performance impact i dont want it.So is there any way to achive this objective.
what is the impact of logging in Application.
Thanks in Advance,
Anil

HSQLDB writes data to the .log file by default. This .log file is used for recovery in case of a crash.
When SET FILES WRITE DELAY FALSE is set, the engine performs a file sync on the .log file at each commit. The reason for the performance impact is the file sync operation. This setting is useful for tests, when many changes are written quickly and the process always exists. It is not necessary for many applications, as by default, a sync is performed on the .log file every .5 second if new records have been written. This can be reduced to 100 milliseconds without a big impact on performance.
You can turn logging off with the SET FILES LOG FALSE statement and perform mass inserts or updates, but this is not recommended as a default setting.
At anytime, including when logging has been turned off , you can perform an explicit CHECKPOINT to commit all the data to the .data and .script files.

Related

WAL log files fill up quickly - how to prevent this?

currently the logs in the folder “/engine-rocksdb/journals” are running full (WAL logs).
When does ArangoDB do a cleaning run of these logs and delete them automatically and how to trigger this cleaning run earlier? My ArangoDB 3.10 runs in single mode and in a virtual environment (cloud with a network storage).
The logfile are increasing very fast for me because there are many writes to the DB. What is the best way, any idea?
What I have done so far:
If I set the value “rocksdb.wal-archive-size-limit” it does delete the logs when the set limit is reached, but it shows errors in the logfile:
2022-09-27T17:53:04Z [898948] WARNING [d9793] {engines} forcing removal of RocksDB WAL file '/archive/813371.log' with start sequence 5387062892 because of overflowing archive. configured maximum archive size is 1073741824, actual archive size is: 75401520
However, I still don't understand the meaning of the logfile output: "configured maximum archive size is 1073741824, actual archive size is: 75401520`". The "actual archive size" is smaller?
But what are the consequences of lowering the "wal-archive-size-limit" value? Is it possible to switch off the wal-archive completely. What exactly is it for? As I understand it, ArangoDb need it for transaction security (i.e. in case of power loss), right?
In general, yes, this is a good thing, but how can I get ArangoDb to a) limit this WAL-archive (without error massages) and b) do a cleaning run faster?
thx :-)

When does ArangoDB do a cleaning run of these logs and delete them automatically and how to trigger this cleaning run earlier?
ArangoDB uses RocksDB underneath, and RocksDB will move WAL file (.log files) into its archive as soon as possible. In order to do so all data from the WAL file needs to be safely stored in the column families' .sst files and have been flushed to disk.
ArangoDB will delete files from the WAL archive (and only from there) once it can assure that an archived WAL file is not used anymore. It will not remove files for the archive that are or may be in current use.
There are a few reasons why ArangoDB may keep archived WAL files for some time:
when server-to-server replication is used: while a follower replicates data, it may read from the leader's WAL. Deleting the WAL file on the leader may make the replication fail
when arangodump is used to create a database dump, it will create a snapshot of data on the server, and the WAL files for that snapshot will be kept around until the snapshot isn't needed anymore (i.e. arangodump finishes).
the first 180 seconds after server start, all WAL files are intentionally kept, for forensic reasons, and to allow followers to replay events from a leader's WAL when it is restarted. The value of 180 seconds can be changed by adjusting the startup option --rocksdb.wal-file-timeout-initial.
there can be some background processing of changes that may refer to data from WAL files. For example, each insert into a collection will need to increase the collection's count() value by 1. To save an extra write into RocksDB on each insert, the count() value is only written to the storage engine by a background thread, ideally only once every X insert operations. However, this may lead to WAL files being around for a bit longer, especially if the background thread cannot keep up with the insert workload.
There is the startup option --rocksdb.wal-archive-size-limit to put a hard limit on the cumulated size of the WAL files in the archive. From your question, it appears that you are currently using ArangoDB version 3.10.
From the warning message you posted, it seems that the WAL archive cleanup somehow applies the wrong limit values.
It turns out that there has been a recent bugfix, released in ArangoDB version 3.10.1, 3.9.4, and 3.8.8, that should rectify this behavior. So upgrading to one of these or later versions may actually help when using the WAL archive size limit.

Shared your question in the Speedb hive, on Discord, and here is what we got for you:
"By default, ArangoDB set the max_wal_size to 1G the value of rocksdb.wal-archive-size-limit must be set to at least twice this number (otherwise you may end up with a single WAL file and the delete will fail)."
Hope this help, if it doesn't or you have follow up questions, please join the Speedb Discord and we will be happy to help.

SQL Server MERGE on a large table with small log file

I am running a MERGE statement on a large table (5M of rows) with a small log file size (2GB). I am getting an error:
Merge for MyTable failed: The transaction log for database 'MyDb' is full due to 'ACTIVE_TRANSACTION'.
Could be this solved by another action than extending the log file? I can’t really afford to extend the log file currently.

If you have a fixed log file size, you have essentially two options:
Temporarily change the recovery mode of your database from FULL to BULK-LOGGED. You'll lose the ability to do point-in-time recovery during this period, but it allows you to quickly do the operation and then go back. There are other caveats, so you need to do some research to make sure this is what you want to do.
Instead of changing the transaction log, you can adopt a batching approach to commit small batches of changes at a time, thus allowing the log to flush as needed.

one log4net log file for many applications

I have two scheduled tasks which write to the same log4net log file.
When one task is running it writes to the log file successfully. However, when both are running; the first to start will write to the log file and the second to start will not.
Do you have to have one log file per app? I have read the documentation but cant find an answer.

If you really want to write to the same file from two processes it is possible to use a different locking model than the default to allow this. Here are the existing models in log4net:
ExclusiveLock: locks the file exclusively, for one process only
MinimalLock: locks the file during the least amount of time possible, making changing/deleting the file during logging possible
InterProcessLock: allows synchronization between processes
So it is definitely possible to have multiple processes write to the same file without losing info. However, as COLD TOLD said and as the log4net documentation recommends:
All locking strategies have issues and you should seriously consider
using a different strategy that avoids having multiple processes
logging to the same file.

Yes that the best way to go unless you only run one app and expect the other app not to run, while one app is using the file the other app may not write to it so you either need to use thread pool in order to control the application that accessing each log or create a separate logfiles I personally would use database in this situation.

atomic inserts in big query

When I load more than 1 csv file, how does big query handles the errors?
bq load --max_bad_record=30 dbname.finalsep20xyz
gs://sep20new/abc.csv.gz,gs://sep20new/xyzcsv.gz
There are a few files in the batch job they may fail to load since the number of expected columns will not match. I want to load the rest of the files though. If the file abc.csv fails Will the xyz.csv file be executed?
Or will the entire job fail and no record will be inserted?
I tried with dummy records but could not conclusively find how the errors in multiple files are handled.

Loads are atomic -- either all files commit or no files do. You can break the loads up into multiple jobs if you want them to complete independently. An alternative would be to set max_bad_records to something much higher.
We would still prefer that you launch fewer jobs with more files, since we have more flexibility in how we handle the imports. That said, recent changes to load quotas mean that you can submit more simultaneous load jobs, and still higher quotas are planned soon.
Also please note that all BigQuery actions that modify BQ state (load, copy, query with a destination table) are atomic; the only job type that isn't atomic is extract, since there is a chance that it might fail after having written out some of the exported data.

How to temporarily disable the log in SQL2000/2005?

Is there a way to stop the log file from growing (or at least from growing as much) in SQL2000/2005?
I am running a very extensive process with loads of inserts and the log is going through the roof.
EDIT: please note I am talking about an batch-import process not about everyday update of live-data.

You can't disable the log, but you could perform your inserts in batches and backup/truncate the log in between batches.
If the data originates from outside your database you could also consider using BCP.

Remember that setting the recovery mode to SIMPLE only allows you to recover the database to the point of your most recent backup. Pending transaction which have not been committed to the database - after the backup has been created - will be lost.

Changing the recovery model will cause your old log backups to be of no use if you need to restore as this will change the log chain.
If you need full recovery normally you'll want to increase your log backup frequency during the load process. This can be done by changing the job schedule for the log backup via the sp_update_jobschedule procedure in the msdb database both before and after the load process.

Your batch may make too much use of temporary tables.

You can turn 'autogrowth' off when creating a database.
You can change this setting seperately for the database and/or the logfile.
Change Autogrowth setting SQL Server http://www.server-management.co.uk/images/library/c1652bc7-.jpg

Changing the recovery mode to SIMPLE causes the log to grow not as much.
What's people opinion about this solution?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas