Limitng logfile size and archiving - express

I'm using express app and I need to save whatever logs it is producing. Any logging middleware will work (winston, simple-node-logger etc). But there are strict requirements: A log file should not exceed 50mb. When it does reach this size it should be zipped and stored for history data. Only 20 log zips must exist at a time. Simply logging into a file and limiting it's size is easy enough by just setting up winston config. But how do I set up size monitoring and zipping feature AND limit the amount of history logs? All of this has to work simultaniously with express app running. Thanks!

Related

WAL log files fill up quickly - how to prevent this?

currently the logs in the folder “/engine-rocksdb/journals” are running full (WAL logs).
When does ArangoDB do a cleaning run of these logs and delete them automatically and how to trigger this cleaning run earlier? My ArangoDB 3.10 runs in single mode and in a virtual environment (cloud with a network storage).
The logfile are increasing very fast for me because there are many writes to the DB. What is the best way, any idea?
What I have done so far:
If I set the value “rocksdb.wal-archive-size-limit” it does delete the logs when the set limit is reached, but it shows errors in the logfile:
2022-09-27T17:53:04Z [898948] WARNING [d9793] {engines} forcing removal of RocksDB WAL file '/archive/813371.log' with start sequence 5387062892 because of overflowing archive. configured maximum archive size is 1073741824, actual archive size is: 75401520
However, I still don't understand the meaning of the logfile output: "configured maximum archive size is 1073741824, actual archive size is: 75401520`". The "actual archive size" is smaller?
But what are the consequences of lowering the "wal-archive-size-limit" value? Is it possible to switch off the wal-archive completely. What exactly is it for? As I understand it, ArangoDb need it for transaction security (i.e. in case of power loss), right?
In general, yes, this is a good thing, but how can I get ArangoDb to a) limit this WAL-archive (without error massages) and b) do a cleaning run faster?
thx :-)
When does ArangoDB do a cleaning run of these logs and delete them automatically and how to trigger this cleaning run earlier?
ArangoDB uses RocksDB underneath, and RocksDB will move WAL file (.log files) into its archive as soon as possible. In order to do so all data from the WAL file needs to be safely stored in the column families' .sst files and have been flushed to disk.
ArangoDB will delete files from the WAL archive (and only from there) once it can assure that an archived WAL file is not used anymore. It will not remove files for the archive that are or may be in current use.
There are a few reasons why ArangoDB may keep archived WAL files for some time:
when server-to-server replication is used: while a follower replicates data, it may read from the leader's WAL. Deleting the WAL file on the leader may make the replication fail
when arangodump is used to create a database dump, it will create a snapshot of data on the server, and the WAL files for that snapshot will be kept around until the snapshot isn't needed anymore (i.e. arangodump finishes).
the first 180 seconds after server start, all WAL files are intentionally kept, for forensic reasons, and to allow followers to replay events from a leader's WAL when it is restarted. The value of 180 seconds can be changed by adjusting the startup option --rocksdb.wal-file-timeout-initial.
there can be some background processing of changes that may refer to data from WAL files. For example, each insert into a collection will need to increase the collection's count() value by 1. To save an extra write into RocksDB on each insert, the count() value is only written to the storage engine by a background thread, ideally only once every X insert operations. However, this may lead to WAL files being around for a bit longer, especially if the background thread cannot keep up with the insert workload.
There is the startup option --rocksdb.wal-archive-size-limit to put a hard limit on the cumulated size of the WAL files in the archive. From your question, it appears that you are currently using ArangoDB version 3.10.
From the warning message you posted, it seems that the WAL archive cleanup somehow applies the wrong limit values.
It turns out that there has been a recent bugfix, released in ArangoDB version 3.10.1, 3.9.4, and 3.8.8, that should rectify this behavior. So upgrading to one of these or later versions may actually help when using the WAL archive size limit.
Shared your question in the Speedb hive, on Discord, and here is what we got for you:
"By default, ArangoDB set the max_wal_size to 1G the value of rocksdb.wal-archive-size-limit must be set to at least twice this number (otherwise you may end up with a single WAL file and the delete will fail)."
Hope this help, if it doesn't or you have follow up questions, please join the Speedb Discord and we will be happy to help.

How to resolve this error in Google Data Fusion: "Stage x contains a task of very large size (2803 KB). The maximum recommended task size is 100 KB."

I need to move data from an parameterized S3 Bucket into Google Cloud Storage. Basic Data dump. I don't own the S3 bucket. It has the following syntax,
s3://data-partner-bucket/mykey/folder/date=2020-10-01/hour=0
I was able to transfer data at the hourly granularity using the Amazon S3 Client provided by Data Fusion. I wanted to bring over a days worth of data so I reset the path in the client to:
s3://data-partner-bucket/mykey/folder/date=2020-10-01
It seemed like it was working until it stopped. The status is "Stopped." When I review the logs just before it stopped I see a warning, "Stage 0 contains a task of very large size (2803 KB). The maximum recommended task size is 100 KB."
I examined the data in the S3 bucket. Each folder contains a series of log files. None of them are "big". The largest folder contains a total of 3MB of data.
I saw a similar question for this error, but the answer involved Spark coding that I don't have access to in Data Fusion.
Screenshot of Advanced Settings in Amazon S3 Client
These are the settings I see in the client. Maybe there is another setting somewhere I need to set? What do I need to do so that Data Fusion can import these files from S3 to GCS?
When you deploy the pipeline you are redirected to a new page with a Ribbon at the top. one of the tools in the Ribbon is Configure.
In the resources section of the Configure Modal you can specify the memory resources. Fiddled around with the numbers. 1000MB worked. 6MB was not enough. (For me.)
I processed 756K records in about 46 min.

Speed Up Hot Folder Data Upload

I have CSV file of around 188MB. When I try to upload data using hot folder technique its taking too much time 10-12 hrs. How can I speedup the data upload?
Thanks
Default value of impex.import.workers is 1. Try to change this value. And I recommend making performance test with a bit smaller file first, than 188Mb (just to get swift results)
Adjust the number of impex threads on the backoffice server to speed up ImpEx file processing. It is recommended that you start with it equal to the number of cores available on a backoffice node. You should not adjust it any higher than 2 * number of cores, and this is only if the IMPEX processes will be the only item running on the node. The actual value could be somewhere in between and will only be determined by testing and analyzing the number of other processes, jobs, apps running on your server to ensure you are not maxing out CPU.
NOTE: this value could be higher for lower environments since Hybris will likely be the only process running.
Taken from Tuning Parameters - Hybris Wiki

Storage limit for indexeddb on IE10

We are building a web-app that store lots of files as blobs with indexedDB. If the user uses our app at its maximum, we could store as much as 15GB of file in indexeddb.
We ran into a problem with IE10, that I strongly suspect is a quota issue.
After having successfully saved some files, a new call to store.put(data, key); will never ends.
Basically, the function will be called, but no success event nor error event will be called.
If I look into the IndexedDB folder of IE 10 I'll see a handfull of what looks like temporary files (of 512 kB each) getting created and removed indefinitely.
When looking at the "Cache and Database" paramaters window, I see that my site's database has reached 250 MB.
Looking further, I found this blog entry http://msdnrss.thecoderblogs.com/2012/12/using-html5javascript-in-windows-store-apps-data-access-and-storage-mechanism-ii/ which incidently says that the storage limit for Windows Store apps is 250 MB.
I am not using any Windows Store mechanism, but I figured I could be victim of the same arbitrary limit.
So, my question is :
Is there any way to bypass this limit ? User is asked for permission to exceed a 10 MB limit, but I saw no question popping to the user when the 250 MB was reached.
Is there any other way to store more than 250 MB of data with IE10.
Thanks, I'll take any clues.
I afraid you can't. Providing the storage limit and asking the user to allow more space is the responsibility of the browser vendor. So I don't think the first option is applicable.
I know the user can allow a website to exceed a give limit (internet options > General > Browsing history > settings > caches and databases), but I don't know if that will overrule the 250MB. It can be that this is a hardcoded limit you can't exceed.
This limit is bound to a domain meaning you can't solve it by creating multiple databases. The only solution would be to store on multiple domains, but in that case you can't cross access them. Also as I see the 250MB limit will be for indexeddb API and File API combined

Hsqldb-How data will persist when process crashes

I want data should commit in hsqldb data file while process crashes without logging to log file. I set one command "SET FILES WRITE DELAY FALSE" But it needs log file enabled to persist the data.As logging will create performance impact i dont want it.So is there any way to achive this objective.
what is the impact of logging in Application.
Thanks in Advance,
Anil
HSQLDB writes data to the .log file by default. This .log file is used for recovery in case of a crash.
When SET FILES WRITE DELAY FALSE is set, the engine performs a file sync on the .log file at each commit. The reason for the performance impact is the file sync operation. This setting is useful for tests, when many changes are written quickly and the process always exists. It is not necessary for many applications, as by default, a sync is performed on the .log file every .5 second if new records have been written. This can be reduced to 100 milliseconds without a big impact on performance.
You can turn logging off with the SET FILES LOG FALSE statement and perform mass inserts or updates, but this is not recommended as a default setting.
At anytime, including when logging has been turned off , you can perform an explicit CHECKPOINT to commit all the data to the .data and .script files.