How to terminate process with Lucene NRT Reader/ Writer gracefully? - lucene

We are using Lucene's near real-time search feature for full-text search in our application. Since commits are costly, say we commit to index after every 10 documents are added (We expect around 150 to 200 documents per hour for indexing). Now, if I want to terminate my process, how do I make sure that all documents in memory are committed to disk before my process is killed? Is there any approach recommended here? Or is my document volume too less to bother about and should I commit on every addition?
Should I track all uncommitted documents? And if process gets killed before they are committed to disk, should I index these uncommitted ones again when the process start up?
Lucene NRT is used in a process that runs embedded Jetty. Is it the right approach to send a shutdown command (invoke some servlet) to jetty and wait till all documents are committed and then terminate using System.exit()?

You could add a hook to commit all the buffered documents in the destroy method of your servlet and make sure that the embedded servlet container is shut down before calling System.exit (maybe by adding a shutdown hook to the JVM).
But this is still not perfect. If your process gets killed, all the buffered data will be lost. Another solution is to use soft commits. Soft commits are cheap commits (no fsync is performed) so no data will be lost if your process gets killed (but data could still be lost if the server shuts down unexpectedly).
To sum up:
shutdown hook
best throughput
data can be lost if the process gets killed
soft commit
no data will be lost if the process gets killed
data may be lost if the server shuts down unexpectedly
hard commit (default)
no data loss at all
slow (need to perform a fsync)

Related

Deciphering redis logs

In redis logs, a line related to background saves appears often, e.g.:
[11465] 06 Mar 08:10:11.292 * RDB: 541 MB of memory used by copy-on-write
Can anyone clarify what this line precisely means?
When redis wants to save a snapshot, it does that by forking itself first and then the forked process saves the dataset, not disturbed by having to serve requests, etc.
Since you have two processes now, it could mean using twice as much RAM, right? But no, the operating systems actually optimize this scenario by having the new process refer memory pages of the old process.
Interesting thing happens when original server's memory changes after the fork (due to you issuing update commands or something). The forked process has to maintain whatever memory state it got when forked, so the system, before changing a shared page, copies the page to the forked process (so that it's no longer shared) and then changes original process' page. This is called "copy-on-write".
In your case, this roughly means that during time it needed for saving a snapshot, you changed 541MB of data.

Can a process terminate after I/O without returning to the CPU?

I have a question about the following diagram from Operating Systems Concepts: http://unboltingbinary.in/wp-content/uploads/2015/04/image028.jpg
This diagram seems to imply that after every I/O operation, the process is placed back on the ready queue before being sent to the CPU again. However, is it possible for a process to terminate after I/O but before being sent to the ready queue?
Suppose we have a program that computes a number and then writes it to storage. In this case, does the process really need to return to the CPU after the I/O operation? It seems to me that the process should be allowed to terminate right after I/O. That way, there would be no need for a context switch.
Once one process has successfully executed a termination request on another, the threads of the terminated process should never run again, no matter what state they were in - blocked on I/O, blocked on inter-thread comms, running on a core, sleeping, whatever - they all must be stopped immediately if running and all be put in a state where they will never run again.
Anything else would be a security issue - terminated threads should not be given execution at all, (else it may not be possible to terminate the process).
Process termination requires the cpu. Changes to kernel mode structures on process exit, returning memory resources, etc. all require the cpu.
A process simply just does not evaporate. The term you want here is process rundown - I think.

Cassandra Commit and Recovery on a Single Node

I am a newbie to Cassandra - I have been searching for information related to commits and crash recovery in Cassandra on a single node. And, hoping someone can clarify the details.
I am testing Cassandra - so, set it up on a single node. I am using stresstool on datastax to insert millions of rows. What happens if there is an electrical failure or system shutdown? Will all the data that was in Cassandra's memory get written to disk upon Cassandra restart (I guess commitlog acts as intermediary)? How long is this process?
Thanks!
Cassandra's commit log gives Cassandra durable writes. When you write to Cassandra, the write is appended to the commit log before the write is acknowledged to the client. This means every write that the client receives a successful response for is guaranteed to be written to the commit log. The write is also made to the current memtable, which will eventually be written to disk as an SSTable when large enough. This could be a long time after the write is made.
However, the commit log is not immediately synced to disk for performance reasons. The default is periodic mode (set by the commitlog_sync param in cassandra.yaml) with a period of 10 seconds (set by commitlog_sync_period_in_ms in cassandra.yaml). This means the commit log is synced to disk every 10 seconds. With this behaviour you could lose up to 10 seconds of writes if the server loses power. If you had multiple nodes in your cluster and used a replication factor of greater than one you would need to lose power to multiple nodes within 10 seconds to lose any data.
If this risk window isn't acceptable, you can use batch mode for the commit log. This mode won't acknowledge writes to the client until the commit log has been synced to disk. The time window is set by commitlog_sync_batch_window_in_ms, default is 50 ms. This will significantly increase your write latency and probably decrease the throughput as well so only use this if the cost of losing a few acknowledged writes is high. It is especially important to store your commit log on a separate drive when using this mode.
In the event that your server loses power, on startup Cassandra replays the commit log to rebuild its memtable. This process will take seconds (possibly minutes) on very write heavy servers.
If you want to ensure that the data in the memtables is written to disk you can run 'nodetool flush' (this operates per node). This will create a new SSTable and delete the commit logs referring to data in the memtables flushed.
You are asking something like
What happen if there is a network failure at the time data is being loaded in Oracle using SQL*Loader ?
Or what happens Sqoop stops processing due to some condition while transferring data?
Simply whatever data is being transferred before electrical failure or system shutdown, it will remain the same.
Coming to second question, when ever the memtable runs out of space, i.e when the number of keys exceed certain limit (128 is default) or when it reaches the time duration (cluster clock), it is being stored into sstable, immutable space.

How do I back up Redis sever RDB and AOF files for recovery to ensure minimal data loss?

Purpose:
I am trying to make backup copies of both dump.rdb every X time and appendonly.aof every Y time so if the files get corrupted for whatever reason (or even just AOF's appendonly.aof file) I can restore my data from the dump.rdb.backup snapshot and then whatever else has changed since with the most recent copy of appendonly.aof.backup I have.
Situation:
I backup dump.rdb every 5 minutes, and backup appendonly.aof every 1 second.
Problems:
1) Since dump.rdb is being written in the background into a temporary file by a child process - what happens to the key changes that occurs while the child process is creating a new image? I know the AOF file will keep appending regardless of the background write, but does the new dump.rdb file contain the key changes too?
2) If dump.rdb does NOT contain the key changes, is there some way to figure out the exact point where the child process is being forked? That way I can keep track of the point after which the AOF file would have the most up to date information.
Thanks!
Usually, people use either RDB, either AOF as a persistence strategy. Having both of them is quite expensive. Running a dump every 5 min, and copying the aof file every second sounds awfully frequent. Except if the Redis instances only store a tiny amount of data, you will likely kill the I/O subsystem of your box.
Now, regarding your questions:
1) Semantic of the RDB mechanism
The dump mechanism exploits the copy-on-write mechanism implemented by modern OS kernels when the clone/fork processes. When the fork is done, the system just creates the background process by copying the page table. The pages themselves are shared between the two processes. If a write operation is done on a page by the Redis process, the OS will transparently duplicate the page (so than the Redis instance has its own version, and the background process the previous version). The background process has therefore the guarantee that the memory structures are kept constant (and therefore consistent).
The consequence is any write operation started after the fork will not be taken in the dump. The dump is a consistent snapshot taken at fork time.
2) Keeping track of the fork point
You can estimate the fork timestamp by running the INFO persistence command and calculating rdb_last_save_time - rdb_last_bgsave_time_sec, but it is not very accurate (second only).
To be a bit more accurate (millisecond), you can parse the Redis log file to extract the following lines:
[3813] 11 Apr 10:59:23.132 * Background saving started by pid 3816
You need at least the "notice" log level to see these lines.
As far as I know, there is no way to correlate a specific entry in the AOF to the fork operation of the RDB (i.e. it is not possible to be 100% accurate).

TransactedReceiveScope - when does the Transaction Commit?

Scenario:
We have a wcf workflow with a client that does NOT use transactionflow.
The workflow contains several sequential TransactedReceiveScopes (using content-based correlation).
The TransactedReceiveScopes contain custom db operations.
Observations:
When we run SQL profiler against the first call, we see all the custom db calls, and the SaveInstance call in the profile trace.
We've noticed that, even though the SendReply is at the very end of TransactedReceiveScope, sometimes the sendreply occurs a good 10 seconds before the transaction gets committed.
We tried changing the TimeToPersist and TimeToUnload to zero, but that had no effect. (The trace shows the SaveInstance happening immediately anyway, but rather the commit seems to be delayed).
Questions:
Are our observations correct?
At what point is the transaction committed? Is this like garbage collection - i.e. it commits some time later when it's not busy?
Is there any way to control the commit delay, or is the only way to do this to use transactionflow from the client (anc then it should all commit when the client commits, including the persist).
The TransactedReceiveScope commits the transaction when the body is completed but as all execution is done through the scheduler that could be some time later. It is not related to garbage collection and there is no real way to influence it other that to avoid a busy machine and a lot of other parallel activities that could also be in the execution queue.