How to recover from JVM subprocess running OOM?

How to recover from JVM subprocess running OOM? - process

I have two JVM processes A and B. Process A communicates with the user and uses B as a slave, to do heavy computation: User -> A -> B.compute
Yet the method B.compute can run out of memory for certain inputs (it is impossible to know which). In such case I want to inform the user, that the input data he gave me is not appropriate, and I want to restart B.
I found the following (not very detailed) solutions on google:
catch the error in B catch (OutOfMemoryException e)
use JVM option -XX:OnOutOfMemoryError=restart-command
manually restart B from A
Which method is the most appropriate to use?
Please show a minimal (OS agnostic) working example.

Letting the JVM terminate abruptly is never a good design for any kind of application.
If you know that there are situations that will cause this, I would design process B to monitor its own memory usage and then terminate processing of data if it is going to run out of memory.
You can do this as simply as:
Runtime rt = Runtime.getRuntime();
long usedMem = (rt.totalMemory() - rt.freeMemory()) / 1024 / 1024;
You could set a threshold on free memory where your B process will stop processing the input, throw away all results and inform A that this is an erroneous input. The garbage collector will reclaim unused memory and return B to being ready for more input (if you really have to you could make an explicit call to System.gc() to force this but I wouldn't recommend it).

Related

vulkan command buffers synchronization for the case of updating the command buffer

Suppose we have 3 command buffers A, B and C. All enable VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT when they are created. The dependencies are as follows:
B -> A
C -> B, the next image in the swapchain
The synchronization between them is done using semaphores. In the most of time, I can pre-create A, B and C and then just submit them one after another to the rendering queue in the rendering loop. At some time point, I want to modify the command buffer A. However, the problem is that several A have been submitted into the rendering queue. I think at that time point the rendering queue may look like
A B C A B
I cannot modify command buffer A because it is being executed or queued by the GPU. The most naive way is to call vkQueueWaitIdle on the CPU side to wait for all CBs to be done. Then I can modify A and continue back to my rendering sequence. The problem for this method is that it will wait all CBs to be done. In my opinion, I only need to wait for all As to be done in the queue as opposed to waiting for all CBs. Is it possible to do it? Is there a better way to modify A without calling vkQueueWaitIdle?

Don't modify the command buffer. Create a new one and record into that. It really shouldn't matter whether it's the command buffer object A or some alternative command buffer object A'. What matters are the commands you record into it.
In any case, the typical way to know when an operation is finished with a command buffer (or some set thereof) is to use a fence at queue submission time. Fences are particularly coarse grained, but you can query information about their status from the CPU.

Can a process terminate after I/O without returning to the CPU?

I have a question about the following diagram from Operating Systems Concepts: http://unboltingbinary.in/wp-content/uploads/2015/04/image028.jpg
This diagram seems to imply that after every I/O operation, the process is placed back on the ready queue before being sent to the CPU again. However, is it possible for a process to terminate after I/O but before being sent to the ready queue?
Suppose we have a program that computes a number and then writes it to storage. In this case, does the process really need to return to the CPU after the I/O operation? It seems to me that the process should be allowed to terminate right after I/O. That way, there would be no need for a context switch.

Once one process has successfully executed a termination request on another, the threads of the terminated process should never run again, no matter what state they were in - blocked on I/O, blocked on inter-thread comms, running on a core, sleeping, whatever - they all must be stopped immediately if running and all be put in a state where they will never run again.
Anything else would be a security issue - terminated threads should not be given execution at all, (else it may not be possible to terminate the process).

Process termination requires the cpu. Changes to kernel mode structures on process exit, returning memory resources, etc. all require the cpu.
A process simply just does not evaporate. The term you want here is process rundown - I think.

Operating Systems - General Process Creation

Review Question
Consider the Program
#include <stdio.h>
int main(){
putchar('X');
exit(0);
}
Suppose it is compiled an an a.out file is generated. now suppose that a user in a local console window types a.out and hits the return key. what happens? be sure to describe a plausible but detailed and comprehensive sequence of operating system actions and events, not just what the user sees.
My answer
First, the shell will create a process in User Space
Then it will perform the system call 'putchar' Which simulates input, and the process will switch to kernel mode
It will then add the process (thread) to the long term scheduler where it will join the set of all processes that are ready to run
Once it is selected, it will move to the short term scheduler, where it will receive some processing time (ready -> running)
Since this process is an IO bound process, it will then head to the IO queue, where it will be stored in a buffer where it awaits execution (running -> waiting)
Once the IO is complete, the putchar call will print the X on the peripheral for which it is applied (the monitor) (waiting -> running)
Once the process returns to the short term scheduler it will again receive more processing time. Since there is nothing left to do but terminate, the process terminates (running -> terminated)
Is this valid understanding? Am I missing some critical concepts for process creation? I know it is relatively simple process, but please advise anything I am missing.
Thanks for reading, and thanks in advance for assistance.

First, the shell will create a process in User Space
// A lot of things happen before this!!
//The program will be loaded by the loaded.
//VM areas will be created for this process.
//Linking for library files will be done.
//Then a series of pagefault will occur will happen to bring your file on physical and virtual memory
Then it will perform the system call 'putchar' Which simulates input, and the process will switch to kernel mode
//putchar in not at all a system call!!!!
//putchar will call its library implementation, which will further call a write() system call and your program will get trapped inside the kernel
It will then add the process (thread) to the long term scheduler where it will join the set of all processes that are ready to run
//Totally depends upon the scheduling algorithms.. might be possible your process will be first to run!!
Once it is selected, it will move to the short term scheduler, where it will receive some processing time (ready -> running)
//Right, waiting on RunQ
Since this process is an IO bound process, it will then head to the IO queue, where it will be stored in a buffer where it awaits execution (running -> waiting)
//Sort of, it will be waiting on I/O queue, waiting for an interrupt, to write on o/p device
Once the IO is complete, the putchar call will print the X on the peripheral for which it is applied (the monitor) (waiting -> running)
//Correct
Once the process returns to the short term scheduler it will again receive more processing time. Since there is nothing left to do but terminate, the process terminates (running -> terminated)
//Before this it will again get trapped inside the kernel when your program will execute RETURN statement.
//It will call the back the startup function which was responsible for calling the main() function.
//Then startup() function will return 0 to operating system, and hence OS will kill this process and moce it to terminated state..
I still don't think its a complete version as 100's of machine instruction will be executed for this program and its difficult to pin point each and everyone..
But, still if you have some doubt post your comment!!]
Hope this will help!!!

process states - new state & ready state

As OS concepts book illustrate this section "Process States":
Process has defined states: new, ready, running, waiting and terminated.
I have conflict between new and ready states, I know that in ready state the process is allocated in memory and all resources needed at creation time is allocated but it is only waiting for CPU time (scheduling).
But what is the new state? what is the previous stage before allocating it in memory?

All the tasks that the OS has to perform cannot be allocated memory immediately after the task is submitted to the OS. So they have to remain in the new state. The decision as to when they move to the ready state is taken by the Long term scheduler. More info about long term scheduler here http://en.wikipedia.org/wiki/Scheduling_(computing)#Long-term_scheduling

To be more precise,the new state is for those processes which are just being created.These haven't been created fully and are in it's growing stage.
Whereas,the ready state means that the process created which is stored in PCB(Process Control Block) has got all the resources which it required for execution,but CPU is not running that process' instructions,
I am giving you a simple example :-
Say, you are having 2 processes.Process A is syncing your data over cloud storage and Process B is printing other data.
So,in case process B is getting created to be stored in PCB,the other
process,Process A has been already created and is not getting the
chance to run because CPU hasn't called these instructions of Process
A.But,Process B requires printer to be found and other drivers to be
checked.It must also check for verification of pages to be printed!
So,here Process A has been created and is waiting for
CPU-time---hence,in ready state. Whereas,Process B is waiting for
printer to be initialised and files to be examined to be
printed--->Hence,in new state(That means these processes haven't been
successfully added into PCB).
One more thing to guide you isFor each process there is a Process Control Block, PCB, which stores the process-specific information.
I hope it clears your doubt.Feel free to comment whatever you don't understand...

Batch printing exception

I get this error while printing multiple .xps documents to a physical printer
Dim defaultPrintQueue As PrintQueue = GetForwardPrintQueue(My.Settings.SelectedPrinter)
Dim xpsPrintJob As PrintSystemJobInfo
xpsPrintJob = defaultPrintQueue.AddJob(JobName, Document, False)
Documents are spooled succesfully till, a print job exception occurs
The InnerException is Insufficient memory to continue the execution of the program.
The source is PresentationCore.dll
Where should i start searching?

When attempting to perform tasks that may fail due to temporary or permanent restrictions on some resource, I tend to use a back-off strategy. This strategy has been followed on things as diverse as message queuing and socket opens.
The general process for such a strategy is as follows.
set maxdelay to 16 # maximum time period between attempts
set maxtries to 10 # maximum attempts
set delay to 0
set tries to 0
while more actions needed:
if delay is not 0:
sleep delay
attempt action
if action failed:
add 1 to tries
if tries is greater than maxtries:
exit with permanent error
if delay is 0:
set delay to 1
else:
double delay
if delay is greater than maxdelay:
set delay to maxdelay
else:
set delay to 0
set tries to 0
This allows the process to run at full speed in the vast majority of cases but backs off when errors start occurring, hopefully giving the resource provider time to recover. The gradual increase in delays allows for more serious resource restrictions to recover and the maximum tries catches what you would term permanent errors (or errors that are taking too long to recover).
I actually prefer this try-it-and-catch-failure approach to the check-if-okay-then-try one since the latter can still often fail if something changes between the check and the try. This is called the "better to seek forgiveness than ask permission" method, which also works quite well with bosses most of the time, and wives a little less often :-)
One particularly useful case was a program which opened a separate TCP session for each short-lived transaction. On older hardware, the closed sockets (those in TCP WAIT state) eventually disappeared before they were needed again.
But, as the hardware got faster, we found that we could open sessions and do work much quicker and Windows was running out of TCP handles (even when increased to the max).
Rather than having to re-engineer the communications protocol to maintain sessions, this strategy was implemented to allow graceful recovery in the event handles were starved.
Granted it's a bit of a kludge but this was legacy software approaching end-of-life, where bug fixes are often just enough to get it working and it wasn't deemed strategic enough to warrant spending a lot of money in fixing it properly.
Update: It may be that there's a (more permanent) problem with PresentationCore. This KB article states that there's a memory leak in WPF within .NET 3.5SP1 (of which your print driver may be a client).
If the backoff strategy doesn't fix your problem (it may not if it's a leak in a long lived process), you might want to try applying the hotfix. Me, I'd replicate the problem in a virtual machine and then patch that to test it (but I'm an extreme paranoid).
It was found by googling PresentationCore Insufficient memory to continue the execution of the program and checking the first link here. Search for the string "hotfix that relates to this issue" on that page.

Before adding a new job to the queue you should check the queue state. More info on PrintQueue.IsOutOfMemory property and related properties that can be queried to verify that the queue is not in an error state.
Of course pax' hint to use a defensive strategy when accessing resources like printers is best practice. For starter you may want to put the line adding the job into a try block.

You might want to consider launching a new process to handle the printing of each document, the overhead should be low compared to the effort of printing the documents.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas