Daemon with Clojure/JVM - jvm

I'd like to have a small (not doing too damn much) daemon running on a little server, watching a directory for new files being added to it (and any directories in the main one), and calling another Clojure program to deal with that new file.
Ideally, each file would be added to a queue (a list represented by a ref in Clojure?) and the main process would take care of those files in the queue on a FIFO basis.
My question is: is having a JVM up running this little program all the time too much a resource hog? And do you have any suggestions as to how go about doing this?
Thank you very much!
EDIT: Another question I should ask: should I run this as its own instance (using less memory) and have it launch a new JVM when a file is seen, or have it on the same JVM the Clojure code that will process the file?

As long as it is running fine now and it has no memory leaks it should be fine.
From the daemon terminology I gather it is on a unix clone, and in this case best is to start it from an init script, or from the rc.local script. Unfortunately details differ from OS to OS to be more specific.
Limit the memry using -Xmx=64m or something to make sure it fails before taking down the rest of the services. Play a bit with the number to find the lowest reliable size.
Also, since clojures claim to fame is its ability to deal with concurrency it make a lot of sense to only run one JVM with all functionality running on it in multiple threads. The overhead of spawning new processes is already very big and if it is a JVM which needs to JIT and warm up its memory management, doubly so. On a resource constrained machine could pose a problem. and on a resource rich machine this is a waste.
I always found that the JVM is not made to quickly run something script like and exit again. It is really not made for that use case in my opinion
.

Related

Does a new process is created when syscall is called in Minix?

For example, when we call write(...) in program in minix. Does a new process is created(like with fork()) or is it done within current process?
Is it efficient to make a lot of syscalls?
Process creation is strictly fork's / exec's job. What kind of process could a system call like write possibly spawn?
Now, Minix is a microkernel, meaning that things like file systems run in userland processes. Writing to a file could therefore possibly spawn a new process somewhere else, but that depends on your file system driver. I haven't paid attention to the MinixFS driver so far, so I can't tell you whether that happens -- but it's not very likely, process creation still being relatively expensive.
It's almost never efficient to make a lot of syscalls (context switches being involved). However, "performant", "efficient" and "a lot" are all very relative things, so I can't tell you something you probably don't know already.

What happens if an MPI process crashes?

I am evaluating different multiprocessing libraries for a fault tolerant application. I basically need any process to be allowed to crash without stopping the whole application.
I can do it using the fork() system call. The limit here is that the process can be created on the same machine, only.
Can I do the same with MPI? If a process created with MPI crashes, can the parent process keep running and eventually create a new process?
Is there any alternative (possibly multiplatform and open source) library to get the same result?
As reported here, MPI 4.0 will have support for fault tolerance.
If you want collectives, you're going to have to wait for MPI-3.something (as High Performance Mark and Hristo Illev suggest)
If you can live with point-to-point, and you are a patient person willing to raise a bunch of bug reports against your MPI implementation, you can try the following:
disable the default MPI error handler
carefully check every single return code from your MPI programs
keep track in your application which ranks are up and which are down. Oh, and when they go down they can never get back. but you're unable to use collectives anyway (see my opening statement), so that's not a huge deal, right?
Here's an old paper (back when Bill still worked at Argonne. I think it's from 2003):
http://www.mcs.anl.gov/~lusk/papers/fault-tolerance.pdf . It lays out the kinds of fault tolerant things one can do in MPI. Perhaps such a "constrained MPI" might still work for your needs.
If you're willing to go for something research quality, there's two implementations of a potential fault tolerance chapter for a future version of MPI (MPI-4?). The proposal is called User Level Failure Mitigation. There's an experimental version in MPICH 3.2a2 and a branch of Open MPI that also provides the interfaces. Both are far from production quality, but you're welcome to try them out. Just know that since this isn't in the MPI Standard, the function prefixes are not MPI_*. For MPICH, they're MPIX_*, for the Open MPI branch, they're OMPI_* (though I believe they'll be changing theirs to be MPIX_* soon as well.
As Rob Latham mentioned, there will be lots of work you'll need to do within your app to handle failures, though you don't necessarily have to check all of your return codes. You can/should use MPI error handlers as a callback function to simplify things. There's information/examples in the spec available along with the Open MPI branch.

Linux process activities

Is there possibility to show what's going on under specified process in Linux?
For example, i run SQL query -> select evil_function();
and notice that process under Linux uses all cpu.
So is there something with what I can see whats going on under this process?
What I want is to see what queries is running under this process.
Thanks!
strace will tell you what system calls the process is making.
To see what called routines are taking the most CPU, you need to run a profiling tool, and make sure the executable of the process you in compiled correctly (sometimes it needs to be instrumented during compilation for profiling, sometimes it just needs to be compiled with debug symbols, or not stripped of them after compilation).
You might want to look at oprofile, valgrind, gprof and for starters on free tools - there are also commercial products available.
Here are a few links:
http://www.pixelbeat.org/programming/profiling/
http://en.wikipedia.org/wiki/List_of_performance_analysis_tools
You are mixing a whole bunch of things.
If you are talking about MySQL do:
show processlist;
For info specifically about linux processes, you can strace the process to get a list of system function that it calls. Unless you are experienced with linux this will be useless to you.
If the process is paused then you can find out what function it is stopped on, but that's probably not what you want, since you say the process is running.
There are also various tools that can give you info on what parts of the disk the process is reading, and how much memory it's allocating.
And finally you can use gdb to break into the process and single step your way through it to see exactly what it's doing. This will also likely be useless to you since an SQL server does a LOT of things - far to many to understand by this method.

What causes a program to freeze

From what experience I have programming whenever a program has a problem it crashes, whether it is from an unhanded exception or a piece of code that should have been checked for errors, but was not and threw one. What would cause a program to completely freeze a system to the point of requiring a restart.
Edit: Thanks for the answers. As for the language and OS this question was inspired by me playing Fallout and the game freezing twice in an hour causing me to have to restart the xbox, so I am guessing c++.
A million different things. The most common that come to mind are:
Spawning too many threads or processes, which drowns the OS scheduler.
Gobbling too much RAM, which puts the memory manager into page-fault hell.
In a Dotnet/Java type environment its quite difficult to seize a system up, because the Runtime keeps you code at a distance from the OS.
Closer to the metal say C or C++, Assembly etc you have to play fair with the rest of the system - If you dont have it already grab a copy of Petzold and observe/experiment yourself with the amount of 'boilerplate' code to get a single Window running...
Even closer, down at the driver level all sorts of things can happen...
There are number of reasons, being internal or external that leads to deadlocked application, more general case is when something is being asked for by a program but is not given that leads to infinite waiting, the practical example to this is, a program writes some text to a file, but when it is about to open a file for writing, same file is opened by any other application, so the requesting app will wait (freeze in some cases if not coded properly) until it gets exclusive control of the file.
And a critical freeze that leads to restarting the system is when the file which is asked for is something which very important for the OS. However, you may not need to restart the system in order to get it back to normal, unless the program which was frozen is written in a language that produces native binary, i.e. C/C++ to be precise. So if application is written in a language which works with the concept of managed code, like any .NET language, it will not need a system restart to get things back to normal.
page faults, trying to access inaccessible data or memory(acces violation), incompatible data types etc.

How would I go about taking a snapshot of a process to preserve its state for future investigation? Is this possible?

Whether this is possible I don't know, but it would mighty useful!
I have a process that fails periodically (running in Windows 2000). I then have just one chance to react to it before having to restart it and painfully wait for it to fail again. I didn't write the process so don't have the source to debug. The failure is seemingly random.
With a snapshot of the process I could repeatedly and quickly test reactions to the failure.
I had thought of running inside a VM but this isn't possible in this instance.
EDIT:
#Jon Cage asked:
When you say a snapshot, you mean capturing a process when it's about to fail (including memory, program state etc. etc.) ...and then replaying it's final few seconds repeatedly to see what effect it has on some other component?
This is exactly what I mean!
I think minidump is what you are looking for.
You can also used Userdump:
The User Mode Process Dumper
(userdump) dumps any running Win32
processes memory image (including
system processes such as csrss.exe,
winlogon.exe, services.exe, etc) on
the fly, without attaching a debugger,
or terminating target processes.
Generated dump file can be analyzed or
debugged by using the standard
debugging tools.
This article shows you how to use it.
My best bet is to start the process in a debugger (OllyDbg being my preferred tool).
The process will pause on an exception, and you can try to figure out what happened shortly before that.
This needs some understanding of assembler and does not allow to create a snapshot of the process for later analysis. You would need to write your own debugger for that - it should be theoretically possible.