It's common these days to have to run daemons alongside a workflow. For example, deep learning jobs require parameter servers and hyperparameter optimization servers. I don't see any obvious supported way of doing this in Snakemake.
I could put code into the Snakefile that starts up a daemon when the workflow starts and kills it when it exits. I could also define a daemon as a rule that generates a daemon.pid file and have rules that need the daemon depend on that file.
Neither of those are ideal though, because they really don't express the intent of the daemon in the workflow. On top of that, while the .pid file approach may start up a daemon only as needed, it doesn't shut it down when it's not needed anymore.
How do people deal with this in their workflows?
Ideally, there would be a separate declaration for a "daemon", and it would get started just before the first rule that depends on the daemon starts, and it gets shut down when no more rules need it. Other workflow systems take that approach. Is there anything like it in Snakemake?
Granted I've never used daemons, from my understanding I don't see your situation as particularly complicated.
One option could be to use the onstart/onsuccess/onerror handlers. Of course, this would start the daemon right at the beginning of the pipeline and stop it at the end, which may be undesirable.
Alternatively, make the first rule requiring the daemon depend on daemon.start. Then, the last rule requiring the daemon should touch a daemon.end file which triggers the stopping of the daemon. Something like:
rule all:
input:
'foo.txt',
'daemon.end',
rule start_daemon:
output:
touch('daemon.start'),
shell:
r"""
start-daemon
"""
rule do_stuff_with_daemon:
input:
'daemon.start',
output:
'foo.txt',
shell:
r"""
do stuff with daemon
"""
rule stop_daemon:
input:
'foo.txt',
output:
touch('daemon.end'),
shell:
r"""
stop-daemon
"""
Related
we run a daemon in all of our machines, however we have a demand that we need to feed different machines with different configurations.
What we need is like this:
When some one reconfigure some thing in the front end, we need to generate new configuration and send these new configurations to the specified machine.
Besides, we should also need to execute commands, like restart after configuration distributed.
we should have a way to check whether the configuration in the specified machine is the newest one, i.e. whether the configuration distribution executed in the first phrase succeeds or not.
I'd keep separate the generation of the new configs from the actual deployment.
I'd have a centralized daemon keeping track of the frontend changes and updating the configs and the necessary machine/config mapping in a central, well/known location. In a simpler form it could also be an on-demand manually executed process.
Your existing daemon would be modified to periodically check if the particular config for its own machine is changed and, if so, apply the change, perform the necessary deployment cmds and maybe even report back into the central location its own progress and results for the centralized daemon's overall status reporting. This keep this daemon code simpler (free from the machine/config mapping logic) and thus more reliable.
I have a Apache module that acts as a security filter that allows requests to pass or not. This is a custom made module, I don't want to use any existent module.
I have actually two questions:
The module has its own log file. I'm thinking that the best location should be in /var/log/apache2/ but since the Apache process runs on www-data user, it cannot create files on that path. I want to find a solution for the log file in such way that is not much intrusive (in terms of security) for a typical web server. Where would be the best place and what kind of security attributes should be set?
The module communicates with another process using pipes. I would like to spawn this process from Apache module only when I need it. Where should I locate this binary and how should I set the privileges as less intrusive as possible?
Thanks,
Cezane.
Apache starts under the superuser first and performs the module initialization (calling the module_struct::register_hooks function). There you can create the log files and either chown them to www-data or keep the file descriptor open in order to later use it from the forked and setuided worker processes.
(And if you need an alternative, I think it's also possible to log with syslog and configure it to route your log messages to your log file).
Under the worker process you are already running as the www-data user so there isn't much you can do to further secure the execution. For example, AFAIK, you can't setuid to yet another user or chroot to protect the filesystem.
What you can do to improve the security is to use a system firewall. For example, under AppArmor you could tell the operating system what binaries your Apache module can execute, stopping it from executing any unwanted binaries. And you can limit that binary's filesystem access, preventing it from accessing www-data files that doesn't belong to it.
I have a tool which supports interactive queries though tcl shell. I want to create a web application through which users can send different queries to the tool. I have done some basic programming using Apache web server and cgi scripts, but i am unable to think of a way to keep the shell alive and send queries to that.
Some more information:
Let me describe it more. Tool builds a graph data structure, after building users can query for information using tcl shell, something like get all child nodes of a particular node. I cannot build the data structure with every query because building takes lot of time. I want to build the data structure and somehow keep the shell alive. Apache server should send all the queries to that shell and return the responses back to the user
You might want to create a daemon process, perhaps using expect, that spawns your interactive program. The daemon program could listen to queries over TCP using Tcl's socket command. Your CGI program would create a client socket to talk to the daemopn.
I'd embed the graph-managing program into an interpreter that's also running a small webserver (e.g., tclhttpd, though that's not the only option) and have the rest of the world interact with the graph through RESTful web accesses. This could then be integrated behind Apache in any way you like — a CGI thunk would work, or you could do request forwarding, or you could write some server-side code to do it (there's many options there!) — or you could even just let clients connect directly. Many options would work.
The question appears to be incomplete as you did not specify what exactly does "interactive" mean with regard to your tool.
How does it support interactive queries? Does it call gets in a kind of endless loop and processed each line as it's read? If so, the solution to your problem is simple: the Tcl shell is not really concerned about whether its standard input is connected to an interactive terminal or not. So just spawn your tool in your CGI request handling code, write the user's query to that process's stdin stream, flush it and then read all the text written by that process to its stdout and stderr streams. Then send them back to the browser. How exactly to spawn the process and communicate with it via its standard streams heavily depends on your CGI code.
If you don't get the idea, try writing your query to a file and then do comething like
$ tclsh /path/to/your/tool/script.tcl </path/to/the/query.file
and you should have the tool to respond in a usual way.
If the interaction is carried using some other way in your tool, then you probably have to split it to a "core" and "front-end" parts so that the core just reads queries and outputs results, and the front-end part carries out interaction. Then hook up that core to your CGI processing code in a way outlined above.
As part of my learning process, I thought it would be good if I expand a little more knowledge on what I know about apache. I have several questions, and while I know some of the stuff may require a rather lengthy explanation, I hope you can provide an overview so I know where to go looking. (preferably reference to mod_wsgi)I have read some resources after searching on google, and what I know arrive from there, so please bear with me.
What does the apache lifecyle looks like before, during, and after it receives a http request? Does it spawns a new child process to do the work, or creates a thread in one of the child process?
Does apache by default runs under www-data? So if that's the case, if I want a directory under my project folder to be used for logs, I can change just the folder group to www-data and allows write access?
What user will the python interpreter run under, after being invoked by apache? And what will processes created by Popen or multiprocessing from there run under?
I ran ps U www-data. Why are there so many processes with
S 0:00 /usr/sbin/apache2 -k start
The Apache mpm prefork module handles one connection in one process. To handle connections fast and not spawn processes on demand, apache maintains a process-pool. This explains why you see so many processes in the process-list. If a connection comes in, it is handed to one of the already existing processes.
Some more information is here: http://httpd.apache.org/docs/2.0/en/mod/prefork.html
The answer to question 2) is yes, apache always runs as www-data und you can grant access to any directory by changing it's group permissions to www-data.
Read:
http://www.fmc-modeling.org/category/projects/apache/amp/Apache_Modeling_Project.html
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
http://code.google.com/p/modwsgi/wiki/QuickConfigurationGuide#Delegation_To_Daemon_Process
http://code.google.com/p/modwsgi/wiki/ConfigurationDirectives#WSGIDaemonProcess
The first one will tell you all the gory details of how Apache works internally. The latter relate to mod_wsgi specifically and process/threading model.
I've recently created a dropbox system using inotify, watching for files created in a particular directory. The directory I'm watching is mounted from an NFS server, and inotify is behaving differently than I'd expect. Consider the following scenario in which an inotify script is run on machine A, watching /some/nfs/dir/also/visible/to/B.
-Using machine A to create a file in /some/nfs/dir/also/visible/to/B, the script behaves as expected. Using machine B to carry out the same action, the script is not notified about a new file dropped in the directory.
-When the script is run on the NFS server, it gets notified when files are created from both machine A and machine B.
Is this a bug in the bug in the package I'm using to access inotofy, or is this expected behaviour?
inotify requires support from the kernel to work. When an application tracks a directory, it asks the kernel to inform it when those changes occur. When the change occurs, in addition to writing those changes to disk, the kernel also notifies the watching process.
On a remote NFS machine, the change is not visible to the kernel; it happens entirely remotely. NFS predates inotify and there is no network level support for it in NFS, or anything equivalent.
If you want to get around this, You can run a service on the storage server (since that kernel will always see changes to the filesystem) that brokers inotify requests for remote machines, and forward the data to the remote clients.
Edit: It seems odd to me that NFS should be blamed for its lack of support for inotify.
Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems in 1984, wikipedia article
However:
Inotify (inode notify) is a Linux kernel subsystem that acts to extend filesystems to notice changes to the filesystem. [...] It has been included in the mainline Linux kernel from release 2.6.13 (June 18, 2005 ) [...]. wikipedia article
It's hard to expect a portable network protocol/application to support a specific kernel feature developed for a different operating system and that appeared more than twenty years later. Even if it did include extensions for it, they would not be available or useful on other operating systems.
*emphasis mine in all cases
Another problem with this; Lets suppose we are not using a network at all, but rather, a local filesystem with good inotify support: ext3 (suppose its mounted at /mnt/foo). But instead of a real disk, the filesystem is mounted from a loopback device ; and the underlying file is in turn accessible at a different location in the vfs (say, /var/images/foo.img).
Now, you're not supposed to modify mounted ext3 filesystems, But it's still reasonably safe to do so if the change is to file contents instead of metadata.
So suppose a clever user modifies the file system image (/var/images/foo.img) in a hex editor, replacing a file's contents with some other data, while at the same time an inotify watch is observing the same file on the mounted filesystem.
There's no reasonable way one can arrange for inotify to always inform the watching process of this sort of change. Although there are probably some gyrations that could be take to make ext3 notice and honor the change, none of that would apply to, say, the xfs drtiver, which is otherwise quite similar.
Nor should it. You're cheating!. inotify can only inform you of changes that occured through the vfs at the actual mountpoint being watched. If the changes occured outside that VFS, because of a change to the underlying data, inotify can't help you and isn't designed to solve that problem.
Have you considered using a message queue for network notification?
To anyone who has come across this question in the search for an answer of why bind mounting on Docker will not detect file changes from host directory (for hot reloading of an app), it's because the propagation of file changes between host and container is not communicated to the container kernel.
Only changes from the container itself is communicated to the kernel. Solution for this is to have your live reload utility turn on "polling mode" instead of using fsnotify.
I found an SGI FAM using an supervisor daemon to monitor file modification. It supports NFS and you can see some description on wiki
I agree with SingleNegationElimination's explanation, and would like to add that iSCSI targets will work, since they alert the kernel.
So things on "real" file systems (relative to the system, that is) will trigger Inotify to alert. Like Rsync'ing, net-catting something into a mounted partition.
If you have to get notifications via inotify (or have to use inotify) you can make a cron to rsync -avz over to the file system. Drawbacks of course are that you are using real system hdd space.
I second #SingleNegationElimination.
Also, you can try notify-forwarder.
Machine A watches for local inotify events, then forwards them to Machine B (via UDP).
Machine B doesn't (can't?) replay the events, but fires an ATTRIB event for the changed file.
If you use vagrant, use vagrant-notify-forwarder.
the problem with notify-forwarder is that it does not trigger an inotify event. It uses utime to update the timestamp for the file on the remote system but inotify fails to see this.
AFAIK, the timestamp already gets updated when using an NFS mount. I have verified this myself between a Synology NAS NFS server and a Raspbian NFS mount (client).
Here's my solution / hack on the client:
#!/bin/bash
path=$1
firstmd5=`ls -laR $path | md5sum | awk ' { print $1 }'`
while true
do
lastmd5=`ls -laR $path | md5sum | awk ' { print $1 }'`
if [ $firstmd5 != $lastmd5 ]
then
firstmd5=$lastmd5
echo files changed
fi
sleep 1
done
Granted, this doesn't report on the specific file being changed, but does provide a general notification hook that something's changed.
It's annoying / kludgy but if I needed more details I would do some additional hacking to isolate the actual files changed.
improved the script with action on click and icon
#!/bin/bash
DAT=$(date +%Y%m%d)
CAM="cam1 "
CHEMIN=/mnt/cams/cam1/$DAT/
first="$CHEMIN"
if [ -d "$CHEMIN" ];then
first=`ls -1rt $CHEMIN | tail -n 1`
fi
echo $first
while true
do
if [ -d "$CHEMIN" ];then
last=`ls -1rt $CHEMIN | tail -n 1`
if [ $first != $last ]
then
first=$last
echo $last created
#notify-send -h string:desktop-entry:nautilus -c "transfer.complete" -u critical -i $PWD../QtVsPlayer.png $CAM $last"\n\r"$CHEMIN
reply=$(dunstify -a QtVsPlayer -A 'open,ouvrir' -i "QtVsPlayer" "$CAM $last"\n\r"$CHEMIN")
if [[ "$reply" == "open" ]]; then
QtVsPlayer -s $CHEMIN$last
fi
fi
fi
sleep 5m
done