Setting environment variables before the execution of the pyiron wrapper on remote cluster - pyiron

I use a jobfile for SLURM in ~/pyiron/resources/queues/, which looks roughly like this:
#!/bin/bash
#SBATCH --output=time.out
#SBATCH --job-name={{job_name}}
#SBATCH --workdir={{working_directory}}
#SBATCH --get-user-env=L
#SBATCH --partition=cpu
module load some_python_module
export PYTHONPATH=path/to/lib:$PYTHONPATH
echo {{command}}
As you can see, I need to load a module to access the correct python version before calling "python -m pyiron.base.job.wrappercmd ..." and I also want to set the PYTHONPATH variable.
Setting the environment directly in the SLURM jobfile is of course working, but it seems very inconvenient, because I need a new jobfile under ~/pyiron/resources/queues/ whenever I want to run a calculation with a slightly different environment. Ideally, I would like to be able to adjust the environment directly in the Jupyter notebook. Something like an {{environment}} block in the above jobile, which can be configured via Jupyter, seems to a nice solution.
As far as I can tell, this is impossible with the current version of pyiron and pysqa. Is there a similar solution available?
As an alternative, I could also imagine to store the above jobfile close to the Jupyter notebook. This would also ease the reproducibility for my colleagues. Is there an option to define a specific file to be used as a jinja2-template for the jobile?
I could achieve my intended setup by writing a temporary jobfile under ~/pyiron/resources/queues/ via Jupyter before running the pyiron job, but this feels like quite a hacky solution.
Thank you very much,
Florian

To explain the example in a bit more detail:
I create a notebook named: reading.ipynb with the following content:
import subprocess
subprocess.check_output("echo ${My_SPECIAL_VAR}", shell=True)
This reads the environment variable My_SPECIAL_VAR.
I can now submit this job using a second jupyter notebook:
import os
os.environ["My_SPECIAL_VAR"] = "SoSpecial"
from pyiron import Project
pr = Project("envjob")
job = pr.create_job(pr.job_type.ScriptJob, "script")
job.script_path = "readenv.ipynb"
job.server.queue = "cm"
job.run()
In this case I first set the environment variable and then submit a script job, the script job is able to read the corresponding environment variable as it is forwarded using the --get-user-env=L option. So you should be able to define the environment in the jupyter notebook which you use to submit the calculation.

Related

Is there a way to get ipython autocompletion when piping a pandas dataframe to a function?

For example, if I have a pipe function:
def process_data(weighting, period, threshold):
# do stuff
Can I get autocompletion on the process data arguments?
There are a lot of arguments to remember and I would like to make sure they get passed in correctly. In ipython, the function can autocomplete to show me the keyword args which is really neat, but I would like it to do this when piping a pandas dataframe too!
I don't see how this would be possible, but then again, I'm truly in awe of ipython and all its greatness. So, is this possible? If not, are there other hacks that people have come up with?
Install the pyreadline library.
$ pip install pyreadline
Update:
It seems like this problem is specific to some versions of ipython. The solution is the following:
Run below command from the terminal:
$ ipython profile create
It will create a default profile at ~/.ipython/profile_default/ipython_config.py
Now edit this ipython_config.py and add the below lines and it will solve the issue.
c = get_config()
c.Completer.use_jedi = False
Reference:
https://github.com/jupyter/notebook/issues/2435
https://ipython.readthedocs.io/en/stable/config/intro.html

Maintain python virtual environment across raku shell commands

Is there a way to activate a python virtual env in one raku shell command, and then access the env in the next shell command? I want to do something like this in raku. Let's assume there is an executable named "execute_software" under the "some_env" env:
shell("source some_env");
shell("execute_software XXX XXX");
shell("source deactivate");
Currently, this doesn't work for me.
Thanks!
Tao
I don't know how you expected the environment to stay around after the program exits.
That is not something you can do with anything as far as I'm aware.
If that is something you want, may I suggest using Inline::Python?
use Inline::Python;
my $py = Inline::Python.new();
$py.run('print("hello world")');
use string:from<Python>;
say string::capwords('foo bar'); # prints "Foo Bar"

How to access cluster_config dict within rule?

I'm working on writing a benchmarking report as part of a workflow, and one of the things I'd like to include is information about the amount of resources requested for each job.
Right now, I can manually require the cluster config file ('cluster.json') as a hardcoded input. Ideally, though, I would like to be able to access the per-rule cluster config information that is passed through the --cluster-config arg. In init.py, this is accessed as a dict called cluster_config.
Is there any way of importing or copying this dict directly into the rule?
From the documentation, it looks like you can now use a custom wrapper script to access the job properties (including the cluster config data) when submitting the script to the cluster. Here is an example from the documentation:
#!python
#!/usr/bin/env python3
import os
import sys
from snakemake.utils import read_job_properties
jobscript = sys.argv[1]
job_properties = read_job_properties(jobscript)
# do something useful with the threads
threads = job_properties[threads]
# access property defined in the cluster configuration file (Snakemake >=3.6.0)
job_properties["cluster"]["time"]
os.system("qsub -t {threads} {script}".format(threads=threads, script=jobscript))
During submission (last line of the previous example) you could either pass the arguments you want from the cluster.json to the script or dump the dict into a JSON file, pass the location of that file to the script during submission, and parse the json file inside your script. Here is an example of how I would change the submission script to do the latter (untested code):
#!python
#!/usr/bin/env python3
import os
import sys
import tempfile
import json
from snakemake.utils import read_job_properties
jobscript = sys.argv[1]
job_properties = read_job_properties(jobscript)
job_json = tempfile.mkstemp(suffix='.json')
json.dump(job_properties, job_json)
os.system("qsub -t {threads} {script} -- {job_json}".format(threads=threads, script=jobscript, job_json=job_json))
job_json should now appear as the first argument to the job script. Make sure to delete the job_json at the end of the job.
From a comment on another answer, it appears that you are just looking to store the job_json somewhere along with the job's output. In that case, it might not be necessary to pass job_json to the job script at all. Just store it in a place of your choosing.
You can manage the resources for the cluster easily per Rules.
Indeed you have the keyword "resources:" to use like this :
rule one:
input: ...
output: ...
resources:
gpu=1,
time=HH:MM:SS
threads: 4
shell: "..."
You can specify the resources by the yaml configuration files for the cluster give with the parameter --cluster-config like this:
rule one:
input: ...
output: ...
resources:
time=cluster_config["one"]["time"]
threads: 4
shell: "..."
When you will call snakemake you will just have to access to the resources like this (example for slurm cluster):
snakemake --cluster "sbatch -c {threads} -t {resources.time} " --cluster-config cluster.yml
It will send each rule with its specific resources for the cluster.
For more informations, you can check the documentations with this link : http://snakemake.readthedocs.io/en/stable/snakefiles/rules.html
Best regards

How to read live output from subprocess python 2.7 and Apache

I have an Apache web server and I made a python script to run a command. Command that I'm running is launching a ROS launch file, that is working indefinitely. I would like to read output from the subprocess live and display it in the page. With my code so far I could only manage to make output to be printed after I terminate the process. I've tried all kinds of solutions from the web but none of them seem to work
command = "roslaunch package test.launch"
proc = subprocess.Popen(
command,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
env=env,
shell=True,
bufsize=1,
)
print "Content-type:text/html\r\n\r\n"
for line in iter(proc.stdout.readline, ''):
strLine = str(line).rstrip()
print(">>> " + strLine)
print("<br/>")
The problem is that the output of roslaunch is being buffered. subprocess is not the best tool for real-time output processing in such situation, but there is a perfect tool for just that task in Python: pexpect. The following snippet should do the trick:
import pexpect
command = "roslaunch package test.launch"
p = pexpect.spawn(command)
print "Content-type:text/html\r\n\r\n"
while not p.eof():
strLine = p.readline()
print(">>> " + strLine)
print("<br/>")
Andrzej Pronobis' answer above suffices for UNIX-based systems but the package pexpect does not work as effectively as one would expect for Windows in certain particular scenarios. Here, spawn() doesn't work for Windows as expected. We still can use it with some alterations that can be seen here in the official docs.
The better way here might be to use wexpect (official docs here). It caters to Windows alone.

Persistent Python Command-Line History

I'd like to be able to "up-arrow" to commands that I input in a previous Python interpreter. I have found the readline module which offers functions like: read_history_file, write_history_file, and set_startup_hook. I'm not quite savvy enough to put this into practice though, so could someone please help? My thoughts on the solution are:
(1) Modify .login PYTHONSTARTUP to run a python script.
(2) In that python script file do something like:
def command_history_hook():
import readline
readline.read_history_file('.python_history')
command_history_hook()
(3) Whenever the interpreter exits, write the history to the file. I guess the best way to do this is to define a function in your startup script and exit using that function:
def ex():
import readline
readline.write_history_file('.python_history')
exit()
It's very annoying to have to exit using parentheses, though: ex(). Is there some python sugar that would allow ex (without the parens) to run the ex function?
Is there a better way to cause the history file to write each time? Thanks in advance for all solutions/suggestions.
Also, there are two architectural choices as I can see. One choice is to have a unified command history. The benefit is simplicity (the alternative that follows litters your home directory with a lot of files.) The disadvantage is that interpreters you run in separate terminals will be populated with each other's command histories, and they will overwrite one another's histories. (this is okay for me since I'm usually interested in closing an interpreter and reopening one immediately to reload modules, and in that case that interpreter's commands will have been written to the file.) One possible solution to maintain separate history files per terminal is to write an environment variable for each new terminal you create:
def random_key()
''.join([choice(string.uppercase + string.digits) for i in range(16)])
def command_history_hook():
import readline
key = get_env_variable('command_history_key')
if key:
readline.read_history_file('.python_history_{0}'.format(key))
else:
set_env_variable('command_history_key', random_key())
def ex():
import readline
key = get_env_variable('command_history_key')
if not key:
set_env_variable('command_history_key', random_key())
readline.write_history_file('.python_history_{0}'.format(key))
exit()
By decreasing the random key length from 16 to say 1 you could decrease the number of files littering your directories to 36 at the expense of possible (2.8% chance) of overlap.
I think the suggestions in the Python documentation pretty much cover what you want. Look at the example pystartup file toward the end of section 13.3:
http://docs.python.org/tutorial/interactive.html
or see this page:
http://rc98.net/pystartup
But, for an out of the box interactive shell that provides all this and more, take a look at using IPython:
http://ipython.scipy.org/moin/
Try using IPython as a python shell. It already has everything you ask for. They have packages for most popular distros, so install should be very easy.
Persistent history has been supported out of the box since Python 3.4. See this bug report.
Use PIP to install the pyreadline package:
pip install pyreadline
If all you want is to use interactive history substitution without all the file stuff, all you need to do is import readline:
import readline
And then you can use the up/down keys to navigate past commands. Same for 2 or 3.
This wasn't clear to me from the docs, but maybe I missed it.