How to add a new version of VASP in pyiron - pyiron

I have a new and compiled version of VASP that performs magnetic constrains of the local moment orientations on-the-fly, which I'd like to test and use with pyiron.
Please could you provide guidance and steps to follow in order to add this version of VASP to pyiron as one more executable?
Thank you,
Eduardo

You need to add corresponding run script(s) to the pyiron resources.
In your .pyiron config file the paths for the resources are stated as RESOURCE_PATHS = /comma/separated/list, /of/paths/to/the/resources.
In the resources you need to add the run scripts in the directory vasp/bin/ using the naming convention run_code_version[_mpie].sh. The actual run script is, of course, dependent on the cluster and the libraries used to build VASP and might look similar to
run_vasp_version.sh (single core version):
module load intel/...
srun -n 1 /path/to/the/new/executable/version/vasp_std
run_vasp_version_mpie.sh (mpi version):
module load intel/... impi/...
srun -n $1 /path/to/the/new/executable/version/vasp_std
Please have a look at the other vasp run scripts which you have used from the shared resources.
(For MPIE colleagues: Detailed examples for the setup/run scripts for vasp on cmti can be found in this privat repo.)

Related

Why does multiprocessing Julia break my module imports?

My team is trying to run a library (Cbc with JuMP) with multiprocessing and using the julia -p # argument. Our code is in a julia package and so we can run our code fine using julia --project, it just runs with one process. Trying to specify both at once however julia --project -p 8 breaks our ability to run the project since running using PackageName after results in an error. We also intend to compile this using the PackageCompiler library so getting it to work with a project is necessary.
We have our project in a folder with a src directory, a Project.toml, and a Manifest.toml
src contains: main.jl and Solver.jl
Project.toml contains:
name = "Solver"
uuid = "5a323fe4-ce2a-47f6-9022-780aeeac18fe"
authors = ["..."]
version = "0.1.0"
Normally, our project works fine starting this way (single threaded):
julia --project
julia> using Solver
julia> include("src/main.jl")
If we add the -p 8 argument when starting Julia, we get an error upon typing using Solver:
ERROR: On worker 2:
ArgumentError: Package Solver [5a323fe4-ce2a-47f6-9022-780aeeac18fe] is required but does not seem to be installed:
- Run `Pkg.instantiate()` to install all recorded dependencies.
We have tried running using Pkg; Pkg.instantiate(); using Solver but this doesn't help as another error just happens later (at the include("src/main.jl") step):
ERROR: LoadError: On worker 2:
ArgumentError: Package Solver not found in current path:
- Run `import Pkg; Pkg.add("Solver")` to install the Solver package.
and then following that suggestion produces another error:
ERROR: The following package names could not be resolved:
* Solver (not found in project, manifest or registry)
Please specify by known `name=uuid`.
Why does this module import work fine in single process mode, but not with -p 8?
Thanks in advance for your consideration
First it is important to note that you are NOT using multi-thread parallelism, you are using distributed parallelism. When you initiate with -p 2 you are launching two different processes that do not share the same memory. Additionally, the project is only being loaded in the master process, that is why the other processes cannot see whatever is in the project. You can learn more about the different kinds of parallelism that Julia offers in the official documentation.
To load the environment in all the workers, you can add this to the beginning of your file.
using Distributed
addprocs(2; exeflags="--project")
#everywhere using Solver
#everywhere include("src/main.jl")
and remove the -p 2 part of the line which you launch julia with. This will load the project on all the processes. The #everywhere macro is used to indicate all the process to perform the given task. This part of the docs explains it.
Be aware, however, that parallelism doesn't work automatically, so if your software is not written with distributed parallelism in mind, it may not get any benefit from the newly launched workers.
There is an issue with Julia when an uncompiled module exists and several parallel processes try to compile it at the same time for the first use.
Hence, if you are running your own module across many processes on a single machine you always need to run in the following way (this assumes that Julia process is run in the same folder where your project is located):
using Distributed, Pkg
#everywhere using Distributed, Pkg
Pkg.activate(".")
#everywhere Pkg.activate(".")
using YourModuleName
#everywhere using YourModuleName
I think this approach is undocumented but I found it experimentally to be most robust.
If you do not use my pattern sometimes (not always!) a compiler chase occurs and strange things tend to happen.
Note that if you are running a distributed cluster you need to modify the code above to run the initialization on a single worker from each node and than on all workers.

How can I install matplotlib for my AWS Elastic Beanstalk application?

I'm having a hell of a time deploying matplotlib on AWS Elastic Beanstalk. I gather that my issue comes from some dependencies and the way that EB deploys packages installed with PIP, and have attempted to follow the instructions here on SO for resolving the issue.
I first tried incrementally deploying, as suggested in the linked answer, by adding pieces of the matplotlib package stack to my requirements.txt file in stages. But this takes forever (for each stage) and is prone to failure and timing out (which seems to leave build directories behind that stall subsequent package installations).
So the simple solution mentioned off-handedly at the end of the answer appeals to me: just eb ssh, activate the virtialenv with
source /opt/python/run/venv/bin/activate
and pip install packages manually. But I can't get this to work either. First I'm often confronted with left-beind build directories (as mentioned above)
pip can't proceed with requirement 'xxxx' due to a pre-existing build directory.
location: /opt/python/run/venv/build/xxxx
This is likely due to a previous installation that failed.
pip is being responsible and not assuming it can delete this.
Please delete it and try again.
But even after removing these, I consistently get
Exception:
Traceback (most recent call last):
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/basecommand.py", line 122, in main
status = self.run(options, args)
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/commands/install.py", line 278, in run
requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/req.py", line 1197, in prepare_files
do_download,
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/req.py", line 1375, in unpack_url
self.session,
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/download.py", line 582, in unpack_http_url
unpack_file(temp_location, location, content_type, link)
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/util.py", line 625, in unpack_file
untar_file(filename, location)
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/util.py", line 533, in untar_file
os.makedirs(location)
File "/opt/python/run/venv/lib64/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/opt/python/run/venv/build/xxxx'
in response to pip install xxxx (and sudo pip fails with sudo: pip: command not found).
What can I do to get this working on AWS-EB? In particular, what do I need to do to get the simple SSH+PIP approach working; or is there some other better — simpler! — approach I should try.
FWIW, I have a .ebextensions/software.config with
packages:
yum:
gcc-c++: []
gcc-gfortran: []
python-devel: []
atlas-sse3-devel: []
lapack-devel: []
libpng-devel: []
freetype-devel: []
zlib-devel: []
and a requirements.txt that ends with
pytz==2014.10
pyparsing==2.0.3
python-dateutil==2.4.0
nose==1.3.4
six>=1.8.0
mock==1.0.1
numpy==1.9.1
matplotlib==1.4.2
After about 4 hours, I've gotten far as numpy (as reported by pip list in the EB virtualenv).
And (in case it matters) the user who is SSHing is part in a group with the policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"elasticbeanstalk:*",
"ec2:*",
"elasticloadbalancing:*",
"autoscaling:*",
"cloudwatch:*",
"s3:*",
"sns:*",
"cloudformation:*",
"rds:*",
"sqs:*",
"iam:PassRole"
],
"Resource": "*"
}
]
}
I have used many approaches to build and deploy numpy/scipy/matplotlib, on Windows as well as Linux systems. I have used system-provided package managers (aptitude, rpm), 3rd-party package managers (pypm), Python package managers (easy_install, pip), source releases, used different build environments/tools (GCC, but also Intel MKL, OpenMP). While doing so, I have run into many many quite annoying situations, but have also learned a lot about the pros and cons of each approach.
I have no experience with Elastic Beanstalk (EB), but I have experience with EC2. I see that you can SSH into an instance and poke around. So, what I suggest further below is based on
above-stated experiences and on
the more or less obvious boundary conditions regarding Beanstalk and on
your application scenario, described in another question here on SO and on
the fact that you just want to get things running, quickly
My suggestion: start off with not building these things yourself. Do not use pip. If possible, try to use the package manager of the Linux distribution in place and let it handle the installation of everything required for you, with a single command (e.g. sudo apt-get install python-matplotlib).
Disadvantages:
possibly old package versions, depending on the Linux distro in use
non-optimized builds (e.g. not built against e.g. Intel MKL or not leveraging OpenMP features or not using special instruction sets)
Advantages:
it quickly downloads, because packages are most likely cached near your machine
it quickly installs (these packages are pre-built, no compilation involved)
it just works
So, I hope you can just use aptitude or rpm or whatever on these machines and inherit the great work that the distribution package maintainers do for you, behind the scenes.
Once you are confident in your application and identified some bottleneck or issue, you might have reason to use a newer version of numpy/matplotlib/... or you might have reason to have a faster version of these, by creating an optimized build.
Edit: EB-related details of outlined approach
In the meantime we have learned that EB by default runs Amazon Linux which is based on Red Hat Enterprise Linux. Likewise, it uses yum as package manager and packages are in RPM format.
Amazon provides documentation about available packages. In Amazon Linux 2014.09, these packages are available: http://aws.amazon.com/de/amazon-linux-ami/2014.09-packages/
In this list we find
numpy-1.7.2
python-matplotlib-0.99.1.2
This version of matplotlib is very old, according to the changelog it is from September 2009: "2009-09-21 Tagged for release 0.99.1".
I did not anticipate it to be so old, but still, it might be sufficient for your needs. So we proceed with our plan (but I'd understand if that's a blocker).
Now, we have learned that system Python and EB Python are isolated from each other. That does not mean that EB Python cannot access system Python site packages. We just need it to tell so. A simple and clean method is to set up a proper directory structure with the packages that should be accessible to EB Python, and to communicate this directory to EB Python via sys.path.
Clearly, we need to customize the bootstrapping phase of EB containers. The available tools are documented here: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html
Obviously, we want to make use of the packages approach, and tell EB to install the numpy and python-matplotlib packages via yum. So the corresponding config file section should contain:
packages:
yum:
numpy: []
python-matplotlib: []
Explicitly mentioning numpy might not be necessary, it likely is a dependency of python-matplotlib.
Also, we need to make use of the commands section:
You can use the commands key to execute commands on the EC2 instance.
The commands are processed in alphabetical order by name, and they run
before the application and web server are set up and the application
version file is extracted.
The following three commands create above-mentioned directory, and set up symbolic links to the numpy/mpl installation paths (these paths hopefully are available in the moment these commands become executed):
commands:
00-create-dir:
command: "mkdir -p /opt/py26-selected-site-packages"
01-link-numpy:
command: "ln -s /usr/lib64/python2.6/site-packages/numpy /opt/py26-selected-site-packages/numpy"
02-link-mpl:
command: "ln -s /usr/lib64/python2.6/site-packages/matplotlib /opt/py26-selected-site-packages/matplotlib"
Two uncertainties: the AWS docs to not clarify that packages are processed before commands are executed. You have to try. It it does not work, use container_commands. Secondly, it is just an educated guess that /usr/lib64/python2.6/site-packages/matplotlib is available after installing python-matplotlib. It should be installed to this place, but it may end up somewhere else. Needs to be tested. Numpy should end up where specified as inferred from this article.
[UPDATE FROM SEB]
AWS documentation says "The cfn-init helper script processes these configuration sections in the following order: packages, groups, users, sources, files, commands, and then services."
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-init.html
So, your approach is safe
[/UPDATE]
The crucial step, as pointed out in the comments to this answer, is to tell your Python app where to look for packages. Direct modification of sys.path before attempting to import is a reliable method to take control of this. The following code adds our special directory to the selection of directories in which Python looks out for packages, and then attempts to import matplotlib:
sys.path.append("/opt/py26-selected-site-packages")
from matplotlib import pyplot
The order in sys.path defines priorities, so in case there is any other matplotlib or numpy package available in one of the other directories, it might be a better idea to
sys.path.insert(0, "/opt/py26-selected-site-packages")
However, this should not be necessary if our whole approach was well thought-through.
To add to Jan-Philip Answer :
AWS Elastic Beanstalk is using Amazon Linux distribution (except for .Net environments). Amazon Linux uses the yum package manager. MatPlotLib is available in Amazon's software repository.
[ec2-user#ip-1-1-1-174 ~]$ yum list | grep matplot
python-matplotlib.x86_64 0.99.1.2-1.6.amzn1 amzn-main
If this version is the one you need for your application, I would try to simply modify your .ebextensions/software.config file and to add the package to the yum section of it:
packages:
yum:
python-matplotlib: []
python-devel: []
atlas-sse3-devel: []
lapack-devel: []
libpng-devel: []
freetype-devel: []
zlib-devel: []
A last note about AWS Elastic BeansTalk and SSH.
While Amazon gives you the possibility to SSH to your Elastic Beanstalk instances, you should use this possibility only for debugging purposes, to understand why your app failed or is not installing as suggested.
Other than that, your deployment must be 100% automatic. When Elastic Beanstalk (Auto Scaling to be precise) will scale out your infrastructure (add more instances) or scale it in (terminate instances) depending on your application workload, all your manual configuration will be lost.
Best practices is to not install SSH keys on your production environment, it further reduces the surface of attacks.
I might be a bit late to this question, but as AWS and a lot of the cloud service providers are moving into Docker and taking into consideration that you haven't specified the platform . I have a fast solution to your question:
Use the generic docker platform.
I created some images with Python, Numpy, Scipy and Matplotlib preinstalled, so you can directly pull and start using them with one line of code.
Python 2.7(This one also has the versions that you were specifying for numpy and matplotlib)
sudo docker pull chuseuiti/pynuscimat2.7
Python 3.4
sudo docker pull chuseuiti/pynusci
However you can create your own image or modify existing images.
In case you want to automate your instances, you can pass a Dockerfile to AWS with the definition of your image.
Tip, in case you don't know about docker:
It is need to login before been able to pull:
sudo docker login
After pulling the image, you can generate and work in a container created from an image with the next code:
sudo docker run -i -t chuseuiti/pynuscimat2.7 bash
PS. At least with the free tier AWS is always complaining about running out of time with scipy and matplotlib, it takes too much time to install them, that is why I use this option.

How do you compile a Pharo VM without an image?

I have already cloned the VM and installed all dependencies for my platform. Now I am a bit confused because a couple of guides suggests that Pharo image should be started to generate the C sources translated from Slang.
"Unix"
PharoVMBuilder buildUnix32.
"OSX"
PharoVMBuilder buildMacOSX32.
"Windows"
PharoVMBuilder buildWin32.
But how you generate a VM when you cannot start a VM in your platform? This sounds like chicken and egg problem.
This means is not possible to build a VM if you cannot start an image in that platform?
If you download pre-generated sources from the CI server as suggested by Esteban, you don't need the pharo-vm sources cloned from any repository. Just uncompress in a new folder and build from there.
Assuming you have your new sources in c:\phs, open directories.cmake and rename the hardcoded path as follows:
set(topDir "c:/phs/")
set(buildDir "c:/phs/build")
set(thirdpartyDir "${buildDir}/thirdparty")
set(platformsDir "c:/phs/platforms")
set(srcDir "c:/phs/src")
set(srcPluginsDir "${srcDir}/plugins")
set(srcVMDir "${srcDir}/vm")
set(platformName "win32")
set(targetPlatform ${platformsDir}/${platformName})
set(crossDir "${platformsDir}/Cross")
set(platformVMDir "${targetPlatform}/vm")
set(outputDir "c:/phs/results")
As you could not start a VM, I suppose you need to change at least the compilation flags used to generate the sources in the CI server. They are in c:\phs\build\CMakeLists.txt specially the following flags:
-march=... (your processor architecture, search for Safe Cflags)
Removing -g0 which suppress debug options
Remove -O2 (optimizations)
Remove -DNDEBUG
Modify -DDEBUGVM=0 to -DDEBUGVM=1
and finally start the build script
cd /c/phs/build
bash build.sh
You need to pre-generate the sources outside or take pre-generated sources from other place.
Let's assume you want to compile a kind of unix, you can download pre-generated sources from here:
https://ci.inria.fr/pharo/view/3.0-VM/job/PharoSVM/Architecture=32,Slave=vm-builder-linux/lastSuccessfulBuild/artifact/sources.tar.gz (for a stack vm)
https://ci.inria.fr/pharo/view/3.0-VM/job/PharoVM/Architecture=32,Slave=vm-builder-linux/lastSuccessfulBuild/artifact/sources.tar.gz (for a cog vm)

Saving ipython aliases

In ipython 0.10 and 0.11, is there an easy way to make and save aliases? I know there is discussion of allowing store of aliases for 0.12, but what can I do with my students that will be easy. I'd like to save this alias:
alias rtupdate (cd ~/projects/researchtools; hg pull; hg update)
Is the only real option to edit ~/.ipython/ipythonrc or follow http://ipython.scipy.org/Wiki/tips for 0.10 or work with the alias manager in 0.11 (http://wiki.ipython.org/Cookbook/Moving_config_to_IPython_0.11) ?
Students each have their own VMWare Ubuntu 11.04 virtual machine with ipython 0.10.1. I can make this a separate shell executable block in org-mode and add makefiles that will remind people how to do a pull and update with mercurial, but I have yet to explain what a Makefile is. e.g. this kind of hint:
https://bitbucket.org/schwehr/researchtools/src/829773b7db64/Makefile
Are your students on their own machines, or do you control their environment?
If you want configuration to survive from one session to the next, the official way to do that is to edit your config, but there are other ways. For instance, you could write an IPython extension which defines extra aliases, and provide that to your students.
What may be easiest for your students, though, is to simply provide a script to run on startup, containing the lines you want to run, defining aliases, etc. You can call it something like init.ipy, then just instruct IPython to run the script. This can be done in config with InteractiveShellApp.exec_files, or you can just specify it at the command-line with ipython -i init.ipy, or at any later point with %run init.ipy.
Note that a script with the .ipy extension is allowed to have IPython commands (e.g. %alias rtupdate (cd ~/projects/researchtools; hg pull; hg update)), but if you use .py it is treated as a regular Python script.

Creating a new Trac project via trac-admin initenv

I'm somewhat new to Trac.
I'm running trac version 0.11.7 on an ubuntu system.
I'm trying to create another project via the following command:
"trac-admin /var/lib/trac/shipping_tracker initenv".
After answering the various questions, the program fails and returns an error
( see: http://pastebin.com/yijzpB3i ) "Table 'system' already exists"
Does this mean that every-time I need to create a new project, I'll have to go into
the mysql database and create a new database, like trac1, trac2, etc??
I did notice this particular ticket ( http://trac.edgewall.org/ticket/5138 ) where
someone states you have to create a new database for each project. Is this correct??
Thank you.
--Mike
Every Trac environment, being a completely self-contained space, uses a separate database. So yes, you need to create a new database for each environment (although it might be a bad idea to name them trac1, trac2 etc.).
If you want to create new environments often, what you really need is probably multi-project support, which allows you to have different projects within one environment. However, it is still not done as of Trac 0.13, and is planned for 0.14.
You might also want to read about various ideas on having multiple projects with Trac. One of them deals with making Trac store multiple environments in a single database, though it might be outdated and probably breaks automatic updates.
I am using Trac 1.0, running as a stand-alone server, and in order to run multiple projects on one trac installation you still need to set up new environment using
trac-admin /path/to/trac/yournewpoject initenv
... then create .htpasswd file in the /path/to/trac/yournewpoject dir, add users using
htpasswd /path/to/trac/yournewpoject/.htpasswd newuser
(or copy an existing .htpasswd file there) ... and then restart trac with similar to the followin command:
python /path/to/tracd --user=yourlinuxuser --group=yourlinuxgroup -d \
-b hostname -p 8000 \
--basic-auth=oldproject,/path/to/trac/oldproject/.htpasswd,realmname \
--basic-auth=yournewpoject,/path/to/trac/yournewpoject/.htpasswd,realmname \
/path/to/trac/oldproject \
/path/to/trac/yournewpoject
This is valid in case you are using the same type of basic authentication as I do.