How can I install matplotlib for my AWS Elastic Beanstalk application? - numpy

I'm having a hell of a time deploying matplotlib on AWS Elastic Beanstalk. I gather that my issue comes from some dependencies and the way that EB deploys packages installed with PIP, and have attempted to follow the instructions here on SO for resolving the issue.
I first tried incrementally deploying, as suggested in the linked answer, by adding pieces of the matplotlib package stack to my requirements.txt file in stages. But this takes forever (for each stage) and is prone to failure and timing out (which seems to leave build directories behind that stall subsequent package installations).
So the simple solution mentioned off-handedly at the end of the answer appeals to me: just eb ssh, activate the virtialenv with
source /opt/python/run/venv/bin/activate
and pip install packages manually. But I can't get this to work either. First I'm often confronted with left-beind build directories (as mentioned above)
pip can't proceed with requirement 'xxxx' due to a pre-existing build directory.
location: /opt/python/run/venv/build/xxxx
This is likely due to a previous installation that failed.
pip is being responsible and not assuming it can delete this.
Please delete it and try again.
But even after removing these, I consistently get
Exception:
Traceback (most recent call last):
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/basecommand.py", line 122, in main
status = self.run(options, args)
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/commands/install.py", line 278, in run
requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/req.py", line 1197, in prepare_files
do_download,
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/req.py", line 1375, in unpack_url
self.session,
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/download.py", line 582, in unpack_http_url
unpack_file(temp_location, location, content_type, link)
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/util.py", line 625, in unpack_file
untar_file(filename, location)
File "/opt/python/run/venv/lib/python2.7/site-packages/pip/util.py", line 533, in untar_file
os.makedirs(location)
File "/opt/python/run/venv/lib64/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/opt/python/run/venv/build/xxxx'
in response to pip install xxxx (and sudo pip fails with sudo: pip: command not found).
What can I do to get this working on AWS-EB? In particular, what do I need to do to get the simple SSH+PIP approach working; or is there some other better — simpler! — approach I should try.
FWIW, I have a .ebextensions/software.config with
packages:
yum:
gcc-c++: []
gcc-gfortran: []
python-devel: []
atlas-sse3-devel: []
lapack-devel: []
libpng-devel: []
freetype-devel: []
zlib-devel: []
and a requirements.txt that ends with
pytz==2014.10
pyparsing==2.0.3
python-dateutil==2.4.0
nose==1.3.4
six>=1.8.0
mock==1.0.1
numpy==1.9.1
matplotlib==1.4.2
After about 4 hours, I've gotten far as numpy (as reported by pip list in the EB virtualenv).
And (in case it matters) the user who is SSHing is part in a group with the policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"elasticbeanstalk:*",
"ec2:*",
"elasticloadbalancing:*",
"autoscaling:*",
"cloudwatch:*",
"s3:*",
"sns:*",
"cloudformation:*",
"rds:*",
"sqs:*",
"iam:PassRole"
],
"Resource": "*"
}
]
}

I have used many approaches to build and deploy numpy/scipy/matplotlib, on Windows as well as Linux systems. I have used system-provided package managers (aptitude, rpm), 3rd-party package managers (pypm), Python package managers (easy_install, pip), source releases, used different build environments/tools (GCC, but also Intel MKL, OpenMP). While doing so, I have run into many many quite annoying situations, but have also learned a lot about the pros and cons of each approach.
I have no experience with Elastic Beanstalk (EB), but I have experience with EC2. I see that you can SSH into an instance and poke around. So, what I suggest further below is based on
above-stated experiences and on
the more or less obvious boundary conditions regarding Beanstalk and on
your application scenario, described in another question here on SO and on
the fact that you just want to get things running, quickly
My suggestion: start off with not building these things yourself. Do not use pip. If possible, try to use the package manager of the Linux distribution in place and let it handle the installation of everything required for you, with a single command (e.g. sudo apt-get install python-matplotlib).
Disadvantages:
possibly old package versions, depending on the Linux distro in use
non-optimized builds (e.g. not built against e.g. Intel MKL or not leveraging OpenMP features or not using special instruction sets)
Advantages:
it quickly downloads, because packages are most likely cached near your machine
it quickly installs (these packages are pre-built, no compilation involved)
it just works
So, I hope you can just use aptitude or rpm or whatever on these machines and inherit the great work that the distribution package maintainers do for you, behind the scenes.
Once you are confident in your application and identified some bottleneck or issue, you might have reason to use a newer version of numpy/matplotlib/... or you might have reason to have a faster version of these, by creating an optimized build.
Edit: EB-related details of outlined approach
In the meantime we have learned that EB by default runs Amazon Linux which is based on Red Hat Enterprise Linux. Likewise, it uses yum as package manager and packages are in RPM format.
Amazon provides documentation about available packages. In Amazon Linux 2014.09, these packages are available: http://aws.amazon.com/de/amazon-linux-ami/2014.09-packages/
In this list we find
numpy-1.7.2
python-matplotlib-0.99.1.2
This version of matplotlib is very old, according to the changelog it is from September 2009: "2009-09-21 Tagged for release 0.99.1".
I did not anticipate it to be so old, but still, it might be sufficient for your needs. So we proceed with our plan (but I'd understand if that's a blocker).
Now, we have learned that system Python and EB Python are isolated from each other. That does not mean that EB Python cannot access system Python site packages. We just need it to tell so. A simple and clean method is to set up a proper directory structure with the packages that should be accessible to EB Python, and to communicate this directory to EB Python via sys.path.
Clearly, we need to customize the bootstrapping phase of EB containers. The available tools are documented here: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html
Obviously, we want to make use of the packages approach, and tell EB to install the numpy and python-matplotlib packages via yum. So the corresponding config file section should contain:
packages:
yum:
numpy: []
python-matplotlib: []
Explicitly mentioning numpy might not be necessary, it likely is a dependency of python-matplotlib.
Also, we need to make use of the commands section:
You can use the commands key to execute commands on the EC2 instance.
The commands are processed in alphabetical order by name, and they run
before the application and web server are set up and the application
version file is extracted.
The following three commands create above-mentioned directory, and set up symbolic links to the numpy/mpl installation paths (these paths hopefully are available in the moment these commands become executed):
commands:
00-create-dir:
command: "mkdir -p /opt/py26-selected-site-packages"
01-link-numpy:
command: "ln -s /usr/lib64/python2.6/site-packages/numpy /opt/py26-selected-site-packages/numpy"
02-link-mpl:
command: "ln -s /usr/lib64/python2.6/site-packages/matplotlib /opt/py26-selected-site-packages/matplotlib"
Two uncertainties: the AWS docs to not clarify that packages are processed before commands are executed. You have to try. It it does not work, use container_commands. Secondly, it is just an educated guess that /usr/lib64/python2.6/site-packages/matplotlib is available after installing python-matplotlib. It should be installed to this place, but it may end up somewhere else. Needs to be tested. Numpy should end up where specified as inferred from this article.
[UPDATE FROM SEB]
AWS documentation says "The cfn-init helper script processes these configuration sections in the following order: packages, groups, users, sources, files, commands, and then services."
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-init.html
So, your approach is safe
[/UPDATE]
The crucial step, as pointed out in the comments to this answer, is to tell your Python app where to look for packages. Direct modification of sys.path before attempting to import is a reliable method to take control of this. The following code adds our special directory to the selection of directories in which Python looks out for packages, and then attempts to import matplotlib:
sys.path.append("/opt/py26-selected-site-packages")
from matplotlib import pyplot
The order in sys.path defines priorities, so in case there is any other matplotlib or numpy package available in one of the other directories, it might be a better idea to
sys.path.insert(0, "/opt/py26-selected-site-packages")
However, this should not be necessary if our whole approach was well thought-through.

To add to Jan-Philip Answer :
AWS Elastic Beanstalk is using Amazon Linux distribution (except for .Net environments). Amazon Linux uses the yum package manager. MatPlotLib is available in Amazon's software repository.
[ec2-user#ip-1-1-1-174 ~]$ yum list | grep matplot
python-matplotlib.x86_64 0.99.1.2-1.6.amzn1 amzn-main
If this version is the one you need for your application, I would try to simply modify your .ebextensions/software.config file and to add the package to the yum section of it:
packages:
yum:
python-matplotlib: []
python-devel: []
atlas-sse3-devel: []
lapack-devel: []
libpng-devel: []
freetype-devel: []
zlib-devel: []
A last note about AWS Elastic BeansTalk and SSH.
While Amazon gives you the possibility to SSH to your Elastic Beanstalk instances, you should use this possibility only for debugging purposes, to understand why your app failed or is not installing as suggested.
Other than that, your deployment must be 100% automatic. When Elastic Beanstalk (Auto Scaling to be precise) will scale out your infrastructure (add more instances) or scale it in (terminate instances) depending on your application workload, all your manual configuration will be lost.
Best practices is to not install SSH keys on your production environment, it further reduces the surface of attacks.

I might be a bit late to this question, but as AWS and a lot of the cloud service providers are moving into Docker and taking into consideration that you haven't specified the platform . I have a fast solution to your question:
Use the generic docker platform.
I created some images with Python, Numpy, Scipy and Matplotlib preinstalled, so you can directly pull and start using them with one line of code.
Python 2.7(This one also has the versions that you were specifying for numpy and matplotlib)
sudo docker pull chuseuiti/pynuscimat2.7
Python 3.4
sudo docker pull chuseuiti/pynusci
However you can create your own image or modify existing images.
In case you want to automate your instances, you can pass a Dockerfile to AWS with the definition of your image.
Tip, in case you don't know about docker:
It is need to login before been able to pull:
sudo docker login
After pulling the image, you can generate and work in a container created from an image with the next code:
sudo docker run -i -t chuseuiti/pynuscimat2.7 bash
PS. At least with the free tier AWS is always complaining about running out of time with scipy and matplotlib, it takes too much time to install them, that is why I use this option.

Related

How to create a RedHat repository with a reduced set of packages

Dears,
I am managing a pool of servers running under RHEL 7.6, I created a local repository of RHEL packages to be able to udpated the other servers by limiting the internet access to the server hosting the local repository.
I used the reposync command to populate my repository but I am downloading a huge number of rpms packages!
I would like to reduce the set of packages to download to the ones already deployed on all the severs, I can do the list using the rpm command, (~750 packages).
I read that there is an includepkgs directive to be used with the reposync command.
How is it working, what is the required format?
I know it is possible to use the yumdownloader command to update the local repository, how is it possible to populate the repository for the first time ?
Any help advice would be appreciated
Regards
Fdv
It seems that the best option is to limit to the last version of the packages by using the option :
-n, --newest-only Download only newest packages per-repo

Cannot import Selenium even when it's installed [duplicate]

After searching the web for hours i didnt yet find an answer to my problem. I am using Python 3.6 and i cant import selenium. I always get the message "No module named 'selenium''
I tried everything, i first downloaded selenium from this website https://pypi.python.org/pypi/selenium/3.6.0 .
Then I tried python -m pip install -U selenium and didnt work either. I tried some other things that people said but they didnt work either.
Im using windows 10.
Any help?
As you mentioned you are using Python 3.6 following the steps :
Open Command Line Interface (CLI) and issue the command python to check if Python is properly installed :
C:\Users\username>python
Python 3.6.1 (v3.6.1:69c0db5, Jan 16 2018, 17:54:52) [MSC v.1900 32 bit (Intel)]
on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
Ensure pip is working properly :
C:\Users\username>pip
Usage:
pip <command> [options]
Commands:
install Install packages.
download Download packages.
uninstall Uninstall packages.
freeze Output installed packages in requirements format.
list List installed packages.
show Show information about installed packages.
check Verify installed packages have compatible dependencies.
search Search PyPI for packages.
wheel Build wheels from your requirements.
hash Compute hashes of package archives.
completion A helper command used for command completion.
help Show help for commands.
General Options:
-h, --help Show help.
--isolated Run pip in an isolated mode, ignoring environment variables and user configuration.
-v, --verbose Give more output. Option is additive, and can be used up to 3 times.
-V, --version Show version and exit.
-q, --quiet Give less output. Option is additive, and can be used up to 3 times (corresponding to WARNING, ERROR, and CRITICAL logging levels).
--log <path> Path to a verbose appending log.
--proxy <proxy> Specify a proxy in the form [user:passwd#]proxy.server:port.
--retries <retries> Maximum number of retries each connection should attempt (default 5 times).
--timeout <sec> Set the socket timeout (default 15 seconds).
--exists-action <action> Default action when a path already exists: (s)witch, (i)gnore, (w)ipe, (b)ackup, (a)bort.
--trusted-host <hostname> Mark this host as trusted, even though it does not have valid or any HTTPS.
--cert <path> Path to alternate CA bundle.
--client-cert <path> Path to SSL client certificate, a single file containing the private key and the certificate in PEM format.
--cache-dir <dir> Store the cache data in <dir>.
--no-cache-dir Disable the cache.
--disable-pip-version-check
Don't periodically check PyPI to determine
whether a new version of pip is available for
download. Implied with --no-index.
Install latest selenium through pip :
C:\Users\username>pip install -U selenium
Collecting selenium
Downloading selenium-3.8.1-py2.py3-none-any.whl (931kB)
100% |¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 942kB 322kB/s
Installing collected packages: selenium
Successfully installed selenium-3.8.1
Confirm that Selenium is installed :
C:\Users\username>pip freeze
selenium==3.8.1
Open an IDE (e.g Eclipse, PyCharm) and write a simple program as follows :
from selenium import webdriver
driver = webdriver.Firefox(executable_path="C:\\path\\to\\geckodriver.exe")
driver.get('https://stackoverflow.com')
Execute the program on which Firefox Quantum Browser will be initiated and the url https://stackoverflow.com will be accessed.
Python Download Location (Windows) :
Python (for Windows) can be download from the following location :
https://www.python.org/downloads/
I'm on VS Code in Windows 10, and here's how I solved it.
You need to pay attention to where the Python is located (in my case),
1) C:\Users\_Me_\AppData\Local\Programs\Python\Python38\
and where the Python looks for libraries/packages, including the ones installed using pip (again, in my case),
2) C:\Users\_Me_\AppData\Roaming\Python\Python38\
I don't know why these two locations are different (gotta fix it at some point). It seemed that Python was running from the first location, but it was looking for libraries in the second!:/
Anyway, since I have limited experience in Python , I just copied the \Lib\site-packages from the first location (including selenium folders) to \site-packages in second one in hopes of solving the issue, which worked out for me!
How to check for there locations
1) Open Python CLI, typed the following command:
which python
2) Open Python CLI, typed the following commands (From this answer):
>>> import site
>>> site.USER_SITE
EDIT
Since this seems a temporary solution, I uninstalled Python and reinstalled it again in a proper directory (other than the default install directory), and now which python and which pip point to the same folder! Problem solved!
If these solutions did not work for you, try looking into which Python interpreter your IDE is using (in my case I was using VSCode - you can find the interpreter in the bottom bar). Worked for me.
Easiest way is to copy all files of "venv" Lib, Scripts, Selenium and other folder into your main project folder.
This issue occur as pycharm directly take from virtual environment venv.
Hope this will resolve your problem :)

Difference between "command" or specific module execution in Ansible

I'm starting with Ansible, and I found that there is a module called command which lets me execute any command in a remote node.
I saw a couple of example where initial setups are solved by using command instead of specific modules. For example, as far as I know, both of these do the same task:
- name: Install git using apt module
apt:
name: git
state: present
- name: Install git using command
command: apt-get install git
So, my question is: is there any difference or any reason to use a module instead of command?
The difference in short is that using a specific module will give you playbook's idempotence and provide better portability and readability.
What I mean by idempotence? When you run:
- name: Install git using apt module
apt:
name: git
state: present
It will install git package only if it is not yet installed on the target system and after playbook run this task will be reported in green colour (OK) if git had been already installed.
2nd approach with the command module:
- name: Install git using command
command: apt-get install git
Above command will always report status as changed (yellow colour) when in fact nothing changed (assuming git package had been already installed). There are ways to make tasks that use the command module idempotent as well but it costs you some more work.
Best practice is to always use a specific module before command in playbooks.
Ansible is all about describing and managing system state. When you run a playbook on a certain target system it can be very misleading to see a task reporting a changed state while in fact nothing has been changed.
Think declaratively about describing desired state, not about low level commands needed to get a system to this state.
Below article will also provide some explanation around differences and consequences of using command vs specific module:
Ansible Best Practices: The Essentials
There are probably numerous reasons but here are a few:
Intrinsic idempotence (does not execute task every time without extra effort)
Superior readability (much clearer what you are trying to do)
More concise tasks (much fewer words to describe the task)
Platform-agnostic execution (works on all OS instead of just one without extra effort)

How to run scripts automatically after deployment in AWS using EB CLI?

I am trying to make a Django server on AWS. My django app depends on some mathematical python libraries like numpy, scipy, sklearn etc. However there is an issue for which I need to this after every deployment
sudo nano /etc/httpd/conf.d/wsgi.conf
---------------------------------------
add this line in the file
WSGIApplicationGroup %{GLOBAL}
---------------------------------------
sudo /etc/init.d/httpd reload
Basically I need "WSGIApplicationGroup %{GLOBAL}" in my wsgi.conf file otherwise I get 504. I am using a Custom AMI built on top of Amazon Linux 2014 and I am using EB CLI for deployment. However whenever I deploy the wsgi.conf is reset and it does not contain the line that I have added previously and I need to manually SSH into the EC2 instance and do this task myself. It gives a overhead on every deployment and its also not feasible once we scale up (cloning or creating instances also resets it). So is there a way that this will be automatically done after every deployment ?
The content of the wsgi.conf is fixed, so basically I can make a script easily to create it but the issue is how to trigger the script automatically ?
PS:I am new to AWS
You need to use AWS Elastic Beanstalk feature called .ebextensions: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html
In your case you can't use Files or Commands sections, because:
The commands are processed in alphabetical order by name, and they run
before the application and web server are set up and the application
version file is extracted.
You need to use Container_commands section:
They run after the application and web server have been set up and the
application version file has been extracted, but before the
application version is deployed.
Example .ebextensions/01wsgi.config (not tested :-))
container_commands:
apache_reload:
command: |
echo "WSGIApplicationGroup %{GLOBAL}" >> /etc/httpd/conf.d/wsgi.conf
/etc/init.d/httpd reload
Feel free to tweak my example as you want, for example you can copy your temporary wsgi.conf file somewhere and then replace original in Container_commands section.

building apache from source on debian

I'm trying to build apache from source on debian. The only reason I'm not using spt-get install is because in the apache cookbook, they recommend installing from source.I get the following error when I ./configure:
configure: error: invalid variable name: ' --with-mpm'
I also saw some warnings when I ./buildconf Is this something I should be concerned about? This is my first attempt at compiling from source, and I'd really appreciate any help.
I'm using the ./configure arguments directly from the apache cookbook:
./configure --prefix=/usr/local/apache --with-layout=Apache --enable-modules=most --enable-mods-shared=all \ --with-mpm=prefork
I'm running a minimum debian install in virtual box to train myself for deploying in the rackspace cloud soon.
EDIT: I'm building Apache 2.2.16
I suspect you are typing that entire build line you provided on one line, complete with the '\' in the middle.
You should get rid of '\', which in bash either treats the following as part of the same string, but the slash has to immediately follow a non-whitespace character. It is also used for special escape sequences, which I think is the case here and generating that message.
This should be the correct line in your case.
./configure --prefix=/usr/local/apache --with-layout=Apache --enable-modules=most --enable-mods-shared=all --with-mpm=prefork
On a side note, doesn't the Apache Cookbook say that building from source is one possibility for installing it, in addition to installing from a pre-packaged build like you can get from Debian's repositories? I suppose if you really wanted a far newer build or a more repeatable process to ensure consistency across a variety of distributions, building from scratch will do that for you, but otherwise I would try to utilize the distribution's package management as much as possible. Building from source removes you from the security patches and ease-of-upgrade path that Debian APT gives you.