AWS Elastic Beanstalk: setting up X virtual framebuffer (Xvfb) - selenium

I'm trying to get a Selenium script running on an Elastic Beanstalk server, to achieve this I am using pyvirtualdisplay package following this answer. However, for the Display driver to run xvfb also needs to be installed on the system. I'm getting this error message:
OSError=[Errno 2] No such file or directory: 'Xvfb'
Is there any way to manually install this on EB? I have also set up an EC2 server as suggested here, but the whole process seems unnecessary for this task.

You can create a file in .ebextensions/ like: .ebextensions/xvfb.config with the following content:
packages:
yum:
xorg-x11-server-Xvfb: []

Related

Can't create extension pg_cron in bitnami:postgres docker container?

I am running a docker container with a Database which is working with the bitnami:postgres image. It is all working fine but now I want to install pg_cron to schedule autmatic jobs.
I installed it and it is available as a possible extension in Dbeaver. But when I select and install it I get the message:
ERROR: extension "pg_cron" must be installed in schema "pg_catalog"
When i am using the command
Create Extension pg_cron;
I get:
ERROR: pg_cron can only be loaded via shared_preload_libraries
Hinweis: Add pg_cron to the shared_preload_libraries configuration variable in postgresql.conf.
I tried to change the postgresql.conf file but when I restart my docker container to apply the changes shared_preload_libraries is always reset to pgaudit.

Python cron job with Chrome not running in AWS EC2

I've been using an EC2 instance to run a python script with cron everyday for a month or so. The script uses selenium.
Everything was working correctly until today, when my script did not run.
I have tried to run it manually but it's not working either. The error message says that
raise exception_class(message, screen, stacktrace) selenium.common.exceptions.
NoSuchElementException: Message: no such element: Unable to locate element:
{"method":"cssselector","selector":"#ctl00_ctl00_moteurRapideOffre_
ctl01_EngineCriteriaCollection_Contract > option:nth-child(5)"}
(Session info: headless chrome=90.0.4430.85)
However, the same script is running fine on my computer (ie on my Macbook, not on AWS EC2).
As the problem seems to come from Chrome, I uninstalled it on AWS EC2 using:
sudo yum remove google-chrome-stable
Then I reinstalled it using :
curl https://intoli.com/install-google-chrome.sh | bash
sudo mv /usr/bin/google-chrome-stable /usr/bin/google-chrome
google-chrome --version && which google-chrome
If I try to run Chrome on the EC2 using /usr/bin/google-chrome, it does not work and it displays the following error message :
ERROR:browser_main_loop.cc(1386)] Unable to open X display.
I don't know if it was working before as I have never used it this way. But it seems to be a problem.
I have seen on the web that it might come from the fact that there is no screen and that I should use a package named xvfb. I have tried to install it with the following code:
sudo yum install xorg-x11-server-Xvfb
I guess the package was correclty installed, but it is not working better.
To sum up, I think my problem in the python code is linked to the fact that Google Chrome is not working correclty and this might be linked to xvfb. But I am not sure at all, it is just what I have tried until now.
Could you please help me ? Thanks!
You can simply add setup your like this, runs after every 30 minutes
*/30 * * * * export DISPLAY=:0 && ,<do what ever you want.>
If this does not work, and you google-chrome or firefox not found, simply run the command below in your shell BASH, FISH, ZSH etc to get PATH.
echo $PATH
Whatever the result comes out from the above command just copy and paste it above your cronjob like this,
*/30 * * * * export DISPLAY=:0 && ,<your selenium script.>```
You can remove export ```export DISPLAY=:0``` line if you want to this in the background or make your driver headless.
The reason of doing this, you might install the respective from snapd etc and that's why path is not defined as you downloaded from separate resource.

nifi pyspark - "no module named boto3"

I'm trying to run a pyspark job I created that downloads and uploads data from s3 using the boto3 library. While the job runs fine in pycharm, when I try to run it in nifi using this template https://github.com/Teradata/kylo/blob/master/samples/templates/nifi-1.0/template-starter-pyspark.xml
The ExecutePySpark errors with "No module named boto3".
I made sure it was installed on my conda environment that is active.
Any ideas, im sure im missing something obvious.
Here is a picture of the nifi spark processor.
Thanks,
tim
The Python environment where PySpark should run on is configured via the PYSPARK_PYTHON variable.
Go to Spark installation directory
Go to conf
Edit spark-env.sh
Add this line: export PYSPARK_PYTHON=PATH_TO_YOUR_CONDA_ENV

How to run scripts automatically after deployment in AWS using EB CLI?

I am trying to make a Django server on AWS. My django app depends on some mathematical python libraries like numpy, scipy, sklearn etc. However there is an issue for which I need to this after every deployment
sudo nano /etc/httpd/conf.d/wsgi.conf
---------------------------------------
add this line in the file
WSGIApplicationGroup %{GLOBAL}
---------------------------------------
sudo /etc/init.d/httpd reload
Basically I need "WSGIApplicationGroup %{GLOBAL}" in my wsgi.conf file otherwise I get 504. I am using a Custom AMI built on top of Amazon Linux 2014 and I am using EB CLI for deployment. However whenever I deploy the wsgi.conf is reset and it does not contain the line that I have added previously and I need to manually SSH into the EC2 instance and do this task myself. It gives a overhead on every deployment and its also not feasible once we scale up (cloning or creating instances also resets it). So is there a way that this will be automatically done after every deployment ?
The content of the wsgi.conf is fixed, so basically I can make a script easily to create it but the issue is how to trigger the script automatically ?
PS:I am new to AWS
You need to use AWS Elastic Beanstalk feature called .ebextensions: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html
In your case you can't use Files or Commands sections, because:
The commands are processed in alphabetical order by name, and they run
before the application and web server are set up and the application
version file is extracted.
You need to use Container_commands section:
They run after the application and web server have been set up and the
application version file has been extracted, but before the
application version is deployed.
Example .ebextensions/01wsgi.config (not tested :-))
container_commands:
apache_reload:
command: |
echo "WSGIApplicationGroup %{GLOBAL}" >> /etc/httpd/conf.d/wsgi.conf
/etc/init.d/httpd reload
Feel free to tweak my example as you want, for example you can copy your temporary wsgi.conf file somewhere and then replace original in Container_commands section.

How do I run puppet agent inside a docker container to build it out. How do I achieve this?

If I run a docker container with CMD["/use/sbin/ssh", "-D"], I can have them running daemonized, which is good.
Then, I want to run puppet agent too, to build our said container as say an apache server.
Is it possible to do this and then expose the apache server?
Here is another solution. We use ENTRYPOINT docker file instruction as described here: https://docs.docker.com/articles/dockerfile_best-practices/#entrypoint. Using it you can run puppet agent and other services in background before instruction from CMD or command passed via docker run.