Scrapy + Splash (Docker) Issue

Scrapy + Splash (Docker) Issue - scrapy

I have scrapy and scrapy-splash set up on a AWS Ubuntu server. It works fine for a while, but after a few hours I'll start getting error messages like this;
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.5/site-
packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/home/ubuntu/.local/lib/python3.5/site-
packages/twisted/python/failure.py", line 393, in throwExceptionIntoGe
nerator
return g.throw(self.type, self.value, self.tb)
File "/home/ubuntu/.local/lib/python3.5/site-
packages/scrapy/core/downloader/middleware.py", line 43, in process_re
quest
defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.internet.error.ConnectionRefusedError: Connection was refused by
other side: 111: Connection refused.
I'll find that the splash process in docker has either terminated, or is unresponsive.
I've been running the splash process with;
sudo docker run -p 8050:8050 scrapinghub/splash
as per the scrapy-splash instructions.
I tried starting the process in a tmux shell to make sure the ssh connection is not interfering with the splah process, but no luck.
Thoughts?

You should run the container with --restart and -d options. See the documentation how to run Splash in production.

Related

pexpect error when trying to scp in a nohup process

I have an scp function as follows:
def scp_to38():
child = pexpect.spawn(f"scp {tar_file_path} bbmprd#10.82.20.38:/app2/upload/")
child.expect(f"bbmprd#10.82.20.38's password:")
child.sendline("password")
child.interact()
print(f'{tar_file_path} SENT TO 38:/app2/upload/')
and it is throwing me an error saying a problem with child.interact()
Traceback (most recent call last):
File "scan_for_upload.py", line 39, in <module>
scp_to38()
File "scan_for_upload.py", line 26, in scp_to38
child.interact()
File "/app/anaconda3/envs/python37-1/lib/python3.7/site-packages/pexpect/pty_spawn.py", line 788, in interact
mode = tty.tcgetattr(self.STDIN_FILENO)
termios.error: (25, 'Inappropriate ioctl for device')
I am running the script in a nohup process, I believe that is causing this issue. Because when I run the script in the console, it works fine. How do I interact with the scp password in a nohup process using pexpect module? (I can't install any other external modules in my environment)

interact() is for manual interaction with the spawned process and it requires to be running on a tty but nohup provides no tty. You can just replace child.interact() with child.expect(pexpect.EOF, timeout=None).

Accessing Kaggle tools in VM by mounting key

I am trying to use kaggle command line tool and I am running into problems with using it inside my own vm. I downloaded the API token from the site and placed it in /.kaggle/kaggle.json on windows. My vm has ubuntu installed and in the Vagrant file I have the following:
config.vm.synced_folder ENV['HOME'] + "/.kaggle", "/home/ubuntu/.kaggle", mount_options: ['dmode=700,fmode=700']
config.vm.provision "shell", inline: <<-SHELL
echo "export KAGGLE_CONFIG_DIR='/home/ubuntu/.kaggle/kaggle.json'" >> /etc/profile.d/myvar.sh
SHELL
when running env command in the vm I see it is correct:
KAGGLE_CONFIG_DIR=/home/ubuntu/.kaggle/kaggle.json
However, when I try to use the kaggle command for example kaggle -h I get the the following
(main) vagrant#dev:/home/ubuntu/.kaggle$ ls
kaggle.json
(main) vagrant#dev:/home/ubuntu/.kaggle$ kaggle -h
Traceback (most recent call last):
File "/user/home/venvs/main/bin/kaggle", line 5, in <module>
from kaggle.cli import main
File "/user/home/venvs/main/lib/python3.7/site-packages/kaggle/__init__.py", line 23, in <module>
api.authenticate()
File "/user/home/venvs/main/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 149, in authenticate
self.config_file, self.config_dir))
OSError: Could not find kaggle.json. Make sure it's located in /home/ubuntu/.kaggle/kaggle.json. Or use the environment method.
The paths are all correct and the file is where it should be looking for it. Anyone know what the issue could be? Is it because it is mounted?

Alright, I misread the instructions: "You can define a shell environment variable KAGGLE_CONFIG_DIR to change this location to $KAGGLE_CONFIG_DIR/kaggle.json"
So the env variable should be /home/ubuntu/.kaggle/ instead of /home/ubuntu/.kaggle/kaggle.json.

Cannot start GlassFish server via Terminal - No handler was ready to authenticate

I download the zip file of GlassFish 4.1.1, after extract it, I use Terminal to start the server using asadmin start-domain command. It give me this error:
Traceback (most recent call last):
File "/usr/local/bin/asadmin", line 260, in <module> autoscale = boto.connect_autoscale()
File "/Library/Python/2.7/site-packages/boto/__init__.py", line 208, in connect_autoscale**kwargs)
File "/Library/Python/2.7/site-packages/boto/ec2/autoscale/__init__.py", line 115, in __init__profile_name=profile_name)
File "/Library/Python/2.7/site-packages/boto/connection.py", line 1100, in __init__provider=provider)
File "/Library/Python/2.7/site-packages/boto/connection.py", line 569, in __init__host, config, self.provider, self._required_auth_capability())
File "/Library/Python/2.7/site-packages/boto/auth.py", line 997, in get_auth_handler 'Check your credentials' % (len(names), str(names))) boto.exception.NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV4Handler'] Check your credentials
I'm using MacOS Sierra 10.12.2, anyone know how to fix that error?

The problem here is that you have the boto Python AWS command line utilities installed. One of those utilities is called asadmin and your shell thinks you mean to call the asadmin (AWS autoscaling admin) command, rather than the GlassFish asadmin file.
After you extract GlassFish, you need to reference the asadmin file that comes with GlassFish, so start the domain as follows:
glassfish4/bin/asadmin start-domain

I get an Input/Output error when trying to install a VOLTTRON agent

SSH into a VOLTTRON instance, installing agents works. Log out, log back in and installing results in the following error:
2016-09-13 11:46:24,409 () volttron.platform.vip.agent.subsystems.rpc ERROR: unhandled exception in JSON-RPC method 'install_agent':
Traceback (most recent call last):
File "/home/volttron/volttron/volttron/platform/vip/agent/subsystems/rpc.py", line 168, in method
return method(*args, **kwargs)
File "/home/volttron/volttron/volttron/platform/control.py", line 287, in install_agent
agent_uuid = self._aip.install_agent(path, vip_identity=vip_identity)
File "/home/volttron/volttron/volttron/platform/aip.py", line 296, in install_agent
unpack(agent_wheel, dest=agent_path)
File "/home/volttron/volttron/env/local/lib/python2.7/site-packages/wheel/tool/__init__.py", line 135, in unpack
sys.stderr.write("Unpacking to: %s\n" % (destination))
IOError: [Errno 5] Input/output error

When any background process is disowned, the ssh session is terminated, stdeff and stdout are not redirected to /dev/null, and the process tries to write to either it results in an IOError.
In this case one of the third party libraries that VOLLTRON uses when installing an agent tries to write to stderr (much to our chagrin). Even if the platform is run with the -l option it will still occasionally write to stderr. Unfortunately there is no reliable way for VOLTTRON to do the right thing with stderr in all cases so we have to leave it up to the user to know when they need to redirect output to /dev/null.
To run in the background use start-stop-daemon which automatically redirects everything to /dev/null or use this command to start the platform:
volttron -vv -l volttron.log > /dev/null 2>&1&
You can then safely disown the process and logout. Installations will still work.

unable to connect to remote host with rabbitmqadmin

I'm trying to connect to a remote rabbitmq host using the cli rabbitmqadmin.
The command I'm trying to execute is:
rabbitmqadmin --host=$RABBITMQ_HOST --port=443 --ssl --vhost=$RABBITMQ_VHOST --username=$RABBITMQ_USERNAME --password=$RABBITMQ_PASSWORD list queues
Before you ask: the environmental variables RABBITMQ_HOST, RABBITMQ_VHOST and so on are set... I double and triple checked this already.
The error I get back is:
Traceback (most recent call last):
File "/usr/local/sbin/rabbitmqadmin", line 1007, in <module>
main()
File "/usr/local/sbin/rabbitmqadmin", line 413, in main
method()
File "/usr/local/sbin/rabbitmqadmin", line 588, in invoke_list
format_list(self.get(uri), cols, obj_info, self.options)
File "/usr/local/sbin/rabbitmqadmin", line 436, in get
return self.http("GET", "%s/api%s" % (self.options.path_prefix, path), "")
File "/usr/local/sbin/rabbitmqadmin", line 475, in http
self.options.port)
File "/usr/local/sbin/rabbitmqadmin", line 451, in __initialize_https_connection
context = self.__initialize_tls_context())
File "/usr/local/sbin/rabbitmqadmin", line 467, in __initialize_tls_context
self.options.ssl_key_file)
TypeError: coercing to Unicode: need string or buffer, NoneType found
From the last line I assume it's a python related problem, my current python version is 2.7.12, if I try to connect to the local instance of rabbitmq with
rabbitmqadmin list queues
everything works fine. Any help is greatly appreciated thanks :)

shouldn't those env vars have a $ in front of them, and the params without =?
rabbitmqadmin --host $RABBITMQ_HOST --port 443 --ssl --vhost $RABBITMQ_VHOST --username $RABBITMQ_USERNAME --password $RABBITMQ_PASSWORD list queues`
maybe the = doesn't matter, but i'm pretty sure you need $ in front of the env vars

Validate that you are using the same rabbitmqadmin version as the version of your remote hosted broker. Using a mismatching rabbitmqadmin version will result in that error (for example rabbitmqadmin 3.6.4 querying a 3.5.7 server).
Browse to http://server-name:15672/cli/ and download correct tool from there.
https://github.com/rabbitmq/rabbitmq-management/issues/299

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Scrapy + Splash (Docker) Issue - scrapy

You should run the container with --restart and -d options. See the documentation how to run Splash in production.

Related

pexpect error when trying to scp in a nohup process

Accessing Kaggle tools in VM by mounting key

Cannot start GlassFish server via Terminal - No handler was ready to authenticate

I get an Input/Output error when trying to install a VOLTTRON agent

unable to connect to remote host with rabbitmqadmin

Categories

Resources