I have scrapy and scrapyd installed on a debian machine. I log in to this server using a ssh-tunnel. I then start scrapyd by going:
scrapyd
Scrapyd starts up fine and I then open up another ssh-tunnel to the server and schedule my spider with:
curl localhost:6800/schedule.json -d project=myproject -d spider=myspider
The spider runs nicely and everything is fine.
The problem is that scrapyd stops running when I quit the session where I started up scrapyd. This prevents me from using cron to schdedule spiders with scrapyd since scrapyd isn't running when the cronjob is launched.
My simple question is: How do I keep scrapyd running so that it doesn't shut down when I quit the ssh session.
Run it in a screen session:
$ screen
$ scrapyd
# hit ctrl-a, then d to detach from that screen
$ screen -r # to re-attach to your scrapyd process
You might consider launching scrapyd with supervisor.
And there is a good .conf script available as a gist here:
https://github.com/JallyHe/scrapyd/blob/master/supervisord.conf
How about ?
$ sudo service scrapyd start
Related
I created GCP VM (ubunto). I installed python and scrapy.
I would like to run my spider from there, scrapy crawl test -o test1.csv
I opened the terminal from gcp and run the spider (worked), it will take at least 3 hours.
How can I make sure when i exit the terminal (browser) the script will continue.
You can use nohup to make sure the crawling continues:
nohup scrapy crawl test -o test1.csv &
When you log off, the crawler will continues until it finishes. The & at the end will make the process execute in the background.
To redirect the output to a log file, you can execute it as follows:
nohup scrapy crawl test -o test1.csv &> test.log &
For a better way to run & deploy spiders on a server, you can checkout scrapyd
You can create a run.py file in the spiders directory.
document content
from scrapy.cmdline import execute
execute(['scrapy', 'crawl', 'test', '-o', 'test1.csv'])
After that
nohup python -u run.py > spider_house.log 2>&1 &
If the log has been configured inside the crawler and the log will be recorded according to the log output inside the crawler, the log output configured by nohup will not be used.
If a paused sustainable crawl is configured, that is, the JOBDIR= parameter wants to gracefully pause the crawler, so that the next time the crawler starts, the last pause is the crawl. The close crawler method is
kill 2 pid
When I run redis by redis-server CONFIG_FILE, the process will be run in the background. If I run it without CONFIG_FILE parameter, it will not run in background. How can I make it run in front ground with a configuration file? It is useful when send this command to a docker. The docker container will stop running if the process is running in the background.
Try to set daemonize to 'no' in your CONFIG_FILE.
All,
I have successfully installed my ServiceStack console app on my DigitalOcean droplet and can run it from the command line using mono. When I do this, my app is accessible using Postman from my laptop.
I have also tried to use Upstart to run my app as a daemon. I can see from the logging that it successfully launches when I reboot, but unless I am logged in as root and have started my console app from the command line, I can't access the console app from the outside when running as the daemon. I have tried this with ufw enabled (configured to allow the port I am using) and disabled and it makes no difference.
I am reasonably certain this is a permissions issue in my upstart config file for my console app, but since I am brand new to linux, I am unclear as to my next step to get this console app available as a daemon.
Any and all help is greatly appreciated...
Bruce
# ServiceStack GeoAPIConsole Application
# description “GeoAPIConsole”
# author “Bruce Parr”
setuid root
# start on started rc
start on started networking
stop on stopping rc
respawn
exec start-stop-daemon --start --exec /usr/bin/mono /var/console/GeoAPIConsole.exe
This worked. I added a user geoapiconsole and added the -S and -c switches, then I followed with initctrl start GeoAPIConsole
# ServiceStack Example Application
description "ServiceStack Example"
author "ServiceStack"
start on started rc
stop on stopping rc
respawn
exec start-stop-daemon -S -c geoapiconsole --exec /usr/bin/mono /var/console/GeoAPIConsole.exe
I can't seem to automatically bootup my celeryd script located in /etc/init.d/celeryd everytime my Amazon Linux AMI 2013.03.1 machine is booted. I have to automatically do /etc/init.d/celeryd start . However, it boots perfectly and work right away.
Any ideas? I tried
sudo chkconfig /etc/init.d/celeryd on
You need to write a simple startup script:
Create a file called celeryd.sh
vim /etc/init.d/celeryd.sh
Inside that file:
#!/bin/sh
##Starts the celery on boot up###
/etc/init.d/celeryd start
Change permission:
chmod +x celeryd.sh
Done.
You can do init 6 and test, if it works or not.
More on : http://www.cyberciti.biz/tips/linux-how-to-run-a-command-when-boots-up.html
I have the following Profile that I use with foreman to do development work for a heroku site:
web: gunicorn project_name.wsgi -b 0.0.0.0:$PORT
worker: python manage.py rqworker default
redis: redis-server
Everything worked great until I added the redis line. While the app runs fine, I cannot kill foreman with control-c -- it just keeps running. The only way I can kill foreman is by killing the redis-server process.
How can I get foreman to respond (and stop) to the control-c?
This usually happens because redis or memcached won't shut down. So I have just created a script that I run to kill the development environment. Currently it is:
#!/bin/bash
redis-cli SHUTDOWN
killall memcached