running python program on virtual machine - scrapy

I created GCP VM (ubunto). I installed python and scrapy.
I would like to run my spider from there, scrapy crawl test -o test1.csv
I opened the terminal from gcp and run the spider (worked), it will take at least 3 hours.
How can I make sure when i exit the terminal (browser) the script will continue.

You can use nohup to make sure the crawling continues:
nohup scrapy crawl test -o test1.csv &
When you log off, the crawler will continues until it finishes. The & at the end will make the process execute in the background.
To redirect the output to a log file, you can execute it as follows:
nohup scrapy crawl test -o test1.csv &> test.log &
For a better way to run & deploy spiders on a server, you can checkout scrapyd

You can create a run.py file in the spiders directory.
document content
from scrapy.cmdline import execute
execute(['scrapy', 'crawl', 'test', '-o', 'test1.csv'])
After that
nohup python -u run.py > spider_house.log 2>&1 &
If the log has been configured inside the crawler and the log will be recorded according to the log output inside the crawler, the log output configured by nohup will not be used.
If a paused sustainable crawl is configured, that is, the JOBDIR= parameter wants to gracefully pause the crawler, so that the next time the crawler starts, the last pause is the crawl. The close crawler method is
kill 2 pid

Related

How can I run scrapy background in ubuntu?

I developed spider by scrapy, and run it by command
scrapy crawl myspider
Now I try to run it background by command:
nohup scrapy crawl myspider &
but after I close ssh session, the scrapy stop, why?
My guess is this has nothing to do with Scrapy.
Does this suggested solution work?
https://unix.stackexchange.com/questions/658535/what-is-the-real-reason-why-nohup-a-out-dies-when-ssh-session-times-out

Is it possible to run local servers on AWS-CodeBuild?

Good Morning,
I'm using CodeBuild to test my application,
I was wondering if its possible to run a local Server inside a build.
I create a NPM script to start a local server, but every time I ran de tests, the CodeBuild pass through the command without waiting.
I searched on AWS Documentation and they say to use "nohup" command, but It doesn't work for me.
Just to be clear, my expectations is that CodeBuild ran the command, wait to be finished and proceed to another command without closing the open server.
Any of you guys have an idea?
Command:
- nohup yarn start-server
Start a background process and wait for it to complete later:
nohup sleep 30 & echo $! > pidfile
…
wait $(cat pidfile)
Start a background process and do not wait for it to ever complete:
nohup sleep 30 & disown $!
Start a background process and kill it later:
nohup sleep 30 & echo $! > pidfile
…
kill $(cat pidfile)
https://docs.aws.amazon.com/codebuild/latest/userguide/build-env-ref-background-tasks.html

Keep scrapyd running

I have scrapy and scrapyd installed on a debian machine. I log in to this server using a ssh-tunnel. I then start scrapyd by going:
scrapyd
Scrapyd starts up fine and I then open up another ssh-tunnel to the server and schedule my spider with:
curl localhost:6800/schedule.json -d project=myproject -d spider=myspider
The spider runs nicely and everything is fine.
The problem is that scrapyd stops running when I quit the session where I started up scrapyd. This prevents me from using cron to schdedule spiders with scrapyd since scrapyd isn't running when the cronjob is launched.
My simple question is: How do I keep scrapyd running so that it doesn't shut down when I quit the ssh session.
Run it in a screen session:
$ screen
$ scrapyd
# hit ctrl-a, then d to detach from that screen
$ screen -r # to re-attach to your scrapyd process
You might consider launching scrapyd with supervisor.
And there is a good .conf script available as a gist here:
https://github.com/JallyHe/scrapyd/blob/master/supervisord.conf
How about ?
$ sudo service scrapyd start

running same script over many machines

I have setup a few EC2 instances, which all have a script in the home directory. I would like to run the script simultaneously across each EC2 instance, i.e. without going through a loop.
I have seen csshX for OSX for terminal interactive useage...but was wondering what the commandline code is to execute commands like
ssh user#ip.address . test.sh
to run the test.sh script across all instances since...
csshX user#ip.address.1 user#ip.address.2 user#ip.address.3 . test.sh
does not work...
I would like to do this over the commandline as I would like to automate this process by adding it into a shell script.
and for bonus points...if there is a way to send a message back to the machine sending the command that it has completed running the script that would be fantastic.
will it be good enough to have a master shell script that runs all these things in the background? e.g.,
#!/bin/sh
pidlist="ignorethis"
for ip in ip1 ip2
do
ssh user#$ip . test.sh &
pidlist="$pidlist $!" # get the process number of the last forked process
done
# Now all processes are running on the remote machines, and we want to know
# when they are done.
# (EDIT) It's probably better to use the 'wait' shell built-in; that's
# precisely what it seems to be for.
while true
do
sleep 1
alldead=true
for pid in $pidlist
do
if kill -0 $pid > /dev/null 2>&1
then
alldead=false
echo some processes alive
break
fi
done
if $alldead
then
break
fi
done
echo all done.
it will not be exactly simultaneous, but it should kick off the remote scripts in parallel.

Run a php script in background on debian (Apache)

I'm trying to make a push notification work on my debian vps (apace2, mysql).
I use a php script from this tutorial (http://www.raywenderlich.com/3525/apple-push-notification-services-tutorial-part-2).
Basically, the script is put in an infintive loop, that check a mysql table for new records every couple of seconds. The tutorial says it should be run as a background process.
// This script should be run as a background process on the server. It checks
// every few seconds for new messages in the database table push_queue and
// sends them to the Apple Push Notification Service.
//
// Usage: php push.php development &
So I have four questions.
How do I start the script from the terminal? What should I type? The script location on the server is:
/var/www/development_folder/scripts/push2/push.php
How can I kill it if I need to (without having to restart apace)?
Since the push notification is essential, I need a way to check if the script is running.
The code (from the tutorial) calls a function is something goes wrong:
function fatalError($message)
{
writeToLog('Exiting with fatal error: ' . $message);
exit;
}
Maybe I can put something in there to restart the script? But It would also be nice to have a cron job or something that check every 5 minute or so if the script is running, and start it if it doens't.
4 - Can I make the script automatically start after a apace or mysql restart? If the server crash or something else happens that need a apace restart?
Thanks a lot in advance
You could run the script with the following command:
nohup php /var/www/development_folder/scripts/push2/push.php > /dev/null &
The nohup means that that the command should not quit (it ignores hangup signal) when you e.g. close your terminal window. If you don't care about this you could just start the process with "php /var/www/development_folder/scripts/push2/push.php &" instead. PS! nohup logs the script output to a file called nohup.out as default, if you do not want this, just add > /dev/null as I've done here. The & at the end means that the proccess will run in the background.
I would only recommend starting the push script like this while you test your code. The script should be run as a daemon at system-startup instead (see 4.) if it's important that it runs all the time.
Just type
ps ax | grep push.php
and you will get the processid (pid). It will look something like this:
4530 pts/3 S 0:00 php /var/www/development_folder/scripts/push2/push.php
The pid is the first number you'll see. You can then run the following command to kill the script:
kill -9 4530
If you run ps ax | grep push.php again the process should now be gone.
I would recommend that you make a cronjob that checks if the php-script is running, and if not, starts it. You could do this with ps ax and grep checks inside your shell script. Something like this should do it:
if ! ps ax | grep -v grep | grep 'push.php' > /dev/null
then
nohup php /var/www/development_folder/scripts/push2/push.php > /dev/null &
else
echo "push-script is already running"
fi
If you want the script to start up after booting up the system you could make a file in /etc/init.d (e.g. /etc.init.d/mypushscript with something like this inside:
php /var/www/development_folder/scripts/push2/push.php
(You should probably have alot more in this file)
You would also need to run the following commands:
chmod +x /etc/init.d/mypushscript
update-rc.d mypushscript defaults
to make the script start at boot-time. I have not tested this so please do more research before making your own init script!