Setting up a Crontab for scrapy - scrapy

I am trying to set up a crontab for scraping something. So far, I wrote
23 18 * * * cd PycharmProjects/untitled/Project1 && scrapy crawl xx -o test.csv
But when I do that I get this:
/bin/sh: scrapy: command not found.
What should I do?
I tried to locate the scrapy in my mac but couldn't find it. But I am able run the second part of the crontab task from terminal.

Since crontab doesn't set up PATH variable for you, it doesn't know what scrapy is.
The easy way to remedy is to use full path of scrapy:
$ which scrapy
/usr/bin/scrapy
Then use that instead of just scrapy:
23 18 * * * cd PycharmProjects/untitled/Project1 && /usr/bin/scrapy crawl xx -o test.csv
Another way of doing this is to set the PATH environment in your crontab:
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
# or your custom path, check your `.bashrc` for PATH you have set in your shell
23 18 * * * cd PycharmProjects/untitled/Project1 && scrapy crawl xx -o test.csv
Sidenote:
Also it's very common in cron to wrap your command in some sort of script that populates the PATH and other configurations and calling that script in cron instead of calling the commands directly.

Related

CasperJS and cronjob

so I have phantomJS and casperJS installed, everything is working fine, but I'm trying to add my casperJS file to cronjob (ubuntu) and I'm getting error:
/bin/sh: 1: /usr/local/bin/casperjs: not found
My crontab file:
0 */1 * * * PHANTOMJS_EXECUTABLE=/usr/local/bin/phantomjs
/usr/local/bin/casperjs /usr/local/share/casper-test/test.js 2>&1
Any Ideas whats wrong?
If you want to use several commands on one line, you have to separate them with a semicolon:
0 */1 * * * PHANTOMJS_EXECUTABLE=/usr/local/bin/phantomjs ; /usr/local/bin/casperjs /usr/local/share/casper-test/test.js 2>&1
Or, if you need to execute commands sequentially and only progress to next if the previous has been successful, use && operator.
For better readability you could just put those commands in a shell script and run that from cron.

create a backup of database every day using Cron. [putty]

I have this code which created a backup of my database.
pg_dump -U dbadmin -h 127.0.0.1 123telcom -f dbbackup
Now i want to create a backup every night.
Is there a way u can execute this code with crontab?
0 3 * * * pg_dump -U dbadmin -h 127.0.0.1 123telcom -f dbbackup
I'm new to putty so if anyone could help me a little that would be great.
I suspect that you have fallen foul of cron's PATH set up.
If you look in /etc/crontab, it will define a PATH for itself and you will probably have a different PATH set up for your login.
Create your script with the first 2 lines:
#!/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
where the PATH includes whatever is set up in your environment and ensure that the script is executable.
To test what is going on try this script:
#!/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
echo $PATH >> /home/yourhome/cron.txt
create an entry in /etc/crontab:
* * * * * root /home/yourhome/yourshell.sh
tell cron about the changes by using sudo crontab -e and then just save it and exit (often Ctrl O and Ctrl X if using nano editor) or I think that you can just kill the cron process and it will re-spawn.
Then check the cron.txt file to see what it is using for PATH.
PS Don't forget to remove this script from the crontab afterwards

Command works in shell but not Objective-C or C

I want to run the following shell command in Objective-C
sshfs -C -p 22 user#remote.computer.com ~/local/directory/path
using the command system("sshfs -C -p 22 user#remote.computer.com ~/local/directory/path");
but I get sh: sshfs: command not found in NSLog.
If I copy and paste it into terminal however, it works.
The path used by an GUI application does not include any changes you have made in your shell files in your home directory (e.g. ~/.bashrc)
One way is to use the full path in the system call. (i.e. /Users/username/Projects - ~ are not automatically expanded) In a Cocoa app I would use NSTask to give more control

Cron Job Rails 3 - Loading system ruby not RVM ruby

I'm trying to set up a cron job with the following command:
crontab -l
Begin Whenever generated tasks for: myapp
* * * * * /bin/bash -l -c 'cd /Users/boris/projects/myapp && script/rails runner "Resque.enqueue(MyModel)"'
I get the following error; in which I see its loading Ruby 1.8. The problem is I'm using RVM with ruby 1.9.2. How do I specify the correct RVM path in CRON?
Subject: Cron <boris#jz> /bin/bash -l -c cd /Users/boris/projects/myapp && script/rails runner "Resque.enqueue(Place)"
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=boris>
X-Cron-Env: <USER=boris>
X-Cron-Env: <HOME=/Users/boris>
Message-Id: <20110523022400.A5B242C608D#jz.local>
Date: Sun, 22 May 2011 19:24:00 -0700 (PDT)
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require': no such file to load -- bundler/setup (LoadError)
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `require'
from /Users/boris/projects/myapp/config/boot.rb:6
from script/rails:5:in `require'
from script/rails:5
How do I specify the correct RVM path in CRON?
Thanks in advance
Ruby path with which ruby:
/Users/boris/.rvm/rubies/ruby-1.9.2-p180/bin/ruby
Please do not use the -l switch in cron jobs. The --login switch instructs bash to run as a login shell. Therefore, it will load your environment, and things might appear to work. However, cron jobs are by nature non-interactive, non-login shells. Invoking them as if they were is just bad practice. Also, when bash starts a login shell, it first loads the system environment (/etc/profile), and if in that file something needs to print to the screen (like motd), your cron job will report nasty errors like this:
stty: TIOCGETD: Inappropriate ioctl for device
You don't need to write a cron runner neither (following that logic, you might as well write a cron runner runner). Please keep things simple. All you need to do is configure your cron job to launch a bash shell, and make that bash shell load your environment.
The shebang line in your script should not refer directly to a ruby executable, but to rvm's ruby:
#!/usr/bin/env ruby
This instructs the script to load the environment and run ruby as we would on the command line with rvm loaded.
On many UNIX derived systems, crontabs can have a configuration section before the actual lines that define the jobs to be run. If this is the case, you would then specify:
SHELL=/path/to/bash
This will ensure that the cron job will be spawned from bash. Still, your environment is missing, so to instruct bash to load your environment, you will want to add to the configuration section the following:
BASH_ENV=/path/to/environment (typically .bash_profile or .bashrc)
HOME is automatically derived from the /etc/passwd line of the crontab owner, but you can override it.
HOME=/path/to/home
After this, a cron job might look like this:
15 14 1 * * $HOME/rvm_script.rb
What if your crontab doesn't support the configuration section. Well, you will have to give all the environment directives in one line, with the job itself. For example,
15 14 1 * * export BASH_ENV=/path/to/environment && /full/path/to/bash -c '/full/path/to/rvm_script.rb'
Full blog post on the subject
Your problem is that you're executing two commands but not as you expect. The two commands are:
/bin/bash -l -c cd /Users/boris/projects/myapp
script/rails runner "Resque.enqueue(MyModel)"
With the second only executing if the first succeeded. I think you just need some quotes:
* * * * * /bin/bash -l -c 'cd /Users/boris/projects/myapp && script/rails runner "Resque.enqueue(MyModel)"'
Those single quotes will feed your cd ... && script/rails ... pair to /bin/bash as a single command and that should change the current working directory to what you want when script/rails is executed.
Easiest solution is to use this command instead:
Begin Whenever generated tasks for: myapp
* * * * * /bin/bash -l -c 'cd /Users/boris/projects/myapp && ./script/rails runner "Resque.enqueue(MyModel)"'

rsync: polling for new files

I've got:
$ rsync -azv zope#myserver:/smb/Data/*/*/* ~/rsynced_samples/
And I want it to run forever, syncing any new file as soon as it appears on myserver:
(specifying a poll interval, such as 4 seconds would be an ok comprise)
Instead of rsync you can use inotifywait which use kernel specific file changes triggers.
This script (inotify.sh) can you give an idea:
#!/bin/bash
directory=$1
inotifywait -q -m --format '%f' -e modify -e move -e create -e delete ${directory} | while read line
do
echo "doing something with: $line";
# for example:
# cp $line to <somewhere>
You can invoke this script specifying the "monitor" directory, in this way
./inotify.sh ~/Desktop/
The $line variable contains the full file path.
If you want to limit to only newly created files you can use on the flag "-e create"
Use cron to set up a check based on your time interval (say, every minute, perhaps?) . This link should help: http://www.cyberciti.biz/faq/how-do-i-add-jobs-to-cron-under-linux-or-unix-oses/
Note that a cron tab is set up on your machine side, not in your bash script
also useful: http://benr75.com/pages/using_crontab_mac_os_x_unix_linux
and here is a code example:
1) crontab -e // this opens up your current crontab or creates one if it does not exist
2) enter: * * * * * file.sh >> log.txt // this would pipe the output of your file to a log file and run it every minute.
hope that helps