How to both show the log on the console and save an external log for a crawl? - scrapy

Just like the title. I use -s LOG_FILE=mylog.txt for saving an external log. But I would like also see the log as the spider is running. Is there a way to do that? I'm using Windows 10 and prefer an answer that works in Windows 10.
Amateur developer without computing background here so please go easy on me.

Use gnu's tee tool:
scrapy crawl myspider 2>&1 | tee crawl.log
2>&1 redirect stderr to stdout - you want errors and info in the same file most likely.
| tee crawl.log pipes that output to tee which splits it to crawl.log file and stdout.
There's tee implementation for windows too:
There's a Win32 port of the Unix tee command, that does exactly that. See http://unxutils.sourceforge.net/ or http://getgnuwin32.sourceforge.net/
taken from: https://stackoverflow.com/a/796492/3737009

Related

ssh one shot command gives partial results

I execute a command to grep a long log file on a remote server and the problem is whenever I ssh first and then execute the grep command remotely I get way more matches than if I do it in one shot as follows:
ssh host 'less file | grep something'
I was suspecting some default automatic timeout with the second version so I experimented with those options -o ServerAliveInterval=<seconds> -o ServerAliveCountMax=<int> but to no avail. Any idea what could be the problem?
The problem was related to less. It does not behave well outside of interactive mode. Using cat solved the issue.

parse output from running wget command

I'm using wget to synchronise my repository server (I know, wget is not the best tool, but company policy forces me...).
This is the wget command:
/usr/bin/wget --no-check-certificate -r -N -np -nH --cut-dirs=2 --include-directories=dir_1/dir_2/RPMS.all https://repo_url/dir_1/dir_2/RPMS.all
This does the job, but I would like to capture the output of wget which looks like this (e.g.) :
--2016-07-07 16:59:10-- https://repo_url/dir_1/dir_2/RPMS.all/repodata/d65d6fc4c2a0500803acde0525aa3e604a5ea03ac7b11c5694cc8b1de08ce7cc-filelists.xml.gz
Reusing existing connection to repo_url:443.
Proxy request sent, awaiting response... 200 OK
Length: 156605 (153K) [application/octet-stream]
Server file no newer than local file ‘RPMS.all/repodata/d65d6fc4c2a0500803acde0525aa3e604a5ea03ac7b11c5694cc8b1de08ce7cc-filelists.xml.gz’ -- not retrieving.
so I can process this output (using grep, awk or whatever) and show only the current file that I'm wget-ing.
Apart from that, I want to display that output on the same line over and over until finished (maybe even discarding the 'no newer' files, like above.
I tried several solutions I found (e.g. using IFS or shopt or stdbuf), but none seem to work. I also tried with the wget -O - option, but that doesn't work either.
Maybe to clarify a bit more:
I'd like to do this while wget is working. I don't want to do this when wget is finished, but process each connection while wget is running, whether the source file is newer or not.
Is this at all possible?

Android Gradle save log output to file

Using Android and Gradle how can I save the console messages of gradlew tasks to a file? For example when running 'gradlew connectedCheck -i' how do I save the run times and any failures to a file?
In bash/command line run:
./gradlew connectedCheck -i 2>&1 | tee file.txt
In Powershell on Windows where tee is typically not available, you can do the same thing with the normal redirection operator (looks similar to BASH, but does indeed work):
./gradlew connectedCheck -i 2>&1 > file.txt
As far as I know this should work all the way back to Powershell 2.0, only because we still use it at work on some of our older servers. I can't find docs for anything older than v3.0, for which the documentation is here:
about_Redirection | Microsoft Docs

Redirect stderr to stdout in C shell

When I run the following command in csh, I got nothing, but it works in bash.
Is there any equivalent in csh which can redirect the standard error to standard out?
somecommand 2>&1
The csh shell has never been known for its extensive ability to manipulate file handles in the redirection process.
You can redirect both standard output and error to a file with:
xxx >& filename
but that's not quite what you were after, redirecting standard error to the current standard output.
However, if your underlying operating system exposes the standard output of a process in the file system (as Linux does with /dev/stdout), you can use that method as follows:
xxx >& /dev/stdout
This will force both standard output and standard error to go to the same place as the current standard output, effectively what you have with the bash redirection, 2>&1.
Just keep in mind this isn't a csh feature. If you run on an operating system that doesn't expose standard output as a file, you can't use this method.
However, there is another method. You can combine the two streams into one if you send it to a pipeline with |&, then all you need to do is find a pipeline component that writes its standard input to its standard output. In case you're unaware of such a thing, that's exactly what cat does if you don't give it any arguments. Hence, you can achieve your ends in this specific case with:
xxx |& cat
Of course, there's also nothing stopping you from running bash (assuming it's on the system somewhere) within a csh script to give you the added capabilities. Then you can use the rich redirections of that shell for the more complex cases where csh may struggle.
Let's explore this in more detail. First, create an executable echo_err that will write a string to stderr:
#include <stdio.h>
int main (int argc, char *argv[]) {
fprintf (stderr, "stderr (%s)\n", (argc > 1) ? argv[1] : "?");
return 0;
}
Then a control script test.csh which will show it in action:
#!/usr/bin/csh
ps -ef ; echo ; echo $$ ; echo
echo 'stdout (csh)'
./echo_err csh
bash -c "( echo 'stdout (bash)' ; ./echo_err bash ) 2>&1"
The echo of the PID and ps are simply so you can ensure it's csh running this script. When you run this script with:
./test.csh >test.out 2>test.err
(the initial redirection is set up by bash before csh starts running the script), and examine the out/err files, you see:
test.out:
UID PID PPID TTY STIME COMMAND
pax 5708 5364 cons0 11:31:14 /usr/bin/ps
pax 5364 7364 cons0 11:31:13 /usr/bin/tcsh
pax 7364 1 cons0 10:44:30 /usr/bin/bash
5364
stdout (csh)
stdout (bash)
stderr (bash)
test.err:
stderr (csh)
You can see there that the test.csh process is running in the C shell, and that calling bash from within there gives you the full bash power of redirection.
The 2>&1 in the bash command quite easily lets you redirect standard error to the current standard output (as desired) without prior knowledge of where standard output is currently going.
I object the above answer and provide my own. csh DOES have this capability and here is how it's done:
xxx |& some_exec # will pipe merged output to your some_exec
or
xxx |& cat > filename
or if you just want it to merge streams (to stdout) and not redirect to a file or some_exec:
xxx |& tee /dev/null
As paxdiablo said you can use >& to redirect both stdout and stderr. However if you want them separated you can use the following:
(command > stdoutfile) >& stderrfile
...as indicated the above will redirect stdout to stdoutfile and stderr to stderrfile.
xxx >& filename
Or do this to see everything on the screen and have it go to your file:
xxx | & tee ./logfile
What about just
xxx >& /dev/stdout
???
I think this is the correct answer for csh.
xxx >/dev/stderr
Note most csh are really tcsh in modern environments:
rmockler> ls -latr /usr/bin/csh
lrwxrwxrwx 1 root root 9 2011-05-03 13:40 /usr/bin/csh -> /bin/tcsh
using a backtick embedded statement to portray this as follows:
echo "`echo 'standard out1'` `echo 'error out1' >/dev/stderr` `echo 'standard out2'`" | tee -a /tmp/test.txt ; cat /tmp/test.txt
if this works for you please bump up to 1. The other suggestions don't work for my csh environment.

Run a php script in background on debian (Apache)

I'm trying to make a push notification work on my debian vps (apace2, mysql).
I use a php script from this tutorial (http://www.raywenderlich.com/3525/apple-push-notification-services-tutorial-part-2).
Basically, the script is put in an infintive loop, that check a mysql table for new records every couple of seconds. The tutorial says it should be run as a background process.
// This script should be run as a background process on the server. It checks
// every few seconds for new messages in the database table push_queue and
// sends them to the Apple Push Notification Service.
//
// Usage: php push.php development &
So I have four questions.
How do I start the script from the terminal? What should I type? The script location on the server is:
/var/www/development_folder/scripts/push2/push.php
How can I kill it if I need to (without having to restart apace)?
Since the push notification is essential, I need a way to check if the script is running.
The code (from the tutorial) calls a function is something goes wrong:
function fatalError($message)
{
writeToLog('Exiting with fatal error: ' . $message);
exit;
}
Maybe I can put something in there to restart the script? But It would also be nice to have a cron job or something that check every 5 minute or so if the script is running, and start it if it doens't.
4 - Can I make the script automatically start after a apace or mysql restart? If the server crash or something else happens that need a apace restart?
Thanks a lot in advance
You could run the script with the following command:
nohup php /var/www/development_folder/scripts/push2/push.php > /dev/null &
The nohup means that that the command should not quit (it ignores hangup signal) when you e.g. close your terminal window. If you don't care about this you could just start the process with "php /var/www/development_folder/scripts/push2/push.php &" instead. PS! nohup logs the script output to a file called nohup.out as default, if you do not want this, just add > /dev/null as I've done here. The & at the end means that the proccess will run in the background.
I would only recommend starting the push script like this while you test your code. The script should be run as a daemon at system-startup instead (see 4.) if it's important that it runs all the time.
Just type
ps ax | grep push.php
and you will get the processid (pid). It will look something like this:
4530 pts/3 S 0:00 php /var/www/development_folder/scripts/push2/push.php
The pid is the first number you'll see. You can then run the following command to kill the script:
kill -9 4530
If you run ps ax | grep push.php again the process should now be gone.
I would recommend that you make a cronjob that checks if the php-script is running, and if not, starts it. You could do this with ps ax and grep checks inside your shell script. Something like this should do it:
if ! ps ax | grep -v grep | grep 'push.php' > /dev/null
then
nohup php /var/www/development_folder/scripts/push2/push.php > /dev/null &
else
echo "push-script is already running"
fi
If you want the script to start up after booting up the system you could make a file in /etc/init.d (e.g. /etc.init.d/mypushscript with something like this inside:
php /var/www/development_folder/scripts/push2/push.php
(You should probably have alot more in this file)
You would also need to run the following commands:
chmod +x /etc/init.d/mypushscript
update-rc.d mypushscript defaults
to make the script start at boot-time. I have not tested this so please do more research before making your own init script!