parse output from running wget command

parse output from running wget command - awk

I'm using wget to synchronise my repository server (I know, wget is not the best tool, but company policy forces me...).
This is the wget command:
/usr/bin/wget --no-check-certificate -r -N -np -nH --cut-dirs=2 --include-directories=dir_1/dir_2/RPMS.all https://repo_url/dir_1/dir_2/RPMS.all
This does the job, but I would like to capture the output of wget which looks like this (e.g.) :
--2016-07-07 16:59:10-- https://repo_url/dir_1/dir_2/RPMS.all/repodata/d65d6fc4c2a0500803acde0525aa3e604a5ea03ac7b11c5694cc8b1de08ce7cc-filelists.xml.gz
Reusing existing connection to repo_url:443.
Proxy request sent, awaiting response... 200 OK
Length: 156605 (153K) [application/octet-stream]
Server file no newer than local file ‘RPMS.all/repodata/d65d6fc4c2a0500803acde0525aa3e604a5ea03ac7b11c5694cc8b1de08ce7cc-filelists.xml.gz’ -- not retrieving.
so I can process this output (using grep, awk or whatever) and show only the current file that I'm wget-ing.
Apart from that, I want to display that output on the same line over and over until finished (maybe even discarding the 'no newer' files, like above.
I tried several solutions I found (e.g. using IFS or shopt or stdbuf), but none seem to work. I also tried with the wget -O - option, but that doesn't work either.
Maybe to clarify a bit more:
I'd like to do this while wget is working. I don't want to do this when wget is finished, but process each connection while wget is running, whether the source file is newer or not.
Is this at all possible?

Related

How to save output of python-swiftclient to file when dowloading a directory?

Sometimes I get errors when I download files from a cloud with python-swiftclient, like this one:
Error downloading object 'uploads/1/image.png': Object GET failed: https://orbit.brightbox.com/v1/acc-12345/uploads/1/image.png 500 Internal Error b'An error occurred'
To search for the all errors and re-download failed files I would want to save output of the swift command to a file
I tried to do the following ways:
swift-cli -A https://orbit.brightbox.com/v1/acc-12345 \
-U user -K secret download uploads 2>&1 | tee uploads.log
# and
swift-cli -A https://orbit.brightbox.com/v1/acc-12345 \
-U user -K secret download uploads > uploads.log
But this didn't work. man swift describes -o option
For a single object download, you may use the -o [--output]
option to redirect the output to a specific file or if "-" then just redirect to stdout or with --no-download actually not to write anything to disk.
but when I try to download a directory with -o option if fails with
-o option only allowed for single file downloads
How can I save log to a file when I download a directory with swift CLI?

Actually redirecting output to a file works with swift-client:
swift-cli -A https://orbit.brightbox.com/v1/acc-12345 \
-U user -K secret download uploads > uploads.log
I was confused because after I started the command above, in another terminal window I did
tail -f uploads.log
But it didn't give me any output (like I was seeing when I was running the download command without redirection).
Seems like that swift-client writes to a file in batches and I needed to wait about a minute until tail -f dumps into the console a hundred of lines like this
uploads/documents/1/image.png [auth 0.000s, headers 0.390s, total 14.361s, 0.034 MB/s]

scp command - transfer folder over ssh

I have a Arduino Yun and want setup the server for Yun.
So what I want is to copy a folder that contain a py file and a index.html to my Yun
I used mac terminal to do this operation
the command looks like this
scp -r /Users/gudi/Desktop/LobsterHeartRate root#192.168.240.1:/mnt/sda1
and then terminal asked for the password
after I typed, it shows
scp: /mnt/sda1/LobsterHeartRate: Not a directory
I didn't type /mnt/sda1/LobsterHeartRate why it shows this error

Your code
scp -r /Users/gudi/Desktop/LobsterHeartRate root#192.168.240.1:/mnt/sda1
requires that the remote directory /mnt/sda1 exists. This looks like it is not true in your case. Check it using ssh root#192.168.240.1 ls /mnt/sda1.
scp is simple tool and it does not allow you to rename directories on the fly and the target directory must exists. You might try
scp -r /Users/gudi/Desktop/LobsterHeartRate root#192.168.240.1:/mnt/
ssh root#192.168.240.1 mv /mnt/LobsterHeartRate /mnt/sda1
or so, if it will suit your needs. But copying more files, rsync is usually more suitable. Check its manual page and give it a try next time.

As #Jens Höpken notes, your post is a bit sparse. But trying to read between the lines of your post I suspect that LobsterHeartRate is a DIRECTORY on your local system but a FILE named LobsterHeartRate in your target system. This might be happening right at the top of the directory tree, or perhaps you have directories/files of the same name further down the tree. scp -rv might help resolve any confusions here.
Beware: scp -r resolves symbolic links. If you want to preserve symlinks you need to do something else. For historic reasons I use the following, though cpio with a find front-end opens up interesting possibilities for fine-grained file selections.
( cd /Users/gudi/Desktop && tar -cf - LobsterHeartRate ) |
ssh root#192.168.240.1 'cd /mnt/sda1 && tar -xf -'
For a safe "dry run" you could change the -xf to a -tf. The && chains are required to prevent bad things from happening if any prior command fails.
Disclaimer: any debugging is left as an exercise for the student.

Setup Amazon S3 backup on QNAP using s3cmd

I own a QNAP-219P and I want to set this up manually using s3cmd.
I did quite a bit of research on this, and here are the references I got:
http://web.archive.org/web/20091120211330/http://codemonkeybrown.com/qnaps3.html
http://wiki.qnap.com/wiki/Running_Your_Own_Application_at_Startup
http://wiki.qnap.com/wiki/Add_items_to_crontab
http://blog.wingateuk.com/2013/03/cloud-backup-on-qnap-nas.html?showComment=1413660445187#c8935766892046800936
I'm trying to get the s3cmd to work on my TS-219P.
I got everything to work (on command line), even running the script file (s3-backup.sh) on command line:
#!/bin/bash <-- I also tried #!/bin/sh
/share/maintenance/s3cmd-1.5.0-rc1/s3cmd --rr sync -rv /share/all-shared-folders/emilie/ s3://kingjim-backup/kingjim-nas/emilie/ >> /share/maintenance/log/s3cmd/backup_`date "+%Y%m%d-%H-%M"`.log <-- I also tried running s3cmd via python by adding /usr/bin/python on the front.
If I run using the SSH command prompt, it seems to work perfectly.
The problem though, is the cronjob. I can confirm the cronjob trigger, and it was run, because my log file (the one above) was generated, but the log is always empty, even though I'm sure there are some new files created/modified.
This is my cronjob task:
14 3 * * * /share/maintenance/s3-backup.sh 2>&1 | logger
I've done a number of different variations on the above, but couldn't find out what was missing.
I feel like some dependency is missing when the crontab is running, as compared to when I run it on command prompt. But I don't know how to debug crontab.

Found out that the problem was that the s3cmd configuration file was not found when running s3cmd.
So the fix was simply to copy this .s3config file to a safe shared folder, and then call the s3cmd with the "--config" parameter followed by the file.
Like this:
/share/maintenance/s3-backup/s3cmd/s3cmd --config
/share/maintenance/s3-backup/s3cmd.config --rr sync -rv /share/MD0_DATA/ s3://xxx-backup/xxx-nas/ >> /share/maintenance/s3-backup/logs/backup_`date "+%Y%m%d-%H-%M"`.log 2>&1

Is there a simple way to use scp that will behave like rsync -u or cp -u

I'd like to be able to upload to my remote server only updating new files. I am using a nanoblogger, and it appears to upload the entire thing every time using plain scp -r, but I can't find any -u option for scp mentioned in the man pages.
I suppose I could try to somehow script the upload with an ls or find that grabs only files updated in the last $n minutes, or something, but that seems heavy handed.

Use rsync over SSH.
I you can scp, you can very probably rsync over ssh:
rsync -a /some/dir/ user#server:/dest/dir/

scp doesn't have any conditional option, and probably won't get it anytime soon. rsync seems like a very reasonable way, if it is installed on the target system; if not, some find + uniq magic could do the job, but would be serious work. Compiling rsync would probably be faster :-).

Is there a curl/wget option that prevents saving files in case of http errors?

I want to download a lot of urls in a script but I do not want to save the ones that lead to HTTP errors.
As far as I can tell from the man pages, neither curl or wget provide such functionality.
Does anyone know about another downloader who does?

I think the -f option to curl does what you want:
-f, --fail
(HTTP) Fail silently (no output at all) on server errors. This is mostly done to better
enable scripts etc to better deal with failed attempts. In normal cases when an HTTP
server fails to deliver a document, it returns an HTML document stating so (which often
also describes why and more). This flag will prevent curl from outputting that and
return error 22. [...]
However, if the response was actually a 301 or 302 redirect, that still gets saved, even if its destination would result in an error:
$ curl -fO http://google.com/aoeu
$ cat aoeu
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
here.
</BODY></HTML>
To follow the redirect to its dead end, also give the -L option:
-L, --location
(HTTP/HTTPS) If the server reports that the requested page has moved to a different
location (indicated with a Location: header and a 3XX response code), this option will
make curl redo the request on the new place. [...]

One liner I just setup for this very purpose:
(works only with a single file, might be useful for others)
A=$$; ( wget -q "http://foo.com/pipo.txt" -O $A.d && mv $A.d pipo.txt ) || (rm $A.d; echo "Removing temp file")
This will attempt to download the file from the remote Host. If there is an Error, the file is not kept. In all other cases, it's kept and renamed.

Ancient thread.. landed here looking for a solution... ended up writing some shell code to do it.
if [ `curl -s -w "%{http_code}" --compress -o /tmp/something \
http://example.com/my/url/` = "200" ]; then
echo "yay"; cp /tmp/something /path/to/destination/filename
fi
This will download output to a tmp file, and create/overwrite output file only if status was a 200. My usecase is slightly different.. in my case the output takes > 10 seconds to generate... and I did not want the destination file to remain blank for that duration.

NOTE: I am aware that this is an older question, but I believe I have found a better solution for those using wget than any of the above answers provide.
wget -q $URL 2>/dev/null
Will save the target file to the local directory if and only if the HTTP status code is within the 200 range (Ok).
Additionally, if you wanted to do something like print out an error whenever the request was met with an error, you could check the wget exit code for non-zero values like so:
wget -q $URL 2>/dev/null
if [ $? != 0]; then
echo "There was an error!"
fi
I hope this is helpful to someone out there facing the same issues I was.
Update:
I just put this into a more script-able form for my own project, and thought I'd share:
function dl {
pushd . > /dev/null
cd $(dirname $1)
wget -q $BASE_URL/$1 2> /dev/null
if [ $? != 0 ]; then
echo ">> ERROR could not download file \"$1\"" 1>&2
exit 1
fi
popd > /dev/null
}

I have a workaround to propose, it does download the file but it also removes it if its size is 0 (which happens if a 404 occurs).
wget -O <filename> <url/to/file>
if [[ (du <filename> | cut -f 1) == 0 ]]; then
rm <filename>;
fi;
It works for zsh but you can adapt it for other shells.
But it only saves it in first place if you provide the -O option

As alternative you can create a temporal rotational file:
wget http://example.net/myfile.json -O myfile.json.tmp -t 3 -q && mv list.json.tmp list.json
The previous command will always download the file "myfile.json.tmp" however only when the wget exit status is equal to 0 the file is rotated as "myfile.json".
This solution will prevent to overwrite the final file when a network failure occurs.
The advantage of this method is that in case that something is wrong you can inspect the temporal file and see what error message is returned.
The "-t" parameter attempt to download the file several times in case of error.
The "-q" is the quiet mode and it's important to use with cron because cron will report any output of wget.
The "-O" is the output file path and name.
Remember that for Cron schedules it's very important to provide always the full path for all the files and in this case for the "wget" program it self as well.

You can download the file without saving using "-O -" option as
wget -O - http://jagor.srce.hr/
You can get mor information at http://www.gnu.org/software/wget/manual/wget.html#Advanced-Usage

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas