CUB_200_2011 Dataset Download link Error COLAB - google-colaboratory

I am trying to download CUB_200_2011 dataset in colab using
!wget http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz
after running this i got
--2021-05-28 10:13:12-- http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz
Resolving www.vision.caltech.edu (www.vision.caltech.edu)... 34.208.54.77
Connecting to www.vision.caltech.edu (www.vision.caltech.edu)|34.208.54.77|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://drive.google.com/file/d/1hbzc_P1FuxMkcabkgn9ZKinBwW683j45/view [following]
--2021-05-28 10:13:12-- https://drive.google.com/file/d/1hbzc_P1FuxMkcabkgn9ZKinBwW683j45/view
Resolving drive.google.com (drive.google.com)... 74.125.195.102, 74.125.195.113, 74.125.195.138, ...
Connecting to drive.google.com (drive.google.com)|74.125.195.102|:443... connected.
HTTP request sent, awaiting response... 200 OK
**Length: unspecified [text/html]**
Saving to: ‘CUB_200_2011.tgz’
CUB_200_2011.tgz [ <=> ] 71.36K --.-KB/s in 0.03s
2021-05-28 10:13:13 (2.41 MB/s) - ‘CUB_200_2011.tgz’ saved [73069]
Length is unspecified and it says its an HTML file and cannot unrar it as i get an error.
!tar -xvzf CUB_200_2011.tgz
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Is there anything wrong with the link or what is the problem?

See the message carefully, download URL leading to google drive folder in which it navigates in the confirmation page instead of initiating the download. The following command is prepared for your requirement, where you see configuring the download with Google Drive file id, setting CUB_200_2011.tgz as an output file, using cookies.txt file as specified by --keep-session-cookie to hold cookie information during the download, enabled auto-confirmation for the download, also skipping the certificate check by --no-check-certificate, and removeed cookies.txt at the end after the download is over.
!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1hbzc_P1FuxMkcabkgn9ZKinBwW683j45' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1hbzc_P1FuxMkcabkgn9ZKinBwW683j45" -O CUB_200_2011.tgz && rm -rf /tmp/cookies.txt
Also, nothing wrong with your tar command, it should work properly when you complete the first command correctly. Hopefully, it resolves your issue.

It seems that the original authors redirected the dataset link to a google drive link (This broke tons of online tutorials) but a new public source of the data is provided by fast.ai and can be obtained in ipython session with the following line:
!wget https://s3.amazonaws.com/fast-ai-imageclas/CUB_200_2011.tgz

Related

podman CentOS 8 not starting container as non-root user

I am trying to start busybox container as non root on CentOS 8 server, but its giving the below message.
What is the correct way to start the container as non-root user?
podman run -it --name busy docker.io/library/busybox sh
Trying to pull docker.io/library/busybox...Getting image source signatures
Copying blob bdbbaa22dec6 done
Copying config 6d5fcfe5ff done
Writing manifest to image destination
Storing signatures
ERRO[0003] Error pulling image ref //busybox:latest: Error committing the finished image: error adding layer with blob "sha256:bdbbaa22dec6b7fe23106d2c1b1f43d9598cd8fc33706cc27c1d938ecd5bffc7": Error processing tar file(exit status 1): there might not be enough IDs available in the namespace (requested 65534:65534 for /home): lchown /home: invalid argument
Failed
Error: unable to pull docker.io/library/busybox: unable to pull image: Error committing the finished image: error adding layer with blob "sha256:bdbbaa22dec6b7fe23106d2c1b1f43d9598cd8fc33706cc27c1d938ecd5bffc7": Error processing tar file(exit status 1): there might not be enough IDs available in the namespace (requested 65534:65534 for /home): lchown /home: invalid argument
Yes, the command you run is correct. On my Fedora 31 system it works just fine.
[testuser#fedora31 ~]$ podman run -it --name busy docker.io/library/busybox sh
Trying to pull docker.io/library/busybox...
Getting image source signatures
Copying blob bdbbaa22dec6 done
Copying config 6d5fcfe5ff done
Writing manifest to image destination
Storing signatures
/ # exit
[testuser#fedora31 ~]$ podman --version
podman version 1.8.0
[testuser#fedora31 ~]$
The flag --rm is also often useful.
It seems the error you get is related to the UID mapping.
Here is some information regarding running "rootless" podman:
https://github.com/containers/libpod/blob/master/docs/tutorials/rootless_tutorial.md
What also might be interesting:
"Does not work on NFS or parallel filesystem homedirs"
Quote from
https://github.com/containers/libpod/blob/master/rootless.md

How to save output of python-swiftclient to file when dowloading a directory?

Sometimes I get errors when I download files from a cloud with python-swiftclient, like this one:
Error downloading object 'uploads/1/image.png': Object GET failed: https://orbit.brightbox.com/v1/acc-12345/uploads/1/image.png 500 Internal Error b'An error occurred'
To search for the all errors and re-download failed files I would want to save output of the swift command to a file
I tried to do the following ways:
swift-cli -A https://orbit.brightbox.com/v1/acc-12345 \
-U user -K secret download uploads 2>&1 | tee uploads.log
# and
swift-cli -A https://orbit.brightbox.com/v1/acc-12345 \
-U user -K secret download uploads > uploads.log
But this didn't work. man swift describes -o option
For a single object download, you may use the -o [--output]
option to redirect the output to a specific file or if "-" then just redirect to stdout or with --no-download actually not to write anything to disk.
but when I try to download a directory with -o option if fails with
-o option only allowed for single file downloads
How can I save log to a file when I download a directory with swift CLI?
Actually redirecting output to a file works with swift-client:
swift-cli -A https://orbit.brightbox.com/v1/acc-12345 \
-U user -K secret download uploads > uploads.log
I was confused because after I started the command above, in another terminal window I did
tail -f uploads.log
But it didn't give me any output (like I was seeing when I was running the download command without redirection).
Seems like that swift-client writes to a file in batches and I needed to wait about a minute until tail -f dumps into the console a hundred of lines like this
uploads/documents/1/image.png [auth 0.000s, headers 0.390s, total 14.361s, 0.034 MB/s]

parse output from running wget command

I'm using wget to synchronise my repository server (I know, wget is not the best tool, but company policy forces me...).
This is the wget command:
/usr/bin/wget --no-check-certificate -r -N -np -nH --cut-dirs=2 --include-directories=dir_1/dir_2/RPMS.all https://repo_url/dir_1/dir_2/RPMS.all
This does the job, but I would like to capture the output of wget which looks like this (e.g.) :
--2016-07-07 16:59:10-- https://repo_url/dir_1/dir_2/RPMS.all/repodata/d65d6fc4c2a0500803acde0525aa3e604a5ea03ac7b11c5694cc8b1de08ce7cc-filelists.xml.gz
Reusing existing connection to repo_url:443.
Proxy request sent, awaiting response... 200 OK
Length: 156605 (153K) [application/octet-stream]
Server file no newer than local file ‘RPMS.all/repodata/d65d6fc4c2a0500803acde0525aa3e604a5ea03ac7b11c5694cc8b1de08ce7cc-filelists.xml.gz’ -- not retrieving.
so I can process this output (using grep, awk or whatever) and show only the current file that I'm wget-ing.
Apart from that, I want to display that output on the same line over and over until finished (maybe even discarding the 'no newer' files, like above.
I tried several solutions I found (e.g. using IFS or shopt or stdbuf), but none seem to work. I also tried with the wget -O - option, but that doesn't work either.
Maybe to clarify a bit more:
I'd like to do this while wget is working. I don't want to do this when wget is finished, but process each connection while wget is running, whether the source file is newer or not.
Is this at all possible?

I m trying to integrate ldap with devstack and when i did ./stack.sh i got this localrc: line 9: KEYSTONE_IDENTITY_BACKEND: command not found

localrc file
ADMIN_PASSWORD=password2 MYSQL_PASSWORD=password2
RABBIT_PASSWORD=password2 SERVICE_PASSWORD=password2
SERVICE_TOKEN=token2
ENABLED_SERVICES=key,n-api,n-crt,n-obj,n-cpu,n-net,n-cond,cinder,c-sch,c-api,c-vol,n-sch,n-novnc,n-xvnc,n-cauth,horizon,mysql,rabbit,ldap
KEYSTONE_IDENTITY_BACKEND=ldap
KEYSTONE_CLEAR_LDAP=yes LDAP_PASSWORD=9632
I followed this website(http://www.ibm.com/developerworks/cloud/library/cl-ldap-keystone/)
I am assuming the above snippet is from a file written in shell script. Your example looks Ok.
I checked the link you provided and noted that the line you say failed is written in the IBM example as:
KEYSTONE_IDENTITY_BACKEND = ldap
Which is not legal sh (or bash) and would cause the error message you described.
KEYSTONE_IDENTITY_BACKEND = ldap
-bash: KEYSTONE_IDENTITY_BACKEND: command not found
I suspect you copied and pasted the bad example from the link into your localrc file, which caused the error you saw, but somehow when you wrote the SO question, you corrected the mistake by removing the spaces around the "=".
Edit: Investigation
;TLDR
Create a file in the root of the devstack repo, devstack/local.conf with the contents:
[[local|localrc]]
ADMIN_PASSWORD=password2
MYSQL_PASSWORD=password2
RABBIT_PASSWORD=password2
SERVICE_PASSWORD=password2
SERVICE_TOKEN=token2
ENABLED_SERVICES=key,n-api,n-crt,n-obj,n-cpu,n-net,n-cond,cinder,c-sch,c-api,c-vol,n-sch,n-novnc,n-xvnc,n-cauth,horizon,mysql,rabbit,ldap
KEYSTONE_IDENTITY_BACKEND=ldap
KEYSTONE_CLEAR_LDAP=yes
LDAP_PASSWORD=9632
Full Description
I installed devstack on Centos7 (using the Devstack Quick Start Guide):
git clone https://git.openstack.org/openstack-dev/devstack
cd devstack
./stack.sh
I entered passwords as prompted, but eventually it failed with the error:
Error: pg_config executable not found.
Please add the directory containing pg_config to the PATH
or specify the full executable path with the option:
python setup.py build_ext --pg-config /path/to/pg_config build ...
or with the pg_config option in 'setup.cfg'.
I traced the problem to a limited PATH in the sudoers entry, and because my postgreSQL install is in a non-standard location, I linked pg_config into /usr/local/bin and ran stack.sh again:
sudo ln -s /usr/pgsql-9.3/bin/pg_config /usr/local/bin/pg_config
./stack.sh
(You probably won't have to do this if Postgres is in a standard location).
Install took a long time -
This is your host IP address: 192.168.200.181
This is your host IPv6 address: ::1
Horizon is now available at http://192.168.200.181/dashboard
Keystone is serving at http://192.168.200.181/identity/
The default users are: admin and demo
The password: 12345678
2016-07-17 18:16:32.834 | WARNING:
2016-07-17 18:16:32.834 | Using lib/neutron-legacy is deprecated, and it will be removed in the future
2016-07-17 18:16:32.834 | stack.sh completed in 1447 seconds.
I killed the devstack session and did it all again with a clean git repo and with a localrc file.
./unstack.sh
cd ..
git clone https://git.openstack.org/openstack-dev/devstack
cd devstack
cat << __EOF > local.conf
[[local|localrc]]
ADMIN_PASSWORD=password2
MYSQL_PASSWORD=password2
RABBIT_PASSWORD=password2
SERVICE_PASSWORD=password2
SERVICE_TOKEN=token2
ENABLED_SERVICES=key,n-api,n-crt,n-obj,n-cpu,n-net,n-cond,cinder,c-sch,c-api,c-vol,n-sch,n-novnc,n-xvnc,n-cauth,horizon,mysql,rabbit,ldap
KEYSTONE_IDENTITY_BACKEND=ldap
KEYSTONE_CLEAR_LDAP=yes
LDAP_PASSWORD=9632
__EOF
./stack.sh
This time there were no password prompts, so the local config was definitely read.

Is there a curl/wget option that prevents saving files in case of http errors?

I want to download a lot of urls in a script but I do not want to save the ones that lead to HTTP errors.
As far as I can tell from the man pages, neither curl or wget provide such functionality.
Does anyone know about another downloader who does?
I think the -f option to curl does what you want:
-f, --fail
(HTTP) Fail silently (no output at all) on server errors. This is mostly done to better
enable scripts etc to better deal with failed attempts. In normal cases when an HTTP
server fails to deliver a document, it returns an HTML document stating so (which often
also describes why and more). This flag will prevent curl from outputting that and
return error 22. [...]
However, if the response was actually a 301 or 302 redirect, that still gets saved, even if its destination would result in an error:
$ curl -fO http://google.com/aoeu
$ cat aoeu
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
here.
</BODY></HTML>
To follow the redirect to its dead end, also give the -L option:
-L, --location
(HTTP/HTTPS) If the server reports that the requested page has moved to a different
location (indicated with a Location: header and a 3XX response code), this option will
make curl redo the request on the new place. [...]
One liner I just setup for this very purpose:
(works only with a single file, might be useful for others)
A=$$; ( wget -q "http://foo.com/pipo.txt" -O $A.d && mv $A.d pipo.txt ) || (rm $A.d; echo "Removing temp file")
This will attempt to download the file from the remote Host. If there is an Error, the file is not kept. In all other cases, it's kept and renamed.
Ancient thread.. landed here looking for a solution... ended up writing some shell code to do it.
if [ `curl -s -w "%{http_code}" --compress -o /tmp/something \
http://example.com/my/url/` = "200" ]; then
echo "yay"; cp /tmp/something /path/to/destination/filename
fi
This will download output to a tmp file, and create/overwrite output file only if status was a 200. My usecase is slightly different.. in my case the output takes > 10 seconds to generate... and I did not want the destination file to remain blank for that duration.
NOTE: I am aware that this is an older question, but I believe I have found a better solution for those using wget than any of the above answers provide.
wget -q $URL 2>/dev/null
Will save the target file to the local directory if and only if the HTTP status code is within the 200 range (Ok).
Additionally, if you wanted to do something like print out an error whenever the request was met with an error, you could check the wget exit code for non-zero values like so:
wget -q $URL 2>/dev/null
if [ $? != 0]; then
echo "There was an error!"
fi
I hope this is helpful to someone out there facing the same issues I was.
Update:
I just put this into a more script-able form for my own project, and thought I'd share:
function dl {
pushd . > /dev/null
cd $(dirname $1)
wget -q $BASE_URL/$1 2> /dev/null
if [ $? != 0 ]; then
echo ">> ERROR could not download file \"$1\"" 1>&2
exit 1
fi
popd > /dev/null
}
I have a workaround to propose, it does download the file but it also removes it if its size is 0 (which happens if a 404 occurs).
wget -O <filename> <url/to/file>
if [[ (du <filename> | cut -f 1) == 0 ]]; then
rm <filename>;
fi;
It works for zsh but you can adapt it for other shells.
But it only saves it in first place if you provide the -O option
As alternative you can create a temporal rotational file:
wget http://example.net/myfile.json -O myfile.json.tmp -t 3 -q && mv list.json.tmp list.json
The previous command will always download the file "myfile.json.tmp" however only when the wget exit status is equal to 0 the file is rotated as "myfile.json".
This solution will prevent to overwrite the final file when a network failure occurs.
The advantage of this method is that in case that something is wrong you can inspect the temporal file and see what error message is returned.
The "-t" parameter attempt to download the file several times in case of error.
The "-q" is the quiet mode and it's important to use with cron because cron will report any output of wget.
The "-O" is the output file path and name.
Remember that for Cron schedules it's very important to provide always the full path for all the files and in this case for the "wget" program it self as well.
You can download the file without saving using "-O -" option as
wget -O - http://jagor.srce.hr/
You can get mor information at http://www.gnu.org/software/wget/manual/wget.html#Advanced-Usage