tar -xvf file.tar.gz failed with: gzip:stdin:not in gzip format - gzip

I tar a set of files with command:tar -czvf file.tar.gz file/ then copy to usb (ext4 format), I checked that I can untar it. After I reinstall system , when mount usb , some error happened , I do fsck /dev/sdc1 and success in mount and copy it to pc. When I untar it tar -xvf file.tar.gz, error happen again:
gzip: stdin: not in gzip format
tar : child returned status 1
tar: Error is not recoverable: exiting now
I have no idea how to rescue the data.
Any help needed. Thanks.

#sitexa , I got this error when file is not fully downloaded(transferred).
tar xvfz apache-tomcat-8.5.12.tar.gz
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
This error is coming due to file is not completely downloaded.
reference : https://ubuntuforums.org/showthread.php?t=1319801

check for file by the command
file <file name>
<filename>: HTML document, ASCII text, with very long lines
comes then it may be corrupted

Related

CUB_200_2011 Dataset Download link Error COLAB

I am trying to download CUB_200_2011 dataset in colab using
!wget http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz
after running this i got
--2021-05-28 10:13:12-- http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz
Resolving www.vision.caltech.edu (www.vision.caltech.edu)... 34.208.54.77
Connecting to www.vision.caltech.edu (www.vision.caltech.edu)|34.208.54.77|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://drive.google.com/file/d/1hbzc_P1FuxMkcabkgn9ZKinBwW683j45/view [following]
--2021-05-28 10:13:12-- https://drive.google.com/file/d/1hbzc_P1FuxMkcabkgn9ZKinBwW683j45/view
Resolving drive.google.com (drive.google.com)... 74.125.195.102, 74.125.195.113, 74.125.195.138, ...
Connecting to drive.google.com (drive.google.com)|74.125.195.102|:443... connected.
HTTP request sent, awaiting response... 200 OK
**Length: unspecified [text/html]**
Saving to: ‘CUB_200_2011.tgz’
CUB_200_2011.tgz [ <=> ] 71.36K --.-KB/s in 0.03s
2021-05-28 10:13:13 (2.41 MB/s) - ‘CUB_200_2011.tgz’ saved [73069]
Length is unspecified and it says its an HTML file and cannot unrar it as i get an error.
!tar -xvzf CUB_200_2011.tgz
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Is there anything wrong with the link or what is the problem?
See the message carefully, download URL leading to google drive folder in which it navigates in the confirmation page instead of initiating the download. The following command is prepared for your requirement, where you see configuring the download with Google Drive file id, setting CUB_200_2011.tgz as an output file, using cookies.txt file as specified by --keep-session-cookie to hold cookie information during the download, enabled auto-confirmation for the download, also skipping the certificate check by --no-check-certificate, and removeed cookies.txt at the end after the download is over.
!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1hbzc_P1FuxMkcabkgn9ZKinBwW683j45' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1hbzc_P1FuxMkcabkgn9ZKinBwW683j45" -O CUB_200_2011.tgz && rm -rf /tmp/cookies.txt
Also, nothing wrong with your tar command, it should work properly when you complete the first command correctly. Hopefully, it resolves your issue.
It seems that the original authors redirected the dataset link to a google drive link (This broke tons of online tutorials) but a new public source of the data is provided by fast.ai and can be obtained in ipython session with the following line:
!wget https://s3.amazonaws.com/fast-ai-imageclas/CUB_200_2011.tgz

Download/Copy tar.gz File from S3 to EC2

When I download a tar.gz file from AWS S3, and then I try to untar it, I am getting the following error:
tar -xzvf filename_backup_jan212021_01.tar.gz
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
When I check what type of file it is, I get this:
file filename_backup_jan212021_01.tar.gz
filename_backup_jan212021_01.tar.gz: ASCII text
This is the command I am using to copy the file from S3 to my EC2:
aws s3 cp s3://bucket_name/filename_backup_jan212021_01.tar.gz .
Please, help me find a solution to extract a tar.gz file after downloading it from AWS S3.
tar -xzvf filename_backup_jan212021_01.tar.gz
gzip: stdin: not in gzip format
file filename_backup_jan212021_01.tar.gz
filename_backup_jan212021_01.tar.gz: ASCII text
cat filename_backup_jan212021_01.tar.gz
/home/ec2-user/file_delete_01.txt
/home/ec2-user/file_jan2021.txt
/home/ec2-user/filename_backup_jan1.tar.gz
/home/ec2-user/filename_backup_jan1.txt
/home/ec2-user/filename_backup_jan2.tar.gz
/home/ec2-user/filename_backup_jan3.tar.gz
All of these indicate that the file that was uploaded to S3 itself is not gzip'd tar file, rather just a plain text file uploaded with a .tar.gz filename. While filenames and extensions are used to indicate content type to humans, computers think otherwise :)
You can create the file with
tar cvzf <archive name> </path/to/files/to/be/tarred> && aws s3 cp <bucket path> <archive name>
to create the archive and upload it to S3, and use the commands you mention in the question to download them. Of course replace the placeholders with the proper names and such

s3cmd put files from tar stdin: [Errno 32] Broken pipe

I was trying to upload files from stdin by s3cmd, by this command below
tar cfz - folder | s3cmd put - s3://backups/abcd2.tar
but sometimes I get ERROR
ERROR: Cannot retrieve any response status before encountering an EPIPE or ECONNRESET exception
WARNING: Upload failed: /abcd2.tar?partNumber=21&uploadId=... ([Errno 32] Broken pipe)
WARNING: Waiting 3 sec...
I think s3cmd is waiting to long for TAR and Amazon S3 close connection (RequestTimeout).
How to solve this problem? Maybe add buffer between tar and s3cmd but how?

ansible - unarchive - input file not found

I'm getting this error while Ansible (1.9.2) is trying to unpack the file.
19:06:38 TASK: [jmeter | unpack jmeter] ************************************************
19:06:38 fatal: [jmeter01.veryfast.server.jenkins] => input file not found at /tmp/apache-jmeter-2.13.tgz or /tmp/apache-jmeter-2.13.tgz
19:06:38
19:06:38 FATAL: all hosts have already failed -- aborting
19:06:38
I checked on the target server, /tmp/apache-jmeter-2.13.tgz file exists and it has valid permissions (for testing I also gave 777 even though not reqd but still got the above error mesg).
I also checked md5sum of this file (compared it with what's there on the apache jmeter site) -- It matches!
# md5sum apache-jmeter-2.13.tgz|grep 53dc44a6379b7b4a57976936f3a65e03
53dc44a6379b7b4a57976936f3a65e03 apache-jmeter-2.13.tgz
When I'm using tar -xvzf on this file, tar is able to show/extract it's contents in the .tgz file.
What could I be missing? At this point, I'm wondering unarchive method/module in Ansible must have some bug.
My last resort (if I can't get unarchive in Ansible to work) would be to use Command: "tar -xzvf /tmp/....." but I don't want to do that as my first preference.
The default behavior for Unarchive is to find the file on your local system, copy it to the remote, and unpack it. I suspect if you're getting a file not found error then you need to specify copy=no in your task.

why is gzip trying to compress itself

Attempting to run gzip from a command prompt to compress any file returns
gzip: /usr/bin/gzip is not a directory or a regular file - ignored
as the first line of output.
Here's what I can think of to share that may shed some light:
oslevel
7.1.0.0
echo $SHELL
/usr/bin/ksh
gzip -V
gzip 1.2.4 (18 Aug 93)
Compilation options:
DIRENT UTIME STDC_HEADERS HAVE_UNISTD_H
To produce the error, all I have to do is try to compress any file with gzip (i.e. gzip test.out). The error occurs when run from the command prompt as well as when run from cron.
Any thoughts as to why this is happening?
Additional requsted information:
gzip -h
gzip 1.2.4 (18 Aug 93)
usage: gzip [-cdfhlLnNrtvV19] [-S suffix] [file ...]
-c --stdout write on standard output, keep original files unchanged
-d --decompress decompress
-f --force force overwrite of output file and compress links
-h --help give this help
-l --list list compressed file contents
-L --license display software license
-n --no-name do not save or restore the original name and time stamp
-N --name save or restore the original name and time stamp
-q --quiet suppress all warnings
-r --recursive operate recursively on directories
-S .suf --suffix .suf use suffix .suf on compressed files
-t --test test compressed file integrity
-v --verbose verbose mode
-V --version display version number
-1 --fast compress faster
-9 --best compress better
file... files to (de)compress. If none given, use standard input.
file /usr/bin/gzip
/usr/bin/gzip: executable (RISC System/6000) or object module
gzip *.out
gzip: /usr/bin/gzip is not a directory or a regular file - ignored
gzip -d *.gz
gzip: /usr/bin/gzip is not a directory or a regular file - ignored
Found the issue:
We have an environment file where we load common environment variables. One line in the file is
export GZIP="/usr/bin/gzip"
According to the gzip documentation, the GZIP environment variable is used to specify options. So, it's probably taking that as part of the command line. And, since it's not really an option, it's interpreting the value as being a file name, which it can't gzip because it's actually a symbolic link. By unsetting the variable, the error goes away.