why am I get negative compression ratio for a gz file - gzip

I found an interesting thing about a gz file. The compression ration is negative.
[root#pridns named]# ll dns-query.log-2022083103*
-rw-r--r-- 1 named named 1.2G Aug 31 03:10 dns-query.log-2022083103.gz
[root#pridns named]#
[root#pridns named]# gzip -l dns-query.log-2022083103.gz
compressed uncompressed ratio uncompressed_name
1187103824 679547787 -74.7% dns-query.log-2022083103
[root#pridns named]#
[root#pridns named]# python -c "print((679547787-1187103824)/679547787.0)"
-0.7469026413
[root#pridns named]#
[root#pridns named]# gunzip -c dns-query.log-2022083103.gz > dns-query.log-2022083103
[root#pridns named]#
[root#pridns named]# ll dns-query.log-2022083103*
-rw-r--r-- 1 root root 8.7G Sep 16 11:06 dns-query.log-2022083103
-rw-r--r-- 1 named named 1.2G Aug 31 03:10 dns-query.log-2022083103.gz
[root#pridns named]#
[root#pridns named]# ls -l dns-query.log-2022083103*
-rw-r--r-- 1 root root 9269482379 Sep 16 11:06 dns-query.log-2022083103
-rw-r--r-- 1 named named 1187103824 Aug 31 03:10 dns-query.log-2022083103.gz
[root#pridns named]#
[root#pridns named]# python -c "print((9269482379-1187103824)/9269482379.0)"
0.871934184082
[root#pridns named]# gzip --version
gzip 1.5
Copyright (C) 2007, 2010, 2011 Free Software Foundation, Inc.
Copyright (C) 1993 Jean-loup Gailly.
This is free software. You may redistribute copies of it under the terms of
the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.
Written by Jean-loup Gailly.
[root#pridns named]#
The correct compression ratio should be 87.2%.
I also tried to test another file, the compression ratio is correct.
[root#pridns named]# gzip -l dns-security.log-2022083103.gz
compressed uncompressed ratio uncompressed_name
131503235 1275215408 89.7% dns-security.log-2022083103
[root#pridns named]# zcat dns-security.log-2022083103.gz > dns-security.log-2022083103
[root#pridns named]# ls -l dns-security.log-2022083103*
-rw-r--r-- 1 root root 1275215408 Sep 16 11:31 dns-security.log-2022083103
-rw-r--r-- 1 named named 131503235 Aug 31 03:10 dns-security.log-2022083103.gz
[root#pridns named]# python -c "print((1275215408-131503235)/1275215408.0)"
0.896877630105
[root#pridns named]#

The comments of #pmqs are correct, from the spec on https://www.rfc-editor.org/rfc/rfc1952, the original file size is just stored as 4 bytes. 9269482379 = 0x22881138B, the high bits are cropped, so the stored size is 0x2881138B = 679,547,787.
2.3.1. Member header and trailer
...
ISIZE (Input SIZE)
This contains the size of the original (uncompressed) input
data modulo 2^32.
...

Related

maxminddb module for apache 2.4 on ubuntu error: The MaxMind DB file contains invalid metadata

here is my system information:
$ apachectl -v
Server version: Apache/2.4.18 (Ubuntu)
Server built: 2016-07-14T12:32:26
$ cat /etc/*release* | grep -i dist
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"
$ uname -r
4.4.0-57-generic
i wish to install mod_maxminddb. i installed geoipupdate as a prerequisites and configured it to include not only geolite database, but either commercial one (using this)
$ ls -l /etc/GeoIP.conf
-rw-r--r-- 1 root root 818 Dec 24 18:29 /etc/GeoIP.conf
$ ls -l /usr/share/GeoIP/
total 0
$ geoipupdate
$ ls -l /usr/share/GeoIP/
total 187444
-rw-r--r-- 1 root root 112192399 Dec 24 18:46 GeoIP2-City.mmdb
-rw-r--r-- 1 root root 3012279 Dec 24 18:46 GeoIP2-Country.mmdb
-rw-r--r-- 1 root root 47721533 Dec 24 18:46 GeoIPCity.dat
-rw-r--r-- 1 root root 1699494 Dec 24 18:45 GeoIP.dat
-rw-r--r-- 1 root root 4189407 Dec 24 18:45 GeoIPISP.dat
-rw-r--r-- 1 root root 4299547 Dec 24 18:45 GeoLiteASNum.dat
-rw-r--r-- 1 root root 17760694 Dec 24 18:45 GeoLiteCity.dat
-rw-r--r-- 1 root root 1054583 Dec 24 18:45 GeoLiteCountry.dat
then complied and installed the mod_maxminddb, and when apache is configured to use the commercial databases, the following error is thrown
$ apachectl -M
AH00526: Syntax error on line 12 of /etc/apache2/mods-enabled/maxminddb.conf:
MaxMindDBFile: Failed to open /usr/share/GeoIP/GeoIPCity.dat: The MaxMind DB file contains invalid metadata
but that is not the case when apache is configured to use the geolite databases.
any ideas?
You didn't include your Apache config, but you appear to be trying to use mod_maxminddb with GeoIPCity.dat (GeoIP Legacy). It only works with GeoIP2. Adjust line 12 of maxminddb.conf to use GeoIP2-City.mmdb instead.

gitlab backup: make gitlab-rake produce tar.gz files not tar

Backup files I get with gitlab-rake are tar files how can I get tar.gz ?
Here the files::
root#gitlab:~# ll /mnt/backup-git/ -h
total 1.9G
-rw------- 1 git git 57M Nov 29 15:57 1480431448_gitlab_backup.tar
-rw------- 1 git git 57M Nov 29 15:57 1480431473_gitlab_backup.tar
-rw------- 1 git git 452M Nov 30 02:00 1480467623_gitlab_backup.tar
Here my configuration values for the backup::
$ grep -i backup /etc/gitlab/gitlab.rb | grep -v '^#'
gitlab_rails['backup_path'] = "/mnt/backup-git/"
gitlab_rails['backup_keep_time'] = 604800
To create them, following the documentation here, (omnibus installation):
root#gitlab:~# crontab -l | grep -v '^#'
0 2 * * * /opt/gitlab/bin/gitlab-rake gitlab:backup:create CRON=1
It doesn't really make sense to compress the gitlab backup tar files. The gitlab backup tar files are the final tarball made during the backup process and the contents are all files compressed during the backup process. You can read more here

ocaml 4.01.0 → 4.02.1, binary size became larger

On Ubuntu 14.04, 32 bit:
➥ cat test.ml
let () = print_endline "hello";
➥ opam switch list | grep " C "
4.01.0 C 4.01.0 Official 4.01.0 release
➥ ocamlopt test.ml
➥ ls -l a.out
-rwxrwxr-x 1 shorrty shorrty 158569 Oct. 30 13:29 a.out
➥ opam switch 4.02.0
➥ eval `opam config env`
➥ ocamlopt test.ml
➥ ls -l a.out
-rwxrwxr-x 1 shorrty shorrty 171122 Oct. 30 13:30 a.out
➥ opam switch 4.02.1
➥ eval `opam config env`
➥ ocamlopt test.ml
➥ ls -l a.out
-rwxrwxr-x 1 shorrty shorrty 171196 Oct. 30 14:08 a.out
Executable size became bigger and bigger: 158569 → 171122 → 171196.
In a more complex applications I get an even greater increase in the size of the file.
Any ideas how to fix?
Update #1:
Tried strip:
➥ strip -V | head -n 1
GNU strip (GNU Binutils for Ubuntu) 2.24
➥ ls -l
-rwxrwxr-x 1 shorrty shorrty 158569 Oct. 30 15:22 a.4.01.0.out
-rwxrwxr-x 1 shorrty shorrty 117368 Oct. 30 15:26 a.4.01.0.out.stripped
-rwxrwxr-x 1 shorrty shorrty 171122 Oct. 30 15:03 a.4.02.0.out
-rwxrwxr-x 1 shorrty shorrty 127580 Oct. 30 15:26 a.4.02.0.out.stripped
-rwxrwxr-x 1 shorrty shorrty 171196 Oct. 30 15:21 a.4.02.1.out
-rwxrwxr-x 1 shorrty shorrty 127612 Oct. 30 15:26 a.4.02.1.out.stripped
-rwxrwxr-x 1 shorrty shorrty 158569 Oct. 30 15:21 a.out
It continues to grow: 117368 → 127580 → 127612
Update #2:
Tried option -compact, didn't work:
➥ opam switch 4.01.0 && eval `opam config env`
➥ ocamlopt test.ml && ls -l a.out
-rwxrwxr-x 1 shorrty shorrty 158569 Oct. 30 22:02 a.out
➥ ocamlopt -compact test.ml && ls -l a.out
-rwxrwxr-x 1 shorrty shorrty 158569 Oct. 30 22:03 a.out
➥ opam switch 4.02.1 && eval `opam config env`
➥ ocamlopt test.ml && ls -l a.out
-rwxrwxr-x 1 shorrty shorrty 171196 Oct. 30 22:05 a.out
➥ ocamlopt -compact test.ml && ls -l a.out
-rwxrwxr-x 1 shorrty shorrty 171196 Oct. 30 22:05 a.out
Tried option -inline, didn't work too:
➥ opam switch 4.01.0 && eval `opam config env`
➥ ocamlopt -inline 0 test.ml && ls -l a.out
-rwxrwxr-x 1 shorrty shorrty 158569 Oct. 30 22:07 a.out
➥ ocamlopt -inline 1 test.ml && ls -l a.out
-rwxrwxr-x 1 shorrty shorrty 158569 Oct. 30 22:07 a.out
➥ opam switch 4.02.1 && eval `opam config env`
➥ ocamlopt -inline 0 test.ml && ls -l a.out
-rwxrwxr-x 1 shorrty shorrty 171196 Oct. 30 22:08 a.out
➥ ocamlopt -inline 1 test.ml && ls -l a.out
-rwxrwxr-x 1 shorrty shorrty 171196 Oct. 30 22:09 a.out
Your code didn't change, but it calls the pervasives module which changed between 4.01 and 4.02.
Notably, the part about formats was changed to use GADT-based formats instead of strings.
This notably made to_string, of_string and concatenation (quite) heavier.
See this discussion for more details.

ipython works wrongs wrong with awk?

Dear all! I find a question with ipython.
When I input
!ls -l | awk '{print $$1}'
It gives me:
drwxr-xr-x 2 ckivip ckivip 4096 Oct 11 20:38 Desktop
drwxr-xr-x 6 ckivip ckivip 4096 Oct 11 22:57 Documents
drwxrwxr-x 3 ckivip ckivip 4096 Oct 11 12:53 Downloads
drwxr-xr-x 6 ckivip ckivip 4096 Sep 29 18:22 Epigenetics
drwxr-xr-x 2 ckivip ckivip 4096 Sep 20 14:59 Music
drwxr-xr-x 23 ckivip ckivip 4096 Oct 10 11:02 Pictures
drwxr-xr-x 8 ckivip ckivip 4096 Sep 20 15:21 Project
drwx------ 5 ckivip ckivip 4096 Sep 25 21:31 R
drwxr-xr-x 5 ckivip ckivip 4096 Oct 9 19:23 Share
However, when I input
!ls -l | awk '{print $1}'
It gives me:
drwxr-xr-x
drwxr-xr-x
drwxrwxr-x
drwxr-xr-x
drwxr-xr-x
drwxr-xr-x
drwxr-xr-x
drwx------
It's so annoying about the "$" symbol. And the most ugly thing is that I also can't transport the variables in python to shell using '$' when I use 'awk' function. How can I deal with it?
I'm not familiar with ipython but to address the part about passing the values of shell variables to awk: you do that with 'awk -v variable=value', so if you have a shell variable "$1" that contains the value "3" and you want awk to print the 3rd field of your input based on that, then the syntax is:
awk -v f="$1" '{ print $f }'
so in the above you could try:
!ls -l | awk -v f="$1" '{print $f}'
or if doubling the shell "$"s is required:
!ls -l | awk -v f="$$1" '{print $f}'
Hope that helps.

How to check if a Unix .tar.gz file is a valid file without uncompressing?

I have found the question How to determine if data is valid tar file without a file?, but I was wondering: is there a ready made command line solution?
What about just getting a listing of the tarball and throw away the output, rather than decompressing the file?
tar -tzf my_tar.tar.gz >/dev/null
Edited as per comment. Thanks zrajm!
Edit as per comment. Thanks Frozen Flame! This test in no way implies integrity of the data. Because it was designed as a tape archival utility most implementations of tar will allow multiple copies of the same file!
you could probably use the gzip -t option to test the files integrity
http://linux.about.com/od/commands/l/blcmdl1_gzip.htm
from: http://unix.ittoolbox.com/groups/technical-functional/shellscript-l/how-to-test-file-integrity-of-targz-1138880
To test the gzip file is not corrupt:
gunzip -t file.tar.gz
To test the tar file inside is not corrupt:
gunzip -c file.tar.gz | tar -t > /dev/null
As part of the backup you could probably just run the latter command and
check the value of $? afterwards for a 0 (success) value. If either the tar
or the gzip has an issue, $? will have a non zero value.
If you want to do a real test extract of a tar file without extracting to disk, use the -O option. This spews the extract to standard output instead of the filesystem. If the tar file is corrupt, the process will abort with an error.
Example of failed tar ball test...
$ echo "this will not pass the test" > hello.tgz
$ tar -xvzf hello.tgz -O > /dev/null
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error exit delayed from previous errors
$ rm hello.*
Working Example...
$ ls hello*
ls: hello*: No such file or directory
$ echo "hello1" > hello1.txt
$ echo "hello2" > hello2.txt
$ tar -cvzf hello.tgz hello[12].txt
hello1.txt
hello2.txt
$ rm hello[12].txt
$ ls hello*
hello.tgz
$ tar -xvzf hello.tgz -O
hello1.txt
hello1
hello2.txt
hello2
$ ls hello*
hello.tgz
$ tar -xvzf hello.tgz
hello1.txt
hello2.txt
$ ls hello*
hello1.txt hello2.txt hello.tgz
$ rm hello*
You can also check contents of *.tag.gz file using pigz (parallel gzip) to speedup the archive check:
pigz -cvdp number_of_threads /[...]path[...]/archive_name.tar.gz | tar -tv > /dev/null
A nice option is to use tar -tvvf <filePath> which adds a line that reports the kind of file.
Example in a valid .tar file:
> tar -tvvf filename.tar
drwxr-xr-x 0 diegoreymendez staff 0 Jul 31 12:46 ./testfolder2/
-rw-r--r-- 0 diegoreymendez staff 82 Jul 31 12:46 ./testfolder2/._.DS_Store
-rw-r--r-- 0 diegoreymendez staff 6148 Jul 31 12:46 ./testfolder2/.DS_Store
drwxr-xr-x 0 diegoreymendez staff 0 Jul 31 12:42 ./testfolder2/testfolder/
-rw-r--r-- 0 diegoreymendez staff 82 Jul 31 12:42 ./testfolder2/testfolder/._.DS_Store
-rw-r--r-- 0 diegoreymendez staff 6148 Jul 31 12:42 ./testfolder2/testfolder/.DS_Store
-rw-r--r-- 0 diegoreymendez staff 325377 Jul 5 09:50 ./testfolder2/testfolder/Scala.pages
Archive Format: POSIX ustar format, Compression: none
Corrupted .tar file:
> tar -tvvf corrupted.tar
tar: Unrecognized archive format
Archive Format: (null), Compression: none
tar: Error exit delayed from previous errors.
I have tried the following command and they work well.
bzip2 -t file.bz2
gunzip -t file.gz
However, we can found these two command are time-consuming. Maybe we need some more quick way to determine the intact of the compress files.
These are all very sub-optimal solutions. From the GZIP spec
ID2 (IDentification 2)
These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139
(0x8b, \213), to identify the file as being in gzip format.
Has to be coded into whatever language you're using.
> use the -O option. [...] If the tar file is corrupt, the process will abort with an error.
Sometimes yes, but sometimes not. Let's see an example of a corrupted file:
echo Pete > my_name
tar -cf my_data.tar my_name
# // Simulate a corruption
sed < my_data.tar 's/Pete/Fool/' > my_data_now.tar
# // "my_data_now.tar" is the corrupted file
tar -xvf my_data_now.tar -O
It shows:
my_name
Fool
Even if you execute
echo $?
tar said that there was no error:
0
but the file was corrupted, it has now "Fool" instead of "Pete".