Why do 7zip and gzip add 0x0A at the end of gzip compressed data - gzip

Wikipedia states(wrongly apparently at least for real world status) that gzip format demands that last 4 bytes are uncompressed size (mod 4GB).
I have fond the credible answer on SO that explains that sometimes there is junk at the end of the gzip data so you can not reply on last 4 bytes being size.
Unfortunately this matches my experiments(both terminal gzip and 7zip archiver add 0x0A byte for my small test example).
My question is what is the reason for this gzip and 7zip doing this?
Obviously they do it like that because they are written to do that, but I wonder about the motivation to break the format specification.
I know that some formats have padding requirements, but I found nothing for gzip.
edit:process:
echo "Testing rocks:) Debugging sucks :(" >> test_data
rm test_data.gz
gzip -6 test_data
vim -c "noautocmd edit test_data.gz"
in vim: :%!xxd -c 4
and last 5 bytes are size(35) and 0x0a (23 hex=35, then 00 00 00 0a)
7zip process is just using GUI to make a archive.

Your testing process is wrong. Vim is what adds 0x0A to the end of the file. Here is a simpler test, using xxd directly (why did you even use Vim?):
echo "Testing rocks:) Debugging sucks :(" >> test_data
gzip -6 test_data
xxd -c 4 test_data.gz
Output:
0000000: 1f8b 0808 ....
0000004: 453c 5d59 E<]Y
0000008: 0003 7465 ..te
000000c: 7374 5f64 st_d
0000010: 6174 6100 ata.
0000014: 0b49 2d2e .I-.
0000018: c9cc 4b57 ..KW
000001c: 28ca 4fce (.O.
0000020: 2eb6 d254 ...T
0000024: 7049 4d2a pIM*
0000028: 4d4f 0789 MO..
000002c: 1497 0245 ...E
0000030: 14ac 34b8 ..4.
0000034: 00f4 a724 ...$
0000038: 5623 0000 V#..
000003c: 00 .
As you can see, there is no 0x0A at the end. I think Vim adds newlines to the end of files by default, if they are not present.

Related

Why doesn't 'utf8-c8' encoding work when reading filehandles

I wish to read byte sequences that will not decode as valid UTF-8, specifically byte sequences that correspond to high and low surrogates code points. The result should be a raku string.
I read that, in raku, the 'utf8-c8' encoding can be used for this purpose.
Consider code point U+D83F. It is a high surrogate (reserved for the high half of UTF-16 surrogate pairs).
U+D83F has a byte sequence of 0xED 0xA0 0xBF, if encoded as UTF-8.
Slurping a file? Works
If I slurp a file containing this byte sequence, using 'utf8-c8' as the encoding, I get the expected result:
echo -n $'\ud83f' >testfile # Create a test file containing the byte sequence
myprog1.raku:
#!/usr/local/bin/raku
$*OUT.encoding('utf8-c8');
print slurp('testfile', enc => 'utf8-c8');
$ ./myprog1.raku | od -An -tx1
ed a0 bf
✔️ expected result
Slurping a filehandle? Doesn't work
But if I switch from slurping a file path to slurping a filehandle, it doesn't work, even though I set the filehandle's encoding to 'utf8-c8':
myprog2.raku
#!/usr/local/bin/raku
$*OUT.encoding('utf8-c8');
my $fh = open "testfile", :r, :enc('utf8-c8');
print slurp($fh, enc => 'utf8-c8');
#print $fh.slurp; # I tried this too: same error
$ ./myprog2.raku
Error encoding UTF-8 string: could not encode Unicode Surrogate codepoint 55359 (0xD83F)
in block <unit> at ./myprog2.raku line 4
Environment
Edit 2022-10-30: I originally used my distro's package (Fedora Linux 36: Rakudo version 2020.07). I just downloaded the latest Rakudo binary release (2022.07-01). Result was the same.
$ /usr/local/bin/raku --version
Welcome to Rakudo™ v2022.07.
Implementing the Raku® Programming Language v6.d.
Built on MoarVM version 2022.07.
$ uname -a
Linux hx90 5.19.16-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Oct 16 22:50:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: Fedora
Description: Fedora release 36 (Thirty Six)
Release: 36
Codename: ThirtySix

Issues converting a small Hex value to a Binary value

I am trying to take the contents of a file that has a Hex number and convert that number to Binary and output to a file.
This is what I am trying but not getting the binary value:
xxd -r -p Hex.txt > Binary.txt
The contents of Hex.txt is: ff
I have also tried FF and 0xFF, but would like to just use ff since the device I am pulling the info from has it in that format.
Instead of 11111111 which it should be, I get a y with 2 dots above it.
If I change it to ee, I get an i with 2 dots. It seems to be reading it just fine but according to what I have read on the xxd -r -p command, it is not outputing it in the correct format.
The other ways I have found to convert Hex to Binary have either also not worked or is a pretty big Bash script that seems unnecessary to do what I thought would be a simple task.
This also gives me the y with 2 dots.
$ for i in $(cat Hex.txt) ; do printf "\x$i" ; done > Binary.txt
For some reason almost every solution I find gives me this format instead of a human readable Binary value with 1s and 0s.
Any help is appreciated. I am planning on using this in a script to pull the Relay values from Digital Loggers devices using curl and giving Home Assistant a readable file to record the Relay State. Digital Loggers curl cmd gives the state of all 8 relays at once using Hex instead of being able to pull the status of a specific relay.
If "file.txt" contains:
fe
0a
and you run this:
perl -ane 'printf("%08b\n",hex($_))' file.txt
You'll get this:
11111110
00001010
If you use it a lot, you might want to make a bash function of it in your login profile along these lines - being extremely respectful of spaces and semi-colons that might look unnecessary:
bin(){ perl -ane 'printf("%08b\n",hex($_))' $1 ; }
Then you'll be able to do:
bin file.txt
If you dislike Perl for some reason, you can achieve something similar without it as follows:
tr '[:lower:]' '[:upper:]' < file.txt |
while read h ; do
echo "obase=2; ibase=16; $h" | bc
done

Contiki compile error, " ERROR: address 0x820003 out of range at line 1740 of..."

I started to use contiki operating system with atmel atmega128rfa1.
I can compile my example, but the hex file is bad. The error is:
ERROR: address 0x820003 out of range at line 1740 of ipso.hex (i am not using IPSO, just i kept this name).
When I compile in linux system the code is program size is 27804 byte and the data is 4809byte.
When I compile in windows the program is 28292 and the data is 4791.
I use only one process and one etimer, I would like to turn on and off 1 led.
the makefile consinst of:
`
TARGET=avr-atmega128rfa1
CONTIKI = ../..
include $(CONTIKI)/Makefile.include
all:
make -f Makefile.ipso TARGET=avr-atmega128rfa1 ipso.elf
avr-objcopy -O ihex -R .eeprom ipso.elf ipso.hex
avr-size -C --mcu=atmega128rfa1 ipso.elf `
i can't program the controller. What is the problem?
thank you.
Special sections in the .elf file start above 0x810000 and must be removed when generating a hex file for programming a particular memory, e.g.
$ avr-objdump -h webserver6.avr-atmega128rfa1
webserver6.avr-atmega128rfa1: file format elf32-avr
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 00001bda 00800200 0000e938 0000ea2c 2**0
CONTENTS, ALLOC, LOAD, DATA
1 .text 0000e938 00000000 00000000 000000f4 2**1
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .bss 000031a6 00801dda 00801dda 00010606 2**0
ALLOC
3 .eeprom 00000029 00810000 00810000 00010606 2**0
CONTENTS, ALLOC, LOAD, DATA
4 .fuse 00000003 00820000 00820000 0001062f 2**0
CONTENTS, ALLOC, LOAD, DATA
5 .signature 00000003 00840000 00840000 00010632 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
So,
avr-objcopy -O ihex -R .eeprom -R .fuse -R signature ipso.elf ipso.hex
alternately, only copy the desired sections:
avr-objcopy -O ihex -j .text -j .data ipso.elf ipso.hex
avr-objcopy --change-section-lma .eeprom=0
this works for me

Weird pcap header of byte sequence 0a 0d 0d 0a created on Mac?

I have a PCAP file that was created on a Mac with mergecap that can be parsed on a Mac with Apple's libpcap but cannot be parsed on a Linux system. combined file has an extra 16-byte header that contains 0a 0d 0d 0a 78 00 00 00 before the 4d 3c 2b 1a intro that's common in pcap files. Here is a hex dump:
0000000: 0a0d 0d0a 7800 0000 4d3c 2b1a 0100 0000 ....x...M<+.....
0000010: ffff ffff ffff ffff 0100 4700 4669 6c65 ..........G.File
0000020: 2063 7265 6174 6564 2062 7920 6d65 7267 created by merg
0000030: 696e 673a 200a 4669 6c65 313a 2037 2e70 ing: .File1: 7.p
0000040: 6361 7020 0a46 696c 6532 3a20 362e 7063 cap .File2: 6.pc
0000050: 6170 200a 4669 6c65 333a 2034 2e70 6361 ap .File3: 4.pca
0000060: 7020 0a00 0400 0800 6d65 7267 6563 6170 p ......mergecap
Does anybody know what this is? or how I can read it on a Linux system with libpcap?
I have a PCAP file
No, you don't. You have a pcap-ng file.
that can be parsed on a Mac with Apple's libpcap
libpcap 1.1.0 and later can also read some pcap-ng files (the pcap API only allows a file to have one link-layer header type, one snapshot length, and one byte order, so only pcap-ng files where all sections have the same byte order and all interfaces have the same link-layer header type and snapshot length are supported), and OS X Snow Leopard and later have a libpcap based on 1.1.x, so they can read those files.
(OS X Mountain Lion and later have tweaked libpcap to allow it to write pcap-ng files as well; the -P flag makes tcpdump write out pcap-ng files, with text comments attached to some outgoing packets indicating the process ID and process name of the process that sent them - pcap-ng allows text comments to be attached to packets.)
but cannot be parsed on a Linux system
Your Linux system probably has an older libpcap version. (Note: do not be confused by Debian and Debian derivatives calling the libpcap package "libpcap0.8" - they're not still using libpcap 0.8.)
combined file has an extra 16-byte header that contains 0a 0d 0d 0a 78 00 00 00
A pcap-ng file is a sequence of "blocks" that start with a 4-byte block type and a 4-byte length, both in the byte order of the host that wrote them.
They're divided into "sections", each one beginning with a "Section Header Block" (SHB); the block type for the SHB is 0x0a0d0d0a, which is byte-order-independent (so that you don't have to know the byte order to read the SHB) and contains carriage returns and line feeds (so that if the file is, for example, transferred between a UN*X system and a Windows system by a tool that thinks it's transferring a text file and that "helpfully" tries to fix line endings, the SHB magic number will be damaged and it will be obvious that the file was corrupted by being transferred in that fashion; think of it as the equivalent of a shock indicator).
The 0x78000000 is the length; what follows it is the "byte-order magic" field, which is 0x1A2B3C4D (which is not the same as the 0xA1B2C3D4 magic number for pcap files), and which serves the same purposes as the pcap magic number, namely:
it lets code identify that the file is a pcap-ng file
it lets code determine the byte order of the section.
(No, you don't need to know the length before looking for the pcap magic number; once you've found the magic number, you then check the length to make sure it's at least 28 and, if it's less than or equal to 28, you reject the block as not being valid.)
Does anybody know what this is?
A (little-endian) pcap-ng file.
or how I can read it on a Linux system with libpcap?
Either read it on a Linux system with a newer version of libpcap (which may mean a newer version of whatever distribution you're using, or may just mean doing an update if that will get you a 1.1.0 or later version of libpcap), read it with Wireshark or TShark (which have their own library for reading capture files, which supports the native pcap and pcap-ng formats, as well as a lot of other formats), or download a newer version of libpcap from tcpdump.org, build it, install it, and then build whatever other tools need to read pcap-ng files with that version of libpcap rather than the one that comes with the system.
Newer versions of Wireshark write pcap-ng files by default, including in tools such as mergecap; you can get them to write pcap files with a flag argument of -F pcap.

How to save and retrieve string with accents in redis?

I do not manage to set and retrieve string with accents in my redis db.
Chars with accents are encoded, how can I retrieve them back as they where set ?
redis> set test téléphone
OK
redis> get test
"t\xc3\xa9l\xc3\xa9phone"
I know this has already been asked
(http://stackoverflow.com/questions/6731450/redis-problem-with-accents-utf-8-encoding) but there is no detailed answer.
The Redis server itself stores all data as a binary objects, so it is not dependent on the encoding. The server will just store what is sent by the client (including UTF-8 chars).
Here are a few experiments:
$ echo téléphone | hexdump -C
00000000 74 c3 a9 6c c3 a9 70 68 6f 6e 65 0a |t..l..phone.|
c3a9 is the representation of the 'é' char.
$ redis-cli
> set t téléphone
OK
> get t
"t\xc3\xa9l\xc3\xa9phone"
Actually the data is correctly stored in the Redis server. However, when it is launched in a terminal, the Redis client interprets the output and applies the sdscatrepr function to transform non printable chars (whose definition is locale dependent, and may be broken for multibyte chars anyway).
A simple workaround is to launch redis-cli with the 'raw' option:
$ redis-cli --raw
> get t
téléphone
Your own application will probably use one of the client libraries rather than redis-cli, so it should not be a problem in practice.