DEFLATE: how to handle "no distance codes" case? - gzip

I mostly get RFC 1951, however I'm not too clear on how to manage the case where (when using dynamic Huffman tables) no distance codes are needed or present. For example, let's take the input:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890987654321ZYXWVUTSR
where no backreference is possible since there are no repetitions of length >= 3.
According to RFC 1951, at least one distance code must be present regardless, otherwise it wouldn't be possible to encode HDIST - 1. I understand, according to the reference, that such code should be of zero bits to signal "no distance codes".
One distance code of zero bits means that there are no distance codes
used at all (the data is all literals).
In infgen symbols, I'd expect to see a dist 0 0.
Analyzing what gzip does with infgen, however, I see that TWO distance codes are emitted (each 1 bit long) for the above input (even though none is actually used then):
! infgen 2.4 output
!
gzip
!
last
dynamic
litlen 48 6
litlen 49 6
litlen 50 6
...cut...
litlen 121 6
litlen 122 6
litlen 256 6
dist 0 1
dist 1 1
literal 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890987654321Z
literal 'YXWVUTSR
end
!
crc
length
So what's the correct behavior in these cases?

If there are no matches in the deflate block, there will be no lengths from the length/literal code, and so the decoder will never look for a distance code. In that case, what would make the most sense is to provide no information at all about a distance code.
However the format does not permit that, since the 5-bit HDIST value in the header is interpreted as 1 to 32 distance codes, for which lengths must be provided for in the header. You must provide at least one distance code length in the header, even though it will never be used.
There are several valid things you can do in that case. RFC 1951 notes you can provide a single distance code (HDIST == 0, meaning one length), with length zero, which would be just one zero in the list of lengths.
It is also permitted to provide a single code of length one, or you could do as zlib is doing, which is to provide two codes of length one. You can actually put any valid distance code description you like there, and it will still be accepted.
As to why zlib's deflate is choosing to define two codes there, I can only guess that Jean-loup was being conservative, writing something he knew that even an over-simplified inflator would have to accept. Both gzip and zopfli do the same thing. They all do the same thing when there is only one distance code used. They could emit just the single one-bit distance code, per the RFC, but they emit two single-bit distance codes, one of which is never used.
Really the right thing to do would be to write a single zero length as noted in the RFC, which would take the fewest number of bits in the header. I will consider updating zlib to do that, to eke out a few more bits of compression.

Related

Bsd correction possible?

I am looking at using the BSD checksum described here at wiki BSD does anyone know if you can use it for basic error correction?
Consider an 8 bit or 16 bit left rotating checksum where all the message bytes are supposed to be zero, but one them has a single bit error. The checksum will detect the error, but you'd get the same checksum for message[0] = 0x01, or message[1] = 0x02, ... , or message[7] = 0x80. The checksum can't determine which of these 8 (or more) possible error cases occurred, so it can't be used for error correction.
You'd need at least something like a Hamming code, BCH code or RS code to be able to correct one more bit errors. Since you have CRC as a tag, a single bit correcting binary BCH code is essentially the same as a CRC using a "primitive" polynomial that is the basis for a finite field, if the message length (including the CRC) is shorter than the number of possible values in the finite field. For example, a 15 bit message would have 11 data bits and 4 "parity" bits, based on a finite field of GF(2^4) (GF(16)).

Method to get non-base units?

Is there a method of using the exponent properties of LabView units for carrying custom units? For example I would find it convenient to use milli-Amperes instead of Amperes in my data wires.
My first attempt at doing so looks like this, but trying to get the value out at the end gives me nothing.
I would find it convenient to use milli-Amperes instead of Amperes in my data wires
For a wire, it's not possible, and it's not a problem, here's why:
I'm afraid what you want make little sense, since you're milli-Amperes instead of Amperes refers to representing your data, while a wire is just raw data. Adding the milli- to a floating point changes the exponent, not the mantissa, so there's no loss or gain of precision in the value that your number carries.
Now if we talk about an indicator which is technically a display of the wire value, you change the unit from "A" to "mA" to have the display you want.
Finally, in your attempt with "set numeric info", the -3 factor added next to Amperes means the unit is A^-3, not mA.
You can use data that don't use units, however than you will loose your automatic check of the units.
For display properties you can tweak the display format to show different outputs:
This format string is constructed as following:
% numeric
^ engineering notation, exponents in multiples of three
# no trailing zeros
_6 six significat digits
e scientific notation (1e1 for instance)
The prefix is the best way to affect the presentation of the value on a specific front panel.
When passing data from VI to VI, the prefix is not passed, and the data uses the base ( Amps, Volts, etc...)
In my example below, the unitless value 3 is assigned units of Amp in mA.vi. The front panel indicator is set to show units of mA.
In Watts.vi I multiply the Amps OUT of mA.vi by a constant of 9V and the result is wired to the indicator x*y.
x*y has units of W and I changed the prefix to k for presentation.
The NI forums have several threads that report certain functions (square and square root specifically) can cause unit errors or broken wires. Most folks don't even know the units capability exists, and most that do have tried and abandoned them. :)

u32 filter matching clarification

I've been following through the tutorial for u32 pattern matching here: Link
Most of it is straightforward until I get to the section where the IP header length is grabbed, using the following:
0>>22&0x3C
I don't understand why this was chosen instead of:
0>>24&0x0F
From my understanding, the filter chosen will shift the first byte 22 to the right, then apply a mask to strip the first and last 2 bits off, giving us to the correct lower nibble for the IP header length. The second will complete the full shift to the right, only needing to strip the first 4 bits.
My question is, why was the first chosen and not the second? I believe it's because of the multiply that needs to take place, but I don't understand what effect that operation would have if both filters would return the correct value.
The IP Header length is specified in 32 bit words rather than 8 bit bytes, so whatever value is in the IPH field will need to be multiplied by 4 which can be accomplished by a shift left of 2; therefore instead of shifting right by 24, masking by 0x0F and shifting right by two the author decided only to shift right by 22 and masking by 0x03c.
That is, both operations don't return the same value, the first operation returns the value of the second multiplied by 4. To get the same result as the first you would
0>>24&0x0F<<2

Trying to understand nbits value from stratum protocol

I'm looking at the stratum protocol and I'm having a problem with the nbits value of the mining.notify method. I have trouble calculating it, I assume it's the currency difficulty.
I pull a notify from a dogecoin pool and it returned 1b3cc366 and at the time the difficulty was 1078.52975077.
I'm assuming here that 1b3cc366 should give me 1078.52975077 when converted. But I can't seem to do the conversion right.
I've looked here, here and also tried the .NET function BitConverter.Int64BitsToDouble.
Can someone help me understand what the nbits value signify?
You are right, nbits is current network difficulty.
Difficulty encoding is throughly described here.
Hexadecimal representation like 0x1b3cc366 consists of two parts:
0x1b -- number of bytes in a target
0x3cc366 -- target prefix
This means that valid hash should be less than 0x3cc366000000000000000000000000000000000000000000000000 (it is exactly 0x1b = 27 bytes long).
Floating point representation of difficulty shows how much current target is harder than the one used in the genesis block.
Satoshi decided to use 0x1d00ffff as a difficulty for the genesis block, so the target was
0x00ffff0000000000000000000000000000000000000000000000000000.
And 1078.52975077 is how much current target is greater than the initial one:
$ echo 'ibase=16;FFFF0000000000000000000000000000000000000000000000000000 / 3CC366000000000000000000000000000000000000000000000000' | bc -l
1078.52975077482646448605

How can I use SYNCSORT to format a Packed Decimal field with a specifc sign value?

I want to use SYNCSORT to force all Packed Decimal fields to a negative sign value. The critical requirement is the 2nd nibble must be Hex 'D'. I have a method that works but it seems much too complex. In keeping with the KISS principle, I'm hoping someone has a better method. Perhaps using a bit mask on the last 4 bits? Here is the code I have come up with. Is there a better way?
*
* This sort logic is intended to force all Packed Decimal amounts to
* have a negative sign with a B'....1101' value (Hex 'xD').
*
SORT FIELDS=COPY
OUTFIL FILES=1,
INCLUDE=(8,1,BI,NE,B'....1..1',OR, * POSITIVE PACKED DECIMAL
8,1,BI,EQ,B'....1111'), * UNSIGNED PACKED DECIMAL
OUTREC=(1:1,7, * INCLUDING +0
8:(-1,MUL,8,1,PD),PD,LENGTH=1,
9:9,72)
OUTFIL FILES=2,
INCLUDE=(8,1,BI,EQ,B'....1..1',AND, * NEGATIVE PACKED DECIMAL
8,1,BI,NE,B'....1111'), * NOT UNSIGNED PACKED DECIMAL
OUTREC=(1:1,7, * INCLUDING -0
8:(+1,MUL,8,1,PD),PD,LENGTH=1,
9:9,72)
In the code that processes the VSAM file, can you change the read logic to GET with KEY GTEQ and check for < 0 on the result instead of doing a specific keyed read?
If you did that, you could accept all three negative packed values xA, xB and xD.
Have you considered writing an E15 user exit? The E15 user exit lets you
manipulate records as they are input to the sort process. In this case you would have a
REXX, COBOL or other LE compatible language subroutine patch the packed decimal sign field as it is input to the sort process. No need to split into multiple files to be merged later on.
Here is a link to example JCL
for invoking an E15 exit from DFSORT (same JCL for SYNCSORT). Chapter 4 of this reference
describes how to develop User Exit routines, again this is a DFSORT manual but I believe SyncSort is
fully compatible in this respect. Writing a user exit is no different than writing any other subroutine - get the linkage right and the rest is easy.
This is a very general outline, but I hope it helps.
Okay, it took some digging but NEALB's suggestion to seek help on MVSFORUMS.COM paid off... here is the final result. The OUTREC logic used with SORT/MERGE replaces OUTFIL and takes advantage of new capabilities (IFTHEN, WHEN and OVERLAY) in Syncsort 1.3 that I didn't realize existed. It pays to have current documentation available!
*
* This MERGE logic is intended to assert that the Packed Decimal
* field has a negative sign with a B'....1101' value (Hex X'.D').
*
*
MERGE FIELDS=(27,5.4,BI,A),EQUALS
SUM FIELDS=NONE
OUTREC IFTHEN=(WHEN=(32,1,BI,NE,B'....1..1',OR,
32,1,BI,EQ,B'....1111'),
OVERLAY=(32:(-1,MUL,32,1,PD),PD,LENGTH=1)),
IFTHEN=(WHEN=(32,1,BI,EQ,B'....1..1',AND,
32,1,BI,NE,B'....1111'),
OVERLAY=(32:(+1,MUL,32,1,PD),PD,LENGTH=1))
Looking at the last byte of a packed field is possible. You want positive/unsigned to negative, so if it is greater than -1, subtract it from zero.
From a short-lived Answer by MikeC, it is now known that the data contains non-preferred signs (that is, it can contain A through F in the low-order half-byte, whereas a preferred sign would be C (positive) or D (negative). F is unsigned, treated as positive.
This is tested with DFSORT. It should work with SyncSORT. Turns out that DFSORT can understand a negative packed-decimal zero, but it will not create a negative packed-decimal zero (it will allow a zoned-decimal negative zero to be created from a negative zero packed-decimal).
The idea is that a non-preferred sign is valid and will be accurately signed for input to a decimal machine instruction, but the result will always be a preferred sign, and will be correct. So by adding zero first, the field gets turned into a preferred sign and then the test for -1 will work as expected. With data in the sign-nybble for packed-decimal fields, SORT has some specific and documented behaviours, which just don't happen to help here.
Since there is only one value to deal with to become the negative zero, X'0C', after the normalisation of signs already done, there is a simple test and replacement with a constant of X'0D' for the negative zero. Since the negative zero will not work, the second test is changed from the original minus one to zero.
With non-preferred signs in the data:
SORT FIELDS=COPY
INREC IFTHEN=(WHEN=INIT,
OVERLAY=(32:+0,ADD,32,1,PD,TO=PD,LENGTH=1)),
IFTHEN=(WHEN=(32,1,CH,EQ,X'0C'),
OVERLAY=(32:X'0D')),
IFTHEN=(WHEN=(32,1,PD,GT,0),
OVERLAY=(32:+0,SUB,32,1,PD,TO=PD,LENGTH=1))
With preferred signs in the data:
SORT FIELDS=COPY
INREC IFTHEN=(WHEN=(32,1,CH,EQ,X'0C'),
OVERLAY=(32:X'0D')),
IFTHEN=(WHEN=(32,1,PD,GT,0),
OVERLAY=(32:+0,SUB,32,1,PD,TO=PD,LENGTH=1))
Note: If non-preferred signs are stuffed through a COBOL program not using compiler option NUMPROC(NOPFD) then results will be "interesting".