Floating point serialization, lexicographical comparison == floating point comparison - serialization

I'm looking for a way to serialize floating points so that in their serialized form a lexicographical comparison is the same as a floating point comparison. I think it is possible by storing it in the form:
| signed bit (1 for positive) | exponent | significand |
The exponent and the significand would be serialized as big-endian and the complement would be taken for negative numbers.
Would this work? I don't mind if it breaks for NaN, but having INF comparison working would be nice.

The format of IEEE numbers are specifically designed so that "plain" integer comparison could be used. However, this only applies when two numbers of the same sign is compared.
Your suggestion to complement the numbers when they are negative is sound, so this will work.
This will work for +-Inf:s and for subnormal numbers. NaN:s, however, will not work, or rather, they will be considered "larger" than inf:s.
The only problematic case is "-Zero" (i.e. sign=1, exponent=0, and mantissa=0). Accoring to IEEE, Zero == -Zero. You have to decide if you want to emit -Zero as Zero, treat them as different, or add special code to the comparison routine.

Related

In PDF format syntax should number 1.e10 be written as 10000000000.?

Looking at the PDF Referene ver 1.7 about how objects of type number
are writen according to valid syntax it informs.
Note: PDF does not support the PostScript syntax for numbers with
nondecimal radices (such as 16#FFFE ) or in exponential format (such
as 6.02E23 ).
However it also does not mandate a maximum range the numbers should be in. This seems to suggest it would be correct to write
1.00E10 as 10000000000
or
1.00E-50 as 0.00000000000000000000000000000000000000000000000001
This question has hence 2 aspects:
a) is the notation correct (as provided in the examples?
b) does pdf format expect implementations to use (or at least fall back
to some bigint/bigfloat handling) of numbers, as it seems to not provide
any range for the numbers?
First of all, for normative information on PDF you should refer to the appropriate ISO standards, in particular ISO 32000. Yes, Part 1 (ISO 32000-1) in particular is derived from the PDF reference 1.7 without that many changes, but not without changes either. (Ok, in some situations one has to consult the old PDF reference, too, to understand some of these changes.)
Adobe has published a copy thereof (with "ISO" in the page headers removed) on its web site: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf
Now to your question:
According to ISO 32000, both part 1 and 2:
An integer shall be written as one or more decimal digits optionally preceded by a sign. [...]
A real value shall be written as one or more decimal digits with an optional sign and a leading, trailing, or embedded PERIOD (2Eh) (decimal point).
(section 7.3.3 "Numeric Objects")
Thus, concerning your question a)
is the notation correct (as provided in the examples?
Yes, 10000000000 is an integer valued numeric object, 0.00000000000000000000000000000000000000000000000001 is a real valued numeric object.
Concerning your question b)
does pdf format expect implementations to use (or at least fall back to some bigint/bigfloat handling) of numbers, as it seems to not provide any range for the numbers?
No, in the same section as quoted above you also find
The range and precision of numbers may be limited by the internal representations used in the computer on which the conforming reader is running; Annex C gives these limits for typical implementations.
and Annex C recommends at least the following limits:
integer
2,147,483,647
Largest integer value; equal to 231 − 1.
integer
-2,147,483,648
Smallest integer value; equal to −231
real
±3.403 × 1038
Largest and smallest real values (approximate).
real
±1.175 × 10-38
Nonzero real values closest to 0 (approximate). Values closer than these are automatically converted to 0.
real
5
Number of significant decimal digits of precision in fractional part (approximate).
(ISO 32000-1)
Integers
Integer values (such as object numbers) can often be expressed within 32 bits.
Real numbers
Modern computers often represent and process real numbers using IEEE Standard for Floating-Point Arithmetic (IEEE 754) single or double precision.
(ISO 32000-2)

Why BigFloat.to_s is not precise enough?

I am not sure if this is a bug. But I've been playing with big and I cant understand why this code works this way:
https://carc.in/#/r/2w96
Code
require "big"
x = BigInt.new(1<<30) * (1<<30) * (1<<30)
puts "BigInt: #{x}"
x = BigFloat.new(1<<30) * (1<<30) * (1<<30)
puts "BigFloat: #{x}"
puts "BigInt from BigFloat: #{x.to_big_i}"
Output
BigInt: 1237940039285380274899124224
BigFloat: 1237940039285380274900000000
BigInt from BigFloat: 1237940039285380274899124224
First I though that BigFloat requires to change BigFloat.default_precision to work with bigger number. But from this code it looks like it only matters when trying to output #to_s value.
Same with precision of BigFloat set to 1024 (https://carc.in/#/r/2w98):
Output
BigInt: 1237940039285380274899124224
BigFloat: 1237940039285380274899124224
BigInt from BigFloat: 1237940039285380274899124224
BigFloat.to_s uses LibGMP.mpf_get_str(nil, out expptr, 10, 0, self). Where GMP is saying:
mpf_get_str (char *str, mp_exp_t *expptr, int base, size_t n_digits, const mpf_t op)
Convert op to a string of digits in base base. The base argument may vary from 2 to 62 or from -2 to -36. Up to n_digits digits will be generated. Trailing zeros are not returned. No more digits than can be accurately represented by op are ever generated. If n_digits is 0 then that accurate maximum number of digits are generated.
Thanks.
In GMP (it applies to all languages not just Crystal), integers (C mpz_t, Crystal BigInt) and floats (C mpf_t, Crystal BigFloat) have separate default precision.
Also, note that using an explicit precision is better than setting a default one, because the default precision might not be reentrant (it depends on a configure-time switch). Also, if someone reads only a part of your code, they may skip the part with setting the default precision and assume a wrong one. Although I do not know the Crystal binding well, I assume that such functionality is exposed somewhere.
The zero parameter passed to mpf_get_str means to guess the value from the precision. I know the number of significant digits is proportional and close to precision / log2(10). Floating point numbers have finite precision. In that case, it was not the mpf_get_str call which made the last digits zero - it was the internal representation that did not keep such data. It looks like your (default) precision is too small to store all the necessary digits.
To summarize, there are two solutions:
Set a global default precision. Although this approach will work, it will require to either change the default precision frequently, or use one in the whole program. Both ways, the approach with the default precision is a form of procrastination which is going to have its vengeance later.
Set a precision on variable basis. This is a better solution than the former. Although it requires more code (1-2 more lines per variable initialization), it is going to pay back later. For example, in a space object tracking system, the physics calculations have to be super-precise, but other systems could use lower precision numbers for speed and memory saving.
I am still unsure what made the conversion BigFloat --> BigInt yield the missing digits.

Why is there one more negative int than positive int?

The upper limit for any int data type (excluding tinyint), is always one less than the absolute value of the lower limit.
For example, the upper limit for an int is 2,147,483,647 and ABS(lower limit) = 2,147,483,648.
Is there a reason why there is always one more negative int than positive int?
EDIT: Changed since question isn't directly related to DB's
The types you provided are signed integers. Let's see one byte(8-bit) example. With 1 byte you have 2^8 combinations which gives you 256 possible numbers to store.
Now you want to have the same number of positive and negative numbers (each group should have 128).
The point is 0 doesn't have +0 and -0. There is only one 0.
So you end up with range -128..-1..0..1..127.
The same logic works for 16/32/64-bit.
EDIT:
Why the range is -128 to 127?
It depends on how you represent signed integer:
Signed magnitude representation
Ones' complement
Two's complement
This question isn't really related to databases.
As lad2025 points out, there are an even number of values. So, by including 0, there would be one more positive or negative value. The question you are asking seems to be: "Why is there one more negative value than positive value?"
Basically, the reason is the sign-bit. One possible implementation of negative numbers is to use n - 1 bits for the absolute value and then 0 and 1 for the sign bit. The problem with this approach is that it permits +0 and -0. That is not desirable.
To fix this, computer scientists devised the twos-complement representation for signed integers. (Wikipedia explains this in more detail.) Basically, this representation maintains the concept of a sign bit that can be tested. But it changes the representation. If +1 is represented as 001, then -1 is represented as 111. That is, the negative value is the bit-wise complement of the positive value minus one. In fact the negative is always generated by subtracting 1 and using the bit-wise complement.
The issue is then the value 100 (followed by any number of zeros). The sign bit is set, so it is negative. However, when you subtract 1 and invert, it becomes itself again (011 --> 100). There is an argument for calling this "infinity" or "not a number". Instead it is assigned the smallest possible negative number.
Let's say you have a 4byte (32 bit) integer. The range defined by C++ is -231 to 231-1.
So we end up with a range -231.....0......231.
We can think of this as having 231 non negative integers (note 0 is included) and 231 negative integers.

How to write negative number in obj C?

I´m trying to write a negative number like this:
} else if ([newsCondition.temperature floatValue] == -7.0f) {
but that won´t trigger it and the negative symbol is black whilst the number is blue. How can I write the number so that it triggers when temperature isEqual to -7.0 degrees?
The way you've written your negative number (-7.0f) is correct.
As for your code not triggering: the floating point representation of numbers is not perfect, and you have to be aware of these issues when comparing floating point numbers to each other.
If you're wanting to compare two floating point numbers, you can use an 'epsilon' (i.e. acceptable error) for the comparison. This is basically checking if the numbers are close enough.
Simple naive example:
#define EPSILON 0.00001f
float x = 0.09f;
float y = 0.0901f;
if (abs(y - x) < EPSILON) {
// close enough to be considered equal;
// do something here
}
For more discussion, see http://floating-point-gui.de/errors/comparison/
Floating-point arithmetic is considered an esoteric subject by many
people. This is rather surprising because floating-point is ubiquitous
in computer systems. Almost every language has a floating-point
datatype; computers from PCs to supercomputers have floating-point
accelerators; most compilers will be called upon to compile
floating-point algorithms from time to time; and virtually every
operating system must respond to floating-point exceptions such as
overflow. This paper presents a tutorial on those aspects of
floating-point that have a direct impact on designers of computer
systems. It begins with background on floating-point representation
and rounding error, continues with a discussion of the IEEE
floating-point standard, and concludes with numerous examples of how
computer builders can better support floating-point.
From What Every Computer Scientist Should Know About Floating-Point Arithmetic

How can I use SYNCSORT to format a Packed Decimal field with a specifc sign value?

I want to use SYNCSORT to force all Packed Decimal fields to a negative sign value. The critical requirement is the 2nd nibble must be Hex 'D'. I have a method that works but it seems much too complex. In keeping with the KISS principle, I'm hoping someone has a better method. Perhaps using a bit mask on the last 4 bits? Here is the code I have come up with. Is there a better way?
*
* This sort logic is intended to force all Packed Decimal amounts to
* have a negative sign with a B'....1101' value (Hex 'xD').
*
SORT FIELDS=COPY
OUTFIL FILES=1,
INCLUDE=(8,1,BI,NE,B'....1..1',OR, * POSITIVE PACKED DECIMAL
8,1,BI,EQ,B'....1111'), * UNSIGNED PACKED DECIMAL
OUTREC=(1:1,7, * INCLUDING +0
8:(-1,MUL,8,1,PD),PD,LENGTH=1,
9:9,72)
OUTFIL FILES=2,
INCLUDE=(8,1,BI,EQ,B'....1..1',AND, * NEGATIVE PACKED DECIMAL
8,1,BI,NE,B'....1111'), * NOT UNSIGNED PACKED DECIMAL
OUTREC=(1:1,7, * INCLUDING -0
8:(+1,MUL,8,1,PD),PD,LENGTH=1,
9:9,72)
In the code that processes the VSAM file, can you change the read logic to GET with KEY GTEQ and check for < 0 on the result instead of doing a specific keyed read?
If you did that, you could accept all three negative packed values xA, xB and xD.
Have you considered writing an E15 user exit? The E15 user exit lets you
manipulate records as they are input to the sort process. In this case you would have a
REXX, COBOL or other LE compatible language subroutine patch the packed decimal sign field as it is input to the sort process. No need to split into multiple files to be merged later on.
Here is a link to example JCL
for invoking an E15 exit from DFSORT (same JCL for SYNCSORT). Chapter 4 of this reference
describes how to develop User Exit routines, again this is a DFSORT manual but I believe SyncSort is
fully compatible in this respect. Writing a user exit is no different than writing any other subroutine - get the linkage right and the rest is easy.
This is a very general outline, but I hope it helps.
Okay, it took some digging but NEALB's suggestion to seek help on MVSFORUMS.COM paid off... here is the final result. The OUTREC logic used with SORT/MERGE replaces OUTFIL and takes advantage of new capabilities (IFTHEN, WHEN and OVERLAY) in Syncsort 1.3 that I didn't realize existed. It pays to have current documentation available!
*
* This MERGE logic is intended to assert that the Packed Decimal
* field has a negative sign with a B'....1101' value (Hex X'.D').
*
*
MERGE FIELDS=(27,5.4,BI,A),EQUALS
SUM FIELDS=NONE
OUTREC IFTHEN=(WHEN=(32,1,BI,NE,B'....1..1',OR,
32,1,BI,EQ,B'....1111'),
OVERLAY=(32:(-1,MUL,32,1,PD),PD,LENGTH=1)),
IFTHEN=(WHEN=(32,1,BI,EQ,B'....1..1',AND,
32,1,BI,NE,B'....1111'),
OVERLAY=(32:(+1,MUL,32,1,PD),PD,LENGTH=1))
Looking at the last byte of a packed field is possible. You want positive/unsigned to negative, so if it is greater than -1, subtract it from zero.
From a short-lived Answer by MikeC, it is now known that the data contains non-preferred signs (that is, it can contain A through F in the low-order half-byte, whereas a preferred sign would be C (positive) or D (negative). F is unsigned, treated as positive.
This is tested with DFSORT. It should work with SyncSORT. Turns out that DFSORT can understand a negative packed-decimal zero, but it will not create a negative packed-decimal zero (it will allow a zoned-decimal negative zero to be created from a negative zero packed-decimal).
The idea is that a non-preferred sign is valid and will be accurately signed for input to a decimal machine instruction, but the result will always be a preferred sign, and will be correct. So by adding zero first, the field gets turned into a preferred sign and then the test for -1 will work as expected. With data in the sign-nybble for packed-decimal fields, SORT has some specific and documented behaviours, which just don't happen to help here.
Since there is only one value to deal with to become the negative zero, X'0C', after the normalisation of signs already done, there is a simple test and replacement with a constant of X'0D' for the negative zero. Since the negative zero will not work, the second test is changed from the original minus one to zero.
With non-preferred signs in the data:
SORT FIELDS=COPY
INREC IFTHEN=(WHEN=INIT,
OVERLAY=(32:+0,ADD,32,1,PD,TO=PD,LENGTH=1)),
IFTHEN=(WHEN=(32,1,CH,EQ,X'0C'),
OVERLAY=(32:X'0D')),
IFTHEN=(WHEN=(32,1,PD,GT,0),
OVERLAY=(32:+0,SUB,32,1,PD,TO=PD,LENGTH=1))
With preferred signs in the data:
SORT FIELDS=COPY
INREC IFTHEN=(WHEN=(32,1,CH,EQ,X'0C'),
OVERLAY=(32:X'0D')),
IFTHEN=(WHEN=(32,1,PD,GT,0),
OVERLAY=(32:+0,SUB,32,1,PD,TO=PD,LENGTH=1))
Note: If non-preferred signs are stuffed through a COBOL program not using compiler option NUMPROC(NOPFD) then results will be "interesting".