In PDF format syntax should number 1.e10 be written as 10000000000.? - pdf

Looking at the PDF Referene ver 1.7 about how objects of type number
are writen according to valid syntax it informs.
Note: PDF does not support the PostScript syntax for numbers with
nondecimal radices (such as 16#FFFE ) or in exponential format (such
as 6.02E23 ).
However it also does not mandate a maximum range the numbers should be in. This seems to suggest it would be correct to write
1.00E10 as 10000000000
or
1.00E-50 as 0.00000000000000000000000000000000000000000000000001
This question has hence 2 aspects:
a) is the notation correct (as provided in the examples?
b) does pdf format expect implementations to use (or at least fall back
to some bigint/bigfloat handling) of numbers, as it seems to not provide
any range for the numbers?

First of all, for normative information on PDF you should refer to the appropriate ISO standards, in particular ISO 32000. Yes, Part 1 (ISO 32000-1) in particular is derived from the PDF reference 1.7 without that many changes, but not without changes either. (Ok, in some situations one has to consult the old PDF reference, too, to understand some of these changes.)
Adobe has published a copy thereof (with "ISO" in the page headers removed) on its web site: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf
Now to your question:
According to ISO 32000, both part 1 and 2:
An integer shall be written as one or more decimal digits optionally preceded by a sign. [...]
A real value shall be written as one or more decimal digits with an optional sign and a leading, trailing, or embedded PERIOD (2Eh) (decimal point).
(section 7.3.3 "Numeric Objects")
Thus, concerning your question a)
is the notation correct (as provided in the examples?
Yes, 10000000000 is an integer valued numeric object, 0.00000000000000000000000000000000000000000000000001 is a real valued numeric object.
Concerning your question b)
does pdf format expect implementations to use (or at least fall back to some bigint/bigfloat handling) of numbers, as it seems to not provide any range for the numbers?
No, in the same section as quoted above you also find
The range and precision of numbers may be limited by the internal representations used in the computer on which the conforming reader is running; Annex C gives these limits for typical implementations.
and Annex C recommends at least the following limits:
integer
2,147,483,647
Largest integer value; equal to 231 − 1.
integer
-2,147,483,648
Smallest integer value; equal to −231
real
±3.403 × 1038
Largest and smallest real values (approximate).
real
±1.175 × 10-38
Nonzero real values closest to 0 (approximate). Values closer than these are automatically converted to 0.
real
5
Number of significant decimal digits of precision in fractional part (approximate).
(ISO 32000-1)
Integers
Integer values (such as object numbers) can often be expressed within 32 bits.
Real numbers
Modern computers often represent and process real numbers using IEEE Standard for Floating-Point Arithmetic (IEEE 754) single or double precision.
(ISO 32000-2)

Related

Why BigFloat.to_s is not precise enough?

I am not sure if this is a bug. But I've been playing with big and I cant understand why this code works this way:
https://carc.in/#/r/2w96
Code
require "big"
x = BigInt.new(1<<30) * (1<<30) * (1<<30)
puts "BigInt: #{x}"
x = BigFloat.new(1<<30) * (1<<30) * (1<<30)
puts "BigFloat: #{x}"
puts "BigInt from BigFloat: #{x.to_big_i}"
Output
BigInt: 1237940039285380274899124224
BigFloat: 1237940039285380274900000000
BigInt from BigFloat: 1237940039285380274899124224
First I though that BigFloat requires to change BigFloat.default_precision to work with bigger number. But from this code it looks like it only matters when trying to output #to_s value.
Same with precision of BigFloat set to 1024 (https://carc.in/#/r/2w98):
Output
BigInt: 1237940039285380274899124224
BigFloat: 1237940039285380274899124224
BigInt from BigFloat: 1237940039285380274899124224
BigFloat.to_s uses LibGMP.mpf_get_str(nil, out expptr, 10, 0, self). Where GMP is saying:
mpf_get_str (char *str, mp_exp_t *expptr, int base, size_t n_digits, const mpf_t op)
Convert op to a string of digits in base base. The base argument may vary from 2 to 62 or from -2 to -36. Up to n_digits digits will be generated. Trailing zeros are not returned. No more digits than can be accurately represented by op are ever generated. If n_digits is 0 then that accurate maximum number of digits are generated.
Thanks.
In GMP (it applies to all languages not just Crystal), integers (C mpz_t, Crystal BigInt) and floats (C mpf_t, Crystal BigFloat) have separate default precision.
Also, note that using an explicit precision is better than setting a default one, because the default precision might not be reentrant (it depends on a configure-time switch). Also, if someone reads only a part of your code, they may skip the part with setting the default precision and assume a wrong one. Although I do not know the Crystal binding well, I assume that such functionality is exposed somewhere.
The zero parameter passed to mpf_get_str means to guess the value from the precision. I know the number of significant digits is proportional and close to precision / log2(10). Floating point numbers have finite precision. In that case, it was not the mpf_get_str call which made the last digits zero - it was the internal representation that did not keep such data. It looks like your (default) precision is too small to store all the necessary digits.
To summarize, there are two solutions:
Set a global default precision. Although this approach will work, it will require to either change the default precision frequently, or use one in the whole program. Both ways, the approach with the default precision is a form of procrastination which is going to have its vengeance later.
Set a precision on variable basis. This is a better solution than the former. Although it requires more code (1-2 more lines per variable initialization), it is going to pay back later. For example, in a space object tracking system, the physics calculations have to be super-precise, but other systems could use lower precision numbers for speed and memory saving.
I am still unsure what made the conversion BigFloat --> BigInt yield the missing digits.

Fortran: can you explain this formatting string

I have a Fortran program which I need to modify, so I'm reading it and trying to understand. Can you please explain what the formatting string in the following statement means:
write(*,'(1p,(5x,3(1x,g20.10)))') x(jr,1:ncols)
http://www.fortran.com/F77_std/rjcnf0001-sh-13.html
breifly, you are writing three general (g) format floats per line. Each float has a total field width of 20 characters and 10 places to the right of the decimal. Large magnitude numbers are in exponential form.
The 1xs are simply added spaces (which could as well have been accomplished by increasing the field width ie, g21.10 since the numbers are right justified. The 5x puts an additional 5 spaces at the beginning of each line.
The somewhat tricky thing here is tha lead 1p which is a scale factor. It causes the mantissa of all exponential form numbers produced by the following g format to be multiplied by 10, and the exponent changed accordingly, ie instead of the default,
g17.10 -> b0.1234567890E+12
you get:
1p,g17.10 -> b1.2345678900E+11
b denotes a blank in the output. Be sure to allow room for a - in your field width count...
for completeness in the case of scale greater than one the number of decimal places is reduced (preserving the total precision) ie,
3p,g17.10 -> b123.45678900E+09 ! note only 8 digits after the decimal
that is 1p buys you a digit of precision over the default, but you don't get any more. Negative scales cost you precision, preserving the 10 digits:
-7p,g17.10 -> b0.0000000123E+19
I should add, the p scale factor edit descriptor does something completely different on input. Read the docs...
I'd like to add slightly to George's answer. Unfortunately this is a very nasty (IMO) part of Fortran. In general, bear in mind that a Fortran format specification is automatically repeated as long as there are values remaining in the input/output list, so it isn't necessary to provide formats for every value to be processed.
Scale factors
In the output, all floating point values following kP are multiplied by 10k. Fields containing exponents (E) have their exponent reduced by k, unless the exponent format is fixed by using EN (engineering) or ES (scientific) descriptors. Scaling does not apply to G editing, unless the value is such that E editing is applied. Thus, there is a difference between (1P,G20.10) and (1P,F20.10).
Grouping
A format like n() repeats the descriptors within parentheses n times before proceeding.

How are the digits in ObjC method type encoding calculated?

Is is a follow-up to my previous question:
What are the digits in an ObjC method type encoding string?
Say there is an encoding:
v24#0:4:8#12B16#20
How are those numbers calculated? B is a char so it should occupy just 1 byte (not 4 bytes). Does it have something to do with "alignment"? What is the size of void?
Is it correct to calculate the numbers as follows? Ask sizeof on every item and round up the result to multiple of 4? And the first number becomes the sum of all the other ones?
The numbers were used in the m68K days to denote stack layout. That is, you could literally decode the the method signature and, for just about all types, know exactly which bytes at what offset within the stack frame you could diddle to get/set arguments.
This worked because the m68K's ABI was entirely [IIRC -- been a long long time] stack based argument/return passing. There wasn't anything shoved into registers across call boundaries.
However, as Objective-C was ported to other platforms, always-on-the-stack was no longer the calling convention. Arguments and return values are often passed in registers.
Thus, those offsets are now useless. As well, the type encoding used by the compiler is no longer complete (because it never was terribly useful) and there will be types that won't be encoded. Not too mention that encoding some C++ templatized types yields method type encoding strings that can be many Kilobytes in size (I think the record I ran into was around 30K of type information).
So, no, it isn't correct to use sizeof() to generate the numbers because they are effectively meaningless to everything. The only reason why they still exist is for binary compatibility; there are bits of esoteric code here and there that still parse the type encoding string with the expectation that there will be random numbers sprinkled here and there.
Note that there are vestiges of API in the ObjC runtime that still lead one to believe that it might be possible to encode/decode stack frames on the fly. It really isn't as the C ABI doesn't guarantee that argument registers will be preserved across call boundaries in the face of optimization. You'd have to drop to assembly and things get ugly really really fast (>shudder<).
The full encoding string is constructed (in clang) by the method ASTContext::getObjCEncodingForMethodDecl, which you can find in lib/AST/ASTContext.cpp.
The method that does the size rounding is ASTContext::getObjCEncodingTypeSize, in the same file. It forces each size to be at least the size of an int. On all of Apple's current platforms, an int is 4 bytes.
The stack frame size and argument offsets are calculated by the compiler. I'm actually trying to track this down in the Clang source myself this week; it possibly has something to do with CodeGenTypes::arrangeObjCMessageSendSignature. (Looks like Rob just made my life a lot easier!)
The first number is the sum of the others, yes -- it's the total space occupied by the arguments. To get the size of the type represented by an ObjC type encoding in your code, you should use NSGetSizeAndAlignment().

Floating point serialization, lexicographical comparison == floating point comparison

I'm looking for a way to serialize floating points so that in their serialized form a lexicographical comparison is the same as a floating point comparison. I think it is possible by storing it in the form:
| signed bit (1 for positive) | exponent | significand |
The exponent and the significand would be serialized as big-endian and the complement would be taken for negative numbers.
Would this work? I don't mind if it breaks for NaN, but having INF comparison working would be nice.
The format of IEEE numbers are specifically designed so that "plain" integer comparison could be used. However, this only applies when two numbers of the same sign is compared.
Your suggestion to complement the numbers when they are negative is sound, so this will work.
This will work for +-Inf:s and for subnormal numbers. NaN:s, however, will not work, or rather, they will be considered "larger" than inf:s.
The only problematic case is "-Zero" (i.e. sign=1, exponent=0, and mantissa=0). Accoring to IEEE, Zero == -Zero. You have to decide if you want to emit -Zero as Zero, treat them as different, or add special code to the comparison routine.

How can I use SYNCSORT to format a Packed Decimal field with a specifc sign value?

I want to use SYNCSORT to force all Packed Decimal fields to a negative sign value. The critical requirement is the 2nd nibble must be Hex 'D'. I have a method that works but it seems much too complex. In keeping with the KISS principle, I'm hoping someone has a better method. Perhaps using a bit mask on the last 4 bits? Here is the code I have come up with. Is there a better way?
*
* This sort logic is intended to force all Packed Decimal amounts to
* have a negative sign with a B'....1101' value (Hex 'xD').
*
SORT FIELDS=COPY
OUTFIL FILES=1,
INCLUDE=(8,1,BI,NE,B'....1..1',OR, * POSITIVE PACKED DECIMAL
8,1,BI,EQ,B'....1111'), * UNSIGNED PACKED DECIMAL
OUTREC=(1:1,7, * INCLUDING +0
8:(-1,MUL,8,1,PD),PD,LENGTH=1,
9:9,72)
OUTFIL FILES=2,
INCLUDE=(8,1,BI,EQ,B'....1..1',AND, * NEGATIVE PACKED DECIMAL
8,1,BI,NE,B'....1111'), * NOT UNSIGNED PACKED DECIMAL
OUTREC=(1:1,7, * INCLUDING -0
8:(+1,MUL,8,1,PD),PD,LENGTH=1,
9:9,72)
In the code that processes the VSAM file, can you change the read logic to GET with KEY GTEQ and check for < 0 on the result instead of doing a specific keyed read?
If you did that, you could accept all three negative packed values xA, xB and xD.
Have you considered writing an E15 user exit? The E15 user exit lets you
manipulate records as they are input to the sort process. In this case you would have a
REXX, COBOL or other LE compatible language subroutine patch the packed decimal sign field as it is input to the sort process. No need to split into multiple files to be merged later on.
Here is a link to example JCL
for invoking an E15 exit from DFSORT (same JCL for SYNCSORT). Chapter 4 of this reference
describes how to develop User Exit routines, again this is a DFSORT manual but I believe SyncSort is
fully compatible in this respect. Writing a user exit is no different than writing any other subroutine - get the linkage right and the rest is easy.
This is a very general outline, but I hope it helps.
Okay, it took some digging but NEALB's suggestion to seek help on MVSFORUMS.COM paid off... here is the final result. The OUTREC logic used with SORT/MERGE replaces OUTFIL and takes advantage of new capabilities (IFTHEN, WHEN and OVERLAY) in Syncsort 1.3 that I didn't realize existed. It pays to have current documentation available!
*
* This MERGE logic is intended to assert that the Packed Decimal
* field has a negative sign with a B'....1101' value (Hex X'.D').
*
*
MERGE FIELDS=(27,5.4,BI,A),EQUALS
SUM FIELDS=NONE
OUTREC IFTHEN=(WHEN=(32,1,BI,NE,B'....1..1',OR,
32,1,BI,EQ,B'....1111'),
OVERLAY=(32:(-1,MUL,32,1,PD),PD,LENGTH=1)),
IFTHEN=(WHEN=(32,1,BI,EQ,B'....1..1',AND,
32,1,BI,NE,B'....1111'),
OVERLAY=(32:(+1,MUL,32,1,PD),PD,LENGTH=1))
Looking at the last byte of a packed field is possible. You want positive/unsigned to negative, so if it is greater than -1, subtract it from zero.
From a short-lived Answer by MikeC, it is now known that the data contains non-preferred signs (that is, it can contain A through F in the low-order half-byte, whereas a preferred sign would be C (positive) or D (negative). F is unsigned, treated as positive.
This is tested with DFSORT. It should work with SyncSORT. Turns out that DFSORT can understand a negative packed-decimal zero, but it will not create a negative packed-decimal zero (it will allow a zoned-decimal negative zero to be created from a negative zero packed-decimal).
The idea is that a non-preferred sign is valid and will be accurately signed for input to a decimal machine instruction, but the result will always be a preferred sign, and will be correct. So by adding zero first, the field gets turned into a preferred sign and then the test for -1 will work as expected. With data in the sign-nybble for packed-decimal fields, SORT has some specific and documented behaviours, which just don't happen to help here.
Since there is only one value to deal with to become the negative zero, X'0C', after the normalisation of signs already done, there is a simple test and replacement with a constant of X'0D' for the negative zero. Since the negative zero will not work, the second test is changed from the original minus one to zero.
With non-preferred signs in the data:
SORT FIELDS=COPY
INREC IFTHEN=(WHEN=INIT,
OVERLAY=(32:+0,ADD,32,1,PD,TO=PD,LENGTH=1)),
IFTHEN=(WHEN=(32,1,CH,EQ,X'0C'),
OVERLAY=(32:X'0D')),
IFTHEN=(WHEN=(32,1,PD,GT,0),
OVERLAY=(32:+0,SUB,32,1,PD,TO=PD,LENGTH=1))
With preferred signs in the data:
SORT FIELDS=COPY
INREC IFTHEN=(WHEN=(32,1,CH,EQ,X'0C'),
OVERLAY=(32:X'0D')),
IFTHEN=(WHEN=(32,1,PD,GT,0),
OVERLAY=(32:+0,SUB,32,1,PD,TO=PD,LENGTH=1))
Note: If non-preferred signs are stuffed through a COBOL program not using compiler option NUMPROC(NOPFD) then results will be "interesting".