How to print a number with number of fixed characters in Smalltalk/Pharo - formatting

Actual task: I want to print the matrix (of my own implementation) in humanly readable format. As a pre-requisite, I figured I need to be able to specify "fit the number representation into X characters". I found #printShowingDecimalPlaces: and #printPaddedWith:to: in Float and Integer classes (the first method is in more general Number class). Individually, they work, but the former works on fractional part only and the the latter on part before fractional, e.g.:
10.3 printPaddedWith: Character space to: 5.
"' 10.3'"
-10.3 printPaddedWith: Character space to: 5.
"' -10.3'"
10.3 printShowingDecimalPlaces: 3.
"'10.300'"
Also, their action on very large (or equally small numbers) in scientific form is not ideal:
12.3e9 printShowingDecimalPlaces: 3.
"'12300000000.000'"
12.3e9 printPaddedWith: Character space to: 5.
"' 1.23e10'"
So, I would like something like Common Lisp's (FORMAT T "~10g" 12.3d9) or C's printf("%10g", 12.3e9), that (a) restricts the whole width to 10 characters and (b) chooses the most suitable format depending on the size of the number. Is there something like this in Pharo?

For versatile printing options, I suggest loading NumberPrinter package from
http://ss3.gemstone.com/ss/NumberPrinter/
(FloatPrinter fixed) digitCount: 2; print: 10.3.
-> '10.30'
I did not try it in recent Pharo versions though.
EDIT:
Ah, but I see no format for handling exponents multiple of 3, maybe you would have to create a subclass for such format.
EDIT:
Or I missunderstood: you don't want it to print as '12.3e9' but rather '1.23e10'? note that apart significand digitCount, you need extra size for at worst 1 for sign + 1 for fraction separator + 1 for exponent letter + 1 for exponent sign + 3 for exponent (worst case for double precision floating point).
The more or less equivalent to g format would be something like this:
(FloatPrinter freeFormat)
totalWidth: 13; "size of the generated string"
digitCount: 6; "number of significant figures"
print: -12.3e-205.
->' -1.23e-204'

Related

what's best way for awk to check arbitrary integer precision

from GNU gawk's page
https://www.gnu.org/software/gawk/manual/html_node/Checking-for-MPFR.html
they have a formula to check arbitrary precision
function adequate_math_precision(n) { return (1 != (1+(1/(2^(n-1))))) }
My question is : wouldn't it be more efficient by staying within integer math domain with a formula such as
( 2^abs(n) - 1 ) % 2 # note 2^(n-1) vs. 2^|n| - 1
Since any power of 2 must also be even, then subtracting 1 must always be odd, then its modulo (%) over 2 becomes indicator function for is_odd() for n >= 0, while the abs(n) handles the cases where it's negative.
Or does the modulo necessitate a casting to float point, thus nullifying any gains ?
Good question. Let's tackle it.
The proposed snippet aims at checking wether gawk was invoked with the -M option.
I'll attach some digression on that option at the bottom.
The argument n of the function is the floating point precision needed for whatever operation you'll have to perform. So, say your script is in a library somewhere and will get called but you have no control over it. You'll run that function at the beginning of the script to promptly throw exception and bail out, suggesting that the end result will be wrong due to lack of bits to store numbers.
Your code stays in the integer realm: a power of two of an integer is an integer. There is no need to use abs(n) here, because there is no point in specifying how many bits you'll need as a negative number in the first place.
Then you subtract one from an even, integer number. Now, unless n=0, in which case 2^0=1 and then your code reads (1 - 1) % 2 = 0, your snippet shall always return 1, because the quotient (%) of an odd number divided by two is 1.
Problem is: you are trying to calculate a potentially stupidly large number in a function that should check if you are able to do so in the first place.
Since any power of 2 must also be even, then subtracting 1 must always
be odd, then its modulo (%) over 2 becomes indicator function for
is_odd() for n >= 0, while the abs(n) handles the cases where it's
negative.
Except when n=0 as we discussed above, you are right. The snippet will tell that any power of 2 is even, and any power of 2, minus 1, is odd. We were discussing another subject entirely thought.
Let's analyze the other function instead:
return (1 != (1+(1/(2^(n-1)))))
Remember that booleans in awk runs like this: 0=false and non zero equal true. So, if 1+x where x is a very small number, typically a large power of two (2^122 in the example page) is mathematically guaranteed to be !=1, in the digital world that's not the case. At one point, floating computation will reach a precision rock bottom, will be rounded down, and x=0 will be suddenly declared. At that point, the arbitrary precision function will return 0: false: 1 is equal 1.
A larger discussion on types and data representation
The page you link explains precision for gawk invoked with the -M option. This sounds like technoblahblah, let's decipher it.
At one point, your OS architecture has to decide how to store data, how to represent it in memory so that it can be accessed again and displayed. Terms like Integer, Float, Double, Unsigned Integer are examples of data representation. We here are addressing Integer representation: how is an integer stored in memory?
A 32-bit system will use 4 bytes to represent and integer, which in turn determines how larger the integer will be. The 32 bits are read from most significative (MSB) to less significative (LSB) and if signed, one bit will represent the sign (the MSB typically, drastically reducing the max size of the integer).
If asked to compute a large number, a machine will try to fit in in the max number available. If the end result is larger than that, you have overflow and end up with a wrong result or an error. Many online challenges typically ask you to write code for arbitrary long loops or large sums, then test it with inputs that will break the 64bit barrier, to see if you master proper types for indexes.
AWK is not a strongly typed language. Meaning, any variable can store data, regardless of the type. The data type can change and it is determined at runtime by the interpreter, so that the developer doesn't need to care. For instance:
$awk '{a="this is text"; print a; a=2; print a; print a+3.0*2}'
-| this is text
-| 2
-| 8
In the example, a is text, then is an integer and can be summed to a floating point number and printed as integer without any special type handling.
The Arbitrary Precision Page presents the following snippet:
$ gawk -M 'BEGIN {
> s = 2.0
> for (i = 1; i <= 7; i++)
> s = s * (s - 1) + 1
> print s
> }'
-| 113423713055421845118910464
There is some math voodoo behind, we will skip that. Since s is interpreted as a floating point number, the end result is computed as floating point.
Try to input that number on Windows calculator as decimal, and it will fail. Although you can compute it as a binary. You'll need the programmer setting and to add up to 53 bits to be able to fit it as unsigned integer.
53 is a magic number here: with the -M option, gawk uses arbitrary precision for numbers. In other words, it commandeers how many bits are necessary, track them and breaks free of the native OS architecture. The default option says that gawk will allocate 53 bits for any given arbitrary number. Fun fact, the actual result of that snippet is wrong, and it would take up to 100 bits to compute correctly.
To implement arbitrary large numbers handling, gawk relies on an external library called MPFR. Provided with an arbitrary large number, MPFR will handle the memory allocation and bit requisition to store it. However, the interface between gawk and MPFR is not perfect, and gawk can't always control the type that MPFR will use. In case of integers, that's not an issue. For floating point numbers, that will result in rounding errors.
This brings us back to the snippet at the beginning: if gawk was called with the -M option, numbers up to 2^53 can be stored as integers. Floating points will be smaller than that (you'll need to make the comma disappear somehow, or rather represent it spending some of the bits allocated for that number, just like the sign). Following the example of the page, and asking an arbitrary precision larger than 32, the snippet will return TRUE only if the -M option was passed, otherwise 1/2^(n-1) will be rounded down to be 0.

Remove decimal separator with output format on real values in Fortran [duplicate]

So I have some code that does essentially this:
REAL, DIMENSION(31) :: month_data
INTEGER :: no_days
no_days = get_no_days()
month_data = [fill array with some values]
WRITE(1000,*) (month_data(d), d=1,no_days)
So I have an array with values for each month, in a loop I fill the array with a certain number of values based on how many days there are in that month, then write out the results into a file.
It took me quite some time to wrap my head around the whole 'write out an array in one go' aspect of WRITE, but this seems to work.
However this way, it writes out the numbers in the array like this (example for January, so 31 values):
0.00000 10.0000 20.0000 30.0000 40.0000 50.0000 60.0000
70.0000 80.0000 90.0000 100.000 110.000 120.000 130.000
140.000 150.000 160.000 170.000 180.000 190.000 200.000
210.000 220.000 230.000 240.000 250.000 260.000 270.000
280.000 290.000 300.000
So it prefixes a lot of spaces (presumably to make columns line up even when there are larger values in the array), and it wraps lines to make it not exceed a certain width (I think 128 chars? not sure).
I don't really mind the extra spaces (although they inflate my file sizes considerably, so it would be nice to fix that too...) but the breaking-up-lines screws up my other tooling. I've tried reading several Fortran manuals, but while some of the mention 'output formatting', I have yet to find one that mentions newlines or columns.
So, how do I control how arrays are written out when using the syntax above in Fortran?
(also, while we're at it, how do I control the nr of decimal digits? I know these are all integer values so I'd like to leave out any decimals all together, but I can't change the data type to INTEGER in my code because of reasons).
You probably want something similar to
WRITE(1000,'(31(F6.0,1X))') (month_data(d), d=1,no_days)
Explanation:
The use of * as the format specification is called list directed I/O: it is easy to code, but you are giving away all control over the format to the processor. In order to control the format you need to provide explicit formatting, via a label to a FORMAT statement or via a character variable.
Use the F edit descriptor for real variables in decimal form. Their syntax is Fw.d, where w is the width of the field and d is the number of decimal places, including the decimal sign. F6.0 therefore means a field of 6 characters of width with no decimal places.
Spaces can be added with the X control edit descriptor.
Repetitions of edit descriptors can be indicated with the number of repetitions before a symbol.
Groups can be created with (...), and they can be repeated if preceded by a number of repetitions.
No more items are printed beyond the last provided variable, even if the format specifies how to print more items than the ones actually provided - so you can ask for 31 repetitions even if for some months you will only print data for 30 or 28 days.
Besides,
New lines could be added with the / control edit descriptor; e.g., if you wanted to print the data with 10 values per row, you could do
WRITE(1000,'(4(10(F6.0,:,1X),/))') (month_data(d), d=1,no_days)
Note the : control edit descriptor in this second example: it indicates that, if there are no more items to print, nothing else should be printed - not even spaces corresponding to control edit descriptors such as X or /. While it could have been used in the previous example, it is more relevant here, in order to ensure that, if no_days is a multiple of 10, there isn't an empty line after the 3 rows of data.
If you want to completely remove the decimal symbol, you would need to rather print the nearest integers using the nint intrinsic and the Iw (integer) descriptor:
WRITE(1000,'(31(I6,1X))') (nint(month_data(d)), d=1,no_days)

Bitwise negation up to the first positive bit

I was working in a project and I can´t use the bitwise negation with a U32 bits (Unsigned 32 bits) because when I tried to use the negation operator for example I have 1 and the negation (according to this function) was the biggest number possible with U32 and I expected the zero. My idea was working with a binary number like (110010) and I need to only negate the bits after the first 1-bit(001101). There is a way to do that in LabVIEW?
This computes the value you are looking for.
1110 --> 0001 (aka, 1)
1010 --> 0101 (aka, 101)
111 --> 000 (aka, 0) [indeed, all patterns that are all "1" will become 0]
0 --> 0 [because there are no bits to negate... maybe you want to special case this as "1"?)
Note: This is a VI Snippet. Save the .png file to your disk then drag the image from your OS into LabVIEW and it will generate the block diagram (I wrote it in LV 2016, so it works for 2016 or later). Sometimes dragging directly from browser to diagram works, but most browsers seem to strip out the EXIF data that makes that work.
Here's an alternative solution without a loop. It formats the input into its string representation (without leading zeros) to figure out how many bits to negate - call this n - and then XOR's the input with 2^n - 1.
Note that this version will return an output of 1 for an input of 0.
Using the string functions feels a bit hacky... but it doesn't use a loop!!
Obviously we could instead try to get the 'bit length' of the input using its base-2 log, but I haven't sat down and worked out how to correctly ensure that there are no rounding issues when the input has only its most significant bit set, and therefore the base-2 log should be exactly an integer, but might come out a fraction smaller.
Here's a solution without strings/loops using the conversion to float (a common method of computing floor(log_2(x))). This won't work on unsigned types.

Test cases for numeric input

What are some common (or worthwhile) tests, test questions, weaknesses, or misunderstandings dealing with numeric inputs?
This is a community wiki. Please add to it.
For example, here are a couple sample ideas:
I commonly see users enter text into number fields (eg, ">4" or "4 days", etc).
Fields left blank (null)
Very long numeric strings
Multiple decimals and commas (eg, "4..4" and "4,,434.4.4")
Boundary Value Analysis:
Lower Boundary
Lower Boundary - 1 (for decimal/float, use smaller amounts)
Upper Boundary
Upper Boundary + 1
Far below the lower boundary (eg, beyond the hardware boundary value)
Far above the upper boundary
middle of the range
0
0.0
White space, nothing else " "
String input & other incorrect data types.
Number with text in front or back, eg "$5.00", "4 lbs", "about 60", "50+"
Negative numbers
+ sign with positive numbers, "+4"
Both plus and minus sign, eg "+-4" and "-4e+30"
Exponents of 10, both uppercase and lowercase, positive and negative eg "4e10", "-5E-10", "+6e+60", etc
Too many "e" characters, eg "4e4e4" "4EE4"
Impossibly large/small exponents or inappropriate ones
Decimal values that cannot be represented in a computer
eg, .3 + .6 == 1.0? This bug affects most hardware, so outputs that compare decimal values should allow for a degree of error.
Integer/hardware overflow. Eg, for 32-bit integers, what happens when adding 4 billion to 4 billion?
wrong use of decimal sign and thousands seperator ("," vs. ".") (MikeD)
internationalisation i18n issues: In english aplications you write "12,345.67" meaning "12345.67" in german you write "12345,67" – (k3b)
leading 0's don't make number octal (common javascript bug)

Correct termiology for documentation [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
The documentation below is for a module, which has now been "decommissioned"
and I'm writing it's replacement.
Before i write the replacement I want to get my terms right.
I know the terms are wrong in the documentation - it was hacked together quickly
so i could instruct a college working on the hardware side of this project on how to use a program I'ld made.
Full documentary can be found here for any who are interested (in so much as has been written and added to our wiki), the Website may only be available to certain IPS's (depends on you ISP - university internet connections are most likely to work), and the SVN repo is private.
So there are alot of terms that are wrong.
such as.
deliminators
formatted string containing value expressions (might now be wrong but is hard to say)
What are the correct terms for these.
And what other mistakes have I made
==== formatted string containing value expressions ====
Before I start on actual programs an explanation of:
"formatted string containing value expressions" and how to encode values in them.
The ''formatted string containing value expressions'' is at the core of doing low level transmission.
We know the decimal 65, hex 41, binary 0100 0001, and the ascii character 'A' all have the same binary representation, so to tell which we are using we have a series of deliminators - numbers preceded by:
# are decimal
$ are Hex
# are binary
No deliminator, then ascii.
Putting a sign indicator after the deliminator is optional. It is required if you want to send a negative number.
You may put muliple values in the same string.
eg: "a#21#1001111$-0F"
All values in a ''formatted string containing value expressions'' must be in the range -128 to 255 (inclusive) as they must fit in 8bytes (other values will cause an error). Negative numbers have the compliment of 2 representation for their binary form.
There are some problems with ascii - characters that can't be sent (in future versions this will be fixed by giving ascii a delineator and some more code to make that deliminator work, I think).
Characters that can't be sent:
* The delineator characters: $##
* Numbers written immediately after a value that could have contained those digits:
* 0,1,2,3,4,5,6,7,8,9 for decimal
* 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f,A,B,C,D,E,F for hex
* 0,1 for binary
For a start, deliminator would probably be delimiter, although I notice your text has both delineator and deliminator in it - perhaps deliminator is a special delimiter/terminator combination :-)
However, a delimiter is usually used to separate fields and is usually present no matter what. What you have is an optional prefix which dictates the following field type. So I would probably call that a "prefix" or "type prefix" instead.
The "formatted string containing value expressions" I would just call a "value expression string" or "value string" to change it to a shorter form.
One other possible problem:
must be in the range -128 to 255 (inclusive) as they must fit in 8bytes
I think you mean 8 bits.
Try something like the following:
==== Value string encoding ====
The value string is at the core of the data used for low level
transmissions.
Within the value string the following refixes are used:
# decimal
$ Hex
# binary
No prefix - ASCII.
An optional sign may be included after the delimiter for negative numbers.
Negative numbers are represented using twos complement.
The value string may contain multiple values:
eg: "a#21#1001111$-0F"
All elements of the value string must represent an 8bit value and must
be in the range -128 to 255
When using ASCII representation the following characters that can't be sent
* The delineator characters: $## (use prefixed hex value.)
* Numbers written immediately after a value that could have
contained those digits:
* 0,1,2,3,4,5,6,7,8,9 for decimal
* 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f,A,B,C,D,E,F for hex
* 0,1 for binary