Is it possible to use one variable to represent numbers of unlimited length in programming? [duplicate] - variables

This question already has answers here:
Most efficient implementation of a large number class
(5 answers)
Closed 8 years ago.
I've been using C# for three years to make games and I've played with various simulations where numbers sometimes get big and Int32 is not enough to store the value. Eventually even Int64 became insufficient for my experiments, it took several such fields (actually an array of variable length) and a special property to correctly handle such big numbers. And so I wondered: Is there a way to declare a numeric variable with unlimited (unknown beforehand) length so I can relax and let the computer do the math?
We can write any kind of number we like on paper without needing any special kind of paper. We can also type a lot of words in a text file without needing special file system alterations to make it save and load correctly. Isn't there a variable to declare a who-knows-how-long-it-will-be number in any programming languages?

Starting with .NET 4, the .NET framework contains a BigInteger structure, which can handle integers of arbitrary size.
Since your question is language-agnostic, it might be worth to mention that internally BigInteger stores the value in an array of unsigned integers, see the following SO question for details:
How does the BigInteger store values internally?
BigInteger is immutable, so there is no need to "resize" the array. Arithmetic operations create new instances of BigInteger, with appropriately sized arrays.

Most modern dynamic languages such as Perl6, Tcl8 and Ruby goes one step further by allowing you to store unlimited (up to available RAM) sized numbers in number types.
Most of these languages don't have separate integer and floating point types but rather a single "number" type that automatically gets converted to whatever it needs to be to be stored in RAM. Some, like Perl6, even includes complex numbers in its "number" type.
How it's implemented at the machine level is that by default numbers are assumed to be integers - so int32 or int64. If need be numbers are converted to floats or doubles if the result of a calculation or assignment isn't an integer. If the integer grows too large then the interpreter/runtime environment silently converts it to a bigInt object/struct (which is simply a big, growable array or linked-list of ints).
How it appears to the programmer is that numbers have unlimited size (again, up to available RAM).
Still, there are gotchas with this system (kind of like the 0.1+0.2!=0.3 issue with floats) so you'd still need to be aware of the underlying implementation even if you can ignore it 99.99% of the time.
For example, if at any point in time you super large number gets converted to a floating point number (most likely a double in hardware) you'll lose precision. Because that's just how floating point numbers work. Sometimes you can do it accidentally. In some languages for example, the power function (like pow() in C) returns a floating point result. So raising an integer to the power of another integer may truncate the result if it's too large.
For the most part, it works. And I personally feel that this is the sane way of dealing with numbers. Lots of language designers have apparently come up with this solution independently.

Is it possible to [...] represent numbers of unlimited length [...]?
No.
On existing computers it is not possible to represent unlimited numbers because the machines are finite. Even when using all existing storage it is not possible to store unlimited numbers.
It is possible, though, to store very large numbers. Wikipedia has information on the concept of arbitrary precision integers.

"Unlimited" - no, as Nikolai Ruhe soundly pointed out. "Unknown" - yes, qualified by the first point. :}
A BigInteger type is available in .NET 4.0 and in Java as others point out.
For .NET 2.0+, take a look at IntX.
More generally, languages (or a de facto library used with them at least) generally have some support for arbitrarily long integers, which provides a means of dealing with the "unknown" you describe.
A discussion on the Ubuntu forums somewhat addresses this question more generally and touches on specifics in more languages - some of which provide simpler means of leveraging arbitrarily large integers (e.g. Python and Common Lisp). Personally, the "relax and let the computer do the math" factor was highest for me in Common Lisp years ago: so it may pay to look around broadly for perspective as you seem inclined to do.

Related

How to decide when to use fixed point arithmetic over float?

How to decide when to use fixed point arithmetic over float?
I have read that, fixed point is used when there is no Floating point unit in the processor. When there is no FPU, does that mean 'float' datatype is not supported ?
If you have no FPU fixed point will be faster and more deterministic
If you have no FPU fixed point will in many cases be smaller - certainly for simple arithmetic.
If you need your code to generate bit-identical results across different platforms or toolchains with or without an FPU, then fixed point is necessary.
If you need to do complex math requiring trig or log functions for example, floating point is the path of least resistance, but by no means the only option - but you need to develop or find a library (see links below) - and there are plenty of ways of doing that badly.
If you need wide dynamic range floating point is simpler. For example the square root of a number less than one is a smaller number - with fixed point you can run out of bits and end up with zero, with floating point, the point is simply moved to increase the resolution at the expense of range.
If you have an FPU and are using an RTOS and don't want the overhead of stacking FPU registers on a context switch (or if it is not supported), fixed-point avoids the need, and avoids errors if you forget to enable the option for every task that needs it.
Generally if your operation is trivial, use fixed point or at least an integer representation by selecting your units appropriately. For example storing voltage values in integer millivolts (or even in ADC quanta) rather then in volts can avoid unnecessary floating point.
If you are doing complex maths and have an FPU, floating point is the simpler, least error prone solution. Even without and FPU, if your solution meets timing and code size constraints, floating point may still be simpler, but may restrict your ability to use the same code in more constraint execution environments. So is reuse across a wide range of platforms is required fixed-point may be preferable.
Whatever you do, avoid "decimal fixed point" in most cases, use where possible use a binary fixed point representation (Q representation), where for example 10Q6 has 10 integer bits and 6 fractional bits. The reason for this is that rescaling following multiply/divide are then shift operations rather than potentially expensive multiply/divide operations and you loose no precision in the rescaling.
Some useful references:
https://www.drdobbs.com/cpp/optimizing-math-intensive-applications-w/207000448
http://jet.ro/files/The_neglected_art_of_Fixed_Point_arithmetic_20060913.pdf
If there is no FPU available, compilers can often emulate floating point arithmetics. But this is inefficient, as it takes a lot of cycles.
If you are resource constraint (which you often are in environments without a FPU), you can then opt for fixed point arithmetic, which uses regular integer operations.
Just rambling: When I was using FP, I missed support from the compiler (C/C++) to be able to mark variables to be fixed point (with some specific number of fractional bits).
If you have a standard compliant compiler, then float and double are always available and work correctly. If there isn't an FPU then the calculations are done in software (called soft-FPU or FPU emulation). This is slower and uses more memory.
When to use fixed point is mainly a matter of opinion, but when NOT to use it is when a variable has a large dynamic range, ie: when a number could be very large but you still need it to be accurate if it is very small.
Eg: Displaying the speed of a car. I need to know the difference between 68 and 70 mph, but I don't care about the difference between 0.68 mph and 0.70 mph. This has low dynamic range (that I care about) so I could use fixed-point if other reasons suggested that I might want to. Alternatively, measuring the radio-activity of a sample: I care about the difference between 100 and 90 counts per second and I still care about the difference between 1 and 0.9 counts per second. This high dynamic range means that fixed point would not be suitable.
How to decide when to use fixed point arithmetic over float?
It depends on many factors which may or may not affect you, including...
Pros:
Fixed-point requires less circuitry so may be more practical on smaller, simpler devices.
Fixed-point uses less energy so may be more practical
on battery-powered devices,
in applications where intensive computation incurs a significant energy bill, or
where heat dissipation is a problem.
Fixed-point is really just integer arithmetic so operations are lossless.
Fixed-point allows you to precisely express real numbers in the number base of your choice, e.g. the values 1/10 or 1/3.
Floating-point arithmetic can exhibit inconsistent behavior related to things like
global rounding modes,
optimisation,
associativity,
implementation-defined behavior, and
variations in FPU hardware.
Cons:
While lossless, fixed-point arithmetic is prone to out-of-range errors, such as overflow. (Libraries and UB sanitizers can help avoid/detect errors.)
Lossless division is achieved with the help of the modulus operator (%), which is often harder to work with.
Fixed-point arithmetic is not as well supported in most languages: you have to perform error-prone calculations by hand using integers or find a library to help you.
Floating-point formats tend to be more consistent across architectures, unlike integers which vary in width and endianness.
Not only do floating-point types have a dynamic radix point, but the optimal position of that point is maintained automatically, saving headaches and precision loss.
Really, floating-point is an easy-to-use and widely-supported solution for representing real numbers. If floating-point works for you, think carefully before exploring alternatives.
Exception: an application where the pros make fixed-point a natural choice is currency. Lossless representation of denominations is especially important in finance. (example)
I have read that, fixed point is used when there is no Floating point unit in the processor.
That is true. Without an FPU, floating-point arithmetic is very slow!
When there is no FPU, does that mean 'float' datatype is not supported ?
No, that shouldn't necessarily follow, although implementations may vary. (IIRC, older versions of GCC had a flag to enable floating-point support.) It's entirely possible to perform floating-point operations using
the compiler, by generating equivalent ALU instructions, or
the system, by interrupting the process when an FPU instruction is encountered and deferring to a software routine.
Both of these are much slower, so using fixed-point arithmetic on such hardware may become a practical choice. It may be best to think of fixed-point as an optimisation technique, used once a performance deficit has been identified.

Emulating numerical operations in software

The numerical operations we do in our programs are limited by the number of bytes that a language specifies for a given datatype (or maybe hardware supports). Say I can use integer to do calculations on my paycheck (even "short" is more than enough for a year's earning!!! ;) ) but can't do the same with Bill Gates wealth. So, we go for things like long long and stuff. But aren't we still at the mercy of number of bits that are given to us.
So, how about if I emulate numerical operations in software? Say a class that abstracts and can do numerical operations on numbers with 1000s of digits...
Of course it will be too too slow, but I am not much worried about complexity but looking more at just computability...
Maybe I can use it to calculate PI to 1000 digit accuracy in a months or a Mersenne Primes in few years and take home $100K ;)
So now my question,
1) Are there already any such libraries to do this kind of stuff out there (In C/C++).
2) If I go about implementing one, do you have any suggestions for me? (+, -, *, /, %, <<, >> operations should enough I guess)
PS:
I am C/C++ programmer.
And this limitation started bugging me from my school days.
Such datatypes are known as Arbitrary-precision numbers. In Java, there are the classes BigDecimal and BigInteger which handle basic operations (+, -, *, /) on digit level. They have no 'built-in' size limitation. They are actually not that slow and are used in a lot of real-world domains.
C/C++ don't have it built-in but there are a lot of libraries out there. See a list here:
http://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic#Libraries

Why aren't Floating-Point Decimal numbers hardware accelerated like Floating-Point Binary numbers?

Is it worth it to implement it in hardware? If yes why? If not why not?
Sorry I thought it is clear that I am talking about Decimal Rational Numbers! Ok something like decNumber++ for C++, decimal for .NET... Hope it is clear now :)
The latest revision of the IEEE 754:2008 standard does indeed define hardware decimal floating point numbers, using the representations shown in the software referenced in the question. The previous version of the standard (IEEE 754:1985) did not provide decimal floating point numbers. Most current hardware implements the 1985 standard and not the 2008 standard, but IBM's iSeries computers using Power6 chips have such support, and so do the z10 mainframes.
The standardization effort for decimal floating point was spearheaded by Mike Cowlishaw of IBM UK, who has a web site full of useful information (including the software in the question). It is likely that in due course, other hardware manufacturers will also introduce decimal floating point units on their chips, but I have not heard a statement of direction for when (or whether) Intel might add one. Intel does have optimized software libraries for it.
The C standards committee is looking to add support for decimal floating point and that work is TR 24732.
Some IBM processors have dedicated decimal hardware included (Decimal Floating Point | DFP- unit).
In contribution of
answered Sep 18 at 23:43
Daniel Pryden
the main reason is that DFP-units need more transistors in a chip then BFP-units. The reason is the BCD Code to calculate decimal numbers in a binary environment. The IEEE754-2008 has several methods to minimize the overload. It seems that the DPD hxxp://en.wikipedia.org/wiki/Densely_packed_decimal method is more effective in comparison to the BID hxxp://en.wikipedia.org/wiki/Binary_Integer_Decimal method.
Normally, you need 4 bits to cover the decimal range from 0 to 9. Bit the 10 to 15 are invalid but still possible with BCD.
Therefore, the DPD compress 3*4=12 bit into 10 bit to cover the range from 000 to 999 with 1024 (10^2)possibilities.
In general it is to say, that BFP is faster then DFP.
And BFP need less space on a chip then DFP.
The question why IBM implemented a DFP unit is quite simple answered:
They build servers for the finance market. If data represents money, it should be reliable.
With hardware accelerated decimal arithmetic, some errors do not accour as in binary.
1/5 = 0.2 => 0.0110011001100110011001100110... in binary so recurrent fractions could be avoided.
And the overhelming round() function in excel would be useless anymore :D
(->function =1*(0,5-0,4-0,1) wtf!)
hope that explain your question a little!
There is (a tiny bit of) decimal string acceleration, but...
This is a good question. My first reaction was "macro ops have always failed to prove out", but after thinking about it, what you are talking about would go a whole lot faster if implemented in a functional unit. I guess it comes down to whether those operations are done enough to matter. There is a rather sorry history of macro op and application-specific special-purpose instructions, and in particular the older attempts at decimal financial formats are just legacy baggage now. For example, I doubt if they are used much, but every PC has the Intel BCD opcodes, which consist of
DAA, AAA, AAD, AAM, DAS, AAS
Once upon a time, decimal string instructions were common on high-end hardware. It's not clear that they ever made much of a benchmark difference. Programs spend a lot of time testing and branching and moving things and calculating addresses. It normally doesn't make sense to put macro-operations into the instruction set architecture, because overall things seem to go faster if you give the CPU the smallest number of fundamental things to do, so it can put all its resources into doing them as fast as possible.
These days, not even all the binary ops are actually in the real ISA. The cpu translates the legacy ISA into micro-ops at runtime. It's all part of going fast by specializing in core operations. For now the left-over transisters seem to be waiting for some graphics and 3D work, i.e., MMX, SSE, 3DNow!
I suppose it's possible that a clean-sheet design might do something radical and unify the current (HW) scientific and (SW) decimal floating point formats, but don't hold your breath.
No, they are very memory-inefficient. And the calculations are also on hardware not easy to implement (of course it can be done, but it also can use a lot of time).
Another disadvantage of the decimal format is, it's not widly used, before research showed that the binary-formatted numbers were more accurate the format was popular for a time. But now programmers know better. The decimal format is't efficient and is more lossy. Also additional hardware-representations require additional instruction-sets, that can lead to more difficult code.
Decimals (and more generally, fractions) are relatively easy to implement as a pair of integers. General purpose libraries are ubiquitous and easily fast enough for most applications.
Anyone who needs the ultimate in speed is going to hand-tune their implementation (eg changing the divisor to suit a particular usage, algebraicly combining/reordering the operations, clever use of SIMD shuffles...). Merely encoding the most common functions into a hardware ISA would surely never satisfy them -- in all likelihood, it wouldn't help at all.
The hardware you want used to be fairly common.
Older CPU's had hardware BCD (Binaray coded decimal) arithmetic. ( The little Intel chips had a little support as noted by earlier posters)
Hardware BCD was very good at speeding up FORTRAN which used 80 bit BCD for numbers.
Scientific computing used to make up a significant percentage of the worldwide market.
Since everyone (relatively speaking) got home PC running windows, the market got tiny
as a percentage. So nobody does it anymore.
Since you don't mind having 64bit doubles (binary floating point) for most things, it mostly works.
If you use 128bit binary floating point on modern hardware vector units it's not too bad. Still less accurate than 80bit BCD, but you get that.
At an earlier job, a colleague formerly from JPL was astonished we still used FORTRAN. "We've converted to C and C++ he told us." I asked him how they solved the problem of lack of precision. They'd not noticed. (They have also not the same space probe landing accuracy they used to have. But anyone can miss a planet.)
So, basically 128bit doubles in the vector unit are more okay, and widely available.
My twenty cents. Please don't represent it as a floating point number :)
Decimal floating-point standard (IEEE 754-2008) is already implemented in hardware by two companies; IBM's POWER 6/7 based servers, and SilMinds SilAx PCIe-based acceleration card.
SilMinds published a case study about converting the Decimal arithmetic execution to use its HW solutions. A great boost in time and slashed energy consumption are presented.
Moreover several publications by "Michael J. Schulte" and others reveal very positive benchmarks results, and some comparison between DPD and BID formats (both defined in the IEEE 754-2008 standard)
You can find pdfs to:
Performance analysis of decimal floating-point libraries and its impact on decimal hardware and software solutions
A survey of hardware designs for decimal arithmetic
Energy and Delay Improvement via Decimal Floating Point Units
Those 3 papers should be more than enough for your questions!
I speculate that there are no compute-intensive applications of decimal numbers. On the other hand, floating points number are extensively used in engineering applications, which must handle enormous amounts of data and do not need exact results, just need to stay within a desired precision.
The simple answer is that computers are binary machines. They don't have ten fingers, they have two. So building hardware for binary numbers is considerably faster, easier, and more efficient than building hardware for decimal numbers.
By the way: decimal and binary are number bases, while fixed-point and floating-point are mechanisms for approximating rational numbers. The two are completely orthogonal: you can have floating-point decimal numbers (.NET's System.Decimal is implemented this way) and fixed-point binary numbers (normal integers are just a special case of this).
Floating point math essentially IS an attempt to implement decimals in hardware. It's troublesome, which is why the Decimal types are created partly in software. It's a good question, why CPUs don't support more types, but I suppose it goes back to CISC vs. RISC processors -- RISC won the performance battle, so they try to keep things simple these days I guess.
Modern computers are usually general purpose. Floating point arithmetic is very general purpose, while Decimal has a far more specific purpose. I think that's part of the reason.
Do you mean the typical numeric integral types "int", "long", "short" (etc.)? Because operations on those types are definitely implemented in hardware. If you're talking about arbitrary-precision large numbers ("BigNums" and "Decimals" and such), it's probably a combination of rarity of operations using these data types and the complexity of building hardware to deal with arbitrarily large data formats.

Overhead of using bignums

I have hit upon this problem about whether to use bignums in my language as a default datatype when there's numbers involved. I've evaluated this myself and reduced it to a convenience&comfort vs. performance -question. The answer to that question depends about how large the performance hit is in programs that aren't getting optimized.
How small is the overhead of using bignums in places where a fixnum or integer would had sufficed? How small can it be at best implementations? What kind of implementations reach the smallest overhead and what kind of additional tradeoffs do they result in?
What kind of hit can I expect to the results in the overall language performance if I'll put my language to default on bignums?
You can perhaps look at how Lisp does it. It will almost always do the exactly right thing and implicitly convert the types as it becomes necessary. It has fixnums ("normal" integers), bignums, ratios (reduced proper fractions represented as a set of two integers) and floats (in different sizes). Only floats have a precision error, and they are contagious, i.e. once a calculation involves a float, the result is a float, too. "Practical Common Lisp" has a good description of this behaviour.
To be honest, the best answer is "try it and see".
Clearly bignums can't be as efficient as native types, which typically fit in a single CPU register, but every application is different - if yours doesn't do a whole load of integer arithmetic then the overhead could be negligible.
Come to think of it... I don't think it will have much performance hits at all.
Because bignums by nature, will have a very large base, say a base of 65536 or larger for which is usually a maximum possible value for traditional fixnum and integers.
I don't know how large you would set the bignum's base to be but if you set it sufficiently large enough so that when it is used in place of fixnums and/or integers, it would never exceeds its first bignum-digit thus the operation will be nearly identical to normal fixnums/int.
This opens an opportunity for optimizations where for a bignum that never grows over its first bignum-digit, you could replace them with uber-fast one-bignum-digit operation.
And then switch over to n-digit algorithms when the second bignum-digit is needed.
This could be implemented with a bit flag and a validating operation on all arithmetic operations, roughly thinking, you could use the highest-order bit to signify bignum, if a data block has its highest-order bit set to 0, then process them as if they were normal fixnum/ints but if it is set to 1, then parse the block as a bignum structure and use bignum algorithms from there.
That should avoid performance hits from simple loop iterator variables which I think is the first possible source of performance hits.
It's just my rough thinking though, a suggestion since you should know better than me :-)
p.s. sorry, forgot what the technical terms of bignum-digit and bignum-base were
your reduction is correct, but the choice depends on the performance characteristics of your language, which we cannot possibly know!
once you have your language implemented, you can measure the performance difference, and perhaps offer the programmer a directive to choose the default
You will never know the actual performance hit until you create your own benchmark as the results will vary per language, per language revision and per cpu and. There's no language independent way to measure this except for the obvious fact that a 32bit integer uses twice the memory of a 16bit integer.
How small is the overhead of using bignums in places where a fixnum or integer would had sufficed? Show small can it be at best implementations?
The bad news is that even in the best possible software implementation, BigNum is going to be slower than the builtin arithmetics by orders of magnitude (i.e. everything from factor 10 up to factor 1000).
I don't have exact numbers but I don't think exact numbers will help very much in such a situation: If you need big numbers, use them. If not, don't. If your language uses them by default (which language does? some dynamic languages do …), think whether the disadvantage of switching to another language is compensated for by the gain in performance (which it should rarely be).
(Which could roughly be translated to: there's a huge difference but it shouldn't matter. If (and only if) it matters, use another language because even with the best possible implementation, this language evidently isn't well-suited for the task.)
I totally doubt that it would be worth it, unless it is very domain-specific.
The first thing that comes to mind are all the little for loops throughout programs, are the little iterator variables all gonna be bignums? That's scary!
But if your language is rather functional... then maybe not.

When to use Fixed Point these days

For intense number-crunching i'm considering using fixed point instead of floating point. Of course it'll matter how many bytes the fixed point type is in size, on what CPU it'll be running on, if i can use (for Intel) the MMX or SSE or whatever new things come up...
I'm wondering if these days when floating point runs faster than ever, is it ever worth considering fixed point? Are there general rules of thumb where we can say it'll matter by more than a few percent? What is the overview from 35,000 feet of numerical performance? (BTW, i'm assuming a general CPU as found in most computers, not DSP or specialized embedded systems.)
It's still worth it. Floating point is faster than in the past, but fixed-point is also. And fixed is still the only way to go if you care about precision beyond that guaranteed by IEEE 754.
In situations where you are dealing with very large amounts of data, fixed point can be twice as memory efficient, e.g. a four byte long integer as opposed to an eight byte double. A technique often used in large geospatial datasets is to reduce all the data to a common origin, such that the most significant bits can be disposed of, and work with fixed point integers for the rest. Floating point is only important if the point does actually float, i.e. you are dealing with a very wide range of numbers at very high accuracy.
Another good reason to use fixed decimal is that rounding is much simpler and predictable. Most of the financial software uses fixed point arbitrary precision decimals with half-even rounding to represent money.
Its nearly ALWAYS faster to use fixed point (experience of x86, pentium, 68k and ARM). It can, though, also depend on the application type. For graphics programming (one of my main uses of fixed point) I've been able to optimize the code using prebuilt cosine tables, log tables etc. But also the basic mathematical operations have also proven faster.
A comment on financial software. It was said in an earlier answer that fixed point is useful for financial calculations. In my own experience (development of large treasury management system and extensive experience of credit card processing) I would NOT use fixed point. You will have rounding errors using either floating or fixed point. We always use whole amounts to represent monetary amounts, counting the minimum amount possible (1c for Euro or dollar). This ensure no partial amounts are ever lost. When doing complex calculations values are converted to doubles, application specific rounding rules are applied and results are converted back to whole numbers.
Use fixed-point when the hardware doesn't support floating-point or the hardware implementation sucks.
Also beware when making classes for it. Something you think would be quick could actually turn out to be a dog when it comes to profiling due to (un)necessary copies of classes. That is another question for another time however.
Another reason to use fixed-point is that ARM devices, like mobile phones and tablets, lack of FPU (at least many of them).
For developing real-time applications it makes sense to optimize functions using fixed-point arithmetic. There are implementations of FFTs (Fast Fourier Transform), very importan for graphics, that base its improvements on efficiency on relying on floating point arithmetic.
Since you are using a general-purpose CPU, I would suggest not using fixed point, unless performance is so critical for your application that you have to count every tic. The hassle of implementing fixed point, and dealing with issues like overflow is just not worth it, when you have a CPU, which will do it for you.
IMHO, fixed point is only necessary when you are using a DSP without hardware support for floating point operations.