Emulating numerical operations in software - numeric

The numerical operations we do in our programs are limited by the number of bytes that a language specifies for a given datatype (or maybe hardware supports). Say I can use integer to do calculations on my paycheck (even "short" is more than enough for a year's earning!!! ;) ) but can't do the same with Bill Gates wealth. So, we go for things like long long and stuff. But aren't we still at the mercy of number of bits that are given to us.
So, how about if I emulate numerical operations in software? Say a class that abstracts and can do numerical operations on numbers with 1000s of digits...
Of course it will be too too slow, but I am not much worried about complexity but looking more at just computability...
Maybe I can use it to calculate PI to 1000 digit accuracy in a months or a Mersenne Primes in few years and take home $100K ;)
So now my question,
1) Are there already any such libraries to do this kind of stuff out there (In C/C++).
2) If I go about implementing one, do you have any suggestions for me? (+, -, *, /, %, <<, >> operations should enough I guess)
PS:
I am C/C++ programmer.
And this limitation started bugging me from my school days.

Such datatypes are known as Arbitrary-precision numbers. In Java, there are the classes BigDecimal and BigInteger which handle basic operations (+, -, *, /) on digit level. They have no 'built-in' size limitation. They are actually not that slow and are used in a lot of real-world domains.
C/C++ don't have it built-in but there are a lot of libraries out there. See a list here:
http://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic#Libraries

Related

What makes non linear functions computationally expensive in hardware (e.g. FPGA)?

I've read some articles that state non-linear functions (like exponentials) are computationally expensive.
I was wondering what makes them computationally expensive.
When referring to 'computationally expensive' does it mean in terms of time taken or hardware resources used?
I've tried searching on Google, but I couldn't find any simple explanations for this.
Not pretending to offer the answer, but start with what you have in fpga.
Normally you're limited to adders, multipliers and some memory. What can you do with those?
Linear function - easy, taking just one multiplier and one adder.
Nonlinear functions - what a those? Either polynomial, requiring you to spend a ton of multipliers (the more the higher the polynomial's degree), or even transcendental, requiring you to find some satisfactory approximation, doing that in many steps.
Even simple integer division can't be done in one clock, in simple implementations requiring as many steps as there's bits in the numbers being divided.
The other possible solution is to use a lookup tables. And it's great for a small range of arguments. But if you want to have the function values found in wide range of arguments, or with greater precision, you'll end up with lookup table that is so large that can't fit in the device you have to work with.
So that's the main costs - you'll spend lots of dedicated hardware resources (multipliers, memory for lookup tables), or spend lots of time in many-steps approximation algorithms, or algorithms that refine the results one "digit" per iteration (integer division, CORDIC, etc).

Is it possible to use one variable to represent numbers of unlimited length in programming? [duplicate]

This question already has answers here:
Most efficient implementation of a large number class
(5 answers)
Closed 8 years ago.
I've been using C# for three years to make games and I've played with various simulations where numbers sometimes get big and Int32 is not enough to store the value. Eventually even Int64 became insufficient for my experiments, it took several such fields (actually an array of variable length) and a special property to correctly handle such big numbers. And so I wondered: Is there a way to declare a numeric variable with unlimited (unknown beforehand) length so I can relax and let the computer do the math?
We can write any kind of number we like on paper without needing any special kind of paper. We can also type a lot of words in a text file without needing special file system alterations to make it save and load correctly. Isn't there a variable to declare a who-knows-how-long-it-will-be number in any programming languages?
Starting with .NET 4, the .NET framework contains a BigInteger structure, which can handle integers of arbitrary size.
Since your question is language-agnostic, it might be worth to mention that internally BigInteger stores the value in an array of unsigned integers, see the following SO question for details:
How does the BigInteger store values internally?
BigInteger is immutable, so there is no need to "resize" the array. Arithmetic operations create new instances of BigInteger, with appropriately sized arrays.
Most modern dynamic languages such as Perl6, Tcl8 and Ruby goes one step further by allowing you to store unlimited (up to available RAM) sized numbers in number types.
Most of these languages don't have separate integer and floating point types but rather a single "number" type that automatically gets converted to whatever it needs to be to be stored in RAM. Some, like Perl6, even includes complex numbers in its "number" type.
How it's implemented at the machine level is that by default numbers are assumed to be integers - so int32 or int64. If need be numbers are converted to floats or doubles if the result of a calculation or assignment isn't an integer. If the integer grows too large then the interpreter/runtime environment silently converts it to a bigInt object/struct (which is simply a big, growable array or linked-list of ints).
How it appears to the programmer is that numbers have unlimited size (again, up to available RAM).
Still, there are gotchas with this system (kind of like the 0.1+0.2!=0.3 issue with floats) so you'd still need to be aware of the underlying implementation even if you can ignore it 99.99% of the time.
For example, if at any point in time you super large number gets converted to a floating point number (most likely a double in hardware) you'll lose precision. Because that's just how floating point numbers work. Sometimes you can do it accidentally. In some languages for example, the power function (like pow() in C) returns a floating point result. So raising an integer to the power of another integer may truncate the result if it's too large.
For the most part, it works. And I personally feel that this is the sane way of dealing with numbers. Lots of language designers have apparently come up with this solution independently.
Is it possible to [...] represent numbers of unlimited length [...]?
No.
On existing computers it is not possible to represent unlimited numbers because the machines are finite. Even when using all existing storage it is not possible to store unlimited numbers.
It is possible, though, to store very large numbers. Wikipedia has information on the concept of arbitrary precision integers.
"Unlimited" - no, as Nikolai Ruhe soundly pointed out. "Unknown" - yes, qualified by the first point. :}
A BigInteger type is available in .NET 4.0 and in Java as others point out.
For .NET 2.0+, take a look at IntX.
More generally, languages (or a de facto library used with them at least) generally have some support for arbitrarily long integers, which provides a means of dealing with the "unknown" you describe.
A discussion on the Ubuntu forums somewhat addresses this question more generally and touches on specifics in more languages - some of which provide simpler means of leveraging arbitrarily large integers (e.g. Python and Common Lisp). Personally, the "relax and let the computer do the math" factor was highest for me in Common Lisp years ago: so it may pay to look around broadly for perspective as you seem inclined to do.

Difference between Gene Expression Programming and Cartesian Genetic Programming

Something pretty annoying in evolutionary computing is that mildly different and overlapping concepts tend to pick dramatically different names. My latest confusion because of this is that gene-expression-programming seems very similar to cartesian-genetic-programming.
(how) Are these fundamentally different concepts?
I've read that indirect encoding of GP instructions is an effective technique ( both GEP and CGP do that ). Has there been reached some sort of consensus that indirect encoding has outdated classic tree bases GP?
Well, it seems that there is some difference between gene expression programming (GEP) and cartesian genetic programming (CGP or what I view as classic genetic programming), but the difference might be more hyped up than it really ought to be. Please note that I have never used GEP, so all of my comments are based on my experience with CGP.
In CGP there is no distinction between genotype and a phenotype, in other words- if you're looking at the "genes" of a CGP you're also looking at their expression. There is no encoding here, i.e. the expression tree is the gene itself.
In GEP the genotype is expressed into a phenotype, so if you're looking at the genes you will not readily know what the expression is going to look like. The "inventor" of GP, Cândida Ferreira, has written a really good paper and there are some other resources which try to give a shorter overview of the whole concept.
Ferriera says that the benefits are "obvious," but I really don't see anything that would necessarily make GEP better than CGP. Apparently GEP is multigenic, which means that multiple genes are involved in the expression of a trait (i.e. an expression tree). In any case, the fitness is calculated on the expressed tree, so it doesn't seem like GEP is doing anything to increase the fitness. What the author claims is that GEP increases the speed at which the fitness is reached (i.e. in fewer generations), but frankly speaking you can see dramatic performance shifts from a CGP just by having a different selection algorithm, a different tournament structure, splitting the population into tribes, migrating specimens between tribes, including diversity into the fitness, etc.
Selection:
random
roulette wheel
top-n
take half
etc.
Tournament Frequency:
once per epoch
once per every data instance
once per generation.
Tournament Structure:
Take 3, kill 1 and replace it with the child of the other two.
Sort all individuals in the tournament by fitness, kill the lower half and replace it with the offspring of the upper half (where lower is worse fitness and upper is better fitness).
Randomly pick individuals from the tournament to mate and kill the excess individuals.
Tribes
A population can be split into tribes that evolve independently of each-other:
Migration- periodically, individual(s) from a tribe would be moved to another tribe
The tribes are logically separated so that they're like their own separate populations running in separate environments.
Diversity Fitness
Incorporate diversity into the fitness, where you count how many individuals have the same fitness value (thus are likely to have the same phenotype) and you penalize their fitness by a proportionate value: the more individuals with the same fitness value, the more penalty for those individuals. This way specimens with unique phenotypes will be encouraged, therefore there will be much less stagnation of the population.
Those are just some of the things that can greatly affect the performance of a CGP, and when I say greatly I mean that it's in the same order or greater than Ferriera's performance. So if Ferriera didn't tinker with those ideas too much, then she could have seen much slower performance of the CGPs... especially if she didn't do anything to combat stagnation. So I would be careful when reading performance statistics on GEP, because sometimes people fail to account for all of the "optimizations" available out there.
There seems to be some confusion in these answers that must be clarified. Cartesian GP is different from classic GP (aka tree-based GP), and GEP. Even though they share many concepts and take inspiration from the same biological mechanisms, the representation of the individuals (the solutions) varies.
In CGPthe representation (mapping between genotype and phenotype) is indirect, in other words, not all of the genes in a CGP genome will be expressed in the phenome (a concept also found in GEP and many others). The genotypes can be coded in a grid or array of nodes, and the resulting program graph is the expression of active nodes only.
In GEP the representation is also indirect, and similarly not all genes will be expressed in the phenotype. The representation in this case is much different from treeGP or CGP, but the genotypes are also expressed into a program tree. In my opinion GEP is a more elegant representation, easier to implement, but also suffers from some defects like: you have to find the appropriate tail and head size which is problem specific, the mnltigenic version is a bit of a forced glue between expression trees, and finally it has too much bloat.
Independently of which representation may be better than the other in some specific problem domain, they are general purpose, can be applied to any domain as long as you can encode it.
In general, GEP is simpler from GP. Let's say you allow the following nodes in your program: constants, variables, +, -, *, /, if, ...
For each of such nodes with GP you must create the following operations:
- randomize
- mutate
- crossover
- and probably other genetic operators as well
In GEP for each of such nodes only one operation is needed to be implemented: deserialize, which takes array of numbers (like double in C or Java), and returns the node. It resembles object deserialization in languages like Java or Python (the difference is that deserialization in programming languages uses byte arrays, where here we have arrays of numbers). Even this 'deserialize' operation doesn't have to be implemented by the programmer: it can be implemented by a generic algorithm, just like it's done in Java or Python deserialization.
This simplicity from one point of view may make searching of best solution less successful, but from other side: requires less work from programmer and simpler algorithms may execute faster (easier to optimize, more code and data fits in CPU cache, and so on). So I would say that GEP is slightly better, but of course the definite answer depends on problem, and for many problems the opposite may be true.

Why aren't Floating-Point Decimal numbers hardware accelerated like Floating-Point Binary numbers?

Is it worth it to implement it in hardware? If yes why? If not why not?
Sorry I thought it is clear that I am talking about Decimal Rational Numbers! Ok something like decNumber++ for C++, decimal for .NET... Hope it is clear now :)
The latest revision of the IEEE 754:2008 standard does indeed define hardware decimal floating point numbers, using the representations shown in the software referenced in the question. The previous version of the standard (IEEE 754:1985) did not provide decimal floating point numbers. Most current hardware implements the 1985 standard and not the 2008 standard, but IBM's iSeries computers using Power6 chips have such support, and so do the z10 mainframes.
The standardization effort for decimal floating point was spearheaded by Mike Cowlishaw of IBM UK, who has a web site full of useful information (including the software in the question). It is likely that in due course, other hardware manufacturers will also introduce decimal floating point units on their chips, but I have not heard a statement of direction for when (or whether) Intel might add one. Intel does have optimized software libraries for it.
The C standards committee is looking to add support for decimal floating point and that work is TR 24732.
Some IBM processors have dedicated decimal hardware included (Decimal Floating Point | DFP- unit).
In contribution of
answered Sep 18 at 23:43
Daniel Pryden
the main reason is that DFP-units need more transistors in a chip then BFP-units. The reason is the BCD Code to calculate decimal numbers in a binary environment. The IEEE754-2008 has several methods to minimize the overload. It seems that the DPD hxxp://en.wikipedia.org/wiki/Densely_packed_decimal method is more effective in comparison to the BID hxxp://en.wikipedia.org/wiki/Binary_Integer_Decimal method.
Normally, you need 4 bits to cover the decimal range from 0 to 9. Bit the 10 to 15 are invalid but still possible with BCD.
Therefore, the DPD compress 3*4=12 bit into 10 bit to cover the range from 000 to 999 with 1024 (10^2)possibilities.
In general it is to say, that BFP is faster then DFP.
And BFP need less space on a chip then DFP.
The question why IBM implemented a DFP unit is quite simple answered:
They build servers for the finance market. If data represents money, it should be reliable.
With hardware accelerated decimal arithmetic, some errors do not accour as in binary.
1/5 = 0.2 => 0.0110011001100110011001100110... in binary so recurrent fractions could be avoided.
And the overhelming round() function in excel would be useless anymore :D
(->function =1*(0,5-0,4-0,1) wtf!)
hope that explain your question a little!
There is (a tiny bit of) decimal string acceleration, but...
This is a good question. My first reaction was "macro ops have always failed to prove out", but after thinking about it, what you are talking about would go a whole lot faster if implemented in a functional unit. I guess it comes down to whether those operations are done enough to matter. There is a rather sorry history of macro op and application-specific special-purpose instructions, and in particular the older attempts at decimal financial formats are just legacy baggage now. For example, I doubt if they are used much, but every PC has the Intel BCD opcodes, which consist of
DAA, AAA, AAD, AAM, DAS, AAS
Once upon a time, decimal string instructions were common on high-end hardware. It's not clear that they ever made much of a benchmark difference. Programs spend a lot of time testing and branching and moving things and calculating addresses. It normally doesn't make sense to put macro-operations into the instruction set architecture, because overall things seem to go faster if you give the CPU the smallest number of fundamental things to do, so it can put all its resources into doing them as fast as possible.
These days, not even all the binary ops are actually in the real ISA. The cpu translates the legacy ISA into micro-ops at runtime. It's all part of going fast by specializing in core operations. For now the left-over transisters seem to be waiting for some graphics and 3D work, i.e., MMX, SSE, 3DNow!
I suppose it's possible that a clean-sheet design might do something radical and unify the current (HW) scientific and (SW) decimal floating point formats, but don't hold your breath.
No, they are very memory-inefficient. And the calculations are also on hardware not easy to implement (of course it can be done, but it also can use a lot of time).
Another disadvantage of the decimal format is, it's not widly used, before research showed that the binary-formatted numbers were more accurate the format was popular for a time. But now programmers know better. The decimal format is't efficient and is more lossy. Also additional hardware-representations require additional instruction-sets, that can lead to more difficult code.
Decimals (and more generally, fractions) are relatively easy to implement as a pair of integers. General purpose libraries are ubiquitous and easily fast enough for most applications.
Anyone who needs the ultimate in speed is going to hand-tune their implementation (eg changing the divisor to suit a particular usage, algebraicly combining/reordering the operations, clever use of SIMD shuffles...). Merely encoding the most common functions into a hardware ISA would surely never satisfy them -- in all likelihood, it wouldn't help at all.
The hardware you want used to be fairly common.
Older CPU's had hardware BCD (Binaray coded decimal) arithmetic. ( The little Intel chips had a little support as noted by earlier posters)
Hardware BCD was very good at speeding up FORTRAN which used 80 bit BCD for numbers.
Scientific computing used to make up a significant percentage of the worldwide market.
Since everyone (relatively speaking) got home PC running windows, the market got tiny
as a percentage. So nobody does it anymore.
Since you don't mind having 64bit doubles (binary floating point) for most things, it mostly works.
If you use 128bit binary floating point on modern hardware vector units it's not too bad. Still less accurate than 80bit BCD, but you get that.
At an earlier job, a colleague formerly from JPL was astonished we still used FORTRAN. "We've converted to C and C++ he told us." I asked him how they solved the problem of lack of precision. They'd not noticed. (They have also not the same space probe landing accuracy they used to have. But anyone can miss a planet.)
So, basically 128bit doubles in the vector unit are more okay, and widely available.
My twenty cents. Please don't represent it as a floating point number :)
Decimal floating-point standard (IEEE 754-2008) is already implemented in hardware by two companies; IBM's POWER 6/7 based servers, and SilMinds SilAx PCIe-based acceleration card.
SilMinds published a case study about converting the Decimal arithmetic execution to use its HW solutions. A great boost in time and slashed energy consumption are presented.
Moreover several publications by "Michael J. Schulte" and others reveal very positive benchmarks results, and some comparison between DPD and BID formats (both defined in the IEEE 754-2008 standard)
You can find pdfs to:
Performance analysis of decimal floating-point libraries and its impact on decimal hardware and software solutions
A survey of hardware designs for decimal arithmetic
Energy and Delay Improvement via Decimal Floating Point Units
Those 3 papers should be more than enough for your questions!
I speculate that there are no compute-intensive applications of decimal numbers. On the other hand, floating points number are extensively used in engineering applications, which must handle enormous amounts of data and do not need exact results, just need to stay within a desired precision.
The simple answer is that computers are binary machines. They don't have ten fingers, they have two. So building hardware for binary numbers is considerably faster, easier, and more efficient than building hardware for decimal numbers.
By the way: decimal and binary are number bases, while fixed-point and floating-point are mechanisms for approximating rational numbers. The two are completely orthogonal: you can have floating-point decimal numbers (.NET's System.Decimal is implemented this way) and fixed-point binary numbers (normal integers are just a special case of this).
Floating point math essentially IS an attempt to implement decimals in hardware. It's troublesome, which is why the Decimal types are created partly in software. It's a good question, why CPUs don't support more types, but I suppose it goes back to CISC vs. RISC processors -- RISC won the performance battle, so they try to keep things simple these days I guess.
Modern computers are usually general purpose. Floating point arithmetic is very general purpose, while Decimal has a far more specific purpose. I think that's part of the reason.
Do you mean the typical numeric integral types "int", "long", "short" (etc.)? Because operations on those types are definitely implemented in hardware. If you're talking about arbitrary-precision large numbers ("BigNums" and "Decimals" and such), it's probably a combination of rarity of operations using these data types and the complexity of building hardware to deal with arbitrarily large data formats.

When to use Fixed Point these days

For intense number-crunching i'm considering using fixed point instead of floating point. Of course it'll matter how many bytes the fixed point type is in size, on what CPU it'll be running on, if i can use (for Intel) the MMX or SSE or whatever new things come up...
I'm wondering if these days when floating point runs faster than ever, is it ever worth considering fixed point? Are there general rules of thumb where we can say it'll matter by more than a few percent? What is the overview from 35,000 feet of numerical performance? (BTW, i'm assuming a general CPU as found in most computers, not DSP or specialized embedded systems.)
It's still worth it. Floating point is faster than in the past, but fixed-point is also. And fixed is still the only way to go if you care about precision beyond that guaranteed by IEEE 754.
In situations where you are dealing with very large amounts of data, fixed point can be twice as memory efficient, e.g. a four byte long integer as opposed to an eight byte double. A technique often used in large geospatial datasets is to reduce all the data to a common origin, such that the most significant bits can be disposed of, and work with fixed point integers for the rest. Floating point is only important if the point does actually float, i.e. you are dealing with a very wide range of numbers at very high accuracy.
Another good reason to use fixed decimal is that rounding is much simpler and predictable. Most of the financial software uses fixed point arbitrary precision decimals with half-even rounding to represent money.
Its nearly ALWAYS faster to use fixed point (experience of x86, pentium, 68k and ARM). It can, though, also depend on the application type. For graphics programming (one of my main uses of fixed point) I've been able to optimize the code using prebuilt cosine tables, log tables etc. But also the basic mathematical operations have also proven faster.
A comment on financial software. It was said in an earlier answer that fixed point is useful for financial calculations. In my own experience (development of large treasury management system and extensive experience of credit card processing) I would NOT use fixed point. You will have rounding errors using either floating or fixed point. We always use whole amounts to represent monetary amounts, counting the minimum amount possible (1c for Euro or dollar). This ensure no partial amounts are ever lost. When doing complex calculations values are converted to doubles, application specific rounding rules are applied and results are converted back to whole numbers.
Use fixed-point when the hardware doesn't support floating-point or the hardware implementation sucks.
Also beware when making classes for it. Something you think would be quick could actually turn out to be a dog when it comes to profiling due to (un)necessary copies of classes. That is another question for another time however.
Another reason to use fixed-point is that ARM devices, like mobile phones and tablets, lack of FPU (at least many of them).
For developing real-time applications it makes sense to optimize functions using fixed-point arithmetic. There are implementations of FFTs (Fast Fourier Transform), very importan for graphics, that base its improvements on efficiency on relying on floating point arithmetic.
Since you are using a general-purpose CPU, I would suggest not using fixed point, unless performance is so critical for your application that you have to count every tic. The hassle of implementing fixed point, and dealing with issues like overflow is just not worth it, when you have a CPU, which will do it for you.
IMHO, fixed point is only necessary when you are using a DSP without hardware support for floating point operations.