Why isn't MIPS BLT instruction implemented in hardware? - branch

I'd like to ask why the BLT instruction is not a part of MIPS ISA. Instead they actually make it a pseudo-instruction for assembly programmers. I just couldn't recognize the difference between BLT and BLTZ (which is a part of MIPS ISA) from hardware implementation point of view.
By the way the book "Computer Organization and Design" says :
Heeding von Neumann's warning about the simplicity of the "equipment" the MIPS architecture doesn't include branch on less than because it's too complicated either it would stretch the clock cycle time or it would take extra clock cycles per instruction, the two faster instructions are more useful.
But I still have no idea why those might happen.

It's much easier to compare with zero
Because less than zero is really just a sign bit check while less than is
a subtract and then jump based on the sign bit of the result, or
a lexicographic comparison of the binary representations of the input values
Either way is far more complex than a single bit check, therefore BLTZ is much faster. Similarly, >= 0 also needs only 1 sign bit check. <= 0 and > 0 need another zero check but it's rather trivial.
In fact, it's not common that an architecture supports jump-and-compare between two values directly in a single instruction, although most will have jumps based on the value in relation to 0. Even CISC architectures like x86 require the user to compare and then jump

Related

How to decide when to use fixed point arithmetic over float?

How to decide when to use fixed point arithmetic over float?
I have read that, fixed point is used when there is no Floating point unit in the processor. When there is no FPU, does that mean 'float' datatype is not supported ?
If you have no FPU fixed point will be faster and more deterministic
If you have no FPU fixed point will in many cases be smaller - certainly for simple arithmetic.
If you need your code to generate bit-identical results across different platforms or toolchains with or without an FPU, then fixed point is necessary.
If you need to do complex math requiring trig or log functions for example, floating point is the path of least resistance, but by no means the only option - but you need to develop or find a library (see links below) - and there are plenty of ways of doing that badly.
If you need wide dynamic range floating point is simpler. For example the square root of a number less than one is a smaller number - with fixed point you can run out of bits and end up with zero, with floating point, the point is simply moved to increase the resolution at the expense of range.
If you have an FPU and are using an RTOS and don't want the overhead of stacking FPU registers on a context switch (or if it is not supported), fixed-point avoids the need, and avoids errors if you forget to enable the option for every task that needs it.
Generally if your operation is trivial, use fixed point or at least an integer representation by selecting your units appropriately. For example storing voltage values in integer millivolts (or even in ADC quanta) rather then in volts can avoid unnecessary floating point.
If you are doing complex maths and have an FPU, floating point is the simpler, least error prone solution. Even without and FPU, if your solution meets timing and code size constraints, floating point may still be simpler, but may restrict your ability to use the same code in more constraint execution environments. So is reuse across a wide range of platforms is required fixed-point may be preferable.
Whatever you do, avoid "decimal fixed point" in most cases, use where possible use a binary fixed point representation (Q representation), where for example 10Q6 has 10 integer bits and 6 fractional bits. The reason for this is that rescaling following multiply/divide are then shift operations rather than potentially expensive multiply/divide operations and you loose no precision in the rescaling.
Some useful references:
https://www.drdobbs.com/cpp/optimizing-math-intensive-applications-w/207000448
http://jet.ro/files/The_neglected_art_of_Fixed_Point_arithmetic_20060913.pdf
If there is no FPU available, compilers can often emulate floating point arithmetics. But this is inefficient, as it takes a lot of cycles.
If you are resource constraint (which you often are in environments without a FPU), you can then opt for fixed point arithmetic, which uses regular integer operations.
Just rambling: When I was using FP, I missed support from the compiler (C/C++) to be able to mark variables to be fixed point (with some specific number of fractional bits).
If you have a standard compliant compiler, then float and double are always available and work correctly. If there isn't an FPU then the calculations are done in software (called soft-FPU or FPU emulation). This is slower and uses more memory.
When to use fixed point is mainly a matter of opinion, but when NOT to use it is when a variable has a large dynamic range, ie: when a number could be very large but you still need it to be accurate if it is very small.
Eg: Displaying the speed of a car. I need to know the difference between 68 and 70 mph, but I don't care about the difference between 0.68 mph and 0.70 mph. This has low dynamic range (that I care about) so I could use fixed-point if other reasons suggested that I might want to. Alternatively, measuring the radio-activity of a sample: I care about the difference between 100 and 90 counts per second and I still care about the difference between 1 and 0.9 counts per second. This high dynamic range means that fixed point would not be suitable.
How to decide when to use fixed point arithmetic over float?
It depends on many factors which may or may not affect you, including...
Pros:
Fixed-point requires less circuitry so may be more practical on smaller, simpler devices.
Fixed-point uses less energy so may be more practical
on battery-powered devices,
in applications where intensive computation incurs a significant energy bill, or
where heat dissipation is a problem.
Fixed-point is really just integer arithmetic so operations are lossless.
Fixed-point allows you to precisely express real numbers in the number base of your choice, e.g. the values 1/10 or 1/3.
Floating-point arithmetic can exhibit inconsistent behavior related to things like
global rounding modes,
optimisation,
associativity,
implementation-defined behavior, and
variations in FPU hardware.
Cons:
While lossless, fixed-point arithmetic is prone to out-of-range errors, such as overflow. (Libraries and UB sanitizers can help avoid/detect errors.)
Lossless division is achieved with the help of the modulus operator (%), which is often harder to work with.
Fixed-point arithmetic is not as well supported in most languages: you have to perform error-prone calculations by hand using integers or find a library to help you.
Floating-point formats tend to be more consistent across architectures, unlike integers which vary in width and endianness.
Not only do floating-point types have a dynamic radix point, but the optimal position of that point is maintained automatically, saving headaches and precision loss.
Really, floating-point is an easy-to-use and widely-supported solution for representing real numbers. If floating-point works for you, think carefully before exploring alternatives.
Exception: an application where the pros make fixed-point a natural choice is currency. Lossless representation of denominations is especially important in finance. (example)
I have read that, fixed point is used when there is no Floating point unit in the processor.
That is true. Without an FPU, floating-point arithmetic is very slow!
When there is no FPU, does that mean 'float' datatype is not supported ?
No, that shouldn't necessarily follow, although implementations may vary. (IIRC, older versions of GCC had a flag to enable floating-point support.) It's entirely possible to perform floating-point operations using
the compiler, by generating equivalent ALU instructions, or
the system, by interrupting the process when an FPU instruction is encountered and deferring to a software routine.
Both of these are much slower, so using fixed-point arithmetic on such hardware may become a practical choice. It may be best to think of fixed-point as an optimisation technique, used once a performance deficit has been identified.

SMT solvers for bit vector arithmetic

I'm planning some experiments in symbolic execution of C code, using an off-the-shelf SMT solver, and wondering which solver to use; looking at e.g. the SMT contest entrants, and taking only the open-source systems, narrows it down to Beaver, Boolector, CVC3, OpenSMT, Sateen, Sonolar, STP, Verit; which is still a long list.
Trying to narrow it down a little further, I notice that some of the systems advertise the ability to handle bit vector arithmetic, whereas others only advertise the ability to handle general integer arithmetic. In principle, the former is correct for C, where variables are machine words, not unbounded integers. How much difference does it make in practice? What happens if you try to use a general integer system for this kind of job? Does one of the following scenarios apply?
A bit vector system is slightly more efficient, but you can use either, no problem.
You can use a general integer system with a bit of tweaking.
A general integer system is fine for signed int (because the result of overflow is undefined) but will give the wrong answer for unsigned.
A general integer system just isn't correct for machine word arithmetic, and I can reduce my short list to only those systems that provide bit vector arithmetic.
Something else...?
I've tried to ask as specific a question as possible, but if anyone can suggest any other criteria for narrowing down the list, that would be great!
I've had good experience using STP for symbolic execution. STP was designed precisely for this task. Also, there have been a number of symbolic execution tools that have successfully used STP for this purpose, so there is reason to believe that STP doesn't suck. I would definitely recommend STP to others as a default choice for this sort of experimentation.
However, I haven't tried the other systems, so I don't know how STP compares to them.
Personally, I see STP as the baseline and the default choice for this kind of application. So, if you only have time to try one solver, trying STP seems like a pretty reasonable choice.
If I had to guess, my guess would be that bit-vector arithmetic is important to support, because any large systems code is going to have a non-trivial amount of code that performs bitwise operations. Also, I'd suspect/worry some systems code may rely upon the behavior of unsigned arithmetic to wrap modulo 2n, and if you try to model it with integers, you will not get the semantics of C right (because, as you say, integers just aren't correct for machine-word arithmetic) and consequently, if you try to use an integer-only solver, you may experience some difficulties. However, I have no hard evidence for either of these suspicions.
P.S. Z3 might also be a contender to add to your list to consider. (Do you really need your solver to be open-source, as long as it is free? I'd expect that a symbolic execution tool would use it only as a blackbox, without modification.)
According to SMT-Wikipedia at 2011-08, we have:
Based on these measures, it appears that the most vibrant, well-organized projects are OpenSMT, STP and CVC4.
I'm just checking this stuff - so far, all three seems reasonable, plus older CVC -> CVC3.

Consistant behaviour of float code with GCC

I do some numerical computing, and I have often had problems with floating points computations when using GCC. For my current purpose, I don't care too much about the real precision of the results, but I want this firm property:
no matter WHERE the SAME code is in my program, when it is run on the SAME inputs, I want it to give the SAME outputs.
How can I force GCC to do this? And specifically, what is the behavior of --fast-math, and the different -O optimizations?
I've heard that GCC might try to be clever, and sometimes load floats in registers, and sometime read them directly from memory, and that this might change the precision of the floats, resulting in a different output. How can I avoid this?
Again, I want :
my computations to be fast
my computations to be reliable (ie. same input -> same result)
I don't care that much about the precision for this particular code, so I can be fine with reduced precision if this brings reliability
could anyone tell me what is the way to go for this problem?
If your targets include x86 processors, using the switch that makes gcc use SSE2 instructions (instead of the historical stack-based ones) will make these run more like the others.
If your targets include PowerPC processors, using the switch that makes gcc not use the fmadd instruction (to replace a multiplication followed by an addition in the source code) will make these run more like the others.
Do not use --fast-math: this allows the compiler to take some shortcuts, and this will cause differences between architectures. Gcc is more standard-compliant, and therefore predictable, without this option.
Including your own math functions (exp, sin, ...) in your application instead of relying on those from the system's library can only help with predictability.
And lastly, even when the compiler does rigorously respect the standard (I mean C99 here), there may be some differences, because C99 allows intermediate results to be computed with a higher precision than required by the type of the expression. If you really want the program always to give the same results, write three-address code. Or, use only the maximum precision available for all computations, which would be double if you can avoid the historical x86 instructions. In any case do not use lower-precision floats in an attempt to improve predictability: the effect would be the opposite, as per the above clause in the standard.
I think that GCC is pretty well documented so I'm not going to reveal my own ignorance by trying to answer the parts of your question about its options and their effects. I would, though, make the general statement that when numeric precision and performance are concerned, it pays big dividends to read the manual. The clever people who work on GCC put a lot of effort into their documentation, reading it is rewarding (OK, it can be a trifle dull, but heck, it's a compiler manual not a bodice-ripper).
If it is important to you that you get identical-to-the-last-bit numeric results you'll have to concern yourself with more than just GCC and how you can control its behaviour. You'll need to lock down the libraries it calls, the hardware it runs on and probably a number of other factors I haven't thought of yet. In the worst (?) case you may even want to, and I've seen this done, write your own implementations of f-p maths to guarantee bit-identity across platforms. This is difficult, and therefore expensive, and leaves you possibly less certain of the correctness of your own code than of the code usd by GCC.
However, you write
I don't care that much about the precision for this particular code, so I can be fine with reduced precision if this brings reliability
which prompts the question to you -- why don't you simply use 5-decimal-digit precision as your standard of (reduced) precision ? It's what an awful lot of us in numerical computing do all the time; we ignore the finer aspects of numerical analysis since they are difficult, and costly in computation time, to circumvent. I'm thinking of things like interval arithmetic and high-precision maths. (OF course, if 5 is not right for you, choose another single-digit number.)
But the good news is that this is entirely justifiable: we're dealing with scientific data which, by its nature, comes with errors attached (of course we generally don't know what the errors are but that's another matter) so it's OK to disregard the last few digits in the decimal representation of, say, a 64-bit f-p number. Go right ahead and ignore a few more of them. Even better, it doesn't matter how many bits your f-p numbers have, you will always lose some precision doing numerical calculations on computers; adding more bits just pushes the errors back, both towards the least-significant-bits and towards the end of long-running computations.
The case you have to watch out for is where you have such a poor algorithm, or a poor implementation of an algorithm, that it loses lots of precision quickly. This usually shows up with any reasonable size of f-p number. Your test suite should have exposed this if it is a real problem for you.
To conclude: you have to deal with loss of precision in some way and it's not necessarily wrong to brush the finer details under the carpet.

Why aren't Floating-Point Decimal numbers hardware accelerated like Floating-Point Binary numbers?

Is it worth it to implement it in hardware? If yes why? If not why not?
Sorry I thought it is clear that I am talking about Decimal Rational Numbers! Ok something like decNumber++ for C++, decimal for .NET... Hope it is clear now :)
The latest revision of the IEEE 754:2008 standard does indeed define hardware decimal floating point numbers, using the representations shown in the software referenced in the question. The previous version of the standard (IEEE 754:1985) did not provide decimal floating point numbers. Most current hardware implements the 1985 standard and not the 2008 standard, but IBM's iSeries computers using Power6 chips have such support, and so do the z10 mainframes.
The standardization effort for decimal floating point was spearheaded by Mike Cowlishaw of IBM UK, who has a web site full of useful information (including the software in the question). It is likely that in due course, other hardware manufacturers will also introduce decimal floating point units on their chips, but I have not heard a statement of direction for when (or whether) Intel might add one. Intel does have optimized software libraries for it.
The C standards committee is looking to add support for decimal floating point and that work is TR 24732.
Some IBM processors have dedicated decimal hardware included (Decimal Floating Point | DFP- unit).
In contribution of
answered Sep 18 at 23:43
Daniel Pryden
the main reason is that DFP-units need more transistors in a chip then BFP-units. The reason is the BCD Code to calculate decimal numbers in a binary environment. The IEEE754-2008 has several methods to minimize the overload. It seems that the DPD hxxp://en.wikipedia.org/wiki/Densely_packed_decimal method is more effective in comparison to the BID hxxp://en.wikipedia.org/wiki/Binary_Integer_Decimal method.
Normally, you need 4 bits to cover the decimal range from 0 to 9. Bit the 10 to 15 are invalid but still possible with BCD.
Therefore, the DPD compress 3*4=12 bit into 10 bit to cover the range from 000 to 999 with 1024 (10^2)possibilities.
In general it is to say, that BFP is faster then DFP.
And BFP need less space on a chip then DFP.
The question why IBM implemented a DFP unit is quite simple answered:
They build servers for the finance market. If data represents money, it should be reliable.
With hardware accelerated decimal arithmetic, some errors do not accour as in binary.
1/5 = 0.2 => 0.0110011001100110011001100110... in binary so recurrent fractions could be avoided.
And the overhelming round() function in excel would be useless anymore :D
(->function =1*(0,5-0,4-0,1) wtf!)
hope that explain your question a little!
There is (a tiny bit of) decimal string acceleration, but...
This is a good question. My first reaction was "macro ops have always failed to prove out", but after thinking about it, what you are talking about would go a whole lot faster if implemented in a functional unit. I guess it comes down to whether those operations are done enough to matter. There is a rather sorry history of macro op and application-specific special-purpose instructions, and in particular the older attempts at decimal financial formats are just legacy baggage now. For example, I doubt if they are used much, but every PC has the Intel BCD opcodes, which consist of
DAA, AAA, AAD, AAM, DAS, AAS
Once upon a time, decimal string instructions were common on high-end hardware. It's not clear that they ever made much of a benchmark difference. Programs spend a lot of time testing and branching and moving things and calculating addresses. It normally doesn't make sense to put macro-operations into the instruction set architecture, because overall things seem to go faster if you give the CPU the smallest number of fundamental things to do, so it can put all its resources into doing them as fast as possible.
These days, not even all the binary ops are actually in the real ISA. The cpu translates the legacy ISA into micro-ops at runtime. It's all part of going fast by specializing in core operations. For now the left-over transisters seem to be waiting for some graphics and 3D work, i.e., MMX, SSE, 3DNow!
I suppose it's possible that a clean-sheet design might do something radical and unify the current (HW) scientific and (SW) decimal floating point formats, but don't hold your breath.
No, they are very memory-inefficient. And the calculations are also on hardware not easy to implement (of course it can be done, but it also can use a lot of time).
Another disadvantage of the decimal format is, it's not widly used, before research showed that the binary-formatted numbers were more accurate the format was popular for a time. But now programmers know better. The decimal format is't efficient and is more lossy. Also additional hardware-representations require additional instruction-sets, that can lead to more difficult code.
Decimals (and more generally, fractions) are relatively easy to implement as a pair of integers. General purpose libraries are ubiquitous and easily fast enough for most applications.
Anyone who needs the ultimate in speed is going to hand-tune their implementation (eg changing the divisor to suit a particular usage, algebraicly combining/reordering the operations, clever use of SIMD shuffles...). Merely encoding the most common functions into a hardware ISA would surely never satisfy them -- in all likelihood, it wouldn't help at all.
The hardware you want used to be fairly common.
Older CPU's had hardware BCD (Binaray coded decimal) arithmetic. ( The little Intel chips had a little support as noted by earlier posters)
Hardware BCD was very good at speeding up FORTRAN which used 80 bit BCD for numbers.
Scientific computing used to make up a significant percentage of the worldwide market.
Since everyone (relatively speaking) got home PC running windows, the market got tiny
as a percentage. So nobody does it anymore.
Since you don't mind having 64bit doubles (binary floating point) for most things, it mostly works.
If you use 128bit binary floating point on modern hardware vector units it's not too bad. Still less accurate than 80bit BCD, but you get that.
At an earlier job, a colleague formerly from JPL was astonished we still used FORTRAN. "We've converted to C and C++ he told us." I asked him how they solved the problem of lack of precision. They'd not noticed. (They have also not the same space probe landing accuracy they used to have. But anyone can miss a planet.)
So, basically 128bit doubles in the vector unit are more okay, and widely available.
My twenty cents. Please don't represent it as a floating point number :)
Decimal floating-point standard (IEEE 754-2008) is already implemented in hardware by two companies; IBM's POWER 6/7 based servers, and SilMinds SilAx PCIe-based acceleration card.
SilMinds published a case study about converting the Decimal arithmetic execution to use its HW solutions. A great boost in time and slashed energy consumption are presented.
Moreover several publications by "Michael J. Schulte" and others reveal very positive benchmarks results, and some comparison between DPD and BID formats (both defined in the IEEE 754-2008 standard)
You can find pdfs to:
Performance analysis of decimal floating-point libraries and its impact on decimal hardware and software solutions
A survey of hardware designs for decimal arithmetic
Energy and Delay Improvement via Decimal Floating Point Units
Those 3 papers should be more than enough for your questions!
I speculate that there are no compute-intensive applications of decimal numbers. On the other hand, floating points number are extensively used in engineering applications, which must handle enormous amounts of data and do not need exact results, just need to stay within a desired precision.
The simple answer is that computers are binary machines. They don't have ten fingers, they have two. So building hardware for binary numbers is considerably faster, easier, and more efficient than building hardware for decimal numbers.
By the way: decimal and binary are number bases, while fixed-point and floating-point are mechanisms for approximating rational numbers. The two are completely orthogonal: you can have floating-point decimal numbers (.NET's System.Decimal is implemented this way) and fixed-point binary numbers (normal integers are just a special case of this).
Floating point math essentially IS an attempt to implement decimals in hardware. It's troublesome, which is why the Decimal types are created partly in software. It's a good question, why CPUs don't support more types, but I suppose it goes back to CISC vs. RISC processors -- RISC won the performance battle, so they try to keep things simple these days I guess.
Modern computers are usually general purpose. Floating point arithmetic is very general purpose, while Decimal has a far more specific purpose. I think that's part of the reason.
Do you mean the typical numeric integral types "int", "long", "short" (etc.)? Because operations on those types are definitely implemented in hardware. If you're talking about arbitrary-precision large numbers ("BigNums" and "Decimals" and such), it's probably a combination of rarity of operations using these data types and the complexity of building hardware to deal with arbitrarily large data formats.

When to use Fixed Point these days

For intense number-crunching i'm considering using fixed point instead of floating point. Of course it'll matter how many bytes the fixed point type is in size, on what CPU it'll be running on, if i can use (for Intel) the MMX or SSE or whatever new things come up...
I'm wondering if these days when floating point runs faster than ever, is it ever worth considering fixed point? Are there general rules of thumb where we can say it'll matter by more than a few percent? What is the overview from 35,000 feet of numerical performance? (BTW, i'm assuming a general CPU as found in most computers, not DSP or specialized embedded systems.)
It's still worth it. Floating point is faster than in the past, but fixed-point is also. And fixed is still the only way to go if you care about precision beyond that guaranteed by IEEE 754.
In situations where you are dealing with very large amounts of data, fixed point can be twice as memory efficient, e.g. a four byte long integer as opposed to an eight byte double. A technique often used in large geospatial datasets is to reduce all the data to a common origin, such that the most significant bits can be disposed of, and work with fixed point integers for the rest. Floating point is only important if the point does actually float, i.e. you are dealing with a very wide range of numbers at very high accuracy.
Another good reason to use fixed decimal is that rounding is much simpler and predictable. Most of the financial software uses fixed point arbitrary precision decimals with half-even rounding to represent money.
Its nearly ALWAYS faster to use fixed point (experience of x86, pentium, 68k and ARM). It can, though, also depend on the application type. For graphics programming (one of my main uses of fixed point) I've been able to optimize the code using prebuilt cosine tables, log tables etc. But also the basic mathematical operations have also proven faster.
A comment on financial software. It was said in an earlier answer that fixed point is useful for financial calculations. In my own experience (development of large treasury management system and extensive experience of credit card processing) I would NOT use fixed point. You will have rounding errors using either floating or fixed point. We always use whole amounts to represent monetary amounts, counting the minimum amount possible (1c for Euro or dollar). This ensure no partial amounts are ever lost. When doing complex calculations values are converted to doubles, application specific rounding rules are applied and results are converted back to whole numbers.
Use fixed-point when the hardware doesn't support floating-point or the hardware implementation sucks.
Also beware when making classes for it. Something you think would be quick could actually turn out to be a dog when it comes to profiling due to (un)necessary copies of classes. That is another question for another time however.
Another reason to use fixed-point is that ARM devices, like mobile phones and tablets, lack of FPU (at least many of them).
For developing real-time applications it makes sense to optimize functions using fixed-point arithmetic. There are implementations of FFTs (Fast Fourier Transform), very importan for graphics, that base its improvements on efficiency on relying on floating point arithmetic.
Since you are using a general-purpose CPU, I would suggest not using fixed point, unless performance is so critical for your application that you have to count every tic. The hassle of implementing fixed point, and dealing with issues like overflow is just not worth it, when you have a CPU, which will do it for you.
IMHO, fixed point is only necessary when you are using a DSP without hardware support for floating point operations.