I've just spent most of today trying to find some sort of function to generate keys for known images, for later comparison to determine what the image is. I have attempted to use SIFT and SURF descriptors, both of which are too slow (and patented for commercial use). My latest attempt was creating a dct hash using:
int mm_dct_imagehash(const char* file, float sigma, uint64_t *hash){
if (!file) return -1;
if (!hash) return -2;
*hash = 0;
IplImage *img = cvLoadImage(file, CV_LOAD_IMAGE_GRAYSCALE);
if (!img) return -3;
cvSmooth(img, img, CV_GAUSSIAN, 7, 7, sigma, sigma);
IplImage *img_resized = cvCreateImage(cvSize(32,32), img->depth, img->nChannels);
if (!img_resized) return -4;
cvResize(img, img_resized, CV_INTER_CUBIC);
IplImage *img_prime = cvCreateImage(cvSize(32,32), IPL_DEPTH_32F, img->nChannels);
if (!img_prime) return -5;
cvConvertScale(img_resized, img_prime,1, 0);
IplImage *dct_img = cvCreateImage(cvSize(32,32), IPL_DEPTH_32F, img->nChannels);
if (!dct_img) return -6;
cvDCT(img_prime, dct_img, CV_DXT_FORWARD);
cvSetImageROI(dct_img, cvRect(1,1,8,8));
double minval, maxval;
cvMinMaxLoc(dct_img, &minval, &maxval, NULL, NULL, NULL);
double medval = (maxval + minval)/2;
int i,j;
for (i=1;i<=8;i++){
const float *row = (const float*)(dct_img->imageData + i*dct_img->widthStep);
for (j=1;j<=8;j++){
if (row[j] > medval){
(*hash) |= 1;
(*hash) <<= 1;
return 0;
This did generate something of the type I was looking for, but when I tried comparing it to a database of known hashes, I had as many false positives as I had positives. And so, I'm back at it and thought I might ask the experts.
Would any of you know/have a function that could give me some sort of identifier/checksum for provided images, which would remain similar across similar images so it could be used to quickly identify images via comparison to a database? In short, which category of checksums the image best matches to?
I'm not looking for theories, concepts, papers or ideas, but actually working solutions. I'm not spending another day digging at a dead end, and appreciate anyone who takes the time to put together some code.
With a bit more research, I know that the autoit devs designed pixelchecksum to use the "Adler-32" algorithm. I guess the next step is to find a c implementation and to get it to process pixel data. Any suggestions are welcome!
A google search for "microsoft image hashing" has near the top the two best papers on the subject I am aware of. Both offer practical solutions.
The short answer is that there's no out of the box working solution for your problem. Additionally, the Adler-32 algorithm will not solve your problem.
Unfortunately, comparing image by visual similarity using image signatures (or a related concept) is a very active and open research topic. For example, you said that you had many false positives in your tests. However, what is a correct or incorrect result is subjective and will depend on your application.
In my opinion, the only way to solve your problem is find a adequate image descriptor for your problem and use then to compare the images. Note that comparing descriptors extracted from image is not a trivial task.
How can I get efficiently a single move out of an attack mask, that looks like this:
for a queen.
What I've done in the past, is to get the square-indices of every single possible move from the queen by counting the trailing zeros (bitScanForward)
and after I generated the new move i removed this square from the attack mask and continued with the next attack-square. Is there any technic to get the single attack bits directly?
I think what you are describing is already the most efficient way. Looping over the bitboard until it is zero and pick one move at a time.
To sketch the idea with some code, it could look like this:
using Bitboard = uint64_t; // 64 bit unsigned integer
pMoves createAllMoves(Bitboard mask, int from_sq, Move* pMoves) {
while(moves != 0) {
int to_sq = findAndClearSetBit(mask);
*pMoves++ = createMove(from_sq, to_sq);
return pMoves;
The findAndClearSetBit function can choose any set bit, but commonly on today's hardware, finding the least significant bit is most efficient. If you are using GCC or Clang, you can use __builtin_ctzll which should be optimized to the specific hardware:
int findAndClearSetBit(Bitboard& mask) {
int sq = __builtin_ctzll(mask); // find least significant bit
mask &= mask - 1; // clear least significant bit
return sq;
If I am not mistaken, your existing function bitScanForward is already an implementation to find the least significant bit. So, you can use it to get a portable version.
For example, there are QR scanners which scan video stream in real time and get QR codes info.
I would like to check the light source from the video, if it is on or off, it is quite powerful so it is no problem.
I will probably take a video stream as input, maybe make images of it and analyze images or stream in real time for presence of light source (maybe number of pixels of certain color on the image?)
How do I approach this problem? Maybe there is some source of library?
It sounds like you are asking for information about several discreet steps. There are a multitude of ways to do each of them and if you get stuck on any individual step it would be a good idea to post a question about it individually.
1: Get video Frame
Like chaitanya.varanasi said, AVFoundation Framework is the best way of getting access to an video frame on IOS. If you want something less flexible and quicker try looking at open CV's video capture. The goal of this step is to get access to a pixel buffer from the camera. If you have trouble with this, ask about it specifically.
2: Put pixel buffer into OpenCV
This part is really easy. If you get it from openCV's video capture you are already done. If you get it from an AVFoundation you will need to put it into openCV like this
//Buffer is of type CVImageBufferRef, which is what AVFoundation should be giving you
//I assume it is BGRA or RGBA formatted, if it isn't, change CV_8UC4 to the appropriate format
CVPixelBufferLockBaseAddress( Buffer, 0 );
int bufferWidth = CVPixelBufferGetWidth(Buffer);
int bufferHeight = CVPixelBufferGetHeight(Buffer);
unsigned char *pixel = (unsigned char *)CVPixelBufferGetBaseAddress(Buffer);
cv::Mat image = cv::Mat(bufferHeight,bufferWidth,CV_8UC4,pixel); //put buffer in open cv, no memory copied
//Process image Here
//End processing
CVPixelBufferUnlockBaseAddress( pixelBuffer, 0 );
note I am assuming you plan to do this in OpenCV since you used its tag. Also I assume you can get the OpenCV framework to link to your project. If that is an issue, ask a specific question about it.
3: Process Image
This part is by far the most open ended. All you have said about your problem is that you are trying to detect a strong light source. One very quick and easy way of doing that would be to detect the mean pixel value in a greyscale image. If you get the image in colour you can convert with cvtColor. Then just call Avg on it to get the mean value. Hopefully you can tell if the light is on by how that value fluctuates.
chaitanya.varanasi suggested another option, you should check it out too.
openCV is a very large library that can do a wide wide variety of things. Without knowing more about your problem I don't know what else to tell you.
Look at the AVFoundation Framework from Apple.
Hope it helps!
You can try this method: start by getting all images to an AVCaptureVideoDataOutput. From the method:captureOutput:didOutputSampleBuffer:fromConnection,you can sample/calculate every pixel. Source: answer
Also, you can take a look at this SO question where they check if a pixel is black. If its such a powerful light source, you can take the inverse of the pixel and then determine using a set threshold for black.
The above sample code only provides access to the pixel values stored in the buffer; you cannot run any other commands but those that change those values on a pixel-by-pixel basis:
for ( uint32_t y = 0; y < height; y++ )
for ( uint32_t x = 0; x < width; x++ )
bgraImage.at<cv::Vec<uint8_t,4> >(y,x)[1] = 0;
This—to use your example—will not work with the code you provided:
cv::Mat bgraImage = cv::Mat( (int)height, (int)extendedWidth, CV_8UC4, base );
cv::Mat grey = bgraImage.clone();
cv::cvtColor(grey, grey, 44);
What is a quick and easy way to 'checksum' an array of floating point numbers, while allowing for a specified small amount of inaccuracy?
e.g. I have two algorithms which should (in theory, with infinite precision) output the same array. But they work differently, and so floating point errors will accumulate differently, though the array lengths should be exactly the same. I'd like a quick and easy way to test if the arrays seem to be the same. I could of course compare the numbers pairwise, and report the maximum error; but one algorithm is in C++ and the other is in Mathematica and I don't want the bother of writing out the numbers to a file or pasting them from one system to another. That's why I want a simple checksum.
I could simply add up all the numbers in the array. If the array length is N, and I can tolerate an error of 0.0001 in each number, then I would check if abs(sum1-sum2)<0.0001*N. But this simplistic 'checksum' is not robust, e.g. to an error of +10 in one entry and -10 in another. (And anyway, probability theory says that the error probably grows like sqrt(N), not like N.) Of course, any checksum is a low-dimensional summary of a chunk of data so it will miss some errors, if not most... but simple checksums are nonetheless useful for finding non-malicious bug-type errors.
Or I could create a two-dimensional checksum, [sum(x[n]), sum(abs(x[n]))]. But is the best I can do, i.e. is there a different function I might use that would be "more orthogonal" to the sum(x[n])? And if I used some arbitrary functions, e.g. [sum(f1(x[n])), sum(f2(x[n]))], then how should my 'raw error tolerance' translate into 'checksum error tolerance'?
I'm programming in C++, but I'm happy to see answers in any language.
i have a feeling that what you want may be possible via something like gray codes. if you could translate your values into gray codes and use some kind of checksum that was able to correct n bits you could detect whether or not the two arrays were the same except for n-1 bits of error, right? (each bit of error means a number is "off by one", where the mapping would be such that this was a variation in the least significant digit).
but the exact details are beyond me - particularly for floating point values.
i don't know if it helps, but what gray codes solve is the problem of pathological rounding. rounding sounds like it will solve the problem - a naive solution might round and then checksum. but simple rounding always has pathological cases - for example, if we use floor, then 0.9999999 and 1 are distinct. a gray code approach seems to address that, since neighbouring values are always single bit away, so a bit-based checksum will accurately reflect "distance".
[update:] more exactly, what you want is a checksum that gives an estimate of the hamming distance between your gray-encoded sequences (and the gray encoded part is easy if you just care about 0.0001 since you can multiple everything by 10000 and use integers).
and it seems like such checksums do exist: Any error-correcting code can be used for error detection. A code with minimum Hamming distance, d, can detect up to d − 1 errors in a code word. Using minimum-distance-based error-correcting codes for error detection can be suitable if a strict limit on the minimum number of errors to be detected is desired.
so, just in case it's not clear:
multiple by minimum error to get integers
convert to gray code equivalent
use an error detecting code with a minimum hamming distance larger than the error you can tolerate.
but i am still not sure that's right. you still get the pathological rounding in the conversion from float to integer. so it seems like you need a minimum hamming distance that is 1 + len(data) (worst case, with a rounding error on each value). is that feasible? probably not for large arrays.
maybe ask again with better tags/description now that a general direction is possible? or just add tags now? we need someone who does this for a living. [i added a couple of tags]
I've spent a while looking for a deterministic answer, and been unable to find one. If there is a good answer, it's likely to require heavy-duty mathematical skills (functional analysis).
I'm pretty sure there is no solution based on "discretize in some cunning way, then apply a discrete checksum", e.g. "discretize into strings of 0/1/?, where ? means wildcard". Any discretization will have the property that two floating-point numbers very close to each other can end up with different discrete codes, and then the discrete checksum won't tell us what we want to know.
However, a very simple randomized scheme should work fine. Generate a pseudorandom string S from the alphabet {+1,-1}, and compute csx=sum(X_i*S_i) and csy=sum(Y_i*S_i), where X and Y are my original arrays of floating point numbers. If we model the errors as independent Normal random variables with mean 0, then it's easy to compute the distribution of csx-csy. We could do this for several strings S, and then do a hypothesis test that the mean error is 0. The number of strings S needed for the test is fixed, it doesn't grow linearly in the size of the arrays, so it satisfies my need for a "low-dimensional summary". This method also gives an estimate of the standard deviation of the error, which may be handy.
Try this:
#include <complex>
#include <cmath>
#include <iostream>
const size_t no_freqs = 3;
const double freqs[no_freqs] = {0.05, 0.16, 0.39}; // (for example)
int main() {
std::complex<double> spectral_amplitude[no_freqs];
for (size_t i = 0; i < no_freqs; ++i) spectral_amplitude[i] = 0.0;
size_t n_data = 0;
std::complex<double> datum;
while (std::cin >> datum) {
for (size_t i = 0; i < no_freqs; ++i) {
spectral_amplitude[i] += datum * std::exp(
std::complex<double>(0.0, 1.0) * freqs[i] * double(n_data)
std::cout << "Fuzzy checksum:\n";
for (size_t i = 0; i < no_freqs; ++i) {
std::cout << real(spectral_amplitude[i]) << "\n";
std::cout << imag(spectral_amplitude[i]) << "\n";
std::cout << "\n";
return 0;
It returns just a few, arbitrary points of a Fourier transform of the entire data set. These make a fuzzy checksum, so to speak.
How about computing a standard integer checksum on the data obtained by zeroing the least significant digits of the data, the ones that you don't care about?
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I know that you should only optimize things when it is deemed necessary. But, if it is deemed necessary, what are your favorite low level (as opposed to algorithmic level) optimization tricks.
For example: loop unrolling.
gcc -O2
Compilers do a lot better job of it than you can.
Picking a power of two for filters, circular buffers, etc.
So very, very convenient.
Why, bit twiddling hacks, of course!
One of the most useful in scientific code is to replace pow(x,4) with x*x*x*x. Pow is almost always more expensive than multiplication. This is followed by
for(int i = 0; i < N; i++)
z += x/y;
double denom = 1/y;
for(int i = 0; i < N; i++)
z += x*denom;
But my favorite low level optimization is to figure out which calculations can be removed from a loop. Its always faster to do the calculation once rather than N times. Depending on your compiler, some of these may be automatically done for you.
Inspect the compiler's output, then try to coerce it to do something faster.
I wouldn't necessarily call it a low level optimization, but I have saved orders of magnitude more cycles through judicious application of caching than I have through all my applications of low level tricks combined. Many of these methods are applications specific.
Having an LRU cache of database queries (or any other IPC based request).
Remembering the last failed database query and returning a failure if re-requested within a certain time frame.
Remembering your location in a large data structure to ensure that if the next request is for the same node, the search is free.
Caching calculation results to prevent duplicate work. In addition to more complex scenarios, this is often found in if or for statements.
CPUs and compilers are constantly changing. Whatever low level code trick that made sense 3 CPU chips ago with a different compiler may actually be slower on the current architecture and there may be a good chance that this trick may confuse whoever is maintaining this code in the future.
++i can be faster than i++, because it avoids creating a temporary.
Whether this still holds for modern C/C++/Java/C# compilers, I don't know. It might well be different for user-defined types with overloaded operators, whereas in the case of simple integers it probably doesn't matter.
But I've come to like the syntax... it reads like "increment i" which is a sensible order.
Using template metaprogramming to calculate things at compile time instead of at run-time.
Years ago with a not-so-smart compilier, I got great mileage from function inlining, walking pointers instead of indexing arrays, and iterating down to zero instead of up to a maximum.
When in doubt, a little knowledge of assembly will let you look at what the compiler is producing and attack the inefficient parts (in your source language, using structures friendlier to your compiler.)
precalculating values.
For instance, instead of sin(a) or cos(a), if your application doesn't necessarily need angles to be very precise, maybe you represent angles in 1/256 of a circle, and create arrays of floats sine[] and cosine[] precalculating the sin and cos of those angles.
And, if you need a vector at some angle of a given length frequently, you might precalculate all those sines and cosines already multiplied by that length.
Or, to put it more generally, trade memory for speed.
Or, even more generally, "All programming is an exercise in caching" -- Terje Mathisen
Some things are less obvious. For instance traversing a two dimensional array, you might do something like
for (x=0;x<maxx;x++)
for (y=0;y<maxy;y++)
You might find the processor cache likes it better if you do:
for (y=0;y<maxy;y++)
for (x=0;x<maxx;x++)
or vice versa.
Don't do loop unrolling. Don't do Duff's device. Make your loops as small as possible, anything else inhibits x86 performance and gcc optimizer performance.
Getting rid of branches can be useful, though - so getting rid of loops completely is good, and those branchless math tricks really do work. Beyond that, try never to go out of the L2 cache - this means a lot of precalculation/caching should also be avoided if it wastes cache space.
And, especially for x86, try to keep the number of variables in use at any one time down. It's hard to tell what compilers will do with that kind of thing, but usually having less loop iteration variables/array indexes will end up with better asm output.
Of course, this is for desktop CPUs; a slow CPU with fast memory access can precalculate a lot more, but in these days that might be an embedded system with little total memory anyway…
I've found that changing from a pointer to indexed access may make a difference; the compiler has different instruction forms and register usages to choose from. Vice versa, too. This is extremely low-level and compiler dependent, though, and only good when you need that last few percent.
for (i = 0; i < n; ++i)
*p++ = ...; // some complicated expression
for (i = 0; i < n; ++i)
p[i] = ...; // some complicated expression
Optimizing cache locality - for example when multiplying two matrices that don't fit into cache.
Allocating with new on a pre-allocated buffer using C++'s placement new.
Counting down a loop. It's cheaper to compare against 0 than N:
for (i = N; --i >= 0; ) ...
Shifting and masking by powers of two is cheaper than division and remainder, / and %
#define WORD_LOG 5
#define SIZE (1 << WORD_LOG)
#define MASK (SIZE - 1)
uint32_t bits[K]
void set_bit(unsigned i)
bits[i >> WORD_LOG] |= (1 << (i & MASK))
(i >> WORD_LOG) == (i / SIZE) and
(i & MASK) == (i % SIZE)
because SIZE is 32 or 2^5.
Jon Bentley's Writing Efficient Programs is a great source of low- and high-level techniques -- if you can find a copy.
Eliminating branches (if/elses) by using boolean math:
if(x == 0)
x = 5;
// becomes:
x += (x == 0) * 5;
// if '5' was a base 2 number, let's say 4:
x += (x == 0) << 2;
// divide by 2 if flag is set
sum >>= (blendMode == BLEND);
This REALLY speeds things out especially when those ifs are in a loop or somewhere that is being called a lot.
The one from Assembler:
xor ax, ax
instead of:
mov ax, 0
Classical optimization for program size and performance.
In SQL, if you only need to know whether any data exists or not, don't bother with COUNT(*):
SELECT 1 FROM table WHERE some_primary_key = some_value
If your WHERE clause is likely return multiple rows, add a LIMIT 1 too.
(Remember that databases can't see what your code's doing with their results, so they can't optimise these things away on their own!)
Recycling the frame-pointer all of a sudden
Pascal calling-convention
Rewrite stack-frame tail call optimizarion (although it sometimes messes with the above)
Using vfork() instead of fork() before exec()
And one I am still looking for, an excuse to use: data driven code-generation at runtime
Liberal use of __restrict to eliminate load-hit-store stalls.
Rolling up loops.
Seriously, the last time I needed to do anything like this was in a function that took 80% of the runtime, so it was worth trying to micro-optimize if I could get a noticeable performance increase.
The first thing I did was to roll up the loop. This gave me a very significant speed increase. I believe this was a matter of cache locality.
The next thing I did was add a layer of indirection, and put some more logic into the loop, which allowed me to only loop through the things I needed. This wasn't as much of a speed increase, but it was worth doing.
If you're going to micro-optimize, you need to have a reasonable idea of two things: the architecture you're actually using (which is vastly different from the systems I grew up with, at least for micro-optimization purposes), and what the compiler will do for you.
A lot of the traditional micro-optimizations trade space for time. Nowadays, using more space increases the chances of a cache miss, and there goes your performance. Moreover, a lot of them are now done by modern compilers, and typically better than you're likely to do them.
Currently, you should (a) profile to see if you need to micro-optimize, and then (b) try to trade computation for space, in the hope of keeping as much as possible in cache. Finally, run some tests, so you know if you've improved things or screwed them up. Modern compilers and chips are far too complex for you to keep a good mental model, and the only way you'll know if some optimization works or not is to test.
In addition to Joshua's comment about code generation (a big win), and other good suggestions, ...
I'm not sure if you would call it "low-level", but (and this is downvote-bait) 1) stay away from using any more levels of abstraction than absolutely necessary, and 2) stay away from event-driven notification-style programming, if possible.
If a computer executing a program is like a car running a race, a method call is like a detour. That's not necessarily bad except there's a strong temptation to nest those things, because once you're written a method call, you tend to forget what that call could cost you.
If your're relying on events and notifications, it's because you have multiple data structures that need to be kept in agreement. This is costly, and should only be done if you can't avoid it.
In my experience, the biggest performance killers are too much data structure and too much abstraction.
I was amazed at the speedup I got by replacing a for loop adding numbers together in structs:
const unsigned long SIZE = 100000000;
typedef struct {
int a;
int b;
int result;
} addition;
addition *sum;
void start() {
unsigned int byte_count = SIZE * sizeof(addition);
sum = malloc(byte_count);
unsigned int i = 0;
if (i < SIZE) {
do {
sum[i].a = i;
sum[i].b = i;
} while (i < SIZE);
void test_func() {
unsigned int i = 0;
if (i < SIZE) { // this is about 30% faster than the more obvious for loop, even with O3
do {
addition *s1 = &sum[i];
s1->result = s1->b + s1->a;
} while ( i<SIZE );
void finish() {
Why doesn't gcc optimise for loops into this? Or is there something I missed? Some cache effect?
What is the best method for comparing IEEE floats and doubles for equality? I have heard of several methods, but I wanted to see what the community thought.
The best approach I think is to compare ULPs.
bool is_nan(float f)
return (*reinterpret_cast<unsigned __int32*>(&f) & 0x7f800000) == 0x7f800000 && (*reinterpret_cast<unsigned __int32*>(&f) & 0x007fffff) != 0;
bool is_finite(float f)
return (*reinterpret_cast<unsigned __int32*>(&f) & 0x7f800000) != 0x7f800000;
// if this symbol is defined, NaNs are never equal to anything (as is normal in IEEE floating point)
// if this symbol is not defined, NaNs are hugely different from regular numbers, but might be equal to each other
#define UNEQUAL_NANS 1
// if this symbol is defined, infinites are never equal to finite numbers (as they're unimaginably greater)
// if this symbol is not defined, infinities are 1 ULP away from +/- FLT_MAX
// test whether two IEEE floats are within a specified number of representable values of each other
// This depends on the fact that IEEE floats are properly ordered when treated as signed magnitude integers
bool equal_float(float lhs, float rhs, unsigned __int32 max_ulp_difference)
if(is_nan(lhs) || is_nan(rhs))
return false;
if((is_finite(lhs) && !is_finite(rhs)) || (!is_finite(lhs) && is_finite(rhs)))
return false;
signed __int32 left(*reinterpret_cast<signed __int32*>(&lhs));
// transform signed magnitude ints into 2s complement signed ints
if(left < 0)
left = 0x80000000 - left;
signed __int32 right(*reinterpret_cast<signed __int32*>(&rhs));
// transform signed magnitude ints into 2s complement signed ints
if(right < 0)
right = 0x80000000 - right;
if(static_cast<unsigned __int32>(std::abs(left - right)) <= max_ulp_difference)
return true;
return false;
A similar technique can be used for doubles. The trick is to convert the floats so that they're ordered (as if integers) and then just see how different they are.
I have no idea why this damn thing is screwing up my underscores. Edit: Oh, perhaps that is just an artefact of the preview. That's OK then.
The current version I am using is this
bool is_equals(float A, float B,
float maxRelativeError, float maxAbsoluteError)
if (fabs(A - B) < maxAbsoluteError)
return true;
float relativeError;
if (fabs(B) > fabs(A))
relativeError = fabs((A - B) / B);
relativeError = fabs((A - B) / A);
if (relativeError <= maxRelativeError)
return true;
return false;
This seems to take care of most problems by combining relative and absolute error tolerance. Is the ULP approach better? If so, why?
#DrPizza: I am no performance guru but I would expect fixed point operations to be quicker than floating point operations (in most cases).
It rather depends on what you are doing with them. A fixed-point type with the same range as an IEEE float would be many many times slower (and many times larger).
Things suitable for floats:
3D graphics, physics/engineering, simulation, climate simulation....
In numerical software you often want to test whether two floating point numbers are exactly equal. LAPACK is full of examples for such cases. Sure, the most common case is where you want to test whether a floating point number equals "Zero", "One", "Two", "Half". If anyone is interested I can pick some algorithms and go more into detail.
Also in BLAS you often want to check whether a floating point number is exactly Zero or One. For example, the routine dgemv can compute operations of the form
y = beta*y + alpha*A*x
y = beta*y + alpha*A^T*x
y = beta*y + alpha*A^H*x
So if beta equals One you have an "plus assignment" and for beta equals Zero a "simple assignment". So you certainly can cut the computational cost if you give these (common) cases a special treatment.
Sure, you could design the BLAS routines in such a way that you can avoid exact comparisons (e.g. using some flags). However, the LAPACK is full of examples where it is not possible.
There are certainly many cases where you don't want check for "is exactly equal". For many people this even might be the only case they ever have to deal with. All I want to point out is that there are other cases too.
Although LAPACK is written in Fortran the logic is the same if you are using other programming languages for numerical software.
Oh dear lord please don't interpret the float bits as ints unless you're running on a P6 or earlier.
Even if it causes it to copy from vector registers to integer registers via memory, and even if it stalls the pipeline, it's the best way to do it that I've come across, insofar as it provides the most robust comparisons even in the face of floating point errors.
i.e. it is a price worth paying.
This seems to take care of most problems by combining relative and absolute error tolerance. Is the ULP approach better? If so, why?
ULPs are a direct measure of the "distance" between two floating point numbers. This means that they don't require you to conjure up the relative and absolute error values, nor do you have to make sure to get those values "about right". With ULPs, you can express directly how close you want the numbers to be, and the same threshold works just as well for small values as for large ones.
If you have floating point errors you have even more problems than this. Although I guess that is up to personal perspective.
Even if we do the numeric analysis to minimize accumulation of error, we can't eliminate it and we can be left with results that ought to be identical (if we were calculating with reals) but differ (because we cannot calculate with reals).
If you are looking for two floats to be equal, then they should be identically equal in my opinion. If you are facing a floating point rounding problem, perhaps a fixed point representation would suit your problem better.
If you are looking for two floats to be equal, then they should be identically equal in my opinion. If you are facing a floating point rounding problem, perhaps a fixed point representation would suit your problem better.
Perhaps we cannot afford the loss of range or performance that such an approach would inflict.
#DrPizza: I am no performance guru but I would expect fixed point operations to be quicker than floating point operations (in most cases).
#Craig H: Sure. I'm totally okay with it printing that. If a or b store money then they should be represented in fixed point. I'm struggling to think of a real world example where such logic ought to be allied to floats. Things suitable for floats:
real world values (like from a ADC)
For all these things, either you much then numbers and simply present the results to the user for human interpretation, or you make a comparative statement (even if such a statement is, "this thing is within 0.001 of this other thing"). A comparative statement like mine is only useful in the context of the algorithm: the "within 0.001" part depends on what physical question you're asking. That my 0.02. Or should I say 2/100ths?
It rather depends on what you are
doing with them. A fixed-point type
with the same range as an IEEE float
would be many many times slower (and
many times larger).
Okay, but if I want a infinitesimally small bit-resolution then it's back to my original point: == and != have no meaning in the context of such a problem.
An int lets me express ~10^9 values (regardless of the range) which seems like enough for any situation where I would care about two of them being equal. And if that's not enough, use a 64-bit OS and you've got about 10^19 distinct values.
I can express values a range of 0 to 10^200 (for example) in an int, it is just the bit-resolution that suffers (resolution would be greater than 1, but, again, no application has that sort of range as well as that sort of resolution).
To summarize, I think in all cases one either is representing a continuum of values, in which case != and == are irrelevant, or one is representing a fixed set of values, which can be mapped to an int (or a another fixed-precision type).
An int lets me express ~10^9 values
(regardless of the range) which seems
like enough for any situation where I
would care about two of them being
equal. And if that's not enough, use a
64-bit OS and you've got about 10^19
distinct values.
I have actually hit that limit... I was trying to juggle times in ps and time in clock cycles in a simulation where you easily hit 10^10 cycles. No matter what I did I very quickly overflowed the puny range of 64-bit integers... 10^19 is not as much as you think it is, gimme 128 bits computing now!
Floats allowed me to get a solution to the mathematical issues, as the values overflowed with lots zeros at the low end. So you basically had a decimal point floating aronud in the number with no loss of precision (I could like with the more limited distinct number of values allowed in the mantissa of a float compared to a 64-bit int, but desperately needed th range!).
And then things converted back to integers to compare etc.
Annoying, and in the end I scrapped the entire attempt and just relied on floats and < and > to get the work done. Not perfect, but works for the use case envisioned.
If you are looking for two floats to be equal, then they should be identically equal in my opinion. If you are facing a floating point rounding problem, perhaps a fixed point representation would suit your problem better.
Perhaps I should explain the problem better. In C++, the following code:
#include <iostream>
using namespace std;
int main()
float a = 1.0;
float b = 0.0;
for(int i=0;i<10;++i)
if(a != b)
cout << "Something is wrong" << endl;
return 1;
prints the phrase "Something is wrong". Are you saying that it should?
Oh dear lord please don't interpret the float bits as ints unless you're running on a P6 or earlier.
it's the best way to do it that I've come across, insofar as it provides the most robust comparisons even in the face of floating point errors.
If you have floating point errors you have even more problems than this. Although I guess that is up to personal perspective.