I am wondering if it's possible to multiply a Cycles type variable in gem5. What I want to represent is a certain latency that is added an n number of times. So something like this:
return lookupLatency * n;
I'm getting this error:
error: could not convert '(((const BaseCache*)this)->BaseCache::lookupLatency.Cycles::operator uint64_t() * ((uint64_t)n))' from 'uint64_t {aka long unsigned int}' to 'Cycles'
Is there any way to do that quickly?
Cycles is just a wrapper class around uint64_t, and you can check its functionality in src/base/types.hh.
That being said, the constructor must be explicit, and you are trying to implicitly create a Cycles variable from lookupLatency * n, which is a uint64_t. Just call the constructor to make it Cycles again:
return Cycles(lookupLatency * n);
Related
In the math-headers we see
extern float fabsf(float);
extern double fabs(double);
extern long double fabsl(long double);
...
extern float fmodf(float, float);
extern double fmod(double, double);
extern long double fmodl(long double, long double);
Why is there one function for each type?
Isn't this a lot of duplicate code? If I where to say write a lerp-function or a clamp-function would I need to write one for each type?
Seems like we will have duplicate code where there's only one thing changing – the type.
extern float clampf(float value, float min, float max)
{
if(value > max)
return max;
if(value < min)
return min;
return value;
}
extern double clamp(double value, double min, double max)
{
if(value > max)
return max;
if(value < min)
return min;
return value;
}
Question 1: What is the historical reason for this structure?
Question 2: Should I follow the same pattern? Or should I only implement the double-kind since it is the one which is most common?
Question 3: Or should I just use macro's to overcome the type-issue altogether?
Historically (circa C89 and before), the math library contained only the double-precision versions of these functions, which is why those versions have no suffix. If you needed to compute the sine of a float, you either wrote your own implementation, or (more likely!) you simply wrote:
float x;
float y = sin(x);
However, this introduces some overhead on modern architectures. Specifically, on the most common architectures today, it is necessary for the compiler to emit code that looks something like this:
convert x to double
call sin
convert result to float
These conversions are pretty fast (about the same as an addition, usually), but they still have some cost. On top of the cost of conversion, sin needs to deliver a result that has ~53 bits of precision, more than half of which are completely wasted if the result is just going to be converted back to single precision. Between these two factors, it is possible for a dedicated single-precision sin routine to be about twice as fast; that’s a significant win for some very frequently-used library functions!
If we look at functions like fabs (and assume that the compiler does not simply inline and lower them), the situation is much, much worse. fabs, on a typical modern architecture, is a simple bitwise-and operation. So the two conversions bracketing the call (if all you have is double) are significantly more expensive than the operation itself, and can easily cause a 5x slowdown. That’s why multiple versions of these functions were added to support each FP type.
If you don’t want to keep track of all of them, you can #include <tgmath.h>, which will infer the correct function to use based on the type of the argument (meaning
sin((float)x)
will generate a call to sinf(x), whereas
sin((long double)x)
will call sinl(x)).
In your own code, you usually know a priori what the type of your arguments is, and only need to support one or maybe two types. clamp and lerp in particular are graphics operations, and almost universally are used only in single-precision variants.
Incidentally, the fact that you’re using clamp and lerp is a pretty good indication that you might want to look at writing your code in OpenCL instead of C/Obj-C; the OpenCL math library implements these operations (and many other similar operations) for you, and provides implementations that work with a wide range of basic types, including vectors.
float and double are different data types, same as int and long int. You can use the functions which operate on double on float values and implicit conversion will happen to make it work as expected in most circumstances, but if you use functions which operate on float on double values, you will almost inevitably lose precision.
There are other longer explanations available, e.g. What's the difference between a single precision and double precision floating point operation? .
I have a program on which I was trying to perform some loop optimization It's written in C++ and compiled using gcc
eventually using a profiler I tracked down more than half the execution time of the loop to the line
double x_component = in.input_vector[in.dimension_to_process] - \
(center_of_bin_0 + (double) nn * grid_distance);
Everything on this line is of type double with the exception of the loop index nn which is of type long unsigned int
the cast from long unsigned int to double generates the assembly instruction fxtod which the profiler flagged
As a test I removed the reference to nn from the line, thus removing the cast from unsigned int to double and the execution time of the loop reduced by almost a half, in a loop that performs about a dozen floating point operations on an Ultrasparc IV processor. I confirmed that this is also the case on an Ultrasparc II,
Is it normal for a cast from int to double to be much more expensive that a cache miss, let alone a floating point multiply ? And if so what does everyone else normally do about it ?
A lookup table for all possible values of nn (which in this case have a known limited range) would be faster than this
Jumping straight to code, this is what I would like to do:
size_t len = obj->someLengthFunctionThatReturnsTypeSizeT();
array<int>^ a = gcnew array<int>(len);
When I try this, I get the error
conversion from size_t to int, possible loss of data
Is there a way I can get this code to compile without explicitly casting to int? I find it odd that I can't initialize an array to this size, especially because there is a LongLength property (and how could you get a length as a long - bigger than int - if you can only initialize a length as an int?).
Thanks!
P.S.: I did find this article that says that it may be impractical to allocate an array that is truly size_t, but I don't think that is a concern. The point is that the length I would like to initialize to is stored in a size_t variable.
Managed arrays are implemented for using Int32 as indices, there is no way around that. You cannot allocate arrays larger than Int32.MaxValue.
You could use the static method Array::CreateInstance (the overload that takes a Type and an array of Int64), and then cast the resulting System::Array to the appropriate actual array type (e.g. array<int>^). Note that the passed values must not be larger than Int32.MaxValue. And you would still need to cast.
So you have at least two options. Either casting:
// Would truncate the value if it is too large
array<int>^ a = gcnew array<int>((int)len);
or this (no need to cast len, but the result of CreateInstance):
// Throws an ArgumentOutOfRangeException if len is too large
array<int>^ a = (array<int>^)Array::CreateInstance(int::typeid, len);
Personally, i find the first better. You still might want to check the actual size of len so that you don't run into any of the mentioned errors.
I have this code:
unsigned int k=(len - sizeof(MSG_INFO));
NSLog(#"%d",k);
for( unsigned int ix = 0; ix < k; ix++)
{
m_pOutPacket->m_buffer[ix] = (char)(pbuf[ix + sizeof(MSG_INFO)]);
}
The problem is, when:
len = 0 and sizeof(MSG_INFO)=68;
k=-68;
This condition gets into the for loop and is continuing for infinite times.
Your code says: unsigned int k. So k isn't -68, it's unsigned. This makes k a very big number, based around a 4 byte int, it would be 4294967210. This is obviously quite a lot more than 0, so it's going to take your for loop a while to get that high, although it would terminate eventually.
The reason you think that it's -86, is that when you print it out with a function like NSLog, it has no direct knowledge about the arguments passed in, it determines how to treat the arguments, based around the format string, supplied as the first argument.
You're calling:
This:
NSLog(#"%d",k);
This tells NSLog to treat the argument as a signed int (%d). You should be doing this:
NSLog(#"%u",k);
So that NSLog treats the argument as the type that it is: unsigned (%u). See the NSLog documentation.
As it stands, I'd expect your buffer to overrun, trashing memory as the loop runs and your application to crash.
After reflecting, I believe #FreeAsInBeer is correct and you don't want to iterate through the for loop in this situation and you could probably fix this by using signed ints. However, It seems to me like you would be better off, checking len > sizeof(MSG_INFO) and if this isn't the case handling it differently. Most situations I can think of, I wouldn't want to perform any processing after the for loop, if I'd failed to read sufficient information for a message...
I'm not really sure what is going on here, as the loop should never execute. I've loaded up your code, and it seems that the unsigned part of your int declaration is causing the issues. If you remove both of your unsigned specifiers, your code will execute as it should, without ever entering the loop.
Does it make any difference if I use e.g. short or char type of variable instead of int as a for-loop initializer?
for (int i = 0; i < 10; ++i) {}
for (short i = 0; i < 10; ++i) {}
for (char i = 0; i < 10; ++i) {}
Or maybe there is no difference? Maybe I make the things even worse and efficiency decreases? Does using different type saves memory and increases speed? I am not sure, but I suppose that ++ operator may need to widen the type, and as a result: slow down the execution.
It will not make any difference you should be caring about, provided the range you iterate over fits into the type you choose. Performance-wise, you'll probably get the best results when the size of the iteration variable is the same as the platform's native integer size, but any decent compiler will optimize it to use that anyway. On a managed platform (e.g. C# or Java), you don't know the target platform at compile time, and the JIT compiler is basically free to optimize for whatever platform it is running on.
The only thing you might want to watch out for is when you use the loop counter for other things inside the loop; changing the type may change the way these things get executed, up to the point (in C++ at least) that a different overload for a function or method may get called because the loop variable has a different type. An example would be when you output the loop variable through a C++ stream, like so: cout << i << endl;. Similarly, the type of the loop variable can infest the implicit types of (sub-)expressions that contain it, and lead to hidden overflows in numeric calculations, e.g.: int j = i * i;.