Will code written in this style be optimized out by RVO in C++11? - optimization

I grew up in the days when passing around structures was bad mojo because they are often large, so pointers were always the way to go. Now that C++11 has quite good RVO (right value optimization), I'm wondering if code like the following will be efficient.
As you can see, my class has a bunch of vector structures (not pointers to them). The constructor accepts value structures and stores them away.
My -hope- is that the compiler will use move semantics so that there really is no copying of data going on; the constructor will (when possible) just assume ownership of the values passed in.
Does anyone know if this is true, and happens automagically, or do I need a move constructor with the && syntax and so on?
// ParticleVertex
//
// Class that represents the particle vertices
class ParticleVertex : public Vertex
{
public:
D3DXVECTOR4 _vertexPosition;
D3DXVECTOR2 _vertexTextureCoordinate;
D3DXVECTOR3 _vertexDirection;
D3DXVECTOR3 _vertexColorMultipler;
ParticleVertex(D3DXVECTOR4 vertexPosition,
D3DXVECTOR2 vertexTextureCoordinate,
D3DXVECTOR3 vertexDirection,
D3DXVECTOR3 vertexColorMultipler)
{
_vertexPosition = vertexPosition;
_vertexTextureCoordinate = vertexTextureCoordinate;
_vertexDirection = vertexDirection;
_vertexColorMultipler = vertexColorMultipler;
}
virtual const D3DVERTEXELEMENT9 * GetVertexDeclaration() const
{
return particleVertexDeclarations;
}
};

Yes, indeed you should trust the compiler to optimally "move" the structures:
Want Speed? Pass By Value
Guideline: Don’t copy your function arguments. Instead, pass them by value and let the compiler do the copying
In this case, you'd move the arguments into the constructor call:
ParticleVertex myPV(std::move(pos),
std::move(textureCoordinate),
std::move(direction),
std::move(colorMultipler));
In many contexts, the std::move will be implicit, e.g.
D3DXVECTOR4 getFooPosition() {
D3DXVECTOR4 result;
// bla
return result; // NRVO, std::move only required with MSVC
}
ParticleVertex myPV(getFooPosition(), // implicit rvalue-reference moved

RVO means Return Value Optimization not Right value optimization.
RVO is a optimization performed by the compiler when the return of a function is by value, and its clear that the code returns a temporary object created in the body, so the copy can be avoided. The function returns the created object directly.
What C++11 introduces is Move Semantics. Move semantics allows us to "move" the resource from a certain temporary to a target object.
But, move implies that the object wich the resource comes from, is in a unusable state after the move. This is not the case (I think) you want in your class, because the vertex data is used by the class, even if the user calls to this function or not.
So, use the common return by const reference to avoid copies.
On the other hand,, DirectX provides handles to the resources (Pointers), not the real resource. Pointers are basic types,its copying is cheap, so don't worry about performance. In your case, you are using 2d/3d vectors. Its copying is cheap too.
Personally, I think that returning a pointer to an internal resource is a very bad idea, always. I think that in this case the best aproach is to return by const reference.

Related

Is Kotlin synchronized() not locking basic types?

class Notification(val context: Context, title: String, message: String) {
private val channelID = "TestMessages"
companion object ID {
var s_notificationID = -1
}
init {
var notificationID = -1
synchronized(s_notificationID) {
if (++s_notificationID == 0)
createNotificationChannel()
notificationID = s_notificationID
}
The above is being called simultaneously from two threads. A breakpoint in createNotificationChannel() clearly showed that sometimes s_notificationID equals 1.
However, if I change
synchronized(s_notificationID)
to synchronized(ID)
then it seems to lock fine.
Is synchronized() not locking basic types? And if so, why does it compile?
A look at the generated JVM bytecode indicates that the ID example looks like
synchronized(ID) { ... }
which is what you'd expect. However, the s_notificationID example looks more like
synchronized(Integer.valueOf(s_notificationID)) { ... }
In Java, we can only synchronize on objects, not on primitives. Kotlin mostly removes this distinction, but it looks like you've found one place where the implementation still seeps through. Since s_notificationID is an int as far as the JVM is concerned (hence, not an object) but synchronized expects an object, Kotlin is "smart" enough to wrap the value in Integer.valueOf on demand. Unfortunately for you, that produces wildly inconsistent results, because
This method will always cache values in the range -128 to 127, inclusive, and may cache other values outside of this range.
So for small numbers, this is guaranteed to lock on some cached object in memory that you don't control. For large ones, it may be a fresh object (hence always unlocked) or it might again end up on a cached object out of your hands.
The lesson here, it seems, is: Don't synchronize on primitive types.
Silvio Mayolo explained why it is not a good idea to synchronize on primitives (actually, I think the compiler should warn about this). But I believe there is another problem with this code, probably the main one that makes your synchronized blocks work in parallel.
The problem is that you replace the value of s_notificationID. Even if it would be an object, not a primitive, your synchronized blocks would still run in parallel, because each call to synchronized uses a different object. This is why in Java we usually synchronize on this and not on a field that we need to modify.
TL;DR The lesson here, it seems, is: Don't synchronize on primitive types.
synchronized(i) where i is Int, is actually synchronized(Integer.valueOf(i)).
Only in the range -128 to 127 this value is guaranteed to be a cached value.
Another fact is that ++i cannot be looked at as a mutation of the "object" i, but rather as replacing i by a new "object" with the value i+1.
Thank you broot & Silvio Mayolo for the above.
Experiments I did prove the above.
In my original code I have removed the ++ from
++s_notificationID. Amazingly or not, the lock worked now.
Now with that change I changed var s_notificationID = -1 to be var s_notificationID = -1000. Even more amazing, now the lock again stopped working.
Still, I think this anomaly of basic types undermines the attempt of Kotlin to see basic types as objects, and I think this should have been mentioned clearly in Kotlin documentation.

populate object from command line and check object state

I populate an object based on the users input from the commandline.
The object needs to have a certain amount of data to proceed. My solution so far is nested if-statements to check if the object is ready. Like below example.
Maybe 3 if-statements aren't so bad(?) but what if that number of if-statements starts to increase? What are my alternatives here? Let's say that X, Y and Z are three completely different things. For example let's say that object.X is a list of integers and object.Y is a string and maybe Z is some sort of boolean to return true only if object.Y has a certain amount of values?
I'm not sure polymorhism will work in this case?
do
{
if (object.HasX)
{
if (object.HasY)
{
if (object.HasZ)
{
//Object is ready to proceed.
}
else
{
//Object is missing Z. Handle it...
}
}
else
{
//Object is missing Y. Handle it...
}
}
else
{
//Object is missing X. Handle it...
}
} while (!String.IsNullOrEmpty(line));
For complex logic workflow, I have found, it's important for maintainability to decide which level of abstraction the logic should live in.
Will new logic/parsing rules have to be added regularly?
Unfortunately, there isn't a way to avoid having to do explicit conditionals, they have to live somewhere.
Some things that can help keep it clean could be:
Main function is only responsible for converting command line arguments to native datatypes, then it pushes the logic down to an object builder class, This will keep main function stable and unchanged, except for adding flag descriptions, THis should keep the logic out of the domain, and centralized to the builder abstraction
Main function is responsible for parsing and configuring the domain, this isolates all the messy conditionals in the main/parsing function and keeps the logic outside of the domain models
Flatten the logic, if not object.hasX; return, next step you know has.X, this will still have a list of conditionals but will be flatter
Create a DSL declarative rule language (more apparent when flattening). This could be a rule processor, where the logic lives, then the outer main function could define that states that are necessary to proceed

No constructor found in the absence of writeln

I am using DMD64 D Compiler v2.063.2 on Ubuntu 13.04 64-bit.
I have written a class as below:
class FixedList(T){
// list
private T[] list;
// number of items
private size_t numberOfItems;
// capacity
private size_t capacity;
// mutex
private Mutex listMutex;
// get capacity
#property public size_t Capacity(){ return capacity; }
#property public shared size_t Capacity(){ return capacity; }
// constructor
public this( size_t capacity ){
// initialise
numberOfItems = 0;
this.capacity = capacity;
writeln("Cons Normal");
}
// constructor
public shared this( size_t capacity ){
// initialise
numberOfItems = 0;
this.capacity = capacity;
// create mutex
listMutex = cast(shared)(new Mutex());
writeln("Cons Shared");
}
}
While class is written in this way, in main function, I wrote that code:
auto list1 = new shared FixedList!int( 128 );
auto list2 = new FixedList!int( 128 );
Output with this, there is no error at all and the output is as below:
Cons Shared
Cons Normal
What I do next is to remove both writeln lines from the code, and when I recompile the code, it starts showing error messages as below:
src/webapp.d(61): Error: constructor lists.FixedList!(int).FixedList.this called with argument types:
((int) shared)
matches both:
lists.d(28): lists.FixedList!(int).FixedList.this(ulong capacity)
and:
lists.d(37): lists.FixedList!(int).FixedList.this(ulong capacity)
src/app.d(61): Error: no constructor for FixedList
src/app.d(62): Error: constructor lists.FixedList!(int).FixedList.this called with argument types:
((int))
matches both:
lists.d(28): lists.FixedList!(int).FixedList.this(ulong capacity)
and:
lists.d(37): lists.FixedList!(int).FixedList.this(ulong capacity)
src/app.d(62): Error: no constructor for FixedList
make: *** [all] Error 1
Basically writeln function is preventing the error. Actually writeln is preventing in many places and I am not sure about why this is happening.
I even tried to compile the the code with m32 flag for 32-bit, but it is still same. Am I doing something wrong, or is this a bug?
pure, nothrow, and #safe are inferred for template functions. As FixedList is templated, its constructors are templated. writeln is not (and cannot be) pure as it does I/O. So, while writeln is in the constructors, they are inferred to not be pure, but everything else that the constructors are doing is pure, so without the calls to writeln, they become pure.
Under some circumstances, the compiler is able to alter the return type of pure functions to implicitly convert it to immutable or shared. This works, because in those cases, the compiler knows that what's being returned is a new, unique object and that casting it to immutable or shared would not violate the type system. Not all pure functions qualify, as the parameter types can affect whether the compiler can guarantee that the return value is unique, but many pure functions are able to take advantage of this and implicitly convert their return value to immutable or shared. This is useful, because it can avoid code duplication (for different return types) or copying - since if the type returned doesn't match what you need with regards to immutable or shared, and you can't guarantee that it's not referred to elsewhere, you have to copy it to get the type that you want. In this case, the compiler is able to make the guarantee that the object is not referred to elsewhere, so it can safely cast it for you.
Constructors effectively return new values, so they can be affected by this feature. This makes it so that if a constructor is pure, you can often construct immutable and shared values from it without having to duplicate the constructor (like you'd have to do if it weren't pure). As with other pure functions, whether this works or not depends on the constructor's parameter types, but it's frequently possible, and it helps avoid code duplication.
What's causing you problems is that when FixedList's constructors are both pure, the compiler is able to use either of them to construct a shared object. So, it doesn't know which one to choose, and gives you an ambiguity error.
I've reported this as a bug on the theory that the compiler should probably prefer the constructer which is explicitly marked as shared, but what the compiler devs will decide, I don't know. The ability to implicitly convert return values from pure functions is a fairly new feature and exactly when we can and can't do those implicit conversions is still being explored, which can result both in unanticipated problems (like this one probably is) as well as compiler bugs (e.g. there's at least one case with immutable, where it currently does the conversion when it shouldn't). I'm sure that these issues will be ironed out fairly quickly though.
A pure constructor can build a shared object without being marked shared itself.
Apparently, pureness is inferred for constructors.
writeln is not pure. So, with it in place, the constructors are not pure.
When writeln is removed, the constructors become pure. Both constructors now match the shared call.

How a programmers solve the dilemma of using old variables instead of new variables?

For example:
... some code
int sizeOfSomeObject = someObject.length();
... some code, sizeOfSomeObject is not need anymore
now I need other int variable for other action(for example, for position in some object), and i have the dilemma: create a new variable or use sizeOfSomeObject for this. In the first case I will keep readability, but lose performance. In the second case - on the contrary. What usually do programmers in this situation?
In the first case I will keep readability, but lose performance. In the second case - on the contrary.
So did you benchmark it? I suspect no, you didn't. Most modern compilers do a lot of agressive analysis during register allocation, so if the optimizer perceives that there's a variable that's not used anymore, but there's a new variable of the same type, it will just merge the two variables to the same memory region or processor register. No need to worry about performance penalties.
And anyway, don't do premature optimization (which this is). In 90% of the cases, readability is more important than "performance".
All in all, go ahead and create a new variable with an appropriate, different, descriptive name. And just for fun, compile this version and the version in which you used the same variable name, and look at the generated assembly (or bytecode, or...) - and find out that they're identical.
I would use different named variables for different things.
In terms of something like this, I don't think just one variable would cause a massive performance hit. In most languages you have the option to clear variables from memory in some way when they are no longer in use, so I would recommend doing that so that the code means something to you or others when read at a later date.
In C++, you can use blocks for objects to be destroyed as soon as they are not needed anymore:
void some_function () {
{
MyClass c;
// ... here we use c ...
}
// now c has been destroyed
{
MyClass d;
// ... here we use d ...
}
// now d has been destroyed
}
In your example (with int variables), there is no reason to worry about performance. The worst thing that could probably happen is memory for two variables being used instead of one, but (i) that's negligible and (ii) int's will probably live in a CPU register, anyway. If you really worry, use the block approach for your int example.
It depends how often such an int would be initialized. If it's not in some hugely nested for loop, most (all) programmers will go for the first. Besides, most modern programming languages have a garbage collector, which cleans up left over objects.
Decent compiler will optimize out your second variable, so that shouldn't be an issue.
That said, there are situations where variable reuse makes sense. E.g., you might have some variable that holds a generic output populated from call to some external API. According to the context and parameters passed to the API you'll process the data differently but it's probably better (more readable etc.) to reuse the same data variable.
For example, something like this:
void* data = getSomeData(params);
//process data
//change params
data = getSomeData(params);
//process data
//change params
data = getSomeData(params);

std::unique_ptr and pointer-to-pointer

I've been using std::unique_ptr to store some COM resources, and provided a custom deleter function. However, many of the COM functions want pointer-to-pointer. Right now, I'm using the implementation detail of _Myptr, in my compiler. Is it going to break unique_ptr to be accessing this data member directly, or should I store a gajillion temporary pointers to construct unique_ptr rvalues from?
COM objects are reference-countable by their nature, so you shouldn't use anything except reference-counting smart pointers like ATL::CComPtr or _com_ptr_t even if it seems inappropriate for your usecase (I fully understand your concerns, I just think you assign too much weight to them). Both classes are designed to be used in all valid scenarios that arise when COM objects are used, including obtaining the pointer-to-pointer. Yes, that's a bit too much functionality, but if you don't expect any specific negative consequences you can't tolerate you should just use those classes - they are designed exactly for this purpose.
I've had to tackle the same problem not too long ago, and I came up with two different solutions:
The first was a simple wrapper that encapsulated a 'writeable' pointer and could be std::moved into my smart pointer. This is just a little more convenient that using the temp pointers you are mentioning, since you cannot define the type directly at the call-site.
Therefore, I didn't stick with that. So what I did was a Retrieve helper-function that would get the COM function and return my smart-pointer (and do all the temporary pointer stuff internally). Now this trivially works with free-functions that only have a single T** parameter. If you want to use this on something more complex, you can just pass in the call via std::bind and only leave the pointer-to-be-returned free.
I know that this is not directly what you're asking, but I think it's a neat solution to the problem you're having.
As a side note, I'd prefer boost's intrusive_ptr instead of std::unique_ptr, but that's a matter of taste, as always.
Edit: Here's some sample code that's transferred from my version using boost::intrusive_ptr (so it might not work out-of-the box with unique_ptr)
template <class T, class PtrType, class PtrDel>
HRESULT retrieve(T func, std::unique_ptr<PtrType, PtrDel>& ptr)
{
ElementType* raw_ptr=nullptr;
HRESULT result = func(&raw_ptr);
ptr.reset(raw_ptr);
return result;
}
For example, it can be used like this:
std::unique_ptr<IFileDialog, ComDeleter> FileDialog;
/*...*/
using std::bind;
using namespace std::placeholders;
std::unique_ptr<IShellItem, ComDeleter> ShellItem;
HRESULT status = retrieve(bind(&IFileDialog::GetResult, FileDialog, _1), ShellItem);
For bonus points, you can even let retrieve return the unique_ptr instead of taking it by reference. The functor that bind generates should have signature typedefs to derive the pointer type. You can then throw an exception if you get a bad HRESULT.
C++0x smart pointers have a portable way to get at the raw pointer container .get() or release it entirely with .release(). You could also always use &(*ptr) but that is less idiomatic.
If you want to use smart pointers to manage the lifetime of an object, but still need raw pointers to use a library which doesn't support smart pointers (including standard c library) you can use those functions to most conveniently get at the raw pointers.
Remember, you still need to keep the smart pointer around for the duration you want the object to live (so be aware of its lifetime).
Something like:
call_com_function( &my_uniq_ptr.get() ); // will work fine
return &my_localscope_uniq_ptr.get(); // will not
return &my_member_uniq_ptr.get(); // might, if *this will be around for the duration, etc..
Note: this is just a general answer to your question. How to best use COM is a separate issue and sharptooth may very well be correct.
Use a helper function like this.
template< class T >
T*& getPointerRef ( std::unique_ptr<T> & ptr )
{
struct Twin : public std::unique_ptr<T>::_Mybase {};
Twin * twin = (Twin*)( &ptr );
return twin->_Myptr;
}
check the implementation
int wmain ( int argc, wchar_t argv[] )
{
std::unique_ptr<char> charPtr ( new char[25] );
delete getPointerRef(charPtr);
getPointerRef(charPtr) = 0;
return charPtr.get() != 0;
}