OpenMP - Iterator in the For-loop

OpenMP - Iterator in the For-loop - iterator

#pragma omp parallel for reduction(+ : numOfVecs)
for(itc=clus.begin() ; itc!=clus.end() ; itc++)
{
numOfVecs += (*itc)->getNumOfVecs();
}
I have a couple of codes like the code above where I need iterators in the loop. But I get the error 'invalid controlling predicate'. Is there any way to overcome this?
Thanks in advance
By the way I'm using the latest versions of code::blocks and mingw. I'm new to this but I think they support openmp3.0 by default after -fopenmp. The iterator I'm using is list iterator.

std::list<T>::iterator is a bidirectional iterator. Afaik, openmp3 parallel for loop works with random access iterators only (and no !=, as ejd mentioned). Maybe you can use a std::vector instead.

Related

How to disable all optimization when using COSMIC compiler?

I am using the COSMIC compiler in the STVD ide and even though optimization is turned of with -no (documentation says "-no: do not use optimizer") some lines of code get removed and cannot have a breakpoint placed upon them, nor are they to be found in the disassembly.
I tried to set -oc (leave removed instructions as comments) which resulted in not even showing the removed lines as comment.
bool foo(void)
{
uint8_t val;
if (globalvar > 5)
val = 0;
for (val = 0; val < 8; val++)
{
some code...
}
return true;
}
I do know it seems idiotic to set val to 0 prior to the for loop but lets just assume it is for some reason necessary. When I set no optimization I expect it to be not optimized but insted the val = 0; gets removed without any traces.
I am not looking for a workaround like declaring val volatile whitch solves the problem. I am rather looking for a way to prevent the optimization or at least understand/know what changes are made to my code when compiling.

It is not clear from the manual, but it seems that the -no option prevents assembly level optimisation. It seems possible that the code generator stage that runs before assembly optimisation may perform higher level optimisation such as redundant code removal.
From the manual:
-cp
disable the constant propagation optimization. By default,
when a variable is assigned with a constant, any subsequent access to that variable is replaced by the constant
itself until the variable is modified or a flow break is
encountered (function call, loop, label ...).
It seems that it is this constant propagation feature that you must explicitly disable.
It is unusual perhaps, but it appears that this compiler optimises by default, and distinguishes between compiler optimisations and assembler optimisations (performed as the compilation stage), and them makes you switch off each individual optimisation separately.
To avoid this in the code, rather than switching it off globally, you could initialise val to a non-zero value in this case:
int val = -1 ;
Then the later assignment to zero will require explicit code. This has the advantage over volatile perhaps in that it will not block optimisations when you do enable them.

I believe that this behaviour is allowed by the C language specification.
You are effectively writing the same value either once or twice to the same variable on successive lines of code. The compiler could assign this value to either a processor register or a memory location as it sees fit and knows that the value following the initial assignment in the for loop is the same as the value assigned when the if clause is actioned. As a result the language spec allows the compiler to throw the redundant code away.
The way to force the compiler to perform all read and write accesses to the variable is to use the volatile keyword. That is what it is for.

How to use Callgrind to profile specific functions?

Following this, I wrapped my functions with CALLGRIND_xxx_INSTRUMENTATION macros. However, I always get "out of memory".
Here is a simplified version of my program and callgrind would still run out of memory even though I could run callgrind without using the macros.
#include <cstdio>
#include <valgrind/callgrind.h>
void foo(int i)
{
printf("i=%d\n", i);
}
int main()
{
for (int i=0; i<1048576; i++)
{
CALLGRIND_START_INSTRUMENTATION;
foo(i);
CALLGRIND_STOP_INSTRUMENTATION;
}
}
To run this, "valgrind --tool=callgrind --instr-atstart=no ./foo >foo.out".
Did I do anything wrong? Please help. Thanks!

CALLGRIND_START_INSTRUMENTATION typical use case is to skip instrumenting the application startup code. If you call it in a loop, then this is costly both in memory and in cpu,
as callgrind will each time re-instrument the code.
If you are interested in only measuring some functions, you should rather
start instrumentation somewhere before the loop, and then use CALLGRIND_TOGGLE_COLLECT before/after the function calls you are interested in.
That will use both less cpu and less memory.
If you want to do the above, you should then use the options --instr-atstart=no and --collect-at-start=no. You then start instrumentation at the relevant place in your program (e.g. after the startup/initialisation code). You can then insert calls to CALLGRIND_TOGGLE_COLLECT in the functions you are interested in.
Note that instead of modifying your program to call CALLGRIND_TOGGLE_COLLECT for a bunch of functions,
you can also use one or more times the command line option --toggle-collect=<function>

Why use a single incrementer class

Below code are found in WebKit:
RefPtr<Element> element = pendingScript.releaseElementAndClear();
if (ScriptElement* scriptElement = toScriptElement(element.get())) {
NestingLevelIncrementer nestingLevelIncrementer(m_scriptNestingLevel);
IgnoreDestructiveWriteCountIncrementer ignoreDestructiveWriteCountIncrementer(m_document);
//Do something else...
}
}
NestingLevelIncrementer is a simple class, which increase the counter in construction and decrease it in destruction. You could check the implementation here.
In this scrap, I think that is similar with increasing and reducing the number directly. Perhaps the only benefit is no matter to reduce the number then, but one new class is introduced.
Any other reason to use this pattern?

The intent is for the increment to be reversed no matter how the something else concludes; the stack variable will be destroyed when the method returns or an exception is thrown.
An alternative approach in other languages would use try...finally; see this for more discussion on RAII in C++ vs. finally:
Does C++ support 'finally' blocks? (And what's this 'RAII' I keep hearing about?)

cli/c++ increment operator overloading

i have a question regarding operator overloading in cli/c++ environment
static Length^ operator++(Length^ len)
{
Length^ temp = gcnew Length(len->feet, len->inches);
++temp->inches;
temp->feet += temp->inches/temp->inchesPerFoot;
temp->inches %= temp->inchesPerFoot;
return temp;
}
(the code is from ivor horton's book.)
why do we need to declare a new class object (temp) on the heap just to return it?
ive googled for the info on overloading but theres really not much out there and i feel kinda lost.

This is the way operator overloading is implemented in .NET. Overloaded operator is static function, which returns a new instance, instead of changing the current instance. Therefore, post and prefix ++ operators are the same. Most information about operator overloading talks about native C++. You can see .NET specific information, looking for C# samples, for example this: http://msdn.microsoft.com/en-us/library/aa288467(v=vs.71).aspx
.NET GC allows to create a lot of lightweight new instances, which are collected automatically. This is why .NET overloaded operators are more simple than in native C++.

Yes, because you're overloading POST-increment operator here. Hence, the original value may be used a lot in the code, copied and stored somewhere else, despite the existance of the new value. Example:
store_length_somewhere( len++ );
While len will be increased, the original value might be stored by the function somewhere else. That means that you might need two different values at the same time. Hence the creation and return of a new value.

Using memcpy to copy managed structures

I am working in mixed mode (managed C++ and C++ in one assembly). I am in a situation something like this.
ManagedStructure ^ managedStructure = gcnew ManagedStructure();
//here i set different properties of managedStructure
then I call "Method" given below and pass it "& managedStructure"
Method(void *ptrToStruct)
{
ManagedStructure ^ managedStructure2 = gcnew ManagedStructure();
memcpy(&managedStructure2 , ptrToStruct, sizeof(managedStructure2 ));
}
I have following question about this scenario.
1) Is it safe to use memcpy like this? and if not what is its alternate to achieve same functionality? ( I can't change "Method" definition)
2) I am not freeing any memory as both the structures are managed. Is it fine?

You could look into using a copy constructor or something similar. Check out this article as it explains a few things that might be useful.
I would assume your memory model is OK since it's all managed.

I'm not sure but you might need to pin managedStructure2 before the memcpy, look at the docs for pin_ptr<>. If it's not pinned, GC might happen on a separate thread in the middle of your memcpy, resulting in an intermittent bug.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

OpenMP - Iterator in the For-loop - iterator

std::list<T>::iterator is a bidirectional iterator. Afaik, openmp3 parallel for loop works with random access iterators only (and no !=, as ejd mentioned). Maybe you can use a std::vector instead.

Related

How to disable all optimization when using COSMIC compiler?

How to use Callgrind to profile specific functions?

Why use a single incrementer class

cli/c++ increment operator overloading

Using memcpy to copy managed structures

Categories

Resources