Kcachegrind/callgrind is inaccurate for dispatcher functions? - valgrind

I have a model code on which kcachegrind/callgrind reports strange results. It is kind of dispatcher function. The dispatcher is called from 4 places; each call says, which actual do_J function to run (so the first2 will call only do_1 and do_2 and so on)
Source (this is a model of actual code)
#define N 1000000
int a[N];
int do_1(int *a) { int i; for(i=0;i<N/4;i++) a[i]+=1; }
int do_2(int *a) { int i; for(i=0;i<N/2;i++) a[i]+=2; }
int do_3(int *a) { int i; for(i=0;i<N*3/4;i++) a[i]+=3; }
int do_4(int *a) { int i; for(i=0;i<N;i++) a[i]+=4; }
int dispatcher(int *a, int j) {
if(j==1) do_1(a);
else if(j==2) do_2(a);
else if(j==3) do_3(a);
else do_4(a);
}
int first2(int *a) { dispatcher(a,1); dispatcher(a,2); }
int last2(int *a) { dispatcher(a,4); dispatcher(a,3); }
int inner2(int *a) { dispatcher(a,2); dispatcher(a,3); }
int outer2(int *a) { dispatcher(a,1); dispatcher(a,4); }
int main(){
first2(a);
last2(a);
inner2(a);
outer2(a);
}
Compiled with gcc -O0; Callgrinded with valgrind --tool=callgrind; kcachegrinded with kcachegrind and qcachegrind-0.7.
Here is a full callgraph of the application. All paths to do_J go through dispatcher and this is good (the do_1 is just hided as too fast, but it is here really, just left to do_2)
Lets focus on do_1 and check, who called it (this picture is incorrect):
And this is very strange, I think, only first2 and outer2 called do_1 but not all.
Is it a limitation of callgrind/kcachegrind? How can I get accurate callgraph with weights (proportional to running time of every function, with and without its childs)?

Yes, this is limitation of callgrind format. It doesn't store full trace; it only stores parent-child calls information.
There is a google-perftools project with pprof/libprofiler.so CPU profiler, http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html . libprofiler.so can get profile with calltraces and it will store every trace event with full backtrace. pprof is converter of libprofile's output to graphic formats or to callgrind format. In full view the result will be the same as in kcachegrind; but if you will focus on some function, e.g. do_1 using pprof's option focus; it will show accurate calltree when focused on function.

Related

Parallel Dynamic Programming with CUDA

It is my first attempt to implement recursion with CUDA. The goal is to extract all the combinations from a set of chars "12345" using the power of CUDA to parallelize dynamically the task. Here is my kernel:
__device__ char route[31] = { "_________________________"};
__device__ char init[6] = { "12345" };
__global__ void Recursive(int depth) {
// up to depth 6
if (depth == 5) return;
// newroute = route - idx
int x = depth * 6;
printf("%s\n", route);
int o = 0;
int newlen = 0;
for (int i = 0; i<6; ++i)
{
if (i != threadIdx.x)
{
route[i+x-o] = init[i];
newlen++;
}
else
{
o = 1;
}
}
Recursive<<<1,newlen>>>(depth + 1);
}
__global__ void RecursiveCount() {
Recursive <<<1,5>>>(0);
}
The idea is to exclude 1 item (the item corresponding to the threadIdx) in each different thread. In each recursive call, using the variable depth, it works over a different base (variable x) on the route device variable.
I expect the kernel prompts something like:
2345_____________________
1345_____________________
1245_____________________
1234_____________________
2345_345_________________
2345_245_________________
2345_234_________________
2345_345__45_____________
2345_345__35_____________
2345_345__34_____________
..
2345_245__45_____________
..
But it prompts ...
·_____________
·_____________
·_____________
·_____________
·_____________
·2345
·2345
·2345
·2345
...
What I´m doing wrong?
What I´m doing wrong?
I may not articulate every problem with your code, but these items should get you a lot closer.
I recommend providing a complete example. In my view it is basically required by Stack Overflow, see item 1 here, note use of the word "must". Your example is missing any host code, including the original kernel call. It's only a few extra lines of code, why not include it? Sure, in this case, I can deduce what the call must have been, but why not just include it? Anyway, based on the output you indicated, it seems fairly evident the launch configuration of the host launch would have to be <<<1,1>>>.
This doesn't seem to be logical to me:
I expect the kernel prompts something like:
2345_____________________
The very first thing your kernel does is print out the route variable, before making any changes to it, so I would expect _____________________. However we can "fix" this by moving the printout to the end of the kernel.
You may be confused about what a __device__ variable is. It is a global variable, and there is only one copy of it. Therefore, when you modify it in your kernel code, every thread, in every kernel, is attempting to modify the same global variable, at the same time. That cannot possibly have orderly results, in any thread-parallel environment. I chose to "fix" this by making a local copy for each thread to work on.
You have an off-by-1 error, as well as an extent error in this loop:
for (int i = 0; i<6; ++i)
The off-by-1 error is due to the fact that you are iterating over 6 possible items (that is, i can reach a value of 5) but there are only 5 items in your init variable (the 6th item being a null terminator. The correct indexing starts out over 0-4 (with one of those being skipped). On subsequent iteration depths, its necessary to reduce this indexing extent by 1. Note that I've chosen to fix the first error here by increasing the length of init. There are other ways to fix, of course. My method inserts an extra _ between depths in the result.
You assume that at each iteration depth, the correct choice of items is the same, and in the same order, i.e. init. However this is not the case. At each depth, the choices of items must be selected not from the unchanging init variable, but from the choices passed from previous depth. Therefore we need a local, per-thread copy of init also.
A few other comments about CUDA Dynamic Parallelism (CDP). When passing pointers to data from one kernel scope to a child scope, local space pointers cannot be used. Therefore I allocate for the local copy of route from the heap, so it can be passed to child kernels. init can be deduced from route, so we can use an ordinary local variable for myinit.
You're going to quickly hit some dynamic parallelism (and perhaps memory) limits here if you continue this. I believe the total number of kernel launches for this is 5^5, which is 3125 (I'm doing this quickly, I may be mistaken). CDP has a pending launch limit of 2000 kernels by default. We're not hitting this here according to what I see, but you'll run into that sooner or later if you increase the depth or width of this operation. Furthermore, in-kernel allocations from the device heap are by default limited to 8KB. I don't seem to be hitting that limit, but probably I am, so my design should probably be modified to fix that.
Finally, in-kernel printf output is limited to the size of a particular buffer. If this technique is not already hitting that limit, it will soon if you increase the width or depth.
Here is a worked example, attempting to address the various items above. I'm not claiming it is defect free, but I think the output is closer to your expectations. Note that due to character limits on SO answers, I've truncated/excerpted some of the output.
$ cat t1639.cu
#include <stdio.h>
__device__ char route[31] = { "_________________________"};
__device__ char init[7] = { "12345_" };
__global__ void Recursive(int depth, const char *oroute) {
char *nroute = (char *)malloc(31);
char myinit[7];
if (depth == 0) memcpy(myinit, init, 6);
else memcpy(myinit, oroute+(depth-1)*6, 6);
myinit[6] = 0;
if (nroute == NULL) {printf("oops\n"); return;}
memcpy(nroute, oroute, 30);
nroute[30] = 0;
// up to depth 6
if (depth == 5) return;
// newroute = route - idx
int x = depth * 6;
//printf("%s\n", nroute);
int o = 0;
int newlen = 0;
for (int i = 0; i<(6-depth); ++i)
{
if (i != threadIdx.x)
{
nroute[i+x-o] = myinit[i];
newlen++;
}
else
{
o = 1;
}
}
printf("%s\n", nroute);
Recursive<<<1,newlen>>>(depth + 1, nroute);
}
__global__ void RecursiveCount() {
Recursive <<<1,5>>>(0, route);
}
int main(){
RecursiveCount<<<1,1>>>();
cudaDeviceSynchronize();
}
$ nvcc -o t1639 t1639.cu -rdc=true -lcudadevrt -arch=sm_70
$ cuda-memcheck ./t1639
========= CUDA-MEMCHECK
2345_____________________
1345_____________________
1245_____________________
1235_____________________
1234_____________________
2345__345________________
2345__245________________
2345__235________________
2345__234________________
2345__2345_______________
2345__345___45___________
2345__345___35___________
2345__345___34___________
2345__345___345__________
2345__345___45____5______
2345__345___45____4______
2345__345___45____45_____
2345__345___45____5______
2345__345___45____5_____5
2345__345___45____4______
2345__345___45____4_____4
2345__345___45____45____5
2345__345___45____45____4
2345__345___35____5______
2345__345___35____3______
2345__345___35____35_____
2345__345___35____5______
2345__345___35____5_____5
2345__345___35____3______
2345__345___35____3_____3
2345__345___35____35____5
2345__345___35____35____3
2345__345___34____4______
2345__345___34____3______
2345__345___34____34_____
2345__345___34____4______
2345__345___34____4_____4
2345__345___34____3______
2345__345___34____3_____3
2345__345___34____34____4
2345__345___34____34____3
2345__345___345___45_____
2345__345___345___35_____
2345__345___345___34_____
2345__345___345___45____5
2345__345___345___45____4
2345__345___345___35____5
2345__345___345___35____3
2345__345___345___34____4
2345__345___345___34____3
2345__245___45___________
2345__245___25___________
2345__245___24___________
2345__245___245__________
2345__245___45____5______
2345__245___45____4______
2345__245___45____45_____
2345__245___45____5______
2345__245___45____5_____5
2345__245___45____4______
2345__245___45____4_____4
2345__245___45____45____5
2345__245___45____45____4
2345__245___25____5______
2345__245___25____2______
2345__245___25____25_____
2345__245___25____5______
2345__245___25____5_____5
2345__245___25____2______
2345__245___25____2_____2
2345__245___25____25____5
2345__245___25____25____2
2345__245___24____4______
2345__245___24____2______
2345__245___24____24_____
2345__245___24____4______
2345__245___24____4_____4
2345__245___24____2______
2345__245___24____2_____2
2345__245___24____24____4
2345__245___24____24____2
2345__245___245___45_____
2345__245___245___25_____
2345__245___245___24_____
2345__245___245___45____5
2345__245___245___45____4
2345__245___245___25____5
2345__245___245___25____2
2345__245___245___24____4
2345__245___245___24____2
2345__235___35___________
2345__235___25___________
2345__235___23___________
2345__235___235__________
2345__235___35____5______
2345__235___35____3______
2345__235___35____35_____
2345__235___35____5______
2345__235___35____5_____5
2345__235___35____3______
2345__235___35____3_____3
2345__235___35____35____5
2345__235___35____35____3
2345__235___25____5______
2345__235___25____2______
2345__235___25____25_____
2345__235___25____5______
2345__235___25____5_____5
2345__235___25____2______
2345__235___25____2_____2
2345__235___25____25____5
2345__235___25____25____2
2345__235___23____3______
2345__235___23____2______
2345__235___23____23_____
2345__235___23____3______
2345__235___23____3_____3
2345__235___23____2______
2345__235___23____2_____2
2345__235___23____23____3
2345__235___23____23____2
2345__235___235___35_____
2345__235___235___25_____
2345__235___235___23_____
2345__235___235___35____5
2345__235___235___35____3
2345__235___235___25____5
2345__235___235___25____2
2345__235___235___23____3
2345__235___235___23____2
2345__234___34___________
2345__234___24___________
2345__234___23___________
2345__234___234__________
2345__234___34____4______
2345__234___34____3______
2345__234___34____34_____
2345__234___34____4______
2345__234___34____4_____4
2345__234___34____3______
2345__234___34____3_____3
2345__234___34____34____4
2345__234___34____34____3
2345__234___24____4______
2345__234___24____2______
2345__234___24____24_____
2345__234___24____4______
2345__234___24____4_____4
2345__234___24____2______
2345__234___24____2_____2
2345__234___24____24____4
2345__234___24____24____2
2345__234___23____3______
2345__234___23____2______
2345__234___23____23_____
2345__234___23____3______
2345__234___23____3_____3
2345__234___23____2______
2345__234___23____2_____2
2345__234___23____23____3
2345__234___23____23____2
2345__234___234___34_____
2345__234___234___24_____
2345__234___234___23_____
2345__234___234___34____4
2345__234___234___34____3
2345__234___234___24____4
2345__234___234___24____2
2345__234___234___23____3
2345__234___234___23____2
2345__2345__345__________
2345__2345__245__________
2345__2345__235__________
2345__2345__234__________
2345__2345__345___45_____
2345__2345__345___35_____
2345__2345__345___34_____
2345__2345__345___45____5
2345__2345__345___45____4
2345__2345__345___35____5
2345__2345__345___35____3
2345__2345__345___34____4
2345__2345__345___34____3
2345__2345__245___45_____
2345__2345__245___25_____
2345__2345__245___24_____
2345__2345__245___45____5
2345__2345__245___45____4
2345__2345__245___25____5
2345__2345__245___25____2
2345__2345__245___24____4
2345__2345__245___24____2
2345__2345__235___35_____
2345__2345__235___25_____
2345__2345__235___23_____
2345__2345__235___35____5
2345__2345__235___35____3
2345__2345__235___25____5
2345__2345__235___25____2
2345__2345__235___23____3
2345__2345__235___23____2
2345__2345__234___34_____
2345__2345__234___24_____
2345__2345__234___23_____
2345__2345__234___34____4
2345__2345__234___34____3
2345__2345__234___24____4
2345__2345__234___24____2
2345__2345__234___23____3
2345__2345__234___23____2
1345__345________________
1345__145________________
1345__135________________
1345__134________________
1345__1345_______________
1345__345___45___________
1345__345___35___________
1345__345___34___________
1345__345___345__________
1345__345___45____5______
1345__345___45____4______
1345__345___45____45_____
1345__345___45____5______
1345__345___45____5_____5
1345__345___45____4______
1345__345___45____4_____4
1345__345___45____45____5
1345__345___45____45____4
1345__345___35____5______
1345__345___35____3______
1345__345___35____35_____
1345__345___35____5______
1345__345___35____5_____5
1345__345___35____3______
1345__345___35____3_____3
1345__345___35____35____5
1345__345___35____35____3
1345__345___34____4______
1345__345___34____3______
1345__345___34____34_____
1345__345___34____4______
1345__345___34____4_____4
1345__345___34____3______
1345__345___34____3_____3
1345__345___34____34____4
1345__345___34____34____3
1345__345___345___45_____
1345__345___345___35_____
1345__345___345___34_____
1345__345___345___45____5
1345__345___345___45____4
1345__345___345___35____5
1345__345___345___35____3
1345__345___345___34____4
1345__345___345___34____3
1345__145___45___________
1345__145___15___________
1345__145___14___________
1345__145___145__________
1345__145___45____5______
1345__145___45____4______
1345__145___45____45_____
1345__145___45____5______
1345__145___45____5_____5
1345__145___45____4______
1345__145___45____4_____4
1345__145___45____45____5
1345__145___45____45____4
1345__145___15____5______
1345__145___15____1______
1345__145___15____15_____
1345__145___15____5______
1345__145___15____5_____5
1345__145___15____1______
1345__145___15____1_____1
1345__145___15____15____5
1345__145___15____15____1
1345__145___14____4______
1345__145___14____1______
1345__145___14____14_____
1345__145___14____4______
1345__145___14____4_____4
1345__145___14____1______
1345__145___14____1_____1
1345__145___14____14____4
1345__145___14____14____1
1345__145___145___45_____
1345__145___145___15_____
1345__145___145___14_____
1345__145___145___45____5
1345__145___145___45____4
1345__145___145___15____5
1345__145___145___15____1
1345__145___145___14____4
1345__145___145___14____1
1345__135___35___________
1345__135___15___________
1345__135___13___________
1345__135___135__________
1345__135___35____5______
1345__135___35____3______
1345__135___35____35_____
1345__135___35____5______
1345__135___35____5_____5
1345__135___35____3______
1345__135___35____3_____3
1345__135___35____35____5
1345__135___35____35____3
1345__135___15____5______
1345__135___15____1______
1345__135___15____15_____
1345__135___15____5______
1345__135___15____5_____5
1345__135___15____1______
1345__135___15____1_____1
1345__135___15____15____5
1345__135___15____15____1
1345__135___13____3______
1345__135___13____1______
1345__135___13____13_____
1345__135___13____3______
1345__135___13____3_____3
1345__135___13____1______
1345__135___13____1_____1
1345__135___13____13____3
1345__135___13____13____1
1345__135___135___35_____
1345__135___135___15_____
1345__135___135___13_____
1345__135___135___35____5
1345__135___135___35____3
1345__135___135___15____5
1345__135___135___15____1
1345__135___135___13____3
1345__135___135___13____1
1345__134___34___________
1345__134___14___________
1345__134___13___________
1345__134___134__________
1345__134___34____4______
1345__134___34____3______
1345__134___34____34_____
1345__134___34____4______
1345__134___34____4_____4
1345__134___34____3______
1345__134___34____3_____3
1345__134___34____34____4
1345__134___34____34____3
1345__134___14____4______
1345__134___14____1______
1345__134___14____14_____
1345__134___14____4______
1345__134___14____4_____4
1345__134___14____1______
1345__134___14____1_____1
1345__134___14____14____4
1345__134___14____14____1
1345__134___13____3______
1345__134___13____1______
1345__134___13____13_____
1345__134___13____3______
1345__134___13____3_____3
1345__134___13____1______
1345__134___13____1_____1
1345__134___13____13____3
1345__134___13____13____1
1345__134___134___34_____
1345__134___134___14_____
1345__134___134___13_____
1345__134___134___34____4
1345__134___134___34____3
1345__134___134___14____4
1345__134___134___14____1
1345__134___134___13____3
1345__134___134___13____1
1345__1345__345__________
1345__1345__145__________
1345__1345__135__________
1345__1345__134__________
1345__1345__345___45_____
1345__1345__345___35_____
1345__1345__345___34_____
1345__1345__345___45____5
1345__1345__345___45____4
1345__1345__345___35____5
1345__1345__345___35____3
1345__1345__345___34____4
1345__1345__345___34____3
1345__1345__145___45_____
1345__1345__145___15_____
1345__1345__145___14_____
1345__1345__145___45____5
1345__1345__145___45____4
1345__1345__145___15____5
1345__1345__145___15____1
1345__1345__145___14____4
1345__1345__145___14____1
1345__1345__135___35_____
1345__1345__135___15_____
1345__1345__135___13_____
1345__1345__135___35____5
1345__1345__135___35____3
1345__1345__135___15____5
1345__1345__135___15____1
1345__1345__135___13____3
1345__1345__135___13____1
1345__1345__134___34_____
1345__1345__134___14_____
1345__1345__134___13_____
1345__1345__134___34____4
1345__1345__134___34____3
1345__1345__134___14____4
1345__1345__134___14____1
1345__1345__134___13____3
1345__1345__134___13____1
1245__245________________
1245__145________________
1245__125________________
1245__124________________
1245__1245_______________
1245__245___45___________
1245__245___25___________
1245__245___24___________
1245__245___245__________
1245__245___45____5______
1245__245___45____4______
1245__245___45____45_____
1245__245___45____5______
1245__245___45____5_____5
1245__245___45____4______
1245__245___45____4_____4
1245__245___45____45____5
1245__245___45____45____4
1245__245___25____5______
1245__245___25____2______
1245__245___25____25_____
1245__245___25____5______
1245__245___25____5_____5
1245__245___25____2______
1245__245___25____2_____2
1245__245___25____25____5
1245__245___25____25____2
1245__245___24____4______
1245__245___24____2______
1245__245___24____24_____
1245__245___24____4______
1245__245___24____4_____4
1245__245___24____2______
1245__245___24____2_____2
1245__245___24____24____4
1245__245___24____24____2
1245__245___245___45_____
1245__245___245___25_____
1245__245___245___24_____
1245__245___245___45____5
1245__245___245___45____4
1245__245___245___25____5
1245__245___245___25____2
1245__245___245___24____4
1245__245___245___24____2
1245__145___45___________
1245__145___15___________
1245__145___14___________
1245__145___145__________
1245__145___45____5______
1245__145___45____4______
1245__145___45____45_____
1245__145___45____5______
1245__145___45____5_____5
1245__145___45____4______
...
1235__1235__235___25_____
1235__1235__235___23_____
1235__1235__235___35____5
1235__1235__235___35____3
1235__1235__235___25____5
1235__1235__235___25____2
1235__1235__235___23____3
1235__1235__235___23____2
1235__1235__135___35_____
1235__1235__135___15_____
1235__1235__135___13_____
1235__1235__135___35____5
1235__1235__135___35____3
1235__1235__135___15____5
1235__1235__135___15____1
1235__1235__135___13____3
1235__1235__135___13____1
1235__1235__125___25_____
1235__1235__125___15_____
1235__1235__125___12_____
1235__1235__125___25____5
1235__1235__125___25____2
1235__1235__125___15____5
1235__1235__125___15____1
1235__1235__125___12____2
1235__1235__125___12____1
1235__1235__123___23_____
1235__1235__123___13_____
1235__1235__123___12_____
1235__1235__123___23____3
1235__1235__123___23____2
1235__1235__123___13____3
1235__1235__123___13____1
1235__1235__123___12____2
1235__1235__123___12____1
1234__234________________
1234__134________________
1234__124________________
1234__123________________
1234__1234_______________
1234__234___34___________
1234__234___24___________
1234__234___23___________
1234__234___234__________
1234__234___34____4______
1234__234___34____3______
1234__234___34____34_____
1234__234___34____4______
1234__234___34____4_____4
1234__234___34____3______
1234__234___34____3_____3
1234__234___34____34____4
1234__234___34____34____3
1234__234___24____4______
1234__234___24____2______
1234__234___24____24_____
1234__234___24____4______
1234__234___24____4_____4
1234__234___24____2______
1234__234___24____2_____2
1234__234___24____24____4
1234__234___24____24____2
1234__234___23____3______
1234__234___23____2______
1234__234___23____23_____
1234__234___23____3______
1234__234___23____3_____3
1234__234___23____2______
1234__234___23____2_____2
1234__234___23____23____3
1234__234___23____23____2
1234__234___234___34_____
1234__234___234___24_____
1234__234___234___23_____
1234__234___234___34____4
1234__234___234___34____3
1234__234___234___24____4
1234__234___234___24____2
1234__234___234___23____3
1234__234___234___23____2
1234__134___34___________
1234__134___14___________
1234__134___13___________
1234__134___134__________
1234__134___34____4______
1234__134___34____3______
1234__134___34____34_____
1234__134___34____4______
1234__134___34____4_____4
1234__134___34____3______
1234__134___34____3_____3
1234__134___34____34____4
1234__134___34____34____3
1234__134___14____4______
1234__134___14____1______
1234__134___14____14_____
1234__134___14____4______
1234__134___14____4_____4
1234__134___14____1______
1234__134___14____1_____1
1234__134___14____14____4
1234__134___14____14____1
1234__134___13____3______
1234__134___13____1______
1234__134___13____13_____
1234__134___13____3______
1234__134___13____3_____3
1234__134___13____1______
1234__134___13____1_____1
1234__134___13____13____3
1234__134___13____13____1
1234__134___134___34_____
1234__134___134___14_____
1234__134___134___13_____
1234__134___134___34____4
1234__134___134___34____3
1234__134___134___14____4
1234__134___134___14____1
1234__134___134___13____3
1234__134___134___13____1
1234__124___24___________
1234__124___14___________
1234__124___12___________
1234__124___124__________
1234__124___24____4______
1234__124___24____2______
1234__124___24____24_____
1234__124___24____4______
1234__124___24____4_____4
1234__124___24____2______
1234__124___24____2_____2
1234__124___24____24____4
1234__124___24____24____2
1234__124___14____4______
1234__124___14____1______
1234__124___14____14_____
1234__124___14____4______
1234__124___14____4_____4
1234__124___14____1______
1234__124___14____1_____1
1234__124___14____14____4
1234__124___14____14____1
1234__124___12____2______
1234__124___12____1______
1234__124___12____12_____
1234__124___12____2______
1234__124___12____2_____2
1234__124___12____1______
1234__124___12____1_____1
1234__124___12____12____2
1234__124___12____12____1
1234__124___124___24_____
1234__124___124___14_____
1234__124___124___12_____
1234__124___124___24____4
1234__124___124___24____2
1234__124___124___14____4
1234__124___124___14____1
1234__124___124___12____2
1234__124___124___12____1
1234__123___23___________
1234__123___13___________
1234__123___12___________
1234__123___123__________
1234__123___23____3______
1234__123___23____2______
1234__123___23____23_____
1234__123___23____3______
1234__123___23____3_____3
1234__123___23____2______
1234__123___23____2_____2
1234__123___23____23____3
1234__123___23____23____2
1234__123___13____3______
1234__123___13____1______
1234__123___13____13_____
1234__123___13____3______
1234__123___13____3_____3
1234__123___13____1______
1234__123___13____1_____1
1234__123___13____13____3
1234__123___13____13____1
1234__123___12____2______
1234__123___12____1______
1234__123___12____12_____
1234__123___12____2______
1234__123___12____2_____2
1234__123___12____1______
1234__123___12____1_____1
1234__123___12____12____2
1234__123___12____12____1
1234__123___123___23_____
1234__123___123___13_____
1234__123___123___12_____
1234__123___123___23____3
1234__123___123___23____2
1234__123___123___13____3
1234__123___123___13____1
1234__123___123___12____2
1234__123___123___12____1
1234__1234__234__________
1234__1234__134__________
1234__1234__124__________
1234__1234__123__________
1234__1234__234___34_____
1234__1234__234___24_____
1234__1234__234___23_____
1234__1234__234___34____4
1234__1234__234___34____3
1234__1234__234___24____4
1234__1234__234___24____2
1234__1234__234___23____3
1234__1234__234___23____2
1234__1234__134___34_____
1234__1234__134___14_____
1234__1234__134___13_____
1234__1234__134___34____4
1234__1234__134___34____3
1234__1234__134___14____4
1234__1234__134___14____1
1234__1234__134___13____3
1234__1234__134___13____1
1234__1234__124___24_____
1234__1234__124___14_____
1234__1234__124___12_____
1234__1234__124___24____4
1234__1234__124___24____2
1234__1234__124___14____4
1234__1234__124___14____1
1234__1234__124___12____2
1234__1234__124___12____1
1234__1234__123___23_____
1234__1234__123___13_____
1234__1234__123___12_____
1234__1234__123___23____3
1234__1234__123___23____2
1234__1234__123___13____3
1234__1234__123___13____1
1234__1234__123___12____2
1234__1234__123___12____1
========= ERROR SUMMARY: 0 errors
$
The answer given by Robert Crovella is correct at the 5th point, the mistake was in the using of init in every recursive call, but I want to clarify something that can be useful for other beginners with CUDA.
I used this variable because when I tried to launch a child kernel passing a local variable I always got the exception: Error: a pointer to local memory cannot be passed to a launch as an argument.
As I´m C# expert developer I´m not used to using pointers (Ref does the low-level-work for that) so I thought there was no way to do it in CUDA/c programming.
As Robert shows in its code it is possible copying the pointer with memalloc for using it as a referable argument.
Here is a kernel simplified as an example of deep recursion.
__device__ char init[6] = { "12345" };
__global__ void Recursive(int depth, const char* route) {
// up to depth 6
if (depth == 5) return;
//declaration for a referable argument (point 6)
char* newroute = (char*)malloc(6);
memcpy(newroute, route, 5);
int o = 0;
int newlen = 0;
for (int i = 0; i < (6 - depth); ++i)
{
if (i != threadIdx.x)
{
newroute[i - o] = route[i];
newlen++;
}
else
{
o = 1;
}
}
printf("%s\n", newroute);
Recursive <<<1, newlen>>>(depth + 1, newroute);
}
__global__ void RecursiveCount() {
Recursive <<<1, 5>>>(0, init);
}
I don't add the main call because I´m using ManagedCUDA for C# but as Robert says it can be figured-out how the call RecursiveCount is.
About ending arrays of char with /0 ... sorry but I don't know exactly what is the benefit; this code works fine without them.

How to use DLLs to exchange values between MQL4 programs?

First of all I need to say I don't know much about DLLs.
I am trying to send data from one program to another, using functions from kernel32.dll. My programs are coded in MQL4.
This is the code I use for the server part, which is supposed to save the data:
#define INVALID_HANDLE_VALUE -1
#define BUF_SIZE 256
#define PAGE_READWRITE 0x0004
#define FILE_MAP_ALL_ACCESS 0xf001F
#import "kernel32.dll"
int CreateFileMappingA(int hFile, int lpAttributes, int flProtect, int dwMaximumSizeHigh, int dwMaximumSizeLow, string lpName);
int MapViewOfFile(int hFileMappingObject, int dwDesiredAccess, int dwFileOffsetHigh, int dwFileOffsetLow, int dwNumberOfBytesToMap);
int UnmapViewOfFile(int lpBaseAddress);
int RtlMoveMemory(int DestPtr, string s, int Length);
int CloseHandle(int hwnd);
int CreateMutexA(int attr, int owner, string mutexName);
int ReleaseMutex(int hnd);
int WaitForSingleObject(int hnd, int dwMilliseconds);
bool started = False;
int hMapFile = 0;
int pBuf=0;
int hMutex;
int OnInit()
{
if(!started) {
started = true;
string szName="Global\\Value1";
int hMapFile = CreateFileMappingA(INVALID_HANDLE_VALUE,0,PAGE_READWRITE,0,BUF_SIZE,szName);
if(hMapFile==0) {
Alert("CreateFileMapping failed!");
return;
}
pBuf = MapViewOfFile(hMapFile, FILE_MAP_ALL_ACCESS, 0, 0, BUF_SIZE);
if(pBuf==0) {
Alert("Map View failed!");
return;
}
hMutex = CreateMutexA(0,0,"PriceMapMutex");
}
}
void OnTick()
{
WaitForSingleObject(hMutex,1000);
if(pBuf==0) return;
string szMsg = DoubleToStr(Bid,Digits);
Comment("Data: ",szMsg);
RtlMoveMemory(pBuf, szMsg, StringLen(szMsg)+1);
ReleaseMutex(hMutex);
return(0);
}
int deinit()
{
switch(UninitializeReason()) {
case REASON_CHARTCLOSE:
case REASON_REMOVE:
UnmapViewOfFile(pBuf);
CloseHandle(hMapFile);
break;
}
return(0);
}
This is what I use for my client part, which is supposed to pick up the data:
#define INVALID_HANDLE_VALUE -1
#define BUF_SIZE 1024
#define FILE_MAP_READ 4
extern int BufferSize = 1024;
#import "kernel32.dll"
int OpenFileMappingA(int dwDesiredAccess, bool bInheritHandle, string lpName);
string MapViewOfFile(int hFileMappingObject, int dwDesiredAccess, int dwFileOffsetHigh, int dwFileOffsetLow, int dwNumberOfBytesToMap);
int UnmapViewOfFile(string lpBaseAddress);
int CloseHandle(int hwnd);
int CreateMutexA(int attr, int owner, string mutexName);
int ReleaseMutex(int hnd);
int WaitForSingleObject(int hnd, int dwMilliseconds);
string szName;
int hMapFile;
string obj;
string data;
int hMutex;
double dd;
int OnInit()
{
szName="Global\\Value1";
hMapFile = OpenFileMappingA(FILE_MAP_READ,False,szName);
if(hMapFile==0) {
Alert("CreateFile Failed!");
return;
}
obj="data";
ObjectCreate(obj,OBJ_HLINE,0,0,0);
ObjectSet(obj,OBJPROP_COLOR,Gold);
hMutex = CreateMutexA(0,0,"PriceMapMutex");
}
void OnDeinit(const int reason)
{
CloseHandle(hMapFile);
Comment("");
ObjectDelete(obj);
return(0);
}
void start()
{
getsignal();
Comment("Data: ",DoubleToStr(dd,Digits));
Sleep(50);
}
void getsignal() {
WaitForSingleObject(hMutex,333);
data = MapViewOfFile(hMapFile,FILE_MAP_READ,0,0,BUF_SIZE);
dd = StrToDouble(data);
ReleaseMutex(hMutex);
UnmapViewOfFile(data);
ObjectMove(obj,0,Time[0],dd);
}
The code basically works. However I am facing 2 major problems with it.
Problem number 1:
I want to exchange multiple values ( value1, value2, value3, ... ). For some reason it seems to be irrelevant which name I use for szName="Global\\Value1". The server saves the value and the client picks it up no matter what names I use szName="Global\\Value1", szName="Global\\Value2" or szName="Global\\Value3". So for example in the server code I use szName="Global\\Value1" and in my client code I use szName="Global\\Value3" the client still picks up the value which the server writes to szName="Global\\Value1".
Problem number 2:
my client is only stable for about 5 hours. After that I get a message in the client programme saying
"There is a problem and the program needs to be closed...".
Then I close and restart my client and its again working for the next 5 hours.
Has anyone any idea?
FILE MEDIUM
I agree that Kernel32 is not a good option if you need to do MT4-to-MT4 interfacing. The reason is that Kernel32 is Windows specific. The EA won't work on Mac. Also, messing around with Kernel32 DLL might cause memory leaks (eg, your 5hr live). Plus, it requires user to know how to enable DLL (you'd be surprise how many users have no idea how to enable it).
Recommendation:
If you only need SAME MT4 exchange (EAs between charts), then use the GlobalVariableGet(), GlobalVariableSet(), etc.
If you need exchange between 2 different MT4s (on the same PC) --even if it is across different broker MT4, then use the FILE system: Files\FilePipe.mqh which allows you to write to the common MT4 folder:
#include <Files\FilePipe.mqh>
CFilePipe voPipeOut;
voPipeOut.Open("yourFileName.txt", FILE_WRITE|FILE_COMMON|FILE_BIN);
voPipeOut.WriteString("WhatEverMessage, probably some CSV value here");
voPipeOut.Close();
and subsequently
CFilePipe voPipeFile;
string vsInString = "";
voPipeFile.Open("yourFileName.txt", FILE_SHARE_READ|FILE_COMMON|FILE_BIN);
voPipeFile.Seek(0,SEEK_SET);
voPipeFile.ReadString( vsInString );
voPipeFile.Close();
This way, your EA won't depend on DLLs and also works in a wide range of environments. It is very fast (under 2ms for a 1Mb pipe). It works even for cross-broker interfacing (exchanging info [feed?] between 2 different brokers).
The best idea?
The best thing I can advise you is to stop trying to tweak KERNEL32.DLL published API to make it use with MetaTrader Terminal 4 code-execution ecosystem and to rather start designing a professional distributed system, independently of injecting objects into the O/S pagefile and hassling with semaphores and MUTEX-es.
Besides the best next step:
MQL4 code ought NEVER block. A MUTEX-signalling turned into a non-blocking state is a must
MQL4 code / API mapper ought respect the data-types and their actual memory sizes in MQL4
MQL4 code ought conform to the recent New-MQL4 rules ( sections are in "old"-MQL4 )
MQL4 declared string is not a C-lang string, but rather a struct! Handle with care!
MQL4 code violated in several places syntax rules, just test with #property strict
MQL4 code is "shooting itself in leg" when ignoring namespace boundaries / scopes of declaration
MQL4 code ignores potential error states, not inspecting any GetLastError() to handle such collision(s)
MQL4 code does not gracefully return resources ( forgets to clear 'em )
MQL4 code proposed exposes itself into an immense risk of KERNEL32.DLL API usage unlocked stealth security flaw / enabling a run-time hijacking hack
better use separation of concerns, using ZeroMQ or nanomsg messaging to "exchange values between ( not only ) MQL4 programs"

Fixing Puzzle Game

I'm new to C++ and programming in general, and this is my first 'project'. I'm sorry if this is not an applicable question, tell me and I will delete it right away.
I want to build a puzzle game, but instead of sliding tiles, the user swaps the number adjacent to '0', where '0' is the blank tile.
This is the code I've so far:
#include <iostream>
#include <ctime>
#include <algorithm>
using namespace std;
int grid[]= {1,2,3,4,5,6,7,8,0};
void shuffle()
{
random_shuffle(&grid[0],&grid[9]);
}
void print_puzzle()
{
shuffle();
int a=grid[0];
int b=grid[1];
int c=grid[2];
int d=grid[3];
int e=grid[4];
int f=grid[5];
int g=grid[6];
int h=grid[7];
int i=grid[8];
cout<<endl;
cout<<" "<<a<<" "<<b<<" "<<c<<endl;
cout<<" "<<d<<" "<<e<<" "<<f<<endl;
cout<<" "<<g<<" "<<h<<" "<<i<<endl;
}
bool check()
{
if (grid[0]==1&&grid[1]==2&&grid[2]==3&&grid[3]==4&&grid[4]==5&&grid[6]==7&&grid[7]==8&&grid[8]==0)
{
return true;
}
else
{
return false;
}
}
void swap_puzzle()
{
print_puzzle();
int temp=0;
cout<<"Enter the number adjacent to '0' that you want to be swapped with '0'."<<endl;
cin>> temp;
while(check==false)
{
for(int i=0; i<9; i++)
{
if(grid[i]==temp)
{
int pos_num=i;
for(int j=0; j<9; j++)
{
if(grid[j]==0)
{
int pos_0=j;
swap(grid[pos_num], grid[pos_0]);
break;
}
}
}
}
print_puzzle();
}
}
int main()
{
srand ( time(NULL) );
cout<<"The Rules:"<<endl;
cout<<"There will be a shuffled 3x3 grid with numbers from 1-8, with one blank space denoted by '0'. You can swap '0' only with adjacent numbers. The goal of this game is to rearrange the numbers in the grid in ascending order."<<endl;
cout<<endl;
cout<<"The Shuffled Grid: "<<endl;
swap_puzzle();
return 0;
}
After swapping the number that the enter user with '0', the grid just shuffles instead of swapping them as I intended. I tried fixing this problem by breaking the code into smaller functions, but that didn't help.
Any advice would be appreciated.
Your code has several problems.
At the end of your swap_puzzle() function, inside the while loop, you call print_puzzle(). print_puzzle calls shuffle(). That's why everything is shuffled instead of having just a swap operation. To fix this you must decouple these 2 functions: printing is one thing, shuffling is another one. Neither of these should trigger the other. Or MAYBE you could have the shuffle function that calls the print one at the end, to show the new status; but a "print" function should never do anything other than printing.
Then, in your check() function you have forgotten grid[5]==6. Actually this might not be important, because if all the others are at the right place, then grid[5] should automatically be right. Anyway, for what it costs, I would definitely add it.
Then, your swap_puzzle() has a problem where you call swap and then you break. You have 2 nested for loops, and you must break from both. Right now, you are breaking from the inner one (the one that uses j), which means that then the program will resume at the next iteration of i. It is possible, then, that the swap conditions are met again, as a result of the previous swap. So the user wants to make one move, and he ends up with 2. To be precise, the second one will revert the first one, and of course this is not wanted. To solve this you could add a bool variable called swapped and set it to false at the beginning of the while loop; when you reach the break, before breaking you set it to true. Then, immediately after the end of the inner loop, you check the value of swapped: if it is true you have to break again.
As your program is structured right now, you display cout and read the input with cin only once. You must put both functions inside the while loop, so that they are run every time, for every move.
In your print_puzzle function you have this line
shuffle();
That mean every time you go to print your puzzle, it shuffles it. You don't want that.
Also this line:
while(check==false)
should be
while(check()==false)
check by itself is the pointer to the function, not a call to the function.
You also want to move the request for a new number to swap inside your loop, otherwise your loop will just go on forever.
I did some minor changes and it works.
starting game may need to shuffle once.
call to shuffle() in swap_puzzle() (but don't shuffle it when printing)
put cin>> temp; in the while loop so users can input till puzzle solved
break the inner loop too(i added some vars for that)
finally call to check() and break if true. thats means solved.
--
#include <iostream>
#include <ctime>
#include <algorithm>
using namespace std;
int grid[]= {1,2,3,4,5,6,7,8,0};
void shuffle()
{
random_shuffle(&grid[0],&grid[9]);
}
void print_puzzle()
{
// shuffle();
int a=grid[0];
int b=grid[1];
int c=grid[2];
int d=grid[3];
int e=grid[4];
int f=grid[5];
int g=grid[6];
int h=grid[7];
int i=grid[8];
cout<<endl;
cout<<" "<<a<<" "<<b<<" "<<c<<endl;
cout<<" "<<d<<" "<<e<<" "<<f<<endl;
cout<<" "<<g<<" "<<h<<" "<<i<<endl;
}
bool check()
{
if (grid[0]==1&&grid[1]==2&&grid[2]==3&&grid[3]==4&&grid[4]==5&&grid[6]==7&&grid[7]==8&&grid[8]==0)
{
return true;
}
else
{
return false;
}
}
void swap_puzzle()
{
shuffle();
print_puzzle();
int temp=0;
bool swaped = false;
while(1)
{
swaped = false;
cout<<"Enter the number adjacent to '0' that you want to be swapped with '0'."<<endl;
cin>> temp;
for(int i=0; i<9; i++)
{
if(grid[i]==temp)
{
int pos_num=i;
for(int j=0; j<9; j++)
{
if(grid[j]==0)
{
int pos_0=j;
swap(grid[pos_num], grid[pos_0]);
swaped = true;
break;
}
}
}
if(swaped) break;
}
print_puzzle();
if(check()==true) break;
}
}
int main()
{
srand ( time(NULL) );
cout<<"The Rules:"<<endl;
cout<<"There will be a shuffled 3x3 grid with numbers from 1-8, with one blank space denoted by '0'. You can swap '0' only with adjacent numbers. The goal of this game is to rearrange the numbers in the grid in ascending order."<<endl;
cout<<endl;
cout<<"The Shuffled Grid: "<<endl;
swap_puzzle();
return 0;
}

How to declare a C function with an undetermined return type?

Can I declare a C function with an undetermined return type (without C compiler warning)? The return type could be int, float, double, void *, etc.
undetermined_return_type miscellaneousFunction(undetermined_return_type inputValue);
And you can use this function in other functions to return a value (although that could be a run time error):
BOOL isHappy(int feel){
return miscellaneousFunction(feel);
};
float percentage(float sales){
return miscellaneousFunction(sales);
};
What I'm looking for:
To declare and to implement a C function (or Obj-C method) with an undefined-return-type could be useful for aspect-oriented programming.
If I could intercept Obj-C messages in another function in run time, I might return the value of that message to the original receiver or not with doing something else action. For example:
- (unknown_return_type) interceptMessage:(unknown_return_type retValOfMessage){
// I may print the value here
// No idea how to print the retValOfMessage (I mark the code with %???)
print ("The message has been intercepted, and the return value of the message is %???", retValOfMessage);
// Or do something you want (e.g. lock/unlock, database open/close, and so on).
// And you might modify the retValOfMessage before returning.
return retValOfMessage;
}
So I can intercept the original message with a little addition:
// Original Method
- (int) isHappy{
return [self calculateHowHappyNow];
}
// With Interception
- (int) isHappy{
// This would print the information on the console.
return [self interceptMessage:[self calculateHowHappyNow]];
}
You can use a void * type.
Then for example:
float percentage(float sales){
return *(float *) miscellaneousFunction(sales);
}
Be sure not to return a pointer to a object with automatic storage duration.
You may use the preprocessor.
#include <stdio.h>
#define FUNC(return_type, name, arg) \
return_type name(return_type arg) \
{ \
return miscellaneousFunction(arg); \
}
FUNC(float, undefined_return_func, arg)
int main(int argc, char *argv[])
{
printf("\n %f \n", undefined_return_func(3.14159));
return 0;
}
May be a union as suggested by thejh
typedef struct
{
enum {
INT,
FLOAT,
DOUBLE
} ret_type;
union
{
double d;
float f;
int i;
} ret_val;
} any_type;
any_type miscellaneousFunction(any_type inputValue) {/*return inputValue;*/}
any_type isHappy(any_type feel){
return miscellaneousFunction(feel);
}
any_type percentage(any_type sales){
return miscellaneousFunction(sales);
}
Here with ret_type you can know data type of return value and ret_type. i,f,d can give you corresponding value.
All elements will use same memory space and only one should be accessed.
Straight C doesn't support dynamically-typed variables (variants) since it is statically typed, but there might be some libraries that do what you want.

Trying to parse OpenCV YAML ouput with yaml-cpp

I've got a series of OpenCv generated YAML files and would like to parse them with yaml-cpp
I'm doing okay on simple stuff, but the matrix representation is proving difficult.
# Center of table
tableCenter: !!opencv-matrix
rows: 1
cols: 2
dt: f
data: [ 240, 240]
This should map into the vector
240
240
with type float. My code looks like:
#include "yaml.h"
#include <fstream>
#include <string>
struct Matrix {
int x;
};
void operator >> (const YAML::Node& node, Matrix& matrix) {
unsigned rows;
node["rows"] >> rows;
}
int main()
{
std::ifstream fin("monsters.yaml");
YAML::Parser parser(fin);
YAML::Node doc;
Matrix m;
doc["tableCenter"] >> m;
return 0;
}
But I get
terminate called after throwing an instance of 'YAML::BadDereference'
what(): yaml-cpp: error at line 0, column 0: bad dereference
Abort trap
I searched around for some documentation for yaml-cpp, but there doesn't seem to be any, aside from a short introductory example on parsing and emitting. Unfortunately, neither of these two help in this particular circumstance.
As I understand, the !! indicate that this is a user-defined type, but I don't see with yaml-cpp how to parse that.
You have to tell yaml-cpp how to parse this type. Since C++ isn't dynamically typed, it can't detect what data type you want and create it from scratch - you have to tell it directly. Tagging a node is really only for yourself, not for the parser (it'll just faithfully store it for you).
I'm not really sure how an OpenCV matrix is stored, but if it's something like this:
class Matrix {
public:
Matrix(unsigned r, unsigned c, const std::vector<float>& d): rows(r), cols(c), data(d) { /* init */ }
Matrix(const Matrix&) { /* copy */ }
~Matrix() { /* delete */ }
Matrix& operator = (const Matrix&) { /* assign */ }
private:
unsigned rows, cols;
std::vector<float> data;
};
then you can write something like
void operator >> (const YAML::Node& node, Matrix& matrix) {
unsigned rows, cols;
std::vector<float> data;
node["rows"] >> rows;
node["cols"] >> cols;
node["data"] >> data;
matrix = Matrix(rows, cols, data);
}
Edit It appears that you're ok up until here; but you're missing the step where the parser loads the information into the YAML::Node. Instead, se it like:
std::ifstream fin("monsters.yaml");
YAML::Parser parser(fin);
YAML::Node doc;
parser.GetNextDocument(doc); // <-- this line was missing!
Matrix m;
doc["tableCenter"] >> m;
Note: I'm guessing dt: f means "data type is float". If that's the case, it'll really depend on how the Matrix class handles this. If you have a different class for each data type (or a templated class), you'll have to read that field first, and then choose which type to instantiate. (If you know it'll always be float, that'll make your life easier, of course.)