Measuring Time inside kernel on intel iGPU - gpu

I am very new to OpenCL, however I have fair amount of experience on GPU programming using CUDA. I used to use clock function inside CUDA kernel (as mentioned in here) to measure ticks of certain operations inside the kernel. I wrote a simple OpenCL vector addition kernel and tried to run it on the intel integrated GPU. The program ran fine and gave correct output. But then I tried to use the clock function inside the kernel function and there is JIT compilation error while executing the clBuildProgram. The vector addition kernel that I wanted to execute is provided below:
__kernel void testVecAdd(__global const int *a,__global const int *b,__global int *c,
__global float *t){
clock_t start = clock();
int gid = get_global_id(0);
c[gid] = a[gid] + b[gid];
t[gid] = (float)(clock()-start)/CLOCKS_PER_SEC;
}
The errors are as follows:
/home/duttasankha/Desktop/SANKHA_ALL/IGPU_RESEARCH_RELATED/OCL_PRAC_DIR/test_OCL_1.cl:6:2: error: use of undeclared identifier 'clock_t'
clock_t start = clock();
^
/home/duttasankha/Desktop/SANKHA_ALL/IGPU_RESEARCH_RELATED/OCL_PRAC_DIR/test_OCL_1.cl:11:19: error: implicit declaration of function 'clock' is invalid in OpenCL
t[gid] = (float)(clock()-start)/CLOCKS_PER_SEC;
^
/home/duttasankha/Desktop/SANKHA_ALL/IGPU_RESEARCH_RELATED/OCL_PRAC_DIR/test_OCL_1.cl:11:27: error: use of undeclared identifier 'start'; did you mean 'sqrt'?
t[gid] = (float)(clock()-start)/CLOCKS_PER_SEC;
^~~~~
sqrt
CTHeader.h:5277:40: note: 'sqrt' declared here
double16 __attribute__((overloadable)) sqrt(double16);
^
/home/duttasankha/Desktop/SANKHA_ALL/IGPU_RESEARCH_RELATED/OCL_PRAC_DIR/test_OCL_1.cl:11:27: error: taking address of function is not allowed
t[gid] = (float)(clock()-start)/CLOCKS_PER_SEC;
^
/home/duttasankha/Desktop/SANKHA_ALL/IGPU_RESEARCH_RELATED/OCL_PRAC_DIR/test_OCL_1.cl:11:34: error: use of undeclared identifier 'CLOCKS_PER_SEC'
t[gid] = (float)(clock()-start)/CLOCKS_PER_SEC;
^
Failed to build program...: -11 (CL_BUILD_PROGRAM_FAILURE)
Build failed!
I was able to do this in the CUDA as it supports clock function. But similar goals was not achieved with the intel iGPU. I also tried other functions to measure the ticks but none of them worked as well. I also tried offline compilation using ioc64 but I got same errors. I was just wondering if someone could tell me is there anything wrong I am doing in here or getting the ticks using clock (or similar) functions is not possible in the intel integrated GPU. It is absolutely necessary for me to get this execution traces. So if using clock function is not a viable option then I was wondering what would be the alternate option in here to achieve same goals and how can I use it? Thank you.

I have posted this in the intel opencl forum and the solution has been provided there. Please follow this forum post link to find the answer. If you have any following questions, you can post either in here or in the intel forum. Thanks.

Related

Error: No operator "=" matches these operands in "Servo_Project.cpp", Line: 15, Col: 22

So I tried using code from another post around here to see if I could use it, it was a code meant to utilize a potentiometer to move a servo motor, but when I attempted to compile it is gave the error above saying No operator "=" matches these operands in "Servo_Project.cpp". How do I go about fixing this error?
Just in case ill say this, the boards I was trying to compile the code were a NUCLEO-L476RG, the board from the post I mentioned utilized Nucleo L496ZG board and a Tower Pro Micro Servo 9G.
#include "mbed.h"
#include "Servo.h"
Servo myservo(D6);
AnalogOut MyPot(A0);
int main() {
float PotReading;
PotReading = MyPot.read();
while(1) {
for(int i=0; i<100; i++) {
myservo = (i/100);
wait(0.01);
}
}
}
This line:
myservo = (i/100);
Is wrong in a couple of ways. First, i/100 will always be zero - integer division truncates in C++. Second, there's not an = operator that allows an integer value to be assigned to a Servo object. YOu need to invoke some kind of Servo method instead, likely write().
myservo.write(SOMETHING);
The SOMETHING should be the position or speed of the servo you're trying to get working. See the Servo class reference for an explanation. Your code tries to use fractions from 0-1 and thatvisn't going to work - the Servo wants a position/speed between 0 and 180.
You should look in the Servo.h header to see what member functions and operators are implemented.
Assuming what you are using is this, it does have:
Servo& operator= (float percent);
Although note that the parameter is float and you are passing an int (the parameter is also in the range 0.0 to 1.0 - so not "percent" as its name suggests - so be wary, both the documentation and the naming are poor). You should have:
myservo = i/100.0f;
However, even though i / 100 would produce zero for all i in the loop, that does not explain the error, since an implicit cast should be possible - even if clearly undesirable. You should look in the actual header you are using to see if the operator= is declared - possibly you have the wrong file or a different version or just an entirely different implementation that happens to use teh same name.
I also notice that if you look in the header, there is no documentation mark-up for this function and the Servo& operator= (Servo& rhs); member is not documented at all - hence the confusing automatically generated "Shorthand for the write and read functions." on the Servo doc page when the function shown is only one of those things. It is possible it has been removed from your version.
Given that the documentation is incomplete and that the operator= looks like an after thought, the simplest solution is to use the read() / write() members directly in any case. Or implement your own Servo class - it appears to be only a thin wrapper/facade of the PwmOut class in any case. Since that is actually part of mbed rather than user contributed code of unknown quality, you may be on firmer ground.

PGI compiler issue in routine directive

This is a section of C++ develoepd muti-gpu code.
I am trying the CPU version where OPENACC=0
#if (OPENACC==1)
#pragma acc routine
#endif
void myCass::method( int i, int j, int dir, int index )
{
#if (OPENACC==1)
double Sn[ZSIZE];
#else
double *Sn=new double[ZSIZE] (double *Sn=()malloc(ZSIZE))
#endif
}
The following method gives the compiler error "PGCC-S-1000-Call in
OpenACC region to procedure '_Znam' which has no acc routine
information" but if I replace the "new" with C-style allocation
(i.e. malloc ) in compiles fine. Is this something to be expected? I
use PGI version 18.1
Is it safe to use a large private variable such as Sn ?
You need to give a level of parallelism on your acc routine pragma. It looks like the correct pragma in your case would be acc routine seq since it contains no loops to parallelize.
"_Znam" is the mangled name for the new operator which doesn't have a device side equivalent. However, there is a CUDA device malloc routine and why it works.
Though, I would highly recommend you not perform device side dynamic allocation. Besides being very slow, the device heap is very small (default is around 8MB). While you can increase this by setting the PGI environment variable "PGI_ACC_CUDA_HEAPSIZE", the max is still only around 32MB. (Note that I haven't tested the max device heap size in awhile so this may have increased but I'd still not recommended doing device side allocation.)
Also, yes, if the level of parallelism is not specified, PGI will default to using "seq".

How to demonstrate a memory misalignment error in C on a macbook pro (Intel 64 bit processor)

In an attempt to understand C memory alignment or whatever the term is (data structure alignment?), i'm trying to write code that results in a alignment error. The original reason that brought me to learning about this is that i'm writing data parsing code that reads binary data received over the network. The data contains some uint32s, uint64s, floats, and doubles, and i'd like to make sure they are never corrupted due to errors in my parsing code.
An unsuccessful attempt at causing some problem due to misalignment:
uint32_t integer = 1027;
uint8_t * pointer = (uint8_t *)&integer;
uint8_t * bytes = malloc(5);
bytes[0] = 23; // extra byte to misalign uint32_t data
bytes[1] = pointer[0];
bytes[2] = pointer[1];
bytes[3] = pointer[2];
bytes[4] = pointer[3];
uint32_t integer2 = *(uint32_t *)(bytes + 1);
printf("integer: %u\ninteger2: %u\n", integer, integer2);
On my machine both integers print out the same. (macbook pro with Intel 64 bit processor, not sure what exactly determines alignment behaviour, is it the architecture? or exact CPU model? or compiler maybe? i use Xcode so clang)
I guess my processor/machine/setup supports unaligned reads so it takes the above code without any problems.
What would a case where parsing of say an uint32_t would fail because of code not taking alignment in account? Is there a way to make it fail on a modern Intel 64 bit system? Or am i safe from alignment errors when using simple datatypes like integers and floats (no structs)?
Edit: If anyone's reading this later, i found a similar question with interesting info: Mis-aligned pointers on x86
Normally, the x86 architecture doesn't have alignment requirements [except for some SIMD instructions like movdqa).
However, since you're trying to write code to cause such an exception ...
There is an alignment check exception bit that can be set into the x86 flags register. If you turn in on, an unaligned access will generate an exception which will show up [under linux at least] as a bus error (i.e. SIGBUS)
See my answer here: any way to stop unaligned access from c++ standard library on x86_64? for details and some sample programs to generate an exception.

3D boolean assertion fail. "mark" not matching

I'm trying to do some 3D boolean operations with CGAL. I have successfully converted my polyhedra to nef polyhedra. However, when ever I try doing a simple union, I get an assertion fail on line 286 of SM_overlayer.h:
CGAL error: assertion violation!
Expression : G.mark(v1,0)==G.mark(v2,0)&& G.mark(v1,1)==G.mark(v2,1)
File : ./CGAL/include/CGAL/Nef_S2/SM_overlayer.h
Line : 287
I tried searching the documentation for "mark". Apparently it is a template argument on the Nef_polyhedron_3" that defaults to bool. However the documentation also says it is not implemented and that you shouldn't mess with it. I am a bit confused why there is even an assertion on some unimplemented feature. I tried simply commenting out the assertion, but it simply goes on to fail at a later point.
I am using the following kernel and typedefs as it was the only example I could find that allowed doubles in the construction of the meshes.
typedef CGAL::Exact_predicates_exact_constructions_kernel Kernel;
typedef CGAL::Nef_polyhedron_3<Kernel> Nef_polyhedron;
typedef CGAL::Polyhedron_3<Kernel> Polyhedron;
I used CGAL 4.6.3 installed with the exe installer. I have also tried the 4.7 beta and I get the same exception(at line 300 instead though).
Related github issue: https://github.com/CGAL/cgal/issues/353
EDIT: The issue turned out to be with the meshes. The meshes I were using had self intersections. Thus, even though is_valid, is_closed and is_triangular returned true, the mesh was in valid for conversion to a nef polyhedron. From CGAL 4.7. The Polygon mesh processing package has been introduced which contains this which can be used to check for self intersections.

function that returns value from dlsym()?

Stupid question that I'm sure is some bit of syntax that's not right. How do I get dlsym to work with a function that returns a value? I'm getting the error 'invalid conversion of void* to LSError (*)()' in the following code - trying to get the compile the linux lightscribe sample program hoping that I can link it against the OSX dylib (why the hell won't HP release an actual Cocoa SDK? LS has only been around for what? 6 or 7 years now?):
void* LSHandle = dlopen("liblightscribe.1.dylib", RTLD_LOCAL|RTLD_LAZY);
if (LSHandle) {
LSError (*LS_DiscPrinter_ReleaseExclusiveUse)() = dlsym(LSHandle, "LS_DiscPrinter_ReleaseExclusiveUse");
..
lsError = LS_DiscPrinter_ReleaseExclusiveUse( pDiscPrinter);
The C standard does not actually define behaviour for converting to and from function pointers. Explanations vary as to why; the most common being that not all architectures implement function pointers as simple pointers to data. On some architectures, functions may reside in an entirely different segment of memory that is unaddressable using a pointer to void.
The “recommended” way to use dlsym is:
LSError (*LS_DiscPrinter_ReleaseExclusiveUse)(LS_DiscPrinterHandle);
*(void **)&LS_DiscPrinter_ReleaseExclusiveUse = dlsym("LS_DiscPrinter_ReleaseExclusiveUse");
Read the rationale and example on the POSIX page for dlsym for more information.