Objective C OpenCL Kernel Error Checking - objective-c

My kernel seems to not work if I pass in a certain image2d_t but works if I pass in another, and I don't know how to check if there is an error.
My kernel code:
kernel void change_color(read_only image2d_t img, write_only image2d_t out) {
int x = get_global_id(0);
int y = get_global_id(1);
int2 coords = {x, y};
float4 v = {1, 1, 0, 1};
write_imagef(out, coords, v);
}
Relevant host code:
cl_image_format format = {CL_BGRA, CL_UNSIGNED_INT8};
cl_image screen = gcl_create_image(
&format,
width,
height,
0,
frameSurface);
cl_ndrange range = {
2, {0,0,0}, {width,height,0}, {0,0,0}
};
change_color_kernel(&range, screen, output);
If I change
change_color_kernel(&range, screen, output)
to
change_color_kernel(&range, output, output)
the code works perfectly fine and the output image turns yellow; otherwise, the program runs as if the kernel was not invoked at all, and the output image remains unchanged.
Does this mean that passing screen results in some kind of error? How do I check what the error is, and what could be the cause?
Note: I do not know if my initialization of screen from frameSurface is correct.
Note #2: This is simply for testing purposes. I know that I'm not using img in the kernel, but the error should not be happening anyway.

Related

Raw Input mouse lastx, lasty with odd values while logged in through RDP

When I attempt to update my mouse position from the lLastX, and lLastY members of the RAWMOUSE structure while I'm logged in via RDP, I get some really odd numbers (like > 30,000 for both). I've noticed this behavior on Windows 7, 8, 8.1 and 10.
The usFlags member returns a value of MOUSE_MOVE_ABSOLUTE | MOUSE_VIRTUAL_DESKTOP. Regarding the MOUSE_MOVE_ABSOLUTE, I am handling absolute positioning as well as relative in my code. However, the virtual desktop flag has me a bit confused as I assumed that flag was for a multi-monitor setup. I've got a feeling that there's a connection to that flag and the weird numbers I'm getting. Unfortunately, I really don't know how to adjust the values without a point of reference, nor do I even know how to get a point of reference.
When I run my code locally, everything works as it should.
So does anyone have any idea why RDP + Raw Input would give me such messed up mouse lastx/lasty values? And if so, is there a way I can convert them to more sensible values?
It appears that when using WM_INPUT through remote desktop, the MOUSE_MOVE_ABSOLUTE and MOUSE_VIRTUAL_DESKTOP bits are set, and the values seems to be ranging from 0 to USHRT_MAX.
I never really found a clear documentation stating which coordinate system is used when MOUSE_VIRTUAL_DESKTOP bit is set, but this seems to have worked well thus far:
case WM_INPUT: {
UINT buffer_size = 48;
LPBYTE buffer[48];
GetRawInputData((HRAWINPUT)lparam, RID_INPUT, buffer, &buffer_size, sizeof(RAWINPUTHEADER));
RAWINPUT* raw = (RAWINPUT*)buffer;
if (raw->header.dwType != RIM_TYPEMOUSE) {
break;
}
const RAWMOUSE& mouse = raw->data.mouse;
if ((mouse.usFlags & MOUSE_MOVE_ABSOLUTE) == MOUSE_MOVE_ABSOLUTE) {
static Vector3 last_pos = vector3(FLT_MAX, FLT_MAX, FLT_MAX);
const bool virtual_desktop = (mouse.usFlags & MOUSE_VIRTUAL_DESKTOP) == MOUSE_VIRTUAL_DESKTOP;
const int width = GetSystemMetrics(virtual_desktop ? SM_CXVIRTUALSCREEN : SM_CXSCREEN);
const int height = GetSystemMetrics(virtual_desktop ? SM_CYVIRTUALSCREEN : SM_CYSCREEN);
const Vector3 absolute_pos = vector3((mouse.lLastX / float(USHRT_MAX)) * width, (mouse.lLastY / float(USHRT_MAX)) * height, 0);
if (last_pos != vector3(FLT_MAX, FLT_MAX, FLT_MAX)) {
MouseMoveEvent(absolute_pos - last_pos);
}
last_pos = absolute_pos;
}
else {
MouseMoveEvent(vector3((float)mouse.lLastX, (float)mouse.lLastY, 0));
}
}
break;

How-to convert an iOS camera image to greyscale using the Accelerate Framework?

It seems like this should be simpler than I'm finding it to be.
I have an AVFoundation frame coming back in the standard delegate method:
- (void)captureOutput:(AVCaptureOutput *)captureOutput
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
fromConnection:(AVCaptureConnection *)connection
where I would like to convert the frame to greyscale using the Accelerate.Framework.
There is a family of conversion methods in the framework, including vImageConvert_RGBA8888toPlanar8(), which looks like it might be what I would like to see, however, I can't find any examples of how to use them!
So far, I have the code:
- (void)captureOutput:(AVCaptureOutput *)captureOutput
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
fromConnection:(AVCaptureConnection *)connection
{
#autoreleasepool {
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
/*Lock the image buffer*/
CVPixelBufferLockBaseAddress(imageBuffer,0);
/*Get information about the image*/
uint8_t *baseAddress = (uint8_t *)CVPixelBufferGetBaseAddress(imageBuffer);
size_t width = CVPixelBufferGetWidth(imageBuffer);
size_t height = CVPixelBufferGetHeight(imageBuffer);
size_t stride = CVPixelBufferGetBytesPerRow(imageBuffer);
// vImage In
Pixel_8 *bitmap = (Pixel_8 *)malloc(width * height * sizeof(Pixel_8));
const vImage_Buffer inImage = { bitmap, height, width, stride };
//How can I take this inImage and convert it to greyscale?????
//vImageConvert_RGBA8888toPlanar8()??? Is the correct starting format here??
}
}
So I have two questions:
(1) In the code above, is RBGA8888 the correct starting format?
(2) How can I actually make the Accelerate.Framework call to convert to greyscale?
There is an easier option here. If you change the camera acquire format to YUV, then you already have a greyscale frame that you can use as you like. When setting up your data output, use something like:
dataOutput.videoSettings = #{ (id)kCVPixelBufferPixelFormatTypeKey : #(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange) };
You can then access the Y plane in your capture callback using:
CVPixelBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
CVPixelBufferLockBaseAddress(pixelBuffer, kCVPixelBufferLock_ReadOnly);
uint8_t *yPlane = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);
... do stuff with your greyscale camera image ...
CVPixelBufferUnlockBaseAddress(pixelBuffer);
The vImage method is to use vImageMatrixMultiply_Planar8 and a 1x3 matrix.
vImageConvert_RGBA8888toPlanar8 is the function you use to convert a RGBA8888 buffer into 4 planar buffers. These are used by vImageMatrixMultiply_Planar8. vImageMatrixMultiply_ARGB8888 will do it too in one pass, but your gray channel will be interleaved with three other channels in the result. vImageConvert_RGBA8888toPlanar8 itself doesn't do any math. All it does is separate your interleaved image into separate image planes.
If you need to adjust the gamma as well, then probably vImageConvert_AnyToAny() is the easy choice. It will do the fully color managed conversion from your RGB format to a grayscale colorspace. See vImage_Utilities.h.
I like Tarks answer better though. It just leaves you in a position of having to color manage the Luminance manually (if you care).
Convert BGRA Image to Grayscale with Accelerate vImage
This method is meant to illustrate getting Accelerate's vImage use in converting BGR images to grayscale. Your image may very well be in RGBA format and you'll need to adjust the matrix accordingly, but the camera outputs BGRA so I'm using it here. The values in the matrix are the same values used in OpenCV for cvtColor, there are other values you might play with like luminosity. I assume you malloc the appropriate amount of memory for the result. In the case of grayscale it is only 1-channel or 1/4 the memory used for BGRA. If anyone finds issues with this code please leave a comment.
Performance note
Converting to grayscale in this way may NOT be the fastest. You should check the performance of any method in your environment. Brad Larson's GPUImage might be faster, or even OpenCV's cvtColor. In any case you will want to remove the calls to malloc and free for the intermediate buffers and manage them for the app lifecycle. Otherwise, the function call will be dominated by the malloc and free. Apple's docs recommend reusing the whole vImage_Buffer when possible.
You can also read about solving the same problem with NEON intrinsics.
Finally, the fastest method is not converting at all. If you're getting image data from the device camera the device camera is natively in the kCVPixelFormatType_420YpCbCr8BiPlanarFullRange format. Meaning, grabbing the first plane's data (Y-Channel, luma) is the fastest way to get grayscale.
BGRA to Grayscale
- (void)convertBGRAFrame:(const CLPBasicVideoFrame &)bgraFrame toGrayscale:(CLPBasicVideoFrame &)grayscaleFrame
{
vImage_Buffer bgraImageBuffer = {
.width = bgraFrame.width,
.height = bgraFrame.height,
.rowBytes = bgraFrame.bytesPerRow,
.data = bgraFrame.rawPixelData
};
void *intermediateBuffer = malloc(bgraFrame.totalBytes);
vImage_Buffer intermediateImageBuffer = {
.width = bgraFrame.width,
.height = bgraFrame.height,
.rowBytes = bgraFrame.bytesPerRow,
.data = intermediateBuffer
};
int32_t divisor = 256;
// int16_t a = (int16_t)roundf(1.0f * divisor);
int16_t r = (int16_t)roundf(0.299f * divisor);
int16_t g = (int16_t)roundf(0.587f * divisor);
int16_t b = (int16_t)roundf(0.114f * divisor);
const int16_t bgrToGray[4 * 4] = { b, 0, 0, 0,
g, 0, 0, 0,
r, 0, 0, 0,
0, 0, 0, 0 };
vImage_Error error;
error = vImageMatrixMultiply_ARGB8888(&bgraImageBuffer, &intermediateImageBuffer, bgrToGray, divisor, NULL, NULL, kvImageNoFlags);
if (error != kvImageNoError) {
NSLog(#"%s, vImage error %zd", __PRETTY_FUNCTION__, error);
}
vImage_Buffer grayscaleImageBuffer = {
.width = grayscaleFrame.width,
.height = grayscaleFrame.height,
.rowBytes = grayscaleFrame.bytesPerRow,
.data = grayscaleFrame.rawPixelData
};
void *scratchBuffer = malloc(grayscaleFrame.totalBytes);
vImage_Buffer scratchImageBuffer = {
.width = grayscaleFrame.width,
.height = grayscaleFrame.height,
.rowBytes = grayscaleFrame.bytesPerRow,
.data = scratchBuffer
};
error = vImageConvert_ARGB8888toPlanar8(&intermediateImageBuffer, &grayscaleImageBuffer, &scratchImageBuffer, &scratchImageBuffer, &scratchImageBuffer, kvImageNoFlags);
if (error != kvImageNoError) {
NSLog(#"%s, vImage error %zd", __PRETTY_FUNCTION__, error);
}
free(intermediateBuffer);
free(scratchBuffer);
}
CLPBasicVideoFrame.h - For reference
typedef struct
{
size_t width;
size_t height;
size_t bytesPerRow;
size_t totalBytes;
unsigned long pixelFormat;
void *rawPixelData;
} CLPBasicVideoFrame;
I got through the grayscale conversion, but was having trouble with the quality when I found this book on the web called Instant OpenCV for iOS. I personally picked up a copy and it has a number of gems, although the code is bit of a mess. On the bright-side it is a very reasonably priced eBook.
I'm very curious about that matrix. I toyed around with it for hours trying to figure out what the arrangement should be. I would have thought the values should be on the diagonal, but the Instant OpenCV guys put it as above.
if you need to use BGRA vide streams - you can use this excellent conversion
here
This is the function you'll need to take:
void neon_convert (uint8_t * __restrict dest, uint8_t * __restrict src, int numPixels)
{
int i;
uint8x8_t rfac = vdup_n_u8 (77);
uint8x8_t gfac = vdup_n_u8 (151);
uint8x8_t bfac = vdup_n_u8 (28);
int n = numPixels / 8;
// Convert per eight pixels
for (i=0; i < n; ++i)
{
uint16x8_t temp;
uint8x8x4_t rgb = vld4_u8 (src);
uint8x8_t result;
temp = vmull_u8 (rgb.val[0], bfac);
temp = vmlal_u8 (temp,rgb.val[1], gfac);
temp = vmlal_u8 (temp,rgb.val[2], rfac);
result = vshrn_n_u16 (temp, 8);
vst1_u8 (dest, result);
src += 8*4;
dest += 8;
}
}
more optimisations (using assembly) are in the link
(1) My experience with the iOS camera framework has been with images in the kCMPixelFormat_32BGRA format, which is compatible with the ARGB8888 family of functions. (It may be possible to use other formats as well.)
(2) The simplest way to convert from BGR to grayscale on iOS is to use vImageMatrixMultiply_ARGB8888ToPlanar8():
https://developer.apple.com/documentation/accelerate/1546979-vimagematrixmultiply_argb8888top
Here is a fairly complete example written in Swift. I'm assuming the Objective-C code would be similar.
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
// TODO: report error
return
}
// Lock the image buffer
if (kCVReturnSuccess != CVPixelBufferLockBaseAddress(imageBuffer, CVPixelBufferLockFlags.readOnly)) {
// TODO: report error
return
}
defer {
CVPixelBufferUnlockBaseAddress(imageBuffer, CVPixelBufferLockFlags.readOnly)
}
// Create input vImage_Buffer
let baseAddress = CVPixelBufferGetBaseAddress(imageBuffer)
let width = CVPixelBufferGetWidth(imageBuffer)
let height = CVPixelBufferGetHeight(imageBuffer)
let stride = CVPixelBufferGetBytesPerRow(imageBuffer)
var inImage = vImage_Buffer(data: baseAddress, height: UInt(height), width: UInt(width), rowBytes: stride)
// Create output vImage_Buffer
let bitmap = malloc(width * height)
var outImage = vImage_Buffer(data: bitmap, height: UInt(height), width: UInt(width), rowBytes: width)
defer {
// Make sure to free unless the caller is responsible for this
free(bitmap)
}
// Arbitrary divisor to scale coefficients to integer values
let divisor: Int32 = 0x1000
let fDivisor = Float(divisor)
// Rec.709 coefficients
var coefficientsMatrix = [
Int16(0.0722 * fDivisor), // blue
Int16(0.7152 * fDivisor), // green
Int16(0.2126 * fDivisor), // red
0 // alpha
]
// Convert to greyscale
if (kvImageNoError != vImageMatrixMultiply_ARGB8888ToPlanar8(
&inImage, &outImage, &coefficientsMatrix, divisor, nil, 0, vImage_Flags(kvImageNoFlags))) {
// TODO: report error
return
}
The code above was inspired by a tutorial from Apple on grayscale conversion, which can be found at the following link. It also includes conversion to a CGImage if that is needed. Note that they assume RGB order instead of BGR, and they only provide a 3 coefficients instead of 4 (mistake?)
https://developer.apple.com/documentation/accelerate/vimage/converting_color_images_to_grayscale

Why doesn't this OpenCL kernel work?

I am trying to write a physics emulator for Mac which uses OpenCL hardware acceleration. Right now, when I try to run my kernel with OpenCL, the debugger reports a SIGABORT at my gcl_memcpy line to get results from the kernel.
Right now, I have managed to simplify the code quite a bit and still get the error. Here is my barebones OpenCL kernel which still causes a crash:
kernel void pointfield(global const float * points,
float posX, float posY,
global float * fieldsOut) {
size_t index = get_global_id(0);
float2 chargePosition = vload2(0, &points[index * 3]);
vstore2(chargePosition, 0, &fieldsOut[2 * index]);
}
This literally just loads a 2-dimensional vector and then stores it immediately. And I know what you are thinking, but the problem is not an overflow error. This equivalent code works:
kernel void pointfield(global const float * points,
float posX, float posY,
global float * fieldsOut) {
size_t index = get_global_id(0);
fieldsOut[2 * index] = points[3 * index];
fieldsOut[2 * index + 1] = points[3 * index + 1];
}
What on EARTH is wrong with my vector code?
Also, incase it seems necessary, here is the gist of how I am creating the OpenCL context:
dispatch_queue_t queue = gcl_create_dispatch_queue(CL_DEVICE_TYPE_GPU, NULL);
if (!queue) {
queue = gcl_create_dispatch_queue(CL_DEVICE_TYPE_CPU, NULL);
}
cl_device_id gpu = gcl_get_device_id_with_dispatch_queue(queue);
char name[128];
clGetDeviceInfo(gpu, CL_DEVICE_NAME, 128, name, NULL);
dispatch_sync(queue, ^{
cl_ndrange range = {
1,
{0, 0, 0},
{self.pointChargeCount, 0, 0},
0
};
pointfield_kernel(&range, (cl_float *)_pointCharges.clBuffer, point.x, point.y,
(cl_float *)_fieldValues.clBuffer);
[_fieldValues getCLBuffer]; // calls gcl_memcpy
});
If I omit the [_fieldValues getCLBuffer] call, the abort happens at the end of the dispatch_sync call rather than at the gcl_memcpy call.

Android - Trying to gradually fill a circle bottom to top

I'm trying to fill a round circle (transparent other than the outline of the circle) in an ImageView.
I have the code working:
public void setPercentage(int p) {
if (this.percentage != p ) {
this.percentage = p;
this.invalidate();
}
}
#Override public void onDraw(Canvas canvas) {
Canvas tempCanvas;
Paint paint;
Bitmap bmCircle = null;
if (this.getWidth() == 0 || this.getHeight() == 0 )
return ; // nothing to do
mergedLayersBitmap = Bitmap.createBitmap(this.getWidth(), this.getHeight(), Bitmap.Config.ARGB_8888);
tempCanvas = new Canvas(mergedLayersBitmap);
paint = new Paint(Paint.ANTI_ALIAS_FLAG);
paint.setStyle(Paint.Style.FILL_AND_STROKE);
paint.setFilterBitmap(false);
bmCircle = drawCircle(this.getWidth(), this.getHeight());
tempCanvas.drawBitmap(bmCircle, 0, 0, paint);
paint.setXfermode(new PorterDuffXfermode(PorterDuff.Mode.CLEAR));
tempCanvas.clipRect(0,0, this.getWidth(), (int) FloatMath.floor(this.getHeight() - this.getHeight() * ( percentage/100)));
tempCanvas.drawColor(0xFF660000, PorterDuff.Mode.CLEAR);
canvas.drawBitmap(mergedLayersBitmap, null, new RectF(0,0, this.getWidth(), this.getHeight()), new Paint());
canvas.drawBitmap(mergedLayersBitmap, 0, 0, new Paint());
}
static Bitmap drawCircle(int w, int h) {
Bitmap bm = Bitmap.createBitmap(w, h, Bitmap.Config.ARGB_8888);
Canvas c = new Canvas(bm);
Paint p = new Paint(Paint.ANTI_ALIAS_FLAG);
p.setColor(drawColor);
c.drawOval(new RectF(0, 0, w, h), p);
return bm;
}
It kind of works. However, I have two issues: I run out of memory quickly and the GC goes crazy. How can I utilize the least amount of memory for this operation?
I know I Shouldn't be instantiating objects in onDraw, however I'm not sure where to draw then. Thank you.
pseudo would look something like this.
for each pixel inside CircleBitmap {
if (pixel.y is < Yboundary && pixelIsInCircle(pixel.x, pixel.y)) {
CircleBitmap .setPixel(x, y, Color.rgb(45, 127, 0));
}
}
that may be slow, but it would work, and the smaller the circle the faster it would go.
just know the basics, bitmap width and height, for example 256x256, the circles radius, and to make things easy make the circle centered at 128,128. then as you go pixel by pixel, check the pixels X and Y to see if it falls inside the circle, and below the Y limit line.
then just use:
CircleBitmap .setPixel(x, y, Color.rgb(45, 127, 0));
edit: to speed things up, don't even bother looking at the pixels above the Y limit.
in case if you want to see another solution (perhaps cleaner), look at this link, filling a circle gradually from bottom to top android

Rotating camera around the X-axis (three.js)

I am trying to rotate the camera around to X-axis of the scene.
At this point my code is like this:
rotation += 0.05;
camera.position.y = Math.sin(rotation) * 500;
camera.position.z = Math.cos(rotation) * 500;
This makes the camera move around but during the rotation something weird happens and either the camera flips, or it skips some part of the imaginary circle it's following.
You have only provided a snippet of code, so I have to make some assumptions about what you are doing.
This code:
rotation += 0.05;
camera.position.x = 0;
camera.position.y = Math.sin(rotation) * 500;
camera.position.z = Math.cos(rotation) * 500;
camera.lookAt( scene.position ); // the origin
will cause the "flipping" you refer to because the camera is trying to remain "right side up", and it will quickly change orientation as it passes over the "north pole."
If you offset the camera's x-coordinate like so,
camera.position.x = 200;
the camera behavior will appear more natural to you.
Three.js tries to keep the camera facing up. When you pass 0 along the z-axis, it'll "fix" the camera's rotation. You can just check and reset the camera's angle manually.
camera.lookAt( scene.position ); // the origin
if (camera.position.z < 0) {
camera.rotation.z = 0;
}
I'm sure this is not the best solution, but if anyone else runs across this question while playing with three.js (like I just did), it'll give one step further.
This works for me, I hope it helps.
Rotating around X-Axis:
var x_axis = new THREE.Vector3( 1, 0, 0 );
var quaternion = new THREE.Quaternion;
camera.position.applyQuaternion(quaternion.setFromAxisAngle(x_axis, rotation_speed));
camera.up.applyQuaternion(quaternion.setFromAxisAngle(x_axis, rotation_speed));
Rotating around Y-Axis:
var y_axis = new THREE.Vector3( 0, 1, 0 );
camera.position.applyQuaternion(quaternion.setFromAxisAngle(y_axis, angle));
Rotating around Z-Axis:
var z_axis = new THREE.Vector3( 0, 0, 1 );
camera.up.applyQuaternion(quaternion.setFromAxisAngle(z_axis, angle));
I wanted to move my camera to a new location while having the camera look at a particular object, and this is what I came up with [make sure to load tween.js]:
/**
* Helper to move camera
* #param loc Vec3 - where to move the camera; has x, y, z attrs
* #param lookAt Vec3 - where the camera should look; has x, y, z attrs
* #param duration int - duration of transition in ms
**/
function flyTo(loc, lookAt, duration) {
// Use initial camera quaternion as the slerp starting point
var startQuaternion = camera.quaternion.clone();
// Use dummy camera focused on target as the slerp ending point
var dummyCamera = camera.clone();
dummyCamera.position.set(loc.x, loc.y, loc.z);
// set the dummy camera quaternion
var rotObjectMatrix = new THREE.Matrix4();
rotObjectMatrix.makeRotationFromQuaternion(startQuaternion);
dummyCamera.quaternion.setFromRotationMatrix(rotObjectMatrix);
dummyCamera.up.set(camera)
console.log(camera.quaternion, dummyCamera.quaternion);
// create dummy controls to avoid mutating main controls
var dummyControls = new THREE.TrackballControls(dummyCamera);
dummyControls.target.set(loc.x, loc.y, loc.z);
dummyControls.update();
// Animate between the start and end quaternions
new TWEEN.Tween(camera.position)
.to(loc, duration)
.onUpdate(function(timestamp) {
// Slerp the camera quaternion for smooth transition.
// `timestamp` is the eased time value from the tween.
THREE.Quaternion.slerp(startQuaternion, dummyCamera.quaternion, camera.quaternion, timestamp);
camera.lookAt(lookAt);
})
.onComplete(function() {
controls.target = new THREE.Vector3(scene.children[1].position-0.001);
camera.lookAt(lookAt);
}).start();
}
Example usage:
var pos = {
x: -4.3,
y: 1.7,
z: 7.3,
};
var lookAt = scene.children[1].position;
flyTo(pos, lookAt, 60000);
Then in your update()/render() function, call TWEEN.update();
Full example