Trying to create a C Function List - objective-c

I have been trying to create a list of general global C functions for various classes to use, and although i've done this in the past, this one is having problems. Here are the .h and .c parts of the list.
CGGeometry.h
//CDPoint
//////////////////////////////////////////////////////////////////////
typedef struct CDPoint {
CGFloat x, y, z;
} CDPoint;
// Creates a CDPoint from 3 float numbers
CDPoint CDPointMake(float x, float y, float z);
//CDLine
//////////////////////////////////////////////////////////////////////
typedef struct CDLine {
CDPoint a, b;
} CDLine;
// Creates a CDPoint from 2 CDPoints
CDLine CDLineMake(CDPoint a, CDPoint b);
//CDVector
//////////////////////////////////////////////////////////////////////
typedef struct CDVector {
CDPoint start, finish;
CDPoint gradient;
} CDVector;
// Creates a CDVector from 2 CDPoints
CDVector CDVectorMake(CDPoint startPoint, CDPoint endPoint);
// Returns a point travelled to on a given vector, using a start point and a distance scalar.
CDPoint CDVectorTrace(CDVector vecToTrace, CDPoint startPoint, float distance);
//CDExtra
//////////////////////////////////////////////////////////////////////
// This is stuff that shoudn't really be in this section, but are for convenience purposes until it has enough functions to be standalone.
GLfloat* CDMeshColorsCreateGrey(CGFloat bValue, CGFloat vertCount);
CGFloat* CDMeshVertexesCreateRectangle(CGFloat height, CGFloat width);
CDGeometry.c
#include "CDGeometry.h"
//CDGeometry.c
/* A collection of functions and typedefs that aid 2D and 3D environment positioning and provides methods for objects and processes. Also includes elements relevant to collision detection. */
//CDPoint
//////////////////////////////////////////////////////////////////////
CDPoint CDPointMake(float x, float y, float z)
{
return (CDPoint) {x, y, z};
}
//CDLine
//////////////////////////////////////////////////////////////////////
CDLine CDLineMake(CDPoint a, CDPoint b)
{
return (CDLine) {a, b};
}
//CDVector
//////////////////////////////////////////////////////////////////////
CDVector CDVectorMake(CDPoint startPoint, CDPoint endPoint)
{
CDPoint grad = CDPointMake(startPoint.x / endPoint.x,
startPoint.y / endPoint.y,
startPoint.z / endPoint.z);
return (CDVector) {startPoint, endPoint, grad};
}
//CDExtra
/////////////////////////////////////////////////////////////////////
GLfloat* CDMeshColorsCreateGrey(CGFloat bValue, CGFloat vertCount)
{
GLfloat *greyColor = (GLfloat *) malloc(vertCount * 4 * sizeof(GLfloat));
int index = 0;
for (index = 0; index < (vertCount); index++)
{
int position = index * 4;
greyColor[position] = bValue;
greyColor[position + 1] = bValue;
greyColor[position + 2] = bValue;
greyColor[position + 3] = 1.0;
}
return greyColor;
}
CGFloat* CDMeshVertexesCreateRectangle(CGFloat height, CGFloat width) {
CGFloat *squareVertexes = (CGFloat *) malloc(8 * sizeof(CGFloat));
squareVertexes[0] = -(width / 2);
squareVertexes[1] = -(height / 2);
squareVertexes[2] = (width / 2);
squareVertexes[3] = -(height / 2);
squareVertexes[4] = (width / 2);
squareVertexes[5] = (height / 2);
squareVertexes[6] = -(width / 2);
squareVertexes[7] = (height / 2);
return squareVertexes;
}
When I don't import or any other framework, I receive 'Parse Error: unknown type name' for CGFloat and GLfloat. When I do inside the .h file, I get Parse and Semantic errors, where NSString is an unknown type name inside the framework, as well as other, 'Expected Identifier or (" errors.
I've never had to include this header for my original C function lists, i've gone through other example code from Apple and i've checked headers on other classes that use these functions and typedefs, and I cant find the problem.

CGFloat is part of the CoreGraphics Framework. If you want to use pure C you will not have access to the CGFloat and need to define it as just a float. If your class is only going to be used with Objective-C you can make it a .m file and you should not have any trouble.

Related

When I use Y-Combinator and block in C, I meet a strange thing in parameter value

When I try to caculate sinh−1(x) using functions:
double asinh_recursion(double buf, double increment, double input_var, unsigned long item_count) {
if (fabs(increment) < 1E-5) {
return buf;
}
return asinh_recursion(buf + increment, increment * (-1) * (2 * item_count - 1) * (2 * item_count -1) / (2 * item_count + 1) / 2 / item_count * input_var, input_var, item_count + 1);
}
double asinh(double x) {
if (!(fabs(x) < 1.0)) {
printf("error asinh():wrong param x(fabs(x) > 1.0)");
return -1.0;
}
return asinh_recursion(0.0, x, x * x, 1);
}
it seem works.
but when I try to use block and Y-Combinator to do it:
typedef void * (^YCBlock)(void *);
YCBlock Y;
double asinh_with_block(double x) {
if (!(fabs(x) < 1.0)) {
printf("error asinh():wrong param x(fabs(x) > 1.0)");
return -1.0;
}
Y= (YCBlock) ^ (YCBlock f) {
return (YCBlock) ^ (YCBlock g) {
return g(g);
}(
(YCBlock) ^ (YCBlock h) {
return f(^ (void * x) { return ((YCBlock)h(h))(x); });
}
);
};
typedef double (^ RECUR_BLK_TYPE)(double, double, unsigned long);
RECUR_BLK_TYPE recur_block = Y(^(RECUR_BLK_TYPE recur_block){
return Block_copy(^ double (double buf, double increment, unsigned long item_count){
if (item_count < 4) {
printf("param:%lf,%lf,%lu\n", buf, increment, item_count);
}
if (fabs(increment) < 1E-5) {
return buf;
}
buf = buf + increment;
increment = increment * (-1) * (2 * item_count - 1) * (2 * item_count -1) / (2 * item_count + 1) / 2 / item_count * (x * x);
++item_count;
if (item_count < 4) {
printf("\tbuf:%lf\n", buf);
}
return recur_block(buf, increment, item_count);
});
});
double ret = recur_block(0, x, 1);
Block_release(recur_block);
Block_release(Y);
return ret;
}
but it works strangely in the output(x=0.5):
param:0.000000,0.500000,1
buf:0.500000
param:0.500000,-0.020833,2
buf:0.479167
param:0.500000,0.002344,3
...
asinh_with_block(0.500000):0.500000
it seem like that in the block, at some time,when I pass buf=0.479167, next time when I print it, it is still 0.500000.
I wanna to find why it works like this, maybe I wrote some wrong code at somewhere...
The problem is that your Y combinator is only made to work with an underlying function that takes one void * parameter and returns a void *. You can see that in the line:
return f(^ (void * x) { return ((YCBlock)h(h))(x); });
The block in there that takes x (one argument) and passed the x to another thing as one argument. For it to work with a recursive function of multiple arguments, this function must take those multiple arguments and pass them all on (of course, the types all need to be right too, because different types have different sizes, and the ABI for passing and returning things of different types is different). So you will need a different Y combinator for each function signature.
You have a recursive function that takes three parameters (two doubles and an unsigned long) and returns a double. You can (minimally) make it work by changing the relevant block in the Y combinator and coercing it from the wrong type to the right type:
return f(^ (double buf, double increment, unsigned long item_count) {
return ((RECUR_BLK_TYPE)((YCBlock)h(h)))(buf, increment, item_count);
});
But to really make it clean with correct type safety without this unsafe casting would require you to carefully set up the types. Something like this:
typedef double (^Func)(double, double, unsigned long);
typedef Func (^FuncFunc)(Func);
typedef Func (^RecursiveFunc)(void *);
typedef Func (^YCBlock)(FuncFunc);
Y = ^(FuncFunc f) {
return ^(RecursiveFunc g) {
return g(g);
}(
^(void *temp) {
RecursiveFunc h = temp; // trick to hide the recursive typing
return f(^(double buf, double increment, unsigned long item_count) {
return h(h)(buf, increment, item_count);
});
}
);
};

QuadTree or KD Tree for objective c? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm looking a while for a decent piece of code to use in my app, in one of those algorithms.
I found this example: http://rosettacode.org/wiki/K-d_tree#C
But when I put the code in xcode, I get an errors, for example:
"use of undeclared identifier", "expected ';' at the end of declaration".
I guess a header file is missing?
I copied the code from the link and made a minor edit which moved
"swap" from being an inline nested function to a static function.
Compiled with "gcc -C99 file.c" and it compiled ok. So, no, it doesn't
need some include file. Maybe you mis pasted it.
If you are happy with this answer, you could accept it. Thanks.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <time.h>
#define MAX_DIM 3
struct kd_node_t{
double x[MAX_DIM];
struct kd_node_t *left, *right;
};
inline double
dist(struct kd_node_t *a, struct kd_node_t *b, int dim)
{
double t, d = 0;
while (dim--) {
t = a->x[dim] - b->x[dim];
d += t * t;
}
return d;
}
static void swap(struct kd_node_t *x, struct kd_node_t *y) {
double tmp[MAX_DIM];
memcpy(tmp, x->x, sizeof(tmp));
memcpy(x->x, y->x, sizeof(tmp));
memcpy(y->x, tmp, sizeof(tmp));
}
/* see quickselect method */
struct kd_node_t*
find_median(struct kd_node_t *start, struct kd_node_t *end, int idx)
{
if (end <= start) return NULL;
if (end == start + 1)
return start;
struct kd_node_t *p, *store, *md = start + (end - start) / 2;
double pivot;
while (1) {
pivot = md->x[idx];
swap(md, end - 1);
for (store = p = start; p < end; p++) {
if (p->x[idx] < pivot) {
if (p != store)
swap(p, store);
store++;
}
}
swap(store, end - 1);
/* median has duplicate values */
if (store->x[idx] == md->x[idx])
return md;
if (store > md) end = store;
else start = store;
}
}
struct kd_node_t*
make_tree(struct kd_node_t *t, int len, int i, int dim)
{
struct kd_node_t *n;
if (!len) return 0;
if ((n = find_median(t, t + len, i))) {
i = (i + 1) % dim;
n->left = make_tree(t, n - t, i, dim);
n->right = make_tree(n + 1, t + len - (n + 1), i, dim);
}
return n;
}
/* global variable, so sue me */
int visited;
void nearest(struct kd_node_t *root, struct kd_node_t *nd, int i, int dim,
struct kd_node_t **best, double *best_dist)
{
double d, dx, dx2;
if (!root) return;
d = dist(root, nd, dim);
dx = root->x[i] - nd->x[i];
dx2 = dx * dx;
visited ++;
if (!*best || d < *best_dist) {
*best_dist = d;
*best = root;
}
/* if chance of exact match is high */
if (!*best_dist) return;
if (++i >= dim) i = 0;
nearest(dx > 0 ? root->left : root->right, nd, i, dim, best, best_dist);
if (dx2 >= *best_dist) return;
nearest(dx > 0 ? root->right : root->left, nd, i, dim, best, best_dist);
}
#define N 1000000
#define rand1() (rand() / (double)RAND_MAX)
#define rand_pt(v) { v.x[0] = rand1(); v.x[1] = rand1(); v.x[2] = rand1(); }
int main(void)
{
int i;
struct kd_node_t wp[] = {
{{2, 3}}, {{5, 4}}, {{9, 6}}, {{4, 7}}, {{8, 1}}, {{7, 2}}
};
struct kd_node_t this = {{9, 2}};
struct kd_node_t *root, *found, *million;
double best_dist;
root = make_tree(wp, sizeof(wp) / sizeof(wp[1]), 0, 2);
visited = 0;
found = 0;
nearest(root, &this, 0, 2, &found, &best_dist);
printf(">> WP tree\nsearching for (%g, %g)\n"
"found (%g, %g) dist %g\nseen %d nodes\n\n",
this.x[0], this.x[1],
found->x[0], found->x[1], sqrt(best_dist), visited);
million = calloc(N, sizeof(struct kd_node_t));
srand(time(0));
for (i = 0; i < N; i++) rand_pt(million[i]);
root = make_tree(million, N, 0, 3);
rand_pt(this);
visited = 0;
found = 0;
nearest(root, &this, 0, 3, &found, &best_dist);
printf(">> Million tree\nsearching for (%g, %g, %g)\n"
"found (%g, %g, %g) dist %g\nseen %d nodes\n",
this.x[0], this.x[1], this.x[2],
found->x[0], found->x[1], found->x[2],
sqrt(best_dist), visited);
/* search many random points in million tree to see average behavior.
tree size vs avg nodes visited:
10 ~ 7
100 ~ 16.5
1000 ~ 25.5
10000 ~ 32.8
100000 ~ 38.3
1000000 ~ 42.6
10000000 ~ 46.7 */
int sum = 0, test_runs = 100000;
for (i = 0; i < test_runs; i++) {
found = 0;
visited = 0;
rand_pt(this);
nearest(root, &this, 0, 3, &found, &best_dist);
sum += visited;
}
printf("\n>> Million tree\n"
"visited %d nodes for %d random findings (%f per lookup)\n",
sum, test_runs, sum/(double)test_runs);
// free(million);
return 0;
}

CUDA Thrust sort_by_key when the key is a tuple dealt with by zip_iterator's with custom comparison predicate

I've looked through a lot of questions here for something similar and there are quite a few, albeit with one minor change. I'm trying to sort values with a zip_iterator as a compound key.
Specifically, I have the following function:
void thrustSort(
unsigned int * primaryKey,
float * secondaryKey,
unsigned int * values,
unsigned int numberOfPoints)
{
thrust::device_ptr dev_ptr_pkey = thrust::device_pointer_cast(primaryKey);
thrust::device_ptr dev_ptr_skey = thrust::device_pointer_cast(secondaryKey);
thrust::device_ptr dev_ptr_values = thrust::device_pointer_cast(values);
thrust::tuple,thrust::device_ptr> keytup_begin =
thrust::make_tuple,thrust::device_ptr>(dev_ptr_pkey, dev_ptr_skey);
thrust::zip_iterator, thrust::device_ptr > > first =
thrust::make_zip_iterator, thrust::device_ptr > >(keytup_begin);
thrust::sort_by_key(first, first + numberOfPoints, dev_ptr_values, ZipComparator());
}
and this custom predicate:
typedef thrust::device_ptr<unsigned int> tdp_uint ;
typedef thrust::device_ptr<float> tdp_float ;
typedef thrust::tuple<tdp_uint, tdp_float> tdp_uif_tuple ;
struct ZipComparator
{
__host__ __device__
inline bool operator() (const tdp_uif_tuple &a, const tdp_uif_tuple &b)
{
if(a.head < b.head) return true;
if(a.head == b.head) return a.tail < b.tail;
return false;
}
};
The errors I'm getting are:
Error 1 error : no instance of constructor "thrust::device_ptr::device_ptr [with T=unsigned int]" matches the argument list C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include\thrust\detail\tuple.inl 309 1 ---
Error 2 error : no instance of constructor "thrust::device_ptr::device_ptr [with T=float]" matches the argument list C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include\thrust\detail\tuple.inl 401 1 ---
Any ideas what might cause this / how do I write a predicate that indeed works?
Thanks in Advance,
Nathan
The comparator takes arguments of type const thrust::tuple<unsigned int, float>&. The const tdp_uif_tuple& type you defined expands to const thrust::tuple<thrust::device_ptr<unsigned int>, thrust:device_ptr<float> >&
The code below compiles for me:
struct ZipComparator
{
__host__ __device__
inline bool operator() (const thrust::tuple<unsigned int, float> &a, const thrust::tuple<unsigned int, float> &b)
{
if(a.head < b.head) return true;
if(a.head == b.head) return a.tail < b.tail;
return false;
}
};
Hope it does for you as well :)
http://code.google.com/p/thrust/wiki/QuickStartGuide#zip_iterator has more details on the zip iterator.
Not required, but if you're looking to clean up the length of those templates, you can do this:
void thrustSort(
unsigned int * primaryKey,
float * secondaryKey,
unsigned int * values,
unsigned int numberOfPoints)
{
tdp_uint dev_ptr_pkey(primaryKey);
tdp_float dev_ptr_skey(secondaryKey);
tdp_uint dev_ptr_values(values);
thrust::tuple<tdp_uint, tdp_float> keytup_begin = thrust::make_tuple(dev_ptr_pkey, dev_ptr_skey);
thrust::zip_iterator<thrust::tuple<tdp_uint, tdp_float> > first =
thrust::make_zip_iterator(keytup_begin);
thrust::sort_by_key(first, first + numberOfPoints, dev_ptr_values, ZipComparator());
}
A lot of the template arguments can be inferred from the arguments.
This is a fully worked example on how using sort_by_key when the key is a tuple dealt with by zip_iterator's and a customized comparison operator.
#include <thrust/device_vector.h>
#include <thrust/sort.h>
#include "Utilities.cuh"
// --- Defining tuple type
typedef thrust::tuple<int, int> Tuple;
/**************************/
/* TUPLE ORDERING FUNCTOR */
/**************************/
struct TupleComp
{
__host__ __device__ bool operator()(const Tuple& t1, const Tuple& t2)
{
if (t1.get<0>() < t2.get<0>())
return true;
if (t1.get<0>() > t2.get<0>())
return false;
return t1.get<1>() < t2.get<1>();
}
};
/********/
/* MAIN */
/********/
int main()
{
const int N = 8;
// --- Keys and values on the host: allocation and definition
int h_keys1[N] = { 1, 3, 3, 3, 2, 3, 2, 1 };
int h_keys2[N] = { 1, 5, 3, 8, 2, 8, 1, 1 };
float h_values[N] = { 0.3, 5.1, 3.2, -0.08, 2.1, 5.2, 1.1, 0.01};
printf("\n\n");
printf("Original\n");
for (int i = 0; i < N; i++) {
printf("%i %i %f\n", h_keys1[i], h_keys2[i], h_values[i]);
}
// --- Keys and values on the device: allocation
int *d_keys1; gpuErrchk(cudaMalloc(&d_keys1, N * sizeof(int)));
int *d_keys2; gpuErrchk(cudaMalloc(&d_keys2, N * sizeof(int)));
float *d_values; gpuErrchk(cudaMalloc(&d_values, N * sizeof(float)));
// --- Keys and values: host -> device
gpuErrchk(cudaMemcpy(d_keys1, h_keys1, N * sizeof(int), cudaMemcpyHostToDevice));
gpuErrchk(cudaMemcpy(d_keys2, h_keys2, N * sizeof(int), cudaMemcpyHostToDevice));
gpuErrchk(cudaMemcpy(d_values, h_values, N * sizeof(float), cudaMemcpyHostToDevice));
// --- From raw pointers to device_ptr
thrust::device_ptr<int> dev_ptr_keys1 = thrust::device_pointer_cast(d_keys1);
thrust::device_ptr<int> dev_ptr_keys2 = thrust::device_pointer_cast(d_keys2);
thrust::device_ptr<float> dev_ptr_values = thrust::device_pointer_cast(d_values);
// --- Declare outputs
thrust::device_vector<float> d_values_output(N);
thrust::device_vector<Tuple> d_keys_output(N);
auto begin_keys = thrust::make_zip_iterator(thrust::make_tuple(dev_ptr_keys1, dev_ptr_keys2));
auto end_keys = thrust::make_zip_iterator(thrust::make_tuple(dev_ptr_keys1 + N, dev_ptr_keys2 + N));
thrust::sort_by_key(begin_keys, end_keys, dev_ptr_values, TupleComp());
int *h_keys1_output = (int *)malloc(N * sizeof(int));
int *h_keys2_output = (int *)malloc(N * sizeof(int));
float *h_values_output = (float *)malloc(N * sizeof(float));
gpuErrchk(cudaMemcpy(h_keys1_output, d_keys1, N * sizeof(int), cudaMemcpyDeviceToHost));
gpuErrchk(cudaMemcpy(h_keys2_output, d_keys2, N * sizeof(int), cudaMemcpyDeviceToHost));
gpuErrchk(cudaMemcpy(h_values_output, d_values, N * sizeof(float), cudaMemcpyDeviceToHost));
printf("\n\n");
printf("Ordered\n");
for (int i = 0; i < N; i++) {
printf("%i %i %f\n", h_keys1_output[i], h_keys2_output[i], h_values_output[i]);
}
}

How to quickly find a image in another image using CUDA?

In my current project I need to find pixel exact position of image contained in another image of larger size. Smaller image is never rotated or stretched (so should match pixel by pixel) but it may have different brightness and some pixels in the image may be distorted. My first attemp was to do it on CPU but it was too slow. The calculations are very parallel, so I decided to use the GPU. I just started to learn CUDA and wrote my first CUDA app. My code works but it still is too slow even on GPU. When the larger image has a dimension of 1024x1280 and smaller is 128x128 program performs calculations in 2000ms on GeForce GTX 560 ti. I need to get results in less than 200ms. In the future I'll probably need a more complex algorithm, so I'd rather have even more computational power reserve. The question is how I can optimise my code to achieve that speed up?
CUDAImageLib.dll:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <cutil.h>
//#define SUPPORT_ALPHA
__global__ void ImageSearch_kernel(float* BufferOut, float* BufferB, float* BufferS, unsigned int bw, unsigned int bh, unsigned int sw, unsigned int sh)
{
unsigned int bx = threadIdx.x + blockIdx.x * blockDim.x;
unsigned int by = threadIdx.y + blockIdx.y * blockDim.y;
float diff = 0;
for (unsigned int y = 0; y < sh; ++y)
{
for (unsigned int x = 0; x < sw; ++x)
{
unsigned int as = (x + y * sw) * 4;
unsigned int ab = (x + bx + (y + by) * bw) * 4;
#ifdef SUPPORT_ALPHA
diff += ((abs(BufferS[as] - BufferB[ab]) + abs(BufferS[as + 1] - BufferB[ab + 1]) + abs(BufferS[as + 2] - BufferB[ab + 2])) * BufferS[as + 3] * BufferB[ab + 3]);
#else
diff += abs(BufferS[as] - BufferB[ab]);
diff += abs(BufferS[as + 1] - BufferB[ab + 1]);
diff += abs(BufferS[as + 2] - BufferB[ab + 2]);
#endif
}
}
BufferOut[bx + (by * (bw - sw))] = diff;
}
extern "C" int __declspec(dllexport) __stdcall ImageSearchGPU(float* BufferOut, float* BufferB, float* BufferS, int bw, int bh, int sw, int sh)
{
int aBytes = (bw * bh) * 4 * sizeof(float);
int bBytes = (sw * sh) * 4 * sizeof(float);
int cBytes = ((bw - sw) * (bh - sh)) * sizeof(float);
dim3 threadsPerBlock(32, 32);
dim3 numBlocks((bw - sw) / threadsPerBlock.x, (bh - sh) / threadsPerBlock.y);
float *dev_B = 0;
float *dev_S = 0;
float *dev_Out = 0;
unsigned int timer = 0;
float sExecutionTime = 0;
cudaError_t cudaStatus;
// Choose which GPU to run on, change this on a multi-GPU system.
cudaStatus = cudaSetDevice(0);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaSetDevice failed! Do you have a CUDA-capable GPU installed?");
goto Error;
}
// Allocate GPU buffers for three vectors (two input, one output) .
cudaStatus = cudaMalloc((void**)&dev_Out, cBytes);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMalloc failed!");
goto Error;
}
cudaStatus = cudaMalloc((void**)&dev_B, aBytes);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMalloc failed!");
goto Error;
}
cudaStatus = cudaMalloc((void**)&dev_S, bBytes);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMalloc failed!");
goto Error;
}
// Copy input vectors from host memory to GPU buffers.
cudaStatus = cudaMemcpy(dev_B, BufferB, aBytes, cudaMemcpyHostToDevice);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMemcpy failed!");
goto Error;
}
cudaStatus = cudaMemcpy(dev_S, BufferS, bBytes, cudaMemcpyHostToDevice);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMemcpy failed!");
goto Error;
}
cutCreateTimer(&timer);
cutStartTimer(timer);
// Launch a kernel on the GPU with one thread for each element.
ImageSearch_kernel<<<numBlocks, threadsPerBlock>>>(dev_Out, dev_B, dev_S, bw, bh, sw, sh);
// cudaDeviceSynchronize waits for the kernel to finish, and returns
// any errors encountered during the launch.
cudaStatus = cudaDeviceSynchronize();
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaDeviceSynchronize returned error code %d after launching addKernel!\n", cudaStatus);
goto Error;
}
cutStopTimer(timer);
sExecutionTime = cutGetTimerValue(timer);
// Copy output vector from GPU buffer to host memory.
cudaStatus = cudaMemcpy(BufferOut, dev_Out, cBytes, cudaMemcpyDeviceToHost);
if (cudaStatus != cudaSuccess) {
fprintf(stderr, "cudaMemcpy failed!");
goto Error;
}
Error:
cudaFree(dev_Out);
cudaFree(dev_B);
cudaFree(dev_S);
return (int)sExecutionTime;
}
extern "C" int __declspec(dllexport) __stdcall FindMinCPU(float* values, int count)
{
int minIndex = 0;
float minValue = 3.4e+38F;
for (int i = 0; i < count; ++i)
{
if (values[i] < minValue)
{
minValue = values[i];
minIndex = i;
}
}
return minIndex;
}
C# test app:
using System;
using System.Collections.Generic;
using System.Text;
using System.Diagnostics;
using System.Drawing;
namespace TestCUDAImageSearch
{
class Program
{
static void Main(string[] args)
{
using(Bitmap big = new Bitmap("Big.png"), small = new Bitmap("Small.png"))
{
Console.WriteLine("Big " + big.Width + "x" + big.Height + " Small " + small.Width + "x" + small.Height);
Stopwatch sw = new Stopwatch();
sw.Start();
Point point = CUDAImageLIb.ImageSearch(big, small);
sw.Stop();
long t = sw.ElapsedMilliseconds;
Console.WriteLine("Image found at " + point.X + "x" + point.Y);
Console.WriteLine("total time=" + t + "ms kernel time=" + CUDAImageLIb.LastKernelTime + "ms");
}
Console.WriteLine("Hit key");
Console.ReadKey();
}
}
}
//#define SUPPORT_HSB
using System;
using System.Collections.Generic;
using System.Text;
using System.Runtime.InteropServices;
using System.Drawing;
using System.Drawing.Imaging;
namespace TestCUDAImageSearch
{
public static class CUDAImageLIb
{
[DllImport("CUDAImageLib.dll")]
private static extern int ImageSearchGPU(float[] bufferOut, float[] bufferB, float[] bufferS, int bw, int bh, int sw, int sh);
[DllImport("CUDAImageLib.dll")]
private static extern int FindMinCPU(float[] values, int count);
private static int _lastKernelTime = 0;
public static int LastKernelTime
{
get { return _lastKernelTime; }
}
public static Point ImageSearch(Bitmap big, Bitmap small)
{
int bw = big.Width;
int bh = big.Height;
int sw = small.Width;
int sh = small.Height;
int mx = (bw - sw);
int my = (bh - sh);
float[] diffs = new float[mx * my];
float[] b = ImageToFloat(big);
float[] s = ImageToFloat(small);
_lastKernelTime = ImageSearchGPU(diffs, b, s, bw, bh, sw, sh);
int minIndex = FindMinCPU(diffs, diffs.Length);
return new Point(minIndex % mx, minIndex / mx);
}
public static List<Point> ImageSearch(Bitmap big, Bitmap small, float maxDeviation)
{
int bw = big.Width;
int bh = big.Height;
int sw = small.Width;
int sh = small.Height;
int mx = (bw - sw);
int my = (bh - sh);
int nDiff = mx * my;
float[] diffs = new float[nDiff];
float[] b = ImageToFloat(big);
float[] s = ImageToFloat(small);
_lastKernelTime = ImageSearchGPU(diffs, b, s, bw, bh, sw, sh);
List<Point> points = new List<Point>();
for(int i = 0; i < nDiff; ++i)
{
if (diffs[i] < maxDeviation)
{
points.Add(new Point(i % mx, i / mx));
}
}
return points;
}
#if SUPPORT_HSB
private static float[] ImageToFloat(Bitmap img)
{
int w = img.Width;
int h = img.Height;
float[] pix = new float[w * h * 4];
int i = 0;
for (int y = 0; y < h; ++y)
{
for (int x = 0; x < w; ++x)
{
Color c = img.GetPixel(x, y);
pix[i] = c.GetHue() / 360;
pix[i + 1] = c.GetSaturation();
pix[i + 2] = c.GetBrightness();
pix[i + 3] = c.A;
i += 4;
}
}
return pix;
}
#else
private static float[] ImageToFloat(Bitmap bmp)
{
int w = bmp.Width;
int h = bmp.Height;
int n = w * h;
float[] pix = new float[n * 4];
System.Diagnostics.Debug.Assert(bmp.PixelFormat == PixelFormat.Format32bppArgb);
Rectangle r = new Rectangle(0, 0, w, h);
BitmapData bmpData = bmp.LockBits(r, ImageLockMode.ReadOnly, bmp.PixelFormat);
System.Diagnostics.Debug.Assert(bmpData.Stride > 0);
int[] pixels = new int[n];
System.Runtime.InteropServices.Marshal.Copy(bmpData.Scan0, pixels, 0, n);
bmp.UnlockBits(bmpData);
int j = 0;
for (int i = 0; i < n; ++i)
{
pix[j] = (pixels[i] & 255) / 255.0f;
pix[j + 1] = ((pixels[i] >> 8) & 255) / 255.0f;
pix[j + 2] = ((pixels[i] >> 16) & 255) / 255.0f;
pix[j + 3] = ((pixels[i] >> 24) & 255) / 255.0f;
j += 4;
}
return pix;
}
#endif
}
}
Looks like what you are talking about is a well known problem: Template matching. The easiest way forward is to convolve the Image (the bigger image) with the template (the smaller image). You could implement convolutions in one of two ways.
1) Modify the convolutions example from the CUDA SDK (similar to what you are doing anyway).
2) Use FFTs to implement the convolution. Ref. Convolution theorem. You will need to remember
% MATLAB format
L = size(A) + size(B) - 1;
conv2(A, B) = IFFT2(FFT2(A, L) .* FFT2(B, L));
You could use cufft to implement the 2 dimensional FFTs (After padding them appropriately). You will need to write a kernel that does element wise multiplication and then normalizes the result (because CUFFT does not normalize) before performing the inverse FFT.
For the sizes you mention, (1024 x 1280 and 128 x 128), the inputs must be padded to atleast ((1024 + 128 - 1) x (1280 + 128 -1) = 1151 x 1407). But FFTs are fastest when the (padded) inputs are powers of 2. So you will need to pad both the large and small images to size 2048 x 2048.
You could speed up your calculations by using faster memory access, for example by using
Texture Cache for the big image
Shared Memory or Constant Cache for the small image or parts of it.
But your real problem is the whole approach of your comparison. Comparing the images pixel by pixel at every possible location will never be efficient. There is just too much work to do. First you should think about finding ways to
Select the interesting image regions in the big image where the small image might be contained and only search in these
Find a faster comparison mechanism, by something representing the images that are not their pixels values. You should be able to compare the images by computing a representation with less data, e.g. a color histogram, or integral images.

How to subclass in Go

In C I can do something like this
struct Point {
int x,y;
}
struct Circle {
struct Point p; // must be first!
int rad;
}
void move(struct Point *p,int dx,int dy) {
....
}
struct Circle c = .....;
move( (struct Point*)&c,1,2);
Using this approach, I can pass any struct(Circle,Rectangle,etc) that has struct Point as first member.
How can I do the same in google go?
Actually, there's a simpler way to do it, which is more similar to the OP's example:
type Point struct {
x, y int
}
func (p *Point) Move(dx, dy int) {
p.x += dx
p.y += dy
}
type Circle struct {
*Point // embedding Point in Circle
rad int
}
// Circle now implicitly has the "Move" method
c := &Circle{&Point{0, 0}, 5}
c.Move(7, 3)
Also notice that Circle would also fulfill the Mover interface that PeterSO posted.
http://golang.org/doc/effective_go.html#embedding
Although Go has types and methods and
allows an object-oriented style of
programming, there is no type
hierarchy. The concept of “interface”
in Go provides a different approach
that we believe is easy to use and in
some ways more general. There are also
ways to embed types in other types to
provide something analogous—but not
identical—to subclassing. Is Go an
object-oriented language?, FAQ.
For example,
package main
import "fmt"
type Mover interface {
Move(x, y int)
}
type Point struct {
x, y int
}
type Circle struct {
point Point
rad int
}
func (c *Circle) Move(x, y int) {
c.point.x = x
c.point.y = y
}
type Square struct {
diagonal int
point Point
}
func (s *Square) Move(x, y int) {
s.point.x = x
s.point.y = y
}
func main() {
var m Mover
m = &Circle{point: Point{1, 2}}
m.Move(3, 4)
fmt.Println(m)
m = &Square{3, Point{1, 2}}
m.Move(4, 5)
fmt.Println(m)
}