Is a "local memory program" the same thing as a "serial program"?

Is a "local memory program" the same thing as a "serial program"? - serialization

Would something like this, that does not implement pthreads, MPI, openMP, etc. be considered a local memory program?
int main() {
int arr[] = { 10, 7, 8, 9, 1, 5 };
int n = sizeof(arr) / sizeof(arr[0]);
quickSort(arr, 0, n - 1);
printf("Sorted quicksort: \n");
printArray(arr, n);
return 0;
}

Related

ffind leaf nodes of the binary search tree

i was asked in a interview question that given the preorder traversal of a binary search tree , find out the leaf nodes without constructing the original tree. i know the property that binary search tree has to satisfy but i cannot find any relation into how can it be done utilising this property . only thing i can identify is that the first node in th preorder traversal will be always be root. also google search did not yield any result for this problem. i do not want the code just a simple hint to begin with would be sufficient.
EDIT: after trying out a lot i got this solution:
#include<iostream>
#include<vector>
#include<string>
using namespace std;
void fl(vector<int> &v, int lo, int hi){
if (lo>hi) return;
if (lo == hi) { cout<<"leaf ^^^^^^^ "<< v[hi]<<"\n"; return; }
int root = v[lo];
int i;
for(i = lo+1 ; i <= hi ; i++) if (v[i] > root) break;
fl(v, lo+1, i -1);
fl(v, i , hi);
}
int main(){
vector<int> v1 = {8, 3, 1, 6, 4, 7, 10, 14, 13};
vector<int> v2 = {27, 14, 10, 19, 35, 31, 42};
vector<int> v3 = {9,8,7,6,5,4,3,2,1};
fl(v3,0,v3.size()-1);
return 0;
}
any suggestions for improvement other than variable names will be very helpful

This program should print the leaf nodes from a preOrder of BST. The program is pretty self explanatory.
public static void findLeafs(int[] arr) {
if (arr == null || arr.length == 0)
return;
Stack<Integer> stack = new Stack<>();
for(int n = 1, c = 0; n < arr.length; n++, c++) {
if (arr[c] > arr[n]) {
stack.push(arr[c]);
} else {
boolean found = false;
while(!stack.isEmpty()) {
if (arr[n] > stack.peek()) {
stack.pop();
found = true;
} else
break;
}
if (found)
System.out.println(arr[c]);
}
}
System.out.println(arr[arr.length-1]);
}

def getLeafNodes(data):
if data:
root=data[0]
leafNodes=[]
process(data[1:],root,leafNodes)
return leafNodes
def process(data,root,leafNodes):
if data:
left=[]
right=[]
for i in range(len(data)):
if data[i]<root:
left.append(data[i])
if data[i]>root:
right.append(data[i])
if len(left)==0 and len(right)==0:
leafNodes.append(root)
return
if len(left)>0:
process(left[1:],left[0],leafNodes)
if len(right)>0:
process(right[1:],right[0],leafNodes)
else:
leafNodes.append(root)
#--Run--
print getLeafNodes([890,325,290,530,965])

Have to compare equality of 5 numbers and choose best of 3 in case in C

I have a situation where i have to write a code to compare 5 integers to see which ones are equal and choose atleast 3 equal numbers of the 5 nrs. More than:-
e.g.
a=0,b=0,c=0,d=2,c=3
then Valid_nr=0,
a=6,b=0,c=6,d=6,c=6
then Valid_nr=6,
a=8,b=8,c=8,d=8,c=8
then Valid_nr=8
Please suggest me some logic because this seems to be very confusing to write a short and efficient code to achieve this..

Example written in C#:
class Program
{
static List<int> numbers = new List<int>();
static int NumOfequals = 1;
static int ValidNUmber;
static void Main(string[] args)
{
for (int i=0;i<5;i++)
{
int input;
Console.WriteLine("Enter number " + (i+1).ToString() + "and press enter");
if (int.TryParse(Console.ReadLine(),out input))
{
numbers.Add(input);
}
else
{
Console.WriteLine("not an integer");
}
}
numbers.Sort();
for (int i=0;i<numbers.Count;i++)
{
if (i<numbers.Count-1)
{
if (numbers[i]==numbers[i+1])
{
NumOfequals++;
ValidNumber=numbers[i];
}
else
{
if (NumOfequals < 3)
{
NumOfequals = 1;
ValidNumber=-1;
}
}
}
}
Console.WriteLine("number of equal numbers is " + NumOfequals.ToString()+" and the valid number is:"+ValidNumber.ToString());
Console.ReadLine();
}
}

I'll show you a function in C that generalize the problem in your question, equalities() compares the elements of an array of int and return the maximum number of equalities. A pointer to an int is passed too to store the value of the repeted number. You can easily specialize this function for your problem or use it as it is and check if the returned value is >= 3 only.
#include <stdio.h>
#include <stdlib.h>
int equalities( int *nums, int n, int *x ) {
int * vals = malloc(n * sizeof(int));
int * reps = calloc(n,sizeof(int));
int i,j;
for ( i = 0; i < n; ++i ) {
for ( j = 0; j < n && reps[j] != 0 && vals[j] != nums[i]; ++j );
if ( j != 0 && reps[j] == reps[0]) { // new max
vals[j] = vals[0];
vals[0] = nums[i];
++reps[0];
} else {
vals[j] = nums[i];
++reps[j];
}
}
*x = vals[0];
int rep = reps[0];
free(vals);
free(reps);
return rep;
}
int main(void) {
int test1[] = { 1, 4, 2, 4, 4 };
int test2[] = { 0, 0, 2, 0, 0 };
int test3[] = { 1, 3, 2, 4, 5 };
int test4[] = { 1, 1, 1, 1, 1 };
int test5[] = { 5, 5, 0, 5, 0 };
int y = -1;
int r = equalities(test1,5,&y);
printf("Repetitions: %d, Number: %d\n",r,y);
r = equalities(test2,5,&y);
printf("Repetitions: %d, Number: %d\n",r,y);
r = equalities(test3,5,&y);
printf("Repetitions: %d, Number: %d\n",r,y);
r = equalities(test4,5,&y);
printf("Repetitions: %d, Number: %d\n",r,y);
r = equalities(test5,5,&y);
printf("Repetitions: %d, Number: %d\n",r,y);
return 0;
}
The output of the test main() is:
Repetitions: 3, Number: 4
Repetitions: 4, Number: 0
Repetitions: 1, Number: 1
Repetitions: 5, Number: 1
Repetitions: 3, Number: 5

OpenCL Local memory and Xcode

I'm trying to learn OpenCL on a Mac, which appears to have some differences in implementation from the OpenCL book I'm reading. I want to be able to dynamically allocate local memory on the GPU. What I'm reading is I need to use the clSetKernelArg function, but that doesn't work within Xcode 6.4. Here's the code as it stands (never mind it's a pointless program, just trying to learn the syntax for shared memory). In Xcode, the kernel is written as a stand-alone .cl file similar to CUDA, so that's a separate file.
add.cl:
kernel void add(int a, int b, global int* c, local int* d)
{
d[0] = a;
d[1] = b;
*c = d[0] + d[1];
}
main.c:
#include <stdio.h>
#include <OpenCL/opencl.h>
#include "add.cl.h"
int main(int argc, const char * argv[]) {
int a = 3;
int b = 5;
int c;
int* cptr = &c;
dispatch_queue_t queue = gcl_create_dispatch_queue(CL_DEVICE_TYPE_GPU, NULL);
void* dev_c = gcl_malloc(sizeof(cl_int), NULL, CL_MEM_WRITE_ONLY);
// attempt to create local memory buffer
void* dev_d = gcl_malloc(2*sizeof(cl_int), NULL, CL_MEM_READ_WRITE);
// clSetKernelArg(add_kernel, 3, 2*sizeof(cl_int), NULL);
dispatch_sync(queue, ^{
cl_ndrange range = { 1, {0, 0, 0}, {1, 0, 0}, {1, 0, 0} };
// This gives a warning:
// Warning: Incompatible pointer to integer conversion passing 'cl_int *'
// (aka 'int *') to parameter of type 'size_t' (aka 'unsigned long')
add_kernel(&range, a, b, (cl_int*)dev_c, (cl_int*)dev_d);
gcl_memcpy((void*)cptr, dev_c, sizeof(cl_int));
});
printf("%d + %d = %d\n", a, b, c);
gcl_free(dev_c);
dispatch_release(queue);
return 0;
}
I've tried putting clSetKernelArg where indicated and it doesn't like the first argument:
Error: Passing 'void (^)(const cl_ndrange *, cl_int, cl_int, cl_int *, size_t)' to parameter of incompatible type 'cl_kernel' (aka 'struct _cl_kernel *')
I've looked and looked but can't find any examples illustrating this point within the Xcode environment. Can you point me in the right direction?

Managed to solve this by ditching Apple's extensions and using standard OpenCL 1.2 calls. That means replacing gcl_malloc with clCreateBuffer, replacing dispatch_sync with clEnqueueNDRangeKernel, and most importantly, using clSetKernelArg with NULL in the last argument for local variables. Works like a charm.
Here's the new version:
char kernel_add[1024] =
"kernel void add(int a, int b, global int* c, local int* d) \
{\
d[0] = a;\
d[1] = b;\
*c = d[0] + d[1];\
}";
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <OpenCL/opencl.h>
int main(int argc, const char * argv[]) {
int a = 3;
int b = 5;
int c;
cl_device_id device_id;
int err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device_id, NULL);
cl_context context = clCreateContext(0, 1, &device_id, NULL, NULL, &err);
cl_command_queue queue = clCreateCommandQueue(context, device_id, 0, &err);
const char* srccode = kernel;
cl_program program = clCreateProgramWithSource(context, 1, &srccode, NULL, &err);
err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
cl_kernel kernel = clCreateKernel(program, "kernel_add", &err);
cl_mem dev_c = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(int), NULL, NULL);
err = clSetKernelArg(kernel, 0, sizeof(int), &a);
err |= clSetKernelArg(kernel, 1, sizeof(int), &b);
err |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &dev_c);
err |= clSetKernelArg(kernel, 3, sizeof(int), NULL);
size_t one = 1;
err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &one, NULL, 0, NULL, NULL);
clFinish(queue);
err = clEnqueueReadBuffer(queue, dev_c, true, 0, sizeof(int), &c, 0, NULL, NULL);
clReleaseMemObject(dev_c);
clReleaseKernel(kernel);
clReleaseProgram(program);
clReleaseCommandQueue(queue);
clReleaseContext(context);
return 0;
}

In regular OpenCL, for a kernel parameter declared as a local pointer, you don't allocate a host buffer and pass it in (like you're doing with dev_d). Instead you do a clSetKernelArg with the size of the desired local storage but a NULL pointer (like this: clSetKernelArg(kernel, 2, sizeof(cl_int) * local_work_size[0], NULL)). You'll have to translate that into the Xcode way if you insist on being platform-specific.

Whats wrong with my code? The backwards loop doesnt work (Option 2)

The option (2) crashes/overloads my coding software, it has the same code as option (1), does anybody know why its doing it and how to fix it?
#include "aservelibs/aservelib.h"
#include <stdio.h>
#include <math.h>
int length();
float mtof(int note);
int main() {
// do while the user hasnt pressed exit key (whatever)
int control[8] = {74, 71, 91, 93, 73, 72, 5, 84};
int index;
int mod;
float frequency;
int notes[8];
int response;
mod = aserveGetControl(1);
// ask backwards, forwards, exit
// SCALING
// (getControl(75) / ((127 - 0) / (1000 - 100))) + 100;
while(true) {
printf("Run Loop Forwards (1), Backwards (2), Exit (0)\n");
scanf("%d", &response);
if(response == 1) {
while(mod == 0) {
for(index = 0; index < 8; index++) {
notes[index] = aserveGetControl(control[index]);
frequency = mtof(notes[index]);
aserveOscillator(0, frequency, 1.0, 0);
aserveSleep(length());
printf("Slider Value:%5d\n", notes[index]);
mod = aserveGetControl(1);
}
}
} else if(response == 2) {
// here is the part where the code is exactly
// the same apart from the for loop which is
// meant to make the loop go backwards
while(mod == 0) {
for(index = 8; index > 0; index--) {
notes[index] = aserveGetControl(control[index]);
frequency = mtof(notes[index]);
aserveOscillator(0, frequency, 1.0, 0);
aserveSleep(length());
printf("Slider Value:%5d\n", notes[index]);
mod = aserveGetControl(1);
}
}
} else if(response == 0) {
return 0;
}
}
}
int length() {
return (aserveGetControl(75)/((127.0 - 0) / (1000 - 100))) + 100;
}
float mtof(int note) {
return 440 * pow(2, (note-69) / 12.0);
}

Your for loops aren't exactly the same.
The first option goes through { 0, 1, ..., 7 }
The second option goes through { 8, 7, ..., 1 }
Notice also that control[8] is undefined (0..7). So when it tries to reference this location the application runs into an error.
Change the second for loop to
for (index = 7; index >= 0; index--) {
...
}

OpenCL - Kernel crashes on the second run

I am trying to run a code which works the first time but crashes the second time that it runs. The function which causes the crash is part of the class Octree_GPU and is this:
int Octree_GPU::runCreateNodeKernel(int length)
{
cl_uint nodeLength;
if(nodeNumsArray[length-1] == 0)
nodeLength = nodeAddArray[length-1];
else
nodeLength = nodeAddArray[length-1]+8;
nodeArray = (cl_uint*)malloc(sizeof(cl_uint)*nodeLength);
nodePointsArray = (cl_int*)malloc(sizeof(cl_uint)*nodeLength);
startIndexArray = (cl_int*)malloc(sizeof(cl_int)*nodeLength);
d_nodeAdd = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(cl_uint)*length, NULL, &err);
d_nodeArray = clCreateBuffer(context,CL_MEM_READ_WRITE, sizeof(cl_uint)*temp_length, NULL, &err);
d_numPoints = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(cl_uint)*length, NULL, &err);
d_pointIndex = clCreateBuffer(context, CL_MEM_READ_WRITE,sizeof(cl_uint)*length,NULL, &err);
d_nodePointsArray = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(cl_int)*temp_length, NULL, &err);
d_nodeIndexArray = clCreateBuffer(context,CL_MEM_READ_WRITE, sizeof(cl_int)*temp_length, NULL, &err);
err |= clEnqueueWriteBuffer(commands, d_nodeAdd, CL_TRUE, 0, sizeof(cl_uint)*length, nodeAddArray, 0, NULL,NULL);
err |= clEnqueueWriteBuffer(commands, d_numPoints,CL_TRUE, 0, sizeof(cl_uint)*length,numPointsArray,0,NULL,NULL);
err |= clEnqueueWriteBuffer(commands, d_pointIndex, CL_TRUE, 0, sizeof(cl_uint)*length,pointStartIndexArray,0, NULL, NULL);
clFinish(commands);
err = clSetKernelArg(createNodeKernel, 0, sizeof(cl_mem), &d_odata);
err |= clSetKernelArg(createNodeKernel, 1, sizeof(cl_mem), &d_nodeNums);
err |= clSetKernelArg(createNodeKernel, 2, sizeof(cl_mem), &d_nodeAdd);
err |= clSetKernelArg(createNodeKernel, 3, sizeof(cl_mem), &d_numPoints);
err |= clSetKernelArg(createNodeKernel, 4, sizeof(cl_mem), &d_pointIndex);
err |= clSetKernelArg(createNodeKernel, 5, sizeof(cl_mem), &d_nodeArray);
err |= clSetKernelArg(createNodeKernel, 6, sizeof(cl_mem), &d_nodePointsArray);
err |= clSetKernelArg(createNodeKernel, 7, sizeof(cl_mem), &d_nodeIndexArray);
clFinish(commands);
if(err != CL_SUCCESS) {
printf("Cannot set Kernel Arg \n");
exit(1);
}
size_t global_size[1] = {limit-1};
err = clEnqueueNDRangeKernel(commands, createNodeKernel, 1, NULL, global_size, NULL, 0, NULL, NULL);
if(err != CL_SUCCESS) {
printf(" Kernel does not work \n");
exit(1);
}
clFinish(commands);
err = clEnqueueReadBuffer(commands, d_nodeArray, CL_TRUE, 0, sizeof(cl_uint)*temp_length, nodeArray, 0, NULL, NULL);
err|= clEnqueueReadBuffer(commands, d_nodePointsArray, CL_TRUE, 0, sizeof(cl_int)*nodeLength, nodePointsArray, 0, NULL, NULL);
err|= clEnqueueReadBuffer(commands, d_nodeIndexArray, CL_TRUE, 0, sizeof(cl_int)*nodeLength, startIndexArray, 0, NULL, NULL);
clFinish(commands);
clReleaseMemObject(d_nodeAdd);
clReleaseMemObject(d_numPoints);
clReleaseMemObject(d_nodeArray);
clReleaseMemObject(d_nodePointsArray);
clFinish(commands);
return 0;
}
Please note that d_odata and d_nodeNums have been declared in the previous functions. The kernel code is given below for the same:
__kernel void createNode(__global int* uniqueCode, __global int* nodeNums,__global int* nodeAdd, __global int* numPoints, __global int* pointIndex,__global int* nodeArray, __global int* nodePoints,__global int* nodeIndex)
{
int ig = get_global_id(0);
int add;
int num = uniqueCode[ig];
int pt = numPoints[ig];
int ind = pointIndex[ig];
int temp,j;
if(nodeNums[ig] == 8)
{
for(int i=0;i<8;i++)
{
temp = ((int)num/10)*10+i;
add = nodeAdd[ig] + i;
nodeArray[add] = temp;
nodePoints[add] = select(0, pt, temp==num);
nodeIndex[add] = select(-1, ind, temp==num);
barrier(CLK_LOCAL_MEM_FENCE);
}
}
else
{
j = num % 10;
nodeAdd[ig] = nodeAdd[ig-1];
add = nodeAdd[ig]+j;
nodePoints[add] = pt;
nodeIndex[add] = ind;
barrier(CLK_LOCAL_MEM_FENCE);
}
}
I have tried to find out why but have not succeeded. I might be overlooking something really simple. Thank you for your help.

I'm not 100% sure this is causing the crash, but where you've written
if(nodeNums[ig] == 8)
{
for(int i=0;i<8;i++)
{
barrier(CLK_LOCAL_MEM_FENCE);
}
}
else
{
barrier(CLK_LOCAL_MEM_FENCE);
}
This means that different threads in a work group will be executing different numbers of barriers, which may cause a hang/crash. A barrier (with CLK_LOCAL_MEM_FENCE) is for synchronising accesses to local memory, so all work items in a group must execute this before continuing
On a non crash note, it looks like you're using CLK_LOCAL_MEM_FENCE (ensure that local memory accesses are visible across threads) when you mean CLK_GLOBAL_MEM_FENCE (ensure that global memory accesses are visible across threads)
Also
nodeAdd[ig] = nodeAdd[ig-1];
Is not correct for ig == 0. This may not be causing the actual crash (because I've found that OpenCL can be unfortunately quite forgiving), but its worth fixing

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Is a "local memory program" the same thing as a "serial program"? - serialization

Would something like this, that does not implement pthreads, MPI, openMP, etc. be considered a local memory program? int main() { int arr[] = { 10, 7, 8, 9, 1, 5 }; int n = sizeof(arr) / sizeof(arr[0]); quickSort(arr, 0, n - 1); printf("Sorted quicksort: \n"); printArray(arr, n); return 0; }

Related

ffind leaf nodes of the binary search tree

Have to compare equality of 5 numbers and choose best of 3 in case in C

OpenCL Local memory and Xcode

Whats wrong with my code? The backwards loop doesnt work (Option 2)

OpenCL - Kernel crashes on the second run

Categories

Resources