OpenACC; copy_in not working? - gpu

I have this sample code:
#include <stdio.h>
#include <stdlib.h>
#ifdef _OPENACC
#include <openacc.h>
#endif
#define N 1000
int main() {
#ifdef _OPENACC
acc_init(acc_device_not_host);
printf(" Compiling with OpenACC support \n");
#endif
double * a;
int n = 100;
a = (double *) malloc(n * sizeof(double));
for (int i = 0; i < n; i++)
a[i] = 1.0f;
#pragma acc data copy_in(a[0:n])
{
#pragma acc kernels loop
for (int i = 0; i < n; i++)
a[i] = (double) i + a[i];
}
#ifdef _OPENACC
acc_shutdown(acc_device_not_host);
#endif
printf("Value of a[10]: %lf\n", a[10]);
return 0;
}
Teacher told me that the output is 1.0, because I have copy_in; then, a is copied on the acceñeratpr, but when it ends, a contains 1.0 in every position; but if I run this code I get 11.0, why?

There's a couple of things going on here. First, the correct clause is copyin (no underscore). Second, since you're only copying the input values into the region, any changes made within the data region will not come back to the CPU, so unless you're running this on a shared memory system, for example running on a multicore CPU, then the value of a at your printf statement will be like that loop never ran. In order to get the results back from the data region, you'll actually want a copy clause instead. That informs the compiler to copy in the input values to the region and copy out the output values from the region.
Since you're getting 11, clearly the loop is getting run somewhere. What compiler are you using and what flags? Either you're not actually building with OpenACC enabled or you're running on a shared memory target and your teacher isn't.

Related

How to use scanf in a Swift environment

I am practicing Objective C to get a better understanding of C and was using the newest Xcode, but using the terminal to write simple programs. In the the program below is can't seem to get the scanf function to work. Is there a different function that I can use to input data into the terminal to check the rest of syntax and coding?
#import <Foundation/Foundation.h>
int main (int argc, char *argv[])
{
int n, number, triangularNumber;
NSLog (#"What triangular number do you want?");
scanf ("%i", &number);
triangularNumber = 0;
for ( n = 1; n <= number; ++n )
triangularNumber += n;
NSLog (#"Triangular number %i is %i\n", number, triangularNumber);
return 0;
}
You can't have a space in between the scanf and (). The scanf function should turn purple when done correctly. Just take out the space and you should be fine.
You can try this (this is Swift, Objective C is the same):
let handle = NSFileHandle.fileHandleWithStandardInput()
let input = NSString(data: handle.availableData, encoding: NSUTF8StringEncoding)

Returning Factorials greater than 12 in Objective-C

I have the following code, and I want it to give me factorial values up to 20.
-(NSUInteger)factorialofNumber(NSUInteger n){
static NSUInteger f[N + 1];
static NSUInteger i = 0;
if (i == 0)
{
f[0] = 1;
i = 1;
}
while (i <= n)
{
f[i] = i * f[i - 1];
i++;
}
return f[n];
}
The issue is that when this executes, all values above 12 are incorrect. I have placed this in my prefix file to try and have larger values available but it hasn't fixed the problem. Any hints to work around this?
#define NS_BUILD_32_LIKE_64 1
Factorial of 13 is bigger than 32 bit unsigned integer.
Instead of NSUInteger you could use unsigned long long or uint64_t. This way you always get 64 bit values and keep binary compatibility, which might suffer with NS_BUILD_32_LIKE_64 declared. 64-Bit Transition Guide for Cocoa:
The NS_BUILD_32_LIKE_64 macro is useful when binary compatibility is not a concern, such as when building an application.

Get the boot time in objective c

how can i get the boot time of ios in objective c ?
Is there a way to get it?
Don't know if this will work in iOS, but in OS X (which is essentially the same OS) you would use sysctl(). This is how the OS X Unix utility uptime does it. Source code is available - search for "boottime".
#include <sys/types.h>
#include <sys/sysctl.h>
// ....
#define MIB_SIZE 2
int mib[MIB_SIZE];
size_t size;
struct timeval boottime;
mib[0] = CTL_KERN;
mib[1] = KERN_BOOTTIME;
size = sizeof(boottime);
if (sysctl(mib, MIB_SIZE, &boottime, &size, NULL, 0) != -1)
{
// successful call
NSDate* bootDate = [NSDate dateWithTimeIntervalSince1970:boottime.tv_sec];
}
The restricted nature of programming in the iOS sandboxed environment might make it not work, I don't know, I haven't tried it.
I took JeremyP's answer, gave the result the full microsecond precision, clarified the names of local variables, improved the order, and put it into a method:
#include <sys/types.h>
#include <sys/sysctl.h>
// ....
+ (nullable NSDate *)bootDate
{
// nameIntArray and nameIntArrayLen
int nameIntArrayLen = 2;
int nameIntArray[nameIntArrayLen];
nameIntArray[0] = CTL_KERN;
nameIntArray[1] = KERN_BOOTTIME;
// boot_timeval
struct timeval boot_timeval;
size_t boot_timeval_size = sizeof(boot_timeval);
if (sysctl(nameIntArray, nameIntArrayLen, &boot_timeval, &boot_timeval_size, NULL, 0) == -1)
{
return nil;
}
// bootSince1970TimeInterval
NSTimeInterval bootSince1970TimeInterval = (NSTimeInterval)boot_timeval.tv_sec + ((NSTimeInterval)boot_timeval.tv_usec / 1000000);
// return
return [NSDate dateWithTimeIntervalSince1970:bootSince1970TimeInterval];
}

Dynamic allocating an array (dynamic size of the vector implementation)

In the obj-c, we can create vector objects as follows:
SomeClass* example[100];
or
int count[7000];
But what if we know the size of the vector only at the time init the class?
(Maybe we need example[756] or count[15])
First of all, those aren't vector objects, they're compile-time arrays. One of the features of compile time arrays is automatic memory management; that is, you don't have to worry about allocation and deallocation of these arrays.
If you want to create an array whose size you don't know until runtime, you'll need to use new[] and delete[]:
int size = somenumber;
int* arr = new int[size];
// use arr
arr[0] = 4;
// print the first value of arr which is 4
cout << arr[0];
The catch is that after you're done with this array, you have to deallocate it:
delete[] arr;
If you forget to deallocate something created by new with a corresponding delete1, you'll create a memory leak.
You are probably better off using std::vector though because it manages memory for you automatically:
// include the header
#include <vector>
using namespace std; // so we don't have std:: everywhere
vector<int> vec; // create a vector for ints
vec.push_back(4); // add some data
vec.push_back(5);
vec.push_back(6);
// vec now holds 4, 5, and 6
cout << vec[0]; // print the first element of vec which is 4
// we can add as many elements to vec as we want without having to use any
// deallocation functions on it like delete[] or anything
// when vec goes out of scope, it will clean up after itself and you won't have any leaks
1 Make sure you use delete on pointers that you created with new and delete[] on pointers you make with new[x]. Do not mix and match them. Again, if you use std::vector, you don't have to worry about this.
Why not just use an std::vector
//file.mm
#include <vector>
-(void)function
{
std::vector<int> count;
std::vector<SomeClass*> example;
count.push_back(10); // add 10 to the array;
count.resize(20); // make count hold 20 objects
count[10] = 5; //set object at index of 10 to the value of 5
}
Then you do something like:
SomeClass **example = calloc(numClasses, sizeof(SomeClass *));
or:
int *count = malloc(num_of_counts * sizeof(int));
Note that you should:
#include <stdlib.h>
C++ cannot make global/local arrays of a variable size, only dynamic arrays on the heap.
int main() {
int variable = 100;
SomeClass* example = new SomeClass[variable];
//do stuff
delete [] example; //DO NOT FORGET THIS. Better yet, use a std::vector
return 0;
}
I don't know anything about objective-C, but your question is probably only one or the other.

Quickest way to be sure region of memory is blank (all NULL)?

If I have an unsigned char *data pointer and I want to check whether size_t length of the data at that pointer is NULL, what would be the fastest way to do that? In other words, what's the fastest way to make sure a region of memory is blank?
I am implementing in iOS, so you can assume iOS frameworks are available, if that helps. On the other hand, simple C approaches (memcmp and the like) are also OK.
Note, I am not trying to clear the memory, but rather trying to confirm that it is already clear (I am trying to find out whether there is anything at all in some bitmap data, if that helps). For example, I think the following would work, though I have not tried it yet:
- BOOL data:(unsigned char *)data isNullToLength:(size_t)length {
unsigned char tester[length] = {};
memset(tester, 0, length);
if (memcmp(tester, data, length) != 0) {
return NO;
}
return YES;
}
I would rather not create a tester array, though, because the source data may be quite large and I'd rather avoid allocating memory for the test, even temporarily. But I may just being too conservative there.
UPDATE: Some Tests
Thanks to everyone for the great responses below. I decided to create a test app to see how these performed, the answers surprised me, so I thought I'd share them. First I'll show you the version of the algorithms I used (in some cases they differ slightly from those proposed) and then I'll share some results from the field.
The Tests
First I created some sample data:
size_t length = 1024 * 768;
unsigned char *data = (unsigned char *)calloc(sizeof(unsigned char), (unsigned long)length);
int i;
int count;
long check;
int loop = 5000;
Each test consisted of a loop run loop times. During the loop some random data was added to and removed from the data byte stream. Note that half the time there was actually no data added, so half the time the test should not find any non-zero data. Note the testZeros call is a placeholder for calls to the test routines below. A timer was started before the loop and stopped after the loop.
count = 0;
for (i=0; i<loop; i++) {
int r = random() % length;
if (random() % 2) { data[r] = 1; }
if (! testZeros(data, length)) {
count++;
}
data[r] = 0;
}
Test A: nullToLength. This was more or less my original formulation above, debugged and simplified a bit.
- (BOOL)data:(void *)data isNullToLength:(size_t)length {
void *tester = (void *)calloc(sizeof(void), (unsigned long)length);
int test = memcmp(tester, data, length);
free(tester);
return (! test);
}
Test B: allZero. Proposal by Carrotman.
BOOL allZero (unsigned char *data, size_t length) {
bool allZero = true;
for (int i = 0; i < length; i++){
if (*data++){
allZero = false;
break;
}
}
return allZero;
}
Test C: is_all_zero. Proposed by Lundin.
BOOL is_all_zero (unsigned char *data, size_t length)
{
BOOL result = TRUE;
unsigned char* end = data + length;
unsigned char* i;
for(i=data; i<end; i++) {
if(*i > 0) {
result = FALSE;
break;
}
}
return result;
}
Test D: sumArray. This is the top answer from the nearly duplicate question, proposed by vladr.
BOOL sumArray (unsigned char *data, size_t length) {
int sum = 0;
for (int i = 0; i < length; ++i) {
sum |= data[i];
}
return (sum == 0);
}
Test E: lulz. Proposed by Steve Jessop.
BOOL lulz (unsigned char *data, size_t length) {
if (length == 0) return 1;
if (*data) return 0;
return memcmp(data, data+1, length-1) == 0;
}
Test F: NSData. This is a test using NSData object I discovered in the iOS SDK while working on all of these. It turns out Apple does have an idea of how to compare byte streams that is designed to be hardware independent.
- (BOOL)nsdTestData: (NSData *)nsdData length: (NSUInteger)length {
void *tester = (void *)calloc(sizeof(void), (unsigned long)length);
NSData *nsdTester = [NSData dataWithBytesNoCopy:tester length:(NSUInteger)length freeWhenDone:NO];
int test = [nsdData isEqualToData:nsdTester];
free(tester);
return (test);
}
Results
So how did these approaches compare? Here are two sets of data, each representing 5000 loops through the check. First I tried this on the iPhone Simulator running on a relatively old iMac, then I tried this running on a first generation iPad.
On the iPhone 4.3 Simulator running on an iMac:
// Test A, nullToLength: 0.727 seconds
// Test F, NSData: 0.727
// Test E, lulz: 0.735
// Test C, is_all_zero: 7.340
// Test B, allZero: 8.736
// Test D, sumArray: 13.995
On a first generation iPad:
// Test A, nullToLength: 21.770 seconds
// Test F, NSData: 22.184
// Test E, lulz: 26.036
// Test C, is_all_zero: 54.747
// Test B, allZero: 63.185
// Test D, sumArray: 84.014
These are just two samples, I ran the test many times with only slightly varying results. The order of performance was always the same: A & F very close, E just behind, C, B, and D. I'd say that A, F, and E are virtual ties, on iOS I'd prefer F because it takes advantage of Apple's protection from processor change issues, but A & E are very close. The memcmp approach clearly wins over the simple loop approach, close to ten times faster in the simulator and twice as fast on the device itself. Oddly enough, D, the winning answer from the other thread performed very poorly in this test, probably because it does not break out of the loop when it hits the first difference.
I think you should do it with an explicit loop, but just for lulz:
if (length == 0) return 1;
if (*pdata) return 0;
return memcmp(pdata, pdata+1, length-1) == 0;
Unlike memcpy, memcmp does not require that the two data sections don't overlap.
It may well be slower than the loop, though, because the un-alignedness of the input pointers means there probably isn't much the implementation of memcmp can do to optimize, plus it's comparing memory with memory rather than memory with a constant. Easy enough to profile it and find out.
Not sure if it's the best, but I probably would do something like this:
bool allZero = true;
for (int i = 0; i < size_t; i++){
if (*data++){
//Roll back so data points to the non-zero char
data--;
//Do whatever is needed if it isn't zero.
allZero = false;
break;
}
}
If you've just allocated this memory, you can always call calloc rather than malloc (calloc requires that all the data is zeroed out). (Edit: reading your comment on the first post, you don't really need this. I'll just leave it just in case)
If you're allocating the memory yourself, I'd suggest using the calloc() function. It's just like malloc(), except it zeros out the buffer first. It's what's used to allocate memory for Objective-C objects and is the reason that all ivars default to 0.
On the other hand, if this is a statically declared buffer, or a buffer you're not allocating yourself, memset() is the easy way to do this.
Logic to get a value, check it, and set it will be at least as expensive as just setting it. You want it to be null, so just set it to null using memset().
This would be the preferred way to do it in C:
BOOL is_all_zero (const unsigned char* data, size_t length)
{
BOOL result = TRUE;
const unsigned char* end = data + length;
const unsigned char* i;
for(i=data; i<end; i++)
{
if(*i > 0)
{
result = FALSE;
break;
}
}
return result;
}
(Though note that strictly and formally speaking, a memory cell containing a NULL pointer mustn't necessarily be 0, as long as a null pointer cast results in the value zero, and a cast of a zero to a pointer results in a NULL pointer. In practice, this shouldn't matter as all known compilers use 0 or (void*) 0 for NULL.)
Note the edit to the initial question above. I did some tests and it is clear that the memcmp approach or using Apple's NSData object and its isEqualToData: method are the best approaches for speed. The simple loops are clearer to me, but slower on the device.