Some relevant info I am on OSX using GCD in Objective-C. I have a background task that produces a very large const char * this is then re-introduced into a new background task. This cycle repeats essentially until the const char* is empty. Currently I am creating and using NSStrings in the blocks and then going back to char* immediately. As you can imagine this does a ton of unnecessary copying of all that.
I am wondering how __block variables work for non-objects or how I can get away from NSStrings?
Or
How is memory managed for non-object types?
It is currently just blowing up with ~2 gigs of memory all from the strings.
Here is how it currently looks:
-(void)doSomething:(NSString*)input{
__block NSString* blockCopy = input;
void (^blockTask)(void);
blockTask = ^{
const char* input = [blockCopy UTF8String];
//remainder will point to somewhere along input
const char* remainder = NULL;
myCoolCFunc(input,&remainder);
if(remainder != NULL && remainder[0] != '\0'){
//this is whats killing me the NSString creation of remainder
[self doSomething:#(remainder)];
}
}
/*...create background queue if needed */
dispatch_async(backgroundQueue,blockTask);
}
There is no need to use NSString at all and no need to use the __block attribute:
-(void)doSomething:(const char *)input{
void (^blockTask)(void);
blockTask = ^{
const char* remainder = NULL;
myCoolCFunc(input,&remainder);
if(remainder != NULL && remainder[0] != '\0'){
[self doSomething:remainder];
}
}
/*...create background queue if needed */
dispatch_async(backgroundQueue,blockTask);
}
There is also little need to use recursion either, as an iterative approach is also possible.
blockTask = ^{
const char* remainder = NULL;
while (YES) {
myCoolCFunc(input,&remainder);
if(remainder == NULL || remainder[0] == '\0')
break;
input = remainder;
}
}
Related
I noticed that, while generating a texture from an MTLBuffer created from mmap() via newBufferWithBytesNoCopy, if the size requested by the len argument to mmap, page aligned, is larger than the actual size of the file, page aligned, the mmap call succeeds, and the newBufferWithBytesNoCopy message does not result in a nil return or error, but when I pass the buffer to the GPU to copy the data to an MTLTexture, the following is printed to the console, and all GPU commands fail to perform any action:
Execution of the command buffer was aborted due to an error during execution. Internal Error (IOAF code -536870211)
Here is code to demonstrate the problem:
static id<MTLDevice> Device;
static id<MTLCommandQueue> Queue;
static id<MTLTexture> BlockTexture[3];
#define TEX_LEN_1 1 // These are all made 1 in this question for simplicity
#define TEX_LEN_2 1
#define TEX_LEN_4 1
#define TEX_SIZE ((TEX_LEN_1<<10)+(TEX_LEN_2<<11)+(TEX_LEN_4<<12))
#define PAGE_ALIGN(S) ((S)+PAGE_SIZE-1&~(PAGE_SIZE-1))
int main(void) {
if (!(Queue = [Device = MTLCreateSystemDefaultDevice() newCommandQueue]))
return EXIT_FAILURE;
#autoreleasepool {
const id<MTLBuffer> data = ({
void *const map = ({
NSFileHandle *const file = [NSFileHandle fileHandleForReadingAtPath:[NSBundle.mainBundle pathForResource:#"Content" ofType:nil]];
if (!file)
return EXIT_FAILURE;
mmap(NULL, TEX_SIZE, PROT_READ, MAP_SHARED, file.fileDescriptor, 0);
});
if (map == MAP_FAILED)
return errno;
[Device newBufferWithBytesNoCopy:map length:PAGE_ALIGN(TEX_SIZE) options:MTLResourceStorageModeShared deallocator:^(void *const ptr, const NSUInteger len){
munmap(ptr, len);
}];
});
if (!data)
return EXIT_FAILURE;
const id<MTLCommandBuffer> buffer = [Queue commandBuffer];
const id<MTLBlitCommandEncoder> encoder = [buffer blitCommandEncoder];
if (!encoder)
return EXIT_FAILURE;
{
MTLTextureDescriptor *const descriptor = [MTLTextureDescriptor new];
descriptor.width = descriptor.height = 32;
descriptor.mipmapLevelCount = 6;
descriptor.textureType = MTLTextureType2DArray;
descriptor.storageMode = MTLStorageModePrivate;
const enum MTLPixelFormat format[] = {MTLPixelFormatR8Unorm, MTLPixelFormatRG8Unorm, MTLPixelFormatRGBA8Unorm};
const NSUInteger len[] = {TEX_LEN_1, TEX_LEN_2, TEX_LEN_4};
for (NSUInteger i = 3, off = 0; i--;) {
descriptor.pixelFormat = format[i];
const NSUInteger l = descriptor.arrayLength = len[i];
const id<MTLTexture> texture = [Device newTextureWithDescriptor:descriptor];
if (!texture)
return EXIT_FAILURE;
const NSUInteger br = 32<<i, bi = 1024<<i;
for (NSUInteger j = 0; j < l; off += bi)
[encoder copyFromBuffer:data sourceOffset:off sourceBytesPerRow:br sourceBytesPerImage:bi sourceSize:(const MTLSize){32, 32, 1} toTexture:texture destinationSlice:j++ destinationLevel:0 destinationOrigin:(const MTLOrigin){0}];
[encoder generateMipmapsForTexture:BlockTexture[i] = texture];
}
}
[encoder endEncoding];
[buffer commit];
}
// Rest of code to initialize application (omitted)
}
In this case, the command will fail if the size of the actual Content file is less than 4097 bytes, assuming a 4096 page size. What is the most strange is that neither the mmap() nor the newBufferWithBytesNoCopy fails in this case, but the GPU execution fails so badly that any/all subsequent GPU calls also fail.
Is there a way to cause predictable behavior? I thought that mmap() space beyond the file was just valid 0 memory. Why is this apparently not the case if the space is being used by the GPU? At the very least, how can I detect GPU execution errors or invalid buffers like this to handle them gracefully, besides manually checking if the file is too small? Am I using these functions incorrectly somehow? Is something here undefined behavior?
My research efforts including Google searching for terms such as newBufferWithBytesNoCopy and/or mmap together with 536870211, and got absolutely no results. Now, this question is the only result for such searches.
My guess is this problem has to do with the inner workings of the GPU and/or the MTLBuffer implementation and/or mmap() and its underlying facilities. Not having access to these inner workings, I have no idea how to even start figuring out a solution. I would appreciate an expert to enlighten me as to what is actually going on behind the scenes causing this error, and how to avoid it (besides manually checking if the file is too big, as this is a 'workaround' but does not really fix the problem at its base, or at the very least how to gracefully detect GPU crashes of this type and abort the application gracefully.
I'm parsing midi messages inside an autoreleasepool. After determining the message that's being sent, I make calls to the main thread using dispatch_async(dispatch_get_main_queue(). This seems to work well most of the time but occasionally I'll get a crash where it stops on the line
while (semaphore_wait(midiReceivedSemaphore) == KERN_SUCCESS ) {
The method looks something like this:
#autoreleasepool {
unsigned int maxPacketLength = 0x100;
unsigned char* data = malloc(maxPacketLength);
UInt16 length;
while (semaphore_wait(midiReceivedSemaphore) == KERN_SUCCESS ) {
midi_packet_buffer_next_packet_length(midiPacketBuffer, &length);
if( length > 0) {
length = midi_packet_buffer_read(midiPacketBuffer, data, length);
for (unsigned int packetContentsIndex = 0; packetContentsIndex < length; packetContentsIndex++) {
while (packetContentsIndex < length) {
Byte command = data[packetContentsIndex];
switch (command) {
case 0xA0: //aftertouch
commandLength += 2;
int meterChannel = data[packetContentsIndex +1];
int meterValue = data[packetContentsIndex +2];
if (meterValue < 0x10)
{
dispatch_async(dispatch_get_main_queue(), ^{
[delegate sendLevelMeterPT:meterChannel withValue:meterValue ];
});
}
break;
default:
break;
}
}
}
}
}
free(data);
}
Is this a very bad way of handling the MIDI messages? The reason I used dispatch_async is so that it wouldn't slow down the midi processing thread, before using dispatch_async the method was missing some messages if they were coming in fast. Now it catches all the messages but I get the occasional crash. I'm thinking I might have to rewrite the whole midi processing method but want to know the proper way to go about it. Any advice would be great.
I'm trying to create my own custom assert. However, I would like my assertion to automatically include all of the relevant variables. This seems really basic to me, and I've searched around for about an hour but I can't seem to find a way get access to all the relevant stack frame variables. Does anyone know how to get these variables?
FYI - I don't need to access the variables in the debugger, I need to access them programmatically. I would like to upload them along with the crash report to give me more information about the crash. I also know that I can print them out manually...that is exactly what I'm looking to avoid.
You are basically asking to re-invent a good sized chunk of the debugger.
Without symbols, there isn't anything you can interrogate to figure out the layout of the local frame. Even with symbols, it is quite likely that the optimizer will have stomped on any local variables as the optimizer will re-use stack slots at whim once it determines the variable is no longer needed within the frame.
Note that many crashes won't be able to be caught at all or, if caught, the frame within which they occurred will have long since been destroyed.
Since you mention that you are creating a custom assertion, it sounds like you really aren't looking to introspect crashes as much as dump a snap of the local frame when you programatically detect that things have gone off the rails. While there really isn't a means of automatically reporting on local stack state, you could do something like:
{ ... some function ....
... local variables ...
#define reportblock ^{ ... code that summarizes locals ... ; return summary; }
YourAssert( cond, "cond gone bad. summary: %#", reportblock());
}
Note that the #define ensures that each YourAssert() captures the state at the time of the assertion. Note also that the above might have a potentially significant impact on performance.
Note also that I just made that code up. It seems like it is worthy of investigation, but may prove non-viable for a number of reasons.
If you're willing to use Objective-C++, then this is definitely a possibility, as long as you are also willing to declare your variables differently, and understand that you will only be able to grab your own variables with this method.
Also note that it will increase your stack frame size with extra __stack_ variables, which could cause memory issues (although I doubt it, personally).
It won't work with certain constructs such as for-loops, but for 95% of cases, this should work for what you need:
#include <vector>
struct stack_variable;
static std::vector<const stack_variable *> stack_variables;
struct stack_variable {
void **_value;
const char *_name;
const char *_type;
const char *_file;
const char *_line;
private:
template<typename T>
stack_variable(const T& value, const char *type, const char *name, const char *file, const char *line) : _value((void **) &value), _type(type), _name(name), _file(file), _line(line) {
add(*this);
}
static inline void add(const stack_variable &var) {
stack_variables.push_back(static_cast<const stack_variable *>(&var));
}
static inline void remove(const stack_variable &var) {
for (auto it = stack_variables.begin(); it != stack_variables.end(); it++) {
if ((*it) == &var) {
stack_variables.erase(it);
return;
}
}
}
public:
template<typename T>
static inline stack_variable create(const T& value, const char *type, const char *name, const char *file, const char *line) {
return stack_variable(value, type, name, file, line);
}
~stack_variable() {
remove(*this);
}
void print() const {
// treat the value as a pointer
printf("%s:%s - %s %s = %p\n", _file, _line, _type, _name, *_value);
}
static void dump_vars() {
for (auto var : stack_variables) {
var->print();
}
}
};
#define __LINE_STR(LINE) #LINE
#define _LINE_STR(LINE) __LINE_STR(LINE)
#define LINE_STR _LINE_STR(__LINE__)
#define LOCAL_VAR(type, name, value)\
type name = value;\
stack_variable __stack_ ## name = stack_variable::create<type>(name, #type, #name, __FILE__, LINE_STR);\
(void) __stack_ ## name;
Example:
int temp() {
LOCAL_VAR(int, i_wont_show, 0);
return i_wont_show;
}
int main(){
LOCAL_VAR(long, l, 15);
LOCAL_VAR(int, x, 192);
LOCAL_VAR(short, y, 256);
temp();
l += 10;
stack_variable::dump_vars();
}
Output (note the junk extra bytes for the values smaller than sizeof(void *), there isn't much I can do about that):
/Users/rross/Documents/TestProj/TestProj/main.mm:672 - long l = 0x19
/Users/rross/Documents/TestProj/TestProj/main.mm:673 - int x = 0x5fbff8b8000000c0
/Users/rross/Documents/TestProj/TestProj/main.mm:674 - short y = 0xd000000010100
Threads will royally screw this up, however, so in a multithreaded environment this is (almost) worthless.
I decided to add this as a separate answer, as it uses the same approach as my other one, but this time with an all ObjC code. Unfortunately, you still have to re-declare all of your stack variables, just like before, but hopefully now it will work better with your existing code-base.
StackVariable.h:
#import <Foundation/Foundation.h>
#define LOCAL_VAR(p_type, p_name, p_value)\
p_type p_name = p_value;\
StackVariable *__stack_ ## p_name = [[StackVariable alloc] initWithPointer:&p_name\
size:sizeof(p_type)\
name:#p_name\
type:#p_type\
file:__FILE__\
line:__LINE__];\
(void) __stack_ ## p_name;
#interface StackVariable : NSObject
-(id) initWithPointer:(void *) ptr
size:(size_t) size
name:(const char *) name
type:(const char *) type
file:(const char *) file
line:(const int) line;
+(NSString *) dump;
#end
StackVariable.m:
#import "StackVariable.h"
static NSMutableArray *stackVariables;
#implementation StackVariable {
void *_ptr;
size_t _size;
const char *_name;
const char *_type;
const char *_file;
int _line;
}
-(id) initWithPointer:(void *)ptr size:(size_t)size name:(const char *)name type:(const char *)type file:(const char *)file line:(int)line
{
if (self = [super init]) {
if (stackVariables == nil) {
stackVariables = [NSMutableArray new];
}
_ptr = ptr;
_size = size;
_name = name;
_type = type;
_file = file;
_line = line;
[stackVariables addObject:[NSValue valueWithNonretainedObject:self]];
}
return self;
}
-(NSString *) description {
NSMutableString *result = [NSMutableString stringWithFormat:#"%s:%d - %s %s = { ", _file, _line, _type, _name];
const uint8_t *bytes = (const uint8 *) _ptr;
for (size_t i = 0; i < _size; i++) {
[result appendFormat:#"%02x ", bytes[i]];
}
[result appendString:#"}"];
return result;
}
+(NSString *) dump {
NSMutableString *result = [NSMutableString new];
for (NSValue *value in stackVariables) {
__weak StackVariable *var = [value nonretainedObjectValue];
[result appendString:[var description]];
[result appendString:#"\n"];
}
return result;
}
-(void) dealloc {
[stackVariables removeObject:[NSValue valueWithNonretainedObject:self]];
}
#end
Example:
#include "StackVariable.h"
int temp() {
LOCAL_VAR(int, i_wont_show, 0);
return i_wont_show;
}
int main(){
LOCAL_VAR(long, l, 15);
LOCAL_VAR(int, x, 192);
LOCAL_VAR(short, y, 256);
temp();
l += 10;
puts([[StackVariable dump] UTF8String]);
}
Output:
/Users/rross/Documents/TestProj/TestProj/main.m:676 - long l = { 19 00 00 00 00 00 00 00 }
/Users/rross/Documents/TestProj/TestProj/main.m:677 - int x = { c0 00 00 00 }
/Users/rross/Documents/TestProj/TestProj/main.m:678 - short y = { 00 01 }
This requires ARC (and all of it's magic) enabled for any file you want to test this in, or you will manually have to release the __stack_ variables, which won't be pretty.
However, it now gives you a hex dump of the variable (rather than the weird pointer one), and if you really tried hard enough (using __builtin_types_compatible), it could detect whether the result was an object, and print that.
Once again, threads will mess this up, but a simple way to fix that would be to create a NSDictionary of NSArrays, with a NSThread as the key. Makes it a bit slower, but let's be honest, if you're using this over the C++ version, you aren't going for performance.
I know I can loop through each character of two NSString objects using characterAtIndex: and compare them, but this approach would be very expensive if I use this function frequently.
Is there anything built in for this, or a more efficient way to do it?
The quickest way i can think of is to get a C string from it, and then iterate through the strings.
Just a quick example (fix it to your liking):
const char* myCString = [myNSStringInstance UTF8String];
const char* string2 = [nsstring2 UTF8String];
// Assume same length. You can fix this
for(i = 0; i < strlen(myCString); i++) {
if(myCString[i] != string2[i]) {
// Do something here...
}
}
It's a litte hackish, but you could get the c-string for each and then use pointer indexing. Same basic algorithm as your mentioned idea, but theoretically as efficient as you could reasonably expect a solution to be (just looking at two memory addresses and comparing their contents.
Pseudo code:
char *stringA = [stringA cStringUsingEncoding:NSUTF8StringEncoding];
char *stringB = [stringB cStringUsingEncoding:NSUTF8StringEncoding];
int mismatchIndex = -1;
for(int i = 0; i<shorterStringLength; i++) {
if (stringA[i] != stringB[i]) {
mismatchIndex = i;
break;
}
}
If I have an unsigned char *data pointer and I want to check whether size_t length of the data at that pointer is NULL, what would be the fastest way to do that? In other words, what's the fastest way to make sure a region of memory is blank?
I am implementing in iOS, so you can assume iOS frameworks are available, if that helps. On the other hand, simple C approaches (memcmp and the like) are also OK.
Note, I am not trying to clear the memory, but rather trying to confirm that it is already clear (I am trying to find out whether there is anything at all in some bitmap data, if that helps). For example, I think the following would work, though I have not tried it yet:
- BOOL data:(unsigned char *)data isNullToLength:(size_t)length {
unsigned char tester[length] = {};
memset(tester, 0, length);
if (memcmp(tester, data, length) != 0) {
return NO;
}
return YES;
}
I would rather not create a tester array, though, because the source data may be quite large and I'd rather avoid allocating memory for the test, even temporarily. But I may just being too conservative there.
UPDATE: Some Tests
Thanks to everyone for the great responses below. I decided to create a test app to see how these performed, the answers surprised me, so I thought I'd share them. First I'll show you the version of the algorithms I used (in some cases they differ slightly from those proposed) and then I'll share some results from the field.
The Tests
First I created some sample data:
size_t length = 1024 * 768;
unsigned char *data = (unsigned char *)calloc(sizeof(unsigned char), (unsigned long)length);
int i;
int count;
long check;
int loop = 5000;
Each test consisted of a loop run loop times. During the loop some random data was added to and removed from the data byte stream. Note that half the time there was actually no data added, so half the time the test should not find any non-zero data. Note the testZeros call is a placeholder for calls to the test routines below. A timer was started before the loop and stopped after the loop.
count = 0;
for (i=0; i<loop; i++) {
int r = random() % length;
if (random() % 2) { data[r] = 1; }
if (! testZeros(data, length)) {
count++;
}
data[r] = 0;
}
Test A: nullToLength. This was more or less my original formulation above, debugged and simplified a bit.
- (BOOL)data:(void *)data isNullToLength:(size_t)length {
void *tester = (void *)calloc(sizeof(void), (unsigned long)length);
int test = memcmp(tester, data, length);
free(tester);
return (! test);
}
Test B: allZero. Proposal by Carrotman.
BOOL allZero (unsigned char *data, size_t length) {
bool allZero = true;
for (int i = 0; i < length; i++){
if (*data++){
allZero = false;
break;
}
}
return allZero;
}
Test C: is_all_zero. Proposed by Lundin.
BOOL is_all_zero (unsigned char *data, size_t length)
{
BOOL result = TRUE;
unsigned char* end = data + length;
unsigned char* i;
for(i=data; i<end; i++) {
if(*i > 0) {
result = FALSE;
break;
}
}
return result;
}
Test D: sumArray. This is the top answer from the nearly duplicate question, proposed by vladr.
BOOL sumArray (unsigned char *data, size_t length) {
int sum = 0;
for (int i = 0; i < length; ++i) {
sum |= data[i];
}
return (sum == 0);
}
Test E: lulz. Proposed by Steve Jessop.
BOOL lulz (unsigned char *data, size_t length) {
if (length == 0) return 1;
if (*data) return 0;
return memcmp(data, data+1, length-1) == 0;
}
Test F: NSData. This is a test using NSData object I discovered in the iOS SDK while working on all of these. It turns out Apple does have an idea of how to compare byte streams that is designed to be hardware independent.
- (BOOL)nsdTestData: (NSData *)nsdData length: (NSUInteger)length {
void *tester = (void *)calloc(sizeof(void), (unsigned long)length);
NSData *nsdTester = [NSData dataWithBytesNoCopy:tester length:(NSUInteger)length freeWhenDone:NO];
int test = [nsdData isEqualToData:nsdTester];
free(tester);
return (test);
}
Results
So how did these approaches compare? Here are two sets of data, each representing 5000 loops through the check. First I tried this on the iPhone Simulator running on a relatively old iMac, then I tried this running on a first generation iPad.
On the iPhone 4.3 Simulator running on an iMac:
// Test A, nullToLength: 0.727 seconds
// Test F, NSData: 0.727
// Test E, lulz: 0.735
// Test C, is_all_zero: 7.340
// Test B, allZero: 8.736
// Test D, sumArray: 13.995
On a first generation iPad:
// Test A, nullToLength: 21.770 seconds
// Test F, NSData: 22.184
// Test E, lulz: 26.036
// Test C, is_all_zero: 54.747
// Test B, allZero: 63.185
// Test D, sumArray: 84.014
These are just two samples, I ran the test many times with only slightly varying results. The order of performance was always the same: A & F very close, E just behind, C, B, and D. I'd say that A, F, and E are virtual ties, on iOS I'd prefer F because it takes advantage of Apple's protection from processor change issues, but A & E are very close. The memcmp approach clearly wins over the simple loop approach, close to ten times faster in the simulator and twice as fast on the device itself. Oddly enough, D, the winning answer from the other thread performed very poorly in this test, probably because it does not break out of the loop when it hits the first difference.
I think you should do it with an explicit loop, but just for lulz:
if (length == 0) return 1;
if (*pdata) return 0;
return memcmp(pdata, pdata+1, length-1) == 0;
Unlike memcpy, memcmp does not require that the two data sections don't overlap.
It may well be slower than the loop, though, because the un-alignedness of the input pointers means there probably isn't much the implementation of memcmp can do to optimize, plus it's comparing memory with memory rather than memory with a constant. Easy enough to profile it and find out.
Not sure if it's the best, but I probably would do something like this:
bool allZero = true;
for (int i = 0; i < size_t; i++){
if (*data++){
//Roll back so data points to the non-zero char
data--;
//Do whatever is needed if it isn't zero.
allZero = false;
break;
}
}
If you've just allocated this memory, you can always call calloc rather than malloc (calloc requires that all the data is zeroed out). (Edit: reading your comment on the first post, you don't really need this. I'll just leave it just in case)
If you're allocating the memory yourself, I'd suggest using the calloc() function. It's just like malloc(), except it zeros out the buffer first. It's what's used to allocate memory for Objective-C objects and is the reason that all ivars default to 0.
On the other hand, if this is a statically declared buffer, or a buffer you're not allocating yourself, memset() is the easy way to do this.
Logic to get a value, check it, and set it will be at least as expensive as just setting it. You want it to be null, so just set it to null using memset().
This would be the preferred way to do it in C:
BOOL is_all_zero (const unsigned char* data, size_t length)
{
BOOL result = TRUE;
const unsigned char* end = data + length;
const unsigned char* i;
for(i=data; i<end; i++)
{
if(*i > 0)
{
result = FALSE;
break;
}
}
return result;
}
(Though note that strictly and formally speaking, a memory cell containing a NULL pointer mustn't necessarily be 0, as long as a null pointer cast results in the value zero, and a cast of a zero to a pointer results in a NULL pointer. In practice, this shouldn't matter as all known compilers use 0 or (void*) 0 for NULL.)
Note the edit to the initial question above. I did some tests and it is clear that the memcmp approach or using Apple's NSData object and its isEqualToData: method are the best approaches for speed. The simple loops are clearer to me, but slower on the device.