Elegant Algorithm for Parsing Data Stream Into Record

Elegant Algorithm for Parsing Data Stream Into Record - objective-c

I am interfacing with a hardware device that streams data to my app over Wifi. The data is streaming in just fine. The data contains a character header (DATA:) that indicates a new record has begun. The issues is that the data I receive doesn't necessarily fall on the header boundary, so I have to capture the data until what I've captured contains the header. Then, everything that precedes the header goes into the previous record and everything that comes after it goes into a new record. I have this working, but wondered if anyone has done this before and has a good computer-sciencey way to solve the problem.
Here's what I do:
Convert the NSData of the current read to an NSString
Append the NSString to a placeholder string
Check placeholder string for the header (DATA:). If the header is not there, just wait for the next read.
If the header exists, append whatever precedes it to a previous record placeholder and hand that placeholder off to an array as a complete record that I can further parse into fields.
Take whatever shows up after the header and place it in the record placeholder so that it can be appended to in the next read. Repeat steps 3 - 5.
Let me know if you see any flaws with this or have a suggestion for a better way.
Seems there should be some design pattern for this, but I can't think of one.
Thanks.
UPDATE: Here is a little bit of code:
uint8_t buf[1024];
unsigned int len = 0;
len = [(NSInputStream *)stream read:buf maxLength:1024];
if(len) {
[data appendBytes:(const void *)buf length:len];
int bytesRead;
bytesRead += len;
} else {
NSLog(#"No data.");
}
How would this code be changed then to implement a finite state machine?

That seems pretty much how I'd do it. The only thing I might do differently is write an NSData category that does the linear search of DATA: for me, just to save the overhead of converting it to a string. It wouldn't be that hard to do, either. Something like:
#interface NSData (Search)
- (NSRange) rangeOfData:(NSData *)aData;
#end
#implementation NSData (Search)
- (NSRange) rangeOfData:(NSData *)aData {
const void * bytes = [self bytes];
NSUInteger length = [self length];
const void * searchBytes = [aData bytes];
NSUInteger searchLength = [aData length];
NSUInteger searchIndex = 0;
NSRange foundRange = {NSNotFound, searchLength};
for (NSUInteger index = 0; index < length; index++) {
if (bytes[index] == searchBytes[searchIndex]) {
//the current character matches
if (foundRange.location == NSNotFound) {
foundRange.location = index;
}
searchIndex++;
if (searchIndex >= searchLength) { return foundRange; }
} else {
searchIndex = 0;
foundRange.location = NSNotFound;
}
}
return foundRange;
}
#end
Then you can just use:
NSData * searchData = [#"DATA:" dataUsingEncoding:NSUTF8StringEncoding];
while(receivingData) {
if ([receivedData rangeOfData:searchData].location != NSNotFound) {
//WOOO!
}
}
(warning: typed in a browser)

This is a classic finite state machine problem. A lot of data protocols that are stream based can be described with a finite state machine.
Basically you have a state, and transition. Boost has a finite state machine library, but it could be overkill. You can implement it as a switch.
while(stream.hasData) {
char nextInput = stream.get();
switch(currentState) {
case D: {
if(nextInput == A)
currentState = A;
else
currentState = D; //die
} case A: {
//Same for A
}
}
}
Requested elaboration:
Basically look at the diagram below...it's a finite state machine. At any given time the machine is in exactly one state. Every time a character is input into the state machine a transition is taken, and the current state moves. (possibly back into the same state). So all you have to do is model your networked data as a finite state machine then implement that machine. There are libraries that lay it out for you, then all you have to do is implement exactly what happens on each transition. For you that you probably mean interpreting or saving the byte of data. The interpretation depends on what transition. The transition depends on the current state and the current input. Here is an example FSM.
alt text http://www.freeimagehosting.net/uploads/b1706f2a8d.png
Note that if the characters DATA: are entered the state moves to the last circle. Any other sequence will keep the state in one of first 5 states. (top row) You can also have splits. So the FSM can make decisions, so if you get a sequence like DATA2: then you can branch off of that machine into the data2: part and interpret differently in a totally different part of the machine.

Related

Sudoku Backtracking Algorithm Failure

I'm trying to generate a sudoku board, and although I can generate a solution, I now need to remove squares that a user can then fill in. In order to do this, I'd like to use backtracking to check each time that I remove a square, the board is
1. still solvable and 2. has only one solution.
The Problem
When I test my backtracking algorithm on this board (where the zeroes are empty squares), it returns this solution. Obviously I would prefer not to end up with several 9s in the first row, for example.
My Code
- (BOOL) solveArray: (NSArray*) numArray {
NSMutableArray* board = [numArray mutableCopy];
for (int i=0; i<9; i++) { //make all of the arrays mutable as well (since it's 2D)
[board replaceObjectAtIndex:i withObject:[board[i] mutableCopy]];
}
//if everything is filled in, it's done
if (![self findUnassignedLocation:board]) {
NSLog(#"\n%#", [SudokuBoard sudokuBoardWithArray:board]);
return TRUE;
}
NSArray* poss = [[SudokuBoard sudokuBoardWithArray:board] possibleNumbersForRow:self.arow Col:self.acol];
//if there are no options for a location, this didn't work.
if ([poss count] == 0) {
return FALSE;
}
//otherwise, continue recursively until we find a solution
else {
for (int i=0; i<[poss count]; i++) {
//make a tentative assignment
board[self.arow][self.acol] = poss[i];
//return, if successful, done
if ([self solveArray:board]) {
return TRUE;
}
//if that didn't work, unmake it and retry with other options
board[self.arow][self.acol] = [NSNumber numberWithInt:0];
}
}
return FALSE;
}
Any thoughts on where I might be going wrong?

Each level of recursion needs its own row and column variables. That is, row and column should be inputs to solveArray and outputs of findUnassignedLocation instead of being member variables. As it is, when there is backtracking the row and column of the failed level get reused by the caller.
Given that some assigned locations are being overwritten, maybe findUnassignedLocation also contains an error.
Given that the result is invalid, maybe possibleNumbersForRow also contains an error.

Converting a string variable from Binary to Decimal in Objective C

Im trying to create a Binary to Decimal calculator and I am having trouble doing any sort of conversion that will actually work. First off Id like to introduce myself as a complete novice to objective c and to programming in general. As a result many concepts will appear difficult to me, so I am mostly looking for the easiest way to understand and not the most efficient way of doing this.
I have at the moment a calculator that will accept input and display this in a label. This part is working fine and I have no issues with it. The variable that the input is stored on is _display = [[NSMutableString stringWithCapacity:20] retain];
this is working perfectly and I am able to modify the data accordingly. What I would like to do is to be able to display an NSString of the conversion in another label. At the moment I have tried a few solutions and have not had any decent results, this is the latest attempt
- (NSMutableString *)displayValue2:(long long)element
{
_str= [[NSMutableString alloc] initWithString:#""];
if(element > 0){
for(NSInteger numberCopy = element; numberCopy > 0; numberCopy >>= 1)
{
[_str insertString:((numberCopy & 1) ? #"1" : #"0") atIndex:0];
}
}
else if(element == 0)
{
[_str insertString:#"0" atIndex:0];
}
else
{
element = element * (-1);
_str = [self displayValue2:element];
[_str insertString:#"0" atIndex:0];
NSLog(#"Prima for: %#",_str);
for(int i=0; i<[_str length];i++)
_str = _display;
NSLog(#"Dopo for: %#",_str);
}
return _str;
}
Within my View Controller I have a convert button setup, when this is pressed I want to set the second display field to the decimal equivalent. This is working as if I set displayValue2 to return a string of my choosing it works. All I need is help getting this conversion to work. At the moment this bit of code has led to "incomplete implementation" being displayed at the to of my class. Please help, and cheers to those who take time out to help.

So basically all you are really looking for is a way to convert binary numbers into decimal numbers, correct? Another way to think of this problem is changing a number's base from base 2 to base 10. I have used functions like this before in my projects:
+ (NSNumber *)convertBinaryStringToDecimalNumber:(NSString *)binaryString {
NSUInteger totalValue = 0;
for (int i = 0; i < binaryString.length; i++) {
totalValue += (int)([binaryString characterAtIndex:(binaryString.length - 1 - i)] - 48) * pow(2, i);
}
return #(totalValue);
}
Obviously this is accessing the binary as a string representation. This works well since you can easily access each value over a number which is more difficult. You could also easily change the return type from an NSNumber to some string literal. This also works for your element == 0 scenario.
// original number wrapped as a string
NSString *stringValue = [NSString stringWithFormat:#"%d", 11001];
// convert the value and get an NSNumber back
NSNumber *result = [self.class convertBinaryStringToDecinalNumber:stringValue];
// prints 25
NSLog(#"%#", result);
If I misunderstood something please clarify, if you do not understand the code let me know. Also, this may not be the most efficient but it is simple and clean.
I also strongly agree with Hot Licks comment. If you are truly interested in learning well and want to be an developed programmer there are a few basics you should be learning first (I learned with Java and am glad that I did).

Implementing path compression in a disjoint set data structure?

Here is my Objective-C implementation of a disjoint set.
- Positive number point to parent.
- Negative number indicate root & children count. (So they each start disjointed at -1.)
- The index acts as the data I am grouping.
Seems to work ok... just had a couple questions.
find: How can I compress the path? Because I am not doing it recursively, do I have to store the path and loop it again to set after find root?
join: I am basing join on children count instead of depth!? I guess that is not right. Do I need to do something special during join if depths equal?
Thanks.
DisjointSet.h
#interface DisjointSet : NSObject
{
NSMutableArray *_array;
}
- (id)initWithSize:(NSInteger)size;
- (NSInteger)find:(NSInteger)item;
- (void)join:(NSInteger)root1 root2:(NSInteger)root2;
#end
DisjointSet.m
#import "DisjointSet.h"
#implementation DisjointSet
- (id)initWithSize:(NSInteger)size
{
self = [super init];
if (self)
{
_array = [NSMutableArray arrayWithCapacity:size];
for (NSInteger i = 0; i < size; i++)
{
[_array addObject:[NSNumber numberWithInteger:-1]];
}
}
return self;
}
- (NSInteger)find:(NSInteger)item
{
while ([[_array objectAtIndex:item] integerValue] >= 0)
{
item = [[_array objectAtIndex:item] integerValue];
}
return item;
}
- (void)join:(NSInteger)root1 root2:(NSInteger)root2
{
if (root1 == root2) return;
NSInteger data1 = [[_array objectAtIndex:root1] integerValue];
NSInteger data2 = [[_array objectAtIndex:root2] integerValue];
if (data2 < data1)
{
[_array setObject:[NSNumber numberWithInteger:data2 + data1] atIndexedSubscript:root2];
[_array setObject:[NSNumber numberWithInteger:root2] atIndexedSubscript:root1];
}
else
{
[_array setObject:[NSNumber numberWithInteger:data1 + data2] atIndexedSubscript:root1];
[_array setObject:[NSNumber numberWithInteger:root1] atIndexedSubscript:root2];
}
}
#end

For the find operation, there is no need to store the path (separately from your _array) or to use recursion. Either of those approaches requires O(P) storage (P = path length). Instead, you can just traverse the path twice. The first time, you find the root. The second time, you set all of the children to point to the root. This takes O(P) time and O(1) storage.
- (NSInteger)findItem:(NSInteger)item {
NSInteger root;
NSNumber *rootObject = nil;
for (NSInteger i = item; !rootObject; ) {
NSInteger parent = [_array[i] integerValue];
if (parent < 0) {
root = i;
rootObject = #(i);
}
i = parent;
}
for (NSInteger i = item; i != root; ) {
NSInteger parent = [_array[i] integerValue];
_array[i] = rootObject;
i = parent;
}
return root;
}
For the merge operation, you want to store each root's rank (which is an upper bound on its depth), not each root's descendant count. Storing each root's rank allows you to merge the shorter tree into the taller tree, which guarantees O(log N) time for find operations. The rank only increases when the trees to be merged have equal rank.
- (void)joinItem:(NSInteger)a item:(NSInteger)b {
NSInteger aRank = -[_array[a] integerValue];
NSInteger bRank = -[_array[b] integerValue];
if (aRank < bRank) {
NSInteger t = a;
a = b;
b = t;
} else if (aRank == bRank) {
_array[a] = #(-aRank - 1);
}
_array[b] = #(a);
}

You definitely should implement path compression using recursion. I would not even think about trying to do it non-recursively.
Implementing the disjoin-set datastructure should be very easy, and can be done in few lines. Its very, very easy to translate it from the pseudocode to any programming language. You can find the pseudocode on Wikipedia. (Unfortunately, I can't read Objective-C, so I cannot really judge wether your code is correct or not).

Yes. To implement highest ancestor compression without recursion you need to maintain your own list. Make one pass up the chain to get pointers to the sets that need their parent pointers changed and also to learn the root. Then make a second pass to update the necessary parent pointers.
The recursive method is doing the same thing. The first pass is the "winding up" of the recursion, which stores the sets needing parent pointer updates on the program stack. The second pass is in reverse as the recursion unwinds.
I differ with those who say the recursive method is always best. In a reasonable number systems (especially embedded ones), the program stack is of limited size. There are cases where many unions are performed in a row before a find. In such cases, the parent chain can be O(n) in size for n elements. Here collapsing by recursion can blow out the stack. Since you are working in Objective C, this may be iOS. I do not know the default stack size there, but if you use recursion it's worth looking at. It might be smaller than you think. This article implies 512K for secondary threads and 1Mb for the main thread.
Iterative, constant space alternative
Actually the main reason I'm writing is to point out that you still get O(log^* n) for n ammortized operations -- just a shade less efficient than collapsing, and still effectively O(1) -- if you only do factor-of-two compression: in the find operation, change parent pointers so that they point to the grandparents instead instead of the root. This can be done with iteration in constant storage. This lecture at Princeton talks about this algorithm and implements it in a loop with 5 lines of C. See slide 29.

Arbitrary precision bit manipulation (Objective C)

I need to do bit operations on representations of arbitrary precision numbers in Objective C. So far I have been using NSData objects to hold the numbers - is there a way to bit shift the content of those? If not, is there a different way to achieve this?

Using NSMutableData you can fetch the byte in a char, shift your bits and replace it with -replaceBytesInRange:withBytes:.
I don't see any other solution except for writing your own date holder class using a char * buffer to hold the raw data.

As you'll have spotted, Apple doesn't provide arbitrary precision support. Nothing is provided larger than the 1024-bit integers in vecLib.
I also don't think NSData provides shifts and rolls. So you're going to have to roll your own. E.g. a very naive version, which may have some small errors as I'm typing it directly here:
#interface NSData (Shifts)
- (NSData *)dataByShiftingLeft:(NSUInteger)bitCount
{
// we'll work byte by byte
int wholeBytes = bitCount >> 3;
int extraBits = bitCount&7;
NSMutableData *newData = [NSMutableData dataWithLength:self.length + wholeBytes + (extraBits ? 1 : 0)];
if(extraBits)
{
uint8_t *sourceBytes = [self bytes];
uint8_t *destinationBytes = [newData mutableBytes];
for(int index = 0; index < self.length-1; index++)
{
destinationBytes[index] =
(sourceBytes[index] >> (8-extraBits)) |
(sourceBytes[index+1] << extraBits);
}
destinationBytes[index] = roll >> (8-extraBits);
}
else
/* just copy all of self into the beginning of newData */
return newData;
}
#end
Of course, that assumes the number of bits you want to shift by is itself expressible as an NSUInteger, amongst other sins.

Quickest way to be sure region of memory is blank (all NULL)?

If I have an unsigned char *data pointer and I want to check whether size_t length of the data at that pointer is NULL, what would be the fastest way to do that? In other words, what's the fastest way to make sure a region of memory is blank?
I am implementing in iOS, so you can assume iOS frameworks are available, if that helps. On the other hand, simple C approaches (memcmp and the like) are also OK.
Note, I am not trying to clear the memory, but rather trying to confirm that it is already clear (I am trying to find out whether there is anything at all in some bitmap data, if that helps). For example, I think the following would work, though I have not tried it yet:
- BOOL data:(unsigned char *)data isNullToLength:(size_t)length {
unsigned char tester[length] = {};
memset(tester, 0, length);
if (memcmp(tester, data, length) != 0) {
return NO;
}
return YES;
}
I would rather not create a tester array, though, because the source data may be quite large and I'd rather avoid allocating memory for the test, even temporarily. But I may just being too conservative there.
UPDATE: Some Tests
Thanks to everyone for the great responses below. I decided to create a test app to see how these performed, the answers surprised me, so I thought I'd share them. First I'll show you the version of the algorithms I used (in some cases they differ slightly from those proposed) and then I'll share some results from the field.
The Tests
First I created some sample data:
size_t length = 1024 * 768;
unsigned char *data = (unsigned char *)calloc(sizeof(unsigned char), (unsigned long)length);
int i;
int count;
long check;
int loop = 5000;
Each test consisted of a loop run loop times. During the loop some random data was added to and removed from the data byte stream. Note that half the time there was actually no data added, so half the time the test should not find any non-zero data. Note the testZeros call is a placeholder for calls to the test routines below. A timer was started before the loop and stopped after the loop.
count = 0;
for (i=0; i<loop; i++) {
int r = random() % length;
if (random() % 2) { data[r] = 1; }
if (! testZeros(data, length)) {
count++;
}
data[r] = 0;
}
Test A: nullToLength. This was more or less my original formulation above, debugged and simplified a bit.
- (BOOL)data:(void *)data isNullToLength:(size_t)length {
void *tester = (void *)calloc(sizeof(void), (unsigned long)length);
int test = memcmp(tester, data, length);
free(tester);
return (! test);
}
Test B: allZero. Proposal by Carrotman.
BOOL allZero (unsigned char *data, size_t length) {
bool allZero = true;
for (int i = 0; i < length; i++){
if (*data++){
allZero = false;
break;
}
}
return allZero;
}
Test C: is_all_zero. Proposed by Lundin.
BOOL is_all_zero (unsigned char *data, size_t length)
{
BOOL result = TRUE;
unsigned char* end = data + length;
unsigned char* i;
for(i=data; i<end; i++) {
if(*i > 0) {
result = FALSE;
break;
}
}
return result;
}
Test D: sumArray. This is the top answer from the nearly duplicate question, proposed by vladr.
BOOL sumArray (unsigned char *data, size_t length) {
int sum = 0;
for (int i = 0; i < length; ++i) {
sum |= data[i];
}
return (sum == 0);
}
Test E: lulz. Proposed by Steve Jessop.
BOOL lulz (unsigned char *data, size_t length) {
if (length == 0) return 1;
if (*data) return 0;
return memcmp(data, data+1, length-1) == 0;
}
Test F: NSData. This is a test using NSData object I discovered in the iOS SDK while working on all of these. It turns out Apple does have an idea of how to compare byte streams that is designed to be hardware independent.
- (BOOL)nsdTestData: (NSData *)nsdData length: (NSUInteger)length {
void *tester = (void *)calloc(sizeof(void), (unsigned long)length);
NSData *nsdTester = [NSData dataWithBytesNoCopy:tester length:(NSUInteger)length freeWhenDone:NO];
int test = [nsdData isEqualToData:nsdTester];
free(tester);
return (test);
}
Results
So how did these approaches compare? Here are two sets of data, each representing 5000 loops through the check. First I tried this on the iPhone Simulator running on a relatively old iMac, then I tried this running on a first generation iPad.
On the iPhone 4.3 Simulator running on an iMac:
// Test A, nullToLength: 0.727 seconds
// Test F, NSData: 0.727
// Test E, lulz: 0.735
// Test C, is_all_zero: 7.340
// Test B, allZero: 8.736
// Test D, sumArray: 13.995
On a first generation iPad:
// Test A, nullToLength: 21.770 seconds
// Test F, NSData: 22.184
// Test E, lulz: 26.036
// Test C, is_all_zero: 54.747
// Test B, allZero: 63.185
// Test D, sumArray: 84.014
These are just two samples, I ran the test many times with only slightly varying results. The order of performance was always the same: A & F very close, E just behind, C, B, and D. I'd say that A, F, and E are virtual ties, on iOS I'd prefer F because it takes advantage of Apple's protection from processor change issues, but A & E are very close. The memcmp approach clearly wins over the simple loop approach, close to ten times faster in the simulator and twice as fast on the device itself. Oddly enough, D, the winning answer from the other thread performed very poorly in this test, probably because it does not break out of the loop when it hits the first difference.

I think you should do it with an explicit loop, but just for lulz:
if (length == 0) return 1;
if (*pdata) return 0;
return memcmp(pdata, pdata+1, length-1) == 0;
Unlike memcpy, memcmp does not require that the two data sections don't overlap.
It may well be slower than the loop, though, because the un-alignedness of the input pointers means there probably isn't much the implementation of memcmp can do to optimize, plus it's comparing memory with memory rather than memory with a constant. Easy enough to profile it and find out.

Not sure if it's the best, but I probably would do something like this:
bool allZero = true;
for (int i = 0; i < size_t; i++){
if (*data++){
//Roll back so data points to the non-zero char
data--;
//Do whatever is needed if it isn't zero.
allZero = false;
break;
}
}
If you've just allocated this memory, you can always call calloc rather than malloc (calloc requires that all the data is zeroed out). (Edit: reading your comment on the first post, you don't really need this. I'll just leave it just in case)

If you're allocating the memory yourself, I'd suggest using the calloc() function. It's just like malloc(), except it zeros out the buffer first. It's what's used to allocate memory for Objective-C objects and is the reason that all ivars default to 0.
On the other hand, if this is a statically declared buffer, or a buffer you're not allocating yourself, memset() is the easy way to do this.

Logic to get a value, check it, and set it will be at least as expensive as just setting it. You want it to be null, so just set it to null using memset().

This would be the preferred way to do it in C:
BOOL is_all_zero (const unsigned char* data, size_t length)
{
BOOL result = TRUE;
const unsigned char* end = data + length;
const unsigned char* i;
for(i=data; i<end; i++)
{
if(*i > 0)
{
result = FALSE;
break;
}
}
return result;
}
(Though note that strictly and formally speaking, a memory cell containing a NULL pointer mustn't necessarily be 0, as long as a null pointer cast results in the value zero, and a cast of a zero to a pointer results in a NULL pointer. In practice, this shouldn't matter as all known compilers use 0 or (void*) 0 for NULL.)

Note the edit to the initial question above. I did some tests and it is clear that the memcmp approach or using Apple's NSData object and its isEqualToData: method are the best approaches for speed. The simple loops are clearer to me, but slower on the device.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas