Parse CSV with double quote in some cases - objective-c

I have csv that comes with format:
a1, a2, a3, "a4,a5", a6
Only field with , will have quotes
Using Objective-C, how to easily parse this? I try to avoid using open source CSV parser as company policy. Thanks.

I agree with rmaddy that a full csv parsing algorithm is beyond the scope of SO, however, here is one possible way of tackling this:
Extract the whole line into an NSString
Iterate over the NSString, pushing each character back into another string.
When a comma is encountered, save the stored string, ignore the comma and create a new blank string.
Repeat until end of line.
Set a flag to identify whether double quotes have been found. If the flag is set, ignore step 3. When a second set of quotes is found, unset the flag and continue as before.
This is generally applicable to any language (using their respective native string classes) and such an algorithm can form a small basis for a full CSV parser. In this particular case however, you may not need any more functionality than this.
For some sample code, I would encourage you to have a look at my answer to this CSV-related question as it demonstrates a way of splitting and storing strings in Objective-C.

This snippet worked perfectly for me...
BOOL quotesOn = false;
NSString* line = #"a1, a2, a3, "a4,a5", a6";
NSMutableArray* lineParts = [[NSMutableArray alloc] init];
NSMutableString* linePart = [[NSMutableString alloc] init];
for (int i = 0; i < line.length; i++)
{
unichar current = [line characterAtIndex: i];
if (current == '"')
{
quotesOn = !quotesOn;
continue;
}
if (!quotesOn && current == ',')
{
if (linePart.length > 0)
[lineParts addObject: linePart];
linePart = [[NSMutableString alloc] init];
}
if (quotesOn || current != ',')
[linePart appendString: [line substringWithRange: NSMakeRange(i, 1)]];
}
if (linePart.length > 0)
[lineParts addObject: linePart];
My 5 elements are in the lineParts array...

Related

Parsing arithmetic expression for long numbers that need formatting

I am trying to make a simple calculator app. Currently, the app works perfectly. One problem: It's smart enough to change results into formatted numbers (800000 = 800,000), but not full expressions (200*600/21000 = 200*600/21,000).
I would like to be able to have a method that I could feed a string and get back a string of properly formatted numbers with operations still inside the string.
Example:
I feed the method 30000/80^2. Method gives back 30,000/80^2.
EDIT: People seem to be misunderstanding the question (Or it's possible I am misunderstanding the answers!) I want to be able to separate the numbers - 60000/200000 would separate into 60000 & 200000. I can do it from there.
Well, what's the problem? You obviously can parse the whole expression (you say calculator works), you can format single numbers (you say you can format results).
The only thing you need is to parse the expression, format all the numbers and recompose the expression...
EDIT: There is a simpler solution. For formatting, you don't need to parse the expression into a tree. You just have to find the numbers.
I suggest to create character set of all operators
NSCharacterSet* operators = [NSCharacterSet characterSetWithCharactersInString:#"+*-/^()"];
NSCharacterSet* whitespaces = [NSCharacterSet whitespaceCharacterSet];
Then split the expression using this set:
NSString* expression = [...];
NSMutableString* formattedExpression = [NSMutableString string];
NSRange numberRange = NSMakeRange(0, 0);
for (NSUInteger i = 0; i < expression.length; i++) {
unichar character = [expression characterAtIndex:i];
if ([whitespaces characterIsMember:character] || [operators characterIsMember:character]) {
if (numberRange.length > 0) {
NSString* number = [expression substringWithRange:numberRange];
NSString* formattedNumber = [self formatNumber:number];
[formattedExpression appendString:number];
numberRange.length = 0;
}
}
else if (numberRange.length == 0) {
numberRange.location = i;
numberRange.length = 1;
}
else {
numberRange.length++;
}
if ([operators characterIsMember:character]) {
[formattedExpression appendFormat:#"%C", character];
}
}
if (numberRange.length > 0) {
NSString* number = [expression substringWithRange:numberRange];
NSString* formattedNumber = [self formatNumber:number];
[formattedExpression appendString:number];
}
Note that this should work even for numbers prefixed by a sign. I am ignoring all whitespaces because if you want to have a pretty expression, you probably want to handle whitespaces differently (e.g. no space after (, space before +/-, space after - only if it's not a number sign...). In general, for handling spaces, parsing the expression into a tree would simplify matters. Also note that infix expressions are not unambiguous - that means that you should sometimes add parenthesis. However, that can't be done without parsing into a tree.
Look up NSNumberFormatter. Not only will that handle formatting of numbers, it will do so based on the user's locale.

capitalizedString doesn't capitalize correctly words starting with numbers?

I'm using the NSString method [myString capitalizedString], to capitalize all words of my string.
However capitalization doesn't work very well for words starting with numbers.
i.e. 2nd chance
becomes
2Nd Chance
Even if n is not the first letter of the word.
thanks
You have to roll your own solution to this problem. The Apple docs state that you may not get the specified behavior using that function for multi-word strings and for strings with special characters. Here's a pretty crude solution
NSString *text = #"2nd place is nothing";
// break the string into words by separating on spaces.
NSArray *words = [text componentsSeparatedByString:#" "];
// create a new array to hold the capitalized versions.
NSMutableArray *newWords = [[NSMutableArray alloc]init];
// we want to ignore words starting with numbers.
// This class helps us to determine if a string is a number.
NSNumberFormatter *num = [[NSNumberFormatter alloc]init];
for (NSString *item in words) {
NSString *word = item;
// if the first letter of the word is not a number (numberFromString returns nil)
if ([num numberFromString:[item substringWithRange:NSMakeRange(0, 1)]] == nil) {
word = [item capitalizedString]; // capitalize that word.
}
// if it is a number, don't change the word (this is implied).
[newWords addObject:word]; // add the word to the new list.
}
NSLog(#"%#", [newWords description]);
Unfortunately this seems to be the general behaviour of capitalizedString.
Perhaps a not so nice workaround / hack would be to replace each number with a string before the transformation, and then change it back afterwards.
So, "2nd chance" -> "xyznd chance" -> "Xyznd Chance" -> "2nd Chance"

Is there a way to get Spell Check data from an NSString?

I'm writing a simple shift cipher iPhone app as a pet project, and one piece of functionality I'm currently designing is a "universal" decryption of an NSString, that returns an NSArray, all of NSStrings:
- (NSArray*) decryptString: (NSString*)ciphertext{
NSMutableArray* theDecryptions = [NSMutableArray arrayWithCapacity:ALPHABET];
for (int i = 0; i < ALPHABET; ++i) {
NSString* theNewPlainText = [self decryptString:ciphertext ForShift:i];
[theDecryptions insertObject:theNewPlainText
atIndex:i];
}
return theDecryptions;
}
I'd really like to pass this NSArray into another method that attempts to spell check each individual string within the array, and builds a new array that puts the strings with the fewest typo'd words at lower indicies, so they're displayed first. I'd like to use the system's dictionary like a text field would, so I can match against words that have been trained into the phone by its user.
My current guess is to split a given string up into words, then spell check each with NSSpellChecker's -checkSpellingOfString:StartingAt: and using the number of correct words to sort the Array. Is there an existing library method or well-accepted pattern that would help return such a value for a given string?
Well, I found a solution that works using UIKit/UITextChecker. It correctly finds the user's most preferred language dictionary, but I'm not sure if it includes learned words in the actual rangeOfMisspelledWords... method. If it doesn't, calling [UITextChecker hasLearnedWord] on currentWord inside the bottom if statement should be enough to find user-taught words.
As noted in the comments, it may be prudent to call rangeOfMisspelledWords with each of the top few languages in [UITextChecker availableLanguages], to help multilingual users.
-(void) checkForDefinedWords {
NSArray* words = [message componentsSeparatedByString:#" "];
NSInteger wordsFound = 0;
UITextChecker* checker = [[UITextChecker alloc] init];
//get the first language in the checker's memory- this is the user's
//preferred language.
//TODO: May want to search with every language (or top few) in the array
NSString* preferredLang = [[UITextChecker availableLanguages] objectAtIndex:0];
//for each word in the array, determine whether it is a valid word
for(NSString* currentWord in words){
NSRange range;
range = [checker rangeOfMisspelledWordInString:currentWord
range:NSMakeRange(0, [currentWord length])
startingAt:0
wrap:NO
language:preferredLang];
//if it is valid (no errors found), increment wordsFound
if (range.location == NSNotFound) {
//NSLog(#"%# %#", #"Valid Word found:", currentWord);
wordsFound++;
}
else {
//NSLog(#"%# %#", #"Invalid Word found:", currentWord);
}
}
//After all "words" have been searched, save wordsFound to validWordCount
[self setValidWordCount:wordsFound];
[checker release];
}

get the first letter of each word in a string using objective-c

Example of what I am trying to do:
String = "This is my sentence"
I am looking to get this as a result: "TIMS"
I am struggling with objective-c and strings for some reason
Naïve solution:
NSMutableString * firstCharacters = [NSMutableString string];
NSArray * words = [#"this is my sentence" componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
for (NSString * word in words) {
if ([word length] > 0) {
NSString * firstLetter = [word substringToIndex:1];
[firstCharacters appendString:[firstLetter uppercaseString]];
}
}
Note that this is kinda stupid about breaking up words (just going by spaces, which isn't always the best approach), and it doesn't handle UTF16+ characters.
If you need to handle UTF16+ characters, change the if() statement inside the loop to:
if ([word length] > 0) {
NSString * firstLetter = [word substringWithRange:[word rangeOfComposedCharacterSequenceAtIndex:0]];
[firstCharacters appendString:[firstLetter uppercaseString]];
}
You could always use the method cStringUsingEncoding: and just iterate the const char*. Or better, you could use the method getCharacters:
When you iterate, you just have to do a for loop and check if the previous character is the ' ' character and append it to your temporary variable. If you want it uppercase, just use uppercaseString at the end.
see apple doc for more information:
http://developer.apple.com/mac/library/documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html#//apple_ref/occ/instm/NSString/getCharacters:range:
I also struggle with strings sometime, the function names are not really similar to other languages like c++/java for instance.
The shortest and faster way to enumerate through the string using below code
Swift
let fullWord = "This is my sentence"
var result = ""
fullWord.enumerateSubstrings(in: fullWord.startIndex..<fullWord.endIndex, options: .byWords) { (substring, _, _, _) in
if let substring = substring {
result += substring.prefix(1).capitalized }
}
print(result)
Output
TIMS

Best way to split a string into tokens skipping escaped delimiters?

I'm receiving an NSString which uses commas as delimiters, and a backslash as an escape character. I was looking into splitting the string using componentsSeparatedByString, but I found no way to specify the escape character. Is there a built-in way to do this? NSScanner? CFStringTokenizer?
If not, would it be better to split the string at the commas, and then rejoin tokens that were falsely split (after inspecting them for a (non-escaped) escape character at the end) or looping through each character trying to find a comma, and then looking back one character to see if the comma is escaped or not (and then one more character to see if the escape character is escaped).
Now that I think about it, I would need to check that the amount of escape characters before a delimiter is even, because only then is the delimiter itself not being escaped.
If someone has a method that does this, I'd appreciate it if I could take a look at it.
I think the most straightforward method to do this would be to go through the string character by character as you suggest, appending into new string objects. You can follow two simple rules:
if you find a backslash, ignore but copy the next character (if exists) unconditionally
if you find a comma, end of that section
You could do this manually or use some of the functionality of NSScanner to help you (scanUpToCharactersFromSet:intoString:)
I would prefer to use a regular expression based parser to weed out the escape characters and then possibly doing a split operation (of some type) on the string.
Okay, (I hope) this is what wipolar suggested. It's the first implementation that works. I've just started with a non-GC-collected language, so please post a comment if you think this code can be improved, especially in the memory-management department.
- (NSArray *) splitUnescapedCharsFrom: (NSString *) str atChar: (char) delim withEscape: (char) esc
{
NSMutableArray * result = [[NSMutableArray alloc] init];
NSMutableString * currWord = [[NSMutableString alloc] init];
for (int i = 0; i < [str length]; i++)
{
if ([str characterAtIndex:i] == esc)
{
[currWord appendFormat:#"%c", [str characterAtIndex:++i]];
}
else if ([str characterAtIndex:i] == delim)
{
[result addObject:[NSString stringWithString:currWord]];
[currWord release];
currWord = [[NSMutableString alloc] init];
}
else
{
[currWord appendFormat:#"%c", [str characterAtIndex:i]];
}
}
[result addObject:[NSString stringWithString:currWord]];
[currWord release];
return [NSArray arrayWithArray:result];
}