How do I split an NSString by each character in the string? - objective-c

I have the following code, which works as I expect. What I would like to know if there is an accepted Cocoa way of splitting a string into an array with each character of the string as an object in an array?
- (NSString *)doStuffWithString:(NSString *)string {
NSMutableArray *stringBuffer = [NSMutableArray arrayWithCapacity:[string length]];
for (int i = 0; i < [string length]; i++) {
[stringBuffer addObject:[NSString stringWithFormat:#"%C", [string characterAtIndex:i]]];
}
// doing stuff with the array
return [stringBuffer componentsJoinedByString:#""];
}

As a string is already an array of characters, that seems, ... redundant.

If you really need an NSArray of NSStrings of one character each, I think your way of creating it is OK.
But it appears questionable that your purpose cannot be done in a more readable, safe (and performance-optimized) way. One thing especially seem dangerous to me: Splitting strings into unicode characters is (most of the time) not doing what you might expect. There are characters that are composed of more than one unicode code point. and there are unicode code points that really are more than one character. Unless you know about these (or can guarantee that your input does not contain arbitrary strings) you shouldn’t poke around in unicode strings on the character level.

Related

Replacing character within cstring - getting bad access

Is it possible to replace a character from a c string after converting it from NSString via the UTF8string method?
For example take the code below. It is to format a string with particular rule.
- (NSString *)formatString:(NSString *)input {
if (input.length==0) {
return #"";
}
//code to determine rule
....
....
// substitute output format with input characters
if (rule) {
input = [input substringFromIndex:prefix.length];
char *string = (char *)[rule UTF8String];
int repCount = 0;
for (int i=0; i<rule.length; i++) {
if (string[i] == '#') {
if (repCount < input.length)
string[i] = [input characterAtIndex:repCount++];//bad access
else
string[i] = ' ';
}
}
NSMutableString *output = [NSMutableString stringWithCString:string encoding:NSUTF8StringEncoding];
...
... //do something with the output
return output;
} else {
return input;
}
}
Initially string[0] has '#' and it should get replaced with the character in the input. This is not happening.
In a word, NO. That buffer doesn't belong to you so leave it alone.
A couple of issues:
You are casting UTF8String, which returns a const char *, to char *. UTF8String is, by definition, returning a read-only string and you should use it as such. (You really should use casts sparingly, if at all. Certainly never use casts to override const qualifiers for variables.)
If you want to perform this C-string manipulation, you have to copy the string to your own buffer. For example, use getCString or getCharacters methods (but only after you've created a buffer to receive them, and remember to add a character for the NULL terminator).
By the way, you're also returning characterAtIndex, which is a unichar (which can be larger than 8-bits), and using it in your char * buffer (8-bits per character). I'd be wary about mixing and matching those without being very careful. It is best to pick one and stick with it (and unichar offers a little more tolerance for those non-8-bit characters).
Perhaps you check for this earlier, but you're setting string to be those characters after the prefix, and then proceed to check the next rule.length number of characters. But, as far as I can tell, you have no assurances that string actually has that many characters left in it. You should test for that, or else that will also cause problems.
Personally, I'd retire this whole C-string algorithm and employ the appropriate NSString and/or NSMutableString methods to do whatever replacement you wanted, e.g. stringByReplacingCharactersInRange, stringByReplacingOccurrencesOfString, or the equivalent NSMutableString methods, replaceCharactersInRange or replaceOccurrencesOfString.

Split NSString into words, then rejoin it into original form

I am splitting an NSString like this: (filter string is an nsstring)
seperatorSet = [NSMutableCharacterSet whitespaceAndNewlineCharacterSet];
[seperatorSet formUnionWithCharacterSet:[NSCharacterSet punctuationCharacterSet]];
NSMutableArray *words = [[filterString componentsSeparatedByCharactersInSet:seperatorSet] mutableCopy];
I want to put words back into the form of filter string with the original punctuation and spacing. The reason I want to do this is I want to change some words and put it back together as it was originally.
A more robust way to split by words is to use string enumeration. A space is not always the delimiter and not all languages delimit spaces anyway (e.g. Japanese).
NSString * string = #" \n word1! word2,%$?'/word3.word4 ";
[string enumerateSubstringsInRange:NSMakeRange(0, string.length)
options:NSStringEnumerationByWords
usingBlock:
^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
NSLog(#"Substring: '%#'", substring);
}];
// Logs:
// Substring: 'word1'
// Substring: 'word2'
// Substring: 'word3'
// Substring: 'word4'
NSString *myString = #"Foo Bar Blah B..";
NSArray *myWords = [myString componentsSeparatedByCharactersInSet:
[NSCharacterSet characterSetWithCharactersInString:#" "]
];
NSString* string = [myWords componentsJoinedByString: #" "];
NSLog(#"%#",string);
Since you eliminate the original punctuation, there's no way to turn it back automatically.
The only way is not to use componentsSeparatedByCharactersInSet.
An alternative solution may be to iterate through the string and, for each char, check if it belongs to your character set.
If yes, add the char to a list and the substring to another list (you may use NSMutableArray class).
This way, for example, you know that the punctuation char between the first and the second substring is the first character in your list of separators.
You can use the pathArray componentsJoinedByString: method of the array class to rejoin the words:
NSString *orig = [words pathArray componentsJoinedByString:#" "];
How are you determining which words need to be replaced? Instead of breaking it apart in the first place, perhaps using -stringByReplacingOccurrencesOfString:withString:options:range: would be more suitable.
My guess is you may not be using the best API. If you're really worried about words, you should be using a word-based API. I'm a bit hazy on whether that would be NSDataDetector or something else. (I believe NSRegularExpression can deal with word boundaries in a smarter way.)
If you are using Mac OS X 10.7+ or iOS 4+ you can use NSRegularExpression, The pattern to replace a word is: "\b word \b" - (no spaces around word) \b matches a word boundary. Look at methods replaceMatchesInString:options:range:withTemplate: and stringByReplacingMatchesInString:options:range:withTemplate:.
Under 10.6 pr earlier if you wish to use regular expressions you can wrap the regcomp/regexec C-based functions, they support word boundaries as well. However you may prefer to use one of the other Cocoa options mentioned in other answers for this simple case.

Random uppercase - lowercase

I'd like to let a string change letters to lowercase or uppercase randomly(in Xcode).
for example: "example" to "ExaMpLe" or "eXAMPle" or ExAmPlE" or something else like this randomly..
hot can i solve this?
thanks
You could either use the -uppercaseString and -lowercaseString methods on substrings, or use the toupper() and tolower() functions on characters. There's no way to simply filter a string; you'll want to use either an NSMutableString or a C array of characters.
See this question for how to get a random boolean value, which you can use to decide whether a character should be uppercase or lowercase.
NSString has both a lowercaseString and uppercaseString method. You can iterate over the characters in a string as a sequence of substrings, using some random source to call the appropriate lower/upper case on each of them, collecting the result. Something like...
NSMutableString result = [NSMutableString string];
for (NSUInteger i = 0; i < [myString length]; i++)
{
NSString *substring = [myString substringWithRange:NSMakeRange(i, 1)];
[result appendString:(rand() % 2) ? [substring lowercaseString]
: [substring uppercaseString]];
}
You may prefer a better source of entropy than rand, but it'll do for an example (don't forget to seed it if you use this case as is). If the strings are large, you can do it in-place on an NSMutableString.
You could break the word into an array of letters, and loop over this using a random number to determining case, after looping the array, simply stick the letters back together using NSMutableString.
NSString had a uppercaseString and lowercaseString methods you can use.

How to get a single NSString character from an NSString

I want to get a character from somewhere inside an NSString. I want the result to be an NSString.
This is the code I use to get a single character at index it:
[[s substringToIndex:i] substringToIndex:1]
Is there a better way to do it?
This will also retrieve a character at index i as an NSString, and you're only using an NSRange struct rather than an extra NSString.
NSString * newString = [s substringWithRange:NSMakeRange(i, 1)];
If you just want to get one character from an a NSString, you can try this.
- (unichar)characterAtIndex:(NSUInteger)index;
Used like so:
NSString *originalString = #"hello";
int index = 2;
NSString *theCharacter = [NSString stringWithFormat:#"%c", [originalString characterAtIndex:index-1]];
//returns "e".
Your suggestion only works for simple characters like ASCII. NSStrings store unicode and if your character is several unichars long then you could end up with gibberish. Use
- (NSRange)rangeOfComposedCharacterSequenceAtIndex:(NSUInteger)index;
if you want to determine how many unichars your character is. I use this to step through my strings to determine where the character borders occur.
Being fully unicode able is a bit of work but depends on what languages you use. I see a lot of asian text so most characters spill over from one space and so it's work that I need to do.
NSMutableString *myString=[NSMutableString stringWithFormat:#"Malayalam"];
NSMutableString *revString=#"";
for (int i=0; i<myString.length; i++) {
revString=[NSMutableString stringWithFormat:#"%c%#",[myString characterAtIndex:i],revString];
}
NSLog(#"%#",revString);

Best way to split a string into tokens skipping escaped delimiters?

I'm receiving an NSString which uses commas as delimiters, and a backslash as an escape character. I was looking into splitting the string using componentsSeparatedByString, but I found no way to specify the escape character. Is there a built-in way to do this? NSScanner? CFStringTokenizer?
If not, would it be better to split the string at the commas, and then rejoin tokens that were falsely split (after inspecting them for a (non-escaped) escape character at the end) or looping through each character trying to find a comma, and then looking back one character to see if the comma is escaped or not (and then one more character to see if the escape character is escaped).
Now that I think about it, I would need to check that the amount of escape characters before a delimiter is even, because only then is the delimiter itself not being escaped.
If someone has a method that does this, I'd appreciate it if I could take a look at it.
I think the most straightforward method to do this would be to go through the string character by character as you suggest, appending into new string objects. You can follow two simple rules:
if you find a backslash, ignore but copy the next character (if exists) unconditionally
if you find a comma, end of that section
You could do this manually or use some of the functionality of NSScanner to help you (scanUpToCharactersFromSet:intoString:)
I would prefer to use a regular expression based parser to weed out the escape characters and then possibly doing a split operation (of some type) on the string.
Okay, (I hope) this is what wipolar suggested. It's the first implementation that works. I've just started with a non-GC-collected language, so please post a comment if you think this code can be improved, especially in the memory-management department.
- (NSArray *) splitUnescapedCharsFrom: (NSString *) str atChar: (char) delim withEscape: (char) esc
{
NSMutableArray * result = [[NSMutableArray alloc] init];
NSMutableString * currWord = [[NSMutableString alloc] init];
for (int i = 0; i < [str length]; i++)
{
if ([str characterAtIndex:i] == esc)
{
[currWord appendFormat:#"%c", [str characterAtIndex:++i]];
}
else if ([str characterAtIndex:i] == delim)
{
[result addObject:[NSString stringWithString:currWord]];
[currWord release];
currWord = [[NSMutableString alloc] init];
}
else
{
[currWord appendFormat:#"%c", [str characterAtIndex:i]];
}
}
[result addObject:[NSString stringWithString:currWord]];
[currWord release];
return [NSArray arrayWithArray:result];
}