How do I split a string with special characters into a NSMutableArray - objective-c

I'am trying to seperate a string with danish characters into a NSMutableArray. But something is not working. :(
My code:
NSString *danishString = #"æøå";
NSMutableArray *characters = [[NSMutableArray alloc] initWithCapacity:[danishString length]];
for (int i=0; i < [danishString length]; i++)
{
NSString *ichar = [NSString stringWithFormat:#"%c", [danishString characterAtIndex:i ]];
[characters addObject:ichar];
}
If I do at NSLog on the danishString it works (returns æøå);
But if I do a NSLog on the characters (the array) I get some very stange characters - What is wrong?
/Morten

First of all, your code is incorrect. characterAtIndex returns unichar, so you should use #"%C"(uppercase) as the format specifier.
Even with the correct format specifier, your code is unsafe, and strictly speaking, still incorrect, because not all unicode characters can be represented by a single unichar. You should always handle unicode strings per substring:
It's common to think of a string as a sequence of characters, but when
working with NSString objects, or with Unicode strings in general, in
most cases it is better to deal with substrings rather than with
individual characters. The reason for this is that what the user
perceives as a character in text may in many cases be represented by
multiple characters in the string.
You should definitely read String Programming Guide.
Finally, the correct code for you:
NSString *danishString = #"æøå";
NSMutableArray *characters = [[NSMutableArray alloc] initWithCapacity:[danishString length]];
[danishString enumerateSubstringsInRange:NSMakeRange(0, danishString.length) options:NSStringEnumerationByComposedCharacterSequences usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
[characters addObject:substring];
}];
If with NSLog(#"%#", characters); you see "strange character" of the form "\Uxxxx", that's correct. It's the default stringification behavior of NSArray by description method. You can print these unicode characters one by one if you want to see the "normal characters":
for (NSString *c in characters) {
NSLog(#"%#", c);
}

In your example, ichar isn't type of NSString, but unichar. If you want NSStrings try getting a substring instead :
NSString *danishString = #"æøå";
NSMutableArray *characters = [[NSMutableArray alloc] initWithCapacity:[danishString length]];
for (int i=0; i < [danishString length]; i++)
{
NSRange r = NSMakeRange(i, 1);
NSString *ichar = [danishString substringWithRange:r];
[characters addObject:ichar];
}

You could do something like the following, which should be fine with Danish characters, but would break down if you have decomposed characters. I suggest reading the String Programming Guide for more information.
NSString *danishString = #"æøå";
NSMutableArray* characters = [NSMutableArray array];
for( int i = 0; i < [danishString length]; i++ ) {
NSString* subchar = [danishString substringWithRange:NSMakeRange(i, 1)];
if( subchar ) [characters addObject:subchar];
}
That would split the string into an array of individual characters, assuming that all the code points were composed characters.

It is printing the unicode of the characters. Anyhow, you can use the unicode (with \u) anywhere.

Related

Uppercase random characters in a NSString

I'm trying to figure out the best approach to a problem. I have an essentially random alphanumeric string that I'm generating on the fly:
NSString *string = #"e04325ca24cf20ac6bd6ebf73c376b20ac57192dad83b22602264e92dac076611b51142ae12d2d92022eb2c77f";
You can see that there are no special characters, just numbers and letters, and all the letters are lowercase. Changing all the letters in this string to uppercase is easy:
[string capitalizedString];
The hard part is that I want to capitalize random characters in this string, not all of them. For example, this could be the output on one execution:
E04325cA24CF20ac6bD6eBF73C376b20Ac57192DAD83b22602264e92daC076611b51142AE12D2D92022Eb2C77F
This could be the output on another, since it's random:
e04325ca24cf20aC6bd6eBF73C376B20Ac57192DAd83b22602264E92dAC076611B51142AE12D2d92022EB2c77f
In case it makes this easier, let's say I have two variables as well:
int charsToUppercase = 12;//hardcoded value for how many characters to uppercase here
int totalChars = 90;//total string length
In this instance it would mean that 12 random characters out of the 90 in this string would be uppercased. What I've figured out so far is that I can loop through each char in the string relatively easily:
NSUInteger len = [string length];
unichar buffer[len+1];
[string getCharacters:buffer range:NSMakeRange(0, len)];
NSLog(#"loop through each char");
for(int i = 0; i < len; i++) {
NSLog(#"%C", buffer[i]);
}
Still stuck with selecting random chars in this loop to uppercase, so not all are uppercased. I'm guessing a condition in the for loop could do the trick well, given that it's random enough.
Here's one way, not particularly concerned with efficiency, but not silly efficiency-wise either: create an array characters in the original string, building an index of which ones are letters along the way...
NSString *string = #"e04325ca24cf20ac6bd6ebf73c376b20ac57192dad83b22602264e92dac076611b51142ae12d2d92022eb2c77f";
NSMutableArray *chars = [#[] mutableCopy];
NSMutableArray *letterIndexes = [#[] mutableCopy];
for (int i=0; i<string.length; i++) {
unichar ch = [string characterAtIndex:i];
// add each char as a string to a chars collection
[chars addObject:[NSString stringWithFormat:#"%c", ch]];
// record the index of letters
if ([[NSCharacterSet letterCharacterSet] characterIsMember:ch]) {
[letterIndexes addObject:#(i)];
}
}
Now, select randomly from the letterIndexes (removing them as we go) to determine which letters shall be upper case. Convert the member of the chars array at that index to uppercase...
int charsToUppercase = 12;
for (int i=0; i<charsToUppercase && letterIndexes.count; i++) {
NSInteger randomLetterIndex = arc4random_uniform((u_int32_t)(letterIndexes.count));
NSInteger indexToUpdate = [letterIndexes[randomLetterIndex] intValue];
[letterIndexes removeObjectAtIndex:randomLetterIndex];
[chars replaceObjectAtIndex:indexToUpdate withObject:[chars[indexToUpdate] uppercaseString]];
}
Notice the && check on letterIndexes.count. This guards against the condition where charsToUppercase exceeds the number of chars. The upper bound of conversions to uppercase is all of the letters in the original string.
Now all that's left is to join the chars array into a string...
NSString *result = [chars componentsJoinedByString:#""];
NSLog(#"%#", result);
EDIT Looking discussion in OP comments, you could, instead of acharsToUppercase input parameter, be given a probability of uppercase change as an input. That would compress this idea into a single loop with a little less data transformation...
NSString *string = #"e04325ca24cf20ac6bd6ebf73c376b20ac57192dad83b22602264e92dac076611b51142ae12d2d92022eb2c77f";
float upperCaseProbability = 0.5;
NSMutableString *result = [#"" mutableCopy];
for (int i=0; i<string.length; i++) {
NSString *chString = [string substringWithRange:NSMakeRange(i, 1)];
BOOL toUppercase = arc4random_uniform(1000) / 1000.0 < upperCaseProbability;
if (toUppercase) {
chString = [chString uppercaseString];
}
[result appendString:chString];
}
NSLog(#"%#", result);
However this assumes a given uppercase probability for any character, not any letter, so it won't result in a predetermined number of letters changing case.

Take all numbers separated by spaces from a string and place in an array

I have a NSString formatted like this:
"Hello world 12 looking for some 56"
I want to find all instances of numbers separated by whitespace and place them in an NSArray. I dont want to remove the numbers though.
Whats the best way of achieving this?
This is a solution using regular expression as suggested in the comment.
NSString *string = #"Hello world 12 looking for some 56";
NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:#"\\b\\d+" options:nil error:nil];
NSArray *matches = [expression matchesInString:string options:nil range:(NSMakeRange(0, string.length))];
NSMutableArray *result = [[NSMutableArray alloc] init];
for (NSTextCheckingResult *match in matches) {
[result addObject:[string substringWithRange:match.range]];
}
NSLog(#"%#", result);
First make an array using NSString's componentsSeparatedByString method and take reference to this SO question. Then iterate the array and refer to this SO question to check if an array element is number: Checking if NSString is Integer.
I don't know where you are looking to do perform this action because it may not be fast (such as if it's being called in a table cell it may be choppy) based upon the string size.
Code:
+ (NSArray *)getNumbersFromString:(NSString *)str {
NSMutableArray *retVal = [NSMutableArray array];
NSCharacterSet *numericSet = [NSCharacterSet decimalDigitCharacterSet];
NSString *placeholder = #"";
unichar currentChar;
for (int i = [str length] - 1; i >= 0; i--) {
currentChar = [str characterAtIndex:i];
if ([numericSet characterIsMember:currentChar]) {
placeholder = [placeholder stringByAppendingString:
[NSString stringWithCharacters:&currentChar
length:[placeholder length]+1];
} else {
if ([placeholder length] > 0) [retVal addObject:[placeholder intValue]];
else placeholder = #"";
return [retVal copy];
}
To explain what is happening above, essentially I am,
going through every character until I find a number
adding that number including any numbers after to a string
once it finds a number it adds it to an array
Hope this helps please ask for clarification if needed

Get a substring from an NSString until arriving to any letter in an NSArray - objective C

I am trying to parse a set of words that contain -- first greek letters, then english letters. This would be easy if there was a delimiter between the sets.That is what I've built so far..
- (void)loadWordFileToArray:(NSBundle *)bundle {
NSLog(#"loadWordFileToArray");
if (bundle != nil) {
NSString *path = [bundle pathForResource:#"alfa" ofType:#"txt"];
//pull the content from the file into memory
NSData* data = [NSData dataWithContentsOfFile:path];
//convert the bytes from the file into a string
NSString* string = [[NSString alloc] initWithBytes:[data bytes]
length:[data length]
encoding:NSUTF8StringEncoding];
//split the string around newline characters to create an array
NSString* delimiter = #"\n";
incomingWords = [string componentsSeparatedByString:delimiter];
NSLog(#"incomingWords count: %lu", (unsigned long)incomingWords.count);
}
}
-(void)parseWordArray{
NSLog(#"parseWordArray");
NSString *seperator = #" = ";
int i = 0;
for (i=0; i < incomingWords.count; i++) {
NSString *incomingString = [incomingWords objectAtIndex:i];
NSScanner *scanner = [NSScanner localizedScannerWithString: incomingString];
NSString *firstString;
NSString *secondString;
NSInteger scanPosition;
[scanner scanUpToString:seperator intoString:&firstString];
scanPosition = [scanner scanLocation];
secondString = [[scanner string] substringFromIndex:scanPosition+[seperator length]];
// NSLog(#"greek: %#", firstString);
// NSLog(#"english: %#", secondString);
[outgoingWords insertObject:[NSMutableArray arrayWithObjects:#"greek", firstString, #"english",secondString,#"category", #"", nil] atIndex:0];
[englishWords insertObject:[NSMutableArray arrayWithObjects:secondString,nil] atIndex:0];
}
}
But I cannot count on there being delimiters.
I have looked at this question. I want something similar. This would be: grab the characters in the string until an english letter is found. Then take the first group to one new string, and all the characters after to a second new string.
I only have to run this a few times, so optimization is not my highest priority.. Any help would be appreciated..
EDIT:
I've changed my code as shown below to make use of NSLinguisticTagger. This works, but is this the best way? Note that the interpretation for english characters is -- for some reason "und"...
The incoming string is: άγαλμα, το statue, only the last 6 characters are in english.
int j = 0;
for (j=0; j<incomingString.length; j++) {
NSString *language = [tagger tagAtIndex:j scheme:NSLinguisticTagSchemeLanguage tokenRange:NULL sentenceRange:NULL];
if ([language isEqual: #"und"]) {
NSLog(#"j is: %i", j);
int k = 0;
for (k=0; k<j; k++) {
NSRange range = NSMakeRange (0, k);
NSString *tempString = [incomingString substringWithRange:range ];
NSLog (#"tempString: %#", tempString);
}
return;
}
NSLog (#"Language: %#", language);
}
Alright so what you could do is use NSLinguisticTagger to find out the language of the word (or letter) and if the language has changed then you know where to split the string. You can use NSLinguisticTagger like this:
NSArray *tagschemes = #[NSLinguisticTagSchemeLanguage];
NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:tagschemes options: NSLinguisticTagPunctuation | NSLinguisticTaggerOmitWhitespace];
[tagger setString:#"This is my string in English."];
NSString *language = [tagger tagAtIndex:0 scheme:NSLinguisticTagSchemeLanguage tokenRange:NULL sentenceRange:NULL];
//Loop through each index of the string's characters and check the language as above.
//If it has changed then you can assume the language has changed.
Alternatively you can use NSSpellChecker's requestCheckingOfString to get teh dominant language in a range of characters:
NSSpellChecker *spellChecker = [NSSpellChecker sharedSpellChecker];
[spellChecker setAutomaticallyIdentifiesLanguages:YES];
NSString *spellCheckText = #"Guten Herr Mustermann. Dies ist ein deutscher Text. Bitte löschen Sie diesen nicht.";
[spellChecker requestCheckingOfString:spellCheckText
range:(NSRange){0, [spellCheckText length]}
types:NSTextCheckingTypeOrthography
options:nil
inSpellDocumentWithTag:0
completionHandler:^(NSInteger sequenceNumber, NSArray *results, NSOrthography *orthography, NSInteger wordCount) {
NSLog(#"dominant language = %#", orthography.dominantLanguage);
}];
This answer has information on how to detect the language of an NSString.
Allow me to introduce two good friends of mine.
NSCharacterSet and NSRegularExpression.
Along with them, normalization. (In Unicode terms)
First, you should normalize strings before analyzing them against a character set.
You will need to look at the choices, but normalizing to all composed forms is the way I would go.
This means an accented character is one instead of two or more.
It simplifies the number of things to compare.
Next, you can easily build your own NSCharacterSet objects from strings (loaded from files even) to use to test set membership.
Lastly, regular expressions can achieve the same thing with Unicode Property Names as classes or categories of characters. Regular expressions could be more terse but more expressive.

Extracting sentences containing keywords objective c

I have a block of text (a newspaper article if it's of any relevance) was wondering if there is a way to extract all sentences containing a particular keyword in objective-c? I've been looking a bit at ParseKit but aren't having much luck!
You can enumerate sentences using native NSString methods like this...
NSString *string = #"your text";
NSMutableArray *sentences = [NSMutableArray array];
[string enumerateSubstringsInRange:NSMakeRange(0, string.length)
options:NSStringEnumerationBySentences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
//check that this sentence has the string you are looking for
NSRange range = [substring rangeOfString:#"The text you are looking for"];
if (range.location != NSNotFound) {
[sentences addObject:substring];
}
}];
for (NSString *sentence in sentences) {
NSLog(#"%#", sentence);
}
At the end you will have an array of sentences all containing the text you were looking for.
Edit: As noted in the comments there are some inherit weaknesses with my solution as it requires a perfectly formatted sentence where period + space is only used when actually ending sentences... I'll leave it in here as it could be viable for people sorting a text with another (known) separator.
Here's another way of achieving what you want:
NSString *wordYouAreLookingFor = #"happy";
NSArray *arrayOfSentences = [aString componentsSeparatedByString:#". "]; // get the single sentences
NSMutableArray *sentencesWithMatchingWord = [[NSMutableArray alloc] init];
for (NSString *singleSentence in arrayOfSentences) {
NSInteger originalSize = [singleSentence length];
NSString *possibleNewString = [singleSentence stringByReplacingOccurrencesOfString:wordYouAreLookingFor withString:#""];
if (originalSize != [possibleNewString length]) {
[sentencesWithMatchingWord addObject:singleSentence];
}
}

string tokenizer in ios

I like to tokenize a string to characters and store the tokens in a string array. I am trying to use following code which is not working as I am using C notation to access the array. What needs to be changed in place of travel path[i]?
NSArray *tokanizedTravelPath= [[NSArray alloc]init];
for (int i=0; [travelPath length]; i++) {
tokanizedTravelPath[i]= [travelPath characterAtIndex:i];
You can't store unichars in an NSArray*. What exactly are you trying to accomplish? An NSString* is already a great representation for a collection of unichars, and you already have one of those.
You need a NSMutableArray to set every element of the array (otherwise you can't change its objects).Also, you can only insert objects in the array, so you can:
- Insert a NSString containing the character;
- Use a C-style array instead.
This is how to do with the NSMutableArray:
NSMutableArray *tokanizedTravelPath= [[NSMutableArray alloc]init];
for (int i=0; i<[travelPath length]; i++)
{
[tokanizedTravelPath insertObject: [NSString stringWithFormat: #"%c", [travelPath characterAtIndex:i]] atIndex: i];
}
I count 3 errors in your code, I explain them at the end of my answer.
First I want to show you a better approach to split a sting into it characters.
While I agree with Kevin that an NSString is a great representation of unicode characters already, you can use this block-based code to split it into substrings and save it to an array.
Form the docs:
enumerateSubstringsInRange:options:usingBlock:
Enumerates the
substrings of the specified type in the specified range of the string.
NSString *hwlloWord = #"Hello World";
NSMutableArray *charArray = [NSMutableArray array];
[hwlloWord enumerateSubstringsInRange:NSMakeRange(0, [hwlloWord length])
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring,
NSRange substringRange,
NSRange enclosingRange,
BOOL *stop)
{
[charArray addObject:substring];
}];
NSLog(#"%#", charArray);
Output:
(
H,
e,
l,
l,
o,
" ",
W,
o,
r,
l,
d
)
But actually your problems are of another nature:
An NSArray is immutable. Once instantiated, it cannot be altered. For mutable array, you use the NSArray subclass NSMutableArray.
Also, characterAtIndex does not return an object, but a primitive type — but those can't be saved to an NSArray. You have to wrap it into an NSString or some other representation.
You could use substringWithRange instead.
NSMutableArray *tokanizedTravelPath= [NSMutableArray array];
for (int i=0; i < [hwlloWord length]; ++i) {
NSLog(#"%#",[hwlloWord substringWithRange:NSMakeRange(i, 1)]);
[tokanizedTravelPath addObject:[hwlloWord substringWithRange:NSMakeRange(i, 1)]];
}
Also your for-loop is wrong, the for-loop condition is not correct. it must be for (int i=0; i < [travelPath length]; i++)