NSString Length - Special Characters - objective-c

I have a UITextField that users will be entering characters into. It is as simple as, how can I return it's actual length? When the string contains A-Z 1-9 characters it works as expected but any emoji or special characters get double counted.
In it's simplest format, this just has an allocation of 2 characters for some special characters like emoji:
NSLog(#"Field '%#' contains %i chars", myTextBox.text, [myTextBox.text length] );
I have tried looping through each character using characterAtIndex, substringFromIndex, etc. and got nowhere.
As per answer below, exact code used to count characters (hope this is the right approach but it works..):
NSString *sString = txtBox.text;
__block int length = 0;
[sString enumerateSubstringsInRange:NSMakeRange(0, [sString length])
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
length++;
}];
NSLog(#"Total: %u", length );

The [myTextBox.text length] returns the count of unichars and not the visible length of the string. é = e+´ which is 2 unichars. The Emoji characters should contain more the 1 unichar.
This sample below enumerates through each character block in the string. Which means if you log the range of substringRange it can longer than 1.
__block NSInteger length = 0;
[string enumerateSubstringsInRange:range
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
length++;
}];
You should go and watch the Session 128 - Advance Text Processing from 2011 WWDC. They explain why it is like that. It's really great!
I hope this was to any help.
Cheers!

We can also consider the below option as a solution
const char *cString = [myTextBox UTF8String];
int textLength = (int)strlen(cString);
This will work with special chars and emoji

Related

How to handle 32bit unicode characters in a NSString

I have a NSString containing a unicode character bigger than U+FFFF, like the MUSICAL SYMBOL G CLEF symbol '𝄞'. I can create the NSString and display it.
NSString *s = #"A\U0001d11eB"; // "A𝄞B"
NSLog(#"String = \"%#\"", s);
The log is correct and displays the 3 characters. This tells me the NSString is well done and there is no encoding problem.
String = "A𝄞B"
But when I try to loop through all characters using the method
- (unichar)characterAtIndex:(NSUInteger)index
everything goes wrong.
The type unichar is 16 bits so I expect to get the wrong character for the musical symbol. But the length of the string is also incorrect!
NSLog(#"Length = %d", [s length]);
for (int i=0; i<[s length]; i++)
{
NSLog(#" Character %d = %c", i, [s characterAtIndex:i]);
}
displays
Length = 4
Character 0 = A
Character 1 = 4
Character 2 = .
Character 3 = B
What methods should I use to correctly parse my NSString and get my 3 unicode characters?
Ideally the right method should return a type like wchar_t in place of unichar.
Thank you
NSString *s = #"A\U0001d11eB";
NSData *data = [s dataUsingEncoding:NSUTF32LittleEndianStringEncoding];
const wchar_t *wcs = [data bytes];
for (int i = 0; i < [data length]/4; i++) {
NSLog(#"%#010x", wcs[i]);
}
Output:
0x00000041
0x0001d11e
0x00000042
(The code assumes that wchar_t has a size of 4 bytes and little-endian encoding.)
length and charAtIndex: do not give the expected result because \U0001d11e
is internally stored as UTF-16 "surrogate pair".
Another useful method for general Unicode strings is
[s enumerateSubstringsInRange:NSMakeRange(0, [s length])
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
NSLog(#"%#", substring);
}];
Output:
A
𝄞
B

Objective-C: Randomly replace characters in string

I'd like to have a function that removes a random set of characters from a string and replaces them with '_'. eg. to create a fill in the blanks type of situation. The way I have it now works, but its not smart. Also I don't want to replace spaces with blanks (as you can see in the while loop). Any suggestions on a more efficient way to do this?
blankItem = #"Remove Some Characters";
for(int j=0;j<totalRemove;j++)
{
replaceLocation=arc4random() % blankItem.length;
while ([blankItem characterAtIndex:replaceLocation] == '_' || [blankItem characterAtIndex:replaceLocation] == ' ') {
replaceLocation=arc4random() % blankItem.length;
}
blankItem= [blankItem stringByReplacingCharactersInRange:NSMakeRange(replaceLocation, 1) withString:#"_"];
}
My issue is with the for and while loops in terms of efficiency. But, maybe efficiency isn't of the essence in something this small?
If the number of characters to remove/replace is small compared to the length of the
string, then your solution is good, because the probability of a "collision" in the
while-loop is small. You can improve the method by using a single mutable string instead of
allocating a new string in each step:
NSString *string = #"Remove Some Characters";
int totalRemove = 5;
NSMutableString *result = [string mutableCopy];
for (int j=0; j < totalRemove; j++) {
int replaceLocation;
do {
replaceLocation = arc4random_uniform((int)[result length]);
} while ([result characterAtIndex:replaceLocation] == '_' || [result characterAtIndex:replaceLocation] == ' ');
[result replaceCharactersInRange:NSMakeRange(replaceLocation, 1) withString:#"_"];
}
If the number of characters to remove/replace is about the same magnitude as the
length of the string, then a different algorithm might be better.
The following code uses the ideas from Unique random numbers in an integer array in the C programming language to replace characters
at random positions with a single loop over all characters of the string.
An additional (first) pass is necessary because of your requirement that space characters
are not replaced.
NSString *string = #"Remove Some Characters";
int totalRemove = 5;
// First pass: Determine number of non-space characters:
__block int count = 0;
[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
if (![substring isEqualToString:#" "]) {
count++;
}
}];
// Second pass: Replace characters at random positions:
__block int c = count; // Number of remaining non-space characters
__block int r = totalRemove; // Number of remaining characters to replace
NSMutableString *result = [string mutableCopy];
[result enumerateSubstringsInRange:NSMakeRange(0, [result length])
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
if (![substring isEqualToString:#" "]) {
// Replace this character with probability r/c:
if (arc4random_uniform(c) < r) {
[result replaceCharactersInRange:substringRange withString:#"_"];
r--;
if (r == 0) *stop = YES; // Stop enumeration, nothing more to do.
}
c--;
}
}];
Another advantage of this solution is that it handles surrogate pairs (e.g. Emojis) and composed character sequences correctly, even if these are stores as two separate characters in the string.

matching multiple words with enumerateSubstringsInRange in NSMutableAttributedString

I am trying to match the string below but unfortunately it only gives me "nope" as the result. Can anyone help? thanks in advance!
NSMutableAttributedString *text = [NSMutableString stringWithString:#"darn thing suddenly erupted without any warning.";
NSString *findMe = #"suddenly erupted";
[text enumerateSubstringsInRange:NSMakeRange(0, [text length]) options:NSStringEnumerationByWords usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
if ([findMe isEqualToString:substring] ) {
NSLog(#"found it");
}
else {
NSLog(#"nope");
}
}];
Your method is only enumerating separate words. "suddenly erupted" are two words.
Why don't you use -rangeOfSubstring: in order to find whether text contains some substring? For example:
NSLog(#"%#",[[text mutableString] rangeOfString:findMe].location == NSNotFound ? #"nope" : #"found it");
enumerateSubstringsInRange have options like
NSStringEnumerationByLines
NSStringEnumerationBySentences
NSStringEnumerationByParagraphs
NSStringEnumerationByComposedCharacterSequences
NSStringEnumerationByWords
if you have words to compare means it will work
e.g
NSString *text = #"darn thing suddenlyerupted without any warning.";
NSString *findMe = #"suddenlyerupted";
so you cant compare sub string. You need to customize the block or move to some other option.

How do I split a string with special characters into a NSMutableArray

I'am trying to seperate a string with danish characters into a NSMutableArray. But something is not working. :(
My code:
NSString *danishString = #"æøå";
NSMutableArray *characters = [[NSMutableArray alloc] initWithCapacity:[danishString length]];
for (int i=0; i < [danishString length]; i++)
{
NSString *ichar = [NSString stringWithFormat:#"%c", [danishString characterAtIndex:i ]];
[characters addObject:ichar];
}
If I do at NSLog on the danishString it works (returns æøå);
But if I do a NSLog on the characters (the array) I get some very stange characters - What is wrong?
/Morten
First of all, your code is incorrect. characterAtIndex returns unichar, so you should use #"%C"(uppercase) as the format specifier.
Even with the correct format specifier, your code is unsafe, and strictly speaking, still incorrect, because not all unicode characters can be represented by a single unichar. You should always handle unicode strings per substring:
It's common to think of a string as a sequence of characters, but when
working with NSString objects, or with Unicode strings in general, in
most cases it is better to deal with substrings rather than with
individual characters. The reason for this is that what the user
perceives as a character in text may in many cases be represented by
multiple characters in the string.
You should definitely read String Programming Guide.
Finally, the correct code for you:
NSString *danishString = #"æøå";
NSMutableArray *characters = [[NSMutableArray alloc] initWithCapacity:[danishString length]];
[danishString enumerateSubstringsInRange:NSMakeRange(0, danishString.length) options:NSStringEnumerationByComposedCharacterSequences usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
[characters addObject:substring];
}];
If with NSLog(#"%#", characters); you see "strange character" of the form "\Uxxxx", that's correct. It's the default stringification behavior of NSArray by description method. You can print these unicode characters one by one if you want to see the "normal characters":
for (NSString *c in characters) {
NSLog(#"%#", c);
}
In your example, ichar isn't type of NSString, but unichar. If you want NSStrings try getting a substring instead :
NSString *danishString = #"æøå";
NSMutableArray *characters = [[NSMutableArray alloc] initWithCapacity:[danishString length]];
for (int i=0; i < [danishString length]; i++)
{
NSRange r = NSMakeRange(i, 1);
NSString *ichar = [danishString substringWithRange:r];
[characters addObject:ichar];
}
You could do something like the following, which should be fine with Danish characters, but would break down if you have decomposed characters. I suggest reading the String Programming Guide for more information.
NSString *danishString = #"æøå";
NSMutableArray* characters = [NSMutableArray array];
for( int i = 0; i < [danishString length]; i++ ) {
NSString* subchar = [danishString substringWithRange:NSMakeRange(i, 1)];
if( subchar ) [characters addObject:subchar];
}
That would split the string into an array of individual characters, assuming that all the code points were composed characters.
It is printing the unicode of the characters. Anyhow, you can use the unicode (with \u) anywhere.

How do you get the number of words in a NSTextStorage/NSString?

So my question is basically how do you get the number of words in a NSTextStorage/NSString? I don't want the character length but the word length. Thanks.
If you're on 10.6 or later, the following may be the easiest solution:
- (NSUInteger)numberOfWordsInString:(NSString *)str {
__block NSUInteger count = 0;
[str enumerateSubstringsInRange:NSMakeRange(0, [str length])
options:NSStringEnumerationByWords|NSStringEnumerationSubstringNotRequired
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
count++;
}];
return count;
}
If you want to take the current locale into account when doing word-splitting you can also add NSStringEnumerationLocalized to the options.
You could always find the number of spaces and add one.
To be more accurate one would have to take into all nonletter characters: commas, fullstops, whitespace characters, etc.
[[string componentsSeparatedByString:#" "] count];
When using NSTextStorage, you can use the words method to get to the number of words. It might not be the most memory-efficient way to count words, but it does a pretty good job at ignoring punctuation marks and other non-word characters:
NSString *input = #"one - two three four .";
NSTextStorage *storage = [[NSTextStorage alloc] initWithString:input];
NSLog(#"word count: %u", [[storage words] count]);
The output will be word count: 4.
CFStringTokenizer is your friend.
Use that:
NSArray *words = [theStorage words];
int wordCount = [words count];
Is that your problem?