I recently wanted use regex in Cocoa app. But I found that Cocoa does not include a regex class. So I decided to use RegexKit (OgreKit is good, but I don't know why it did not run fine on my OSX 10.6.4 x86_64).
I have a file content like:
12:20:30 - 01:20:30
some text
11:20:30 - 04:20:30
some text
And I want to pick up all time values and text values.I found this example code in a guide:
NSString *entireMatchString = NULL, *totalString = NULL, *dollarsString = NULL, *centsString = NULL;
NSString *regexString = #"owe:\\s*\\$?(?<total>(?<dollars>\\d+)\\.(?<cents>\\d+))";
[#"You owe: 1234.56 (tip not included)" getCapturesWithRegexAndReferences:regexString,#"$0", &entireString,#"${total}", &totalString,#"${dollars}", &dollarsString,#"${cents}",¢sString,nil];
// entireString = #"owe: 1234.56";
// totalString = #"1234.56";
// dollarsString = #"1234";
// centsString = #"56";
So I wrote a regex string
NSString *regexString = #"\\s*\\n*(?<start>\\d\\d:\\d\\d:\\d\\d*)\\s*-\\s*(?<end>\\d\\d:\\d\\d:\\d\\d*)\\s*\\n*(?<text>.*)
and it worked fine, but it only works once. I need to pick up all named captures, like we do in Ogrekit, e.g.:
OGRegularExpression *regex = [OGRegularExpression regularExpressionWithString:#"<video src=\"(?<imageURL>.+)\".+>"
options:OgreCaptureGroupOption
syntax:OgreRubySyntax
escapeCharacter:OgreBackslashCharacter];
NSArray *matches = [regex allMatchesInString:#"<video src=\"http://test.com/hello.jpg\">"];
if (matches != nil && ([matches count] == 1))
{
OGRegularExpressionMatch *match = [matches objectAtIndex: 0];
NSString *result = [match substringNamed:#"ImageURL"];
// : http://test.com/hello.jpg
}
It's easy to find all named capture values from a matches array. Does someone know how to do in RegexKit? Thanks.
Take a look at the documentation under Enumerating all the Matches in a String by a Regular Expression. Specifically, you want to learn about the RKEnumerator class.
Related
I am working with a Objective-C Application, specifically I am gathering the dictionary representation of NSUserDefaults with this code:
NSUserDefaults *defaults = [NSUserDefaults standardUserDefaults];
NSDictionary *userDefaultsDict = [defaults dictionaryRepresentation];
While enumerating keys and objects of the resulting dict, sometimes I find a kind of opaque string that you can see in the following picture:
So it seems like an encoding problem.
If I try to print description of the string, the debugger correctly prints:
Printing description of obj:
tsuqsx
However, if I try to write obj to a file, or use it in any other way, I get an unreadable output like this:
What I would like to achieve is the following:
Detect in some way that the string has the encoding problem.
Convert the string to UTF8 encoding to use it in the rest of the program.
Any help is greatly appreciated. Thanks
EDIT: Very Hacky possible Solution that helps explaining what I am trying to do.
After trying all possible solutions based on dataUsingEncoding and back, I ended up with the following solution, absolutely weird, but I post it here, in the hope that it can help somebody to guess the encoding and what to do with unprintable characters:
- (BOOL)isProblematicString:(NSString *)candidateString {
BOOL returnValue = YES;
if ([candidateString length] <= 2) {
return NO;
}
const char *temp = [candidateString UTF8String];
long length = temp[0];
char *dest = malloc(length + 1);
long ctr = 1;
long usefulCounter = 0;
for (ctr = 1;ctr <= length;ctr++) {
if ((ctr - 1) % 3 == 0) {
memcpy(&dest[ctr - usefulCounter - 1],&temp[ctr],1);
} else {
if (ctr != 1 && ctr < [candidateString length]) {
if (temp[ctr] < 0x10 || temp[ctr] > 0x1F) {
returnValue = NO;
}
}
usefulCounter += 1;
}
}
memset(&dest[length],0,1);
free(dest);
return returnValue;
}
- (NSString *)utf8StringFromUnknownEncodedString:(NSString*)originalUnknownString {
const char *temp = [originalUnknownString UTF8String];
long length = temp[0];
char *dest = malloc(length + 1);
long ctr = 1;
long usefulCounter = 0;
for (ctr = 1;ctr <= length;ctr++) {
if ((ctr - 1) % 3 == 0) {
memcpy(&dest[ctr - usefulCounter - 1],&temp[ctr],1);
} else {
usefulCounter += 1;
}
}
memset(&dest[length],0,1);
NSString *returnValue = [[NSString alloc] initWithUTF8String:dest];
free(dest);
return returnValue;
}
This returns me a string that I can use to build a full UTF8 string. I am looking for a clean solution. Any help is greatly appreciated. Thanks
We're talking about a string which comes from the /Library/Preferences/.GlobalPreferences.plist
(key com.apple.preferences.timezone.new.selected_city).
NSString *city = [[NSUserDefaults standardUserDefaults]
stringForKey:#"com.apple.preferences.timezone.new.selected_city"];
NSLog(#"%#", city); // \^Zt\^\\^]s\^]\^\u\^V\^_q\^]\^[s\^W\^Zx\^P
(lldb) p [city description]
(__NSCFString *) $1 = 0x0000600003f6c240 #"\x1at\x1c\x1ds\x1d\x1cu\x16\x1fq\x1d\x1bs\x17\x1ax\x10"
What I would like to achieve is the following:
Detect in some way that the string has the encoding problem.
Convert the string to UTF8 encoding to use it in the rest of the program.
&
After trying all possible solutions based on dataUsingEncoding and back.
This string has no encoding problem and characters like \x1a, \x1c, ... are valid characters.
You can call dataUsingEncoding: with ASCII, UTF-8, ... but all these characters will still be
present. They're called control characters (or non-printing characters). The linked Wikipedia page explains what these characters are and how they're defined in ASCII, extended ASCII and unicode.
What you're looking for is a way how to remove control characters from a string.
Remove control characters
We can create a category for our new method:
#interface NSString (ControlCharacters)
- (NSString *)stringByRemovingControlCharacters;
#end
#implementation NSString (ControlCharacters)
- (NSString *)stringByRemovingControlCharacters {
// TODO Remove control characters
return self;
}
#end
In all examples below, the city variable is created in this way ...
NSString *city = [[NSUserDefaults standardUserDefaults]
stringForKey:#"com.apple.preferences.timezone.new.selected_city"];
... and contains #"\x1at\x1c\x1ds\x1d\x1cu\x16\x1fq\x1d\x1bs\x17\x1ax\x10". Also all
examples below were tested with the following code:
NSString *cityWithoutCC = [city stringByRemovingControlCharacters];
// tsuqsx
NSLog(#"%#", cityWithoutCC);
// {length = 6, bytes = 0x747375717378}
NSLog(#"%#", [cityWithoutCC dataUsingEncoding:NSUTF8StringEncoding]);
Split & join
One way is to utilize the NSCharacterSet.controlCharacterSet.
There's a stringByTrimmingCharactersInSet:
method (NSString), but it removes these characters from the beginning/end only,
which is not what you're looking for. There's a trick you can use:
- (NSString *)stringByRemovingControlCharacters {
NSArray<NSString *> *components = [self componentsSeparatedByCharactersInSet:NSCharacterSet.controlCharacterSet];
return [components componentsJoinedByString:#""];
}
It splits the string by control characters and then joins these components back. Not a very efficient way, but it works.
ICU transform
Another way is to use ICU transform (see ICU User Guide).
There's a stringByApplyingTransform:reverse:
method (NSString), but it only accepts predefined constants. Documentation says:
The constants defined by the NSStringTransform type offer a subset of the functionality provided by the underlying ICU transform functionality. To apply an ICU transform defined in the ICU User Guide that doesn't have a corresponding NSStringTransform constant, create an instance of NSMutableString and call the applyTransform:reverse:range:updatedRange: method instead.
Let's update our implementation:
- (NSString *)stringByRemovingControlCharacters {
NSMutableString *result = [self mutableCopy];
[result applyTransform:#"[[:Cc:] [:Cf:]] Remove"
reverse:NO
range:NSMakeRange(0, self.length)
updatedRange:nil];
return result;
}
[:Cc:] represents control characters, [:Cf:] represents format characters. Both represents the same character set as the already mentioned NSCharacterSet.controlCharacterSet. Documentation:
A character set containing the characters in Unicode General Category Cc and Cf.
Iterate over characters
NSCharacterSet also offers the characterIsMember: method. Here we need to iterate over characters (unichar) and check if it's a control character or not.
Let's update our implementation:
- (NSString *)stringByRemovingControlCharacters {
if (self.length == 0) {
return self;
}
NSUInteger length = self.length;
unichar characters[length];
[self getCharacters:characters];
NSUInteger resultLength = 0;
unichar result[length];
NSCharacterSet *controlCharacterSet = NSCharacterSet.controlCharacterSet;
for (NSUInteger i = 0 ; i < length ; i++) {
if ([controlCharacterSet characterIsMember:characters[i]] == NO) {
result[resultLength++] = characters[i];
}
}
return [NSString stringWithCharacters:result length:resultLength];
}
Here we filter out all characters (unichar) which belong to the controlCharacterSet.
Other ways
There're other ways how to iterate over characters - for example - Most efficient way to iterate over all the chars in an NSString.
BBEdit & others
Let's write this string to a file:
NSString *city = [[NSUserDefaults standardUserDefaults]
stringForKey:#"com.apple.preferences.timezone.new.selected_city"];
[city writeToFile:#"/Users/zrzka/city.txt"
atomically:YES
encoding:NSUTF8StringEncoding
error:nil];
It's up to the editor how all these controls characters are handled/displayed. Here's en example - Visual Studio Code.
View - Render Control Characters off:
View - Render Control Characters on:
BBEdit displays question marks (upside down), but I'm sure there's a way how to
toggle control characters rendering. Don't have BBEdit installed to verify it.
Here's my program so far. My intention is to have it so the if statement compares the letter in the string letterGuessed to a character in the string userInputPhraseString. Here's what I have. While coding in xCode, I get an "expected '['"error. I have no idea why.
NSString *letterGuessed = userInputGuessedLetter.text;
NSString *userInputPhraseString = userInputPhraseString.text;
int loopCounter = 0;
int stringLength = userInputPhraseString.length;
while (loopCounter < stringLength){
if (guessedLetter isEqualToString:[userInputPhraseString characterAtIndex:loopIndexTwo])
{
//if statement true
}
loopCounter++;
}
You are missing enclosing square brackets on this line:
if (guessedLetter isEqualToString:[userInputPhraseString characterAtIndex:loopIndexTwo])
It should be:
if ([guessedLetter isEqualToString:[userInputPhraseString characterAtIndex:loopIndexTwo]])
Edit that won’t fix your problem, though, because characterAtIndex: returns a unichar, not an NSString.
It's not clear what you are trying to do.. But I suppose that letterGuessed has one character... And that userInputPhraseString has many characters. So you want to know if letterGuessed is inside userInputPhraseString correct?
This is one solution without loops involved.. I replaced the input with fixed values for testing and tested the code.. It works.
NSString *letterGuessed = #"A"; //Change to your inputs
NSString *userInputPhraseString = #"BBBA"; //Since it has A it will be true in the test
NSCharacterSet *cset = [NSCharacterSet characterSetWithCharactersInString:letterGuessed];
NSRange range = [userInputPhraseString rangeOfCharacterFromSet:cset];
if (range.location != NSNotFound) { //Does letterGuessed is in UserInputPhraseString?
NSLog(#"YES"); //userInput Does contain A...
} else {
NSLog(#"NO");
}
In regards to your code... I fixed a couple of errors, first you are trying to get a UniChar (Integer) value for the character and want to compare it to a NSString which is an Object. Also fixed a couple of issues with syntax you had and used the right approach which is to return a range of characters. Again for doing what you want to accomplish the example above is the best approach I know, but for the sake of learning, here is your code fixed.
NSString *letterGuessed = #"A"; //Change to your inputs
NSString *userInputPhraseString = #"BBBA"; //Since it has A it will be true in the test
NSInteger loopCounter = 0; //Use NSInteger instead of int.
NSInteger stringLength = userInputPhraseString.length;
BOOL foundChar = NO; //Just for the sake of returning NOT FOUND in NSLOG
while (loopCounter < stringLength){
//Here we will get a letter for each iteration.
NSString *scannedLetter = [userInputPhraseString substringWithRange:NSMakeRange(loopCounter, 1)]; // Removed loopCounterTwo
if ([scannedLetter isEqualToString:letterGuessed])
{
NSLog(#"FOUND CHARACTER");
foundChar = YES;
}
loopCounter++;
}
if (!foundChar) NSLog(#"NOT FOUND");
NSRange holds the position, length.. So we move to a new position on every iteration and then get 1 character.
Also if this approach is what you want, I would strongly suggest a for-loop.
This has given me quite a big headache. For whatever reason, when I use this code, the if statement always evaluates to false:
while(!feof(file))
{
NSString *line = [self readNSString:file];
NSLog(#"%#", line);
NSLog(#"%#", search);
NSRange textRange;
textRange =[line rangeOfString:search];
if(textRange.location != NSNotFound)
{
NSString *result = [line substringFromIndex:NSMaxRange([line rangeOfString:search])];
resultView.text = result;
}
else
{
resultView.text = #"Not found";
}
}
When the functions execute, the two NSLogs tell me that the "line" and "search" strings are what they should be, so then why does the if statement always evaluate to false? I must be missing something simple, having another set of eyes would be great. Thanks
edit: (function "readNSString")
- (NSString*)readNSString:(FILE*) file
{
char buffer[300];
NSMutableString *result = [NSMutableString stringWithCapacity:256];
int read;
do
{
if(fscanf(file, "%299[^\n]%n%*c", buffer, &read) == 1)
[result appendFormat:#"%s", buffer];
else
break;
} while(r == 299);
return result;
}
edit 2:
search is set with a call to the first function, with an NSString* variable as a parameter, like this:
NSString *textFieldText = [[NSString alloc]
initWithFormat:#"%#", textField.text];
[self readFile:textFieldText];
edit 3 (NSLogs output)
line: Germany Italy France
search: Italy
I think that you are using the rangeOfString and the NSNotFound etc. correctly, so the problem is possibly to do with the creation of the string from the data read from the file using the appendFormat:#"%s".
I suspect there may be an encoding issue between your two string formats - I would investigate whether the "%s" encodes the null terminated C string properly into the same format as a unicode NSString with the appropriate encoding.
Try hard coding the value you are getting from the readNSString function as a string literal in code just for testing and see if that comparison works, if so this would tend to indicate it probably is something to do with the encoding of the string created from the file.
I'm writing a simple shift cipher iPhone app as a pet project, and one piece of functionality I'm currently designing is a "universal" decryption of an NSString, that returns an NSArray, all of NSStrings:
- (NSArray*) decryptString: (NSString*)ciphertext{
NSMutableArray* theDecryptions = [NSMutableArray arrayWithCapacity:ALPHABET];
for (int i = 0; i < ALPHABET; ++i) {
NSString* theNewPlainText = [self decryptString:ciphertext ForShift:i];
[theDecryptions insertObject:theNewPlainText
atIndex:i];
}
return theDecryptions;
}
I'd really like to pass this NSArray into another method that attempts to spell check each individual string within the array, and builds a new array that puts the strings with the fewest typo'd words at lower indicies, so they're displayed first. I'd like to use the system's dictionary like a text field would, so I can match against words that have been trained into the phone by its user.
My current guess is to split a given string up into words, then spell check each with NSSpellChecker's -checkSpellingOfString:StartingAt: and using the number of correct words to sort the Array. Is there an existing library method or well-accepted pattern that would help return such a value for a given string?
Well, I found a solution that works using UIKit/UITextChecker. It correctly finds the user's most preferred language dictionary, but I'm not sure if it includes learned words in the actual rangeOfMisspelledWords... method. If it doesn't, calling [UITextChecker hasLearnedWord] on currentWord inside the bottom if statement should be enough to find user-taught words.
As noted in the comments, it may be prudent to call rangeOfMisspelledWords with each of the top few languages in [UITextChecker availableLanguages], to help multilingual users.
-(void) checkForDefinedWords {
NSArray* words = [message componentsSeparatedByString:#" "];
NSInteger wordsFound = 0;
UITextChecker* checker = [[UITextChecker alloc] init];
//get the first language in the checker's memory- this is the user's
//preferred language.
//TODO: May want to search with every language (or top few) in the array
NSString* preferredLang = [[UITextChecker availableLanguages] objectAtIndex:0];
//for each word in the array, determine whether it is a valid word
for(NSString* currentWord in words){
NSRange range;
range = [checker rangeOfMisspelledWordInString:currentWord
range:NSMakeRange(0, [currentWord length])
startingAt:0
wrap:NO
language:preferredLang];
//if it is valid (no errors found), increment wordsFound
if (range.location == NSNotFound) {
//NSLog(#"%# %#", #"Valid Word found:", currentWord);
wordsFound++;
}
else {
//NSLog(#"%# %#", #"Invalid Word found:", currentWord);
}
}
//After all "words" have been searched, save wordsFound to validWordCount
[self setValidWordCount:wordsFound];
[checker release];
}
Hey folks, beneath is a piece of code i used for a school assignment.
Whenever I enter a word, with an O in it (which is a capital o), it fails!
Whenever there is one or more capital O's in this program, it returns false and logs : sentence not a palindrome.
A palindrome, for the people that dont know what a palindrome is, is a word that is the same read left from right, and backwards. (e.g. lol, kayak, reviver etc)
I found this bug when trying to check the 'oldest' palindrome ever found: SATOR AREPO TENET OPERA ROTAS.
When I change all the capital o's to lowercase o's, it works, and returns true.
Let me state clearly, with this piece of code ALL sentences/words with capital O's return false. A single capital o is enough to fail this program.
-(BOOL)testForPalindrome:(NSString *)s position:(NSInteger)pos {
NSString *string = s;
NSInteger position = pos;
NSInteger stringLength = [string length];
NSString *charOne = [string substringFromIndex:position];
charOne = [charOne substringToIndex:1];
NSString *charTwo = [string substringFromIndex:(stringLength - 1 - position)];
charTwo = [charTwo substringToIndex:1];
if(position > (stringLength / 2)) {
NSString *printableString = [NSString stringWithFormat:#"De following word or sentence is a palindrome: \n\n%#", string];
NSLog(#"%# is a palindrome.", string);
[textField setStringValue:printableString];
return YES;
}
if(charOne != charTwo) {
NSLog(#"%#, %#", charOne, charTwo);
NSLog(#"%i", position);
NSLog(#"%# is not a palindrome.", string);
return NO;
}
return [self testForPalindrome:string position:position+1];
}
So, is this some weird bug in Cocoa?
Or am I missing something?
B
This of course is not a bug in Cocoa, as you probably knew deep down inside.
Your compare method is causing this 'bug in Cocoa', you're comparing the addresses of charOne and charTwo. Instead you should compare the contents of the string with the isEqualToString message.
Use:
if(![charOne isEqualToString:charTwo]) {
Instead of:
if(charOne != charTwo) {
Edit: tested it in a test project and can confirm this is the problem.
Don't use charOne != charTwo
Instead use one of the NSString Compare Methods.
if ([charOne caseInsensitiveCompare:charTwo] != NSOrderedSame)
It may also have to do with localization (but I doubt it).