Regex - excluding a word from a search - objective-c

I've tried the following to exclude the words 'and' and 'the' from a regex search but it doesn't seem to be working. Any idea what I'm doing wrong?
NSString *pattern = [NSString stringWithFormat:#"\\b%#\\b\\b(?!.*\\b(?:and|the)\\b)", word];
NSRegularExpression *regex = [[NSRegularExpression alloc] initWithPattern:pattern options: NSRegularExpressionCaseInsensitive error:nil];

Seems like Objective C does support negative lookbehinds.. so I'm going to link you two excellent posts about negative lookbehinds.
http://www.codinghorror.com/blog/2005/10/excluding-matches-with-regular-expressions.html (by Jeff Atwood)
http://www.regular-expressions.info/lookaround.html (second link on the above blog post)

Related

NSRegularExpression matching and replacing with exclude

I'm working on a small iOS App and got stuck with creating a pattern using NSRegularExpression class. I need a pattern that I can use to look for and match a special word and replace it later but I need to exclude this word from match in case it has already been replaced by this match. So if user processes given text several times the replacement goes only once.
Example:
I need to find and replace all "yes" in any given text with "probably yes". But I need to exclude replacement of "yes" in "probably yes" in case user processes text one more time so it won't look like "probably probably yes"
NSRegularExpression *regexYesReplace = [NSRegularExpression regularExpressionWithPattern:#"some pattern" options:0 error:&error];
NSString *replacementStringYesReplace = #"probably yes";
replacedText = [regexYesReplace stringByReplacingMatchesInString:afterText options:options range:range withTemplate:replacementStringYesReplace];
I tried to implement pattern from this question and fixed syntax for NSRegularExpression but it didn't work out.
Regex replace text but exclude when text is between specific tag
May be someone had the same problem. Thanks in advance
You can use negative look-behind
(?<!probably )yes
Regex Demo

Are regexes the right way to extract digits from a ticket number in an NSString?

I would like to programmatically receive a JIRA ticket number, like #"ART-235", and obtain the bare digits / number, #"235".
A question I asked about using regular expressions turned up Regular expressions in an Objective-C Cocoa application with a link to https://developer.apple.com/library/ios/documentation/Foundation/Reference/NSRegularExpression_Class/Reference/Reference.html, and it looks indeed like I can have a regular expression such as \D*?(\d+) and retrieve the value via a regular expression.
However, I wanted to check in and ask if there is a less bletcherous way to do this, or is this an example of why Objective-C is called a bit archaic? The second link gives what looks like everything I need, but it smells a little funny. For the objective stated above, do I want to use regular expressions, or is there a more nicely idiomatic way to perform this sort of string manipulation?
Sounds like -componentsSeparatedByString: would do what you need.
Getting pieces of a fixed, known, format that doesn't use paired delimiters or nesting is exactly the kind of thing that regexes are made to do. I don't see a thing wrong with using one here.
To address your question as written (about "iteration"), however, you might want to look at NSScanner, which does move through the characters of a string by "character class", allowing you to evaluate them as you go.
NSString * ticket = #"ART-235";
NSScanner * scanner = [NSScanner scannerWithString:ticket];
[scanner scanUpToCharactersFromSet:[NSCharacterSet decimalDigitCharacterSet]
intoString:nil];
// As an integer
NSInteger ticketNumber;
[scanner scanInteger:&ticketNumber];
// Or as a string
NSString * ticketNumber;
[scanner scanCharactersFromSet:[NSCharacterSet decimalDigitCharacterSet]
intoString:&ticketNumber];
Like other answers have already said: that simple case can be solved using componentsSeparatedByString:#"-".
That said, your original question is how to enumerate individual characters.
Not all characters are of the same size, some languages combine more than one character into a new language. When enumerating such a string you most likely want to get the resulting of that composition, not the individual pieces. In Objective-C you can enumerate these composed characters like this:
NSString *myString = #"Hello Strings!";
[myString enumerateSubstringsInRange:NSMakeRange(0, myString.length)
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
// Do something with the composed character
NSLog(#"%#", substring);
}];
The example above will log each character one by one.
I made a simple method for you that does the trick, provided that the
ticket identifiers will always be in a "string-number" format !
-(int) numberFromJiraTicket:(NSString*)ticketId
{
//Get number as string
NSString *number = [[ticketId componentsSeparatedByString:#"-"] lastObject];
//Return the INT representation of the number
return [number intValue];
}

replace matches in NSString with template using NSRegularExpression

I'm trying to detect <br> or <Br> or < br>,... in NSString and replace it with \n.
I use NSRegularExpression and i wrote this code:
NSString *string = #"123 < br><br>1245; Ross <Br>Test 12<br>";
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"<[* ](br|BR|bR|Br|br)>" options:NSRegularExpressionCaseInsensitive error:&error];
NSString *modifiedString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:#"\n"];
NSLog(#"%#", modifiedString);
it works fine but it replace first matching only, not replacing all matches. Please help me to detect all matches and replace them.
Thanks
You currently don't handle an arbitrary amount of white space. For good measure you should also handle white space after br and also handle the closing slash since <br /> is the correct way of writing the line break in HTML.
You would end up with an pattern that looks like this
<\s*(br|BR|bR|Br|br)\s*\/*>
or written as a NSRegularExpression
NSError *error = NULL;
NSRegularExpression *regex =
[NSRegularExpression regularExpressionWithPattern:#"<\\s*(br|BR|bR|Br|br)\\s*\\/*>"
options:0
error:&error];
Edit
You could also make the pattern more compact by separating the two letters
<\s*([bB][rR])\s*\/*>
You're close, you need to have it handle any number of spaces after your initial <, and handle if it doesn't have any space at all.
Using your example, you can use the regex <\s*(br|BR|bR|Br|br)> to have it accept the 0 to N spaces before your BR works. You can also simplify it a little bit more by making it case insensitive with i, which allows for a cleaner looking regex to handle all the variations on BR you will see. To do that, use (?i)<\s*br>.
I think for completeness you can also include an arbitrary amount of space AFTER the br, just to handle anything that could be thrown. I agree with adding in some sort of catch for a /> to end the pattern, since <br/> is valid HTML as well. It makes the regex look a little more crazy, but it boils down to just adding the other 3 pieces.
(?i)<\s*br\s*\/?\s*>
It looks really scary, but breaks down very simply into a few parts:
(?i) turns on case insensitive to handle the variations on the br.
<\s* is the start of the tag directly followed by an arbitrary number of spaces.
br\s* is your br chars followed by an arbitrary number of spaces.
\/? is to handle 0 or 1 instances of the closing slash (to handle HTML valid tags like <br/> and <br>.
\s*> is handling an arbitrary number of spaces and then the closing >.

iOS Determine if two unicode characters are actually one letter in another language and put in tableview index

In a UITableView's index scroller (the scroller on the right side containing the chars for each section) how do I display a mix of English characters and say Japanese characters? Is there a way to grab the first char of an NSString and then check to see if it's actually part of a é or something (since é is 2 unicode characters -- e + `). Any code snippets would be very helpful. By just doing the first character, it ends up displays random characters like "=" or "~" instead of the japanese character
Thanks!
NOTE: I'm not using the UILocalizedIndexedCollation because I am using CoreData's FetchResultsController. In many places online I've read that you can't really use both.
EDIT: I can get the character now, however the tableview index doesn't seem to render them properly. Does anyone have something like Japanese characters displaying in the tableview index?
The most solid way is to use the NSString methods that are sensitive to these characters. You would probably be interested in the WWDC2011 - Session 128 - Advanced Text Processing video. It talks extensively about just this subject. Pay attention to the part about "Composed Character Sequences"
Based on the information presented there you could probably do something like this:
#warning I haven't tested this thoroughly
NSString *string = #"Hello";
__block NSString *firstCharacterSequence = nil;
[string enumerateSubstringsInRange:NSMakeRange(0, string.length)
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
firstCharacterSequence = substring;
*stop = YES;
}];
NSLog(#"%#",firstCharacterSequence);

NSPredicate and Regex

Can someone please help me with using Regex with NSPredicate?
NSString *regex = #"(?:[A-Za-z0-9])";
NSPredicate *pred = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", regex];
if ([pred evaluateWithObject:mystring])
{
//do something
}
testing the above wth mystring - qstring123 doesn't seem to work. I am expecting it to enter the if condition because it supposedly should match the regex.
Besides, I need a regex for alpha numberic allowing commas and spaces.
will this work?
#"(?:[A-Za-z0-9])*(?:,[A-sa-z0-9)*(?:\s[A-sa-s0-9])"
Please help.
From my experimentation, it tries to match the regex against the entire string, and won't match inside a string.
Therefore, the regex [a-zA-Z0-9]+ works, but [a-zA-Z0-9] does not.
With that in mind, you may want to rework your comma-matching predicate, or use a more full-featured regex solution, like the amazingly awesome RegexKit and RegexKitLite.