Objective-C, regular expression match repetition - objective-c

I found a problem in regular expression to match all group repetition.
This is a simple example:
NSString *string = #"A1BA2BA3BC";
NSString *pattern = #"(A[^AB]+B)+C";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:&error];
NSArray *array = [regex matchesInString:string options:0 range:NSMakeRange(0, [string length])];
Returning array have one element which contains two ranges, whole input string and last captured group "A3B". First two groups, "A1B" and "A2B", are not captured as I expected.
I've tried all from greedy to lazy matching.

A Quantifier Does not Spawn New Capture Groups
Except in .NET, which has CaptureCollections, adding a quantifier to a capture group does not create more captures. The group number stays the same (in your case, Group 1), and the content returned is the last capture of the group.
Reference
Everything about Regex Capture Groups (see Generating New Capture Groups Automatically)
Iterating the Groups
If you wanted to match all the substrings while still validating that they are in a valid string (composed of such groups and ending in C), you could use:
A[^AB]+B(?=(?:A[^AB]+B)*C)
The whole string, of course, would be
^(?:A[^AB]+B)+C$
To iterate the substrings: something like
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"A[^AB]+B(?=(?:A[^AB]+B)*C)" options:0 error:&error];
NSArray *matches = [regex matchesInString:subject options:0 range:NSMakeRange(0, [subject length])];
NSUInteger matchCount = [matches count];
if (matchCount) {
for (NSUInteger matchIdx = 0; matchIdx < matchCount; matchIdx++) {
NSTextCheckingResult *match = [matches objectAtIndex:matchIdx];
NSRange matchRange = [match range];
NSString *result = [subject substringWithRange:matchRange];
}
}
else { // Nah... No matches.
}

Related

Regular expression to grub usernames from string

i need to find usernames (like twitter ones) in strings, for example, if the string is:
"Hello, #username! How are you? And #username2??"
I want to isolate/extract #username and #username2
Do you know how to do it in Objective-C, i found this for Python regex for Twitter username but does not work for me
I tried it like this, but is not working:
NSString *comment = #"Hello, #username! How are you? And #username2??";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(?<=^|(?<=[^a-zA-Z0-9-\\.]))#([A-Za-z]+[A-Za-z0-9-]+)" options:0 error:&error];
NSArray *matches = [regex matchesInString:comment options:0 range:NSMakeRange(0, comment.length)];
for (NSTextCheckingResult *match in matches) {
NSRange wordRange = [match rangeAtIndex:1];
NSString *username = [comment substringWithRange:wordRange];
NSLog(#"searchUsersInComment result --> %#", username);
}
(?<=^|(?<=[^a-zA-Z0-9-\\.]))#([A-Za-z]+[A-Za-z0-9-]+) is to neglect emails and grab only usernames, as your string doesn't contain any emails, you should just use #([A-Za-z]+[A-Za-z0-9-]+)
Your regex is wrong. You need to modify it to:
NSString *comment = #"Hello, #username! How are you? And #username2??";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"#([A-Za-z]+[A-Za-z0-9-]+)" options:0 error:&error];
NSArray *matches = [regex matchesInString:comment options:0 range:NSMakeRange(0, comment.length)];
for (NSTextCheckingResult *match in matches) {
NSRange wordRange = [match rangeAtIndex:1];
NSString *username = [comment substringWithRange:wordRange];
NSLog(#"searchUsersInComment result --> %#", username);
}
FYI: Any subpattern inside a pair of parentheses will be captured as a group. In practice, this can be used to extract information like phone numbers or emails from all sorts of data.
Imagine for example that you had a command line tool to list all the image files you have in the cloud. You could then use a pattern such as ^(IMG\d+.png)$ to capture and extract the full filename, but if you only wanted to capture the filename without the extension, you could use the pattern ^(IMG\d+).png$ which only captures the part before the period.
I would suggest you to read about regex strings: http://regexone.com/lesson/capturing_groups

How can I replace one pair of character with multiple occurrence in a string?

Original String is: This is a sentence with (noun) (verb) (adverb).
Original sentence has three occurrence of (). I need the last one intact but replace rest with #""
Required String: This is a sentence with (adverb).
I can do it with NSRange but I am looking for NSRegularExpression pattern.
Also which is more efficient, one with NSRange or the NSRegularExpression.
CODE
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"\\(.*?\\)" options:NSRegularExpressionCaseInsensitive error:NULL];
NSString *newString = [regex stringByReplacingMatchesInString:modify options:0 range:NSMakeRange(0, [modify length]) withTemplate:#""];
Output:: This is a sentence with
You can obtain the match ranges themselves and do the replacement manually, ignoring the last one.
NSMutableString* newString = [modify mutableCopy];
NSArray<NSTextCheckingResult*>* matches = [regex matchesInString:newString options:0 range:NSMakeRange(0, newString.length)];
if (matches.count >= 2)
{
// Enumerate backwards so that each replacement doesn't invalidate the other ranges
for (NSInteger i = matches.count - 2; i >= 0; i--)
{
NSTextCheckingResult* result = matches[i];
[newString replaceCharactersInRange:result.range withString:#""];
}
}

String Trimming with Certain keyword

I have a string like below.
<br><br><br><br><br> SomeHtmlString <br><br><br><br><br>
I want to remove br tags like trim function preserving middle br tags in SomeHtmlString.
Is there any function to do this shortly?
e.g.
<br><br><br>test1<br><br>test2<br><br><br><br>
to
test1<br><br>test2
Here is a method using regular expressions. It matches only one at a time and replaces that either at the beginning of end of the string.
NSMutableString *replaceMe = [[NSMutableString alloc ]
initWithString:#"<br><br > <br > test<br>test2<br><br>"];
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"^ *<br *> *"
options:NSRegularExpressionCaseInsensitive
error:&error];
do {
;
} while ([regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""] != 0);
regex = [NSRegularExpression
regularExpressionWithPattern:#" *<br *> *$"
options:NSRegularExpressionCaseInsensitive
error:&error];
do {
;
} while ([regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""] != 0);
NSLog(#"string=%#", replaceMe);
and that does strip "<br><br > <br > test<br>test2<br><br>" down to test<br>test2.
It's probably not the neatest solution but it is very easy to modify to match different expressions, with different whitespace, for example.
It's also possible to use the regular expressions to match several <br>s in one go:
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"^ *(<br *> *)+"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""];
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#" *(<br *> *)+$"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""];
which avoids the looping but is a little harder to modify.
You can do this:
NSString* htmlString= #"<br><br><br><br><br> SomeHtmlString <br><br><br><br><br>";
NSString* pureString= [htmlString stringByReplacingOccurrencesOfString: #"<br>" withString: #""];
So you'll have #" SomeHtmlString " in pureString.
You could use this to strip out the unwanted bits:
[yourString stringByReplacingOccurrencesOfString:#"<br>" withString:#""];
Then you would use something like this to remake your string the way you want it:
NSString *newString = [NSString stringWithFormat:#"<br>%#<br>", yourString];
You might also want to look at stringByTrimmingCharactersInSet:
There are so many things you can do with NSString. Check out the Class Reference: https://developer.apple.com/library/mac/#documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html
EDIT:
substringToIndex: could be your friend here. You can do this to find out if the first 4 characters of your string consist of the characters you want to remove:
NSString *subString = [yourString substringToIndex:4];
if ([subString isEqualToString:#"<br>"]) {
yourString = [yourString substringFromIndex:4];
}
Then you are creating a new string without those 4 characters. You keep doing this until the first 4 character are not equal to the ones you want to remove.
You can do something similar at the end of your string using substringFromIndex. You will need to know the length of your original string to make sure none of your substrings go out of bounds.
Alternative regular expression rendition:
NSString *input = #"<br><br><br><br><br><br>test<br>test2<br><br><br><br><br><br><br><br><br><br>";
__block NSString *output;
NSError *error;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"^(<br>)*(.*?)(<br>)*$"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex enumerateMatchesInString:input
options:0
range:NSMakeRange(0, [input length])
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSRange matchRange = [result rangeAtIndex:2];
output = [input substringWithRange:matchRange];
}];
if (output)
NSLog(#"Found: %#", output);

Objective-C NSRegularExpressions, finding first occurrence of numbers in a string

I'm pretty green at regex with Objective-C. I'm having some difficulty with it.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"\\b([1-9]+)\\b" options:NSRegularExpressionCaseInsensitive error:&regError];
if (regError) {
NSLog(#"%#",regError.localizedDescription);
}
__block NSString *foundModel = nil;
[regex enumerateMatchesInString:self.model options:kNilOptions range:NSMakeRange(0, [self.model length]) usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop) {
foundModel = [self.model substringWithRange:[match rangeAtIndex:0]];
*stop = YES;
}];
All I'm looking to do is take a string like
150A
And get
150
First the problems with the regex:
You are using word boundaries (\b) which means you are only
looking for a number that is by itself (e.g. 15 but not 150A).
Your number range does not include 0 so it would not capture 150. It needs to be [0-9]+ and better yet use \d+.
So to fix this, if you want to capture any number all you need is \d+. If you want to capture anything that starts with a number then only put the word boundary at the beginning \b\d+.
Now to get the first occurrence you can use -[regex rangeOfFirstMatchInString:options:range:]
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"\\b\\d+" options:NSRegularExpressionCaseInsensitive error:&regError];
if (regError) {
NSLog(#"%#",regError.localizedDescription);
}
NSString *model = #"150A";
NSString *foundModel = nil;
NSRange range = [regex rangeOfFirstMatchInString:model options:kNilOptions range:NSMakeRange(0, [model length])];
if(range.location != NSNotFound)
{
foundModel = [model substringWithRange:range];
}
NSLog(#"Model: %#", foundModel);
What about .*?(\d+).*? ?
Demo:
That would backreference the number and you would be able to use it wherever you want.

Why does NSRegularExpression not honor capture groups in all cases?

Main problem: ObjC can tell me there were six matches when my pattern is, #"\\b(\\S+)\\b", but when my pattern is #"A b (c) or (d)", it only reports one match, "c".
Solution
Here's a function which returns the capture groups as an NSArray. I'm an Objective C newbie so I suspect there are better ways to do the clunky work than by creating a mutable array and assigning it at the end to an NSArray.
- (NSArray *)regexWithResults:(NSString *)haystack pattern:(NSString *)strPattern
{
NSArray *ar;
ar = [[NSArray alloc] init];
NSError *error = NULL;
NSArray *arTextCheckingResults;
NSMutableArray *arMutable = [[NSMutableArray alloc] init];
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:strPattern
options:NSRegularExpressionSearch error:&error];
arTextCheckingResults = [regex matchesInString:haystack
options:0
range:NSMakeRange(0, [haystack length])];
for (NSTextCheckingResult *ntcr in arTextCheckingResults) {
int captureIndex;
for (captureIndex = 1; captureIndex < ntcr.numberOfRanges; captureIndex++) {
NSString * capture = [haystack substringWithRange:[ntcr rangeAtIndex:captureIndex]];
//NSLog(#"Found '%#'", capture);
[arMutable addObject:capture];
}
}
ar = arMutable;
return ar;
}
Problem
I am accustomed to using parentheses to match capture groups in Perl in a manner like this:
#!/usr/bin/perl -w
use strict;
my $str = "This sentence has words in it.";
if(my ($what, $inner) = ($str =~ /This (\S+) has (\S+) in it/)) {
print "That $what had '$inner' in it.\n";
}
That code will produce:
That sentence had 'words' in it.
But in Objective C, with NSRegularExpression, we get different results. Sample function:
- (void)regexTest:(NSString *)haystack pattern:(NSString *)strPattern
{
NSError *error = NULL;
NSArray *arTextCheckingResults;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:strPattern
options:NSRegularExpressionSearch
error:&error];
NSUInteger numberOfMatches = [regex numberOfMatchesInString:haystack options:0 range:NSMakeRange(0, [haystack length])];
NSLog(#"Pattern: '%#'", strPattern);
NSLog(#"Search text: '%#'", haystack);
NSLog(#"Number of matches: %lu", numberOfMatches);
arTextCheckingResults = [regex matchesInString:haystack options:0 range:NSMakeRange(0, [haystack length])];
for (NSTextCheckingResult *ntcr in arTextCheckingResults) {
NSString *match = [haystack substringWithRange:[ntcr rangeAtIndex:1]];
NSLog(#"Found string '%#'", match);
}
}
Calls to that test function, and the results show it is able to count the number of words in the string:
NSString *searchText = #"This sentence has words in it.";
[myClass regexTest:searchText pattern:#"\\b(\\S+)\\b"];
Pattern: '\b(\S+)\b'
Search text: 'This sentence has words in it.'
Number of matches: 6
Found string 'This'
Found string 'sentence'
Found string 'has'
Found string 'words'
Found string 'in'
Found string 'it'
But what if the capture groups are explicit, like so?
[myClass regexTest:searchText pattern:#".*This (sentence) has (words) in it.*"];
Result:
Pattern: '.*This (sentence) has (words) in it.*'
Search text: 'This sentence has words in it.'
Number of matches: 1
Found string 'sentence'
Same as above, but with \S+ instead of the actual words:
[myClass regexTest:searchText pattern:#".*This (\\S+) has (\\S+) in it.*"];
Result:
Pattern: '.*This (\S+) has (\S+) in it.*'
Search text: 'This sentence has words in it.'
Number of matches: 1
Found string 'sentence'
How about a wildcard in the middle?
[myClass regexTest:searchText pattern:#"^This (\\S+) .* (\\S+) in it.$"];
Result:
Pattern: '^This (\S+) .* (\S+) in it.$'
Search text: 'This sentence has words in it.'
Number of matches: 1
Found string 'sentence'
References:
NSRegularExpression
NSTextCheckingResult
NSRegularExpression matching options
I think if you change
// returns the range which matched the pattern
NSString *match = [haystack substringWithRange:ntcr.range];
to
// returns the range of the first capture
NSString *match = [haystack substringWithRange:[ntcr rangeAtIndex:1]];
You will get the expected result, for patterns containing a single capture.
See the doc page for NSTextCheckingResult:rangeAtIndex:
A result must have at least one range, but may optionally have more (for example, to represent regular expression capture groups).
Passing rangeAtIndex: the value 0 always returns the value of the the range property. Additional ranges, if any, will have indexes from 1 to numberOfRanges-1.
Change the NSTextCheckingResult:
- (void)regexTest:(NSString *)haystack pattern:(NSString *)strPattern
{
NSError *error = NULL;
NSArray *arTextCheckingResults;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:strPattern
options:NSRegularExpressionSearch
error:&error];
NSRange stringRange = NSMakeRange(0, [haystack length]);
NSUInteger numberOfMatches = [regex numberOfMatchesInString:haystack
options:0 range:stringRange];
NSLog(#"Number of matches for '%#' in '%#': %u", strPattern, haystack, numberOfMatches);
arTextCheckingResults = [regex matchesInString:haystack options:NSRegularExpressionCaseInsensitive range:stringRange];
for (NSTextCheckingResult *ntcr in arTextCheckingResults) {
NSRange matchRange = [ntcr rangeAtIndex:1];
NSString *match = [haystack substringWithRange:matchRange];
NSLog(#"Found string '%#'", match);
}
}
NSLog output:
Found string 'words'