Regular expression to grub usernames from string - objective-c

i need to find usernames (like twitter ones) in strings, for example, if the string is:
"Hello, #username! How are you? And #username2??"
I want to isolate/extract #username and #username2
Do you know how to do it in Objective-C, i found this for Python regex for Twitter username but does not work for me
I tried it like this, but is not working:
NSString *comment = #"Hello, #username! How are you? And #username2??";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(?<=^|(?<=[^a-zA-Z0-9-\\.]))#([A-Za-z]+[A-Za-z0-9-]+)" options:0 error:&error];
NSArray *matches = [regex matchesInString:comment options:0 range:NSMakeRange(0, comment.length)];
for (NSTextCheckingResult *match in matches) {
NSRange wordRange = [match rangeAtIndex:1];
NSString *username = [comment substringWithRange:wordRange];
NSLog(#"searchUsersInComment result --> %#", username);
}

(?<=^|(?<=[^a-zA-Z0-9-\\.]))#([A-Za-z]+[A-Za-z0-9-]+) is to neglect emails and grab only usernames, as your string doesn't contain any emails, you should just use #([A-Za-z]+[A-Za-z0-9-]+)
Your regex is wrong. You need to modify it to:
NSString *comment = #"Hello, #username! How are you? And #username2??";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"#([A-Za-z]+[A-Za-z0-9-]+)" options:0 error:&error];
NSArray *matches = [regex matchesInString:comment options:0 range:NSMakeRange(0, comment.length)];
for (NSTextCheckingResult *match in matches) {
NSRange wordRange = [match rangeAtIndex:1];
NSString *username = [comment substringWithRange:wordRange];
NSLog(#"searchUsersInComment result --> %#", username);
}
FYI: Any subpattern inside a pair of parentheses will be captured as a group. In practice, this can be used to extract information like phone numbers or emails from all sorts of data.
Imagine for example that you had a command line tool to list all the image files you have in the cloud. You could then use a pattern such as ^(IMG\d+.png)$ to capture and extract the full filename, but if you only wanted to capture the filename without the extension, you could use the pattern ^(IMG\d+).png$ which only captures the part before the period.
I would suggest you to read about regex strings: http://regexone.com/lesson/capturing_groups

Related

Regular expression substitution problem in Objective-C

Trying to capitalize all tags and running into trouble with substitution. Any idea why "upperCaseString" method isn't working?
NSError *error = nil;
NSMutableString *stringToCap = [NSMutableString stringWithString:#"<kaboom>stuff</kaboom>"];
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(</?[a-zA-Z].*?>)" options:NSRegularExpressionCaseInsensitive error:&error];
NSMutableString *modifiedString = [NSMutableString stringWithString:[regex stringByReplacingMatchesInString:stringToCap options:0 range:NSMakeRange(0, [stringToCap length]) withTemplate:#"$1".uppercaseString]];
NSLog(#"%#", modifiedString);
Produces: <kaboom>stuff</kaboom> when I expect <KABOOM>stuff</KABOOM>
stringByReplacingMatchesInString:options:range:withTemplate: doesn't work like that, the type of the last argument is just NSString and the string you are passing is the result of the expression #"$1".uppercaseString – which is just #"$1".
A possible algorithm (pseudo code):
for NSTextCheckingResult *match in [regex matchesInString:... options:... range:...] do
extract the substring at match.range from modified string
uppercase it
replace the substring at match.range with uppercased result

How to Get Percentage From a NSString - Objective C

I would like to get a substring for a NSString that contains a percentage value.
For example:
1. Get 10% off with this item.
2. 55% off when you purchase this.
function should return 10% and 55% respectively.
I am using regex in Java \\d+%
I don't know how to do the same in objective c.
I have searched it but I am a bit lost.
You should be able to use NSRegularExpression to execute the same regex that you use in java. There is a good tutorial for NSRegularExpression here.
https://www.raywenderlich.com/30288/nsregularexpression-tutorial-and-cheat-sheet
I was able to accomplish it with this code:
NSString *string = #"10% off with this item";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"\\d+%" options:0 error:&error];
NSTextCheckingResult *result = [regex firstMatchInString:string options:0 range:NSMakeRange(0, [string length])];
NSString *substring = [string substringWithRange:result.range];
NSLog(#"%#", substring); // 10%
The key is in the TextCheckingResult. It contains the NSRange for the match in the original string so you can grab a substring of the match.

Objective-C, regular expression match repetition

I found a problem in regular expression to match all group repetition.
This is a simple example:
NSString *string = #"A1BA2BA3BC";
NSString *pattern = #"(A[^AB]+B)+C";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:&error];
NSArray *array = [regex matchesInString:string options:0 range:NSMakeRange(0, [string length])];
Returning array have one element which contains two ranges, whole input string and last captured group "A3B". First two groups, "A1B" and "A2B", are not captured as I expected.
I've tried all from greedy to lazy matching.
A Quantifier Does not Spawn New Capture Groups
Except in .NET, which has CaptureCollections, adding a quantifier to a capture group does not create more captures. The group number stays the same (in your case, Group 1), and the content returned is the last capture of the group.
Reference
Everything about Regex Capture Groups (see Generating New Capture Groups Automatically)
Iterating the Groups
If you wanted to match all the substrings while still validating that they are in a valid string (composed of such groups and ending in C), you could use:
A[^AB]+B(?=(?:A[^AB]+B)*C)
The whole string, of course, would be
^(?:A[^AB]+B)+C$
To iterate the substrings: something like
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"A[^AB]+B(?=(?:A[^AB]+B)*C)" options:0 error:&error];
NSArray *matches = [regex matchesInString:subject options:0 range:NSMakeRange(0, [subject length])];
NSUInteger matchCount = [matches count];
if (matchCount) {
for (NSUInteger matchIdx = 0; matchIdx < matchCount; matchIdx++) {
NSTextCheckingResult *match = [matches objectAtIndex:matchIdx];
NSRange matchRange = [match range];
NSString *result = [subject substringWithRange:matchRange];
}
}
else { // Nah... No matches.
}

Objective-c NSRegularExpression doesn't match

I want to search a string with my regex, but my regex doesn't match anything ...
This is the content I have:
<h4>Text</h4>
<p><span>Some text I want to catch</p></span>
<p><span>Some text I want to catch</p></span>
<p><span>Some text I want to catch</p></span>
<h4>Other Text</h4>
<p><span>...<p><span>
...
This is my NSRegularExpression:
NSString *regex = #"<h4>Text</h4>(.*?)<h4>";
NSError *error = nil;
NSRegularExpression *pattern = [NSRegularExpression regularExpressionWithPattern:regex options:0 error:&error];
NSRange rangeOfString = NSMakeRange(0, content.length);
NSArray *matches = [[NSArray alloc] init];
matches = [pattern matchesInString:content options:0 range:rangeOfString];
NSString *matchText;
NSMutableArray *mutableArray = [[NSMutableArray alloc] init];
for (NSTextCheckingResult *match in matches) {
matchText = [content substringWithRange:[match rangeAtIndex:1]];
[mutableArray addObject:matchText];
}
I want to catch the text (with tags) between the two headlines, but my NSArray "matches" / NSMutableArray "mutableArray" is still empty.
My other regex are working ...
I checked this regex in an online regex-evaluator and got my text but in my application this regular expression doesn't work.
Is something wrong with my code or regular expression?
By default, the dot . does not match a line separator.
Since the text that you want to capture spans multiple lines, you have to add the
NSRegularExpressionDotMatchesLineSeparators option:
NSRegularExpression *pattern = [NSRegularExpression regularExpressionWithPattern:regex
options:NSRegularExpressionDotMatchesLineSeparators
error:&error];
Alternatively, add (?s) to the pattern to add the "s" flag.

String Trimming with Certain keyword

I have a string like below.
<br><br><br><br><br> SomeHtmlString <br><br><br><br><br>
I want to remove br tags like trim function preserving middle br tags in SomeHtmlString.
Is there any function to do this shortly?
e.g.
<br><br><br>test1<br><br>test2<br><br><br><br>
to
test1<br><br>test2
Here is a method using regular expressions. It matches only one at a time and replaces that either at the beginning of end of the string.
NSMutableString *replaceMe = [[NSMutableString alloc ]
initWithString:#"<br><br > <br > test<br>test2<br><br>"];
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"^ *<br *> *"
options:NSRegularExpressionCaseInsensitive
error:&error];
do {
;
} while ([regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""] != 0);
regex = [NSRegularExpression
regularExpressionWithPattern:#" *<br *> *$"
options:NSRegularExpressionCaseInsensitive
error:&error];
do {
;
} while ([regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""] != 0);
NSLog(#"string=%#", replaceMe);
and that does strip "<br><br > <br > test<br>test2<br><br>" down to test<br>test2.
It's probably not the neatest solution but it is very easy to modify to match different expressions, with different whitespace, for example.
It's also possible to use the regular expressions to match several <br>s in one go:
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"^ *(<br *> *)+"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""];
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#" *(<br *> *)+$"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""];
which avoids the looping but is a little harder to modify.
You can do this:
NSString* htmlString= #"<br><br><br><br><br> SomeHtmlString <br><br><br><br><br>";
NSString* pureString= [htmlString stringByReplacingOccurrencesOfString: #"<br>" withString: #""];
So you'll have #" SomeHtmlString " in pureString.
You could use this to strip out the unwanted bits:
[yourString stringByReplacingOccurrencesOfString:#"<br>" withString:#""];
Then you would use something like this to remake your string the way you want it:
NSString *newString = [NSString stringWithFormat:#"<br>%#<br>", yourString];
You might also want to look at stringByTrimmingCharactersInSet:
There are so many things you can do with NSString. Check out the Class Reference: https://developer.apple.com/library/mac/#documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html
EDIT:
substringToIndex: could be your friend here. You can do this to find out if the first 4 characters of your string consist of the characters you want to remove:
NSString *subString = [yourString substringToIndex:4];
if ([subString isEqualToString:#"<br>"]) {
yourString = [yourString substringFromIndex:4];
}
Then you are creating a new string without those 4 characters. You keep doing this until the first 4 character are not equal to the ones you want to remove.
You can do something similar at the end of your string using substringFromIndex. You will need to know the length of your original string to make sure none of your substrings go out of bounds.
Alternative regular expression rendition:
NSString *input = #"<br><br><br><br><br><br>test<br>test2<br><br><br><br><br><br><br><br><br><br>";
__block NSString *output;
NSError *error;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"^(<br>)*(.*?)(<br>)*$"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex enumerateMatchesInString:input
options:0
range:NSMakeRange(0, [input length])
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSRange matchRange = [result rangeAtIndex:2];
output = [input substringWithRange:matchRange];
}];
if (output)
NSLog(#"Found: %#", output);