NSRegularExpression to extract url between 2 strings - objective-c

I'm using an NSRegularExpression to extract a URL between 2 strings, here is the whole string:
<a href="/url?q=http://www.myurl.com/videos/how-to/&sa=
I need to extract part between /url?q= and &sa=.

It can be found and dug out using a regex with positive look-behind and look-ahead, like this:
NSString *orgStr = #"<a href=\"/url?q=http://www.myurl.com/videos/how-to/&sa=";
NSString *URLRegExPattern = #"(?<=url\\?q=).*(?=&sa=)";
NSError *regExErr;
NSRegularExpression *URLRegEx = [NSRegularExpression regularExpressionWithPattern:URLRegExPattern
options:0
error:&regExErr];
NSString *URLString = nil;
NSRange range = [URLRegEx rangeOfFirstMatchInString:orgStr
options:0
range:NSMakeRange(0, orgStr.length)];
if (!NSEqualRanges(range, NSMakeRange(NSNotFound, 0))) {
URLString = [orgStr substringWithRange:rangeOfFirstMatch];
}
NSLog(#"URL: %#", URLString);

You can use the methode of NSString : - (NSArray *)componentsSeparatedByString:(NSString *)separator

Related

How to Get Percentage From a NSString - Objective C

I would like to get a substring for a NSString that contains a percentage value.
For example:
1. Get 10% off with this item.
2. 55% off when you purchase this.
function should return 10% and 55% respectively.
I am using regex in Java \\d+%
I don't know how to do the same in objective c.
I have searched it but I am a bit lost.
You should be able to use NSRegularExpression to execute the same regex that you use in java. There is a good tutorial for NSRegularExpression here.
https://www.raywenderlich.com/30288/nsregularexpression-tutorial-and-cheat-sheet
I was able to accomplish it with this code:
NSString *string = #"10% off with this item";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"\\d+%" options:0 error:&error];
NSTextCheckingResult *result = [regex firstMatchInString:string options:0 range:NSMakeRange(0, [string length])];
NSString *substring = [string substringWithRange:result.range];
NSLog(#"%#", substring); // 10%
The key is in the TextCheckingResult. It contains the NSRange for the match in the original string so you can grab a substring of the match.

How to use this regex with NSRegularExpression?

I'm trying to extract the youtube video id from a URL using the regex from this answer. However I can't figure out how to format it correctly to work with NSRegularExpression. I have tried escaping the backslashes for C, as well as using escapedTemplateForString and escapedPatternForString. I also tried adding a backslash before the opening/closing brackets. Each case returns NSNotFound for all URLs I try.
// Original: /^.*(?:(?:youtu\.be\/|v\/|vi\/|u\/\w\/|embed\/)|(?:(?:watch)?\?v(?:i)?=|\&v(?:i)?=))([^#\&\?]*).*/
NSString *c_escaped = #"/^.*(youtu.be\\/|v\\/|e\\/|u\\/\\w+\\/|embed\\/|v=)([^#\\&\\?]*).*/";
NSString *template = [NSRegularExpression escapedTemplateForString:c_escaped]; // "/^.*(youtu.be\\/|v\\/|e\\/|u\\/\\w+\\/|embed\\/|v=)([^#\\&\\?]*).*/"
NSString *pattern = [NSRegularExpression escapedPatternForString:c_escaped]; // "\/\^\.\*\(youtu\.be\\\/\|v\\\/\|e\\\/\|u\\\/\\w\+\\\/\|embed\\\/\|v=\)\(\[\^#\\&\\\?]\*\)\.\*\/"
NSRegularExpression *expr = [NSRegularExpression regularExpressionWithPattern:c_escaped
options:0
error:&error];
NSRange range = [expr rangeOfFirstMatchInString:self options:0 range:NSMakeRange(0, self.length)];
NSRegularExpression *expr1 = [NSRegularExpression regularExpressionWithPattern:template
options:0
error:&error];
NSRange range1 = [expr1 rangeOfFirstMatchInString:self options:0 range:NSMakeRange(0, self.length)];
NSRegularExpression *expr2 = [NSRegularExpression regularExpressionWithPattern:pattern
options:0
error:&error];
NSRange range2 = [expr2 rangeOfFirstMatchInString:self options:0 range:NSMakeRange(0, self.length)];
These are the URLs I've tested against:
NSArray *urls = #[
#"//www.youtube-nocookie.com/embed/up_lNV-yoK4?rel=0",
#"http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo",
#"http://www.youtube.com/watch?v=cKZDdG9FTKY&feature=channel",
#"http://www.youtube.com/watch?v=yZ-K7nCVnBI&playnext_from=TL&videos=osPknwzXEas&feature=sub",
#"http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I",
#"http://www.youtube.com/user/SilkRoadTheatre#p/a/u/2/6dwqZw0j_jY",
#"http://youtu.be/6dwqZw0j_jY",
#"http://www.youtube.com/watch?v=6dwqZw0j_jY&feature=youtu.be",
#"http://youtu.be/afa-5HQHiAs",
#"http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo?rel=0",
#"http://www.youtube.com/watch?v=cKZDdG9FTKY&feature=channel",
#"http://www.youtube.com/watch?v=yZ-K7nCVnBI&playnext_from=TL&videos=osPknwzXEas&feature=sub",
#"http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I",
#"http://www.youtube.com/embed/nas1rJpm7wY?rel=0",
#"http://www.youtube.com/watch?v=peFZbP64dsU",
#"http://youtube.com/v/dQw4w9WgXcQ?feature=youtube_gdata_player",
#"http://youtube.com/vi/dQw4w9WgXcQ?feature=youtube_gdata_player",
#"http://youtube.com/?v=dQw4w9WgXcQ&feature=youtube_gdata_player",
#"http://www.youtube.com/watch?v=dQw4w9WgXcQ&feature=youtube_gdata_player",
#"http://youtube.com/?vi=dQw4w9WgXcQ&feature=youtube_gdata_player",
#"http://youtube.com/watch?v=dQw4w9WgXcQ&feature=youtube_gdata_player",
#"http://youtube.com/watch?vi=dQw4w9WgXcQ&feature=youtube_gdata_player",
#"http://youtu.be/dQw4w9WgXcQ?feature=youtube_gdata_player"];
You need to delete the leading and trailing slashes. The problem is, you're adapting this from JavaScript which allows the use of "/" to delimit strings. You also need to escape the backslashes to make the Obj-C compiler do the right thing, but that's all. Try this:
#implementation NSString (youtube)
- (BOOL)isYouTubeURL
{
NSString *youtubePattern = #"^.*(?:(?:youtu\\.be\\/|v\\/|vi\\/|u\\/\\w\\/|embed\\/)|(?:(?:watch)?\\?v(?:i)?=|\\&v(?:i)?=))([^#\\&\\?]*).*";
NSRegularExpression *expr = [NSRegularExpression regularExpressionWithPattern:youtubePattern
options:0
error:nil];
NSRange range = [expr rangeOfFirstMatchInString:self options:0 range:NSMakeRange(0, self.length)];
return range.location != NSNotFound;
}
#end

Regular expressions to filter text

in objective-c I have a string as follows:
CAST(407704969.734560,
I want to extract the digits:
407704969.734560
The code I'm using is this one:
NSString *stringToCheck = #"CAST(407704969.734560,"
NSRange searchedRange = NSMakeRange(0, [stringToCheck length]);
NSString *pattern = #"(?<=CAST\\()(\\d+?.?\\d+?)(?=,)";
NSError *error = nil;
NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:&error];
NSArray* matches = [regex matchesInString:stringToCheck options:0 range: searchedRange];
for (NSTextCheckingResult* match in matches) {
NSString* matchText = [stringToCheck substringWithRange:[match range]];
NSLog(#"match: %#", matchText);
}
I guess the problem is in the regex, seen that I can't find any tutorial about it.
You could try using following regex:
PATTERN
CAST\((\d+?\.?\d+?),
INPUT
CAST(407704969.734560,
OUTPUT
Match 1: CAST(407704969.734560,
Group 1: 407704969.734560
Or if you only need the digits try this:
PATTERN
(?<=CAST\()(\d+?\.?\d+?)(?=,)
INPUT
CAST(407704969.734560,
OUTPUT
Match 1: 407704969.734560
And here you have not long but really nice regex tutorial:
www.codeproject.com

String Trimming with Certain keyword

I have a string like below.
<br><br><br><br><br> SomeHtmlString <br><br><br><br><br>
I want to remove br tags like trim function preserving middle br tags in SomeHtmlString.
Is there any function to do this shortly?
e.g.
<br><br><br>test1<br><br>test2<br><br><br><br>
to
test1<br><br>test2
Here is a method using regular expressions. It matches only one at a time and replaces that either at the beginning of end of the string.
NSMutableString *replaceMe = [[NSMutableString alloc ]
initWithString:#"<br><br > <br > test<br>test2<br><br>"];
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"^ *<br *> *"
options:NSRegularExpressionCaseInsensitive
error:&error];
do {
;
} while ([regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""] != 0);
regex = [NSRegularExpression
regularExpressionWithPattern:#" *<br *> *$"
options:NSRegularExpressionCaseInsensitive
error:&error];
do {
;
} while ([regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""] != 0);
NSLog(#"string=%#", replaceMe);
and that does strip "<br><br > <br > test<br>test2<br><br>" down to test<br>test2.
It's probably not the neatest solution but it is very easy to modify to match different expressions, with different whitespace, for example.
It's also possible to use the regular expressions to match several <br>s in one go:
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"^ *(<br *> *)+"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""];
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#" *(<br *> *)+$"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""];
which avoids the looping but is a little harder to modify.
You can do this:
NSString* htmlString= #"<br><br><br><br><br> SomeHtmlString <br><br><br><br><br>";
NSString* pureString= [htmlString stringByReplacingOccurrencesOfString: #"<br>" withString: #""];
So you'll have #" SomeHtmlString " in pureString.
You could use this to strip out the unwanted bits:
[yourString stringByReplacingOccurrencesOfString:#"<br>" withString:#""];
Then you would use something like this to remake your string the way you want it:
NSString *newString = [NSString stringWithFormat:#"<br>%#<br>", yourString];
You might also want to look at stringByTrimmingCharactersInSet:
There are so many things you can do with NSString. Check out the Class Reference: https://developer.apple.com/library/mac/#documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html
EDIT:
substringToIndex: could be your friend here. You can do this to find out if the first 4 characters of your string consist of the characters you want to remove:
NSString *subString = [yourString substringToIndex:4];
if ([subString isEqualToString:#"<br>"]) {
yourString = [yourString substringFromIndex:4];
}
Then you are creating a new string without those 4 characters. You keep doing this until the first 4 character are not equal to the ones you want to remove.
You can do something similar at the end of your string using substringFromIndex. You will need to know the length of your original string to make sure none of your substrings go out of bounds.
Alternative regular expression rendition:
NSString *input = #"<br><br><br><br><br><br>test<br>test2<br><br><br><br><br><br><br><br><br><br>";
__block NSString *output;
NSError *error;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"^(<br>)*(.*?)(<br>)*$"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex enumerateMatchesInString:input
options:0
range:NSMakeRange(0, [input length])
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSRange matchRange = [result rangeAtIndex:2];
output = [input substringWithRange:matchRange];
}];
if (output)
NSLog(#"Found: %#", output);

How to strip down the string?

I have a really long string, I just want to extract some certain string inside that string. How can I do that?
for example I have:
this is the image <img src="http://vnexpress.net/Files/Subject/3b/bd/67/6f/chungkhoan-xanhdiem2.jpg"> and it is very beautiful.
and yes now i want to get substring this long string and get only http://vnexpress.net/Files/Subject/3b/bd/67/6f/chungkhoan-xanhdiem2.jpg
Please show me how I can do this.
You can use regular expressions for this:
NSRegularExpression* regex = [[NSRegularExpression alloc] initWithPattern:#"src=\"([^\"]*)\"" options:NSRegularExpressionCaseInsensitive error:nil];
NSString *text = #"this is the image <img src=\"http://vnexpress.net/Files/Subject/3b/bd/67/6f/chungkhoan-xanhdiem2.jpg\"> and it is very beautiful.";
NSArray *imgs = [regex matchesInString:text options:0 range:NSMakeRange(0, [text length])];
if (imgs.count != 0) {
NSTextCheckingResult* r = [imgs objectAtIndex:0];
NSLog(#"%#", [text substringWithRange:[r rangeAtIndex:1]]);
}
This regular expression is the heart of the solution:
src="([^"]*)"
It matches the content of the src attribute, and captures the content between the quotes (note a pair of parentheses). This caption is then retrieved in [r rangeAtIndex:1], and used to extract the part of the string that you are looking for.
You should use a regular expression, probably using the NSRegularExpression class.
Here's an example that does exactly what you want (from here):
- (NSString *)stripOutHttp:(NSString *)httpLine
{
// Setup an NSError object to catch any failures
NSError *error = NULL;
// create the NSRegularExpression object and initialize it with a pattern
// the pattern will match any http or https url, with option case insensitive
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"https?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)?"
options:NSRegularExpressionCaseInsensitive
error:&error];
// create an NSRange object using our regex object for the first match in the string httpline
NSRange rangeOfFirstMatch = [regex rangeOfFirstMatchInString:httpLine
options:0
range:NSMakeRange(0, [httpLine length])];
// check that our NSRange object is not equal to range of NSNotFound
if (!NSEqualRanges(rangeOfFirstMatch, NSMakeRange(NSNotFound, 0)))
{
// Since we know that we found a match, get the substring from the parent
// string by using our NSRange object
NSString *substringForFirstMatch = [httpLine substringWithRange:rangeOfFirstMatch];
NSLog(#"Extracted URL: %#",substringForFirstMatch);
// return the matching string
return substringForFirstMatch;
}
return NULL;
}
NSString *urlString = nil;
NSString *htmlString = //Your string;
NSScanner *scanner = [NSScanner scannerWithString:htmlString];
[scanner scanUpToString:#"<img" intoString:nil];
if (![scanner isAtEnd]) {
[scanner scanUpToString:#"http" intoString:nil];
NSCharacterSet *charset = [NSCharacterSet characterSetWithCharactersInString:#">"];
[scanner scanUpToCharactersFromSet:charset intoString:&urlString];
}
NSLog(#"%#", urlString);