How to strip down the string? - objective-c

I have a really long string, I just want to extract some certain string inside that string. How can I do that?
for example I have:
this is the image <img src="http://vnexpress.net/Files/Subject/3b/bd/67/6f/chungkhoan-xanhdiem2.jpg"> and it is very beautiful.
and yes now i want to get substring this long string and get only http://vnexpress.net/Files/Subject/3b/bd/67/6f/chungkhoan-xanhdiem2.jpg
Please show me how I can do this.

You can use regular expressions for this:
NSRegularExpression* regex = [[NSRegularExpression alloc] initWithPattern:#"src=\"([^\"]*)\"" options:NSRegularExpressionCaseInsensitive error:nil];
NSString *text = #"this is the image <img src=\"http://vnexpress.net/Files/Subject/3b/bd/67/6f/chungkhoan-xanhdiem2.jpg\"> and it is very beautiful.";
NSArray *imgs = [regex matchesInString:text options:0 range:NSMakeRange(0, [text length])];
if (imgs.count != 0) {
NSTextCheckingResult* r = [imgs objectAtIndex:0];
NSLog(#"%#", [text substringWithRange:[r rangeAtIndex:1]]);
}
This regular expression is the heart of the solution:
src="([^"]*)"
It matches the content of the src attribute, and captures the content between the quotes (note a pair of parentheses). This caption is then retrieved in [r rangeAtIndex:1], and used to extract the part of the string that you are looking for.

You should use a regular expression, probably using the NSRegularExpression class.
Here's an example that does exactly what you want (from here):
- (NSString *)stripOutHttp:(NSString *)httpLine
{
// Setup an NSError object to catch any failures
NSError *error = NULL;
// create the NSRegularExpression object and initialize it with a pattern
// the pattern will match any http or https url, with option case insensitive
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"https?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)?"
options:NSRegularExpressionCaseInsensitive
error:&error];
// create an NSRange object using our regex object for the first match in the string httpline
NSRange rangeOfFirstMatch = [regex rangeOfFirstMatchInString:httpLine
options:0
range:NSMakeRange(0, [httpLine length])];
// check that our NSRange object is not equal to range of NSNotFound
if (!NSEqualRanges(rangeOfFirstMatch, NSMakeRange(NSNotFound, 0)))
{
// Since we know that we found a match, get the substring from the parent
// string by using our NSRange object
NSString *substringForFirstMatch = [httpLine substringWithRange:rangeOfFirstMatch];
NSLog(#"Extracted URL: %#",substringForFirstMatch);
// return the matching string
return substringForFirstMatch;
}
return NULL;
}

NSString *urlString = nil;
NSString *htmlString = //Your string;
NSScanner *scanner = [NSScanner scannerWithString:htmlString];
[scanner scanUpToString:#"<img" intoString:nil];
if (![scanner isAtEnd]) {
[scanner scanUpToString:#"http" intoString:nil];
NSCharacterSet *charset = [NSCharacterSet characterSetWithCharactersInString:#">"];
[scanner scanUpToCharactersFromSet:charset intoString:&urlString];
}
NSLog(#"%#", urlString);

Related

NSString doesn't overwrites with a new value

Trying to remove all the urls from text:
- (NSString *)cleanText:(NSString *)text{
NSString *string = #"This is a sample of a http://abc.com/efg.php?EFAei687e3EsA sentence with a URL within it.";
NSDataDetector *linkDetector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeLink error:nil];
NSArray *matches = [linkDetector matchesInString:string options:0 range:NSMakeRange(0, [string length])];
for (NSTextCheckingResult *match in matches) {
if ([match resultType] == NSTextCheckingTypeLink) {
NSString *matchingString = [match description];
NSLog(#"found URL: %#", matchingString);
string = [string stringByReplacingOccurrencesOfString:matchingString withString:#""];
}
}
NSLog(string);
return string;
}
However string returns unchanged (there is match).
Upd.: Console output:
found URL: <NSLinkCheckingResult: 0xb2b03f0>{22, 36}{http://abc.com/efg.php?EFAei687e3EsA}
2013-10-02 20:19:52.772
This is a sample of a http://abc.com/efg.php?EFAei687e3EsA sentence with a URL within it and a number 097843.
Ready working recipe done by #Raphael Schweikert.
The problem is that [match description] does not return the matching string; it returns a string that looks like this:
"<NSLinkCheckingResult: 0x8cd5150>{22,36}{http://abc.com/efg.php?EFAei687e3EsA}"
To replace the matched URL in your string, you should do:
string = [string stringByReplacingCharactersInRange:match.range withString:#""];
According to Apple’s own Douglas Davidson, the matches are guaranteed to be in the order they appear in the string. So instead of sorting the matches array (as I suggested), it can just be iterated in reverse.
The whole code sample would then look as follows:
NSString *string = #"This is a sample of a http://abc.com/efg.php sentence (http://abc.com/efg.php) with a URL within it and some more text afterwards so there is no index error.";
NSDataDetector *linkDetector = [NSDataDetector dataDetectorWithTypes:0|NSTextCheckingTypeLink error:nil];
NSArray *matches = [linkDetector matchesInString:string options:0 range:NSMakeRange(0, [string length])];
for (NSTextCheckingResult *match in [matches reverseObjectEnumerator]) {
string = [string stringByReplacingCharactersInRange:match.range withString:#""];
}
The check for match.resultType == NSTextCheckingTypeLink can be omitted as you’ve already specified in the options that you’re only interested in links.

String Trimming with Certain keyword

I have a string like below.
<br><br><br><br><br> SomeHtmlString <br><br><br><br><br>
I want to remove br tags like trim function preserving middle br tags in SomeHtmlString.
Is there any function to do this shortly?
e.g.
<br><br><br>test1<br><br>test2<br><br><br><br>
to
test1<br><br>test2
Here is a method using regular expressions. It matches only one at a time and replaces that either at the beginning of end of the string.
NSMutableString *replaceMe = [[NSMutableString alloc ]
initWithString:#"<br><br > <br > test<br>test2<br><br>"];
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"^ *<br *> *"
options:NSRegularExpressionCaseInsensitive
error:&error];
do {
;
} while ([regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""] != 0);
regex = [NSRegularExpression
regularExpressionWithPattern:#" *<br *> *$"
options:NSRegularExpressionCaseInsensitive
error:&error];
do {
;
} while ([regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""] != 0);
NSLog(#"string=%#", replaceMe);
and that does strip "<br><br > <br > test<br>test2<br><br>" down to test<br>test2.
It's probably not the neatest solution but it is very easy to modify to match different expressions, with different whitespace, for example.
It's also possible to use the regular expressions to match several <br>s in one go:
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"^ *(<br *> *)+"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""];
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#" *(<br *> *)+$"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""];
which avoids the looping but is a little harder to modify.
You can do this:
NSString* htmlString= #"<br><br><br><br><br> SomeHtmlString <br><br><br><br><br>";
NSString* pureString= [htmlString stringByReplacingOccurrencesOfString: #"<br>" withString: #""];
So you'll have #" SomeHtmlString " in pureString.
You could use this to strip out the unwanted bits:
[yourString stringByReplacingOccurrencesOfString:#"<br>" withString:#""];
Then you would use something like this to remake your string the way you want it:
NSString *newString = [NSString stringWithFormat:#"<br>%#<br>", yourString];
You might also want to look at stringByTrimmingCharactersInSet:
There are so many things you can do with NSString. Check out the Class Reference: https://developer.apple.com/library/mac/#documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html
EDIT:
substringToIndex: could be your friend here. You can do this to find out if the first 4 characters of your string consist of the characters you want to remove:
NSString *subString = [yourString substringToIndex:4];
if ([subString isEqualToString:#"<br>"]) {
yourString = [yourString substringFromIndex:4];
}
Then you are creating a new string without those 4 characters. You keep doing this until the first 4 character are not equal to the ones you want to remove.
You can do something similar at the end of your string using substringFromIndex. You will need to know the length of your original string to make sure none of your substrings go out of bounds.
Alternative regular expression rendition:
NSString *input = #"<br><br><br><br><br><br>test<br>test2<br><br><br><br><br><br><br><br><br><br>";
__block NSString *output;
NSError *error;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"^(<br>)*(.*?)(<br>)*$"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex enumerateMatchesInString:input
options:0
range:NSMakeRange(0, [input length])
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSRange matchRange = [result rangeAtIndex:2];
output = [input substringWithRange:matchRange];
}];
if (output)
NSLog(#"Found: %#", output);

NSRegularExpression to extract url between 2 strings

I'm using an NSRegularExpression to extract a URL between 2 strings, here is the whole string:
<a href="/url?q=http://www.myurl.com/videos/how-to/&sa=
I need to extract part between /url?q= and &sa=.
It can be found and dug out using a regex with positive look-behind and look-ahead, like this:
NSString *orgStr = #"<a href=\"/url?q=http://www.myurl.com/videos/how-to/&sa=";
NSString *URLRegExPattern = #"(?<=url\\?q=).*(?=&sa=)";
NSError *regExErr;
NSRegularExpression *URLRegEx = [NSRegularExpression regularExpressionWithPattern:URLRegExPattern
options:0
error:&regExErr];
NSString *URLString = nil;
NSRange range = [URLRegEx rangeOfFirstMatchInString:orgStr
options:0
range:NSMakeRange(0, orgStr.length)];
if (!NSEqualRanges(range, NSMakeRange(NSNotFound, 0))) {
URLString = [orgStr substringWithRange:rangeOfFirstMatch];
}
NSLog(#"URL: %#", URLString);
You can use the methode of NSString : - (NSArray *)componentsSeparatedByString:(NSString *)separator

In objective-c is it possible to get the position of a regex match within the string

If I have the string "Hello World", is it possible to use NSRegularExpression with the pattern #"World" to get the position of the match, i.e. in the "Hello World" example the position/index of the match should be "6"?
in php I'd use preg_match with the "PREG_OFFSET_CAPTURE" flag to achieve this, does objective-c support this?
You can do it the Cocoa way:
NSRegularExpression *regex = [[NSRegularExpression alloc] initWithPattern:#"world" options:0 error:NULL];
// omitted error checking for the sake of simplicity
NSString *str = #"Hello world!";
[regex enumerateMatchesInString:str
options:0
range:NSMakeRange(0, str.length)
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop)
{
NSLog(#"Match at [%d, %d]", result.range.location, result.range.length);
}];
[regex release];
Or the POSIX way (this may be convenient for you, since you want only one match, and this function/method returns the match range directly):
#include <regex.h>
- (NSRange)matchString:(NSString *)string toRegex:(NSString *)regex
{
regex_t regex_obj;
regmatch_t match;
const char *regex_str;
const char *match_str;
int error;
regex_str = [regex UTF8String];
error = regcomp(&regex_obj, regex_str, REG_EXTENDED);
if (error)
{
return NSMakeRange(NSNotFound, 0);
}
match_str = [string UTF8String];
error = regexec(&regex_obj, match_str, 1, &match, 0);
if (error)
{
return NSMakeRange(NSNotFound, 0);
}
regfree(&regex_obj);
return NSMakeRange(match.rm_so, match.rm_eo - match.rm_so);
}
This is somewhat long in Cocoa, but you can do it:
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"world"
options:NSRegularExpressionSearch
error:&error];
NSString *str = #"Hello, world!";
NSTextCheckingResult *match = [regex firstMatchInString:str
options:0
range:NSMakeRange(0, [str length])];
if (match) {
NSRange matchRange = [match range];
NSLog(#"%lu", matchRange.location);
}
This prints 7.
If you're going to make a lot of use of RegEx's, I recommend looking at RegexKit or RegexKitLite.
Yes it is possible. You can use the NSRegularExpression method, rangeOfFirstMatchInString:options:range: which returns the range of the first match. You could also do this with the NSString method rangeOfString: if you don't need to use REGEX.

Using NSRegularExpression to extract URLs on the iPhone

I'm using the following code on my iPhone app, taken from here to extract all URLs from striped .html code.
I'm only being able to extract the first URL, but I need an array containing all URLs. My NSArray isn't returning NSStrings for each URL, but the objects descriptions only.
How do I make my arrayOfAllMatches return all URLs, as NSStrings?
-(NSArray *)stripOutHttp:(NSString *)httpLine {
// Setup an NSError object to catch any failures
NSError *error = NULL;
// create the NSRegularExpression object and initialize it with a pattern
// the pattern will match any http or https url, with option case insensitive
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)?" options:NSRegularExpressionCaseInsensitive error:&error];
// create an NSRange object using our regex object for the first match in the string httpline
NSRange rangeOfFirstMatch = [regex rangeOfFirstMatchInString:httpLine options:0 range:NSMakeRange(0, [httpLine length])];
NSArray *arrayOfAllMatches = [regex matchesInString:httpLine options:0 range:NSMakeRange(0, [httpLine length])];
// check that our NSRange object is not equal to range of NSNotFound
if (!NSEqualRanges(rangeOfFirstMatch, NSMakeRange(NSNotFound, 0))) {
// Since we know that we found a match, get the substring from the parent string by using our NSRange object
NSString *substringForFirstMatch = [httpLine substringWithRange:rangeOfFirstMatch];
NSLog(#"Extracted URL: %#",substringForFirstMatch);
NSLog(#"All Extracted URLs: %#",arrayOfAllMatches);
// return all matching url strings
return arrayOfAllMatches;
}
return NULL;
}
Here is my NSLog output:
Extracted URL: http://example.com/myplayer
All Extracted URLs: (
"<NSExtendedRegularExpressionCheckingResult: 0x106ddb0>{728, 53}{<NSRegularExpression: 0x106bc30> http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)? 0x1}",
"<NSExtendedRegularExpressionCheckingResult: 0x106ddf0>{956, 66}{<NSRegularExpression: 0x106bc30> http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)? 0x1}",
"<NSExtendedRegularExpressionCheckingResult: 0x106de30>{1046, 63}{<NSRegularExpression: 0x106bc30> http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)? 0x1}",
"<NSExtendedRegularExpressionCheckingResult: 0x106de70>{1129, 67}{<NSRegularExpression: 0x106bc30> http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)? 0x1}"
)
The method matchesInString:options:range: returns an array of NSTextCheckingResult objects. You can use fast enumeration to iterate through the array, pull out the substring of each match from your original string, and add the substring to a new array.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)?" options:NSRegularExpressionCaseInsensitive error:&error];
NSArray *arrayOfAllMatches = [regex matchesInString:httpLine options:0 range:NSMakeRange(0, [httpLine length])];
NSMutableArray *arrayOfURLs = [[NSMutableArray alloc] init];
for (NSTextCheckingResult *match in arrayOfAllMatches) {
NSString* substringForMatch = [httpLine substringWithRange:match.range];
NSLog(#"Extracted URL: %#",substringForMatch);
[arrayOfURLs addObject:substringForMatch];
}
// return non-mutable version of the array
return [NSArray arrayWithArray:arrayOfURLs];
Try NSDataDetector
NSDataDetector *linkDetector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeLink error:nil];
NSArray *matches = [linkDetector matchesInString:text options:0 range:NSMakeRange(0, [text length])];
With NSDataDetector using Swift :
let types: NSTextCheckingType = .Link
var error : NSError?
let detector = NSDataDetector(types: types.rawValue, error: &error)
var matches = detector!.matchesInString(text, options: nil, range: NSMakeRange(0, count(text)))
for match in matches {
println(match.URL!)
}
Using Swift 2.0:
let text = "http://www.google.com. http://www.bla.com"
let types: NSTextCheckingType = .Link
let detector = try? NSDataDetector(types: types.rawValue)
guard let detect = detector else {
return
}
let matches = detect.matchesInString(text, options: .ReportCompletion, range: NSMakeRange(0, text.characters.count))
for match in matches {
print(match.URL!)
}
Using Swift 3.0
let text = "http://www.google.com. http://www.bla.com"
let types: NSTextCheckingResult.CheckingType = .link
let detector = try? NSDataDetector(types: types.rawValue)
let matches = detector?.matches(in: text, options: .reportCompletion, range: NSMakeRange(0, text.characters.count))
for match in matches! {
print(match.url!)
}
to get all links from a given string
NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:#"(?i)\\b((?:[a-z][\\w-]+:(?:/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»“”‘’]))" options:NSRegularExpressionCaseInsensitive error:NULL];
NSString *someString = #"www.facebook.com/link/index.php This is a sample www.google.com of a http://abc.com/efg.php?EFAei687e3EsA sentence with a URL within it.";
NSArray *matches = [expression matchesInString:someString options:NSMatchingCompleted range:NSMakeRange(0, someString.length)];
for (NSTextCheckingResult *result in matches) {
NSString *url = [someString substringWithRange:result.range];
NSLog(#"found url:%#", url);
}
I found myself so nauseated by the complexity of this simple operation ("match ALL the substrings") that I made a little library I am humbly calling Unsuck which adds some sanity to NSRegularExpression in the form of from and allMatches methods. Here's how you'd use them:
NSRegularExpression *re = [NSRegularExpression from: #"(?i)\\b(https?://.*)\\b"]; // or whatever your favorite regex is; Hossam's seems pretty good
NSArray *matches = [re allMatches:httpLine];
Please check out the unsuck source code on github and tell me all the things I did wrong :-)
Note that (?i) makes it case insensitive so you don't need to specify NSRegularExpressionCaseInsensitive.