String Trimming with Certain keyword - objective-c

I have a string like below.
<br><br><br><br><br> SomeHtmlString <br><br><br><br><br>
I want to remove br tags like trim function preserving middle br tags in SomeHtmlString.
Is there any function to do this shortly?
e.g.
<br><br><br>test1<br><br>test2<br><br><br><br>
to
test1<br><br>test2

Here is a method using regular expressions. It matches only one at a time and replaces that either at the beginning of end of the string.
NSMutableString *replaceMe = [[NSMutableString alloc ]
initWithString:#"<br><br > <br > test<br>test2<br><br>"];
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"^ *<br *> *"
options:NSRegularExpressionCaseInsensitive
error:&error];
do {
;
} while ([regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""] != 0);
regex = [NSRegularExpression
regularExpressionWithPattern:#" *<br *> *$"
options:NSRegularExpressionCaseInsensitive
error:&error];
do {
;
} while ([regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""] != 0);
NSLog(#"string=%#", replaceMe);
and that does strip "<br><br > <br > test<br>test2<br><br>" down to test<br>test2.
It's probably not the neatest solution but it is very easy to modify to match different expressions, with different whitespace, for example.
It's also possible to use the regular expressions to match several <br>s in one go:
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"^ *(<br *> *)+"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""];
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#" *(<br *> *)+$"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""];
which avoids the looping but is a little harder to modify.

You can do this:
NSString* htmlString= #"<br><br><br><br><br> SomeHtmlString <br><br><br><br><br>";
NSString* pureString= [htmlString stringByReplacingOccurrencesOfString: #"<br>" withString: #""];
So you'll have #" SomeHtmlString " in pureString.

You could use this to strip out the unwanted bits:
[yourString stringByReplacingOccurrencesOfString:#"<br>" withString:#""];
Then you would use something like this to remake your string the way you want it:
NSString *newString = [NSString stringWithFormat:#"<br>%#<br>", yourString];
You might also want to look at stringByTrimmingCharactersInSet:
There are so many things you can do with NSString. Check out the Class Reference: https://developer.apple.com/library/mac/#documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html
EDIT:
substringToIndex: could be your friend here. You can do this to find out if the first 4 characters of your string consist of the characters you want to remove:
NSString *subString = [yourString substringToIndex:4];
if ([subString isEqualToString:#"<br>"]) {
yourString = [yourString substringFromIndex:4];
}
Then you are creating a new string without those 4 characters. You keep doing this until the first 4 character are not equal to the ones you want to remove.
You can do something similar at the end of your string using substringFromIndex. You will need to know the length of your original string to make sure none of your substrings go out of bounds.

Alternative regular expression rendition:
NSString *input = #"<br><br><br><br><br><br>test<br>test2<br><br><br><br><br><br><br><br><br><br>";
__block NSString *output;
NSError *error;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"^(<br>)*(.*?)(<br>)*$"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex enumerateMatchesInString:input
options:0
range:NSMakeRange(0, [input length])
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSRange matchRange = [result rangeAtIndex:2];
output = [input substringWithRange:matchRange];
}];
if (output)
NSLog(#"Found: %#", output);

Related

Regular expression substitution problem in Objective-C

Trying to capitalize all tags and running into trouble with substitution. Any idea why "upperCaseString" method isn't working?
NSError *error = nil;
NSMutableString *stringToCap = [NSMutableString stringWithString:#"<kaboom>stuff</kaboom>"];
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(</?[a-zA-Z].*?>)" options:NSRegularExpressionCaseInsensitive error:&error];
NSMutableString *modifiedString = [NSMutableString stringWithString:[regex stringByReplacingMatchesInString:stringToCap options:0 range:NSMakeRange(0, [stringToCap length]) withTemplate:#"$1".uppercaseString]];
NSLog(#"%#", modifiedString);
Produces: <kaboom>stuff</kaboom> when I expect <KABOOM>stuff</KABOOM>
stringByReplacingMatchesInString:options:range:withTemplate: doesn't work like that, the type of the last argument is just NSString and the string you are passing is the result of the expression #"$1".uppercaseString – which is just #"$1".
A possible algorithm (pseudo code):
for NSTextCheckingResult *match in [regex matchesInString:... options:... range:...] do
extract the substring at match.range from modified string
uppercase it
replace the substring at match.range with uppercased result

Regular expression to grub usernames from string

i need to find usernames (like twitter ones) in strings, for example, if the string is:
"Hello, #username! How are you? And #username2??"
I want to isolate/extract #username and #username2
Do you know how to do it in Objective-C, i found this for Python regex for Twitter username but does not work for me
I tried it like this, but is not working:
NSString *comment = #"Hello, #username! How are you? And #username2??";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(?<=^|(?<=[^a-zA-Z0-9-\\.]))#([A-Za-z]+[A-Za-z0-9-]+)" options:0 error:&error];
NSArray *matches = [regex matchesInString:comment options:0 range:NSMakeRange(0, comment.length)];
for (NSTextCheckingResult *match in matches) {
NSRange wordRange = [match rangeAtIndex:1];
NSString *username = [comment substringWithRange:wordRange];
NSLog(#"searchUsersInComment result --> %#", username);
}
(?<=^|(?<=[^a-zA-Z0-9-\\.]))#([A-Za-z]+[A-Za-z0-9-]+) is to neglect emails and grab only usernames, as your string doesn't contain any emails, you should just use #([A-Za-z]+[A-Za-z0-9-]+)
Your regex is wrong. You need to modify it to:
NSString *comment = #"Hello, #username! How are you? And #username2??";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"#([A-Za-z]+[A-Za-z0-9-]+)" options:0 error:&error];
NSArray *matches = [regex matchesInString:comment options:0 range:NSMakeRange(0, comment.length)];
for (NSTextCheckingResult *match in matches) {
NSRange wordRange = [match rangeAtIndex:1];
NSString *username = [comment substringWithRange:wordRange];
NSLog(#"searchUsersInComment result --> %#", username);
}
FYI: Any subpattern inside a pair of parentheses will be captured as a group. In practice, this can be used to extract information like phone numbers or emails from all sorts of data.
Imagine for example that you had a command line tool to list all the image files you have in the cloud. You could then use a pattern such as ^(IMG\d+.png)$ to capture and extract the full filename, but if you only wanted to capture the filename without the extension, you could use the pattern ^(IMG\d+).png$ which only captures the part before the period.
I would suggest you to read about regex strings: http://regexone.com/lesson/capturing_groups

How to use this regex with NSRegularExpression?

I'm trying to extract the youtube video id from a URL using the regex from this answer. However I can't figure out how to format it correctly to work with NSRegularExpression. I have tried escaping the backslashes for C, as well as using escapedTemplateForString and escapedPatternForString. I also tried adding a backslash before the opening/closing brackets. Each case returns NSNotFound for all URLs I try.
// Original: /^.*(?:(?:youtu\.be\/|v\/|vi\/|u\/\w\/|embed\/)|(?:(?:watch)?\?v(?:i)?=|\&v(?:i)?=))([^#\&\?]*).*/
NSString *c_escaped = #"/^.*(youtu.be\\/|v\\/|e\\/|u\\/\\w+\\/|embed\\/|v=)([^#\\&\\?]*).*/";
NSString *template = [NSRegularExpression escapedTemplateForString:c_escaped]; // "/^.*(youtu.be\\/|v\\/|e\\/|u\\/\\w+\\/|embed\\/|v=)([^#\\&\\?]*).*/"
NSString *pattern = [NSRegularExpression escapedPatternForString:c_escaped]; // "\/\^\.\*\(youtu\.be\\\/\|v\\\/\|e\\\/\|u\\\/\\w\+\\\/\|embed\\\/\|v=\)\(\[\^#\\&\\\?]\*\)\.\*\/"
NSRegularExpression *expr = [NSRegularExpression regularExpressionWithPattern:c_escaped
options:0
error:&error];
NSRange range = [expr rangeOfFirstMatchInString:self options:0 range:NSMakeRange(0, self.length)];
NSRegularExpression *expr1 = [NSRegularExpression regularExpressionWithPattern:template
options:0
error:&error];
NSRange range1 = [expr1 rangeOfFirstMatchInString:self options:0 range:NSMakeRange(0, self.length)];
NSRegularExpression *expr2 = [NSRegularExpression regularExpressionWithPattern:pattern
options:0
error:&error];
NSRange range2 = [expr2 rangeOfFirstMatchInString:self options:0 range:NSMakeRange(0, self.length)];
These are the URLs I've tested against:
NSArray *urls = #[
#"//www.youtube-nocookie.com/embed/up_lNV-yoK4?rel=0",
#"http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo",
#"http://www.youtube.com/watch?v=cKZDdG9FTKY&feature=channel",
#"http://www.youtube.com/watch?v=yZ-K7nCVnBI&playnext_from=TL&videos=osPknwzXEas&feature=sub",
#"http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I",
#"http://www.youtube.com/user/SilkRoadTheatre#p/a/u/2/6dwqZw0j_jY",
#"http://youtu.be/6dwqZw0j_jY",
#"http://www.youtube.com/watch?v=6dwqZw0j_jY&feature=youtu.be",
#"http://youtu.be/afa-5HQHiAs",
#"http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo?rel=0",
#"http://www.youtube.com/watch?v=cKZDdG9FTKY&feature=channel",
#"http://www.youtube.com/watch?v=yZ-K7nCVnBI&playnext_from=TL&videos=osPknwzXEas&feature=sub",
#"http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I",
#"http://www.youtube.com/embed/nas1rJpm7wY?rel=0",
#"http://www.youtube.com/watch?v=peFZbP64dsU",
#"http://youtube.com/v/dQw4w9WgXcQ?feature=youtube_gdata_player",
#"http://youtube.com/vi/dQw4w9WgXcQ?feature=youtube_gdata_player",
#"http://youtube.com/?v=dQw4w9WgXcQ&feature=youtube_gdata_player",
#"http://www.youtube.com/watch?v=dQw4w9WgXcQ&feature=youtube_gdata_player",
#"http://youtube.com/?vi=dQw4w9WgXcQ&feature=youtube_gdata_player",
#"http://youtube.com/watch?v=dQw4w9WgXcQ&feature=youtube_gdata_player",
#"http://youtube.com/watch?vi=dQw4w9WgXcQ&feature=youtube_gdata_player",
#"http://youtu.be/dQw4w9WgXcQ?feature=youtube_gdata_player"];
You need to delete the leading and trailing slashes. The problem is, you're adapting this from JavaScript which allows the use of "/" to delimit strings. You also need to escape the backslashes to make the Obj-C compiler do the right thing, but that's all. Try this:
#implementation NSString (youtube)
- (BOOL)isYouTubeURL
{
NSString *youtubePattern = #"^.*(?:(?:youtu\\.be\\/|v\\/|vi\\/|u\\/\\w\\/|embed\\/)|(?:(?:watch)?\\?v(?:i)?=|\\&v(?:i)?=))([^#\\&\\?]*).*";
NSRegularExpression *expr = [NSRegularExpression regularExpressionWithPattern:youtubePattern
options:0
error:nil];
NSRange range = [expr rangeOfFirstMatchInString:self options:0 range:NSMakeRange(0, self.length)];
return range.location != NSNotFound;
}
#end

Objective-C NSRegularExpressions, finding first occurrence of numbers in a string

I'm pretty green at regex with Objective-C. I'm having some difficulty with it.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"\\b([1-9]+)\\b" options:NSRegularExpressionCaseInsensitive error:&regError];
if (regError) {
NSLog(#"%#",regError.localizedDescription);
}
__block NSString *foundModel = nil;
[regex enumerateMatchesInString:self.model options:kNilOptions range:NSMakeRange(0, [self.model length]) usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop) {
foundModel = [self.model substringWithRange:[match rangeAtIndex:0]];
*stop = YES;
}];
All I'm looking to do is take a string like
150A
And get
150
First the problems with the regex:
You are using word boundaries (\b) which means you are only
looking for a number that is by itself (e.g. 15 but not 150A).
Your number range does not include 0 so it would not capture 150. It needs to be [0-9]+ and better yet use \d+.
So to fix this, if you want to capture any number all you need is \d+. If you want to capture anything that starts with a number then only put the word boundary at the beginning \b\d+.
Now to get the first occurrence you can use -[regex rangeOfFirstMatchInString:options:range:]
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"\\b\\d+" options:NSRegularExpressionCaseInsensitive error:&regError];
if (regError) {
NSLog(#"%#",regError.localizedDescription);
}
NSString *model = #"150A";
NSString *foundModel = nil;
NSRange range = [regex rangeOfFirstMatchInString:model options:kNilOptions range:NSMakeRange(0, [model length])];
if(range.location != NSNotFound)
{
foundModel = [model substringWithRange:range];
}
NSLog(#"Model: %#", foundModel);
What about .*?(\d+).*? ?
Demo:
That would backreference the number and you would be able to use it wherever you want.

In objective-c is it possible to get the position of a regex match within the string

If I have the string "Hello World", is it possible to use NSRegularExpression with the pattern #"World" to get the position of the match, i.e. in the "Hello World" example the position/index of the match should be "6"?
in php I'd use preg_match with the "PREG_OFFSET_CAPTURE" flag to achieve this, does objective-c support this?
You can do it the Cocoa way:
NSRegularExpression *regex = [[NSRegularExpression alloc] initWithPattern:#"world" options:0 error:NULL];
// omitted error checking for the sake of simplicity
NSString *str = #"Hello world!";
[regex enumerateMatchesInString:str
options:0
range:NSMakeRange(0, str.length)
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop)
{
NSLog(#"Match at [%d, %d]", result.range.location, result.range.length);
}];
[regex release];
Or the POSIX way (this may be convenient for you, since you want only one match, and this function/method returns the match range directly):
#include <regex.h>
- (NSRange)matchString:(NSString *)string toRegex:(NSString *)regex
{
regex_t regex_obj;
regmatch_t match;
const char *regex_str;
const char *match_str;
int error;
regex_str = [regex UTF8String];
error = regcomp(&regex_obj, regex_str, REG_EXTENDED);
if (error)
{
return NSMakeRange(NSNotFound, 0);
}
match_str = [string UTF8String];
error = regexec(&regex_obj, match_str, 1, &match, 0);
if (error)
{
return NSMakeRange(NSNotFound, 0);
}
regfree(&regex_obj);
return NSMakeRange(match.rm_so, match.rm_eo - match.rm_so);
}
This is somewhat long in Cocoa, but you can do it:
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"world"
options:NSRegularExpressionSearch
error:&error];
NSString *str = #"Hello, world!";
NSTextCheckingResult *match = [regex firstMatchInString:str
options:0
range:NSMakeRange(0, [str length])];
if (match) {
NSRange matchRange = [match range];
NSLog(#"%lu", matchRange.location);
}
This prints 7.
If you're going to make a lot of use of RegEx's, I recommend looking at RegexKit or RegexKitLite.
Yes it is possible. You can use the NSRegularExpression method, rangeOfFirstMatchInString:options:range: which returns the range of the first match. You could also do this with the NSString method rangeOfString: if you don't need to use REGEX.