objective c regular expression - objective-c

I have a problem with reading the image path of a string like -> background-image:url(/assets/test.jpg)
I wanna have the string inside of the brackets without the brackets self.
Here is my code used:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"\\bbackground-image:url\(.*\)\\b" options:NSRegularExpressionCaseInsensitive error:nil];
thumbnail = [regex stringByReplacingMatchesInString:thumbnail options:0 range:NSMakeRange(0, [thumbnail length]) withTemplate:#"$1"];
what i get is (/assets/test.jpg)

Use the following pattern to get the expeced result:
background-image:url\\((.*)\\)
Applied to your code:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"background-image:url\\((.*)\\)" options:NSRegularExpressionCaseInsensitive error:nil];
Using this the result will be "/assets/test.jpg", just as you want it to be.
Your code should have given you a warning about an unknown escape sequence for "\(". You have to use "\\(" to escape a "(". Also get rid of "\\b" at the beginning and end of your pattern.
But be aware that this pattern only works when your string only contains "background-image:url(somevaluehere)"
EDIT:
What does \\b mean?
\\b is a word boundary, usually expressed as \b. In an NSString you need to write \\b because you need to escape the \ so it will be treated as real backslash.
Here some information on what word boundaries match:
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a
word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
Simply put: \b allows you to perform a "whole words only" search using
a regular expression in the form of \bword\b. A "word character" is a
character that can be used to form words. All characters that are not
"word characters" are "non-word characters".
Taken from http://www.regular-expressions.info/wordboundaries.html
I hope this clarifies this a bit.

I believe you should change your regex to \\bbackground-image:url\((.*)\)\\b.
I added () around .* to capture the match you want.

Related

NSRegularExpression - Probleme with a Pattern

i've wrote a little program to find a string in a string which works fine so far. But i have a problem with NSRegularExpression - i need the right Pattern for my special case and stuck.
NSString *strRegExp = [NSString stringWithFormat:#"?trunk/%#/%#/+\\([a-zA-Z0-9_\\-\\.])+/Host-1", inputstrse , inputstrsno];
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:strRegExp options: NSRegularExpressionCaseInsensitive error:NULL];
NSArray *arrayOfAllMatches = [regex matchesInString:inputurl options:0 range:NSMakeRange(0, [inputurl length])];
The NSRegularExpression pattern should match string the look like this:
trunk/%#/%#/some-text-1/Host-1
trunk/test/1/5-text-text/Host-1
Where trunk/%#/%#/ and /Host-1 stays always the same. Only the part in the middle is variable and always looks like this:
NUMBER-Some-Text -> 5-Hello-World -> /trunk/test/1/5-hello-world/Host-1
I've tried it with different RegExp as you see here: "?trunk/%#/%#/+\([a-zA-Z0-9_\-\.])+/Host-1", but i still seems not to work, maybe someone can help me.
Maybe there is a Probleme when i build the pattern with:
NSString *strRegExp = [NSString stringWithFormat:#"?trunk/%#/%#/+\\([a-zA-Z0-9_\\-\\.])+/Host-1", inputstrse , inputstrsno];
And use it later like that:
regularExpressionWithPattern:strRegExp
I hope someone can help me - i'm new to RegularExpressions.
Generally, expressing a Regex as "I want to match a number of letters, then a dash, then a number" and so on is the easiest way to construct one. Also, using a tool such as http://www.regexr.com simplifies a lot.
From what I understand you want to match the following:
trunk/test/1/[some number]-[some text]-[some other text]/Host-1
If so, then the following regular expression should cut it:
trunk\/test\/1\/[0-9]*-[a-zA-Z]*-[a-zA-Z]*\/Host-1
It does the following:
trunk\/test\/1\/: Match the constant string trunk/test/1/ (The backslashes are escapes)
[0-9]*-: Match any number of digits followed by a -
[a-zA-Z]*-: Match any number of letters followed by a -
[a-zA-Z]*: Match any number of letters
\/Host-1: Match the constant string /Host-1/
Here is a link to RegExr which you can use if you want to experiment with different input data or changes to the regex: http://regexr.com/39tgn
The following string was provided in the comments: trunk\test\/1\/.*\/Host-1. It's a bit less strict but does the job as well.
I don't know Objective-C but your regex has a bunch of oddities, if I remove those I get something that I think you'd want to achieve.
Your first character is a ?, that can't be, it's a quantifier in regex that says something about the preceding character (or class or group). If it's the first character, there is no preceding char.
/+\\ <-- unsure what you were trying to do here, but it means '1 or more / followed by \'
[a-zA-Z0-9_\\-\\.] can be done much shorter like: [\w.-] and if you place the + within the parentheses it will capture the entire unknown string in capture group 1.
From comments: So %# is a variable text, the first is always just letters, the 2nd is always just numbers. That would be [a-zA-Z]+ and \d+ respectively in a regex. But actually I would use [^/]+ (any character that isn't /) so that the code doesn't break when someone puts a different character in this path like trunk/this_text/4/.../Host-1 which would break on the _.
Combined this makes (changed after comments):
trunk/[^/]+/[^/]+/([\w.-]+)/Host-1
Debuggex Demo
Now note that this is without escaping to get the proper string into the regex engine, but if Objective-C is anything like C# then a string started with #"..." doesn't need escaping.

Regex (searching for function(#"string content") to get "string content"

I have a little regex problem (don't we all sometimes).
The few pieces of code are from Objective C but regex expressions are still the same I believe.
I have two functions called
NSString * CRLocalizedString(NSString *key)
NSString * CRLocalizedArgString(NSString *key, ...)
These are scattered around my project for localisation.
Now I want to find them all.
Well go to directory, parse all files, etc
All fine there.
The regexes I use on the files are
[NSRegularExpression regularExpressionWithPattern:#"CRLocalizedString\\(#\\\"[^)]+\\\"\\)" options:0 error:&error];
[NSRegularExpression regularExpressionWithPattern:#"CRLocalizedArgString\\([^)]+\\)" options:0 error:&error];
And this works perfect except that my terminates character is an ).
The problem occurs with function calls like this
CRLocalizedString(#"Happy =), o so happy =D");
CRLocalizedArgString(#"Filter (%i)", 0.75f);
The regex ends the string at "Filter (%i" and at "Happy =)".
And this is where my regex knowledge ends and I do not now what to do anymore.
I thought using ");" as an end but this isn't always the case.
So I was hoping someone here knew something for me (complete different things then regex are also allowed of course)
Kind regards
Saren
Let's write your first regex without the extra level of C escapes:
CRLocalizedString\(#\"[^)]+\"\)
You don't have to escape a " for a regex, so let's get rid of those extra backslashes:
CRLocalizedString\(#"[^)]+"\)
So, you want to match a quoted string using "[^)]+". But that doesn't match every quoted string.
What is a quoted string? It's a ", followed by any number of string atoms, followed by another ". What is a string atom? It's any character except " or \, or a \ followed by any character. So here's a regex for a quoted string:
"([^"\\]|\\.)*"
Sticking that back into your first regex, we get this:
CRLocalizedString\(#"([^"\\]|\\.)*"\)
Here's a link to a regex tester demonstrating that regex.
Quoting it in an Objective-C string literal gives us this:
#"CRLocalizedString\\(#\"([^\"\\\\]|\\\\.)*\"\\)"
It is impossible to write a regex to match calls to CRLocalizedArgString in the general case, because such calls can take arbitrary expressions as arguments, and regexes cannot match arbitrary expressions (because they can contain arbitrary levels of nested parentheses, which regexes cannot match).
You could just hope that there are no parentheses in the argument list, and use this regex:
CRLocalizedArgString\(#"([^"\\]|\\.)*"[^)]*\)
Here's a link to a regex tester demonstrating that regex.
Quoting it in an Objective-C string literal gives us this:
#"CRLocalizedArgString\\(#\"([^\"\\\\]|\\\\.)*\"[^)]*\\)"

How do I check for this odd space character - " " in Objective-C?

I wrote some RegEx to play with spaces in strings, and it works beautifully, except for when I come across this character: " " instead of " ". You probably think I'm crazy, but apparently they're different. Check out this RegEx app (oddly enough, it often crashes it):
When I use the weird space:
When I use a normal space:
As you can see, there are many more spaces detected here, but it doesn't detect the weird spaces.
What is this space? How do I get rid of it?
Unicode has a lot of different space characters. The space you posted in your question -- in both the title and the body -- is a regular ASCII space, good old U+0020.
If you want to check exactly what you've copied onto your clipboard, you can run the command pbpaste(1) on Mac OS X. For example, if you copied a non-breaking space (U+00A0), you could identify it like so:
# Write pasteboard contents to stdout, convert from UTF-8 to UTF-32 for easy
# code point identification, then hex dump the contents
$ pbpaste | iconv -f utf-8 -t utf-32be | hexdump -C
00000000 00 00 00 a0 |....|
00000004
Depending on the regex engine you're using, it may not support them all, especially if you use the \s character class. If you want to be sure to match the space character you have, then include it explicitly in your character class, e.g. [\s<YOURSPACEHERE>], where <YOURSPACEHERE> is copy+pasted from the character you want to match.
Try "\p{Z}" for your regular expression. It's the unicode property for any kind of whitespace or invisible separator.
See: NSRegularExpression and Unicode Regular Expressions.
Just as a test of my answer, I constructed the following unit test.
- (void)testPattern
{
NSString *string = #"xxx\u00A0yyy";
NSString *pattern = #"\\p{Z}";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:NULL];
NSUInteger number = [regex numberOfMatchesInString:string options:0 range:NSMakeRange(0, [string length])];
STAssertEquals(number, 1U, #"");
}
They're probably non-breaking spaces, seeing as all the lines end with spaces that are matched by \s rather than these mystery spaces. Try matching \0xA0.
You can match Unicode characters with \x{NNNN}, where NNNN is the Hexa code of the character. See ICU User Guide.

OS X Using literal asterisk in regular expression

I'm writing a program to make text that begins with /* and ends with */ a different color (syntax highlighting for a C comment). When I try this
#"/\*.*\*/";
I get unknown escape sequence. So I figured that to get a literal asterisk I had to use this
#"/[*].*[*]/";
and I get no errors, but when I use this code
commentPattern = #"/[*].*[*]/";
reg = [NSRegularExpression regularExpressionWithPattern:commentPattern options:kNilOptions error:nil];
results = [reg matchesInString:self.string options:kNilOptions range:NSMakeRange(0, [self.string length])];
for (NSTextCheckingResult *result in results)
{
[self setTextColor:[NSColor colorWithCalibratedRed:0.0 green:0.7 blue:0.0 alpha:1.0] range:result.range];
}
the text color of the comments doesn't change, but I don't see anything wrong with my regular expression. Can someone tell me why this wont work? I don't think it's a problem with the way I get the results or change their color, because I use the same method for other regular expressions.
You want to use this: "\\*".
\* is the escape sequence for * in regular expressions, but in C strings, \ also begins an escaped character token, so you have to escape that as well.
#"/\*.*\*/";
I get unknown escape sequence.
A string first converts escape sequences in the string, then the result is handed over to the regex engine. For instance, an escape sequence might be \t, which represents a tab, or \n which represents a newline. The string first converts an escape sequence to a special code. Your error is saying that \* is not a legal escape sequence for an NSString.
The regex engine needs to see a literal back slash followed by a *. To get a literal back slash in a string you need to write \\. However, for readability I prefer using a character class like you did with your second attempt.
You should NSLog what the results array contains to see what matches you are getting. If the matches are what you expect, then the problem is not with the regex.

NSRegularExpression to add escape characters

I'm trying to replace [word] with \[word\] using NSRegularExpression:
NSRegularExpression *metaRegex = [NSRegularExpression regularExpressionWithPattern:#"([\\[\\]])"
options:0
error:&metaRegexError];
NSString *escapedTarget = [metaRegex stringByReplacingMatchesInString:string
options:0
range:NSMakeRange(0, string.length)
withTemplate:#"\\$1"];
But the output of this is $1word$1. You would think the first \ would escape the second \ character but instead it looks like it's escaping the $ character... How do I tell it to escape \ and not $?
Try:
#"\\\\$1"
for the replacement template. Basically: \\ will escape the \ for the string, so it's #"\$1" when it's sent to the regex. The \ then escapes the $ in the template, causing your issue.
You actually need four backslashes, like this:
#"\\\\$1"
Why is this unwieldily system required? Well, think of it this way. The \ character is used as the C escape character and the regex escape character. So, if you create an expression with only one backslash, you might get an error, because the NSString itself will thing you're using the special character \$. To escape the slash, you need to use two slashes, which will evaluate to only one in the final NSString data.
However, you really need two backslashes in the NSString itself to be sent to the regex parser, so you need to escape two backslashes in the string literal itself. So, \\\\ resolves to \\ in the actual data, which the regex parser then collapses to a single literal \ character