I need to find a proper regular expression for words like [[ "objective C" ]] , [[ "Java" ]] ,and [[ "perl programming"]] in Objective C
Tried with many combinations like
NSString *pattern1 = #"[\[][\[][ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz \",]+]]";
NSString *pattern2 = #"\[\[[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz \",]+]]";
Apple documentation on NSRegularExpression Class says I need to use \ for treating next character as literal. Can some body help me to find what is the error in above regular expression ?
\ is an escape character for NSString, so you need to escape it.
\[ in a regex, becomes \\[ in NSString.
By the way a simpler regex for matching a single element is
\[\[ \"(\w|\s)+\" \]\]
which escaped for NSString is
#"\\[\\[ \\"(\\w|\\s)+\\" \\]\\]"
Related
I have a little regex problem (don't we all sometimes).
The few pieces of code are from Objective C but regex expressions are still the same I believe.
I have two functions called
NSString * CRLocalizedString(NSString *key)
NSString * CRLocalizedArgString(NSString *key, ...)
These are scattered around my project for localisation.
Now I want to find them all.
Well go to directory, parse all files, etc
All fine there.
The regexes I use on the files are
[NSRegularExpression regularExpressionWithPattern:#"CRLocalizedString\\(#\\\"[^)]+\\\"\\)" options:0 error:&error];
[NSRegularExpression regularExpressionWithPattern:#"CRLocalizedArgString\\([^)]+\\)" options:0 error:&error];
And this works perfect except that my terminates character is an ).
The problem occurs with function calls like this
CRLocalizedString(#"Happy =), o so happy =D");
CRLocalizedArgString(#"Filter (%i)", 0.75f);
The regex ends the string at "Filter (%i" and at "Happy =)".
And this is where my regex knowledge ends and I do not now what to do anymore.
I thought using ");" as an end but this isn't always the case.
So I was hoping someone here knew something for me (complete different things then regex are also allowed of course)
Kind regards
Saren
Let's write your first regex without the extra level of C escapes:
CRLocalizedString\(#\"[^)]+\"\)
You don't have to escape a " for a regex, so let's get rid of those extra backslashes:
CRLocalizedString\(#"[^)]+"\)
So, you want to match a quoted string using "[^)]+". But that doesn't match every quoted string.
What is a quoted string? It's a ", followed by any number of string atoms, followed by another ". What is a string atom? It's any character except " or \, or a \ followed by any character. So here's a regex for a quoted string:
"([^"\\]|\\.)*"
Sticking that back into your first regex, we get this:
CRLocalizedString\(#"([^"\\]|\\.)*"\)
Here's a link to a regex tester demonstrating that regex.
Quoting it in an Objective-C string literal gives us this:
#"CRLocalizedString\\(#\"([^\"\\\\]|\\\\.)*\"\\)"
It is impossible to write a regex to match calls to CRLocalizedArgString in the general case, because such calls can take arbitrary expressions as arguments, and regexes cannot match arbitrary expressions (because they can contain arbitrary levels of nested parentheses, which regexes cannot match).
You could just hope that there are no parentheses in the argument list, and use this regex:
CRLocalizedArgString\(#"([^"\\]|\\.)*"[^)]*\)
Here's a link to a regex tester demonstrating that regex.
Quoting it in an Objective-C string literal gives us this:
#"CRLocalizedArgString\\(#\"([^\"\\\\]|\\\\.)*\"[^)]*\\)"
I've received a Warning in Xcode: Unknown escape sequence '\]'
Code in Question: _regexForFindingTags = [[NSRegularExpression alloc] initWithPattern:#"\[.*?\]" options:ops error:&error];
The Problematic Search Pattern: \[.*?\]
Why is there a Warning for this Specific Search Pattern?
How can this Warning be Overcome?
My Search Pattern works in Regex Tester (granted that's in Javascript). According to Ray Wenderlich's NSRegularExpression Tutorial the ] character should be escapable using the \ character, So I'm missing something...
You get a warning from your compiler that is parsing string literal, not from regex engine. As escaping also exists for string literals, the sequence #"\[" is just syntax error apart from regex' syntax (it is just string after all, right?). So, if original regex is \[.*?\], it must be transformed it into:
[… initWithPattern:#"\\[.*?\\]" …];
I.e. you escape brackets at regex level and then also escape backslashes at string literal level, so #"\\[.*?\\]" becomes \[.*?\] in memory bytes.
You unfortunately need to escape the \
So they need to be \ in NSString literals
Want to know the whole character set whose characters have to be escaped in an Objective-C NSString object in order to be recognized properly, e.g. " has to be escaped as \", as in
NSString *temporaryString = #"That book is dubbed as \"the little book\".";
Is the character set same with the one in C language char * string?
Thanks for your help :D
The only characters that have to be escaped are the " (double-quote) and \ (backslash) characters.
There are other special character literals such as \n that have special meaning but those are really a separate issue.
Objective-C NSString values use the same set of special character literals as C.
I'm trying to replace [word] with \[word\] using NSRegularExpression:
NSRegularExpression *metaRegex = [NSRegularExpression regularExpressionWithPattern:#"([\\[\\]])"
options:0
error:&metaRegexError];
NSString *escapedTarget = [metaRegex stringByReplacingMatchesInString:string
options:0
range:NSMakeRange(0, string.length)
withTemplate:#"\\$1"];
But the output of this is $1word$1. You would think the first \ would escape the second \ character but instead it looks like it's escaping the $ character... How do I tell it to escape \ and not $?
Try:
#"\\\\$1"
for the replacement template. Basically: \\ will escape the \ for the string, so it's #"\$1" when it's sent to the regex. The \ then escapes the $ in the template, causing your issue.
You actually need four backslashes, like this:
#"\\\\$1"
Why is this unwieldily system required? Well, think of it this way. The \ character is used as the C escape character and the regex escape character. So, if you create an expression with only one backslash, you might get an error, because the NSString itself will thing you're using the special character \$. To escape the slash, you need to use two slashes, which will evaluate to only one in the final NSString data.
However, you really need two backslashes in the NSString itself to be sent to the regex parser, so you need to escape two backslashes in the string literal itself. So, \\\\ resolves to \\ in the actual data, which the regex parser then collapses to a single literal \ character
Alright, I'm trying to write some code that removes words that contain an apostrophe from an NSString. To do this, I've decided to use regular expressions, and I wrote one, that I tested using this website: http://rubular.com/r/YTV90BcgoQ
Here, the expression is: \S*'+\S
As shown on the website, the words containing an apostrophe are matched. But for some reason, in the application I'm writing, using this code:
sourceString = [sourceString stringByReplacingOccurrencesOfRegex:#"\S*'+\S" withString:#""];
Doesn't return any positive result. By NSLogging the 'sourceString', I notice that words like 'Don't' and 'Doesn't' are still present in the output.
It doesn't seem like my expression is the problem, but maybe RegexKitLite doesn't accept certain types of expressions? If someone knows what's going on here, please enlighten me !
Literal NSStrings use \ as an escape character so that you can put things like newlines \n into them. Regexes also use backslashes as an escape character for character classes like \S. When your literal string gets run through the compiler, the backslashes are treated as escape characters, and don't make it to the regex pattern.
Therefore, you need to escape the backslashes themselves in your literal NSString, in order to end up with backslashes in the string that is used as the pattern: #"\\S*'+\\S".
You should have seen a compiler warning about "Unknown escape sequence" -- don't ignore those warnings!