Xcode - replace function with regex and two-digit capture group (back reference) - objective-c

I would like to use the Xcode's find in project option to normalize the signatures of methods.
I wrote the find expression:
^\s*([+-])\s*\((\w+)\s*(\*?)\s*\)\s*(\w+)(\s*(:)\s*(\()\s*(\w+)\s*(\*?)\s*(\))\s*(\w+))?
and the replacement expression:
\1 \(\2\3\)\4\6\7\8\9\10\11
The test string is:
+(NSString *) testFunction : (NSInteger ) arg1
and the desired result:
+ (NSString*)testFunction:(NSInteger)arg1
Unfortunatelly Xcode isn't able to recognize te two digit capture group \10 and translates it to \1 and '0' character and so long. How to solve this problem or bug?
Thanks in advance,
MichaƂ

I believe #trojanfoe is correct; regexes can only have nine capture groups. This is waaay more than you need for your particular example, though.
^\s*([+-])\s*\((\w+)\s*(\*?)\s*\)\s*(\w+)(\s*(:)\s*(\()\s*(\w+)\s*(\*?)\s*(\))\s*(\w+))?
\1 \(\2\3\)\4\6\7\8\9\10\11
The first thing I notice is that you're not using \5, so there's no reason to capture it at all. Next, I notice that \6 corresponds to the regex (:), so you can avoid capturing it and replace \6 with : in the output. \7 corresponds to (\(), so you can replace \7 with ( in the output. ...Iterating this approach yields a much simpler pair of regexes: one for zero-argument methods and one for one-argument methods.
^\s*([+-])\s*\((\w+)\s*(\*?)\s*\)\s*(\w+)
\1 \(\2\3\)\4
^([+-] \(\w+\*?\)\w+)\s*:\s*\(\s*(\w+)\s*(\*?)\s*\)\s*(\w+)
\1:\(\2\3\)\4
Notice that I can capture the whole regex [+-] \(\w+\*?\)\w+ without all those noisy \s*s, because it's been normalized already by the first regex's pass.
However, this whole idea is a huge mistake. Consider the following Objective-C method declarations:
-(const char *)toString;
-(id)initWithA: (A) a andB: (B) b andC: (C) c;
-(NSObject **)pointerptr;
-(void)performBlock: (void (^)(void)) block;
-(id)stringWithFormat: (const char *) fmt, ...;
None of these are going to be parsed correctly by your regex. The first one contains a two-word type const char instead of a single word; the second has more than one parameter; the third has a double pointer; the fourth has a very complicated type instead of a single word; and the fifth has not only const char but a variadic argument list. I could go on, through out parameters and arrays and __attribute__ syntax, but surely you're beginning to see why regexes are a bad match for this problem.
What you're really looking for is an indent program (named after GNU indent, which unfortunately doesn't do Objective-C). The best-known and best-supported Objective-C indent program is called uncrustify; get it here.

Related

Regular expression to extract a number of steps

I have a localized string that looks something like this in English:
"
5 Mile(s)
5,252 Step(s)
"
My app is localized both in left-to-right and right-to-left languages so I don't want to make assumptions either about the ordering of the step(s) or about the formatting of the number (e.g. 5,252 can be 5.252 depending on user locale). So I need to account for possibilities that can include things like
Step(s) 5.252
as well as what's above.
A few other caveats
All I know is that if the Step(s) line is in there, it will be on its own line (hence in my regex I require \n at each end of the string)
No guarantee that the Mile(s) information will be in the string at all, let alone whether it will be before or after Step(s)
Here's my attempt at pattern extraction:
NSString *patternString = [NSString stringWithFormat:#"\\n(([0-9,\\.]*)\s*%#|%#\s*([0-9,\\.]*))\\n",
NSLocalizedString(#"Step(s)",nil), NSLocalizedString(#"Step(s)",nil)];
There appear to be two problems with this:
XCode is indicating Unknown escape sequence '\s' for the second \s in the pattern string above
No matches are being found even for strings like the following:
0.2 Mile(s)
1,482 Step(s)
Ideally I would extract the 1,482 out of this string in a way that is localization friendly. How should I modify my regex?
as far as the regex, perhaps this approach might work - it simply matches (with named groups) each couplet of numbers in sequence, with the assumption the first is miles and the second is steps. Decimals in the . or , form are optional:
(?<miles>\d+(?:[.,]\d+)?).*?(?<steps>\d+(?:[.,]\d+)?)
(and i think it should be \\s) - i'm not an ios guy, but if you can use a regex literal it would be way more readable.
regular expression demo
First I'd like to ask - Why is Mile(s) mentioned in the question at all?
And now to my two bits - you could simply use a positive look-ahead:
^(?=.*Step\(s\))[^\d]*(\d+(?:[.,]\d+)?)
It makes sure the expected word is present on the line, and then captures the number on it, allowing for localized, optional, decimal separator and decimals. This way it doesn't matter if the numer is before, or after, the "word".
It doesn't take localization of the "word" into account, but that you seem to have handled by yourself ;)
See it here at regex101.
Your regex is close, although in Obj-C you need to double-escape the \s and (s):
^(([0-9,.]*)\\s*%#|%#\\s*([0-9,.]*))$
In your NSLocalizedString you likely also need to escape the parentheses enclosing (s):
NSString *patternString = [NSString stringWithFormat:#"^(([\\d,.]+)\\s%#|%#\\s([\\d,.]+))$",
NSLocalizedString(#"Step\\(s\\)",nil), NSLocalizedString(#"Step\\(s\\)",nil)];
If you don't escape (s) then the regex engine is probably going to interpret it as a capture group.
Looking at NSLog you can see what the pattern actually reads like:
NSLog(#"patternString: %#", patternString);
Output:
patternString: ^(([\d,.]+)\sStep\(s\)|Step\(s\)\s([\d,.]+))$
Since you mentioned the Mile(s) part may not be in the string at all I'm assuming it isn't relevant to the regular expression. As I understand from the question, you just need to capture the number of steps and nothing else. On this basis, here's a modified version of your existing regex:
NSString *patternString =
[NSString stringWithFormat:#"^(?:([0-9,.]*)\\s*%#|%#\\s*([0-9,.]*))$",
NSLocalizedString(#"Step\\(s\\)",nil), NSLocalizedString(#"Step\\(s\\)",nil)];
Demo:
https://www.regex101.com/r/Q6ff1b/1
This is based on the following tips/modifications:
Use the m (= UREGEX_MULTILINE) flag option when creating the regex to specify that ^ and $ match the start and end of each line. This is more sophisticated than using \n as it will also handle the start and end of the string where this might not be present. See here.
Always use a double backslash (\\) for regex escaping - otherwise NSString will interpret the single backslash to be escaping the next character and convert it before it gets to the regex.
Literal parentheses need to be escaped - e.g. Step\\(s\\) instead of Step(s).
Characters within a character class (i.e. anything within the [] square brackets) don't need to be escaped - so it would be . rather than \\. - the latter.
If you are using (x|y|...) as a choice and don't need it to be a capturing group, use ?: after the first parenthesis to ensure it doesn't get captured - i.e. (?:x|y|...).

Regex (searching for function(#"string content") to get "string content"

I have a little regex problem (don't we all sometimes).
The few pieces of code are from Objective C but regex expressions are still the same I believe.
I have two functions called
NSString * CRLocalizedString(NSString *key)
NSString * CRLocalizedArgString(NSString *key, ...)
These are scattered around my project for localisation.
Now I want to find them all.
Well go to directory, parse all files, etc
All fine there.
The regexes I use on the files are
[NSRegularExpression regularExpressionWithPattern:#"CRLocalizedString\\(#\\\"[^)]+\\\"\\)" options:0 error:&error];
[NSRegularExpression regularExpressionWithPattern:#"CRLocalizedArgString\\([^)]+\\)" options:0 error:&error];
And this works perfect except that my terminates character is an ).
The problem occurs with function calls like this
CRLocalizedString(#"Happy =), o so happy =D");
CRLocalizedArgString(#"Filter (%i)", 0.75f);
The regex ends the string at "Filter (%i" and at "Happy =)".
And this is where my regex knowledge ends and I do not now what to do anymore.
I thought using ");" as an end but this isn't always the case.
So I was hoping someone here knew something for me (complete different things then regex are also allowed of course)
Kind regards
Saren
Let's write your first regex without the extra level of C escapes:
CRLocalizedString\(#\"[^)]+\"\)
You don't have to escape a " for a regex, so let's get rid of those extra backslashes:
CRLocalizedString\(#"[^)]+"\)
So, you want to match a quoted string using "[^)]+". But that doesn't match every quoted string.
What is a quoted string? It's a ", followed by any number of string atoms, followed by another ". What is a string atom? It's any character except " or \, or a \ followed by any character. So here's a regex for a quoted string:
"([^"\\]|\\.)*"
Sticking that back into your first regex, we get this:
CRLocalizedString\(#"([^"\\]|\\.)*"\)
Here's a link to a regex tester demonstrating that regex.
Quoting it in an Objective-C string literal gives us this:
#"CRLocalizedString\\(#\"([^\"\\\\]|\\\\.)*\"\\)"
It is impossible to write a regex to match calls to CRLocalizedArgString in the general case, because such calls can take arbitrary expressions as arguments, and regexes cannot match arbitrary expressions (because they can contain arbitrary levels of nested parentheses, which regexes cannot match).
You could just hope that there are no parentheses in the argument list, and use this regex:
CRLocalizedArgString\(#"([^"\\]|\\.)*"[^)]*\)
Here's a link to a regex tester demonstrating that regex.
Quoting it in an Objective-C string literal gives us this:
#"CRLocalizedArgString\\(#\"([^\"\\\\]|\\\\.)*\"[^)]*\\)"

What does the \? (backslash question mark) escape sequence mean?

I'm writing a regular expression in Objective-C.
The escape sequence \w is illegal and emits a warning, so the regular expression /\w/ must be written as #"\\w"; the escape sequence \? is valid, apparently, and doesn't emit a warning, so the regular expression /\?/ must be written as #"\\?" (i.e., the backslash must be escaped).
Question marks aren't invisible like \t or \n, so why is \? a valid escape sequence?
Edit: To clarify, I'm not asking about the quantifier, I'm asking about a string escape sequence. That is, this doesn't emit a warning:
NSString *valid = #"\?";
By contrast, this does emit a warning ("Unknown escape sequence '\w'"):
NSString *invalid = #"\w";
It specifies a literal question mark. It is needed because of a little-known feature called trigraphs, where you can write a three-character sequence starting with question marks to substitute another character. If you have trigraphs enabled, in order to write "??" in a string, you need to write it as "?\?" in order to prevent the preprocessor from trying to read it as the beginning of a trigraph.
(If you're wondering "Why would anybody introduce a feature like this?": Some keyboards or character sets didn't include commonly used symbols like {. so they introduced trigraphs so you could write ??< instead.)
? in regex is a quantifier, it means 0 or 1 occurences. When appended to the + or * quantifiers, it makes it "lazy".
For example, applying the regex o? to the string foo? would match o.
However, the regex o\? in foo? would match o?, because it is searching for a literal question mark in the string, instead of an arbitrary quantifier.
Applying the regex o*? to foo? would match oo.
More info on quantifiers here.

How do I format an NSString over multiple lines

Does anyone know how to format an NSString over multiple lines?
e.g. this doesn't build:
return #"asdfasdf" +
"asdfasdf";
return #"asdfasdf"
#"asdfasdf";
I suggest using this syntax instead of
return #"asdfasdf"
"asdfasdf";
just to distinguish C-strings from ObjectiveC ones.
I was having this problem all the time (especially with HTML strings), so I made a tiny tool to convert text to an escaped multi-line Objective-C string:
http://multilineobjc.herokuapp.com/
Hope this saves you some time.
If you remove the +, the compiler will join the two strings together. See C syntax: string literal concatenation.
return #"asdfasdf"
"asdfasdf";
Note that neither GCC nor LLVM seem to care if you omit the # prefix from the later strings.

Regular expression for extracting a number

I would like to be able to extract a number from within a string formatted as follows:
"<[1085674730]> hello foo1, how are you doing?"
I'm a novice with regular expressions, I only want to be able to extract a number that is enclosed in the greater/less-than and bracket symbols, but I'm not sure how to go about it. I have to match numeric digits only, but I'm not sure what syntax is used for only searching within these symbols.
UPDATE:
Thank you all for you input, sorry for not being more specific, as I explained to kiamlaluno, I'm using VB.Net as the language for my application. I was wondering why some of the implementations were not working. In fact, the only one that did work was the one described by Matthew Flaschen. But that captures the symbols around the number as well as the number itself. I would like to only capture the number that is encased in the symbols and filter out the symbols themselves.
Use:
<\[(\d+)\]>
This is tested with ECMAScript regex.
It means:
\[ - literal [
( - open capturing group
\d - digit
+ - one or more
) - close capturing group
\] - literal ]
The overall functionality is to capture one or more digits surrounded by the given characters.
Combine Mathews post with lookarounds http://www.regular-expressions.info/lookaround.html. This will exclude the prefix and suffix.
(?<=<\[)\d+(?=\]>)
I didn't test this regex but it should be very close to what you need. Double check at the link provided.
Hope this helps!
$subject = "<[1085674730]> hello foo1, how are you doing?";
preg_match('/<\[(\d+)\]>/', $subject, $matches);
$matches[1] will contain the number you are looking for.
Use:
/<\[([[:digit:]]+)\]>/
If your implementation doesn't support the handy [:digit:] syntax, then use this:
/<\[([\d]+)\]>/
And if your implementation doesn't support the handy \d syntax, then use this:
/<\[([0-9]+)\]>/