How can I make this regex? - objective-c

I'm sorry for being annoying and asking other people to do this for me, but I have been trying for a while now and can't seem to get a working one. This is what it needs to allow:
Lower case letters
Upper case letters
Apostrophes (')
Dashes (-)
It doesn't matter what order these come in for the string that will be rejected as long as it doesn't contain anything but the above characters. It is for objective-c if that affects anything in regex expressions.
NSString *nameRegEx = #"^[A-Z][a-zA-Z]+$";
NSPredicate *firstTest = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", nameRegEx];

For the "upper & lower-case letters, dash and apostrophe" part of the regex, try :--
[a-zA-Z'\\-]
You need to escape the - dash, if you're not going to depend on it being in certain syntactic positions in the [] character-class.
In Java, we'd need to use \\ double-backslashes -- a single-backslash would escape a control-character into the compiler, so we need a double-backslash to get a \ single backslash past the compiler to act as an escape in the regex. It may well be similar for you.
Hope this helps.

Related

Objective C - RegEx - Invalid Range when trying to match spaces [duplicate]

How to rewrite the [a-zA-Z0-9!$* \t\r\n] pattern to match hyphen along with the existing characters ?
The hyphen is usually a normal character in regular expressions. Only if it’s in a character class and between two other characters does it take a special meaning.
Thus:
[-] matches a hyphen.
[abc-] matches a, b, c or a hyphen.
[-abc] matches a, b, c or a hyphen.
[ab-d] matches a, b, c or d (only here the hyphen denotes a character range).
Escape the hyphen.
[a-zA-Z0-9!$* \t\r\n\-]
UPDATE:
Never mind this answer - you can add the hyphen to the group but you don't have to escape it. See Konrad Rudolph's answer instead which does a much better job of answering and explains why.
It’s less confusing to always use an escaped hyphen, so that it doesn't have to be positionally dependent. That’s a \- inside the bracketed character class.
But there’s something else to consider. Some of those enumerated characters should possibly be written differently. In some circumstances, they definitely should.
This comparison of regex flavors says that C♯ can use some of the simpler Unicode properties. If you’re dealing with Unicode, you should probably use the general category \p{L} for all possible letters, and maybe \p{Nd} for decimal numbers. Also, if you want to accomodate all that dash punctuation, not just HYPHEN-MINUS, you should use the \p{Pd} property. You might also want to write that sequence of whitespace characters simply as \s, assuming that’s not too general for you.
All together, that works out to apattern of [\p{L}\p{Nd}\p{Pd}!$*] to match any one character from that set.
I’d likely use that anyway, even if I didn’t plan on dealing with the full Unicode set, because it’s a good habit to get into, and because these things often grow beyond their original parameters. Now when you lift it to use in other code, it will still work correctly. If you hard‐code all the characters, it won’t.
[-a-z0-9]+,[a-z0-9-]+,[a-z-0-9]+ and also [a-z-0-9]+ all are same.The hyphen between two ranges considered as a symbol.And also [a-z0-9-+()]+ this regex allow hyphen.
use "\p{Pd}" without quotes to match any type of hyphen. The '-' character is just one type of hyphen which also happens to be a special character in Regex.
Is this what you are after?
MatchCollection matches = Regex.Matches(mystring, "-");

Regular expression to extract a number of steps

I have a localized string that looks something like this in English:
"
5 Mile(s)
5,252 Step(s)
"
My app is localized both in left-to-right and right-to-left languages so I don't want to make assumptions either about the ordering of the step(s) or about the formatting of the number (e.g. 5,252 can be 5.252 depending on user locale). So I need to account for possibilities that can include things like
Step(s) 5.252
as well as what's above.
A few other caveats
All I know is that if the Step(s) line is in there, it will be on its own line (hence in my regex I require \n at each end of the string)
No guarantee that the Mile(s) information will be in the string at all, let alone whether it will be before or after Step(s)
Here's my attempt at pattern extraction:
NSString *patternString = [NSString stringWithFormat:#"\\n(([0-9,\\.]*)\s*%#|%#\s*([0-9,\\.]*))\\n",
NSLocalizedString(#"Step(s)",nil), NSLocalizedString(#"Step(s)",nil)];
There appear to be two problems with this:
XCode is indicating Unknown escape sequence '\s' for the second \s in the pattern string above
No matches are being found even for strings like the following:
0.2 Mile(s)
1,482 Step(s)
Ideally I would extract the 1,482 out of this string in a way that is localization friendly. How should I modify my regex?
as far as the regex, perhaps this approach might work - it simply matches (with named groups) each couplet of numbers in sequence, with the assumption the first is miles and the second is steps. Decimals in the . or , form are optional:
(?<miles>\d+(?:[.,]\d+)?).*?(?<steps>\d+(?:[.,]\d+)?)
(and i think it should be \\s) - i'm not an ios guy, but if you can use a regex literal it would be way more readable.
regular expression demo
First I'd like to ask - Why is Mile(s) mentioned in the question at all?
And now to my two bits - you could simply use a positive look-ahead:
^(?=.*Step\(s\))[^\d]*(\d+(?:[.,]\d+)?)
It makes sure the expected word is present on the line, and then captures the number on it, allowing for localized, optional, decimal separator and decimals. This way it doesn't matter if the numer is before, or after, the "word".
It doesn't take localization of the "word" into account, but that you seem to have handled by yourself ;)
See it here at regex101.
Your regex is close, although in Obj-C you need to double-escape the \s and (s):
^(([0-9,.]*)\\s*%#|%#\\s*([0-9,.]*))$
In your NSLocalizedString you likely also need to escape the parentheses enclosing (s):
NSString *patternString = [NSString stringWithFormat:#"^(([\\d,.]+)\\s%#|%#\\s([\\d,.]+))$",
NSLocalizedString(#"Step\\(s\\)",nil), NSLocalizedString(#"Step\\(s\\)",nil)];
If you don't escape (s) then the regex engine is probably going to interpret it as a capture group.
Looking at NSLog you can see what the pattern actually reads like:
NSLog(#"patternString: %#", patternString);
Output:
patternString: ^(([\d,.]+)\sStep\(s\)|Step\(s\)\s([\d,.]+))$
Since you mentioned the Mile(s) part may not be in the string at all I'm assuming it isn't relevant to the regular expression. As I understand from the question, you just need to capture the number of steps and nothing else. On this basis, here's a modified version of your existing regex:
NSString *patternString =
[NSString stringWithFormat:#"^(?:([0-9,.]*)\\s*%#|%#\\s*([0-9,.]*))$",
NSLocalizedString(#"Step\\(s\\)",nil), NSLocalizedString(#"Step\\(s\\)",nil)];
Demo:
https://www.regex101.com/r/Q6ff1b/1
This is based on the following tips/modifications:
Use the m (= UREGEX_MULTILINE) flag option when creating the regex to specify that ^ and $ match the start and end of each line. This is more sophisticated than using \n as it will also handle the start and end of the string where this might not be present. See here.
Always use a double backslash (\\) for regex escaping - otherwise NSString will interpret the single backslash to be escaping the next character and convert it before it gets to the regex.
Literal parentheses need to be escaped - e.g. Step\\(s\\) instead of Step(s).
Characters within a character class (i.e. anything within the [] square brackets) don't need to be escaped - so it would be . rather than \\. - the latter.
If you are using (x|y|...) as a choice and don't need it to be a capturing group, use ?: after the first parenthesis to ensure it doesn't get captured - i.e. (?:x|y|...).

NSRegularExpression - Probleme with a Pattern

i've wrote a little program to find a string in a string which works fine so far. But i have a problem with NSRegularExpression - i need the right Pattern for my special case and stuck.
NSString *strRegExp = [NSString stringWithFormat:#"?trunk/%#/%#/+\\([a-zA-Z0-9_\\-\\.])+/Host-1", inputstrse , inputstrsno];
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:strRegExp options: NSRegularExpressionCaseInsensitive error:NULL];
NSArray *arrayOfAllMatches = [regex matchesInString:inputurl options:0 range:NSMakeRange(0, [inputurl length])];
The NSRegularExpression pattern should match string the look like this:
trunk/%#/%#/some-text-1/Host-1
trunk/test/1/5-text-text/Host-1
Where trunk/%#/%#/ and /Host-1 stays always the same. Only the part in the middle is variable and always looks like this:
NUMBER-Some-Text -> 5-Hello-World -> /trunk/test/1/5-hello-world/Host-1
I've tried it with different RegExp as you see here: "?trunk/%#/%#/+\([a-zA-Z0-9_\-\.])+/Host-1", but i still seems not to work, maybe someone can help me.
Maybe there is a Probleme when i build the pattern with:
NSString *strRegExp = [NSString stringWithFormat:#"?trunk/%#/%#/+\\([a-zA-Z0-9_\\-\\.])+/Host-1", inputstrse , inputstrsno];
And use it later like that:
regularExpressionWithPattern:strRegExp
I hope someone can help me - i'm new to RegularExpressions.
Generally, expressing a Regex as "I want to match a number of letters, then a dash, then a number" and so on is the easiest way to construct one. Also, using a tool such as http://www.regexr.com simplifies a lot.
From what I understand you want to match the following:
trunk/test/1/[some number]-[some text]-[some other text]/Host-1
If so, then the following regular expression should cut it:
trunk\/test\/1\/[0-9]*-[a-zA-Z]*-[a-zA-Z]*\/Host-1
It does the following:
trunk\/test\/1\/: Match the constant string trunk/test/1/ (The backslashes are escapes)
[0-9]*-: Match any number of digits followed by a -
[a-zA-Z]*-: Match any number of letters followed by a -
[a-zA-Z]*: Match any number of letters
\/Host-1: Match the constant string /Host-1/
Here is a link to RegExr which you can use if you want to experiment with different input data or changes to the regex: http://regexr.com/39tgn
The following string was provided in the comments: trunk\test\/1\/.*\/Host-1. It's a bit less strict but does the job as well.
I don't know Objective-C but your regex has a bunch of oddities, if I remove those I get something that I think you'd want to achieve.
Your first character is a ?, that can't be, it's a quantifier in regex that says something about the preceding character (or class or group). If it's the first character, there is no preceding char.
/+\\ <-- unsure what you were trying to do here, but it means '1 or more / followed by \'
[a-zA-Z0-9_\\-\\.] can be done much shorter like: [\w.-] and if you place the + within the parentheses it will capture the entire unknown string in capture group 1.
From comments: So %# is a variable text, the first is always just letters, the 2nd is always just numbers. That would be [a-zA-Z]+ and \d+ respectively in a regex. But actually I would use [^/]+ (any character that isn't /) so that the code doesn't break when someone puts a different character in this path like trunk/this_text/4/.../Host-1 which would break on the _.
Combined this makes (changed after comments):
trunk/[^/]+/[^/]+/([\w.-]+)/Host-1
Debuggex Demo
Now note that this is without escaping to get the proper string into the regex engine, but if Objective-C is anything like C# then a string started with #"..." doesn't need escaping.

Regex (searching for function(#"string content") to get "string content"

I have a little regex problem (don't we all sometimes).
The few pieces of code are from Objective C but regex expressions are still the same I believe.
I have two functions called
NSString * CRLocalizedString(NSString *key)
NSString * CRLocalizedArgString(NSString *key, ...)
These are scattered around my project for localisation.
Now I want to find them all.
Well go to directory, parse all files, etc
All fine there.
The regexes I use on the files are
[NSRegularExpression regularExpressionWithPattern:#"CRLocalizedString\\(#\\\"[^)]+\\\"\\)" options:0 error:&error];
[NSRegularExpression regularExpressionWithPattern:#"CRLocalizedArgString\\([^)]+\\)" options:0 error:&error];
And this works perfect except that my terminates character is an ).
The problem occurs with function calls like this
CRLocalizedString(#"Happy =), o so happy =D");
CRLocalizedArgString(#"Filter (%i)", 0.75f);
The regex ends the string at "Filter (%i" and at "Happy =)".
And this is where my regex knowledge ends and I do not now what to do anymore.
I thought using ");" as an end but this isn't always the case.
So I was hoping someone here knew something for me (complete different things then regex are also allowed of course)
Kind regards
Saren
Let's write your first regex without the extra level of C escapes:
CRLocalizedString\(#\"[^)]+\"\)
You don't have to escape a " for a regex, so let's get rid of those extra backslashes:
CRLocalizedString\(#"[^)]+"\)
So, you want to match a quoted string using "[^)]+". But that doesn't match every quoted string.
What is a quoted string? It's a ", followed by any number of string atoms, followed by another ". What is a string atom? It's any character except " or \, or a \ followed by any character. So here's a regex for a quoted string:
"([^"\\]|\\.)*"
Sticking that back into your first regex, we get this:
CRLocalizedString\(#"([^"\\]|\\.)*"\)
Here's a link to a regex tester demonstrating that regex.
Quoting it in an Objective-C string literal gives us this:
#"CRLocalizedString\\(#\"([^\"\\\\]|\\\\.)*\"\\)"
It is impossible to write a regex to match calls to CRLocalizedArgString in the general case, because such calls can take arbitrary expressions as arguments, and regexes cannot match arbitrary expressions (because they can contain arbitrary levels of nested parentheses, which regexes cannot match).
You could just hope that there are no parentheses in the argument list, and use this regex:
CRLocalizedArgString\(#"([^"\\]|\\.)*"[^)]*\)
Here's a link to a regex tester demonstrating that regex.
Quoting it in an Objective-C string literal gives us this:
#"CRLocalizedArgString\\(#\"([^\"\\\\]|\\\\.)*\"[^)]*\\)"

RegexKitLite Not Matching NSString Correctly

Alright, I'm trying to write some code that removes words that contain an apostrophe from an NSString. To do this, I've decided to use regular expressions, and I wrote one, that I tested using this website: http://rubular.com/r/YTV90BcgoQ
Here, the expression is: \S*'+\S
As shown on the website, the words containing an apostrophe are matched. But for some reason, in the application I'm writing, using this code:
sourceString = [sourceString stringByReplacingOccurrencesOfRegex:#"\S*'+\S" withString:#""];
Doesn't return any positive result. By NSLogging the 'sourceString', I notice that words like 'Don't' and 'Doesn't' are still present in the output.
It doesn't seem like my expression is the problem, but maybe RegexKitLite doesn't accept certain types of expressions? If someone knows what's going on here, please enlighten me !
Literal NSStrings use \ as an escape character so that you can put things like newlines \n into them. Regexes also use backslashes as an escape character for character classes like \S. When your literal string gets run through the compiler, the backslashes are treated as escape characters, and don't make it to the regex pattern.
Therefore, you need to escape the backslashes themselves in your literal NSString, in order to end up with backslashes in the string that is used as the pattern: #"\\S*'+\\S".
You should have seen a compiler warning about "Unknown escape sequence" -- don't ignore those warnings!