NSRegularExpression escaping parentheses - objective-c

I'm using regular expressions to find some values in a string, however, what I'm trying to find looks something like this:
Dealt to SanderDecler [2s 5d]
But I can't seem to find a way to escape these square brackets, I've had the same problem with parentheses earlier. I've tried to escape them like this \( or \[, but that didn't give any matches. So I just replaced that with a dot, and it did match, however, that doesn't seem like the best way to do it, and I can imagine it's better for performance to specify the exact character too...
So my question is, how can I match parantheses and square brackets?
Here's how my code looks like now, this is working, but non-optimal:
NSString *expression =
#"^Dealt to (.{1,12}) .([0-9TJKQA][cdhs]) ([0-9TJKQA][cdhs]).";
NSRegularExpression *regex =
[NSRegularExpression regularExpressionWithPattern:expression
options:NSRegularExpressionAnchorsMatchLines
error:nil];
for (NSTextCheckingResult *result in [regex matchesInString:history options:NSMatchingReportCompletion range:NSMakeRange(0, history.length)])
{
NSLog(#"%#", [history substringWithRange:[result rangeAtIndex:0]]);
}

Try this:
#"^Dealt to (.{1,12}) \\[([0-9TJKQA][cdhs]) ([0-9TJKQA][cdhs])\\]"

Related

RegEx for parsing chemical formulas

I need a way to separate a chemical formula into its components. The result should look like
this:
Ag3PO4 -> [Ag3, P, O4]
H2O -> [H2, O]
CH3OOH -> [C, H3, O, O, H]
Ca3(PO4)2 -> [Ca3, (PO4)2]
I don't know regex syntax, but I know I need something like this
[An optional parenthesis][A capital letter][0 or more lowercase letters][0 or more numbers][An optional parenthesis][0 or more numbers]
This worked
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"[A-Z][a-z]*\\d*|\\([^)]+\\)\\d*"
options:0
error:nil];
NSArray *tests = [[NSArray alloc ] initWithObjects:#"Ca3(PO4)2", #"HCl", #"CaCO3", #"ZnCl2", #"C7H6O2", #"BaSO4", nil];
for (NSString *testString in tests)
{
NSLog(#"Testing: %#", testString);
NSArray *myArray = [regex matchesInString:testString options:0 range:NSMakeRange(0, [testString length])] ;
NSMutableArray *matches = [NSMutableArray arrayWithCapacity:[myArray count]];
for (NSTextCheckingResult *match in myArray) {
NSRange matchRange = [match rangeAtIndex:0];
[matches addObject:[testString substringWithRange:matchRange]];
NSLog(#"%#", [matches lastObject]);
}
}
(PO4)2 really sits aside from all.
Let's start from simple, match items without parenthesis:
[A-Z][a-z]?\d*
Using regex above we can successfully parse Ag3PO4, H2O, CH3OOH.
Then we need to somehow add expression for group. Group by itself can be matched using:
\(.*?\)\d+
So we add or condition:
[A-Z][a-z]?\d*|\(.*?\)\d+
Demo
Which works for given cases. But may be you have some more samples.
Note: It will have problems with nested parenthesis. Ex. Co3(Fe(CN)6)2
If you want to handle that case, you can use the following regex:
[A-Z][a-z]?\d*|(?<!\([^)]*)\(.*\)\d+(?![^(]*\))
For Objective-C you can use the expression without lookarounds:
[A-Z][a-z]?\d*|\([^()]*(?:\(.*\))?[^()]*\)\d+
Demo
Or regex with repetitions (I don't know such formulas, but in case if there is anything like A(B(CD)3E(FG)4)5 - multiple parenthesis blocks inside one.
[A-Z][a-z]?\d*|\((?:[^()]*(?:\(.*\))?[^()]*)+\)\d+
Demo
When you encounter a parenthesis group, you don't want to parse what's inside, right?
If there are no nested parenthesis groups you can simply use
[A-Z][a-z]*\d*|\([^)]+\)\d*
\d is a shorcut for [0-9], [^)] means anything but a parenthesis.
See demo here.
This should just about work:
/(\(?)([A-Z])([a-z]*)([0-9]*)(\))?([0-9]*)/g
Play around with it here: http://refiddle.com/
this pattern should work depending on you RegEx engine
([A-Z][a-z]*\d*)|(\((?:[^()]+|(?R))*\)\d*) with gm option
Demo
Better to limit the set of chars to valid chemical names. In simple form:
^((Ac|Ag|Al|Am|Ar|As|At|Au|B|Ba|Be|Bh|Bi|Bk|Br|C|Ca|Cd|Ce|Cf|Cl|Cm|Co|Cr|Cs|Cu|Ds|Db|Dy|Er|Es|Eu|F|Fe|Fm|Fr|Ga|Gd|Ge|H|He|Hf|Hg|Ho|Hs|I|In|Ir|K|Kr|La|Li|Lr|Lu|Md|Mg|Mn|Mo|Mt|N|Na|Nb|Nd|Ne|Ni|No|Np|O|Os|P|Pa|Pb|Pd|Pm|Po|Pr|Pt|Pu|Ra|Rb|Re|Rf|Rg|Rh|Rn|Ru|S|Sb|Sc|Se|Sg|Si|Sm|Sn|Sr|Ta|Tb|Tc|Te|Th|Ti|Tl|Tm|U|V|W|Xe|Y|Yb|Zn|Zr)\d*)+$
This doesn't deal with the parenthesized groups.
This we worked out during the San Diego Python Users Group meeting.

Regex pattern to find occurrences of html tags

Say I have a string that looks like this:
iword/i
Here the tag is i. This is similar to an HTML tag except without the <> angle brackets.
Or say I have
emword/em
Here the tag is em.
What I want is a pattern that removes these tags.
I'm testing this pattern:
<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1> on http://rubular.com/, but it is not working properly.
Specifically, what I want to do is with Objective-C:
NSString *string = #"iword/i";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:NSRegularExpressionCaseInsensitive error:&error];
return [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, string.length) withTemplate:#""];
which will just remove all but word.
You're going to need a complete list of html tags you want to remove then (i, em, b, what else?) since you're going to have to search specifically for the tags to remove.
One way of doing this is: \b(i|em|b)(\w*)\/(i|em|b)\b (and as you've seen before with Obj-c, likely some double \ escaping)
In action: http://regex101.com/r/qL3cU9
Input:
iword/i
emword/em
bword/b
ibword/ib
notgoing/tomatch this
Substitution result:
word
word
word
ibword/ib
notgoing/tomatch this

Objective C Regex?

I'm trying to parse a 7-digit number from a page's source code and the pattern that I look for is
/nnnnnnn"
where "n" is a digit. I'm trying with the following regex and in a regex test site it works, but not in obj-c. Is it possible that I'm passing the wrong option or something?
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"/\d\d\d\d\d\d\d\">" options:NSRegularExpressionSearch error:nil];
NSUInteger numberOfMatches = [regex numberOfMatchesInString:contents
options:0
range:NSMakeRange(0, [contents length])];
You should double the backslashes in front of your ds, like this:
#"/\\d\\d\\d\\d\\d\\d\\d\">"
Backslash is a special character inside a string literal: the character after it is interpreted differently. In order for the regex engine to see a backslash, you need two slashes in the literal.

Objective-C: Parsing String into an Array under Special Circumstances

I have a string:
[{"id":1,"gameName":"arizona","cost":"0.5E1","email":"hi#gmail.com","requests":0},{"id":2,"gameName":"arizona","cost":"0.5E1","email":"hi#gmail.com","requests":0},{"id":3,"gameName":"arizona","cost":"0.5E1","email":"hi#gmail.com","requests":0}]
However, I would like to parse this string into an array such as:
[{"id":1,"gameName":"arizona","cost":"0.5E1","email":"hi#gmail.com","requests":0},
{"id":2,"gameName":"arizona","cost":"0.5E1","email":"hi#gmail.com","requests":0},
{"id":3,"gameName":"arizona","cost":"0.5E1","email":"hi#gmail.com","requests":0}]
This array is delimited by the comma in between the curly braces: },{
I tride usign the command
NSArray *responseArray = [response componentsSeparatedByString:#","];
but this separates the string into values at EVERY comma, which is not desirable.
Then I tried using regex:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"\\{.*\\}" options:NSRegularExpressionCaseInsensitive error:&error];
NSArray *matches = [regex matchesInString:response options:0 range:NSMakeRange(0, [response length])];
which found one match: starting at the first curly brace to the last curly brace.
I was wondering if anyone new how to solve this problem efficiently?
This string seems to be valid JSON. Try a JSON parser: NSJSONSerialization
I agree with H2CO3's suggestion to use a parser where possible.
But looking at your attempted regex, it looks like you just need to make it non-greedy, i.e.
#"\\{.*?\\}"
^
|
Add this question mark for non-greedy matching.
Of course, this will fail if you have deeper levels of (what I assume to be) nested arrays. Go with the JSON parser!

Objective C. Regular expression to eliminate anything after 3 dots

I wrote the following code to eliminate anything after 3 dots
currentItem.summary = #"I am just testing. I am ... the second part should be eliminated";
NSError * error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(.)*(/././.)(.)*" options:0 error:&error];
if(nil != regex){
currentItem.summary = [regex stringByReplacingMatchesInString:currentItem.summary
options:0 range:NSMakeRange(0, [currentItem.summary length])
withTemplate:#"$1"];
}
However, my input and output are the same. The correct output should be "I am just testing. I am".
I was trying to do this using regular expression because I have a database of other regular expressions that I run on the string. I know the performance might not be as good as a plain text find or replace but the strings involved are short. I also tried using "\" to escape the dots in the regex, but I was getting a warning.
There is another question with a similar topic but the match strings are not for objective c.
This is much easier and will accomplish what you want:
NSRange range = [currentItem.summary rangeOfString:#"..."];
if (range != NSNotFound) {
currentItem.summary = [currentItem.summary substringToIndex:range.location];
}
You have forward slashes, /, instead of backward slashes, \, in your pattern. Also if you wish to match everything before the three dots you should use (.*) - tag everything matched by the enclosed .*. (The other parentheses in the pattern are redundant.)
Nice alternative:
NSScanner *scanner = [NSScanner scannerWithString:currentItem.summary];
[scanner scanUpToString:#"..." intoString: &currentItem.summary];
My recommended regex for your problem:
regularExpressionWithPattern:#"^(.*)\\s*\\.{3}.*$"
Main differences between this one and yours:
uses backslashes to escape special chars
uses ^ and $ to anchor at the beginning and end of the string
only captures the interesting section with ()
strips whitespace before the ... by ignoring any number of whitespace chars (\s*).
After correcting the slashes and other improvements, my final expression is:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"^(.*)\\.{3}.*$"
options:0
error:&error];