Efficient way or replacing "a" with "an"? - objective-c

In my most recent project I am dynamically constructing sentences, then after the fact going through my text to grammatically "clean things up". One task I have is to switch occurrences of "a" to "an" where the first letter of the next word is a vowel. For now, I'm only concerned with lowercase English language words, and am ignoring following words that begin with 'h'.
The solution I have in place now works, but it looks terribly inefficient and definitely will not scale should I want to support internationalization in the future.
if ([destination rangeOfString:#" a "].location != NSNotFound) {
destination = [destination stringByReplacingOccurrencesOfString:#" a a" withString:#" an a"];
destination = [destination stringByReplacingOccurrencesOfString:#" a e" withString:#" an e"];
destination = [destination stringByReplacingOccurrencesOfString:#" a i" withString:#" an i"];
destination = [destination stringByReplacingOccurrencesOfString:#" a o" withString:#" an o"];
destination = [destination stringByReplacingOccurrencesOfString:#" a u" withString:#" an u"];
}
I check for the " a " case up front, just to skip the inefficiency of all those replacement lines to follow. I'm thinking there must be a way of doing this in a sleeker, more efficient manner, perhaps using regular expressions?

One Foundation tool that could be helpful here is NSRegularExpression, along the regular expression lines you suggested.
Here's an example:
NSString* source = #"What is a apple doing in a toilet? A umbrella is in there too!";
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"\\b([Aa])( [aeiou])"
options:0
error:nil];
NSString* result = [regex
stringByReplacingMatchesInString:source
options:0
range:NSMakeRange(0, [source length])
withTemplate:#"$1n$2"];
A couple minor notes:
The options:0 and error:nil entries are just me punting on options that might be useful in the real-world use case.
I added that word boundary (\\b), to catch what I imagine might be tricky post-punctuation occurrences of "a" (e.g. "It rained; a earthworm appeared."). [edit: whoops, I’m wrong, that was where I was thinking of an “A” starting a sentence.]
Hope that's helpful!

Related

Apostrophes (') is not recognised in regular expression

I want a regular expression for first name that can contain
1)Alphabets
2)Spaces
3)Apostrophes
Exp: Raja, Raja reddy, Raja's,
I used this ^([a-z]+[,.]?[ ]?|[a-z]+[']?)+$ but it is failing to recognise Apostrophes (').
- (BOOL)validateFirstNameOrLastNameOrCity:(NSString *) inputCanditate {
NSString *firstNameRegex = #"^([a-z]+[,.]?[ ]?|[a-z]+[']?)+$";
NSPredicate *firstNamePredicate = [NSPredicate predicateWithFormat:#"SELF MATCHES[c] %#",firstNameRegex];
return [firstNamePredicate evaluateWithObject:inputCanditate];
}
May I recommand ^[A-Z][a-zA-Z ']* ?
// The NSRegularExpression class is currently only available in the Foundation framework of iOS 4
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"^[A-Z][a-zA-Z ']*" options:NSRegularExpressionAnchorsMatchLines error:&error];
NSUInteger numberOfMatches = [regex numberOfMatchesInString:searchText options:0 range:NSMakeRange(0, [string length])];
return numberOfMatches > 1;
^[A-Z] : Force start with a capital letter from A to Z
[a-zA-Z ']* : followed by any number of charactere that an be 'a' to 'z', 'A' to 'Z', space or simple quote
I think you are looking for a pattern like this: ^[a-zA-Z ']+$
However, this is pretty bad. What about umlauts, accents, and a whole lot other letters that are not part of the ASCII alphabet?
A better solution would be to allow any kind of letter from any language.
To do so you can use the Unicode "letter" category \p{L}, e.g. ^[\p{L}]+$.
.. or you could just drop that rule all together - as reasonably suggested.

RegEx for parsing chemical formulas

I need a way to separate a chemical formula into its components. The result should look like
this:
Ag3PO4 -> [Ag3, P, O4]
H2O -> [H2, O]
CH3OOH -> [C, H3, O, O, H]
Ca3(PO4)2 -> [Ca3, (PO4)2]
I don't know regex syntax, but I know I need something like this
[An optional parenthesis][A capital letter][0 or more lowercase letters][0 or more numbers][An optional parenthesis][0 or more numbers]
This worked
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"[A-Z][a-z]*\\d*|\\([^)]+\\)\\d*"
options:0
error:nil];
NSArray *tests = [[NSArray alloc ] initWithObjects:#"Ca3(PO4)2", #"HCl", #"CaCO3", #"ZnCl2", #"C7H6O2", #"BaSO4", nil];
for (NSString *testString in tests)
{
NSLog(#"Testing: %#", testString);
NSArray *myArray = [regex matchesInString:testString options:0 range:NSMakeRange(0, [testString length])] ;
NSMutableArray *matches = [NSMutableArray arrayWithCapacity:[myArray count]];
for (NSTextCheckingResult *match in myArray) {
NSRange matchRange = [match rangeAtIndex:0];
[matches addObject:[testString substringWithRange:matchRange]];
NSLog(#"%#", [matches lastObject]);
}
}
(PO4)2 really sits aside from all.
Let's start from simple, match items without parenthesis:
[A-Z][a-z]?\d*
Using regex above we can successfully parse Ag3PO4, H2O, CH3OOH.
Then we need to somehow add expression for group. Group by itself can be matched using:
\(.*?\)\d+
So we add or condition:
[A-Z][a-z]?\d*|\(.*?\)\d+
Demo
Which works for given cases. But may be you have some more samples.
Note: It will have problems with nested parenthesis. Ex. Co3(Fe(CN)6)2
If you want to handle that case, you can use the following regex:
[A-Z][a-z]?\d*|(?<!\([^)]*)\(.*\)\d+(?![^(]*\))
For Objective-C you can use the expression without lookarounds:
[A-Z][a-z]?\d*|\([^()]*(?:\(.*\))?[^()]*\)\d+
Demo
Or regex with repetitions (I don't know such formulas, but in case if there is anything like A(B(CD)3E(FG)4)5 - multiple parenthesis blocks inside one.
[A-Z][a-z]?\d*|\((?:[^()]*(?:\(.*\))?[^()]*)+\)\d+
Demo
When you encounter a parenthesis group, you don't want to parse what's inside, right?
If there are no nested parenthesis groups you can simply use
[A-Z][a-z]*\d*|\([^)]+\)\d*
\d is a shorcut for [0-9], [^)] means anything but a parenthesis.
See demo here.
This should just about work:
/(\(?)([A-Z])([a-z]*)([0-9]*)(\))?([0-9]*)/g
Play around with it here: http://refiddle.com/
this pattern should work depending on you RegEx engine
([A-Z][a-z]*\d*)|(\((?:[^()]+|(?R))*\)\d*) with gm option
Demo
Better to limit the set of chars to valid chemical names. In simple form:
^((Ac|Ag|Al|Am|Ar|As|At|Au|B|Ba|Be|Bh|Bi|Bk|Br|C|Ca|Cd|Ce|Cf|Cl|Cm|Co|Cr|Cs|Cu|Ds|Db|Dy|Er|Es|Eu|F|Fe|Fm|Fr|Ga|Gd|Ge|H|He|Hf|Hg|Ho|Hs|I|In|Ir|K|Kr|La|Li|Lr|Lu|Md|Mg|Mn|Mo|Mt|N|Na|Nb|Nd|Ne|Ni|No|Np|O|Os|P|Pa|Pb|Pd|Pm|Po|Pr|Pt|Pu|Ra|Rb|Re|Rf|Rg|Rh|Rn|Ru|S|Sb|Sc|Se|Sg|Si|Sm|Sn|Sr|Ta|Tb|Tc|Te|Th|Ti|Tl|Tm|U|V|W|Xe|Y|Yb|Zn|Zr)\d*)+$
This doesn't deal with the parenthesized groups.
This we worked out during the San Diego Python Users Group meeting.

NSPredicate Detect First & Last Name

I am trying to use NSPredicate to evaluate whether or not a NSString has both a first and last name (Essentially a space between two non-digit words). This code hasn't been working for me (Code taken & modified slightly from: What are best practices for validating email addresses in Objective-C for iOS 2.0?:
-(BOOL) validName:(NSString*) nameString {
NSString *regExPattern = #"[A-Z]+_[A-Z]";
NSRegularExpression *regEx = [[NSRegularExpression alloc] initWithPattern:regExPattern options:NSRegularExpressionCaseInsensitive error:nil];
NSUInteger regExMatches = [regEx numberOfMatchesInString:nameString options:0 range:NSMakeRange(0, [nameString length])];
if (regExMatches == 0) {
return NO;
} else
return YES;
}
}
I think there is something wrong with my regEx pattern, but I'm not sure how to fix it. This is how I check the string:
if([self validName:nameTextField.text]) {
// Valid Name
} else {
// Name no valid
}
First, if you want to match a space, then just put a space in the regex pattern. The underscore you have now will require an underscore in your name field in order to match.
Second, NSPredicate matches the whole string against the regex, so the pattern would not catch normal last names (which have more than one character), even with the space. You'll need to add some expression that covers the last part of the name.
Third, since you pass the text field directly into the check, you are putting some pressure on your users to type everything like you expected. You might want to clean the string a bit first, before testing. Personally, I would at least trim the string for spaces and replace multiple spaces with a single one.
Here is some code that does this:
NSString *regExPattern = #"[A-Z]+ [A-Z]+"; //Added a "+" to match the whole string up to the end.
Check:
NSString *name = nameTextField.text;
name = [name stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
name = [name stringByReplacingOccurrencesOfString:#" +"
withString:#" "
options:NSRegularExpressionSearch
range:NSMakeRange(0, name.length)];
if([self validName: name]) {
// Valid Name
} else {
// Name no valid
}
As you can imagine there are many ways to do this, but this is a start. You should consider your test for "correct" names, though, as there are many names that won't pass you simple regex, for instance names with apostrophes and accents, for instance:
Jim O'Malley
Zoë Jones
etc.
If you just want to check for the space-separated fore- and surname, I would try this:
- (BOOL)validName:(NSString*)name
{
NSArray *components = [name componentsSeparatedByString:#" "];
return ([components count] >= 1);
}
This will check if you've at least two components separated by a space. This will also work for names with 3 or more components (middle names).

Objective C. Regular expression to eliminate anything after 3 dots

I wrote the following code to eliminate anything after 3 dots
currentItem.summary = #"I am just testing. I am ... the second part should be eliminated";
NSError * error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(.)*(/././.)(.)*" options:0 error:&error];
if(nil != regex){
currentItem.summary = [regex stringByReplacingMatchesInString:currentItem.summary
options:0 range:NSMakeRange(0, [currentItem.summary length])
withTemplate:#"$1"];
}
However, my input and output are the same. The correct output should be "I am just testing. I am".
I was trying to do this using regular expression because I have a database of other regular expressions that I run on the string. I know the performance might not be as good as a plain text find or replace but the strings involved are short. I also tried using "\" to escape the dots in the regex, but I was getting a warning.
There is another question with a similar topic but the match strings are not for objective c.
This is much easier and will accomplish what you want:
NSRange range = [currentItem.summary rangeOfString:#"..."];
if (range != NSNotFound) {
currentItem.summary = [currentItem.summary substringToIndex:range.location];
}
You have forward slashes, /, instead of backward slashes, \, in your pattern. Also if you wish to match everything before the three dots you should use (.*) - tag everything matched by the enclosed .*. (The other parentheses in the pattern are redundant.)
Nice alternative:
NSScanner *scanner = [NSScanner scannerWithString:currentItem.summary];
[scanner scanUpToString:#"..." intoString: &currentItem.summary];
My recommended regex for your problem:
regularExpressionWithPattern:#"^(.*)\\s*\\.{3}.*$"
Main differences between this one and yours:
uses backslashes to escape special chars
uses ^ and $ to anchor at the beginning and end of the string
only captures the interesting section with ()
strips whitespace before the ... by ignoring any number of whitespace chars (\s*).
After correcting the slashes and other improvements, my final expression is:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"^(.*)\\.{3}.*$"
options:0
error:&error];

Split NSString into words, then rejoin it into original form

I am splitting an NSString like this: (filter string is an nsstring)
seperatorSet = [NSMutableCharacterSet whitespaceAndNewlineCharacterSet];
[seperatorSet formUnionWithCharacterSet:[NSCharacterSet punctuationCharacterSet]];
NSMutableArray *words = [[filterString componentsSeparatedByCharactersInSet:seperatorSet] mutableCopy];
I want to put words back into the form of filter string with the original punctuation and spacing. The reason I want to do this is I want to change some words and put it back together as it was originally.
A more robust way to split by words is to use string enumeration. A space is not always the delimiter and not all languages delimit spaces anyway (e.g. Japanese).
NSString * string = #" \n word1! word2,%$?'/word3.word4 ";
[string enumerateSubstringsInRange:NSMakeRange(0, string.length)
options:NSStringEnumerationByWords
usingBlock:
^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
NSLog(#"Substring: '%#'", substring);
}];
// Logs:
// Substring: 'word1'
// Substring: 'word2'
// Substring: 'word3'
// Substring: 'word4'
NSString *myString = #"Foo Bar Blah B..";
NSArray *myWords = [myString componentsSeparatedByCharactersInSet:
[NSCharacterSet characterSetWithCharactersInString:#" "]
];
NSString* string = [myWords componentsJoinedByString: #" "];
NSLog(#"%#",string);
Since you eliminate the original punctuation, there's no way to turn it back automatically.
The only way is not to use componentsSeparatedByCharactersInSet.
An alternative solution may be to iterate through the string and, for each char, check if it belongs to your character set.
If yes, add the char to a list and the substring to another list (you may use NSMutableArray class).
This way, for example, you know that the punctuation char between the first and the second substring is the first character in your list of separators.
You can use the pathArray componentsJoinedByString: method of the array class to rejoin the words:
NSString *orig = [words pathArray componentsJoinedByString:#" "];
How are you determining which words need to be replaced? Instead of breaking it apart in the first place, perhaps using -stringByReplacingOccurrencesOfString:withString:options:range: would be more suitable.
My guess is you may not be using the best API. If you're really worried about words, you should be using a word-based API. I'm a bit hazy on whether that would be NSDataDetector or something else. (I believe NSRegularExpression can deal with word boundaries in a smarter way.)
If you are using Mac OS X 10.7+ or iOS 4+ you can use NSRegularExpression, The pattern to replace a word is: "\b word \b" - (no spaces around word) \b matches a word boundary. Look at methods replaceMatchesInString:options:range:withTemplate: and stringByReplacingMatchesInString:options:range:withTemplate:.
Under 10.6 pr earlier if you wish to use regular expressions you can wrap the regcomp/regexec C-based functions, they support word boundaries as well. However you may prefer to use one of the other Cocoa options mentioned in other answers for this simple case.