Parsing "Key" = "Value" pair - objective-c

I'm trying to parse the string in the folowing format using the regex:
"Key" = "Value";
The following code is used to extract the "key" and "value":
NSString* pattern = #"([\"\"'])(?:(?=(\\\\?))\\2.)*?\\1";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern
options:0
error:NULL];
NSRange matchRange = NSMakeRange(0, line.length);
NSTextCheckingResult *match = [regex firstMatchInString:line options:0 range:matchRange];
NSRange rangeKeyMatch = [match rangeAtIndex:0];
matchRange.location = rangeKeyMatch.length;
matchRange.length = line.length - rangeKeyMatch.length;
NSTextCheckingResult *match2 = [regex firstMatchInString:line options:0 range:matchRange];
NSRange rangeValueMatch = [match2 rangeAtIndex:0];
It doesn't look efficient and is not considering the following example as invalid:
"key" = "value" = "something else";
Is there any efficient way to perform parse of this kind of parsing?

I'm not familiar with that dialect, but since you've tagged regex, here's one that should do it in principle: ^"([^"]*)" = "([^"]*)";$
You're not being exact about the format so you may need to add some conditional white-space here and there depending on your input format. Another thing that might come into play is the need to escape the parentheses.
For example with sed, you'd have to write:
echo '"Key" = "Value";' | sed -e 's#^"\([^"]*\)" = "\([^"]*\)";$#key is \1 and value is \2#'

This code should match "key" = "value" and not "key" = "value" = "something else":
NSString *line = #"\"key\" = \"value\"";
NSError *error = NULL;
NSString *pattern = #"\\\"(\\w+)\\\"\\s=\\s\\\"(\\w+)\\\"$";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern
options:NSRegularExpressionAnchorsMatchLines error:&error];
NSRange matchRange = NSMakeRange(0, line.length);
NSTextCheckingResult *match = [regex firstMatchInString:line options:0 range:matchRange];
/* It looks like you were not quite looking at the ranges properly. The rangeAtIndex 0 is actually the entire string. */
NSRange rangeKeyMatch = [match rangeAtIndex:1];
NSRange rangeValueMatch = [match rangeAtIndex:2];
NSLog(#"Key: %#, Value: %#", [line substringWithRange:rangeKeyMatch], [line substringWithRange:rangeValueMatch]);

Related

How can I replace one pair of character with multiple occurrence in a string?

Original String is: This is a sentence with (noun) (verb) (adverb).
Original sentence has three occurrence of (). I need the last one intact but replace rest with #""
Required String: This is a sentence with (adverb).
I can do it with NSRange but I am looking for NSRegularExpression pattern.
Also which is more efficient, one with NSRange or the NSRegularExpression.
CODE
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"\\(.*?\\)" options:NSRegularExpressionCaseInsensitive error:NULL];
NSString *newString = [regex stringByReplacingMatchesInString:modify options:0 range:NSMakeRange(0, [modify length]) withTemplate:#""];
Output:: This is a sentence with
You can obtain the match ranges themselves and do the replacement manually, ignoring the last one.
NSMutableString* newString = [modify mutableCopy];
NSArray<NSTextCheckingResult*>* matches = [regex matchesInString:newString options:0 range:NSMakeRange(0, newString.length)];
if (matches.count >= 2)
{
// Enumerate backwards so that each replacement doesn't invalidate the other ranges
for (NSInteger i = matches.count - 2; i >= 0; i--)
{
NSTextCheckingResult* result = matches[i];
[newString replaceCharactersInRange:result.range withString:#""];
}
}

Objective-C, regular expression match repetition

I found a problem in regular expression to match all group repetition.
This is a simple example:
NSString *string = #"A1BA2BA3BC";
NSString *pattern = #"(A[^AB]+B)+C";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:&error];
NSArray *array = [regex matchesInString:string options:0 range:NSMakeRange(0, [string length])];
Returning array have one element which contains two ranges, whole input string and last captured group "A3B". First two groups, "A1B" and "A2B", are not captured as I expected.
I've tried all from greedy to lazy matching.
A Quantifier Does not Spawn New Capture Groups
Except in .NET, which has CaptureCollections, adding a quantifier to a capture group does not create more captures. The group number stays the same (in your case, Group 1), and the content returned is the last capture of the group.
Reference
Everything about Regex Capture Groups (see Generating New Capture Groups Automatically)
Iterating the Groups
If you wanted to match all the substrings while still validating that they are in a valid string (composed of such groups and ending in C), you could use:
A[^AB]+B(?=(?:A[^AB]+B)*C)
The whole string, of course, would be
^(?:A[^AB]+B)+C$
To iterate the substrings: something like
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"A[^AB]+B(?=(?:A[^AB]+B)*C)" options:0 error:&error];
NSArray *matches = [regex matchesInString:subject options:0 range:NSMakeRange(0, [subject length])];
NSUInteger matchCount = [matches count];
if (matchCount) {
for (NSUInteger matchIdx = 0; matchIdx < matchCount; matchIdx++) {
NSTextCheckingResult *match = [matches objectAtIndex:matchIdx];
NSRange matchRange = [match range];
NSString *result = [subject substringWithRange:matchRange];
}
}
else { // Nah... No matches.
}

Regular expressions to filter text

in objective-c I have a string as follows:
CAST(407704969.734560,
I want to extract the digits:
407704969.734560
The code I'm using is this one:
NSString *stringToCheck = #"CAST(407704969.734560,"
NSRange searchedRange = NSMakeRange(0, [stringToCheck length]);
NSString *pattern = #"(?<=CAST\\()(\\d+?.?\\d+?)(?=,)";
NSError *error = nil;
NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:&error];
NSArray* matches = [regex matchesInString:stringToCheck options:0 range: searchedRange];
for (NSTextCheckingResult* match in matches) {
NSString* matchText = [stringToCheck substringWithRange:[match range]];
NSLog(#"match: %#", matchText);
}
I guess the problem is in the regex, seen that I can't find any tutorial about it.
You could try using following regex:
PATTERN
CAST\((\d+?\.?\d+?),
INPUT
CAST(407704969.734560,
OUTPUT
Match 1: CAST(407704969.734560,
Group 1: 407704969.734560
Or if you only need the digits try this:
PATTERN
(?<=CAST\()(\d+?\.?\d+?)(?=,)
INPUT
CAST(407704969.734560,
OUTPUT
Match 1: 407704969.734560
And here you have not long but really nice regex tutorial:
www.codeproject.com

String Trimming with Certain keyword

I have a string like below.
<br><br><br><br><br> SomeHtmlString <br><br><br><br><br>
I want to remove br tags like trim function preserving middle br tags in SomeHtmlString.
Is there any function to do this shortly?
e.g.
<br><br><br>test1<br><br>test2<br><br><br><br>
to
test1<br><br>test2
Here is a method using regular expressions. It matches only one at a time and replaces that either at the beginning of end of the string.
NSMutableString *replaceMe = [[NSMutableString alloc ]
initWithString:#"<br><br > <br > test<br>test2<br><br>"];
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"^ *<br *> *"
options:NSRegularExpressionCaseInsensitive
error:&error];
do {
;
} while ([regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""] != 0);
regex = [NSRegularExpression
regularExpressionWithPattern:#" *<br *> *$"
options:NSRegularExpressionCaseInsensitive
error:&error];
do {
;
} while ([regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""] != 0);
NSLog(#"string=%#", replaceMe);
and that does strip "<br><br > <br > test<br>test2<br><br>" down to test<br>test2.
It's probably not the neatest solution but it is very easy to modify to match different expressions, with different whitespace, for example.
It's also possible to use the regular expressions to match several <br>s in one go:
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"^ *(<br *> *)+"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""];
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#" *(<br *> *)+$"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex replaceMatchesInString:replaceMe options:NSMatchingCompleted range:NSMakeRange(0, replaceMe.length) withTemplate:#""];
which avoids the looping but is a little harder to modify.
You can do this:
NSString* htmlString= #"<br><br><br><br><br> SomeHtmlString <br><br><br><br><br>";
NSString* pureString= [htmlString stringByReplacingOccurrencesOfString: #"<br>" withString: #""];
So you'll have #" SomeHtmlString " in pureString.
You could use this to strip out the unwanted bits:
[yourString stringByReplacingOccurrencesOfString:#"<br>" withString:#""];
Then you would use something like this to remake your string the way you want it:
NSString *newString = [NSString stringWithFormat:#"<br>%#<br>", yourString];
You might also want to look at stringByTrimmingCharactersInSet:
There are so many things you can do with NSString. Check out the Class Reference: https://developer.apple.com/library/mac/#documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html
EDIT:
substringToIndex: could be your friend here. You can do this to find out if the first 4 characters of your string consist of the characters you want to remove:
NSString *subString = [yourString substringToIndex:4];
if ([subString isEqualToString:#"<br>"]) {
yourString = [yourString substringFromIndex:4];
}
Then you are creating a new string without those 4 characters. You keep doing this until the first 4 character are not equal to the ones you want to remove.
You can do something similar at the end of your string using substringFromIndex. You will need to know the length of your original string to make sure none of your substrings go out of bounds.
Alternative regular expression rendition:
NSString *input = #"<br><br><br><br><br><br>test<br>test2<br><br><br><br><br><br><br><br><br><br>";
__block NSString *output;
NSError *error;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"^(<br>)*(.*?)(<br>)*$"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex enumerateMatchesInString:input
options:0
range:NSMakeRange(0, [input length])
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSRange matchRange = [result rangeAtIndex:2];
output = [input substringWithRange:matchRange];
}];
if (output)
NSLog(#"Found: %#", output);

Regular expression, specifying optional capture groups?

Is there a way to write a regular expression pattern that will create one or two groups based on the input text. (i.e.)
// ONE
NSString *pattern = #""; ([0-9]+).([0-9]+)
NSString *inputText = #"ThisIs MyTest72.56String";
// OUTPUT match = 72.56, group1 = 72, group2 = 56
What I am trying to get is:
// TWO
NSString *pattern = #""; ([0-9]+).([0-9]+)
NSString *inputText = #"ThisIs MyTest72String";
// OUTPUT match = 72, group1 = 72, group2 = Empty
I was thinking I could use (?:) but that just removes the group
What I am after is:
Text = "ThisIs MyTest72String"
Match = 72
Group1 = 72
Group2 = Empty
Text = "ThisIs MyTest72.56String"
Match = 72.56
Group1 = 72
Group2 = 56
EDIT:
This sort of works, although I would like to get rid of the "S" in the initial match.
Pattern = ([0-9]+).([0-9]*)
Text = "ThisIs MyTest72String"
Match = 72S
Group1 = 72 //RangeAtIndex:1 {13,2}
Group2 = Empty //RangeAtIndex:2 {16,0}
Text = "ThisIs MyTest72.56String"
Match = 72.56
Group1 = 72
Group2 = 56
This is close, but in the case of "Empty" (Group2) I was expecting the rangeAtIndex:2 to equal NSNotFound. The docs say "The range {NSNotFound, 0} is returned if one of the capture groups did not participate in this particular match" does the group being empty not count as "Not participating"?
Does this give you what you want?
([0-9]+)(?:\.([0-9]+))?
I've escaped the decimal place (which you hadn't, unsure if this is needed in your target language) and grouped the decimal and everything after it as a optional non captured group.
Should just be a matter of checking for the existence of a second group.
How about this:
NSString *inputText = #"ThisIs MyTest72.56String";
// Setup an NSError object to catch any failures
NSError *error = NULL;
// create the NSRegularExpression object and initialize it with a pattern
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"\\d+.\\d+" options:NSRegularExpressionCaseInsensitive error:&error];
// create an NSRange object using our regex object for the first match in the string httpline
NSRange rangeOfFirstMatch = [regex rangeOfFirstMatchInString:inputText options:0 range:NSMakeRange(0, [inputText length])];
// check that our NSRange object is not equal to range of NSNotFound
if (!NSEqualRanges(rangeOfFirstMatch, NSMakeRange(NSNotFound, 0))) {
// Since we know that we found a match, get the substring from the parent string by using our NSRange object
NSString *substringForFirstMatch = [inputText substringWithRange:rangeOfFirstMatch];
NSLog(#"Extracted string: %#",substringForFirstMatch); // Extracted string: 72.56
regex = [NSRegularExpression regularExpressionWithPattern:#"\\d+" options:NSRegularExpressionCaseInsensitive error:&error];
NSArray *matches = [regex matchesInString:substringForFirstMatch options:0 range:NSMakeRange(0, [substringForFirstMatch length])];
for (NSTextCheckingResult *match in matches) {
NSString *matchString = [substringForFirstMatch substringWithRange:[match range]];
NSLog(#"match string: %#", matchString);
// match string: 72
// match string: 56
}
}
Use this pattern:
pattern = #"([0-9]+)\.([0-9]+)?";
and then in the NSTextCheckingResult check if the group range location is NSNotFound.
Example code:
NSString *pattern = #"([0-9]+).([0-9]+)?";
NSString *string = #"ThisIs MyTest72.56String";
//NSString *string = #"ThisIs MyTest72.XXString";
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:pattern
options:NSRegularExpressionCaseInsensitive
error:nil];
NSTextCheckingResult *match = [regex firstMatchInString:string options:0 range:NSMakeRange(0, string.length)];
for (int groupNumber=1; groupNumber<match.numberOfRanges; groupNumber+=1) {
NSRange groupRange = [match rangeAtIndex:groupNumber];
if (groupRange.location != NSNotFound)
NSLog(#"match %d: '%#'", groupNumber, [string substringWithRange:groupRange]);
else
NSLog(#"match %d: '%#'", groupNumber, #"");
}
NSLog output:
match 1: '72'
match 2: '56'
With the second pattern "match 2: ''".