Regexp in iOS to find comments - objective-c

I am trying to find and process 'java-style' comments within a string in objective-C.
I have a few regex snippets which almost work but I am stuck on one hurdle: different options seem to make the different styles work.
For example, I am using this to match:
NSArray* matches = [[NSRegularExpression regularExpressionWithPattern:expression options:NSRegularExpressionAnchorsMatchLines error:nil] matchesInString:string options:0 range:searchRange];
The options here allow me successfully find and process single line comments (//) but not multiline (/* */), if I change the option to NSRegularExpressionDotMatchesLineSeparators then I can make multiline work fine but I can't find the 'end' of a single line comment.
I suppose really I need dot-matches-line-separators but I need a better way of finding the end of a single line comment?
The regexp I have so far are:
#"/\\*.*?\\*/"
#"//.*$"
it's clear to see if dot matches a line separator then the second one (single line) never 'finishes' but how do I fix this? I found some suggestions for single line that were more like:
#"(\/\/[^"\n\r]*(?:"[^"\n\r]*"[^"\n\r]*)*[\r\n])"
But that doesn't' seem to work at all!
Thanks in advance for any pointers.

So it turns out the example I had was pretty close its just for some reason I had some additional backslashes in there that weren't needed, it now reads:
#"(//[^\"\n\r]*(?:\"[^\"\n\r]*\"[^\"\n\r]*)*[\r\n])"
(that is, in the code for Objective-C). And to clarify my own point, I am using DotMatchesLineSeparator and this is working now exactly as I'd expect.

Related

Adding an NSDictionary to an NSMutableDictionary

I've been working on this for a few days now. Searched Stackoverflow and other sites for solutions but none of them appear to work. Most of the postings I've found are quite old (before 2013) so I'm thinking this is not the right way to do this.
I thought this would work:
[localMutableDictionary addEntriesFromDictionary:deviceDictionary];
localMutableDictionary remains null
I've worked around this using an array of integers instead of a mutable dictionary. But that doesn't give me the right result when I add the array to an NSDictionary for subsequent processing with NSJSONSerialisation. Values from my array don't get double quote marks around them. The json receiver / parser is expecting values in quotes (runs with json produced in VB code for a similar app). I can use an alternative parser to work around this, but I would rather get a clean solution.
This is probably a case of there being a simple syntax that I haven't managed to find, or that I'm just using an out-of-date style. Or I may just be adding my array to the NSDictionary "the wrong way". A solution for either method would work for me - thank you.

Xcode finding multiple lines using regular expression

I have the following lines in my code at many places. I want to find all of them at once and replace each of such block with new comment. However i am able search single line at a time. But i am not getting how to include new line in my regular expression to search please help.
// Block Solver
// We develop a block solver that includes the joint limit.
// when the mass has poor distribution (leading to large torques about..
//
Thanks in advance
Search for:
^(?://.*\n?)+
and replace all with nothing.
This will find all lines that start with //.

Objective-C RegexKitLite match one string or another

I'm trying to use regexkitlite for string matching in objective-c and I'm having some problems with it. What I'm trying to do is search a large string for substrings matching:
"http://[something].jpg"
"http://[something].png"
Basically, I want to find all links to images from the original string. What I have currently is:
NSString *regexString = #"http://[a-zA-Z0-9._%+-/]+\.jpg";
Now this is working for .jpg images, but of course it doesn't match .png images. I would really like to use one regexString that would match either, but I can't figure out how.
Reading some regex tutorials for other languages, I think it is something along the lines of:
NSString *regexString = #"http://[a-zA-Z0-9._%+-/]+\.(?:jpg|png)";
But I can't quite get it right.
Any help would be greatly appreciated.
You don't need a non-capturing group around the file extensions. It's good practice to use them, but it could be causing an error here. (Does the library support it?)
Also, I simplified your regex slightly by using a predefined character class.
NSString *regexString = #"http://[\w.%+-/]+\.(jpg|png)";
You can see this in action here.
You can also add any file extensions that you want. Ex: (jpg|png|gif|...).
Updated: Apple now includes regular expression support with NSRegularExpression, which is available in OS X v10.7 and later.

Regex to replace asterisk characters with html bold tag

Does anyone have a good regex to do this? For example:
This is *an* example
should become
This is <b>an</b> example
I need to run this in Objective C, but I can probably work that bit out on my own. It's the regex that's giving me trouble (so rusty...). Here's what I have so far:
s/\*([0-9a-zA-Z ])\*/<b>$1<\/b>/g
But it doesn't seem to be working. Any ideas? Thanks :)
EDIT: Thanks for the answer :) If anyone is wondering what this looks like in Objective-C, using RegexKitLite:
NSString *textWithBoldTags = [inputText stringByReplacingOccurrencesOfRegex:#"\\*([0-9a-zA-Z ]+?)\\*" withString:#"<b>$1<\\/b>"];
EDIT AGAIN: Actually, to encompass more characters for bolding I changed it to this:
NSString *textWithBoldTags = [inputText stringByReplacingOccurrencesOfRegex:#"\\*([^\\*]+?)\\*" withString:#"<b>$1<\\/b>"];
Why don't you just do \*([^*]+)\* and replace it with <b>$1</b> ?
You're only matching one character between the *s. Try this:
s/\*([0-9a-zA-Z ]*?)\*/<b>$1<\/b>/g
or to ensure there's at least one character between the *s:
s/\*([0-9a-zA-Z ]+?)\*/<b>$1<\/b>/g
I wrote a slightly more complex version that ensures the asterisk is always at the boundary so it ignores hanging star characters:
/\*([^\s][^\*]+?[^\s])\*/
Test phrases with which it works and doesn't:
This one regexp works for me (JavaScript)
x.match(/\B\*[^*]+\*\B/g)

How to Parse Some Wiki Markup

Hey guys, given a data set in plain text such as the following:
==Events==
* [[312]] – [[Constantine the Great]] is said to have received his famous [[Battle of Milvian Bridge#Vision of Constantine|Vision of the Cross]].
* [[710]] – [[Saracen]] invasion of [[Sardinia]].
* [[939]] – [[Edmund I of England|Edmund I]] succeeds [[Athelstan of England|Athelstan]] as [[King of England]].
*[[1275]] – Traditional founding of the city of [[Amsterdam]].
*[[1524]] – [[Italian Wars]]: The French troops lay siege to [[Pavia]].
*[[1553]] – Condemned as a [[Heresy|heretic]], [[Michael Servetus]] is [[burned at the stake]] just outside [[Geneva]].
*[[1644]] – [[Second Battle of Newbury]] in the [[English Civil War]].
*[[1682]] – [[Philadelphia]], [[Pennsylvania]] is founded.
I would like to end up with an NSDictionary or other form of collection so that I can have the year (The Number on the left) mapping to the excerpt (The text on the right). So this is what the 'template' is like:
*[[YEAR]] – THE_TEXT
Though I would like the excerpt to be plain text, that is, no wiki markup so no [[ sets. Actually, this could prove difficult with alias links such as [[Edmund I of England|Edmund I]].
I am not all that experienced with regular expressions so I have a few questions. Should I first try to 'beautify' the data? For example, removing the first line which will always be ==Events==, and removing the [[ and ]] occurrences?
Or perhaps a better solution: Should I do this in passes? So for example, the first pass I can separate each line into * [[710]] and [[Saracen]] invasion of [[Sardinia]]. and store them into different NSArrays.
Then go through the first NSArray of years and only get the text within the [[]] (I say text and not number because it can be 530 BC), so * [[710]] becomes 710.
And then for the excerpt NSArray, go through and if an [[some_article|alias]] is found, make it only be [[alias]] somehow, and then remove all of the [[ and ]] sets?
Is this possible? Should I use regular expressions? Are there any ideas you can come up with for regular expressions that might help?
Thanks! I really appreciate it.
EDIT: Sorry for the confusion, but I only want to parse the above data. Assume that that's the only type of markup that I will encounter. I'm not necessarily looking forward to parsing wiki markup in general, unless there is already a pre-existing library which does this. Thanks again!
This code assumes you are using RegexKitLite:
NSString *data = #"* [[312]] – [[Constantine the Great]] is said to have received his famous [[Battle of Milvian Bridge#Vision of Constantine|Vision of the Cross]].\n\
* [[710]] – [[Saracen]] invasion of [[Sardinia]].\n\
* [[939]] – [[Edmund I of England|Edmund I]] succeeds [[Athelstan of England|Athelstan]] as [[King of England]].\n\
*[[1275]] – Traditional founding of the city of [[Amsterdam]].";
NSString *captureRegex = #"(?i)(?:\\* *\\[\\[)([0-9]*)(?:\\]\\] \\– )(.*)";
NSRange captureRange;
NSRange stringRange;
stringRange.location = 0;
stringRange.length = data.length;
do
{
captureRange = [data rangeOfRegex:captureRegex inRange:stringRange];
if ( captureRange.location != NSNotFound )
{
NSString *year = [data stringByMatching:captureRegex options:RKLNoOptions inRange:stringRange capture:1 error:NULL];
NSString *textStuff = [data stringByMatching:captureRegex options:RKLNoOptions inRange:stringRange capture:2 error:NULL];
stringRange.location = captureRange.location + captureRange.length;
stringRange.length = data.length - stringRange.location;
NSLog(#"Year:%#, Stuff:%#", year, textStuff);
}
}
while ( captureRange.location != NSNotFound );
Note that you really need to study up on RegEx's to build these well, but here's what the one I have is saying:
(?i)
Ignore case, I could have left that out since I'm not matching letters.
(?:\* *\[\[)
?: means don't capture this block, I escape * to match it, then there are zero or more spaces (" *") then I escape out two brackets (since brackets are also special characters in a regex).
([0-9]*)
Grab anything that is a number.
(?:\]\] \– )
Here's where we ignore stuff again, basically matching " – ". Note any "\" in the regex, I have to add another one to in the Objective-C string above since "\" is a special character in a string... and yes that means matching a regex escaped single "\" ends up as "\\" in an Obj-C string.
(.*)
Just grab anything else, by default the RegEX engine will stop matching at the end of a line which is why it doesn't just match everything else. You'll have to add code to strip out the [[LINK]] stuff from the text.
The NSRange variables are used to keep matching through the file without re-matching original matches. So to speak.
Don't forget after you add the RegExKitLite class files, you also need to add the special linker flag or you'll get lots of link errors (the RegexKitLite site has installation instructions).
I'm no good with regular expressions, but this sounds like a job for them. I imagine a regex would sort this out for you quite easily.
Have a look at the RegexKitLite library.
If you want to be able to parse Wikitext in general, you have a lot of work to do. Just one complicating factor is templates. How much effort do you want to go to cope with these?
If you're serious about this, you probably should be looking for an existing library which parses Wikitext. A brief look round finds this CPAN library, but I have not used it, so I can't cite it as a personal recommendation.
Alternatively, you might want to take a simpler approach and decide which particular parts of Wikitext you're going to cope with. This might be, for example, links and headings, but not lists. Then you have to focus on each of these and turn the Wikitext into whatever you want that to look like. Yes, regular expressions will help a lot with this bit, so read up on them, and if you have specific problems, come back and ask.
Good luck!