This is a bit of a puzzler for me. I have a string that looks like:
fanspd<fanspd>3</fanspd>
doorinprocess<doorinprocess>0</doorinprocess>
timeremaining<timeremaining>0</timeremaining>
macaddr<macaddr>60:CB:FB:99:99:C1</macaddr>
ipaddr<ipaddr>10.0.0.6</ipaddr>
model<model>4.4eWHF</model>
softver: <softver>2.14.2</softver>
interlock1: <interlock1>0</interlock1>
interlock2: <interlock2>0</interlock2>
cfm: <cfm>2200</cfm>
power: <power>120</power>
inside: <house_temp>-99</house_temp>
<DNS1>10.0.0.1</DNS1>
attic: <attic_temp>76</attic_temp>
OA: <oa_temp>-99</oa_temp>
server response: <server_response>Ó£àêEE²ç©þ]kõ «jsÐ</server_response>
DIP Switches: <DIPS>11100</DIPS>
Remote Switch: <switch2>1111</switch2>
Setpoint:<Setpoint>0</Setpoint>
The string includes the "/n" so I have split it into corrisponding lines that look like
fanspd<fanspd>0</fanspd>
All I really want is the char(s) in the middle of the line. In the above example it would be 0.
I can match everything with regular expressions but by doing the following:
(.*)(<[a-z]+>)(.*)(</[a-z]+>)
But what I'd like is something more that would exclude or strip away or remove all the junk and grab the middle chars.
(!(.*)(!<[a-z]+>))(.*)(!(</[a-z]+>))
I've tried this and it does not work. I've also thought of doing another [NSstring componentsSeparatedByString:#"(with either < or or >"] but that would leave be with more parsing yet to do and I think there should be a way to get just the chars inbetween the tags with either regular expressions or string compare or some such way to parse out the
Any suggestions or help would be greatly appreciated.
Thanks
Two things.
Your regular expression does not escape the forward slash.
Your regular expression seems overly complicated for what you are trying to do.
If all you want is that lone middle character with regular expressions,
Try this:
<[a-z]+>(.*)<\/[a-z]+>
Here's a great tool to play around with:
http://rubular.com
Heck you could probably even get away with:
<[a-z]+>(.*)<\/
EDIT:
I figured out your problem partially, some of the tags part way down contain characters other than a through z. So here you go:
<.+>(.*)<\/.+>
Related
I have a localized string that looks something like this in English:
"
5 Mile(s)
5,252 Step(s)
"
My app is localized both in left-to-right and right-to-left languages so I don't want to make assumptions either about the ordering of the step(s) or about the formatting of the number (e.g. 5,252 can be 5.252 depending on user locale). So I need to account for possibilities that can include things like
Step(s) 5.252
as well as what's above.
A few other caveats
All I know is that if the Step(s) line is in there, it will be on its own line (hence in my regex I require \n at each end of the string)
No guarantee that the Mile(s) information will be in the string at all, let alone whether it will be before or after Step(s)
Here's my attempt at pattern extraction:
NSString *patternString = [NSString stringWithFormat:#"\\n(([0-9,\\.]*)\s*%#|%#\s*([0-9,\\.]*))\\n",
NSLocalizedString(#"Step(s)",nil), NSLocalizedString(#"Step(s)",nil)];
There appear to be two problems with this:
XCode is indicating Unknown escape sequence '\s' for the second \s in the pattern string above
No matches are being found even for strings like the following:
0.2 Mile(s)
1,482 Step(s)
Ideally I would extract the 1,482 out of this string in a way that is localization friendly. How should I modify my regex?
as far as the regex, perhaps this approach might work - it simply matches (with named groups) each couplet of numbers in sequence, with the assumption the first is miles and the second is steps. Decimals in the . or , form are optional:
(?<miles>\d+(?:[.,]\d+)?).*?(?<steps>\d+(?:[.,]\d+)?)
(and i think it should be \\s) - i'm not an ios guy, but if you can use a regex literal it would be way more readable.
regular expression demo
First I'd like to ask - Why is Mile(s) mentioned in the question at all?
And now to my two bits - you could simply use a positive look-ahead:
^(?=.*Step\(s\))[^\d]*(\d+(?:[.,]\d+)?)
It makes sure the expected word is present on the line, and then captures the number on it, allowing for localized, optional, decimal separator and decimals. This way it doesn't matter if the numer is before, or after, the "word".
It doesn't take localization of the "word" into account, but that you seem to have handled by yourself ;)
See it here at regex101.
Your regex is close, although in Obj-C you need to double-escape the \s and (s):
^(([0-9,.]*)\\s*%#|%#\\s*([0-9,.]*))$
In your NSLocalizedString you likely also need to escape the parentheses enclosing (s):
NSString *patternString = [NSString stringWithFormat:#"^(([\\d,.]+)\\s%#|%#\\s([\\d,.]+))$",
NSLocalizedString(#"Step\\(s\\)",nil), NSLocalizedString(#"Step\\(s\\)",nil)];
If you don't escape (s) then the regex engine is probably going to interpret it as a capture group.
Looking at NSLog you can see what the pattern actually reads like:
NSLog(#"patternString: %#", patternString);
Output:
patternString: ^(([\d,.]+)\sStep\(s\)|Step\(s\)\s([\d,.]+))$
Since you mentioned the Mile(s) part may not be in the string at all I'm assuming it isn't relevant to the regular expression. As I understand from the question, you just need to capture the number of steps and nothing else. On this basis, here's a modified version of your existing regex:
NSString *patternString =
[NSString stringWithFormat:#"^(?:([0-9,.]*)\\s*%#|%#\\s*([0-9,.]*))$",
NSLocalizedString(#"Step\\(s\\)",nil), NSLocalizedString(#"Step\\(s\\)",nil)];
Demo:
https://www.regex101.com/r/Q6ff1b/1
This is based on the following tips/modifications:
Use the m (= UREGEX_MULTILINE) flag option when creating the regex to specify that ^ and $ match the start and end of each line. This is more sophisticated than using \n as it will also handle the start and end of the string where this might not be present. See here.
Always use a double backslash (\\) for regex escaping - otherwise NSString will interpret the single backslash to be escaping the next character and convert it before it gets to the regex.
Literal parentheses need to be escaped - e.g. Step\\(s\\) instead of Step(s).
Characters within a character class (i.e. anything within the [] square brackets) don't need to be escaped - so it would be . rather than \\. - the latter.
If you are using (x|y|...) as a choice and don't need it to be a capturing group, use ?: after the first parenthesis to ensure it doesn't get captured - i.e. (?:x|y|...).
I know there are a lot of resources with regex for it. But I could not find the one I want.
My problem is:
I want to remove one line comments (//) from obj-c sources, but I don't want to break the code in it. For instance, with this regex: #"//.*" I can remove all comments, but it also corrupts string literal:
#"bsdv//sdfsdf"
I played with non-capturing parentheses (?:(\"*\")*+), but without success.
Also I found this expression for Python:
r'(\".*?\"|\'.*?\')|(/\*.*?\*/|//[^\r\n]*$)'
It should cover my case, but I've not figure out how to make it work with obj-c.
Please, help me to build proper regex.
UPDATE: Yeah, that's a tough one, I know there're a lot of caveats, other than the one I described. I would appreciate if someone post regex that only fix my issue. Anyway, I gonna post my solution, without regex soon, I hope it will be helpful for anyone who struggling with such problem too.
Try this regex:
(?:^|.*;(?!.*")|#(?:define|endif|ifn?def|import|undef|...).*)\s*(//[^\r\n]+$)
Demo
http://regex101.com/r/jT4xC8
Description
Discussion
Besides all the warnings expressed in the comments, I assume that a single line can appear in two distinct cases:
Case 1: Alone on its line preceded or not by blank chars
Case 2: Not Alone on its line preceded or not by blank chars, and other chars.
In the first case, we match the beginning of the line (^ with /m flag). Then we search zero or more blank chars (\s*) and finally the single line comment: //[$\r\n]+$.
In the second case, if there are other chars on the line, they form statements. Any statement is ended by a semicolon ;. So we search the last statement and its corresponding semicolon .*;(?!.*"). Then we search the single line comment. Those other chars can be also preprocessor statements. In this case, they are introduced by a sharp #.
One important keypoint is that I assume the code passed to the regex is a code that compiles.
There is more
Don't forget also to add some other pre-processor directives that may apply in your case. Check this SO answer: https://stackoverflow.com/a/18014883/363573
Here is my problem:
I am trying to filter out html tags from an NSString object.
Most fixes for this simply remove everything falling between a < and a >, as well as those characters themselves. I am trying to figure out a way to remove the "< . . . >" substring ONLY if it does not contain white space or newline characters.
The way i was thikning about doing it looks something like this
while ([source rangeOfString#"someRegEx" options:NSRegularExpressionSearch].location != NSNotFound) {
//find the range of the substring
//check for newlines/whitespace characters
//replace occurrences of the string with "" if it doesn't have them
}
Firstly, does this seem like a good approach? Secondly, I'm having a lot of problems with figuring out what that regex would look like... does anyone have any ideas what it might look like?
This seems like a fine approach, provided the tags you're looking for really never contain whitespace, as m.buettner points out. The regex would look something like this:
<[^\s]*?>
The [^\s] is a negated character class which matches anything but whitespace characters. The ? makes the * lazy instead of greedy. So this regex in English means "Match a '<', then the smallest possible number of non-whitespace characters, then a '>'."
This is a helpful page.
Maybe you should consider employing an NSXMLParser, described here.
You get quite a rich set of delegate methods to extract whatever you like from the string.
I'm finding it hard to match strings using NSRegularExpression. Generic alpha characters are not a problem with [a-z] but if I need to match a word like 'import' I'm struggling to make it work. I'm sure I have to escape the word in some manner but I can't find any docs around this. A really basic example would be
{{import "hello"}}
where I want to get hold of the string: hello
edit: to clarify - 'hello' could be any string - it's the bit I want returned
This regular expression matches the text between the "-s in your example:
\{\{import "([^"]+)"\}\}
The match will be stored in the first match group.
I've made some good progress with my first attempt at a program, but have hit another road block. I'm taking standard output (as a string) froma console CMD window (results of dsquery piped to dsget) and have found small rectangles in the output. I tried using Regex to clean the little bastards but it seems they are related to the _ (underscore), which I need to keep (to return 2000/NT logins). Odd thing is - when I copy the caharcter and paste it into VS2K10 Express it acts like a carrige return??
Any ideas on finding out what these little SOB's are -- and how to remove them?
Going to try using /U or /A CMD switch next..
The square is often just used whenever a character is not displayable. The character could very well be a CR. You can use a Regular Expression to just get normal characters or remove the CR LF characters using string.replace.
You mentioned that you are using the string.replace function, and I am wondering if you are replacing the wrong character or something like that. If all your trying to do is remove a carriage return I would skip the regular expressions and stick with the string.replace.
Something like this should work...
strInputString = strInputString.replace(chr(13), "")
If not could you post a line or two of code.
On a side note, this might give some other examples....
Character replacement in strings in VB.NET