How should a string be matched with a regular expression in Objective C - objective-c

I'm finding it hard to match strings using NSRegularExpression. Generic alpha characters are not a problem with [a-z] but if I need to match a word like 'import' I'm struggling to make it work. I'm sure I have to escape the word in some manner but I can't find any docs around this. A really basic example would be
{{import "hello"}}
where I want to get hold of the string: hello
edit: to clarify - 'hello' could be any string - it's the bit I want returned

This regular expression matches the text between the "-s in your example:
\{\{import "([^"]+)"\}\}
The match will be stored in the first match group.

Related

REGEX_EXTRACT for specific pattern inside brackets

Trying to use REGEX_EXTRACT in SQL to extract certain string patterns inside Brackets.
So I have tried this formula: REGEX_EXTRACT(column, r'\[(.*?)\]'), but problem is that there are multiple Brackets in the same cell, and this formula will only extract the first string pattern in the first bracket.
So, what I'm trying to figure out is how can I extract specific patterns within the Brackets? The pattern I'm looking for looks like this: [xx-XX]
Where x can be any string in the alphabet.
Any tips or directions would be greatly appreciated
This should work if you always have 2 lowercase letters followed by '-' and then followed by 2 uppercase letters:
\[([a-z]{2}-[A-Z]{2})\]

regex capture middle of url

I'm trying to figure out the base regex to capture the middle of a google url out of a sql database.
For example, a few links:
https://www.google.com/cars/?year=2016&model=dodge+durango&id=1234
https://www.google.com/cars/?year=2014&model=jeep+cherokee+crossover&id=6789
What would be the regex to capture the text to get dodge+durango , or jeep+cherokee+crossover ? (It's alright that the + still be in there.)
My Attempts:
1)
\b[=.]\W\b\w{5}\b[+.]?\w{7}
, but this clearly does not work as this is a hard coded scenario that would only work like something for the dodge durango example. (would extract "dodge+durango)
2) Using positive lookback ,
[^+]( ?=&id )
but I am not fully sure how to use this, as this only grabs one character behind the & symbol.
How can I extract a string of (potentially) any length with any amount of + delimeters between the "model=" and "&id" boundaries?
seems like you could use regexp_replace and access match groups:
regexp_replace(input, 'model=(.*?)([&\\s]|$)', E'\\1')
from here:
The regexp_replace function provides substitution of new text for
substrings that match POSIX regular expression patterns. It has the
syntax regexp_replace(source, pattern, replacement [, flags ]). The
source string is returned unchanged if there is no match to the
pattern. If there is a match, the source string is returned with the
replacement string substituted for the matching substring. The
replacement string can contain \n, where n is 1 through 9, to indicate
that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted. Write \ if you need to put a literal backslash in the
replacement text. The flags parameter is an optional text string
containing zero or more single-letter flags that change the function's
behavior. Flag i specifies case-insensitive matching, while flag g
specifies replacement of each matching substring rather than only the
first one
I may be misunderstanding, but if you want to get the model, just select everything between model= and the ampersand (&).
regexp_matches(input, 'model=([^&]*)')
model=: Match literally
([^&]*): Capture
[^&]*: Anything that isn't an ampersand
*: Unlimited times

Objective C parse string for middle chars

This is a bit of a puzzler for me. I have a string that looks like:
fanspd<fanspd>3</fanspd>
doorinprocess<doorinprocess>0</doorinprocess>
timeremaining<timeremaining>0</timeremaining>
macaddr<macaddr>60:CB:FB:99:99:C1</macaddr>
ipaddr<ipaddr>10.0.0.6</ipaddr>
model<model>4.4eWHF</model>
softver: <softver>2.14.2</softver>
interlock1: <interlock1>0</interlock1>
interlock2: <interlock2>0</interlock2>
cfm: <cfm>2200</cfm>
power: <power>120</power>
inside: <house_temp>-99</house_temp>
<DNS1>10.0.0.1</DNS1>
attic: <attic_temp>76</attic_temp>
OA: <oa_temp>-99</oa_temp>
server response: <server_response>Ó£àêEE²ç©þ]kõ «jsÐ</server_response>
DIP Switches: <DIPS>11100</DIPS>
Remote Switch: <switch2>1111</switch2>
Setpoint:<Setpoint>0</Setpoint>
The string includes the "/n" so I have split it into corrisponding lines that look like
fanspd<fanspd>0</fanspd>
All I really want is the char(s) in the middle of the line. In the above example it would be 0.
I can match everything with regular expressions but by doing the following:
(.*)(<[a-z]+>)(.*)(</[a-z]+>)
But what I'd like is something more that would exclude or strip away or remove all the junk and grab the middle chars.
(!(.*)(!<[a-z]+>))(.*)(!(</[a-z]+>))
I've tried this and it does not work. I've also thought of doing another [NSstring componentsSeparatedByString:#"(with either < or or >"] but that would leave be with more parsing yet to do and I think there should be a way to get just the chars inbetween the tags with either regular expressions or string compare or some such way to parse out the
Any suggestions or help would be greatly appreciated.
Thanks
Two things.
Your regular expression does not escape the forward slash.
Your regular expression seems overly complicated for what you are trying to do.
If all you want is that lone middle character with regular expressions,
Try this:
<[a-z]+>(.*)<\/[a-z]+>
Here's a great tool to play around with:
http://rubular.com
Heck you could probably even get away with:
<[a-z]+>(.*)<\/
EDIT:
I figured out your problem partially, some of the tags part way down contain characters other than a through z. So here you go:
<.+>(.*)<\/.+>

Regular expression for extracting a number

I would like to be able to extract a number from within a string formatted as follows:
"<[1085674730]> hello foo1, how are you doing?"
I'm a novice with regular expressions, I only want to be able to extract a number that is enclosed in the greater/less-than and bracket symbols, but I'm not sure how to go about it. I have to match numeric digits only, but I'm not sure what syntax is used for only searching within these symbols.
UPDATE:
Thank you all for you input, sorry for not being more specific, as I explained to kiamlaluno, I'm using VB.Net as the language for my application. I was wondering why some of the implementations were not working. In fact, the only one that did work was the one described by Matthew Flaschen. But that captures the symbols around the number as well as the number itself. I would like to only capture the number that is encased in the symbols and filter out the symbols themselves.
Use:
<\[(\d+)\]>
This is tested with ECMAScript regex.
It means:
\[ - literal [
( - open capturing group
\d - digit
+ - one or more
) - close capturing group
\] - literal ]
The overall functionality is to capture one or more digits surrounded by the given characters.
Combine Mathews post with lookarounds http://www.regular-expressions.info/lookaround.html. This will exclude the prefix and suffix.
(?<=<\[)\d+(?=\]>)
I didn't test this regex but it should be very close to what you need. Double check at the link provided.
Hope this helps!
$subject = "<[1085674730]> hello foo1, how are you doing?";
preg_match('/<\[(\d+)\]>/', $subject, $matches);
$matches[1] will contain the number you are looking for.
Use:
/<\[([[:digit:]]+)\]>/
If your implementation doesn't support the handy [:digit:] syntax, then use this:
/<\[([\d]+)\]>/
And if your implementation doesn't support the handy \d syntax, then use this:
/<\[([0-9]+)\]>/

How to remove strings contained in a list in VB.NET?

How can I find words like and, or, to, a, no, with, for etc. in a sentence using VB.NET and remove them. Also where can I find all words list like above.
Note that unless you use Regex word boundaries you risk falling afoul of the Scunthorpe (Sfannythorpe) problem.
string pattern = #"\band\b";
Regex re = new Regex(pattern);
string input = "a band loves and its fans";
string output = re.Replace(input, ""); // a band loves its fans
Notice the 'and' in 'band' is untouched.
You can indeed replace your list of words using the .Replace function (as colithium described) ...
myString.Replace("and", "")
Edit:
... but indeed, a nicer way is to use Regular Expressions (as edg suggested) to avoid replacing parts of words.
As your question suggests that you would like to clean-up a sentence to keep meaningfull words, you have to do more than just remove two- and three letter words.
What you need is a list of stop-words:
http://en.wikipedia.org/wiki/Stop_word
A comma seperated list of stop-words for the English language can be found here:
http://www.textfixer.com/resources/common-english-words.txt
The easiest way is:
myString.Replace("and", "")
You'd loop over your word list and have a statement like the above. Google for a list of common English words?
List of English 2 Letter Words
List of English 3 Letter Words
You can match the words and remove them using regular expressions.