Are negative lookbehinds supported in Impala? - sql

I am using regexp_like in Impala with a negative lookbehind to find a pattern in a string array. I've built the expression as follows against a sample data set.
Running it yields the following error message.
Invalid regex expression: '(?<=Hello).+'
regexp_like(string_field,'(?<!Hello).+')
result
string_field
no match
Hello World, Bye World
match
Cool, Not Cool
no match
Cool, Hello, Bye Bye
This negative lookbehind works in python. Has anyone else come across this? I've tried looking at the documentation but didn't find anything particularly useful.
A better example.
I am trying to find at least one occurrence from a comma separated string array in which at least one of the array elements is not preceded by the keyword e.g. - ('Hello'). A negative lookaround seems like one of the most elegant solutions for the task at hand.

A little clunky, but this works:
regexp_like(string_field, '(^|,)([^H]|H[^e]|He[^l]|Hel[^l]|Hell[^o])')
See live demo.

Related

How can I negate this expression

I have absolutely no clue how to work with regex's. I am less than a beginner.
I want to find any invalid css names from a string, so I can exclude them. Looking online I found a way to select the valid names using this:
/-?[_a-zA-Z]+[_a-zA-Z0-9-]*/g
What I want to do is negate this expression, so that only '1999' is matched in this example input:
holding-page single 1999 contact id-12 contact single single
To "negate" an expression, turn it into a negative look ahead:
/(?<!\S)(?!-?[_a-zA-Z]+[_a-zA-Z0-9-]*)\S+(?!\S)/g
See live demo.
What this does is match a complete term, but one that does not match your positive regex.
A "complete term" is matched using (?<!\S)\S+(?!\S), which is \S+ (one or more non-whitespace) wrapped in negative look arounds for not a non-whitespace to prevent matching part of a term.
Note that "not a non-whitespace" is not the same as "whitespace", because "not a non-whitespace" also matches the start and end of the input, so leading and trailing terms that are invalid will match too.
Your positive regex has been turned into a negative look ahead by enclosing it in (?!...).

Objective C parse string for middle chars

This is a bit of a puzzler for me. I have a string that looks like:
fanspd<fanspd>3</fanspd>
doorinprocess<doorinprocess>0</doorinprocess>
timeremaining<timeremaining>0</timeremaining>
macaddr<macaddr>60:CB:FB:99:99:C1</macaddr>
ipaddr<ipaddr>10.0.0.6</ipaddr>
model<model>4.4eWHF</model>
softver: <softver>2.14.2</softver>
interlock1: <interlock1>0</interlock1>
interlock2: <interlock2>0</interlock2>
cfm: <cfm>2200</cfm>
power: <power>120</power>
inside: <house_temp>-99</house_temp>
<DNS1>10.0.0.1</DNS1>
attic: <attic_temp>76</attic_temp>
OA: <oa_temp>-99</oa_temp>
server response: <server_response>Ó£àêEE²ç©þ]kõ «jsÐ</server_response>
DIP Switches: <DIPS>11100</DIPS>
Remote Switch: <switch2>1111</switch2>
Setpoint:<Setpoint>0</Setpoint>
The string includes the "/n" so I have split it into corrisponding lines that look like
fanspd<fanspd>0</fanspd>
All I really want is the char(s) in the middle of the line. In the above example it would be 0.
I can match everything with regular expressions but by doing the following:
(.*)(<[a-z]+>)(.*)(</[a-z]+>)
But what I'd like is something more that would exclude or strip away or remove all the junk and grab the middle chars.
(!(.*)(!<[a-z]+>))(.*)(!(</[a-z]+>))
I've tried this and it does not work. I've also thought of doing another [NSstring componentsSeparatedByString:#"(with either < or or >"] but that would leave be with more parsing yet to do and I think there should be a way to get just the chars inbetween the tags with either regular expressions or string compare or some such way to parse out the
Any suggestions or help would be greatly appreciated.
Thanks
Two things.
Your regular expression does not escape the forward slash.
Your regular expression seems overly complicated for what you are trying to do.
If all you want is that lone middle character with regular expressions,
Try this:
<[a-z]+>(.*)<\/[a-z]+>
Here's a great tool to play around with:
http://rubular.com
Heck you could probably even get away with:
<[a-z]+>(.*)<\/
EDIT:
I figured out your problem partially, some of the tags part way down contain characters other than a through z. So here you go:
<.+>(.*)<\/.+>

What is the regex for these cases?

What is the regex for these cases:
29000.12345678900, expected result 29000.123456789
29000.000, expected result 29000
29000.00003400, expected result 29000.000034
In short, I want to eliminate the 0 point if there is no 1-9 found again behind decimal and I also want to eliminate the dot (.) if actually the number can be considered as integer.
I use this regex
(?:.0*$|0*$)
but it gives me this result:
29123.6 from 29123.6400, 4 is gone from there.
When I tested the regex separately, it works perfectly,
.0*$ gives me 29123 from 29123.0000
0*$ gives me 29123.6423 from 29123.642300
Am I missing something with the combined regex?
If you think regex is the best way of doing it, you can just use something like this:
\.?0+$
It works for both cases:
> '12300000.000001130000000'.replace(/\.?0+$/g, '')
"12300000.00000113"
> '12300000.000000000000'.replace(/\.?0+$/g, '')
"12300000"
You can use this regex
^\d+(\.\d*[1-9])?
- -------------
| |->this would match only if the digits after . end with [1-9]
|
|->^ depicts the start of the string..it is necessary to match the pattern
that solves your problem
try it here
You simply want this:
^\d*(\.?\d*[1-9])?
^\d* that means one or more digit before the first group.
In the () that describes matching group.
\.? means single DOT(.) can be there but optional. eg. (.)
\d* there can be one or more digits. eg. (1234)
\.?\d* there can be one DOT and one or more digit eg. (.123)
[1-9] this includes only digit from 1 to 9 only excluding 0. eg. (2344)
Regex
I don't know whether Objective-C supports something like the following construct, but in Python you can do it completely without regular expressions using str.rstrip():
In [1]: def shorten_number(number):
...: return number.rstrip('0').rstrip('.')
In [2]: shorten_number('29000.12345678900')
Out[2]: '29000.123456789'
In [3]: shorten_number('29000.000')
Out[3]: '29000'

How should a string be matched with a regular expression in Objective C

I'm finding it hard to match strings using NSRegularExpression. Generic alpha characters are not a problem with [a-z] but if I need to match a word like 'import' I'm struggling to make it work. I'm sure I have to escape the word in some manner but I can't find any docs around this. A really basic example would be
{{import "hello"}}
where I want to get hold of the string: hello
edit: to clarify - 'hello' could be any string - it's the bit I want returned
This regular expression matches the text between the "-s in your example:
\{\{import "([^"]+)"\}\}
The match will be stored in the first match group.

Regular expression for extracting a number

I would like to be able to extract a number from within a string formatted as follows:
"<[1085674730]> hello foo1, how are you doing?"
I'm a novice with regular expressions, I only want to be able to extract a number that is enclosed in the greater/less-than and bracket symbols, but I'm not sure how to go about it. I have to match numeric digits only, but I'm not sure what syntax is used for only searching within these symbols.
UPDATE:
Thank you all for you input, sorry for not being more specific, as I explained to kiamlaluno, I'm using VB.Net as the language for my application. I was wondering why some of the implementations were not working. In fact, the only one that did work was the one described by Matthew Flaschen. But that captures the symbols around the number as well as the number itself. I would like to only capture the number that is encased in the symbols and filter out the symbols themselves.
Use:
<\[(\d+)\]>
This is tested with ECMAScript regex.
It means:
\[ - literal [
( - open capturing group
\d - digit
+ - one or more
) - close capturing group
\] - literal ]
The overall functionality is to capture one or more digits surrounded by the given characters.
Combine Mathews post with lookarounds http://www.regular-expressions.info/lookaround.html. This will exclude the prefix and suffix.
(?<=<\[)\d+(?=\]>)
I didn't test this regex but it should be very close to what you need. Double check at the link provided.
Hope this helps!
$subject = "<[1085674730]> hello foo1, how are you doing?";
preg_match('/<\[(\d+)\]>/', $subject, $matches);
$matches[1] will contain the number you are looking for.
Use:
/<\[([[:digit:]]+)\]>/
If your implementation doesn't support the handy [:digit:] syntax, then use this:
/<\[([\d]+)\]>/
And if your implementation doesn't support the handy \d syntax, then use this:
/<\[([0-9]+)\]>/