Regex negative lookahead not excluding required string - sql

Good day
I've been trying to apply a specific exclusion in my regex_like function within Netezza SQL: regexp_like(trim(upper(a.MRCH_NME)),'\bMAKRO\s?(?!DEBTORS)','i')
Unfortunately I can't get the expression to exclude the "DEBTORS". Would anyone be able to assist me in finding my mistake?
Thank you.

Here is one quick fix to your pattern:
\bMAKRO\s(?!DEBTORS)
Code:
regexp_like(trim(upper(a.MRCH_NME)),'\bMAKRO\s(?!DEBTORS)','i')
The reason your current pattern is allowing MAKRO DEBTORS WOODME to pass is that it can take the \s as optional, then assert that DEBTORS does not immediately follow MAKRO.
Demo
Edit:
You could also rewrite your negative lookahead slightly to this:
\bMAKRO(?!\s?DEBTORS)

Related

How to remove several symbols from a string in BiqQuery

I have strings which contain numbers like that:
a20cdac0_19221bdc12022bab3fe05a43df4a7dbe
I need to get only symbols after underscore symbol:
19221bdc12022bab3fe05a43df4a7dbe
Unfortunately, the amount of those symbols is always different, so I can't use just RIGHT function.
I know that probably REGEXP might help, but I can't understand how to use that exactly. Will be very grateful for the help.
Below is for BigQuery Standard SQL (using regexp)
regexp_extract(value, r'_(.*)') regexp_approach
if to apply to sample value from your question
regexp_extract('a20cdac0_19221bdc12022bab3fe05a43df4a7dbe', r'_(.*)') regexp_approach
result is
Yet, another regexp option is to use regexp_replace as in below example
regexp_replace(value, r'^.*?_', '')
Note: using split in this case is also an option unless you have more than one _ in which case you will get part between first and second _
split(value, '_')[safe_offset(1)]
Also, as you can see you need to use safe to prevent error in cases when _ is absent
You can use the split function like this
select split('a20cdac0_19221bdc12022bab3fe05a43df4a7dbe','_')[ORDINAL(2)];
https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#split

Regex matching sequence of characters

I have a test string such as: The Sun and the Moon together, forever
I want to be able to type a few characters or words and be able to match this string if the characters appear in the correct sequence together, even if there are missing words. For example, the following search word(s) should all match against this string:
The Moon
Sun tog
Tsmoon
The get ever
What regex pattern should I be using for this? I should add that the supplied test strings are going to be dynamic within an app, and so I'd like to be able to use a pattern based on the search string.
From your example Tsmoon you show partial words (T), ignoring case (s, m) and allow anything between each entered character. So as a first attempt you can:
Set the ignore case option
Between each chapter input insert the regular expression to match zero or more of anything. You can choose whether to match the shortest or longest run.
Try that, reading the documentation for NSRegularExpression if you're stuck, and see how it goes. If you get stuck ask a new question showing your code and the RE constructed and explain what happens/doesn't work as expected.
HTH

Usage of Regular Expression Extractor JMeter?

Using Regular Extractor in JMeter, I need to get the value of "fullBkupUNIXTime" from the below response,
{"fullBackupTimeString":["Mon 10 Apr 2017 14:14:36"],"fullBkupUNIXTime":["1491833676"],"fullBackupDirName":["10_04_2017_0636"]}
I tried with Ref Name as time and
Regular Expression: "fullBkupUNIXTime": "([0-9])" and "(.+?)"
and pass them as input for 2nd request ${time}
The above 2 two doesn't work out for me.
Please Help me out of this.
First of all: why not just use this thing?
Then, if you firm with your RegExp adventure to get happen.
First expression is not going to work because you've defined it to match exactly one [0-9] charcter.
Add the appropriate repetition character, like "fullBkupUNIXTime": "([0-9]+)".
And basically it make sense to tell the engine to stop at first narrowest match too: "fullBkupUNIXTime": "([0-9]+?)"
Next, make sure you're handling space chars between key and value and colon mark properly. Better mark them explicitly, if any, with \s
And last but not least: make sure you're properly handle multiple lines (if appropriate, of course). Add the (?m) modifier to your expression.
And/or (?im) to be not case-sensitive, in addition.
[ is a reserve character in regex, you need to escape it, in your case use:
Regular Expression fullBkupUNIXTime":\["(\d+)
Template: $1$
Match No.: 1

Find string that is not between specific html tag

I'm being required to use regular expressions to parse HTML. I do realize regular expressions are bad for HTML matching.
I would like to find a specific string and evaluate whether or not its between two strings.
In this example ® must be immediately between <sup> and </sup>
Example:
<sup>®</sup>
I believe this would involve using negative lookaheads and lookbehinds. My first thought would be:
(?<!<sup>)®(?!<\/sup>)
Unfortunately this fails as I don't believe you can do a lookahead and lookbehind in this combination.
Just using the negative-lookahead does work and is probably good enough for my purposes...
®(?!<\/sup>)
...but I'd like to know if it's possible to combine a lookahead and lookbehind in this way. Or is there another technique I should be using?
Thanks in advance
Your initial regex (i.e. (?<!<sup>)®(?!<\/sup>)) is correct, as demonstrated in the example usage at https://www.debuggex.com/r/WyY9y0Zq2Krz_3Xm
However, it works in Python and PCRE, but not in Javascript (you can check by choosing each of them in the dropdown). Javascript does not have negative lookbehind support.

How to filter by dateHour on Google Analytics API?

I'm trying to retrieve results from the Google Analytics API that are between two specific dateHours. I tried to use the following filter:
'filters' => 'ga:dateHour>2013120101;ga:dateHour<2013120501'
But this doesn't work. What do I need to do? (I've used other filters before, and they work just fine).
First off, you're trying to use greater than and less than on a dimension which doesn't work. Dimension filters only allow ==, !=, =#, !#,=~, and !=.
So, if you want to get ga:dateHour, you'll have to use a regex filter:
ga:dateHour=~^2013120([1-4][0-2][0-9]|500)$;ga:dateHour!~^201312010[0-1]$
EDIT:
I wouldn't hesitate a second to learn RegEx, especially if you're a programmer. Here is a great SO post on learning regular expressions.
So, to break it down:
=~ looking to match the following regex
!= not looking to match the following regex
(check out GA's filter operators)
^ at the start of a string
2013120 since the entire dateHour range contained this string of numbers, look for that
([1-4][0-2][0-9]|500) match every number after 2013120, so the first [1-4] will match 20131201, 20131202, 20131203, and 20131204, then within those strings we want the next number to be [0-2] and likewise with [0-9]. So look at each [ ] as a placeholder for a range of digits.
| means or
500 says, we only want 500 and nothing more, so it's very specific.
The whole statement is wrapped in () so we can say he, match [1-4][0-2][0-9] OR 500, after 2013120.
Then, we end it with $ to signify the end of the string.
This probably isn't the most concise way to describe how this filter is working, but what I would do is use a website like regexpal or some other regex testing tool, and get the range you'd like to filter and start writing regex.
Best of luck!