BigQuery - Illegal Escape Sequence at REGEXP_REPLACE - google-bigquery

I'm having an issue matching regular expression in BigQuery.
REGEXP_REPLACE(tc.metadata->>'document_number', '\D', '', 'g') = m.document_number
However, BigQuery doesn't seem to like escape sequences for some reason and I get this error that I can't figure out:
Syntax error: Illegal escape sequence: \D
This code works fine, but BigQuery is unhappy with it and I can't figure out why. Thanks in advance for the help

You need to double escape the the character in BigQuery, as the first / will be consumed by JavaScript.
Try double escaping, e.g. \\D and that should work for you.

In [1] if you scroll down you can see the escaping sequences of standard SQL and none is \D, so as Ben P says, you need to double escape to have the backslash escaping sequence. I assume it's what is missing but if you could elaborate further your question the answer would be indeed more accurate.
[1] https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical#string_and_bytes_literals

Related

How to include apostrophe in character set for REGEXP_SUBSTR()

The IBM i implementation of regex uses apostrophes (instead of e.g. slashes) to delimit a regex string, i.e.:
... where REGEXP_SUBSTR(MYFIELD,'myregex_expression')
If I try to use an apostrophe inside a [group] within the expression, it always errors - presumably thinking I am giving a closing quote. I have tried:
- escaping it: \'
- doubling it: '' (and tripling)
No joy. I cannot find anything relevant in the IBM SQL manual or by google search.
I really need this to, for instance, allow names like O'Leary.
Thanks to Wiktor Stribizew for the answer in his comment.
There are a couple of "gotchas" for anyone who might land on this question with the same problem. The first is that you have to give the (presumably Unicode) hex value rather than the EBCDIC value that you would use, e.g. in ordinary interactive SQL on the IBM i. So in this case it really is \x27 and not \x7D for an apostrophe. Presumably this is because the REGEXP_ ... functions are working through Unicode even for EBCDIC data.
The second thing is that it would seem that the hex value cannot be the last one in the set. So this works:
^[A-Z0-9_\+\x27-]+ ... etc.
But this doesn't
^[A-Z0-9_\+-\x27]+ ... etc.
I don't know how to highlight text within a code sample, so I draw your attention to the fact that the hyphen is last in the first sample and second-to-last in the second sample.
If anyone knows why it has to not be last, I'd be interested to know. [edit: see Wiktor's answer for the reason]
btw, using double quotes as the string delimiter with an apostrophe in the set didn't work in this context.
A single quote can be defined with the \x27 notation:
^[A-Z0-9_+\x27-]+
^^^^
Note that when you use a hyphen in the character class/bracket expression, when used in between some chars it forms a range between those symbols. When you used ^[A-Z0-9_\+-\x27]+ you defined a range between + and ', which is an invalid range as the + comes after ' in the Unicode table.

BigQuery - Illegal Escape Sequence

I'm having an issue matching regular expression in BigQuery. I have the following line of code that tries to identify user agents:
when regexp_contains((cs_user_agent), '^AppleCoreMedia\/1\.(.*)iPod') then "iOS App - iPod"
However, BigQuery doesn't seem to like escape sequences for some reason and I get this error that I can't figure out:
Syntax error: Illegal escape sequence: \/ at [4:63]
This code works fine in a regex validator I use, but BigQuery is unhappy with it and I can't figure out why. Thanks in advance for the help
Use regexp_contains((cs_user_agent), r'^AppleCoreMedia\/1\.(.*)iPod')

Write regex for pattern like W00001

I am new to Regular Expressions and any help is highly appreciated.
Pattern like W00000,W00001,W00002,W00004
Must begin with W
Each string before comma must be six characters
String can only be repeated four times
Comma in between
Must not begin or end with comma
I tried below pattern and some others, like (^[W]{1}\d{5}){1,4}'), and none of them work correctly:
Select 'X' from dual Where REGEXP_LIKE ('W12342','(^[W]{1}\d{5})(?<!,)$')
My understanding is that the OP is saying the match should fail if the string begins or ends with a comma, not just that the preceding or trailing commas shouldn't match, so anchors are needed. Also, based on the regex he attempted, I infer that a single group, such as W00000, should match. So, I think the regex should be this, if the characters following the W must always be digits:
^W[:digit:]{5}(,W[:digit:]{5}){0,3}$
Or this, if they can be something other than digits:
^W[^,]{5}(,W[^,]{5}){0,3}$
UPDATE:
The OP posted the following comment:
I am on Oracle 11g and [:digit:] doesn't work. When I replace it with [0-9] it then works fine.
According to the documentation, Oracle 11g conforms to the POSIX regex standard and should be able to use POSIX character classes such as [:digit:]. However, I noticed in the docs that Oracle 11g does support Perl-style backslash character class abbreviations, which I didn't think was the case when I originally wrote this answer. In that case, the following should work:
^W\d{5}(,W\d{5}){0,3}$
Well in that case, you can do this:
(W[^,]{5},){3}W[^,]{5}
If I understood correctly, this should do it!
^W[0-9]{5}(,W[0-9]{5}){0,3}$
One W12345 pattern, maybe followed by one to 3 ,W12345 blocks.
Edit1: Adding ^$ to fail if there is a comma
Edit2: Fix class, since it fails on Oracle 11g

Regular expression for date in selenium ide

I tried to create a regular expression for the below date format, where as I am unable to justify that, please help me out.
04-Apr-2013 [10:58:13 GMT+05:30]
This is what I came up with:
\\d{2}-\\w{3}-d{4} [\\d{2}:d{2}:d{2} \\w{3}+\\d{2}:d{2}]
Correct me where i have gone wrong.
Thanks
You have to escape the square brackets as they have a special meaning in regular expressions and also some of your digit indicators doesn't have backslash prefix. I've tested the following regex and it worked for me:
regexp:\d{2}-\w{3}-\d{4} \[\d{2}:\d{2}:\d{2} \w{3}\+\d{2}:\d{2}\]

How do I escape the # character in Cognos?

I have created a pass-through Query Item in Cognos 8 Framework Manager that requires the # character as part of the query. Unfortunately this gets interpreted by Cognos as the opening of a macro.
How do I escape the # (number sign/sharp) character in a Query Item?
There does't seem to be any "official" way in the documentation, but this seems to work.
#"#"#
#'#'# works too, unless it appears in a literal SQL string in your query, so it's safer to use #"#"#.
A single backslash \ should delimit it, I know it delimits square brackets [].