This question already has answers here:
What do comma separated numbers in curly braces at the end of a regex mean?
(6 answers)
Closed 3 years ago.
I've tried to understand the below but don't seem to get the last part of the regular expression which has {1,40}. Overall, I know the pattern tries to match the special characters and something else (the {1,40})
regexp_like(COLUMN,'^['||UNISTR('\0020')||'-'||UNISTR('\0060')||UNISTR('\007B')||UNISTR('\007D')||UNISTR('\007E')||UNISTR('\00C0')||'-'||UNISTR('\00DF')||']'||'{1,40}$')
regexp_like() checks that a string matches the regex provided as second argument.
Your regexp looks like ^[...]{1,40}$.
^ is the beginning of the string and $ is the end, so the entire string must match the regex.
[...] is a character class, that contains a bunch of characters code points. All characters of the string must belong to that list (any other character is forbiden). You would need to to check what they correspond to: unicode.org is your friend. For the first code points:
\0020 space
\0060 grave accent
\007B left curly bracket
finally, {1,40} is a quantifier: the length of the string must be at least one and at most 40.
Related
I have two types of URL's which I would need to clean, they look like this:
["//xxx.com/se/something?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
["//www.xxx.com/se/car?p_color_car=White?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
The outcome I want is;
SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"
I want to remove the brackets and everything up to SE, the URLS differ so I want to remove:
First URL
["//xxx.com/se/something?
Second URL:
["//www.xxx.com/se/car?p_color_car=White?
I can't get my head around it,I've tried this .*\/ . But it will still keep strings I don't want such as:
(1 url) =
something?
(2 url) car?p_color_car=White?
You can use
regexp_replace(FinalUrls, r'.*\?|"\]$', '')
See the regex demo
Details
.*\? - any zero or more chars other than line breakchars, as many as possible and then ? char
| - or
"\]$ - a "] substring at the end of the string.
Mind the regexp_replace syntax, you can't omit the replacement argument, see reference:
REGEXP_REPLACE(value, regexp, replacement)
Returns a STRING where all substrings of value that match regular
expression regexp are replaced with replacement.
You can use backslashed-escaped digits (\1 to \9) within the
replacement argument to insert text matching the corresponding
parenthesized group in the regexp pattern. Use \0 to refer to the
entire matching text.
I am currently debugging an old script and trying to understand a regex in an SQL query.
What would be the result of this search?
[...] REGEXP '(^| |"|\\()ORDER(-| )[[:digit:]]{3}';
Here is a part by part explanation:
(^| |"|\\(): either the beginning of the string (^), or a space, or a double quote, or an opening parenthese (this character needs to be escaped because it is meaningful in regexes)
ORDER: the word "ORDER"
(-| ): either a dash or a space
[[:digit:]]{3}: a sequence of 3 consecutive digits (between 0 and 9)
Hi may i know what does the below query means?
REGEXP_REPLACE(number,'[^'' ''-/0-9:-#A-Z''[''-`a-z{-~]', 'xy') ext_number
part 1
In terms of explaining what the function function call is doing:
It is a function call to analyse an input string 'number' with a regex (2nd argument) and replace any parts of the string which match a specific string. As for the name after the parenthesis I am not sure, but the documentation for the function is here
part 2
Sorry to be writing a question within an answer here but I cannot respond in comments yet (not enough rep)
Does this regex work? Unless sql uses different syntax this would appear to be a non-functional regex. There are some red flags, e.g:
The entire regex is wrapped in square parenthesis, indicating a set of characters but seems to predominantly hold an expression
There is a range indicator between a single quote and a character (invalid range: if a dash was required in the match it should be escaped with a '\' (backslash))
One set of square brackets is never closed
After some minor tweaks this regex is valid syntax:
^'' ''\-\/0-9:-#A-Z''[''-a-z{-~]`, but does not match anything I can think of, it is important to know what string is being examined/what the context is for the program in order to identify what the regex might be attempting to do
It seems like it is meant to replaces all ASCII control characters in the column or variable number with xy.
[] encloses a class of characters. Any character in that class matches. [^] negates that, hence all characters match, that are not in the class.
- is a range operator, e.g. a-z means all characters from a to z, like abc...xyz.
It seams like characters enclosed in ' should be escaped (The second ' is to escape the ' in the string itself.) At least this would make some sense. (But for none of the DBMS I found having a regexp_replace() function (Postgres, Oracle, DB2, MariaDB, MySQL), I found something in the docs, that would indicate this escape mechanism. They all use \, but maybe I missed something? Unfortunately you didn't tag which DBMS you're actually using!)
Now if you take an ASCII table you'll see, that the ranges in the expression make up all printable characters (counting space as printable) in groups from space to /, 0 to 9, : to #, etc.. Actually it might have been shorter to express it as '' ''-~, space to ~.
Given the negation, all these don't match. The ones left are from NUL to US and DEL. These match and get replaced by xy one by one.
I'm trying to figure out the base regex to capture the middle of a google url out of a sql database.
For example, a few links:
https://www.google.com/cars/?year=2016&model=dodge+durango&id=1234
https://www.google.com/cars/?year=2014&model=jeep+cherokee+crossover&id=6789
What would be the regex to capture the text to get dodge+durango , or jeep+cherokee+crossover ? (It's alright that the + still be in there.)
My Attempts:
1)
\b[=.]\W\b\w{5}\b[+.]?\w{7}
, but this clearly does not work as this is a hard coded scenario that would only work like something for the dodge durango example. (would extract "dodge+durango)
2) Using positive lookback ,
[^+]( ?=&id )
but I am not fully sure how to use this, as this only grabs one character behind the & symbol.
How can I extract a string of (potentially) any length with any amount of + delimeters between the "model=" and "&id" boundaries?
seems like you could use regexp_replace and access match groups:
regexp_replace(input, 'model=(.*?)([&\\s]|$)', E'\\1')
from here:
The regexp_replace function provides substitution of new text for
substrings that match POSIX regular expression patterns. It has the
syntax regexp_replace(source, pattern, replacement [, flags ]). The
source string is returned unchanged if there is no match to the
pattern. If there is a match, the source string is returned with the
replacement string substituted for the matching substring. The
replacement string can contain \n, where n is 1 through 9, to indicate
that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted. Write \ if you need to put a literal backslash in the
replacement text. The flags parameter is an optional text string
containing zero or more single-letter flags that change the function's
behavior. Flag i specifies case-insensitive matching, while flag g
specifies replacement of each matching substring rather than only the
first one
I may be misunderstanding, but if you want to get the model, just select everything between model= and the ampersand (&).
regexp_matches(input, 'model=([^&]*)')
model=: Match literally
([^&]*): Capture
[^&]*: Anything that isn't an ampersand
*: Unlimited times
This question already has answers here:
Carets in Regular Expressions
(2 answers)
Closed 6 years ago.
I'm studying regular expressions and cannot figure out what this caret does exactly. I thought that this caret symbol means 'not equal', but in this query below, I am confused:
SELECT REGEXP_REPLACE('San Antonio', '(^[[:alpha:]]+)', 'CITY') TEST
FROM DUAL;
RESULT:
CITY Antonio
'San' should comply with [:alpha:] so I don't understand what the caret function does here.
Carrat (^) also stands for the beginning of the line (and Dollar ($) for its end).
^Hello$ = the word Hello and nothing more
^Hello.* = something that starts with Hello
The negation functionality is within square brackets:
[^0-9] = anything that is not a digit
[^a-zA-Z] = anything that is not an english letter
Caret ^ (please note the correct spelling) means "at the beginning of the string", but only when it is the very first character in the matching pattern.
'San' does NOT comply with [:alpha:], because [:alpha:] is a SINGLE alphabetic character. [ ... ] means "matching set" (match exactly ONE SINGLE character of those listed within square brackets). [[:alpha:]] means any single alphabetic character. The + means "one or more" of what precedes it, so 'San' matches [[:alpha:]]+ at the beginning of the string. 'Antonio' also matches, but it is not at the beginning of the string, so it is not replaced. If you didn't have the caret, both words would be replaced with CITY (try it and you will see.)