How do I use Find/Replace with a backreference followed by a number? - automation

I want to do a find/replace on the string matching (abcdefg)1 with \12, but it doesn't work. How do I do it?

Add a leading zero to the backreference:
\012
References
Example: Regular Expression Matching a Valid Date
How do you get a literal digit after a backreference in the search pattern

Related

Trino regexp_replace this character in the beginning but not in the middle Trino [duplicate]

I am a complete Reg-exp noob, so please bear with me. Tried to google this, but haven't found it yet.
What would be an appropriate way of writing a Regular expression matching files starting with a dot, such as .buildpath or .htaccess?
Thanks a lot!
In most regex languages, ^\. or ^[.] will match a leading dot.
The ^ matches the beginning of a string in most languages. This will match a leading .. You need to add your filename expression to it.
^\.
Likewise, $ will match the end of a string.
You may need to substitute the \ for the respective language escape character. However, under Powershell the Regex I use is: ^(\.)+\/
Test case:
"../NameOfFile.txt" -match '^(\\.)+\\\/'
works, while
"_./NameOfFile.txt" -match '^(\\.)+\\\/'
does not.
Naturally, you may ask, well what is happening here?
The (\\.) searches for the literal . followed by a +, which matches the previous character at least once or more times.
Finally, the \\\/ ensures that it conforms to a Window file path.
It depends a bit on the regular expression library you use, but you can do something like this:
^\.\w+
The ^ anchors the match to the beginning of the string, the \. matches a literal period (since an unescaped . in a regular expression typically matches any character), and \w+ matches 1 or more "word" characters (alphanumeric plus _).
See the perlre documentation for more info on Perl-style regular expressions and their syntax.
It depends on what characters are legal in a filename, which depends on the OS and filesystem.
For example, in Windows that would be:
^\.[^<>:"/\\\|\?\*\x00-\x1f]+$
The above expression means:
Match a string starting with the literal character .
Followed by at least one character which is not one of (whole class of invalid chars follows)
I used this as reference regarding which chars are disallowed in filenames.
To match the string starting with dot in java you will have to write a simple expression
^\\..*
^ means regular expression is to be matched from start of string
\. means it will start with string literal "."
.* means dot will be followed by 0 or more characters

Regex to exclude substring

I want to write regex that:
pass string starting with /scripts/my_app/ (/scripts/my_app/xxxx.sh or /scripts/my_app/yyyyy/xxxz.sh)
exclude another strings starting with /scripts/ (/scripts/another_string/xxxx.sh or /scripts/some_string/yyyyy/xxxz.sh)
pass any others strings like /xxxx/yyyy/zzzzz/qqqq.sh or /aaaa/bbbb/ccc.sh
Something like this ^[(\/scripts\/my_app\/)|(?!\/scripts\/)].* doesn't work.
You can use
^/(?:scripts/my_app/|(?!scripts/)).*
Explanation
^/ Start of string and match /
(?: Non capture group
scripts/my_app/ Match literally
| Or
(?!scripts/) Negative lookahead, assert not scripts/
) Close non capture group
.* Match 0+ times any character
Regex demo
Another option is using a double negative lookahead
^/(?!scripts/(?!my_app/)).*
Regex demo

REGEXP_REPLACE URL BIGQUERY

I have two types of URL's which I would need to clean, they look like this:
["//xxx.com/se/something?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
["//www.xxx.com/se/car?p_color_car=White?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
The outcome I want is;
SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"
I want to remove the brackets and everything up to SE, the URLS differ so I want to remove:
First URL
["//xxx.com/se/something?
Second URL:
["//www.xxx.com/se/car?p_color_car=White?
I can't get my head around it,I've tried this .*\/ . But it will still keep strings I don't want such as:
(1 url) =
something?
(2 url) car?p_color_car=White?
You can use
regexp_replace(FinalUrls, r'.*\?|"\]$', '')
See the regex demo
Details
.*\? - any zero or more chars other than line breakchars, as many as possible and then ? char
| - or
"\]$ - a "] substring at the end of the string.
Mind the regexp_replace syntax, you can't omit the replacement argument, see reference:
REGEXP_REPLACE(value, regexp, replacement)
Returns a STRING where all substrings of value that match regular
expression regexp are replaced with replacement.
You can use backslashed-escaped digits (\1 to \9) within the
replacement argument to insert text matching the corresponding
parenthesized group in the regexp pattern. Use \0 to refer to the
entire matching text.

regex capture middle of url

I'm trying to figure out the base regex to capture the middle of a google url out of a sql database.
For example, a few links:
https://www.google.com/cars/?year=2016&model=dodge+durango&id=1234
https://www.google.com/cars/?year=2014&model=jeep+cherokee+crossover&id=6789
What would be the regex to capture the text to get dodge+durango , or jeep+cherokee+crossover ? (It's alright that the + still be in there.)
My Attempts:
1)
\b[=.]\W\b\w{5}\b[+.]?\w{7}
, but this clearly does not work as this is a hard coded scenario that would only work like something for the dodge durango example. (would extract "dodge+durango)
2) Using positive lookback ,
[^+]( ?=&id )
but I am not fully sure how to use this, as this only grabs one character behind the & symbol.
How can I extract a string of (potentially) any length with any amount of + delimeters between the "model=" and "&id" boundaries?
seems like you could use regexp_replace and access match groups:
regexp_replace(input, 'model=(.*?)([&\\s]|$)', E'\\1')
from here:
The regexp_replace function provides substitution of new text for
substrings that match POSIX regular expression patterns. It has the
syntax regexp_replace(source, pattern, replacement [, flags ]). The
source string is returned unchanged if there is no match to the
pattern. If there is a match, the source string is returned with the
replacement string substituted for the matching substring. The
replacement string can contain \n, where n is 1 through 9, to indicate
that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted. Write \ if you need to put a literal backslash in the
replacement text. The flags parameter is an optional text string
containing zero or more single-letter flags that change the function's
behavior. Flag i specifies case-insensitive matching, while flag g
specifies replacement of each matching substring rather than only the
first one
I may be misunderstanding, but if you want to get the model, just select everything between model= and the ampersand (&).
regexp_matches(input, 'model=([^&]*)')
model=: Match literally
([^&]*): Capture
[^&]*: Anything that isn't an ampersand
*: Unlimited times

Regular Expression to return when invalid character found

I have the following regex that checks for a list of valid characters:
^([a-zA-Z0-9+?/:().,' -]){1,35}$
What I now need to do now is search for any existing columns in our DB that invalidates the above regex. I'm using the oracle SQL REGEXP_LIKE command.
The problem I have is I can't seem to negate the above expression and return a value when it finds a character not in the expression e.g.
"a-valid-filename.xml" => this shouldn't be returned as it's valid.
"an_invalid-filename.xml" => I need to find these i.e. anything with an invalid character.
The obvious answer to me is to define a list of invalid characters... but that could be a long list.
You can match it against the following regex which uses the [^...] negation character class:
([^a-zA-Z0-9+?/:().,' -])
This will match any single character that is not part of the list of characters that are allowed.
You can negate a character class by inserting a caret as the first character.
Example:
[^y]
The above will match anything that is not y
Try this:
where not regexp_like(col, '^([a-zA-Z0-9+?/:().,'' -]){1,35}$')
or
where regexp_like(col, '[^a-zA-Z0-9+?/:().,'' -]')