Regex to convert CamelCase to snake case in Redshift - sql

I am trying to convert CamelCase to either snake case or separated by a delimiter using regex in SQL (AWS Redshift). So something like
regexp_replace(MyString, '([A-Z]+)', '-$1')
except I need to specify not at the beginning of the string. Right now,
MyString -> -my-string instead of my-string.
How do I do this?

Match and capture any char before the uppercase letters, and restore it using another backreference in the replacement pattern:
regexp_replace(MyString, '(.)([A-Z]+)', '$1-$2')
^^^ ^^^^^
See the regex demo.
I understand you already LOWER the result after the regex replacement.

Related

Trino regexp_replace this character in the beginning but not in the middle Trino [duplicate]

I am a complete Reg-exp noob, so please bear with me. Tried to google this, but haven't found it yet.
What would be an appropriate way of writing a Regular expression matching files starting with a dot, such as .buildpath or .htaccess?
Thanks a lot!
In most regex languages, ^\. or ^[.] will match a leading dot.
The ^ matches the beginning of a string in most languages. This will match a leading .. You need to add your filename expression to it.
^\.
Likewise, $ will match the end of a string.
You may need to substitute the \ for the respective language escape character. However, under Powershell the Regex I use is: ^(\.)+\/
Test case:
"../NameOfFile.txt" -match '^(\\.)+\\\/'
works, while
"_./NameOfFile.txt" -match '^(\\.)+\\\/'
does not.
Naturally, you may ask, well what is happening here?
The (\\.) searches for the literal . followed by a +, which matches the previous character at least once or more times.
Finally, the \\\/ ensures that it conforms to a Window file path.
It depends a bit on the regular expression library you use, but you can do something like this:
^\.\w+
The ^ anchors the match to the beginning of the string, the \. matches a literal period (since an unescaped . in a regular expression typically matches any character), and \w+ matches 1 or more "word" characters (alphanumeric plus _).
See the perlre documentation for more info on Perl-style regular expressions and their syntax.
It depends on what characters are legal in a filename, which depends on the OS and filesystem.
For example, in Windows that would be:
^\.[^<>:"/\\\|\?\*\x00-\x1f]+$
The above expression means:
Match a string starting with the literal character .
Followed by at least one character which is not one of (whole class of invalid chars follows)
I used this as reference regarding which chars are disallowed in filenames.
To match the string starting with dot in java you will have to write a simple expression
^\\..*
^ means regular expression is to be matched from start of string
\. means it will start with string literal "."
.* means dot will be followed by 0 or more characters

REGEXP_REPLACE URL BIGQUERY

I have two types of URL's which I would need to clean, they look like this:
["//xxx.com/se/something?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
["//www.xxx.com/se/car?p_color_car=White?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
The outcome I want is;
SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"
I want to remove the brackets and everything up to SE, the URLS differ so I want to remove:
First URL
["//xxx.com/se/something?
Second URL:
["//www.xxx.com/se/car?p_color_car=White?
I can't get my head around it,I've tried this .*\/ . But it will still keep strings I don't want such as:
(1 url) =
something?
(2 url) car?p_color_car=White?
You can use
regexp_replace(FinalUrls, r'.*\?|"\]$', '')
See the regex demo
Details
.*\? - any zero or more chars other than line breakchars, as many as possible and then ? char
| - or
"\]$ - a "] substring at the end of the string.
Mind the regexp_replace syntax, you can't omit the replacement argument, see reference:
REGEXP_REPLACE(value, regexp, replacement)
Returns a STRING where all substrings of value that match regular
expression regexp are replaced with replacement.
You can use backslashed-escaped digits (\1 to \9) within the
replacement argument to insert text matching the corresponding
parenthesized group in the regexp pattern. Use \0 to refer to the
entire matching text.

sql regexp string end with ".0"

I want to judge if a positive number string is end with ".0", so I wrote the following sql:
select '12310' REGEXP '^[0-9]*\.0$'. The result is true however. I wonder why I got the result, since I use "\" before "." to escape.
So I write another one as select '1231.0' REGEXP '^[0-9]\d*\.0$', but this time the result is false.
Could anyone tell me the right pattern?
Dot (.) in regexp has special meaning (any character) and requires escaping if you want literally dot:
select '12310' REGEXP '^[0-9]*\\.0$';
Result:
false
Use double-slash to escape special characters in Hive. slash has special meaning and used for characters like \073 (semicolon), \n (newline), \t (tab), etc. This is why for escaping you need to use double-slash. Also for character class digit use \\d:
hive> select '12310.0' REGEXP '^\\d*?\\.0$';
OK
true
Also characters inside square brackets do not need double-slash escaping: [.] can be used instead of \\.
If you know it is a number string, why not just use:
select ( val like '%.0' )
You need regular expression if you want to validate that the string has digits everywhere else. But if you only need to check the last two characters, like is sufficient.
As for your question . is a wildcard in regular expressions. It matches any character.

Workaround for Impala Regex lookahead and lookbehind

If I use Hive, the below works fine. But if I use Impala, it throws error:
select regexp_replace("foobarbarfoo","bar(?=bar)","<NA>");
WARNINGS: Could not compile regexp pattern: bar(?=bar)
Error: invalid perl operator: (?=
Basically, Impala doesn't support lookahead and lookbehind
https://www.cloudera.com/documentation/enterprise/release-notes/topics/impala_incompatible_changes.html#incompatible_changes_200
Is there a workaround for this today? Maybe use UDF?
Thanks.
Since you are using regexp_replace, match and capture the part of string you want to keep (but want to use as must-have context) and replace with a backreference. See the regexp_replace Impala reference:
These examples show how you can replace parts of a string matching a pattern with replacement text, which can include backreferences to any () groups in the pattern string. The backreference numbers start at 1, and any \characters must be escaped as \\.
So, here, you may use
select regexp_replace("foobarbarfoo","bar(bar)","<NA>\\1");
^ ^ ^^^
Note it will not work to replace consecutive matches, however, it will work in the current scenario and foobarbarfoo will turn into foo<NA>barfoo (note that Go regex engine is also RE2, hence this option is chosen at regex101.com).

Parse WHERE condition with regular expressions

I was wondering if anyone can help me with a regular expression - not my strongest point - to parse the WHERE part of a SQL statement. I need to extract the column names, either in "column" or "table.column" format. I'm using MySQL and PHP.
For example, parsing:
(table.column_a = '1') OR (table.column_a = '0'))
AND (date_column < '2014-07-03')
AND column_c LIKE '%my search string%'
should yield
table.column_a
table.column_b
date_column
column_c
Edit: clarification - the strings will be parsed in PHP with preg_* functions!
Thank you!
Assuming you are not doing this in SQL, you can use a regex like this:
[A-Za-z._]+(?=[ ]*(?:[<>=]|LIKE))
See regex demo.
This would work in Notepad++ and many languages.
Explanation
[A-Za-z._]+ matches the characters in your word (if you want to add digits, add 0-9
The lookahead (?=[ ]*(?:[<>=]|LIKE)) asserts that what follows is optional spaces (the brackets are optional, they make the space stand out), then one of the characters in this class [<>=] (an operator OR | LIKE
You can add operators inside [<>=], or tag them at the end with another alternation, e.g. |REGEXP
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind