Convert string using regex expression

Convert string using regex expression - sql

I convert it using substring and it working fine but I have to convert lots of the and it will take time.
I was told regex is much more efficient and faster.
Any advice on regex ?
converting string1 to string2 using regex
string1 = '96457fa012456c41bf9200011da2d8fa'
string2='\96\45\7f\a0\12\45\6c\41\bf\92\00\01\1d\a2\d8\fa'
Thank you in advance

This works in Oracle - replacing Oracle's regex implementation with SQL Server's should be straightforward:
select regexp_replace(
'96457fa012456c41bf9200011da2d8fa',
'(..)',
'\\\1')
from dual
Explanation:
we want to match any pair of characters => ".."
we want to "store" the characters we just matched, therefore enclose them in a capturing group => "(..)"
in our replacement string, we want to get the contents of our matching group => "\1"
and we want to add a backslash before each group => "\\\1"
dual is just a dummy table in Oracle

Related

Remove template text on regexp_replace in Oracle's SQL

I am trying to remove template text like &#x; or &#xx; or &#xxx; from long string
Note: x / xx / xxx - is number, The length of the number is unknown, The cell type is CLOB
for example:
SELECT 'H'ello wor±ld' FROM dual
A desirable result:
Hello world
I know that regexp_replace should be used, But how do you use this function to remove this text?

You can use
SELECT REGEXP_REPLACE(col,'&&#\d+;')
FROM t
where
& is put twice to provide escaping for the substitution character
\d represents digits and the following + provides the multiple occurrences of them
ending the pattern with ;
or just use a single ampersand ('&#\d+;') for the pattern as in the case of Demo , since an ampersand has a special meaning for Oracle, a usage is a bit problematic.

In case you wanted to remove the entities because you don't know how to replace them by their character values, here is a solution:
UTL_I18N.UNESCAPE_REFERENCE( xmlquery( 'the_double_quoted_original_string' RETURNING content).getStringVal() )
In other words, the original 'H'ello wor±ld' should be passed to XMLQUERY as '"H'ello wor±ld"'.
And the result will be 'H'ello wo±ld'

regex extract big query with numeric data

how would I be able to grab the number 2627995 from this string
"hellotest/2627995?hl=en"
I want to grab the number 2627995, here is my current regex but it does not work when I use regex extract from big query
(\/)\d{7,7}
SELECT
REGEXP_EXTRACT(DESC, r"(\/)\d{7,7}")
AS number
FROM
`string table`
here is the output
Thank you!!

I think you just want to match all digits coming after the last path separator, before either the start of the query parameter, or the end of the URL.
SELECT REGEXP_EXTRACT(DESC, r"/(\d+)(?:\?|$)") AS number
FROM `string table`
Demo

Try this one: r"\/(\d+)"

Your code returns the slash because you captured it (see the parentheses in (\/)\d{7,7}). REGEXP_EXTRACT only returns the captured substring.
Thus, you could just wrap the other part of your regex with the parentheses:
SELECT
REGEXP_EXTRACT(DESC, r"/(\d{7})")
AS number
FROM
`string table`
NOTE:
In BigQuery, regex is specified with string literals, not regex literals (that are usually delimited with forward slashes), that is why you do not need to escape the / char (it is not a special regex metacharacter)
{7,7} is equal to {7} limiting quantifier, meaning seven occurrences.
Also, if you are sure the number is at the end of string or is followed with a query string, you can enhance it as
REGEXP_EXTRACT(DESC, r"/(\d+)(?:[?#]|$)")
where the regex means
/ - a / char
(\d+) - Group 1 (the actual output): one or more digits
(?:[?#]|$) - either ? or # char, or end of string.

Snowflake SQL Regex

I am trying to identify a value that is nested in a string using Snowflakes regexp_substr()
The value that I want to access is in quotes:
...
Type:
value: "CategoryA"
...
Edit: This text is nested in a much larger portion of text.
I want to extract CategoryA for all columns using regexp_substr. But I am unsure how.
I have tried:
regexp_substr(col, 'Type\\W+(\\w+)\\W+\\w.+')
and while that gives the portion of the string, I just want what is in quotes and can't figure out how to do so.

You could use regexp_replace() instead:
regexp_replace(col, '(^[^"]*")|("[^"]*$)", '')
The regexp matches on both following conditions, and replaces matching parts with the empty string:
^[^"]*": everything from the beginning of the string to the first double quote
("[^"]*$)": everything from the last double quote to the end of the string

Extract data in parentheses with Amazon-redshift

I have these characters in a table:
LEASE THIRD-CCP-MANAGER (AAAA)
THE MANAGEMENT OF A THIRD PARTY (BBBB / AAAA)
When I extract the information:
AAAA
BBBB/AAAA
That is, I have to look for the pattern and extract what is inside the parenthesis.
I'm trying to use the REGEXP_SUBSTR function.
In amazon redshift, how do I extract the characters in parentheses?
thanks

Whoa, that was hard!
Here is the syntax to use:
SELECT REGEXP_SUBSTR('One (Two) Three', '[(](.*)[)]', 1, 1, 'e')
This will return: Two
It appears that escaping brackets with \( doesn't work, but putting them in [(] does work. The 'e' at the end will "Extract a substring using a subexpression".

use position for finding the index of parenthesis ( and then substring
select
substring(position('(' in 'LEASE THIRD-CCP-MANAGER (AAAA)'),position(')' in 'LEASE THIRD-CCP-MANAGER (AAAA)'))
or you can use split_part
split_part('LEASE THIRD-CCP-MANAGER (AAAA)','(',2)

You’re probably struggling with () meaning something in regular expressions (lookup “back references”)
To tell regex that you just mean the characters ( and ) without their special meaning, “escape” them using \
regexp_substr(yourtable.yourcolumn,'\(.*\)')

Cut string after first occurrence of a character

I have strings like 'keepme:cutme' or 'string-without-separator' which should become respectively 'keepme' and 'string-without-separator'. Can this be done in PostgreSQL? I tried:
select substring('first:last' from '.+:')
But this leaves the : in and won't work if there is no : in the string.

Use split_part():
SELECT split_part('first:last', ':', 1) AS first_part
Returns the whole string if the delimiter is not there. And it's simple to get the 2nd or 3rd part etc.
Substantially faster than functions using regular expression matching. And since we have a fixed delimiter we don't need the magic of regular expressions.
Related:
Split comma separated column data into additional columns

regexp_replace() may be overload for what you need, but it also gives the additional benefit of regex. For instance, if strings use multiple delimiters.
Example use:
select regexp_replace( 'first:last', E':.*', '');

SQL Select to pick everything after the last occurrence of a character
select right('first:last', charindex(':', reverse('first:last')) - 1)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Convert string using regex expression - sql

Related

Remove template text on regexp_replace in Oracle's SQL

regex extract big query with numeric data

Snowflake SQL Regex

Extract data in parentheses with Amazon-redshift

Cut string after first occurrence of a character

Categories

Resources