Extract data in parentheses with Amazon-redshift - sql

I have these characters in a table:
LEASE THIRD-CCP-MANAGER (AAAA)
THE MANAGEMENT OF A THIRD PARTY (BBBB / AAAA)
When I extract the information:
AAAA
BBBB/AAAA
That is, I have to look for the pattern and extract what is inside the parenthesis.
I'm trying to use the REGEXP_SUBSTR function.
In amazon redshift, how do I extract the characters in parentheses?
thanks

Whoa, that was hard!
Here is the syntax to use:
SELECT REGEXP_SUBSTR('One (Two) Three', '[(](.*)[)]', 1, 1, 'e')
This will return: Two
It appears that escaping brackets with \( doesn't work, but putting them in [(] does work. The 'e' at the end will "Extract a substring using a subexpression".

use position for finding the index of parenthesis ( and then substring
select
substring(position('(' in 'LEASE THIRD-CCP-MANAGER (AAAA)'),position(')' in 'LEASE THIRD-CCP-MANAGER (AAAA)'))
or you can use split_part
split_part('LEASE THIRD-CCP-MANAGER (AAAA)','(',2)

You’re probably struggling with () meaning something in regular expressions (lookup “back references”)
To tell regex that you just mean the characters ( and ) without their special meaning, “escape” them using \
regexp_substr(yourtable.yourcolumn,'\(.*\)')

Related

Regex to split apart text. Special case for parentheses with spaces in them

I am trying to split a field by delimiter in LookML. This field either follows the format of:
Managers (AE)
Managers (AE - MM)
I was able to split to first case using this
sql: case
when rlike (${user_role_name}, '^.*[\\(\\)].*$') then split_part(${user_role_name}, ' ', -1)
However, I haven't been able to get the 2nd case to do the same. It's in a case statement so I am going to add another when statement, but am not able to figure out the regex for parentheses that contains spaces.
Thanks in advance for the help!
By "split" the string, I think you mean you want to extract the part in parentheses, right?
I would do this using a regex substring method. You didn't mention what warehouse you're using, and the syntax will vary a little, but on snowflake that would look like:
regexp_substr(${user_role_name}, '\\([^)]*\\)')
So, for example, with the inputs you gave:
select regexp_substr('Managers (AE)', '\\([^)]*\\)')
union all
select regexp_substr('Managers (AE - MM)', '\\([^)]*\\)')
result
(AE)
(AE - MM)

Redshift SQL REGEXP_REPLACE function

I have a value which is duplicated from source (can't do anything about that). I have read some examples here https://docs.aws.amazon.com/redshift/latest/dg/REGEXP_REPLACE.html
Example value:
ABC$ABC$
So just trimming anything after the first '€'. I tried this, but I cannot figure out the correct REGEX expression.
REGEXP_REPLACE(value, '€.*\\.$', '')
So just trimming anything after the first '€'.
Why use regex at all? Why not just..
SELECT LEFT(value, CHARINDEX('€', value)-1)
If not all your data has a euro sign, consider WHERE value like '%€%'
Your current regex pattern is including a dot as the final character. Remove it and your approach should work:
SELECT REGEXP_REPLACE(value, '€.*$', '') AS value_out
FROM yourTable;
Or you can take the initial sequence of non-€ characters:
REGEXP_SUBSTR(value, '^[^€]+')

Use REGEXP_SUBSTR to extract string of varied length

I want to extract alphanumeric text of varied length from a string between the second occurrence of a specific characters.
I have tried various forms of substr and regexp_substr but can't seem to get the syntax right. This is for use in Teradata SQL assistant. In the past I would have to create a temp table and use substr twice before trimming down the string to what I need. I want to do it all in one go.
SELECT regexp_substr('Channel:DF GB, Order Num:12345T6, Order Date:01/01/2019, Charge Codes:TAXES,,GBRAX', 'Num\\:+(\\:+)',1,2, ':') as RESULTING_STRING
My desired result is to return ONLY what is between "Num:" and the next "," in this case "12345T6". The length of the order number can vary so it is not a fixed length. When I run my code the actual output is a '?' returned by Teradata. What am I doing wrong?
This seems to work:
SELECT regexp_substr('Channel:DF GB, Order Num:12345T6, Order Date:01/01/2019, Charge Codes:TAXES,,GBRAX', 'Num:(\w*)', 1, 1, NULL, 1) as RESULTING_STRING from dual
Finds Num: and then captures as many word characters (, is not a word char) as there are available. The last parameter - subexpr - specifies which subexpression (aka capture group) you want, without it the whole thing will be matched (Num:12345T6).
Assuming you use Teradata SQL Assistant to query a Teradata system (but why do you tag Oracle then) the RegEx syntax is slightly different (both use a different RegEx dialects):
Teradata's RegExp_Substr doesn't support the subexpression parameter, you can either switch to the (I really don't know why) undocumented RegExp_Substr_gpl
RegExp_Substr_gpl(x, 'Num:([^,]*)', 1, 1, 'i', 1)
or tell the RegEx to forget the previous match using \K:
RegExp_Substr(x, 'Num:\K[^,]*', 1,1, 'i')
You can give a try to the below pattern search !
SELECT REGEXP_REPLACE ((REGEXP_SUBSTR('Channel:DF GB, Order Num:12345T6, Order Date:01/01/2019, Charge Codes:TAXES,,GBRAX', 'Num:[A-Za-z0-9]*',1,1, 'i')),'Num:','',1,1,'i') AS RESULTING_STRING
Regexp_substr pattern search ['Num:[A-Za-z0-9]*'], will first filter out the alphanumeric characters that follow the pattern 'Num:',astriek, helps to find out zero or more occurrences of the specified pattern.
For eg:, in this 'Num:12345T6' will be filtered out of the string provided, also note the last parameter in the regexp_substr is 'i', which ensures case in-specific search.
Lastly, Regexp_replace will replace the pattern 'Num:' from the output of the regexp_substr with an empty string,resulting in a final string as '12345T6'.

replace all occurrences of a sub string between 2 charcters using sql

Input string: ["1189-13627273","89-13706681","118-13708388"]
Expected Output: ["14013627273","14013706681","14013708388"]
What I am trying to achieve is to replace any numbers till the '-' for each item with hard coded text like '140'
SELECT replace(value_to_replace, '-', '140')
FROM (
VALUES ('1189-13627273-77'), ('89-13706681'), ('118-13708388')
) t(value_to_replace);
check this
I found the right way to achieve that using the below regular expression.
SELECT REGEXP_REPLACE (string_to_change, '\\"[0-9]+\\-', '140')
You don't need a regexp for this, it's as easy as concatenation of 140 and the substring from - (or the second part when you split by -)
select '140'||substring('89-13706681' from position('-' in '89-13706681')+1 for 1000)
select '140'||split_part('89-13706681','-',2)
also, it's important to consider if you might have instances that don't contain - and what would be the output in this case
Use regexp_replace(text,text,text) function to do so giving the pattern to match and replacement string.
First argument is the value to be replaced, second is the POSIX regular expression and third is a replacement text.
Example
SELECT regexp_replace('1189-13627273', '.*-', '140');
Output: 14013627273
Sample data set query
SELECT regexp_replace(value_to_replace, '.*-', '140')
FROM (
VALUES ('1189-13627273'), ('89-13706681'), ('118-13708388')
) t(value_to_replace);
Caution! Pattern .*- will replace every character until it finds last occurence of - with text 140.

Cut string after first occurrence of a character

I have strings like 'keepme:cutme' or 'string-without-separator' which should become respectively 'keepme' and 'string-without-separator'. Can this be done in PostgreSQL? I tried:
select substring('first:last' from '.+:')
But this leaves the : in and won't work if there is no : in the string.
Use split_part():
SELECT split_part('first:last', ':', 1) AS first_part
Returns the whole string if the delimiter is not there. And it's simple to get the 2nd or 3rd part etc.
Substantially faster than functions using regular expression matching. And since we have a fixed delimiter we don't need the magic of regular expressions.
Related:
Split comma separated column data into additional columns
regexp_replace() may be overload for what you need, but it also gives the additional benefit of regex. For instance, if strings use multiple delimiters.
Example use:
select regexp_replace( 'first:last', E':.*', '');
SQL Select to pick everything after the last occurrence of a character
select right('first:last', charindex(':', reverse('first:last')) - 1)