Redshift SQL REGEXP_REPLACE function - sql

I have a value which is duplicated from source (can't do anything about that). I have read some examples here https://docs.aws.amazon.com/redshift/latest/dg/REGEXP_REPLACE.html
Example value:
ABC$ABC$
So just trimming anything after the first '€'. I tried this, but I cannot figure out the correct REGEX expression.
REGEXP_REPLACE(value, '€.*\\.$', '')

So just trimming anything after the first '€'.
Why use regex at all? Why not just..
SELECT LEFT(value, CHARINDEX('€', value)-1)
If not all your data has a euro sign, consider WHERE value like '%€%'

Your current regex pattern is including a dot as the final character. Remove it and your approach should work:
SELECT REGEXP_REPLACE(value, '€.*$', '') AS value_out
FROM yourTable;

Or you can take the initial sequence of non-€ characters:
REGEXP_SUBSTR(value, '^[^€]+')

Related

Snowflake SQL - Format Phone Number to 9 digits

I have a column with phone numbers in varchar, currently looks something like this. Because there is no consistent format, I don't think substring works.
(956) 444-3399
964-293-4321
(929)293-1234
(919)2991234
How do I remove all brackets, spaces and dashes and have the query return just the digits, in Snowflake? The desired output:
9564443399
9642934321
9292931234
9192991234
You can use regexp_replace() function to achieve this:
REGEXP_REPLACE(yourcolumn, '[^0-9]','')
That will strip out any non-numeric character.
You could use regexp_replace to remove all of the special characters
something like this
select regexp_replace('(956) 444-3399', '[\(\) -]', '')
An alternative using translate . Documentation
select translate('(956) 444-3399', '() -', '')

Extract data in parentheses with Amazon-redshift

I have these characters in a table:
LEASE THIRD-CCP-MANAGER (AAAA)
THE MANAGEMENT OF A THIRD PARTY (BBBB / AAAA)
When I extract the information:
AAAA
BBBB/AAAA
That is, I have to look for the pattern and extract what is inside the parenthesis.
I'm trying to use the REGEXP_SUBSTR function.
In amazon redshift, how do I extract the characters in parentheses?
thanks
Whoa, that was hard!
Here is the syntax to use:
SELECT REGEXP_SUBSTR('One (Two) Three', '[(](.*)[)]', 1, 1, 'e')
This will return: Two
It appears that escaping brackets with \( doesn't work, but putting them in [(] does work. The 'e' at the end will "Extract a substring using a subexpression".
use position for finding the index of parenthesis ( and then substring
select
substring(position('(' in 'LEASE THIRD-CCP-MANAGER (AAAA)'),position(')' in 'LEASE THIRD-CCP-MANAGER (AAAA)'))
or you can use split_part
split_part('LEASE THIRD-CCP-MANAGER (AAAA)','(',2)
You’re probably struggling with () meaning something in regular expressions (lookup “back references”)
To tell regex that you just mean the characters ( and ) without their special meaning, “escape” them using \
regexp_substr(yourtable.yourcolumn,'\(.*\)')

substr in Oracle from column

What is the Syntax to substr in Oracle to subtract a string
i have "123456789 #073"
I only want what after the #
substr (table.col, 17,3)
is that ok ?
Most likely the simplest (and most performant) way of doing this would be to use the base string functions:
SELECT SUBSTR(col, INSTR(col, '#') + 1)
FROM yourTable;
Demo
We could also try using REGEXP_REPLACE here:
SELECT REGEXP_REPLACE(col, '.*#(.*)', '\1')
FROM yourTable;
The regex option would in general not perform as well as the first query. The reason for this is that invoking a regex incurs a performance overhead. You might want to consider a regex option if you expect that the string logic might change or get more complicated in the future. Otherwise, go with base string functions wherever possible.
I think the most direct method might be regexp_substr():
select regexp_substr('123456789 #073', '[^#]+$')
from dual;
The regular expression says: "get me all non-hash characters at the end of the string".
If you happen to know that there are 3 characters and really want the last three characters of the string:
select substr('123456789 #073', -3)

Get only Number

How to ignore special characters and get only number with the below input as string.
Input: '33-01-616-000'
Output should be 3301616000
Use the REPLACE() function to remove the - characters.
REPLACE(columnname, '-', '')
Or if there can be other non-numeric characters, you can use REGEXP_REPLACE() to remove anything that isn't a number.
REGEXP_REPLACE(columnname, '\D', '')
Standard string functions (like REPLACE, TRANSLATE etc.) are often much faster (one order of magnitude faster) than their regular expression counterparts. Of course, this is only important if you have a lot of data to process, and/or if you don't have that much data but you must process it very frequently.
Here is one way to use TRANSLATE for this problem even if you don't know ahead of time what other characters there may be in the string - besides digits:
TRANSLATE(columnname, '0123456789' || columnname, '0123456789')
This will map 0 to 0, 1 to 1, etc. - and all other characters in the input string columnname to nothing (so they will be simply removed). Note that in the TRANSLATE mapping, only the first occurrence of a character in the second argument matters - any additional mapping (due to the appearance of the same character in the second argument more than once) is ignored.
You can also use REGEXP_REPLACE function. Try code below,
SELECT REGEXP_REPLACE('33-01-61ASDF6-0**(98)00[],./123', '([^[:digit:]])', NULL)
FROM DUAL;
SELECT regexp_replace('33-01-616-000','[^0-9]') digits_only FROM dual;
/

Cut string after first occurrence of a character

I have strings like 'keepme:cutme' or 'string-without-separator' which should become respectively 'keepme' and 'string-without-separator'. Can this be done in PostgreSQL? I tried:
select substring('first:last' from '.+:')
But this leaves the : in and won't work if there is no : in the string.
Use split_part():
SELECT split_part('first:last', ':', 1) AS first_part
Returns the whole string if the delimiter is not there. And it's simple to get the 2nd or 3rd part etc.
Substantially faster than functions using regular expression matching. And since we have a fixed delimiter we don't need the magic of regular expressions.
Related:
Split comma separated column data into additional columns
regexp_replace() may be overload for what you need, but it also gives the additional benefit of regex. For instance, if strings use multiple delimiters.
Example use:
select regexp_replace( 'first:last', E':.*', '');
SQL Select to pick everything after the last occurrence of a character
select right('first:last', charindex(':', reverse('first:last')) - 1)