Replace positions with characters in Snowflake SQL - sql

I have several columns where I have to replace positions in strings with underscores.
i.e.
11 11_modified
XX4RDGCG9DR XX4RDGCG__R
12 12_modified
XX4RDGCG9DRX XX4RDGCG___X
13 13_modified
XX4RDGCG9DRXY XX4RDGCG____Y
Notice that I will always just need the first 8-digits, but depending on the column, the number of underscores changes and I only need the last value of a string-value.
11... has 2 underscores at the 9th and 10th position, 12... has 3 underscores at the 9th, 10th, and 11th position, and 13 has 4 underscores at the 9th, 10th, llth, and 12th position.
How would I do this?

Using CONCAT and string manipulation functions:
SELECT col,
CONCAT(LEFT(col, 8), REPEAT('_', LEN(col)-9), RIGHT(col, 1)) AS modified
FROM tab;
For sample input:
CREATE OR REPLACE TABLE tab
AS
SELECT 'XX4RDGCG9DR' AS col UNION
SELECT 'XX4RDGCG9DRX' UNION
SELECT 'XX4RDGCG9DRXY';
Output:

Another alternative using insert
insert( <base_expr>, <pos>, <len>, <insert_expr> )
set str='XX4RDGCG9DRX';
select insert($str,9,len($str)-9,repeat('_',len($str)-9));
Update:
I noticed both Lukasz's and my solution return unexpected result if len($str) < 9. To fix for that, modify that to the following so that strings that don't qualify remain unchanged. I would honestly use a where clause or a case expression instead
set str='XG9DR';
select insert($str,9,greatest(len($str)-9,0),repeat('_',len($str)-9));

Related

REGEXP_EXTRACT value from left between 4th and 5th underscore

I have a string column that contains either 7 or 8 elements that are always separated by underscores:
AAA_BBB_CCC_DDD_EEE_FFF_GGG_HHH
AAA_BBB_CCC_DDD_EEE_FFF_GGG
Values between underscores can be of various length and contain other characters like + as an example
How do I extract only the value between the 4th and 5th underscore? That is, for both of these strings, I would get EEE?
The code I am trying to use is:
SELECT
REGEXP_EXTRACT("AAA_BBB_CCC_DDD_EEE_FFF_GGG_HHH", r'.+_.+_.+_.+_(.+)_.+_.+_.+') AS a
If it is the longer string (ending with HHH), I get the value EEE, but if it is the shorter string, I get null. What am I doing wrong?
The following logic using REGEXP_EXTRACT with a capture group should be working here:
SELECT REGEXP_EXTRACT(col, r'^[^_]+_[^_]+_[^_]+_[^_]+_([^_]+)'
FROM yourTable;
An alternative is to split your string into an array, and select the 5th element of it (from 0)
WITH test AS
(SELECT "AAA_BBB_CCC_DDD_EEE_FFF_GGG_HHH" as letter_group
UNION ALL
SELECT "AAA_BBB_CCC_DDD_EEE_FFF_GGG" as letter_group)
SELECT letter_array[OFFSET(5)] FROM (SELECT SPLIT(letter_group, "_") as letter_array FROM test) T;

How to identify combination of number and character in SQL

I have a requirement where I have to find number of records in a special pattern in the field ref_id in a table. It's a varchar column. I need to find all the records where 8th, 9th and 10th character are numeric+XX. That is it should be like 2XX or 8XX. I tried using regexp :digit: but no luck. Essentially I am looking for all records where 8th-10th characters are 1XX, 2XX, 3XX… etc
Using REGEXP_LIKE, replace table with Yours:
SELECT COUNT(*)
FROM table
WHERE REGEXP_LIKE(ref_id,'^.{7}[0-9]XX');
.{7} whatever seven characters
[0-9] 8th character digit
XX 9th and 10th characters X
Or with [:digit:] class as You are mentioning, You may use:
SELECT COUNT(*)
FROM table
WHERE REGEXP_LIKE(ref_id,'^.{7}[[:digit:]]XX');
This can also be achieved using standard non-regex SQL functions
select * from t where s like '________XX%' -- any 8 characters and then XX
AND translate( substr(s,8,1),'?0123456789','?') is null; --8th one is numeric
DEMO
No need for a regexp:
select * from mytable where substr(ref_id, 8, 3) in ('0XX','1XX','2XX','3XX','4XX','5XX','6XX','7XX','8XX','9XX')
or
select * from mytable where substr(ref_id, 8, 3) in ('1XX','2XX','3XX','4XX','5XX','6XX','7XX','8XX','9XX')
I don't know if '0XX' is a valid match or not.
Regexp's tend to be slow.

Extracting specific part of column values in Oracle SQL

I want to extract a specific part of column values.
The target column and its values look like
TEMP_COL
---------------
DESCOL 10MG
TEGRAL 200MG 50S
COLOSPAS 135MG 30S
The resultant column should look like
RESULT_COL
---------------
10MG
200MG
135MG
This can be done using a regular expression:
SELECT regexp_substr(TEMP_COL, '[0-9]+MG')
FROM the_table;
Note that this is case sensitive and it always returns the first match.
I would probably approach this using REGEXP_SUBSTR() rather than base functions, because the structure of the prescription text varies from record to record.
SELECT TRIM(REGEXP_SUBSTR(TEMP_COL, '(\s)(\S*)', 1, 1))
FROM yourTable
The pattern (\s)(\S*) will match a single space followed by any number of non-space characters. This should match the second term in all cases. We use TRIM() to remove a leading space which is matched and returned.
how do you know what is the part you want to extract? how do you know where it begins and where it ends? using the white-spaces?
if so, you can use substr for cutting the data and instr for finding the white-spaces.
example:
select substr(tempcol, -- string
instr(tempcol, ' ', 1), -- location of first white-space
instr(tempcol, ' ', 1, 2) - instr(tempcol, ' ', 1)) -- length until next space
from dual
another solution is using regexp_substr (but it might be harder on performance if you have a lot of rows):
SELECT REGEXP_SUBSTR (tempcol, '(\S*)(\s*)', 1, 2)
FROM dual;
edit: fixed the regular expression to include expressions that don't have space after the parsed text. sorry about that.. ;)

Remove last two characters from each database value

I run the following query:
select * from my_temp_table
And get this output:
PNRP1-109/RT
PNRP1-200-16
PNRP1-209/PG
013555366-IT
How can I alter my query to strip the last two characters from each value?
Use the SUBSTR() function.
SELECT SUBSTR(my_column, 1, LENGTH(my_column) - 2) FROM my_table;
Another way using a regular expression:
select regexp_replace('PNRP1-109/RT', '^(.*).{2}$', '\1') from dual;
This replaces your string with group 1 from the regular expression, where group 1 (inside of the parens) includes the set of characters after the beginning of the line, not including the 2 characters just before the end of the line.
While not as simple for your example, arguably more powerful.

How to get rightmost 10 places of a string in oracle

I am trying to fetch an id from an oracle table. It's something like TN0001234567890345. What I want is to sort the values according to the right most 10 positions (e.g. 4567890345). I am using Oracle 11g. Is there any function to cut the rightmost 10 places in Oracle SQL?
You can use SUBSTR function as:
select substr('TN0001234567890345',-10) from dual;
Output:
4567890345
codaddict's solution works if your string is known to be at least as long as the length it is to be trimmed to. However, if you could have shorter strings (e.g. trimming to last 10 characters and one of the strings to trim is 'abc') this returns null which is likely not what you want.
Thus, here's the slightly modified version that will take rightmost 10 characters regardless of length as long as they are present:
select substr(colName, -least(length(colName), 10)) from tableName;
Another way of doing it though more tedious. Use the REVERSE and SUBSTR functions as indicated below:
SELECT REVERSE(SUBSTR(REVERSE('TN0001234567890345'), 1, 10)) FROM DUAL;
The first REVERSE function will return the string 5430987654321000NT.
The SUBSTR function will read our new string 5430987654321000NT from the first character to the tenth character which will return 5430987654.
The last REVERSE function will return our original string minus the first 8 characters i.e. 4567890345
SQL> SELECT SUBSTR('00000000123456789', -10) FROM DUAL;
Result: 0123456789
Yeah this is an old post, but it popped up in the list due to someone editing it for some reason and I was appalled that a regular expression solution was not included! So here's a solution using regex_substr in the order by clause just for an exercise in futility. The regex looks at the last 10 characters in the string:
with tbl(str) as (
select 'TN0001239567890345' from dual union
select 'TN0001234567890345' from dual
)
select str
from tbl
order by to_number(regexp_substr(str, '.{10}$'));
An assumption is made that the ID part of the string is at least 10 digits.