REGEXP_EXTRACT value from left between 4th and 5th underscore - sql

I have a string column that contains either 7 or 8 elements that are always separated by underscores:
AAA_BBB_CCC_DDD_EEE_FFF_GGG_HHH
AAA_BBB_CCC_DDD_EEE_FFF_GGG
Values between underscores can be of various length and contain other characters like + as an example
How do I extract only the value between the 4th and 5th underscore? That is, for both of these strings, I would get EEE?
The code I am trying to use is:
SELECT
REGEXP_EXTRACT("AAA_BBB_CCC_DDD_EEE_FFF_GGG_HHH", r'.+_.+_.+_.+_(.+)_.+_.+_.+') AS a
If it is the longer string (ending with HHH), I get the value EEE, but if it is the shorter string, I get null. What am I doing wrong?

The following logic using REGEXP_EXTRACT with a capture group should be working here:
SELECT REGEXP_EXTRACT(col, r'^[^_]+_[^_]+_[^_]+_[^_]+_([^_]+)'
FROM yourTable;

An alternative is to split your string into an array, and select the 5th element of it (from 0)
WITH test AS
(SELECT "AAA_BBB_CCC_DDD_EEE_FFF_GGG_HHH" as letter_group
UNION ALL
SELECT "AAA_BBB_CCC_DDD_EEE_FFF_GGG" as letter_group)
SELECT letter_array[OFFSET(5)] FROM (SELECT SPLIT(letter_group, "_") as letter_array FROM test) T;

Related

Extract character between the first two characters

I have a table in BigQuery:
ab_col_jfsfhfd_ggg_sdf
arfd_am_fdsf_fddg_fg
d_fdf_fdddg_ffddd_f
I would like to extract those characters that go right after the first _ character and followed by the second _ character. I want to get the following:
col
am
fdf
I used the following regular expression to extract the characters but it does not work as intended:
^.*\_(\D+)\_.*$
regexp_replace(id,'^.*\\_(\\D+)\\_.*$' , '\\1')
Please help!
If I follow you correctly, you can use split():
(split(col, '_'))[safe_ordinal(2)]
split() turns the string column to an array of values, given a separator (here, we use _). Then we can just grab second array element.
split() is a very simply way of solving this. But regular expressions are also quite simple:
with t as (
select 'ab_col_jfsfhfd_ggg_sdf' as id union all
select 'arfd_am_fdsf_fddg_fg' union all
select 'd_fdf_fdddg_ffddd_f'
)
select id, regexp_extract(id, '[^_]+', 1, 2)
from t;
The logic for the pattern is: "Look for any string of characters that is not an underscore. Then take the second one in the string."
Use regexp_extract:
regexp_extract(id,'^[^_]+_([^_]+)')
See proof
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
[^_]+ any character except: '_' (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
_ '_'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^_]+ any character except: '_' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1

How to return a substring after a colon and before a whitespace in Oracle SQL

I need to get a substring from a table column that is after a colon and before a whitespace. The length of the substring can vary, but the length of the data before the colon and after the whitespace is constant.
So the data in my table column named "Subject" consists of 5 words, immediately followed by a colon, immediately followed by the substring I need (which can vary in length), followed by a whitespace and a date. The substring I need is a course name. Examples:
Payment Due for Upcoming Course:FIN/370T 11/26/2019
Payment Due for Upcoming Course:BUS/475 11/26/2019
Payment Due for Upcoming Course:ADMIN9/475TG 11/26/2019
I have tried using REGEXP function with REGEXP_SUBSTR(COLUMN_NAME,'[^:]+$') to get everything after the colon, and REGEXP_SUBSTR(COLUMN_NAME, '[^ ]+' , 1 , 5 ) to get data before the last whitespace, but I need to combine them.
I have tried the following:
select
REGEXP_SUBSTR(SUBJECT,'[^:]+$') COURSE_ID
from TABLE
Result:
FIN/370T 11/26/2019
and this:
select
REGEXP_SUBSTR (SUBJECT, '[^ ]+' , 1 , 5 ) COURSE_ID2
from TABLE
Result:
Course:FIN/370T
I need the output to return FIN/370T
In short use:
select regexp_replace(str,'(.*:)(.*)( )(.*)$','\2') as short_course_id
from tab
I prefer regexp_replace, because there are more possibilities to extract part of strings.
If you don't want to mess with regex, you can use a combo of substr and instr.
select
substr(part1,1,instr(part1, ' ',-1,1) ) as course,
part1
from (
select
substr(<your column>,instr(<your column>,':',1,1) +1) as part1
from
<your table>
) t
Fiddle
One option would be
select replace(regexp_substr(str,'[^:]+$'),
regexp_substr(str,'[^:][^ ]+$'),'') as course_id
from tab
Demo
where first regexp_substr() extracts the substring starting from the colon to the end, and the second one from the last space to the end.

replace all occurrences of a sub string between 2 charcters using sql

Input string: ["1189-13627273","89-13706681","118-13708388"]
Expected Output: ["14013627273","14013706681","14013708388"]
What I am trying to achieve is to replace any numbers till the '-' for each item with hard coded text like '140'
SELECT replace(value_to_replace, '-', '140')
FROM (
VALUES ('1189-13627273-77'), ('89-13706681'), ('118-13708388')
) t(value_to_replace);
check this
I found the right way to achieve that using the below regular expression.
SELECT REGEXP_REPLACE (string_to_change, '\\"[0-9]+\\-', '140')
You don't need a regexp for this, it's as easy as concatenation of 140 and the substring from - (or the second part when you split by -)
select '140'||substring('89-13706681' from position('-' in '89-13706681')+1 for 1000)
select '140'||split_part('89-13706681','-',2)
also, it's important to consider if you might have instances that don't contain - and what would be the output in this case
Use regexp_replace(text,text,text) function to do so giving the pattern to match and replacement string.
First argument is the value to be replaced, second is the POSIX regular expression and third is a replacement text.
Example
SELECT regexp_replace('1189-13627273', '.*-', '140');
Output: 14013627273
Sample data set query
SELECT regexp_replace(value_to_replace, '.*-', '140')
FROM (
VALUES ('1189-13627273'), ('89-13706681'), ('118-13708388')
) t(value_to_replace);
Caution! Pattern .*- will replace every character until it finds last occurence of - with text 140.

Remove last two characters from each database value

I run the following query:
select * from my_temp_table
And get this output:
PNRP1-109/RT
PNRP1-200-16
PNRP1-209/PG
013555366-IT
How can I alter my query to strip the last two characters from each value?
Use the SUBSTR() function.
SELECT SUBSTR(my_column, 1, LENGTH(my_column) - 2) FROM my_table;
Another way using a regular expression:
select regexp_replace('PNRP1-109/RT', '^(.*).{2}$', '\1') from dual;
This replaces your string with group 1 from the regular expression, where group 1 (inside of the parens) includes the set of characters after the beginning of the line, not including the 2 characters just before the end of the line.
While not as simple for your example, arguably more powerful.

How to get rightmost 10 places of a string in oracle

I am trying to fetch an id from an oracle table. It's something like TN0001234567890345. What I want is to sort the values according to the right most 10 positions (e.g. 4567890345). I am using Oracle 11g. Is there any function to cut the rightmost 10 places in Oracle SQL?
You can use SUBSTR function as:
select substr('TN0001234567890345',-10) from dual;
Output:
4567890345
codaddict's solution works if your string is known to be at least as long as the length it is to be trimmed to. However, if you could have shorter strings (e.g. trimming to last 10 characters and one of the strings to trim is 'abc') this returns null which is likely not what you want.
Thus, here's the slightly modified version that will take rightmost 10 characters regardless of length as long as they are present:
select substr(colName, -least(length(colName), 10)) from tableName;
Another way of doing it though more tedious. Use the REVERSE and SUBSTR functions as indicated below:
SELECT REVERSE(SUBSTR(REVERSE('TN0001234567890345'), 1, 10)) FROM DUAL;
The first REVERSE function will return the string 5430987654321000NT.
The SUBSTR function will read our new string 5430987654321000NT from the first character to the tenth character which will return 5430987654.
The last REVERSE function will return our original string minus the first 8 characters i.e. 4567890345
SQL> SELECT SUBSTR('00000000123456789', -10) FROM DUAL;
Result: 0123456789
Yeah this is an old post, but it popped up in the list due to someone editing it for some reason and I was appalled that a regular expression solution was not included! So here's a solution using regex_substr in the order by clause just for an exercise in futility. The regex looks at the last 10 characters in the string:
with tbl(str) as (
select 'TN0001239567890345' from dual union
select 'TN0001234567890345' from dual
)
select str
from tbl
order by to_number(regexp_substr(str, '.{10}$'));
An assumption is made that the ID part of the string is at least 10 digits.