how to get regexp_substr for a string - sql

In my table for the rows containing values like
sample>test Y10,
Sample> y21
I want to get a substring like y10,y21 from all rows. May I pls know how to get it. I tried with regexp_substr,Instr but not able to find the solution.

I am supposing that your string from column is devided by a single space .
It will give you last occurances which will be splited by ' ' a space
substr(your_string, 1, instr(yourString,' ') - 1)
OR you can achive this using regexp_substr
regexp_substr(your_String, '[^[:space:]]+', 1, -1 )

Assuming that yxx is always preceded by a space, it should be as easy as doing this:
TRIM(REGEXP_SUBSTR(mycolumn, ' y\d+', 1, 1, 'i'))
The above regular expression will grab y (note that it is case-insensitive, so it will grab Y as well) followed by an indefinite number (one or more) of digits. If you want to grab just two digits, replace \d+ with \d{2}.
Also, please note that it will get the first occurrence only. Getting multiple occurrences is a bit more complicated, but it can still be done.

Related

SnowflakeSQL get me last text after specific char

I'm trying to retrieve last part of string after underscore.. but every record has different number of underscore and I'm not sure how to write it correctly.
Example:
aaa_bb_cccc_dddd_ee - only ee
aaa_bb_cccc_dddd - only dddd
sss_aas_ww_ww_ww_bb - only bb
As you can see there is different number of underscore and I need only last part after last underscore.
I have been playing with regex and split_part but since I don't now how to point to the last _ then it's not working correctly.
My Idea is to start reading string from the right a then you just pick the first one but couldn't find a way how to do it.
It's probably basic thing but I'm struggling with it so please help.
Use REGEXP_SUBSTR:
SELECT col, REGEXP_SUBSTR(col, '_([^_]+)$', 1, 1, 'e', 1) AS col_out
FROM yourTable;
You can make split_part change the direction of split using negative integers
select split_part(col,'_',-1)
from your_table;
You could also just regexp_replace everything up until and including the last underscore
select regexp_replace(col,'.*_','')
from your_table;

How to extract a number part of a field using regex_substr function?

I need to extract the numerical part of values in a column (varchar) if there exists a number in the value.
ColumnA has values like ABC, M365, J344, MCT etc.
I would like to check the entire value from second position and if is a number I would like to extract it, for instance,
a. M365, from 2nd position 365 is a number so I would like to return this substring.
b. M3AB, from 2nd position 3AB is not a number so I would not want to return this substring.
I tried regex_substr('M365', '[0-9]', 2) but this is not how I want and it only returns what is there in the second position but not the entire substring.
This seems to do what you want:
select regexp_substr(substr(x, 2), '^\d+$')
This starts matching the pattern at the second position in the string, requiring that a number start there.
[0-9] only searches for one number. You want to know if they are all numbers, so you need the '+' operator. For more info, visit:
https://www.techonthenet.com/oracle/functions/regexp_substr.php
The following code should work for you.
regex_substr('M365', '[0-9]+', 2)

Use REGEXP_SUBSTR to extract string of varied length

I want to extract alphanumeric text of varied length from a string between the second occurrence of a specific characters.
I have tried various forms of substr and regexp_substr but can't seem to get the syntax right. This is for use in Teradata SQL assistant. In the past I would have to create a temp table and use substr twice before trimming down the string to what I need. I want to do it all in one go.
SELECT regexp_substr('Channel:DF GB, Order Num:12345T6, Order Date:01/01/2019, Charge Codes:TAXES,,GBRAX', 'Num\\:+(\\:+)',1,2, ':') as RESULTING_STRING
My desired result is to return ONLY what is between "Num:" and the next "," in this case "12345T6". The length of the order number can vary so it is not a fixed length. When I run my code the actual output is a '?' returned by Teradata. What am I doing wrong?
This seems to work:
SELECT regexp_substr('Channel:DF GB, Order Num:12345T6, Order Date:01/01/2019, Charge Codes:TAXES,,GBRAX', 'Num:(\w*)', 1, 1, NULL, 1) as RESULTING_STRING from dual
Finds Num: and then captures as many word characters (, is not a word char) as there are available. The last parameter - subexpr - specifies which subexpression (aka capture group) you want, without it the whole thing will be matched (Num:12345T6).
Assuming you use Teradata SQL Assistant to query a Teradata system (but why do you tag Oracle then) the RegEx syntax is slightly different (both use a different RegEx dialects):
Teradata's RegExp_Substr doesn't support the subexpression parameter, you can either switch to the (I really don't know why) undocumented RegExp_Substr_gpl
RegExp_Substr_gpl(x, 'Num:([^,]*)', 1, 1, 'i', 1)
or tell the RegEx to forget the previous match using \K:
RegExp_Substr(x, 'Num:\K[^,]*', 1,1, 'i')
You can give a try to the below pattern search !
SELECT REGEXP_REPLACE ((REGEXP_SUBSTR('Channel:DF GB, Order Num:12345T6, Order Date:01/01/2019, Charge Codes:TAXES,,GBRAX', 'Num:[A-Za-z0-9]*',1,1, 'i')),'Num:','',1,1,'i') AS RESULTING_STRING
Regexp_substr pattern search ['Num:[A-Za-z0-9]*'], will first filter out the alphanumeric characters that follow the pattern 'Num:',astriek, helps to find out zero or more occurrences of the specified pattern.
For eg:, in this 'Num:12345T6' will be filtered out of the string provided, also note the last parameter in the regexp_substr is 'i', which ensures case in-specific search.
Lastly, Regexp_replace will replace the pattern 'Num:' from the output of the regexp_substr with an empty string,resulting in a final string as '12345T6'.

Extracting specific part of column values in Oracle SQL

I want to extract a specific part of column values.
The target column and its values look like
TEMP_COL
---------------
DESCOL 10MG
TEGRAL 200MG 50S
COLOSPAS 135MG 30S
The resultant column should look like
RESULT_COL
---------------
10MG
200MG
135MG
This can be done using a regular expression:
SELECT regexp_substr(TEMP_COL, '[0-9]+MG')
FROM the_table;
Note that this is case sensitive and it always returns the first match.
I would probably approach this using REGEXP_SUBSTR() rather than base functions, because the structure of the prescription text varies from record to record.
SELECT TRIM(REGEXP_SUBSTR(TEMP_COL, '(\s)(\S*)', 1, 1))
FROM yourTable
The pattern (\s)(\S*) will match a single space followed by any number of non-space characters. This should match the second term in all cases. We use TRIM() to remove a leading space which is matched and returned.
how do you know what is the part you want to extract? how do you know where it begins and where it ends? using the white-spaces?
if so, you can use substr for cutting the data and instr for finding the white-spaces.
example:
select substr(tempcol, -- string
instr(tempcol, ' ', 1), -- location of first white-space
instr(tempcol, ' ', 1, 2) - instr(tempcol, ' ', 1)) -- length until next space
from dual
another solution is using regexp_substr (but it might be harder on performance if you have a lot of rows):
SELECT REGEXP_SUBSTR (tempcol, '(\S*)(\s*)', 1, 2)
FROM dual;
edit: fixed the regular expression to include expressions that don't have space after the parsed text. sorry about that.. ;)

How can I extract a substring from a character column without using SUBSTR()?

I have a questions regarding below data.
You clearly can see each EMP_IDENTIFIER has connected with EMP_ID.
So I need to pull only identifier which is 10 characters that will insert another column.
How would I do that?
I did some traditional way, using INSTR, SUBSTR.
I just want to know is there any other way to do it but not using INSTR, SUBSTR.
EMP_ID(VARCHAR2)EMP_IDENTIFIER(VARCHAR2)
62049 62049-2162400111
6394 6394-1368000222
64473 64473-1814702333
61598 61598-0876000444
57452 57452-0336503555
5842 5842-0000070666
75778 75778-0955501777
76021 76021-0546004888
76274 76274-0000454999
73910 73910-0574500122
I am using Oracle 11g.
If you want the second part of the identifier and it is always 10 characters:
select t.*, substr(emp_identifier, -10) as secondpart
from t;
Here is one way:
REGEXP_SUBSTR (EMP_IDENTIFIER, '-(.{10})',1,1,null,1)
That will give the 1st 10 character string that follows a dash ("-") in your string. Thanks to mathguy for the improvement.
Beyond that, you'll have to provide more details on the exact logic for picking out the identifier you want.
Since apparently this is for learning purposes... let's say the assignment was more complicated. Let's say you had a longer input string, and it had several groups separated by -, and the groups could include letters and digits. You know there are at least two groups that are "digits only" and you need to grab the second such "purely numeric" group. Then something like this will work (and there will not be an instr/substr solution):
select regexp_substr(input_str, '(-|^)(\d+)(-|$)', 1, 2, null, 2) from ....
This searches the input string for one or more digits ( \d means any digit, + means one or more occurrences) between a - or the beginning of the string (^ means beginning of the string; (a|b) means match a OR b) and a - or the end of the string ($ means end of the string). It starts searching at the first character (the second argument of the function is 1); it looks for the second occurrence (the argument 2); it doesn't do any special matching such as ignore case (the argument "null" to the function), and when the match is found, return the fragment of the match pattern included in the second set of parentheses (the last argument, 2, to the regexp function). The second fragment is the \d+ - the sequence of digits, without the leading and/or trailing dash -.
This solution will work in your example too, it's just overkill. It will find the right "digits-only" group in something like AS23302-ATX-20032-33900293-CWV20-3499-RA; it will return the second numeric group, 33900293.