Removal of first characters in a string oracle sql - sql

It may be very simple question, but I have run out of ideas.
I would like to remove first 5 characters from string.
Example string will look like:
1Y40K100R
I would like to display only digits that are after '%K' which in this case should give me result of 100R.
Please note that number after 'K' can have different amount of digits. It can be 4 digit number or 2 digit number.

Just use substr():
select substr(col, 6)
This returns all characters starting at the sixth.
There are multiple ways to return all characters after the k. If you know the string has a k, then use instr():
select substr(col, instr(col, 'K') + 1)

You can use regexp_substr
select regexp_substr('1Y40K100R', '(K)(.*)', 1, 1, 'i', 2) from dual
A way without regexp:
select substr('1Y40K100R', instr('1Y40K100R', 'K') +1) from dual
This may appear not so elegant, but it usually performs better than the regexp way.

Related

How do I dynamically extract substring from string?

I’m trying to dynamically extract a substring from a very long URL. For example, I may have the following URLs:
https://www.google.com/ABCDEF Version=“0.0.00.0” GHIJK
https://www.google.com/ABCDEFGH Version=“0.0.0.0” IJKLM
https://www.google.com/ABC Version=“0.0.0.00” 12345
I am trying to extract the version code only (0.0.0.0).
This is what I have so far:
SELECT SUBSTR(col, INSTR(col, ‘Version=“‘)+9)
FROM table
This query returns the following result:
0.0.00.0” GHIJK … (url continues on)
So, I attempt to find “Version” in the link, so I can start from the same position in each row. This works fine, however I’m having a hard time dynamically locating the ending quote (“). I tried using INSTR in the third parameter of my SUBSTR function, like so:
SELECT SUBSTR(col, INSTR(col, ‘Version=“‘)+9, INSTR(col, ‘“‘))
FROM table
I figured that this would find the position of the ending quote, and then use that number for the length, but it returns a strange output. I’ve also used POSITION, CHARINDEX, LENGTH, and LOCATE. None of these functions work in Oracle.
I think maybe when I put +9 after the first INSTR function, it’s setting the query to a fixed position instead of a dynamic one, but I’m not sure how else to remove ‘Version=“‘.
Here's one option (which, actually, selects what's between double quotes - that's version in your example; if there were some other similar substring, you'd get a wrong result).
with test (col) as
(select 'https://www.google.com/ABCDEF Version="0.0.00.0" GHIJK' from dual union all
select 'https://www.google.com/ABCDEFGH Version="0.0.0.0" IJKLM' from dual union all
select 'https://www.google.com/ABC Version="0.0.0.00" 12345' from dual
)
select col,
replace(regexp_substr(col, '".+"'), '"') version
from test;
which results in
https://www.google.com/ABCDEF Version="0.0.00.0" GHIJK 0.0.00.0
https://www.google.com/ABCDEFGH Version="0.0.0.0" IJKLM 0.0.0.0
https://www.google.com/ABC Version="0.0.0.00" 12345 0.0.0.00
You can still use use INSTR to locate the second " in the string, then subtract the location of the first " to get the length that you need to get. Below is an example query:
SELECT col,
SUBSTR (col, INSTR (col, '"') + 1, INSTR (col, '"', 1, 2) - INSTR (col, '"') - 1) version
FROM test;
You can use REGEXP_SUBSTR() with Version=(\d.*\d?) pattern in order to extract the piece between Version=" and "(your quotes are presumed to be regular double quotes " ")
SELECT REGEXP_SUBSTR(url,'Version="(\d.*\d)"',1,1,null,1) AS version
FROM t
where
the third argument(1) is position,
the fourth argument(1) is occurence, and especially important to use the last one as being capture group (1)
indeed using '"(\d.*\d)"' pattern is enough for the
current data set
or
REGEXP_REPLACE() with capture group \2 as
SELECT REGEXP_REPLACE(url,'^(.*Version=")([^"]*).*','\2') AS version
FROM t
Demo

How to get the substring before the last occurrence of a character?

In oracle I have a string that looks like this:
\A\B\C\D
Now I would like to get the substring before the last occurrence of \. I don't know how long the string in total will be, as A, B, C and D can differ in length, and the number of \ might vary as well.
What is the most efficient way to get \A\B\C\?
You could use a combination of instr (with -1) and substr:
with data as (select '\A\B\C\D' str from dual)
select str, substr(str, 1, instr(str, '\', -1))
from data;
You can use regexp_replace():
select regexp_replace(str, '[^\\]+$', '')
Here is a db<>fiddle.
A solution through a Regex function might be
SELECT REGEXP_REPLACE(str, '[^\\]+$')
FROM t
Demo
if it is desired that that last occurrence of the \ character be skipped also, then this might work.
REGEXP_SUBSTR (str,'(.*)(\.*$)',
1,
1,
NULL,
1)
this was learned from this post https://stackoverflow.com/a/63458155/18021603

How to use Regex in Oracle SQL request

I have this complex string in a database, which represents 4 elements separated by delimeters '|'
1944913|200010|157|4*
1944913|200085|157|1.22*
NOTE: both lines of strings exist in the same database row. There could be up to 10 lines of strings in a single cell
I want an oracle sql command that, given either 200010 or 200085, returns either the 1st or second element next to it, like:
Given 200010, returns 157 or returns 4*
Given 200085, returns 157 or returns 1.22
How would I go about doing that?
You can use regexp_substr and instr to get the required results:
select a.*,
CASE WHEN instr(text1,'200085')>0 then REGEXP_SUBSTR (substr(text1,instr(text1,'200085'),500),'[^,|]+', 1, 2)
ELSE NULL END as partA,
CASE WHEN instr(text1,'200085')>0 then REGEXP_SUBSTR (substr(text1,instr(text1,'200085'),500),'[^,|]+', 1, 3)
ELSE NULL END as partB
from tableA a
Simply replace '200085' with the string you are looking for and you should be good to go. Hope this helps.
You can split this into columns easily. So, I think something like this is pretty much what you want:
select regexp_substr(col, '[^|]+', 1, 2) as val_2,
regexp_substr(col, '[^|]+', 1, 3) as val_3,
regexp_substr(col, '[^|]+', 1, 4) as val_4
from t;

How to get file name without extension with using Regular Expressions

I have a field with following values, now i want to extract only those rows with "xyz" in the field value mentioned below, can you please help?
Mydata_xyz_aug21
Mydata2_zzz_aug22
Mydata3_xyz_aug33
One more requirement
I want to extract only "aIBM_MyProjectFile" from following string below, can you please help me with this?
finaldata/mydata/aIBM_MyProjectFile.exe.ld
I've tried this but it didn't work.
select
regexp_substr('FinalProject/MyProject/aIBM_MyProjectFile.exe.ld','([^/]*)[\.]') exp
from dual;
To extract substrings between the first pair of underscores, you need to use
regexp_substr('Mydata_xyz_aug21','_([^_]+)_', 1, 1, NULL, 1)
To get the file name without the extension, you need
regexp_substr('FinalProject/MyProject/aIBM_MyProjectFile.exe.ld','.*/([^.]+)', 1, 1, NULL, 1)
Note that each regex contains a capturing group (a pattern inside (...)) and this value is accessed with the last 1 argument to the regexp_substr function.
The _([^_]+)_ pattern finds the first _, then places 1 or more chars other than _ into Group 1 and then matches another _.
The .*/([^.]+) pattern matches the whole text up to the last /, then captures 1 or more chars other than . into Group 1 using ([^.]+).
For the first requirement, it would suffice to use LIKE, as posted in answer above:
SELECT column
FROM table
WHERE column LIKE '%xyz%';
For your second requirement (extraction) you will have to use REGEXP_SUBSTR function:
SELECT REGEXP_SUBSTR ('FinalProject/MyProject/aIBM_MyProjectFile.exe.ld', '.*/([^.]+)', 1, 1, NULL, 1)
FROM DUAL
I hope it helped!
Another way to do this is to skip regexp completely:
WITH
aset AS
(SELECT 'with_extension.txt' txt FROM DUAL
UNION ALL
SELECT 'without_extension' FROM DUAL)
SELECT CASE
WHEN INSTR (txt, '.', -1) > 0
THEN
SUBSTR (txt, 1, INSTR (txt, '.', -1) - 1)
ELSE
txt
END
txt
FROM aset
The result of this is
with_extension
without_extension
A BIG Caveat where the regexp is better:
My method doesn't handle this case correctly:
\this\is.a\test
So after I have gone to all this effort, stay with the regexp solutions. I'll leave this here so that others may learn from it.

regexp_substr strip text between first forward slash and second one

/abc/required_string/2/ should return abc with regexp_substr
SELECT REGEXP_SUBSTR ('/abc/blah/blah/', '/([a-zA-Z0-9]+)/', 1, 1, NULL, 1) first_val
from dual;
You might try the following:
SELECT TRIM('/' FROM REGEXP_SUBSTR(mycolumn, '^\/([^\/]+)'))
FROM mytable;
This regular expression will match the first occurrence of a pattern starting with / (I habitually escape /s in regular expressions, hence \/ which won't hurt anything) and including any non-/ characters that follow. If there are no such characters then it will return NULL.
Hope this helps.
You can search for /([^/]+)/, which says:
/ forward slash
( start of subexpression (usually called "group" in other languages)
[^/] any character other than forward slash
+ match the preceding expression one or more times
) end of subexpression
/ forward slash
You can use the 6th argument to regexp_substr to select a subexpression.
Here we pass 1 to match only the characters between the /s:
select regexp_substr(txt, '/([^/]+)/', 1, 1, null, 1)
from t1
See it working at SQL Fiddle.
Classic SUBSTR + INSTR offer a simple solution; I know you specified regular expressions, but - consider this too, might work better for a large data volume.
SQL> with test (col) as
2 (select '/abc/required_string/2/' from dual)
3 select substr(col, 2, instr(col, '/', 1, 2) - 2) result
4 from test;
RES
---
abc
SQL>
Here's another way to get the 2nd occurrence of a string of characters followed by a forward slash. It handles the problem if that element happens to be NULL as well. Always expect the unexpected!
Note: If you use the regex form of [^/]+, and that element is NULL it will return "required string" which is NOT what you expect! That form does NOT handle NULL elements. See here for more info: [https://stackoverflow.com/a/31464699/2543416]
with tbl(str) as (
select '/abc/required_string/2/' from dual union all
select '//required_string1/3/' from dual
)
select regexp_substr(str, '(.*?)(/)', 1, 2, null, 1)
from tbl;