Select vowels from a varchar, Oracle PL/SQL - sql

I'm trying to pull up the count of the vowels contained in a varchar,
I've been looking around in google, no success though.
Can anyone give me a hand with this one?

Something like
select length(regexp_replace('andrew','[^AEIOUaeiou]')) as vowels from dual;

If you're using Oracle 11g, you can use the REXEXP_COUNT function to determine what matches the pattern.
SQL> select regexp_count('andrew', '[aeiou]', 1, 'i') as vowels
2 from dual;
VOWELS
----------
2
The first parameter is the string you want to match, 'andrew'.
The second parameter is the match pattern, in this case [aeiou]. The [] indicates a character list; the parser matches any and all characters inside this list in any order.
The third parameter, 1, is the start position indicating the positional index of the string where Oracle should start searching for a match. It's included solely so I can use the fourth parameter.
The fourth parameter is a match parameter, 'i' indicates that I want to do case insensitive matching. This is the reason why the character list is not [aeiouAEIOU].
If you're using 10g then REGEXP_COUNT doesn't exist. In this case you could use a more exact version of Annjawan's solution with REGEXP_REPLACE.
SQL> select length(regexp_replace('andrew','[^aeiou]', '', 1, 0, 'i')) as vowels
2 from dual;
VOWELS
----------
2
The carat (^) indicates a not, i.e. the replaces every character in the string 'andrew' that is not in the character list [aeiou] with the empty string. The next parameter, once again, is the start position. The fifth parameter, 0 indicates that you want to replace every occurrence of the pattern that matches and once again I've used the match parameter 'i' to indicate case insensitive matching.
Gaurav's answer is incorrect. This is because within the character list he has included comma's. Remember that everything within the character list get's matched if it is available. So, if I introduce a comma into your string you'll have 3 "vowels" in your string:
SQL> select regexp_count('an,drew','[a,e,i,o,u,A,E,I,O,U]' ) as vowels
2 from dual;
VOWELS
----------
3
Regular expressions are not simple beasts and I would highly recommend reading the documentation when attempting them.

SELECT length('andrew')
- length(REGEXP_REPLACE('andrew','[a,e,i,o,u,A,E,I,O,U]',''))
FROM DUAL;
Output:2 -- a and e are two vowels here.
If you are using Oracle 11g then
SELECT REGEXP_COUNT('andrew','[a,e,i,o,u,A,E,I,O,U]' ) from dual

Related

How do I extract consonants from a string field?

How do I extract only the consonants from a field in records that contain names?
For example, if I had the following record in the People table:
Field
Value
Name
Richard
How could I extract only the consonants in "Richard" to get "R,c,r,d"?
If you mean "how can I remove all vowels from the input" so that 'Richard' becomes 'Rchrd', then you can use the translate function as Boneist has shown, but with a couple more subtle additions.
First, you can completely remove a character with translate, if it appears in the second argument and it doesn't have a corresponding "translate to" character in the third argument.
Second, alas, if the third (and last) argument to translate is null the function returns null (and the same if the last argument is the empty string; there is a very small number of instances where Oracle does not treat the empty string as null, but this is not one of them). So, to make the whole thing work, you need to add an extra character to both the second and the third argument - a character you do NOT want to remove. It may be anything (it doesn't even need to appear in the input string), just not one of the characters to remove. In the illustration below I use the period character (.) but you can use any other character - just not a vowel.
Pay attention too to upper vs lower case letters. Ending up with:
with
sample_inputs (name) as (
select 'Richard' from dual union all
select 'Aliosha' from dual union all
select 'Ai' from dual union all
select 'Ng' from dual
)
select name, translate(name, '.aeiouAEIOU', '.') as consonants
from sample_inputs
;
NAME CONSONANTS
------- ----------
Richard Rchrd
Aliosha lsh
Ai
Ng Ng
Should be able to string a couple replace functions together
Select replace(replace(Value, 'A', ''), 'E', '')),...etc
You can easily do this with the translate() function, e.g.:
WITH people AS (SELECT 'Name' field, 'Richard' val FROM dual UNION ALL
SELECT 'Name' field, 'Siobhan' val FROM dual)
SELECT field, val, TRANSLATE(val, 'aeiou', ',,,,,') updated_val
FROM people;
FIELD VAL UPDATED_VAL
----- ------- -----------
Name Richard R,ch,rd
Name Siobhan S,,bh,n
The translate function simply takes a list of characters and - based on the second list of characters, which defines the translation - translates the input string.
So in the above example, the a (first character in the first list) becomes a , (first character in the second list), the e (second character in the first list) becomes a , (second character in the second list), etc.
N.B. I really, really hope your key-value table is just a made-up example for the situation you're trying to solve, and not an actual production table; in general, key-value tables are a terrible idea in a relational database!

How do I extract data between two strings based on a pattern in Oracle SQL

I want to extract the data from a column which is of type CLOB in oracle SQL based on a specific pattern. I tried different things with regex nothing worked so far.
PFB the example on how the data would look like and the expected output.
Sample Data:
I should extract CLOB column preceding the word LIST until one word before the .(dot)
PS: CLOB can have CR LF / Carriage return within the pattern.
Expected Output:
Here is how I would do this. Note a couple of things:
The output preserves newlines that existed in the input. You didn't
say anything about removing them; however, your output doesn't show
them. In any case - they can be removed, if needed, but that is an
unrelated process.
You say "word" but obviously you are using that in a sense different
from the common usage in regular expressions. In regexp, "word
characters" are only letters, digits and underscore; yet your
"words" include brackets, equal sign, and who knows what else. I interpreted the term "word" to mean any
sequence of consecutive non-whitespace characters.
Here is how we can recreate your data. When you ask a question here, this is how you should provide sample data - not as an image that we can't copy and paste in an SQL editor.
CREATE TABLE sample_data( col_a varchar2(20), col_b CLOB );
INSERT INTO sample_data VALUES
('12345', to_clob(
'Created:2/28/2019
Updated:1/19/2021
LIST:[ABC][DEF][GHI]
[LMNO][PQRST]
[Location=BLAH].[City=BLAH]'));
INSERT INTO sample_data VALUES
('12346', to_clob(
'Created:2/28/2019
Updated:1/19/2021
LIST:[ABC][DEF][GHI]
[LMNO][PQRST]
[SOC].[RAW]'));
commit;
Then here is the query and the output. Note that, depending on your interface (in my case: SQL Developer, which uses a SQL*Plus-like interface), you may need to change some settings so that the output is not truncated. In particular, in SQL*Plus, CLOB columns are truncated to 80 characters by default; I had to
set long 100
So - query and output:
select col_a, col_b,
regexp_substr(col_b, '(\s|^)(LIST:[^.]*?)\s+\S+\.', 1, 1, null, 2)
as result
from sample_data
;
COL_A COL_B RESULT
----- ------------------------------ ------------------------------
12345 Created:2/28/2019 LIST:[ABC][DEF][GHI]
Updated:1/19/2021 [LMNO][PQRST]
LIST:[ABC][DEF][GHI]
[LMNO][PQRST]
[Location=BLAH].[City=BLAH]
12346 Created:2/28/2019 LIST:[ABC][DEF][GHI]
Updated:1/19/2021 [LMNO][PQRST]
LIST:[ABC][DEF][GHI]
[LMNO][PQRST]
[SOC].[RAW]
The regular expression matches a single whitespace character or the beginning of the string ((\s|^)), then the characters LIST: followed by as few consecutive, non-period characters (this will match spaces and newline characters, in particular) as needed to allow a match - which continues with one or more whitespace characters, followed by a single word (string of 1 or more non-whitespace characters) and a literal period (\.).
The expression we must return is enclosed in parentheses, so that we can return it from regexp_substr. Such an expression is called a "capture group". The regexp includes another capture group, (\s|^), out of necessity (alternation), so the capture group we must return is the second in the regexp. This is what the last argument to regexp_substr does: it instructs the function to return the second capture group.
Note a peculiar thing about the period (related to the much more general concept of escaping within bracket expressions): the period must be escaped to represent a literal period, rather than "any character", at the end of the regular expression; however, within the (negated) bracket expression [^.]*?, the period - representing a literal period, not "any character" - is not escaped. Oracle follows the ERE (extended regular expressions) dialect of the POSIX standard, and that standard says that escape sequences are invalid within bracket expressions. This is different from other regular expression dialect, and confuses a lot of users.
One option would be using REPLACE() in order to remove line feed (CHR(10)) and carriage return (CHR(13)), then REGEXP_REPLACE() functions recursively in order to extract the substring after LIST: upto the dot such as
SELECT col_a,
'LIST:'||REGEXP_REPLACE(REPLACE(REPLACE(col_b,CHR(10)),CHR(13)),'(.*LIST:)(\S+)(\..*)','\2') AS result
FROM t;
col_a result
------ -------
12345 LIST:[ABC][DEF][GHI][LMNO][PQRST][Location=BLAH]
12346 LIST:[ABC][DEF][GHI][LMNO][PQRST][SOC]
Demo
There may be more efficient ways to do this, but the following seems to work:
First I replace newline characters with spaces using TRANSLATE, then using regex find anything between LIST: and .. Then I remove the final "word" using SUBSTR and INSTR. I've used a subquery to prevent having to repeat the first steps.
SELECT
SubQuery.COL_A,
SUBSTR(SubQuery.WithWordAndDot, 1, INSTR(SubQuery.WithWordAndDot,' ',-1)-1) AS Result
FROM
(
SELECT
COL_A,
REGEXP_SUBSTR(TRANSLATE(COL_B, CHR(10)||CHR(13), ' '),'LIST:[^\.]+\.') as WithWordAndDot
FROM MyTable
) SubQuery
;

What is this Oracle regexp matching in this production code?

Here's the code that is in production:
dynamic_sql := q'[ with cte as
select user_id,
user_name
from user_table
where regexp_like (bizz_buzz,'^[^Z][^Y6]]' || q'[') AND
user_code not in ('A','E','I')
order by 1]';
Start at the beginning and search bizz_buzz
Match any one character that is NOT Z
Match any two characters that are not Y6
What's the ']' after the 6?
Then what?
I think that StackOverflow's formatting is causing some of the confusion in the answers. Oracle has a syntax for a string literal, q'[...]', which means that the ... portion is to be interpreted exactly as-is; so for instance it can include single quotes without having to escape each one individually.
But the code formatting here doesn't understand that syntax, so it is treating each single-quote as a string delimiter, which makes the result look different that how Oracle really sees it.
The expression is concatenating two such string literals together. (I'm not sure why - it looks like it would be possible to write this as a single string literal with no issues.) As pointed out in another answer/comment, the resulting SQL string is actually:
with cte as
select user_id,
user_name
from user_table
where regexp_like (bizz_buzz,'^[^Z][^Y6]') AND
user_code not in ('A','E','I')
order by 1
And also as pointed out in another answer, the [^Y6] portion of the regex matches a single character, not two. So this expression should simply match any string whose first character is not 'Z' and whose second character is neither 'Y' nor '6'.
When not in couples ] means... Well... Itself:
^[^Z][^Y6]]/
^ assert position at start of the string
[^Z] match a single character not present in the list below
Z the literal character Z (case sensitive)
[^Y6] match a single character not present in the list below
Y6 a single character in the list Y6 literally (case sensitive)
] matches the character ] literally
Start at the beginning and search bizz_buzz
Match any one character that is NOT Z
Match any two one characters that is not Y or 6
What's the ']' after the 6? it's a ]
I'm afraid I have to post this here as the comment section is inappropriate for the formatting required. After your edit above that shows the entire statement, I ran this to see what the string ends up being:
select q'[ with cte as
select user_id,
user_name
from user_table
where regexp_like (bizz_buzz,'^[^Z][^Y6]]' || q'[') AND
user_code not in ('A','E','I')
order by 1]' txt
from dual;
It ended up yielding this:
with cte as
select user_id,
user_name
from user_table
where regexp_like (bizz_buzz,'^[^Z][^Y6]') AND
user_code not in ('A','E','I')
order by 1
It is apparent now that the closing bracket and quote at the end of the regex belong to the first alternate quote string and not to the regex. This is concatenating 2 alternate quoted strings which is a tad confusing as it sure looked like part of the regex. If anything you are learning the importance of comments for the poor person behind you! Please comment this accordingly when you are done figuring this out. Even include a link to this post.

How to extract group from regular expression in Oracle?

I got this query and want to extract the value between the brackets.
select de_desc, regexp_substr(de_desc, '\[(.+)\]', 1)
from DATABASE
where col_name like '[%]';
It however gives me the value with the brackets such as "[TEST]". I just want "TEST". How do I modify the query to get it?
The third parameter of the REGEXP_SUBSTR function indicates the position in the target string (de_desc in your example) where you want to start searching. Assuming a match is found in the given portion of the string, it doesn't affect what is returned.
In Oracle 11g, there is a sixth parameter to the function, that I think is what you are trying to use, which indicates the capture group that you want returned. An example of proper use would be:
SELECT regexp_substr('abc[def]ghi', '\[(.+)\]', 1,1,NULL,1) from dual;
Where the last parameter 1 indicate the number of the capture group you want returned. Here is a link to the documentation that describes the parameter.
10g does not appear to have this option, but in your case you can achieve the same result with:
select substr( match, 2, length(match)-2 ) from (
SELECT regexp_substr('abc[def]ghi', '\[(.+)\]') match FROM dual
);
since you know that a match will have exactly one excess character at the beginning and end. (Alternatively, you could use RTRIM and LTRIM to remove brackets from both ends of the result.)
You need to do a replace and use a regex pattern that matches the whole string.
select regexp_replace(de_desc, '.*\[(.+)\].*', '\1') from DATABASE;

How to Select a substring in Oracle SQL up to a specific character?

Say I have a table column that has results like:
ABC_blahblahblah
DEFGH_moreblahblahblah
IJKLMNOP_moremoremoremore
I would like to be able to write a query that selects this column from said table, but only returns the substring up to the Underscore (_) character. For example:
ABC
DEFGH
IJKLMNOP
The SUBSTRING function doesn't seem to be up to the task because it is position-based and the position of the underscore varies.
I thought about the TRIM function (the RTRIM function specifically):
SELECT RTRIM('listofchars' FROM somecolumn)
FROM sometable
But I'm not sure how I'd get this to work since it only seems to remove a certain list/set of characters and I'm really only after the characters leading up to the Underscore character.
Using a combination of SUBSTR, INSTR, and NVL (for strings without an underscore) will return what you want:
SELECT NVL(SUBSTR('ABC_blah', 0, INSTR('ABC_blah', '_')-1), 'ABC_blah') AS output
FROM DUAL
Result:
output
------
ABC
Use:
SELECT NVL(SUBSTR(t.column, 0, INSTR(t.column, '_')-1), t.column) AS output
FROM YOUR_TABLE t
Reference:
SUBSTR
INSTR
Addendum
If using Oracle10g+, you can use regex via REGEXP_SUBSTR.
This can be done using REGEXP_SUBSTR easily.
Please use
REGEXP_SUBSTR('STRING_EXAMPLE','[^_]+',1,1)
where STRING_EXAMPLE is your string.
Try:
SELECT
REGEXP_SUBSTR('STRING_EXAMPLE','[^_]+',1,1)
from dual
It will solve your problem.
You need to get the position of the first underscore (using INSTR) and then get the part of the string from 1st charecter to (pos-1) using substr.
1 select 'ABC_blahblahblah' test_string,
2 instr('ABC_blahblahblah','_',1,1) position_underscore,
3 substr('ABC_blahblahblah',1,instr('ABC_blahblahblah','_',1,1)-1) result
4* from dual
SQL> /
TEST_STRING POSITION_UNDERSCORE RES
---------------- ------------------ ---
ABC_blahblahblah 4 ABC
Instr documentation
Susbtr Documentation
SELECT REGEXP_SUBSTR('STRING_EXAMPLE','[^_]+',1,1) from dual
is the right answer, as posted by user1717270
If you use INSTR, it will give you the position for a string that assumes it contains "_" in it. What if it doesn't? Well the answer will be 0. Therefore, when you want to print the string, it will print a NULL.
Example: If you want to remove the domain from a "host.domain". In some cases you will only have the short name, i.e. "host". Most likely you would like to print "host". Well, with INSTR it will give you a NULL because it did not find any ".", i.e. it will print from 0 to 0. With REGEXP_SUBSTR you will get the right answer in all cases:
SELECT REGEXP_SUBSTR('HOST.DOMAIN','[^.]+',1,1) from dual;
HOST
and
SELECT REGEXP_SUBSTR('HOST','[^.]+',1,1) from dual;
HOST
Another possibility would be the use of REGEXP_SUBSTR.
In case if String position is not fixed then by below Select statement we can get the expected output.
Table Structure
ID VARCHAR2(100 BYTE)
CLIENT VARCHAR2(4000 BYTE)
Data-
ID CLIENT
1001 {"clientId":"con-bjp","clientName":"ABC","providerId":"SBS"}
1002
--
{"IdType":"AccountNo","Id":"XXXXXXXX3521","ToPricingId":"XXXXXXXX3521","clientId":"Test-Cust","clientName":"MFX"}
Requirement - Search ClientId string in CLIENT column and return the corresponding value. Like From "clientId":"con-bjp" --> con-bjp(Expected output)
select CLIENT,substr(substr(CLIENT,instr(CLIENT,'"clientId":"')+length('"clientId":"')),1,instr(substr(CLIENT,instr(CLIENT,'"clientId":"')+length('"clientId":"')),'"',1 )-1) cut_str from TEST_SC;
--
CLIENT cut_str
----------------------------------------------------------- ----------
{"clientId":"con-bjp","clientName":"ABC","providerId":"SBS"} con-bjp
{"IdType":"AccountNo","Id":"XXXXXXXX3521","ToPricingId":"XXXXXXXX3521","clientId":"Test-Cust","clientName":"MFX"} Test-Cust
Remember this if all your Strings in the column do not have an underscore
(...or else if null value will be the output):
SELECT COALESCE
(SUBSTR("STRING_COLUMN" , 0, INSTR("STRING_COLUMN", '_')-1),
"STRING_COLUMN")
AS OUTPUT FROM DUAL
To find any sub-string from large string:
string_value:=('This is String,Please search string 'Ple');
Then to find the string 'Ple' from String_value we can do as:
select substr(string_value,instr(string_value,'Ple'),length('Ple')) from dual;
You will find result: Ple