How do I extract consonants from a string field? - sql

How do I extract only the consonants from a field in records that contain names?
For example, if I had the following record in the People table:
Field
Value
Name
Richard
How could I extract only the consonants in "Richard" to get "R,c,r,d"?

If you mean "how can I remove all vowels from the input" so that 'Richard' becomes 'Rchrd', then you can use the translate function as Boneist has shown, but with a couple more subtle additions.
First, you can completely remove a character with translate, if it appears in the second argument and it doesn't have a corresponding "translate to" character in the third argument.
Second, alas, if the third (and last) argument to translate is null the function returns null (and the same if the last argument is the empty string; there is a very small number of instances where Oracle does not treat the empty string as null, but this is not one of them). So, to make the whole thing work, you need to add an extra character to both the second and the third argument - a character you do NOT want to remove. It may be anything (it doesn't even need to appear in the input string), just not one of the characters to remove. In the illustration below I use the period character (.) but you can use any other character - just not a vowel.
Pay attention too to upper vs lower case letters. Ending up with:
with
sample_inputs (name) as (
select 'Richard' from dual union all
select 'Aliosha' from dual union all
select 'Ai' from dual union all
select 'Ng' from dual
)
select name, translate(name, '.aeiouAEIOU', '.') as consonants
from sample_inputs
;
NAME CONSONANTS
------- ----------
Richard Rchrd
Aliosha lsh
Ai
Ng Ng

Should be able to string a couple replace functions together
Select replace(replace(Value, 'A', ''), 'E', '')),...etc

You can easily do this with the translate() function, e.g.:
WITH people AS (SELECT 'Name' field, 'Richard' val FROM dual UNION ALL
SELECT 'Name' field, 'Siobhan' val FROM dual)
SELECT field, val, TRANSLATE(val, 'aeiou', ',,,,,') updated_val
FROM people;
FIELD VAL UPDATED_VAL
----- ------- -----------
Name Richard R,ch,rd
Name Siobhan S,,bh,n
The translate function simply takes a list of characters and - based on the second list of characters, which defines the translation - translates the input string.
So in the above example, the a (first character in the first list) becomes a , (first character in the second list), the e (second character in the first list) becomes a , (second character in the second list), etc.
N.B. I really, really hope your key-value table is just a made-up example for the situation you're trying to solve, and not an actual production table; in general, key-value tables are a terrible idea in a relational database!

Related

How to remove leftmost group of numbers from string in Oracle SQL?

I have a string like T_44B56T4 that I'd like to make T_B56T4. I can't use positional logic because the string could instead be TE_2BMT that I'd like to make TE_BMT.
What is the most concise Oracle SQL logic to remove the leftmost grouping on consecutive numbers from the string?
EDIT:
regex_replace is unavailable but I have LTRIM,REPLACE,SUBSTR, etc.
would this fit the bill? I am assuming there are alphanumeric characters, then underscore, and then the numbers you want to remove followed by anything.
select regexp_replace(s, '^([[:alnum:]]+)_\d*(.*)$', '\1_\2')
from (
select 'T_44B56T4' s from dual union all
select 'TXM_1JK7B' from dual
)
It uses regular expressions with matched groups.
Alphanumeric characters before underscore are matched and stored in first group, then underscore followed by 0-many digits (it will match as many digits as possible) followed by anything else that is stored in second group.
If we have a match, the string will be replaced by content of the first group followed by underscore and content of the second group.
if there is no match, the string will not be changed.
It seems that you must use standard string functions, as regular expression functions are not available to you. (Comment under Gordon Linoff's answer; it would help if you would add the same at the bottom of your original question, marked clearly as EDIT).
Also, it seems that the input will always have at least one underscore, and any digits that must be removed will always be immediately after the first underscore.
If so, here is one way you could solve it:
select s, substr(s, 1, instr(s, '_')) ||
ltrim(substr(s, instr(s, '_') + 1), '0123456789') as result
from (
select 'T_44B56T4' s from dual union all
select 'TXM_1JK7B' from dual union all
select '34_AB3_1D' from dual
)
S RESULT
--------- ------------------
T_44B56T4 T_B56T4
TXM_1JK7B TXM_JK7B
34_AB3_1D 34_AB3_1D
I added one more test string, to show that only digits immediately following the first underscore are removed; any other digits are left unchanged.
Note that this solution would very likely be faster than regexp solutions, too (assuming that matters; sometimes it does, but often it doesn't).
If I understand correctly, you can use regexp_replace():
select regexp_replace('T_44B56T4', '_[0-9]+', '_')
Here is a db<>fiddle with your two examples.
Note: Your questions says the left most grouping, but the examples all have the number following an underscore, so the underscore seems to be important.
EDIT:
If you really just want the first string of digits replaced without reference to the underscore:
select regexp_replace(code, '[0-9]+', '', 1, 1)
from (select 'T_44B56T4' as code from dual union all select 'TE_2BMT' from dual ) t

fixed number format with different lengths in Oracle

I need help with a Oracle Query
I have a query:
scenario 1: select to_char('1737388250',what format???) from dual;
expected output: 173,7388250
scenario 2: select to_char('173738825034',what format??) from dual;
expected output: 173,738825034
scenario 3: select to_char('17373882',what format??) from dual;
expected output: 173,73882
I need a query to satify all above scenarios?
Can some one help please?
It is possible to get the desired result with a customized format model given to to_char; I show one example below. However, any solution along these lines is just a hack (a solution that should work correctly in all cases, but using features of the language in ways they weren't intended to be used).
Here is one example - this will work if your "inputs" are positive integers greater than 999 (that is: at least four digits).
with
sample_data (num) as (
select 1737388250 from dual union all
select 12338 from dual
)
select num, to_char(num, rpad('fm999G', length(num) + 3, '9')) as formatted
from sample_data
;
NUM FORMATTED
---------- ------------
1737388250 173,7388250
12338 123,38
This assumes comma is the "group separator" in nls_numeric_characters; if it isn't, that can be controlled with the third argument to to_char. Note that the format modifier fm is needed so that no space is prepended to the resulting string; and the +3 in the second argument to rpad accounts for the extra characters in the format model (f, m and G).
You can try
select TO_CHAR(1737388250, '999,99999999999') from dual;
Take a look here
Your requirement is different so you can use substr and concatanation as follows:
select substr(your_number,1,3)
|| case when your_number >= 1000 then ',' end
|| substr(1737388250,4)
from dual;
Db<>fiddle
Your "number" is enclosed in single-quotes. This makes it a character string, albeit a string of only numeric characters. But a character string, nonetheless. So it makes no sense to pass a character string to TO_CHAR.
Everyone's suggestions are eliding over this and useing and using an actual number .. notice the lack of single-quotes in their code.
You say you always want a comma after the first three "numbers" (characters), which makes no sense from a numerical/mathematical sense. So just use INSTR and insert the comma:
select substr('123456789',1,3)||','||substr('123456789',4) from dual:
If the source data is actually a number, then pass it to to_char, and wrap that in substr:
select substr(to_char(123456789),1,3)||','||substr(to_char(123456789,4) from dual:

Oracle SQL regexp_substr number extraction behavior

In a sense I've answered my own question, but I'm trying to understand the answer better:
When using regexp_substr (in oracle) to extract the first occurrence of a number (either single or multi digits), how/why do the modifiers * and + impact the results? Why does + provide the behavior I'm looking for and * does not? * is my default usage in most regular expressions so I was surprised it didn't suit my need.
For example, in the following:
select test,
regexp_substr(TEST,'\d') Pattern1,
regexp_substr(TEST,'\d*') Pattern2,
regexp_substr(TEST,'\d+') Pattern3
from (
select '123 W' TEST from dual
union
select 'W 123' TEST from dual
);
the use of regexp_substr(TEST,'\d*') returns null value for the input "W 123" - since 'zero or more' digits exist in the string, I'm confused by this behavior. I'm also confused why it does work on the string '123 W'
my understanding is that * means zero or more occurrences of the element it follows and + means 1 or more occurrence of the preceding element. In the example provided for pattern2 [\d*] why does it successfully capture "123" from "123 W" but it does not take 123 from "W 123" as zero or more occurrences of a digit do exist, they just don't exist in the beginning of the string. Is there additional [implied] logic attached to using *?
Note: I looked around for a while trying to find similar questions that helped me capture the '123' from 'W 123' but the closest i found was variations of regexp_replace which would not meet my needs.
So the regexp_count indicates there are FOUR substrings that match the \d* pattern.
The third of those is the '123'. The implication is that the first and second are derived from the W and space and what you have is a zero length result that 'consumes' one character of the source string.
select test,
regexp_count(TEST,'\d*') Pattern2_c,
regexp_substr(TEST,'\d*') Pattern2,
regexp_substr(TEST,'\d*',1,1) Pattern2_1,
regexp_substr(TEST,'\d*',1,2) Pattern2_2,
regexp_substr(TEST,'\d*',1,3) Pattern2_3,
regexp_substr(TEST,'\d*',1,4) Pattern2_4
from (select '123 W' TEST from dual
union
select 'W 123' TEST from dual
);
Oracle has a weird thing about zero length strings and null.
The result doesn't "feel" right, but then if you ask a computer deep philosophical questions about how many zero length substrings are contained in a string, I wouldn't bet on any answer.
After thinking through this, it actually makes sense. The pattern \d* is saying to match any number zero or more times. The problem here is that the beginning of the string will always match this pattern, because of the zero or more times.
If the string begins with a number, then it will include those numbers, so given 123 W, the pattern matches 123. However, given the pattern W 123 the pattern also matches at the beginning, but it matches against 0 characters. This is why you get a NULL result.
This is a general regex thing and not an Oracle thing. You have to be careful with the * quantifier.
Here are two regex fiddle examples to illustrate this, using the string W 123:
\d+ shows 1 match on 123
\d* shows 1 match on nothing (i.e. the beginning of the string)

REGEXP to insert special characters, not remove

How would i put double quotes around the two fields that are missing it? Would i be able to use like a INSTR/SUBSTR/REPLACE in one statement to accomplish it?
string := '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"';
Expected string := '"ES26653","ABCBEVERAGES","861526999728","**606.32**","2017-01-26","2017-01-27","","","**77910467**","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"';
Please suggest! Thank you.
This answer does not work in this case, because some fields contain commas. I am leaving it in case it helps anyone else.
One rather brute force method for internal fields is:
replace(replace(string, ',', '","'), '""', '"')
This adds double quotes on either side of a comma and then removes double double quotes. You don't need to worry about "". It becomes """" and then back to "".
This can be adapted for the first and last fields as well, but it complicates the expression.
This offering attempts to address a number of end cases:
Addressing issues with first and last fields. Here only the last field is a special case as we look out for the end-of-string $ rather than a comma.
Empty unquoted fields i.e. leading commas, consecutive commas and trailing commas.
Preserving a pair of double quotes within a field representing a single double quote.
The SQL:
WITH orig(str) AS (
SELECT '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"'
FROM dual
),
rpl_first(str) AS (
SELECT REGEXP_REPLACE(str, '("(([^"]|"")*)"|([^,]*))(,|$)','"\2\4"\5')
FROM orig
)
SELECT REGEXP_REPLACE(str, '"""$','"') fixed_string
FROM rpl_first;
The technique is to find either a quoted field and remember it or a non-quoted field and remember it, terminated by a comma or end-of-string and remember that. The answers is then a " followed by one of the fields followed by " and then the terminator.
The quoted field is basically "[^"]*" where [^"] is a any character that is not a quote and * is repeated zero or more times. This is complicated by the fact the not-a-quote character could also be a pair of quotes so we need an OR construct (|) i.e. "([^"]|"")*". However we must remember just the field inside the quotes so add brackets so we can later back reference just that i.e. "(([^"]|"")*)".
The unquoted field is simply a non-comma repeated zero or more times where we want to remember it all ([^,]*).
So we want to find either of these, the OR construct again i.e. ("(([^"]|"")*)"|([^,]*)). Followed by the terminator, either a comma or end-of-string, which we want to remember i.e. (,|$).
Now we can replace this with one of the two types of field we found enclosed in quotes followed by the terminator i.e. "\2\4"\5. The number n for the back reference \n is just a matter of counting the open brackets.
The second REGEXP_REPLACE is to work around something I suspect is an Oracle bug. If the last field is quoted then a extra pair of quotes is added to the end of the string. This suggests that the end-of-string is being processed twice when it is parsed, which would be a bug. However regexp processing is probably done by a standard library routine so it may be my interpretation of the regexp rules. Comments are welcome.
Oracle regexp documentation can be found at Using Regular Expressions in Database Applications.
My thanks to #Gary_W for his template. Here I am keeping the two separate regexp blocks to separate the bit I can explain from the bit I can't (the bug?).
This method makes 2 passes on the string. First look for a grouping of a double-quote followed by a comma, followed by a character that is not a double-quote. Replace them by referring to them with the shorthand of their group, the first group, '\1', the missing double-quote, the second group '\2'. Then do it again, but the other way around. Sure you could nest the regex_replace calls and end up with one big ugly statement, but just make it 2 statements for easier maintenance. The guy working on this after you will thank you, and this is ugly enough as it is.
SQL> with orig(str) as (
select '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017
-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA
","NE","68144"'
from dual
),
rpl_first(str) as (
select regexp_replace(str, '(",)([^"])', '\1"\2')
from orig
)
select regexp_replace(str, '([^"])(,")', '\1"\2') fixed_string
from rpl_first;
FIXED_STRING
--------------------------------------------------------------------------------
"ES26653","ABCBEVERAGES","861526999728","606.32","2017-01-26","2017-01-27","",""
,"77910467","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","681
44"
SQL>
EDIT: Changed regex's and added a third step to allow for empty, unquoted fields per Unoembre's comment. Good catch! Also added additional test cases. Always expect the unexpected and make sure to add test cases for all data combinations.
SQL> with orig(str) as (
select '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2
017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OM
AHA","NE","68144"'
from dual union
select 'ES26653,"ABCBEVERAGES","861526999728"' from dual union
select '"ES26653","ABCBEVERAGES",861526999728' from dual union
select '1S26653,"ABCBEVERAGES",861526999728' from dual union
select '"ES26653",,861526999728' from dual
),
rpl_empty(str) as (
select regexp_replace(str, ',,', ',"",')
from orig
),
rpl_first(str) as (
select regexp_replace(str, '(",|^)([^"])', '\1"\2')
from rpl_empty
)
select regexp_replace(str, '([^"])(,"|$)', '\1"\2') fixed_string
from rpl_first;
FIXED_STRING
--------------------------------------------------------------------------------
"ES26653","ABCBEVERAGES","861526999728","606.32","2017-01-26","2017-01-27","",""
,"77910467","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","681
44"
"ES26653","ABCBEVERAGES","861526999728"
"ES26653","","861526999728"
"1S26653","ABCBEVERAGES","861526999728"
"ES26653","ABCBEVERAGES","861526999728"
SQL>

Select vowels from a varchar, Oracle PL/SQL

I'm trying to pull up the count of the vowels contained in a varchar,
I've been looking around in google, no success though.
Can anyone give me a hand with this one?
Something like
select length(regexp_replace('andrew','[^AEIOUaeiou]')) as vowels from dual;
If you're using Oracle 11g, you can use the REXEXP_COUNT function to determine what matches the pattern.
SQL> select regexp_count('andrew', '[aeiou]', 1, 'i') as vowels
2 from dual;
VOWELS
----------
2
The first parameter is the string you want to match, 'andrew'.
The second parameter is the match pattern, in this case [aeiou]. The [] indicates a character list; the parser matches any and all characters inside this list in any order.
The third parameter, 1, is the start position indicating the positional index of the string where Oracle should start searching for a match. It's included solely so I can use the fourth parameter.
The fourth parameter is a match parameter, 'i' indicates that I want to do case insensitive matching. This is the reason why the character list is not [aeiouAEIOU].
If you're using 10g then REGEXP_COUNT doesn't exist. In this case you could use a more exact version of Annjawan's solution with REGEXP_REPLACE.
SQL> select length(regexp_replace('andrew','[^aeiou]', '', 1, 0, 'i')) as vowels
2 from dual;
VOWELS
----------
2
The carat (^) indicates a not, i.e. the replaces every character in the string 'andrew' that is not in the character list [aeiou] with the empty string. The next parameter, once again, is the start position. The fifth parameter, 0 indicates that you want to replace every occurrence of the pattern that matches and once again I've used the match parameter 'i' to indicate case insensitive matching.
Gaurav's answer is incorrect. This is because within the character list he has included comma's. Remember that everything within the character list get's matched if it is available. So, if I introduce a comma into your string you'll have 3 "vowels" in your string:
SQL> select regexp_count('an,drew','[a,e,i,o,u,A,E,I,O,U]' ) as vowels
2 from dual;
VOWELS
----------
3
Regular expressions are not simple beasts and I would highly recommend reading the documentation when attempting them.
SELECT length('andrew')
- length(REGEXP_REPLACE('andrew','[a,e,i,o,u,A,E,I,O,U]',''))
FROM DUAL;
Output:2 -- a and e are two vowels here.
If you are using Oracle 11g then
SELECT REGEXP_COUNT('andrew','[a,e,i,o,u,A,E,I,O,U]' ) from dual