Regular expression to detect doubled vowels in Oracle [duplicate]

Regular expression to detect doubled vowels in Oracle [duplicate] - sql

This question already has answers here:
Reference - What does this regex mean?
(1 answer)
Carets in Regular Expressions
(2 answers)
Have trouble understanding capturing groups and back references
(2 answers)
Closed 5 years ago.
Can anyone explain this regular expression? This query is used in Oracle to returns the last name for those employees with a double vowel (where last_name contains two adjacent occurrences of either a, e, i, o, or u, regardless of case):
SELECT last_name
FROM employees
WHERE REGEXP_LIKE (last_name, '([aeiou])\1', 'i');
The output is :
LAST_NAME
---------------
De Haan
Greenberg
Khoo
Gee
Greene
Lee
Bloom
Feeney

The regex pattern ([aeiou])\1 simply matches two vowels in succession:
([aeiou]) match and capture a single vowel
\1 then match the same vowel we just captured
If you examine the matching last names, you will see that they all have repeating vowels in some position. By the way, the term \1 is known as a backreference, which refers to a captured quantity earlier in the pattern.
Explore the helpful demo below to better understand how the pattern works.
Demo

Related

How to extract alphanumeric phrase from a string - if it doesn't exist I wish to flag it

I have a column that can have the following possible values -
ITO26218361281- JANE
SBC28791827135 VATS
SOT21092832917 JOHN DOE
TIM INQ12109283291
JANE DOE 12/15
I only want to extract the 14 characters
alphanumeric phrase from the strings that can look like above. If the record is like (5), I still want that record to exist to be able to call it out as an error. I don't need the exact text to be the same, I just need it to be flagged for error.
Result expected -
ITO26218361281
SBC28791827135
SOT21092832917
INQ12109283291
JANE DOE 12/15 (or flagged as error)

You can use a regular expression to match the pattern you need, 3 letters and 11 numbers.
Using this in WHERE clause, you can match all "valid" values:
SELECT *
FROM TableName
WHERE ColumnName like '%[A-Z][A-Z][A-Z][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%'
I'm not a master of patterns nor regular expression, so used a "simple to understand" pattern here.
With this query, you extract the data to other table, or UPDATE a table column with the flag you want.
You can see this query working here in sqlfiddle.com

How to remove the last 3 words from a string in PL/SQL? [duplicate]

This question already has answers here:
Regex - How to replace the last 3 words of a string with PHP
(3 answers)
Closed 4 years ago.
I have strings like these:
Jack & Bauer Limited Company Bristol
Streetfood Limited Company München
Brouse with High Jack UnlimiteD Company London
What I want to have is just the company names like:
Jack & Bauer
Streetfood
Brouse with High Jack
So in every case, I have to delete the last 3 words, because the names can be consist a lot of words.
I know I have to use regexp, but I dont know how.

While you can use regular expressions to do this you don't have to. This task can be accomplished using a combination of INSTR and SUBSTR:
SELECT SUBSTR(FIELD1, 1, INSTR(FIELD1, ' ', -1, 3)-1) AS NAME
FROM TABLE1
SQLFiddle here
Best of luck.

Here is one method:
select regexp_replace(str, '( [^ ]+){3}$', '')
Here is a rextester.

Need help solving SQL Oracle Counting Characters

we have a large set of data and the professor is asking us to do the following:
Amy Gray has seven characters in her name. (The space between her first and last name does not count.) J. J. Brown has ten in his name. (The space and periods in J. J. count as characters.) Allison Black-White has eighteen in hers. (The hyphen counts as a character.)
Create a view named A9T4 that will display the size and the total number of students whose combined first and last name has that size. The two column headings should be Name_Size and Students. The rows should be sorted by descending size.
Note: As a simple check of your work, the longest name in A9 has 22 characters and the three shortest names have seven characters.

I used the Oracle DUMP, SUBSTR, and REGEXP_LIKE function to get the count.
http://www.techonthenet.com/oracle/functions/dump.php
http://www.techonthenet.com/oracle/functions/substr.php
http://www.techonthenet.com/oracle/functions/regexp_substr.php
CREATE TABLE SCHEMA1.NAMES (
eval_name VARCHAR2(100 CHAR)
);
insert into SCHEMA1.NAMES values('Amy Gray');
insert into SCHEMA1.NAMES values('J. J. Brown');
insert into SCHEMA1.NAMES values('Allison Black-White');
commit;
select eval_name, REGEXP_SUBSTR(SUBSTR(DUMP(eval_name),11), '^[0-9]*')-1 from SCHEMA1.NAMES;
--returns
Amy Gray 7
J. J. Brown 10
Allison Black-White 18
DUMP('Amy Gray') -- gives us 'Typ=1 Len=8: 65,109,121,32,71,114,97,121'
SUBSTR(DUMP('Amy Gray'),11) -- starts at position eleven, giving us
'8: 65,109,121,32,71,114,97,121'
REGEXP_SUBSTR(SUBSTR(dump('Amy Gray'),11), '^[0-9]*') -- gives us '8', all digits from the beginning of the string '^' to the first non-digit, ':'
--and the -1 removes the expected space between the first and last names.

Fuzzy text searching in Oracle

I have a large Oracle DB table which contains street names for a whole country, which has 600000+ rows. In my application, I take an address string as input and want to check whether specific substrings of this address string matches one or many of the street names in the table, such that I can label that address substring as the name of a street.
Clearly, this should be a fuzzy text matching problem, there is only a small chance that the substring I query has an exact match with the street names in DB table. So there should be some kind of fuzzy text matching approach. I am trying to read the Oracle documentation at http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm in which CONTAINS and CATSEARCH search operators are explained. But these seem to be used for more complex tasks like searching a match for the given string in documents. I just want to do that for a column of a table.
What do you suggest me in this case, does Oracle have support for such kind of fuzzy text matching queries?

UTL_MATCH contains methods for matching strings and comparing their similarity. The
edit distance, also known as the Levenshtein Distance, might be a good place to start. Since one string is a substring it may help to compare the edit distance
relative to the size of the strings.
--Addresses that are most similar to each substring.
select substring, address, edit_ratio
from
(
--Rank edit ratios.
select substring, address, edit_ratio
,dense_rank() over (partition by substring order by edit_ratio desc) edit_ratio_rank
from
(
--Calculate edit ratio - edit distance relative to string sizes.
select
substring,
address,
(length(address) - UTL_MATCH.EDIT_DISTANCE(substring, address))/length(substring) edit_ratio
from
(
--Fake addreses (from http://names.igopaygo.com/street/north_american_address)
select '526 Burning Hill Big Beaver District of Columbia 20041' address from dual union all
select '5206 Hidden Rise Whitebead Michigan 48426' address from dual union all
select '2714 Noble Drive Milk River Michigan 48770' address from dual union all
select '8325 Grand Wagon Private Sleeping Buffalo Arkansas 72265' address from dual union all
select '968 Iron Corner Wacker Arkansas 72793' address from dual
) addresses
cross join
(
--Address substrings.
select 'Michigan' substring from dual union all
select 'Not-So-Hidden Rise' substring from dual union all
select '123 Fake Street' substring from dual
)
order by substring, edit_ratio desc
)
)
where edit_ratio_rank = 1
order by substring, address;
These results are not great but hopefully this is at least a good starting point. It should work with any language. But you'll still probably want to combine this with some language- or locale- specific comparison rules.
SUBSTRING ADDRESS EDIT_RATIO
--------- ------- ----------
123 Fake Street 526 Burning Hill Big Beaver District of Columbia 20041 0.5333
Michigan 2714 Noble Drive Milk River Michigan 48770 1
Michigan 5206 Hidden Rise Whitebead Michigan 48426 1
Not-So-Hidden Rise 5206 Hidden Rise Whitebead Michigan 48426 0.5

You could make use of the SOUNDEX function available in Oracle databases. SOUNDEX computes a numeric signature of a text string. This can be used to find strings which sound similar and thus reduce the number of string comparisons.
Edited:
If SOUNDEX is not suitable for your local language, you can ask Google for a phonetic signature or phonetic matching function which performs better. This function has to be evaluated once per new table entry and once for every query. Therefore, it does not need to reside in Oracle.
Example: A Turkish SOUNDEX is promoted here.
To increase the matching quality, the street name spelling should be unified in a first step. This could be done by applying a set of rules:
Simplified example rules:
Convert all characters to lowercase
Remove "str." at the end of a name
Remove "drv." at the end of a name
Remove "place" at the end of a name
Remove "ave." at the end of a name
Sort names with multiple words alphabetically
Drop auxiliary words like "of", "and", "the", ...

Oracle PL/SQL : report formatting [duplicate]

This question already has answers here:
How to retrieve two columns data in A,B format in Oracle
(4 answers)
Closed 9 years ago.
I have been asked to create a report using PL/SQL to get names associated with a given city. While this is not in the least bit difficult, I find that the way in which the data is to be presented is something I have not seen from SQL. The report needs to be formatted such that the city name appears first and all subsequent people associated with that city are to be listed after that - in a single line.
TEMPE: Rich Allen, Jerry Black, et al..
TUSCON: Bob Adams, Frank Bruce, et al..
I can't recall ever seeing output like this and am a bit stuck on how to present this data.
Any suggestions would be appreciated.

Select city, wmConcat(namefield)
from tablename
group by City
WM_CONCAT for versions 10g and prior
LISTAGG for newer versions.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Regular expression to detect doubled vowels in Oracle [duplicate] - sql

Related

How to extract alphanumeric phrase from a string - if it doesn't exist I wish to flag it

How to remove the last 3 words from a string in PL/SQL? [duplicate]

Need help solving SQL Oracle Counting Characters

Fuzzy text searching in Oracle

Oracle PL/SQL : report formatting [duplicate]

Categories

Resources