How do I Count the words in a string using regex - sql

I'm trying to count the words in a string using regex in Oracle 10g.
I've been trying this
select *
from books
where REGEXP_LIKE(title, '[ ]{2}');
so that its returning titles with at least 3 words in the title.

INSTR is also a viable option. By looking for the second occurrence of a space, that will indicate that the string has at least 3 words.
WITH
books
AS
(SELECT 'Tom Sawyer' title FROM DUAL
UNION ALL
SELECT 'A tale of two cities' FROM DUAL
UNION ALL
SELECT 'The Little Prince' FROM DUAL
UNION ALL
SELECT 'Don Quixote' FROM DUAL)
SELECT title
FROM books
WHERE instr(title, ' ', 1, 2) > 0;
If you do with to stick with regex, the regex expression below can be used to find books that have 3 or more words.
WITH
books
AS
(SELECT 'Tom Sawyer' title FROM DUAL
UNION ALL
SELECT 'A tale of two cities' FROM DUAL
UNION ALL
SELECT 'The Little Prince' FROM DUAL
UNION ALL
SELECT 'Don Quixote' FROM DUAL)
SELECT title
FROM books
WHERE REGEXP_LIKE (title, '(\S+\s){2,}');
(Thanks #Littlefoot for the books!)

REPLACE does the job (with some calculation).
SQL> with books as
2 (select 'Tom Sawyer' title from dual union all
3 select 'A tale of two cities' from dual union all
4 select 'The Little Prince' from dual union all
5 select 'Don Quixote' from dual
6 )
7 select title
8 from books
9 where length(title) - length(replace(title, ' ', '')) >= 2;
TITLE
--------------------
A tale of two cities
The Little Prince
SQL>

The below one is simple and easy to understand (works on 11g and later):
The below is just to create some sample data
create table books as
with tab as
(
select 'Tom Sawyer' title from dual
union all
select 'A tale of two cities' from dual
union all
select 'The Little Prince' from dual
union all
select 'The_Little_Prince' from dual
union all
select 'Don Quixote' from dual
union all
select null from dual
)
select title
from tab;
The below is your solution to get those titles that have at least 3 words in it
select title
from books
where regexp_count(title, '\w+') > 2
Output:

Related

How to use regex to select rows where the column has more than two words in oracle

for example:
id
center
1
man
2
some men here
I want to select rows with three or more words so ouput should be:
id
center
2
some men here
I've tried using this: regexp_like(center, '\w{3,}') but it's not giving the expected output.
You can use REGEXP_COUNT to look for more than 2 sets of words
WITH
some_table (id, center)
AS
(SELECT 1, 'man' FROM DUAL
UNION ALL
SELECT 2, 'some men here' FROM DUAL)
SELECT *
FROM some_table
WHERE REGEXP_COUNT (center, '\w+') > 2;
You could use the regex pattern \w+ \w+ \w+:
SELECT id, center
FROM yourTable
WHERE REGEXP_LIKE(center, '\w+[:space:]+\w+[:space:]+\w+);
I think this is the regex you are looking for:
regexp_like(center, '((\s|^)\w+(\s|$)?){3,}')
or with a short test:
select * from (
select 'abc' center
from dual
union all
select 'abc def'
from dual
union all
select 'abc def ghi'
from dual
union all
select 'abc def ghi jkl'
from dual
)
where regexp_like(center, '((\s|^)\w+(\s|$)?){3,}')
It says
Start of line or whitespace
One or more letters
Whitespace or end of line, non-greedy
Repeat all of the above at least three times

Convert a series of Number values in Text in Oracle SQL Query

In the Oracle database, I have string values (VARCHAR2) like 1,4,7,8. The number represents as 1=car, 2= bus, 3=BB, 4=SB, 5=Ba, 6=PA, 7=HB, and 8 =G
and want to convert the above-said example to "car,SB,HB,G" in my query results
I tried to use "Decode" but it does not work. Please advise how to make it works. Would appreciate.
Thanks`
Initially, I have used the following query:
Select Clientid as C#, vehicletypeExclusions as vehicle from
clients
The sample of outcomes are:
C# Vehicle
20 1,19,20,23,24,7,5
22 1,19,20,23,24,7,5
I also tried the following that gives me the null value of vehicles:
Select Clientid as C#, Decode (VEHICLETYPEEXCLUSIONS, '1', 'car',
'3','bus', '5','ba' ,'7','HB', '8','G'
, '9','LED1102', '10','LED1104', '13','LED8-2',
'14','Flip4-12', '17','StAT1003', '19','Taxi-Min', '20','Tax_Sed',
'21','Sup-veh' , '22','T-DATS', '23','T-Mini',
'24','T-WAM') as vehicle_Ex from clients >
Here's one option. Read comments within code. Sample data in lines #1 - 13; query begins at line #14.
SQL> with
2 expl (id, name) as
3 (select 1, 'car' from dual union all
4 select 2, 'bus' from dual union all
5 select 3, 'BB' from dual union all
6 select 4, 'SB' from dual union all
7 select 5, 'Ba' from dual union all
8 select 6, 'PA' from dual union all
9 select 7, 'HB' from dual union all
10 select 8, 'G' from dual
11 ),
12 temp (col) as
13 (select '1,4,7,8' from dual),
14 -- split COL to rows
15 spl as
16 (select regexp_substr(col, '[^,]+', 1, level) val,
17 level lvl
18 from temp
19 connect by level <= regexp_count(col, ',') + 1
20 )
21 -- join SPL with EXPL; aggregate the result
22 select listagg(e.name, ',') within group (order by s.lvl) result
23 from expl e join spl s on s.val = e.id;
RESULT
--------------------------------------------------------------------------------
car,SB,HB,G
SQL>
Using the function f_subst from https://stackoverflow.com/a/68537479/429100 :
create or replace
function f_subst(str varchar2, template varchar2, subst sys.odcivarchar2list) return varchar2
as
res varchar2(32767):=str;
begin
for i in 1..subst.count loop
res:=replace(res, replace(template,'%d',i), subst(i));
end loop;
return res;
end;
/
I've replaced ora_name_list_t (nested table) with sys.odcivarchar2list (varray) to make this example easier, but I would suggest to create your own collection for example create type varchar2_table as table of varchar2(4000);
Example:
select
f_subst(
'1,4,7,8'
,'%d'
,sys.odcivarchar2list('car','bus','BB','SB','Ba','PA','HB','G')
) s
from dual;
S
----------------------------------------
car,SB,HB,G
Assume you have a lookup table (associating the numeric codes with descriptions) and a table of input strings, which I called sample_inputs in my tests, as shown below:
create table lookup (code, descr) as
select 1, 'car' from dual union all
select 2, 'bus' from dual union all
select 3, 'BB' from dual union all
select 4, 'SB' from dual union all
select 5, 'Ba' from dual union all
select 6, 'PA' from dual union all
select 7, 'HB' from dual union all
select 8, 'G' from dual
;
create table sample_inputs (str) as
select '1,4,7,8' from dual union all
select null from dual union all
select '3' from dual union all
select '5,5,5' from dual union all
select '6,2,8' from dual
;
One strategy for solving your problem is to split the input - slightly modified to make it a JSON array, so that we can use json_table to split it - then join to the lookup table and re-aggregate.
select s.str, l.descr_list
from sample_inputs s cross join lateral
( select listagg(descr, ',') within group (order by ord) as descr_list
from json_table( '[' || str || ']', '$[*]'
columns code number path '$', ord for ordinality)
join lookup l using (code)
) l
;
STR DESCR_LIST
------- ------------------------------
1,4,7,8 car,SB,HB,G
3 BB
5,5,5 Ba,Ba,Ba
6,2,8 PA,bus,G

How to add a space to an existing string in Oracle character functions without using regular expressions

I have a field as name in a table with names inserted without spaces. Eg: "MarkJones".
Now I want to create a space between the first and lastname of a person within the same column to be displayed as "Mark Jones" using Oracle functions.
I have tried this query
SELECT instr('MarkJones', '%||Upper(*)||%') AS substr1,
SUBSTR('MarkJones', instr('MarkJones', '%lower(*)upper(*)%')) AS substr2,
substr1||' '||substr2
FROM dual
;
However, this query is not working. I want to try it using oracle functions including translate, substr and instr, but no regular expressions.
This approach works for the simple example given, but fails if the name has more than 2 uppercase letters in it. If this is coursework as expected, maybe the requirements are not too difficult for the names to parse as we all know that is fraught with heartache and you can never account for 100% of names from all nationalities.
Anyway my approach was to move through the string looking for uppercase letters and if found replace them with a space followed by the letter. I used the ASCII function to test their ascii value to see if they were an uppercase character. The CONNECT BY construct (needed to loop through each character of the string) returns each character in its own row so LISTAGG() was employed to reassemble back into a string and ltrim to remove the leading space.
I suspect if this is coursework it may be using some features you should not be using yet. At least you should get out of this the importance of receiving and/or giving complete specifications!
SQL> with tbl(name) as (
select 'MarkJones' from dual
)
select ltrim(listagg(case
when ascii(substr(name, level, 1)) >= 65 AND
ascii(substr(name, level, 1)) <= 90 THEN
' ' || substr(name, level, 1)
else substr(name, level, 1)
end, '')
within group (order by level)) fixed
from tbl
connect by level <= length(name);
FIXED
------------------------------------
Mark Jones
When you are ready, here's the regexp_replace version anyway :-)
Find and "remember" the 2nd occurrence of an uppercase character then replace it with a space and the "remembered" uppercase character.
SQL> with tbl(name) as (
select 'MarkJones' from dual
)
select regexp_replace(name, '([A-Z])', ' \1', 1, 2) fixed
from tbl;
FIXED
----------
Mark Jones
Not sure we should go against #Alex Poole advice, but it looks like an homework assignment.
So my idea is to point the second Upper Case. Its doable if you create a set of the upper cases, on which you valuate the position in input string iStr. Then if you're allowed to use length, you can use this position to build firstName too:
SELECT substr(iStr, 1, length(iStr)-length(substr(iStr, instr(iStr, u)))) firstName
, substr(iStr, instr(iStr, u)) lastName
, substr(iStr, 1, length(iStr)-length(substr(iStr, instr(iStr, u)))) ||' '||
substr(iStr, instr(iStr, u)) BINGO
FROM ( select 'MarkJones' iStr from dual
union all select 'SomeOtherNames' from dual -- 2 u-cases gives 2 different results
union all select 'SomeOtherOols' from dual -- only one result
union all select 'AndJim' from dual
union all select 'JohnLenon' from dual
union all select 'LemingWay' from dual
),
( select 'A' U from dual
union all select 'B' from dual
union all select 'C' from dual
union all select 'D' from dual
union all select 'E' from dual
union all select 'F' from dual
union all select 'G' from dual
union all select 'H' from dual
union all select 'I' from dual
union all select 'J' from dual
union all select 'K' from dual
union all select 'L' from dual
union all select 'M' from dual
union all select 'N' from dual
union all select 'O' from dual
union all select 'P' from dual
union all select 'Q' from dual
union all select 'R' from dual
union all select 'S' from dual
union all select 'T' from dual
union all select 'U' from dual
union all select 'V' from dual
union all select 'W' from dual
union all select 'X' from dual
union all select 'Y' from dual
union all select 'Z' from dual
) upper_cases
where instr(iStr, U) > 1
;

Extract numbers from a string in Informix

There are strings in my table as follows:
select '1. name 1' from dual union all
select '2. name 2' from dual union all
select '11. name 3' from dual union all
select '12. name 4' from dual
I need to extract the first numbers:
1 2 11 12
IBM claims that Informix supports substring_index(). If so:
select substring_index(col, '.', 1)
This doesn't exactly get the first number. It returns the first part of the string before the '.', which appears to be the same thing.

Oracle SQL -- find the values NOT in a table

Take this table WORDS
WORD
Hello
Aardvark
Potato
Dog
Cat
And this list:
('Hello', 'Goodbye', 'Greetings', 'Dog')
How do I return a list of words that AREN'T in the words table, but are in my list?
If I have a table that "contains all possible words", I can do:
SELECT * from ALL_WORDS_TABLE
where word in ('Hello', 'Goodbye', 'Greetings', 'Dog')
and word not in
(SELECT word from WORDS
where word in ('Hello', 'Goodbye', 'Greetings', 'Dog')
);
However I do not have such a table. How else can this be done?
Also, constructing a new table is not an option because I do not have that level of access.
Instead of hard coding the list values into rows, use DBMS_DEBUG_VC2COLL to dynamically convert your delimited list into rows, then use the MINUS operator to eliminate rows in the second query that are not in the first query:
select column_value
from table(sys.dbms_debug_vc2coll('Hello', 'Goodbye', 'Greetings', 'Dog'))
minus
select word
from words;
Try this solution :
SELECT
a.word
FROM
(
SELECT 'Hello' word FROM DUAL UNION
SELECT 'Goodbye' word FROM DUAL UNION
SELECT 'Greetings' word FROM DUAL UNION
SELECT 'Dog' word FROM DUAL
) a
LEFT JOIN ALL_WORDS_TABLE t ON t.word = a.word
WHERE
t.word IS NULL
You can turn your list into a view like this:
select 'Hello' as word from dual
union all
select 'Goodbye' from dual
union all
select 'Greetings' from dual
union all
select 'Dog' from dual
Then you can select from that:
select * from
(
select 'Hello' as word from dual
union all
select 'Goodbye' from dual
union all
select 'Greetings' from dual
union all
select 'Dog' from dual
)
where word not in (select word from words);
Possibly not as neat a solution as you might have hoped for...
You say you don't have sufficient privileges to create tables, so presumably you can't create types either - but if you can find a suitable type "lying around" in your database you can do this:
select * from table (table_of_varchar2_type('Hello','Goodbye','Greetings','Dog'))
where column_value not in (select word from words);
Here table_of_varchar2_type is imagined to be the name of a type that is defined like:
create type table_of_varchar2_type as table of varchar2(100);
One such type you are likely to be able to find is SYS.KU$_VCNT which is a TABLE OF VARCHAR2(4000).