Oracle query to find all occurrences of a charcter in a string

Oracle query to find all occurrences of a charcter in a string - sql

I have to write an Oracle query in toad to find all the occurrences of a character in a string. For example if I'm searching for R in the string SSSRNNSRSSR, it should return positions 4, 8 and 11.
I am new to Oracle and tried this.
select instr(mtr_ctrl_flags, 'R', pos + 1, 1) as pos1
from mer_trans_reject
where pos in ( select instr(mtr_ctrl_flags, 'R', 1, 1) as pos
from mer_trans_reject
);
where mtr_ctrl_flags is the column name. I'm getting an error indicating that pos is an invalid identifier.

Extending GolezTrol's answer you can use regular expressions to significantly reduce the number of recursive queries you do:
select instr('SSSRNNSRSSR','R', 1, level)
from dual
connect by level <= regexp_count('SSSRNNSRSSR', 'R')
REGEXP_COUNT() returns the number of times the pattern matches, in this case the number of times R exists in SSSRNNSRSSR. This limits the level of recursion to the exact number you need to.
INSTR() simply searches for the index of R in your string. level is the depth of the recursion but in this case it's also the level th occurrence of the string as we restricted to the number of recurses required.
If the string you're wanting to pick out is more complicated you could go for regular expressions ans REGEXP_INSTR() as opposed to INSTR() but it will be slower (not by much) and it's unnecessary unless required.
Simple benchmark as requested:
The two CONNECT BY solutions would indicate that using REGEXP_COUNT is 20% quicker on a string of this size.
SQL> set timing on
SQL>
SQL> -- CONNECT BY with REGEX
SQL> declare
2 type t__num is table of number index by binary_integer;
3 t_num t__num;
4 begin
5 for i in 1 .. 100000 loop
6 select instr('SSSRNNSRSSR','R', 1, level)
7 bulk collect into t_num
8 from dual
9 connect by level <= regexp_count('SSSRNNSRSSR', 'R')
10 ;
11 end loop;
12 end;
13 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:03.94
SQL>
SQL> -- CONNECT BY with filter
SQL> declare
2 type t__num is table of number index by binary_integer;
3 t_num t__num;
4 begin
5 for i in 1 .. 100000 loop
6 select pos
7 bulk collect into t_num
8 from ( select substr('SSSRNNSRSSR', level, 1) as character
9 , level as pos
10 from dual t
11 connect by level <= length('SSSRNNSRSSR') )
12 where character = 'R'
13 ;
14 end loop;
15 end;
16 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:04.80
The pipelined table function is a fair bit slower, though it would be interesting to see how it performs over large strings with lots of matches.
SQL> -- PIPELINED TABLE FUNCTION
SQL> declare
2 type t__num is table of number index by binary_integer;
3 t_num t__num;
4 begin
5 for i in 1 .. 100000 loop
6 select *
7 bulk collect into t_num
8 from table(string_indexes('SSSRNNSRSSR','R'))
9 ;
10 end loop;
11 end;
12 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:06.54

This is a solution:
select
pos
from
(select
substr('SSSRNNSRSSR', level, 1) as character,
level as pos
from
dual
connect by
level <= length(t.text))
where
character = 'R'
dual is a built in table that just returns a single row. Very convenient!
connect by lets you build recursive queries. This is often used to generate lists from tree-like data (parent/child relations). It allows you to more or less repeat the query in front of it. And you've got special fields, like level that allows you to check how deeply the recursion went.
In this case, I use it to split the string to characters and return a row for each character. Using level, I can repeat the query and get a character until the end of the string is reached.
Then it is just a matter of returning the pos for all rows containing the character 'R'

To take up a_horse_with_no_name's challenge here is another answer with a pipelined table function.
A pipelined function returns an array, which you can query normally. I would expect that over strings with large numbers of matches this will perform better than the recursive query but as with everything test yourself first.
create type num_array as table of number
/
create function string_indexes (
PSource_String in varchar2
, PSearch_String in varchar2
) return num_array pipelined is
begin
for i in 1 .. length(PSource_String) loop
if substr(PSource_String, i, 1) = PSearch_String then
pipe row(i);
end if;
end loop;
return;
end;
/
Then in order to access it:
select *
from table(string_indexes('SSSRNNSRSSR','R'))
SQL Fiddle

Related

Can we sort the characters of a string in SQL oracle? [duplicate]

I'm looking for a function that would sort chars in varchar2 alphabetically.
Is there something built-in into oracle that I can use or I need to create custom in PL/SQL ?

From an answer at http://forums.oracle.com/forums/thread.jspa?messageID=1791550 this might work, but don't have 10g to test on...
SELECT MIN(permutations)
FROM (SELECT REPLACE (SYS_CONNECT_BY_PATH (n, ','), ',') permutations
FROM (SELECT LEVEL l, SUBSTR ('&col', LEVEL, 1) n
FROM DUAL
CONNECT BY LEVEL <= LENGTH ('&col')) yourtable
CONNECT BY NOCYCLE l != PRIOR l)
WHERE LENGTH (permutations) = LENGTH ('&col')
In the example col is defined in SQL*Plus, but if you make this a function you can pass it in, or could rework it to take a table column directly I suppose.
I'd take that as a start point rather than a solution; the original question was about anagrams so it's designed to find all permutations, so something similar but simplified might be possible. I suspect this doesn't scale very well for large values.

So eventually I went PL/SQL route, because after searching for some time I realized that there is no build-in function that I can use.
Here is what I came up with. Its based on the future of associative array which is that Oracle keeps the keys in sorted order.
create or replace function sort_chars(p_string in varchar2) return varchar deterministic
as
rv varchar2(4000);
ch varchar2(1);
type vcArray is table of varchar(4000) index by varchar2(1);
sorted vcArray;
key varchar2(1);
begin
for i in 1 .. length(p_string)
loop
ch := substr(p_string, i, 1);
if (sorted.exists(ch))
then
sorted(ch) := sorted(ch) || ch;
else
sorted(ch) := ch;
end if;
end loop;
rv := '';
key := sorted.FIRST;
WHILE key IS NOT NULL LOOP
rv := rv || sorted(key);
key := sorted.NEXT(key);
END LOOP;
return rv;
end;
Simple performance test:
set timing on;
create table test_sort_fn as
select t1.object_name || rownum as test from user_objects t1, user_objects t2;
select count(distinct test) from test_sort_fn;
select count (*) from (select sort_chars(test) from test_sort_fn);
Table created.
Elapsed: 00:00:01.32
COUNT(DISTINCTTEST)
-------------------
384400
1 row selected.
Elapsed: 00:00:00.57
COUNT(*)
----------
384400
1 row selected.
Elapsed: 00:00:00.06

You could use the following query:
select listagg(letter)
within group (order by UPPER(letter), ASCII(letter) DESC)
from
(
select regexp_substr('gfedcbaGFEDCBA', '.', level) as letter from dual
connect by regexp_substr('gfedcbaGFEDCBA', '.', level) is not null
);
The subquery splits the string into records (single character each) using regexp_substr, and the outer query merges the records into one string using listagg, after sorting them.
Here you should be careful, because alphabetical sorting depends on your database configuration, as Cine pointed.
In the above example the letters are sorted ascending "alphabetically" and descending by ascii code, which - in my case - results in "aAbBcCdDeEfFgG".
The result in your case may be different.
You may also sort the letters using nlssort - it would give you better control of the sorting order, as you would get independent of your database configuration.
select listagg(letter)
within group (order by nlssort(letter, 'nls_sort=german')
from
(
select regexp_substr('gfedcbaGFEDCBA', '.', level) as letter from dual
connect by regexp_substr('gfedcbaGFEDCBA', '.', level) is not null
);
The query above would give you also "aAbBcCdDeEfFgG", but if you changed "german" to "spanish", you would get "AaBbCcDdEeFfGg" instead.

You should remember that there is no common agreement what "alphabetically" means. It all depends on which country it is, and who is looking at your data and what context it is in.
For instance in DK, there are a large number of different sortings of a,aa,b,c,æ,ø,å
per the alphabet: a,aa,b,c,æ,ø,å
for some dictionary: a,aa,å,b,c,æ,ø
for other dictionaries: a,b,c,æ,ø,aa,å
per Microsoft standard: a,b,c,æ,ø,aa,å
check out http://www.siao2.com/2006/04/27/584439.aspx for more info. Which also happens to be a great blog for issues as these.

Assuming you don't mind having the characters returned 1 per row:
select substr(str, r, 1) X from (
select 'CAB' str,
rownum r
from dual connect by level <= 4000
) where r <= length(str) order by X;
X
=
A
B
C

For people using Oracle 10g, select listagg within group won't work. The accepted answer does work, but it generates every possible permutation of the input string, which results in terrible performance - my Oracle database struggles with an input string only 10 characters long.
Here is another alternative working for Oracle 10g. It's similar to Jeffrey Kemp's answer, only the result isn't splitted into rows:
select replace(wm_concat(ch), ',', '') from (
select substr('CAB', level, 1) ch from dual
connect by level <= length('CAB')
order by ch
);
-- output: 'ABC'
wm_concat simply concatenates records from different rows into a single string, using commas as separator (that's why we are also doing a replace later).
Please note that, if your input string had commas, they will be lost. Also, wm_concat is an undocumented feature, and according to this answer it has been removed in Oracle 12c. Use it only if you're stuck with 10g and don't have a better option (such as listagg, if you can use 11g instead).

From Oracle 12, you can use:
SELECT *
FROM table_name t
CROSS JOIN LATERAL (
SELECT LISTAGG(SUBSTR(t.value, LEVEL, 1), NULL) WITHIN GROUP (
ORDER BY SUBSTR(t.value, LEVEL, 1)
) AS ordered_value
FROM DUAL
CONNECT BY LEVEL <= LENGTH(t.value)
)
Which, for the sample data:
CREATE TABLE table_name (value) AS
SELECT 'ZYX' FROM DUAL UNION ALL
SELECT 'HELLO world' FROM DUAL UNION ALL
SELECT 'aæøåbcæøåaæøå' FROM DUAL;
Outputs:
VALUE
ORDERED_VALUE
ZYX
XYZ
HELLO world
EHLLOdlorw
aæøåbcæøåaæøå
aabcåååæææøøø
If you want to change how the letters are sorted then use NLSSORT:
SELECT *
FROM table_name t
CROSS JOIN LATERAL (
SELECT LISTAGG(SUBSTR(t.value, LEVEL, 1), NULL) WITHIN GROUP (
ORDER BY NLSSORT(SUBSTR(t.value, LEVEL, 1), 'NLS_SORT = BINARY_AI')
) AS ordered_value
FROM DUAL
CONNECT BY LEVEL <= LENGTH(t.value)
)
Outputs:
VALUE
ORDERED_VALUE
ZYX
XYZ
HELLO world
dEHLLlOorw
aæøåbcæøåaæøå
aaåååbcøøøæææ
db<>fiddle here

Oracle SQL Developer Query on recommended password setup

Setting up a random password for user using
select
dbms_random.string('L',2) || dbms_random.string('X',6) || '1!' as deflvrpwd,
'${access_request_cri_acc_cas9}' as ACNTDN
from dual
New requirement
New Hire Details:
Name :John Doe
Region: America
WDID : 876214
WDID Reverse and split
Region in the middle with the letter A replaced with # symbol
Should read if we follow your formula.
= 412#meric#s678
Please suggest attribute are same as mentioned.
Thank You

Here's one option; read comments within code.
SQL> WITH
2 -- sample data
3 test (name, region, wdid)
4 AS
5 (SELECT 'John Doe', 'America', '876214' FROM DUAL),
6 temp
7 AS
8 -- reverse WDID; don't use undocumented REVERSE function
9 -- replace "A" (or "a") with "#" in REGION
10 ( SELECT name,
11 REPLACE (REPLACE (region, 'A', '#'), 'a', '#') new_region,
12 LISTAGG (letter, '') WITHIN GROUP (ORDER BY lvl DESC) new_wdid
13 FROM ( SELECT SUBSTR (wdid, LEVEL, 1) letter,
14 LEVEL lvl,
15 name,
16 region
17 FROM test
18 CONNECT BY LEVEL <= LENGTH (wdid))
19 GROUP BY name, region)
20 -- finally
21 SELECT SUBSTR (new_wdid, 1, 3) || new_region || SUBSTR (new_wdid, 4) AS result
22 FROM temp;
RESULT
--------------------------------------------------------------------------------
412#meric#678
SQL>
I don't know where s in your result comes from (this: 412#meric#s678).

There's a small cost in context switching between SQL and PL/SQL, but this doesn't sound like a high-volume or performance-critical thing, so you might find it cleaner to put the logic in a function:
create or replace function get_password (p_wdid varchar2, p_region varchar2)
return varchar2 as
l_split pls_integer;
l_password varchar2(30);
begin
-- split WDID halfway, but allow for odd lengths
l_split := floor(length(p_wdid)/2);
-- iterate over the WDID in reverse
for i in reverse 1..length(p_wdid) LOOP
-- when we reach the split point, append the modified region
if i = l_split then
l_password := l_password || translate(p_region, 'Aax', '##x');
end if;
-- append each WDID character, in reverse order
l_password := l_password || substr(p_wdid, i, 1);
end loop;
return l_password;
end get_password;
/
The WDID is reversed in a loop, and the modified region is included at the midway point, based on the length of the WDID value.
You can then do:
select get_password('876214', 'America') from dual;
GET_PASSWORD('876214','AMERICA')
--------------------------------
412#meric#678
This also doesn't have the unexplained 's' from the example in your question.
If you can't create a function but are on a recent version of Oracle then you can define an ad hoc function in a CTE:
with
function invert (p_input varchar2) return varchar2 as
l_output varchar2(30);
begin
for i in reverse 1..length(p_input) LOOP
l_output := l_output || substr(p_input, i, 1);
end loop;
return l_output;
end invert;
t (wdid, region) as (
select invert('876214'), translate('America', 'Aax', '##x')
from dual
)
select substr(wdid, 1, floor(length(wdid)/2))
|| region
|| substr(wdid, floor(length(wdid)/2) + 1)
from t;
which gets the same result. (I've called the function invert to avoid confusion with the undocumented reverse function.)
db<>fiddle showing both.

Printing the position of the bigger letter in a list of characters

I am having some logic thinking trouble with this task.
So the task asks to return the position of the first bigger letter in a list of letters.
For example:
ABVD -> 3
BCDG -> 4
CFDE -> 2
This tasks suggests to use lenght, ascii, and named block, function
So this is what I could do so far:
declare
x varchar2(10) :='ABFD';
BEGIN
FOR i in 1..length(x) LOOP
dbms_output.put_line(ASCII(SUBSTR(x, i, 1)));
END LOOP;
END;
My thought was to turn the letters to numbers : 65, 66, 70, 68. The pattern is x + 1 and since the number 70 is not equal 66 + 1, so the program will return the position of that number, which is 3.
Unfortunately I don't know how turn this idea into code. Can you give me some hints/suggestions? Thanks!

In the problem statement you said "... use named block, function."
Your solution is an anonymous procedure. It is not named anywhere (which is why it is called "anonymous"). And it is not a function - it doesn't return anything.
I will let you study the documentation to understand the difference between function and procedure, and how to name a function or procedure. Below I will follow your lead and show how you can modify your code to make it into a workable anonymous procedure. (In the procedure I "print" the final value of ind; when you change this to a function, you should return that value, instead of printing it.)
In the code you posted, you are printing the letters in the input string, one by one. You are not even attempting to define or assign to an integer (the index of the first occurrence of the "highest" letter in the string). That should be done in the DECLARE block. Then we also need to store the highest letter found "so far" (for future comparisons).
The code might look like this:
declare
x varchar2(10) :='ABFD';
ind number := 1;
max_letter char(1) := substr(x, 1, 1);
BEGIN
FOR i in 2..length(x) LOOP
if substr(x, i, 1) > max_letter
then max_letter := substr(x, i, 1);
ind := i;
end if;
END LOOP;
dbms_output.put_line(ind);
END;
/
Note that letters can be compared to each other directly, there is no reason to convert them to numbers.

Pure SQL using model clause
with t(str) as
(select 'ABVD' from dual
union all select 'BCDG' from dual
union all select 'CFDE' from dual)
select str, instr(str, max_chr) ind
from t
model
partition by (rownum rn)
dimension by (1 dummy)
measures (str, chr(1) max_chr)
rules
iterate (4e3) until (substr(str[1], iteration_number + 2, 1) is null)
(max_chr[1] = greatest(max_chr[1], substr(str[1], iteration_number + 1, 1)));
STR IND
---- ----------
ABVD 3
BCDG 4
CFDE 2

License check with REGEXP in PL-SQL Oracle Apex

create or replace trigger "KENTEKEN_CHECK"
after insert or update of kenteken
on auto
for each row
declare
kenteken varchar2;
teller number := 0;
tellerletter number := 0;
tellercijfer number := 0;
begin
kenteken := lower(:NEW.kenteken);
loop
if substr(kenteken, teller, 1) = REGEXP ("[eoiau]") then
raise_application_error (-20502, 'Kenteken kan geen klinkers bevatten.');
elsif substr(kenteken, teller, 1) = REGEXP ("[0987654321]") then
tellercijfer := tellercijfer + 1;
elsif substr(kenteken, teller, 1) = REGEXP ("[qwrtypsdfghjklzxcvbnm]") then
tellerletter := tellerletter + 1;
else raise_application_error (-20502, 'Er is een ongeldig kenteken ingevoerd.');
end if;
teller := teller + 1;
exit when teller = 5;
end loop;
end;
I need to check a license plate (has six characters). It needs at least 2 letters and 2 numbers and the e, a, o, u and i are not allowed. How should I use the REGEXP the right way to check this?
Teller stands for counter
Tellernumber stands for a counter for the numbers
Tellerletter stands for a counter for the letters
Yes, I'm a starter with this so don't blame my style of coding... Never worked with REGEXP before so don't know how to use it.
Hope it's clear.

This could be an approach:
with test(s) as (
select 'axx11a' from dual union all
select 'xxxx1x' from dual union all
select '111111' from dual union all
select 'xx1111' from dual union all
select 'x1x1x1' from dual
)
select s
from test
where regexp_count(s, '[aeiou]') = 0
and regexp_count(s, '[qwrtypsdfghjklzxcvbnm]') >= 2
and regexp_count(s, '[0-9]') >= 2
The regexp_count counts the number of occurrences of the regular expression in the string; you may use it both to check that you have at least 2 characters of a given set and to check that you do not have unwanted characters.
You can rewrite this in different ways, I used the regexp_count to check all the cases to make it clearer.

You can do this with standard string functions (instead of regexp), which should result in improved performance - if that is a consideration.
To test whether a character is present, you can use TRANSLATE to remove that character from the string, and then compare the length of the resulting string to the length of the original. You need a small trick - TRANSLATE will remove (delete) the characters from the FROM list that do not have a correspondent in the TO list, but you can't have an empty TO list (if you do, the result will be the NULL string).
So, something like this:
with test( s ) as (
select 'axx11a' from dual union all
select 'xxxx1x' from dual union all
select '111111' from dual union all
select 'xx1111' from dual union all
select 'x1x1x1' from dual
)
-- end of test data; solution (SQL query) begins below this line
select s
from test
where length(translate(s, '~aeiou' , '~')) = length(s)
and length(translate(s, '~0123456789' , '~')) <= length(s) - 2
and length(translate(s, '~qwrtypsdfghjklzxcvbnm', '~')) <= length(s) - 2
;
S
------
xx1111
x1x1x1
In your procedure you can use these tests separately, in your IF... clauses.

It's important to realize that Oracle supports POSIX Extended Regular Expressions, which is more restricted than PCRE Regexes that you find most places.
Because POSIX ERE does not support so-called lookaheads, I would use these regexes:
.*[A-Z].*[A-Z]
To check if there are two letters (anywhere in the license plate).
.*[0-9].*[0-9]
To check if there are two numbers.
^[A-Z0-9]{6}$
To check if there are vowels I would use NOT REGEX_LIKE() with this regex:
[AEIOU]
To check whether the license plate consists of 6 numbers or letters from the start (^) to the end ($) of the string. This will fail if there are any other characters, or a different number of characters other than exactly 6.

([qwrtypsdfghjklzxcvbnm]{2}\d{2}) ==> dd02
(\d{2}[qwrtypsdfghjklzxcvbnm]{2}) ==> 02dd
([qwrtypsdfghjklzxcvbnm]{2}-\d{2}) ==> dd-02
I think that this will do your job.
Succes!!
Groet

Performance of regexp_replace vs translate in Oracle?

For simple things is it better to use the translate function on the premise that it is less CPU intensive or is regexp_replace the way to go?
This question comes forth from How can I replace brackets to hyphens within Oracle REGEXP_REPLACE function?

I think you're running into simple optimization. The regexp expression is so expensive to compute that the result is cached in the hope that it will be used again in the future. If you actually use distinct strings to convert, you will see that the modest translate is naturally faster because it is its specialized function.
Here's my example, running on 11.1.0.7.0:
SQL> DECLARE
2 TYPE t IS TABLE OF VARCHAR2(4000);
3 l t;
4 l_level NUMBER := 1000;
5 l_time TIMESTAMP;
6 l_char VARCHAR2(4000);
7 BEGIN
8 -- init
9 EXECUTE IMMEDIATE 'ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL=2';
10 SELECT dbms_random.STRING('p', 2000)
11 BULK COLLECT
12 INTO l FROM dual
13 CONNECT BY LEVEL <= l_level;
14 -- regex
15 l_time := systimestamp;
16 FOR i IN 1 .. l.count LOOP
17 l_char := regexp_replace(l(i), '[]()[]', '-', 1, 0);
18 END LOOP;
19 dbms_output.put_line('regex :' || (systimestamp - l_time));
20 -- tranlate
21 l_time := systimestamp;
22 FOR i IN 1 .. l.count LOOP
23 l_char := translate(l(i), '()[]', '----');
24 END LOOP;
25 dbms_output.put_line('translate :' || (systimestamp - l_time));
26 END;
27 /
regex :+000000000 00:00:00.979305000
translate :+000000000 00:00:00.238773000
PL/SQL procedure successfully completed
on 11.2.0.3.0 :
regex :+000000000 00:00:00.617290000
translate :+000000000 00:00:00.138205000
Conclusion: In general I suspect translate will win.

For SQL, I tested this with the following script:
set timing on
select sum(length(x)) from (
select translate('(<FIO>)', '()[]', '----') x
from (
select *
from dual
connect by level <= 2000000
)
);
select sum(length(x)) from (
select regexp_replace('[(<FIO>)]', '[\(\)\[]|\]', '-', 1, 0) x
from (
select *
from dual
connect by level <= 2000000
)
);
and found that the performance of translate and regexp_replace were almost always the same, but it could be that the cost of the other operations is overwhelming the cost of the functions I'm trying to test.
Next, I tried a PL/SQL version:
set timing on
declare
x varchar2(100);
begin
for i in 1..2500000 loop
x := translate('(<FIO>)', '()[]', '----');
end loop;
end;
/
declare
x varchar2(100);
begin
for i in 1..2500000 loop
x := regexp_replace('[(<FIO>)]', '[\(\)\[]|\]', '-', 1, 0);
end loop;
end;
/
Here the translate version takes just under 10 seconds, while the regexp_replace version around 0.2 seconds -- around 2 orders of magnitude faster(!)
Based on this result, I will be using regular expressions much more often in my performance critical code -- both SQL and PL/SQL.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Oracle query to find all occurrences of a charcter in a string - sql

Related

Can we sort the characters of a string in SQL oracle? [duplicate]

Oracle SQL Developer Query on recommended password setup

Printing the position of the bigger letter in a list of characters

License check with REGEXP in PL-SQL Oracle Apex

Performance of regexp_replace vs translate in Oracle?

Categories

Resources