Oracle regular expression match string from last occurence - sql

I'm still learning regexp in oracle and stuck with below error. Below is my sample code
SELECT DISTINCT COALESCE(TO_NUMBER(regexp_substr(USERNAME, '[^.]+', 1, 2)), ID) ID ,
COALESCE(regexp_substr(USERNAME, '[^.]+', 1, 1), USERNAME) AS USERNAME
FROM logs;
ORA-01722: invalid number
01722. 00000 - "invalid number"
*Cause: The specified number was invalid.
*Action: Specify a valid number.
Table Data
Username ID
Ravi.1234 1234
Krishna.12345 12345
Ravi.Krishna.1234567 1234567
R.Krishna.987 987
Ravi.K.567890 567890
R.Krish 123
Ravi 456
Expected Output
ID Username
1234 Ravi
12345 Krishna
1234567 Ravi.Krishna
987 R.Krishna
567890 Ravi.K
How to reframe the query to get the output needed. Can substr be used instead of regexp will it give desired output? This is used in oracle database not in sql. Thanks in advance.

If I understood your assignment correctly (see my comments under your question), here is how you can do this with standard string functions and conditions:
with
table_data (username, id) as (
select 'Ravi.1234' , '1234' from dual union all
select 'Krishna.12345' , '12345' from dual union all
select 'Ravi.Krishna.1234567', '1234567' from dual union all
select 'R.Krishna.987' , '987' from dual union all
select 'Ravi.K.567890' , '567890' from dual union all
select 'R.Krish' , '123' from dual union all
select 'Ravi' , '456' from dual
)
select id, substr(username, 1, instr(username, '.', -1) - 1) as username
from table_data
where username like '%.' || id
;
ID USERNAME
------- --------------------
1234 Ravi
12345 Krishna
1234567 Ravi.Krishna
987 R.Krishna
567890 Ravi.K
In the LIKE condition in the WHERE clause, % is a wildcard for "any string of any length, including zero"; that must be followed by a literal dot and then by the ID, and that must be the whole USERNAME string. In select, instr(username, '.', -1) finds the position of the "first" dot in username, but counting from the end and moving left - that is what the minus sign means.
With regular expression functions and conditions:
select id, regexp_substr(username, '^(.*)\.' || id || '$', 1, 1, null, 1) as username
from table_data
where regexp_like(username, '\.' || id || '$')
;
The sixth argument to regexp_substr means "the first substring enclosed in parentheses" (first "capture group" is the technical term).

I think REGEXP_REPLACE() would suit well for your case while filtering out the values having at least one digit. In the current case, you're trying to convert the second portions of the Username strings to number, but not all of them are numeric, the error raises due to this. Moreover, you can also extract the ID column from Username column. e.g. no need to hold seperate ID column within your original table.
Thus, consider using
SELECT TO_NUMBER( REGEXP_REPLACE(Username, '[^0-9]+') ) AS ID,
RTRIM( REGEXP_REPLACE(Username, '[^.]+$'),'.') AS "Username"
FROM logs
WHERE REGEXP_LIKE(Username,'[0-9]')
the following option would be an alternative to above one by using less Regular expression
SELECT TO_NUMBER( SUBSTR( Username, INSTR(Username, '.',-1)+1, LENGTH( Username ) )) AS ID,
SUBSTR( Username, 1, INSTR(Username, '.',-1)-1 ) AS "Username"
FROM logs
WHERE REGEXP_LIKE(Username,'[0-9]')
Demo

Related

Scalar function throws error while using in SQL

My question is:
Write a query to display user name and password. Password should be generated by concatenating first two characters of user name , length of the user name and last three numbers in the phone number and give an alias name as USER_PASSWORD. Sort the results based on the user name in descending order.
select
name,
concat(substring(name, 1, 2), cast(len(name) as varchar), cast(right(phno, 3) as varchar)) as USER_PASSWORD
from
users
order by
name desc;
I get this error:
cast(len(name) as varchar),
ERROR at line 5: ORA-00906: missing left parenthesis
Thanks
You have five issues:
CONCAT only takes two arguments so you either need CONCAT(a, CONCAT(b, c)) or use the || string concatenation operator a || b || c
CAST requires the data type and length CAST(a AS VARCHAR2(10))
SUBSTRING is not an Oracle function, you want SUBSTR;
LEN is not an Oracle function, you want LENGTH;
RIGHT is not an Oracle function, your want SUBSTR with a negative index.
SELECT name,
concat(
substr(name, 1, 2),
concat(
cast(length(name) as varchar2(10)),
cast(SUBSTR(phno, -3) as varchar2(10))
)
) as USER_PASSWORD
from users
order by name desc;
However, you do not need to explicitly use CAST as you can use an implicit conversion between data types:
SELECT name,
substr(name, 1, 2) || length(name) || SUBSTR(phno, -3) as USER_PASSWORD
from users
order by name desc;
Which, for the sample data:
CREATE TABLE users (name, phno) AS
SELECT 'Benny', '0123111' FROM DUAL UNION ALL
SELECT 'Betty', '4567111' FROM DUAL UNION ALL
SELECT 'Beryl', '2222111' FROM DUAL;
Both output:
NAME
USER_PASSWORD
Betty
Be5111
Beryl
Be5111
Benny
Be5111
fiddle
Which leads to the final point, don't generate obvious passwords; generate random or pseudo-random passwords. Then don't store them as plain text; instead store them as a salted-hash.
Concat() is limited to two arguments in Oracle. Use || instead.
with my_data as (
select 'abcdefg' as name, 12345 as phno from dual
)
select
name,
substr(name, 1, 2) ||
length(name) ||
substr(to_char(phno),-3) as user_password
from my_data
| NAME | USER_PASSWORD |
| --------|---------------|
| abcdefg | ab7345 |
fiddle

SQL: using regexp_substr ot regexp_extract, looking for the regex pattern that will only return the string between one character and a space

The row I am trying to parse from is a series of string values separated only by spaces. Sample below:
TX:123 SP:XapZNsyeS INST:456123
I need to use either regexp_substr or regexp_extract to return only values for the string that appears after "TX:" or "SP:", etc. So essentially an expression that only captures the string after a string (e.g. "TX:") and before a space (" ").
Here's one way to split on 2 delimiters. This works on Oracle 12c as you included the Oracle regexp-substr tag. Using a with statement, first set up the original data, then split on a space or the end of the line, then break into name-value pairs.
WITH tbl_original_data(ID, str) AS (
SELECT 1, 'TX:123 SP:XapZNsyeS INST:456123' FROM dual UNION ALL
SELECT 2, 'MI:321 SP:MfeKLgkrJ INST:654321' FROM dual
),
tbl_split_on_space(ID, ELEMENT) AS (
SELECT ID,
REGEXP_SUBSTR(str, '(.*?)( |$)', 1, LEVEL, NULL, 1)
FROM tbl_original_data
CONNECT BY REGEXP_SUBSTR(str, '(.*?)( |$)', 1, LEVEL) IS NOT NULL
AND PRIOR ID = ID
AND PRIOR SYS_GUID() IS NOT NULL
)
--SELECT * FROM tbl_split_on_space;
SELECT ID,
REGEXP_REPLACE(ELEMENT, '^(.*):.*', '\1') NAME,
REGEXP_REPLACE(ELEMENT, '.*:(.*)$', '\1') VALUE
FROM tbl_split_on_space;
ID NAME VALUE
---------- ---------- ----------
1 TX 123
1 SP XapZNsyeS
1 INST 456123
2 MI 321
2 SP MfeKLgkrJ
2 INST 654321
6 rows selected.
EDIT: Realizing this answer is a little more than was asked for, here's a simplified answer to return one element. Don't forget to allow for the ending of a space or the end of the line as well, in case you element is at the end of the line.
WITH tbl_original_data(ID, str) AS (
SELECT 1, 'TX:123 SP:XapZNsyeS INST:456123' FROM dual
)
SELECT REGEXP_SUBSTR(str, '.*?TX:(.*)( |$)', 1, 1, NULL, 1) TX_VALUE
FROM tbl_original_data;
TX_VALUE
--------
123
1 row selected.

Update ID value to format XXXXXXXX-X using oracle SQL

Table name: TEST
Column name: ID [VARCHAR(200)]
The format of ID is ‘XXXXXXXX-X’, where ‘X’ is a number from 0 to 9.
Additional operations in case above format is not satisfied:
if the ID consists of 9 digits and there is a double dash between eighth and ninth digit , the extra dash is removed (e.g. 08452142--6 -> 08452142-6)
if the ID consists of 9 digits and there is/are space(s) between eighth and ninth digit and/or non-digits and/or non-letter symbol(s) then replace them to dash (e.g. 08452142 - . 3 -> 08452142-3)
if the ID consists 9 digits and starts/ends with non-digits and/or non-letter symbol(s) then delete that symbol(s) up to digit (e.g. 08452142-2.. -> 08452142-2)
if the ID contains only 9 digits then put a dash before the last digit (e.g. 123456789 -> 12345678-9)
I have achieved the necessary format by using the below snippet.
UPDATE TEST
SET ID = (SELECT REGEXP_REPLACE(ID,'^\d{8}-\d{1}$','') AS "ID"
from TEST
WHERE PK = 11;
)
What are the possible ways to add transformations as mentioned in points[1-4] above in a single query?
Using REGEXP_REPLACE, I can achieve ID in above format. But in case format is incorrect, and ID needs to be transformed[like removing extra dash, or adding dash in case 9 digits are received] to achieve satisfactory format, how can that be achieved in a single UPDATE query?
In any case, you need to extract 9 digits from your string in the first step. And then
add a hyphen before the last character. For both steps use regexp_replace() function
with test(id) as
(
select '08452142--6' from dual union all
select '08452142 - . 3' from dual union all
select '08452142-2..' from dual union all
select '123456789' from dual union all
select '1234567890' from dual
)
select case when length(regexp_replace(id,'(\D)'))=9 then
regexp_replace(regexp_replace(id,'(\D)'),
'(^[[:digit:]]{8})(.*)([[:digit:]]{1}$)','\1-\3')
end as id
from test;
ID
----------
08452142-6
08452142-3
08452142-2
12345678-9
<null>
Demo
You can use the following I think:
UPDATE TEST
SET ID = REGEXP_REPLACE(ID,'^\D*(\d{8})\D*(\d)\D*$','\1-\2')
WHERE REGEXP_LIKE(ID,'^\D*(\d{8})\D*(\d)\D*$')
This way you ignore all non-digit charcters and search for a 8-digit number and then an 1-digit number. Take these 2 numbers and put a single '-' in between.
This is a little more generous as you might need but should work with all your provided examples.
I think you want the first 8 digits, then a hyphen, then the 9th digit:
select ( substr(regexp_replace(id, '[^0-9]', ''), 1, 8) ||
'-' ||
substr(regexp_replace(id, '[^0-9]', ''), 9, 1)
)
I tried an approach based on the suggestion by #BarbarosÖzhan:
with source as (
select '02426467--6' id from dual union all
select '02426467-6' id from dual union all
select '02597718 -- . 3' id from dual union all
select '02597718 --dF5 . 3' id from dual union all
select '00120792-2..' id from dual union all
select '..00120792-2..' id from dual union all
select '123456789' id from dual union all
select '1234567890' id from dual
)
select
case
when regexp_like(id, '\d{8}-\d{1}')
then id
else
case
when regexp_like(id, '\d{8}-\d{1}')
then id
else
case
when regexp_count(id, '\d') = 9
then
case
when
regexp_like(
regexp_replace(
regexp_replace(
id, '(\d{8}-)(-)(\d{1})', '\1\3'
), '(\d{8})([^A-Za-z1-9])(\d{1})', '\1-\3'
)
, '\d{8}-\d{1}')
then
regexp_replace(
regexp_replace(
id, '(\d{8}-)(-)(\d{1})', '\1\3'
), '(\d{8})([^A-Za-z1-9])(\d{1})', '\1-\3'
)
else id
end
else id
end
end id_tr
from source
However, in cases 3 and 4, I cannot get rid of the space, dot and alphabets. I think something wrong with the logic in case length is more than 9. I end with "id" as it is so the result is the same without any modifications.
Any suggestions to impprove this?

find invalid characters in string

I need a select statement that will show any invalid characters in Customer number field.
A vaild customer number starts with the captial letter N then 10 digits, can be zero to 9.
Something like,
SELECT (CustomerField, 'N[0-9](10)') <> ''
FROM CustomerTable;
Use regexp_like.
select customerfield
from CustomerTable
where not regexp_like(CustomerField, '^N[0-9]{10}$')
This will show the customerfield's that don't follow the pattern specified.
If you really need to find the invalid characters in the string (and not to just simply find the strings that are invalid) perhaps this more complex query will help. You didn't state in what format you may need the output, so I made up my own. I also created several strings for testing (in particular, it is always important to check that the NULL input is treated correctly).
The column len shows the length of the input, if it's not 11. The length of the empty string (null in Oracle) is shown as 0. The first-nondigit columns refer to characters starting at the SECOND position in the string (ignoring the first character, for which the rules are different and which is checked for validity separately).
with
inputs ( str ) as (
select 'N0123456789' from dual union all
select '' from dual union all
select '02324434323' from dual union all
select 'N02345678' from dual union all
select 'A2140480080' from dual union all
select 'N93049c4995' from dual union all
select 'N4448883333' from dual union all
select 'PAR3993949Z' from dual union all
select 'AN39E' from dual
)
-- end of test data; query begins below this line
select str,
case when regexp_like(str, '^N\d{10}$') then 'valid'
else 'invalid' end as classif,
case when length(str) != 11 then length(str)
when str is null then 0 end as len,
case when substr(str, 1, 1) != 'N'
then substr(str, 1, 1) end as first_char,
regexp_substr(str, '[^0-9]', 2) as first_nondigit,
nullif(regexp_instr( str, '[^0-9]', 2), 0) as first_nondigit_pos
from inputs
;
OUTPUT
STR CLASSIF LEN FIRST_CHAR FIRST_NONDIG FIRST_NONDIGIT_POS
----------- ------- ----- ---------- ------------ ------------------
N0123456789 valid
invalid 0
02324434323 invalid 0
N02345678 invalid 9
A2140480080 invalid A
N93049c4995 invalid c 7
N4448883333 valid
PAR3993949Z invalid P A 2
AN39E invalid 5 A N 2
9 rows selected.
\d stands for digit
Perl-influenced Extensions in Oracle Regular Expressions
The rest if the regular expression elements can be found here
Regular Expression Operator Multilingual Enhancements
select *
from CustomerTable
where not regexp_like (CustomerField,'^N\d{10}$')

substring, after last occurrence of character?

I need help with this problem:
I have a column named phone_number and I wanted to query this column to get the the string right of the last occurrence of '.' for all kinds of numbers in one single sql query.
example #:
515.123.1277
011.44.1345.629268
I need to get 1277 and 629268 respectively.
I have this so far:
select phone_number,
case when length(phone_number) <= 12
then
substr(phone_number,-4)
else
substr (phone_number, -6) end
from employees;
This works for this example, but I want it for all kinds of # formats.
Would be great to get some input.
Thanks
It should be as easy as this regex:
SELECT phone_number, REGEXP_SUBSTR(phone_number, '[^.]*$')
FROM employees;
With the end anchor $ it should get everything that is not a . character after the final .. If the last character is . then it will return NULL.
Search for a pattern including the period, [.] with digits, \d, followed by the end of the string, $.
Associate the digits with a character group by placing the pattern, \d, in parenthesis (see below). This is referenced with the subexpr parameter, 1 (last parameter).
Here is the solution:
SCOTT#dev> list
1 WITH t AS
2 ( SELECT '414.352.3100' p_number FROM dual
3 UNION ALL
4 SELECT '515.123.1277' FROM dual
5 UNION ALL
6 SELECT '011.44.1345.629268' FROM dual
7 )
8* SELECT regexp_substr(t.p_number, '[.](\d+)$', 1, 1, NULL, 1) end_num FROM t
SCOTT#dev> /
END_NUM
========================================================================
3100
1277
629268
You can do something like this in oracle:
select regexp_substr(num,'[^\.]+',1,regexp_count(num,'\.')+1) last_number from
(select '515.123.1277' num from dual union all
select '011.44.1345.629268' from dual );
Previous to 11gR2 you can use regexp_replace instead regexp_count:
select regexp_substr(num,'[^\.]+',1,length(regexp_replace (num , '[^\.]+'))+1) last_number from
(select '515.123.1277' num from dual union all
select '011.44.1345.629268' from dual );