Escaping characters for use with Oracle's Xmltable - sql

I'm using Xmltable to convert a field of comma-delimited email addresses to a table of values.
WITH
data AS
(
select 1 ID, 'foo&bar#domain.tld,bar#domain.tld' recipients from dual
)
select ID, trim(COLUMN_VALUE) recipient
from data,xmltable(('"'|| REPLACE( recipients , ',', '","') || '"'))
produces an error:
[72000][19112] ORA-19112: error raised during evaluation: XVM-01003:
[XPST0003] Syntax error at '"foo' 1
"foo&bar#domain.tld","bar#domain.tld" - ^
However, when I replace the & with its entity value (&):
WITH
DATA AS
(
select 1 ID, 'foo&bar#domain.tld,bar#domain.tld' recipients from dual
)
select ID
-- & --> &
, replace( trim(COLUMN_VALUE), '&', '&') recipient
from data
-- & --> &
,xmltable(('"'|| REPLACE( replace( recipients, '&','&') , ',', '","') || '"'))
the query works:
ID,RECIPIENT
1,foo&bar#domain.tld
1,bar#domain.tld
I'm imaging that there might be other characters that are valid in an email address, but will be problematic for Xmltable.
Is there a better way to do this?

You could use the built-in dbms_xmlgen.convert() function:
with data (id, recipients) as
(
select 1, 'foo&bar#domain.tld,bar#domain.tld' from dual
)
select d.id, dbms_xmlgen.convert(x.column_value.getstringval(), 1) as recipient
from data d
cross join
xmltable(('"' || replace(dbms_xmlgen.convert(d.recipients, 0), ',', '","') || '"')) x
ID RECIPIENT
---------- ------------------------------
1 foo&bar#domain.tld
1 bar#domain.tld
The inner call dbms_xmlgen.convert(d.recipients, 0) gives you
foo&bar#domain.tld,bar#domain.tld
After that has been modified to have double quotes around each comma-separated value and been split into multiple rows, you end up with column_value as:
foo&bar#domain.tld
bar#domain.tld
so the outer dbms_xmlgen.convert(x.column_value.getstringval(), 1) converts any encoded entities back to their plain versions.
If you do this in a PL/SQL context then you can use dbms_xmlgen.entity_encode and dbms_xmlgen.entity_decode instead of the fixed 0 and 1, but those aren't available in plain SQL.
(There are only five entities to worry about anyway, but you still might as well allow for them all - whether they are valid in email addresses or not - and using a function call like this is maybe going to be less confusing to future maintainers...)

Related

Scalar function throws error while using in SQL

My question is:
Write a query to display user name and password. Password should be generated by concatenating first two characters of user name , length of the user name and last three numbers in the phone number and give an alias name as USER_PASSWORD. Sort the results based on the user name in descending order.
select
name,
concat(substring(name, 1, 2), cast(len(name) as varchar), cast(right(phno, 3) as varchar)) as USER_PASSWORD
from
users
order by
name desc;
I get this error:
cast(len(name) as varchar),
ERROR at line 5: ORA-00906: missing left parenthesis
Thanks
You have five issues:
CONCAT only takes two arguments so you either need CONCAT(a, CONCAT(b, c)) or use the || string concatenation operator a || b || c
CAST requires the data type and length CAST(a AS VARCHAR2(10))
SUBSTRING is not an Oracle function, you want SUBSTR;
LEN is not an Oracle function, you want LENGTH;
RIGHT is not an Oracle function, your want SUBSTR with a negative index.
SELECT name,
concat(
substr(name, 1, 2),
concat(
cast(length(name) as varchar2(10)),
cast(SUBSTR(phno, -3) as varchar2(10))
)
) as USER_PASSWORD
from users
order by name desc;
However, you do not need to explicitly use CAST as you can use an implicit conversion between data types:
SELECT name,
substr(name, 1, 2) || length(name) || SUBSTR(phno, -3) as USER_PASSWORD
from users
order by name desc;
Which, for the sample data:
CREATE TABLE users (name, phno) AS
SELECT 'Benny', '0123111' FROM DUAL UNION ALL
SELECT 'Betty', '4567111' FROM DUAL UNION ALL
SELECT 'Beryl', '2222111' FROM DUAL;
Both output:
NAME
USER_PASSWORD
Betty
Be5111
Beryl
Be5111
Benny
Be5111
fiddle
Which leads to the final point, don't generate obvious passwords; generate random or pseudo-random passwords. Then don't store them as plain text; instead store them as a salted-hash.
Concat() is limited to two arguments in Oracle. Use || instead.
with my_data as (
select 'abcdefg' as name, 12345 as phno from dual
)
select
name,
substr(name, 1, 2) ||
length(name) ||
substr(to_char(phno),-3) as user_password
from my_data
| NAME | USER_PASSWORD |
| --------|---------------|
| abcdefg | ab7345 |
fiddle

BigQuery - concatenate ignoring NULL

I'm very new to SQL. I understand in MySQL there's the CONCAT_WS function, but BigQuery doesn't recognise this.
I have a bunch of twenty fields I need to CONCAT into one comma-separated string, but some are NULL, and if one is NULL then the whole result will be NULL. Here's what I have so far:
CONCAT(m.track1, ", ", m.track2))) As Tracks,
I tried this but it returns NULL too:
CONCAT(m.track1, IFNULL(m.track2,CONCAT(", ", m.track2))) As Tracks,
Super grateful for any advice, thank you in advance.
Unfortunately, BigQuery doesn't support concat_ws(). So, one method is string_agg():
select t.*,
(select string_agg(track, ',')
from (select t.track1 as track union all select t.track2) x
) x
from t;
Actually a simpler method uses arrays:
select t.*,
array_to_string([track1, track2], ',')
Arrays with NULL values are not supported in result sets, but they can be used for intermediate results.
I have a bunch of twenty fields I need to CONCAT into one comma-separated string
Assuming that these are the only fields in the table - you can use below approach - generic enough to handle any number of columns and their names w/o explicit enumeration
select
(select string_agg(col, ', ' order by offset)
from unnest(split(trim(format('%t', (select as struct t.*)), '()'), ', ')) col with offset
where not upper(col) = 'NULL'
) as Tracks
from `project.dataset.table` t
Below is oversimplified dummy example to try, test the approach
#standardSQL
with `project.dataset.table` as (
select 1 track1, 2 track2, 3 track3, 4 track4 union all
select 5, null, 7, 8
)
select
(select string_agg(col, ', ' order by offset)
from unnest(split(trim(format('%t', (select as struct t.*)), '()'), ', ')) col with offset
where not upper(col) = 'NULL'
) as Tracks
from `project.dataset.table` t
with output

Find words in a dictionary db by given set of letters in SQL

Let's think we have a table of meaningful words (like a dictionary), and we have a "set of letters" as an input, and we want to find all the meaningful words found in our source table that are made with the given set of letters, considering that not all the given letters need to be used in the result, but all the letters of the result need to be part of given set of letters.
Dictionary
----------
in
ink
inn
inbox
Input: unmin
Result
------
in
inn
You can break the string into its letters using a recursive CTE and then make sure that the dictionary only has those letters:
with cte as (
select left(#input, 1) as letter, stuff(#input, 1, 1, '') as rest
union all
select left(rest, 1), stuff(rest, 1, 1, '')
from cte
where rest <> ''
)
select d.word
from dictionary d join
cte
on d.word like '%' + cte.letter + '%'
group by d.word
having count(*) = len(d.word)
Here is a db<>fiddle.

Using Oracle REGEXP_SUBSTR to extract uppercase data separated by underscores

sample column data:
Failure on table TOLL_USR_TRXN_HISTORY:
Failure on table DOCUMENT_IMAGES:
Error in CREATE_ACC_STATEMENT() [line 16]
I am looking for a way to extract only the uppercase words (table names) separated by underscores. I want the whole table name, the maximum is 3 underscores and the minimum is 1 underscore. I would like to ignore any capital letters that are initcap.
You can just use regexp_substr():
select regexp_substr(str, '[A-Z_]{3,}', 1, 1, 'c')
from (select 'Failure on table TOLL_USR_TRXN_HISTORY' as str from dual) x;
The pattern says to find substrings with capital letters or underscores, at least 3 characters long. The 1, 1 means start from the first position and return the first match. The 'c' makes the search case-sensitive.
You may use such a SQL Select statement for each substituted individual line
( Failure on table TOLL_USR_TRXN_HISTORY in the below case )
from your text :
select regexp_replace(q.word, '[^a-zA-Z0-9_]+', '') as word
from
(
select substr(str,nvl(lag(spc) over (order by lvl),1)+1*sign(lvl-1),
abs(decode(spc,0,length(str),spc)-nvl(lag(spc) over (order by lvl),1))) word,
nvl(lag(spc) over (order by lvl),1) lg
from
(
with tab as
( select 'Failure on table TOLL_USR_TRXN_HISTORY' str from dual )
select instr(str,' ',1,level) spc, str, level lvl
from tab
connect by level <= 10
)
) q
where lg > 0
and upper(regexp_replace(q.word, '[^a-zA-Z0-9_]+', ''))
= regexp_replace(q.word, '[^a-zA-Z0-9_]+', '')
and ( nvl(length(regexp_substr(q.word,'_',1,1)),0)
+ nvl(length(regexp_substr(q.word,'_',1,2)),0)
+ nvl(length(regexp_substr(q.word,'_',1,3)),0)) > 0
and nvl(length(regexp_substr(q.word,'_',1,4)),0) = 0;
Alternate way to get only table name from below error message , the below query will work only if table_name at end in the mentioned way
with t as( select 'Failure on table TOLL_USR_TRXN_HISTORY:' as data from dual)
SELECT RTRIM(substr(data,instr(data,' ',-1)+1),':') from t
New Query for all messages :
select replace (replace ( 'Failure on table TOLL_USR_TRXN_HISTORY:
Failure on table DOCUMENT_IMAGES:' , 'Failure on table', ' ' ),':',' ') from dual

REGEXP_REPLACE to replace emails in a list except a specific domain

I am novice to regular expressions. I am trying to remove emails from a list which do not belong to a specific domain.
for e.g. I have a below list of emails:
John#yahoo.co.in , Jacob#gmail.com, Bob#rediff.com,
Lisa#abc.com, sam#gmail.com , rita#yahoo.com
I need to get only the gmail ids:
Jacob#gmail.com, sam#gmail.com
Please note we may have spaces before the comma delimiters.
Appreciate any help!
This could be a start for you.
SELECT *
FROM ( SELECT REGEXP_SUBSTR (str,
'[[:alnum:]\.\+]+#gmail.com',
1,
LEVEL)
AS SUBSTR
FROM (SELECT ' John#yahoo.co.in , Jacob.foo#gmail.com, Bob#rediff.com,Lisa#abc.com, sam#gmail.com , sam.bar+stackoverflow#gmail.com, rita#yahoo.com, foobar '
AS str
FROM DUAL)
CONNECT BY LEVEL <= LENGTH (REGEXP_REPLACE (str, '[^,]+')) + 1)
WHERE SUBSTR IS NOT NULL ;
Put in a few more examples, but an email checker should comply to the respective RFCs, look at wikipedia for further knowledge about them https://en.wikipedia.org/wiki/Email_address
Inspiration from https://stackoverflow.com/a/17597049/869069
Rather than suppress the emails not matching a particular domain (in your example, gmail.com), you might try getting only those emails that match the domain:
WITH a1 AS (
SELECT 'John#yahoo.co.in , Jacob#gmail.com, Bob#rediff.com,Lisa#abc.com, sam#gmail.com , rita#yahoo.com' AS email_list FROM dual
)
SELECT LISTAGG(TRIM(email), ',') WITHIN GROUP ( ORDER BY priority )
FROM (
SELECT REGEXP_SUBSTR(email_list, '[^,]+#gmail.com', 1, LEVEL, 'i') AS email
, LEVEL AS priority
FROM a1
CONNECT BY LEVEL <= REGEXP_COUNT(email_list, '[^,]+#gmail.com', 1, 'i')
);
That said, Oracle is probably not the best tool for this (do you have these email addresses stored as a list in a table somewhere? If so then #GordonLinoff's comment is apt - fix your data model if you can).
Here's a method using a CTE just for a different take on the problem. First step is to make a CTE "table" that contains the parsed list elements. Then select from that. The CTE regex handles NULL list elements.
with main_tbl(email) as (
select ' John#yahoo.co.in , Jacob.foo#gmail.com, Bob#rediff.com,Lisa#abc.com, sam#gmail.com , sam.bar+stackoverflow#gmail.com, rita#yahoo.com, foobar '
from dual
),
email_list(email_addr) as (
select trim(regexp_substr(email, '(.*?)(,|$)', 1, level, NULL, 1))
from main_tbl
connect by level <= regexp_count(email, ',')+1
)
-- select * from email_list;
select LISTAGG(TRIM(email_addr), ', ') WITHIN GROUP ( ORDER BY email_addr )
from email_list
where lower(email_addr) like '%gmail.com';