How to include the code into a REPLACE function in oracle? - sql

User #psaraj12 helped me with a ticket here about finding ascii character in a string in my DB with the following code:
with test (col) as (
select
'L.A.D'
from
dual
union all
select
'L.􀈉.D.'
from
dual
)
select
col,
case when max(ascii_of_one_character) >= 65535 then 'NOT OK' else 'OK' end result
from
(
select
col,
substr(col, column_value, 1) one_character,
ascii(
substr(col, column_value, 1)
) ascii_of_one_character
from
test cross
join table(
cast(
multiset(
select
level
from
dual connect by level <= length(col)
) as sys.odcinumberlist
)
)
)
group by
col
having max(ascii_of_one_character) >= 4000000000;
The script looks for characters of a certain range GROUPs them and marks displays them.
Is it possible to include this in a REPLACE statement of a similar sort:
REPLACE(table.column, max(ascii_of_one_character) >= 4000000000, '')
EDIT: As per #flyaround answer this is the code I use changed a little bit:
with test (col) as (
select skunden.name1
from skunden
)
select col
, REGEXP_REPLACE(col, 'max(ascii_of_one_character)>=4000000000', '') as cleaned
, CASE WHEN REGEXP_COUNT(col, 'max(ascii_of_one_character)>=4000000000') > 0 THEN 0 ELSE 1 END as isOk
from test;

Coming back to your original code, because my suggested REGEX_REPLACE is not working sufficient with high surrogates. Your approach is already very effective, so I jumped into it to have a solution here.
MERGE
INTO skunden
USING (
select
id as innerId,
name as innerName,
case when max(ascii_of_one_character) >= 65535 then 0 else 1 end isOk,
listagg(case when ascii_of_one_character <65535 then one_character end , '') within group (order by rn) as cleaned
from
(
select
id,
name,
substr(name, column_value, 1) one_character,
ascii(
substr(name, column_value, 1)
) ascii_of_one_character
, rownum as rn
from
skunden cross
join table(
cast(
multiset(
select
level
from
dual connect by level <= length(name)
) as sys.odcinumberlist
)
)
)
group by
id, name
having max(ascii_of_one_character) >= 4000000000
)
ON (skunden.id = innerId)
WHEN MATCHED THEN
UPDATE
SET name = cleaned
;
On MERGE you can't use the referencing column for an update. Therefore you should use the unique key (I used 'id' in my example) of your table.
The resulting value will be 'L..D' for your example value of 'L.􀈉.D.'

If I got your question correctly you would like to remove characters with a higher decimal representation of characters than specified.
You could check to use REGEXP_REPLACE for this, like:
with test (col) as (
select
'L.A.D'
from
dual
union all
select
'L.􀈉.D.'
from
dual
)
select col
, REGEXP_REPLACE(col, '[^\u00010000-\u0010FFFF]+$', '') as cleaned
, CASE WHEN REGEXP_COUNT(col, '[^\u00010000-\u0010FFFF]+$') > 0 THEN 0 ELSE 1 END as isOk
from test;

Related

Concatenating clob cloumn values in sql query

I am using this statement in my sql query to concate large clob column values but the output contains extra ","(commas) not able to figure out what is going wrong.?
SELECT RTRIM(
XMLAGG(
XMLELEMENT(
E,
CASE WHEN UNIQ_ID IN ( SELECT VAL
FROM SOME_TABLE
WHERE VAL_NM = 'SOME_TEXT' )
THEN TABLE1.COL_NAME
ELSE NULL
END,
', '
).EXTRACT('//text()')
ORDER BY TABLE1.UNIQ_ID
).GETCLOBVAL(),
','
) COMBINED_VAL
If you are asking about the trailing commas, then you are concatenating using comma then space so the trailing character is a space and not a comma.
If you are asking about adjacent separators with no value in between then when the WHEN UNIQ_ID IN ( ... ) part of your CASE statement is not matched you will have a NULL value; this is concatenated into the aggregated output and then you will find that you have two adjacent comma-space separators with no text in between.
For example:
WITH test_data ( id, value ) AS (
SELECT 1, 'a' FROM DUAL UNION ALL
SELECT 2, NULL FROM DUAL UNION ALL
SELECT 3, 'b' FROM DUAL
)
SELECT RTRIM(
XMLAGG(
XMLELEMENT(
E,
value,
', '
).EXTRACT('//text()')
ORDER BY id
).GETCLOBVAL(),
','
) AS COMBINED_VAL
FROM test_data;
Outputs:
| COMBINED_VAL |
| :----------- |
| a, , b, |
The trailing comma-space isn't trimmed as the last character is a space and the values are a then NULL then b and the NULL is represented as a zero-width substring.
db<>fiddle here
That's pretty easy:
do not aggregate rows which you don't want to get. To do that you just need to generate xmlelement only for required rows, and just return null for others.
Just put all characters you want to trim from your result into second parameter of rtrim:
SELECT RTRIM(
XMLAGG(
CASE WHEN UNIQ_ID IN ( SELECT VAL
FROM SOME_TABLE
WHERE VAL_NM = 'SOME_TEXT' )
and COL_NAME is not null
THEN XMLELEMENT(
E,
TABLE1.COL_NAME||', '
)
END
ORDER BY TABLE1.UNIQ_ID
).extract('//text()').GETCLOBVAL(),
', '
) COMBINED_VAL
from table1;
Full test case with sample data and results: https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=452c715247e8edda8735014ff2fb34f4
with
SOME_TABLE(VAL, VAL_NM) as (
select level*2, 'SOME_TEXT' from dual connect by level<=10
)
,TABLE1(UNIQ_ID, COL_NAME) as (
select level UNIQ_ID
, to_clob(level) COL_NAME
from dual
connect by level<=20
)
SELECT RTRIM(
XMLAGG(
CASE WHEN UNIQ_ID IN ( SELECT VAL
FROM SOME_TABLE
WHERE VAL_NM = 'SOME_TEXT' )
and COL_NAME is not null
THEN XMLELEMENT(
E,
TABLE1.COL_NAME||', '
)
END
ORDER BY TABLE1.UNIQ_ID
).extract('//text()').GETCLOBVAL(),
', '
) COMBINED_VAL
from TABLE1;
Results:
COMBINED_VAL
----------------------------------------
2, 4, 6, 8, 10, 12, 14, 16, 18, 20

distinct and sum if like

I have a table as the following
name
-----------
1#apple#1
2#apple#2
3#apple#4
4#box#4
5#box#5
and I want to get the result as:
name
--------------
apple 3
box 2
Thanks in advance for your help
This is what you need.
select
SUBSTRING(
name,
CHARINDEX('#', name) + 1,
LEN(name) - (
CHARINDEX('#', REVERSE(name)) + CHARINDEX('#', name)
)
),
count(1)
from
tbl
group by
SUBSTRING(
name,
CHARINDEX('#', name) + 1,
LEN(name) - (
CHARINDEX('#', REVERSE(name)) + CHARINDEX('#', name)
)
)
If your data does not contain any full stops (or periods depending on your vernacular), and the length of your string is less than 128 characters, then you can use PARSENAME to effectively split your string into parts, and extract the 2nd part:
DECLARE #T TABLE (Val VARCHAR(20));
INSERT #T (Val)
VALUES ('1#apple#1'), ('2#apple#2'), ('3#apple#4'),
('4#box#4'), ('5#box#5');
SELECT Val = PARSENAME(REPLACE(t.Val, '#', '.'), 2),
[Count] = COUNT(*)
FROM #T AS t
GROUP BY PARSENAME(REPLACE(t.Val, '#', '.'), 2);
Otherwise you will need to use CHARINDEX to find the first and last occurrence of # within your string (REVERSE is also needed to get the last position), then use SUBSTRING to extract the text between these positions:
DECLARE #T TABLE (Val VARCHAR(20));
INSERT #T (Val)
VALUES ('1#apple#1'), ('2#apple#2'), ('3#apple#4'),
('4#box#4'), ('5#box#5');
SELECT Val = SUBSTRING(t.Val, x.FirstPosition + 1, x.LastPosition - x.FirstPosition),
[Count] = COUNT(*)
FROM #T AS t
CROSS APPLY
( SELECT CHARINDEX('#', t.Val) ,
LEN(t.Val) - CHARINDEX('#', REVERSE(t.Val))
) AS x (FirstPosition, LastPosition)
GROUP BY SUBSTRING(t.Val, x.FirstPosition + 1, x.LastPosition - x.FirstPosition);
use case when
select case when name like '%apple%' then 'apple'
when name like '%box%' then 'box' end item_name,
count(*)
group by cas when name like '%apple%' then 'apple'
when name like '%box%' then 'box' end
No DBMS specified, so here is a postgres variant. The query does use regexps to simplify things a bit.
with t0 as (
select '1#apple#1' as value
union all select '2#apple#2'
union all select '3#apple#4'
union all select '4#box#4'
union all select '5#box#5'
),
trimmed as (
select regexp_replace(value,'[0-9]*#(.+?)#[0-9]*','\1') as name
from t0
)
select name, count(*)
from trimmed
group by name
order by name
DB Fiddle
Update
For Oracle DMBS, the query stays basically the same:
with t0 as (
select '1#apple#1' as value from dual
union all select '2#apple#2' from dual
union all select '3#apple#4' from dual
union all select '4#box#4' from dual
union all select '5#box#5' from dual
),
trimmed as (
select regexp_replace(value,'[0-9]*#(.+?)#[0-9]*','\1') as name
from t0
)
select name, count(*)
from trimmed
group by name
order by name
NAME | COUNT(*)
:---- | -------:
apple | 3
box | 2
db<>fiddle here
Update
MySQL 8.0
with t0 as (
select '1#apple#1' as value
union all select '2#apple#2'
union all select '3#apple#4'
union all select '4#box#4'
union all select '5#box#5'
),
trimmed as (
select regexp_replace(value,'[0-9]*#(.+?)#[0-9]*','$1') as name
from t0
)
select name, count(*)
from trimmed
group by name
order by name
name | count(*)
:---- | -------:
apple | 3
box | 2
db<>fiddle here
You can use case and group by to do the same.
select new_col , count(new_col)
from
(
select case when col_name like '%apple%' then 'apple'
when col_name like '%box%' then 'box'
else 'others' end new_col
from table_name
)
group by new_col
;

SQL to find upper case words from a column

I have a description column in my table and its values are:
This is a EXAMPLE
This is a TEST
This is a VALUE
I want to display only EXAMPLE, TEST, and VALUE from the description column.
How do I achieve this?
This could be a way:
-- a test case
with test(id, str) as (
select 1, 'This is a EXAMPLE' from dual union all
select 2, 'This is a TEST' from dual union all
select 3, 'This is a VALUE' from dual union all
select 4, 'This IS aN EXAMPLE' from dual
)
-- concatenate the resulting words
select id, listagg(str, ' ') within group (order by pos)
from (
-- tokenize the strings by using the space as a word separator
SELECT id,
trim(regexp_substr(str, '[^ ]+', 1, level)) str,
level as pos
FROM test t
CONNECT BY instr(str, ' ', 1, level - 1) > 0
and prior id = id
and prior sys_guid() is not null
)
-- only get the uppercase words
where regexp_like(str, '^[A-Z]+$')
group by id
The idea is to tokenize every string, then cut off the words that are not made by upper case characters and then concatenate the remaining words.
The result:
1 EXAMPLE
2 TEST
3 VALUE
4 IS EXAMPLE
If you need to handle some other character as an upper case letter, you may edit the where condition to filter for the matching words; for example, with '_':
with test(id, str) as (
select 1, 'This is a EXAMPLE' from dual union all
select 2, 'This is a TEST' from dual union all
select 3, 'This is a VALUE' from dual union all
select 4, 'This IS aN EXAMPLE' from dual union all
select 5, 'This IS AN_EXAMPLE' from dual
)
select id, listagg(str, ' ') within group (order by pos)
from (
SELECT id,
trim(regexp_substr(str, '[^ ]+', 1, level)) str,
level as pos
FROM test t
CONNECT BY instr(str, ' ', 1, level - 1) > 0
and prior id = id
and prior sys_guid() is not null
)
where regexp_like(str, '^[A-Z_]+$')
group by id
gives:
1 EXAMPLE
2 TEST
3 VALUE
4 IS EXAMPLE
5 IS AN_EXAMPLE
Here's another solution. It was inspired by Aleksej's answer.
The idea? Get all the words. Then aggregate only fully uppercased to a list.
Sample data:
create table descriptions (ID int, Description varchar2(100));
insert into descriptions (ID, Description)
select 1 as ID, 'foo Foo FOO bar Bar BAR' as Description from dual
union all select 2, 'This is an EXAMPLE TEST Description VALUE' from dual
;
Query:
select id, Description, listagg(word, ',') within group (order by pos) as UpperCaseWords
from (
select
id, Description,
trim(regexp_substr(Description, '\w+', 1, level)) as word,
level as pos
from descriptions t
connect by regexp_instr(Description, '\s+', 1, level - 1) > 0
and prior id = id
and prior sys_guid() is not null
)
where word = upper(word)
group by id, Description
Result:
ID | DESCRIPTION | UPPERCASEWORDS
-- | ----------------------------------------- | ------------------
1 | foo Foo FOO bar Bar BAR | FOO,BAR
2 | This is an EXAMPLE TEST Description VALUE | EXAMPLE,TEST,VALUE
It is possible to achieve this thanks to the REGEXP_REPLACE function:
SELECT REGEXP_REPLACE(my_column, '(^[A-Z]| |[a-z][A-Z]*|[A-Z]*[a-z])', '') AS Result FROM my_table
It uses a regex which replaces first upper case char of the line and converts every lower case char and space with blanks.
Try this:
SELECT SUBSTR(column_name, INSTR(column_name,' ',-1) + 1)
FROM your_table;
This should do the trick:
SELECT SUBSTR(REGEXP_REPLACE(' ' || REGEXP_REPLACE(description, '(^[A-Z]|[a-z]|[A-Z][a-z]+|[,])', ''), ' +', ' '), 2, 9999) AS only_upper
FROM (
select 'Hey IF you do not know IT, This IS a test of UPPERCASE and IT, with good WILL and faith, Should BE fine to be SHOWN' description
from dual
)
I have added condition to strip commas, you can add inside that brakets other special characters to remove.
ONLY_UPPER
-----------------------------------
IF IT IS UPPERCASE IT WILL BE SHOWN
This is a function based on some of the regular expression answers.
create or replace function capwords(orig_string varchar2)
return varchar2
as
out_string varchar2(80);
begin
out_string := REGEXP_REPLACE(orig_string, '([a-z][A-Z_]*|[A-Z_]*[a-z])', '');
out_string := REGEXP_REPLACE(trim(out_string), '( *)', ' ');
return out_string;
end;
/
Removes strings of upper case letters and underscores that have lower case letters
on either end. Replaces multiple adjacent spaces with one space.
Trims extra spaces off of the ends. Assumes max size of 80 characters.
Slightly edited output:
>select id,str,capwords(str) from test;
ID STR CAPWORDS(STR)
---------- ------------------------------ ------------------
1 This is a EXAMPLE EXAMPLE
2 This is a TEST TEST
3 This is a VALUE VALUE
4 This IS aN EXAMPLE IS EXAMPLE
5 This is WITH_UNDERSCORE WITH_UNDERSCORE
6 ThiS IS aN EXAMPLE IS EXAMPLE
7 thiS IS aN EXAMPLE IS EXAMPLE
8 This IS wiTH_UNDERSCORE IS
If you only need to "display" the result without changing the values in the column then you can use CASE WHEN (in the example Description is the column name):
Select CASE WHEN Description like '%EXAMPLE%' then 'EXAMPLE' WHEN Description like '%TEST%' then 'TEST' WHEN Description like '%VALUE%' then 'VALUE' END From [yourTable]
The conditions are not case sensitive even if you write it all in uppercase.
You can add Else '<Value if all conditions are wrong>' before the END in case there are descriptions that don't contain any of the values. The example will return NULL for those cases, and writing ELSE Description will return the original value of that row.
It also works if you need to update. It is simple and practical, easy way out, haha.

regex oracle sql return all capturing groups

I have an regex like
select regexp_substr('some stuff TOTAL_SCORE<518>some stuff OTHER_VALUE<456> foo <after>', 'TOTAL_SCORE<(\d{3})>', 1, 1, NULL, 1) from dual which can return a value for a single capturing group.
How can I instead return all the capturing groups as an additional column? (string concat of results is fine)
select regexp_substr('some stuff TOTAL_SCORE<518> TOTAL_SCORE<123>some stuff OTHER_VALUE<456> foo <after>', 'TOTAL_SCORE<(\d{3})>') from dual
Query 1:
-- Sample data
WITH your_table ( value ) AS (
SELECT 'some stuff TOTAL_SCORE<518>some stuff OTHER_VALUE<456> foo <after>' FROM DUAL
)
-- Query
SELECT REGEXP_REPLACE(
value,
'.*TOTAL_SCORE<(\d{3})>.*OTHER_VALUE<(\d{3})>.*',
'\1,\2'
) As scores
FROM your_table
Output:
SCORES
-------
518,456
Query 2:
-- Sample data
WITH your_table ( value ) AS (
SELECT 'some stuff TOTAL_SCORE<518> TOTAL_SCORE<123> some stuff OTHER_VALUE<456> foo <after>' FROM DUAL
)
-- Query
SELECT l.column_value As scores
FROM your_table t,
TABLE(
CAST(
MULTISET(
SELECT TO_NUMBER(
REGEXP_SUBSTR(
t.value,
'TOTAL_SCORE<(\d{3})>',
1,
LEVEL,
NULL,
1
)
)
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( t.value, 'TOTAL_SCORE<(\d{3})>' )
) AS SYS.ODCINUMBERLIST
)
) l;
Output:
SCORES
-------
518
123

Create a delimitted string from a query in DB2

I am trying to create a delimitted string from the results of a query in DB2 on the iSeries (AS/400). I've done this in T-SQL, but can't find a way to do it here.
Here is my code in T-SQL. I'm looking for an equivelant in DB2.
DECLARE #a VARCHAR(1000)
SELECT #a = COALESCE(#a + ', ' + [Description], [Description])
FROM AP.Checkbooks
SELECT #a
If the descriptions in my table look like this:
Desc 1
Desc 2
Desc 3
Then it will return this:
Desc 1, Desc 2, Desc 3
Essentially you're looking for the equivalent of MySQL's GROUP_CONCAT aggregate function in DB2. According to one thread I found, you can mimic this behaviour by going through the XMLAGG function:
create table t1 (num int, color varchar(10));
insert into t1 values (1,'red'), (1,'black'), (2,'red'), (2,'yellow'), (2,'green');
select num,
substr( xmlserialize( xmlagg( xmltext( concat( ', ', color ) ) ) as varchar( 1024 ) ), 3 )
from t1
group by num;
This would return
1 red,black
2 red,yellow,green
(or should, if I'm reading things correctly)
You can do this using common table expressions (CTEs) and recursion.
with
cte1 as
(select description, row_number() over() as row_nbr from checkbooks),
cte2 (list, cnt, cnt_max) AS
(SELECT VARCHAR('', 32000), 0, count(description) FROM cte1
UNION ALL
SELECT
-- No comma before the first description
case when cte2.list = '' THEN RTRIM(CHAR(cte1.description))
else cte2.list || ', ' || RTRIM(CHAR(cte1.description)) end,
cte2.cnt + 1,
cte2.cnt_max
FROM cte1,cte2
WHERE cte1.row_nbr = cte2.cnt + 1 AND cte2.cnt < cte2.cnt_max ),
cte3 as
(select list from cte2
where cte2.cnt = cte2.cnt_max fetch first 1 row only)
select list from cte3;
I'm trying to do this in OLEDB and from what I understand you can't do this because you can't do anything fancy in SQL for OLEDB like declare variables or create a table. So I guess there is no way.
If you are running DB2 9.7 or higher, you can use LISTAGG function. Have a look here:
http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.db2.luw.sql.ref.doc%2Fdoc%2Fr0058709.html