Replace string with random text - Oracle SQL - sql

I have a table table1 with 1 column - edi_value which is of type CLOB.
These are the entries:
seq edi_message
1 ISA*00* *00* *08*9254110060 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~
GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~
ST*824*021390001*005010X186A1~
2 ISA*00* *00* *08*56789876678 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~
GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~
ST*824*021390001*005010X186A1~
Please note - there can be varying number of lines, from 3 to 500.
What I'm looking for is the following conditions:
Ignore text before first * in each line, for every line, before the first *, it should not change. For ex. GS, ST should not change. ONLY after the first * should randomize
Replace numbers [0-9] with random numbers, for ex. if 0 is replaced with 1, then it should be 1 througout.
Replace text [A-Za-z] with random text, for ex. if A is replaced with W, then it should be replaced with W throughout
Leave special characters as is
One character/number should ONLY map to one random character/number
Output can be:
seq edi_message
1 ISA*11* *11* *13*4030111101 *QQ*102030234 *101010*1313*U*11311*111143121*1*V*>~
GS*WE*3122000233*102030234*01101010*1313*43121*X*113111~
ST*300*101241111*113111X130A1~
2 ISA*11* *11* *13*30234320023 *QQ*102030234 *101010*1313*U*11311*111143121*1*V*>~
GS*WE*3122000233*102030234*01101010*1313*43121*X*113111~
ST*300*101241111*113111X130W1~
How can this be achieved in Oracle SQL?

You can use translate with a helper function for generating random strings (though #LukStorms has a much neater SQL solution for that using LISTAGG), along with a method to tokenise and then re-concatenate the values into lines (I use a pure SQL method here for demonstration):
create or replace function f(p_low integer, p_high integer)
return varchar as
r varchar(2000) := '';
x integer;
begin
for i in p_low..p_high loop
x := dbms_random.value(0,length(r)+1);
r := substr(r,1,x)||chr(i)||substr(r,x+1);
end loop;
return r;
end;
/
select * from table1;
| EDI_VALUE |
| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ISA*00* *00* *08*9254110060 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~<br> GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~<br> ST*824*021390001*005010X186A1~ |
| ISA*00* *00* *08*56789876678 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~<br> GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~<br> ST*824*021390001*005010X186A |
with t as (select f(48,57)||f(65,90) translate_chars from dual)
select (select new_value
from (select substr(sys_connect_by_path(r_line,'
'),2) new_value, connect_by_isleaf isleaf
from (select lvl
, substr(line,1,instr(line,'*')-1)||
translate(substr(line,instr(line,'*'))
,'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
,(select translate_chars from t)) r_line
from (select level lvl
, regexp_substr(edi_value,'^.*$',1,level,'m') line
from (select table1.edi_value from dual)
connect by level <= regexp_count(edi_value,'^.*$',1,'m')))
start with lvl=1 connect by lvl=(prior lvl)+1)
where isleaf=1)
from table1;
| (SELECTNEW_VALUEFROM(SELECTSUBSTR(SYS_CONNECT_BY_PATH(R_LINE,''),2)NEW_VALUE,CONNECT_BY_ISLEAFISLEAFFROM(SELECTLVL,SUBSTR(LINE,1,INSTR(LINE,'*')-1)||TRANSLATE(SUBSTR(LINE,INSTR(LINE,'*')),'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',(SELECTTRANSLATE_CHARSFR |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ISA*66* *66* *67*1935006626 *VV*098532471 *650902*6763*K*66360*666613640*6*P*>~<br> GS*GZ*3084295877*098532471*96650902*6763*13640*I*663606~<br> ST*795*690816660*663606I072G0~ |
| ISA*66* *66* *67*32471742247 *VV*098532471 *650902*6763*K*66360*666613640*6*P*>~<br> GS*GZ*3084295877*098532471*96650902*6763*13640*I*663606~<br> ST*795*690816660*663606I072G |
db<>fiddle here

You can use CTE's with a CONNECT to generate the strings for the letters and numbers.
Then use the ordered and scrambled strings in the translate.
A CROSS APPLY can be used to REGEX split the message into parts.
Then only translate those that start with a *.
And use LISTAGG to glue the parts back together.
WITH
NUMS as
(
select
LISTAGG(n, '') WITHIN GROUP (ORDER BY n) as n_from,
LISTAGG(n, '') WITHIN GROUP (ORDER BY DBMS_RANDOM.VALUE) as n_to
from (select level-1 n from dual connect by level <= 10)
),
LETTERS as
(
select
LISTAGG(c, '') WITHIN GROUP (ORDER BY c) as c_from,
LISTAGG(c, '') WITHIN GROUP (ORDER BY DBMS_RANDOM.VALUE) as c_to
from (select chr(ascii('A')+level-1 ) c from dual connect by level <= 26)
)
SELECT ca.scrambled as scrambled_message
FROM table1 t
CROSS JOIN NUMS
CROSS JOIN LETTERS
CROSS APPLY
(
SELECT LISTAGG(CASE WHEN part like '*%' then translate(part, n_from||c_from, n_to||c_to) else part end, '') WITHIN GROUP (ORDER BY lvl) as scrambled
FROM
(
SELECT
level AS lvl,
REGEXP_SUBSTR(t.edi_message,'[*]\S+|[^*]+',1,level,'m') AS part
FROM dual
CONNECT BY level <= regexp_count(t.edi_message, '[*]\S+|[^*]+')+1
) parts
) ca;
A test on db<>fiddle here
Example output:
SCRAMBLED_MESSAGE
-----------------------------------------------------------------------------------------------------------
ISA*99* *99* *92*3525999959 *PP*950525023 *959595*9292*A*99299*999932909*9*J*>~
GS*WQ*2900555022*950525023*59959595*9292*32909*I*992999~
ST*255*959039999*992999I925V9~
ISA*99* *99* *92*25023205502 *PP*950525023 *959595*9292*A*99299*999932909*9*J*>~
GS*WQ*2900555022*950525023*59959595*9292*32909*I*992999~
ST*255*959039999*992999I925W9~

Related

SQL: Divide long text in multiple rows

I would like to divide a long text in multiple rows; there are other questions similar to this one but none of them worked for me.
What I have
ID | Message
----------------------------------
1 | Very looooooooooooooooong text
2 | Short text
What I would like to do is divide that string every n characters
Result if n = 15:
Id | Message
------------------------------------------
1 | Very looooooooo
1 | oooooooong text
2 | Short text
Even better if the split is done at the first space after n character.
I tried with string_split and substring but I cannot find anything that works.
I thought to use something similar to this:
SELECT index, element FROM table, CAST(message AS SUPER) AS element AT index;
But it doesn't take into account the length and I don't like casting a varchar variable into a super.
You can use generate_series() to accomplish this:
select m.*, gs.posn, substring(m.message, gs.posn, 15) as split_message
from messages m
cross join lateral generate_series(1, length(message), 15) gs(posn);
Splitting on spaces after the length is a little trickier. We would have to split the message into words and then figure out how to break them into groups and then reaggregate.
I could not figure out how to split on spaces without recursion. I hope you don't mind that it treats all whitespace as word boundaries:
with recursive by_words as (
select m.*, s.n, s.word, length(s.word) as word_len,
max(s.n) over (partition by m.id) as num_words
from messages m
cross join lateral regexp_split_to_table(m.message, '\s+')
with ordinality as s(word, n)
), rejoin as (
select id, n, array[word] as words, word_len as cum_word_len,
word_len >= 15 as keep
from by_words
where n = 1
union all
select p.id, c.n,
case
when p.cum_word_len >= 15 then array[c.word]
else p.words||c.word
end as words,
case
when p.cum_word_len >= 15 then c.word_len
else p.cum_word_len + c.word_len + 1
end as cum_word_len,
(p.cum_word_len + c.word_len + 1 >= 15)
or (c.n = c.num_words) as keep
from rejoin p
join by_words c on (c.id, c.n) = (p.id, p.n + 1)
)
select id,
row_number() over (partition by id
order by n) as segnum,
array_to_string(words, ' ') as split_message
from rejoin
where keep
order by 1, 2
;
db<>fiddle here
Edit to add:
Can you please tell me whether the below works in Redshift?
with gs as (
select generate_series as posn
from generate_series(1, 150000, 15)
)
select *, substring(m.message, gs.posn, 15) as split_message
from messages m
join gs
on gs.posn <= greatest(1, length(m.message))
order by m.id, gs.posn
;
Thanks to #Mike Organek 's answer and his help I found a solution that works with Redshift too.
Problem in Mike's answer for Redshift is related to generate_series that is not well supported in Redshift, so here's a workaround.
with row as (
select t.*, row_number() over () as x
from table t -- big enough table
limit 100
),
result as
(
select (x-1)*15+1 as posn from row --change 15 to a number to split the long text with
)
select * into gs
from result
And then Mike's answer:
select *, substring(m.feedback from gs.posn for 15) as split_message
from messages m
join gs
on gs.posn <= greatest(1, length(m.message))
order by m.id, gs.posn

Count and order comma separated values

I have the below one column "table" (apologies for the data model, not my fault :():
COL_IN
------
2K, E
E, 2K
O
I would like to obtain the below output, ordered by count descending:
COL_OUT COUNT
----------
K 4
E 2
O 1
COUNT is a reserved keyword, so it's not a good column name - even in the final output. I use COUNT_ instead (with an underscore).
Other than that, you can modify the input strings so they become valid JSON arrays, so that you can then use JSON functions to split them. After you split the strings into tokens, it's a simple matter to separate the leading number (if present) from the rest of the string, and to aggregate. NVL in the sum adds 1 for each token without a leading integer.
Including the sample data for testing only (if you have an actual table, remove the WITH clause at the top):
with
tbl (col_in) as (
select '2K, E' from dual union all
select 'E, 2K' from dual union all
select 'O' from dual
)
select ltrim(col, '0123456789') as col_out
, sum(nvl(to_number(regexp_substr(col, '^\d*')), 1)) as count_
from tbl,
json_table('["' || regexp_replace(col_in, ', *', '","') || '"]', '$[*]'
columns col path '$')
group by ltrim(col, '0123456789')
order by count_ desc, col_out
;
COL_OUT COUNT_
------- ------
K 4
E 2
O 1
You can use hierarchical query in such a way that
WITH t2 AS
(
SELECT TRIM(REGEXP_SUBSTR(col_in,'[^,]+',1,level)) AS s
FROM t
CONNECT BY level <= REGEXP_COUNT(col_in,',')+1
AND PRIOR SYS_GUID() IS NOT NULL
AND PRIOR col_in = col_in
)
SELECT REGEXP_SUBSTR(s,'[^0-9]') AS col_out,
SUM(NVL(REGEXP_SUBSTR(s,'[^[:alpha:]]'),1)) AS count
FROM t2
GROUP BY REGEXP_SUBSTR(s,'[^0-9]'),REGEXP_SUBSTR(s,'[^[:alpha:]]')
ORDER BY count DESC
presuming all of the data are alphanumeric only(eg.not containing special charaters such as $,#,! ..etc.)

Return five rows of random DNA instead of just one

This is the code I have to create a string of DNA:
prepare dna_length(int) as
with t1 as (
select chr(65) as s
union select chr(67)
union select chr(71)
union select chr(84) )
, t2 as ( select s, row_number() over() as rn from t1)
, t3 as ( select generate_series(1,$1) as i, round(random() * 4 + 0.5) as rn )
, t4 as ( select t2.s from t2 join t3 on (t2.rn=t3.rn))
select array_to_string(array(select s from t4),'') as dna;
execute dna_length(20);
I am trying to figure out how to re-write this to give a table of 5 rows of strings of DNA of length 20 each, instead of just one row. This is for PostgreSQL.
I tried:
CREATE TABLE dna_table(g int, dna text);
INSERT INTO dna_table (1, execute dna_length(20));
But this does not seem to work. I am an absolute beginner. How to do this properly?
PREPARE creates a prepared statement that can be used "as is". If your prepared statement returns one string then you can only get one string. You can't use it in other operations like insert, e.g.
In your case you may create a function:
create or replace function dna_length(int) returns text as
$$
with t1 as (
select chr(65) as s
union
select chr(67)
union
select chr(71)
union
select chr(84))
, t2 as (select s,
row_number() over () as rn
from t1)
, t3 as (select generate_series(1, $1) as i,
round(random() * 4 + 0.5) as rn)
, t4 as (select t2.s
from t2
join t3 on (t2.rn = t3.rn))
select array_to_string(array(select s from t4), '') as dna
$$ language sql;
And use it in a way like this:
insert into dna_table(g, dna) select generate_series(1,5), dna_length(20)
From the official doc:
PREPARE creates a prepared statement. A prepared statement is a server-side object that can be used to optimize performance. When the PREPARE statement is executed, the specified statement is parsed, analyzed, and rewritten. When an EXECUTE command is subsequently issued, the prepared statement is planned and executed. This division of labor avoids repetitive parse analysis work, while allowing the execution plan to depend on the specific parameter values supplied.
About functions.
This can be much simpler and faster:
SELECT string_agg(CASE ceil(random() * 4)
WHEN 1 THEN 'A'
WHEN 2 THEN 'C'
WHEN 3 THEN 'T'
WHEN 4 THEN 'G'
END, '') AS dna
FROM generate_series(1,100) g -- 100 = 5 rows * 20 nucleotides
GROUP BY g%5;
random() produces random value in the range 0.0 <= x < 1.0. Multiply by 4 and take the mathematical ceiling with ceil() (cheaper than round()), and you get a random distribution of the numbers 1-4. Convert to ACTG, and aggregate with GROUP BY g%5 - % being the modulo operator.
About string_agg():
Concatenate multiple result rows of one column into one, group by another column
As prepared statement, taking
$1 ... the number of rows
$2 ... the number of nucleotides per row
PREPARE dna_length(int, int) AS
SELECT string_agg(CASE ceil(random() * 4)
WHEN 1 THEN 'A'
WHEN 2 THEN 'C'
WHEN 3 THEN 'T'
WHEN 4 THEN 'G'
END, '') AS dna
FROM generate_series(1, $1 * $2) g
GROUP BY g%$1;
Call:
EXECUTE dna_length(5,20);
Result:
| dna |
| :------------------- |
| ATCTTCGACACGTCGGTACC |
| GTGGCTGCAGATGAACAGAG |
| ACAGCTTAAAACACTAAGCA |
| TCCGGACCTCTCGACCTTGA |
| CGTGCGGAGTACCCTAATTA |
db<>fiddle here
If you need it a lot, consider a function instead. See:
What is the difference between a prepared statement and a SQL or PL/pgSQL function, in terms of their purposes?

looping in sql with delimiter

I just had this idea of how can i loop in sql?
For example
I have this column
PARAMETER_VALUE
E,C;S,C;I,X;G,T;S,J;S,F;C,S;
i want to store all value before (,) in a temp column also store all value after (;) into another column
then it wont stop until there is no more value after (;)
Expected Output for Example
COL1 E S I G S S C
COL2 C C X T J F S
etc . . .
You can get by using regexp_substr() window analytic function with connect by level <= clause
with t1(PARAMETER_VALUE) as
(
select 'E,C;S,C;I,X;G,T;S,J;S,F;C,S;' from dual
), t2 as
(
select level as rn,
regexp_substr(PARAMETER_VALUE,'([^,]+)',1,level) as str1,
regexp_substr(PARAMETER_VALUE,'([^;]+)',1,level) as str2
from t1
connect by level <= regexp_count(PARAMETER_VALUE,';')
)
select listagg( regexp_substr(str1,'([^;]+$)') ,' ') within group (order by rn) as col1,
listagg( regexp_substr(str2,'([^,]+$)') ,' ') within group (order by rn) as col2
from t2;
COL1 COL2
------------- -------------
E S I G S S C C C X T J F S
Demo
Assuming that you need to separate the input into rows, at the ; delimiters, and then into columns at the , delimiter, you could do something like this:
-- WITH clause included to simulate input data. Not part of the solution;
-- use actual table and column names in the SELECT statement below.
with
t1(id, parameter_value) as (
select 1, 'E,C;S,C;I,X;G,T;S,J;S,F;C,S;' from dual union all
select 2, ',U;,;V,V;' from dual union all
select 3, null from dual
)
-- End of simulated input data
select id,
level as ord,
regexp_substr(parameter_value, '(;|^)([^,]*),', 1, level, null, 2) as col1,
regexp_substr(parameter_value, ',([^;]*);' , 1, level, null, 1) as col2
from t1
connect by level <= regexp_count(parameter_value, ';')
and id = prior id
and prior sys_guid() is not null
order by id, ord
;
ID ORD COL1 COL2
--- --- ---- ----
1 1 E C
1 2 S C
1 3 I X
1 4 G T
1 5 S J
1 6 S F
1 7 C S
2 1 U
2 2
2 3 V V
3 1
Note - this is not the most efficient way to split the inputs (nothing will be very efficient - the data model, which is in violation of First Normal Form, is the reason). This can be improved using standard instr and substr, but the query will be more complicated, and for that reason, harder to maintain.
I generated more input data, to illustrate a few things. You may have several inputs that must be broken up at the same time; that must be done with care. (Note the additional conditions in CONNECT BY). I also illustrate the handling of NULL - if a comma comes right after a semicolon, that means that the "column 1" part of that pair must be NULL. That is shown in the output.

Apply order by in comma separated string in oracle

I have one of the column in oracle table which has below value :
select csv_val from my_table where date='09-OCT-18';
output
==================
50,100,25,5000,1000
I want this values to be in ascending order with select query, output would looks like :
output
==================
25,50,100,1000,5000
I tried this link, but looks like it has some restriction on number of digits.
Here, I made you a modified version of the answer you linked to that can handle an arbitrary (hardcoded) number of commas. It's pretty heavy on CTEs. As with most LISTAGG answers, it'll have a 4000-char limit. I also changed your regexp to be able to handle null list entries, based on this answer.
WITH
T (N) AS --TEST DATA
(SELECT '50,100,25,5000,1000' FROM DUAL
UNION
SELECT '25464,89453,15686' FROM DUAL
UNION
SELECT '21561,68547,51612' FROM DUAL
),
nums (x) as -- arbitrary limit of 20, can be changed
(select level from dual connect by level <= 20),
splitstr (N, x, substring) as
(select N, x, regexp_substr(N, '(.*?)(,|$)', 1, x, NULL, 1)
from T
inner join nums on x <= 1 + regexp_count(N, ',')
order by N, x)
select N, listagg(substring, ',') within group (order by to_number(substring)) as sorted_N
from splitstr
group by N
;
Probably it can be improved, but eh...
Based on sample data you posted, relatively simple query would work (you need lines 3 - 7). If data doesn't really look like that, query might need adjustment.
SQL> with my_table (csv_val) as
2 (select '50,100,25,5000,1000' from dual)
3 select listagg(token, ',') within group (order by to_number(token)) result
4 from (select regexp_substr(csv_val, '[^,]+', 1, level) token
5 from my_table
6 connect by level <= regexp_count(csv_val, ',') + 1
7 );
RESULT
-------------------------
25,50,100,1000,5000
SQL>