Escaping special characters for JSON output - sql

I have a column that contains data that I want to escape in order to use it as JSON output, to be more precise am trying to escape the same characters listed here but using Oracle 11g: Special Characters and JSON Escaping Rules
I think it can be solved using REGEXP_REPLACE:
SELECT REGEXP_REPLACE(my_column, '("|\\|/)|(' || CHR(9) || ')', '\\\1') FROM my_table;
But I am lost about replacing the other characters (tab, new line, backspace, etc), in the previous example I know that \1 will match and replace the first group but I am not sure how to capture the tab in the second group and then replace it with \t. Somebody could give me a hint about how to do the replacement?
I know I can do this:
SELECT REGEXP_REPLACE( REGEXP_REPLACE(my_column, '("|\\|/)', '\\\1'), '(' || CHR(9) || ')', '\t')
FROM my_table;
But I would have to nest like 5 calls to REGEXP_REPLACE, and I suspect I should be able to do it in just one or two calls.
I am aware about other packages or libraries for JSON but I think this case is simple enough that it can be solved with the functions that Oracle offers out-of-the-box.
Thank you.

Here's a start. Replacing all the regular characters is easy enough, it's the control characters that will be tricky. This method uses a group consisting of a character class that contains the characters you want to add the backslash in front of. Note that characters inside of the class do not need to be escaped. The argument to REGEXP_REPLACE of 1 means start at the first position and the 0 means to replace all occurrences found in the source string.
SELECT REGEXP_REPLACE('t/h"is"'||chr(9)||'is a|te\st', '([/\|"])', '\\\1', 1, 0) FROM dual;
Replacing the TAB and a carriage return is easy enough by wrapping the above in REPLACE calls, but it stinks to have to do this for each control character. Thus, I'm afraid my answer isn't really a full answer for you, it only helps you with the regular characters a bit:
SQL> SELECT REPLACE(REPLACE(REGEXP_REPLACE('t/h"is"'||chr(9)||'is
2 a|te\st', '([/\|"])', '\\\1', 1, 0), chr(9), '\t'), chr(10), '\n') fixe
3 FROM dual;
FIXED
-------------------------
t\/h\"is\"\tis\na\|te\\st
SQL>
EDIT: Here's a solution! I don't claim to understand it fully, but basically it creates a translation table that joins to your string (in the inp_str table). The connect by, level traverses the length of the string and replaces characters where there is a match in the translation table. I modified a solution found here: http://database.developer-works.com/article/14901746/Replace+%28translate%29+one+char+to+many that really doesn't have a great explanation. Hopefully someone here will chime in and explain this fully.
SQL> with trans_tbl(ch_frm, str_to) as (
select '"', '\"' from dual union
select '/', '\/' from dual union
select '\', '\\' from dual union
select chr(8), '\b' from dual union -- BS
select chr(12), '\f' from dual union -- FF
select chr(10), '\n' from dual union -- NL
select chr(13), '\r' from dual union -- CR
select chr(9), '\t' from dual -- HT
),
inp_str as (
select 'No' || chr(12) || 'w is ' || chr(9) || 'the "time" for /all go\od men to '||
chr(8)||'com' || chr(10) || 'e to the aid of their ' || chr(13) || 'country' txt from dual
)
select max(replace(sys_connect_by_path(ch,'`'),'`')) as txt
from (
select lvl
,decode(str_to,null,substr(txt, lvl, 1),str_to) as ch
from inp_str cross join (select level lvl from inp_str connect by level <= length(txt))
left outer join trans_tbl on (ch_frm = substr(txt, lvl, 1))
)
connect by lvl = prior lvl+1
start with lvl = 1;
TXT
------------------------------------------------------------------------------------------
No\fw is \tthe \"time\" for \/all go\\od men to \bcom\ne to the aid of their \rcountry
SQL>
EDIT 8/10/2016 - Make it a function for encapsulation and reusability so you could use it for multiple columns at once:
create or replace function esc_json(string_in varchar2)
return varchar2
is
s_converted varchar2(4000);
BEGIN
with trans_tbl(ch_frm, str_to) as (
select '"', '\"' from dual union
select '/', '\/' from dual union
select '\', '\\' from dual union
select chr(8), '\b' from dual union -- BS
select chr(12), '\f' from dual union -- FF
select chr(10), '\n' from dual union -- NL
select chr(13), '\r' from dual union -- CR
select chr(9), '\t' from dual -- HT
),
inp_str(txt) as (
select string_in from dual
)
select max(replace(sys_connect_by_path(ch,'`'),'`')) as c_text
into s_converted
from (
select lvl
,decode(str_to,null,substr(txt, lvl, 1),str_to) as ch
from inp_str cross join (select level lvl from inp_str connect by level <= length(txt))
left outer join trans_tbl on (ch_frm = substr(txt, lvl, 1))
)
connect by lvl = prior lvl+1
start with lvl = 1;
return s_converted;
end esc_json;
Example to call for multiple columns at once:
select esc_json(column_1), esc_json(column_2)
from your_table;

Inspired by the answer above, I created this simpler "one-liner" function:
create or replace function json_esc (
str IN varchar2
) return varchar2
begin
return REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(str, chr(8), '\b'), chr(9), '\t'), chr(10), '\n'), chr(12), '\f'), chr(13), '\r');
end;
Please note, both this and #Gary_W's answer above are not escaping all control characters as the json.org seems to indicate.

in sql server you can use STRING_ESCAPE() function like below:
SELECT
STRING_ESCAPE('['' This is a special / "message" /'']', 'json') AS
escapedJson;

Related

Oracle - String to Double - Dynamic Scale

I am in the situation that inside the database various double values are stored as a string.
(this can not be changed due to some other reasons!)
The numbers can have a different amount of numbers after & before the decimal separator.
The decimal separator of the stored values is a .
The default decimal separator of the database might possibly change in the future.
Examples:
1.1
111.1
1.111
11.11
1.1111
I now need to select those as numbers to be able to compare for bigger or smaller values, etc.
Therefore I tried to convert the strings to numbers. I found a hint at this answer: click.
Unfortunately using this as a test:
SELECT TO_NUMBER('10.123', TRANSLATE('10.123', ' 1,234.567890', TO_CHAR(9999.9, '9G999D9') || '99999'))
FROM DUAL;
Somehow converts the number to 10123, completely removing the decimal separation, so this query gives no result (just for verification):
SELECT * FROM(SELECT TO_NUMBER('10.123', TRANSLATE('10.123', ' 1,234.567890', TO_CHAR(9999.9, '9G999D9') || '99999')) AS NUM
FROM DUAL) WHERE NUM < 11;
So I stepped through the single parts to see if I can find an error:
SELECT TO_CHAR(9999.9, '9G999D9') FROM DUAL; -- 9.999,9
SELECT TO_CHAR(9999.9, '9G999D9') || '99999' FROM DUAL; -- 9.999,999999
SELECT TRANSLATE('10.123', ' 1,234.567890', ' 9.999,999999')
FROM DUAL; -- 99,999
SELECT TRANSLATE('10.123', ' 1,234.567890', TO_CHAR(9999.9, '9G999D9') || '99999')
FROM DUAL; -- 99,999
As you can see I get a . as group separator and a , as decimal separator for the database.
I do not understand why it does not correctly convert the number.
Thank you already for any help!
Try using this version of to_number
TO_NUMBER( string1 [, format_mask] [, nls_language])
For example:
SELECT to_number('1.1111','9G990D00000', 'NLS_NUMERIC_CHARACTERS = ''.,''') FROM DUAL
You can try this,
alter session set NLS_NUMERIC_CHARACTERS = '.,';
WITH INPUT_TEST AS (
SELECT '.' decimal_operator, '1.1' num_in_char from dual
UNION ALL
SELECT '.' decimal_operator, '111.1 ' from dual
UNION ALL
SELECT '.' decimal_operator, '1.111 ' from dual
UNION ALL
SELECT '.' decimal_operator, '11.11 ' from dual
UNION ALL
SELECT '.' decimal_operator, '1.1111' from dual)
SELECT TO_NUMBER(REPLACE(num_in_char, '.', decimal_separator)) to_num
FROM input_test a, (select SUBSTR(value, 1, 1) decimal_separator
from nls_session_parameters
where parameter = 'NLS_NUMERIC_CHARACTERS') b;
TO_NUM
----------
1.1
111.1
1.111
11.11
1.1111
alter session set NLS_NUMERIC_CHARACTERS = ',.';
Run the select statement above again.
TO_NUM
----------
1,1
111,1
1,111
11,11
1,1111

SQL Regular expression to split a column (string) to multiple rows based on delimiter '/n'

I have to split a column to multiple rows. The column stores string, and we have to split delimiter based on '/n'
Have written the below query. But not able to specify ^[/n]. The other 'n' in the string is also getting removed. Please help to parse the string
WITH sample AS
( SELECT 101 AS id,
'Name' test,
'3243243242342342/n12131212312/n123131232/n' as attribute_1,
'test value/nneenu not/nhoney' as attribute_2
FROM DUAL
)
-- end of sample data
SELECT id,
test,
regexp_substr(attribute_1,'[^/n]+', 1, column_value),
regexp_substr(attribute_2,'[^/]+', 1, column_value)
FROM sample,
TABLE(
CAST(
MULTISET(SELECT LEVEL
FROM dual
CONNECT BY LEVEL <= LENGTH(attribute_1) - LENGTH(replace(attribute_1, '/n')) + 1
) AS sys.OdciNumberList
)
)
WHERE regexp_substr(attribute_1,'[^/n]+', 1, column_value) IS NOT NULL
/
you need to use class [[:cntrl:]]
and '[^/n]+' is not syntactically good either.
the escape char is '\' and you cannot use [] to "wrap" special chars, you need to use () instead.(that is grouping)
if you want to ignore CR (e.g.'\n') , use [^[:cntrl:]] in the sec param in the regexp_substr
more help: http://psoug.org/snippet/Regular-Expressions--Regexp-Cheat-Sheet_856.htm
Assumption
/n is supposed to mean \n to match a newline ( strictly [Posix] speaking a LF character (hex x0a) ).
If this assumption is wrong, use (^|/n)(([^/]|/+[^n])+) as your regex and extract the part of interest using regexp_substr(attribute_1,'(^|/n)(([^/]|/+[^n])+)', 1, column_value, '', 2).
Solution
You cannot specify control characters in escape syntax within character classes. Using the posix character class [:cntrl:] works but suffers from the other characters included; for practical purposes, TAB ( #x09 ) might be a nuisance.
However, you can specify all characters in a regex character class composing the pattern string from literals and calls to the chr function:
-- ...
'3243243242342342'||chr(13)||chr(10)||'12131212312'||chr(13)||chr(10)||'123131232'||chr(13)||chr(10) as attribute_1,
'test value'||chr(13)||chr(10)||'neenu not'||chr(13)||chr(10)||'honey' as attribute_2
-- ...
regexp_substr(attribute_1,'[^'||chr(13)||chr(10)||']+', 1, column_value),
regexp_substr(attribute_2,'[^'||chr(13)||chr(10)||']+', 1, column_value)
-- ...
You may want to check out the following test queries in sqlplus (the cr/lfs are part of the literals; copy into a text editor, check that the cr/lfs are preserved, re-insert if not, drop the result in sqlplus):
select regexp_substr('adda
yxcv','[^'||CHR(10)||CHR(13)||']+', 1, 2) from dual;
select regexp_substr('ad'||CHR(9)||'da
yxcv','[^[:cntrl:]]+', 1, 2) from dual;
with test as (select 'ABC' || chr(13) || 'DEF' || chr(13) || 'GHI' || chr(13) || 'JKL' || chr(13) || 'MNO' str from dual)
select regexp_substr (str, '[^' || chr(13) || ']+', 1, rownum) split
from test
connect by level <= length (regexp_replace (str, '[^' || chr(13) || ']+')) + 1
First choice would be to fix the data model as data stored this way is not optimal. At any rate, try this version with some more test data. I tweaked the regex's:
WITH sample AS
( SELECT 101 AS id,
'Name' test,
'3243243242342342/n12131212312/n123131232/n' as attribute_1,
'test value/nneenu not/nhoney' as attribute_2
FROM DUAL
)
-- end of sample data
SELECT id,
test,
regexp_substr(attribute_1,'(.*?)(/n|$)', 1, column_value, NULL, 1),
regexp_substr(attribute_2,'(.*?)(/n|$)', 1, column_value, NULL, 1)
FROM sample,
TABLE(
CAST(
MULTISET(SELECT LEVEL
FROM dual
--CONNECT BY LEVEL <= LENGTH(attribute_1) - LENGTH(replace(attribute_1, '/n')) + 1
-- Counts substrings ending with the delimiter.
CONNECT BY LEVEL <= REGEXP_COUNT(attribute_1, '.*?/n')
) AS sys.OdciNumberList
)
)
WHERE regexp_substr(attribute_1,'(.*?)(/n|$)', 1, column_value, NULL, 1) IS NOT NULL
/

using Oracle SQL - regexp_substr to split a record

I need to split the record for column CMD.NUM_MAI which may contain ',' or ';'.
I did this but it gave me an error:
SELECT REGEXP_SUBSTR (expression.num_mai,
'[^;|,]+',
1,
LEVEL)
FROM (SELECT CMD.num_cmd,
(SELECT COMM.com
FROM COMM
WHERE COMM.cod_soc = CMD.cod_soc AND COMM.cod_com = 'URL_DSD')
AS cod_url,
NVL (CONTACT.nom_cta, TIERS.nom_ct1) AS nom_cta,
NVL (CONTACT.num_mai, TIERS.num_mai) AS num_mai,
NVL (CONTACT.num_tel, TIERS.num_tel) AS num_tel,
TO_CHAR (SYSDATE, 'hh24:MI') AS heur_today
FROM CMD, TIERS, CONTACT
WHERE ( (CMD.cod_soc = :CMD_cod_soc)
AND (CMD.cod_eta = :CMD.cod_eta)
AND (CMD.typ_cmd = :CMD.typ_cmd)
AND (CMD.num_cmd = :CMD.num_cmd))
AND (TIERS.cod_soc(+) = CMD.cod_soc)
AND (TIERS.cod_trs(+) = CMD.cod_trs_tra)
AND (TIERS.cod_soc = CONTACT.cod_soc(+))
AND (TIERS.cod_trs = CONTACT.cod_trs(+))
AND (CONTACT.lib_cta(+) = 'EDITION')) experssion
CONNECT BY REGEXP_SUBSTR (expression.num_mai,'[^;|,]+',1,LEVEL)
Error 1:
The expression in CONNECT BY clause is unary. You have to specify both left and right hand side operands.
Try something like,
CONNECT BY REGEXP_SUBSTR (expression.num_mai,'[^;|,]+',1,LEVEL) IS NOT NULL
Error 2:
Your bind variable name is wrong. Ex: :CMD_cod_eta
Perhaps you wanted this way!
( (CMD.cod_soc = :CMD_cod_soc)
AND (CMD.cod_eta = :CMD_cod_eta)
AND (CMD.typ_cmd = :CMD_typ_cmd)
AND (CMD.num_cmd = :CMD_num_cmd))
This is a common question, I'd put into a function, then call it as needed:
CREATE OR REPLACE function fn_split(i_string in varchar2, i_delimiter in varchar2 default ',', b_dedup_tokens in number default 0)
return sys.dbms_debug_vc2coll
as
l_tab sys.dbms_debug_vc2coll;
begin
select regexp_substr(i_string,'[^' || i_delimiter || ']+', 1, level)
bulk collect into l_tab
from dual
connect by regexp_substr(i_string, '[^' || i_delimiter || ']+', 1, level) is not null
order by level;
if (b_dedup_tokens > 0) then
return l_tab multiset union distinct l_tab;
end if;
return l_tab;
end;
/
This will return a table of varchar2(1000), dbms_debug_vc2coll, which is a preloaded type owned by SYS (or you could create your own type using 4000 perhaps). Anyway, an example using it (with space, comma, or semi-colon used as delimiters):
with test_data as (
select 1 as id, 'A;test;test;string' as test_string from dual
union
select 2 as id, 'Another string' as test_string from dual
union
select 3 as id,'A,CSV,string' as test_string from dual
)
select d.*, column_value as token
from test_data d, table(fn_split(test_string, ' ,;', 0));
Output:
ID TEST_STRING TOKEN
1 A;test;test;string A
1 A;test;test;string test
1 A;test;test;string test
1 A;test;test;string string
2 Another string Another
2 Another string string
3 A,CSV,string A
3 A,CSV,string CSV
3 A,CSV,string string
You can pass 1 instead of 0 to fn_split to dedup the tokens (like the repeated "test" token above)

Oracle REGEXP_LIKE to use lookaround or AND ignoring the order

I am trying to find a match with a given list of search parameters appearing anywhere in the given string. The search parameters can be OR or AND. REGEXP_LIKE with REPLACE works fine with OR (|) but not able to do for AND. Here is an example of OR:
select 'match' from dual WHERE REGEXP_LIKE('BCR081', REPLACE ('BCR;081', ';', '|')); --works
select 'match' from dual WHERE REGEXP_LIKE('BCR081', '(' || REPLACE ('BCR;081', ';', ').*?(') || ')');
-- Works when they are in order but order shouldn't matter.
select 'match' from dual WHERE REGEXP_LIKE('BCR081', '(' || REPLACE ('081;BCR', ';', ').*?(') || ')'); --I need this to work.
Is there something equivalent to
select 'match' from dual WHERE REGEXP_LIKE('BCR081', REPLACE ('BCR;081', ';', '&'));
Any help is greatly appreciated. I tried (look ahead?):
select 'match' from dual WHERE REGEXP_LIKE('BCR081','(?=' || REPLACE ('081;BCR', ';', ')(?=') || ')');
Note: The above is an example only, we can have anywhere from 1-20 search parameters. Also can't use the contains clause as it will throw too many results error.
We can tokenise the search string and look for one by one, by virtually generating rows.
with my_data as
(
select 'BCR081' as str , 'BCR;081' as pattern from dual
),
pattern_table as
(
select str, regexp_substr(pattern,'[^;]+',1,level) as pattern
from my_data
connect by level <= regexp_count(pattern,';') + 1
)
SELECT DECODE(
COUNT(DECODE(
INSTR(a.str,b.pattern),
0,
null,
1)
),
0,
'no match',
'match'
) as result
FROM my_data a, pattern_table b
WHERE a.str = b.str
GROUP BY a.str;

How to tokenize a string in Oracle and convert each token to NUMBER to use them in a query as part of IN clause?

Suppose I have a string '1,2,3'
I want to tokenize the string and convert each of the tokens into NUMBER. So the above string will be tokenized into :
1 NUMBER
2 NUMBER
3 NUMBER
The final intention is to use them in a query as part of IN clause as below :
select * from sample where type in (1,2,3) ;
How can I achieve this ? One important point here is the string can have different number of tokens in different situations. So it can be either '1,2,3' or '1,2' or '1,2,3,4' or even '1'.
Please help me out guys.
Thanks in advance.
Please try:
with test as
(
select '1,2,3' str from dual
)
select * from sample
where type in(
select regexp_substr (str, '[^,]+', 1, rownum) split
from test
connect by level <= length (regexp_replace (str, '[^,]+')) + 1);
Depending on what you are doing, it might be faster to convert the id to a String and look for it in your String. Just add a comma to the beginning and the end of your list.
SELECT id
FROM (SELECT 1 AS id FROM DUAL
UNION
SELECT 2 FROM DUAL
UNION
SELECT 3 FROM DUAL) idtable
WHERE ',' || '1,3,4,5' || ',' LIKE '%,' || idtable.id || ',%'