How to use REGEXP_SUBSTR properly? - sql

Currently in my select statement I have id and value. The value is json which looks like this:
{"layerId":"nameOfLayer","layerParams":{some unnecessary data}
I would like to have in my select id and nameOfLayer so the output would be for example:
1, layerName
2, layerName2
etc.
The json looks always the same so the layerID is the first.
Could you tell me how can I use REGEXP_SUBSTR properly in my select query which looks like this now?
select
id,
value
from
...
where
table1.id = table2.bookmark_id
and ...;

In Oracle 11g, you can extract the layerId using the following regular expression, where js is the name of your JSON column:
regexp_replace(js, '^.*"layerId":"([^"]+).*$', '\1')
This basically extracts the string between double quotes after "layerId":.
In more recent versions, you would add a check constraint on the table to ensure that the document is valid JSON, and then use the dot notation to access the object attribute as follows:
create table mytable (
id int primary key,
js varchar2(200),
constraint ensure_js_is_json check (js is json)
);
insert into mytable values (1, '{"layerId":"nameOfLayer","layerParams":{} }');
select id, t.js.layerId from mytable t;
Demo on DB Fiddle:
ID | LAYERID
-: | :----------
1 | nameOfLayer

Don't use regular expressions; use a JSON_TABLE or JSON_VALUE to parse JSON:
Oracle 18c Setup:
CREATE TABLE test_data (
id INTEGER,
value VARCHAR2(4000)
);
INSERT INTO test_data ( id, value )
SELECT 1, '{"layerId":"nameOfLayer","layerParams":{"some":"unnecessary data"}}' FROM DUAL UNION ALL
SELECT 2, '{"layerParams":{"layerId":"NOT THIS ONE!"},"layerId":"nameOfLayer"}' FROM DUAL UNION ALL
SELECT 3, '{"layerId":"Name with \"Quotes\"","layerParams":{"layerId":"NOT THIS ONE!"}}' FROM DUAL;
Query 1:
SELECT t.id,
j.layerId
FROM test_data t
CROSS JOIN
JSON_TABLE(
t.value,
'$'
COLUMNS (
layerId VARCHAR2(50) PATH '$.layerId'
)
) j
Query 2:
If you only want a single value you could, alternatively, use JSON_VALUE:
SELECT id,
JSON_VALUE( value, '$.layerId' ) AS layerId
FROM test_data
Output:
Both output:
ID | LAYERID
-: | :-----------------
1 | nameOfLayer
2 | nameOfLayer
3 | Name with "Quotes"
Query 3:
You can try regular expressions but they do not always work as expected:
SELECT id,
REPLACE(
REGEXP_SUBSTR( value, '[{,]"layerId":"((\\"|[^"])*)"', 1, 1, NULL, 1 ),
'\"',
'"'
) AS layerID
FROM test_data
Output:
ID | LAYERID
-: | :-----------------
1 | nameOfLayer
2 | NOT THIS ONE!
3 | Name with "Quotes"
So if you can guarantee that no-one is going to put data into the database where the JSON is in a different order then this may work; however the JSON specification allows key-value pairs to be in any order so regular expressions are not a general solution that will parse every JSON string. You should be using a proper JSON parser and there are 3rd party solutions available for Oracle 11g or you can upgrade to Oracle 12c where there is a native solution.
db<>fiddle here

I think you can use regexp_substr like this:
regexp_substr(str, '[^"]+',1,2) as layer_id,
regexp_substr(str, '[^"]+',1,4) as layername
Db<>fiddle demo
Cheers!!

Related

Oracle 12c Json split

This is how I am getting result in Oracle 12c
Id
Start Date Range
End Date Range
1
[ "2019-01-07","2019-02-17","2019-03-17"]
[ "2019-01-14","2019-02-21","2019-03-21"]
And I want it
Id
Start Date Range
End Date Range
1
2019-01-07
2019-01-14
1
2019-02-17
2019-02-21
1
2019-03-17
2019-03-21
Earlier I had asked this question for single-column split and below is the link
How to replace special characters and then break line in oracle
But when I add another column there is the cartesian product.
You can use json_table to extract the strings from the JSON arrays, presumably as actual dates:
select t.id, s.n, s.start_date, e.end_date
from your_table t
cross apply json_table (
t.start_range, '$[*]'
columns
n for ordinality,
start_date date path '$'
) s
join json_table (
t.end_range, '$[*]'
columns
n for ordinality,
end_date date path '$'
) e
on e.n = s.n
The for ordinality clauses provide an index into each array, and the join then matches up the 'related' array entries.
ID | N | START_DATE | END_DATE
-: | -: | :--------- | :--------
1 | 1 | 07-JAN-19 | 14-JAN-19
1 | 2 | 17-FEB-19 | 21-FEB-19
1 | 3 | 17-MAR-19 | 21-MAR-19
If you want string rather than dates for some reason you can just change the data type in the column clauses.
db<>fiddle
You can also do it with regexp_substr and connect by after replacing [,] and " with empty spaces.
Schema and insert statements:
create table testtable(Id int, Start_Date_Range varchar(500), End_Date_Range varchar(500));
insert into testtable values(1 ,'[ "2019-01-07","2019-02-17","2019-03-17"]', '[ "2019-01-14","2019-02-21","2019-03-21"]');
Query:
select distinct id, trim(regexp_substr(replace(replace(replace(Start_Date_Range,'"',''),'[',''),']',''),'[^,]+', 1, level) ) Start_Date_Range,
trim(regexp_substr(replace(replace(replace(end_Date_Range,'"',''),'[',''),']',''),'[^,]+', 1, level) ) End_Date_Range,
level
from testtable
connect by regexp_substr(Start_Date_Range, '[^,]+', 1, level) is not null
order by id, level;
Output:
ID
START_DATE_RANGE
END_DATE_RANGE
LEVEL
1
2019-01-07
2019-01-14
1
1
2019-02-17
2019-02-21
2
1
2019-03-17
2019-03-21
3
db<fiddle here
In a comment to the OP, I pointed out that the data model is not quite right. The values in the two JSON arrays are related; such data should be encoded in a single object, not two. There should be a single array of objects, each object having two members: start date and end date.
To illustrate the data model I am suggesting, I start with a sample input table (with an additional id), I use Alex Poole's answer exactly as is to generate the table the OP asked about, but then I follow that with JSON generating functions to put the data back into JSON format to illustrate how I think the input data should look like. (The provider of the JSON string should create a JSON in this format, rather than sending two separate JSON arrays of strings representing dates).
What I do not show here is how to use a single call to JSON_TABLE to split the data from the single array of objects created at the end of my query. That is a lot simpler than the query to get data out of two separate JSON arrays.
NOTE - this is not really an answer; I wrote it as an answer because it obviously wouldn't fit in a comment.
with
t (id, start_date_range, end_date_range) as (
select 1, '["2019-01-07","2019-02-17","2019-03-17"]',
'["2019-01-14","2019-02-21","2019-03-21"]' from dual union all
select 5, '["2020-04-23","2020-06-15"]',
'["2020-04-30","2020-06-19"]' from dual
)
, shown_as_table(id, n, start_date, end_date) as (
select t.id, s.n, to_char(s.start_date, 'yyyy-mm-dd'),
to_char(e.end_date, 'yyyy-mm-dd')
from t
cross apply json_table (
t.start_date_range, '$[*]'
columns
n for ordinality,
start_date date path '$'
) s
join json_table (
t.end_date_range, '$[*]'
columns
n for ordinality,
end_date date path '$'
) e
on e.n = s.n
)
select id, json_arrayagg(
json_object('start' value start_date, 'end' value end_date)
format json
order by n
) as date_range_array
from shown_as_table
group by id
;
Output:
ID DATE_RANGE_ARRAY
-- -------------------------------------------------------------------------------------------------------------------------------
1 [{"start":"2019-01-07","end":"2019-01-14"},{"start":"2019-02-17","end":"2019-02-21"},{"start":"2019-03-17","end":"2019-03-21"}]
5 [{"start":"2020-04-23","end":"2020-04-30"},{"start":"2020-06-15","end":"2020-06-19"}]

Oracle Select From JSON Array

I have a table for some 'settings' and in that table I have a record with a json array. It is a simple array, like this:
"['scenario1','scenario2','scenario3']"
I want to use a sub-select statement in a view to pull this information out so I can use it like this:
select * from table where field_scenario in (select ????? from settings_table where this=that)
I have been looking through documentation and googling for this but for the life of me I can't figure out how to 'pivot' the returning array into individual elements in order to use it.
Oracle 12c I believe, thanks in advance.
Do NOT use regular expression to parse JSON. Use a proper JSON parser:
select *
from table_name
where field_scenario in (
SELECT j.value
FROM settings_table s
OUTER APPLY (
SELECT value
FROM JSON_TABLE(
s.json,
'$[*]'
COLUMNS(
value VARCHAR2(50) PATH '$'
)
)
) j
)
Which, for the sample data:
CREATE TABLE settings_table ( json CLOB CHECK ( json IS JSON ) );
INSERT INTO settings_table ( json ) VALUES ( '["scenario1","scenario2","scenario3"]');
INSERT INTO settings_table ( json ) VALUES ( '["scenario5"]');
INSERT INTO settings_table ( json ) VALUES ( '["scenario \"quoted\""]');
INSERT INTO settings_table ( json ) VALUES ( '["scenario2,scenario4"]');
CREATE TABLE table_name ( id, field_scenario ) AS
SELECT LEVEL, 'scenario'||LEVEL FROM DUAL CONNECT BY LEVEL <= 6 UNION ALL
SELECT 7, 'scenario "quoted"' FROM DUAL;
Outputs:
ID | FIELD_SCENARIO
-: | :----------------
1 | scenario1
2 | scenario2
3 | scenario3
5 | scenario5
7 | scenario "quoted"
db<>fiddle here

How I can access a JSON value when the key has dot inside its name (in oracle)?

How I can get value for the "b.c" key? I need to extract that 2 in an efficient way.
I just get null
select js,
json_value(js,'$.a') a_value, -- work correctly and give 1
json_value(js,'$.b.c') b_c_value, -- doesn't work and give null
json_query(js,'$.b.c') b_c_query -- I saw this solution somwhere but this also give null
from (select '{"a": "1","b.c": "2"}' js from dual) -- sample JSON
Put the key in double quotes. So,
json_value(js,'$."b.c"') b_c_value
Full example:
select js,
json_value(js,'$."b.c"') b_c_value
from (select '{"a": "1","b.c": "2"}' js from dual) -- sample JSON
+-----------------------+-----------+
| JS | B_C_VALUE |
+-----------------------+-----------+
| {"a": "1","b.c": "2"} | 2 |
+-----------------------+-----------+
One option would be using JSON_TABLE() function introduced in DB version 12.1.0.2:
WITH t(js) AS
(
SELECT '{"a": "1","b.c": "2"}' FROM dual
)
SELECT a_value, b_c_value
FROM t
CROSS JOIN JSON_TABLE(js, '$'
COLUMNS (
a_value INT PATH '$.a',
b_c_value INT PATH '$."b.c"' ) )
A_VALUE B_C_VALUE
------- ---------
1 2
Demo

(SQLite) Selecting # separated data as multiple rows

I have a table wihch contains foreign keys concactenated (separator #) in a field. I want to transform the data to one row per FK so that I can do a join on the data.
My table looks like this:
ARCHE
id_a | str_ids
str_ids field contains concactenate FK as follows: #id1#id2#id4#id7#
(There is a different number of agregated id's for each row)
I am not really familiar with SQLite, and I have trouble finding the equivalent. I understood I have to do this "with recursive" but it seems I can't get the hang of this.
The oracle equivalent of what I am looking for is as follows:
select
id_a
,trim(regexp_substr(str_ids, '[^#]+', 1, LEVEL)) as id_b
from arche
connect by trim(regexp_substr(str_ids, '[^#]+', 1, LEVEL)) IS NOT NULL
In SQLite you can use a recursive common table expression to solve this. The recursive CTE selects from the original table and splits the strings into subparts, which the main query selects from.
WITH RECURSIVE cte(id, val, etc) AS(
SELECT id_a, '', str_ids FROM arche
UNION ALL
SELECT
id
, SUBSTR(etc, 0, INSTR(etc, '#'))
, SUBSTR(etc, INSTR(etc, '#')+1)
FROM cte
WHERE etc <> ''
)
SELECT id AS id_a, REPLACE(val, 'id', '') AS id_b
FROM cte
WHERE val <> ''
ORDER BY id, val
Here is an example :
Schema (SQLite v3.26)
Query #1
WITH RECURSIVE cte(id, val, etc) AS(
SELECT 1, '', '#id1#id2#id4#id7#'
UNION ALL
SELECT
id
, SUBSTR(etc, 0, INSTR(etc, '#'))
, SUBSTR(etc, INSTR(etc, '#')+1)
FROM cte
WHERE etc <> ''
)
SELECT id AS id_a, val AS id_b
FROM cte
WHERE val <> ''
ORDER BY id, val;
| id_a | id_b |
| ---- | ---- |
| 1 | id1 |
| 1 | id2 |
| 1 | id4 |
| 1 | id7 |
View on DB Fiddle
NB2 :
REGEXP_REPLACE does not exists in SQLite, I replaced it with REPLACE
you need a # ad the end of string for this to work (having two #` is ok, too)
this is not a very performant approach ; if you have lots of rows to process, this might not scale well.

Convert an array into a Map

I have a table with a column like
[{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}]
Which is of the format array<struct<key:string,value:array<string>>>
I want to convert the column into below format :
{"e":["253","203","204"],"st":["mi"],"k2":["1","2"]}
which is of the type map<string,array<string>>
I have tried exploding the array but that does not work. Any ideas how I can do this in hive.
Without use of external libraries it's impossible. Please refer to brickhouse or create your own UDAF.
Note: further code provides snippets to reproduce the problem and solving the problem that Hive's built-in functions can solve. i.e map<string,string> not map<string, array<string>>.
-- reproducing the problem
CREATE TABLE test_table(id INT, input ARRAY<STRUCT<key:STRING,value:ARRAY<STRING>>>);
INSERT INTO TABLE test_table
SELECT
1 AS id,
ARRAY(
named_struct("key","e", "value", ARRAY("253","203","204")),
named_struct("key","st", "value", ARRAY("mi")),
named_struct("key","k2", "value", ARRAY("1", "2"))
) AS input;
SELECT id, input FROM test_table;
+-----+-------------------------------------------------------------------------------------------------------+--+
| id | input |
+-----+-------------------------------------------------------------------------------------------------------+--+
| 1 | [{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}] |
+-----+-------------------------------------------------------------------------------------------------------+--+
With exploding and using STRUCT features, we can split the keys and values.
SELECT id, exploded_input.key, exploded_input.value
FROM (
SELECT id, exploded_input
FROM test_table LATERAL VIEW explode(input) d AS exploded_input
) x;
+-----+------+----------------------+--+
| id | key | value |
+-----+------+----------------------+--+
| 1 | e | ["253","203","204"] |
| 1 | st | ["mi"] |
| 1 | k2 | ["1","2"] |
+-----+------+----------------------+--+
The idea is to use your UDAF to "collect" a map while aggregating on id.
What Hive can solve with built in function is generating map<string,string> by converting rows to strings with a special delimiter, aggregate rows via another special delimiter and use str_to_map built-in function on the delimiters to generate map<string, string>.
SELECT
id,
str_to_map(
-- outputs: e:253,203,204#st:mi#k2:1,2 with delimiters between aggregated rows
concat_ws('#', collect_list(list_to_string)),
'#', -- first delimiter
':' -- second delimiter
) mapped_output
FROM (
SELECT
id,
-- outputs 3 rows: (e:253,203,203), (st:mi), (k2:1,2)
CONCAT(exploded_input.key,':' , CONCAT_WS(',', exploded_input.value)) as list_to_string
FROM (
SELECT id, exploded_input
FROM test_table LATERAL VIEW explode(input) d AS exploded_input
) x
) y
GROUP BY id;
Which outputs a string to string map like:
+-----+-------------------------------------------+--+
| id | mapped_output |
+-----+-------------------------------------------+--+
| 1 | {"e":"253,203,204","st":"mi","k2":"1,2"} |
+-----+-------------------------------------------+--+
with input_set as (
select array(named_struct('key','e','value',array('253','203','204')),named_struct('key','st','value',array('mi')),named_struct('key','k2','value',array('1','2'))) as input_array
), break_input_set as (
select y.col_num as y_col_num,y.col_value as y_col_value from input_set lateral view posexplode(input_set.input_array) y as col_num, col_value
), create_map as (
select map(y_col_value.key,y_col_value.value) as final_map from break_input_set
)
select * from create_map;
var Array = [{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}];
var obj = {}
for(var i=0;i<Array.length;i++){
obj[Array[i].key] = Array[i].value
}
obj will be in the required format