Json - Flatten Key and Values in Hive - hive

In a Hive table having a record JSON column value as: {"XXX": ["123","456"],"YYY": ["246","135"]} and ID as ABC
Need to flatten it as below in Hive query.
Key
Value
ID
XXX
123
ABC
XXX
456
ABC
YYY
246
ABC
YYY
135
ABC

The following uses get_json_object to extract json keys before using regexp_replace and split to convert the remaining values to arrays. With the assistance of explode and lateral views from the resulting subquery, the data has been extracted. The full reproducible example is below:
WITH input_df AS (
SELECT '{"XXX": ["123","456"],"YYY": ["246","135"]}' my_col
)
SELECT
t.key,
kv.kval as value
FROM (
SELECT
explode(map(
'XXX',
split(regexp_replace(get_json_object(my_col,'$.XXX'),'"|\\[|\\]',''),','),
'YYY',
split(regexp_replace(get_json_object(my_col,'$.YYY'),'"|\\[|\\]',''),',')
))
FROM
input_df
) t LATERAL VIEW explode(t.value) kv as kval
You may use the query below if your table/view is named input_df and your json column is my_col
SELECT
t.key,
kv.kval as value
FROM (
SELECT
explode(map(
'XXX',
split(regexp_replace(get_json_object(my_col,'$.XXX'),'"|\\[|\\]',''),','),
'YYY',
split(regexp_replace(get_json_object(my_col,'$.YYY'),'"|\\[|\\]',''),',')
))
FROM
input_df
) t LATERAL VIEW explode(t.value) kv as kval
Response To Updated Question 1:
SELECT
t.key,
kv.kval as value,
'ABC' as ID
FROM (
SELECT
explode(map(
'XXX',
split(regexp_replace(get_json_object(my_col,'$.XXX'),'"|\\[|\\]',''),','),
'YYY',
split(regexp_replace(get_json_object(my_col,'$.YYY'),'"|\\[|\\]',''),',')
))
FROM
input_df
) t LATERAL VIEW explode(t.value) kv as kval
Let me know if this works for you.

Related

building a table with dynamic columns from a key value array in snowflake

I have the following table -
ID , DATA
1 [{"key":"Apple", "value":2}, {"key":"Orange", "value":3}]
2 [{"key":"Apple", "value":5}, {"key":"Orange", "value":4}, {"key":"Cookie", "value":4}]
I'd like to build the following table :
Id, Apple, Orange, Cookie
1 2 3
2 5 4 4
Ive tried many combinations of parse_json and flatten but none seemed to support this structure.
Sample data:
CREATE OR REPLACE TABLE tab
AS
SELECT 1 ID, PARSE_JSON('[{"key":"Apple", "value":2}, {"key":"Orange", "value":3}]') AS DATA
UNION
SELECT 2, PARSE_JSON('[{"key":"Apple", "value":5}, {"key":"Orange", "value":4}, {"key":"Cookie", "value":4}]');
Step 1 - parse:
SELECT id, s.VALUE:key::TEXT AS key, s.VALUE:value::TEXT AS value
FROM tab
,LATERAL FLATTEN(input=>tab.DATA) s;
Output:
Step 2: Pivot
WITH cte AS (
SELECT id, s.VALUE:key::TEXT AS key, s.VALUE:value::TEXT AS value
FROM tab
,LATERAL FLATTEN(input=>tab.DATA) s
)
SELECT *
FROM cte
PIVOT(MAX(value) FOR KEY IN ('Apple', 'Orange', 'Cookie')) AS p;
Output:

Split single row value to multiple rows in Snowflake

I have a table where the column data has a combination of values seperated by ';'. I would like to split them into rows for each column value.
Table data
Now I would like to split them into multiple rows for each value like
I have tried using the below SQL statement.
SELECT DISTINCT COL_NAME FROM "DB"."SCHEMA"."TABLE,
LATERAL FLATTEN(INPUT=>SPLIT(COL_NAME,';'))
But the output is not as expected. Attaching the query output below.
Basically the query does nothing to my data.
It could be achieved using SPLIT_TO_TABLE table function:
This table function splits a string (based on a specified delimiter) and flattens the results into rows.
SELECT *
FROM tab, LATERAL SPLIT_TO_TABLE(column_name, ';')
I was able to resolve this by using LATERAL FLATTERN like a joining table and selecting the value from it.
SELECT DISTINCT A.VALUE AS COL_NAME
FROM "DB"."SCHEMA"."TABLE",
LATERAL SPLIT_TO_TABLE(COL_NAME,';')A
Looks your data has multiple delimiters , We can leverage STRTOK_SPLIT_TO_TABLE function using multiple delimiters..
STRTOK_SPLIT_TO_TABLE
WITH data AS (
SELECT *
FROM VALUES
('Greensboro-High Point-Winston-Salem;Norfolk-Portsmouth-Newport News Washington, D.C. Roanoke-Lynchburg Richmond-Petersburg')
v( cities))
select *
from data, lateral strtok_split_to_table(cities, ';-')
order by seq, index;
Result:
Your first attempt was very close, you just need to access the out of the flatten, instead of the input to the flatten
so using this CTE for data:
WITH fake_data AS (
SELECT *
FROM VALUES
('Greensboro-High Point-Winston-Salem;Norfolk-Portsmouth-Newport News;Washington, D.C.;Roanoke-Lynchburg;Richmond-Petersburg'),
('Knoxville'),
('Knoxville;Memphis;Nashville')
v( COL_NAME)
)
if you had aliased you tables, and accessed the parts.
SELECT DISTINCT f.value::text as col_name
FROM fake_data d,
LATERAL FLATTEN(INPUT=>SPLIT(COL_NAME,';')) f
;
which is what you did in your provided answer, but via SPLIT_TO_TABLE
SELECT DISTINCT f.value as col_name
FROM fake_data d,
TABLE(SPLIT_TO_TABLE(COL_NAME,';')) f
;
STRTOK_SPLIT_TO_TABLE also is the same thing:
SELECT DISTINCT f.value as col_name
FROM fake_data d,
TABLE(strtok_split_to_table(COL_NAME,';')) f
;
Which can also be done via a strtok_to_array and FLATTEN that
SELECT DISTINCT f.value as col_name
FROM fake_data d,
TABLE(FLATTEN(input=>STRTOK_TO_ARRAY(COL_NAME,';'))) f
;
COL_NAME
Greensboro-High Point-Winston-Salem
Norfolk-Portsmouth-Newport News
Washington, D.C.
Roanoke-Lynchburg
Richmond-Petersburg
Knoxville
Memphis
Nashville

Oracle Select From JSON Array

I have a table for some 'settings' and in that table I have a record with a json array. It is a simple array, like this:
"['scenario1','scenario2','scenario3']"
I want to use a sub-select statement in a view to pull this information out so I can use it like this:
select * from table where field_scenario in (select ????? from settings_table where this=that)
I have been looking through documentation and googling for this but for the life of me I can't figure out how to 'pivot' the returning array into individual elements in order to use it.
Oracle 12c I believe, thanks in advance.
Do NOT use regular expression to parse JSON. Use a proper JSON parser:
select *
from table_name
where field_scenario in (
SELECT j.value
FROM settings_table s
OUTER APPLY (
SELECT value
FROM JSON_TABLE(
s.json,
'$[*]'
COLUMNS(
value VARCHAR2(50) PATH '$'
)
)
) j
)
Which, for the sample data:
CREATE TABLE settings_table ( json CLOB CHECK ( json IS JSON ) );
INSERT INTO settings_table ( json ) VALUES ( '["scenario1","scenario2","scenario3"]');
INSERT INTO settings_table ( json ) VALUES ( '["scenario5"]');
INSERT INTO settings_table ( json ) VALUES ( '["scenario \"quoted\""]');
INSERT INTO settings_table ( json ) VALUES ( '["scenario2,scenario4"]');
CREATE TABLE table_name ( id, field_scenario ) AS
SELECT LEVEL, 'scenario'||LEVEL FROM DUAL CONNECT BY LEVEL <= 6 UNION ALL
SELECT 7, 'scenario "quoted"' FROM DUAL;
Outputs:
ID | FIELD_SCENARIO
-: | :----------------
1 | scenario1
2 | scenario2
3 | scenario3
5 | scenario5
7 | scenario "quoted"
db<>fiddle here

SQL select dummy data

I have attempted to create some dummy data from a select statement. I can easily create 1 column with 1 dummy data, or 2 columns with 1 dummy data, but how can I go about making 1 column with 2 dummy data(2 rows)?
(No column name)
dummy1
dummy2
Select statements that are 1 dummy data per column:
Select 'dummy'
Select 'dummy1','dummy2'
Just another option with one or multiple columns
Single Column
Select *
From (values ('Dummy1')
,('Dummy2')
) A(Dummies)
Returns
Dummies
Dummy1
Dummy2
Multiple Columns
Select *
From (values ('Dummy1',1)
,('Dummy2',2)
) A(Dummies,Value)
Returns
Dummies Value
Dummy1 1
Dummy2 2
You would have to use UNION with two select statements:
SELECT 'dummy1' AS [Dummies]
UNION
SELECT 'dummy2'
This will produce a single column.
Dummies
-------
dummy1
dummy2
Select 'dummy1,dummy2' as dummy
Not sure why you'd want to though...

sql query db2 9.7 [duplicate]

Is there a built in function for comma separated column values in DB2 SQL?
Example: If there are columns with an ID and it has 3 rows with the same ID but have three different roles, the data should be concatenated with a comma.
ID | Role
------------
4555 | 2
4555 | 3
4555 | 4
The output should look like the following, per row:
4555 2,3,4
LISTAGG function is new function in DB2 LUW 9.7
see example:
create table myTable (id int, category int);
insert into myTable values (1, 1);
insert into myTable values (2, 2);
insert into myTable values (5, 1);
insert into myTable values (3, 1);
insert into myTable values (4, 2);
example: select without any order in grouped column
select category, LISTAGG(id, ', ') as ids from myTable group by category;
result:
CATEGORY IDS
--------- -----
1 1, 5, 3
2 2, 4
example: select with order by clause in grouped column
select
category,
LISTAGG(id, ', ') WITHIN GROUP(ORDER BY id ASC) as ids
from myTable
group by category;
result:
CATEGORY IDS
--------- -----
1 1, 3, 5
2 2, 4
I think with this smaller query, you can do what you want.
This is equivalent of MySQL's GROUP_CONCAT in DB2.
SELECT
NUM,
SUBSTR(xmlserialize(xmlagg(xmltext(CONCAT( ', ',ROLES))) as VARCHAR(1024)), 3) as ROLES
FROM mytable
GROUP BY NUM;
This will output something like:
NUM ROLES
---- -------------
1 111, 333, 555
2 222, 444
assumming your original result was something like that:
NUM ROLES
---- ---------
1 111
2 222
1 333
2 444
1 555
Depending of the DB2 version you have, you can use XML functions to achieve this.
Example table with some data
create table myTable (id int, category int);
insert into myTable values (1, 1);
insert into myTable values (2, 2);
insert into myTable values (3, 1);
insert into myTable values (4, 2);
insert into myTable values (5, 1);
Aggregate results using xml functions
select category,
xmlserialize(XMLAGG(XMLELEMENT(NAME "x", id) ) as varchar(1000)) as ids
from myTable
group by category;
results:
CATEGORY IDS
-------- ------------------------
1 <x>1</x><x>3</x><x>5</x>
2 <x>2</x><x>4</x>
Use replace to make the result look better
select category,
replace(
replace(
replace(
xmlserialize(XMLAGG(XMLELEMENT(NAME "x", id) ) as varchar(1000))
, '</x><x>', ',')
, '<x>', '')
, '</x>', '') as ids
from myTable
group by category;
Cleaned result
CATEGORY IDS
-------- -----
1 1,3,5
2 2,4
Just saw a better solution using XMLTEXT instead of XMLELEMENT here.
Since DB2 9.7.5 there is a function for that:
LISTAGG(colname, separator)
check this for more information: Using LISTAGG to Turn Rows of Data into a Comma Separated List
My problem was to transpose row fields(CLOB) to column(VARCHAR) with a CSV and use the transposed table for reporting. Because transposing on report layer slows down the report.
One way to go is to use recursive SQL. You can find many articles about that but its difficult and resource consuming if you want to join all your recursive transposed columns.
I created multiple global temp tables where I stored single transposed columns with one key identifier. Eventually, I had 6 temp tables for joining 6 columns but due to limited resource allocation I wasnt able to bring all columns together. I opted to below 3 formulas and then I just had to run 1 query which gave me output in 10 seconds.
I found various articles on using XML2CLOB functions and have found 3 different ways.
REPLACE(VARCHAR(XML2CLOB(XMLAGG(XMLELEMENT(NAME "A",ALIASNAME.ATTRIBUTENAME)))),'', ',') AS TRANSPOSED_OUTPUT
NVL(TRIM(',' FROM REPLACE(REPLACE(REPLACE(CAST(XML2CLOB(XMLAGG(XMLELEMENT(NAME "E", ALIASNAME.ATTRIBUTENAME))) AS VARCHAR(100)),'',' '),'',','), '', 'Nothing')), 'Nothing') as TRANSPOSED_OUTPUT
RTRIM(REPLACE(REPLACE(REPLACE(VARCHAR(XMLSERIALIZE(XMLAGG(XMLELEMENT(NAME "A",ALIASNAME.ATTRIBUTENAME) ORDER BY ALIASNAME.ATTRIBUTENAME) AS CLOB)), '',','),'',''),'','')) AS TRANSPOSED_OUTPUT
Make sure you are casting your "ATTRIBUTENAME" to varchar in a subquery and then calling it here.
other possibility, with recursive cte
with tablewithrank as (
select id, category, rownumber() over(partition by category order by id) as rangid , (select count(*) from myTable f2 where f1.category=f2.category) nbidbycategory
from myTable f1
),
cte (id, category, rangid, nbidbycategory, rangconcat) as (
select id, category, rangid, nbidbycategory, cast(id as varchar(500)) from tablewithrank where rangid=1
union all
select f2.id, f2.category, f2.rangid, f2.nbidbycategory, cast(f1.rangconcat as varchar(500)) || ',' || cast(f2.id as varchar(500)) from cte f1 inner join tablewithrank f2 on f1.rangid=f2.rangid -1 and f1.category=f2.category
)
select category, rangconcat as IDS from cte
where rangid=nbidbycategory
Try this:
SELECT GROUP_CONCAT( field1, field2, field3 ,field4 SEPARATOR ', ')