HIve Create Json Array that not contains duplicate - sql

I want to create an array of jsons that not contain duplicate . I had used LATERAL VIEW EXPLODE to break the initial Array , and now i want to group the string json i received and create merged jsons based on a key.
For example if i have :
Col1 :
{"key" : ke , "value" : 1 }
{"key" : ke , "value" : 2 }
{"key" : ke1 , "value" : 5 }
I would like to have
{"key" : ke , "value" : 3 }
{"key" : ke1 , "value" : 5 }
Can you help me?

select concat('{"key":"',jt.key,'","value":',sum(jt.value),'}')
from mytable t
lateral view json_tuple(Col1, 'key', 'value') jt as key,value
group by jt.key
;

Related

How to return nested json in one row using JSON_TABLE

I am struggling with the following query. I have json and it is splitting the duration and distance elements over two rows. I need them in one and cannot see where to change this.
The following will just work from SQL*Plus etc (Oracle 20.1)
select *
from json_table(
'{
"destination_addresses" : [ "Page St, London SW1P 4BG, UK" ],
"origin_addresses" : [ "Corbridge Drive, Luton, LU2 9UH, UK" ],
"rows" : [
{
"elements" : [
{
"distance" : {
"text" : "88 km",
"value" : 87773
},
"duration" : {
"text" : "1 hours 25 mins",
"value" : 4594
},
"status" : "OK"
}
]
}
],
"status" : "OK"
}',
'$' COLUMNS
(
origin_addresses varchar2(1000 char) path '$.origin_addresses[*]',
destination_addresses varchar2(1000 char) path '$.destination_addresses[*]',
nested path '$.rows.elements.distance[*]'
COLUMNS(
distance_text varchar2(100) path '$.text',
distance_value varchar2(100) path '$.value'
),
nested path '$.rows.elements.duration[*]'
COLUMNS(
duration_text varchar2(100) path '$.text',
duration_value varchar2(100) path '$.value'
)
)
);
Mathguy: I don't have influence over the JSON that Google returns but origin and destinations are arrays it is just this search is from A to Z and not A,B,C to X,Y,Z for example which would return 9 results instead of 1
See results here

Transform JSON map to an array in Postgres via update statement

I have a postgresql table called datasource with jsonb column called data. It has the following structure:
{
"key1":{"param1" : "value1", "param2" : "value2"},
"key2":{"param2_1" : "value2_1", "param2_2" : "value2_2"},
"key3":{"param3_1" : "value3_1", "param3_2" : "value3_2"}
}
Is there any way to write some UPDATE script to transform given JSON to the following:
[
{"key": "key1", "param1" : "value1", "param2" : "value2"},
{"key": "key2", "param2_1" : "value2_1", "param2_2" : "value2_2"},
{"key": "key3", "param3_1" : "value3_1", "param3_2" : "value3_2"}
]
You can unnest the object to rows in a lateral join, then aggregate back into an array:
select d.*, x.*
from datasource d
cross join lateral (
select jsonb_agg(jsonb_build_object('key', j.k) || j.v) new_data
from jsonb_each(d.data) as j(k, v)
) x
Demo on DB Fiddle - with jsonb_pretty() enabled:
If you wanted an update statement:
update datasource d
set data = (
select jsonb_agg(jsonb_build_object('key', j.k) || j.v)
from jsonb_each(d.data) as j(k, v)
)

How to unnest bigquery field that is stored as a string?

I am trying to unnest a field but something is wrong with my query.
Sample data in my table
'1234', '{ "id" : "123" , "items" : [ { "quantity" : 1 , "product" : { "id" : "p1" , "categories" : [ "cat1","cat2","cat3"] }}] }'
There are 2 fields in the dataset: row_id and parts, where parts is a dictionary object with list items (categories) in it but datatype of a parts is string. I would like the output to be individual row for each category.
This is what I have tried but I am not getting any result back.
#standardSQL
with t as (
select "1234" as row_id, '{ "id" : "123" , "items" : [ { "quantity" : 1 , "product" : { "id" : "p1" , "categories" : [ "cat1","cat2","cat3"] }}] }' as parts
)
select row_id, _categories
from t,
UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(parts, '$.items'), r'"categories":"(.+?)"')) _categories
expected result
id, _categories
1234, cat1
1234, cat2
1234, cat3
Below is for BigQuery Standard SQL
#standardSQL
WITH t AS (
SELECT "1234" AS row_id, '{ "id" : "123" , "items" : [ { "quantity" : 1 , "product" : { "id" : "p1" , "categories" : [ "cat1","cat2","cat3"] }}] }' AS parts
)
SELECT row_id, REPLACE(_categories, '"', '') _categories
FROM t, UNNEST(SPLIT(REGEXP_EXTRACT(
JSON_EXTRACT(parts, '$.items'),
r'"categories":\[(.+?)]'))
) _categories
and produces expected result
Row row_id _categories
1 1234 cat1
2 1234 cat2
3 1234 cat3
Update
Above solution was mostly focused on fixing regexp used in extract - but not addressed more generic case of having multiple products. Below solution addresses such more generic case
#standardSQL
WITH t AS (
SELECT "1234" AS row_id, '''{ "id" : "123" , "items" : [
{ "quantity" : 1 , "product" : { "id" : "p1" , "categories" : [ "cat1","cat2","cat3"] }},
{ "quantity" : 2 , "product" : { "id" : "p2" , "categories" : [ "cat4","cat5","cat6"] }}
] }''' AS parts
)
SELECT row_id, REPLACE(category, '"', '') category
FROM t, UNNEST(REGEXP_EXTRACT_ALL(parts, r'"categories" : \[(.+?)]')) categories,
UNNEST(SPLIT(categories)) category
with result
Row row_id category
1 1234 cat1
2 1234 cat2
3 1234 cat3
4 1234 cat4
5 1234 cat5
6 1234 cat6

JSON Bulk load with Apache Phoenix

I have a problem with loading data from json files. How can i export data from json files into the table in Hbase?
Here is json-structure:
{ "_id" : { "$oid" : "53ba5e86eb07565b53374901"} , "_api_method" : "database.getSchools" , "id" : "0" , "date_insert" : "2014-07-07 11:47:02" , "unixdate" : 1404722822 , "city_id" : "1506490" , "response" : [ 1 , { "id" : 354053 , "title" : "шк. Аджамская"}]};
Help me please!
For your json format, you could not use importtsv. I suggest you write a Mapreduce to parse you json data and put data to HBase.

Merging 2 rows in pentaho kettle transformation

My KTR is:
MongoDB Json Input gives the JSON as follows:
{ "_id" : { "$oid" : "525cf3a70fafa305d949ede0"} , "asset" :
"RO2500AS1" , "Salt Rejection" : "82%" , "Salt Passage" : "18%" ,
"Recovery" : "56.33%" , "Concentration Factor" : "2.3" , "status" :
"critical" , "Flow Alarm" : "High Flow"}
And one Table input which returns 2 rows:
In StreamLookUp step, Key to look up is configured as asset = AssetName
My final Output is returning 2 jsons:
{"data":[{"Estimated Cost":"USD
15","AssetName":"RO2500AS1","Description":"Pump
Maintenance","Index":1,"json":"{ \"_id\" : { \"$oid\" :
\"525cf3a70fafa305d949ede0\"} , \"asset\" : \"RO2500AS1\" , \"Salt
Rejection\" : \"82%\" , \"Salt Passage\" : \"18%\" , \"Recovery\" :
\"56.33%\" , \"Concentration Factor\" : \"2.3\" , \"status\" :
\"critical\" , \"Flow Alarm\" : \"High
Flow\"}","Type":"Service","DeadLine":"13 November 2013"}]}
{"data":[{"Estimated Cost":"USD
35","AssetName":"RO2500AS1","Description":"Heat
Sensor","Index":2,"json":"{ \"_id\" : { \"$oid\" :
\"525cf3a70fafa305d949ede0\"} , \"asset\" : \"RO2500AS1\" , \"Salt
Rejection\" : \"82%\" , \"Salt Passage\" : \"18%\" , \"Recovery\" :
\"56.33%\" , \"Concentration Factor\" : \"2.3\" , \"status\" :
\"critical\" , \"Flow Alarm\" : \"High
Flow\"}","Type":"Replacement","DeadLine":"26 November 2013"}]}
I want my final JSON output to merge show result something like:
{"data": [{"Estimated Cost":"USD 15", "AssetName":"RO2500AS1",
"Description":"Pump Maintenance", "Index":1, "Type":"Service",
"DeadLine":"13 November 2013"}, {"Estimated Cost":"USD 35",
"AssetName":"RO2500AS1", "Description":"Heat Sensor", "Index":2,
"Type":"Replacement", "DeadLine":"26 November 2013"}],
"json":{ "_id" : "525cf3a70fafa305d949ede0"} , "asset" : "RO2500AS1"
, "Salt Rejection" : "82%" , "Salt Passage" : "18%" , "Recovery" :
"56.33%" , "Concentration Factor" : "2.3" , "status" : "critical" ,
"Flow Alarm" : "High Flow"}
which means merging 2 rows.
Can anybody help please
you can use MergeJoin after Tableinput. That will merge the rows from Mysql output rows and you will have only one JSON as output...
You would want to use the Merge step for your purpose. Don't forget to sort the input streams.
Note: In this step rows are expected in to be sorted on the specified key fields. When using the Sort step, this works fine. When you sorted the data outside of PDI, you may run into issues with the internal case sensitive/insensitive flag