JSON and Teradata - sql

I have the following JSON:
'{"0": false,"1": false,"barring": "BAR_ROAMING"}'
There is a propriety in teradata for Json that can be used to extract barring value F_JSON.barring --> BAR_ROAMING
But for the other 2, which are dynamic keys, how can I extract them?

You can use the JSONExtractValue function:
select JsonCol.JSONExtractValue('$.[0]') as FirstOne
, JsonCol.JSONExtractValue('$.[1]') as SecondOne
from (
select new json('{"0": false,"1": false,"barring": "BAR_ROAMING"}')
) MyJsonData(JsonCol)
https://docs.teradata.com/r/HN9cf0JB0JlWCXaQm6KDvw/aaGwlJOTKsXk4IaU7vsE6g

I ended up using
CREATE TABLE KEY_JSON AS (
SELECT DISTINCT(JSONKeys) J_KEY FROM Json_Keys
(
ON (SELECT JSON FROM JSON_TABLE) USING QUOTES('N'))
AS json_data) WITH DATA;
And performing a JOIN between my 2 tables (JSON_TABLE and KEY_JSON) ON JSON LIKE '%||J_KEY||%'
And extracting the value using JSONEXTRACT(JSON.'$."||J_KEY)

Related

Extract complex json with random key field

I am trying to extract the following JSON into its own rows like the table below in Presto query. The issue here is the name of the key/av engine name is different for each row, and I am stuck on how I can extract and iterate on the keys without knowing the value of the key.
The json is a value of a table row
{
"Bkav":
{
"detected": false,
"result": null,
},
"Lionic":
{
"detected": true,
"result": Trojan.Generic.3611249',
},
...
AV Engine Name
Detected Virus
Result
Bkav
false
null
Lionic
true
Trojan.Generic.3611249
I have tried to use json_extract following the documentation here https://teradata.github.io/presto/docs/141t/functions/json.html but there is no mention of extraction if we don't know the key :( I am trying to find a solution that works in both presto & hive query, is there a common query that is applicable to both?
You can cast your json to map(varchar, json) and process it with unnest to flatten:
-- sample data
WITH dataset (json_str) AS (
VALUES (
'{"Bkav":{"detected": false,"result": null},"Lionic":{"detected": true,"result": "Trojan.Generic.3611249"}}'
)
)
--query
select k "AV Engine Name", json_extract_scalar(v, '$.detected') "Detected Virus", json_extract_scalar(v, '$.result') "Result"
from (
select cast(json_parse(json_str) as map(varchar, json)) as m
from dataset
)
cross join unnest (map_keys(m), map_values(m)) t(k, v)
Output:
AV Engine Name
Detected Virus
Result
Bkav
false
Lionic
true
Trojan.Generic.3611249
The presto query suggested by #Guru works, but for hive, there is no easy way.
I had to extract the json
Parse it with replace to remove some character and bracket
Then convert it back to a map, and repeat for one more time to get the nested value out
SELECT
av_engine,
str_to_map(regexp_replace(engine_result, '\\}', ''),',', ':') AS output_map
FROM (
SELECT
str_to_map(regexp_replace(regexp_replace(get_json_object(raw_response, '$.scans'), '\"', ''), '\\{',''),'\\},', ':') AS key_val_map
FROM restricted_antispam.abuse_malware_scanning
) AS S
LATERAL VIEW EXPLODE(key_val_map) temp AS av_engine, engine_result

Apply function for all values of an array

I'm using a custom query with a multi-select parameter as Datasource in DataStudio.
I'd like to use the query parameter array in the where clause such as
STARTS_WITH(stringField, #paramArray[1])
AND STARTS_WITH(stringField, #paramArray[2])
AND STARTS_WITH(stringField, #paramArray[3])
…
For all elements of the #paramArray.
Below is example for BigQuery Standard SQL
SELECT *,
FROM `project.dataset.table`
WHERE (
SELECT LOGICAL_AND(STARTS_WITH(param, stringField))
FROM UNNEST(paramArray) AS param
)
so, try below with #paramArray
SELECT *,
FROM `project.dataset.table`
WHERE (
SELECT LOGICAL_AND(STARTS_WITH(param, stringField))
FROM UNNEST(#paramArray) AS param
)

PG::InvalidParameterValue: ERROR: cannot extract element from a scalar

I'm fetching data from the JSON column by using the following query.
SELECT id FROM ( SELECT id,JSON_ARRAY_ELEMENTS(shipment_lot::json) AS js2 FROM file_table WHERE ('shipment_lot') = ('shipment_lot') ) q WHERE js2->> 'invoice_number' LIKE ('%" abc1123"%')
My Postgresql version is 9.3
Saved data in JSON column:
[{ "id"=>2981, "lot_number"=>1, "activate"=>true, "invoice_number"=>"abc1123", "price"=>378.0}]
However, I'm getting this error:
ActiveRecord::StatementInvalid (PG::InvalidParameterValue: ERROR: cannot extract element from a scalar:
SELECT id FROM
( SELECT id,JSON_ARRAY_ELEMENTS(shipment_lot::json)
AS js2 FROM file_heaps
WHERE ('shipment_lot') = ('shipment_lot') ) q
WHERE js2->> 'invoice_number' LIKE ('%abc1123%'))
How I can solve this issue.
Your issue is that you have improper JSON stored
If you try running your example data on postgres it will not run
SELECT ('[{ "id"=>2981, "lot_number"=>1, "activate"=>true, "invoice_number"=>"abc1123", "price"=>378.0}]')::json
This is the JSON formatted correctly:
SELECT ('[{ "id":2981, "lot_number":1, "activate":true, "invoice_number":"abc1123", "price":378.0}]')::json

Regex extract in BigQuery issue

I'm trying to simplify a column in BigQuery by using BigQuery extract on it but I am having a bit of an issue.
Here are two examples of the data I'm extracting from:
dc_pre=CLXk_aigyOMCFQb2dwod4dYCZw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=OVERDRFT;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.bank.co.za/onlineContent/ga_bridge.html
dc_pre=COztt4-tyOMCFcji7Qod440PCw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=DDA13;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.support.co.za/onlineContent/ga_bridge.html
I want to extract the portion between ;u1= and ;u2
Running the following legacy SQL Query
SELECT
Date(Event_Time),
Activity_ID,
REGEXP_EXTRACT(Other_Data, r'(?<=u1=)(.*\n?)(?=;u2)')
FROM
[sprt-data-transfer:dtftv2_sprt.p_activity_166401]
WHERE
Activity_ID in ('8179851')
AND Site_ID_DCM NOT IN ('2134603','2136502','2539719','2136304','2134604','2134602','2136701','2378406')
AND Event_Time BETWEEN 1563746400000000 AND 1563832799000000
I get the error...
Failed to parse regular expression "(?<=u1=)(.*\n?)(?=;u2)": invalid
perl operator: (?<
And this is where my talent runs out, is the error being caused because I'm using legacy SQL? Or is an unsupported format for REGEX?
Just tried this, and it worked, but with "Standart SQL" enabled.
select
other_data,
regexp_extract(other_data, ';u1=(.+?);u2') as some_part
from
unnest([
'dc_pre=CLXk_aigyOMCFQb2dwod4dYCZw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=OVERDRFT;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.bank.co.za/onlineContent/ga_bridge.html',
'dc_pre=COztt4-tyOMCFcji7Qod440PCw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=DDA13;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.support.co.za/onlineContent/ga_bridge.html'
]) as other_data
Not using regex but it still works...
with test as (
select 1 as id, 'dc_pre=CLXk_aigyOMCFQb2dwod4dYCZw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=OVERDRFT;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.bank.co.za/onlineContent/ga_bridge.html' as my_str UNION ALL
select 2 as id, 'dc_pre=COztt4-tyOMCFcji7Qod440PCw;gtm=2wg7f1;gcldc=;gclaw=;gac=UA-5815571-8:;auiddc=;u1=DDA13;u2=undefined;u3=undefined;u4=undefined;u5=SSA;u6=undefined;u7=na;u8=undefined;u9=undefined;u10=undefined;u11=undefined;~oref=https://www.online.support.co.za/onlineContent/ga_bridge.html'
),
temp as (
select
id,
split(my_str,';') as items
from test
),
flattened as (
select
id,
split(i,'=')[SAFE_OFFSET(0)] as left_side,
split(i,'=')[SAFE_OFFSET(1)] as right_side
from temp
left join unnest(items) i
)
select * from flattened
where left_side = 'u1'

Bigquery - json_extract all elements from an array

i'm trying to extract two key from every json in an arry of jsons(using sql legacy)
currently i am using json extract function :
json_extract(json_column , '$[1].X') AS X,
json_extract(json_column , '$[1].Y') AS Y,
how can i make it run on every json at the 'json arry column', and not just [1] (for example)?
An example json:
[
{"blabla":000,"X":1,"blabla":000,"blabla":000,"blabla":000,,"Y":"2"},
{"blabla":000,"X":3,"blabla":000,"blabla":000,"blabla":000,,"Y":"4"},
]
thanks in advance!
Update 2020: JSON_EXTRACT_ARRAY()
Now BigQuery supports JSON_EXTRACT_ARRAY():
https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions#json_extract_array
For example, to solve this particular question:
SELECT id
, ARRAY(
SELECT JSON_EXTRACT_SCALAR(x, '$.author.email')
FROM UNNEST(JSON_EXTRACT_ARRAY(payload, "$.commits"))x
) emails
FROM `githubarchive.day.20180830`
WHERE type='PushEvent'
AND id='8188163772'
Previous answer
Let's start with a similar problem - this is not a very convenient way to extract all emails from a json array:
SELECT id
, [ JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[0].author.email')
, JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[1].author.email')
, JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[2].author.email')
, JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[3].author.email')
] emails
FROM `githubarchive.day.20180830`
WHERE type='PushEvent'
AND id='8188163772'
The best way we have right now to deal with this is to use some JavaScript in an UDF to split a json-array into a SQL array:
CREATE TEMP FUNCTION json2array(json STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(json).map(x=>JSON.stringify(x));
""";
SELECT * EXCEPT(array_commits),
ARRAY(SELECT JSON_EXTRACT_SCALAR(x, '$.author.email') FROM UNNEST(array_commits) x) emails
FROM (
SELECT id
, json2array(JSON_EXTRACT(payload, '$.commits')) array_commits
FROM `githubarchive.day.20180830`
WHERE type='PushEvent'
AND id='8188163772'
)
May 1st, 2020 Update
A new function, JSON_EXTRACT_ARRAY, has been just added to the list of JSON
functions. This function allows you to extract the contents of a JSON document as
a string array.
so in below you can replace use of CUSTOM_JSON_EXTRACT UDF with just in-built function JSON_EXTRACT_ARRAY as in below example
#standardSQL
SELECT
JSON_EXTRACT_SCALAR(json , '$.X') AS X,
JSON_EXTRACT_SCALAR(json , '$.Y') AS Y
FROM t, UNNEST(JSON_EXTRACT_ARRAY(json_column , '$')) json
==============
Below example for BigQuery Standard SQL and allows you to be close to standard way of working with JSONPath and no extra manipulation needed so you just simply use CUSTOM_JSON_EXTRACT(json, json_path) function
#standardSQL
CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return jsonPath(JSON.parse(json), json_path);
"""
OPTIONS (
library="gs://your_bucket/jsonpath-0.8.0.js"
);
WITH t AS (
SELECT '''
[
{"blabla1":1,"X":1,"blabla2":3,"blabla3":5,"blabla4":7,"Y":"2"},
{"blabla1":2,"X":3,"blabla2":4,"blabla3":6,"blabla4":8,"Y":"4"}
]
''' AS json_column
)
SELECT
CUSTOM_JSON_EXTRACT(json_column , '$[*].X') AS X,
CUSTOM_JSON_EXTRACT(json_column , '$[*].Y') AS Y
FROM t
result will be
Row X Y
1 1 2
3 4
Note: to overcome current BigQuery's "limitation" for JsonPath, above solution uses custom function along with external library - jsonpath-0.8.0.js that can be downloaded from https://code.google.com/archive/p/jsonpath/downloads and uploaded to Google Cloud Storage - gs://your_bucket/jsonpath-0.8.0.js
Just re-read Felipe's answer - for his example above solution will look like below (just as FYI)
SELECT
id,
CUSTOM_JSON_EXTRACT(payload, '$.commits[*].author.email') emails
FROM `githubarchive.day.20180830`
WHERE type='PushEvent'
AND id='8188163772'