Handle partial Stringified Partial Key-value JSON VARIANT on SQL - sql

I have a variant object like the below:
{
"build_num": "111",
"city_name": "Paris",
"rawData": "\"sku\":\"AAA\",\"price\":19.98,\"currency\":\"USD\",\"quantity\":1,\"size\":\"\"}",
"country": "France"
}
So you can see that parts of it are regular key-value pairs like build num and city name, but then we have raw data which its value is a stringified version of a JSON
I would like to create a variant from this that will look like:
{
"build_num": "111",
"city_name": "Paris",
"rawData": {
"sku":"AAA",
"price":19.98,
"currency":"USD",
"quantity":1}
"country": "France"
}
AND I would like to do this all in SQL (Snowflake) - is that possible?

so to poke the "data" into I have used a CTE, and escaped the sub json, so it is in the DB as you describe. I also had to add the missing start of object token {, to make rawData valid.
WITH data AS (
SELECT parse_json(json) as json
FROM VALUES
('{"build_num": "111","city_name": "Paris","country": "France","rawData": "{\\"sku\\":\\"AAA\\",\\"price\\":19.98,\\"currency\\":\\"USD\\",\\"quantity\\":1,\\"size\\":\\"\\"}"}')
v( json)
)
SELECT
json,
json:rawData as raw_data,
parse_json(raw_data) as sub_json,
OBJECT_INSERT(json, 'rawData', sub_json, true) as all_json
FROM data;
so this show step by step transforming the data, parsing it via PARSE_JSON, and re inject via OBJECT_INSERT the result into the original object.
WITH data AS (
SELECT parse_json(json) as json
FROM VALUES
('{"build_num": "111","city_name": "Paris","country": "France","rawData": "{\\"sku\\":\\"AAA\\",\\"price\\":19.98,\\"currency\\":\\"USD\\",\\"quantity\\":1,\\"size\\":\\"\\"}"}')
v( json)
)
SELECT
OBJECT_INSERT(json, 'rawData', parse_json(json:rawData), true) as all_json
FROM data;

TRY_PARSE_JSON will clear off additional slashes from JSON element and below is reference:
select
column1 as n,
try_parse_json(column1) as v
from
values
(
'{
"build_num": "111",
"city_name": "Paris",
"rawData": "{\"sku\":\"AAA\",\"price\":19.98,\"currency\":\"USD\",\"quantity\":1,\"size\":\"\"}",
"country": "France"}'
) as vals;

Related

How to extract a field from an array of JSON objects in AWS Athena?

I have the following JSON data structure in a column in AWS Athena:
[
{
"event_type": "application_state_transition",
"data": {
"event_id": "-3368023833341021830"
}
},
{
"event_type": "application_state_transition",
"data": {
"event_id": "5692882176024811076"
}
}
]
I would like to somehow extract the values of event_id field, e.g. in the form of a list:
["-3368023833341021830", "5692882176024811076"]
(Though I don't insist on exactly this as long as I can get my event IDs.)
I wanted to use the JSON_EXTRACT function and thought it uses the very same syntax as jq. In jq, I can easily get what I want using the following query syntax:
.[].data.event_id
However, in AWS Athena this results in an error, as apparently the syntax is not entirely compatible with jq. Is there an alternative way to achieve the result I want?
JSON_EXTRACT supports quite limited set of json paths. Depending on Athena engine version you can either process column by casting it to array of maps and processing this array via array functions:
-- sample data
with dataset(json_col) as (
values ('[
{
"event_type": "application_state_transition",
"data": {
"event_id": "-3368023833341021830"
}
},
{
"event_type": "application_state_transition",
"data": {
"event_id": "5692882176024811076"
}
}
]')
)
-- query
select transform(
cast(json_parse(json_col) as array(map(varchar, json))),
m -> json_extract(m['data'], '$.event_id'))
from dataset;
Output:
_col0
["-3368023833341021830", "5692882176024811076"]
Or for 3rd Athena engine version you can try using Trino's json_query:
-- query
select JSON_QUERY(json_col, 'lax $[*].data.event_id' WITH ARRAY WRAPPER)
from dataset;
Note that return type of two will differ - in first case you will have array(json) and in the second one - just varchar.

PostgreSQL Select from an array inside an array

I have a column (locations) populated with something like below.
{
"unique_id": "1",
"address": {
"street_address": "100 Main St",
"city": "Pleasantville",
"state_province": "IL",
"country_code": "US"
}
}
I know how to select "unique_id" with
SELECT locations #>> '{unique_id}'
FROM table;
But how would you select, say, "street_address"?
You have a couple of options; because the JSON operators return JSON you can chain the operators:
SELECT locations -> 'address' -> 'street_address' AS way1 FROM testjson;
Alternatively you can use the operators that accept an array:
SELECT locations #> '{address,street_address}' AS otherway FROM testjson;
See this sql fiddle for a working example or the documentation for further information.

Returning unknown JSON in a query

Here is my scenario. I have data in a Cosmos DB and I want to return c.this, c.that etc as the indexer for Azure Cognitive Search. One field I want to return is JSON of an unknown structure. The one thing I do know about it is that it is flat. However it is my understanding that the return value for an indexer needs to be known. How, using SQL in a SELECT, would I return all JSON elements in the flat object? Here is an example value I would be querying:
{
"BusinessKey": "SomeKey",
"Source": "flat",
"id": "SomeId",
"attributes": {
"Source": "flat",
"Element": "element",
"SomeOtherElement": "someOtherElement"
}
}
So I would want my select to be maybe something like:
SELECT
c.BusinessKey,
c.Source,
c.id,
-- SOMETHING HERE TO LIST OUT ALL ATTRIBUTES IN THE JSON AS FIELDS IN THE RESULT
And I would want the result to be:
{
"BusinessKey": "SomeKey",
"Source": "flat",
"id": "SomeId",
"attributes": [{"Source":"flat"},{"Element":"element"},{"SomeOtherElement":"someotherelement"}]
}
Currently we are calling ToString on the c.attributes, which is the JSON of unknown structure but it is adding all the escape characters. When we want to search the index, we have to add all those escape characters and it's getting really unruly.
Is there a way to do this using SQL?
Thanks for any help!
You could use UDF in cosmos db sql.
UDF code:
function userDefinedFunction(object){
var returnArray = [];
for (var key in object) {
var map = {};
map[key] = object[key];
returnArray.push(map);
}
return returnArray;
}
Sql:
SELECT
c.BusinessKey,
c.Source,
c.id,
udf.test(c.attributes) as attributes
from c
Output:

Can I convert a stringified JSON array back to a BigQuery strucutre?

I'm trying to take a STRING field that contains a nested JSON structure from a table called my_old_table, extract a nested array called "alerts" from it, then insert it into a column in a new table called my_new_table. The new column is defined as:
ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>>
I'm using this SQL:
INSERT INTO my_dataset.my_table(
id, alerts)
SELECT id, JSON_EXTRACT(extra, "$.alerts") AS content_alerts
FROM my_dataset.my_old_table
This gives me:
Query column 2 has type STRING which cannot be inserted into column content_alerts, which has type ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>> at [4:1]
I don't see a way of parsing the extracted string this back to a structure.... Is there another way to do this?
Edit:
The original value is a json string that looks like this:
{
"id": "bar123",
"value": "Test",
"title": "Test",
"alerts": [
{
"id": "abc123",
"title": "Foo",
"created": "2020-01-17T23:18:59.769908Z"
},
{
"id": "abc124",
"title": "Accepting/Denying Claims",
"created": "2020-01-17T23:18:59.769908Z"
}
]
}
I want to extract $.alerts and insert it into the ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>> somehow.
Edit #2
To clarify, this reproduces the issue:
CREATE TABLE insights.my_table
(
id string,
alerts ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>>
);
CREATE TABLE insights.my_old_table
(
id string,
field STRING
);
INSERT INTO insights.my_old_table(id, field)
VALUES("1", "{\"id\": \"bar123\",\"value\": \"Test\",\"title\": \"Test\",\"alerts\":[{\"id\": \"abc123\",\"title\": \"Foo\",\"created\": \"2020-01-17T23:18:59.769908Z\"},{\"id\": \"abc124\",\"title\": \"Accepting/Denying Claims\",\"created\": \"2020-01-17T23:18:59.769908Z\"}]}");
Based on the above setup, I don't know how to extract "alerts" from the STRING field and insert it into the STRUCT field. I thought I could add a JSON PARSE step in there but I don't see any BigQuery feature for that. Or else there would be a way to manipulate JSON as a STRUCT but I don't see that either. As a result, this is as close as I could get:
INSERT INTO insights.my_table(id, alerts)
SELECT id, JSON_EXTRACT(field, "$.alerts") AS alerts FROM insights.my_old_table
I'm sure there's something I'm missing here.
Below for BigQuery Standard SQL
#standardSQL
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(input).map(x=>JSON.stringify(x));
""";
)
SELECT
JSON_EXTRACT_SCALAR(extra, "$.id") AS id,
ARRAY(
SELECT AS STRUCT
JSON_EXTRACT_SCALAR(alert, "$.id") AS cuid,
JSON_EXTRACT_SCALAR(alert, "$.title") AS title,
TIMESTAMP(JSON_EXTRACT_SCALAR(alert, "$.created")) AS created
FROM UNNEST(JsonToItems(JSON_EXTRACT(extra, "$.alerts"))) alert
) AS alerts,
FROM `project.dataset.my_old_table`
You can test, play with above using sample data from your question as in example below
#standardSQL
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(input).map(x=>JSON.stringify(x));
""";
WITH `project.dataset.my_old_table` AS (
SELECT '''
{
"id": "bar123",
"value": "Test",
"title": "Test",
"alerts": [
{
"id": "abc123",
"title": "Foo",
"created": "2020-01-17T23:18:59.769908Z"
},
{
"id": "abc124",
"title": "Accepting/Denying Claims",
"created": "2020-01-17T23:18:59.769908Z"
}
]
}
''' extra
)
SELECT
JSON_EXTRACT_SCALAR(extra, "$.id") AS id,
ARRAY(
SELECT AS STRUCT
JSON_EXTRACT_SCALAR(alert, "$.id") AS cuid,
JSON_EXTRACT_SCALAR(alert, "$.title") AS title,
TIMESTAMP(JSON_EXTRACT_SCALAR(alert, "$.created")) AS created
FROM UNNEST(JsonToItems(JSON_EXTRACT(extra, "$.alerts"))) alert
) AS alerts,
FROM `project.dataset.my_old_table`
with result
Obviously, you can then use this in your INSERT INTO my_dataset.my_table statement
You can parse the extracted string back to a BigQuery structure like so:
SELECT STRUCT(ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>>
[('Rick', 'Scientist', '2020-01-17')]) FROM my_dataset.my_old_table;
I just tried it with your data
I have inserted your data in a BigQuery table:
INSERT INTO dataset.table
VALUES('{"id": "bar123", "value": "Test", "title": "Test", "alerts":
[{ "id": "abc123", "title": "Foo", "created": "2020-01-17T23:18:59.769908Z"},
{"id": "abc124", "title": "Accepting/Denying Claims", "created": "2020-01-17T23:18:59.769908Z"}]}');
and queried it, converting it back to a BigQuery structure:
SELECT STRUCT<cuid STRING, title STRING, created TIMESTAMP>("abc123",
"Foo", "2020-01-17T23:18:59.769908Z"),("abc124", "Accepting/Denying
Claims", "2020-01-17T23:18:59.769908Z") FROM blabla.testingjson;
Output:
Row | f0_.cuid | f0_.title | f0_.created
----------------------------------------
1 | abc123 | Foo | 2020-01-17 23:18:59.769908 UTC

jsonb LIKE query on nested objects in an array

My JSON data looks like this:
[{
"id": 1,
"payload": {
"location": "NY",
"details": [{
"name": "cafe",
"cuisine": "mexican"
},
{
"name": "foody",
"cuisine": "italian"
}
]
}
}, {
"id": 2,
"payload": {
"location": "NY",
"details": [{
"name": "mbar",
"cuisine": "mexican"
},
{
"name": "fdy",
"cuisine": "italian"
}
]
}
}]
given a text "foo" I want to return all the tuples that have this substring. But I cannot figure out how to write the query for the same.
I followed this related answer but cannot figure out how to do LIKE.
This is what I have working right now:
SELECT r.res->>'name' AS feature_name, d.details::text
FROM restaurants r
, LATERAL (SELECT ARRAY (
SELECT * FROM json_populate_recordset(null::foo, r.res#>'{payload,
details}')
)
) AS d(details)
WHERE d.details #> '{cafe}';
Instead of passing the whole text of cafe I want to pass ca and get the results that match that text.
Your solution can be simplified some more:
SELECT r.res->>'name' AS feature_name, d.name AS detail_name
FROM restaurants r
, jsonb_populate_recordset(null::foo, r.res #> '{payload, details}') d
WHERE d.name LIKE '%oh%';
Or simpler, yet, with jsonb_array_elements() since you don't actually need the row type (foo) at all in this example:
SELECT r.res->>'name' AS feature_name, d->>'name' AS detail_name
FROM restaurants r
, jsonb_array_elements(r.res #> '{payload, details}') d
WHERE d->>'name' LIKE '%oh%';
db<>fiddle here
But that's not what you asked exactly:
I want to return all the tuples that have this substring.
You are returning all JSON array elements (0-n per base table row), where one particular key ('{payload,details,*,name}') matches (case-sensitively).
And your original question had a nested JSON array on top of this. You removed the outer array for this solution - I did the same.
Depending on your actual requirements the new text search capability of Postgres 10 might be useful.
I ended up doing this(inspired by this answer - jsonb query with nested objects in an array)
SELECT r.res->>'name' AS feature_name, d.details::text
FROM restaurants r
, LATERAL (
SELECT * FROM json_populate_recordset(null::foo, r.res#>'{payload, details}')
) AS d(details)
WHERE d.details LIKE '%oh%';
Fiddle here - http://sqlfiddle.com/#!15/f2027/5