Can I convert a stringified JSON array back to a BigQuery strucutre?

Can I convert a stringified JSON array back to a BigQuery strucutre? - google-bigquery

I'm trying to take a STRING field that contains a nested JSON structure from a table called my_old_table, extract a nested array called "alerts" from it, then insert it into a column in a new table called my_new_table. The new column is defined as:
ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>>
I'm using this SQL:
INSERT INTO my_dataset.my_table(
id, alerts)
SELECT id, JSON_EXTRACT(extra, "$.alerts") AS content_alerts
FROM my_dataset.my_old_table
This gives me:
Query column 2 has type STRING which cannot be inserted into column content_alerts, which has type ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>> at [4:1]
I don't see a way of parsing the extracted string this back to a structure.... Is there another way to do this?
Edit:
The original value is a json string that looks like this:
{
"id": "bar123",
"value": "Test",
"title": "Test",
"alerts": [
{
"id": "abc123",
"title": "Foo",
"created": "2020-01-17T23:18:59.769908Z"
},
{
"id": "abc124",
"title": "Accepting/Denying Claims",
"created": "2020-01-17T23:18:59.769908Z"
}
]
}
I want to extract $.alerts and insert it into the ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>> somehow.
Edit #2
To clarify, this reproduces the issue:
CREATE TABLE insights.my_table
(
id string,
alerts ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>>
);
CREATE TABLE insights.my_old_table
(
id string,
field STRING
);
INSERT INTO insights.my_old_table(id, field)
VALUES("1", "{\"id\": \"bar123\",\"value\": \"Test\",\"title\": \"Test\",\"alerts\":[{\"id\": \"abc123\",\"title\": \"Foo\",\"created\": \"2020-01-17T23:18:59.769908Z\"},{\"id\": \"abc124\",\"title\": \"Accepting/Denying Claims\",\"created\": \"2020-01-17T23:18:59.769908Z\"}]}");
Based on the above setup, I don't know how to extract "alerts" from the STRING field and insert it into the STRUCT field. I thought I could add a JSON PARSE step in there but I don't see any BigQuery feature for that. Or else there would be a way to manipulate JSON as a STRUCT but I don't see that either. As a result, this is as close as I could get:
INSERT INTO insights.my_table(id, alerts)
SELECT id, JSON_EXTRACT(field, "$.alerts") AS alerts FROM insights.my_old_table
I'm sure there's something I'm missing here.

Below for BigQuery Standard SQL
#standardSQL
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(input).map(x=>JSON.stringify(x));
""";
)
SELECT
JSON_EXTRACT_SCALAR(extra, "$.id") AS id,
ARRAY(
SELECT AS STRUCT
JSON_EXTRACT_SCALAR(alert, "$.id") AS cuid,
JSON_EXTRACT_SCALAR(alert, "$.title") AS title,
TIMESTAMP(JSON_EXTRACT_SCALAR(alert, "$.created")) AS created
FROM UNNEST(JsonToItems(JSON_EXTRACT(extra, "$.alerts"))) alert
) AS alerts,
FROM `project.dataset.my_old_table`
You can test, play with above using sample data from your question as in example below
#standardSQL
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(input).map(x=>JSON.stringify(x));
""";
WITH `project.dataset.my_old_table` AS (
SELECT '''
{
"id": "bar123",
"value": "Test",
"title": "Test",
"alerts": [
{
"id": "abc123",
"title": "Foo",
"created": "2020-01-17T23:18:59.769908Z"
},
{
"id": "abc124",
"title": "Accepting/Denying Claims",
"created": "2020-01-17T23:18:59.769908Z"
}
]
}
''' extra
)
SELECT
JSON_EXTRACT_SCALAR(extra, "$.id") AS id,
ARRAY(
SELECT AS STRUCT
JSON_EXTRACT_SCALAR(alert, "$.id") AS cuid,
JSON_EXTRACT_SCALAR(alert, "$.title") AS title,
TIMESTAMP(JSON_EXTRACT_SCALAR(alert, "$.created")) AS created
FROM UNNEST(JsonToItems(JSON_EXTRACT(extra, "$.alerts"))) alert
) AS alerts,
FROM `project.dataset.my_old_table`
with result
Obviously, you can then use this in your INSERT INTO my_dataset.my_table statement

You can parse the extracted string back to a BigQuery structure like so:
SELECT STRUCT(ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>>
[('Rick', 'Scientist', '2020-01-17')]) FROM my_dataset.my_old_table;
I just tried it with your data
I have inserted your data in a BigQuery table:
INSERT INTO dataset.table
VALUES('{"id": "bar123", "value": "Test", "title": "Test", "alerts":
[{ "id": "abc123", "title": "Foo", "created": "2020-01-17T23:18:59.769908Z"},
{"id": "abc124", "title": "Accepting/Denying Claims", "created": "2020-01-17T23:18:59.769908Z"}]}');
and queried it, converting it back to a BigQuery structure:
SELECT STRUCT<cuid STRING, title STRING, created TIMESTAMP>("abc123",
"Foo", "2020-01-17T23:18:59.769908Z"),("abc124", "Accepting/Denying
Claims", "2020-01-17T23:18:59.769908Z") FROM blabla.testingjson;
Output:
Row | f0_.cuid | f0_.title | f0_.created
----------------------------------------
1 | abc123 | Foo | 2020-01-17 23:18:59.769908 UTC

Related

How do I use BigQuery DML to transform some fields of a struct nested within an array, within a struct, within an array?

I think this is a more complex version of the question in Update values in struct arrays in BigQuery.
I'm trying to update some of the fields in a struct, where the struct is heavily nested. I'm having trouble creating the SQL to do it. Here's my table schema:
CREATE TABLE `my_dataset.test_data_for_so`
(
date DATE,
hits ARRAY<STRUCT<search STRUCT<query STRING, other_column STRING>, metadata ARRAY<STRUCT<key STRING, value STRING>>>>
);
This is what the schema looks like in the BigQuery GUI after I create the table:
Here's the data I've inserted:
INSERT INTO `my_dataset.test_data_for_so` (date, hits)
VALUES (
CAST('2021-01-01' AS date),
[
STRUCT(
STRUCT<query STRING, other_column STRING>('foo bar', 'foo bar'),
[
STRUCT<key STRING, value STRING>('foo bar', 'foo bar')
]
)
]
)
My goal is to transform the "search.query" and "metadata.value" fields. For example, uppercasing them, leaving every other column (and every other struct field) in the row unchanged.
I'm looking for a solution involving either manually specifying each column in the SQL, or preferably, one where I can only mention the columns/fields I want to transform in the SQL, omitting all other columns/fields. This is a minimal example. The table I'm working on in production has hundreds of columns and fields.
For example, that row, when transformed this way, would change from:
[
{
"date": "2021-01-01",
"hits": [
{
"search": {
"query": "foo bar",
"other_column": "foo bar"
},
"metadata": [
{
"key": "foo bar",
"value": "foo bar"
}
]
}
]
}
]
to:
[
{
"date": "2021-01-01",
"hits": [
{
"search": {
"query": "FOO BAR",
"other_column": "foo bar"
},
"metadata": [
{
"key": "foo bar",
"value": "FOO BAR"
}
]
}
]
}
]

preferably, one where I can only mention the columns/fields I want to transform in the SQL ...
Use below approach - it does exactly what you wish - ONLY those fields that are to be updated are in use, all other (tens or hundreds ...) are preserved as is
update your_table
set hits = array(
select as struct *
replace(
(select as struct * replace (upper(query) as query) from unnest([search])) as search,
array(select as struct * replace(upper(value) as value) from unnest(metadata)) as metadata
)
from unnest(hits)
)
where true;
if applied to sample data in your question - result is

How to convert a JSON field to Tabular format in SQL Query?

I have a Table containing 3 columns
(ID, Content, Date), where the Content column have values in json format as shown below:
{
"Id": "9999",
"Name": "PETERPAN",
"SubContent": [
{
"subcontent1": "ABC",
"subcontent2": "123"
}
[
}
How can I convert it into tabular format using SQL Query?

Use LATERAL FLATTEN to get the key/value pairs as separate rows:
with t as (
select parse_json('{
"Id": "9999",
"Name": "PETERPAN",
"SubContent":
{
"subcontent1": "ABC",
"subcontent2": "123"
}
}') col
)
select col:Id as id, col:Name as name, sc.key, sc.value
from t, lateral flatten( input => col:SubContent ) sc;
The result is
ID NAME KEY VALUE
9999 PETERPAN subcontent1 ABC
9999 PETERPAN subcontent2 123

postgresql filter data from bytea column

I have a table where i am saving data in a column of type bytea, the data is actually a JSON object.
I need to implement a filter on the JSON data.
SELECT cast(job_data::TEXT as jsonb) FROM job_details where job_data ->> "organization" = "ABC";
This query does not work.
The JSON Object looks like
{
"uid": "FdUR4SB0h7",
"Type": "Reference Data Service",
"user": "hk#ss.com",
"SubType": "Reference Data Task",
"_version": 1,
"Frequency": "Once",
"Parameters": "sdfsdfsdfds",
"organization": "ABC",
"StartDateTime": "2020-01-20T10:30:00Z"
}

You need to predicate on the converted column, also, that conversion may not necessarily work depending on encoding. Try something like this:
SELECT
*
FROM
job_details
WHERE
convert_from(job_data, 'UTF-8')::json ->> 'organization' = 'ABC';

How to generate JSON array from multiple rows, then return with values of another table

I am trying to build a query which combines rows of one table into a JSON array, I then want that array to be part of the return.
I know how to do a simple query like
SELECT *
FROM public.template
WHERE id=1
And I have worked out how to produce the JSON array that I want
SELECT array_to_json(array_agg(to_json(fields)))
FROM (
SELECT id, name, format, data
FROM public.field
WHERE template_id = 1
) fields
However, I cannot work out how to combine the two, so that the result is a number of fields from public.template with the output of the second query being one of the returned fields.
I am using PostGreSQL 9.6.6
Edit, as requested more information, a definition of field and template tables and a sample of each queries output.
Currently, I have a JSONB row on the template table which I am using to store an array of fields, but I want to move fields to their own table so that I can more easily enforce a schema on them.
Template table contains:
id
name
data
organisation_id
But I would like to remove data and replace it with the field table which contains:
id
name
format
data
template_id
At the moment the output of the first query is:
{
"id": 1,
"name": "Test Template",
"data": [
{
"id": "1",
"data": null,
"name": "Assigned User",
"format": "String"
},
{
"id": "2",
"data": null,
"name": "Office",
"format": "String"
},
{
"id": "3",
"data": null,
"name": "Department",
"format": "String"
}
],
"id_organisation": 1
}
This output is what I would like to recreate using one query and both tables. The second query outputs this, but I do not know how to merge it into a single query:
[{
"id": 1,
"name": "Assigned User",
"format": "String",
"data": null
},{
"id": 2,
"name": "Office",
"format": "String",
"data": null
},{
"id": 3,
"name": "Department",
"format": "String",
"data": null
}]

The feature you're looking for is json concatenation. You can do that by using the operator ||. It's available since PostgreSQL 9.5
SELECT to_jsonb(template.*) || jsonb_build_object('data', (SELECT to_jsonb(field) WHERE template_id = templates.id)) FROM template

Sorry for poorly phrasing what I was trying to achieve, after hours of Googling I have worked it out and it was a lot more simple than I thought in my ignorance.
SELECT id, name, data
FROM public.template, (
SELECT array_to_json(array_agg(to_json(fields)))
FROM (
SELECT id, name, format, data
FROM public.field
WHERE template_id = 1
) fields
) as data
WHERE id = 1
I wanted the result of the subquery to be a column in the ouput rather than compiling the entire output table as a JSON.

Postgresql SELECTing from JSON column

Assume I am using PG 9.3 and I have a post table with a json column 'meta_data':
Example content of the json column 'meta_data'
{
"content": "this is a post body",
"comments": [
{
"user_id": 1,
"content": "hello"
},
{
"user_id": 2,
"content": "foo"
},
{
"user_id": 3,
"content": "bar"
}
]
}
How can I find all the posts where the user_id = 1 from the comments array from the meta_data column?

I'm almost positive I'm implementing this incorrectly but try this
select *
from posts
where id in (
select id from (
select id,
json_array_elements(meta_data->'comments')->'user_id' as user_id
from posts
) x
where cast(user_id as varchar) = '1'
);
There's probably an array operator like #> that will remove the need for the nested select statements but I can't seem to get it to work right now.
Let me know if this is going down the correct track, I'm sure we could figure it out if required.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Can I convert a stringified JSON array back to a BigQuery strucutre? - google-bigquery

Related

How do I use BigQuery DML to transform some fields of a struct nested within an array, within a struct, within an array?

How to convert a JSON field to Tabular format in SQL Query?

postgresql filter data from bytea column

How to generate JSON array from multiple rows, then return with values of another table

Postgresql SELECTing from JSON column

Categories

Resources