BigQuery: Convert record of repeated in repeated record - google-bigquery

I have a BigQuery table represented by this JSON (Record of Repeated)
{
"createdBy": [
"foo",
"foo"
],
"fileName": [
"bar1",
"bar2"
]
}
that I need to convert to Repeated Record
[
{
"createdBy": "foo",
"fileName": "bar1"
},
{
"createdBy": "foo",
"fileName": "bar2"
}
]
To make this conversion you use the index 0 for every array and you created the first object, use the 1 index for the second object, ...
I performed this kind of transformation using a UDF, but the problem is due to BigQuery limits I'm unable to save a VIEW that performs this transformation:
No support for CREATE TEMPORARY FUNCTION statements inside views
Following the full statement to generate a sample table and the function
CREATE TEMP FUNCTION filesObjectArrayToArrayObject(filesJson STRING)
RETURNS ARRAY<STRUCT<createdBy STRING, fileName STRING>>
LANGUAGE js AS """
function filesObjectArrayToArrayObject_execute(files) {
var createdBy = files["createdBy"];
var fileName = files["fileName"];
var output = [];
for(var i=0 ; i<createdBy.length ; i++) {
output.push({ "createdBy" : createdBy[i], "fileName" : fileName[i] });
}
return output;
}
return filesObjectArrayToArrayObject_execute(JSON.parse(filesJson));
""";
WITH sample_table AS (
SELECT STRUCT<createdBy ARRAY<STRING>, fileName ARRAY<STRING>>(
["foo", "foo"],
["bar1", "bar2"]
) AS files
)
SELECT
files AS filesOriginal,
filesObjectArrayToArrayObject(TO_JSON_STRING(files)) AS filesConverted
FROM sample_table
Is there a way to perform the same kind of task using native BigQuery statements?
Please note that:
The real data has more than 2 keys, but those are fixed in names
The length of the array is not fixed, can be 0, 1, 10, 20, ...

Below is for BigQuery Standard SQL
#standardSQL
WITH sample_table AS (
SELECT STRUCT<createdBy ARRAY<STRING>, fileName ARRAY<STRING>>(
["foo", "foo"],
["bar1", "bar2"]
) AS files
)
SELECT
ARRAY(
SELECT STRUCT(createdBy, fileName)
FROM t.files.createdBy AS createdBy WITH OFFSET
JOIN t.files.fileName AS fileName WITH OFFSET
USING(OFFSET)
) files
FROM `sample_table` t
with output
Row files.createdBy files.fileName
1 foo bar1
foo bar2

Related

How do I use BigQuery DML to transform some fields of a struct nested within an array, within a struct, within an array?

I think this is a more complex version of the question in Update values in struct arrays in BigQuery.
I'm trying to update some of the fields in a struct, where the struct is heavily nested. I'm having trouble creating the SQL to do it. Here's my table schema:
CREATE TABLE `my_dataset.test_data_for_so`
(
date DATE,
hits ARRAY<STRUCT<search STRUCT<query STRING, other_column STRING>, metadata ARRAY<STRUCT<key STRING, value STRING>>>>
);
This is what the schema looks like in the BigQuery GUI after I create the table:
Here's the data I've inserted:
INSERT INTO `my_dataset.test_data_for_so` (date, hits)
VALUES (
CAST('2021-01-01' AS date),
[
STRUCT(
STRUCT<query STRING, other_column STRING>('foo bar', 'foo bar'),
[
STRUCT<key STRING, value STRING>('foo bar', 'foo bar')
]
)
]
)
My goal is to transform the "search.query" and "metadata.value" fields. For example, uppercasing them, leaving every other column (and every other struct field) in the row unchanged.
I'm looking for a solution involving either manually specifying each column in the SQL, or preferably, one where I can only mention the columns/fields I want to transform in the SQL, omitting all other columns/fields. This is a minimal example. The table I'm working on in production has hundreds of columns and fields.
For example, that row, when transformed this way, would change from:
[
{
"date": "2021-01-01",
"hits": [
{
"search": {
"query": "foo bar",
"other_column": "foo bar"
},
"metadata": [
{
"key": "foo bar",
"value": "foo bar"
}
]
}
]
}
]
to:
[
{
"date": "2021-01-01",
"hits": [
{
"search": {
"query": "FOO BAR",
"other_column": "foo bar"
},
"metadata": [
{
"key": "foo bar",
"value": "FOO BAR"
}
]
}
]
}
]
preferably, one where I can only mention the columns/fields I want to transform in the SQL ...
Use below approach - it does exactly what you wish - ONLY those fields that are to be updated are in use, all other (tens or hundreds ...) are preserved as is
update your_table
set hits = array(
select as struct *
replace(
(select as struct * replace (upper(query) as query) from unnest([search])) as search,
array(select as struct * replace(upper(value) as value) from unnest(metadata)) as metadata
)
from unnest(hits)
)
where true;
if applied to sample data in your question - result is

HiveQL: How to write a query to select and filter records based on nested JSON array values

In our logging database we store custom UI data as a serialized JSON string. I have been using lateral view json_tuple() to traverse the JSON object and extract nested values. However, I need to filter some of my query results based on whether an array of objects contains certain values or not. After doing some digging I think I need to use lateral view explode(), but I am not a HiveQL expert and I'm not sure exactly how to use this in the way I need.
EX: (simplified for clarity and brevity)
// ui_events table schema
eventDate, eventType, eventData
// serialized JSON string stored in eventData
{ foo: { bar: [{ x: 1, y: 0 }, { x: 0, y: 1 }] } }
// HiveQL query
select
eventDate,
x,
y
from ui_events
lateral view json_tuple(eventData, 'foo') as foo
lateral view json_tuple(foo, 'bar') as bar
// <-- how to select only sub-item(s) in bar where x = 0 and y = 1
where
eventType = 'custom'
and // <-- how to only return records where at least 1 `bar` item was found above?
Any help would be greatly appreciated. Thanks!
Read comments in the code. You can filter the dataset as you want:
with
my_table as(
select stack(2, '{ "foo": { "bar": [{ "x": 1, "y": 0 }, { "x": 0, "y": 1 }] } }',
'{ "foo": { } }'
) as EventData
)
select * from
(
select --get_json_object returns string, not array.
--remove outer []
--and replace delimiter between },{ with ,,,
--to be able to split array
regexp_replace(regexp_replace(get_json_object(EventData, '$.foo.bar'),'^\\[|\\]$',''),
'\\},\\{', '},,,{'
)bar
from my_table t
) s --explode array
lateral view explode (split(s.bar,',,,')) b as bar_element
--get struct elements
lateral view json_tuple(b.bar_element, 'x','y') e as x, y
Result:
s.bar b.bar_element e.x e.y
{"x":1,"y":0},,,{"x":0,"y":1} {"x":1,"y":0} 1 0
{"x":1,"y":0},,,{"x":0,"y":1} {"x":0,"y":1} 0 1

Set a json objects values from an array in postgresql

I have an Array of JSON objects in PostgreSQL inside data.json, That looks like this
[
{ "10" : [1,2,3,4,5] },
{ "8" : [5,4,3,1,2] },
{ "11" : [1,3,5,4,2] }
]
The data is taken from a select statement
SELECT
ARRAY((SELECT json_build_object(station_id,
ARRAY((SELECT
COALESCE((SELECT SUM(prodtakt_takt)
FROM psproductivity.prodtakt
WHERE prodtakt_start::date = generate_series.shipping_day_sort::date
AND station_id = prodtakt_station_id
),0)
FROM generate_series)) ) FROM psproductivity.station WHERE (SELECT COALESCE(SUM(prodtakt_takt),0) FROM psproductivity.prodtakt WHERE station_id = prodtakt_station_id) > 0
)) AS json, ...
Where generate_series is just a series of dates.
Now I need to that and turn it into this format of a JSON object
{
"x" : "x",
"jsondata" : {
"10" : [1,2,3,4,5]
"8" : [5,4,3,1,2]
"11" : [1,3,5,4,2]
}
}
the software I am working on uses c3.js to process data into graphs so I have to change this format. I am thinking that I need to start with something like
json_build_object( 'jsondata',( SELECT FROM json_each(unnest(data.json)) ) )
But I can think of no route with that logic. Adding the x into the JSON object is easy. I am confident I can do that part if I can just reorganize the array

Can I convert a stringified JSON array back to a BigQuery strucutre?

I'm trying to take a STRING field that contains a nested JSON structure from a table called my_old_table, extract a nested array called "alerts" from it, then insert it into a column in a new table called my_new_table. The new column is defined as:
ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>>
I'm using this SQL:
INSERT INTO my_dataset.my_table(
id, alerts)
SELECT id, JSON_EXTRACT(extra, "$.alerts") AS content_alerts
FROM my_dataset.my_old_table
This gives me:
Query column 2 has type STRING which cannot be inserted into column content_alerts, which has type ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>> at [4:1]
I don't see a way of parsing the extracted string this back to a structure.... Is there another way to do this?
Edit:
The original value is a json string that looks like this:
{
"id": "bar123",
"value": "Test",
"title": "Test",
"alerts": [
{
"id": "abc123",
"title": "Foo",
"created": "2020-01-17T23:18:59.769908Z"
},
{
"id": "abc124",
"title": "Accepting/Denying Claims",
"created": "2020-01-17T23:18:59.769908Z"
}
]
}
I want to extract $.alerts and insert it into the ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>> somehow.
Edit #2
To clarify, this reproduces the issue:
CREATE TABLE insights.my_table
(
id string,
alerts ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>>
);
CREATE TABLE insights.my_old_table
(
id string,
field STRING
);
INSERT INTO insights.my_old_table(id, field)
VALUES("1", "{\"id\": \"bar123\",\"value\": \"Test\",\"title\": \"Test\",\"alerts\":[{\"id\": \"abc123\",\"title\": \"Foo\",\"created\": \"2020-01-17T23:18:59.769908Z\"},{\"id\": \"abc124\",\"title\": \"Accepting/Denying Claims\",\"created\": \"2020-01-17T23:18:59.769908Z\"}]}");
Based on the above setup, I don't know how to extract "alerts" from the STRING field and insert it into the STRUCT field. I thought I could add a JSON PARSE step in there but I don't see any BigQuery feature for that. Or else there would be a way to manipulate JSON as a STRUCT but I don't see that either. As a result, this is as close as I could get:
INSERT INTO insights.my_table(id, alerts)
SELECT id, JSON_EXTRACT(field, "$.alerts") AS alerts FROM insights.my_old_table
I'm sure there's something I'm missing here.
Below for BigQuery Standard SQL
#standardSQL
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(input).map(x=>JSON.stringify(x));
""";
)
SELECT
JSON_EXTRACT_SCALAR(extra, "$.id") AS id,
ARRAY(
SELECT AS STRUCT
JSON_EXTRACT_SCALAR(alert, "$.id") AS cuid,
JSON_EXTRACT_SCALAR(alert, "$.title") AS title,
TIMESTAMP(JSON_EXTRACT_SCALAR(alert, "$.created")) AS created
FROM UNNEST(JsonToItems(JSON_EXTRACT(extra, "$.alerts"))) alert
) AS alerts,
FROM `project.dataset.my_old_table`
You can test, play with above using sample data from your question as in example below
#standardSQL
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(input).map(x=>JSON.stringify(x));
""";
WITH `project.dataset.my_old_table` AS (
SELECT '''
{
"id": "bar123",
"value": "Test",
"title": "Test",
"alerts": [
{
"id": "abc123",
"title": "Foo",
"created": "2020-01-17T23:18:59.769908Z"
},
{
"id": "abc124",
"title": "Accepting/Denying Claims",
"created": "2020-01-17T23:18:59.769908Z"
}
]
}
''' extra
)
SELECT
JSON_EXTRACT_SCALAR(extra, "$.id") AS id,
ARRAY(
SELECT AS STRUCT
JSON_EXTRACT_SCALAR(alert, "$.id") AS cuid,
JSON_EXTRACT_SCALAR(alert, "$.title") AS title,
TIMESTAMP(JSON_EXTRACT_SCALAR(alert, "$.created")) AS created
FROM UNNEST(JsonToItems(JSON_EXTRACT(extra, "$.alerts"))) alert
) AS alerts,
FROM `project.dataset.my_old_table`
with result
Obviously, you can then use this in your INSERT INTO my_dataset.my_table statement
You can parse the extracted string back to a BigQuery structure like so:
SELECT STRUCT(ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>>
[('Rick', 'Scientist', '2020-01-17')]) FROM my_dataset.my_old_table;
I just tried it with your data
I have inserted your data in a BigQuery table:
INSERT INTO dataset.table
VALUES('{"id": "bar123", "value": "Test", "title": "Test", "alerts":
[{ "id": "abc123", "title": "Foo", "created": "2020-01-17T23:18:59.769908Z"},
{"id": "abc124", "title": "Accepting/Denying Claims", "created": "2020-01-17T23:18:59.769908Z"}]}');
and queried it, converting it back to a BigQuery structure:
SELECT STRUCT<cuid STRING, title STRING, created TIMESTAMP>("abc123",
"Foo", "2020-01-17T23:18:59.769908Z"),("abc124", "Accepting/Denying
Claims", "2020-01-17T23:18:59.769908Z") FROM blabla.testingjson;
Output:
Row | f0_.cuid | f0_.title | f0_.created
----------------------------------------
1 | abc123 | Foo | 2020-01-17 23:18:59.769908 UTC

SQL Server - "for json path" statement does not return more than 2984 lines of JSON string

I'm trying to generate huge amount of data in a complex and nested JSON string using "for json path" statement, and I'm using multiple functions to create different parts of this JSON string, as follow:
declare #queue nvarchar(max)
select #queue = (
select x.ID as layoutID
, l.Title as layoutName
, JSON_QUERY(queue_objects (#productID, x.ID)) as [objects]
from Layouts x
inner join LayoutLanguages l on l.LayoutID = x.ID
where x.ID = #layoutid
group by x.ID, l.Title
for json path
)
select #queue as JSON
Thus far, JSON would be:
{
"root": [{
"layouts": [{
"layoutID": 5
, "layoutName": "foo"
, "objects": []
}]
}]
}
and the "queue_objects" function then would be called to fill out 'objects' array:
queue_objects
select 0 as objectID
, case when (select inherited_counter(#layoutID,0)) > 0 then 'false' else 'true' end as editable
, JSON_QUERY(queue_properties (p.Table2ID)) as propertyObjects
, JSON_QUERY('[]') as inherited
from productList p
where p.Table1ID = #productID
group by p.Table2ID
for json path
And then JSON would be:
{
"root": [{
"layouts": [{
"layoutID": 5
, "layoutName": "foo"
, "objects": [{
"objectID": 1000
, "editable": "true"
, "propertyObjects": []
, "inherited": []
}, {
"objectID": 2000
, "editable": "false"
, "propertyObjects": []
, "inherited": []
}]
}]
}]
}
Also "inherited_counter" and "queue_properties" functions would be called to fill corresponding keys.
This is just a sample, the code won't work as I'm not putting functions here.
But my question is: is it the functions that simultaneously call each other, makes the server return broken JSON string? or it's the server itself that can't handle JSON strings more than 2984 lines?
EDIT: what I mean by 2984 lines, is that I use beautifier on JSON, the server won't return the string line by line, it returns JSON broken, but after beautifying it happens to be 2984 lines of string.
As I wrote in my comment to the OP, this is probably due to SSMS has a limit of how many characters to display in a column in the result grid. It has no impact on the actual result, e.g. the result has all data, it is just that SSMS doesn't display it all.
To fix this, you can increase the number of characters SSMS retrieves:
I would not recommend that - "how long is a piece of string", but instead select the result into a nvarchar(max) variable, and PRINT that variable. That should give you the whole text.
Hope this helps!