How do I use BigQuery DML to transform some fields of a struct nested within an array, within a struct, within an array? - sql

I think this is a more complex version of the question in Update values in struct arrays in BigQuery.
I'm trying to update some of the fields in a struct, where the struct is heavily nested. I'm having trouble creating the SQL to do it. Here's my table schema:
CREATE TABLE `my_dataset.test_data_for_so`
(
date DATE,
hits ARRAY<STRUCT<search STRUCT<query STRING, other_column STRING>, metadata ARRAY<STRUCT<key STRING, value STRING>>>>
);
This is what the schema looks like in the BigQuery GUI after I create the table:
Here's the data I've inserted:
INSERT INTO `my_dataset.test_data_for_so` (date, hits)
VALUES (
CAST('2021-01-01' AS date),
[
STRUCT(
STRUCT<query STRING, other_column STRING>('foo bar', 'foo bar'),
[
STRUCT<key STRING, value STRING>('foo bar', 'foo bar')
]
)
]
)
My goal is to transform the "search.query" and "metadata.value" fields. For example, uppercasing them, leaving every other column (and every other struct field) in the row unchanged.
I'm looking for a solution involving either manually specifying each column in the SQL, or preferably, one where I can only mention the columns/fields I want to transform in the SQL, omitting all other columns/fields. This is a minimal example. The table I'm working on in production has hundreds of columns and fields.
For example, that row, when transformed this way, would change from:
[
{
"date": "2021-01-01",
"hits": [
{
"search": {
"query": "foo bar",
"other_column": "foo bar"
},
"metadata": [
{
"key": "foo bar",
"value": "foo bar"
}
]
}
]
}
]
to:
[
{
"date": "2021-01-01",
"hits": [
{
"search": {
"query": "FOO BAR",
"other_column": "foo bar"
},
"metadata": [
{
"key": "foo bar",
"value": "FOO BAR"
}
]
}
]
}
]

preferably, one where I can only mention the columns/fields I want to transform in the SQL ...
Use below approach - it does exactly what you wish - ONLY those fields that are to be updated are in use, all other (tens or hundreds ...) are preserved as is
update your_table
set hits = array(
select as struct *
replace(
(select as struct * replace (upper(query) as query) from unnest([search])) as search,
array(select as struct * replace(upper(value) as value) from unnest(metadata)) as metadata
)
from unnest(hits)
)
where true;
if applied to sample data in your question - result is

Related

Insert into table using data from JSON value in PostgresQL

I am working on a sensitive migration. The scenario is as follows:
I have a new table that I need to populate with data
There is an existing table, which contains a column (type = json), which contains an array of objects such as:
[
{
"id": 0,
"name": "custom-field-0",
"label": "When is the deadline for a response?",
"type": "Date",
"options": "",
"value": "2020-10-02",
"index": 1
},
{
"id": 1,
"name": "custom-field-1",
"label": "What territory does this relate to?",
"type": "Dropdown",
"options": "UK, DE, SE, DK, BE, NL, IT, FR, ES, AT, CH, NO, US, SG, Other",
"value": " DE",
"index": 2
}
]
I need to essentially map these values in this column to my new table. I have worked with JSON data in PostgresQL before, where I was dealing with a single object in the JSON, but never with arrays of objects and on such a large scale.
So just to summarise, how does someone iterate every row, and every object in an array, and insert that data into a new table?
EDIT
I have been experimenting with some functions, and I found one that seems promising json_array_elements_text or json_array_elements. As this allowed me to add multiple rows to the new table using this array of objects.
However, my issue is that I need to map certain values to the new table.
INSERT INTO form_field_value ("name", "label", "inputType", "options", "form" "workspace")
SELECT <<HERE IS WHERE I NEED TO EXTRACT VALUES FROM THE JSON ARRAY>>, task.form, task.workspace
FROM task;
EDIT 2
I have been playing around some more with the above functions, but reached a slight issue.
INSERT INTO form_field_value ("name", "label", "inputType", "options", "form" "workspace")
SELECT cf ->> 'name',
(cf ->> 'label')
...
FROM jsonb_array_elements(task."customFields") AS t(cf);
My issue lies in the FROM clause, so customFields is the array of objects, but I also need to get the form and workspace attribute from this table too. Plus I a pretty sure that the FROM clause would not work anyway, as it probably will complain about the task."customFields" not being specified or something.
Here is the select statement that uses json_array_elements and a lateral join in the from clause to flatten the data.
select j ->> 'name' as "name", j ->> 'label' as "label",
j ->> 'type' as "inputType", j ->> 'options' as "options", form, workspace
from task
cross join lateral json_array_elements("customFields") as l(j);
The from clause can be less verbose
from task, json_array_elements("customFields") as l(j)
you can try to use json_to_recordset:
select * from json_to_recordset('
[
{
"id": 0,
"name": "custom-field-0",
"label": "When is the deadline for a response?",
"type": "Date",
"options": "",
"value": "2020-10-02",
"index": 1
},
{
"id": 1,
"name": "custom-field-1",
"label": "What territory does this relate to?",
"type": "Dropdown",
"options": "UK, DE, SE, DK, BE, NL, IT, FR, ES, AT, CH, NO, US, SG, Other",
"value": " DE",
"index": 2
}
]
') as x(id int, name text,label text,type text,options text,value text,index int)
for insert record you can use an sql like this:
INSERT INTO form_field_value ("name", "label", "inputType", "options", "form" "workspace")
SELECT name, label, type, options, form, workspace
FROM
task,
json_to_record(task) AS
x (id int, name text,label text,type text,options text,value text,index int)

how to use trino/presto to query redis

I have a simple string and hash stored in redis
get test
"1"
hget htest first
"first hash"
I'm able to see the "table" test, but there are no columns
trino> show columns from redis.default.test;
Column | Type | Extra | Comment
--------+------+-------+---------
(0 rows)
and obviously I can't get result from select
trino> select * from redis.default.test;
Query 20210918_174414_00006_dmp3x failed: line 1:8: SELECT * not allowed from relation
that has no columns
I see in the documentation that I might need to create a table definition file, but I wasn't able to create one that will work.
I had few variations of this, but this is the one for example:
{
"tableName": "test",
"schemaName": "default",
"value": {
"dataFormat": "json",
"fields": [
{
"name": "number",
"mapping": 0,
"type": "INT"
}
]
}
}
any idea what am I doing wrong?
I focused on the string since it's simpler, but I also need to query the hash

Can I convert a stringified JSON array back to a BigQuery strucutre?

I'm trying to take a STRING field that contains a nested JSON structure from a table called my_old_table, extract a nested array called "alerts" from it, then insert it into a column in a new table called my_new_table. The new column is defined as:
ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>>
I'm using this SQL:
INSERT INTO my_dataset.my_table(
id, alerts)
SELECT id, JSON_EXTRACT(extra, "$.alerts") AS content_alerts
FROM my_dataset.my_old_table
This gives me:
Query column 2 has type STRING which cannot be inserted into column content_alerts, which has type ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>> at [4:1]
I don't see a way of parsing the extracted string this back to a structure.... Is there another way to do this?
Edit:
The original value is a json string that looks like this:
{
"id": "bar123",
"value": "Test",
"title": "Test",
"alerts": [
{
"id": "abc123",
"title": "Foo",
"created": "2020-01-17T23:18:59.769908Z"
},
{
"id": "abc124",
"title": "Accepting/Denying Claims",
"created": "2020-01-17T23:18:59.769908Z"
}
]
}
I want to extract $.alerts and insert it into the ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>> somehow.
Edit #2
To clarify, this reproduces the issue:
CREATE TABLE insights.my_table
(
id string,
alerts ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>>
);
CREATE TABLE insights.my_old_table
(
id string,
field STRING
);
INSERT INTO insights.my_old_table(id, field)
VALUES("1", "{\"id\": \"bar123\",\"value\": \"Test\",\"title\": \"Test\",\"alerts\":[{\"id\": \"abc123\",\"title\": \"Foo\",\"created\": \"2020-01-17T23:18:59.769908Z\"},{\"id\": \"abc124\",\"title\": \"Accepting/Denying Claims\",\"created\": \"2020-01-17T23:18:59.769908Z\"}]}");
Based on the above setup, I don't know how to extract "alerts" from the STRING field and insert it into the STRUCT field. I thought I could add a JSON PARSE step in there but I don't see any BigQuery feature for that. Or else there would be a way to manipulate JSON as a STRUCT but I don't see that either. As a result, this is as close as I could get:
INSERT INTO insights.my_table(id, alerts)
SELECT id, JSON_EXTRACT(field, "$.alerts") AS alerts FROM insights.my_old_table
I'm sure there's something I'm missing here.
Below for BigQuery Standard SQL
#standardSQL
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(input).map(x=>JSON.stringify(x));
""";
)
SELECT
JSON_EXTRACT_SCALAR(extra, "$.id") AS id,
ARRAY(
SELECT AS STRUCT
JSON_EXTRACT_SCALAR(alert, "$.id") AS cuid,
JSON_EXTRACT_SCALAR(alert, "$.title") AS title,
TIMESTAMP(JSON_EXTRACT_SCALAR(alert, "$.created")) AS created
FROM UNNEST(JsonToItems(JSON_EXTRACT(extra, "$.alerts"))) alert
) AS alerts,
FROM `project.dataset.my_old_table`
You can test, play with above using sample data from your question as in example below
#standardSQL
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(input).map(x=>JSON.stringify(x));
""";
WITH `project.dataset.my_old_table` AS (
SELECT '''
{
"id": "bar123",
"value": "Test",
"title": "Test",
"alerts": [
{
"id": "abc123",
"title": "Foo",
"created": "2020-01-17T23:18:59.769908Z"
},
{
"id": "abc124",
"title": "Accepting/Denying Claims",
"created": "2020-01-17T23:18:59.769908Z"
}
]
}
''' extra
)
SELECT
JSON_EXTRACT_SCALAR(extra, "$.id") AS id,
ARRAY(
SELECT AS STRUCT
JSON_EXTRACT_SCALAR(alert, "$.id") AS cuid,
JSON_EXTRACT_SCALAR(alert, "$.title") AS title,
TIMESTAMP(JSON_EXTRACT_SCALAR(alert, "$.created")) AS created
FROM UNNEST(JsonToItems(JSON_EXTRACT(extra, "$.alerts"))) alert
) AS alerts,
FROM `project.dataset.my_old_table`
with result
Obviously, you can then use this in your INSERT INTO my_dataset.my_table statement
You can parse the extracted string back to a BigQuery structure like so:
SELECT STRUCT(ARRAY<STRUCT<cuid STRING, title STRING, created TIMESTAMP>>
[('Rick', 'Scientist', '2020-01-17')]) FROM my_dataset.my_old_table;
I just tried it with your data
I have inserted your data in a BigQuery table:
INSERT INTO dataset.table
VALUES('{"id": "bar123", "value": "Test", "title": "Test", "alerts":
[{ "id": "abc123", "title": "Foo", "created": "2020-01-17T23:18:59.769908Z"},
{"id": "abc124", "title": "Accepting/Denying Claims", "created": "2020-01-17T23:18:59.769908Z"}]}');
and queried it, converting it back to a BigQuery structure:
SELECT STRUCT<cuid STRING, title STRING, created TIMESTAMP>("abc123",
"Foo", "2020-01-17T23:18:59.769908Z"),("abc124", "Accepting/Denying
Claims", "2020-01-17T23:18:59.769908Z") FROM blabla.testingjson;
Output:
Row | f0_.cuid | f0_.title | f0_.created
----------------------------------------
1 | abc123 | Foo | 2020-01-17 23:18:59.769908 UTC

How to "zip" multiple nested JSON arrays without using id key?

I'm trying to merge some nested JSON arrays without looking at the id. Currently I'm getting this when I make a GET request to /surveyresponses:
{
"surveys": [
{
"id": 1,
"name": "survey 1",
"isGuest": true,
"house_id": 1
},
{
"id": 2,
"name": "survey 2",
"isGuest": false,
"house_id": 1
},
{
"id": 3,
"name": "survey 3",
"isGuest": true,
"house_id": 2
}
],
"responses": [
{
"question": "what is this anyways?",
"answer": "test 1"
},
{
"question": "why?",
"answer": "test 2"
},
{
"question": "testy?",
"answer": "test 3"
}
]
}
But I would like to get it where each survey has its own question and answers so something like this:
{
"surveys": [
{
"id": 1,
"name": "survey 1",
"isGuest": true,
"house_id": 1
"question": "what is this anyways?",
"answer": "test 1"
}
]
}
Because I'm not going to a specific id I'm not sure how to make the relationship work. This is the current query I have that's producing those results.
export function getSurveyResponse(id: number): QueryBuilder {
return db('surveys')
.join('questions', 'questions.survey_id', '=', 'surveys.id')
.join('questionAnswers', 'questionAnswers.question_id', '=', 'questions.id')
.select('surveys.name', 'questions.question', 'questions.question', 'questionAnswers.answer')
.where({ survey_id: id, question_id: id })
}
Assuming jsonb in current Postgres 10 or 11, this query does the job:
SELECT t.data, to_jsonb(s) AS new_data
FROM t
LEFT JOIN LATERAL (
SELECT jsonb_agg(s || r) AS surveys
FROM (
SELECT jsonb_array_elements(t.data->'surveys') s
, jsonb_array_elements(t.data->'responses') r
) sub
) s ON true;
db<>fiddle here
I unnest both nested JSON arrays in parallel to get the desired behavior of "zipping" both directly. The number of elements in both nested JSON arrays has to match or you need to do more (else you lose data).
This builds on implementation details of how Postgres deals with multiple set-returning functions in a SELECT list to make it short and fast. See:
What is the expected behaviour for multiple set-returning functions in select clause?
One could be more explicit with a ROWS FROM expression, which works properly since Postgres 9.4:
SELECT t.data
, to_jsonb(s) AS new_data
FROM tbl t
LEFT JOIN LATERAL (
SELECT jsonb_agg(s || r) AS surveys
FROM ROWS FROM (jsonb_array_elements(t.data->'surveys')
, jsonb_array_elements(t.data->'responses')) sub(s,r)
) s ON true;
The manual about combining multiple table functions.
Or you could use WITH ORDINALITY to get original order of elements and combine as you wish:
PostgreSQL unnest() with element number

jsonb LIKE query on nested objects in an array

My JSON data looks like this:
[{
"id": 1,
"payload": {
"location": "NY",
"details": [{
"name": "cafe",
"cuisine": "mexican"
},
{
"name": "foody",
"cuisine": "italian"
}
]
}
}, {
"id": 2,
"payload": {
"location": "NY",
"details": [{
"name": "mbar",
"cuisine": "mexican"
},
{
"name": "fdy",
"cuisine": "italian"
}
]
}
}]
given a text "foo" I want to return all the tuples that have this substring. But I cannot figure out how to write the query for the same.
I followed this related answer but cannot figure out how to do LIKE.
This is what I have working right now:
SELECT r.res->>'name' AS feature_name, d.details::text
FROM restaurants r
, LATERAL (SELECT ARRAY (
SELECT * FROM json_populate_recordset(null::foo, r.res#>'{payload,
details}')
)
) AS d(details)
WHERE d.details #> '{cafe}';
Instead of passing the whole text of cafe I want to pass ca and get the results that match that text.
Your solution can be simplified some more:
SELECT r.res->>'name' AS feature_name, d.name AS detail_name
FROM restaurants r
, jsonb_populate_recordset(null::foo, r.res #> '{payload, details}') d
WHERE d.name LIKE '%oh%';
Or simpler, yet, with jsonb_array_elements() since you don't actually need the row type (foo) at all in this example:
SELECT r.res->>'name' AS feature_name, d->>'name' AS detail_name
FROM restaurants r
, jsonb_array_elements(r.res #> '{payload, details}') d
WHERE d->>'name' LIKE '%oh%';
db<>fiddle here
But that's not what you asked exactly:
I want to return all the tuples that have this substring.
You are returning all JSON array elements (0-n per base table row), where one particular key ('{payload,details,*,name}') matches (case-sensitively).
And your original question had a nested JSON array on top of this. You removed the outer array for this solution - I did the same.
Depending on your actual requirements the new text search capability of Postgres 10 might be useful.
I ended up doing this(inspired by this answer - jsonb query with nested objects in an array)
SELECT r.res->>'name' AS feature_name, d.details::text
FROM restaurants r
, LATERAL (
SELECT * FROM json_populate_recordset(null::foo, r.res#>'{payload, details}')
) AS d(details)
WHERE d.details LIKE '%oh%';
Fiddle here - http://sqlfiddle.com/#!15/f2027/5