Bigquery: Append to a nested record

Bigquery: Append to a nested record - sql

I'm currently checking out Bigquery, and I want to know if it's possible to add new data to a nested table.
For example, if I have a table like this:
[
{
"name": "name",
"type": "STRING"
},
{
"name": "phone",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "number",
"type": "STRING"
},
{
"name": "type",
"type": "STRING"
}
]
}
]
And then I insert a phone number for the contact John Doe.
INSERT into socialdata.phones_examples (name, phone) VALUES("Jonh Doe", [("555555", "Home")]);
Is there an option to later add another number to the contact ? To get something like this:
I know I can update the whole field, but I want to know if there is way to append to the nested table new values.

When you insert data into BigQuery, the granularity is the level of rows, not elements of the arrays contained within rows. You would want to use a query like this, where you update the relevant row and append to the array:
UPDATE socialdata.phones_examples
SET phone = ARRAY_CONCAT(phone, [("555555", "Home")])
WHERE name = "Jonh Doe"

if you need to update multiple records for some users - you can use below
#standardSQL
UPDATE `socialdata.phones_examples` t
SET phone = ARRAY_CONCAT(phone, [new_phone])
FROM (
SELECT 'John Doe' name, STRUCT<number STRING, type STRING>('123-456-7892', 'work') new_phone UNION ALL
SELECT 'Abc Xyz' , STRUCT('123-456-7893', 'work') new_phone
) u
WHERE t.name = u.name
or if those updates are available in some table (for example socialdata.phones_updates):
#standardSQL
UPDATE `socialdata.phones_examples` t
SET phone = ARRAY_CONCAT(phone, [new_phone])
FROM `socialdata.phones_updates` u
WHERE t.name = u.name

Related

Insert into table using data from JSON value in PostgresQL

I am working on a sensitive migration. The scenario is as follows:
I have a new table that I need to populate with data
There is an existing table, which contains a column (type = json), which contains an array of objects such as:
[
{
"id": 0,
"name": "custom-field-0",
"label": "When is the deadline for a response?",
"type": "Date",
"options": "",
"value": "2020-10-02",
"index": 1
},
{
"id": 1,
"name": "custom-field-1",
"label": "What territory does this relate to?",
"type": "Dropdown",
"options": "UK, DE, SE, DK, BE, NL, IT, FR, ES, AT, CH, NO, US, SG, Other",
"value": " DE",
"index": 2
}
]
I need to essentially map these values in this column to my new table. I have worked with JSON data in PostgresQL before, where I was dealing with a single object in the JSON, but never with arrays of objects and on such a large scale.
So just to summarise, how does someone iterate every row, and every object in an array, and insert that data into a new table?
EDIT
I have been experimenting with some functions, and I found one that seems promising json_array_elements_text or json_array_elements. As this allowed me to add multiple rows to the new table using this array of objects.
However, my issue is that I need to map certain values to the new table.
INSERT INTO form_field_value ("name", "label", "inputType", "options", "form" "workspace")
SELECT <<HERE IS WHERE I NEED TO EXTRACT VALUES FROM THE JSON ARRAY>>, task.form, task.workspace
FROM task;
EDIT 2
I have been playing around some more with the above functions, but reached a slight issue.
INSERT INTO form_field_value ("name", "label", "inputType", "options", "form" "workspace")
SELECT cf ->> 'name',
(cf ->> 'label')
...
FROM jsonb_array_elements(task."customFields") AS t(cf);
My issue lies in the FROM clause, so customFields is the array of objects, but I also need to get the form and workspace attribute from this table too. Plus I a pretty sure that the FROM clause would not work anyway, as it probably will complain about the task."customFields" not being specified or something.

Here is the select statement that uses json_array_elements and a lateral join in the from clause to flatten the data.
select j ->> 'name' as "name", j ->> 'label' as "label",
j ->> 'type' as "inputType", j ->> 'options' as "options", form, workspace
from task
cross join lateral json_array_elements("customFields") as l(j);
The from clause can be less verbose
from task, json_array_elements("customFields") as l(j)

you can try to use json_to_recordset:
select * from json_to_recordset('
[
{
"id": 0,
"name": "custom-field-0",
"label": "When is the deadline for a response?",
"type": "Date",
"options": "",
"value": "2020-10-02",
"index": 1
},
{
"id": 1,
"name": "custom-field-1",
"label": "What territory does this relate to?",
"type": "Dropdown",
"options": "UK, DE, SE, DK, BE, NL, IT, FR, ES, AT, CH, NO, US, SG, Other",
"value": " DE",
"index": 2
}
]
') as x(id int, name text,label text,type text,options text,value text,index int)
for insert record you can use an sql like this:
INSERT INTO form_field_value ("name", "label", "inputType", "options", "form" "workspace")
SELECT name, label, type, options, form, workspace
FROM
task,
json_to_record(task) AS
x (id int, name text,label text,type text,options text,value text,index int)

LIKE in Array of Objects in JSONB column

I have JSONB data in a Postgres column like this:
{
"Id": "5c6d3210-1def-489b-badd-2bcc4a1cda28",
"Name": "Jane Doe",
"Tags": [
{
"Key": "Project",
"Value": "1004345"
}
]
}
How can I query data where Name contains "Jane" or "Tags.Key" contains "4345"?
I tried this but this only matches the exact "Key" value:
select * from documents where data->'Tags' #> '[{ "Value":"1004345"}]';

You can use a JSON path operator using like_regex
select *
from documents
where data ## '$.Tags[*].Value like_regex "4345"'

you can do this way
select *
from documents
where 'Tags' ->> 'Value' = '1004345';

How to generate JSON array from multiple rows, then return with values of another table

I am trying to build a query which combines rows of one table into a JSON array, I then want that array to be part of the return.
I know how to do a simple query like
SELECT *
FROM public.template
WHERE id=1
And I have worked out how to produce the JSON array that I want
SELECT array_to_json(array_agg(to_json(fields)))
FROM (
SELECT id, name, format, data
FROM public.field
WHERE template_id = 1
) fields
However, I cannot work out how to combine the two, so that the result is a number of fields from public.template with the output of the second query being one of the returned fields.
I am using PostGreSQL 9.6.6
Edit, as requested more information, a definition of field and template tables and a sample of each queries output.
Currently, I have a JSONB row on the template table which I am using to store an array of fields, but I want to move fields to their own table so that I can more easily enforce a schema on them.
Template table contains:
id
name
data
organisation_id
But I would like to remove data and replace it with the field table which contains:
id
name
format
data
template_id
At the moment the output of the first query is:
{
"id": 1,
"name": "Test Template",
"data": [
{
"id": "1",
"data": null,
"name": "Assigned User",
"format": "String"
},
{
"id": "2",
"data": null,
"name": "Office",
"format": "String"
},
{
"id": "3",
"data": null,
"name": "Department",
"format": "String"
}
],
"id_organisation": 1
}
This output is what I would like to recreate using one query and both tables. The second query outputs this, but I do not know how to merge it into a single query:
[{
"id": 1,
"name": "Assigned User",
"format": "String",
"data": null
},{
"id": 2,
"name": "Office",
"format": "String",
"data": null
},{
"id": 3,
"name": "Department",
"format": "String",
"data": null
}]

The feature you're looking for is json concatenation. You can do that by using the operator ||. It's available since PostgreSQL 9.5
SELECT to_jsonb(template.*) || jsonb_build_object('data', (SELECT to_jsonb(field) WHERE template_id = templates.id)) FROM template

Sorry for poorly phrasing what I was trying to achieve, after hours of Googling I have worked it out and it was a lot more simple than I thought in my ignorance.
SELECT id, name, data
FROM public.template, (
SELECT array_to_json(array_agg(to_json(fields)))
FROM (
SELECT id, name, format, data
FROM public.field
WHERE template_id = 1
) fields
) as data
WHERE id = 1
I wanted the result of the subquery to be a column in the ouput rather than compiling the entire output table as a JSON.

How to Query Cosmos DB graph by use of SQL CONTAINS

I have a Cosmo DB graph where I would like to access the 'name' field in an expression using the string matching CONTAINS in Cosmos DB. CONTAINS works at 1 level as in matching CONATINS
SELECT s.label, s.name FROM s WHERE CONTAINS(LOWER(s.name._value), "cara") AND s.label = "site"
I also tried with a UDF function
SELECT s.label, s.name FROM s WHERE(s.label = 'site' AND udf.strContains(s.name._value, '/cara/i'))
I don't get any hits or syntax errors from Cosmos DB even that should be at least one record in this example. Does anyone have a hint? Thanks in advance
[
{
"label": "site",
"name": [
{
"_value": "0315817 Caracol",
"id": "2e2f000d-2e0a-435a-b472-75d257236558"
}
]
},
{
"label": "site",
"name": [
{
"_value": "0315861 New Times",
"id": "48497172-1734-43d0-9866-51faf9f603ed"
}
]
}
]

I noticed that the name property is an array not an object.So, you need to use join in sql.
SELECT s.label, s.name , name._value FROM s
join name in s.name
where CONTAINS(LOWER(name._value), "cara") AND s.label = "site"
Output:
Hope it helps you.

BigQuery Schema design for arbitrary tags

I'm investigating the feasibility of using BigQuery to store sensor data in time series. The intent is to store the data in BQ and process it in Pandas... so far so good... Pandas can interpret a TIMESTAMP field index and create a Series.
An additional requirement is that the data support arbitrary tags as key/value pairs (e.g. job_id=1234, task_id=5678). BigQuery can support this nicely with REPEATED fields of type RECORD:
{'fields':
[
{
"mode": "NULLABLE",
"name": "timestamp",
"type": "TIMESTAMP"
},
{
"mode": "REPEATED",
"name": "tag",
"type": "RECORD",
"fields":
[
{
"name":"name",
"type":"STRING"
},
{
"name":"value",
"type":"STRING"
},
{
"mode": "NULLABLE",
"name": "measurement_1",
"type": "FLOAT"
},
{
"mode": "NULLABLE",
"name": "measurement_2",
"type": "FLOAT"
},
{
"mode": "NULLABLE",
"name": "measurement_3",
"type": "FLOAT"
},
]
},
]
}
This works great for storing the data and it even works great for querying if I only need to filter on a single key/value combination
SELECT measurement_1 FROM measurements
WHERE tag.name = 'job_id' AND tag.value = '1234'
However, I also need to be able to combine sets of tags in query expressions and I can't seem to make this work. For example this query returns no result
SELECT measurement_1 FROM measurements
WHERE tag.name = 'job_id' AND tag.value = '1234'
AND tag.name = 'task_id' AND tag.value = '5678'
Questions: Is it possible to formulate a query to do what I want using this schema? What is the recommended way to attach this type of variable data to an otherwise fixed schema in Big Query?
Thanks for any help or suggestions!
Note: If you're thinking this looks like a great fix for InfluxDB it's because that's what I've been using thus far. The seemingly insurmountable issue is the amount of series cardinality in my data set, so I'm looking for alternatives.

BigQuery Legacy SQL
SELECT measurement_1 FROM measurements
OMIT RECORD IF
SUM((tag.name = 'job_id' AND tag.value = '1234')
OR (tag.name = 'task_id' AND tag.value = '5678')) < 2
BigQuery Standard SQL
SELECT measurement_1 FROM measurements
WHERE (
SELECT COUNT(1) FROM UNNEST(tag)
WHERE ((name = 'job_id' AND value = '1234')
OR (name = 'task_id' AND value = '5678'))
) >= 2

Repeated are great way for storing data series, collection etc.
In order to filter out from repeated fields just the value of one interest I would use the following template
SELECT
MAX( IF( filter criteria, value_to_pull, null)) WITHIN RECORD AS some_name
FROM <table>
In your case it would be the following:
SELECT
MAX(IF(tag.name = 'job_id' AND tag.value = '1234', measurement_1, NULL)) WITHIN RECORD AS job_1234_meassurement_1,
MAX(IF(tag.name = 'task_id' AND tag.value = '5678', measurement_1, NULL)) WITHIN RECORD AS task_5678_meassurement_1,
FROM measurements

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Bigquery: Append to a nested record - sql

Related

Insert into table using data from JSON value in PostgresQL

LIKE in Array of Objects in JSONB column

How to generate JSON array from multiple rows, then return with values of another table

How to Query Cosmos DB graph by use of SQL CONTAINS

BigQuery Schema design for arbitrary tags

Categories

Resources