How to update a field in a nested array in Bigquery? - google-bigquery

I am trying to update a table that has STRUCT(a few fields, ARRAY(STRUCT)).
The field that I need to update is inside the array and I am having trouble with making it work.
Here is the layout of the the two tables:
CREATE TABLE mydatset.orders (
order_id string,
order_time timestamp,
trans STRUCT <
id string,
amount INT64,
accounts ARRAY<STRUCT <
role STRING ,
account_id STRING,
region STRING,
amount INT64> > >
)
CREATE TABLE mydatset.relocations (
account_id string,
region string
)
Trying to update the region of any account in the array accounts if that account exists in the relocations table:
update mydataset.orders a
set trans = (SELECT AS STRUCT trans.* REPLACE(ARRAY(SELECT STRUCT<role STRING, account_id STRING, region STRING, amount INT64>
(cp.role, cp.account_id,
case when cp.account_id = ll.account_id then ll.region else cp.region end ,
cp.amount
)
) as accounts )
from unnest(trans.accounts) cp
left join unnest(relocs.chgs) ll
on cp.account_id = ll.account_id
)
from (select array_agg(struct (account_id, region) ) chgs
from`mydataset.relocations`
) relocs
where true
The syntax works, but the sql doesn't perform the expected update. The account's region in the orders table is not changed after running the above update!
(I have seen BigQuery UPDATE nested array field and this case is slightly different. The array is inside a struct and itself is an array of struct)
Appreciate any help.

Below is for BigQuery Standard SQL
#standardSQL
UPDATE `project.dataset.orders`
SET trans = (SELECT AS STRUCT trans.* REPLACE(
ARRAY(SELECT AS STRUCT x.* REPLACE(IFNULL(y.region, x.region) AS region)
FROM UNNEST(trans.accounts) x
LEFT JOIN UNNEST(relocations) y
USING(account_id)
) AS accounts))
FROM (SELECT ARRAY_AGG(t) relocations FROM `project.dataset.relocations` t)
WHERE TRUE
It is tested with below dummy data
initial dummy data that looks like below
[
{
"order_id": "order_id1",
"order_time": "2019-06-28 01:05:16.346854 UTC",
"trans": {
"id": "id1",
"amount": "1",
"accounts": [
{
"role": "role1",
"account_id": "account_id1",
"region": "region1",
"amount": "11"
},
{
"role": "role2",
"account_id": "account_id2",
"region": "region2",
"amount": "12"
}
]
}
},
{
"order_id": "order_id2",
"order_time": "2019-06-28 01:05:16.346854 UTC",
"trans": {
"id": "id2",
"amount": "1",
"accounts": [
{
"role": "role3",
"account_id": "account_id1",
"region": "region4",
"amount": "13"
},
{
"role": "role4",
"account_id": "account_id3",
"region": "region3",
"amount": "14"
}
]
}
}
]
after applying below adjustments
[
{
"account_id": "account_id1",
"region": "regionA"
},
{
"account_id": "account_id2",
"region": "regionB"
}
]
result is
[
{
"id": "id1",
"amount": "1",
"accounts": [
{
"role": "role1",
"account_id": "account_id1",
"region": "regionA",
"amount": "11"
},
{
"role": "role2",
"account_id": "account_id2",
"region": "regionB",
"amount": "12"
}
]
},
{
"id": "id2",
"amount": "1",
"accounts": [
{
"role": "role3",
"account_id": "account_id1",
"region": "regionA",
"amount": "13"
},
{
"role": "role4",
"account_id": "account_id3",
"region": "region3",
"amount": "14"
}
]
}
]

Related

select node value from json column type

A table I called raw_data with three columns: ID, timestamp, payload, the column paylod is a json type having values such as:
{
"data": {
"author_id": "1461871206425108480",
"created_at": "2022-08-17T23:19:14.000Z",
"geo": {
"coordinates": {
"type": "Point",
"coordinates": [
-0.1094,
51.5141
]
},
"place_id": "3eb2c704fe8a50cb"
},
"id": "1560043605762392066",
"text": " ALWAYS # London, United Kingdom"
},
"matching_rules": [
{
"id": "1560042248007458817",
"tag": "london-paris"
}
]
}
From this I want to select rows where the coordinates is available, such as [-0.1094,51.5141]in this case.
SELECT *
FROM raw_data, json_each(payload)
WHERE json_extract(json_each.value, '$.data.geo.') IS NOT NULL
LIMIT 20;
Nothing was returned.
EDIT
NOT ALL json objects have the coordinates node. For example this value:
{
"data": {
"author_id": "1556031969062010881",
"created_at": "2022-08-18T01:42:21.000Z",
"geo": {
"place_id": "006c6743642cb09c"
},
"id": "1560079621017796609",
"text": "Dear Desperate sister say husband no dey oo."
},
"matching_rules": [
{
"id": "1560077018183630848",
"tag": "kaduna-kano-katsina-dutse-zaria"
}
]
}
The correct path is '$.data.geo.coordinates.coordinates' and there is no need for json_each():
SELECT *
FROM raw_data
WHERE json_extract(payload, '$.data.geo.coordinates.coordinates') IS NOT NULL;
See the demo.

How to extract a value in a JSON table on BigQuery?

I have a JSON table which has over 30.000 rows. There are different rows like this:
JSON_columns
------------
{
"level": 20,
"nickname": "ABCDE",
"mission_name": "take_out_the_trash",
"mission_day": "150",
"duration": "0",
"properties": []
}
{
"nickname": "KLMNP",
"mission_name": "recycle",
"mission_day": "180",
"properties": [{
"key": "bottle",
"value": {
"string_value": "blue_bottle"
}
}, {
"key": "bottleRecycle",
"value": {
"string_value": "true"
}
}, {
"key": "price",
"value": {
"float_value": 21.99
}
}, {
"key": "cost",
"value": {
"float_value": 15.39
}
}]
}
I want to take the sum of costs the table. But firtsly, I want to extract the cost from the table.
I tried the code below. It returns null:
SELECT JSON_VALUE('$.properties[3].value.float_value') AS profit
FROM `missions.missions_study`
WHERE mission_name = "recycle"
My question is, how can I extract the cost values right, and sum them?
Common way to extract cost from your json is like below.
WITH sample_table AS (
SELECT '{"level":20,"nickname":"ABCDE","mission_name":"take_out_the_trash","mission_day":"150","duration":"0","properties":[]}' json
UNION ALL
SELECT '{"nickname":"KLMNP","mission_name":"recycle","mission_day":"180","properties":[{"key":"bottle","value":{"string_value":"blue_bottle"}},{"key":"bottleRecycle","value":{"string_value":"true"}},{"key":"price","value":{"float_value":21.99}},{"key":"cost","value":{"float_value":15.39}}]}' json
)
SELECT SUM(cost) AS total FROM (
SELECT CAST(JSON_VALUE(prop, '$.value.float_value') AS FLOAT64) AS cost
FROM sample_table, UNNEST(JSON_QUERY_ARRAY(json, '$.properties')) prop
WHERE JSON_VALUE(json, '$.mission_name') = 'recycle'
AND JSON_VALUE(prop, '$.key') = 'cost'
);

Select data from Json array MS SQL Server

I have to select data from Json like this:
[
{
"id": 10100,
"externalId": "100000035",
"name": "Test1",
"companyId": 10099,
"phone": "0738003811",
"email": "test#Test.com",
"mainAddress": {
"county": "UK",
"province": "test",
"zipCode": "01234",
"city": "test",
"street": "test",
"gln": "44,37489331;26,21941193",
"country": {
"iso2": "UK",
"iso3": "UK"
}
},
"active": false,
"main": true,
"stores": [
"Test"
],
"attributes": [
{
"attributeId": 1059,
"attributeName": "CH6 name",
"attributeExternalId": null,
"attributeValueId": 74292,
"attributeValueType": "MONO_LINGUAL",
"attributeValueEid": null,
"attributePlainValue": "Unknown"
},
{
"attributeId": 1061,
"attributeName": "BD",
"attributeExternalId": null,
"attributeValueId": 81720,
"attributeValueType": "MONO_LINGUAL",
"attributeValueEid": null,
"attributePlainValue": "Not assigned"
}
],
"daysSinceLastOrder": null
},
{
"id": 62606,
"externalId": "VL_LC_000190",
"name": "Test",
"companyId": 17793,
"phone": "44333424",
"email": "test#email.com",
"mainAddress": {
"firmName": "test",
"county": "test",
"province": "test",
"zipCode": "247555",
"city": "test",
"street": "test",
"gln": "44.8773851;23.9223518",
"country": {
"iso2": "RO",
"iso3": "ROU"
},
"phone": "07547063789"
},
"active": true,
"main": false,
"stores": [
"Valcea"
],
"attributes": [
{
"attributeId": 1042,
"attributeName": "Type of location",
"attributeExternalId": "TYPE_OF_DIVISION",
"attributeValueId": 34506,
"attributeValueType": "MONO_LINGUAL",
"attributeValueEid": "Small OTC (<40mp)",
"attributePlainValue": "Small OTC (<40mp)"
},
{
"attributeId": 17,
"attributeName": "Limit for payment",
"attributeExternalId": "LIMIT_FOR_PAYMENT_IN_DAYS",
"attributeValueId": 59120,
"attributeValueType": "NUMBER",
"attributeValueEid": null,
"attributePlainValue": "28"
}
],
"daysSinceLastOrder": 147
}
]
I know how to select data from simple json object using "FROM OPENJSON",
but now I have to select a
AttributeValueId, AttributeId and AttributeName, attributePlainValue and CompanyId for each Attribute. So I dont know how to select data from attributes array and then how to join to this CompanyId which is one level up.
Maybe someone knows how write this query.
As mentioned by #lptr in the comments:
You need to pass the result of one OPENJSON to another, using CROSS APPLY. You can select a whole JSON object or array as a property, by using the syntax AS JSON
select
t1.companyid,
t2.*
from openjson(#j)
with (
companyId int,
attributes nvarchar(max) as json
) as t1
cross apply openjson(t1.attributes)
with
(
attributeId int,
attributeName nvarchar(100),
attributeValueId nvarchar(100),
attributePlainValue nvarchar(100)
) as t2;
db<>fiddle
For example, you can use code like this.
f1.metaData->"$.identity.customerID" = '.$customerID.'

Problem with using of FOR JSON AUTO in SQL Server

I am using FOR JSON AUTO in SQL server database, to convert my query's result to the JSON format.
in my query, I joined order table to two other tables.
SELECT
orders.[Code], orders.[Total], orders.[Discount],
customer.[Name], customer.[PhoneNumber],
store.[Name], store.[Address]
FROM
Orders orders
INNER JOIN
Customers customer ON (orders.[CustomerID] = customer.[ID])
INNER JOIN
Stores store ON (orders.[StoreID] = store.[ID])
FOR JSON AUTO
Result:
[
{
"Code": "1528",
"Total": 5000,
"Discount": 20,
"customer": [
{
"Name": "Alex",
"PhoneNumber": "(548) 123-5555",
"store": [
{
"Name": "Apple",
"Address": "E. Santa rd"
}
]
}
]
},
{
"Code": "1687",
"Total": 3000,
"Discount": 10,
"customer": [
{
"Name": "John",
"PhoneNumber": "(226) 354-7896",
"store": [
{
"Name": "Sony",
"Address": "W. Atlantic ave"
}
]
}
]
}
]
But it's not correct, because in this scenario customer and store are sibling and they have same parent, and both of them joined with the order table directly, correct JSON must be such as this:
[
{
"Code": "1528",
"Total": 5000,
"Discount": 20,
"customer": [
{
"Name": "Alex",
"PhoneNumber": "(548) 123-5555"
}
],
"store": [
{
"Name": "Apple",
"Address": "E. Santa rd"
}
]
},
{
"Code": "1687",
"Total": 3000,
"Discount": 10,
"customer": [
{
"Name": "John",
"PhoneNumber": "(226) 354-7896"
}
],
"store": [
{
"Name": "Sony",
"Address": "W. Atlantic ave"
}
]
}
]
how can I do that? Are there any option for this in SQL? (I don't want to use inner select.)
If there are one-to-one relationships between Orders and Customer and between Orders and Store then you can make the desired output by using PATH option and dot-separated column names:
SELECT
orders.[Code], orders.[Total], orders.[Discount],
customer.[Name] AS [Customer.Name], customer.[PhoneNumber] AS [Customer.PhoneNumber],
store.[Name] AS [Store.Name], store.[Address] AS [Store.Address]
FROM
Orders orders
INNER JOIN
Customers customer ON (orders.[CustomerID] = customer.[ID])
INNER JOIN
Stores store ON (orders.[StoreID] = store.[ID])
FOR JSON PATH
But if there are one-to-many relationships then you have to use nested queries:
SELECT
orders.[Code], orders.[Total], orders.[Discount],
(SELECT [Name], [PhoneNumber] FROM Customers WHERE Customers.ID=Orders.CustomerID FOR JSON AUTO) AS Customers,
(SELECT [Name], [Address] FROM Stores WHERE Stores.ID=Orders.StoreID FOR JSON AUTO) AS Stores
FROM
Orders orders
FOR JSON AUTO

Extract array from varchar in PrestoSQL

I have a VARCHAR field like this:
[
{
"config": 0,
"type": "0
},
{
"config": x,
"type": "1"
},
{
"config": "",
"type": ""
},
{
"config": [
{
"address": {},
"category": "",
"merchant": {
"data": [
10,12,23
],
"file": 0
},
"range_id": 1,
"shop_id_info": null
}
],
"type": "new"
}
]
And I need to extract merchant data from this. Desirable output is:
10
12
23
Please advise. I keep getting Cannot cast VARCHAR to array/unnest type VARCHAR
You can try using json path $.*.config.*.merchant.data.* but if it does not work for you (as for me in Athena version, where arrays in json path are not supported well) you can cast your json to ARRAY(JSON) and do some manipultaions from there (needed to fix your JSON a little bit):
Test data:
WITH dataset AS (
SELECT * FROM (VALUES
(JSON '[
{
"config": {},
"type": "0"
},
{
"config": "x",
"type": "1"
},
{
"config": "",
"type": ""
},
{
"config": [
{
"address": {},
"category": "",
"merchant": {
"data": [
10,12,23
],
"file": 0
},
"range_id": 1,
"shop_id_info": null
}
],
"type": "new"
}
]')
) AS t (json_value))
And query:
SELECT flatten(
transform(
flatten(
transform(
CAST(json_value AS ARRAY(JSON))
, json_object -> try(CAST(json_extract(json_object, '$.config') AS ARRAY(JSON))))),
json_config -> CAST(json_extract(json_config, '$.merchant.data') as ARRAY(INTEGER))))
FROM dataset
Which will give you array of numbers:
_col0
[10, 12, 23]
And from there you can continue with unnest and so on if needed.