transform multiple columns to records in Bigquery - sql

I'm trying to convert a flat table into a nested table in Bigquery.
If I want to take a row, and transform some of the columns into 2 fields:
key.name
key.value
for example, If I'm taking this table:
and I want to convert it to the following structure:

You can define this as an array. I would suggest putting this into a string struct so you have only one array:
select unique_key, cast_number, date,
[struct('block' as key, block as value),
struct('iucr' as key, iucr as value),
struct('primary_type' as key, primary_type as value),
. . .
] as key_values
But for what you specifically ask for:
select unique_key, cast_number, date,
['block', 'iucr', 'primary_type', . . . ] as keys,
[block, iucr, primary_type, . . . ] as values
Note that these assume that the values are all strings. You may have to convert some values if they are not.

Related

extract values inside an array column in amazon athena

I have a table in athena aws where the column 'metadata_stopinfo' has the structure that you can see in the image.
I am trying to extract values that are inside that array, however when I try
SELECT
"json_extract_scalar"(metadata_stopinfo, '$.city')
FROM "table"
I have the following problem
SYNTAX_ERROR: line 2:5: Unexpected parameters (array(row("address" row("addressline" varchar,"city" varchar,"countrycode" varchar,"countrycodeoriginal" varchar,"state" varchar,"zipcode" varchar),"carrierreference" varchar,"contacts" array(row("contacttype" varchar,"email" varchar,"fax" varchar,"mobilephone" varchar,"name" varchar,"officephone" varchar,"userid" varchar)),"containerinfo" array(row("containerid" varchar,"containeridtype" varchar,"equipmentcode" varchar,"equipmenttype" varchar)),"conveyancelinenumber" varchar,"conveyancetype" varchar,"conveyancetypeoriginal" varchar,"dateinfo" row("arrivalestimateddate" varchar,"arrivalestimateddateend" varchar,"arrivalestimatedendoffset" varchar,"arrivalestimatedoffset" varchar,"arrivalrequesteddate" varchar,"deliveryestimateddate" varchar,"deliveryestimateddateend" varchar,"deliveryestimatedendoffset" varchar,"deliveryestimatedoffset" varchar,"deliveryrequesteddate" varchar,"deliveryrequesteddateend" varchar,"deliveryrequestedendoffset" varchar,"deliveryrequestedoffset" varchar,"departureestimateddate" varchar,"departureestimateddateend" varchar,"departureestimatedendoffset" varchar,"departureestimatedoffset" varchar,"departurerequesteddate" varchar,"pickuprequesteddate" varchar,"pickuprequesteddateend" varchar,"pickuprequestedendoffset" varchar,"pickuprequestedoffset" varchar,"pickupestimateddate" varchar,"pickupestimateddateend" varchar,"pickupestimatedendoffset" varchar,"pickupestimatedoffset" varchar),"deliverynotenumber" varchar,"instructions" array(row("customerspecificsubtype" varchar,"header" boolean,"instructionsubtype" varchar,"instructiontype" varchar,"text" varchar)),"locationid" varchar,"partnercarrieraddress" row("addressline" varchar,"city" varchar,"countrycode" varchar,"countrycodeoriginal" varchar,"state" varchar,"zipcode" varchar),"partnercarriercontacts" array(row("contacttype" varchar,"email" varchar,"fax" varchar,"name" varchar,"officephone" varchar)),"partnercarrierid" varchar,"partnercarriername" varchar,"partnerid" varchar,"partnername" varchar,"partnertimezone" varchar,"partnertype" varchar,"productquantity" row("number" double,"originalunitofmeasure" varchar,"quantitytype" varchar,"unitofmeasure" varchar),"sequencenumber" bigint,"shipmentidentifier" varchar,"stoptype" varchar,"transportinfo" row("description" varchar,"transportcode" varchar,"transportoriginalcode" varchar),"vesselinfo" row("lloydsnumber" varchar,"shipsradiocallnumber" varchar,"vesselname" varchar,"vesselnumber" varchar,"voyagetripnumber" varchar))), varchar(6)) for function json_extract_scalar. Expected: json_extract_scalar(varchar(x), JsonPath) , json_extract_scalar(json, JsonPath)
My question is, how can i extract values inside de column ?
json_extract_scalar unsurprisingly works with json (note that even if yur data was in json format, json_extract_scalar(metadata_stopinfo, '$.city') still would not have worked cause your data is an array), while your column contains array's of row's, so you need to work with it correspondingly. For example you can use indexes to access elements in array (in presto array indexes start from 1):
SELECT
metadata_stopinfo[1] r
FROM "table"
And then access the fields:
The fields may be of any SQL type, and are accessed with field reference operator .
SELECT
metadata_stopinfo[1].city city
FROM "table"
Also you can flatten the array with unnest:
SELECT r.city
FROM "table",
unnest(metadata_stopinfo) as t(r)

Extracting JSON returns null (Presto Athena)

I'm working with SQL Presto in Athena and in a table I have a column named "data.input.additional_risk_data.basket" that has a json like this:
[
{
"data.input.additional_risk_data.basket.val.brand":null,
"data.input.additional_risk_data.basket.val.category":null,
"data.input.additional_risk_data.basket.val.item_reference":"26484651",
"data.input.additional_risk_data.basket.val.name":"Nike Force 1",
"data.input.additional_risk_data.basket.val.product_name":null,
"data.input.additional_risk_data.basket.val.published_date":null,
"data.input.additional_risk_data.basket.val.quantity":"1",
"data.input.additional_risk_data.basket.val.size":null,
"data.input.additional_risk_data.basket.val.subCategory":null,
"data.input.additional_risk_data.basket.val.unit_price":769.0,
"data.input.additional_risk_data.basket.val.upc":null,
"data.input.additional_risk_data.basket.val.url":null
}
]
I need to extract some of the data there, for example data.input.additional_risk_data.basket.val.item_reference. I'm not used to working with jsons but I tried a few things:
json_extract("data.input.additional_risk_data.basket", '$.data.input.additional_risk_data.basket.val.item_reference')
json_extract_scalar("data.input.additional_risk_data.basket", '$.data.input.additional_risk_data.basket.val.item_reference)
They all returned null. I'm wondering what is the correct way to get the values from that json
Thank you!
There are multiple "problems" with your data and json path selector. Keys are not conventional (and I have not found a way to tell athena to escape them) and your json is actually an array of json objects. What you can do - cast data to an array and process it. For example:
-- sample data
WITH dataset (json_val) AS (
VALUES (json '[
{
"data.input.additional_risk_data.basket.val.brand":null,
"data.input.additional_risk_data.basket.val.category":null,
"data.input.additional_risk_data.basket.val.item_reference":"26484651",
"data.input.additional_risk_data.basket.val.name":"Nike Force 1",
"data.input.additional_risk_data.basket.val.product_name":null,
"data.input.additional_risk_data.basket.val.published_date":null,
"data.input.additional_risk_data.basket.val.quantity":"1",
"data.input.additional_risk_data.basket.val.size":null,
"data.input.additional_risk_data.basket.val.subCategory":null,
"data.input.additional_risk_data.basket.val.unit_price":769.0,
"data.input.additional_risk_data.basket.val.upc":null,
"data.input.additional_risk_data.basket.val.url":null
}
]')
)
--query
select arr[1]['data.input.additional_risk_data.basket.val.item_reference'] item_reference -- or use unnest if there are actually more than 1 element in array expected
from(
select cast(json_val as array(map(varchar, json))) arr
from dataset
)
Output:
item_reference
"26484651"

in snowflake, how to get a list of all values of a certain key out a list of key values

Have a column of a large semi structured object, one of the parts is a key value on its own (actually a list of key values) I can get it like so:
t.payload:questions_and_answers
which gives:
[{"answer":"yes","position":0,"question":"would you"},
{"answer":"because","position":1,"question":"what"}]
I want to get from that:
yes, because
any ideas?
Using FLATTEN:
CREATE OR REPLACE TABLE t
AS
SELECT PARSE_JSON('{questions_and_answers:[{"answer":"yes","position":0,"question":"would you"},
{"answer":"because","position":1,"question":"what"}]}') AS payload;
Query:
SELECT s.value:answer::STRING
FROM t
,TABLE(FLATTEN (input => t.payload, PATH =>'questions_and_answers')) s;
Or if single output is required:
SELECT LISTAGG(s.value:answer::STRING, ', ') AS result
FROM t
,TABLE(FLATTEN (input => t.payload, PATH =>'questions_and_answers')) s;
Output:

How to change sum headers in group_by of a tree view into min?

I need to remove the sum header of some columns that it automatically calculated for me.
I created a new tree view without inheriting any view, and have the fields like so:
. . .
<field name="model">my.purchase.order.line.inherit</field>
. . .
. . .
<field name="product_uom_qty"/>
<field name="price_unit"/>
. . .
<field name="price_subtotal" widget="monetary"/>
. . .
. . .
The main model is from purchase.order.line.
Now with the answer from CZoellner, my model is now something like this.
class purchase_order_line_inherit(models.Model):
_name = "my.purchase.order.line.inherit"
_inherit = "purchase.order.line"
product_uom_qty = fields.Float(group_operator="min")
I think the system just calculated them for me but the point is that I want to change it from sum to min.
I have seen this but my fields (as shown earlier) do not have the sum attribute. I also have tried something like this but the sum headers are still there.
How can I achieve such task?
You can define that on field definition. In your case it would be:
my_field = fields.Float(compute="_compute_my_field", group_operator="min")
The possible operators can be found in the documentation.
group_operator (str) – aggregate function used by read_group() when
grouping on this field.
Supported aggregate functions are:
array_agg : values, including nulls, concatenated into an array
count : number of rows
count_distinct : number of distinct rows
bool_and : true if all values are true, otherwise false
bool_or : true if at least one value is true, otherwise false
max : maximum value of all values
min : minimum value of all values
avg : the average (arithmetic mean) of all values
sum : sum of all values

Postgresql - pick up field from object array to text array

how can I pick up all id field '{"se":[{"id":"123"}, {"id":"456"}]}' and get ["123", "456"]
I tried the SQL below, but it not work, the json path always need a index.
select '{"se":[{"id":"123"}, {"id":"456"}]}'::JSONB #> '{se, id}'
only could get the first one as text
select '{"se":[{"id":"123"}, {"id":"456"}]}'::JSONB #> '{se, 0, id}'
That should be done in few separate steps:
First take out the 'se' object
then expand the array items to separate json objects
finally find the value of the id key.
If you need those ids to be a list again then wrap the results with a jsonb_agg function.
SELECT
jsonb_agg(id) id_list
FROM
(SELECT jsonb_array_elements('{"se":[{"id":"123"}, {"id":"456"}]}'::jsonb #> '{se}') -> 'id' AS id) ids
;