I have a table in athena aws where the column 'metadata_stopinfo' has the structure that you can see in the image.
I am trying to extract values that are inside that array, however when I try
SELECT
"json_extract_scalar"(metadata_stopinfo, '$.city')
FROM "table"
I have the following problem
SYNTAX_ERROR: line 2:5: Unexpected parameters (array(row("address" row("addressline" varchar,"city" varchar,"countrycode" varchar,"countrycodeoriginal" varchar,"state" varchar,"zipcode" varchar),"carrierreference" varchar,"contacts" array(row("contacttype" varchar,"email" varchar,"fax" varchar,"mobilephone" varchar,"name" varchar,"officephone" varchar,"userid" varchar)),"containerinfo" array(row("containerid" varchar,"containeridtype" varchar,"equipmentcode" varchar,"equipmenttype" varchar)),"conveyancelinenumber" varchar,"conveyancetype" varchar,"conveyancetypeoriginal" varchar,"dateinfo" row("arrivalestimateddate" varchar,"arrivalestimateddateend" varchar,"arrivalestimatedendoffset" varchar,"arrivalestimatedoffset" varchar,"arrivalrequesteddate" varchar,"deliveryestimateddate" varchar,"deliveryestimateddateend" varchar,"deliveryestimatedendoffset" varchar,"deliveryestimatedoffset" varchar,"deliveryrequesteddate" varchar,"deliveryrequesteddateend" varchar,"deliveryrequestedendoffset" varchar,"deliveryrequestedoffset" varchar,"departureestimateddate" varchar,"departureestimateddateend" varchar,"departureestimatedendoffset" varchar,"departureestimatedoffset" varchar,"departurerequesteddate" varchar,"pickuprequesteddate" varchar,"pickuprequesteddateend" varchar,"pickuprequestedendoffset" varchar,"pickuprequestedoffset" varchar,"pickupestimateddate" varchar,"pickupestimateddateend" varchar,"pickupestimatedendoffset" varchar,"pickupestimatedoffset" varchar),"deliverynotenumber" varchar,"instructions" array(row("customerspecificsubtype" varchar,"header" boolean,"instructionsubtype" varchar,"instructiontype" varchar,"text" varchar)),"locationid" varchar,"partnercarrieraddress" row("addressline" varchar,"city" varchar,"countrycode" varchar,"countrycodeoriginal" varchar,"state" varchar,"zipcode" varchar),"partnercarriercontacts" array(row("contacttype" varchar,"email" varchar,"fax" varchar,"name" varchar,"officephone" varchar)),"partnercarrierid" varchar,"partnercarriername" varchar,"partnerid" varchar,"partnername" varchar,"partnertimezone" varchar,"partnertype" varchar,"productquantity" row("number" double,"originalunitofmeasure" varchar,"quantitytype" varchar,"unitofmeasure" varchar),"sequencenumber" bigint,"shipmentidentifier" varchar,"stoptype" varchar,"transportinfo" row("description" varchar,"transportcode" varchar,"transportoriginalcode" varchar),"vesselinfo" row("lloydsnumber" varchar,"shipsradiocallnumber" varchar,"vesselname" varchar,"vesselnumber" varchar,"voyagetripnumber" varchar))), varchar(6)) for function json_extract_scalar. Expected: json_extract_scalar(varchar(x), JsonPath) , json_extract_scalar(json, JsonPath)
My question is, how can i extract values inside de column ?
json_extract_scalar unsurprisingly works with json (note that even if yur data was in json format, json_extract_scalar(metadata_stopinfo, '$.city') still would not have worked cause your data is an array), while your column contains array's of row's, so you need to work with it correspondingly. For example you can use indexes to access elements in array (in presto array indexes start from 1):
SELECT
metadata_stopinfo[1] r
FROM "table"
And then access the fields:
The fields may be of any SQL type, and are accessed with field reference operator .
SELECT
metadata_stopinfo[1].city city
FROM "table"
Also you can flatten the array with unnest:
SELECT r.city
FROM "table",
unnest(metadata_stopinfo) as t(r)
I'm working with SQL Presto in Athena and in a table I have a column named "data.input.additional_risk_data.basket" that has a json like this:
[
{
"data.input.additional_risk_data.basket.val.brand":null,
"data.input.additional_risk_data.basket.val.category":null,
"data.input.additional_risk_data.basket.val.item_reference":"26484651",
"data.input.additional_risk_data.basket.val.name":"Nike Force 1",
"data.input.additional_risk_data.basket.val.product_name":null,
"data.input.additional_risk_data.basket.val.published_date":null,
"data.input.additional_risk_data.basket.val.quantity":"1",
"data.input.additional_risk_data.basket.val.size":null,
"data.input.additional_risk_data.basket.val.subCategory":null,
"data.input.additional_risk_data.basket.val.unit_price":769.0,
"data.input.additional_risk_data.basket.val.upc":null,
"data.input.additional_risk_data.basket.val.url":null
}
]
I need to extract some of the data there, for example data.input.additional_risk_data.basket.val.item_reference. I'm not used to working with jsons but I tried a few things:
json_extract("data.input.additional_risk_data.basket", '$.data.input.additional_risk_data.basket.val.item_reference')
json_extract_scalar("data.input.additional_risk_data.basket", '$.data.input.additional_risk_data.basket.val.item_reference)
They all returned null. I'm wondering what is the correct way to get the values from that json
Thank you!
There are multiple "problems" with your data and json path selector. Keys are not conventional (and I have not found a way to tell athena to escape them) and your json is actually an array of json objects. What you can do - cast data to an array and process it. For example:
-- sample data
WITH dataset (json_val) AS (
VALUES (json '[
{
"data.input.additional_risk_data.basket.val.brand":null,
"data.input.additional_risk_data.basket.val.category":null,
"data.input.additional_risk_data.basket.val.item_reference":"26484651",
"data.input.additional_risk_data.basket.val.name":"Nike Force 1",
"data.input.additional_risk_data.basket.val.product_name":null,
"data.input.additional_risk_data.basket.val.published_date":null,
"data.input.additional_risk_data.basket.val.quantity":"1",
"data.input.additional_risk_data.basket.val.size":null,
"data.input.additional_risk_data.basket.val.subCategory":null,
"data.input.additional_risk_data.basket.val.unit_price":769.0,
"data.input.additional_risk_data.basket.val.upc":null,
"data.input.additional_risk_data.basket.val.url":null
}
]')
)
--query
select arr[1]['data.input.additional_risk_data.basket.val.item_reference'] item_reference -- or use unnest if there are actually more than 1 element in array expected
from(
select cast(json_val as array(map(varchar, json))) arr
from dataset
)
Output:
item_reference
"26484651"
I am trying to load data to ADLS gen2 from Azure SQL DB in json format.
Below is the query I am using to load it in JSON format
select k2.[mandt],k2.[kunnr],
'knb1' = (select [bukrs] as 'bukrs' , [pernr]
from [ods_cdc_sdr].[knb1] k1
where k2.mandt=k1.mandt AND K1.kunnr=K2.kunnr
FOR JSON PATH),
'knvp' =(select knvp.vkorg, vtweg from [ods_cdc_sdr].[knvp] knvp where k2.mandt=knvp.mandt AND knvp.kunnr=K2.kunnr FOR JSON PATH)
from [ods_cdc_sdr].[kna1] k2
group by k2.[mandt],k2.[kunnr]
FOR JSON PATH
For one or two records data looks fine but when I am trying to load 1000 and above records, json seems to be splitting also not in a proper format (below is the example)
**{"JSON_F52E2B61-18A1-11d1-B105-00805F49916B"**:"[{\"mandt\":\"172\",\"kunnr\":\"\"},{\"mandt\":\"172\",\"kunnr\":\"0000000001\"},{\"mandt\":\"172\",\"kunnr\":\"0000000004\",\"knvp\":[{\"vkorg\":\"FR12\",\"vtweg\":\"01\"},{\"vkorg\":\"FR12\",\"vtweg\":\"01\"},{\"vkorg\":\"FR12\",\"vtweg\":\"01\"},{\"vkorg\":\"FR12\",\"vtweg\":\"01\"},{\"vkorg\":\"FR12\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000006\"},{\"mandt\":\"172\",\"kunnr\":\"0000000008\",\"knvp\":[{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000012\",\"knvp\":[{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000015\"},{\"mandt\":\"172\",\"kunnr\":\"0000000021\"},{\"mandt\":\"172\",\"kunnr\":\"0000000022\"},{\"mandt\":\"172\",\"kunnr\":\"0000000023\"},{\"mandt\":\"172\",\"kunnr\":\"0000000026\",\"knvp\":[{\"vkorg\":\"IN14\",\"vtweg\":\"01\"},{\"vkorg\":\"IN14\",\"vtweg\":\"01\"},{\"vkorg\":\"IN14\",\"vtweg\":\"01\"},{\"vkorg\":\"IN14\",\"vtweg\":\"01\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000045\",\"knvp\":[{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000046\",\"knvp\":[{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000048\"},{\"mandt\":\"172\",\"kunnr\":\"0000000050\"},{\"mandt\":\"172\",\"kunnr\":\"0000000054\"},{\"mandt\":\"172\",\"kunnr\":\"0000000057\"},{\"mandt\":\"172\",\"kunnr\":\"0000000058\"},{\"mandt\":\"172\",\"kunnr\":\"0000000060\"},{\"mandt\":\"172\",\"kunnr\":\"0000000065\"},{\"mandt\":\"172\",\"kunnr\":\"0000000085\"},{\"mandt\":\"172\",\"kunnr\":\"0000000086\"},{\"mandt\":\"172\",\"kunnr\":\"0000000089\"},{\"mandt\":\"172\",\"kunnr\":\"0000000090\"},{\"mandt\":\"172\",\"kunnr\":\"0000000092\"},{\"mandt\":\"172\",\"kunnr\":\"0000000106\"},{\"mandt\":\"172\",\"kunnr\":\"0000000124\"},{\"mandt\":\"172\",\"kunnr\":\"0000000129\",\"knvp\":[{\"vkorg\":\"FR40\",\"vtweg\":\"01\"},{\"vkorg\":\"FR40\",\"vtweg\":\"01\"},{\"vkorg\":\"FR40\""}
**{"JSON_F52E2B61-18A1-11d1-B105-00805F49916B"**:",\**"vtweg\":\"01\"},{\"vkorg\":\"FR40\",\"vtweg\":\"01\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000149\"},{\"mandt\":\"172\",\"kunnr\":\"0000000164\"},{\"mandt\":\"172\",\"kunnr\":\"0000000167\"},{\"mandt\":\"172\",\"kunnr\":\"0000000174\"},{\"mandt\":\"172\",\"kunnr\":\"0000000178\"},{\"mandt\":\"172\",\"kunnr\":\"0000000181\"},{\"mandt\":\"172\",\"kunnr\":\"0000000185\",\"knvp\":[{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000189\"},{\"mandt\":\"172\",\"kunnr\":\"0000000214\"},{\"mandt\":\"172\",\"kunnr\":\"0000000223\"},{\"mandt\":\"172\",\"kunnr\":\"0000000228\"},{\"mandt\":\"172\",\"kunnr\":\"0000000239\"},{\"mandt\":\"172\",\"kunnr\":\"0000000240\"},{\"mandt\":\"172\",\"kunnr\":\"0000000249\"},{\"mandt\":\"172\",\"kunnr\":\"0000000251\"},{\"mandt\":\"172\",\"kunnr\":\"0000000257\"},{\"mandt\":\"172\",\"kunnr\":\"0000000260\"},{\"mandt\":\"172\",\"kunnr\":\"0000000261\"},{\"mandt\":\"172\",\"kunnr\":\"0000000262\"},{\"mandt\":\"172\",\"kunnr\":\"0000000286\"},{\"mandt\":\"172\",\"kunnr\":\"0000000301\"},{\"mandt\":\"172\",\"kunnr\":\"0000000320\"},{\"mandt\":\"172\",\"kunnr\":\"0000000347\"},{\"mandt\":\"172\",\"kunnr\":\"0000000350\"},{\"mandt\":\"172\",\"kunnr\":\"0000000353\"},{\"mandt\":\"172\",\"kunnr\":\"0000000364\"},{\"mandt\":\"172\",\"kunnr\":\"0000000370\"},{\"mandt\":\"172\",\"kunnr\":\"0000000372\"},{\"mandt\":\"172\",\"kunnr\":\"0000000373\"},{\"mandt\":\"172\",\"kunnr\":\"0000000375\"},{\"mandt\":\"172\",\"kunnr\":\"0000000377\"},{\"mandt\":\"172\",\"kunnr\":\"0000000380\"},{\"mandt\":\"172\",\"kunnr\":\"0000000381\"},{\"mandt\":\"172\",\"kunnr\":\"0000000383\"},{\"mandt\":\"172\",\"kunnr\":\"0000000384\"},{\"mandt\":\"172\",\"kunnr\":\"0000000386\"},{\"mandt\":\"172\",\"kunnr\":\"0000000387\"},{\"mandt\":\"172\",\"kunnr\":\"0000000391\"},{\"mandt\":\"172\",\"kunnr\":\"0000000393\"},{\"mandt\":\"172\",\"kunnr\":\"0000000396\"},{\"mandt\":\"172\",\"kunnr\":\"0000000397\"},{\"mandt\":\"172\",\"kunnr\":\"0000000408\"},{\"mandt\":\"172\",\"kunnr\":\"0000000416\"},{\"mandt\":\"172\",\"kunnr\":\"0000000421\"},{\"mandt\":\"172\",\"kunnr\":\"0000000424\"},{\"mandt\":\"172\",\"kunnr\":\"0000000425\"},{\"mandt\":\"172\",\"kunnr\":\"0000000428\"},{\"mandt\":\"172\",\"kunnr\":\"0000000443\"},{\"mandt\":\"172\",\"kunnr\":\"0000000447\"},{\"mandt\":\"172\",\"kunnr\":\"0000000453\"},{\"mandt"}
**{"JSON_F52E2B61-18A1-11d1-B105-00805F49916B"**:"\":\"172\",\"kunnr\":\"0000000475\"},{\"mandt\":\"172\",\"kunnr\":\"0000000478\"},{\"mandt\":\"172\\",\"kunnr\":\"2100000001\",\"knvp\":[{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"}]},{\"mandt\":\"172\",\"kunnr\":\"2100000002\",\"knvp\":[{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"}]}]"}
Please help me how can I get entire message in a proper format```
If you just want to save query result in json format to ADLS. You'd better remove FOR JSON PATH. We can use Azure Data Factory to generate nested JSON.
I created a simple test with two tables. Table Entities and table EntitiesEmails are related through the Id and EntitiyId fields. Usually we can use following sql to return a nested json type array. But ADF will automatically add escape characters '\' to escape double quotes.
SELECT
ent.Id AS 'Id',
ent.Name AS 'Name',
ent.Age AS 'Age',
EMails = (
SELECT
Emails.EntitiyId AS 'Id',
Emails.Email AS 'Email'
FROM EntitiesEmails Emails WHERE Emails.EntitiyId = ent.Id
FOR JSON PATH
)
FROM Entities ent
FOR JSON PATH
I researched out the use of data flow to generate a nested json array. As follows show:
Set source1 to the SQL table Entities .
Set source2 to the SQL table EntitiesEmails .
At Aggregate1 activity, set Group By Aggregate1
Type in expression collect(Email) at Aggregates. It will collect all email addresses into an array.
Data preview is as follows:
Then we can join these two streams at Join1 activity.
Then we can filter out extra columns at Select1 activity.
Then we can sink the result to our json file in ADLS.
Debug output is as follows:
I have browsed the web and forum to download the data from the file json, but my script does not work.
I have a problem with downloading the list of objects of rates. Can someone please help? I can not find fault.
{"table":"C","no":"195/C/NBP/2016","tradingDate":"2016-10-06","effectiveDate":"2016-10-07","rates":
[
{"currency":"dolar amerykański","code":"USD","bid":3.8011,"ask":3.8779},
{"currency":"dolar australijski","code":"AUD","bid":2.8768,"ask":2.935},
{"currency":"dolar kanadyjski","code":"CAD","bid":2.8759,"ask":2.9339},
{"currency":"euro","code":"EUR","bid":4.2493,"ask":4.3351},
{"currency":"forint (Węgry)","code":"HUF","bid":0.013927,"ask":0.014209},
{"currency":"frank szwajcarski","code":"CHF","bid":3.8822,"ask":3.9606},
{"currency":"funt szterling","code":"GBP","bid":4.8053,"ask":4.9023},
{"currency":"jen (Japonia)","code":"JPY","bid":0.036558,"ask":0.037296},
{"currency":"korona czeska","code":"CZK","bid":0.1573,"ask":0.1605},
{"currency":"korona duńska","code":"DKK","bid":0.571,"ask":0.5826},
{"currency":"korona norweska","code":"NOK","bid":0.473,"ask":0.4826},
{"currency":"korona szwedzka","code":"SEK","bid":0.4408,"ask":0.4498},
{"currency":"SDR (MFW)","code":"XDR","bid":5.3142,"ask":5.4216}
],
"EventProcessedUtcTime":"2016-10-09T10:48:41.6338718Z","PartitionId":1,"EventEnqueuedUtcTime":"2016-10-09T10:48:42.6170000Z"}
This is my script in sql.
#trial =
EXTRACT jsonString string
FROM #"adl://kamilsepin.azuredatalakestore.net/ExchangeRates/2016/10/09/10_0_c60d8b8895b047c896ce67d19df3cdb2.json"
USING Extractors.Text(delimiter:'\b', quoting:false);
#json =
SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS rec
FROM #trial;
#columnized =
SELECT
rec["table"]AS table,
rec["no"]AS no,
rec["tradingDate"]AS tradingDate,
rec["effectiveDate"]AS effectiveDate,
rec["rates"]AS rates
FROM #json;
#rateslist =
SELECT
table, no, tradingDate, effectiveDate,
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(rates) AS recl
FROM #columnized;
#selectrates =
SELECT
recl["currency"]AS currency,
recl["code"]AS code,
recl["bid"]AS bid,
recl["ask"]AS ask
FROM #rateslist;
OUTPUT #selectrates
TO "adl://kamilsepin.azuredatalakestore.net/datastreamanalitics/ExchangeRates.tsv"
USING Outputters.Tsv();
You need to look at the structure of your JSON and identify, what constitutes your first path inside your JSON that you want to map to correlated rows. In your case, you are really only interested in the array in rates where you want one row per array item.
Thus, you use the JSONExtractor with a JSONPath that gives you one row per array element (e.g., rates[*]) and then project each of its fields.
Here is the code (with slightly changed paths):
REFERENCE ASSEMBLY JSONBlog.[Newtonsoft.Json];
REFERENCE ASSEMBLY JSONBlog.[Microsoft.Analytics.Samples.Formats];
#selectrates =
EXTRACT currency string, code string, bid decimal, ask decimal
FROM #"/Temp/rates.json"
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("rates[*]");
OUTPUT #selectrates
TO "/Temp/ExchangeRates.tsv"
USING Outputters.Tsv();