Export data from SQL to ADLS using ADF as JSON - sql

I am trying to load data to ADLS gen2 from Azure SQL DB in json format.
Below is the query I am using to load it in JSON format
select k2.[mandt],k2.[kunnr],
'knb1' = (select [bukrs] as 'bukrs' , [pernr]
from [ods_cdc_sdr].[knb1] k1
where k2.mandt=k1.mandt AND K1.kunnr=K2.kunnr
FOR JSON PATH),
'knvp' =(select knvp.vkorg, vtweg from [ods_cdc_sdr].[knvp] knvp where k2.mandt=knvp.mandt AND knvp.kunnr=K2.kunnr FOR JSON PATH)
from [ods_cdc_sdr].[kna1] k2
group by k2.[mandt],k2.[kunnr]
FOR JSON PATH
For one or two records data looks fine but when I am trying to load 1000 and above records, json seems to be splitting also not in a proper format (below is the example)
**{"JSON_F52E2B61-18A1-11d1-B105-00805F49916B"**:"[{\"mandt\":\"172\",\"kunnr\":\"\"},{\"mandt\":\"172\",\"kunnr\":\"0000000001\"},{\"mandt\":\"172\",\"kunnr\":\"0000000004\",\"knvp\":[{\"vkorg\":\"FR12\",\"vtweg\":\"01\"},{\"vkorg\":\"FR12\",\"vtweg\":\"01\"},{\"vkorg\":\"FR12\",\"vtweg\":\"01\"},{\"vkorg\":\"FR12\",\"vtweg\":\"01\"},{\"vkorg\":\"FR12\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000006\"},{\"mandt\":\"172\",\"kunnr\":\"0000000008\",\"knvp\":[{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000012\",\"knvp\":[{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000015\"},{\"mandt\":\"172\",\"kunnr\":\"0000000021\"},{\"mandt\":\"172\",\"kunnr\":\"0000000022\"},{\"mandt\":\"172\",\"kunnr\":\"0000000023\"},{\"mandt\":\"172\",\"kunnr\":\"0000000026\",\"knvp\":[{\"vkorg\":\"IN14\",\"vtweg\":\"01\"},{\"vkorg\":\"IN14\",\"vtweg\":\"01\"},{\"vkorg\":\"IN14\",\"vtweg\":\"01\"},{\"vkorg\":\"IN14\",\"vtweg\":\"01\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000045\",\"knvp\":[{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000046\",\"knvp\":[{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000048\"},{\"mandt\":\"172\",\"kunnr\":\"0000000050\"},{\"mandt\":\"172\",\"kunnr\":\"0000000054\"},{\"mandt\":\"172\",\"kunnr\":\"0000000057\"},{\"mandt\":\"172\",\"kunnr\":\"0000000058\"},{\"mandt\":\"172\",\"kunnr\":\"0000000060\"},{\"mandt\":\"172\",\"kunnr\":\"0000000065\"},{\"mandt\":\"172\",\"kunnr\":\"0000000085\"},{\"mandt\":\"172\",\"kunnr\":\"0000000086\"},{\"mandt\":\"172\",\"kunnr\":\"0000000089\"},{\"mandt\":\"172\",\"kunnr\":\"0000000090\"},{\"mandt\":\"172\",\"kunnr\":\"0000000092\"},{\"mandt\":\"172\",\"kunnr\":\"0000000106\"},{\"mandt\":\"172\",\"kunnr\":\"0000000124\"},{\"mandt\":\"172\",\"kunnr\":\"0000000129\",\"knvp\":[{\"vkorg\":\"FR40\",\"vtweg\":\"01\"},{\"vkorg\":\"FR40\",\"vtweg\":\"01\"},{\"vkorg\":\"FR40\""}
**{"JSON_F52E2B61-18A1-11d1-B105-00805F49916B"**:",\**"vtweg\":\"01\"},{\"vkorg\":\"FR40\",\"vtweg\":\"01\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000149\"},{\"mandt\":\"172\",\"kunnr\":\"0000000164\"},{\"mandt\":\"172\",\"kunnr\":\"0000000167\"},{\"mandt\":\"172\",\"kunnr\":\"0000000174\"},{\"mandt\":\"172\",\"kunnr\":\"0000000178\"},{\"mandt\":\"172\",\"kunnr\":\"0000000181\"},{\"mandt\":\"172\",\"kunnr\":\"0000000185\",\"knvp\":[{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000189\"},{\"mandt\":\"172\",\"kunnr\":\"0000000214\"},{\"mandt\":\"172\",\"kunnr\":\"0000000223\"},{\"mandt\":\"172\",\"kunnr\":\"0000000228\"},{\"mandt\":\"172\",\"kunnr\":\"0000000239\"},{\"mandt\":\"172\",\"kunnr\":\"0000000240\"},{\"mandt\":\"172\",\"kunnr\":\"0000000249\"},{\"mandt\":\"172\",\"kunnr\":\"0000000251\"},{\"mandt\":\"172\",\"kunnr\":\"0000000257\"},{\"mandt\":\"172\",\"kunnr\":\"0000000260\"},{\"mandt\":\"172\",\"kunnr\":\"0000000261\"},{\"mandt\":\"172\",\"kunnr\":\"0000000262\"},{\"mandt\":\"172\",\"kunnr\":\"0000000286\"},{\"mandt\":\"172\",\"kunnr\":\"0000000301\"},{\"mandt\":\"172\",\"kunnr\":\"0000000320\"},{\"mandt\":\"172\",\"kunnr\":\"0000000347\"},{\"mandt\":\"172\",\"kunnr\":\"0000000350\"},{\"mandt\":\"172\",\"kunnr\":\"0000000353\"},{\"mandt\":\"172\",\"kunnr\":\"0000000364\"},{\"mandt\":\"172\",\"kunnr\":\"0000000370\"},{\"mandt\":\"172\",\"kunnr\":\"0000000372\"},{\"mandt\":\"172\",\"kunnr\":\"0000000373\"},{\"mandt\":\"172\",\"kunnr\":\"0000000375\"},{\"mandt\":\"172\",\"kunnr\":\"0000000377\"},{\"mandt\":\"172\",\"kunnr\":\"0000000380\"},{\"mandt\":\"172\",\"kunnr\":\"0000000381\"},{\"mandt\":\"172\",\"kunnr\":\"0000000383\"},{\"mandt\":\"172\",\"kunnr\":\"0000000384\"},{\"mandt\":\"172\",\"kunnr\":\"0000000386\"},{\"mandt\":\"172\",\"kunnr\":\"0000000387\"},{\"mandt\":\"172\",\"kunnr\":\"0000000391\"},{\"mandt\":\"172\",\"kunnr\":\"0000000393\"},{\"mandt\":\"172\",\"kunnr\":\"0000000396\"},{\"mandt\":\"172\",\"kunnr\":\"0000000397\"},{\"mandt\":\"172\",\"kunnr\":\"0000000408\"},{\"mandt\":\"172\",\"kunnr\":\"0000000416\"},{\"mandt\":\"172\",\"kunnr\":\"0000000421\"},{\"mandt\":\"172\",\"kunnr\":\"0000000424\"},{\"mandt\":\"172\",\"kunnr\":\"0000000425\"},{\"mandt\":\"172\",\"kunnr\":\"0000000428\"},{\"mandt\":\"172\",\"kunnr\":\"0000000443\"},{\"mandt\":\"172\",\"kunnr\":\"0000000447\"},{\"mandt\":\"172\",\"kunnr\":\"0000000453\"},{\"mandt"}
**{"JSON_F52E2B61-18A1-11d1-B105-00805F49916B"**:"\":\"172\",\"kunnr\":\"0000000475\"},{\"mandt\":\"172\",\"kunnr\":\"0000000478\"},{\"mandt\":\"172\\",\"kunnr\":\"2100000001\",\"knvp\":[{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"}]},{\"mandt\":\"172\",\"kunnr\":\"2100000002\",\"knvp\":[{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"}]}]"}
Please help me how can I get entire message in a proper format```

If you just want to save query result in json format to ADLS. You'd better remove FOR JSON PATH. We can use Azure Data Factory to generate nested JSON.
I created a simple test with two tables. Table Entities and table EntitiesEmails are related through the Id and EntitiyId fields. Usually we can use following sql to return a nested json type array. But ADF will automatically add escape characters '\' to escape double quotes.
SELECT
ent.Id AS 'Id',
ent.Name AS 'Name',
ent.Age AS 'Age',
EMails = (
SELECT
Emails.EntitiyId AS 'Id',
Emails.Email AS 'Email'
FROM EntitiesEmails Emails WHERE Emails.EntitiyId = ent.Id
FOR JSON PATH
)
FROM Entities ent
FOR JSON PATH
I researched out the use of data flow to generate a nested json array. As follows show:
Set source1 to the SQL table Entities .
Set source2 to the SQL table EntitiesEmails .
At Aggregate1 activity, set Group By Aggregate1
Type in expression collect(Email) at Aggregates. It will collect all email addresses into an array.
Data preview is as follows:
Then we can join these two streams at Join1 activity.
Then we can filter out extra columns at Select1 activity.
Then we can sink the result to our json file in ADLS.
Debug output is as follows:

Related

Transforming JSON data to relational data

I want to display data from SQL Server where the data is in JSON format. But when the select process, the data does not appear:
id
item_pieces_list
0
[{"id":2,"satuan":"BOX","isi":1,"aktif":true},{"id":4,"satuan":"BOX10","isi":1,"aktif":true}]
1
[{"id":0,"satuan":"AMPUL","isi":1,"aktif":"true"},{"id":4,"satuan":"BOX10","isi":5,"aktif":true}]
I've written a query like this, but nothing appears. Can anyone help?
Query :
SELECT id, JSON_Value(item_pieces_list, '$.satuan') AS Name
FROM [cisea.bamedika.co.id-hisys].dbo.medicine_alkes AS medicalkes
Your Path is wrong. Your JSON is an array, and you are trying to retrieve it as a flat object
SELECT id, JSON_Value(item_pieces_list,'$[0].satuan') AS Name
FROM [cisea.bamedika.co.id-hisys].dbo.medicine_alkes
Only in the case of data without the [] (array sign) you could use your original query '$.satuan', but since you are using an array I change it to retrieve only the first element in the array '$[0].satuan'

extract values inside an array column in amazon athena

I have a table in athena aws where the column 'metadata_stopinfo' has the structure that you can see in the image.
I am trying to extract values that are inside that array, however when I try
SELECT
"json_extract_scalar"(metadata_stopinfo, '$.city')
FROM "table"
I have the following problem
SYNTAX_ERROR: line 2:5: Unexpected parameters (array(row("address" row("addressline" varchar,"city" varchar,"countrycode" varchar,"countrycodeoriginal" varchar,"state" varchar,"zipcode" varchar),"carrierreference" varchar,"contacts" array(row("contacttype" varchar,"email" varchar,"fax" varchar,"mobilephone" varchar,"name" varchar,"officephone" varchar,"userid" varchar)),"containerinfo" array(row("containerid" varchar,"containeridtype" varchar,"equipmentcode" varchar,"equipmenttype" varchar)),"conveyancelinenumber" varchar,"conveyancetype" varchar,"conveyancetypeoriginal" varchar,"dateinfo" row("arrivalestimateddate" varchar,"arrivalestimateddateend" varchar,"arrivalestimatedendoffset" varchar,"arrivalestimatedoffset" varchar,"arrivalrequesteddate" varchar,"deliveryestimateddate" varchar,"deliveryestimateddateend" varchar,"deliveryestimatedendoffset" varchar,"deliveryestimatedoffset" varchar,"deliveryrequesteddate" varchar,"deliveryrequesteddateend" varchar,"deliveryrequestedendoffset" varchar,"deliveryrequestedoffset" varchar,"departureestimateddate" varchar,"departureestimateddateend" varchar,"departureestimatedendoffset" varchar,"departureestimatedoffset" varchar,"departurerequesteddate" varchar,"pickuprequesteddate" varchar,"pickuprequesteddateend" varchar,"pickuprequestedendoffset" varchar,"pickuprequestedoffset" varchar,"pickupestimateddate" varchar,"pickupestimateddateend" varchar,"pickupestimatedendoffset" varchar,"pickupestimatedoffset" varchar),"deliverynotenumber" varchar,"instructions" array(row("customerspecificsubtype" varchar,"header" boolean,"instructionsubtype" varchar,"instructiontype" varchar,"text" varchar)),"locationid" varchar,"partnercarrieraddress" row("addressline" varchar,"city" varchar,"countrycode" varchar,"countrycodeoriginal" varchar,"state" varchar,"zipcode" varchar),"partnercarriercontacts" array(row("contacttype" varchar,"email" varchar,"fax" varchar,"name" varchar,"officephone" varchar)),"partnercarrierid" varchar,"partnercarriername" varchar,"partnerid" varchar,"partnername" varchar,"partnertimezone" varchar,"partnertype" varchar,"productquantity" row("number" double,"originalunitofmeasure" varchar,"quantitytype" varchar,"unitofmeasure" varchar),"sequencenumber" bigint,"shipmentidentifier" varchar,"stoptype" varchar,"transportinfo" row("description" varchar,"transportcode" varchar,"transportoriginalcode" varchar),"vesselinfo" row("lloydsnumber" varchar,"shipsradiocallnumber" varchar,"vesselname" varchar,"vesselnumber" varchar,"voyagetripnumber" varchar))), varchar(6)) for function json_extract_scalar. Expected: json_extract_scalar(varchar(x), JsonPath) , json_extract_scalar(json, JsonPath)
My question is, how can i extract values inside de column ?
json_extract_scalar unsurprisingly works with json (note that even if yur data was in json format, json_extract_scalar(metadata_stopinfo, '$.city') still would not have worked cause your data is an array), while your column contains array's of row's, so you need to work with it correspondingly. For example you can use indexes to access elements in array (in presto array indexes start from 1):
SELECT
metadata_stopinfo[1] r
FROM "table"
And then access the fields:
The fields may be of any SQL type, and are accessed with field reference operator .
SELECT
metadata_stopinfo[1].city city
FROM "table"
Also you can flatten the array with unnest:
SELECT r.city
FROM "table",
unnest(metadata_stopinfo) as t(r)

Extracting JSON returns null (Presto Athena)

I'm working with SQL Presto in Athena and in a table I have a column named "data.input.additional_risk_data.basket" that has a json like this:
[
{
"data.input.additional_risk_data.basket.val.brand":null,
"data.input.additional_risk_data.basket.val.category":null,
"data.input.additional_risk_data.basket.val.item_reference":"26484651",
"data.input.additional_risk_data.basket.val.name":"Nike Force 1",
"data.input.additional_risk_data.basket.val.product_name":null,
"data.input.additional_risk_data.basket.val.published_date":null,
"data.input.additional_risk_data.basket.val.quantity":"1",
"data.input.additional_risk_data.basket.val.size":null,
"data.input.additional_risk_data.basket.val.subCategory":null,
"data.input.additional_risk_data.basket.val.unit_price":769.0,
"data.input.additional_risk_data.basket.val.upc":null,
"data.input.additional_risk_data.basket.val.url":null
}
]
I need to extract some of the data there, for example data.input.additional_risk_data.basket.val.item_reference. I'm not used to working with jsons but I tried a few things:
json_extract("data.input.additional_risk_data.basket", '$.data.input.additional_risk_data.basket.val.item_reference')
json_extract_scalar("data.input.additional_risk_data.basket", '$.data.input.additional_risk_data.basket.val.item_reference)
They all returned null. I'm wondering what is the correct way to get the values from that json
Thank you!
There are multiple "problems" with your data and json path selector. Keys are not conventional (and I have not found a way to tell athena to escape them) and your json is actually an array of json objects. What you can do - cast data to an array and process it. For example:
-- sample data
WITH dataset (json_val) AS (
VALUES (json '[
{
"data.input.additional_risk_data.basket.val.brand":null,
"data.input.additional_risk_data.basket.val.category":null,
"data.input.additional_risk_data.basket.val.item_reference":"26484651",
"data.input.additional_risk_data.basket.val.name":"Nike Force 1",
"data.input.additional_risk_data.basket.val.product_name":null,
"data.input.additional_risk_data.basket.val.published_date":null,
"data.input.additional_risk_data.basket.val.quantity":"1",
"data.input.additional_risk_data.basket.val.size":null,
"data.input.additional_risk_data.basket.val.subCategory":null,
"data.input.additional_risk_data.basket.val.unit_price":769.0,
"data.input.additional_risk_data.basket.val.upc":null,
"data.input.additional_risk_data.basket.val.url":null
}
]')
)
--query
select arr[1]['data.input.additional_risk_data.basket.val.item_reference'] item_reference -- or use unnest if there are actually more than 1 element in array expected
from(
select cast(json_val as array(map(varchar, json))) arr
from dataset
)
Output:
item_reference
"26484651"

Extracting data from JSON field in Amazon Redshift

I am trying to extract some data from a JSON field in Redshift.
Given below is a sample view of the data I am working with.
{"fileFormat":"excel","data":{"name":John,"age":24,"dateofbirth":1993,"Class":"Computer Science"}}
I am able to extract data for the first level namely data corresponding to
fileFormat and data as below:
select CONFIGURATION::JSON -> 'fileFormat' from table_name;
I am trying to extract information under data like name, age,dateofbirth
You could use Redshift's native function json_extract_path_text
- https://docs.aws.amazon.com/redshift/latest/dg/JSON_EXTRACT_PATH_TEXT.html
SELECT
json_extract_path_text(
configuration,
'data',
'name'
)
AS name,
json_extract_path_text(
configuration,
'data',
'age'
)
AS age,
etc
FROM
yourTable

How do I load CSV file to Amazon Athena that contains JSON field

I have a CSV (tab separated) in s3 that needs to be queried on a JSON field.
uid\tname\taddress
1\tmoorthi\t{"rno":123,"code":400111}
2\tkiranp\t{"rno":124,"street":"kemp road"}
How can I query this data in Amazon Athena?
I should be able to query like:
select uid
from table1
where address['street']="kemp road";
You could try using the json_extract() command.
From Extracting Data from JSON - Amazon Athena:
You may have source data with containing JSON-encoded strings that you do not necessarily want to deserialize into a table in Athena. In this case, you can still run SQL operations on this data, using the JSON functions available in Presto.
WITH dataset AS (
SELECT '{"name": "Susan Smith",
"org": "engineering",
"projects": [{"name":"project1", "completed":false},
{"name":"project2", "completed":true}]}'
AS blob
)
SELECT
json_extract(blob, '$.name') AS name,
json_extract(blob, '$.projects') AS projects
FROM dataset
This example shows how json_extract() can be used to extract fields from JSON. Thus, you might be able to do something like:
select uid
from table1
where json_extract(address, '$.street') = "kemp road";