U-sql call data in json array - azure-data-lake

I have browsed the web and forum to download the data from the file json, but my script does not work.
I have a problem with downloading the list of objects of rates. Can someone please help? I can not find fault.
{"table":"C","no":"195/C/NBP/2016","tradingDate":"2016-10-06","effectiveDate":"2016-10-07","rates":
[
{"currency":"dolar amerykański","code":"USD","bid":3.8011,"ask":3.8779},
{"currency":"dolar australijski","code":"AUD","bid":2.8768,"ask":2.935},
{"currency":"dolar kanadyjski","code":"CAD","bid":2.8759,"ask":2.9339},
{"currency":"euro","code":"EUR","bid":4.2493,"ask":4.3351},
{"currency":"forint (Węgry)","code":"HUF","bid":0.013927,"ask":0.014209},
{"currency":"frank szwajcarski","code":"CHF","bid":3.8822,"ask":3.9606},
{"currency":"funt szterling","code":"GBP","bid":4.8053,"ask":4.9023},
{"currency":"jen (Japonia)","code":"JPY","bid":0.036558,"ask":0.037296},
{"currency":"korona czeska","code":"CZK","bid":0.1573,"ask":0.1605},
{"currency":"korona duńska","code":"DKK","bid":0.571,"ask":0.5826},
{"currency":"korona norweska","code":"NOK","bid":0.473,"ask":0.4826},
{"currency":"korona szwedzka","code":"SEK","bid":0.4408,"ask":0.4498},
{"currency":"SDR (MFW)","code":"XDR","bid":5.3142,"ask":5.4216}
],
"EventProcessedUtcTime":"2016-10-09T10:48:41.6338718Z","PartitionId":1,"EventEnqueuedUtcTime":"2016-10-09T10:48:42.6170000Z"}
This is my script in sql.
#trial =
EXTRACT jsonString string
FROM #"adl://kamilsepin.azuredatalakestore.net/ExchangeRates/2016/10/09/10_0_c60d8b8895b047c896ce67d19df3cdb2.json"
USING Extractors.Text(delimiter:'\b', quoting:false);
#json =
SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS rec
FROM #trial;
#columnized =
SELECT
rec["table"]AS table,
rec["no"]AS no,
rec["tradingDate"]AS tradingDate,
rec["effectiveDate"]AS effectiveDate,
rec["rates"]AS rates
FROM #json;
#rateslist =
SELECT
table, no, tradingDate, effectiveDate,
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(rates) AS recl
FROM #columnized;
#selectrates =
SELECT
recl["currency"]AS currency,
recl["code"]AS code,
recl["bid"]AS bid,
recl["ask"]AS ask
FROM #rateslist;
OUTPUT #selectrates
TO "adl://kamilsepin.azuredatalakestore.net/datastreamanalitics/ExchangeRates.tsv"
USING Outputters.Tsv();

You need to look at the structure of your JSON and identify, what constitutes your first path inside your JSON that you want to map to correlated rows. In your case, you are really only interested in the array in rates where you want one row per array item.
Thus, you use the JSONExtractor with a JSONPath that gives you one row per array element (e.g., rates[*]) and then project each of its fields.
Here is the code (with slightly changed paths):
REFERENCE ASSEMBLY JSONBlog.[Newtonsoft.Json];
REFERENCE ASSEMBLY JSONBlog.[Microsoft.Analytics.Samples.Formats];
#selectrates =
EXTRACT currency string, code string, bid decimal, ask decimal
FROM #"/Temp/rates.json"
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("rates[*]");
OUTPUT #selectrates
TO "/Temp/ExchangeRates.tsv"
USING Outputters.Tsv();

Related

BigQuery >> Extract Value for Dictionary Key in JSON Object

I have a string object in a table column with structure:
[{"__food": "true"},{"item": "1"},{"toppings": "true"},{"__discount_amount": "4.95"},{"__original_price": "24.95"}]
How can I extract the value true from the toppings key from this?
I tried turning it into JSON first but json_extract(parse_json(string_object_column), '$.toppings') just returns null
The closest I got was keeping it as a string and doing
json_extract(string_object_column, '$[0]')
Which gets me:
{"toppings":"true"}
Is this doable without unnesting?
You may try and consider below approach using REGEXP_EXTRACT:
SELECT REGEXP_EXTRACT('[{"__food": "true"},{"item": "1"},{"toppings": "true"},{"__discount_amount": "4.95"},{"__original_price": "24.95"}]', r'"toppings": "(\D+)"}') as EXTRACT_TOPPINGS
OUTPUT:
You may just update the REGEX to make it more strict based on your use case.

Transforming JSON data to relational data

I want to display data from SQL Server where the data is in JSON format. But when the select process, the data does not appear:
id
item_pieces_list
0
[{"id":2,"satuan":"BOX","isi":1,"aktif":true},{"id":4,"satuan":"BOX10","isi":1,"aktif":true}]
1
[{"id":0,"satuan":"AMPUL","isi":1,"aktif":"true"},{"id":4,"satuan":"BOX10","isi":5,"aktif":true}]
I've written a query like this, but nothing appears. Can anyone help?
Query :
SELECT id, JSON_Value(item_pieces_list, '$.satuan') AS Name
FROM [cisea.bamedika.co.id-hisys].dbo.medicine_alkes AS medicalkes
Your Path is wrong. Your JSON is an array, and you are trying to retrieve it as a flat object
SELECT id, JSON_Value(item_pieces_list,'$[0].satuan') AS Name
FROM [cisea.bamedika.co.id-hisys].dbo.medicine_alkes
Only in the case of data without the [] (array sign) you could use your original query '$.satuan', but since you are using an array I change it to retrieve only the first element in the array '$[0].satuan'

Presto extract string from array of JSON elements

I am on Presto 0.273 and I have a complex JSON data from which I am trying to extract only specific values.
First, I ran SELECT JSON_EXTRACT(library_data, '.$books') which gets me all the books from a certain library. The problem is this returns an array of JSON objects that look like this:
[{
"book_name":"abc",
"book_size":"453",
"requestor":"27657899462"
"comments":"this is a comment"
}, {
"book_name":"def",
"book_size":"354",
"requestor":"67657496274"
"comments":"this is a comment"
}, ...
]
I would like the code to return just a list of the JSON objects, not an array. My intention is to later be able to loop through the JSON objects to find ones from a specific requester. Currently, when I loop through the given arrays using python, I get a range of errors around this data being a Series, hence trying to extract it properly rather.
I tried this SELECT JSON_EXTRACT(JSON_EXTRACT(data, '$.domains'), '$[0]') but this doesn't work because the index position of the object needed is not known.
I also tried SELECT array_join(array[books], ', ') but getting "Error casting array element to VARCHAR " error.
Can anyone please point me in the right direction?
Cast to array(json):
SELECT CAST(JSON_EXTRACT(library_data, '.$books') as array(json))
Or you can use it in unnest to flatten it to rows:
SELECT *,
js_obj -- will contain single json object
FROM table
CROSS JOIN UNNEST CAST(JSON_EXTRACT(library_data, '.$books') as array(json)) as t(js_obj)

How can I get the last element of an array? SQL Bigquery

I'm working on building a follow-network form Github's available data on Google BigQuery, e.g.: https://bigquery.cloud.google.com/table/githubarchive:day.20210606
The key data is contained in the "payload" field, STRING type. I managed to unnest the data contained in that field and convert it to an array, but how can I get the last element?
Here is what I have so far...
select type,
array(select trim(val) from unnest(split(trim(payload, '[]'))) val) payload
from `githubarchive.day.20210606`
where type = 'MemberEvent'
Which outputs:
How can I get only the last element, "Action":"added"} ?
I know that
select array_reverse(your_array)[offset(0)]
should do the trick, however I'm unsure how to combine that in my code. I've been trying different options without success, for example:
with payload as ( select array(select trim(val) from unnest(split(trim(payload, '[]'))) val) payload from `githubarchive.day.20210606`)
select type, ARRAY_REVERSE(payload)[ORDINAL(1)]
from `githubarchive.day.20210606` where type = 'MemberEvent'
The desired output should look like:
To get last element in array you can use below approach
select array_reverse(your_array)[offset(0)]
I'm unsure how to combine that in my code
select type, array_reverse(array(
select trim(val)
from unnest(split(trim(payload, '[]'))) val
))[offset(0)]
from `githubarchive.day.20210606`
where type = 'MemberEvent'
There is a solution without reversing the array.
SELECT event[OFFSET(ARRAY_LENGTH(event)-1)

Converting xml node string to strip out nodes

I have a table that has a column called RAW DATA of type NVARCHAR MAX, which is a dump from a web service. Here is a sample of 1 data line:
<CourtRecordEventCaseHist>
<eventDate>2008-02-11T06:00:00Z</eventDate>
<eventDate_TZ>-0600</eventDate_TZ>
<histSeqNo>4</histSeqNo>
<countyNo>1</countyNo>
<caseNo>xxxxxx</caseNo>
<eventType>WCCS</eventType>
<descr>Warrant/Capias/Commitment served</descr>
<tag/>
<ctofcNameL/>
<ctofcNameF/>
<ctofcNameM/>
<ctofcSuffix/>
<sealCtofcNameL/>
<sealCtofcNameF/>
<sealCtofcNameM/>
<sealCtofcSuffix/>
<sealCtofcTypeCodeDescr/>
<courtRptrNameL/>
<courtRptrNameF/>
<courtRptrNameM/>
<courtRptrSuffix/>
<dktTxt>Signature bond set</dktTxt>
<eventAmt>0.00</eventAmt>
<isMoneyEnabled>false</isMoneyEnabled>
<courtRecordEventPartyList>
<partyNameF>Name</partyNameF>
<partyNameM>A.</partyNameM>
<partyNameL>xxxx</partyNameL>
<partySuffix/>
<isAddrSealed>false</isAddrSealed>
<isSeal>false</isSeal>
</courtRecordEventPartyList>
</CourtRecordEventCaseHist>
It was suppose to go in a table, with the node names representing the column names. The table it's going to is created, I just need to exract the data from this row to the table. I have 100's of thousands records like this. I was going to copy to a xml file, then import. But there is so much data, I would rather try and do the work within the DB.
Any ideas?
First, create the table with all the required columns.
Then, use your favorite scripting language to load the table! Mine being groovy, here is what I'd do:
def sql = Sql.newInstance(/* SQL connection here*/)
sql.eachRow("select RAW_DATA from TABLE_NAME") { row ->
String xmlData = row."RAW_DATA"
def root = new XmlSlurper().parseText(xmlData)
def date = root.eventDate
def histSeqNo = root.histSeqNo
//Pull out all the data and insert into new table!
}
I did find an answer to this, I'm sure there is more than one way of doing this. But this is what I got to work. Thanks for everyone's help.
SELECT
pref.value('(caseNo/text())[1]', 'varchar(20)') as CaseNumber,
pref.value('(countyNo/text())[1]', 'int') as CountyNumber
FROM
dbo.CaseHistoryRawData_10 CROSS APPLY
RawData.nodes('//CourtRecordEventCaseHist') AS CourtRec(pref)