Extracting nested JSON using json_extract_array: getting null results on populated data fields? - sql

I am attempting to extract the following configuration data from this nested json file (Image of JSON file), using this query:
SELECT
json_extract_scalar(f, '$.affiliationTypes') as affiliation_types
, json_extract_scalar(f, '$.organizationTypes') as organization_types
, json_extract_scalar(f, '$.verificationTypes') as verification_types
, json_extract_scalar(f, '$.notifierIds') as notifier_ids
, json_extract_scalar(f, '$.metadata') as metadata
, json_extract_scalar(body, '$.accountId') as account_id
FROM `my database`
left join unnest(json_extract_array(body, '$.request.config')) as f
LIMIT 100;
but am getting null results for all of the selected config data. I've even filtered on accounts/requests where I know there is config data listed still nothing. Any ideas? (many of these fields have data I've just nulled them out for this post).
AS an example, I would like for the first select statement to output: STUDENT_PART_TIME, EMPLOYEE, FACULTY, VETERAN, STUDENT_FULL_TIME, ACTIVE_DUTY
For those who dont want to open image, this is the json chunk I'm interested in:
"body": {
"accountId":""
,"activeAffiliationTypes":[]
,"affiliations":[]
,"aggregateState":"OPEN"
,"birthMonth":null
,"birthYear":null
,"customerFacingConclusiveVerificationType":null
,"customerFacingRequestedAffiliationType":""
,"dataSourceErrorCodes":null
,"dataSourceResponseHttpStatusCategory":"UNKNOWN"
,"dataSourceResponseHttpStatusCode":-1
,"emailDomain":null
,"errorCodes":[]
,"errors":[]
,"issuedRewards":[]
,"metadata":{}
,"request": {
"accountId":""
,"active":false
,"assetMap":{}
,"config": {
"affiliationTypes":["STUDENT_PART_TIME","EMPLOYEE","FACULTY","VETERAN","STUDENT_FULL_TIME","ACTIVE_DUTY"]
,"assetTypes":[]
,"consolationRewardIds":[]
,"ignoreVerifierClasses":null
,"locale":"en_US"
,"metadata":{}
,"notifierIds":null
,"organizationTypes":[]
,"rewardIds":[]
,"testMode":false
,"verificationModelClass":""
,"verificationSourceClasses":[]
,"verificationTypes":["AUTHORITATIVE"]
}
,"created":null

SELECT STRING(c)
FROM 'my database'
LEFT JOIN UNNEST(
JSON_QUERY_ARRAY(body, '$.request.config.affiliationTypes')
) c

Related

Parse Json - CTE & filtering

I need to remove a few records (that contain t) in order to parse/flatten the data column. The query in the CTE that creates 'tab', works independent but when inside the CTE i get the same error while trying to parse json, if I were not have tried to filter out the culprit.
with tab as (
select * from table
where data like '%t%')
select b.value::string, a.* from tab a,
lateral flatten( input => PARSE_JSON( a.data) ) b ;
;
error:
Error parsing JSON: unknown keyword "test123", pos 8
example data:
Date Data
1-12-12 {id: 13-43}
1-12-14 {id: 43-43}
1-11-14 {test12}
1-11-14 {test2}
1-02-14 {id: 44-43}
It is possible to replace PARSE_JSON(a.data) with TRY_PARSE_JSON(a.data) which will produce NULL instead of error for invalid input.
More at: TRY_PARSE_JSON

Stream Analytics query to exclude records

I am successfully joining a reference data stream like this:
TenantInput AS
(
SELECT
Input.userId,
Input.tenantId,
FROM
Input
JOIN
Tenants TNTS ON Input.tenantId = TNTS.tenantId
)
Whereas TNTS is a JSON file in a storage blob:
[
{
"tenantId": "t1"
},
{
"tenantId": "t2"
}
]
This works well and the output only contains records for t1 + t2.
On a second output, I would like to have all data except for tenant t1 + t2 and so far I have not found a solution. I tried things like below, but this is not supported.
OtherTenantInput AS
(
SELECT
Input.userId,
Input.tenantId,
FROM
Input
WHERE
Input.tenanId NOT IN (SELECT * FROM TNTS)
)
Any ideas are welcome.
How about:
TenantInput AS
(
SELECT
Input.userId,
Input.tenantId,
FROM
Input
LEFT JOIN
Tenants TNTS ON Input.tenantId = TNTS.tenantId
WHERE TNTS.tenantId IS NULL
)
This will only output events from Input, where there is no tenantId can be found in TNTS.

Export data from SQL to ADLS using ADF as JSON

I am trying to load data to ADLS gen2 from Azure SQL DB in json format.
Below is the query I am using to load it in JSON format
select k2.[mandt],k2.[kunnr],
'knb1' = (select [bukrs] as 'bukrs' , [pernr]
from [ods_cdc_sdr].[knb1] k1
where k2.mandt=k1.mandt AND K1.kunnr=K2.kunnr
FOR JSON PATH),
'knvp' =(select knvp.vkorg, vtweg from [ods_cdc_sdr].[knvp] knvp where k2.mandt=knvp.mandt AND knvp.kunnr=K2.kunnr FOR JSON PATH)
from [ods_cdc_sdr].[kna1] k2
group by k2.[mandt],k2.[kunnr]
FOR JSON PATH
For one or two records data looks fine but when I am trying to load 1000 and above records, json seems to be splitting also not in a proper format (below is the example)
**{"JSON_F52E2B61-18A1-11d1-B105-00805F49916B"**:"[{\"mandt\":\"172\",\"kunnr\":\"\"},{\"mandt\":\"172\",\"kunnr\":\"0000000001\"},{\"mandt\":\"172\",\"kunnr\":\"0000000004\",\"knvp\":[{\"vkorg\":\"FR12\",\"vtweg\":\"01\"},{\"vkorg\":\"FR12\",\"vtweg\":\"01\"},{\"vkorg\":\"FR12\",\"vtweg\":\"01\"},{\"vkorg\":\"FR12\",\"vtweg\":\"01\"},{\"vkorg\":\"FR12\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000006\"},{\"mandt\":\"172\",\"kunnr\":\"0000000008\",\"knvp\":[{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000012\",\"knvp\":[{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000015\"},{\"mandt\":\"172\",\"kunnr\":\"0000000021\"},{\"mandt\":\"172\",\"kunnr\":\"0000000022\"},{\"mandt\":\"172\",\"kunnr\":\"0000000023\"},{\"mandt\":\"172\",\"kunnr\":\"0000000026\",\"knvp\":[{\"vkorg\":\"IN14\",\"vtweg\":\"01\"},{\"vkorg\":\"IN14\",\"vtweg\":\"01\"},{\"vkorg\":\"IN14\",\"vtweg\":\"01\"},{\"vkorg\":\"IN14\",\"vtweg\":\"01\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000045\",\"knvp\":[{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000046\",\"knvp\":[{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"},{\"vkorg\":\"FR13\",\"vtweg\":\"04\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000048\"},{\"mandt\":\"172\",\"kunnr\":\"0000000050\"},{\"mandt\":\"172\",\"kunnr\":\"0000000054\"},{\"mandt\":\"172\",\"kunnr\":\"0000000057\"},{\"mandt\":\"172\",\"kunnr\":\"0000000058\"},{\"mandt\":\"172\",\"kunnr\":\"0000000060\"},{\"mandt\":\"172\",\"kunnr\":\"0000000065\"},{\"mandt\":\"172\",\"kunnr\":\"0000000085\"},{\"mandt\":\"172\",\"kunnr\":\"0000000086\"},{\"mandt\":\"172\",\"kunnr\":\"0000000089\"},{\"mandt\":\"172\",\"kunnr\":\"0000000090\"},{\"mandt\":\"172\",\"kunnr\":\"0000000092\"},{\"mandt\":\"172\",\"kunnr\":\"0000000106\"},{\"mandt\":\"172\",\"kunnr\":\"0000000124\"},{\"mandt\":\"172\",\"kunnr\":\"0000000129\",\"knvp\":[{\"vkorg\":\"FR40\",\"vtweg\":\"01\"},{\"vkorg\":\"FR40\",\"vtweg\":\"01\"},{\"vkorg\":\"FR40\""}
**{"JSON_F52E2B61-18A1-11d1-B105-00805F49916B"**:",\**"vtweg\":\"01\"},{\"vkorg\":\"FR40\",\"vtweg\":\"01\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000149\"},{\"mandt\":\"172\",\"kunnr\":\"0000000164\"},{\"mandt\":\"172\",\"kunnr\":\"0000000167\"},{\"mandt\":\"172\",\"kunnr\":\"0000000174\"},{\"mandt\":\"172\",\"kunnr\":\"0000000178\"},{\"mandt\":\"172\",\"kunnr\":\"0000000181\"},{\"mandt\":\"172\",\"kunnr\":\"0000000185\",\"knvp\":[{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"},{\"vkorg\":\"FR65\",\"vtweg\":\"01\"}]},{\"mandt\":\"172\",\"kunnr\":\"0000000189\"},{\"mandt\":\"172\",\"kunnr\":\"0000000214\"},{\"mandt\":\"172\",\"kunnr\":\"0000000223\"},{\"mandt\":\"172\",\"kunnr\":\"0000000228\"},{\"mandt\":\"172\",\"kunnr\":\"0000000239\"},{\"mandt\":\"172\",\"kunnr\":\"0000000240\"},{\"mandt\":\"172\",\"kunnr\":\"0000000249\"},{\"mandt\":\"172\",\"kunnr\":\"0000000251\"},{\"mandt\":\"172\",\"kunnr\":\"0000000257\"},{\"mandt\":\"172\",\"kunnr\":\"0000000260\"},{\"mandt\":\"172\",\"kunnr\":\"0000000261\"},{\"mandt\":\"172\",\"kunnr\":\"0000000262\"},{\"mandt\":\"172\",\"kunnr\":\"0000000286\"},{\"mandt\":\"172\",\"kunnr\":\"0000000301\"},{\"mandt\":\"172\",\"kunnr\":\"0000000320\"},{\"mandt\":\"172\",\"kunnr\":\"0000000347\"},{\"mandt\":\"172\",\"kunnr\":\"0000000350\"},{\"mandt\":\"172\",\"kunnr\":\"0000000353\"},{\"mandt\":\"172\",\"kunnr\":\"0000000364\"},{\"mandt\":\"172\",\"kunnr\":\"0000000370\"},{\"mandt\":\"172\",\"kunnr\":\"0000000372\"},{\"mandt\":\"172\",\"kunnr\":\"0000000373\"},{\"mandt\":\"172\",\"kunnr\":\"0000000375\"},{\"mandt\":\"172\",\"kunnr\":\"0000000377\"},{\"mandt\":\"172\",\"kunnr\":\"0000000380\"},{\"mandt\":\"172\",\"kunnr\":\"0000000381\"},{\"mandt\":\"172\",\"kunnr\":\"0000000383\"},{\"mandt\":\"172\",\"kunnr\":\"0000000384\"},{\"mandt\":\"172\",\"kunnr\":\"0000000386\"},{\"mandt\":\"172\",\"kunnr\":\"0000000387\"},{\"mandt\":\"172\",\"kunnr\":\"0000000391\"},{\"mandt\":\"172\",\"kunnr\":\"0000000393\"},{\"mandt\":\"172\",\"kunnr\":\"0000000396\"},{\"mandt\":\"172\",\"kunnr\":\"0000000397\"},{\"mandt\":\"172\",\"kunnr\":\"0000000408\"},{\"mandt\":\"172\",\"kunnr\":\"0000000416\"},{\"mandt\":\"172\",\"kunnr\":\"0000000421\"},{\"mandt\":\"172\",\"kunnr\":\"0000000424\"},{\"mandt\":\"172\",\"kunnr\":\"0000000425\"},{\"mandt\":\"172\",\"kunnr\":\"0000000428\"},{\"mandt\":\"172\",\"kunnr\":\"0000000443\"},{\"mandt\":\"172\",\"kunnr\":\"0000000447\"},{\"mandt\":\"172\",\"kunnr\":\"0000000453\"},{\"mandt"}
**{"JSON_F52E2B61-18A1-11d1-B105-00805F49916B"**:"\":\"172\",\"kunnr\":\"0000000475\"},{\"mandt\":\"172\",\"kunnr\":\"0000000478\"},{\"mandt\":\"172\\",\"kunnr\":\"2100000001\",\"knvp\":[{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"}]},{\"mandt\":\"172\",\"kunnr\":\"2100000002\",\"knvp\":[{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"},{\"vkorg\":\"Y200\",\"vtweg\":\"Z1\"}]}]"}
Please help me how can I get entire message in a proper format```
If you just want to save query result in json format to ADLS. You'd better remove FOR JSON PATH. We can use Azure Data Factory to generate nested JSON.
I created a simple test with two tables. Table Entities and table EntitiesEmails are related through the Id and EntitiyId fields. Usually we can use following sql to return a nested json type array. But ADF will automatically add escape characters '\' to escape double quotes.
SELECT
ent.Id AS 'Id',
ent.Name AS 'Name',
ent.Age AS 'Age',
EMails = (
SELECT
Emails.EntitiyId AS 'Id',
Emails.Email AS 'Email'
FROM EntitiesEmails Emails WHERE Emails.EntitiyId = ent.Id
FOR JSON PATH
)
FROM Entities ent
FOR JSON PATH
I researched out the use of data flow to generate a nested json array. As follows show:
Set source1 to the SQL table Entities .
Set source2 to the SQL table EntitiesEmails .
At Aggregate1 activity, set Group By Aggregate1
Type in expression collect(Email) at Aggregates. It will collect all email addresses into an array.
Data preview is as follows:
Then we can join these two streams at Join1 activity.
Then we can filter out extra columns at Select1 activity.
Then we can sink the result to our json file in ADLS.
Debug output is as follows:

PG::InvalidParameterValue: ERROR: cannot extract element from a scalar

I'm fetching data from the JSON column by using the following query.
SELECT id FROM ( SELECT id,JSON_ARRAY_ELEMENTS(shipment_lot::json) AS js2 FROM file_table WHERE ('shipment_lot') = ('shipment_lot') ) q WHERE js2->> 'invoice_number' LIKE ('%" abc1123"%')
My Postgresql version is 9.3
Saved data in JSON column:
[{ "id"=>2981, "lot_number"=>1, "activate"=>true, "invoice_number"=>"abc1123", "price"=>378.0}]
However, I'm getting this error:
ActiveRecord::StatementInvalid (PG::InvalidParameterValue: ERROR: cannot extract element from a scalar:
SELECT id FROM
( SELECT id,JSON_ARRAY_ELEMENTS(shipment_lot::json)
AS js2 FROM file_heaps
WHERE ('shipment_lot') = ('shipment_lot') ) q
WHERE js2->> 'invoice_number' LIKE ('%abc1123%'))
How I can solve this issue.
Your issue is that you have improper JSON stored
If you try running your example data on postgres it will not run
SELECT ('[{ "id"=>2981, "lot_number"=>1, "activate"=>true, "invoice_number"=>"abc1123", "price"=>378.0}]')::json
This is the JSON formatted correctly:
SELECT ('[{ "id":2981, "lot_number":1, "activate":true, "invoice_number":"abc1123", "price":378.0}]')::json

bigquery joins on nested repeated

I am having trouble joining on a repeated nested field while still preserving the original row structure in BigQuery.
For my example I'll call the two tables being joined A and B.
Records in table A look something like:
{
"url":"some url",
"repeated_nested": [
{"key":"some key","property":"some property"}
]
}
and records in table B look something like:
{
"key":"some key",
"property2": "another property"
}
I am hoping to find a way to join this data together to generate a row that looks like:
{
"url":"some url",
"repeated_nested": [
{
"key":"some key",
"property":"some property",
"property2":"another property"
}
]
}
The very first query I tried was:
SELECT
url, repeated_nested.key, repeated_nested.property, repeated_nested.property2
FROM A
AS lefttable
LEFT OUTER JOIN B
AS righttable
ON lefttable.key=righttable.key
This doesn't work because BQ can't join on repeated nested fields. There is not a unique identifier for each row. If I were to do a FLATTEN on repeated_nested then I'm not sure how to get the original row put back together correctly.
The data is such that a url will always have the same repeated_nested field with it. Because of that, I was able to make a workaround using a UDF to sort of roll up this repeated nested object into a JSON string and then unroll it again:
SELECT url, repeated_nested.key, repeated_nested.property, repeated_nested.property2
FROM
JS(
(
SELECT basetable.url as url, repeated_nested
FROM A as basetable
LEFT JOIN (
SELECT url, CONCAT("[", GROUP_CONCAT_UNQUOTED(repeated_nested_json, ","), "]") as repeated_nested
FROM
(
SELECT
url,
CONCAT(
'{"key": "', repeated_nested.key, '",',
' "property": "', repeated_nested.property, '",',
' "property2": "', mapping_table.property2, '"',
'}'
)
) as repeated_nested_json
FROM (
SELECT
url, repeated_nested.key, repeated_nested.property
FROM A
GROUP BY url, repeated_nested.key, repeated_nested.property
) as urltable
LEFT OUTER JOIN [SDF.alchemy_to_ric]
AS mapping_table
ON urltable.repeated_nested.key=mapping_table.key
)
GROUP BY url
) as companytable
ON basetable.url = urltable.url
),
// input columns:
url, repeated_nested_json,
// output schema:
"[{'name': 'url', 'type': 'string'},
{'name': 'repeated_nested_json', 'type': 'RECORD', 'mode':'REPEATED', 'fields':
[ { 'name': 'key', 'type':'string' },
{ 'name': 'property', 'type':'string' },
{ 'name': 'property2', 'type':'string' }]
}]",
// UDF:
"function(row, emit) {
parsed_repeated_nested = [];
try {
if ( row.repeated_nested_json != null ) {
parsed_repeated_nested = JSON.parse(row.repeated_nested_json);
}
} catch (ex) { }
emit({
url: row.url,
repeated_nested: parsed_repeated_nested
});
}"
)
This solution works fine for small tables. But the real life tables I'm working with have many more columns than in my example above. When there are other fields in addition to url and repeated_nested_json they all have to be passed through the UDF. When I work with tables that are around the 50 gb range everything is fine. But when I apply the UDF and query to tables that are 500-1000 gb, I get an Internal Server Error from BQ.
In the end I just need all of the data in new line delimited JSON format in GCS. As a last ditch effort I tried concatenating all of the fields into a JSON string (so that I only had 1 column) in the hopes that I could export it as CSV and have what I need. However, the export process escaped the double quotes and adds double quotes around the JSON string. According to the BQ docs on jobs (https://cloud.google.com/bigquery/docs/reference/v2/jobs) there is a property configuration.query.tableDefinitions.(key).csvOptions.quote that could help me. But I can't figure out how to make it work.
Does anybody have advice on how they have dealt with this sort of situation?
I have never had to do this, but you should be able to use flatten, then join, then use nest to get repeated fields again.
The docs state that BigQuery always flattens query results, but that appears to be false: you can choose to not have results flattened if you set a destination table. You should then be able to export that table as JSON to Storage.
See also this answer for how to get nest to work.
#AndrewBackes - we rolled out some fixes for UDF memory-related issues this week; there are some details on the root cause here https://stackoverflow.com/a/36275462/5265394 and here https://stackoverflow.com/a/35563562/5265394.
The UDF version of your query is now working for me; could you verify on your side?