I have load the entire json file into a STRING column of BigQuery table. Now I am trying to access the keys using JSON_EXTRACT_SCALAR function, but I am getting null result for the child keys which contain special character period(".") within their name.
Here's the snippet of the data:
{"server_received_time":"2019-01-17 15:00:00.482000","app":161,"device_carrier":null,"$schema":12,"city":"Caro","user_id":null,"uuid":"9018","event_time":"2019-01-17 15:00:00.045000","platform":"Web","os_version":"49","vendor_id":711,"processed_time":"2019-01-17 15:00:00.817195","user_creation_time":"2018-11-01 19:16:34.971000","version_name":null,"ip_address":null,"paying":null,"dma":null,"group_properties":{},"user_properties":{"location.radio":"ca","vendor.userTier":"free","vendor.userID":"a989","user.id":"a989","user.tier":"free","location.region":"ca"},"client_upload_time":"2019-01-17 15:00:00.424000","$insert_id":"e8410","event_type":"LOADED","library":"amp\/4.5.2","vendor_attribution_ids":null,"device_type":"Mac","device_manufacturer":null,"start_version":null,"location_lng":null,"server_upload_time":"2019-01-17 15:00:00.493000","event_id":64,"location_lat":null,"os_name":"Chrome","vendor_event_type":null,"device_brand":null,"groups":{},"event_properties":{"content.authenticated":false,"content.subsection1":"regions","custom.DNT":true,"content.subsection2":"ca","referrer.url":"","content.url":"","content.type":"index","content.title":"","custom.cookiesenabled":true,"app.pillar":"feed","content.area":"news","app.name":"oc"},"data":{},"device_id":"","language":"English","device_model":"Mac","country":"","region":"","is_attribution_event":false,"adid":null,"session_id":15,"device_family":"Mac","sample_rate":null,"idfa":null,"client_event_time":"2019-01-17 14:59:59.987000"}
{"server_received_time":"2019-01-17 15:00:00.913000","app":161,"device_carrier":null,"$schema":12,"city":"Fo","user_id":null,"uuid":"9052","event_time":"2019-01-17 15:00:00.566000","platform":"Web","os_version":"71","vendor_id":797,"processed_time":"2019-01-17 15:00:01.301936","user_creation_time":"2019-01-17 15:00:00.566000","version_name":null,"ip_address":null,"paying":null,"dma":"CO","group_properties":{},"user_properties":{"user.tier":"free"},"client_upload_time":"2019-01-17 15:00:00.157000","$insert_id":"69ae","event_type":"START WEB SESSION","library":"amp\/4.5.2","vendor_attribution_ids":null,"device_type":"Android","device_manufacturer":null,"start_version":null,"location_lng":null,"server_upload_time":"2019-01-17 15:00:00.925000","event_id":1,"location_lat":null,"os_name":"Chrome Mobile","vendor_event_type":null,"device_brand":null,"groups":{},"event_properties":{"content.subsection3":"home","content.subsection2":"archives","content.title":"","content.keywords.subject":["Lifestyle\/Recreation and leisure\/Outdoor recreation\/Boating","Lifestyle\/Relationships\/Couples","General news\/Weather","Oddities"],"content.publishedtime":154687,"app.name":"oc","referrer.url":"","content.subsection1":"archives","content.url":"","content.authenticated":false,"content.keywords.location":["Ot"],"content.originaltitle":"","content.type":"story","content.authors":["Archives"],"app.pillar":"feed","content.area":"news","content.id":"1.49","content.updatedtime":1546878600538,"content.keywords.tag":["24 1","boat house","Ot","Rockcliffe","River","m"],"content.keywords.person":["Ber","Shi","Jea","Jean\u00e9tien"]},"data":{"first_event":true},"device_id":"","language":"English","device_model":"Android","country":"","region":"","is_attribution_event":false,"adid":null,"session_id":15477,"device_family":"Android","sample_rate":null,"idfa":null,"client_event_time":"2019-01-17 14:59:59.810000"}
{"server_received_time":"2019-01-17 15:00:00.913000","app":16,"device_carrier":null,"$schema":12,"city":"","user_id":null,"uuid":"905","event_time":"2019-01-17 15:00:00.574000","platform":"Web","os_version":"71","vendor_id":7973,"processed_time":"2019-01-17 15:00:01.301957","user_creation_time":"2019-01-17 15:00:00.566000","version_name":null,"ip_address":null,"paying":null,"dma":"DCO","group_properties":{},"user_properties":{"user.tier":"free"},"client_upload_time":"2019-01-17 15:00:00.157000","$insert_id":"d045","event_type":"LOADED","library":"am-js\/4.5.2","vendor_attribution_ids":null,"device_type":"Android","device_manufacturer":null,"start_version":null,"location_lng":null,"server_upload_time":"2019-01-17 15:00:00.925000","event_id":2,"location_lat":null,"os_name":"Chrome Mobile","vendor_event_type":null,"device_brand":null,"groups":{},"event_properties":{"content.subsection3":"home","content.subsection2":"archives","content.subsection1":"archives","content.keywords.subject":["Lifestyle\/Recreation and leisure\/Outdoor recreation\/Boating","Lifestyle\/Relationships\/Couples","General news\/Weather","Oddities"],"content.type":"story","content.keywords.location":["Ot"],"app.pillar":"feed","app.name":"oc","content.authenticated":false,"custom.DNT":false,"content.id":"1.4","content.keywords.person":["Ber","Shi","Jea","Je\u00e9tien"],"content.title":"","content.url":"","content.originaltitle":"","custom.cookiesenabled":true,"content.authors":["Archives"],"content.publishedtime":1546878600538,"referrer.url":"","content.area":"news","content.updatedtime":1546878600538,"content.keywords.tag":["24 1","boat house","O","Rockcliffe","River","pr"]},"data":{},"device_id":"","language":"English","device_model":"Android","country":"","region":"","is_attribution_event":false,"adid":null,"session_id":1547737199081,"device_family":"Android","sample_rate":null,"idfa":null,"client_event_time":"2019-01-17 14:59:59.818000"}
Here's the sample query against the table:
SELECT
CAST(JSON_EXTRACT_SCALAR(data,'$.uuid')AS INT64) AS uuid_id,
CAST(JSON_EXTRACT_SCALAR(data,'$.event_time') AS TIMESTAMP) AS event_time,
JSON_EXTRACT_SCALAR(data,'$[event_properties].app.name') AS app_name,
JSON_EXTRACT_SCALAR(data,'$[user_properties].user.tier') AS user_tier
FROM
mytable
Above query give null result for app_name & user_tier columns even though data exists for them.
Following the BigQuery JSON function documentation - JSON Functions in Standard SQL
In cases where a JSON key uses invalid JSONPath characters, you can escape those characters using single quotes and brackets, [' '].
and running the query as:
SELECT
CAST(JSON_EXTRACT_SCALAR(data,"$.uuid_id")AS INT64) AS uuid_id,
CAST(JSON_EXTRACT_SCALAR(data,"$.event_time") AS TIMESTAMP) AS event_time,
JSON_EXTRACT_SCALAR(data,"$.event_properties.['app.name']") AS app_name,
JSON_EXTRACT_SCALAR(data,"$.user_properties.['user.tier']") AS user_tier
FROM
mytable
result into following error:
Invalid token in JSONPath at: .['app.name']
Please advise. What am I missing here?
You have an extra . before the [. Use
"$.event_properties['app.name']"
I've got some problems with extracting values from nested json values in column.
I've got a column of data with values that looks almost like nested json, but some of jsons got \ between values and I need to clean them.
JSON looks like this:
{"mopub_json":
"{\"currency\":\"USD\",
\"country\":\"US\",
\"publisher_revenue\":0.01824}
"}
I need to get currency and publisher revenue as different columns and try this:
SET json_serialization_enable TO true;
SET json_serialization_parse_nested_strings TO true;
SELECT
JSON_EXTRACT_PATH_TEXT(column_name, 'mopub_json', 'publisher_revenue') as revenue_mopub,
JSON_EXTRACT_PATH_TEXT(column_name, 'mopub_json', 'currency') as currency_mopub
FROM(
SELECT replace(column_name, "\t", '')
FROM table_name)
I receive the next error:
[Amazon](500310) Invalid operation: column "\t" does not exist in events
When I'm trying this:
SET json_serialization_parse_nested_strings TO true;
SELECT
JSON_EXTRACT_PATH_TEXT(column_name, 'mopub_json', 'publisher_revenue') as revenue_mopub,
JSON_EXTRACT_PATH_TEXT(column_name, 'mopub_json', 'currency') as currency_mopub
FROM(
SELECT replace(column_name, chr(92), '')
FROM table_name)
I receive
Invalid operation: JSON parsing error
When I'm trying to extract values without replacing , I'm receiving empty columns.
Thank you for your help!
So your json isn't valid. JSON doesn't allow multiline text strings but I expect that the issue. Based on your query I think you don't want a single key and string but the whole structure. The reason the that quotes are backslashed is because they are inside a string. The json should look like:
{
"mopub_json": {
"currency": "USD",
"country": "US",
"publisher_revenue": 0.01824
}
}
Then the SQL you have should work.