I am using the following SQL (from another question) which contains temporary functions.
create temp function extract_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function extract_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
create temp function extract_all_leaves(input string) returns string language js as '''
function flattenObj(obj, parent = '', res = {}){
for(let key in obj){
let propName = parent ? parent + '.' + key : key;
if(typeof obj[key] == 'object'){
flattenObj(obj[key], propName, res);
} else {
res[propName] = obj[key];
}
}
return JSON.stringify(res);
}
return flattenObj(JSON.parse(input));
''';
select col || replace(replace(key, 'value', ''), '.', '-') as col, value,
from your_table,
unnest([struct(extract_all_leaves(data) as json)]),
unnest(extract_keys(json)) key with offset
join unnest(extract_values(json)) value with offset
using(offset)
I want to save the above query as a view, but I cannot include the temporary functions, so I planned to define these as user-defined functions that can be called as part of the view.
When defining the functions, I'm having some trouble getting the input and output types defined correctly. Here's the three user defined functions.
CREATE OR REPLACE FUNCTION `dataset.json_extract_all_leaves`(Obj String)
RETURNS String
LANGUAGE js AS """
function flattenObj(obj, parent = '', res = {}){
for(let key in obj){
let propName = parent ? parent + '.' + key : key;
if(typeof obj[key] == 'object'){
flattenObj(obj[key], propName, res);
} else {
res[propName] = obj[key];
}
}
return JSON.stringify(res);
}
return flattenObj(JSON.parse(input));
"""
CREATE OR REPLACE FUNCTION `dataset.json_extract_keys`(input String)
RETURNS Array<String>
LANGUAGE js AS """
return Object.keys(JSON.parse(input));
"""
finally
CREATE OR REPLACE FUNCTION `dataform.json_extract_values`(input STRING)
RETURNS Array<String>
LANGUAGE js AS """
return Object.values(JSON.parse(input));
"""
Those three functions are created successfully, but when I come to use them in this view
WITH extract_all AS (
select
id,
field,
created,
key || replace(replace(key, 'value', ''), '.', '-') as key_name, value,
FROM `dataset.raw_keys_and_values`,
unnest([struct(`dataset.json_extract_all_leaves`(setting_value) as json)]),
unnest(`dataset.json_extract_keys`(json)) key with offset
join unnest(`dataset.json_extract_values`(json)) value with offset
using(offset)
)
SELECT *
FROM
extract_all
This fails with the following error
Error: Multiple errors occurred during the request. Please see the `errors` array for complete details. 1. Failed to coerce output value "{\"value\":true}" to type ARRAY<STRING>
I understand there's a mismatch somewhere between the expected return value of json_extract_values, but I can't understand if it's in the SQL or JavaScript UDF?
Revised Answer
I've given the original ask another read and contrasted with some experimentation in my test data set.
While I'm unable to reproduce the given error, I did experience related difficulty with the following line:
unnest([struct(`dataset.json_extract_all_leaves`(setting_value) as json)]),
Put simply, the function being called takes a string (presumably a stringified JSON value) and returns a similarly stringified JSON value with the result. Because UNNEST can only be used with arrays, the author surrounds the output with [struct and ] which may be the issue. Again, in an effort to yield the same result as I do below, but using the original functions, I would propose that the SQL block be updated to the following:
create temp function extract_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function extract_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
create temp function extract_all_leaves(input string) returns string language js as '''
function flattenObj(obj, parent = '', res = {}){
for(let key in obj){
let propName = parent ? parent + '.' + key : key;
if(typeof obj[key] == 'object'){
flattenObj(obj[key], propName, res);
} else {
res[propName] = obj[key];
}
}
return JSON.stringify(res);
}
return flattenObj(JSON.parse(input));
''';
WITH extract_all AS (
select
id,
field,
created,
properties
FROM
UNNEST([
STRUCT<id int, field string, created DATE, properties string>(1, 'michael', DATE(2022, 5, 1), '[[{"name":"Andy","age":7},{"name":"Mark","age":5},{"name":"Courtney","age":6}], [{"name":"Austin","age":8},{"name":"Erik","age":6},{"name":"Michaela","age":6}]]'),
STRUCT<id int, field string, created DATE, properties string>(2, 'sarah', DATE(2022, 5, 2), '[{"name":"Angela","age":9},{"name":"Ryan","age":7},{"name":"Andrew","age":7}]'),
STRUCT<id int, field string, created DATE, properties string>(3, 'rosy', DATE(2022, 5, 3), '[{"name":"Brynn","age":4},{"name":"Cameron","age":3},{"name":"Rebecca","age":5}]')
])
AS myData
)
SELECT
id,
field,
created,
key,
value
FROM (
SELECT
*
FROM extract_all,
UNNEST(extract_keys(extract_all_leaves(properties))) key WITH OFFSET
JOIN UNNEST(extract_values(extract_all_leaves(properties))) value WITH OFFSET
USING(OFFSET)
)
Put simply - remove the extract_all_leaves line with its array casting and perform it in the offset-joined pair of keys and values, then put all that in a subquery so you can cleanly pull out just the columns you want.
And to explicitly answer the asked question, I believe the issue is in the SQL because of the type casting in the offending line and my own inability to get it to cleanly pair with the subsequent UNNEST queries against its output.
Original Answer
I gather that you've got some sort of JSON object in your settings_value field and you're trying to sift out a result that shows the keys and values of that object alongside the other columns in your dataset.
As others mentioned in the comments, this is a bit of a puzzle to figure out precisely why your query isn't working without any sample data, so happy to re-visit this if you can provide a record or two I can drop in to validate against, but here's an end-to-end that yields my guess as to what you're aiming for. In lieu of that, I've created some sample records intended to be in the same spirit of what you provided.
Based on your use of joining by the offset, I'm supposing that you're really just wanting to see all the keys and their values, paired with the other columns. Assuming that's true, I propose using a different JavaScript function that yields an array of all the key/value pairs instead of two separate functions to yield their own arrays. It simplifies the query (and more importantly, works):
create temp function extract_all_leaves(input string) returns string language js as r'''
function flattenObj(obj, parent = '', res = {}){
for(let key in obj){
let propName = parent ? parent + '.' + key : key;
if(typeof obj[key] == 'object'){
flattenObj(obj[key], propName, res);
} else {
res[propName] = obj[key];
}
}
return JSON.stringify(res);
}
return flattenObj(JSON.parse(input));
''';
create temp function extract_key_values(input string) returns array<struct<key string, value string>> language js as r"""
var parsed = JSON.parse(input);
var keys = Object.keys(parsed);
var result = [];
for (var ii = 0; ii < keys.length; ii++) {
var o = {key: keys[ii], value: parsed[keys[ii]]};
result.push(o);
}
return result;
""";
WITH extract_all AS (
select
id,
field,
created,
properties
FROM
UNNEST([
--STRUCT<id int, field string, created DATE, properties string>(1, 'michael', DATE(2022, 5, 1), '[[{"name":"Andy","age":7},{"name":"Mark","age":5},{"name":"Courtney","age":6}], [{"name":"Austin","age":8},{"name":"Erik","age":6},{"name":"Michaela","age":6}]]'),
STRUCT<id int, field string, created DATE, properties string>(2, 'sarah', DATE(2022, 5, 2), '[{"name":"Angela","age":9},{"name":"Ryan","age":7},{"name":"Andrew","age":7}]'),
STRUCT<id int, field string, created DATE, properties string>(3, 'rosy', DATE(2022, 5, 3), '[{"name":"Brynn","age":4},{"name":"Cameron","age":3},{"name":"Rebecca","age":5}]')
])
AS myData
)
SELECT
id,
field,
created,
key,
value
FROM (
SELECT
*
FROM extract_all
CROSS JOIN UNNEST(extract_key_values(extract_all_leaves(properties)))
)
And I believe this yields a result more like what you're seeking:
id
field
created
key
value
2
sarah
2022-05-02
0.name
Angela
2
sarah
2022-05-02
0.age
9
2
sarah
2022-05-02
1.name
Ryan
2
sarah
2022-05-02
1.age
7
2
sarah
2022-05-02
2.name
Andrew
2
sarah
2022-05-02
2.age
7
3
rosy
2022-05-03
0.name
Brynn
3
rosy
2022-05-03
0.age
4
3
rosy
2022-05-03
1.name
Cameron
3
rosy
2022-05-03
1.age
3
3
rosy
2022-05-03
2.name
Rebecca
3
rosy
2022-05-03
2.age
5
Of course, if this isn't at all in the right place of where you're trying to get to.
Using this javascript code we can remove accents/diacritics in a string.
var originalText = "éàçèñ"
var result = originalText.normalize('NFD').replace(/[\u0300-\u036f]/g, "")
console.log(result) // eacen
If we create a BigQuery UDF it does not (even with double \).
CREATE OR REPLACE FUNCTION project.remove_accent(x STRING)
RETURNS STRING
LANGUAGE js AS """
return x.normalize("NFD").replace(/[\u0300-\u036f]/g, "");
""";
SELECT project.remove_accent("éàçèñ") --"éàçèñ"
Any thoughts on that?
Consider below approach
select originalText,
regexp_replace(normalize(originalText, NFD), r"\pM", '') output
if applied to sample data in your question - output is
You can easily wrap it with SQL UDF if you wish
I am having a procedure on snowflake which executing the following query:
select
array_size(split($1, ',')) as NO_OF_COL,
split($1, ',') as COLUMNS_ARRAY
from
#mystage/myfile.csv(file_format => 'ONE_COLUMN_FILE_FORMAT')
limit 1;
And the result would be like:
Why I run this query in a procedure:
CREATE OR REPLACE PROCEDURE ADD_TEMPORARY_TABLE(TEMP_TABLE_NAME STRING, FILE_FULL_PATH STRING, ONE_COLUMN_FORMAT_FILE STRING, FILE_FORMAT_NAME STRING)
RETURNS variant
LANGUAGE JAVASCRIPT
EXECUTE AS CALLER
AS
$$
try{
var final_result = [];
var nested_obj = {};
var nbr_rows = 0;
var NO_OF_COL = 0;
var COLUMNS_ARRAY = [];
var get_length_and_columns_array = "select array_size(split($1,',')) as NO_OF_COL, "+
"split($1,',') as COLUMNS_ARRAY from "+FILE_FULL_PATH+" "+
"(file_format=>"+ONE_COLUMN_FORMAT_FILE+") limit 1";
var stmt = snowflake.createStatement({sqlText: get_length_and_columns_array});
var array_result = stmt.execute();
array_result.next();
//return array_result.getColumnValue('COLUMNS_ARRAY');
NO_OF_COL = array_result.getColumnValue('NO_OF_COL');
COLUMNS_ARRAY = array_result.getColumnValue('COLUMNS_ARRAY');
return COLUMNS_ARRAY;
}
...
$$;
It will return an error as the following:
{
"code": 100183,
"message": "Given column name/index does not exist: NO_OF_COL",
"stackTraceTxt": "At ResultSet.getColumnValue, line 16 position 29",
"state": "P0000",
"toString": {}
}
The other issue is if I keep trying, it will return the desired array, but most of the times is returning this error.
The other issue is if I keep trying, it will return the desired array
If it works one time and not another time, my educated guess is that the stored procedure is called from different schemas.
Querying stage
( FILE_FORMAT => '<namespace>.<named_file_format>' )
If referencing a file format in the current namespace for your user session, you can omit the single quotes around the format identifier.
In the standalone query we can see:
select
array_size(split($1, ',')) as NO_OF_COL,
split($1, ',') as COLUMNS_ARRAY
from
#mystage/myfile.csv(file_format => 'ONE_COLUMN_FILE_FORMAT')
>-< >-<
But in the stored procedure body:
"(file_format=>"+ONE_COLUMN_FORMAT_FILE+") limit 1";
--here the text is appended, but without wrapping with ''
=>
"(file_format=>'"+ONE_COLUMN_FORMAT_FILE+"') limit 1";
Suggestion: always provide file format as as string wrapped with ', preferably prefixed with namespace '<schema_name>.<format_name>'.
I have a code block below for parsing query params using udf. It works fine when the value passed to function is hardcoded as in the example. Thought when I try to parse the same value fetched from a table I get a
An internal error occurred and the request could not be completed. (error code: internalError)
CREATE TEMPORARY FUNCTION parse(queryString STRING) RETURNS ARRAY<STRUCT<key STRING, value STRING>> LANGUAGE js AS
"""
var params = {}
var array = []
// split into key/value pairs
var queries = queryString.split('&');
var ind = 0
// convert the array of strings into an object
for (var i = 0; i < queries.length; i++ ) {
var temp = queries[i].split('=');
if(temp.length < 2) continue;
array[ind++] = { key: temp[0], value: decodeURI(temp[1]) }
}
return array;
""";
select parse('ca_chid=2002810&ca_source=gaw&ca_ace=&ca_nw=g&ca_dev=c&ca_pl=&ca_pos=1t3&ca_agid=32438864366&ca_caid=260997846&ca_adid=151983037851&ca_kwt=florists%20in%20walsall&ca_mt=e&ca_fid=&ca_tid=aud-117534990726:kwd-420175760&ca_lp=9045676&ca_li=&ca_devm=&ca_plt=&ca_sadt=&ca_smid=&ca_spc=&ca_spid=&ca_sco=&ca_sla=&ca_sptid=&ca_ssc=&gclid=CLaDoa6ZrdACFcyRGwodG8IFvQ') as params
--not working
--select parse(page_urlquery) from (
--SELECT page_urlquery FROM `query_param_snapshot` where page_urlquery != '' LIMIT 1
Also reported on the issue tracker (we are working on a fix). One workaround is to use a SQL function rather than a JavaScript function, e.g.:
CREATE TEMPORARY FUNCTION parse(queryString STRING)
RETURNS ARRAY<STRUCT<key STRING, value STRING>> AS (
(SELECT
ARRAY_AGG(STRUCT(
entry[OFFSET(0)] AS key,
entry[OFFSET(1)] AS value))
FROM (
SELECT SPLIT(pairString, '=') AS entry
FROM UNNEST(SPLIT(queryString, '&')) AS pairString)
)
);
SELECT parse('ca_chid=2002810&ca_source=gaw&ca_ace=&ca_nw=g&ca_dev=c&ca_pl=&ca_pos=1t3&ca_agid=32438864366&ca_caid=260997846&ca_adid=151983037851&ca_kwt=florists%20in%20walsall&ca_mt=e&ca_fid=&ca_tid=aud-117534990726:kwd-420175760&ca_lp=9045676&ca_li=&ca_devm=&ca_plt=&ca_sadt=&ca_smid=&ca_spc=&ca_spid=&ca_sco=&ca_sla=&ca_sptid=&ca_ssc=&gclid=CLaDoa6ZrdACFcyRGwodG8IFvQ') AS params;
I'm converting some SQL code from BigQuery to BigQuery Standard SQL.
I can't seem to find JSON_EXTRACT_SCALAR in Bigquery Standard SQL, is there an equivalent?
Edit: we implemented the JSON functions a while back. You can read about them in the documentation.
Not that I know of, but there is always workaround
Let's assume we want to mimic example from JSON_EXTRACT_SCALAR documentation
SELECT JSON_EXTRACT_SCALAR('{"a": ["x", {"b":3}]}', '$.a[1].b') as str
Below code does same
CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING)
RETURNS STRING
LANGUAGE js AS """
try { var parsed = JSON.parse(json);
} catch (e) { return null }
return parsed.a[1].b;
""";
SELECT CUSTOM_JSON_EXTRACT('{"a": ["x", {"b":3}]}') AS str
I think this can be good starting point to experiment with
see more for Scalar UDF in BigQuery Standard SQL
Quick update
After cup of coffee, decided to complete this "exercise" by myself
Look as a good short term solution to me :o)
CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING)
RETURNS STRING
LANGUAGE js AS """
try { var parsed = JSON.parse(json);
} catch (e) { return null }
return eval(json_path.replace("$", "parsed"));
""";
SELECT
CUSTOM_JSON_EXTRACT('{"a": ["x", {"b":3}]}', '$.a[1].b') AS str1,
CUSTOM_JSON_EXTRACT('{"a": ["x", {"b":3}]}', '$.a[0]') AS str2,
CUSTOM_JSON_EXTRACT('{"a": 1, "b": [4, 5]}', '$.b') AS str3