BIGQUERY An internal error occurred and the request could not be completed Error: 80038528 - sql

I am trying to install a function. I don't understand what the problem is. I can install correctly when:
I delete the pivot
I use the Table and not the unnest only (so from the table, unnest(a))
CREATE OR REPLACE FUNCTION `dataset.function_naming` (a ARRAY<STRUCT<ROW_ID STRING, KEY STRING, VALUE STRING>>, id_one STRING, id_two STRING, start_date DATE, end_date DATE) RETURNS INT64
AS (
with tmp1 as (
select ROW_ID,X,Y,Z,W
from
(
select prop.ROW_ID,prop.KEY, prop.VALUE
from unnest(a) prop
where prop.KEY in ('X','Y','Z','W')
)
PIVOT
(
MAX(VALUE)
FOR UPPER(KEY) in('X','Y','Z','W')
) as PIVOT
)
select case when X is not null then 1,
when Y is not null then 2,
when Z is not null then 2,
when W is not null then 2
else 0
from tmp1
);
Thanks all.

There are few minor issues I see in your code.
missing extra (...) around function body
extra commas (,) within case statement
So, try below
CREATE OR REPLACE FUNCTION `dataset.function_naming` (
a ARRAY<STRUCT<ROW_ID STRING, KEY STRING, VALUE STRING>>,
id_one STRING,
id_two STRING,
start_date DATE,
end_date DATE
) RETURNS INT64
AS ((
with tmp1 as (
select ROW_ID,X,Y,Z,W
from
(
select prop.ROW_ID,prop.KEY, prop.VALUE
from unnest(a) prop
where prop.KEY in ('X','Y','Z','W')
)
PIVOT
(
MAX(VALUE)
FOR UPPER(KEY) in('X','Y','Z','W')
) as PIVOT
)
select case when X is not null then 1
when Y is not null then 2
when Z is not null then 2
when W is not null then 2
else 0
end
from tmp1
));

Seams there is an internal issue when using pivots and the unnest on the array. You can use the following, that executes the same logic, and also, create an case on issue tracker, as a BigQuery issue with Google cloud Support.
CREATE OR REPLACE FUNCTION `<dataset>.function_naming` (
a ARRAY<STRUCT<ROW_ID STRING, KEY STRING, VALUE STRING>>,
id_one STRING,
id_two STRING,
start_date DATE,
end_date DATE
) RETURNS INT64
AS (( WITH tmp AS (
SELECT
CASE
WHEN KEY="X" THEN 1
WHEN KEY="Y" THEN 2
WHEN KEY="Z" THEN 2
WHEN KEY="W" THEN 2
ELSE
0
END
teste_column
#-- FROM ( SELECT UPPER(prop.KEY) KEY, MAX(prop.VALUE) VALUE FROM -- following your query patern, but not really necessary
FROM ( SELECT UPPER(prop.KEY) KEY FROM
UNNEST(a) prop
WHERE
UPPER(key) IN ('X', 'Y', 'Z', 'W')
GROUP BY key )
ORDER BY teste_column DESC LIMIT 1 )
SELECT * FROM tmp
UNION ALL
SELECT 0 teste_column
FROM (SELECT 1)
LEFT JOIN tmp
ON FALSE
WHERE NOT EXISTS ( SELECT 1 FROM tmp)
));
#--- Testing the function:
select `<project>.<dataset>.function_naming`([STRUCT("1" AS ROW_ID, "x" AS KEY, "10"AS VALUE), STRUCT("1" AS ROW_ID, "x" AS KEY, "20"AS VALUE), STRUCT("1" AS ROW_ID, "w" AS KEY, "20"AS VALUE), STRUCT("1" AS ROW_ID, "y" AS KEY, "20"AS VALUE)], "1", "2", "2022-12-10", "2022-12-10")

Related

ARRAY_CONTACT() returns empty array

I am compiling a list of values per users from 2 different columns into a single array like:
with test as (
select 1 as userId, 'something' as value1, cast(null as string) as value2
union all
select 1 as userId, cast(null as string), cast(null as string)
)
select
userId,
ARRAY_CONCAT(
ARRAY_AGG(distinct value1 ignore nulls ),
ARRAY_AGG(distinct value2 ignore nulls )
) as combo,
from test
group by userId
Everything works one until ARRAY_AGG() but then the ARRAY_CONCAT() just won't have it and returns and empty array [] whereas I expect it to be ['something'].
I am at loss as to why this is happening and whether I can force a workaround here.
I am at loss as to why this is happening ...
ARRAY_CONCAT function returns NULL if any input argument is NULL
... and whether I can force a workaround here
Use below workaround
select
userid,
array_concat(
ifnull(array_agg(distinct value1 ignore nulls ), []),
ifnull(array_agg(distinct value2 ignore nulls ), [])
) as combo,
from test
group by userid
if applied to sample data in your question - output is

BigQuery: Get the first non-null value in a window?

I would like to obtain the first non-null, non-"undefined" value in a list of values as part of a window.
Minimal example:
Given the following code:
SELECT
FIRST_VALUE(
CASE WHEN val = "undefined" THEN NULL ELSE val END
IGNORE NULLS
)
OVER (ORDER BY order_key)
AS res
FROM (
SELECT 1 AS order_key, CAST(NULL AS STRING) AS val
UNION ALL
SELECT 2 AS order_key, "undefined" AS val
UNION ALL
SELECT 3 AS order_key, "value" AS val
) base
I'd expect
res
value
value
value
as the result set. Yet, the result given by the above is the following:
res
null
null
value
The documentation states the following:
FIRST_VALUE (value_expression [{RESPECT | IGNORE} NULLS])
Returns the value of the value_expression for the first row in the current window frame.
This function includes NULL values in the calculation unless IGNORE NULLS is present. If IGNORE NULLS is present, the function excludes NULL values from the calculation.
Yet it seems like value_expression is not what is tested for NULLs in this case.
It seems that instead FIRST_VALUE checks NULLs against the source field, not the CASE statement (effectively value_expression in the above).
While the problem can easily be fixed by doing the case as part of the subquery, I'd like to better understand why this is an issue. Why does FIRST_VALUE not ignore the NULLs provided through the CASE statement?
Alternative to the logic above:
If you are willing to remodel your query, instead of using a window function (FIRST_VALUE), the same effect can be achieved via an ARRAY_AGG(expr IGNORE NULLS ORDER BY ordering)[OFFSET(0)]:
SELECT
id,
ARRAY_AGG(
CASE WHEN val = 'undefined' THEN NULL ELSE val END
IGNORE NULLS
ORDER BY order_key
)[OFFSET(0)]
AS res
FROM (
SELECT 1 AS id, 1 AS order_key, CAST(NULL AS STRING) AS val
UNION ALL
SELECT 1 AS id, 2 AS order_key, 'undefined' AS val
UNION ALL
SELECT 1 AS id, 3 AS order_key, "value" AS val
UNION ALL
SELECT 2 AS id, 1 AS order_key, CAST(NULL AS STRING) AS val
UNION ALL
SELECT 2 AS id, 2 AS order_key, 'undefined' AS val
UNION ALL
SELECT 2 AS id, 3 AS order_key, "value" AS val
) base
GROUP BY id
Given an empty record set for the group, ARRAY_AGG(...)[OFFSET(0)] will return NULL
Given a non-empty record set for the group, ARRAY_AGG(...)[OFFSET(0)] will return the first result of value_expression that is non-NULL, ordered by the ORDER BY clause provided.
The only downside (beside maybe performance?) is that you'll need to create a common table expression with this logic and then join it with your table that was using window functions.
To get expected result you just need to add DESC to the ORDER BY as in below
SELECT
FIRST_VALUE(
CASE WHEN val = "undefined" THEN NULL ELSE val END
IGNORE NULLS
)
OVER (ORDER BY order_key DESC)
AS res
FROM (
SELECT 1 AS order_key, CAST(NULL AS STRING) AS val UNION ALL
SELECT 2 AS order_key, "undefined" AS val UNION ALL
SELECT 3 AS order_key, "value" AS val
) base
so the result now is

Find and replace pattern inside BigQuery string

Here is my BigQuery table. I am trying to find out the URLs that were displayed but not viewed.
create table dataset.url_visits(ID INT64 ,displayed_url string , viewed_url string);
select * from dataset.url_visits;
ID Displayed_URL Viewed_URL
1 url11,url12 url12
2 url9,url12,url13 url9
3 url1,url2,url3 NULL
In this example, I want to display
ID Displayed_URL Viewed_URL unviewed_URL
1 url11,url12 url12 url11
2 url9,url12,url13 url9 url12,url13
3 url1,url2,url3 NULL url1,url2,url3
Split the each string into an array and unnest them. Do a case to check if the items are in each other and combine to an array or a string.
Select ID, string_agg(viewing ) as viewed,
string_agg(not_viewing ) as not_viewed,
array_agg(viewing ignore nulls) as viewed_array
from (
Select ID ,
case when display in unnest(split(Viewed_URL)) then display else null end as viewing,
case when display in unnest(split(Viewed_URL)) then null else display end as not_viewing,
from (
Select 1 as ID, "url11,url12" as Displayed_URL, "url12" as Viewed_URL UNION ALL
Select 2, "url9,url12,url13", "url9" UNION ALL
Select 3, "url1,url2,url3", NULL UNION ALL
Select 4, "url9,url12,url13", "url9,url12"
),unnest(split(Displayed_URL)) as display
)
group by 1
Consider below approach
select *, (
select string_agg(url)
from unnest(split(Displayed_URL)) url
where url != ifnull(Viewed_URL, '')
) unviewed_URL
from `project.dataset.table`
if applied to sample data in your question - output is

how to convert jsonarray to multi column from hive

example:
there is a json array column(type:string) from a hive table like:
"[{"filed":"name", "value":"alice"}, {"filed":"age", "value":"14"}......]"
how to convert it into :
name age
alice 14
by hive sql?
I've tried lateral view explode but it's not working.
thanks a lot!
This is working example of how it can be parsed in Hive. Customize it yourself and debug on real data, see comments in the code:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field_map['field'] = 'name' then field_map['value'] end) as name,
max(case when field_map['field'] = 'age' then field_map['value'] end) as age --do the same for all fields
from
(
select t.id,
t.str as original_string,
str_to_map(regexp_replace(regexp_replace(trim(a.field),', +',','),'\\{|\\}|"','')) field_map --remove extra characters and convert to map
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14
One more approach using get_json_object:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field = 'name' then value end) as name,
max(case when field = 'age' then value end) as age --do the same for all fields
from
(
select t.id,
get_json_object(trim(a.field),'$.field') field,
get_json_object(trim(a.field),'$.value') value
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14

How to select columns of data in BigQuery that has all NULL values

How to select columns of data in BigQuery that has all NULL values
A B C
NULL 1 NULL
NULL NULL NULL
NULL 2 NULL
NULL 3 NULL
I want to retrieve columns A and C. Please can you help!!
Expanding on my comment on Mikhail's answer, this is what I had in mind. It doesn't require generating a query string, which could be quite long if you have a large number of columns. It compares the count of null values for each column name to the total number of rows in the table to decide if the column should be included in the result.
#standardSQL
WITH `project.dataset.table` AS (
SELECT NULL A, 1 B, NULL C UNION ALL
SELECT NULL, NULL, NULL UNION ALL
SELECT NULL, 2, NULL UNION ALL
SELECT NULL, 3, NULL
)
SELECT null_column
FROM `project.dataset.table` AS t,
UNNEST(REGEXP_EXTRACT_ALL(
TO_JSON_STRING(t),
r'\"([a-zA-Z0-9\_]+)\":null')
) AS null_column
GROUP BY null_column
HAVING COUNT(*) = (SELECT COUNT(*) FROM `project.dataset.table`);
Below is for BigQuery StandardSQL
Simple option:
#standardSQL
WITH `project.dataset.table` AS (
SELECT NULL A, 1 B, NULL C UNION ALL
SELECT NULL, NULL, NULL UNION ALL
SELECT NULL, 2, NULL UNION ALL
SELECT NULL, 3, NULL
)
SELECT COUNT(A) A, COUNT(B) B, COUNT(C) C
FROM `project.dataset.table`
it returns below where 0(zero) indicates that respective column has all NULLs
A B C
0 3 0
If this is "not enough" - below is more "sophisticated" version:
#standardSQL
WITH `project.dataset.table` AS (
SELECT NULL A, 1 B, NULL C UNION ALL
SELECT NULL, NULL, NULL UNION ALL
SELECT NULL, 2, NULL UNION ALL
SELECT NULL, 3, NULL
)
SELECT SPLIT(y, ':')[OFFSET(0)] column
FROM (
SELECT REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}"]', '') x
FROM (
SELECT COUNT(A) A, COUNT(B) B, COUNT(C) C
FROM `project.dataset.table`
) t
), UNNEST(SPLIT(x)) y
WHERE CAST(SPLIT(y, ':')[OFFSET(1)] AS INT64) = 0
it returns result as below - enlisting only columns with all NULLs
column
A
C
Note: for your real table - just remove WITH block and replace project.dataset.table with your real table reference
Also, of course, use real column names
My table has round 700 columns..
Below is an example of how you can easily generate above query for any number of columns.
1. Just run below
2. Copy result - this is a generated query
3. paste generated query into new UI and run it
4. Enjoy (I hope you will) result :o)
Of course, as usually replace project.dataset.table with your real table reference
#standardSQL
SELECT
CONCAT('''
SELECT SPLIT(y, ':')[OFFSET(0)] column
FROM (
SELECT REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}"]', '') x
FROM (
SELECT ''', y,
'''
FROM `project.dataset.table`
) t
), UNNEST(SPLIT(x)) y
WHERE CAST(SPLIT(y, ':')[OFFSET(1)] AS INT64) = 0
'''
)
FROM (
SELECT
STRING_AGG(CONCAT('COUNT(', x, ') ', x), ', ') y
FROM (
SELECT REGEXP_EXTRACT_ALL(REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}]', ''), r'"([\w_]+)":') x
FROM `project.dataset.table` t
LIMIT 1
), UNNEST(x) x
)
Note: please pay attention to query cost - both "generation query" and final query itself will do full scan
You can generate columns list much cheaper off of table schema in any client of your choice
To test / play with it - you can use same dummy data as for initial queries in my answer