I am compiling a list of values per users from 2 different columns into a single array like:
with test as (
select 1 as userId, 'something' as value1, cast(null as string) as value2
union all
select 1 as userId, cast(null as string), cast(null as string)
)
select
userId,
ARRAY_CONCAT(
ARRAY_AGG(distinct value1 ignore nulls ),
ARRAY_AGG(distinct value2 ignore nulls )
) as combo,
from test
group by userId
Everything works one until ARRAY_AGG() but then the ARRAY_CONCAT() just won't have it and returns and empty array [] whereas I expect it to be ['something'].
I am at loss as to why this is happening and whether I can force a workaround here.
I am at loss as to why this is happening ...
ARRAY_CONCAT function returns NULL if any input argument is NULL
... and whether I can force a workaround here
Use below workaround
select
userid,
array_concat(
ifnull(array_agg(distinct value1 ignore nulls ), []),
ifnull(array_agg(distinct value2 ignore nulls ), [])
) as combo,
from test
group by userid
if applied to sample data in your question - output is
Related
I'm trying to summarize data in a table:
counting total rows
counting values on specific fields
getting the distinct values on specific fields
and, more importantly, I'm struggling with:
getting the count for each field nested in an object
given this data
COL1
COL2
A
0
null
1
B
null
B
null
the expected result from this query would be:
with dummy as (
select 'A' as col1, 0 as col2
union all
select null, 1
union all
select 'B', null
union all
select 'B', null
)
select
count(1) as total
,count(col1) as col1
,array_agg(distinct col1) as dist_col1
--,object_construct(???) as col1_object_count
,count(col2) as col2
,array_agg(distinct col2) as dist_col2
--,object_construct(???) as col2_object_count
from
dummy
TOTAL
COL1
DIST_COL1
COL1_OBJECT_COUNT
COL2
DIST_COL2
COL2_OBJECT_COUNT
4
3
["A", "B"]
{"A": 1, "B", 2, null: 1}
2
[0, 1]
{0: 1, 1: 1, null: 2}
I've tried several functions inside OBJECT_CONSTRUCT mixed with ARRAY_AGG, but all failed
OBJECT_CONSTRUCT can work with several columns but only given all (*), if you try a select statement inside, it will fail
another issue is that analytical functions are not easily taken by the object or array functions in Snowflake.
You could use Snowflake Scripting or Snowpark for this but here's a solution that is somewhat flexible so you can apply it to different tables and column sets.
Create test table/view:
Create or Replace View dummy as (
select 'A' as col1, 0 as col2
union all
select null, 1
union all
select 'B', null
union all
select 'B', null
);
Set session variables for table and colnames.
set tbname = 'DUMMY';
set colnames = '["COL1", "COL2"]';
Create view that generates the required table_column_summary data:
Create or replace View table_column_summary as
with
-- Create table of required column names
cn as (
select VALUE::VARCHAR CNAME
from table(flatten(input => parse_json($colnames)))
)
-- Convert rows into objects
,ro as (
select
object_construct_keep_null(*) row_object
-- using identifier on session variable to dynamically supply table/view name
from identifier($tbname) )
-- Flatten row objects into key/values
,rof as (
select
key col_name,
ifnull(value,'null')::VARCHAR col_value
from ro, lateral flatten(input => row_object), cn
-- You will only need this filter if you need a subset
-- of columns from the source table/query summarised
where col_name = cn.cname)
-- Get the column value distinct value counts
,cdv as (
select col_name,
col_value,
sum(1) col_value_count
from rof
group by 1,2
)
-- and derive required column level stats and combine with cdv
,cv as (
select
(select count(1) from dummy) total,
col_name,
object_construct('COL_COUNT', count(col_value) ,
'COL_DIST', array_agg(distinct col_value),
'COL_OBJECT_COUNT', object_agg(col_value,col_value_count)) col_values
from cdv
group by 1,2)
-- Return result
Select * from cv;
Use this final query if you want a solution that works flexibility with any table/columns provided as input...
Select total, object_agg(col_name, col_values) col_values_obj
From table_column_summary
Group by 1;
Or use this final query if you want the fixed columns output as described in your question...
Select total,
COL1[0]:COL_COUNT COL1,
COL1[0]:COL_DIST DIST_COL1,
COL1[0]:COL_OBJECT_COUNT COL1_OBJECT_COUNT,
COL2[0]:COL_COUNT COL2,
COL2[0]:COL_DIST DIST_COL2,
COL2[0]:COL_OBJECT_COUNT COL2_OBJECT_COUNT
from table_column_summary
PIVOT ( ARRAY_AGG ( col_values )
FOR col_name IN ( 'COL1', 'COL2' ) ) as pt (total, col1, col2);
I want to count how many similar words I have in a path (which will be split at delimiter /) and return a matching array of integers.
Input data will be something like:
I want to add another column, match_count, with an array of integers. For example:
To replicate this case, this is the query I'm working with:
CREATE TEMP FUNCTION HOW_MANY_MATCHES_IN_PATH(src_path ARRAY<STRING>, test_path ARRAY<STRING>) RETURNS ARRAY<INTEGER> AS (
-- WHAT DO I PUT HERE?
);
SELECT
*,
HOW_MANY_MATCHES_IN_PATH(src_path, test_path) as dir_path_match_count
FROM (
SELECT
ARRAY_AGG(x) AS src_path,
ARRAY_AGG(y) as test_path
FROM
UNNEST([
'lib/client/core.js',
'lib/server/core.js'
]) AS x, UNNEST([
'test/server/core.js'
]) as y
)
I've tried working with ARRAY and UNNEST in the HOW_MANY_MATCHES_IN_PATH function, but I either end up with an error or an array of 4 items (in this example)
Consider below approach
create temp function how_many_matches_in_path(src_path string, test_path string) returns integer as (
(select count(distinct src)
from unnest(split(src_path, '/')) src,
unnest(split(test_path, '/')) test
where src = test)
);
select *,
array( select how_many_matches_in_path(src, test)
from t.src_path src with offset
join t.test_path test with offset
using(offset)
) dir_path_match_count
from your_table t
if to apply to sample of Input data in your question
with your_table as (
select
['lib/client/core.js', 'lib/server/core.js'] src_path,
['test/server/core.js', 'test/server/core.js'] test_path
)
output is
I am trying to install a function. I don't understand what the problem is. I can install correctly when:
I delete the pivot
I use the Table and not the unnest only (so from the table, unnest(a))
CREATE OR REPLACE FUNCTION `dataset.function_naming` (a ARRAY<STRUCT<ROW_ID STRING, KEY STRING, VALUE STRING>>, id_one STRING, id_two STRING, start_date DATE, end_date DATE) RETURNS INT64
AS (
with tmp1 as (
select ROW_ID,X,Y,Z,W
from
(
select prop.ROW_ID,prop.KEY, prop.VALUE
from unnest(a) prop
where prop.KEY in ('X','Y','Z','W')
)
PIVOT
(
MAX(VALUE)
FOR UPPER(KEY) in('X','Y','Z','W')
) as PIVOT
)
select case when X is not null then 1,
when Y is not null then 2,
when Z is not null then 2,
when W is not null then 2
else 0
from tmp1
);
Thanks all.
There are few minor issues I see in your code.
missing extra (...) around function body
extra commas (,) within case statement
So, try below
CREATE OR REPLACE FUNCTION `dataset.function_naming` (
a ARRAY<STRUCT<ROW_ID STRING, KEY STRING, VALUE STRING>>,
id_one STRING,
id_two STRING,
start_date DATE,
end_date DATE
) RETURNS INT64
AS ((
with tmp1 as (
select ROW_ID,X,Y,Z,W
from
(
select prop.ROW_ID,prop.KEY, prop.VALUE
from unnest(a) prop
where prop.KEY in ('X','Y','Z','W')
)
PIVOT
(
MAX(VALUE)
FOR UPPER(KEY) in('X','Y','Z','W')
) as PIVOT
)
select case when X is not null then 1
when Y is not null then 2
when Z is not null then 2
when W is not null then 2
else 0
end
from tmp1
));
Seams there is an internal issue when using pivots and the unnest on the array. You can use the following, that executes the same logic, and also, create an case on issue tracker, as a BigQuery issue with Google cloud Support.
CREATE OR REPLACE FUNCTION `<dataset>.function_naming` (
a ARRAY<STRUCT<ROW_ID STRING, KEY STRING, VALUE STRING>>,
id_one STRING,
id_two STRING,
start_date DATE,
end_date DATE
) RETURNS INT64
AS (( WITH tmp AS (
SELECT
CASE
WHEN KEY="X" THEN 1
WHEN KEY="Y" THEN 2
WHEN KEY="Z" THEN 2
WHEN KEY="W" THEN 2
ELSE
0
END
teste_column
#-- FROM ( SELECT UPPER(prop.KEY) KEY, MAX(prop.VALUE) VALUE FROM -- following your query patern, but not really necessary
FROM ( SELECT UPPER(prop.KEY) KEY FROM
UNNEST(a) prop
WHERE
UPPER(key) IN ('X', 'Y', 'Z', 'W')
GROUP BY key )
ORDER BY teste_column DESC LIMIT 1 )
SELECT * FROM tmp
UNION ALL
SELECT 0 teste_column
FROM (SELECT 1)
LEFT JOIN tmp
ON FALSE
WHERE NOT EXISTS ( SELECT 1 FROM tmp)
));
#--- Testing the function:
select `<project>.<dataset>.function_naming`([STRUCT("1" AS ROW_ID, "x" AS KEY, "10"AS VALUE), STRUCT("1" AS ROW_ID, "x" AS KEY, "20"AS VALUE), STRUCT("1" AS ROW_ID, "w" AS KEY, "20"AS VALUE), STRUCT("1" AS ROW_ID, "y" AS KEY, "20"AS VALUE)], "1", "2", "2022-12-10", "2022-12-10")
I am trying to create a column with a case statement, then concatenate the column. Here is an example code.
WITH base AS (
SELECT ID, Date, Action, case when (Date is null then Action || '**' else Action End) Action_with_no_date
FROM <Table_Name>
)
SELECT ID, "array_join"("array_agg"(DISTINCT Action_with_no_date), ', ') Action_with_no_date
FROM base
GROUP BY ID;
Basically, the Action_with_no_date will display the concatenation of values in Action with '**' string added to the values where Date is null for each ID
After I did this, I found an edge case.
If there is the same Action (i.e. play) taken for one ID, and if one action has date and the other one doesn't, then the output will have one play and one play** for the ID
However, I want this to display just one play with **.
Below is the example data for ID = 1
ID Date Action
1 1/2/22 read
1 1/3/22 play
1 NULL play
and expected result for the ID
ID Action_with_no_date
1 read, play**
How should I handle this?
You can calculate ** suffix if there is any row with null per id and action using analytic max() with case expression. Then concatenate suffix with action.
Demo:
with mytable as (
SELECT * FROM (
VALUES
(1, '1/2/22', 'read'),
(1, '1/3/22', 'play'),
(1, NULL, 'play')
) AS t (id, date, action)
)
select id, array_join(array_agg(DISTINCT action||suffix), ', ')
from
(
select id, date, action,
max(case when date is null then '**' else '' end) over(partition by id, action) as suffix
from mytable
)s
group by id
Result:
1 play**, read
I would like to obtain the first non-null, non-"undefined" value in a list of values as part of a window.
Minimal example:
Given the following code:
SELECT
FIRST_VALUE(
CASE WHEN val = "undefined" THEN NULL ELSE val END
IGNORE NULLS
)
OVER (ORDER BY order_key)
AS res
FROM (
SELECT 1 AS order_key, CAST(NULL AS STRING) AS val
UNION ALL
SELECT 2 AS order_key, "undefined" AS val
UNION ALL
SELECT 3 AS order_key, "value" AS val
) base
I'd expect
res
value
value
value
as the result set. Yet, the result given by the above is the following:
res
null
null
value
The documentation states the following:
FIRST_VALUE (value_expression [{RESPECT | IGNORE} NULLS])
Returns the value of the value_expression for the first row in the current window frame.
This function includes NULL values in the calculation unless IGNORE NULLS is present. If IGNORE NULLS is present, the function excludes NULL values from the calculation.
Yet it seems like value_expression is not what is tested for NULLs in this case.
It seems that instead FIRST_VALUE checks NULLs against the source field, not the CASE statement (effectively value_expression in the above).
While the problem can easily be fixed by doing the case as part of the subquery, I'd like to better understand why this is an issue. Why does FIRST_VALUE not ignore the NULLs provided through the CASE statement?
Alternative to the logic above:
If you are willing to remodel your query, instead of using a window function (FIRST_VALUE), the same effect can be achieved via an ARRAY_AGG(expr IGNORE NULLS ORDER BY ordering)[OFFSET(0)]:
SELECT
id,
ARRAY_AGG(
CASE WHEN val = 'undefined' THEN NULL ELSE val END
IGNORE NULLS
ORDER BY order_key
)[OFFSET(0)]
AS res
FROM (
SELECT 1 AS id, 1 AS order_key, CAST(NULL AS STRING) AS val
UNION ALL
SELECT 1 AS id, 2 AS order_key, 'undefined' AS val
UNION ALL
SELECT 1 AS id, 3 AS order_key, "value" AS val
UNION ALL
SELECT 2 AS id, 1 AS order_key, CAST(NULL AS STRING) AS val
UNION ALL
SELECT 2 AS id, 2 AS order_key, 'undefined' AS val
UNION ALL
SELECT 2 AS id, 3 AS order_key, "value" AS val
) base
GROUP BY id
Given an empty record set for the group, ARRAY_AGG(...)[OFFSET(0)] will return NULL
Given a non-empty record set for the group, ARRAY_AGG(...)[OFFSET(0)] will return the first result of value_expression that is non-NULL, ordered by the ORDER BY clause provided.
The only downside (beside maybe performance?) is that you'll need to create a common table expression with this logic and then join it with your table that was using window functions.
To get expected result you just need to add DESC to the ORDER BY as in below
SELECT
FIRST_VALUE(
CASE WHEN val = "undefined" THEN NULL ELSE val END
IGNORE NULLS
)
OVER (ORDER BY order_key DESC)
AS res
FROM (
SELECT 1 AS order_key, CAST(NULL AS STRING) AS val UNION ALL
SELECT 2 AS order_key, "undefined" AS val UNION ALL
SELECT 3 AS order_key, "value" AS val
) base
so the result now is