Google BigQuery Check If One Array is Superset/Subset of another - sql

Given two arrays in Google BigQuery, I need to figure out whether ALL elements in one array are contained in the other.
As an example, I am trying to complete the following query:
WITH
ArrayA AS (
SELECT
[1, 2, 3] arrA,
UNION ALL
SELECT
[4, 5, 6])
ArrayB AS (
SELECT
[1, 2, 3, 4, 5] arrB)
SELECT
*
FROM
ArrayA
CROSS JOIN
ArrayB
WHERE
<your code goes here>
such that the result looks like
arrA | arrB
[1,2,3] | [1,2,3,4,5]
, since [1,2,3,4,5] is a superset of [1,2,3] but not a superset of [4,5,6].
Many thanks in advance.

You can check for every item in arrA, and then get minimum of it.
If all the items of arrA in arrB, there will be 3 trues, so the minimum will be true.
If at least one of them is not in arrB, there will be 2 true and 1 false, so the minimum will be false.
WITH
ArrayA AS (
SELECT [1, 2, 3] arrA,
UNION ALL
SELECT [4, 5, 6]
),
ArrayB AS (
SELECT [1, 2, 3, 4, 5] arrB
)
SELECT
*,
(
SELECT min(a in UNNEST(arrB))
FROM UNNEST(arrA) as a
) as is_a_in_b
FROM ArrayA
CROSS JOIN ArrayB
You can also make it a function and use it in many places. Sorry for bad naming :)
CREATE TEMP FUNCTION is_array_in_array(subset ARRAY<int64>, main ARRAY<int64>) AS ((SELECT min(a in UNNEST(main)) FROM UNNEST(subset) as a));
WITH
ArrayA AS (
SELECT [1, 2, 3] arrA,
UNION ALL
SELECT [4, 5, 6]
),
ArrayB AS (
SELECT [1, 2, 3, 4, 5] arrB
)
SELECT
*,
is_array_in_array(arrA, arrB) as is_a_in_b
FROM ArrayA
CROSS JOIN ArrayB

I think these conditions do what you want:
WITH
ArrayA AS (
SELECT ARRAY[1, 2, 3] arrA,
UNION ALL
SELECT ARRAY[4, 5, 6]),
ArrayB AS (
SELECT ARRAY[1, 2, 3, 4, 5] arrB)
SELECT *
FROM ArrayA a CROSS JOIN
ArrayB b
WHERE NOT EXISTS (SELECT a_el
FROM UNNEST(a.arrA) a_el LEFT JOIN
UNNEST(b.arrB) b_el
ON a_el = b_el
WHERE b_el IS NULL
) AND
NOT EXISTS (SELECT COUNT(*)
FROM UNNEST(a.arrA) a_el LEFT JOIN
UNNEST(b.arrB) b_el
ON a_el = b_el
HAVING COUNT(*) <> COUNT(b_el)
) ;

Related

check if array is ordered subset postgres

how to check if an array is an ordered subset of another array in PostgreSQL?
[1, 2] ordered_subset [1, 4, 2] -> true
[1, 2] ordered_subset [2, 3, 1, 5] -> false
You can first filter out elements from the second array that do not belong to the first, and then check if the first is a subarray of the second by using generate_series:
select exists (select 1 from generate_series(1, array_length(t1.a3, 1)) v2
where t1.a1 = t1.a3[v2:v2+array_length(t1.a1, 1)-1]) from
(select t.a1, (select array_agg(v) from unnest(t.a2) v
where exists (select 1 from unnest(t.a1) v1 where v1 = v)) a3
from array_inp t
) t1
See fiddle.

Unrecognised name: when using UNNEST WHERE in BigQuery

I’m trying to filter during an unnest in BigQuery as per this blog, but I can’t get the pattern working.
The reproducible example in the blog works really nicely.
SELECT event_name, event_timestamp, user_pseudo_id,
event_params
FROM `firebase-public-project.analytics_153293282.events_20181003`
WHERE event_name = "level_complete_quickplay"
SELECT event_name, event_timestamp, user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params)
WHERE key = "value") AS score
FROM `firebase-public-project.analytics_153293282.events_20181003`
WHERE event_name = "level_complete_quickplay"
When I try this on my own table I get the error Unrecognised name:. I've tried to reproduce the error in a toy table, nested_seq.
WITH sequences AS (
SELECT
[0, 1, 1, 2, 3, 5] AS some_numbers
UNION ALL SELECT [2, 4, 8, 16, 32] AS some_numbers
UNION ALL SELECT [5, 10] AS some_numbers
),
-- table containing repeated record
nested_seq AS (
SELECT
some_numbers,
some_numbers[OFFSET(1)] AS offset_1,
some_numbers[ORDINAL(1)] AS ordinal_1
FROM sequences
)
-- transformation to extract single value from array
SELECT *
FROM nested_seq
LEFT JOIN (SELECT
some_numbers
FROM UNNEST(some_numbers)
WHERE some_numbers = 2)
-- Unrecognized name: some_numbers at [21:19]
What I'm expecting is that elements of some_numbers can be extracted so that I can unnest a nested array without increasing the number of rows.
Row
some_numbers
offset_1
ordinal_1
1
2
1
0
2
2
4
2
3
null
10
5
... elements of some_numbers can be extracted so that I can unnest a nested array without increasing the number of rows.
Consider below "fix"
WITH sequences AS (
SELECT
[0, 1, 1, 2, 3, 5] AS some_numbers
UNION ALL SELECT [2, 4, 8, 16, 32] AS some_numbers
UNION ALL SELECT [5, 10] AS some_numbers
)
SELECT # some_numbers,
(SELECT some_number
FROM t.some_numbers some_number
WHERE some_number = 2
) some_number,
some_numbers[OFFSET(1)] AS offset_1,
some_numbers[ORDINAL(1)] AS ordinal_1
FROM sequences t
with output

Presto: cast an integer array to string?

I have the following table:
my_id, my_array
1 , [5, 6, 3]
2 , [1, 5]
3. , [6, 7, 5]
Would it be possible to do a cast such that the output table would be something like:
my_id, my_str
1 , "5,6,3"
2 , "1,5"
3. , "6,7,5"
Or if there is any way I could directly group by my_array would be fine too. Thanks!
Use array_join function
select array_join(my_array,',') my_str
And of course you can group by array. This works:
select max(id) id , my_array
from
(select 1 id, array[5, 6, 3] as my_array) s
group by my_array

How do I add arrays in BigQuery SQL?

I have a UDF which returns a floating point array of the same size for each row of a table. How do I sum values of these arrays ?
In other words, how can I do something like this:
create temp function f(...)
returns array<float64>
...;
select sum(f(column)) from table
As the result of this operation I need to get another array of equal size where
result[i] = sum(over rows) f(row, column)[i]
Here is a function that uses ANY TYPE in order to support summing arrays of FLOAT64, INT64, or NUMERIC along with some sample input:
CREATE TEMP FUNCTION ElementWiseSum(arr1 ANY TYPE, arr2 ANY TYPE) AS (
ARRAY(SELECT x + arr2[OFFSET(off)] FROM UNNEST(arr1) AS x WITH OFFSET off ORDER BY off)
);
SELECT arr1, arr2, ElementWiseSum(arr1, arr2) AS result
FROM (
SELECT [1, 2, 3] AS arr1, [4, 5, 6] AS arr2 UNION ALL
SELECT [7, 8], [9, 10] UNION ALL
SELECT [], [] UNION ALL
SELECT [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]
);
It unnests arr1 using WITH OFFSET, then retrieves the equivalent element from arr2 using this offset, and orders by the offset to ensure that the element order is preserved.
Edit: to sum across rows, you can unnest the arrays, compute sums grouped by the offset of the elements, then reaggregate the sums into a new array:
SELECT
ARRAY_AGG(sum ORDER BY off) AS arr
FROM (
SELECT
off,
SUM(x) AS sum
FROM (
SELECT [1, 2, 3] AS arr UNION ALL
SELECT [7, 8, 9] UNION ALL
SELECT [4, 5, 6] UNION ALL
SELECT [10, 11, 12]
), UNNEST(arr) AS x WITH OFFSET off
GROUP BY off
);
So based on your comment, what you are looking for is the sum the values of all your arrays. This is how you can do it using UNNEST operator
WITH mydata AS (
SELECT [1.4, 1.3, 1.4, 1.1] as myarray
union all
SELECT [1.4, 1.3, 1.4, 1.1] as myarray
union all
SELECT [1.4, 1.3, 1.4, 1.1] as myarray
)
SELECT SUM(eachelement) from mydata, UNNEST(myarray) AS eachelement;
If you have your UDF defined (takes in a your column(s) and returns a float64 array of a pre-determined (or fixed) dimensions), you can use a simplified solution. For example in case of 3-d arrays, something like:
create temp function f(...)
returns array<float64>
...;
with dataset as (
select arr[offset(0)] as col_a, arr[offset(1)] as col_b, arr[offset(2)] as col_c
from (
select f(mycolumn) as arr
from `mydataset.mytable`
)
)
select [sum(col_a), sum(col_b), sum(col_c)] as new_array from dataset
This does not directly answer OP's question, but people landing on this page searching for "How do I add arrays in BigQuery SQL?" might benefit.
(Based on #elliott-brossard answer edit) In case you have 2 arrays, but 1 array includes a struct, you can use the following code to add them together:
WITH mydata AS (
SELECT
[1, 2, 3] AS arr
-- ,[7, 8, 9] AS arr2
,[
STRUCT(7 AS timeOnSite)
,STRUCT(8 AS timeOnSite)
,STRUCT(9 AS timeOnSite)
] AS arr2
)
SELECT
(
SELECT
ARRAY_AGG(sum ORDER BY off) AS arr
FROM (
SELECT
off,
SUM(x) AS sum
FROM (
SELECT arr UNION ALL
-- SELECT arr2
SELECT (SELECT ARRAY_AGG(t.timeOnSite) FROM UNNEST(arr2) AS t)
), UNNEST(arr) AS x WITH OFFSET off
GROUP BY off
)
) AS sum_arrays
FROM
mydata

Exclude matched array elements

How can I exclude matched elements of one array from another?
Postgres code:
a1 := '{1, 2, 5, 15}'::int[];
a2 := '{1, 2, 3, 6, 7, 9, 15}'::int[];
a3 := a2 ??magic_operator?? a1;
In a3 I expect exactly '{3, 6, 7, 9}'
Final Result
My and lad2025 solutions works fine.
Solution with array_position() required PostgreSQL 9.5 and later, executes x3 faster.
It looks like XOR between arrays:
WITH set1 AS
(
SELECT * FROM unnest('{1, 2, 5, 15}'::int[])
), set2 AS
(
SELECT * FROM unnest('{1, 2, 3, 6, 7, 9, 15}'::int[])
), xor AS
(
(SELECT * FROM set1
UNION
SELECT * FROM set2)
EXCEPT
(SELECT * FROM set1
INTERSECT
SELECT * FROM set2)
)
SELECT array_agg(unnest ORDER BY unnest)
FROM xor
Output:
"{3,5,6,7,9}"
How it works:
Unnest both arrays
Calculate SUM
Calculate INTERSECT
From SUM - INTERSECT
Combine to array
Alternatively you could use sum of both minus(except) operations:
(A+B) - (A^B)
<=>
(A-B) + (B-A)
Utilizing FULL JOIN:
WITH set1 AS
(
SELECT *
FROM unnest('{1, 2, 5, 15}'::int[])
), set2 AS
(
SELECT *
FROM unnest('{1, 2, 3, 6, 7, 9, 15}'::int[])
)
SELECT array_agg(COALESCE(s1.unnest, s2.unnest)
ORDER BY COALESCE(s1.unnest, s2.unnest))
FROM set1 s1
FULL JOIN set2 s2
ON s1.unnest = s2.unnest
WHERE s1.unnest IS NULL
OR s2.unnest IS NULL;
EDIT:
If you want only elements from second array that are not is first use simple EXCEPT:
SELECT array_agg(unnest ORDER BY unnest)
FROM (SELECT * FROM unnest('{1, 2, 3, 6, 7, 9, 15}'::int[])
EXCEPT
SELECT * FROM unnest('{1, 2, 5, 15}'::int[])) AS sub
Output:
"{3,6,7,9}"
The additional module intarray provides a simple and fast subtraction operator - for integer arrays, exactly the magic_operator you are looking for:
test=# SELECT '{1, 2, 3, 6, 7, 9, 15}'::int[] - '{1, 2, 5, 15}'::int[] AS result;
?column?
-----------
{3,6,7,9}
You need to install the module once per database:
CREATE EXTENSION intarray;
It also provides special operator classes for indexes:
Postgresql intarray error: undefined symbol: pfree
Note that it only works for:
... null-free arrays of integers.
I found a little similar case and modify.
That SQL solve my case.
with elements (element) as (
select unnest(ARRAY[1, 2, 3, 6, 7, 9, 15])
)
select array_agg(element)
from elements
where array_position(ARRAY[1, 2, 5, 15],element) is null
PostgreSQL 9.5 and later required.