check if array is ordered subset postgres - sql

how to check if an array is an ordered subset of another array in PostgreSQL?
[1, 2] ordered_subset [1, 4, 2] -> true
[1, 2] ordered_subset [2, 3, 1, 5] -> false

You can first filter out elements from the second array that do not belong to the first, and then check if the first is a subarray of the second by using generate_series:
select exists (select 1 from generate_series(1, array_length(t1.a3, 1)) v2
where t1.a1 = t1.a3[v2:v2+array_length(t1.a1, 1)-1]) from
(select t.a1, (select array_agg(v) from unnest(t.a2) v
where exists (select 1 from unnest(t.a1) v1 where v1 = v)) a3
from array_inp t
) t1
See fiddle.

Related

How to get the difference between the frequency of two fields?

I have a PostgreSQL database table that contains two columns a and b.
When I query all the entries of the table I get:
{1, 2},
{2, 3},
{2, 3}
So the value:
(1) appeared in field a 1 time and in field b 0 times
(2) appeared in field a 2 times and in field b 1 time
(3) appeared in field a 0 times and in field b 2 times
I want to get the following output:
{1, 1},
{2, 1},
{3, -2}
where the first field is the value stored in the database and the second field is difference.
How can I achieve that?
I first query the database and the result is in query_result
then I get the frequency of the first and second element:
f0 = query_result |> Enum.frequencies_by(&elem(&1, 0))
f1 = query_result |> Enum.frequencies_by(&elem(&1, 1))
(f0 |> Map.keys) ++ (f1 |> Map.keys) |> Enum.uniq |> Enum.into(%{}, fn key -> {key, (f0[key] || 0) - (f1[key] || 0)} end)
I am looking for a simpler way to do this.
Use a single query:
SELECT val, COALESCE(a.ct, 0) - COALESCE(b.ct, 0) AS freq_diff
FROM (
SELECT a AS val, count(*) AS ct
FROM tbl
GROUP BY 1
) a
FULL JOIN (
SELECT b AS val, count(*) AS ct
FROM tbl
GROUP BY 1
) b USING (val);
fiddle
FULL [OUTER] JOIN, because either value may be missing in the other column.
COALESCE to defend against NULL values resulting from the join.

How to aggregate arrays element by element in BigQuery?

In BigQquery how can I aggregate arrays element by element ?
For instance if I have this table
id
array_value
1
[1, 2, 3]
2
[4, 5, 6]
3
[7, 8, 9]
I want to sum all the vector element-wise and output [1+4+7, 2+5+8, 3+6+9] = [12, 15, 18]
I can SUM float fields with SELECT SUM(float_field) FROM table but when I try to apply the SUM on an array I get
No matching signature for aggregate function SUM for argument types: ARRAY.
Supported signatures: SUM(INT64); SUM(FLOAT64); SUM(NUMERIC); SUM(BIGNUMERIC) at [1:8]
I have found ARRAY_AGG in the doc but it is not what I want: it just creates an array from values.
I think you want:
select array_agg(sum_val order by id) as res
from (
select idx, sum(val) as sum_val
from mytable t
cross join unnest(t.array_value) as val with offset as idx
group by idx
) t
I think you want:
select array_agg(sum_val)
from (select (select sum(val)
from unnest(t.array_value) val
) as sum_val
from t
) x
I think technically you simply refer to the individual values in the arrays using offset() or safe_offset() in case there might be missing values
-- example data
with temp as (
select * from unnest([
struct(1 as id, [1, 2, 3] as array_value),
(2, [4,5,6]),
(3, [7,8])
])
)
-- actual query
select
[
SUM( array_value[safe_offset(0)] ),
SUM( array_value[safe_offset(1)] ),
SUM( array_value[safe_offset(2)] )
] as result_array
from temp
I put them in a result array, but you don't have to do that. I had the last array missing one value to show that the query doesn't break. If you want it to break you should use offset() without the 'safe_'
Below is for BigQuery Standard SQL
select array_agg(val order by offset)
from (
select offset, sum(val) as val
from `project.dataset.table` t,
unnest(array_value) as val with offset
group by offset
)

How to get all the elements in an array but not in another array in HIVE?

For example, I have two columns of arrays now:
id col1 col2
A [1, 3] [1, 2, 3]
B [2] [1, 2, 3]
what I want is all the elements in col2 but not in col1:
id output
A [2]
B [1, 3]
How can I achieve this?
Explode col2 array, use array_contains to check each element is in another array, collect array again for elements not in col1 array
select t.id,
collect_set(case when array_contains(t.col1, e.elem) then NULL else e.elem end) as result
from my_table t
lateral view explode(t.col2) e as elem
group by t.id

Google BigQuery Check If One Array is Superset/Subset of another

Given two arrays in Google BigQuery, I need to figure out whether ALL elements in one array are contained in the other.
As an example, I am trying to complete the following query:
WITH
ArrayA AS (
SELECT
[1, 2, 3] arrA,
UNION ALL
SELECT
[4, 5, 6])
ArrayB AS (
SELECT
[1, 2, 3, 4, 5] arrB)
SELECT
*
FROM
ArrayA
CROSS JOIN
ArrayB
WHERE
<your code goes here>
such that the result looks like
arrA | arrB
[1,2,3] | [1,2,3,4,5]
, since [1,2,3,4,5] is a superset of [1,2,3] but not a superset of [4,5,6].
Many thanks in advance.
You can check for every item in arrA, and then get minimum of it.
If all the items of arrA in arrB, there will be 3 trues, so the minimum will be true.
If at least one of them is not in arrB, there will be 2 true and 1 false, so the minimum will be false.
WITH
ArrayA AS (
SELECT [1, 2, 3] arrA,
UNION ALL
SELECT [4, 5, 6]
),
ArrayB AS (
SELECT [1, 2, 3, 4, 5] arrB
)
SELECT
*,
(
SELECT min(a in UNNEST(arrB))
FROM UNNEST(arrA) as a
) as is_a_in_b
FROM ArrayA
CROSS JOIN ArrayB
You can also make it a function and use it in many places. Sorry for bad naming :)
CREATE TEMP FUNCTION is_array_in_array(subset ARRAY<int64>, main ARRAY<int64>) AS ((SELECT min(a in UNNEST(main)) FROM UNNEST(subset) as a));
WITH
ArrayA AS (
SELECT [1, 2, 3] arrA,
UNION ALL
SELECT [4, 5, 6]
),
ArrayB AS (
SELECT [1, 2, 3, 4, 5] arrB
)
SELECT
*,
is_array_in_array(arrA, arrB) as is_a_in_b
FROM ArrayA
CROSS JOIN ArrayB
I think these conditions do what you want:
WITH
ArrayA AS (
SELECT ARRAY[1, 2, 3] arrA,
UNION ALL
SELECT ARRAY[4, 5, 6]),
ArrayB AS (
SELECT ARRAY[1, 2, 3, 4, 5] arrB)
SELECT *
FROM ArrayA a CROSS JOIN
ArrayB b
WHERE NOT EXISTS (SELECT a_el
FROM UNNEST(a.arrA) a_el LEFT JOIN
UNNEST(b.arrB) b_el
ON a_el = b_el
WHERE b_el IS NULL
) AND
NOT EXISTS (SELECT COUNT(*)
FROM UNNEST(a.arrA) a_el LEFT JOIN
UNNEST(b.arrB) b_el
ON a_el = b_el
HAVING COUNT(*) <> COUNT(b_el)
) ;

How do I add arrays in BigQuery SQL?

I have a UDF which returns a floating point array of the same size for each row of a table. How do I sum values of these arrays ?
In other words, how can I do something like this:
create temp function f(...)
returns array<float64>
...;
select sum(f(column)) from table
As the result of this operation I need to get another array of equal size where
result[i] = sum(over rows) f(row, column)[i]
Here is a function that uses ANY TYPE in order to support summing arrays of FLOAT64, INT64, or NUMERIC along with some sample input:
CREATE TEMP FUNCTION ElementWiseSum(arr1 ANY TYPE, arr2 ANY TYPE) AS (
ARRAY(SELECT x + arr2[OFFSET(off)] FROM UNNEST(arr1) AS x WITH OFFSET off ORDER BY off)
);
SELECT arr1, arr2, ElementWiseSum(arr1, arr2) AS result
FROM (
SELECT [1, 2, 3] AS arr1, [4, 5, 6] AS arr2 UNION ALL
SELECT [7, 8], [9, 10] UNION ALL
SELECT [], [] UNION ALL
SELECT [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]
);
It unnests arr1 using WITH OFFSET, then retrieves the equivalent element from arr2 using this offset, and orders by the offset to ensure that the element order is preserved.
Edit: to sum across rows, you can unnest the arrays, compute sums grouped by the offset of the elements, then reaggregate the sums into a new array:
SELECT
ARRAY_AGG(sum ORDER BY off) AS arr
FROM (
SELECT
off,
SUM(x) AS sum
FROM (
SELECT [1, 2, 3] AS arr UNION ALL
SELECT [7, 8, 9] UNION ALL
SELECT [4, 5, 6] UNION ALL
SELECT [10, 11, 12]
), UNNEST(arr) AS x WITH OFFSET off
GROUP BY off
);
So based on your comment, what you are looking for is the sum the values of all your arrays. This is how you can do it using UNNEST operator
WITH mydata AS (
SELECT [1.4, 1.3, 1.4, 1.1] as myarray
union all
SELECT [1.4, 1.3, 1.4, 1.1] as myarray
union all
SELECT [1.4, 1.3, 1.4, 1.1] as myarray
)
SELECT SUM(eachelement) from mydata, UNNEST(myarray) AS eachelement;
If you have your UDF defined (takes in a your column(s) and returns a float64 array of a pre-determined (or fixed) dimensions), you can use a simplified solution. For example in case of 3-d arrays, something like:
create temp function f(...)
returns array<float64>
...;
with dataset as (
select arr[offset(0)] as col_a, arr[offset(1)] as col_b, arr[offset(2)] as col_c
from (
select f(mycolumn) as arr
from `mydataset.mytable`
)
)
select [sum(col_a), sum(col_b), sum(col_c)] as new_array from dataset
This does not directly answer OP's question, but people landing on this page searching for "How do I add arrays in BigQuery SQL?" might benefit.
(Based on #elliott-brossard answer edit) In case you have 2 arrays, but 1 array includes a struct, you can use the following code to add them together:
WITH mydata AS (
SELECT
[1, 2, 3] AS arr
-- ,[7, 8, 9] AS arr2
,[
STRUCT(7 AS timeOnSite)
,STRUCT(8 AS timeOnSite)
,STRUCT(9 AS timeOnSite)
] AS arr2
)
SELECT
(
SELECT
ARRAY_AGG(sum ORDER BY off) AS arr
FROM (
SELECT
off,
SUM(x) AS sum
FROM (
SELECT arr UNION ALL
-- SELECT arr2
SELECT (SELECT ARRAY_AGG(t.timeOnSite) FROM UNNEST(arr2) AS t)
), UNNEST(arr) AS x WITH OFFSET off
GROUP BY off
)
) AS sum_arrays
FROM
mydata