SQL Array with Null - sql

I'm trying to group BigQuery columns using an array like so:
with test as (
select 1 as A, 2 as B
union all
select 3, null
)
select *,
[A,B] as grouped_columns
from test
However, this won't work, since there is a null value in column B row 2.
In fact this won't work either:
select [1, null] as test_array
When reading the documentation on BigQuery though, it says Nulls should be allowed.
In BigQuery, an array is an ordered list consisting of zero or more
values of the same data type. You can construct arrays of simple data
types, such as INT64, and complex data types, such as STRUCTs. The
current exception to this is the ARRAY data type: arrays of arrays are
not supported. Arrays can include NULL values.
There doesn't seem to be any attributes or safe prefix to be used with ARRAY() to handle nulls.
So what is the best approach for this?

Per documentation - for Array type
Currently, BigQuery has two following limitations with respect to NULLs and ARRAYs:
BigQuery raises an error if query result has ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.
BigQuery translates NULL ARRAY into empty ARRAY in the query result, although inside the query NULL and empty ARRAYs are two distinct values.
So, as of your example - you can use below "trick"
with test as (
select 1 as A, 2 as B union all
select 3, null
)
select *,
array(select cast(el as int64) el
from unnest(split(translate(format('%t', t), '()', ''), ', ')) el
where el != 'NULL'
) as grouped_columns
from test t
above gives below output
Note: above approach does not require explicit referencing to all involved columns!

My current solution---and I'm not a fan of it---is to use a combo of IFNULL(), UNNEST() and ARRAY() like so:
select
*,
array(
select *
from unnest(
[
ifnull(A, ''),
ifnull(B, '')
]
) as grouping
where grouping <> ''
) as grouped_columns
from test

An alternative way, you can replace NULL value to some NON-NULL figures using function IFNULL(null, 0) as given below:-
with test as (
select 1 as A, 2 as B
union all
select 3, IFNULL(null, 0)
)
select *,
[A,B] as grouped_columns
from test

Related

Is there a way of replace null values in a JSON bigquery field?

I have a JSON value like the one below in a certain column of my table:
{"values":[1, 2, null, 4, null]}
What I want is to convert the value in a bigquery ARRAY: ARRAY<INT64>
I tried JSON_VALUE_ARRAY but it throws an error because the final output cannot be anarray with NULLs.
Said that, what should be the correct approach for that?
You can unnest an array with null elements. For building a new array you can provided the flag ignore nulls to remove null values.
with tbl as (select JSON '{"values":[1, 2, null, 4, null]}' as data union all select JSON ' {"values":[ ] }')
select *,
((Select array_agg(x ignore nulls) from unnest(JSON_VALUE_ARRAY (data.values ) ) x))
from tbl

Extract a string in Postgresql and remove null/empty elements

I Need to extract values from string with Postgresql
But for my special scenario - if an element value is null i want to remove it and bring the next element 1 index closer.
e.g.
assume my string is: "a$$b"
If i will use
select string_to_array('a$$b','$')
The result is:
{a,,b}
If Im trying
SELECT unnest(string_to_array('a__b___d_','_')) EXCEPT SELECT ''
It changes the order
1.d
2.a
3.b
order changes which is bad for me.
I have found a other solution with:
select array_remove( string_to_array(a||','||b||','||c,',') , '')
from (
select
split_part('a__b','_',1) a,
split_part('a__b','_',2) b,
split_part('a__b','_',3) c
) inn
Returns
{a,b}
And then from the Array - i need to extract values by index
e.g. Extract(ARRAY,2)
But this one seems to me like an overkill - is there a better or something simpler to use ?
You can use with ordinality to preserve the index information during unnesting:
select a.c
from unnest(string_to_array('a__b___d_','_')) with ordinality as a(c,idx)
where nullif(trim(c), '') is not null
order by idx;
If you want that back as an array:
select array_agg(a.c order by a.idx)
from unnest(string_to_array('a__b___d_','_')) with ordinality as a(c,idx)
where nullif(trim(c), '') is not null;

How to aggragate integers in postgresql?

I have a query that gives list of IDs:
ID
2
3
4
5
6
25
ID is integer.
I want to get that result like that in ARRAY of integers type:
ID
2,3,4,5,6,25
I wrote this query:
select string_agg(ID::text,',')
from A
where .....
I have to convert it to text otherwise it won't work. string_agg expect to get (text,text)
this works fine the thing is that this result should later be used in many places that expect ARRAY of integers.
I tried :
select ('{' || string_agg(ID::text,',') || '}')::integer[]
from A
WHERE ...
which gives: {2,3,4,5,6,25} in type int4 integer[]
but this isn't the correct type... I need the same type as ARRAY.
for example SELECT ARRAY[4,5] gives array integer[]
in simple words I want the result of my query to work with (for example):
select *
from b
where b.ID = ANY (FIRST QUERY RESULT) // aka: = ANY (ARRAY[2,3,4,5,6,25])
this is failing as ANY expect array and it doesn't work with regular integer[], i get an error:
ERROR: operator does not exist: integer = integer[]
note: the result of the query is part of a function and will be saved in a variable for later work. Please don't take it to places where you bypass the problem and offer a solution which won't give the ARRAY of Integers.
EDIT: why does
select *
from b
where b.ID = ANY (array [4,5])
is working. but
select *
from b
where b.ID = ANY(select array_agg(ID) from A where ..... )
doesn't work
select *
from b
where b.ID = ANY(select array_agg(4))
doesn't work either
the error is still:
ERROR: operator does not exist: integer = integer[]
Expression select array_agg(4) returns set of rows (actually set of rows with 1 row). Hence the query
select *
from b
where b.id = any (select array_agg(4)) -- ERROR
tries to compare an integer (b.id) to a value of a row (which has 1 column of type integer[]). It raises an error.
To fix it you should use a subquery which returns integers (not arrays of integers):
select *
from b
where b.id = any (select unnest(array_agg(4)))
Alternatively, you can place the column name of the result of select array_agg(4) as an argument of any, e.g.:
select *
from b
cross join (select array_agg(4)) agg(arr)
where b.id = any (arr)
or
with agg as (
select array_agg(4) as arr)
select *
from b
cross join agg
where b.id = any (arr)
More formally, the first two queries use ANY of the form:
expression operator ANY (subquery)
and the other two use
expression operator ANY (array expression)
like it is described in the documentation: 9.22.4. ANY/SOME
and 9.23.3. ANY/SOME (array).
How about this query? Does this give you the expected result?
SELECT *
FROM b b_out
WHERE EXISTS (SELECT 1
FROM b b_in
WHERE b_out.id = b_in.id
AND b_in.id IN (SELECT <<first query that returns 2,3,4,...>>))
What I've tried to do is to break down the logic of ANY into two separate logical checks in order to achieve the same result.
Hence, ANY would be equivalent with a combination of EXISTS at least one of the values IN your list of values returned by the first SELECT.

Selecting data into a Postgres array

I have the following data:
name id url
John 1 someurl.com
Matt 2 cool.com
Sam 3 stackoverflow.com
How can I write an SQL statement in Postgres to select this data into a multi-dimensional array, i.e.:
{{John, 1, someurl.com}, {Matt, 2, cool.com}, {Sam, 3, stackoverflow.com}}
I've seen this kind of array usage before in Postgres but have no idea how to select data from a table into this array format.
Assuming here that all the columns are of type text.
You cannot use array_agg() to produce multi-dimensional arrays, at least not up to PostgreSQL 9.4.
(But the upcoming Postgres 9.5 ships a new variant of array_agg() that can!)
What you get out of #Matt Ball's query is an array of records (the_table[]).
An array can only hold elements of the same base type. You obviously have number and string types. Convert all columns (that aren't already) to text to make it work.
You can create an aggregate function for this like I demonstrated to you here before.
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
,STYPE = anyarray
,INITCOND = '{}'
);
Call:
SELECT array_agg_mult(ARRAY[ARRAY[name, id::text, url]]) AS tbl_mult_arr
FROM tbl;
Note the additional ARRAY[] layer to make it a multidimensional array (2-dimenstional, to be precise).
Instant demo:
WITH tbl(id, txt) AS (
VALUES
(1::int, 'foo'::text)
,(2, 'bar')
,(3, '}b",') -- txt has meta-characters
)
, x AS (
SELECT array_agg_mult(ARRAY[ARRAY[id::text,txt]]) AS t
FROM tbl
)
SELECT *, t[1][3] AS arr_element_1_1, t[3][4] AS arr_element_3_2
FROM x;
You need to use an aggregate function; array_agg should do what you need.
SELECT array_agg(s) FROM (SELECT name, id, url FROM the_table ORDER BY id) AS s;

PostgreSQL two dimensional array intersection

is it possible with postgres tools to intersect two-dimensional arrays?
For instance I have:
id arr
1 {{1,2}, {3,4}, {4,5}, {4,7}}
2 {{4,2}, {7,4}, {8,5}, {9,7}}
...
And I want to get all records that have {4,5} in theirs array, record id=1 here.
If I do something like:
select * from table where arr && '{4,5}'
I'll get both of those example records. It finds all records where are 4 and 5 are anywhere in the array - as if it was looking in fully extracted arrays, like {1,2,3,4,4,5,4,7}.
I thought of using functions from the extension intarray, but:
Many of these operations are only sensible for one-dimensional arrays.
Although they will accept input arrays of more dimensions, the data is
treated as though it were a linear array in storage order.
My other ideas:
Quick and dirty
WITH x(id, arr) AS (VALUES
(1, '{{1,2}, {3,4}, {4,5}, {4,7}}'::int[])
,(2, '{{4,2}, {7,4}, {8,5}, {9,7}}')
)
SELECT *
FROM x
WHERE arr::text LIKE '%{4,5}%';
The simple trick is to transform the array to its text representation and check with LIKE.
With array-functions
WITH x(id, arr) AS (
VALUES
(1, '{{1,2}, {3,4}, {4,5}, {4,7}}'::int[])
,(2, '{{4,2}, {7,4}, {8,5}, {9,7}}')
)
,y AS (
SELECT id, arr, generate_subscripts(arr, 1) AS i
FROM x
)
SELECT id, arr
FROM y
WHERE arr[i:i] = '{{4,5}}'::int[];
Or, the same in different notation, building on your example:
SELECT id, arr
FROM (
SELECT id, arr, generate_subscripts(arr, 1) AS i
FROM tbl
) x
WHERE arr[i:i] = '{{4,5}}'::int[];
This generates the array subscripts for the first dimension. With the help of this set-returning function we check each array-slice. Note the double brackets on the right hand side of the expression to create a two-dimensional array that matches.
If {4,5} can be in arr multiple times, add GROUP BY or a DISTINCT clause to avoid duplicate rows.