Selecting data into a Postgres array - sql

I have the following data:
name id url
John 1 someurl.com
Matt 2 cool.com
Sam 3 stackoverflow.com
How can I write an SQL statement in Postgres to select this data into a multi-dimensional array, i.e.:
{{John, 1, someurl.com}, {Matt, 2, cool.com}, {Sam, 3, stackoverflow.com}}
I've seen this kind of array usage before in Postgres but have no idea how to select data from a table into this array format.
Assuming here that all the columns are of type text.

You cannot use array_agg() to produce multi-dimensional arrays, at least not up to PostgreSQL 9.4.
(But the upcoming Postgres 9.5 ships a new variant of array_agg() that can!)
What you get out of #Matt Ball's query is an array of records (the_table[]).
An array can only hold elements of the same base type. You obviously have number and string types. Convert all columns (that aren't already) to text to make it work.
You can create an aggregate function for this like I demonstrated to you here before.
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
,STYPE = anyarray
,INITCOND = '{}'
);
Call:
SELECT array_agg_mult(ARRAY[ARRAY[name, id::text, url]]) AS tbl_mult_arr
FROM tbl;
Note the additional ARRAY[] layer to make it a multidimensional array (2-dimenstional, to be precise).
Instant demo:
WITH tbl(id, txt) AS (
VALUES
(1::int, 'foo'::text)
,(2, 'bar')
,(3, '}b",') -- txt has meta-characters
)
, x AS (
SELECT array_agg_mult(ARRAY[ARRAY[id::text,txt]]) AS t
FROM tbl
)
SELECT *, t[1][3] AS arr_element_1_1, t[3][4] AS arr_element_3_2
FROM x;

You need to use an aggregate function; array_agg should do what you need.
SELECT array_agg(s) FROM (SELECT name, id, url FROM the_table ORDER BY id) AS s;

Related

SQL Array with Null

I'm trying to group BigQuery columns using an array like so:
with test as (
select 1 as A, 2 as B
union all
select 3, null
)
select *,
[A,B] as grouped_columns
from test
However, this won't work, since there is a null value in column B row 2.
In fact this won't work either:
select [1, null] as test_array
When reading the documentation on BigQuery though, it says Nulls should be allowed.
In BigQuery, an array is an ordered list consisting of zero or more
values of the same data type. You can construct arrays of simple data
types, such as INT64, and complex data types, such as STRUCTs. The
current exception to this is the ARRAY data type: arrays of arrays are
not supported. Arrays can include NULL values.
There doesn't seem to be any attributes or safe prefix to be used with ARRAY() to handle nulls.
So what is the best approach for this?
Per documentation - for Array type
Currently, BigQuery has two following limitations with respect to NULLs and ARRAYs:
BigQuery raises an error if query result has ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.
BigQuery translates NULL ARRAY into empty ARRAY in the query result, although inside the query NULL and empty ARRAYs are two distinct values.
So, as of your example - you can use below "trick"
with test as (
select 1 as A, 2 as B union all
select 3, null
)
select *,
array(select cast(el as int64) el
from unnest(split(translate(format('%t', t), '()', ''), ', ')) el
where el != 'NULL'
) as grouped_columns
from test t
above gives below output
Note: above approach does not require explicit referencing to all involved columns!
My current solution---and I'm not a fan of it---is to use a combo of IFNULL(), UNNEST() and ARRAY() like so:
select
*,
array(
select *
from unnest(
[
ifnull(A, ''),
ifnull(B, '')
]
) as grouping
where grouping <> ''
) as grouped_columns
from test
An alternative way, you can replace NULL value to some NON-NULL figures using function IFNULL(null, 0) as given below:-
with test as (
select 1 as A, 2 as B
union all
select 3, IFNULL(null, 0)
)
select *,
[A,B] as grouped_columns
from test

How to remove elements of array in PostgreSQL?

Is it possible to remove multiple elements from an array?
Before removing elements Array1 is :
{1,2,3,4}
Array2 that contains some elements I wish to remove:
{1,4}
And I want to get:
{2,3}
How to operate?
Use unnest() with array_agg(), e.g.:
with cte(array1, array2) as (
values (array[1,2,3,4], array[1,4])
)
select array_agg(elem)
from cte, unnest(array1) elem
where elem <> all(array2);
array_agg
-----------
{2,3}
(1 row)
If you often need this functionality, define the simple function:
create or replace function array_diff(array1 anyarray, array2 anyarray)
returns anyarray language sql immutable as $$
select coalesce(array_agg(elem), '{}')
from unnest(array1) elem
where elem <> all(array2)
$$;
You can use the function for any array, not only int[]:
select array_diff(array['a','b','c','d'], array['a','d']);
array_diff
------------
{b,c}
(1 row)
With some help from this post:
select array_agg(elements) from
(select unnest('{1,2,3,4}'::int[])
except
select unnest('{1,4}'::int[])) t (elements)
Result:
{2,3}
With the intarray extension, you can simply use -:
select '{1,2,3,4}'::int[] - '{1,4}'::int[]
Result:
{2,3}
Online demonstration
You'll need to install the intarray extension if you didn't already. It adds many convenient functions and operators if you're dealing with arrays of integers.
This answer is the simplest I think:
https://stackoverflow.com/a/6535089/673187
SELECT array(SELECT unnest(:array1) EXCEPT SELECT unnest(:array2));
so you can easily use it in an UPDATE command, when you need to remove some elements from an array column:
UPDATE table1 SET array1_column=(SELECT array(SELECT unnest(array1_column) EXCEPT SELECT unnest('{2, 3}'::int[])));
You can use this function for when you are dealing with bigint/int8 numbers and want to maintain order:
CREATE OR REPLACE FUNCTION arr_subtract(int8[], int8[])
RETURNS int8[] AS
$func$
SELECT ARRAY(
SELECT a
FROM unnest($1) WITH ORDINALITY x(a, ord)
WHERE a <> ALL ($2)
ORDER BY ord
);
$func$ LANGUAGE sql IMMUTABLE;
I got this solution from the following answer to a similar question: https://stackoverflow.com/a/8584080/1544473
User array re-dimension annotation
array[<start index>:<end index>]
WITH t(stack, dim) as (
VALUES(ARRAY[1,2,3,4], ARRAY[1,4])
) SELECT stack[dim[1]+1:dim[2]-1] FROM t

How to get the first field from an anonymous row type in PostgreSQL 9.4?

=# select row(0, 1) ;
row
-------
(0,1)
(1 row)
How to get 0 within the same query? I figured the below sort of working but is there any simple way?
=# select json_agg(row(0, 1))->0->'f1' ;
?column?
----------
0
(1 row)
No luck with array-like syntax [0].
Thanks!
Your row type is anonymous and therefore you cannot access its elements easily. What you can do is create a TYPE and then cast your anonymous row to that type and access the elements defined in the type:
CREATE TYPE my_row AS (
x integer,
y integer
);
SELECT (row(0,1)::my_row).x;
Like Craig Ringer commented in your question, you should avoid producing anonymous rows to begin with, if you can help it, and type whatever data you use in your data model and queries.
If you just want the first element from any row, convert the row to JSON and select f1...
SELECT row_to_json(row(0,1))->'f1'
Or, if you are always going to have two integers or a strict structure, you can create a temporary table (or type) and a function that selects the first column.
CREATE TABLE tmptable(f1 int, f2 int);
CREATE FUNCTION gettmpf1(tmptable) RETURNS int AS 'SELECT $1.f1' LANGUAGE SQL;
SELECT gettmpf1(ROW(0,1));
Resources:
https://www.postgresql.org/docs/9.2/static/functions-json.html
https://www.postgresql.org/docs/9.2/static/sql-expressions.html
The json solution is very elegant. Just for fun, this is a solution using regexp (much uglier):
WITH r AS (SELECT row('quotes, "commas",
and a line break".',null,null,'"fourth,field"')::text AS r)
--WITH r AS (SELECT row('',null,null,'')::text AS r)
--WITH r AS (SELECT row(0,1)::text AS r)
SELECT CASE WHEN r.r ~ '^\("",' THEN ''
WHEN r.r ~ '^\("' THEN regexp_replace(regexp_replace(regexp_replace(right(r.r, -2), '""', '\"', 'g'), '([^\\])",.*', '\1'), '\\"', '"', 'g')
ELSE (regexp_matches(right(r.r, -1), '^[^,]*'))[1] END
FROM r
When converting a row to text, PostgreSQL uses quoted CSV formatting. I couldn't find any tools for importing quoted CSV into an array, so the above is a crude text manipulation via mostly regular expressions. Maybe someone will find this useful!
With Postgresql 13+, you can just reference individual elements in the row with .fN notation. For your example:
select (row(0, 1)).f1; --> returns 0.
See https://www.postgresql.org/docs/13/sql-expressions.html#SQL-SYNTAX-ROW-CONSTRUCTORS

Vector (array) addition in Postgres

I have a column with numeric[] values which all have the same size. I'd like to take their element-wise average. By this I mean that the average of
{1, 2, 3}, {-1, -2, -3}, and {3, 3, 3}
should be {1, 1, 1}. Also of interest is how to sum these element-wise, although I expect that any solution for one will be a solution for the other.
(NB: The length of the arrays is fixed within a single table, but may vary between tables. So I need a solution which doesn't assume a certain length.)
My initial guess is that I should be using unnest somehow, since unnest applied to a numeric[] column flattens out all the arrays. So I'd like to think that there's a nice way to use this with some sort of windowing function + group by to pick out the individual components of each array and sum them.
-- EXAMPLE DATA
CREATE TABLE A
(vector numeric[])
;
INSERT INTO A
VALUES
('{1, 2, 3}'::numeric[])
,('{-1, -2, -3}'::numeric[])
,('{3, 3, 3}'::numeric[])
;
I've written an extension to do vector addition (and subtraction, multiplication, division, and powers) with fast C functions. You can find it on Github or PGXN.
Given two arrays a and b you can say vec_add(a, b). You can also add either side to a scalar, e.g. vec_add(a, 5).
If you want a SUM aggregate function instead you can find that in aggs_for_vecs, also on PGXN.
Finally if you want to sum up all the elements of a single array, you can use aggs_for_arrays (PGXN).
I discovered a solution on my own which is probably the one I will use.
First, we can define a function for adding two vectors:
CREATE OR REPLACE FUNCTION vec_add(arr1 numeric[], arr2 numeric[])
RETURNS numeric[] AS
$$
SELECT array_agg(result)
FROM (SELECT tuple.val1 + tuple.val2 AS result
FROM (SELECT UNNEST($1) AS val1
,UNNEST($2) AS val2
,generate_subscripts($1, 1) AS ix) tuple
ORDER BY ix) inn;
$$ LANGUAGE SQL IMMUTABLE STRICT;
and a function for multiplying by a constant:
CREATE OR REPLACE FUNCTION vec_mult(arr numeric[], mul numeric)
RETURNS numeric[] AS
$$
SELECT array_agg(result)
FROM (SELECT val * $2 AS result
FROM (SELECT UNNEST($1) AS val
,generate_subscripts($1, 1) as ix) t
ORDER BY ix) inn;
$$ LANGUAGE SQL IMMUTABLE STRICT;
Then we can use the PostgreSQL statement CREATE AGGREGATE to create the vec_sum function directly:
CREATE AGGREGATE vec_sum(numeric[]) (
SFUNC = vec_add
,STYPE = numeric[]
);
And finally, we can find the average as:
SELECT vec_mult(vec_sum(vector), 1 / count(vector)) FROM A;
from http://www.postgresql.org/message-id/4C2504A3.4090502#wp.pl
select avg(unnested) from (select unnest(vector) as unnested from A) temp;
Edit: I think I now understand the question better.
Here is a possible solution drawing heavily upon: https://stackoverflow.com/a/8767450/3430807 I don't consider it elegant nor am I sure it will perform well:
Test data:
CREATE TABLE A
(vector numeric[], id serial)
;
INSERT INTO A
VALUES
('{1, 2, 3}'::numeric[])
,('{4, 5, 6}'::numeric[])
,('{7, 8, 9}'::numeric[])
;
Query:
select avg(vector[temp.index])
from A as a
join
(select generate_subscripts(vector, 1) as index
, id
from A) as temp on temp.id = a.id
group by temp.index

PostgreSQL two dimensional array intersection

is it possible with postgres tools to intersect two-dimensional arrays?
For instance I have:
id arr
1 {{1,2}, {3,4}, {4,5}, {4,7}}
2 {{4,2}, {7,4}, {8,5}, {9,7}}
...
And I want to get all records that have {4,5} in theirs array, record id=1 here.
If I do something like:
select * from table where arr && '{4,5}'
I'll get both of those example records. It finds all records where are 4 and 5 are anywhere in the array - as if it was looking in fully extracted arrays, like {1,2,3,4,4,5,4,7}.
I thought of using functions from the extension intarray, but:
Many of these operations are only sensible for one-dimensional arrays.
Although they will accept input arrays of more dimensions, the data is
treated as though it were a linear array in storage order.
My other ideas:
Quick and dirty
WITH x(id, arr) AS (VALUES
(1, '{{1,2}, {3,4}, {4,5}, {4,7}}'::int[])
,(2, '{{4,2}, {7,4}, {8,5}, {9,7}}')
)
SELECT *
FROM x
WHERE arr::text LIKE '%{4,5}%';
The simple trick is to transform the array to its text representation and check with LIKE.
With array-functions
WITH x(id, arr) AS (
VALUES
(1, '{{1,2}, {3,4}, {4,5}, {4,7}}'::int[])
,(2, '{{4,2}, {7,4}, {8,5}, {9,7}}')
)
,y AS (
SELECT id, arr, generate_subscripts(arr, 1) AS i
FROM x
)
SELECT id, arr
FROM y
WHERE arr[i:i] = '{{4,5}}'::int[];
Or, the same in different notation, building on your example:
SELECT id, arr
FROM (
SELECT id, arr, generate_subscripts(arr, 1) AS i
FROM tbl
) x
WHERE arr[i:i] = '{{4,5}}'::int[];
This generates the array subscripts for the first dimension. With the help of this set-returning function we check each array-slice. Note the double brackets on the right hand side of the expression to create a two-dimensional array that matches.
If {4,5} can be in arr multiple times, add GROUP BY or a DISTINCT clause to avoid duplicate rows.