PostgreSQL array_agg(INTEGER[]) - sql

Using Postgres 9.5, I want to concaternate integer arrays from a GROUP BY. From the documentation is seems as though array_agg should be able to do this, but I get: ERROR: cannot accumulate arrays of different dimensionality
Using array_dims on my test set I get [1:18], [1:24] and [1:48]. I see this as 3 1-dimensional arrays of different lengths. The result should be a single array with dimension [1:90] What am I missing here?

Continuing from discussion in comments, my personal suggestion is to create aggregate.
CREATE AGGREGATE array_concat_agg(anyarray) (
SFUNC = array_cat,
STYPE = anyarray
);
Then you can do this:
SELECT column1
FROM (VALUES (array[1,2,3]), (array[3,4]), (array[53,43,33,22])) arr;
column1
---------------
{1,2,3}
{3,4}
{53,43,33,22}
(3 rows)
SELECT array_concat_agg(column1)
FROM (VALUES (array[1,2,3]), (array[3,4]), (array[53,43,33,22])) arr;
array_concat_agg
-------------------------
{1,2,3,3,4,53,43,33,22}
(1 row)

Related

Get max on comma separated values in column

How to get max on comma separated values in Original_Ids column and get max value in one column and remaining ids in different column.
|Original_Ids | Max_Id| Remaining_Ids |
|123,534,243,345| 534 | 123,234,345 |
Upadte -
If I already have Max_id and just need below equation?
Remaining_Ids = Original_Ids - Max_id
Thanks
Thanks to the excellent possibilities of array manipulation in Postgres, this could be done relatively easy by converting the string to an array and from there to a set.
Then regular queries on that set are possible. With max() the maximum can be selected and with EXCEPT ALL the maximum can be removed from the set.
A set can then be converted to an array and with array_to_string() and the array can be converted to a delimited string again.
SELECT ids original_ids,
(SELECT max(un.id::integer)
FROM unnest(string_to_array(ids,
',')) un(id)) max_id,
array_to_string(ARRAY((SELECT un.id::integer
FROM unnest(string_to_array(ids,
',')) un(id)
EXCEPT ALL
SELECT max(un.id::integer)
FROM unnest(string_to_array(ids,
',')) un(id))),
',') remaining_ids
FROM elbat;
Another option would have been regexp_split_to_table() which directly produces a set (or regexp_split_to_array() but than we'd had the possible regular expression overhead and still had to convert the array to a set).
But nevertheless you just should (almost) never use delimited lists (nor arrays). Use a table, that's (almost) always the best option.
SQL Fiddle
You can use a window function (https://www.postgresql.org/docs/current/static/tutorial-window.html) to get the max element per unnested array. After that you can reaggregate the elements and remove the calculated max value from the array.
Result:
a max_elem remaining
123,534,243,345 534 123,243,345
3,23,1 23 3,17
42 42
56,123,234,345,345 345 56,123,234
This query needs only one split/unnest as well as only one max calculation.
SELECT
a,
max_elem,
array_remove(array_agg(elements), max_elem) as remaining -- C
FROM (
SELECT
*,
MAX(elements) OVER (PARTITION BY a) as max_elem -- B
FROM (
SELECT
a,
unnest((string_to_array(a, ','))::int[]) as elements -- A
FROM arrays
)s
)s
GROUP BY a, max_elem
A: string_to_array converts the string list into an array. Because the arrays are treated as string arrays you need the cast them into integer arrays by adding ::int[]. The unnest() expands all array elements into own rows.
B: window function MAX gives the maximum value of the single arrays as max_elem
C: array_agg reaggregates the elements through the GROUP BY id. After that array_remove removes the max_elem value from the array.
If you do not like to store them as pure arrays but as string list again you could add array_to_string. But I wouldn't recommend this because your data are integer arrays and not strings. For every further calculation you would need this string cast. A even better way (as already stated by #stickybit) is not to store the elements as arrays but as unnested data. As you can see in nearly every operation should would do the unnest before.
Note:
It would be better to use an ID to adress the columns/arrays instead of the origin string as in SQL Fiddle with IDs
If you install the extension intarray this is quite easy.
First you need to create the extension (you have to be superuser to do that):
create extension intarray;
Then you can do the following:
select original_ids,
original_ids[1] as max_id,
sort(original_ids - original_ids[1]) as remaining_ids
from (
select sort_desc(string_to_array(original_ids,',')::int[]) as original_ids
from bad_design
) t
But you shouldn't be storing comma separated values to begin with

how to convert array elements from string to int in postgres?

I have a column in the table as jsonb [14,21,31]
and I want to get all the rows with selected element eg
SELECT *
FROM t_customers
WHERE tags ?| array['21','14']
but the jsonb elements are in integer format
how do i convert the sql array elements into integer
i tried removing the quotes from the array but it gives an error
A naive solution would be:
t=# with t_customers(tags) as (values('[14,21,31]'::jsonb))
select
tags
, translate(tags::text,'[]','{}')::int[] jsonb_int_to_arr
, translate(tags::text,'[]','{}')::int[] #> array['21','14']::int[] includes
from
t_customers;
tags | jsonb_int_to_arr | includes
--------------+------------------+----------
[14, 21, 31] | {14,21,31} | t
(1 row)
https://www.postgresql.org/docs/current/static/functions-array.html
if you want to cast as array - you should use #> operator to check if contains.
(at first I proposed it because I misunderstood the question - so it goes the opposite way, "turning" jsonb to array and checking if it contains, but now maybe this naive approach is the shortest)
the right approach here probably would be:
t=# with t_customers(tags) as (values('[14,21,31]'::jsonb))
, t as (select tags,jsonb_array_elements(tags) from t_customers)
select jsonb_agg(jsonb_array_elements::text) ?| array['21','14'] tags from t group by tags;
tags
------
t
(1 row)
which is basically "repacking" jsonb array with text representations of integers

Remove n elements from array using start and end index

I have the following table in a Postgres database:
CREATE TABLE test(
id SERIAL NOT NULL,
arr int[] NOT NULL
)
And the array contains about 500k elements.
I would like to know if there is an efficient way to update arr column by removing a set of elements from the array given the start and end index or just the number of "n first elements" to remove.
You can access individual elements or ranges of elements:
If you e.g. want to remove elements 5 to 8, you can do:
select arr[1:4]||arr[9:]
from test;
or as an update:
update test
set arr = arr[1:4]||arr[9:];
To remove the "first n elements", just use the slice after the n+1 element, e.g. to get remove the first 5 elements:
select arr[6:]
from test;
The syntax arr[6:] requires Postgres 9.6 or later, for earlier versions you need
select arr[6:cardinality(arr)]
from test;
cardinality() was introduced in 9.4, if you are using an even older version, you need:
select arr[6:array_lengt(arr,1)]
from test;
You can use slices (see 8.15.3. Accessing Arrays).
create table example
as select array[1,2,3,4,5,6,7,8] arr;
Remove first 3 elements:
select arr[4:8]
from example;
arr
-------------
{4,5,6,7,8}
(1 row)
Remove elements from 4 to 5:
select arr[1:3] || arr[6:8] as arr
from example;
arr
---------------
{1,2,3,6,7,8}
(1 row)
Remove first 5 elements if the length of the array is unknown:
select arr[6:array_length(arr,1)]
from example;
arr
---------
{6,7,8}
(1 row)

How to transfer a column in an array using PostgreSQL, when the columns data type is a composite type?

I'm using PostgreSQL 9.4 and I'm currently trying to transfer a columns values in an array. For "normal" (not user defined) data types I get it to work.
To explain my problem in detail, I made up a minimal example.
Let's assume we define a composite type "compo" and create a table "test_rel" and insert some values. Looks like this and works for me:
CREATE TYPE compo AS(a int, b int);
CREATE TABLE test_rel(t1 compo[],t2 int);
INSERT INTO test_rel VALUES('{"(1,2)"}',3);
INSERT INTO test_rel VALUES('{"(4,5)","(6,7)"}',3);
Next, we try to get an array with column t2's values. The following also works:
SELECT array(SELECT t2 FROM test_rel WHERE t2='3');
Now, we try to do the same stuff with column t1 (the column with the composite type). My problem is now, that the following does'nt work:
SELECT array(SELECT t1 FROM test_rel WHERE t2='3');
ERROR: could not find array type for data type compo[]
Could someone please give me a hint, why the same statement does'nt work with the composite type? I'm not only new to stackoverflow, but also to PostgreSQL and plpgsql. So, please tell me, when I'm doing something the wrong way.
There were some discussion about this in the PostgreSQL mailing list.
Long story short, both
select array(select array_type from ...)
select array_agg(array_type) from ...
represents a concept of array of arrays, which PostgreSQL doesn't support. PostgreSQL supports multidimensional arrays, but they have to be rectangular. F.ex. ARRAY[[0,1],[2,3]] is valid, but ARRAY[[0],[1,2]] is not.
There were some improvement with both the array constructor & the array_agg() function in 9.5.
Now, they explicitly states, that they will accumulate array arguments as a multidimensional array, but only if all of its parts have equal dimensions.
array() constructor: If the subquery's output column is of an array type, the result will be an array of the same type but one higher dimension; in this case all the subquery rows must yield arrays of identical dimensionality, else the result would not be rectangular.
array_agg(any array type): input arrays concatenated into array of one higher dimension (inputs must all have same dimensionality, and cannot be empty or NULL)
For 9.4, you could wrap the array into a row: this way, you could create something, which is almost an array of arrays:
SELECT array(SELECT ROW(t1) FROM test_rel WHERE t2='3');
SELECT array_agg(ROW(t1)) FROM test_rel WHERE t2='3';
Or, you could use a recursive CTE (and an array concatenation) to workaround the problem, like:
with recursive inp(arr) as (
values (array[0,1]), (array[1,2]), (array[2,3])
),
idx(arr, idx) as (
select arr, row_number() over ()
from inp
),
agg(arr, idx) as (
select array[[0, 0]] || arr, idx
from idx
where idx = 1
union all
select agg.arr || idx.arr, idx.idx
from agg
join idx on idx.idx = agg.idx + 1
)
select arr[array_lower(arr, 1) + 1 : array_upper(arr, 1)]
from agg
order by idx desc
limit 1;
But of course this solution is highly dependent of your data ('s dimensions).

Is there something like a zip() function in PostgreSQL that combines two arrays?

I have two array values of the same length in PostgreSQL:
{a,b,c} and {d,e,f}
and I'd like to combine them into
{{a,d},{b,e},{c,f}}
Is there a way to do that?
Postgres 9.5 or later
has array_agg(array expression):
array_agg ( anyarray ) → anyarray
Concatenates all the input arrays into an array of one higher
dimension. (The inputs must all have the same dimensionality, and
cannot be empty or null.)
This is a drop-in replacement for my custom aggregate function array_agg_mult() demonstrated below. It's implemented in C and considerably faster. Use it.
Postgres 9.4
Use the ROWS FROM construct or the updated unnest() which takes multiple arrays to unnest in parallel. Each can have a different length. You get (per documentation):
[...] the number of result rows in this case is that of the largest function
result, with smaller results padded with null values to match.
Use this cleaner and simpler variant:
SELECT ARRAY[a,b] AS ab
FROM unnest('{a,b,c}'::text[]
, '{d,e,f}'::text[]) x(a,b);
Postgres 9.3 or older
Simple zip()
Consider the following demo for Postgres 9.3 or earlier:
SELECT ARRAY[a,b] AS ab
FROM (
SELECT unnest('{a,b,c}'::text[]) AS a
, unnest('{d,e,f}'::text[]) AS b
) x;
Result:
ab
-------
{a,d}
{b,e}
{c,f}
Note that both arrays must have the same number of elements to unnest in parallel, or you get a cross join instead.
You can wrap this into a function, if you want to:
CREATE OR REPLACE FUNCTION zip(anyarray, anyarray)
RETURNS SETOF anyarray LANGUAGE SQL AS
$func$
SELECT ARRAY[a,b] FROM (SELECT unnest($1) AS a, unnest($2) AS b) x;
$func$;
Call:
SELECT zip('{a,b,c}'::text[],'{d,e,f}'::text[]);
Same result.
zip() to multi-dimensional array:
Now, if you want to aggregate that new set of arrays into one 2-dimenstional array, it gets more complicated.
SELECT ARRAY (SELECT ...)
or:
SELECT array_agg(ARRAY[a,b]) AS ab
FROM (
SELECT unnest('{a,b,c}'::text[]) AS a
,unnest('{d,e,f}'::text[]) AS b
) x
or:
SELECT array_agg(ARRAY[ARRAY[a,b]]) AS ab
FROM ...
will all result in the same error message (tested with pg 9.1.5):
ERROR: could not find array type for data type text[]
But there is a way around this, as we worked out under this closely related question.
Create a custom aggregate function:
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
, STYPE = anyarray
, INITCOND = '{}'
);
And use it like this:
SELECT array_agg_mult(ARRAY[ARRAY[a,b]]) AS ab
FROM (
SELECT unnest('{a,b,c}'::text[]) AS a
, unnest('{d,e,f}'::text[]) AS b
) x
Result:
{{a,d},{b,e},{c,f}}
Note the additional ARRAY[] layer! Without it and just:
SELECT array_agg_mult(ARRAY[a,b]) AS ab
FROM ...
You get:
{a,d,b,e,c,f}
Which may be useful for other purposes.
Roll another function:
CREATE OR REPLACE FUNCTION zip2(anyarray, anyarray)
RETURNS SETOF anyarray LANGUAGE SQL AS
$func$
SELECT array_agg_mult(ARRAY[ARRAY[a,b]])
FROM (SELECT unnest($1) AS a, unnest($2) AS b) x;
$func$;
Call:
SELECT zip2('{a,b,c}'::text[],'{d,e,f}'::text[]); -- or any other array type
Result:
{{a,d},{b,e},{c,f}}
Here's another approach that's safe for arrays of differing lengths, using the array multi-aggregation mentioned by Erwin:
CREATE OR REPLACE FUNCTION zip(array1 anyarray, array2 anyarray) RETURNS text[]
AS $$
SELECT array_agg_mult(ARRAY[ARRAY[array1[i],array2[i]]])
FROM generate_subscripts(
CASE WHEN array_length(array1,1) >= array_length(array2,1) THEN array1 ELSE array2 END,
1
) AS subscripts(i)
$$ LANGUAGE sql;
regress=> SELECT zip('{a,b,c}'::text[],'{d,e,f}'::text[]);
zip
---------------------
{{a,d},{b,e},{c,f}}
(1 row)
regress=> SELECT zip('{a,b,c}'::text[],'{d,e,f,g}'::text[]);
zip
------------------------------
{{a,d},{b,e},{c,f},{NULL,g}}
(1 row)
regress=> SELECT zip('{a,b,c,z}'::text[],'{d,e,f}'::text[]);
zip
------------------------------
{{a,d},{b,e},{c,f},{z,NULL}}
(1 row)
If you want to chop off the excess rather than null-padding, just change the >= length test to <= instead.
This function does not handle the rather bizarre PostgreSQL feature that arrays may have a stating element other than 1, but in practice nobody actually uses that feature. Eg with a zero-indexed 3-element array:
regress=> SELECT zip('{a,b,c}'::text[], array_fill('z'::text, ARRAY[3], ARRAY[0]));
zip
------------------------
{{a,z},{b,z},{c,NULL}}
(1 row)
wheras Erwin's code does work with such arrays, and even with multi-dimensional arrays (by flattening them) but does not work with arrays of differing length.
Arrays are a bit special in PostgreSQL, they're a little too flexible with multi-dimensional arrays, configurable origin index, etc.
In 9.4 you'll be able to write:
SELECT array_agg_mult(ARRAY[ARRAY[a,b])
FROM unnest(array1) WITH ORDINALITY as (o,a)
NATURAL FULL OUTER JOIN
unnest(array2) WITH ORDINALITY as (o,b);
which will be a lot nicer, especially if an optimisation to scan the functions together rather than doing a sort and join goes in.