How to query array stored column in Postgres, (array intersection) - sql

I have a table which contains value in array something like this.
id | contents_id
1 | [1, 3, 5]
2 | [1, 2]
3 | [3, 4, 6]
4 | [2, 5]
How to write a query array e.g. [1, 2] such that it check value of array not array as a whole ?
If any common value of array is found get all tuples.
If [1, 2] is queried it must fetch id => 1, 2, 4 from above table as it contains 1 or 2.

Consider using the intarray extension. It provides a && operator for testing integer array overlapping. Here is a fiddle, with an example.
select id from test where ARRAY[1,2] && contents_id;
Though you can query it with the operator, I think it will be better to make a junction table with integer IDs.

On 1-D int arrays && operator arrayoverlap is the fastest as #LaposhasĂșAcsa suggested.
so my answer stands only if arrayoverlap is not avaliable or want to work with anything other than one dimensional integer arrays.
Check UNNEST https://www.postgresql.org/docs/current/static/functions-array.html
CREATE TABLE t45407507 (
id SERIAL PRIMARY KEY
,c int[]
);
insert into t45407507 ( c) values
(ARRAY[1,3,5])
, (ARRAY[1,2])
, (ARRAY[3,4,6])
, (ARRAY[2,5]);
select DISTINCT id from
(SELECT id,unnest(c) as c
from t45407507) x
where x.c in (1,2);
Can be shortened with LATERAL join
select DISTINCT id from
t45407507 x,unnest(c) ec
where ec in (1,2);
The comma (,) in the FROM clause is short notation for CROSS JOIN.
LATERAL is assumed automatically for table functions like unnest().
Rewrite WHERE to use ARRAY as parameter
SELECT DISTINCT id FROM
t45407507 x,unnest(c) ec
WHERE ec = ANY(ARRAY[1,2]);

Related

Postgres filter rows matching grouped foreign key that includes all values of an array

I've got 3 tables:
create table events (id serial, ...)
create table devices (id serial, ...)
create table event_devices (event_id int, device_id int, ...)
Let's say the data in event_devices looks like this:
event_id | device_id
--------------------
1 | 1
1 | 2
1 | 3
2 | 1
2 | 4
I need to conduct a search for two cases:
filter all events that contain any device in a given list such that
{1, 4} -> (1, 2)
{1, 2, 3} -> (1, 2)
filter all events that contain all devices in a given list such that
{1, 4} -> (2)
{1, 2, 3} -> ()
Let's say the given list is input as an array of ints.
The first case is pretty simple; I can simply use "IN":
with
devices_filter as (
select distinct
event_devices.event_id
from event_devices
where
event_devices.device_id in (select unnest($1::int[]) as device_id)
)
select
events.id as event_id
from events
left outer join devices_filter on
devices_filter.event_id = events.id
where
devices_filter.event_id is not null
But how do I query for the second case? I've thought may be I need another CTE that groups and aggregates device ids based on event id, then perform an intersection, and check that the resulting length is equal to the length of the input array, but I'm not sure exactly how that would work. I'd also like to avoid any unnecessary grouping, since the event_devices table can be quite large.
Any hints or tips?
If you are passing in an array that has no duplicates, you can use aggregation:
select ed.event_id
from event_devices ed
where ed.device_id = any (:array)
group by ed.event_id
having count(*) = cardinality(:array)
If you need to cast the values, then :array is really $1::int[].

Searching element in jsonb_array_elements(2D array)

I have a table with 2 column R(id int,dat jsonb). The b column jsonb consist of a 2D array [][]. For example :
id| dat
1 | {"name":"a","numbers":[[1,2],[3,4],[5,6],[1,3]]}
2 | {"age":5,"numbers":[[1,1]]}
3 | {"numbers":[[5,6],[6,7]]}
I'm trying to find all the ids that contain a specific number in one of those sub arrays. I used 2 solutions and I want to understand why the first one isn't working :
1)
select * from R
where exists (
select from jsonb_array_elements(R.dat->'numbers')->>0 first,jsonb_array_elements(range.data->'numbers')->>1 second where first::decimal= 1 and second::decimal= 1
);
ERROR: syntax error at or near "->>"
LINE 3: ...t from jsonb_array_elements(R.dat->'numbers')->>0 first,j...
SELECT *
FROM R
WHERE EXISTS (
SELECT FROM jsonb_array_elements(R.dat-> 'numbers') subarray
WHERE (subarray->>0)::decimal = 1 and (subarray->>1)::decimal = 1
);
In addition, I saw that gin index doesn't handle this operator so basically does any index will help here ?
Your first query raises an error because you can use only table experssions (not value expressions) in the FROM clause.
You can make the second query a bit simpler:
select *
from r
where exists (
select from jsonb_array_elements(dat->'numbers') subarray
where subarray = '[1,1]'
);
or using the function in a lateral join:
select r.*
from r
cross join jsonb_array_elements(dat->'numbers')
where value = '[1,1]';
There is no index that could support these queries because of the use of jsonb_array_elements().
You may be tempted to use the containment operator #> in the way like this:
select *
from r
where dat->'numbers' #> '[[1,1]]'::jsonb
id | dat
----+------------------------------------------------------------
1 | {"name": "a", "numbers": [[1, 2], [3, 4], [5, 6], [1, 3]]}
2 | {"age": 5, "numbers": [[1, 1]]}
(2 rows)
Unfortunately, as you can see, it does not work as you could expect. The use of the operator on arrays is a bit tricky, as it works in the way: array1 #> array2 is true if for each element j of array2, there is an i in array1 such that i #> j. Hence, per the documentation:
the order of array elements is not significant when doing a containment match, and duplicate array elements are effectively considered only once.

How to get index of an array value in PostgreSQL?

I have a table called pins like this:
id (int) | pin_codes (jsonb)
--------------------------------
1 | [4000, 5000, 6000]
2 | [8500, 8400, 8600]
3 | [2700, 2300, 2980]
Now, I want the row with pin_code 8600 and with its array index. The output must be like this:
pin_codes | index
------------------------------
[8500, 8500, 8600] | 2
If I want the row with pin_code 2700, the output :
pin_codes | index
------------------------------
[2700, 2300, 2980] | 0
What I've tried so far:
SELECT pin_codes FROM pins WHERE pin_codes #> '[8600]'
It only returns the row with wanted value. I don't know how to get the index on the value in the pin_codes array!
Any help would be great appreciated.
P.S:
I'm using PostgreSQL 10
If you were storing the array as a real array not as a json, you could use array_position() to find the (first) index of a given element:
select array_position(array['one', 'two', 'three'], 'two')
returns 2
With some text mangling you can cast the JSON array into a text array:
select array_position(translate(pin_codes::text,'[]','{}')::text[], '8600')
from the_table;
The also allows you to use the "operator"
select *
from pins
where '8600' = any(translate(pin_codes::text,'[]','{}')::text[])
The contains #> operator expects arrays on both sides of the operator. You could use it to search for two pin codes at a time:
select *
from pins
where translate(pin_codes::text,'[]','{}')::text[] #> array['8600','8400']
Or use the overlaps operator && to find rows with any of multiple elements:
select *
from pins
where translate(pin_codes::text,'[]','{}')::text[] && array['8600','2700']
would return
id | pin_codes
---+-------------------
2 | [8500, 8400, 8600]
3 | [2700, 2300, 2980]
If you do that a lot, it would be more efficient to store the pin_codes as text[] rather then JSON - then you can also index that column to do searches more efficiently.
Use the function jsonb_array_elements_text() using with ordinality.
with my_table(id, pin_codes) as (
values
(1, '[4000, 5000, 6000]'::jsonb),
(2, '[8500, 8400, 8600]'),
(3, '[2700, 2300, 2980]')
)
select id, pin_codes, ordinality- 1 as index
from my_table, jsonb_array_elements_text(pin_codes) with ordinality
where value::int = 8600;
id | pin_codes | index
----+--------------------+-------
2 | [8500, 8400, 8600] | 2
(1 row)
As has been pointed out previously the array_position function is only available in Postgres 9.5 and greater.
Here is custom function that achieves the same, derived from nathansgreen at github.
-- The array_position function was added in Postgres 9.5.
-- For older versions, you can get the same behavior with this function.
create function array_position(arr ANYARRAY, elem ANYELEMENT, pos INTEGER default 1) returns INTEGER
language sql
as $BODY$
select row_number::INTEGER
from (
select unnest, row_number() over ()
from ( select unnest(arr) ) t0
) t1
where row_number >= greatest(1, pos)
and (case when elem is null then unnest is null else unnest = elem end)
limit 1;
$BODY$;
So in this specific case, after creating the function the following worked for me.
SELECT
pin_codes,
array_position(pin_codes, 8600) AS index
FROM pins
WHERE array_position(pin_codes, 8600) IS NOT NULL;
Worth bearing in mind that it will only return the index of the first occurrence of 8600, you can use the pos argument to index which ever occurrence that you like.
In short, normalize your data structure, or don't do this in SQL. If you want this index of the sub-data element given your current data structure, then do this in your application code (take result, cast to list/array, get index).
Try to unnest the string and assign numbers as follows:
with dat as
(
select 1 id, '8700, 5600, 2300' pins
union all
select 2 id, '2300, 1700, 1000' pins
)
select dat.*, t.rn as index
from
(
select id, t.pins, row_number() over (partition by id) rn
from
(
select id, trim(unnest(string_to_array(pins, ','))) pins from dat
) t
) t
join dat on dat.id = t.id and t.pins = '2300'
If you insist on storing Arrays, I'd defer to klins answer.
As the alternative answer and extension to my comment...don't store SQL data in arrays. 'Normalize' your data in advance and SQL will handle it significantly better. Klin's answer is good, but may suffer for performance as it's outside of what SQL does best.
I'd break the Array prior to storing it. If the number of pincodes is known, then simply having the table pin_id,pin1,pin2,pin3, pinetc... is functional.
If the number of pins is unknown, a first table as pin that stored the pin_id and any info columns related to that pin ID, and then a second table as pin_id, pin_seq,pin_value is also functional (though you may need to pivot this later on to make sense of the data). In this case, select pin_seq where pin_value = 260 would work.

Combine elements of array into different array

I need to split text elements in an array and combine the elements (array_agg) by index into different rows
E.g., input is
'{cat$ball$x... , dog$bat$y...}'::text[]
I need to split each element by '$' and the desired output is:
{cat,dog} - row 1
{ball,bat} - row 2
{x,y} - row 3
...
Sorry for not being clear the first time. I have edited my question. I tried similar options but unable to figure out how to get it with multiple text elements separated with '$' sysmbol
Exactly two parts per array element (original question)
Use unnest(), split_part() and array_agg():
SELECT array_agg(split_part(t, '$', 1)) AS col1
, array_agg(split_part(t, '$', 2)) AS col2
FROM unnest('{cat$ball, dog$bat}'::text[]) t;
Related:
Split comma separated column data into additional columns
General solution (updated question)
For any number of arrays with any number of elements containing any number of parts.
Demo for a table tbl:
CREATE TABLE tbl (tbl_id int PRIMARY KEY, arr text[]);
INSERT INTO tbl VALUES
(1, '{cat1$ball1, dog2$bat2}') -- 2 parts per array element, 2 elements
, (2, '{cat$ball$x, dog$bat$y}') -- 3 parts ...
, (3, '{a1$b1$c1$d1, a2$b2$c2$d2, a3$b3$c3$d3}'); -- 4 parts, 3 elements
Query:
SELECT tbl_id, idx, array_agg(elem ORDER BY ord) AS pivoted_array
FROM tbl t
, unnest(t.arr) WITH ORDINALITY a1(string, ord)
, unnest(string_to_array(a1.string, '$')) WITH ORDINALITY a2(elem, idx)
GROUP BY tbl_id, idx
ORDER BY tbl_id, idx;
We are looking at two (nested) LATERAL joins here. LATERAL requires Postgres 9.3. Details:
What is the difference between LATERAL and a subquery in PostgreSQL?
WITH ORDINALITY for the the first unnest() is up for debate. A simpler query normally works, too. It's just not guaranteed to work according to SQL standards:
SELECT tbl_id, idx, array_agg(elem) AS pivoted_array
FROM tbl t
, unnest(t.arr) string
, unnest(string_to_array(string, '$')) WITH ORDINALITY a2(elem, idx)
GROUP BY tbl_id, idx
ORDER BY tbl_id, idx;
Details:
PostgreSQL unnest() with element number
WITH ORDINALITY requires Postgres 9.4 or later. The same back-patched to Postgres 9.3:
SELECT tbl_id, idx, array_agg(arr2[idx]) AS pivoted_array
FROM tbl t
, LATERAL (
SELECT string_to_array(string, '$') AS arr2 -- convert string to array
FROM unnest(t.arr) string -- unnest org. array
) x
, generate_subscripts(arr2, 1) AS idx -- unnest 2nd array with ord. numbers
GROUP BY tbl_id, idx
ORDER BY tbl_id, idx;
Each query returns:
tbl_id | idx | pivoted_array
--------+-----+---------------
1 | 1 | {cat,dog}
1 | 2 | {bat,ball}
1 | 3 | {y,x}
2 | 1 | {cat2,dog2}
2 | 2 | {ball2,bat2}
3 | 1 | {a3,a1,a2}
3 | 2 | {b1,b2,b3}
3 | 3 | {c2,c1,c3}
3 | 4 | {d2,d3,d1}
SQL Fiddle (still stuck on pg 9.3).
The only requirement for these queries is that the number of parts in elements of the same array is constant. We could even make it work for a varying number of parts using crosstab() with two parameters to fill in NULL values for missing parts, but that's beyond the scope of this question:
PostgreSQL Crosstab Query
A bit messy but you could unnest the array, use regex to separate the text and then aggregate back up again:
with a as (select unnest('{cat$ball, dog$bat}'::_text) some_text),
b as (select regexp_matches(a.some_text, '(^[a-z]*)\$([a-z]*$)') animal_object from a)
select array_agg(animal_object[1]) animal, array_agg(animal_object[2]) a_object
from b
If you're processing multiple records at once you may want to use something like a row number before the unnest so that you have a group by to aggregate back to an array in your final select statement.

Find duplicated values on array column

I have a table with a array column like this:
my_table
id array
-- -----------
1 {1, 3, 4, 5}
2 {19,2, 4, 9}
3 {23,46, 87, 6}
4 {199,24, 93, 6}
And i want as result what and where is the repeated values, like this:
value_repeated is_repeated_on
-------------- -----------
4 {1,2}
6 {3,4}
Is it possible? I don't know how to do this. I don't how to start it! I'm lost!
Use unnest to convert the array to rows, and then array_agg to build an array from the ids
It should look something like this:
SELECT v AS value_repeated,array_agg(id) AS is_repeated_on FROM
(select id,unnest(array) as v from my_table)
GROUP by v HAVING Count(Distinct id) > 1
Note that HAVING Count(Distinct id) > 1 is filtering values that don't appear even once
The clean way to call a set-returning function like unnest() is in a LATERAL join, available since Postgres 9.3:
SELECT value_repeated, array_agg(id) AS is_repeated_on
FROM my_table
, unnest(array_col) value_repeated
GROUP BY value_repeated
HAVING count(*) > 1
ORDER BY value_repeated; -- optional
About LATERAL:
Call a set-returning function with an array argument multiple times
There is nothing in your question to rule out shortcut duplicates (the same element more than once in the same array (like I#MSoP commented), so it must be count(*), not count (DISTINCT id).