How to get index of an array value in PostgreSQL? - sql

I have a table called pins like this:
id (int) | pin_codes (jsonb)
--------------------------------
1 | [4000, 5000, 6000]
2 | [8500, 8400, 8600]
3 | [2700, 2300, 2980]
Now, I want the row with pin_code 8600 and with its array index. The output must be like this:
pin_codes | index
------------------------------
[8500, 8500, 8600] | 2
If I want the row with pin_code 2700, the output :
pin_codes | index
------------------------------
[2700, 2300, 2980] | 0
What I've tried so far:
SELECT pin_codes FROM pins WHERE pin_codes #> '[8600]'
It only returns the row with wanted value. I don't know how to get the index on the value in the pin_codes array!
Any help would be great appreciated.
P.S:
I'm using PostgreSQL 10

If you were storing the array as a real array not as a json, you could use array_position() to find the (first) index of a given element:
select array_position(array['one', 'two', 'three'], 'two')
returns 2
With some text mangling you can cast the JSON array into a text array:
select array_position(translate(pin_codes::text,'[]','{}')::text[], '8600')
from the_table;
The also allows you to use the "operator"
select *
from pins
where '8600' = any(translate(pin_codes::text,'[]','{}')::text[])
The contains #> operator expects arrays on both sides of the operator. You could use it to search for two pin codes at a time:
select *
from pins
where translate(pin_codes::text,'[]','{}')::text[] #> array['8600','8400']
Or use the overlaps operator && to find rows with any of multiple elements:
select *
from pins
where translate(pin_codes::text,'[]','{}')::text[] && array['8600','2700']
would return
id | pin_codes
---+-------------------
2 | [8500, 8400, 8600]
3 | [2700, 2300, 2980]
If you do that a lot, it would be more efficient to store the pin_codes as text[] rather then JSON - then you can also index that column to do searches more efficiently.

Use the function jsonb_array_elements_text() using with ordinality.
with my_table(id, pin_codes) as (
values
(1, '[4000, 5000, 6000]'::jsonb),
(2, '[8500, 8400, 8600]'),
(3, '[2700, 2300, 2980]')
)
select id, pin_codes, ordinality- 1 as index
from my_table, jsonb_array_elements_text(pin_codes) with ordinality
where value::int = 8600;
id | pin_codes | index
----+--------------------+-------
2 | [8500, 8400, 8600] | 2
(1 row)

As has been pointed out previously the array_position function is only available in Postgres 9.5 and greater.
Here is custom function that achieves the same, derived from nathansgreen at github.
-- The array_position function was added in Postgres 9.5.
-- For older versions, you can get the same behavior with this function.
create function array_position(arr ANYARRAY, elem ANYELEMENT, pos INTEGER default 1) returns INTEGER
language sql
as $BODY$
select row_number::INTEGER
from (
select unnest, row_number() over ()
from ( select unnest(arr) ) t0
) t1
where row_number >= greatest(1, pos)
and (case when elem is null then unnest is null else unnest = elem end)
limit 1;
$BODY$;
So in this specific case, after creating the function the following worked for me.
SELECT
pin_codes,
array_position(pin_codes, 8600) AS index
FROM pins
WHERE array_position(pin_codes, 8600) IS NOT NULL;
Worth bearing in mind that it will only return the index of the first occurrence of 8600, you can use the pos argument to index which ever occurrence that you like.

In short, normalize your data structure, or don't do this in SQL. If you want this index of the sub-data element given your current data structure, then do this in your application code (take result, cast to list/array, get index).

Try to unnest the string and assign numbers as follows:
with dat as
(
select 1 id, '8700, 5600, 2300' pins
union all
select 2 id, '2300, 1700, 1000' pins
)
select dat.*, t.rn as index
from
(
select id, t.pins, row_number() over (partition by id) rn
from
(
select id, trim(unnest(string_to_array(pins, ','))) pins from dat
) t
) t
join dat on dat.id = t.id and t.pins = '2300'

If you insist on storing Arrays, I'd defer to klins answer.
As the alternative answer and extension to my comment...don't store SQL data in arrays. 'Normalize' your data in advance and SQL will handle it significantly better. Klin's answer is good, but may suffer for performance as it's outside of what SQL does best.
I'd break the Array prior to storing it. If the number of pincodes is known, then simply having the table pin_id,pin1,pin2,pin3, pinetc... is functional.
If the number of pins is unknown, a first table as pin that stored the pin_id and any info columns related to that pin ID, and then a second table as pin_id, pin_seq,pin_value is also functional (though you may need to pivot this later on to make sense of the data). In this case, select pin_seq where pin_value = 260 would work.

Related

matching array in Postgres with string manipulation

I was working with the "<#" operator and two arrays of strings.
anyarray <# anyarray → boolean
Every string is formed in this way: ${name}_${number}, and I would like to check if the name part is included and the number is equal or lower than the one in the other array.
['elementOne_10'] & [['elementOne_7' , 'elementTwo20']] → true
['elementOne_10'] & [['elementOne_17', 'elementTwo20']] → false
what would be an efficient way to do this?
Assuming your sample data elementTwo20 in fact follows your described schema and should be elementTwo_20:
step-by-step demo:db<>fiddle
SELECT
id
FROM (
SELECT
*,
split_part(u, '_', 1) as name, -- 3
split_part(u, '_', 2)::int as num,
split_part(compare, '_', 1) as comp_name,
split_part(compare, '_', 2)::int as comp_num
FROM
t,
unnest(data) u, -- 1
(SELECT unnest('{elementOne_10}'::text[]) as compare) s -- 2
)s
GROUP BY id -- 4
HAVING
ARRAY_AGG(name) #> ARRAY_AGG(comp_name) -- 5
AND MAX(comp_num) BETWEEN MIN(num) AND MAX(num)
unnest() your array elements into one element per record
JOIN and unnest() your comparision data
split the element strings into their name and num parts
unnest() creates several records per original array, they can be grouped by an identifier (best is an id column)
Filter with your criteria in the HAVING clause: Compare the name parts for example with array operators, for BETWEEN comparing you can use MIN and MAX on the num part.
Note:
As #a_horse_with_no_name correctly mentioned: If possible think about your database design and normalize it:
Don't store arrays -> You don't need to unnest them on every operation
Relevant data should be kept separated, not concatenated as a string -> You don't need to split them on every operation
id | name | num
---------------------
1 | elementOne | 7
1 | elementTwo | 20
2 | elementOne | 17
2 | elementTwo | 20
This is exactly the result of the inner subquery. You have to create this every time you need these data. It's better to store the data like this.

Remove duplicate entries from string array column of postgres

I have a PostgreSQL table where there is column which has array of strings. The row have some unique array strings or some have duplicate strings also. I want to remove duplicate strings from each row if they exists.
I have tried to some queries but couldn't make it happen.
Following is the table:
veh_id | vehicle_types
--------+----------------------------------------
1 | {"byd_tang","volt","viper","laferrari"}
2 | {"volt","viper"}
3 | {"byd_tang","sonata","jaguarxf"}
4 | {"swift","teslax","mirai"}
5 | {"volt","viper"}
6 | {"viper","ferrariff","bmwi8","viper"}
7 | {"ferrariff","viper","viper","volt"}
I am expecting following output:
veh_id | vehicle_types
--------+----------------------------------------
1 | {"byd_tang","volt","viper","laferrari"}
2 | {"volt","viper"}
3 | {"byd_tang","sonata","jaguarxf"}
4 | {"swift","teslax","mirai"}
5 | {"volt","viper"}
6 | {"viper","ferrariff","bmwi8"}
7 | {"ferrariff","viper","volt"}
Since each row's array is independent, a plain correlated subquery with an ARRAY constructor would do the job:
SELECT *, ARRAY(SELECT DISTINCT unnest (vehicle_types)) AS vehicle_types_uni
FROM vehicle;
See:
Why is array_agg() slower than the non-aggregate ARRAY() constructor?
Note that NULL is converted to an empty array ('{}'). We'd need to special-case it, but it is excluded in the UPDATE below anyway.
Fast and simple. But don't use this. You didn't say so, but typically you'd want to preserve original order of array elements. Your rudimentary sample suggests as much. Use WITH ORDINALITY in the correlated subquery, which becomes a bit more sophisticated:
SELECT *, ARRAY (SELECT v
FROM unnest(vehicle_types) WITH ORDINALITY t(v,ord)
GROUP BY 1
ORDER BY min(ord)
) AS vehicle_types_uni
FROM vehicle;
See:
PostgreSQL unnest() with element number
UPDATE to actually remove dupes:
UPDATE vehicle
SET vehicle_types = ARRAY (
SELECT v
FROM unnest(vehicle_types) WITH ORDINALITY t(v,ord)
GROUP BY 1
ORDER BY min(ord)
)
WHERE cardinality(vehicle_types) > 1 -- optional
AND vehicle_types <> ARRAY (
SELECT v
FROM unnest(vehicle_types) WITH ORDINALITY t(v,ord)
GROUP BY 1
ORDER BY min(ord)
); -- suppress empty updates (optional)
Both added WHERE conditions are optional to improve performance. The 1st one is completely redundant. Each condition also excludes the NULL case. The 2nd one suppresses all empty updates.
See:
How do I (or can I) SELECT DISTINCT on multiple columns?
If you tried to do that without preserving original order, you'd likely update most rows without need, just because the order or elements changed even without dupes.
Requires Postgres 9.4 or later.
db<>fiddle here
I don't claim it's efficient, but something like this might work:
with expanded as (
select veh_id, unnest (vehicle_types) as vehicle_type
from vehicles
)
select veh_id, array_agg (distinct vehicle_type)
from expanded
group by veh_id
If you really want to get fancy and do something that is worst case O(n), you can write a custom function:
create or replace function unique_array(input_array text[])
returns text[] as $$
DECLARE
output_array text[];
i integer;
BEGIN
output_array = array[]::text[];
for i in 1..cardinality(input_array) loop
if not (input_array[i] = any (output_array)) then
output_array := output_array || input_array[i];
end if;
end loop;
return output_array;
END;
$$
language plpgsql
Usage example:
select veh_id, unique_array(vehicle_types)
from vehicles

BigQuery - nested json - select where nested item equals

Having the following table in BigQuery database, where the f0_
Row | f0_
1 | {"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}
2 | {"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}
3 | {"configuration":[{"param1":"value1"},{"param2":[3.0,36]}]}
4 | {"configuration":[{"param1":"value1"},{"param2":[3.0,46]}]}
5 | {"configuration":[{"param1":"value1"},{"param2":[3.0,30]}]}
6 | {"configuration":[{"param1":"value1"}]}
f0_ column is a pure string.
Is there a way to write a select query, where the "param2" value is equal to [3.0, 45] array meaning it would only return rows 1 and 2? Preferably would be great to accomplish it without directly indexing the first element in the "configuration" array as the order might not be guaranteed.
Below is for BigQuery Standrad SQL
#standardSQL
SELECT line
FROM `project.dataset.table`
WHERE REGEXP_EXTRACT(JSON_EXTRACT(line, '$.configuration'), r'{"param2":(.*?)}') = '[3.0,45]'
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '{"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}' line UNION ALL
SELECT '{"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}' UNION ALL
SELECT '{"configuration":[{"param1":"value1"},{"param2":[3.0,36]}]}' UNION ALL
SELECT '{"configuration":[{"param1":"value1"},{"param2":[3.0,46]}]}' UNION ALL
SELECT '{"configuration":[{"param1":"value1"},{"param2":[3.0,30]}]}' UNION ALL
SELECT '{"configuration":[{"param1":"value1"}]}'
)
SELECT line
FROM `project.dataset.table`
WHERE REGEXP_EXTRACT(JSON_EXTRACT(line, '$.configuration'), r'{"param2":(.*?)}') = '[3.0,45]'
with result
Row line
1 {"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}
2 {"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}
Preferably would be great to accomplish it without directly indexing the first element in the "configuration" array as the order might not be guaranteed.
Note: this solution does not depend on position of "param2" in the configuration array
You can use some of BQ's neat JSON functions as described here.
Based on that, you can locate param2 and check if its value matches what you're looking for. If you aren't sure of the configuration order, you can iterate through the array to find param2, but it's not particularly efficient. I recommend you try to find a way where param2 is always the second field in the array. I was able to get the correct results like so:
SELECT json_text AS correct_configurations
FROM UNNEST([
'{"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}',
'{"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}',
'{"configuration":[{"param1":"value1"},{"param2":[3.0,36]}]}',
'{"configuration":[{"param1":"value1"},{"param2":[3.0,46]}]}',
'{"configuration":[{"param1":"value1"},{"param2":[3.0,30]}]}',
'{"configuration":[{"param1":"value1"}]}'
])
AS json_text
WHERE JSON_EXTRACT(json_text, '$.configuration[1].param2') LIKE "[3.0,45]";
Gives a result of:
Row | correct_configurations
1 | {"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}
2 | {"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}

Combine elements of array into different array

I need to split text elements in an array and combine the elements (array_agg) by index into different rows
E.g., input is
'{cat$ball$x... , dog$bat$y...}'::text[]
I need to split each element by '$' and the desired output is:
{cat,dog} - row 1
{ball,bat} - row 2
{x,y} - row 3
...
Sorry for not being clear the first time. I have edited my question. I tried similar options but unable to figure out how to get it with multiple text elements separated with '$' sysmbol
Exactly two parts per array element (original question)
Use unnest(), split_part() and array_agg():
SELECT array_agg(split_part(t, '$', 1)) AS col1
, array_agg(split_part(t, '$', 2)) AS col2
FROM unnest('{cat$ball, dog$bat}'::text[]) t;
Related:
Split comma separated column data into additional columns
General solution (updated question)
For any number of arrays with any number of elements containing any number of parts.
Demo for a table tbl:
CREATE TABLE tbl (tbl_id int PRIMARY KEY, arr text[]);
INSERT INTO tbl VALUES
(1, '{cat1$ball1, dog2$bat2}') -- 2 parts per array element, 2 elements
, (2, '{cat$ball$x, dog$bat$y}') -- 3 parts ...
, (3, '{a1$b1$c1$d1, a2$b2$c2$d2, a3$b3$c3$d3}'); -- 4 parts, 3 elements
Query:
SELECT tbl_id, idx, array_agg(elem ORDER BY ord) AS pivoted_array
FROM tbl t
, unnest(t.arr) WITH ORDINALITY a1(string, ord)
, unnest(string_to_array(a1.string, '$')) WITH ORDINALITY a2(elem, idx)
GROUP BY tbl_id, idx
ORDER BY tbl_id, idx;
We are looking at two (nested) LATERAL joins here. LATERAL requires Postgres 9.3. Details:
What is the difference between LATERAL and a subquery in PostgreSQL?
WITH ORDINALITY for the the first unnest() is up for debate. A simpler query normally works, too. It's just not guaranteed to work according to SQL standards:
SELECT tbl_id, idx, array_agg(elem) AS pivoted_array
FROM tbl t
, unnest(t.arr) string
, unnest(string_to_array(string, '$')) WITH ORDINALITY a2(elem, idx)
GROUP BY tbl_id, idx
ORDER BY tbl_id, idx;
Details:
PostgreSQL unnest() with element number
WITH ORDINALITY requires Postgres 9.4 or later. The same back-patched to Postgres 9.3:
SELECT tbl_id, idx, array_agg(arr2[idx]) AS pivoted_array
FROM tbl t
, LATERAL (
SELECT string_to_array(string, '$') AS arr2 -- convert string to array
FROM unnest(t.arr) string -- unnest org. array
) x
, generate_subscripts(arr2, 1) AS idx -- unnest 2nd array with ord. numbers
GROUP BY tbl_id, idx
ORDER BY tbl_id, idx;
Each query returns:
tbl_id | idx | pivoted_array
--------+-----+---------------
1 | 1 | {cat,dog}
1 | 2 | {bat,ball}
1 | 3 | {y,x}
2 | 1 | {cat2,dog2}
2 | 2 | {ball2,bat2}
3 | 1 | {a3,a1,a2}
3 | 2 | {b1,b2,b3}
3 | 3 | {c2,c1,c3}
3 | 4 | {d2,d3,d1}
SQL Fiddle (still stuck on pg 9.3).
The only requirement for these queries is that the number of parts in elements of the same array is constant. We could even make it work for a varying number of parts using crosstab() with two parameters to fill in NULL values for missing parts, but that's beyond the scope of this question:
PostgreSQL Crosstab Query
A bit messy but you could unnest the array, use regex to separate the text and then aggregate back up again:
with a as (select unnest('{cat$ball, dog$bat}'::_text) some_text),
b as (select regexp_matches(a.some_text, '(^[a-z]*)\$([a-z]*$)') animal_object from a)
select array_agg(animal_object[1]) animal, array_agg(animal_object[2]) a_object
from b
If you're processing multiple records at once you may want to use something like a row number before the unnest so that you have a group by to aggregate back to an array in your final select statement.

Parallel unnest() and sort order in PostgreSQL

I understand that using
SELECT unnest(ARRAY[5,3,9]) as id
without an ORDER BY clause, the order of the result set is not guaranteed. I could for example get:
id
--
3
5
9
But what about the following request:
SELECT
unnest(ARRAY[5,3,9]) as id,
unnest(ARRAY(select generate_series(1, array_length(ARRAY[5,3,9], 1)))) as idx
ORDER BY idx ASC
Is it guaranteed that the 2 unnest() calls (which have the same length) will unroll in parallel and that the index idx will indeed match the position of the item in the array?
I am using PostgreSQL 9.3.3.
Yes, that is a feature of Postgres and parallel unnesting is guaranteed to be in sync (as long as all arrays have the same number of elements).
Postgres 9.4 adds a clean solution for parallel unnest:
Unnest multiple arrays in parallel
The order of resulting rows is not guaranteed, though. Actually, with a statement as simple as:
SELECT unnest(ARRAY[5,3,9]) AS id;
the resulting order of rows is "guaranteed", but Postgres does not assert anything. The query optimizer is free to order rows as it sees fit as long as the order is not explicitly defined. This may have side effects in more complex queries.
If the second query in your question is what you actually want (add an index number to unnested array elements), there is a better way with generate_subscripts():
SELECT unnest(ARRAY[5,3,9]) AS id
, generate_subscripts(ARRAY[5,3,9], 1) AS idx
ORDER BY idx;
Details in this related answer:
How to access array internal index with postgreSQL?
You will be interested in WITH ORDINALITY in Postgres 9.4:
PostgreSQL unnest() with element number
Then you can use:
SELECT * FROM unnest(ARRAY[5,3,9]) WITH ORDINALITY tbl(id, idx);
Short answer: No, idx will not match the array positions, when accepting the premise that unnest() output may be randomly ordered.
Demo:
since the current implementation of unnest actually output the rows in the order of elements, I suggest to add a layer on top of it to simulate a random order:
CREATE FUNCTION unnest_random(anyarray) RETURNS setof anyelement
language sql as
$$ select unnest($1) order by random() $$;
Then check out a few executions of your query with unnest replaced by unnest_random:
SELECT
unnest_random(ARRAY[5,3,9]) as id,
unnest_random(ARRAY(select generate_series(1, array_length(ARRAY[5,3,9], 1)))) as idx
ORDER BY idx ASC
Example of output:
id | idx
----+-----
3 | 1
9 | 2
5 | 3
id=3 is associated with idx=1 but 3 was in 2nd position in the array. It's all wrong.
What's wrong in the query: it assumes that the first unnest will shuffle the elements using the same permutation as the second unnest (permutation in the mathematic sense: the relationship between order in the array and order of the rows). But this assumption contradicts the premise that the order output of unnest is unpredictable to start with.
About this question:
Is it guaranteed that the 2 unnest() calls (which have the same
length) will unroll in parallel
In select unnest(...) X1, unnest(...) X2, with X1 and X2 being of type SETOF something and having the same number of rows, X1 and X2 will be paired in the final output so that the X1 value at row N will face the X2 value at the same row N.
(it's a kind of UNION for columns, as opposed to a cartesian product).
But I wouldn't describe this pairing as unroll in parallel, so I'm not sure this is what you meant.
Anyway this pairing doesn't help with the problem since it happens after the unnest calls have lost the array positions.
An alternative: In this thread from the pgsql-sql mailing list, this function is suggested:
CREATE OR REPLACE FUNCTION unnest_with_ordinality(anyarray, OUT value
anyelement, OUT ordinality integer)
RETURNS SETOF record AS
$$
SELECT $1[i], i FROM
generate_series(array_lower($1,1),
array_upper($1,1)) i;
$$
LANGUAGE sql IMMUTABLE;
Based on this, we can order by the second output column:
select * from unnest_with_ordinality(array[5,3,9]) order by 2;
value | ordinality
-------+------------
5 | 1
3 | 2
9 | 3
With postgres 9.4 and above: The WITH ORDINALITY clause that can follow SET RETURNING function calls will provide this functionality in a generic way.