Find duplicated values on array column - sql

I have a table with a array column like this:
my_table
id array
-- -----------
1 {1, 3, 4, 5}
2 {19,2, 4, 9}
3 {23,46, 87, 6}
4 {199,24, 93, 6}
And i want as result what and where is the repeated values, like this:
value_repeated is_repeated_on
-------------- -----------
4 {1,2}
6 {3,4}
Is it possible? I don't know how to do this. I don't how to start it! I'm lost!

Use unnest to convert the array to rows, and then array_agg to build an array from the ids
It should look something like this:
SELECT v AS value_repeated,array_agg(id) AS is_repeated_on FROM
(select id,unnest(array) as v from my_table)
GROUP by v HAVING Count(Distinct id) > 1
Note that HAVING Count(Distinct id) > 1 is filtering values that don't appear even once

The clean way to call a set-returning function like unnest() is in a LATERAL join, available since Postgres 9.3:
SELECT value_repeated, array_agg(id) AS is_repeated_on
FROM my_table
, unnest(array_col) value_repeated
GROUP BY value_repeated
HAVING count(*) > 1
ORDER BY value_repeated; -- optional
About LATERAL:
Call a set-returning function with an array argument multiple times
There is nothing in your question to rule out shortcut duplicates (the same element more than once in the same array (like I#MSoP commented), so it must be count(*), not count (DISTINCT id).

Related

How to get WHERE come after group by in sqlite3

I am working with a dataset, I used GROUP BY to get the count of one of the columns. Then, I want to filter the columns with count > 2, I find that WHERE does not work after GROUP BY, may I ask what should I do?
For example, for the id_table with only one column id with the numbers [2, 3, 2, 3, 4, 2], I want to count each and find which id appears more than two times, in this case, the output should be 2 since it appeared 3 times. My code is like below:
SELECT id
FROM id_table
GROUP BY id
WHERE count(id) > 2;
The error code is: near "WHERE": syntax error
Try this (HAVING instead of WHERE):
SELECT id
FROM id_table
GROUP BY id
HAVING count(id) > 2
;

Postgres filter rows matching grouped foreign key that includes all values of an array

I've got 3 tables:
create table events (id serial, ...)
create table devices (id serial, ...)
create table event_devices (event_id int, device_id int, ...)
Let's say the data in event_devices looks like this:
event_id | device_id
--------------------
1 | 1
1 | 2
1 | 3
2 | 1
2 | 4
I need to conduct a search for two cases:
filter all events that contain any device in a given list such that
{1, 4} -> (1, 2)
{1, 2, 3} -> (1, 2)
filter all events that contain all devices in a given list such that
{1, 4} -> (2)
{1, 2, 3} -> ()
Let's say the given list is input as an array of ints.
The first case is pretty simple; I can simply use "IN":
with
devices_filter as (
select distinct
event_devices.event_id
from event_devices
where
event_devices.device_id in (select unnest($1::int[]) as device_id)
)
select
events.id as event_id
from events
left outer join devices_filter on
devices_filter.event_id = events.id
where
devices_filter.event_id is not null
But how do I query for the second case? I've thought may be I need another CTE that groups and aggregates device ids based on event id, then perform an intersection, and check that the resulting length is equal to the length of the input array, but I'm not sure exactly how that would work. I'd also like to avoid any unnecessary grouping, since the event_devices table can be quite large.
Any hints or tips?
If you are passing in an array that has no duplicates, you can use aggregation:
select ed.event_id
from event_devices ed
where ed.device_id = any (:array)
group by ed.event_id
having count(*) = cardinality(:array)
If you need to cast the values, then :array is really $1::int[].

How to apply lists of filter values on two columns in lock-step?

Let's assume I have the table with below columns and records:
id shop_id product_id
1 1 1
2 1 2
3 2 1
4 2 3
I want to run single query to get ID 1 and ID 4 records when query looks like this:
ShopProduct.where(shop_id: 1, product_id: 1).where(shop_id: 2, product_id: 3)
The problem is when I try to simplify like this:
ShopProduct.where(shop_id: [1,2], product_id: [1,3])
Then I get three records, not two as expected.
A simple solution for few input pairs: ROW values:
SELECT *
FROM "ShopProduct"
WHERE (shop_id, product_id) IN ((1,1), (2,3));
Related:
SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
If you have two long arrays you want to process in "lock-step", other forms may be faster / more convenient. Like: unnest two arrays in parallel (in lock-step), then join:
SELECT *
FROM unnest('{1,2}'::int[], '{1,3}'::int[]) t(shop_id, product_id)
JOIN "ShopProduct" USING (shop_id, product_id);
There is an overloaded version of the function unnest() that accepts multiple input arrays. See:
Unnest multiple arrays in parallel
db<>fiddle here
You can achieve it with or condition starting from rails 5
ShopProduct.where(shop_id: 1, product_id: 1).or(ShopProduct.where(shop_id: 2, product_id: 3))
Related:
Rails find records using an array of hashes

Postgres union of queries in loop

I have a table with two columns. Let's call them
array_column and text_column
I'm trying to write a query to find out, for K ranging from 1 to 10, in how many rows does the value in text_column appear in the first K elements of array_column
I'm expecting results like:
k | count
________________
1 | 70
2 | 85
3 | 90
...
I did manage to get these results by simply repeating the query 10 times and uniting the results, which looks like this:
SELECT 1 AS k, count(*) FROM table WHERE array_column[1:1] #> ARRAY[text_column]
UNION ALL
SELECT 2 AS k, count(*) FROM table WHERE array_column[1:2] #> ARRAY[text_column]
UNION ALL
SELECT 3 AS k, count(*) FROM table WHERE array_column[1:3] #> ARRAY[text_column]
...
But that doesn't looks like the correct way to do it. What if I wanted a very large range for K?
So my question is, is it possible to perform queries in a loop, and unite the results from each query? Or, if this is not the correct approach to the problem, how would you do it?
Thanks in advance!
You could use array_positions() which returns an array of all positions where the argument was found in the array, e.g.
select t.*,
array_positions(array_column, text_column)
from the_table t;
This returns a different result but is a lot more efficient as you don't need to increase the overall size of the result. To only consider the first ten array elements, just pass a slice to the function:
select t.*,
array_positions(array_column[1:10], text_column)
from the_table t;
To limit the result to only rows that actually contain the value you can use:
select t.*,
array_positions(array_column[1:10], text_column)
from the_table t
where text_column = any(array_column[1:10]);
To get your desired result, you could use unnest() to turn that into rows:
select k, count(*)
from the_table t, unnest(array_positions(array_column[1:10], text_column)) as k
where text_column = any(array_column[1:10])
group by k
order by k;
You can use the generate_series function to generate a table with the expected number of rows with the expected values and then join to it within the query, like so:
SELECT t.k AS k, count(*)
FROM table
--right join ensures that you will get a value of 0 if there are no records meeting the criteria
right join (select generate_series(1,10) as k) t
on array_column[1:t.k] #> ARRAY[text_column]
group by t.k
This is probably the closest thing to using a loop to go through the results without using something like PL/SQL to do an actual loop in a user-defined function.

Combine elements of array into different array

I need to split text elements in an array and combine the elements (array_agg) by index into different rows
E.g., input is
'{cat$ball$x... , dog$bat$y...}'::text[]
I need to split each element by '$' and the desired output is:
{cat,dog} - row 1
{ball,bat} - row 2
{x,y} - row 3
...
Sorry for not being clear the first time. I have edited my question. I tried similar options but unable to figure out how to get it with multiple text elements separated with '$' sysmbol
Exactly two parts per array element (original question)
Use unnest(), split_part() and array_agg():
SELECT array_agg(split_part(t, '$', 1)) AS col1
, array_agg(split_part(t, '$', 2)) AS col2
FROM unnest('{cat$ball, dog$bat}'::text[]) t;
Related:
Split comma separated column data into additional columns
General solution (updated question)
For any number of arrays with any number of elements containing any number of parts.
Demo for a table tbl:
CREATE TABLE tbl (tbl_id int PRIMARY KEY, arr text[]);
INSERT INTO tbl VALUES
(1, '{cat1$ball1, dog2$bat2}') -- 2 parts per array element, 2 elements
, (2, '{cat$ball$x, dog$bat$y}') -- 3 parts ...
, (3, '{a1$b1$c1$d1, a2$b2$c2$d2, a3$b3$c3$d3}'); -- 4 parts, 3 elements
Query:
SELECT tbl_id, idx, array_agg(elem ORDER BY ord) AS pivoted_array
FROM tbl t
, unnest(t.arr) WITH ORDINALITY a1(string, ord)
, unnest(string_to_array(a1.string, '$')) WITH ORDINALITY a2(elem, idx)
GROUP BY tbl_id, idx
ORDER BY tbl_id, idx;
We are looking at two (nested) LATERAL joins here. LATERAL requires Postgres 9.3. Details:
What is the difference between LATERAL and a subquery in PostgreSQL?
WITH ORDINALITY for the the first unnest() is up for debate. A simpler query normally works, too. It's just not guaranteed to work according to SQL standards:
SELECT tbl_id, idx, array_agg(elem) AS pivoted_array
FROM tbl t
, unnest(t.arr) string
, unnest(string_to_array(string, '$')) WITH ORDINALITY a2(elem, idx)
GROUP BY tbl_id, idx
ORDER BY tbl_id, idx;
Details:
PostgreSQL unnest() with element number
WITH ORDINALITY requires Postgres 9.4 or later. The same back-patched to Postgres 9.3:
SELECT tbl_id, idx, array_agg(arr2[idx]) AS pivoted_array
FROM tbl t
, LATERAL (
SELECT string_to_array(string, '$') AS arr2 -- convert string to array
FROM unnest(t.arr) string -- unnest org. array
) x
, generate_subscripts(arr2, 1) AS idx -- unnest 2nd array with ord. numbers
GROUP BY tbl_id, idx
ORDER BY tbl_id, idx;
Each query returns:
tbl_id | idx | pivoted_array
--------+-----+---------------
1 | 1 | {cat,dog}
1 | 2 | {bat,ball}
1 | 3 | {y,x}
2 | 1 | {cat2,dog2}
2 | 2 | {ball2,bat2}
3 | 1 | {a3,a1,a2}
3 | 2 | {b1,b2,b3}
3 | 3 | {c2,c1,c3}
3 | 4 | {d2,d3,d1}
SQL Fiddle (still stuck on pg 9.3).
The only requirement for these queries is that the number of parts in elements of the same array is constant. We could even make it work for a varying number of parts using crosstab() with two parameters to fill in NULL values for missing parts, but that's beyond the scope of this question:
PostgreSQL Crosstab Query
A bit messy but you could unnest the array, use regex to separate the text and then aggregate back up again:
with a as (select unnest('{cat$ball, dog$bat}'::_text) some_text),
b as (select regexp_matches(a.some_text, '(^[a-z]*)\$([a-z]*$)') animal_object from a)
select array_agg(animal_object[1]) animal, array_agg(animal_object[2]) a_object
from b
If you're processing multiple records at once you may want to use something like a row number before the unnest so that you have a group by to aggregate back to an array in your final select statement.