ARRAY_AGG without duplicates - sql

In PostgreSQL database I have table which has columns like ITEM_ID and PARENT_ITEM_ID.
| ITEM_ID | ITEM_NAME | PARENT_ITEM_ID |
|---------|-----------|----------------|
| 1 | A | 0 |
| 2 | B | 0 |
| 3 | C | 1 |
My task to take all values from these columns and put them to one array. In the same time I need delete all duplicates. I started with such SQL query but what the best way to delete duplicates?
SELECT
ARRAY_AGG(ITEM_ID || ',' || PARENT_ITEM_ID)
FROM
ITEMS_RELATIONSHIP
GROUP BY
ITEM_ID
I want such result:
[1,0,2,3]
Right now I have such result:
|{1,0}|
|{2,0}|
|{3,1}|

If you want one array of all item IDs, don't group by item_id. Something like this might be what you want:
select
array_agg(item_id, ',') as itemlist
from
(
select item_id from items_relationship
union
select parent_item_id from items_relationship
) as allitems;

Here is one method to get the parent item ids in with the other item ids:
select array_agg(distinct item_id)
from items_relationship ir cross join lateral
(values (ir.item_id), (ir.parent_item_id)) v(item_id);
This unpivots the data using a lateral join and then aggregates.

Related

How to convert JSONB array of pair values to rows and columns?

Given that I have a jsonb column with an array of pair values:
[1001, 1, 1002, 2, 1003, 3]
I want to turn each pair into a row, with each pair values as columns:
| a | b |
|------|---|
| 1001 | 1 |
| 1002 | 2 |
| 1003 | 3 |
Is something like that even possible in an efficient way?
I found a few inefficient (slow) ways, like using LEAD(), or joining the same table with the value from next row, but queries take ~ 10 minutes.
DDL:
CREATE TABLE products (
id int not null,
data jsonb not null
);
INSERT INTO products VALUES (1, '[1001, 1, 10002, 2, 1003, 3]')
DB Fiddle: https://www.db-fiddle.com/f/2QnNKmBqxF2FB9XJdJ55SZ/0
Thanks!
This is not an elegant approach from a declarative standpoint, but can you please see whether this performs better for you?
with indexes as (
select id, generate_series(1, jsonb_array_length(data) / 2) - 1 as idx
from products
)
select p.id, p.data->>(2 * i.idx) as a, p.data->>(2 * i.idx + 1) as b
from indexes i
join products p on p.id = i.id;
This query
SELECT j.data
FROM products
CROSS JOIN jsonb_array_elements(data) j(data)
should run faster if you just need to unpivot all elements within the query as in the demo.
Demo
or even remove the columns coming from products table :
SELECT jsonb_array_elements(data)
FROM products
OR
If you need to return like this
| a | b |
|------|---|
| 1001 | 1 |
| 1002 | 2 |
| 1003 | 3 |
as unpivoting two columns, then use :
SELECT MAX(CASE WHEN mod(rn,2) = 1 THEN data->>(rn-1)::int END) AS a,
MAX(CASE WHEN mod(rn,2) = 0 THEN data->>(rn-1)::int END) AS b
FROM
(
SELECT p.data, row_number() over () as rn
FROM products p
CROSS JOIN jsonb_array_elements(data) j(data)) q
GROUP BY ceil(rn/2::float)
ORDER BY ceil(rn/2::float)
Demo

How to count occurences in a list column on postgres?

I've a table with the following structure:
user | medias
----------------------
1 | {ps2,xbox}
1 | {nintendo,ps2}
How do i count the occurrences of each string in an array column?
Expected result:
media | amount
------------------
ps2 | 2
nintendo | 1
xbox | 1
You can unnest the array with a lateral join, then aggregate:
select x.media, count(*) amount
from myable t
cross join lateral unnest(t.medias) x(media)
group by x.media
order by amount desc, x.media

Batch SELECT ids in an array while preserving the array order

I have an array of ids that might repeats: [1,4,2,1,4,6,7]
Manually I have to do
SELECT * FROM products WHERE catid='1'
SELECT * FROM products WHERE catid='4'
SELECT * FROM products WHERE catid='2'
SELECT * FROM products WHERE catid='1'
.....
one by one and combine everything later
Is there a way to do it in a single query while preserving its order?
So I would get
| id | props |
|----|--------|
| 1 | 1_props|
| 4 | 4_props|
| 2 | 2_props|
| 1 | 1_props|
You can unnest the array and then join against it. The option with ordinality will include the index of the element in the array as a column. That can be used to sort the result:
select p.*
from products p
join unnest(array[1,4,2,1,4,6,7]) with ordinality as t(id, idx) on t.id = p.catid
order by t.idx;

SQL: List/aggregate all items for one corresponding transaction id

I have the following table in a vertica db:
+-----+------+
| Tid | Item |
+-----+------+
| 1 | A |
| 1 | B |
| 1 | C |
| 2 | B |
| 2 | D |
+-----+------+
And I want to get this table:
+-----+-------+-------+-------+
| Tid | Item1 | Item2 | Item3 |
+-----+-------+-------+-------+
| 1 | A | B | C |
| 2 | B | D | |
+-----+-------+-------+-------+
Keep in mind that I don't know the maximum item number a transaction_id (Tid) can have, and the amount of items per Tid is not constant. I tried using join and where but could not get it to work properly. Thanks for the help.
There is no PIVOT ability in Vertica. Columns can not be defined on the fly as part of the query. You have to specify.
There are perhaps other options, such as concatenating them in an aggregate using a UDX, such as what you will find in this Stack Overflow answer. But this will put them into a single field.
The only other alternative would be to build the pivot on the client side using something like Python. Else you have to have a way to generate the column lists for your query.
For my example, I am assuming you are dealing with a unique (Tid, Item) set. You may need to modify to suite your needs.
First you would need to determine the max number if items you need to support:
with Tid_count as (
select Tid, count(*) cnt
from mytable
group by 1
)
select max(cnt)
from Tid_count;
And let's say the most Items you had to support was 4, you would then generate a sql to pivot:
with numbered_mytable as (
select Tid,
Item,
row_number() over (partition by Tid order by Item) rn
from mytable
)
select Tid,
MAX(decode(rn,1,Item)) Item1,
MAX(decode(rn,2,Item)) Item2,
MAX(decode(rn,3,Item)) Item3,
MAX(decode(rn,4,Item)) Item4
from numbered_mytable
group by 1
order by 1;
Or if you don't want to generate SQL, but know you'll never have more than X items, you can just create a static form that goes to X.
You can try this:
Create table #table(id int,Value varchar(1))
insert into #table
select 1,'A'
union
select 1,'B'
union
select 1,'C'
union
select 2,'B'
union
select 2,'D'
select id,[1] Item1,[2] Item2,[3] Item3 from
(
select id,Dense_rank()over(partition by id order by value)Rnak,Value from #table
)d
Pivot
(Min(value) for Rnak in ([1],[2],[3]))p
drop table #table

Select the most common item for each category

Each row in my table belongs to some category, has some value and other data.
I would like to select each category with the most common value for it (doesn't matter which one if there are multiple), ordered by category.
some_table: expected result:
+--------+-----+--- +--------+-----+
|category|value|... |category|value|
+--------+-----+--- +--------+-----+
| 1 | a | | 1 | a |
| 1 | a | | 2 | b |
| 1 | b | | 3 | a # or b
| 2 | a | +--------+-----+
| 2 | b |
| 2 | c |
| 2 | b |
| 3 | a |
| 3 | a |
| 3 | b |
| 3 | b |
+--------+-----+---
I have a solution (posting it as an answer) but it seems suboptimal to me. So I'm looking for better solutions.
My table will have up to 10000 rows (possibly, but not likely, beyond that).
I'm planning to use SQLite but I'm not tied to it, so I may reconsider if SQLite can't do this with reasonable performance.
I would be inclined to do this using a correlated subquery:
select distinct category,
(select value
from some_table t2
where t2.category = t.category
group by value
order by count(*) desc
limit 1
) as mode_value
from some_table t;
The name for the most common value is "mode" in statistics.
And, if you had a categories table, this would be written as:
select category,
(select value
from some_table t2
where t2.category = c.category
group by value
order by count(*) desc
limit 1
) as mode_value
from categories c;
Here is one option, but I think it's slow...
SELECT DISTINCT `category` AS `the_category`, `value`
FROM `some_table`
WHERE `value`=(
SELECT `value`
FROM `some_table`
WHERE `category`=`the_category`
GROUP BY `value`
ORDER BY COUNT(`value`) DESC LIMIT 1)
ORDER BY `category`;
You can replace a part of this with WHERE `id`=( SELECT `id` if the table has a unique/primary key column, then the LIMIT 1 is not needed.
select category, value, count(*) value_count
from some_table t
group by category, value
order by category, value_count DESC;
returns us amout of each value in each category
select category, value
from (
select category, value, count(*) value_count
from some_table t
group by category, value) sub
group by category
actually we need the first value because it's sorted.
I am not sure sqlite leaves the first one and can't test but IMHO it should work