Handling multiple return values in subquery - sql

I have the following data:
cte
=================
gp_id | m_ids
------|----------
1 | {123}
2 | {432,222}
3 | {123,222}
And a function with a signature like this (which in fact returns not a table but a couple of ids):
FUNCTION foo(m_ids integer[])
RETURNS TABLE (
first_id integer,
second_id integer
)
Now, I've got to iterate over each row and perform some calculations with that function, so I would get something like this:
gp_id | first_id | second_id
------|----------|-----------
1 | 25 | 25
2 | 13 | 24
3 | 25 | 11
To achieve that I tried the following code:
SELECT gp_id,
(
SELECT *
FROM foo(
(
SELECT m_ids
FROM cte c2
WHERE c2.gp_id = c1.gp_id)) limit 1)
FROM cte c1
The problem is in the SELECT * statement. If I use SELECT first_id, everything works well (except for that I have to run two consecutive queries, which I'd like to avoid, obviously), but in the former case I'm getting the error
subquery must return only one column
which is somewhat expected.
So how can I correctly iterate over the table in one single query?

Use the function in a lateral join:
select gp_id, first_id, second_id
from cte,
lateral foo(m_ids);

Related

Select concatenated columns based on criteria list in other table

I have a table1
line
a
b
c
d
e
f
g
h
1
18
2
2
22
0
2
1
2
2
20
2
2
2
0
0
0
2
3
10
2
2
222
0
2
1
2
4
12
2
2
3
0
0
0
0
5
15
2
2
3
0
0
0
0
And a table2
 line
criteria
1
 a,b
2
 b,c,f,h
3
 a,b,e,g,h
4
 c,e
I am using this code to see/select the unique results of concated/joined columns, like concat(c,',',d), concat(b,',',d,',',g) and so on from table1 and is working perfectly:
SELECT DISTINCT(CONCAT(c,',',d))
FROM table1
But, instead of writing manually like concat(c,',',d), I want to refer to table2.criteria to get columns references to be concated/joined from table1 so that i can see the entire unique results against each concated criteria
Tried this, but getting an error:
SELECT DISTINCT(SELECT criteria FROM table2)
FROM table1
ERROR: more than one row returned by a subquery used as an expression
SQL state: 21000
The expected unique result is something like this;
| criteria | result |
| ------------ | ---------- |
| a,b | 15,2 |
| a,b | 10,2 |
| a,b | 20,2 |
| a,b | 12,2 |
| a,b | 18,2 |
| b,c,f,h | 2,2,2,2 |
| b,c,f,h | 2,2,0,2 |
| b,c,f,h | 2,2,0,0 |
| a,b,e,g,h | 20,2,0,0,2 |
| a,b,e,g,h | 12,2,0,0,0 |
| a,b,e,g,h | 15,2,0,0,0 |
| a,b,e,g,h | 10,2,0,1,2 |
| a,b,e,g,h | 18,2,0,1,2 |
| c,e | 2,0 |
SQL does not allow to parameterize identifiers. There are various ways to work around this restriction.
It's unclear from the question, but according to comments you want to concatenate the given pattern for every row in table1.
1. Dynamic SQL
Create a helper function (once!) that concatenates and executes statements dynamically.
Basics:
Define table and column names as arguments in a plpgsql function?
CREATE OR REPLACE FUNCTION f_concat_cols(_cols text)
RETURNS TABLE (result text)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE format(
$q$SELECT concat_ws(',', %s) FROM table1 ORDER BY line$q$, _cols);
END
$func$;
It's a set-returning function (a.k.a. "table function"), to return one result row for every row in table1 for each given pattern.
Warning: Converting user input to code like this is a prime opportunity for SQL injection. You must make sure that table1.criteria can only hold valid strings!
To get the full result matrix (with distinct results per row in table2), the query is simple now:
SELECT DISTINCT line AS t2_line, criteria, t1.*
FROM table2, f_concat_cols(criteria) t1
ORDER BY t2_line;
2. Workaround with conversion to JSON
SELECT DISTINCT t2.line AS t2_line, t2.criteria, c.*
FROM table2 t2
CROSS JOIN (SELECT line, to_json(t) AS js FROM table1 t) t1
CROSS JOIN LATERAL (
SELECT string_agg(t1.js->>sub, ',') AS result
FROM unnest(string_to_array(t2.criteria, ',')) sub
) c
ORDER BY t2_line;
After converting rows from t1 to a JSON record, we can access keys (converted from column names) directly.
I unnest the pattern, access each single key, and aggregate the result in LATERAL subquery. See:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
You could encapsulate the logic in a function like in 1., but that's optional in this case.
3. Workaround with conversion to Postgres arrays
SELECT DISTINCT t2.line AS t2_line, t2.criteria, c.*
FROM table2 t2
CROSS JOIN (SELECT line, ARRAY [a,b,c,d,e,f,g,h] AS arr FROM table1 t) t1
CROSS JOIN LATERAL (
SELECT string_agg(t1.arr[idx]::text, ',') AS result
FROM unnest(string_to_array(translate(t2.criteria, 'abcdefgh', '12345678'), ',')::int[]) idx
) c
ORDER BY t2_line;
Similar to the "trick" with JSON, we can avoid dynamic SQL by converting columns to a plain Postgres array. Then project column names to integer array indices. I use translate() for the simple case, but that only works for single letters! Use replace() or regexp_replace() or some other method for longer names.
The rest is like the above.
fiddle - showing all.

sql: filter on created column

Let's say I have a very simple sql table:
id | step
----+------
1 | 1
2 | 2
3 | 3
I'm trying to create a new column based on a simple operation and filter on that new column. The newly created table before the filter should look like this:
id | step | new
----+------+-----
1 | 1 | 10
2 | 2 | 20
3 | 3 | 30
I thought the following query should work:
select id, step, step*10 as new from event where new = 20
But I'm getting the following error:
ERROR: column "new" does not exist
where is processed before select which is why calculated column new is unknown at that time. Use step*10=20 instead.
Or use a sub-query and filter with the alias.
select *
from (select id, step, jsonb_array_elements(payload::jsonb->'sample_id') as new
from event
) e
where new = --somevalue
If you don't like subqueries, you can use a lateral join:
with event as (
select 1 as id, 2 as step
)
select e.id, e.step, v.new
from event e cross join lateral
(values (e.step * 10)) v(new)
where v.new = 20;
This is convenient when you have multiple expressions that depend on each other. Rather than nesting subqueries or CTEs, you can just add them to the FROM clause.

Postgres GROUP BY Array Column

I use postgres & have a table like this :
id | arr
-------------------
1 | [A,B,C]
2 | [C,B,A]
3 | [A,A,B]
4 | [B,A,B]
I created a GROUP BY 'arr' query.
SELECT COUNT(*) AS total, "arr" FROM "table" GROUP BY "arr"
... and the result :
total | arr
-------------------
1 | [A,B,C]
1 | [C,B,A]
1 | [A,A,B]
1 | [B,A,B]
BUT, since [A,B,C] and [C,B,A] have the same elements, so i expected the result should be like this :
total | arr
-------------------
2 | [A,B,C]
2 | [A,A,B]
Did i miss something (in query) or else? Please help me..
You do not need to create a separate function to do this. It can all be done in a single statement:
select array(select unnest(arr) order by 1) as sorted_arr, count(*)
from t
group by sorted_arr;
Here is a rextester.
[A,B,C] and [C,B,A] are different arrays even if they have the same elements they are not in the same position, they will never be grouped by a group by clause, in case you want to make them equivalent you'd need to sort them first.
On this thread you have info abour sorting arrays.
You should do something like:
SELECT COUNT(*) AS total, array_sort("arr") FROM "table" GROUP BY array_sort("arr")
After creating a sort function like the one proposed in there:
CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(SELECT unnest($1) ORDER BY 1)
$$;

Counting the total number of rows with SELECT DISTINCT ON without using a subquery

I have performing some queries using PostgreSQL SELECT DISTINCT ON syntax. I would like to have the query return the total number of rows alongside with every result row.
Assume I have a table my_table like the following:
CREATE TABLE my_table(
id int,
my_field text,
id_reference bigint
);
I then have a couple of values:
id | my_field | id_reference
----+----------+--------------
1 | a | 1
1 | b | 2
2 | a | 3
2 | c | 4
3 | x | 5
Basically my_table contains some versioned data. The id_reference is a reference to a global version of the database. Every change to the database will increase the global version number and changes will always add new rows to the tables (instead of updating/deleting values) and they will insert the new version number.
My goal is to perform a query that will only retrieve the latest values in the table, alongside with the total number of rows.
For example, in the above case I would like to retrieve the following output:
| total | id | my_field | id_reference |
+-------+----+----------+--------------+
| 3 | 1 | b | 2 |
+-------+----+----------+--------------+
| 3 | 2 | c | 4 |
+-------+----+----------+--------------+
| 3 | 3 | x | 5 |
+-------+----+----------+--------------+
My attemp is the following:
select distinct on (id)
count(*) over () as total,
*
from my_table
order by id, id_reference desc
This returns almost the correct output, except that total is the number of rows in my_table instead of being the number of rows of the resulting query:
total | id | my_field | id_reference
-------+----+----------+--------------
5 | 1 | b | 2
5 | 2 | c | 4
5 | 3 | x | 5
(3 rows)
As you can see it has 5 instead of the expected 3.
I can fix this by using a subquery and count as an aggregate function:
with my_values as (
select distinct on (id)
*
from my_table
order by id, id_reference desc
)
select count(*) over (), * from my_values
Which produces my expected output.
My question: is there a way to avoid using this subquery and have something similar to count(*) over () return the result I want?
You are looking at my_table 3 ways:
to find the latest id_reference for each id
to find my_field for the latest id_reference for each id
to count the distinct number of ids in the table
I therefore prefer this solution:
select
c.id_count as total,
a.id,
a.my_field,
b.max_id_reference
from
my_table a
join
(
select
id,
max(id_reference) as max_id_reference
from
my_table
group by
id
) b
on
a.id = b.id and
a.id_reference = b.max_id_reference
join
(
select
count(distinct id) as id_count
from
my_table
) c
on true;
This is a bit longer (especially the long thin way I write SQL) but it makes it clear what is happening. If you come back to it in a few months time (somebody usually does) then it will take less time to understand what is going on.
The "on true" at the end is a deliberate cartesian product because there can only ever be exactly one result from the subquery "c" and you do want a cartesian product with that.
There is nothing necessarily wrong with subqueries.

Get last element of an ordered set in postgresql

I am trying to get the last element of an ordered set, stored in a database table. The ordering is defined by one of the columns in the table. Also the table contains multiple sets, so I want the last one for each of the sets.
As an example consider the following table:
benchmarks=# select id,sorter from aggtest ;
id | sorter
----+--------
1 | 1
3 | 1
5 | 1
2 | 2
7 | 2
4 | 1
6 | 2
(7 rows)
Sorter 1 and 2 define each of the sets, sets are ordered by the id column. To get the last element of each set, I defined an aggregate function:
CREATE FUNCTION public.last_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS $$
SELECT $2;
$$;
CREATE AGGREGATE public.last (
sfunc = public.last_agg,
basetype = anyelement,
stype = anyelement
);
As explained here.
However when I use this I get:
benchmarks=# select last(id),sorter from aggtest group by sorter order by sorter;
last | sorter
------+--------
4 | 1
6 | 2
(2 rows)
However, I want to get (5,1) and (7,2) as these are the last ids (numerically) in the set. Looking at how the aggregate mechanism works, I can see quite well, why the result is not what I want. The items are returned in the order I added them, and then aggregated so that the last one I added is returned.
I tried sorting by ids, so that each group is sorted independently, however that does not work:
benchmarks=# select last(id),sorter from aggtest group by sorter order by sorter,id;
ERROR: column "aggtest.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...(id),sorter from aggtest group by sorter order by sorter,id;
If I wrap the sorting criteria in another aggregate, I get wrong data again:
benchmarks=# select last(id),sorter from aggtest group by sorter order by sorter,last(id);
last | sorter
------+--------
4 | 1
6 | 2
(2 rows)
Also grouping by id in addition to sorter does not work obviously.
Of course there is an easier way, to get the last (highest) id for each group by using the max aggregate. However, I am not so much interested in the id but as in data associated with it (i.e. in the same row). Hence I do not to sort by id and then aggregate so that the row with the highest id is returned for each group.
What is the best way to accomplish this?
EDIT: Why does max(id) grouped by sorter not work
Assume the following complete table (unsorter represents the additional data I have in the table):
benchmarks=# select * from aggtest ;
id | sorter | unsorter
----+--------+----------
1 | 1 | 1
3 | 1 | 2
5 | 1 | 3
2 | 2 | 4
7 | 2 | 5
4 | 1 | 6
6 | 2 | 7
(7 rows)
I would like to retrieve the lines:
id | sorter | unsorter
----+--------+----------
5 | 1 | 3
7 | 2 | 5
However with max(id) and grouping by sorter I get:
benchmarks=# select max(id),sorter,unsorter from aggtest group by sorter;
ERROR: column "aggtest.unsorter" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select max(id),sorter,unsorter from aggtest group by sorter;
Using a max(unsorter) obviously does not work either:
benchmarks=# select max(id),sorter,max(unsorter) from aggtest group by sorter;
max | sorter | max
-----+--------+-----
5 | 1 | 6
7 | 2 | 7
(2 rows)
However using distinct (the accepted answer) I get:
benchmarks=# select distinct on (sorter) id,sorter,unsorter from aggtest order by sorter, id desc;
id | sorter | unsorter
----+--------+----------
5 | 1 | 3
7 | 2 | 5
(2 rows)
Which has the correct additional data. The join approach also seems to work, by is slightly slower on the test data.
Why not use a window function:
select id, sorter
from (
select id, sorter,
row_number() over (partition by sorter order by id desc) as rn
from aggtest
) t
where rn = 1;
Or using Postgres distinct on operator which is usually faster:
select distinct on (sorter) id, sorter
from aggtest
order by sorter, id desc
You write:
Of course there is an easier way, to get the last (highest) id for
each group by using the max aggregate. However, I am not so much
interested in the id but as in data associated with it (i.e. in the
same row).
This query will give you the data associated with the highest id of each sorter group.
select a.* from aggtest a
join (
select max(id) max_id, sorter
from aggtest
group by sorter
) b on a.id = b.max_id and a.sorter = b.sorter
select distinct max(id) over (partition by sorter) id,sorter
from aggtest order by 2 asc
returns:
5;1
7;2