How can you expand a "condensed" PostgreSQL row into separate columns? - sql

I have a function which returns a table.
If you run SELECT * FROM some_function(12345) the result is:
object_id | name
----------------
12345 | "B"
If you run SELECT some_function(12345) the result is:
some_function
-------------
(12345,"B")
The problem is that I want the original form (so that I can access individual column values), but have the argument to some_function() come from a column in a table. I can execute SELECT some_function(thing_id) FROM things but this returns:
some_function
-------------
(12345,"B")
(12346,"C")
(12347,"D")
Whereas what I want returned is:
object_id | name
----------------
12345 | "B"
12346 | "C"
12347 | "D"
So how can one "unnest" or "expand" such a condensed row?

9.3 and above: lateral query
In PostgreSQL 9.3 or newer use an implicit lateral query:
SELECT f.* FROM things t, some_function(t.thing_id) f;
Prefer this formulation for all new queries. The above is the standard formulation.
It also works properly with functions that RETURNS TABLE or RETURNS SETOF RECORD as well as funcs with out-params that RETURNS RECORD.
It's shorthand for:
SELECT f.*
FROM things t
CROSS JOIN LATERAL some_function(t.thing_id) f;
Pre-9.3: wildcard expansion (with care)
Prior versions, causes multiple-evaluation of some_function, does not work if some_function returns a set, do not use this:
SELECT (some_function(thing_id)).* FROM things;
Prior versions, avoids multiple-evaluation of some_function using a second layer of indirection. Only use this if you must support quite old PostgreSQL versions.
SELECT (f).*
FROM (
SELECT some_function(thing_id) f
FROM things
) sub(f);
Demo:
Setup:
CREATE FUNCTION some_function(i IN integer, x OUT integer, y OUT text, z OUT text) RETURNS record LANGUAGE plpgsql AS $$
BEGIN
RAISE NOTICE 'evaluated with %',i;
x := i;
y := i::text;
z := 'dummy';
RETURN;
END;
$$;
create table things(thing_id integer);
insert into things(thing_id) values (1),(2),(3);
test run:
demo=> SELECT f.* FROM things t, some_function(t.thing_id) f;
NOTICE: evaluated with 1
NOTICE: evaluated with 2
NOTICE: evaluated with 3
x | y | z
---+---+-------
1 | 1 | dummy
2 | 2 | dummy
3 | 3 | dummy
(3 rows)
demo=> SELECT (some_function(thing_id)).* FROM things;
NOTICE: evaluated with 1
NOTICE: evaluated with 1
NOTICE: evaluated with 1
NOTICE: evaluated with 2
NOTICE: evaluated with 2
NOTICE: evaluated with 2
NOTICE: evaluated with 3
NOTICE: evaluated with 3
NOTICE: evaluated with 3
x | y | z
---+---+-------
1 | 1 | dummy
2 | 2 | dummy
3 | 3 | dummy
(3 rows)
demo=> SELECT (f).*
FROM (
SELECT some_function(thing_id) f
FROM things
) sub(f);
NOTICE: evaluated with 1
NOTICE: evaluated with 2
NOTICE: evaluated with 3
x | y | z
---+---+-------
1 | 1 | dummy
2 | 2 | dummy
3 | 3 | dummy
(3 rows)

SELECT * FROM (SELECT some_function(thing_id) FROM things) x;
The subselect SELECT some_function(thing_id) FROM things returns a row for each record found. The outer select "uncompresses" the row into separate columns.

Related

Select concatenated columns based on criteria list in other table

I have a table1
line
a
b
c
d
e
f
g
h
1
18
2
2
22
0
2
1
2
2
20
2
2
2
0
0
0
2
3
10
2
2
222
0
2
1
2
4
12
2
2
3
0
0
0
0
5
15
2
2
3
0
0
0
0
And a table2
 line
criteria
1
 a,b
2
 b,c,f,h
3
 a,b,e,g,h
4
 c,e
I am using this code to see/select the unique results of concated/joined columns, like concat(c,',',d), concat(b,',',d,',',g) and so on from table1 and is working perfectly:
SELECT DISTINCT(CONCAT(c,',',d))
FROM table1
But, instead of writing manually like concat(c,',',d), I want to refer to table2.criteria to get columns references to be concated/joined from table1 so that i can see the entire unique results against each concated criteria
Tried this, but getting an error:
SELECT DISTINCT(SELECT criteria FROM table2)
FROM table1
ERROR: more than one row returned by a subquery used as an expression
SQL state: 21000
The expected unique result is something like this;
| criteria | result |
| ------------ | ---------- |
| a,b | 15,2 |
| a,b | 10,2 |
| a,b | 20,2 |
| a,b | 12,2 |
| a,b | 18,2 |
| b,c,f,h | 2,2,2,2 |
| b,c,f,h | 2,2,0,2 |
| b,c,f,h | 2,2,0,0 |
| a,b,e,g,h | 20,2,0,0,2 |
| a,b,e,g,h | 12,2,0,0,0 |
| a,b,e,g,h | 15,2,0,0,0 |
| a,b,e,g,h | 10,2,0,1,2 |
| a,b,e,g,h | 18,2,0,1,2 |
| c,e | 2,0 |
SQL does not allow to parameterize identifiers. There are various ways to work around this restriction.
It's unclear from the question, but according to comments you want to concatenate the given pattern for every row in table1.
1. Dynamic SQL
Create a helper function (once!) that concatenates and executes statements dynamically.
Basics:
Define table and column names as arguments in a plpgsql function?
CREATE OR REPLACE FUNCTION f_concat_cols(_cols text)
RETURNS TABLE (result text)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE format(
$q$SELECT concat_ws(',', %s) FROM table1 ORDER BY line$q$, _cols);
END
$func$;
It's a set-returning function (a.k.a. "table function"), to return one result row for every row in table1 for each given pattern.
Warning: Converting user input to code like this is a prime opportunity for SQL injection. You must make sure that table1.criteria can only hold valid strings!
To get the full result matrix (with distinct results per row in table2), the query is simple now:
SELECT DISTINCT line AS t2_line, criteria, t1.*
FROM table2, f_concat_cols(criteria) t1
ORDER BY t2_line;
2. Workaround with conversion to JSON
SELECT DISTINCT t2.line AS t2_line, t2.criteria, c.*
FROM table2 t2
CROSS JOIN (SELECT line, to_json(t) AS js FROM table1 t) t1
CROSS JOIN LATERAL (
SELECT string_agg(t1.js->>sub, ',') AS result
FROM unnest(string_to_array(t2.criteria, ',')) sub
) c
ORDER BY t2_line;
After converting rows from t1 to a JSON record, we can access keys (converted from column names) directly.
I unnest the pattern, access each single key, and aggregate the result in LATERAL subquery. See:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
You could encapsulate the logic in a function like in 1., but that's optional in this case.
3. Workaround with conversion to Postgres arrays
SELECT DISTINCT t2.line AS t2_line, t2.criteria, c.*
FROM table2 t2
CROSS JOIN (SELECT line, ARRAY [a,b,c,d,e,f,g,h] AS arr FROM table1 t) t1
CROSS JOIN LATERAL (
SELECT string_agg(t1.arr[idx]::text, ',') AS result
FROM unnest(string_to_array(translate(t2.criteria, 'abcdefgh', '12345678'), ',')::int[]) idx
) c
ORDER BY t2_line;
Similar to the "trick" with JSON, we can avoid dynamic SQL by converting columns to a plain Postgres array. Then project column names to integer array indices. I use translate() for the simple case, but that only works for single letters! Use replace() or regexp_replace() or some other method for longer names.
The rest is like the above.
fiddle - showing all.

Handling multiple return values in subquery

I have the following data:
cte
=================
gp_id | m_ids
------|----------
1 | {123}
2 | {432,222}
3 | {123,222}
And a function with a signature like this (which in fact returns not a table but a couple of ids):
FUNCTION foo(m_ids integer[])
RETURNS TABLE (
first_id integer,
second_id integer
)
Now, I've got to iterate over each row and perform some calculations with that function, so I would get something like this:
gp_id | first_id | second_id
------|----------|-----------
1 | 25 | 25
2 | 13 | 24
3 | 25 | 11
To achieve that I tried the following code:
SELECT gp_id,
(
SELECT *
FROM foo(
(
SELECT m_ids
FROM cte c2
WHERE c2.gp_id = c1.gp_id)) limit 1)
FROM cte c1
The problem is in the SELECT * statement. If I use SELECT first_id, everything works well (except for that I have to run two consecutive queries, which I'd like to avoid, obviously), but in the former case I'm getting the error
subquery must return only one column
which is somewhat expected.
So how can I correctly iterate over the table in one single query?
Use the function in a lateral join:
select gp_id, first_id, second_id
from cte,
lateral foo(m_ids);

select function() in postgresql makes too much calls to function() [duplicate]

This question already has an answer here:
How to avoid multiple function evals with the (func()).* syntax in a query?
(1 answer)
Closed 7 years ago.
Let's assume we have this function:
create or replace function foo(a integer)
returns table (b integer, c integer)
language plpgsql
as $$
begin
raise notice 'foo()';
return query select a*2, a*4;
return query select a*6, a*8;
return query select a*10, a*12;
end;
$$;
The "raise notice 'foo()'" part will be used to know how many time the function is called.
If i call the function this way:
postgres=# SELECT i, foo(i) as bla FROM generate_series(1,3) as i;
NOTICE: foo()
NOTICE: foo()
NOTICE: foo()
i | bla
---+---------
1 | (2,4)
1 | (6,8)
1 | (10,12)
2 | (4,8)
2 | (12,16)
2 | (20,24)
3 | (6,12)
3 | (18,24)
3 | (30,36)
(9 rows)
We can see that, as expected, foo() is called 3 times.
But if i call the function this way (so i actually gets foo() result in different columns):
postgres=# SELECT i, (foo(i)).* FROM generate_series(1,3) as i;
NOTICE: foo()
NOTICE: foo()
NOTICE: foo()
NOTICE: foo()
NOTICE: foo()
NOTICE: foo()
i | b | c
---+----+----
1 | 2 | 4
1 | 6 | 8
1 | 10 | 12
2 | 4 | 8
2 | 12 | 16
2 | 20 | 24
3 | 6 | 12
3 | 18 | 24
3 | 30 | 36
(9 rows)
We can see that foo() is called 6 times. And if foo() was returning 3 columns, it would have been called 9 times. It's pretty clear that foo() is called for every i and every column it returns.
I don't understand why postgres does not make an optimisation here. And this is a problem for me as my (real) foo() may be CPU intensive. Any idea ?
Edit:
Using an "immutable" function or a function that does not return multiple rows gives the same behaviour:
create or replace function foo(a integer)
returns table (b integer, c integer, d integer)
language plpgsql
immutable
as $$
begin
raise notice 'foo';
return query select a*2, a*3, a*4;
end;
$$;
postgres=# select i, (foo(i)).* from generate_series(1,2) as i;
NOTICE: foo
NOTICE: foo
NOTICE: foo
NOTICE: foo
NOTICE: foo
NOTICE: foo
i | b | c | d
---+---+---+---
1 | 2 | 3 | 4
2 | 4 | 6 | 8
(2 rows)
This is a known issue.
SELECT (f(x)).*
is macro-expanded at parse-time into
SELECT (f(x)).a, (f(x)).b, ...
and PostgreSQL doesn't coalesce multiple calls to the same function down to a single call.
To avoid the issue you can wrap it in another layer of subquery so that the macro-expansion occurs on a simple reference to the function's result rather than the function invocation:
select i, (f).*
FROM (
SELECT i, foo(i) f from generate_series(1,2) as i
) x(i, f)
or use a lateral call in the FROM clause, which is preferred for newer versions:
select i, f.*
from generate_series(1,2) as i
CROSS JOIN LATERAL foo(i) f;
The CROSS JOIN LATERAL may be omitted, using legacy comma joins and an implicit lateral join, but I find it considerably clear to include it, especially when you're mixing other join types.
Basically it is reasonable not to call functions that return more than one value (especially functions returning sets) in select clause.
In fact postgres does not make any optimization for such a call.
Place your function in from clause.
SELECT i, f.* FROM generate_series(1,3) as i, foo(i) f;
In the documentation you can find the note (emphasis mine):
Currently, functions returning sets can also be called in the select
list of a query. For each row that the query generates by itself, the
function returning set is invoked, and an output row is generated for
each element of the function's result set. Note, however, that this
capability is deprecated and might be removed in future releases.

How to substring and join with another table with the substring result

I have 2 tables: errorlookup and errors.
errorlookup has 2 columns: codes and description.
The codes are of length 2.
errors has 2 columns id and errorcodes.
The errorcodes are of length 40 meaning they code store 20 error codes for each id.
I need to display all the description associated with the id by substring the errorcodes and matching with code in errorlookup table.
Sample data for errorlookup:
codes:description
12:Invalid
22:Inactive
21:active
Sample data for errors:
id:errorcodes
1:1221
2:2112
3:1222
I cant use LIKE as it would result in too many errors. I want the errorcodes column to be broken down into strings of length 2 and then joined with the errorlookup.
How can it be done?
If you really cannot alter the tables structure, here's another approach:
Create an auxilary numbers table:
CREATE TABLE numbers
( i INT NOT NULL
, PRIMARY KEY (i)
) ;
INSERT INTO numbers VALUES
( 1 ) ;
INSERT INTO numbers VALUES
( 2 ) ;
--- ...
INSERT INTO numbers VALUES
( 100 ) ;
Then you could use this:
SELECT err.id
, err.errorcodes
, num.i
, look.codes
, look.descriptionid
FROM
( SELECT i, 2*i-1 AS pos --- odd numbers
FROM numbers
WHERE i <= 20 --- 20 pairs
) num
CROSS JOIN
errors err
JOIN
errorlookup look
ON look.codes = SUBSTR(err.errorcodes, pos, 2)
ORDER BY
err.errorcodes
, num.i ;
Test at: SQL-Fiddle
ID ERRORCODES I CODES DESCRIPTIONID
1 1221 1 12 Invalid
1 1221 2 21 Active
3 1222 1 12 Invalid
3 1222 2 22 Inactive
2 2112 1 21 Active
2 2112 2 12 Invalid
I think the cleanest solution is to "normalize" your errocodes table using a PL/SQL function. That way you can keep the current (broken) table design, but still access its content as if it was properly normlized.
create type error_code_type as object (id integer, code varchar(2))
/
create or replace type error_table as table of error_code_type
/
create or replace function unnest_errors
return error_table pipelined
is
codes_l integer;
i integer;
one_row error_code_type := error_code_type(null, null);
begin
for err_rec in (select id, errorcodes from errors) loop
codes_l := length(err_rec.errorcodes);
i := 1;
while i < codes_l loop
one_row.id := err_rec.id;
one_row.code := substr(err_rec.errorcodes, i, 2);
pipe row (one_row);
i := i + 2;
end loop;
end loop;
end;
/
Now with this function you can do something like this:
select er.id, er.code, el.description
from table(unnest_errors) er
join errorlookup el on el.codes = er.code;
You can also create a view based on the function to make the statements a bit easier to read:
create or replace view normalized_errorcodes
as
select *
from table(unnest_errors);
Then you can simply reference the view in the real statement.
(I tested this on 11.2 but I believe it should work on 10.x as well)
I think you're on the right track with LIKE. MySQL has an RLIKE function that allows matching by regular expression (I don't know if it's present in Oracle.) You could use errorlookup.code as a pattern to match against errors.errorcodes. The (..)* pattern is used to prevent things like "1213" from matching, for example, "21".
SELECT *
FROM error
JOIN errorlookup
WHERE errorcodes RLIKE CONCAT('^(..)*',code)
ORDER BY id;
+------+----------+------+
| id | errorcode| code |
+------+----------+------+
| 1 | 11 | 11 |
| 2 | 1121 | 11 |
| 2 | 1121 | 21 |
| 3 | 21313245 | 21 |
| 3 | 21313245 | 31 |
| 3 | 21313245 | 32 |
| 4 | 21 | 21 |
+------+----------+------+

Deleting similar columns in SQL

In PostgreSQL 8.3, let's say I have a table called widgets with the following:
id | type | count
--------------------
1 | A | 21
2 | A | 29
3 | C | 4
4 | B | 1
5 | C | 4
6 | C | 3
7 | B | 14
I want to remove duplicates based upon the type column, leaving only those with the highest count column value in the table. The final data would look like this:
id | type | count
--------------------
2 | A | 29
3 | C | 4 /* `id` for this record might be '5' depending on your query */
7 | B | 14
I feel like I'm close, but I can't seem to wrap my head around a query that works to get rid of the duplicate columns.
count is a sql reserve word so it'll have to be escaped somehow. I can't remember the syntax for doing that in Postgres off the top of my head so I just surrounded it with square braces (change it if that isn't correct). In any case, the following should theoretically work (but I didn't actually test it):
delete from widgets where id not in (
select max(w2.id) from widgets as w2 inner join
(select max(w1.[count]) as [count], type from widgets as w1 group by w1.type) as sq
on sq.[count]=w2.[count] and sq.type=w2.type group by w2.[count]
);
There is a slightly simpler answer than Asaph's, with EXISTS SQL operator :
DELETE FROM widgets AS a
WHERE EXISTS
(SELECT * FROM widgets AS b
WHERE (a.type = b.type AND b.count > a.count)
OR (b.id > a.id AND a.type = b.type AND b.count = a.count))
EXISTS operator returns TRUE if the following SQL statement returns at least one record.
According to your requirements, seems to me that this should work:
DELETE
FROM widgets
WHERE type NOT IN
(
SELECT type, MAX(count)
FROM widgets
GROUP BY type
)