Union different tables - sql

The following function joins dynamically different tables.
create or replace function unified_tables() returns table(
1 TEXT
, 2 TEXT
, 3 TEXT
, 4 TEXT
, 5 JSONB
, 6 BIGINT
, 7 BIGINT
, 8 TEXT
, 9 TEXT
, 10 TIMESTAMPTZ
)
as
$$
declare
a record;
begin
for a in select table_schema
from information_schema.tables
where table_name = 'name'
loop
return query
execute format('select %L as source_schema, * from %I.name', a.table_schema, a.table_schema);
end loop;
end;
$$
language plpgsql;
Unfortunately, not all the tables called have all the columns specified in RETURNS TABLE.
Precisely, there are 15 tables (the loop goes over 200+ tables) missing the column 2, two tables missing the column 4, and five tables missing the column 9.
Future tables entering the loop might miss columns as well. I do not have control on the source structure.
How can I keep using the function adding a null value for the missing columns so to maintain the structure defined in the RETURNS TABLE?

You can create a set returning function for this:
create function get_all_pages()
returns table (....)
as
$$
declare
l_info_rec record;
begin
for l_info_rec in select table_schema
from information_schema.tables
where table_name = 'page'
loop
return query
execute format('select %L as source_schema, *
from %I.page', l_info_rec.table_schema, l_info_rec.table_schema);
end loop;
end;
$$
language plpgsql;
Then run:
select *
from get_all_pages();
return query in a PL/pgSQL function doesn't end the function. It simply appends the result of the query to the result of the function.
You can pick any table as the return type, it just serves as a "placeholder" in this case (again: assuming all tables are 100% identical). Alternatively you could use returns table (....) - but that will require you to list all the columns of the table manually.
Note that this will buffer the complete result on the server before the function returns, so this might not be suitable for really large tables.
Another option is to create an event trigger that re-creates a VIEW (that does a UNION ALL) each time a new table is created or an existing one is dropped.

Related

Dynamic query that uses CTE gets "syntax error at end of input"

I have a table that looks like this:
CREATE TABLE label (
hid UUID PRIMARY KEY DEFAULT UUID_GENERATE_V4(),
name TEXT NOT NULL UNIQUE
);
I want to create a function that takes a list of names and inserts multiple rows into the table, ignoring duplicate names, and returns an array of the IDs generated for the rows it inserted.
This works:
CREATE OR REPLACE FUNCTION insert_label(nms TEXT[])
RETURNS UUID[]
AS $$
DECLARE
ids UUID[];
BEGIN
CREATE TEMP TABLE tmp_names(name TEXT);
INSERT INTO tmp_names SELECT UNNEST(nms);
WITH new_names AS (
INSERT INTO label(name)
SELECT tn.name
FROM tmp_names tn
WHERE NOT EXISTS(SELECT 1 FROM label h WHERE h.name = tn.name)
RETURNING hid
)
SELECT ARRAY_AGG(hid) INTO ids
FROM new_names;
DROP TABLE tmp_names;
RETURN ids;
END;
$$ LANGUAGE PLPGSQL;
I have many tables with the exact same columns as the label table, so I would like to have a function that can insert into any of them. I'd like to create a dynamic query to do that. I tried that, but this does not work:
CREATE OR REPLACE FUNCTION insert_label(h_tbl REGCLASS, nms TEXT[])
RETURNS UUID[]
AS $$
DECLARE
ids UUID[];
query_str TEXT;
BEGIN
CREATE TEMP TABLE tmp_names(name TEXT);
INSERT INTO tmp_names SELECT UNNEST(nms);
query_str := FORMAT('WITH new_names AS ( INSERT INTO %1$I(name) SELECT tn.name FROM tmp_names tn WHERE NOT EXISTS(SELECT 1 FROM %1$I h WHERE h.name = tn.name) RETURNING hid)', h_tbl);
EXECUTE query_str;
SELECT ARRAY_AGG(hid) INTO ids FROM new_names;
DROP TABLE tmp_names;
RETURN ids;
END;
$$ LANGUAGE PLPGSQL;
This is the output I get when I run that function:
psql=# select insert_label('label', array['how', 'now', 'brown', 'cow']);
ERROR: syntax error at end of input
LINE 1: ...SELECT 1 FROM label h WHERE h.name = tn.name) RETURNING hid)
^
QUERY: WITH new_names AS ( INSERT INTO label(name) SELECT tn.name FROM tmp_names tn WHERE NOT EXISTS(SELECT 1 FROM label h WHERE h.name = tn.name) RETURNING hid)
CONTEXT: PL/pgSQL function insert_label(regclass,text[]) line 19 at EXECUTE
The query generated by the dynamic SQL looks like it should be exactly the same as the query from static SQL.
I got the function to work by changing the return value from an array of UUIDs to a table of UUIDs and not using CTE:
CREATE OR REPLACE FUNCTION insert_label(h_tbl REGCLASS, nms TEXT[])
RETURNS TABLE (hid UUID)
AS $$
DECLARE
query_str TEXT;
BEGIN
CREATE TEMP TABLE tmp_names(name TEXT);
INSERT INTO tmp_names SELECT UNNEST(nms);
query_str := FORMAT('INSERT INTO %1$I(name) SELECT tn.name FROM tmp_names tn WHERE NOT EXISTS(SELECT 1 FROM %1$I h WHERE h.name = tn.name) RETURNING hid', h_tbl);
RETURN QUERY EXECUTE query_str;
DROP TABLE tmp_names;
RETURN;
END;
$$ LANGUAGE PLPGSQL;
I don't know if one way is better than the other, returning an array of UUIDs or a table of UUIDs, but at least I got it to work one of those ways. Plus, possibly not using a CTE is more efficient, so it may be better to stick with the version that returns a table of UUIDs.
What I would like to know is why the dynamic query did not work when using a CTE. The query it produced looked like it should have worked.
If anyone can let me know what I did wrong, I would appreciate it.
... why the dynamic query did not work when using a CTE. The query it produced looked like it should have worked.
No, it was only the CTE without (required) outer query. (You had SELECT ARRAY_AGG(hid) INTO ids FROM new_names in the static version.)
There are more problems, but just use this query instead:
INSERT INTO label(name)
SELECT unnest(nms)
ON CONFLICT DO NOTHING
RETURNING hid;
label.name is defined UNIQUE NOT NULL, so this simple UPSERT can replace your function insert_label() completely.
It's much simpler and faster. It also defends against possible duplicates from within your input array that you didn't cover, yet. And it's safe under concurrent write load - as opposed to your original, which might run into race conditions. Related:
How to use RETURNING with ON CONFLICT in PostgreSQL?
I would just use the simple query and replace the table name.
But if you still want a dynamic function:
CREATE OR REPLACE FUNCTION insert_label(_tbl regclass, _nms text[])
RETURNS TABLE (hid uuid)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE format(
$$
INSERT INTO %s(name)
SELECT unnest($1)
ON CONFLICT DO NOTHING
RETURNING hid
$$, _tbl)
USING _nms;
END
$func$;
If you don't need an array as result, stick with the set (RETURNS TABLE ...). Simpler.
Pass values (_nms) to EXECUTE in a USING clause.
The tablename (_tbl) is type regclass, so the format specifier %I for format() would be wrong. Use %s instead. See:
Table name as a PostgreSQL function parameter

Creating a table with certain columns from another table

Lets say we have a table A with 200 different columns. We want to select columns that contain a certain substring (e.g. the "host" substring in "host_id", "host_name", "average_host_rating"), and create a new table B with only those columns and their data imported from a .csv file.
I tried creating the new table manually, however this is not good practice and i want to improve the code by making it valid and functional even if i add more columns to table A.
Creating the table manually:
SELECT
listings.host_id,
listings.host_url ,
..
..
listings.host_name ,
listings.host_since ,
INTO host_table
FROM listings
WHERE TRUE;
Trying to create the table in a better way:
CREATE TABLE B AS
SELECT *
FROM A
WHERE A::text LIKE '%host%'
I expected it to create table B with every column that contains 'host' in its name however it returned an exact copy of table A (and all its data). I tried different ways and methods of creating new tables, however the problem always was that i could not isolate only the columns with the specified substring ('host').
What could be wrong in my syntax, way of thinking or anything else?
Thanks in advance!
You may create and call a function with parameters. The function will dynamically chose the columns from information_schema.columns.
Note that where false is used because you mentioned data will come from a csv file and not from the original table.
create or replace function
fn_gen_tab_text ( curr_tab_in text,tab_text_in TEXT, new_tab_in text)
RETURNS void AS
$body$
declare v_sql TEXT;
BEGIN
select 'CREATE TABLE %I AS select ' || string_agg(column_name,',')
||' from %I where false'
into v_sql from information_schema.columns
where table_name = curr_tab_in
and column_name like '%'||tab_text_in||'%';
EXECUTE format (v_sql,new_tab_in,curr_tab_in);
END $body$ language plpgsql
Call it as
select fn_gen_tab_text('host_table','host','new_table' );
DEMO

PostgreSQL Function returning result set from dynamic tables names

In my database, I have the standard app tables and backup tables. Eg. for a table "employee", I have a table called "bak_employee". The bak_employee table is a backup of the employee table. I use it to restore the employee table between tests.
I'd figure I can use these "bak_" tables to see the changes that have occurred during the test like this:
SELECT * FROM employee EXCEPT SELECT * FROM bak_employee
This will show me the inserted and updated records. I'll ignore the deleted records for now.
Now, what I would like to do is go through all the tables in my database to see if there's any changes in any of the tables. I was thinking of doing this as a function so it's easy to call over and over. This is what I have so far:
CREATE OR REPLACE FUNCTION public.show_diff()
RETURNS SETOF diff_tables AS
$BODY$
DECLARE
app_tables text;
BEGIN
FOR app_tables IN
SELECT table_name
FROM information_schema.tables
WHERE table_catalog = 'myDatabase'
AND table_schema = 'public'
AND table_name not like 'bak_%' -- exclude existing backup tables
LOOP
-- somehow loop through tables to see what's changed something like:
EXECUTE 'SELECT * FROM ' || app_tables || ' EXCEPT SELECT * FROM bak_' || app_tables;
END LOOP;
RETURN;
END;
$BODY$
LANGUAGE plpgsql;
But obviously this isn't going to return me any useful information. Any help would be appreciated.
You cannot return various well-known row types from the same function in the same call. A cheap fix is to cast each row type to text, so we have a common return type.
CREATE OR REPLACE FUNCTION public.show_diff()
RETURNS SETOF text AS -- text!!
$func$
DECLARE
app_table text;
BEGIN
FOR app_table IN
SELECT table_name
FROM information_schema.tables
WHERE table_catalog = 'myDatabase'
AND table_schema = 'public'
AND table_name NOT LIKE 'bak_%' -- exclude existing backup tables
LOOP
RETURN NEXT ' ';
RETURN NEXT '=== ' || app_table || ' ===';
RETURN QUERY EXECUTE format(
'SELECT x::text FROM (TABLE %I EXCEPT ALL TABLE %I) x'
, app_table, 'bak_' || app_table);
END LOOP;
RETURN;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM public.show_diff();
I had the test suggested by #a_horse at first, but after your comment I realized that there is no need for this. EXCEPT considers NULL values to be equal and shows all differences.
While being at it, I improved and simplified your solution some more. Use EXCEPT ALL: cheaper and does not run the risk of folding complete duplicates.
Using EXCEPT clause in PostgreSQL
TABLE is just syntactical sugar.
Is there a shortcut for SELECT * FROM in psql?
However, if you have an index on a unique (combination of) column(s), a JOIN like I suggested before should be faster: finding the only possible duplicate via index should be substantially cheaper.
Crucial element is the cast the row type to text (x::text).
You can even make the function work for any table - but never more than one at a time: With a polymorphic parameter type:
Refactor a PL/pgSQL function to return the output of various SELECT queries

nzsql - Converting a subquery into columns for another select

Goal: Use a given subquery's results (a single column with many rows of names) to act as the outer select's selection field.
Currently, my subquery is the following:
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'test_table' AND column_name not in ('colRemove');
What I am doing in this subquery is grabbing all the column names from a table (i.e. test_table) and outputting all except for the column name specified (i.e. colRemove). As stated in the "goal", I want to use this subquery as such:
SELECT (*enter subquery from above here*)
FROM actual_table
WHERE (*enter specific conditions*)
I am working on a Netezza SQL server that is version 7.0.4.4. Ideally, I would like to make the entire query executable in one line, but for now, a working solution would be much appreciated. Thanks!
Note: I do not believe that the SQL extensions has been installed (i.e. arrays), but I will need to double check this.
A year too late, here's the best I can come up with but, as you already noticed, it requires a stored procedure to do the dynamic SQL. The stored proc creates a view with the all the columns from the source table minus the one you want to exclude.
-- Create test data.
CREATE TABLE test (firstcol INTEGER, secondcol INTEGER, thirdcol INTEGER);
INSERT INTO test (firstcol, secondcol, thirdcol) VALUES (1, 2, 3);
INSERT INTO test (firstcol, secondcol, thirdcol) VALUES (4, 5, 6);
-- Install stored procedure.
CREATE OR REPLACE PROCEDURE CreateLimitedView (varchar(ANY), varchar(ANY)) RETURNS BOOLEAN
LANGUAGE NZPLSQL AS
BEGIN_PROC
DECLARE
tableName ALIAS FOR $1;
columnToExclude ALIAS FOR $2;
colRec RECORD;
cols VARCHAR(2000); -- Adjust as needed.
isfirstcol BOOLEAN;
BEGIN
isfirstcol := true;
FOR colRec IN EXECUTE
'SELECT ATTNAME AS NAME FROM _V_RELATION_COLUMN
WHERE
NAME=UPPER('||quote_literal(tableName)||')
AND ATTNAME <> UPPER('||quote_literal(columnToExclude)||')
ORDER BY ATTNUM'
LOOP
IF isfirstcol THEN
cols := colRec.NAME;
ELSE
cols := cols || ', ' || colRec.NAME;
END IF;
isfirstcol := false;
END LOOP;
-- Should really check if 'LimitedView' already exists as a view, table or synonym.
EXECUTE IMMEDIATE 'CREATE OR REPLACE VIEW LimitedView AS SELECT ' || cols || ' FROM ' || quote_ident(tableName);
RETURN true;
END;
END_PROC
;
-- Run the stored proc to create the view.
CALL CreateLimitedView('test', 'secondcol');
-- Select results from the view.
SELECT * FROM limitedView WHERE firstcol = 4;
FIRSTCOL | THIRDCOL
----------+----------
4 | 6
You could have the stored proc return a resultset directly but then you wouldn't be able to filter results with a WHERE clause.

Update multiple columns in a trigger function in plpgsql

Given the following schema:
create table account_type_a (
id SERIAL UNIQUE PRIMARY KEY,
some_column VARCHAR
);
create table account_type_b (
id SERIAL UNIQUE PRIMARY KEY,
some_other_column VARCHAR
);
create view account_type_a view AS select * from account_type_a;
create view account_type_b view AS select * from account_type_b;
I try to create a generic trigger function in plpgsql, which enables updating the view:
create trigger trUpdate instead of UPDATE on account_view_type_a
for each row execute procedure updateAccount();
create trigger trUpdate instead of UPDATE on account_view_type_a
for each row execute procedure updateAccount();
An unsuccessful effort of mine was:
create function updateAccount() returns trigger as $$
declare
target_table varchar := substring(TG_TABLE_NAME from '(.+)_view');
cols varchar;
begin
execute 'select string_agg(column_name,$1) from information_schema.columns
where table_name = $2' using ',', target_table into cols;
execute 'update ' || target_table || ' set (' || cols || ') = select ($1).*
where id = ($1).id' using NEW;
return NULL;
end;
$$ language plpgsql;
The problem is the update statement. I am unable to come up with a syntax that would work here. I have successfully implemented this in PL/Perl, but would be interested in a plpgsql-only solution.
Any ideas?
Update
As #Erwin Brandstetter suggested, here is the code for my PL/Perl solution. I incoporated some of his suggestions.
create function f_tr_up() returns trigger as $$
use strict;
use warnings;
my $target_table = quote_ident($_TD->{'table_name'}) =~ s/^([\w]+)_view$/$1/r;
my $NEW = $_TD->{'new'};
my $cols = join(',', map { quote_ident($_) } keys $NEW);
my $vals = join(',', map { quote_literal($_) } values $NEW);
my $query = sprintf(
"update %s set (%s) = (%s) where id = %d",
$target_table,
$cols,
$vals,
$NEW->{'id'});
spi_exec_query($query);
return;
$$ language plperl;
While #Gary's answer is technically correct, it fails to mention that PostgreSQL does support this form:
UPDATE tbl
SET (col1, col2, ...) = (expression1, expression2, ..)
Read the manual on UPDATE.
It's still tricky to get this done with dynamic SQL. I'll assume a simple case where views consist of the same columns as their underlying tables.
CREATE VIEW tbl_view AS SELECT * FROM tbl;
Problems
The special record NEW is not visible inside EXECUTE. I pass NEW as a single parameter with the USING clause of EXECUTE.
As discussed, UPDATE with list-form needs individual values. I use a subselect to split the record into individual columns:
UPDATE ...
FROM (SELECT ($1).*) x
(Parenthesis around $1 are not optional.) This allows me to simply use two column lists built with string_agg() from the catalog table: one with and one without table qualification.
It's not possible to assign a row value as a whole to individual columns. The manual:
According to the standard, the source value for a parenthesized
sub-list of target column names can be any row-valued expression
yielding the correct number of columns. PostgreSQL only allows the
source value to be a row constructor or a sub-SELECT.
INSERT is implemented simpler. If the structure of view and table are identical we can omit the column definition list. (Can be improved, see below.)
Solution
I made a couple of updates to your approach to make it shine.
Trigger function for UPDATE:
CREATE OR REPLACE FUNCTION f_trg_up()
RETURNS TRIGGER
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := quote_ident(TG_TABLE_SCHEMA) || '.'
|| quote_ident(substring(TG_TABLE_NAME from '(.+)_view$'));
_cols text;
_vals text;
BEGIN
SELECT INTO _cols, _vals
string_agg(quote_ident(attname), ', ')
, string_agg('x.' || quote_ident(attname), ', ')
FROM pg_attribute
WHERE attrelid = _tbl
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0; -- no system columns
EXECUTE format('
UPDATE %s
SET (%s) = (%s)
FROM (SELECT ($1).*) x', _tbl, _cols, _vals)
USING NEW;
RETURN NEW; -- Don't return NULL unless you knwo what you're doing
END
$func$;
Trigger function for INSERT:
CREATE OR REPLACE FUNCTION f_trg_ins()
RETURNS TRIGGER
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := quote_ident(TG_TABLE_SCHEMA) || '.'
|| quote_ident(substring(TG_TABLE_NAME FROM '(.+)_view$'));
BEGIN
EXECUTE format('INSERT INTO %s SELECT ($1).*', _tbl)
USING NEW;
RETURN NEW; -- Don't return NULL unless you know what you're doing
END
$func$;
Triggers:
CREATE TRIGGER trg_instead_up
INSTEAD OF UPDATE ON a_view
FOR EACH ROW EXECUTE FUNCTION f_trg_up();
CREATE TRIGGER trg_instead_ins
INSTEAD OF INSERT ON a_view
FOR EACH ROW EXECUTE FUNCTION f_trg_ins();
Before Postgres 11 the syntax (oddly) was EXECUTE PROCEDURE instead of EXECUTE FUNCTION - which also still works.
db<>fiddle here - demonstrating INSERT and UPDATE
Old sqlfiddle
Major points
Include the schema name to make the table reference unambiguous. There can be multiple table of the same name in one database with multiple schemas!
Query pg_catalog.pg_attribute instead of information_schema.columns. Less portable, but much faster and allows to use the table-OID.
How to check if a table exists in a given schema
Table names are NOT safe against SQLi when concatenated as strings for dynamic SQL. Escape with quote_ident() or format() or with an object-identifer type. This includes the special trigger function variables TG_TABLE_SCHEMA and TG_TABLE_NAME!
Cast to the object identifier type regclass to assert the table name is valid and get the OID for the catalog look-up.
Optionally use format() to build the dynamic query string safely.
No need for dynamic SQL for the first query on the catalog tables. Faster, simpler.
Use RETURN NEW instead of RETURN NULL in these trigger functions unless you know what you are doing. (NULL would cancel the INSERT for the current row.)
This simple version assumes that every table (and view) has a unique column named id. A more sophisticated version might use the primary key dynamically.
The function for UPDATE allows the columns of view and table to be in any order, as long as the set is the same.
The function for INSERT expects the columns of view and table to be in identical order. If you want to allow arbitrary order, add a column definition list to the INSERT command, just like with UPDATE.
Updated version also covers changes to the id column by using OLD additionally.
Postgresql doesn't support updating multiple columns using the set (col1,col2) = select val1,val2 syntax.
To achieve the same in postgresql you'd use
update target_table
set col1 = d.val1,
col2 = d.val2
from source_table d
where d.id = target_table.id
This is going to make the dynamic query a bit more complex to build as you'll need to iterate the column name list you're using into individual fields. I'd suggest you use array_agg instead of string_agg as an array is easier to process than splitting the string again.
Postgresql UPDATE syntax
documentation on array_agg function