Creating a table with certain columns from another table - sql

Lets say we have a table A with 200 different columns. We want to select columns that contain a certain substring (e.g. the "host" substring in "host_id", "host_name", "average_host_rating"), and create a new table B with only those columns and their data imported from a .csv file.
I tried creating the new table manually, however this is not good practice and i want to improve the code by making it valid and functional even if i add more columns to table A.
Creating the table manually:
SELECT
listings.host_id,
listings.host_url ,
..
..
listings.host_name ,
listings.host_since ,
INTO host_table
FROM listings
WHERE TRUE;
Trying to create the table in a better way:
CREATE TABLE B AS
SELECT *
FROM A
WHERE A::text LIKE '%host%'
I expected it to create table B with every column that contains 'host' in its name however it returned an exact copy of table A (and all its data). I tried different ways and methods of creating new tables, however the problem always was that i could not isolate only the columns with the specified substring ('host').
What could be wrong in my syntax, way of thinking or anything else?
Thanks in advance!

You may create and call a function with parameters. The function will dynamically chose the columns from information_schema.columns.
Note that where false is used because you mentioned data will come from a csv file and not from the original table.
create or replace function
fn_gen_tab_text ( curr_tab_in text,tab_text_in TEXT, new_tab_in text)
RETURNS void AS
$body$
declare v_sql TEXT;
BEGIN
select 'CREATE TABLE %I AS select ' || string_agg(column_name,',')
||' from %I where false'
into v_sql from information_schema.columns
where table_name = curr_tab_in
and column_name like '%'||tab_text_in||'%';
EXECUTE format (v_sql,new_tab_in,curr_tab_in);
END $body$ language plpgsql
Call it as
select fn_gen_tab_text('host_table','host','new_table' );
DEMO

Related

Union different tables

The following function joins dynamically different tables.
create or replace function unified_tables() returns table(
1 TEXT
, 2 TEXT
, 3 TEXT
, 4 TEXT
, 5 JSONB
, 6 BIGINT
, 7 BIGINT
, 8 TEXT
, 9 TEXT
, 10 TIMESTAMPTZ
)
as
$$
declare
a record;
begin
for a in select table_schema
from information_schema.tables
where table_name = 'name'
loop
return query
execute format('select %L as source_schema, * from %I.name', a.table_schema, a.table_schema);
end loop;
end;
$$
language plpgsql;
Unfortunately, not all the tables called have all the columns specified in RETURNS TABLE.
Precisely, there are 15 tables (the loop goes over 200+ tables) missing the column 2, two tables missing the column 4, and five tables missing the column 9.
Future tables entering the loop might miss columns as well. I do not have control on the source structure.
How can I keep using the function adding a null value for the missing columns so to maintain the structure defined in the RETURNS TABLE?
You can create a set returning function for this:
create function get_all_pages()
returns table (....)
as
$$
declare
l_info_rec record;
begin
for l_info_rec in select table_schema
from information_schema.tables
where table_name = 'page'
loop
return query
execute format('select %L as source_schema, *
from %I.page', l_info_rec.table_schema, l_info_rec.table_schema);
end loop;
end;
$$
language plpgsql;
Then run:
select *
from get_all_pages();
return query in a PL/pgSQL function doesn't end the function. It simply appends the result of the query to the result of the function.
You can pick any table as the return type, it just serves as a "placeholder" in this case (again: assuming all tables are 100% identical). Alternatively you could use returns table (....) - but that will require you to list all the columns of the table manually.
Note that this will buffer the complete result on the server before the function returns, so this might not be suitable for really large tables.
Another option is to create an event trigger that re-creates a VIEW (that does a UNION ALL) each time a new table is created or an existing one is dropped.

PL/pgSQL function to return the output of various SELECT queries from different database

I have found this very interesting article: Refactor a PL/pgSQL function to return the output of various SELECT queries
from Erwin Brandstetter which describes how to return all columns of various tables with only one function:
CREATE OR REPLACE FUNCTION data_of(_table_name anyelement, _where_part text)
RETURNS SETOF anyelement AS
$func$
BEGIN
RETURN QUERY EXECUTE
'SELECT * FROM ' || pg_typeof(_table_name)::text || ' WHERE ' || _where_part;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM data_of(NULL::tablename,'1=1 LIMIT 1');
This works pretty well. I need a very similar solution but for getting data from a table on a different database via dblink. That means the call NULL::tablename will fail since the table does not exists on the database where the call is made. I wonder how to make this work. Any try to connect inside of the function via dblink to a different database failed to get the result of NULL::tablename. It seems the polymorph function needs a polymorph parameter which creates the return type of the function implicit.
I would appreciate very much if anybody could help me.
Thanks a lot
Kind regards
Brian
it seems this request is more difficult to explain than I thought it is. Here is a second try with a test setup:
Database 1
First we create a test table with some data on database 1:
CREATE TABLE db1_test
(
id integer NOT NULL,
txt text
)
WITH (
OIDS=TRUE
);
INSERT INTO db1_test (id, txt) VALUES(1,'one');
INSERT INTO db1_test (id, txt) VALUES(2,'two');
INSERT INTO db1_test (id, txt) VALUES(3,'three');
Now we create the polymorph function on database 1:
-- create a polymorph function with a polymorph parameter "_table_name" on database 1
-- the return type is set implicit by calling the function "data_of" with the parameter "NULL::[tablename]" and a where part
CREATE OR REPLACE FUNCTION data_of(_table_name anyelement, _where_part text)
RETURNS SETOF anyelement AS
$func$
BEGIN
RETURN QUERY EXECUTE
'SELECT * FROM ' || pg_typeof(_table_name)::text || ' WHERE ' || _where_part;
END
$func$ LANGUAGE plpgsql;
Now we make test call if everything works as aspected on database 1
SELECT * FROM data_of(NULL::db1_test, 'id=2');
It works. Please notice I do NOT specify any columns of the table db1_test. Now we switch over to database 2.
Database 2
Here I need to make exactly the same call to data_of from database 1 as before and although WITHOUT knowing the columns of the selected table at call time. Unfortunatly this is not gonna work, the only call which works is something like that:
SELECT
*
FROM dblink('dbname=[database1] port=[port] user=[user] password=[password]'::text, 'SELECT * FROM data_of(NULL::db1_test, \'id=2\')'::text)
t1(id integer, txt text);
Conclusion
This call works, but as you can see, I need to specify at least once how all the columns look like from the table I want to select. I am looking for any way to bypass this and make it possible to make a call WITHOUT knowing all of the columns from the table on database 1.
Final goal
My final goal is to create a function in database 2 which looks like
SELECT * from data_of_dblink('table_name','where_part')
and which calls internaly data_of() on database1 to make it possible to select a table on a different database with a where part as parameter. It should work like a static view but with the possiblity to pass a where part as parameter.
I am extremly open for suggestions.
Thanks a lot
Brian

Save stored procedure output to new table without repeating table type

I want to call an existing procedure and store its table-typed OUT parameters to new physical tables, without having to repeat the definitions of the output types when creating the new tables. For example, if the procedure were
CREATE PROCEDURE MYPROC
(IN X INTEGER, OUT Y TABLE(A INTEGER, B DOUBLE, C NVARCHAR(25)))
LANGUAGE SQLSCRIPT AS BEGIN
...
END;
I would want to create a physical table for the output without repeating the (A INTEGER, B DOUBLE, C NVARCHAR(25)) part.
If I already had a table with the structure I want my result to have, I could CREATE TABLE MY_OUTPUT LIKE EXISTING_TABLE, but I don't.
If I already had a named type defined for the procedure's output type, I could create my table based on that type, but I don't.
If it were a subquery instead of a procedure output parameter, I could CREATE TABLE MY_OUTPUT AS (<subquery>), but it's not a subquery, and I don't know how to express it as a subquery. Also, there could be multiple output parameters, and I don't know how you'd make this work with multiple output parameters.
In my specific case, the functions come from the SAP HANA Predictive Analysis Library, so I don't have the option of changing how the functions are defined. Additionally, I suspect that PAL's unusually flexible handling of parameter types might prevent me from using solutions that would work for ordinary SQLScript procedures, but I'm still interested in solutions that would work for regular procedures, even if they fail on PAL.
Is there a way to do this?
It's possible, with limitations, to do this by using a SQLScript anonymous block:
DO BEGIN
CALL MYPROC(5, Y);
CREATE TABLE BLAH AS (SELECT * FROM :Y);
END;
We store the output to a table variable in the anonymous block, then create a physical table with data taken from the table variable. This even works with PAL! It's a lot of typing, though.
The limitation I've found is that the body of an anonymous block can't refer to local temporary tables created outside the anonymous block, so it's awkward to pass local temporary tables to the procedure this way. It's possible to do it anyway by passing the local temporary table as a parameter to the anonymous block itself, but that requires writing out the type of the local temporary table, and we were trying to avoid writing table types manually.
As far as I understand, you want to use your database tables as output parameter types.
In my default schema, I have a database table named CITY
I can create a stored procedure as follows using the table as output parameter type
CREATE PROCEDURE MyCityList (
OUT CITYLIST CITY
)
LANGUAGE SQLSCRIPT
AS
BEGIN
CITYLIST = SELECT * FROM CITY;
END;
After procedure is created, you can execute it as follows
do
begin
declare myList CITY;
call MyCityList(:myList);
select * from :myList;
end;
Here is the result where the output data is in a database table format, namely as CITY table
I hope this answers your question,
Update after first comment
If the scenario is the opposite as mentioned in the first comment, you can query system view PROCEDURE_PARAMETER_COLUMNS and create dynamic SQL statements that will generate tables with definitions in procedure table type parameters
Here is the SQL query
select
parameter_name,
'CREATE Column Table ' ||
procedure_name || '_'
|| parameter_name || ' ( ' ||
string_agg(
column_name || ' ' ||
data_type_name ||
case when data_type_name = 'INTEGER' then '' else
'(' || length || ')'
end
, ','
) || ' );'
from PROCEDURE_PARAMETER_COLUMNS
where
schema_name = 'A00077387'
group by procedure_name, parameter_name
You need to replace the WHERE clause according to your case.
Each line will have such an output
CREATE Column Table LISTCITIESBYCOUNTRYID_CITYLIST ( CITYID INTEGER,NAME NVARCHAR(40) );
The format for table name is concatenation of procedure name and parameter name
One last note, some data types integer, decimal, etc requires special code like excluding length or adding of scale , etc. Some are not handled in this SQL.
I'll try to enhance the query soon and publish an update

Update multiple columns in a trigger function in plpgsql

Given the following schema:
create table account_type_a (
id SERIAL UNIQUE PRIMARY KEY,
some_column VARCHAR
);
create table account_type_b (
id SERIAL UNIQUE PRIMARY KEY,
some_other_column VARCHAR
);
create view account_type_a view AS select * from account_type_a;
create view account_type_b view AS select * from account_type_b;
I try to create a generic trigger function in plpgsql, which enables updating the view:
create trigger trUpdate instead of UPDATE on account_view_type_a
for each row execute procedure updateAccount();
create trigger trUpdate instead of UPDATE on account_view_type_a
for each row execute procedure updateAccount();
An unsuccessful effort of mine was:
create function updateAccount() returns trigger as $$
declare
target_table varchar := substring(TG_TABLE_NAME from '(.+)_view');
cols varchar;
begin
execute 'select string_agg(column_name,$1) from information_schema.columns
where table_name = $2' using ',', target_table into cols;
execute 'update ' || target_table || ' set (' || cols || ') = select ($1).*
where id = ($1).id' using NEW;
return NULL;
end;
$$ language plpgsql;
The problem is the update statement. I am unable to come up with a syntax that would work here. I have successfully implemented this in PL/Perl, but would be interested in a plpgsql-only solution.
Any ideas?
Update
As #Erwin Brandstetter suggested, here is the code for my PL/Perl solution. I incoporated some of his suggestions.
create function f_tr_up() returns trigger as $$
use strict;
use warnings;
my $target_table = quote_ident($_TD->{'table_name'}) =~ s/^([\w]+)_view$/$1/r;
my $NEW = $_TD->{'new'};
my $cols = join(',', map { quote_ident($_) } keys $NEW);
my $vals = join(',', map { quote_literal($_) } values $NEW);
my $query = sprintf(
"update %s set (%s) = (%s) where id = %d",
$target_table,
$cols,
$vals,
$NEW->{'id'});
spi_exec_query($query);
return;
$$ language plperl;
While #Gary's answer is technically correct, it fails to mention that PostgreSQL does support this form:
UPDATE tbl
SET (col1, col2, ...) = (expression1, expression2, ..)
Read the manual on UPDATE.
It's still tricky to get this done with dynamic SQL. I'll assume a simple case where views consist of the same columns as their underlying tables.
CREATE VIEW tbl_view AS SELECT * FROM tbl;
Problems
The special record NEW is not visible inside EXECUTE. I pass NEW as a single parameter with the USING clause of EXECUTE.
As discussed, UPDATE with list-form needs individual values. I use a subselect to split the record into individual columns:
UPDATE ...
FROM (SELECT ($1).*) x
(Parenthesis around $1 are not optional.) This allows me to simply use two column lists built with string_agg() from the catalog table: one with and one without table qualification.
It's not possible to assign a row value as a whole to individual columns. The manual:
According to the standard, the source value for a parenthesized
sub-list of target column names can be any row-valued expression
yielding the correct number of columns. PostgreSQL only allows the
source value to be a row constructor or a sub-SELECT.
INSERT is implemented simpler. If the structure of view and table are identical we can omit the column definition list. (Can be improved, see below.)
Solution
I made a couple of updates to your approach to make it shine.
Trigger function for UPDATE:
CREATE OR REPLACE FUNCTION f_trg_up()
RETURNS TRIGGER
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := quote_ident(TG_TABLE_SCHEMA) || '.'
|| quote_ident(substring(TG_TABLE_NAME from '(.+)_view$'));
_cols text;
_vals text;
BEGIN
SELECT INTO _cols, _vals
string_agg(quote_ident(attname), ', ')
, string_agg('x.' || quote_ident(attname), ', ')
FROM pg_attribute
WHERE attrelid = _tbl
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0; -- no system columns
EXECUTE format('
UPDATE %s
SET (%s) = (%s)
FROM (SELECT ($1).*) x', _tbl, _cols, _vals)
USING NEW;
RETURN NEW; -- Don't return NULL unless you knwo what you're doing
END
$func$;
Trigger function for INSERT:
CREATE OR REPLACE FUNCTION f_trg_ins()
RETURNS TRIGGER
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := quote_ident(TG_TABLE_SCHEMA) || '.'
|| quote_ident(substring(TG_TABLE_NAME FROM '(.+)_view$'));
BEGIN
EXECUTE format('INSERT INTO %s SELECT ($1).*', _tbl)
USING NEW;
RETURN NEW; -- Don't return NULL unless you know what you're doing
END
$func$;
Triggers:
CREATE TRIGGER trg_instead_up
INSTEAD OF UPDATE ON a_view
FOR EACH ROW EXECUTE FUNCTION f_trg_up();
CREATE TRIGGER trg_instead_ins
INSTEAD OF INSERT ON a_view
FOR EACH ROW EXECUTE FUNCTION f_trg_ins();
Before Postgres 11 the syntax (oddly) was EXECUTE PROCEDURE instead of EXECUTE FUNCTION - which also still works.
db<>fiddle here - demonstrating INSERT and UPDATE
Old sqlfiddle
Major points
Include the schema name to make the table reference unambiguous. There can be multiple table of the same name in one database with multiple schemas!
Query pg_catalog.pg_attribute instead of information_schema.columns. Less portable, but much faster and allows to use the table-OID.
How to check if a table exists in a given schema
Table names are NOT safe against SQLi when concatenated as strings for dynamic SQL. Escape with quote_ident() or format() or with an object-identifer type. This includes the special trigger function variables TG_TABLE_SCHEMA and TG_TABLE_NAME!
Cast to the object identifier type regclass to assert the table name is valid and get the OID for the catalog look-up.
Optionally use format() to build the dynamic query string safely.
No need for dynamic SQL for the first query on the catalog tables. Faster, simpler.
Use RETURN NEW instead of RETURN NULL in these trigger functions unless you know what you are doing. (NULL would cancel the INSERT for the current row.)
This simple version assumes that every table (and view) has a unique column named id. A more sophisticated version might use the primary key dynamically.
The function for UPDATE allows the columns of view and table to be in any order, as long as the set is the same.
The function for INSERT expects the columns of view and table to be in identical order. If you want to allow arbitrary order, add a column definition list to the INSERT command, just like with UPDATE.
Updated version also covers changes to the id column by using OLD additionally.
Postgresql doesn't support updating multiple columns using the set (col1,col2) = select val1,val2 syntax.
To achieve the same in postgresql you'd use
update target_table
set col1 = d.val1,
col2 = d.val2
from source_table d
where d.id = target_table.id
This is going to make the dynamic query a bit more complex to build as you'll need to iterate the column name list you're using into individual fields. I'd suggest you use array_agg instead of string_agg as an array is easier to process than splitting the string again.
Postgresql UPDATE syntax
documentation on array_agg function

Find out which schema based on table values

My database is separated into schemas based on clients (i.e.: each client has their own schema, with same data structure).
I also happen to have an external action that does not know which schema it should target. It comes from another part of the system that has no concepts of clients and does not know in which client's set it is operating. Before I process it, I have to find out which schema that request needs to target
To find the right schema, I have to find out which holds the record R with a particular unique ID (string)
From my understanding, the following
SET search_path TO schema1,schema2,schema3,...
will only look through the tables in schema1 (or the first schema that matches the table) and will not do a global search.
Is there a way for me to do a global search across all schemas, or am I just going to have to use a for loop and iterate through all of them, one at a time?
You could use inheritance for this. (Be sure to consider the limitations.)
Consider this little demo:
CREATE SCHEMA master; -- no access of others ..
CREATE SEQUENCE master.myseq; -- global sequence for globally unique ids
CREATE table master.tbl (
id int primary key DEFAULT nextval('master.myseq')
, foo text);
CREATE SCHEMA x;
CREATE table x.tbl() INHERITS (master.tbl);
INSERT INTO x.tbl(foo) VALUES ('x');
CREATE SCHEMA y;
CREATE table y.tbl() INHERITS (master.tbl);
INSERT INTO y.tbl(foo) VALUES ('y');
SELECT * FROM x.tbl; -- returns 'x'
SELECT * FROM y.tbl; -- returns 'y'
SELECT * FROM master.tbl; -- returns 'x' and 'y' <-- !!
Now, to actually identify the table a particular row lives in, use the tableoid:
SELECT *, tableoid::regclass AS table_name
FROM master.tbl
WHERE id = 2;
Result:
id | foo | table_name
---+-----+-----------
2 | y | y.tbl
You can derive the source schema from the tableoid, best by querying the system catalogs with the tableoid directly. (The displayed name depends on the setting of search_path.)
SELECT n.nspname
FROM master.tbl t
JOIN pg_class c ON c.oid = t.tableoid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE t.id = 2;
This is also much faster than looping through many separate tables.
You will have to iterate over all namespaces. You can get a lot of this information from the pg_* system catalogs. In theory, you should be able to resolve the client -> schema mapping at request time without talking to the database so that the first SQL call you make is:
SET search_path = client1,global_schema;
While I think Erwin's solution is probably preferable if you can re-structure your tables, an alternative that doesn't require any schema changes is to write a PL/PgSQL function that scans the tables using dynamic SQL based on the system catalog information.
Given:
CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.testtab ( searchval text );
CREATE TABLE b.testtab (LIKE a.testtab);
INSERT INTO a.testtab(searchval) VALUES ('ham');
INSERT INTO b.testtab(searchval) VALUES ('eggs');
The following PL/PgSQL function searches all schemas containing tables named _tabname for values in _colname equal to _value and returns the first matching schema.
CREATE OR REPLACE FUNCTION find_schema_for_value(_tabname text, _colname text, _value text) RETURNS text AS $$
DECLARE
cur_schema text;
foundval integer;
BEGIN
FOR cur_schema IN
SELECT nspname
FROM pg_class c
INNER JOIN pg_namespace n ON (c.relnamespace = n.oid)
WHERE c.relname = _tabname AND c.relkind = 'r'
LOOP
EXECUTE
format('SELECT 1 FROM %I.%I WHERE %I = $1',
cur_schema, _tabname, _colname
) INTO foundval USING _value;
IF foundval = 1 THEN
RETURN cur_schema;
END IF;
END LOOP;
RETURN NULL;
END;
$$ LANGUAGE 'plpgsql';
If there are are no matches then null is returned. If there are multiple matches the result will be one of them, but no guarantee is made about which one. Add an ORDER BY clause to the schema query if you want to return (say) the first in alphabetical order or something. The function is also trivially modified to return setof text and RETURN NEXT cur_schema if you want to return all the matches.
regress=# SELECT find_schema_for_value('testtab','searchval','ham');
find_schema_for_value
-----------------------
a
(1 row)
regress=# SELECT find_schema_for_value('testtab','searchval','eggs');
find_schema_for_value
-----------------------
b
(1 row)
regress=# SELECT find_schema_for_value('testtab','searchval','bones');
find_schema_for_value
-----------------------
(1 row)
By the way, you can re-use the table definitions without inheritance if you want, and you really should. Either use a common composite data type:
CREATE TYPE public.testtab AS ( searchval text );
CREATE TABLE a.testtab OF public.testtab;
CREATE TABLE b.testtab OF public.testtab;
in which case they share the same data type but not any data; or or via LIKE:
CREATE TABLE public.testtab ( searchval text );
CREATE TABLE a.testtab (LIKE public.testtab);
CREATE TABLE b.testtab (LIKE public.testtab);
in which case they're completely unconnected to each other after creation.