Extract column values from one table and insert with modifications into another - sql

I have created a PL/pgSQL function that accepts two column names, a "relation", and two table names. It finds distinct rows in one table and inserts them in to a temporary table, deletes any row with a null value, and sets all values of one column to relation. I have the first part of the process using this function.
create or replace function alt_edger(s text, v text, relation text, tbl text, tbl_src text)
returns void
language plpgsql as
$func$
begin
raise notice 's: %, v: %, tbl: %, tbl_src: %', s,v,tbl,tbl_src;
execute ('insert into '||tbl||' ("source", "target") select distinct "'||s||'","'||v||'" from '||tbl_src||'');
execute ('DELETE FROM '||tbl||' WHERE "source" IS null or "target" is null');
end
$func$;
It is executed as follows:
-- create a temporary table and execute the function twice
drop table if exists temp_stack;
create temporary table temp_stack("label" text, "source" text, "target" text, "attr" text, "graph" text);
select alt_edger('x_x', 'y_y', ':associated_with', 'temp_stack','pg_check_table' );
select alt_edger('Document Number', 'x_x', ':documents', 'temp_stack','pg_check_table' );
select * from temp_stack;
Note that I didn't use relation, yet. The INSERT shall also assign relation, but I can't figure out how to make that happen to get something like:
label
source
target
attr
graph
:associated_with
638000
ARAS
:associated_with
202000
JASE
:associated_with
638010
JASE
:associated_with
638000
JASE
:associated_with
202100
JASE
:documents
A
638010
:documents
A
202000
:documents
A
202100
:documents
B
638000
:documents
A
638000
:documents
B
124004
:documents
B
202100
My challenges are:
How to integrate relation in the INSERT? When I try to use VALUES and comma separation I get an "error near select".
How to allow strings starting with ":" in relation? I'm anticipating here, the inclusion of the colon has given me challenges in the past.
How can I do this? Or is there a better approach?
Toy data model:
drop table if exists pg_check_table;
create temporary table pg_check_table("Document Number" text, x_x int, y_y text);
insert into pg_check_table values ('A',202000,'JASE'),
('A',202100,'JASE'),
('A',638010,'JASE'),
('A',Null,'JASE'),
('A',Null,'JASE'),
('A',202100,'JASE'),
('A',638000,'JASE'),
('A',202100,'JASE'),
('B',638000,'JASE'),
('B',202100,null),
('B',638000,'JASE'),
('B',null,'ARAS'),
('B',638000,'ARAS'),
('B',null,'ARAS'),
('B',638000,null),
('B',124004,null);
alter table pg_check_table add row_num serial;
select * from pg_check_table;

-- DROP FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text)
CREATE OR REPLACE FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text, OUT row_count int)
LANGUAGE plpgsql AS
$func$
DECLARE
_sql text := format(
'INSERT INTO pg_temp.%3$I (label, source, target)
SELECT DISTINCT $1, %1$I, %2$I FROM pg_temp.%4$I
WHERE (%1$I, %2$I) IS NOT NULL'
, _s, _v, _tbl, _tbl_src);
BEGIN
-- RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _relation;
GET DIAGNOSTICS row_count = ROW_COUNT; -- return number of inserted rows
END
$func$;
db<>fiddle here
Most importantly, use format() to concatenate your dynamic SQL commands safely. And use the format specifier %I for identifiers. This way, SQL injection is not possible and identifiers are double-quoted properly - preserving non-standard names like Document Number. That's where your original failed.
We could concatenate _relation as string to be inserted into label, too. But the preferable way to pass values to EXECUTE is with the USING clause. $1 inside the SQL string passed to EXECUTE is a placeholder for the first USING argument. Not to be confused with $1 referencing function parameters in the context of the function body outside EXECUTE! (You can pass any string, leading colon (:) does not matter, the string is not interpreted when done right.)
See:
Format specifier for integer variables in format() for EXECUTE?
Table name as a PostgreSQL function parameter
I replaced the DELETE in your original with a WHERE clause to the SELECT of the INSERT. Don't insert rows in the first place, instead of deleting them again later.
(%1$I, %2$I) IS NOT NULL only qualifies when both values are NOT NULL.
About that:
Check if a Postgres composite field is null/empty
Don't use the prefix "pg_" for your table names. That's what Postgres uses for system tables. Don't mess with those.
I schema-qualify known temporary tables with pg_temp. That's typically optional as the temporary schema comes first in the search_path by default. But that can be changed (maliciously), and then the table name would resolve to any existing regular table of the same name in the search_path. So better safe than sorry. See:
How does the search_path influence identifier resolution and the "current schema"
I made the function return the number of inserted rows. That's totally optional!
Since I do that with an OUT parameter, I am allowed to skip the RETURNS clause. See:
Can I make a plpgsql function return an integer without using a variable?

Related

SQL - omit repeating the table name

Let's say I want to create a new table from an existing table in SQL (postgres). I want the new table to have the same name as the old table but I want it to be in a different schema.
Is there a way to do this without having to repeat the name of the two tables (who share one name?)
Let's say the name of the original table is public.student
CREATE TABLE student(
student_id INT PRIMARY KEY,
last_name VARCHAR(30),
major VARCHAR(30))
Now I want to have the exact table but I want it to be in test.student
I know I would "clone" that table via
CREATE TABLE test.student AS
SELECT *
FROM public.student;
but I would like to write this without having to repeat writing "student".
Is there a way to write a function for this?
I'm quite new to SQL, so I'm thankful for any help! I looked into functions and I wasn't able to make it work.
You could create a procedure (or a function) with dynamic SQL:
CREATE OR REPLACE PROCEDURE foo(_schema text, _table text)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE format('CREATE TABLE %1$I.%2$I AS TABLE public.%2$I'
, _schema, _table);
END
$func$;
Call:
CALL foo('test', 'student');
Note that identifers are case sensitive here!
Be wary of possible SQL injection. format() with the format specifier %I (for identifier) is safe. (nested $1, $2 are ordinal references to format input)
See:
Define table and column names as arguments in a plpgsql function?
Table name as a PostgreSQL function parameter

Using a variable in a Postgres function [duplicate]

It must be simple, but I'm making my first steps into Postgres functions and I can't find anything that works...
I'd like to create a function that will modify a table and / or column and I can't find the right way of specifying my tables and columns as arguments in my function.
Something like:
CREATE OR REPLACE FUNCTION foo(t table)
RETURNS void AS $$
BEGIN
alter table t add column c1 varchar(20);
alter table t add column c2 varchar(20);
alter table t add column c3 varchar(20);
alter table t add column c4 varchar(20);
END;
$$ LANGUAGE PLPGSQL;
select foo(some_table)
In another case, I'd like to have a function that alters a certain column from a certain table:
CREATE OR REPLACE FUNCTION foo(t table, c column)
RETURNS void AS $$
BEGIN
UPDATE t SET c = "This is a test";
END;
$$ LANGUAGE PLPGSQL;
Is it possible to do that?
You must defend against SQL injection whenever you turn user input into code. That includes table and column names coming from system catalogs or from direct user input alike. This way you also prevent trivial exceptions with non-standard identifiers. There are basically three built-in methods:
1. format()
1st query, sanitized:
CREATE OR REPLACE FUNCTION foo(_t text)
RETURNS void
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE format('
ALTER TABLE %I ADD COLUMN c1 varchar(20)
, ADD COLUMN c2 varchar(20)', _t);
END
$func$;
format() requires Postgres 9.1 or later. Use it with the %I format specifier.
The table name alone may be ambiguous. You may have to provide the schema name to avoid changing the wrong table by accident. Related:
INSERT with dynamic table name in trigger function
How does the search_path influence identifier resolution and the "current schema"
Aside: adding multiple columns with a single ALTER TABLE command is cheaper.
2. regclass
You can also use a cast to a registered class (regclass) for the special case of existing table names. Optionally schema-qualified. This fails immediately and gracefully for table names that are not be valid and visible to the calling user. The 1st query sanitized with a cast to regclass:
CREATE OR REPLACE FUNCTION foo(_t regclass)
RETURNS void
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE 'ALTER TABLE ' || _t || ' ADD COLUMN c1 varchar(20)
, ADD COLUMN c2 varchar(20)';
END
$func$;
Call:
SELECT foo('table_name');
Or:
SELECT foo('my_schema.table_name'::regclass);
Aside: consider using just text instead of varchar(20).
3. quote_ident()
The 2nd query sanitized:
CREATE OR REPLACE FUNCTION foo(_t regclass, _c text)
RETURNS void
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE 'UPDATE ' || _t -- sanitized with regclass
|| ' SET ' || quote_ident(_c) || ' = ''This is a test''';
END
$func$;
For multiple concatenations / interpolations, format() is cleaner ...
Related answers:
Table name as a PostgreSQL function parameter
Postgres functions vs prepared queries
Case sensitive!
Be aware that unquoted identifiers are not cast to lower case here. When used as identifier in SQL [Postgres casts to lower case automatically][7]. But here we pass strings for dynamic SQL. When escaped as demonstrated, CaMel-case identifiers (like UserS) will be preserved by doublequoting ("UserS"), just like other non-standard names like "name with space" "SELECT"etc. Hence, names are case sensitive in this context.
My standing advice is to use legal lower case identifiers exclusively and never worry about that.
Aside: single quotes are for values, double quotes are for identifiers. See:
Are PostgreSQL column names case-sensitive?

Disambiguating a function with INSERT INTO "table" FROM "table" RETURNING "id" = "id"

I'm having trouble disambiguating this particular postgres function that inserts on a FROM statement of the target table and returns the newly created unique id:
CREATE OR REPLACE FUNCTION "copyEntry"(OUT "entryId" INTEGER, IN "copyEntryId" INTEGER) RETURNING VOID AS $$
BEGIN
INSERT INTO "entries" ("data") SELECT "data" FROM "entries" WHERE "entryId" = "copyEntryId" RETURNING "entryId" INTO "entryId";
--Other plpgsql code below
END;
$$ LANGUAGE 'plpgsql';
"entryId" INTO "entryId" is ambiguous and I can't seem to find a way to alias the insert table to remove ambiguity. I would like to keep the output parameter to "entryId"
The ambiguity is between the variable and the column. There are two ways of disambiguating variable names: prefixing them based on table and code block names, or renaming them to be unique.
In this case, the parameters are declared in the outermost code block of the function, which is named after the function. So the "entryId" parameter can be referenced as "copyEntry"."entryId". Meanwhile, the column is from the table entries, so can be referenced as entries."entryId".
It may however be more readable to name your variables so that they aren't ambiguous in the first place, perhaps using a prefixing convention, so that your parameter would be "out_entryId".
Name the parameters to the function so you can distinguish them from columns in the table. This is a good programming practice.
Something like this:
CREATE OR REPLACE FUNCTION "copyEntry"(OUT "out_entryId" INTEGER, IN "in_copyEntryId" ) RETURNING VOID AS $$
BEGIN
INSERT INTO "entries"
SELECT "data"
FROM "entries" e
WHERE "entryId" = "in_copyentryid"
RETURNING "entryId" ;
--Other plpgsql code below
END;
The RETURNING clause should return the value from the row being inserted -- which is presumably some sort of default or auto-incremented value. As the documentation describes:
The optional RETURNING clause causes INSERT to compute and return
value(s) based on each row actually inserted. This is primarily useful
for obtaining values that were supplied by defaults, such as a serial
sequence number. However, any expression using the table's columns is
allowed. The syntax of the RETURNING list is identical to that of the
output list of SELECT.
The problem with your code might be the issue that the parameter has the same name as the function.

Delete row if type cast fails

Ok, here is the layout:
I have a bunch of uuid data that is in varchar format. I know uuid is its own type. This is how I got the data. So to verify this which ones are uuid, I take the uuid in type varchar and insert it into a table where the column is uuid. If the insert fails, then it is not a uuid type. My basic question is how to delete the bad uuid if the insert fails. Or, how do I delete out of one table if an insert fails in another table.
My first set of data:
drop table if exists temp1;
drop sequence if exists temp1_id_seq;
CREATE temp table temp1 (id serial, some_value varchar);
INSERT INTO temp1(some_value)
SELECT split_part(name,':',2) FROM branding_resource WHERE name LIKE '%curric%';
create temp table temp2 (id serial, other_value uuid);
CREATE OR REPLACE function verify_uuid() returns varchar AS $$
DECLARE uu RECORD;
BEGIN
FOR uu IN select * from temp1
LOOP
EXECUTE 'INSERT INTO temp2 values ('||uu.id||','''|| uu.some_value||''')';
END LOOP;
END;
$$
LANGUAGE 'plpgsql' ;
select verify_uuid();
When I run this, I get the error
ERROR: invalid input syntax for uuid:
which is what I expect. There are some bad uuids in my data set.
My research led me to Trapping Errors - Exceptions with UPDATE/INSERT in the docs.
Narrowing down to the important part:
BEGIN
FOR uu IN select * from temp1
LOOP
begin
EXECUTE 'INSERT INTO temp2 values ('||uu.id||','''|| uu.some_value||''')';
return;
exception when ??? then delete from temp1 where some_value = uu.some_value;
end;
END LOOP;
END;
I do not know what to put instead of ???. I think it relates to the ERROR: invalid input syntax for uuid:, but I am not sure. I am actually not even sure if this is the right way to go about this?
You can get the SQLSTATE code from psql using VERBOSE mode, e.g:
regress=> \set VERBOSITY verbose
regress=> SELECT 'fred'::uuid;
ERROR: 22P02: invalid input syntax for uuid: "fred"
LINE 1: SELECT 'fred'::uuid;
^
LOCATION: string_to_uuid, uuid.c:129
Here we can see that the SQLSTATE is 22P02. You can use that directly in the exception clause, but it's generally more readable to look it up in the manual to find the text representation. Here, we see that 22P02 is invalid_text_representation.
So you can write exception when invalid_text_representation then ...
#Craig shows a way to identify the SQLSTATE.
You an also use pgAdmin, which shows the SQLSTATE by default:
SELECT some_value::uuid FROM temp1
> ERROR: invalid input syntax for uuid: "-a0eebc999c0b4ef8bb6d6bb9bd380a11"
> SQL state: 22P02
I am going to address the bigger question:
I am actually not even sure if this is the right way to go about this?
Your basic approach is the right way: the 'parking in new york' method (quoting Merlin Moncure in this thread on pgsql-general). But the procedure is needlessly expensive. Probably much faster:
Exclude obviously violating strings immediately.
You should be able to weed out the lion's share of violating strings with a much cheaper regexp test.
Postgres accepts a couple of different formats for UUID in text representation, but as far as I can tell, this character class should covers all valid characters:
'[^A-Fa-f0-9{}-]'
You can probably narrow it down further for your particular brand of UUID representation (Only lower case? No curly braces? No hyphen?).
CREATE TEMP TABLE temp1 (id serial, some_value text);
INSERT INTO temp1 (some_value)
SELECT split_part(name,':',2)
FROM branding_resource
WHERE name LIKE '%curric%'
AND split_part(name,':',2) !~ '[^A-Fa-f0-9{}-]';
"Does not contain illegal characters."
Cast to test the rest
Instead of filling another table, it should be much cheaper to just delete (the now few!) violating rows:
CREATE OR REPLACE function f_kill_bad_uuid()
RETURNS void AS
$func$
DECLARE
rec record;
BEGIN
FOR rec IN
SELECT * FROM temp1
LOOP
BEGIN
PERFORM rec.some_value::uuid; -- no dynamic SQL needed
-- do not RETURN! Keep looping.
RAISE NOTICE 'Good: %', rec.some_value; -- only for demo
EXCEPTION WHEN invalid_text_representation THEN
RAISE NOTICE 'Bad: %', rec.some_value; -- only for demo
DELETE FROM temp1 WHERE some_value = rec.some_value;
END;
END LOOP;
END
$func$ LANGUAGE plpgsql;
No need for dynamic SQL. Just cast. Use PERFORM, since we are not interested in the result. We just want to see if the cast goes through or not.
Not return value. You could count and return the number of excluded rows ...
For a one-time operation you could also use a DO statement.
And do not quote the language name 'plpgsql'. It's an identifier, not a string.
SQL Fiddle.

Update multiple columns in a trigger function in plpgsql

Given the following schema:
create table account_type_a (
id SERIAL UNIQUE PRIMARY KEY,
some_column VARCHAR
);
create table account_type_b (
id SERIAL UNIQUE PRIMARY KEY,
some_other_column VARCHAR
);
create view account_type_a view AS select * from account_type_a;
create view account_type_b view AS select * from account_type_b;
I try to create a generic trigger function in plpgsql, which enables updating the view:
create trigger trUpdate instead of UPDATE on account_view_type_a
for each row execute procedure updateAccount();
create trigger trUpdate instead of UPDATE on account_view_type_a
for each row execute procedure updateAccount();
An unsuccessful effort of mine was:
create function updateAccount() returns trigger as $$
declare
target_table varchar := substring(TG_TABLE_NAME from '(.+)_view');
cols varchar;
begin
execute 'select string_agg(column_name,$1) from information_schema.columns
where table_name = $2' using ',', target_table into cols;
execute 'update ' || target_table || ' set (' || cols || ') = select ($1).*
where id = ($1).id' using NEW;
return NULL;
end;
$$ language plpgsql;
The problem is the update statement. I am unable to come up with a syntax that would work here. I have successfully implemented this in PL/Perl, but would be interested in a plpgsql-only solution.
Any ideas?
Update
As #Erwin Brandstetter suggested, here is the code for my PL/Perl solution. I incoporated some of his suggestions.
create function f_tr_up() returns trigger as $$
use strict;
use warnings;
my $target_table = quote_ident($_TD->{'table_name'}) =~ s/^([\w]+)_view$/$1/r;
my $NEW = $_TD->{'new'};
my $cols = join(',', map { quote_ident($_) } keys $NEW);
my $vals = join(',', map { quote_literal($_) } values $NEW);
my $query = sprintf(
"update %s set (%s) = (%s) where id = %d",
$target_table,
$cols,
$vals,
$NEW->{'id'});
spi_exec_query($query);
return;
$$ language plperl;
While #Gary's answer is technically correct, it fails to mention that PostgreSQL does support this form:
UPDATE tbl
SET (col1, col2, ...) = (expression1, expression2, ..)
Read the manual on UPDATE.
It's still tricky to get this done with dynamic SQL. I'll assume a simple case where views consist of the same columns as their underlying tables.
CREATE VIEW tbl_view AS SELECT * FROM tbl;
Problems
The special record NEW is not visible inside EXECUTE. I pass NEW as a single parameter with the USING clause of EXECUTE.
As discussed, UPDATE with list-form needs individual values. I use a subselect to split the record into individual columns:
UPDATE ...
FROM (SELECT ($1).*) x
(Parenthesis around $1 are not optional.) This allows me to simply use two column lists built with string_agg() from the catalog table: one with and one without table qualification.
It's not possible to assign a row value as a whole to individual columns. The manual:
According to the standard, the source value for a parenthesized
sub-list of target column names can be any row-valued expression
yielding the correct number of columns. PostgreSQL only allows the
source value to be a row constructor or a sub-SELECT.
INSERT is implemented simpler. If the structure of view and table are identical we can omit the column definition list. (Can be improved, see below.)
Solution
I made a couple of updates to your approach to make it shine.
Trigger function for UPDATE:
CREATE OR REPLACE FUNCTION f_trg_up()
RETURNS TRIGGER
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := quote_ident(TG_TABLE_SCHEMA) || '.'
|| quote_ident(substring(TG_TABLE_NAME from '(.+)_view$'));
_cols text;
_vals text;
BEGIN
SELECT INTO _cols, _vals
string_agg(quote_ident(attname), ', ')
, string_agg('x.' || quote_ident(attname), ', ')
FROM pg_attribute
WHERE attrelid = _tbl
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0; -- no system columns
EXECUTE format('
UPDATE %s
SET (%s) = (%s)
FROM (SELECT ($1).*) x', _tbl, _cols, _vals)
USING NEW;
RETURN NEW; -- Don't return NULL unless you knwo what you're doing
END
$func$;
Trigger function for INSERT:
CREATE OR REPLACE FUNCTION f_trg_ins()
RETURNS TRIGGER
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := quote_ident(TG_TABLE_SCHEMA) || '.'
|| quote_ident(substring(TG_TABLE_NAME FROM '(.+)_view$'));
BEGIN
EXECUTE format('INSERT INTO %s SELECT ($1).*', _tbl)
USING NEW;
RETURN NEW; -- Don't return NULL unless you know what you're doing
END
$func$;
Triggers:
CREATE TRIGGER trg_instead_up
INSTEAD OF UPDATE ON a_view
FOR EACH ROW EXECUTE FUNCTION f_trg_up();
CREATE TRIGGER trg_instead_ins
INSTEAD OF INSERT ON a_view
FOR EACH ROW EXECUTE FUNCTION f_trg_ins();
Before Postgres 11 the syntax (oddly) was EXECUTE PROCEDURE instead of EXECUTE FUNCTION - which also still works.
db<>fiddle here - demonstrating INSERT and UPDATE
Old sqlfiddle
Major points
Include the schema name to make the table reference unambiguous. There can be multiple table of the same name in one database with multiple schemas!
Query pg_catalog.pg_attribute instead of information_schema.columns. Less portable, but much faster and allows to use the table-OID.
How to check if a table exists in a given schema
Table names are NOT safe against SQLi when concatenated as strings for dynamic SQL. Escape with quote_ident() or format() or with an object-identifer type. This includes the special trigger function variables TG_TABLE_SCHEMA and TG_TABLE_NAME!
Cast to the object identifier type regclass to assert the table name is valid and get the OID for the catalog look-up.
Optionally use format() to build the dynamic query string safely.
No need for dynamic SQL for the first query on the catalog tables. Faster, simpler.
Use RETURN NEW instead of RETURN NULL in these trigger functions unless you know what you are doing. (NULL would cancel the INSERT for the current row.)
This simple version assumes that every table (and view) has a unique column named id. A more sophisticated version might use the primary key dynamically.
The function for UPDATE allows the columns of view and table to be in any order, as long as the set is the same.
The function for INSERT expects the columns of view and table to be in identical order. If you want to allow arbitrary order, add a column definition list to the INSERT command, just like with UPDATE.
Updated version also covers changes to the id column by using OLD additionally.
Postgresql doesn't support updating multiple columns using the set (col1,col2) = select val1,val2 syntax.
To achieve the same in postgresql you'd use
update target_table
set col1 = d.val1,
col2 = d.val2
from source_table d
where d.id = target_table.id
This is going to make the dynamic query a bit more complex to build as you'll need to iterate the column name list you're using into individual fields. I'd suggest you use array_agg instead of string_agg as an array is easier to process than splitting the string again.
Postgresql UPDATE syntax
documentation on array_agg function