My table schema:
CREATE TABLE project_sectors(
sector_id int GENERATED ALWAYS AS IDENTITY,
sector_name varchar(256),
project_count int,
PRIMARY KEY (sector_id)
);
And I am trying to execute a query for many tables with some particular column name:
DO $$
DECLARE
t text;
BEGIN
FOR t IN
SELECT table_name FROM information_schema.columns WHERE column_name = 'project_name'
LOOP
RAISE NOTICE 'INSERT METADATA FOR: %', t;
EXECUTE 'INSERT INTO project_sectors VALUES ($1, 0)'
USING t;
end loop;
end
$$ language 'plpgsql';
Once I try to run the query I get:
[42804] ERROR: column "sector_id" is of type integer but expression is of type text Hint: You will need to rewrite or cast the expression. Where: PL/pgSQL function inline_code_block line 9 at EXECUTE
When previously the EXECUTE statement was
EXECUTE format('INSERT INTO megaproject_sectors VALUES (''%I'', 0)', t)
I would get the error
ERROR: invalid input syntax for type integer: "railway"
railway is the value of t.
Why is it trying to insert data into GENERATED ALWAYS column?
Why is it trying to insert data into GENERATED ALWAYS column?
Because you are not specifying the target columns in your INSERT statement, so Postgres uses them from left to right.
It is good coding practice to always specify the target columns. As your table name is hardcoded, the dynamic SQL is unnecessary as well:
INSERT INTO project_sectors (sector_name, sector_count) VALUES (t.table_name, 0)
Note that in other database products, specifying less values than the table has columns would result in an error. So in e.g. Oracle your statement would result in "ORA-00947: not enough values"
Related
I have created a PL/pgSQL function that accepts two column names, a "relation", and two table names. It finds distinct rows in one table and inserts them in to a temporary table, deletes any row with a null value, and sets all values of one column to relation. I have the first part of the process using this function.
create or replace function alt_edger(s text, v text, relation text, tbl text, tbl_src text)
returns void
language plpgsql as
$func$
begin
raise notice 's: %, v: %, tbl: %, tbl_src: %', s,v,tbl,tbl_src;
execute ('insert into '||tbl||' ("source", "target") select distinct "'||s||'","'||v||'" from '||tbl_src||'');
execute ('DELETE FROM '||tbl||' WHERE "source" IS null or "target" is null');
end
$func$;
It is executed as follows:
-- create a temporary table and execute the function twice
drop table if exists temp_stack;
create temporary table temp_stack("label" text, "source" text, "target" text, "attr" text, "graph" text);
select alt_edger('x_x', 'y_y', ':associated_with', 'temp_stack','pg_check_table' );
select alt_edger('Document Number', 'x_x', ':documents', 'temp_stack','pg_check_table' );
select * from temp_stack;
Note that I didn't use relation, yet. The INSERT shall also assign relation, but I can't figure out how to make that happen to get something like:
label
source
target
attr
graph
:associated_with
638000
ARAS
:associated_with
202000
JASE
:associated_with
638010
JASE
:associated_with
638000
JASE
:associated_with
202100
JASE
:documents
A
638010
:documents
A
202000
:documents
A
202100
:documents
B
638000
:documents
A
638000
:documents
B
124004
:documents
B
202100
My challenges are:
How to integrate relation in the INSERT? When I try to use VALUES and comma separation I get an "error near select".
How to allow strings starting with ":" in relation? I'm anticipating here, the inclusion of the colon has given me challenges in the past.
How can I do this? Or is there a better approach?
Toy data model:
drop table if exists pg_check_table;
create temporary table pg_check_table("Document Number" text, x_x int, y_y text);
insert into pg_check_table values ('A',202000,'JASE'),
('A',202100,'JASE'),
('A',638010,'JASE'),
('A',Null,'JASE'),
('A',Null,'JASE'),
('A',202100,'JASE'),
('A',638000,'JASE'),
('A',202100,'JASE'),
('B',638000,'JASE'),
('B',202100,null),
('B',638000,'JASE'),
('B',null,'ARAS'),
('B',638000,'ARAS'),
('B',null,'ARAS'),
('B',638000,null),
('B',124004,null);
alter table pg_check_table add row_num serial;
select * from pg_check_table;
-- DROP FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text)
CREATE OR REPLACE FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text, OUT row_count int)
LANGUAGE plpgsql AS
$func$
DECLARE
_sql text := format(
'INSERT INTO pg_temp.%3$I (label, source, target)
SELECT DISTINCT $1, %1$I, %2$I FROM pg_temp.%4$I
WHERE (%1$I, %2$I) IS NOT NULL'
, _s, _v, _tbl, _tbl_src);
BEGIN
-- RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _relation;
GET DIAGNOSTICS row_count = ROW_COUNT; -- return number of inserted rows
END
$func$;
db<>fiddle here
Most importantly, use format() to concatenate your dynamic SQL commands safely. And use the format specifier %I for identifiers. This way, SQL injection is not possible and identifiers are double-quoted properly - preserving non-standard names like Document Number. That's where your original failed.
We could concatenate _relation as string to be inserted into label, too. But the preferable way to pass values to EXECUTE is with the USING clause. $1 inside the SQL string passed to EXECUTE is a placeholder for the first USING argument. Not to be confused with $1 referencing function parameters in the context of the function body outside EXECUTE! (You can pass any string, leading colon (:) does not matter, the string is not interpreted when done right.)
See:
Format specifier for integer variables in format() for EXECUTE?
Table name as a PostgreSQL function parameter
I replaced the DELETE in your original with a WHERE clause to the SELECT of the INSERT. Don't insert rows in the first place, instead of deleting them again later.
(%1$I, %2$I) IS NOT NULL only qualifies when both values are NOT NULL.
About that:
Check if a Postgres composite field is null/empty
Don't use the prefix "pg_" for your table names. That's what Postgres uses for system tables. Don't mess with those.
I schema-qualify known temporary tables with pg_temp. That's typically optional as the temporary schema comes first in the search_path by default. But that can be changed (maliciously), and then the table name would resolve to any existing regular table of the same name in the search_path. So better safe than sorry. See:
How does the search_path influence identifier resolution and the "current schema"
I made the function return the number of inserted rows. That's totally optional!
Since I do that with an OUT parameter, I am allowed to skip the RETURNS clause. See:
Can I make a plpgsql function return an integer without using a variable?
I am trying to test my stored procedure in MySQL workbench/j. I get an error when I am trying to call the stored procedure.
I have created a table to store the result of my stored procedure
CREATE TABLE IF NOT EXISTS ableok
(
name VARCHAR(50) ENCODE lzo
);
This is my stored procedure:
CREATE OR REPLACE PROCEDURE sp_GetDistSchema()
AS '
BEGIN
SELECT table_schema INTO ableok FROM information_schema.tables;
END;
'
LANGUAGE plpgsql;
This is how i call my stored procedure in SQL workbench/j:
call sp_getdistschema();
Result:
An error occurred when executing the SQL command:
call sp_getdistschema()
[Amazon](500310) Invalid operation: Column "table_schema" has unsupported type "information_schema.sql_identifier".; [SQL State=0A000, DB Errorcode=500310]
1 statement failed.
The SELECT ... INTO structure is used to store a query result into variables. It looks as though you are really just trying to populate the distTable directly. Try this instead:
Update: When processing the information schema in Redshift/PostgreSQL, you apparently need to convert the column datatypes using CAST:
CREATE OR REPLACE PROCEDURE sp_GetDistSchema()
BEGIN
INSERT INTO distTable SELECT DISTINCT CAST(table_schema AS VARCHAR) FROM information_schema.tables;
END;
As #user9601310 mentioned (up voted), you need to CAST the column data types.
I was scratching my head too, even in plain old Postgres when your using the information_schema.
This will 'describe' a table or a view, but won't work unless the query columns are cast as VARCHAR:
CREATE OR REPLACE FUNCTION public.fn_desc(p_tablename VARCHAR)
RETURNS TABLE(vtable_name VARCHAR, vcolumn_name VARCHAR, vdata_type VARCHAR)
LANGUAGE plpgsql
AS $function$
BEGIN
RETURN QUERY
SELECT
table_name::VARCHAR,
column_name::VARCHAR,
data_type::VARCHAR
FROM
information_schema.columns
WHERE
table_name ILIKE p_tablename;
END;
$function$
SELECT * FROM public.fn_desc('any_table_or_view');
I am trying to get the value of a column so that it can be inserted into a table that holds the column name and column value of the inserted row, however, I have not been able to get the value of the column that I need. Normally, I would be able to use value := NEW.column_name but each table has a unique key column name that is in the table name itself (I know, that's bad), but I already have a way to get the column name that I want, it's getting the NEW value of that column that's the problem.
CREATE OR REPLACE FUNCTION trgfn_keyvalue_insert()
RETURNS trigger AS
$BODY$
DECLARE
key_column_value character varying;
key_column_name character varying;
part text;
part_array text[] := array['prefix_','_suffix'];
BEGIN
key_column_name := TG_TABLE_NAME; --parsing the table name to get the desired column name
FOREACH part IN ARRAY part_array LOOP
key_column_name = regexp_replace(cat, part, '');
END loop;
IF TG_OP = 'INSERT' THEN
EXECUTE 'SELECT $1 FROM $2' --This is where I'd like to get the
INTO key_column_value --value of the column
USING key_column_name, NEW;
INSERT INTO inserted_kvp
(table_name, key, value)
VALUES
(TG_TABLE_NAME, key_column_name, key_column_value);
END IF;
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
So, when I INSERT into a table:
CREATE TABLE prefix_kvp1_suffix AS id SERIAL, kvp1 CHARACTER VARYING;
CREATE TABLE prefix_kvp2_suffix AS id SERIAL, kvp2 CHARACTER VARYING;
INSERT INTO prefix_kvp1_suffix VALUES (1, 'value1');
INSERT INTO prefix_kvp2_suffix VALUES (1, 'value2');
I would like for the inserted_kvp table to have the following:
| table_name |key |value |
--------------------------------
|prefix_kvp1_suffix|kvp1|value1|
|prefix_kvp2_suffix|kvp2|value2|
Instead, I get the following error when inserting:
ERROR: syntax error at or near "$2"
LINE 1: SELECT $1 FROM $2
^
QUERY: SELECT $1 FROM $2
CONTEXT: PL/pgSQL function worldmapkit.trgfn_keyvalue_insert() line 13 at EXECUTE statement
I have tried different variations of getting this value by using EXECUTE format() and a few other ways, but I am still not having any luck. Any help is appreciated!
After much fiddling, I found the answer to my question. The correct syntax for the EXECUTE statement above is:
EXECUTE format('SELECT $1.%I', key_column_name)
INTO key_column_value
USING NEW;
This will get the column value of the NEW record. Hopefully, this will help out someone in a similar situation.
Ok, here is the layout:
I have a bunch of uuid data that is in varchar format. I know uuid is its own type. This is how I got the data. So to verify this which ones are uuid, I take the uuid in type varchar and insert it into a table where the column is uuid. If the insert fails, then it is not a uuid type. My basic question is how to delete the bad uuid if the insert fails. Or, how do I delete out of one table if an insert fails in another table.
My first set of data:
drop table if exists temp1;
drop sequence if exists temp1_id_seq;
CREATE temp table temp1 (id serial, some_value varchar);
INSERT INTO temp1(some_value)
SELECT split_part(name,':',2) FROM branding_resource WHERE name LIKE '%curric%';
create temp table temp2 (id serial, other_value uuid);
CREATE OR REPLACE function verify_uuid() returns varchar AS $$
DECLARE uu RECORD;
BEGIN
FOR uu IN select * from temp1
LOOP
EXECUTE 'INSERT INTO temp2 values ('||uu.id||','''|| uu.some_value||''')';
END LOOP;
END;
$$
LANGUAGE 'plpgsql' ;
select verify_uuid();
When I run this, I get the error
ERROR: invalid input syntax for uuid:
which is what I expect. There are some bad uuids in my data set.
My research led me to Trapping Errors - Exceptions with UPDATE/INSERT in the docs.
Narrowing down to the important part:
BEGIN
FOR uu IN select * from temp1
LOOP
begin
EXECUTE 'INSERT INTO temp2 values ('||uu.id||','''|| uu.some_value||''')';
return;
exception when ??? then delete from temp1 where some_value = uu.some_value;
end;
END LOOP;
END;
I do not know what to put instead of ???. I think it relates to the ERROR: invalid input syntax for uuid:, but I am not sure. I am actually not even sure if this is the right way to go about this?
You can get the SQLSTATE code from psql using VERBOSE mode, e.g:
regress=> \set VERBOSITY verbose
regress=> SELECT 'fred'::uuid;
ERROR: 22P02: invalid input syntax for uuid: "fred"
LINE 1: SELECT 'fred'::uuid;
^
LOCATION: string_to_uuid, uuid.c:129
Here we can see that the SQLSTATE is 22P02. You can use that directly in the exception clause, but it's generally more readable to look it up in the manual to find the text representation. Here, we see that 22P02 is invalid_text_representation.
So you can write exception when invalid_text_representation then ...
#Craig shows a way to identify the SQLSTATE.
You an also use pgAdmin, which shows the SQLSTATE by default:
SELECT some_value::uuid FROM temp1
> ERROR: invalid input syntax for uuid: "-a0eebc999c0b4ef8bb6d6bb9bd380a11"
> SQL state: 22P02
I am going to address the bigger question:
I am actually not even sure if this is the right way to go about this?
Your basic approach is the right way: the 'parking in new york' method (quoting Merlin Moncure in this thread on pgsql-general). But the procedure is needlessly expensive. Probably much faster:
Exclude obviously violating strings immediately.
You should be able to weed out the lion's share of violating strings with a much cheaper regexp test.
Postgres accepts a couple of different formats for UUID in text representation, but as far as I can tell, this character class should covers all valid characters:
'[^A-Fa-f0-9{}-]'
You can probably narrow it down further for your particular brand of UUID representation (Only lower case? No curly braces? No hyphen?).
CREATE TEMP TABLE temp1 (id serial, some_value text);
INSERT INTO temp1 (some_value)
SELECT split_part(name,':',2)
FROM branding_resource
WHERE name LIKE '%curric%'
AND split_part(name,':',2) !~ '[^A-Fa-f0-9{}-]';
"Does not contain illegal characters."
Cast to test the rest
Instead of filling another table, it should be much cheaper to just delete (the now few!) violating rows:
CREATE OR REPLACE function f_kill_bad_uuid()
RETURNS void AS
$func$
DECLARE
rec record;
BEGIN
FOR rec IN
SELECT * FROM temp1
LOOP
BEGIN
PERFORM rec.some_value::uuid; -- no dynamic SQL needed
-- do not RETURN! Keep looping.
RAISE NOTICE 'Good: %', rec.some_value; -- only for demo
EXCEPTION WHEN invalid_text_representation THEN
RAISE NOTICE 'Bad: %', rec.some_value; -- only for demo
DELETE FROM temp1 WHERE some_value = rec.some_value;
END;
END LOOP;
END
$func$ LANGUAGE plpgsql;
No need for dynamic SQL. Just cast. Use PERFORM, since we are not interested in the result. We just want to see if the cast goes through or not.
Not return value. You could count and return the number of excluded rows ...
For a one-time operation you could also use a DO statement.
And do not quote the language name 'plpgsql'. It's an identifier, not a string.
SQL Fiddle.
Given the following schema:
create table account_type_a (
id SERIAL UNIQUE PRIMARY KEY,
some_column VARCHAR
);
create table account_type_b (
id SERIAL UNIQUE PRIMARY KEY,
some_other_column VARCHAR
);
create view account_type_a view AS select * from account_type_a;
create view account_type_b view AS select * from account_type_b;
I try to create a generic trigger function in plpgsql, which enables updating the view:
create trigger trUpdate instead of UPDATE on account_view_type_a
for each row execute procedure updateAccount();
create trigger trUpdate instead of UPDATE on account_view_type_a
for each row execute procedure updateAccount();
An unsuccessful effort of mine was:
create function updateAccount() returns trigger as $$
declare
target_table varchar := substring(TG_TABLE_NAME from '(.+)_view');
cols varchar;
begin
execute 'select string_agg(column_name,$1) from information_schema.columns
where table_name = $2' using ',', target_table into cols;
execute 'update ' || target_table || ' set (' || cols || ') = select ($1).*
where id = ($1).id' using NEW;
return NULL;
end;
$$ language plpgsql;
The problem is the update statement. I am unable to come up with a syntax that would work here. I have successfully implemented this in PL/Perl, but would be interested in a plpgsql-only solution.
Any ideas?
Update
As #Erwin Brandstetter suggested, here is the code for my PL/Perl solution. I incoporated some of his suggestions.
create function f_tr_up() returns trigger as $$
use strict;
use warnings;
my $target_table = quote_ident($_TD->{'table_name'}) =~ s/^([\w]+)_view$/$1/r;
my $NEW = $_TD->{'new'};
my $cols = join(',', map { quote_ident($_) } keys $NEW);
my $vals = join(',', map { quote_literal($_) } values $NEW);
my $query = sprintf(
"update %s set (%s) = (%s) where id = %d",
$target_table,
$cols,
$vals,
$NEW->{'id'});
spi_exec_query($query);
return;
$$ language plperl;
While #Gary's answer is technically correct, it fails to mention that PostgreSQL does support this form:
UPDATE tbl
SET (col1, col2, ...) = (expression1, expression2, ..)
Read the manual on UPDATE.
It's still tricky to get this done with dynamic SQL. I'll assume a simple case where views consist of the same columns as their underlying tables.
CREATE VIEW tbl_view AS SELECT * FROM tbl;
Problems
The special record NEW is not visible inside EXECUTE. I pass NEW as a single parameter with the USING clause of EXECUTE.
As discussed, UPDATE with list-form needs individual values. I use a subselect to split the record into individual columns:
UPDATE ...
FROM (SELECT ($1).*) x
(Parenthesis around $1 are not optional.) This allows me to simply use two column lists built with string_agg() from the catalog table: one with and one without table qualification.
It's not possible to assign a row value as a whole to individual columns. The manual:
According to the standard, the source value for a parenthesized
sub-list of target column names can be any row-valued expression
yielding the correct number of columns. PostgreSQL only allows the
source value to be a row constructor or a sub-SELECT.
INSERT is implemented simpler. If the structure of view and table are identical we can omit the column definition list. (Can be improved, see below.)
Solution
I made a couple of updates to your approach to make it shine.
Trigger function for UPDATE:
CREATE OR REPLACE FUNCTION f_trg_up()
RETURNS TRIGGER
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := quote_ident(TG_TABLE_SCHEMA) || '.'
|| quote_ident(substring(TG_TABLE_NAME from '(.+)_view$'));
_cols text;
_vals text;
BEGIN
SELECT INTO _cols, _vals
string_agg(quote_ident(attname), ', ')
, string_agg('x.' || quote_ident(attname), ', ')
FROM pg_attribute
WHERE attrelid = _tbl
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0; -- no system columns
EXECUTE format('
UPDATE %s
SET (%s) = (%s)
FROM (SELECT ($1).*) x', _tbl, _cols, _vals)
USING NEW;
RETURN NEW; -- Don't return NULL unless you knwo what you're doing
END
$func$;
Trigger function for INSERT:
CREATE OR REPLACE FUNCTION f_trg_ins()
RETURNS TRIGGER
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := quote_ident(TG_TABLE_SCHEMA) || '.'
|| quote_ident(substring(TG_TABLE_NAME FROM '(.+)_view$'));
BEGIN
EXECUTE format('INSERT INTO %s SELECT ($1).*', _tbl)
USING NEW;
RETURN NEW; -- Don't return NULL unless you know what you're doing
END
$func$;
Triggers:
CREATE TRIGGER trg_instead_up
INSTEAD OF UPDATE ON a_view
FOR EACH ROW EXECUTE FUNCTION f_trg_up();
CREATE TRIGGER trg_instead_ins
INSTEAD OF INSERT ON a_view
FOR EACH ROW EXECUTE FUNCTION f_trg_ins();
Before Postgres 11 the syntax (oddly) was EXECUTE PROCEDURE instead of EXECUTE FUNCTION - which also still works.
db<>fiddle here - demonstrating INSERT and UPDATE
Old sqlfiddle
Major points
Include the schema name to make the table reference unambiguous. There can be multiple table of the same name in one database with multiple schemas!
Query pg_catalog.pg_attribute instead of information_schema.columns. Less portable, but much faster and allows to use the table-OID.
How to check if a table exists in a given schema
Table names are NOT safe against SQLi when concatenated as strings for dynamic SQL. Escape with quote_ident() or format() or with an object-identifer type. This includes the special trigger function variables TG_TABLE_SCHEMA and TG_TABLE_NAME!
Cast to the object identifier type regclass to assert the table name is valid and get the OID for the catalog look-up.
Optionally use format() to build the dynamic query string safely.
No need for dynamic SQL for the first query on the catalog tables. Faster, simpler.
Use RETURN NEW instead of RETURN NULL in these trigger functions unless you know what you are doing. (NULL would cancel the INSERT for the current row.)
This simple version assumes that every table (and view) has a unique column named id. A more sophisticated version might use the primary key dynamically.
The function for UPDATE allows the columns of view and table to be in any order, as long as the set is the same.
The function for INSERT expects the columns of view and table to be in identical order. If you want to allow arbitrary order, add a column definition list to the INSERT command, just like with UPDATE.
Updated version also covers changes to the id column by using OLD additionally.
Postgresql doesn't support updating multiple columns using the set (col1,col2) = select val1,val2 syntax.
To achieve the same in postgresql you'd use
update target_table
set col1 = d.val1,
col2 = d.val2
from source_table d
where d.id = target_table.id
This is going to make the dynamic query a bit more complex to build as you'll need to iterate the column name list you're using into individual fields. I'd suggest you use array_agg instead of string_agg as an array is easier to process than splitting the string again.
Postgresql UPDATE syntax
documentation on array_agg function