Delete row if type cast fails - sql

Ok, here is the layout:
I have a bunch of uuid data that is in varchar format. I know uuid is its own type. This is how I got the data. So to verify this which ones are uuid, I take the uuid in type varchar and insert it into a table where the column is uuid. If the insert fails, then it is not a uuid type. My basic question is how to delete the bad uuid if the insert fails. Or, how do I delete out of one table if an insert fails in another table.
My first set of data:
drop table if exists temp1;
drop sequence if exists temp1_id_seq;
CREATE temp table temp1 (id serial, some_value varchar);
INSERT INTO temp1(some_value)
SELECT split_part(name,':',2) FROM branding_resource WHERE name LIKE '%curric%';
create temp table temp2 (id serial, other_value uuid);
CREATE OR REPLACE function verify_uuid() returns varchar AS $$
DECLARE uu RECORD;
BEGIN
FOR uu IN select * from temp1
LOOP
EXECUTE 'INSERT INTO temp2 values ('||uu.id||','''|| uu.some_value||''')';
END LOOP;
END;
$$
LANGUAGE 'plpgsql' ;
select verify_uuid();
When I run this, I get the error
ERROR: invalid input syntax for uuid:
which is what I expect. There are some bad uuids in my data set.
My research led me to Trapping Errors - Exceptions with UPDATE/INSERT in the docs.
Narrowing down to the important part:
BEGIN
FOR uu IN select * from temp1
LOOP
begin
EXECUTE 'INSERT INTO temp2 values ('||uu.id||','''|| uu.some_value||''')';
return;
exception when ??? then delete from temp1 where some_value = uu.some_value;
end;
END LOOP;
END;
I do not know what to put instead of ???. I think it relates to the ERROR: invalid input syntax for uuid:, but I am not sure. I am actually not even sure if this is the right way to go about this?

You can get the SQLSTATE code from psql using VERBOSE mode, e.g:
regress=> \set VERBOSITY verbose
regress=> SELECT 'fred'::uuid;
ERROR: 22P02: invalid input syntax for uuid: "fred"
LINE 1: SELECT 'fred'::uuid;
^
LOCATION: string_to_uuid, uuid.c:129
Here we can see that the SQLSTATE is 22P02. You can use that directly in the exception clause, but it's generally more readable to look it up in the manual to find the text representation. Here, we see that 22P02 is invalid_text_representation.
So you can write exception when invalid_text_representation then ...

#Craig shows a way to identify the SQLSTATE.
You an also use pgAdmin, which shows the SQLSTATE by default:
SELECT some_value::uuid FROM temp1
> ERROR: invalid input syntax for uuid: "-a0eebc999c0b4ef8bb6d6bb9bd380a11"
> SQL state: 22P02
I am going to address the bigger question:
I am actually not even sure if this is the right way to go about this?
Your basic approach is the right way: the 'parking in new york' method (quoting Merlin Moncure in this thread on pgsql-general). But the procedure is needlessly expensive. Probably much faster:
Exclude obviously violating strings immediately.
You should be able to weed out the lion's share of violating strings with a much cheaper regexp test.
Postgres accepts a couple of different formats for UUID in text representation, but as far as I can tell, this character class should covers all valid characters:
'[^A-Fa-f0-9{}-]'
You can probably narrow it down further for your particular brand of UUID representation (Only lower case? No curly braces? No hyphen?).
CREATE TEMP TABLE temp1 (id serial, some_value text);
INSERT INTO temp1 (some_value)
SELECT split_part(name,':',2)
FROM branding_resource
WHERE name LIKE '%curric%'
AND split_part(name,':',2) !~ '[^A-Fa-f0-9{}-]';
"Does not contain illegal characters."
Cast to test the rest
Instead of filling another table, it should be much cheaper to just delete (the now few!) violating rows:
CREATE OR REPLACE function f_kill_bad_uuid()
RETURNS void AS
$func$
DECLARE
rec record;
BEGIN
FOR rec IN
SELECT * FROM temp1
LOOP
BEGIN
PERFORM rec.some_value::uuid; -- no dynamic SQL needed
-- do not RETURN! Keep looping.
RAISE NOTICE 'Good: %', rec.some_value; -- only for demo
EXCEPTION WHEN invalid_text_representation THEN
RAISE NOTICE 'Bad: %', rec.some_value; -- only for demo
DELETE FROM temp1 WHERE some_value = rec.some_value;
END;
END LOOP;
END
$func$ LANGUAGE plpgsql;
No need for dynamic SQL. Just cast. Use PERFORM, since we are not interested in the result. We just want to see if the cast goes through or not.
Not return value. You could count and return the number of excluded rows ...
For a one-time operation you could also use a DO statement.
And do not quote the language name 'plpgsql'. It's an identifier, not a string.
SQL Fiddle.

Related

Extract column values from one table and insert with modifications into another

I have created a PL/pgSQL function that accepts two column names, a "relation", and two table names. It finds distinct rows in one table and inserts them in to a temporary table, deletes any row with a null value, and sets all values of one column to relation. I have the first part of the process using this function.
create or replace function alt_edger(s text, v text, relation text, tbl text, tbl_src text)
returns void
language plpgsql as
$func$
begin
raise notice 's: %, v: %, tbl: %, tbl_src: %', s,v,tbl,tbl_src;
execute ('insert into '||tbl||' ("source", "target") select distinct "'||s||'","'||v||'" from '||tbl_src||'');
execute ('DELETE FROM '||tbl||' WHERE "source" IS null or "target" is null');
end
$func$;
It is executed as follows:
-- create a temporary table and execute the function twice
drop table if exists temp_stack;
create temporary table temp_stack("label" text, "source" text, "target" text, "attr" text, "graph" text);
select alt_edger('x_x', 'y_y', ':associated_with', 'temp_stack','pg_check_table' );
select alt_edger('Document Number', 'x_x', ':documents', 'temp_stack','pg_check_table' );
select * from temp_stack;
Note that I didn't use relation, yet. The INSERT shall also assign relation, but I can't figure out how to make that happen to get something like:
label
source
target
attr
graph
:associated_with
638000
ARAS
:associated_with
202000
JASE
:associated_with
638010
JASE
:associated_with
638000
JASE
:associated_with
202100
JASE
:documents
A
638010
:documents
A
202000
:documents
A
202100
:documents
B
638000
:documents
A
638000
:documents
B
124004
:documents
B
202100
My challenges are:
How to integrate relation in the INSERT? When I try to use VALUES and comma separation I get an "error near select".
How to allow strings starting with ":" in relation? I'm anticipating here, the inclusion of the colon has given me challenges in the past.
How can I do this? Or is there a better approach?
Toy data model:
drop table if exists pg_check_table;
create temporary table pg_check_table("Document Number" text, x_x int, y_y text);
insert into pg_check_table values ('A',202000,'JASE'),
('A',202100,'JASE'),
('A',638010,'JASE'),
('A',Null,'JASE'),
('A',Null,'JASE'),
('A',202100,'JASE'),
('A',638000,'JASE'),
('A',202100,'JASE'),
('B',638000,'JASE'),
('B',202100,null),
('B',638000,'JASE'),
('B',null,'ARAS'),
('B',638000,'ARAS'),
('B',null,'ARAS'),
('B',638000,null),
('B',124004,null);
alter table pg_check_table add row_num serial;
select * from pg_check_table;
-- DROP FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text)
CREATE OR REPLACE FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text, OUT row_count int)
LANGUAGE plpgsql AS
$func$
DECLARE
_sql text := format(
'INSERT INTO pg_temp.%3$I (label, source, target)
SELECT DISTINCT $1, %1$I, %2$I FROM pg_temp.%4$I
WHERE (%1$I, %2$I) IS NOT NULL'
, _s, _v, _tbl, _tbl_src);
BEGIN
-- RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _relation;
GET DIAGNOSTICS row_count = ROW_COUNT; -- return number of inserted rows
END
$func$;
db<>fiddle here
Most importantly, use format() to concatenate your dynamic SQL commands safely. And use the format specifier %I for identifiers. This way, SQL injection is not possible and identifiers are double-quoted properly - preserving non-standard names like Document Number. That's where your original failed.
We could concatenate _relation as string to be inserted into label, too. But the preferable way to pass values to EXECUTE is with the USING clause. $1 inside the SQL string passed to EXECUTE is a placeholder for the first USING argument. Not to be confused with $1 referencing function parameters in the context of the function body outside EXECUTE! (You can pass any string, leading colon (:) does not matter, the string is not interpreted when done right.)
See:
Format specifier for integer variables in format() for EXECUTE?
Table name as a PostgreSQL function parameter
I replaced the DELETE in your original with a WHERE clause to the SELECT of the INSERT. Don't insert rows in the first place, instead of deleting them again later.
(%1$I, %2$I) IS NOT NULL only qualifies when both values are NOT NULL.
About that:
Check if a Postgres composite field is null/empty
Don't use the prefix "pg_" for your table names. That's what Postgres uses for system tables. Don't mess with those.
I schema-qualify known temporary tables with pg_temp. That's typically optional as the temporary schema comes first in the search_path by default. But that can be changed (maliciously), and then the table name would resolve to any existing regular table of the same name in the search_path. So better safe than sorry. See:
How does the search_path influence identifier resolution and the "current schema"
I made the function return the number of inserted rows. That's totally optional!
Since I do that with an OUT parameter, I am allowed to skip the RETURNS clause. See:
Can I make a plpgsql function return an integer without using a variable?

How to check if an sequence is above a certain number and if not change it in Postgres

I have a problem where my SQL sequence has to be above a certain number to fix a unique constraint error. Now I started to write an if-statement that checks for a certain number and if it's below it should be increased to a certain number. The statement is for Postgres.
I got the separate parts running but the connection over if is throwing an error and I don't know why.
First for selecting the current number:
SELECT nextval('mySequence')
Then to update the number:
SELECT setval('mySequence', targetNumber, true)
The full statement looks something like this in my tries:
IF (SELECT nextval('mySequence') < targetNumber)
THEN (SELECT setval('mySequence', targetNumber, true))
END IF;
and the error is
ERROR: syntax error at »IF«
Can someone explain to me what I did wrong there because the error message isn't giving me much to work with? I would appreciate your help.
Try this:
SELECT setval('mySequence', targetNumber, true)
WHERE (SELECT nextval('mySequence') < targetNumber) is true;
You can use postgres functions if you want to use IF statement.
You can try something like this:
CREATE SEQUENCE seq_test_id_seq;
CREATE TABLE seq_test(
id integer NOT NULL DEFAULT nextval('seq_test_id_seq'),
name VARCHAR
);
CREATE OR REPLACE FUNCTION seq_test_function(target_number bigint)
RETURNS void
LANGUAGE 'plpgsql'
VOLATILE
PARALLEL UNSAFE
COST 100
AS $BODY$
DECLARE
seq_val INTEGER;
BEGIN
SELECT nextval('seq_test_id_seq') INTO seq_val;
RAISE NOTICE 'NEXT SEQUENCE [%]', seq_val;
IF (seq_val < target_number) THEN
SELECT setval('seq_test_id_seq', target_number, true) INTO seq_val;
RAISE NOTICE 'SEQUENCE VALUE MODIFIED [%]', seq_val;
END IF;
END;
$BODY$;
Then call the procedure:
select seq_test_function(10);

query has no destination for result data in a function that has a set of instructions in postgresql

I am trying to automate a set of sentences that I execute several times a day. For this I want to put them in a postgres function and just call the function to execute the sentences consecutively. If everything runs OK then in the end return the SUCCESS value. The following function replicates my idea and the error I am getting when executing the function:
CREATE OR REPLACE FUNCTION createTable() RETURNS int AS $$
BEGIN
DROP TABLE IF EXISTS MY_TABLE;
CREATE TABLE MY_TABLE
(
ID integer
)
WITH (
OIDS=FALSE
);
insert into MY_TABLE values(1);
select * from MY_TABLE;
RETURN 'SUCCESS';
END;
$$ LANGUAGE plpgsql;
Invocation:
select * from createTable();
With my ignorance of postgresql I would expect to obtain the SUCCESS value as a return (If everything runs without errors). But the returned message causes me confusion, isn't it the same as a function in any other programming language? When executing the function I get the following message:
query has no destination for result data Hint: If you want to
discard the results of a SELECT, use PERFORM instead.
query has no destination for result data Hint: If you want to discard the results of a SELECT, use PERFORM instead.
You are getting this error because you do not assign the results to any variable in the function. In a function, you would typically do something like this instead:
select * into var1 from MY_TABLE;
Therefore, your function would look something like this:
CREATE OR REPLACE FUNCTION createTable() RETURNS int AS $$
DECLARE
var1 my_table%ROWTYPE;
BEGIN
DROP TABLE IF EXISTS MY_TABLE;
CREATE TABLE MY_TABLE
(
ID integer
)
WITH (
OIDS=FALSE
);
insert into MY_TABLE values(1);
select * into var1 from MY_TABLE;
<do something with var1>
RETURN 'SUCCESS';
END;
$$ LANGUAGE plpgsql;
Otherwise, if you don't put the results into a variable, then you're likely hoping to achieve some side effect (like advancing a sequence or firing a trigger somehow). In that case, plpgsql expects you to use PERFORM instead of SELECT
Also, BTW your function RETURNS int but at the bottom of your definition you RETURN 'SUCCESS'. SUCCESS is a text type, not an int, so you will eventually get this error once you get past that first error message -- be sure to change it as necessary.

PostgreSQL variable in select and delete statements

The Problem: I have many delete lines in a PostgreSQL script where I am deleting data related to the same item in the database. Example:
delete from <table> where <column>=180;
delete from <anothertable> where <column>=180;
...
delete from <table> where <column>=180;
commit work;
There are about 15 delete statements deleting data that references <column>=180.
I have tried to replace the 180 with a variable so that I only have to change the variable, instead of all the lines in the code (like any good programmer would do). I can't seem to figure out how to do it, and it's not working.
NOTE: I am very much a SQL novice (I rarely use it), so I know there's probably a better way to do this, but please enlighten me on how I can fix this problem.
I have used these answers to try and fix it with no luck: first second third. I've even gone to the official PostgreSQL documentation, with no luck.
This is what I'm trying (these lines are just for testing and not in the actual script):
DO $$
DECLARE
variable INTEGER:
BEGIN
variable := 101;
SELECT * FROM <table> WHERE <column> = variable;
END $$;
I've also tried just delcaring it like this:
DECLARE variable INTEGER := 101;
Whenever I run the script after replacing one of the numbers with a variable this is the error I get:
SQL Error [42601]: ERROR: query has no destination for result data
Hint: If you want to discard the results of a SELECT, use PERFORM instead.
Where: PL/pgSQL function inline_code_block line 6 at SQL statement
Can someone tell me where I'm going wrong? It would be nice to only have to change the number in the variable, instead of in all the lines in the script, and I just can't seem to figure it out.
As #Vao Tsun said, you must define a destination to your SELECT statement. Use PERFORM otherwise:
--Test data
CREATE TEMP TABLE my_table (id, description) AS
VALUES (1, 'test 1'), (2, 'test 2'), (101, 'test 101');
--Example procedure
CREATE OR REPLACE FUNCTION my_procedure(my_arg my_table) RETURNS VOID AS $$
BEGIN
RAISE INFO 'Procedure: %,%', my_arg.id, my_arg.description;
END
$$ LANGUAGE plpgsql;
DO $$
DECLARE
variable INTEGER;
my_record my_table%rowtype;
BEGIN
variable := 101;
--Use your SELECT inside a LOOP to work with result
FOR my_record IN SELECT * FROM my_table WHERE id = variable LOOP
RAISE INFO 'Loop: %,%', my_record.id, my_record.description;
END LOOP;
--Use SELECT to populate a variable.
--In this case you MUST define a destination to your result data
SELECT * INTO STRICT my_record FROM my_table WHERE id = variable;
RAISE INFO 'Select: %,%', my_record.id, my_record.description;
--Use PERFORM instead of SELECT if you want to discard result data
--It's often used to call a procedure
PERFORM my_procedure(t) FROM my_table AS t WHERE id = variable;
END $$;
--DROP FUNCTION my_procedure(my_table);

Input table for PL/pgSQL function

I would like to use a plpgsql function with a table and several columns as input parameter. The idea is to split the table in chunks and do something with each part.
I tried the following function:
CREATE OR REPLACE FUNCTION my_func(Integer)
RETURNS SETOF my_part
AS $$
DECLARE
out my_part;
BEGIN
FOR i IN 0..$1 LOOP
FOR out IN
SELECT * FROM my_func2(SELECT * FROM table1 WHERE id = i)
LOOP
RETURN NEXT out;
END LOOP;
END LOOP;
RETURN;
END;
$$
LANGUAGE plpgsql;
my_func2() is the function that does some work on each smaller part.
CREATE or REPLACE FUNCTION my_func2(table1)
RETURNS SETOF my_part2 AS
$$
BEGIN
RETURN QUERY
SELECT * FROM table1;
END
$$
LANGUAGE plpgsql;
If I run:
SELECT * FROM my_func(99);
I guess I should receive the first 99 IDs processed for each id.
But it says there is an error for the following line:
SELECT * FROM my_func2(select * from table1 where id = i)
The error is:
The subquery is only allowed to return one column
Why does this happen? Is there an easy way to fix this?
There are multiple misconceptions here. Study the basics before you try advanced magic.
Postgres does not have "table variables". You can only pass 1 column or row at a time to a function. Use a temporary table or a refcursor (like commented by #Daniel) to pass a whole table. The syntax is invalid in multiple places, so it's unclear whether that's what you are actually trying.
Even if it is: it would probably be better to process one row at a time or rethink your approach and use a set-based operation (plain SQL) instead of passing cursors.
The data types my_part and my_part2 are undefined in your question. May be a shortcoming of the question or a problem in the test case.
You seem to expect that the table name table1 in the function body of my_func2() refers to the function parameter of the same (type!) name, but this is fundamentally wrong in at least two ways:
You can only pass values. A table name is an identifier, not a value. You would need to build a query string dynamically and execute it with EXECUTE in a plpgsql function. Try a search, many related answers her on SO. Then again, that may also not be what you wanted.
table1 in CREATE or REPLACE FUNCTION my_func2(table1) is a type name, not a parameter name. It means your function expects a value of the type table1. Obviously, you have a table of the same name, so it's supposed to be the associated row type.
The RETURN type of my_func2() must match what you actually return. Since you are returning SELECT * FROM table1, make that RETURNS SETOF table1.
It can just be a simple SQL function.
All of that put together:
CREATE or REPLACE FUNCTION my_func2(_row table1)
RETURNS SETOF table1 AS
'SELECT ($1).*' LANGUAGE sql;
Note the parentheses, which are essential for decomposing a row type. Per documentation:
The parentheses are required here to show that compositecol is a column name not a table name
But there is more ...
Don't use out as variable name, it's a keyword of the CREATE FUNCTION statement.
The syntax of your main query my_func() is more like psudo-code. Too much doesn't add up.
Proof of concept
Demo table:
CREATE TABLE table1(table1_id serial PRIMARY KEY, txt text);
INSERT INTO table1(txt) VALUES ('a'),('b'),('c'),('d'),('e'),('f'),('g');
Helper function:
CREATE or REPLACE FUNCTION my_func2(_row table1)
RETURNS SETOF table1 AS
'SELECT ($1).*' LANGUAGE sql;
Main function:
CREATE OR REPLACE FUNCTION my_func(int)
RETURNS SETOF table1 AS
$func$
DECLARE
rec table1;
BEGIN
FOR i IN 0..$1 LOOP
FOR rec IN
SELECT * FROM table1 WHERE table1_id = i
LOOP
RETURN QUERY
SELECT * FROM my_func2(rec);
END LOOP;
END LOOP;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM my_func(99);
SQL Fiddle.
But it's really just a a proof of concept. Nothing useful, yet.
As the error log is telling you.. you can return only one column in a subquery, so you have to change it to
SELECT my_func2(SELECT Specific_column_you_need FROM hasval WHERE wid = i)
a possible solution can be that you pass to funct2 the primary key of the table your funct2 needs and then you can obtain the whole table by making the SELECT * inside the function