I'm looking for some tips to get this PostgreSQL plpgsql function to work. I would like to do a couple of regex's on each record and update only one time as doing them one by one takes 15 min. ...
CREATE OR REPLACE FUNCTION clean_column() RETURNS void AS $$
DECLARE
r record;
reg text[] := array[
['search','replace'],
['search','replace'],
['search','replace'],
['search','replace']
];
var text[];
tmp text;
BEGIN
for r in
select column from mytable
loop -- loop over all records
tmp = r;
FOREACH var SLICE 1 IN ARRAY reg
LOOP -- loop over all changes
tmp = regexp_replace(tmp,var[1],var[2]);
END LOOP;
UPDATE mytable SET r = tmp;
end loop;
END
$$ LANGUAGE plpgsql;
... there is a problem with r as it is not assigned. Probably my lack of understanding of how plpgsql works.
Maybe there is an other way to do multiple changes on a record field?
Your function would update the row repeatedly, writing a new row version every time. That would still be hugely inefficient.
The point must be to update every row only once. And only if anything actually changes.
CREATE OR REPLACE FUNCTION clean_column(INOUT _text text)
LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
DECLARE
_reg CONSTANT text[] := ARRAY[
['search','replace']
, ['search','replace']
, ['search','replace']];
_var text[];
BEGIN
FOREACH _var SLICE 1 IN ARRAY _reg
LOOP
_text := regexp_replace(_text, _var[1], _var[2]);
END LOOP;
END
$func$;
The function does not run the UPDATE itself, just the string processing. Use that function in your UPDATE like so:
UPDATE tbl
SET col = clean_column(col)
WHERE col IS DISTINCT FROM clean_column(col) -- ① !
AND col IS NOT NULL -- ② ?
① Skip updates that would not change anything.
② Skip rows with NULL early (without even evaluating the function). Only relevant if column can be NULL, of course.
Performance will differ by orders of magnitude.
Related
I'd like to create a DB function that will accept a list of numbers and return a list of numbers. For each item in the list that was passed to the function, it should check some condition and add it to the response list. However, I don't think the way I am trying to do it is really a correct one. What I tried writing is basically some pseudo code here.
CREATE OR REPLACE FUNCTION map_numbers(numbers integer[])
returns integer[]
AS
$BODY$
DECLARE return_list integer[];
FOREACH field IN ARRAY numbers LOOP
CASE
WHEN field = 3 THEN -- add 43 (this was a random thought, but I am basically trying to map a few of the numbers to different values)
END
END LOOP;
RETURN QUERY SELECT * FROM return_list;
$BODY$
LANGUAGE sql VOLATILE
COST 100;
You need to put this into an IF statement. To append a value to an array use ||
CREATE OR REPLACE FUNCTION map_numbers(numbers integer[])
returns integer[]
AS
$BODY$
DECLARE
return_list integer[] := integer[];
BEGIN
FOREACH field IN ARRAY numbers LOOP
if field = 3 then
return_list := return_list || 43;
elsif field = 15 then
return_list := return_list || 42;
else
return_list := return_list || field;
end if;
END LOOP;
return return_list; --<< no SELECT required, just return the variable
END;
$BODY$
LANGUAGE plpgsql --<< you need PL/pgSQL, not SQL for the above
STABLE;
This can also be done using SQL rather than PL/pgSQL which usually is more efficient:
CREATE OR REPLACE FUNCTION map_numbers(numbers integer[])
returns integer[]
AS
$BODY$
select array_agg(field order by idx)
from (
select case
when field = 3 then 43
when field = 15 then 42
else field
end as field,
idx
from unnest(numbers) with ordinality as t(field, idx)
) x;
$BODY$
LANGUAGE sql
STABLE;
I have an express.js server running an application and from that server I can access or create "variant_id"s in PostgreSQL (Version 11) by using a stored procedure.
SELECT(get_or_create_variant_id(info_about_variant));
Sometimes I also need to get a bunch of these variant ids back by using a different stored procedure that takes multiple variants and returns multiple ids.
SELECT(get_or_create_variant_ids([info_about_variant, info_about_another_variant]));
What is the best way to generalize getting/creating a single id to doing multiple at once? I'm handling it in a LOOP in my stored procedure, but it feels like I should be able to use a JOIN instead.
CREATE OR REPLACE FUNCTION get_or_create_variant_id(
variant_in VARIANT_TYPE
) RETURNS INT AS $$
DECLARE variant_id_out INTEGER;
BEGIN
-- I'll be changing this to a ON CONFLICT block shortly
SELECT(get_variant_id(variant_in) INTO variant_id_out);
IF (variant_id_out IS NOT NULL) THEN
RETURN variant_id_out;
ELSE
INSERT INTO public.variant (
[some_fields]
)
VALUES (
[some_values]
)
RETURNING variant_id INTO variant_id_out;
RETURN variant_id_out;
END IF;
END;
$$ LANGUAGE plpgsql;
-- What is the best way to avoid a loop here?
CREATE OR REPLACE FUNCTION get_or_create_variant_ids(
variants_in VARIANT_TYPE []
) RETURNS INT [] AS $$
DECLARE variant_ids_out INTEGER [];
DECLARE variants_in_length INTEGER;
DECLARE current_variant_id INTEGER;
BEGIN
SELECT (array_length(variants_in, 1) INTO variants_in_length);
FOR i IN 1..variants_in_length LOOP
SELECT(get_or_create_variant_id(variants_in[i]) INTO current_variant_id);
SELECT(array_append(variant_ids_out, current_variant_id) INTO variant_ids_out);
END LOOP;
RETURN variant_ids_out;
END;
$$ LANGUAGE plpgsql;
-- Everything below is included for completeness, but probably less relevant to my question.
CREATE TYPE variant_type AS (
[lots of info about the variant]
);
CREATE OR REPLACE FUNCTION get_variant_id(
variant_in VARIANT_TYPE
) RETURNS INT AS $$
DECLARE variant_id_out INTEGER;
BEGIN
SELECT variant_id into variant_id_out
FROM public.variant
WHERE
[I want them to]
;
RETURN variant_id_out;
END;
$$ LANGUAGE plpgsql;
You can avoid explicit loop using builtin array functions - in this case, unnest function, and array constructor.
CREATE OR REPLACE FUNCTION get_or_create_variant_ids_v2(
variants_in VARIANT_TYPE []
)
RETURNS integer []
LANGUAGE sql AS $$
SELECT ARRAY(
SELECT get_or_create_variant_id(u.v)
FROM unnest(variants_in) AS u(v)
)
$$ LANGUAGE sql;
I want send whole columns to my function! Something like function min() or max(), is it possible?
How to check query results row by row? I wrote something like:
CREATE OR REPLACE FUNCTION gowno.kiki(temppp INTEGER ) RETURNS INTEGER AS
$$
DECLARE
val INTEGER := 0;
i tyczka%ROWTYPE;
BEGIN
FOR i IN (SELECT adres FROM tyczka)
LOOP
RETURN CAST(i.adres AS INTEGER);
IF CAST(i.adres AS INTEGER) > val THEN
val = CAST(i.adres_ AS INTEGER);
END IF;
END LOOP;
RETURN val;
END
$$
LANGUAGE 'plpgsql'
For example, I have something like the following table. And I want to calculate the differences between the field in column poziom_wody where id is the same.
The syntax of your function would work like this:
CREATE OR REPLACE FUNCTION foo(temppp int)
RETURNS INTEGER AS
$func$
DECLARE
val int := 0;
i tyczka;
BEGIN
FOR i IN
SELECT * FROM tyczka
LOOP
RETURN i.adres::int;
IF i.adres::int > val THEN
val := i.adres::int;
END IF;
END LOOP;
RETURN val;
END
$func$ LANGUAGE plpgsql;
Depending on your table definition this could be further simplified.
Depending on what you want to achieve exactly, there is probably a superior set-based approach. Looping is typically a measure of last resort in a relational database.
There are occasions where it's the best one, but people coming from procedural languages tend to over-use loops.
Added example
To get the difference between the maximum and minimum poziom_wody with the same id_rzeki:
SELECT max(poziom_wody) - min(poziom_wody) AS diff
FROM tbl
WHERE id_rzeki = 2;
This works for any number of rows.
I have the following plpgsql procedure;
DECLARE
_r record;
point varchar[] := '{}';
i int := 0;
BEGIN
FOR _r IN EXECUTE ' SELECT a.'|| quote_ident(column) || ' AS point,
FROM ' || quote_ident (table) ||' AS a'
LOOP
point[i] = _r;
i = i+1;
END LOOP;
RETURN 'OK';
END;
Which its main objective is to traverse a table and store each value of the row in an array. I am still new to plpgsql. Can anyone point out is the error as it is giving me the following error;
This is the complete syntax (note that I renamed the parameter column to col_name as column is reserved word. The same goes for table)
create or replace function foo(col_name text, table_name text)
returns text
as
$body$
DECLARE
_r record;
point character varying[] := '{}';
i int := 0;
BEGIN
FOR _r IN EXECUTE 'SELECT a.'|| quote_ident(col_name) || ' AS pt, FROM ' || quote_ident (table_name) ||' AS a'
loop
point[i] = _r;
i = i+1;
END LOOP;
RETURN 'OK';
END;
$body$
language plpgsql;
Although to be honest: I fail so see what you are trying to achieve here.
#a_horse fixes most of the crippling problems with your failed attempt.
However, nobody should use this. The following step-by-step instructions should lead to a sane implementation with modern PostgreSQL.
Phase 1: Remove errors and mischief
Remove the comma after the SELECT list to fix the syntax error.
You start your array with 0, while the default is to start with 1. Only do this if you need to do it. Leads to unexpected results if you operate with array_upper() et al. Start with 1 instead.
Change RETURN type to varchar[] to return the assembled array and make this demo useful.
What we have so far:
CREATE OR REPLACE FUNCTION foo(tbl varchar, col varchar)
RETURNS varchar[] LANGUAGE plpgsql AS
$BODY$
DECLARE
_r record;
points varchar[] := '{}';
i int := 0;
BEGIN
FOR _r IN
EXECUTE 'SELECT a.'|| quote_ident(col) || ' AS pt
FROM ' || quote_ident (tbl) ||' AS a'
LOOP
i = i + 1; -- reversed order to make array start with 1
points[i] = _r;
END LOOP;
RETURN points;
END;
$BODY$;
Phase 2: Remove cruft, make it useful
Use text instead of character varying / varchar for simplicity. Either works, though.
You are selecting a single column, but use a variable of type record. This way a whole record is being coerced to text, which includes surrounding parenthesis. Hardly makes any sense. Use a text variable instead. Works for any column if you explicitly cast to text (::text). Any type can be cast to text.
There is no point in initializing the variable point. It can start as NULL here.
Table and column aliases inside EXECUTE are of no use in this case. Dynamically executed SQL has its own scope!.
No semicolon (;) needed after final END in a plpgsql function.
It's simpler to just append each value to the array with || .
Almost sane:
CREATE OR REPLACE FUNCTION foo1(tbl text, col text)
RETURNS text[] LANGUAGE plpgsql AS
$func$
DECLARE
point text;
points text[];
BEGIN
FOR point IN
EXECUTE 'SELECT '|| quote_ident(col) || '::text FROM ' || quote_ident(tbl)
LOOP
points = points || point;
END LOOP;
RETURN points;
END
$func$;
Phase 3: Make it shine in modern PL/pgSQL
If you pass a table name as text, you create an ambiguous situation. You can prevent SQLi just fine with format() or quote_ident(), but this will fail with tables outside your search_path.
Then you need to add schema-qualification, which creates an ambiguous value. 'x.y' could stand for the table name "x.y" or the schema-qualified table name "x"."y". You can't pass "x"."y" since that will be escaped into """x"".""y""". You'd need to either use an additional parameter for the schema name or one parameter of type regclass regclass is automatically quoted as need when coerced to text and is the elegant solution here.
The new format() is simpler than multiple (or even a single) quote_ident() call.
You did not specify any order. SELECT returns rows in arbitrary order without ORDER BY. This may seem stable, since the result is generally reproducible as long as the underlying table doesn't change. But that's 100% unreliable. You probably want to add some kind of ORDER BY.
Finally, you don't need to loop at all. Use a plain SELECT with an Array constructor.
Use an OUT parameter to further simplify the code
Proper solution:
CREATE OR REPLACE FUNCTION f_arr(tbl regclass, col text, OUT arr text[])
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE format('SELECT ARRAY(SELECT %I::text FROM %s ORDER BY 1)', col, tbl)
INTO arr;
END
$func$;
Call:
SELECT f_arr('myschema.mytbl', 'mycol');
I am just starting out on functions in PostgreSQL, and this is probably pretty basic, but how is this done?
I would like to be able to use the following in a function:
PERFORM id_exists();
IF FOUND THEN
-- Do something
END IF;
where the id_exists() function (to be used with SELECT and PERFORM) is:
CREATE OR REPLACE FUNCTION id_exists() RETURNS int AS $$
DECLARE
my_id int;
BEGIN
SELECT id INTO my_id
FROM tablename LIMIT 1;
RETURN my_id;
END;
$$ LANGUAGE plpgsql;
Currently, even when my_id does not exist in the table, FOUND is true, presumably because a row is still being returned (a null integer)? How can this be re-written so that an integer is returned if found, otherwise nothing at all is?
Your assumption is correct, FOUND is set to TRUE if the last statement returned a row, regardless of the value (may be NULL in your case). Details in the manual here.
Rewrite to, for instance:
IF id_exists() IS NOT NULL THEN
-- Do something
END IF;
Or rewrite the return value of your function with SETOF so it can return multiple rows - or no row! Use RETURN QUERY like I demonstrate. You can use this function in your original setting.
CREATE OR REPLACE FUNCTION id_exists()
RETURNS SETOF int LANGUAGE plpgsql AS
$BODY$
BEGIN
RETURN QUERY
SELECT id
FROM tablename
LIMIT 1;
END;
$BODY$;
Or, even simpler with a language SQL function:
CREATE OR REPLACE FUNCTION id_exists()
RETURNS SETOF int LANGUAGE sql AS
$BODY$
SELECT id
FROM tablename
LIMIT 1;
$BODY$;