Postgresql insert trigger becomes slow when querying current table - sql

When inserting a lot number into a table we are counting the number times the base number exists, and adding a -## to the end of the new number based on that count.
I have stripped out most the logic (we check for other things as well). I also am aware of the logic flaw here that would skip -1.
-- Function: stone._lsuniqueid()
-- DROP FUNCTION stone._lsuniqueid();
CREATE OR REPLACE FUNCTION stone._lsuniqueid()
RETURNS trigger AS
$BODY$
DECLARE
_count INTEGER;
BEGIN
-- Obtain the number of occurences of this new ls_number
SELECT COUNT(ls_number) into _count
FROM ls
WHERE ls_number LIKE CAST(NEW.ls_number || '%' AS text);
-- Allow new ls_numbers to be entered as is, otherwise add "-#{count + 1}"
-- to the end of the ls_number
if _count > 0 THEN
NEW.ls_number = NEW.ls_number || '-' || CAST(_count + 1 AS text);
END IF;
RETURN NEW;
END
$BODY$
INSERT INTO ls VALUES (NEXTVAL('ls_ls_id_seq'),7285,UPPER('20151012'));
--> Query returned successfully: one row affected, 391 ms execution time.
The count query is plenty fast
SELECT COUNT(ls_number)
FROM ls
WHERE ls_number LIKE CAST('20151012' || '%' AS text);
--> 19ms
For comparison I tried a similar trigger, but ran the count against a different table with same amount of rows, and similar query time.
SELECT COUNT(lsdetail_id)
FROM lsdetail
WHERE lsdetail_id > 2433308
--> 20ms
Running the same insert with the count running against a different table returns the result 20 times faster.
INSERT INTO ls VALUES (NEXTVAL('ls_ls_id_seq'),7285,UPPER('20151012'));
--> Query returned successfully: one row affected, 20 ms execution time.
The ls table has about 2.5 million rows
I've tried a couple of different things and the issue seems to be when selecting from the same table I'm inserting into.
I would like to know why this happening, but I would also be open to a better way to create "sub-lot" numbers.
Thanks!

Found the answer here:
http://www.postgresql.org/message-id/27705.1150381444#sss.pgh.pa.us
Re: How to analyze function performance
"Mindaugas" writes:
Is it possible to somehow analyze function performance? E.g. we are using function cleanup() which takes obviously too much time to execute but I have problems trying to figure what is slowing things down.
When I explain analyze function lines step by step it show quite acceptable performance.
--
Are you sure you are "explain analyze"ing the same queries the function
is really doing? You have to account for the fact that what plpgsql is
issuing is parameterized queries, and sometimes that limits the
planner's ability to pick a good plan. For instance, if you have
declare x int;
begin
...
for r in select * from foo where key = x loop ...
then what is really getting planned and executed is "select * from foo
where key = $1" --- every plpgsql variable gets replaced by a parameter
symbol "$n". You can model this for EXPLAIN purposes with a prepared
statement:
prepare p1(int) as select * from foo where key = $1;
explain analyze execute p1(42);
If you find out that a particular query really sucks when parameterized,
you can work around this by using EXECUTE to force the query to be
planned afresh on each use with literal constants instead of parameters:
Then I looked into this:
http://www.postgresql.org/docs/9.1/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN
39.5.4. Executing Dynamic Commands
Oftentimes you will want to generate dynamic commands inside your PL/pgSQL functions, that is, commands that will involve different tables or different data types each time they are executed. PL/pgSQL's normal attempts to cache plans for commands (as discussed in Section 39.10.2) will not work in such scenarios. To handle this sort of problem, the EXECUTE statement is provided:
EXECUTE 'SELECT count(*) FROM mytable WHERE inserted_by = $1 AND inserted <= $2'
INTO c
USING checked_user, checked_date;
--
So in the end it was a matter of updating count select to this:
EXECUTE 'SELECT COALESCE(COUNT(ls_number), 0) FROM ls WHERE ls_number LIKE $1 || ''%'';'
INTO _count
USING NEW.ls_number;

Related

Statistics rapidly changing within transaction - fixing execution plan

A problem I'm facing (Oracle 11g):
I create a table, let's call it table_xyz, with index (not unique, no primary key).
I create package with procedure that will insert let's say 10 millions ofrecords monthly- it's not a simple "insert into", it's thousand lines of procedural code and some of it actually also selects data from table_xyz to calculate what data to insert further.
For example somewhere within the procedure there is this query
Now, there is a problem:
When the procedure is run for the first time, all queries on table_xyz will have execution plan based on the moment, when there were 0 records in table_xyz.
So, all queries will effectively full scan table_xyz, instead of starting to use indexes at some point.
This leads to terrible performance and in my case actually, the first run will never finish...
Now, there are three approaches i thought of:
At some point within the transaction, recalculate statistics. For example run "analyze table / analyze index compute statistics" after the count of records in table_xyz reaches the power of 10.
This is not possible, however, since ANALYZE commits transaction and i cannot allow that
At some point within the transaction, recalculate statistics as above, but do it in autonomous transaction. This does not work however, since the new statistics will not be visible for the main transaction (i tested that).
Just hint all the cursors that use table_xyz with USE_INDEX. This gets the job done, but is ugly and generally frowned upon in the codebase.
Are there any other ways?
I attach some code. It is just an example, please do not try to optimize it by removing the procedure and so on.
create table table_xyz (idx number(10) /*+ Specifically this is NOT a primary key */
,some_value varchar2(10)
);
create index table_xyz_idx on table_xyz (idx);
declare
cursor idxes is
select level idx
from dual d
connect by level < 100000;
current_val varchar2(10);
function calculate_some_value(p_idx number) return varchar2
is
cursor c_previous is
select t.some_value
from table_xyz t
where t.idx in (round(p_idx / 2, 0), round(p_idx / 3, 0), round(p_idx / 5, 0))
order by t.idx desc
;
x varchar2(100);
begin
open c_previous;
fetch c_previous into x;
close c_previous;
x := nvl(x, 'XYZ');
if mod(p_idx, 2) = 0 then
x := x || '2';
elsif mod(p_idx, 3) = 0 then
x := '3' || x;
elsif mod(p_idx, 5) = 0 then
x := substr(x, 1,1) || '5' || substr(x, 2, 2 + mod(p_idx, 7));
end if;
x := substr(x, 1, 10);
return x;
end calculate_some_value;
begin
for idx in idxes
loop
current_val := calculate_some_value(idx.idx);
insert into table_xyz(idx, some_value) values (idx.idx, current_val);
end loop;
end;
Consider taking a look at the DBMS_STATS package.
Option A: use the DBMS_STATS procedures for manually setting table, column, and index statistics (i.e., SET_TABLE_STATS, SET_COLUMN_STATS, and SET_INDEX_STATS, respectively). Then use DBMS_STATS.LOCK_TABLE_STATS to keep your manually set statistics from being overwritten (e.g., by a DBA gathering schema statistics while your table happens to be empty).
Option B: run you procedure as is and then, after, manually gather stats on the table. Then, as above, use DBMS_STATS.LOCK_TABLE_STATS to keep them from being overwritten.
Either way, the idea is to set or gather statistics on your table and then lock them in place.
If you want to get fancier, maybe you could automate this. E.g.,
At install time, manually set the statistics and lock them for your first run
In your procedure code, at the end, unlock the statistics, gather them, and re-lock them.

Trying to create dynamic query strings with PL/PgSQL to make DRY functions in PostgreSQL 9.6

I have tables that contain the same type of data for every year, but the data gathered varies slightly in that they may not have the same fields.
d_abc_2016
d_def_2016
d_ghi_2016
d_jkl_2016
There are certain constants for each table: company_id, employee_id, salary.
However, each one might or might not have these fields that are used to calculate total incentives: bonus, commission, cash_incentives. There are a lot more, but just using these as a examples. All numeric
I should note at this point, users only have the ability to run SELECT statements.
What I would like to be able to do is this:
Give the user the ability to call in SELECT and specify their own fields in addition to the call
Pass the table name being used into the function to use in conditional logic to determine how the query string should be constructed for the eventual total_incentives calculation in addition to passing the whole table so a ton of arguments don't have to be passed into the function
Basically this:
SELECT employee_id, salary, total_incentives(t, 'd_abc_2016')
FROM d_abc_2016 t;
So the function being called will calculate total_incentives which is numeric for that employee_id and also show their salary. But the user might choose to add other fields to look at.
For the function, because the fields used in the total_incentives function will vary from table to table, I need to create logic to construct the query string dynamically.
CREATE OR REPLACE FUNCTION total_incentives(ANYELEMENT, t text)
RETURNS numeric AS
$$
DECLARE
-- table name lower case in case user typed wrong
tbl varchar(255) := lower($2;
-- parse out the table code to use in conditional logic
tbl_code varchar(255) := split_part(survey, '_', 2);
-- the starting point if the query string
base_calc varchar(255) := 'salary + '
-- query string
query_string varchar(255);
-- have to declare this to put computation INTO
total_incentives_calc numeric;
BEGIN
IF tbl_code = 'abc' THEN
query_string := base_calc || 'bonus';
ELSIF tbl_code = 'def' THEN
query_string := base_calc || 'bonus + commission';
ELSIF tbl_code = 'ghi' THEN
-- etc...
END IF;
EXECUTE format('SELECT $1 FROM %I', tbl)
INTO total_incentives_calc
USING query_string;
RETURN total_incentives_calc;
END;
$$
LANGUAGE plpgsql;
This results in an:
ERROR: invalid input syntax for type numeric: "salary + bonus"
CONTEXT: PL/pgSQL function total_incentives(anyelement,text) line 16 at EXECUTE
Since it should be returning a set of numeric values. Change it to the following:
CREATE OR REPLACE FUNCTION total_incentives(ANYELEMENT, t text)
RETURNS SETOF numeric AS
$$
...
RETURN;
Get the same error.
Figure well, maybe it is a table it is trying to return.
CREATE OR REPLACE FUNCTION total_incentives(ANYELEMENT, t text)
RETURNS TABLE(tot_inc numeric) AS
$$
...
Get the same error.
Really, any variation produces that result. So really not sure how to get this to work.
Look at RESULT QUERY, RESULT NEXT, or RESULT QUERY EXECUTE.
https://www.postgresql.org/docs/9.6/static/plpgsql-control-structures.html
RESULT QUERY won't work because it takes a hard coded query from what I can tell, which won't take in variables.
RESULT NEXT iterates through each record, which I don't think will be suitable for my needs and seems like it will be really slow... and it takes a hard coded query from what I can tell.
RESULT QUERY EXECUTE sounds promising.
-- EXECUTE format('SELECT $1 FROM %I', tbl)
-- INTO total_incentives_calc
-- USING query_string;
RETURN QUERY
EXECUTE format('SELECT $1 FROM %I', tbl)
USING query_string;
And get:
ERROR: structure of query does not match function result type
DETAIL: Returned type character varying does not match expected type numeric in column 1.
CONTEXT: PL/pgSQL function total_incentives(anyelement,text) line 20 at RETURN QUERY
It should be returning numeric.
Lastly, I can get this to work, but it won't be DRY. I'd rather not make a bunch of separate functions for each table with duplicative code. Most of the working examples I have seen have the whole query in the function and are called like such:
SELECT total_incentives(d_abc_2016, 'd_abc_2016');
So any additional columns would have to be specified in the function as:
EXECUTE format('SELECT employee_id...)
Given the users will only be able to run SELECT in query this really isn't an option. They need to specify any additional columns they want to see inside a query.
I've posted a similar question but was told it was unclear, so hopefully this lengthier version will more clearly explain what I am trying to do.
The column names and tables names should not be used as query parameters passed by USING clause.
Probably lines:
RETURN QUERY
EXECUTE format('SELECT $1 FROM %I', tbl)
USING query_string;
should be:
RETURN QUERY
EXECUTE format('SELECT %s FROM %I', query_string, tbl);
This case is example why too DRY principle is sometimes problematic. If you write it directly, then your code will be simpler, cleaner and probably shorter.
Dynamic SQL is one from last solution - not first. Use dynamic SQL only when your code will be significantly shorter with dynamic sql than without dynamic SQL.

How to stop a large query if it has too many rows in PostgreSQL using Jdbc?

We run user submitted queries which can potentially return a large result set.
In order to avoid memory issues, we would like to detect these cases and cancel the query. The user then is expected to modify the query.
We already use PreparedStatement#setFetchSize() to scroll the result set and process a large result set incrementally.
However, when the result set is too large, we would like to avoid bringing even the first results over the network or any other unnecessary work as much as possible on the client side and on the database side.
Doing a SELECT COUNT(*)... beforehand just degrades the performance of the expected case where the queries behave nicely in general.
Is there a way for postgres to tell the expected result set size?
Take a look here.
They are doing an estimate with a database procedure:
CREATE FUNCTION count_estimate(query text) RETURNS INTEGER AS
$func$
DECLARE
rec record;
ROWS INTEGER;
BEGIN
FOR rec IN EXECUTE 'EXPLAIN ' || query LOOP
ROWS := SUBSTRING(rec."QUERY PLAN" FROM ' rows=([[:digit:]]+)');
EXIT WHEN ROWS IS NOT NULL;
END LOOP;
RETURN ROWS;
END
$func$ LANGUAGE plpgsql;
It uses the EXPLAIN command of PGSQL to estimate the returned rowcount.

SQL Oracle - Procedure Syntax (School assignment)

I'm currently learning SQL and I'm having trouble with a procedure of mine. The procedure should calculate the average of a column called 'INDICE_METABO_PAT'. The information I need is in 3-4 different tables. Then when I do have the average calculated, I update a table to set this average to the corresponding entries. Here is the procedure. Note that everything else in my .sql file works : inserts, updates, selects, views, etc.
create or replace Procedure SP_UPDATE_INDICE_METABO_DV (P_NO_ETUDE in number)
is
V_SOMME number := 0;
V_NBPATIENT number := 0;
V_NO_ETUDE number := P_NO_ETUDE;
cursor curseur is
select PATIENT.INDICE_EFFICACITE_METABO_PAT
from ETUDE, DROGUE_VARIANT, ETUDE_PATIENT, PATIENT
where ETUDE.NO_DROGUE = DROGUE_VARIANT.NO_DROGUE
and ETUDE.NO_VAR_GEN = DROGUE_VARIANT.NO_VAR_GEN
and V_NO_ETUDE = ETUDE_PATIENT.NO_ETUDE
and ETUDE_PATIENT.NO_PATIENT = PATIENT.NO_PATIENT;
begin
open curseur;
fetch curseur into V_SOMME;
V_NBPATIENT := V_NBPATIENT + 1;
exit when curseur%NOTFOUND;
update DROGUE_VARIANT
set INDICE_EFFICACITE_METABO_DV = V_SOMME / V_NBPATIENT
where exists(select * from ETUDE, DROGUE_VARIANT, ETUDE_PATIENT, PATIENT
where ETUDE.NO_DROGUE = DROGUE_VARIANT.NO_DROGUE
and ETUDE.NO_VAR_GEN = DROGUE_VARIANT.NO_VAR_GEN
and V_NO_ETUDE = ETUDE_PATIENT.NO_ETUDE
and ETUDE_PATIENT.NO_PATIENT = PATIENT.NO_PATIENT);
end SP_UPDATE_INDICE_METABO_DV;
/
I'm getting an error : Procedure compiled , error check compiler log.
But I can't open the compiler log, and when my friend opens it, it points to weird places, like my create tables and such.
This is school stuff by the way, so it'll be nice if you could give an insight instead of a direct solution. Thanks alot.
Thanks alot in advance for your kind help !
To see the error you can do show errors after your procedure creation statement, or you can query the user_errors or all_errors views.
That will show something like:
LINE/COL ERROR
-------- ------------------------------------------------------------------------
20/4 PLS-00376: illegal EXIT/CONTINUE statement; it must appear inside a loop
20/4 PL/SQL: Statement ignored
You mentioned that when you checked the complier log, which shows the same information, "it points to weird places". Presumably you're looking at line 20 in your script. But this message is referring to line 20 of the PL/SQL code block, which is the exit when curseur%NOTFOUND;, which makes sense for the error message.
And as the message also says, and as #ammoQ said in a comment, that should be in a loop. If you're trying to manually calculate the average in a procedure as an exercise, instead of using the built-in aggregation functions, then you need to loop over all of the rows from your cursor:
open curseur;
loop
fetch curseur into V_SOMME;
exit when curseur%NOTFOUND;
V_NBPATIENT := V_NBPATIENT + 1;
end loop;
close curseur;
But as you'll quickly realise, you'll end up with the v_somme variable having the last value retrieved, not the sum of all the values. You need a separate variable keep to track of the sum - fetch each value into a variable, and add that to your running total, all within the loop. But as requested, not giving a complete solution.
As you're starting out you should really use ANSI join syntax, not the old from/where syntax you have now. It's a shame that is still being taught. So your cursor query should be something like:
select PATIENT.INDICE_EFFICACITE_METABO_PAT
from ETUDE_PATIENT
join ETUDE
-- missing on clause !
join DROGUE_VARIANT
on DROGUE_VARIANT.NO_DROGUE = ETUDE.NO_DROGUE
and DROGUE_VARIANT.NO_VAR_GEN = ETUDE.NO_VAR_GEN
join PATIENT
on PATIENT.NO_PATIENT = ETUDE_PATIENT.NO_PATIENT
where ETUDE_PATIENT.NO_ETUDE = P_NO_ETUDE;
... which shows you that you are missing a join condition between ETUDE_PATIENT and ETUDE - it's unlikely you want a cartesian product, and it's much easier to spot that missing join using this syntax than with what you had.
You need to look at your update statement carefully too, particularly the exists clause. That will basically always return true if the cursor found anything, so it will update every row in DROGUE_VARIANT with your calculated average, which presumably isn't what you want.
There is no correlation between the rows in the table you're updating and the subquery in that clause - the DROGUE_VARIANT in the subquery is independent of the DROGUE_VARIANT you're updating. By which I mean, it's the same table, obviously; but the update and the subquery are looking at the table separately and so are looking at different rows. It also has the same missing join condition as the cursor query.

How would you enforce DRY (Don't Repeat Yourself) in a SQL script?

I'm changing a database (oracle) with a script containing a few updates looking like:
UPDATE customer
SET status = REPLACE(status, 'X_Y', 'xy')
WHERE status LIKE '%X_Y%'
AND category_id IN
(SELECT id
FROM category
WHERE code = 'ABC');
UPDATE customer
SET status = REPLACE(status, 'X_Z', 'xz')
WHERE status LIKE '%X_Z%'
AND category_id IN
(SELECT id
FROM category
WHERE code = 'ABC');
-- More updates looking the same...
In this case, how would you enforce DRY (Don't Repeat Yourself)?
I'd particularly interested in solving the two following recurring problems:
Define a function, available from this script only, to extract the subquery SELECT id FROM category WHERE code = 'ABC'
Create a set of replace rules (that could look like {"X_Y": "yx", "X_Z": "xz", ...} in a popular programming language) and then iterate a single update query on it.
Thanks!
I would reduce it to a single query:
UPDATE customer
SET status = REPLACE(REPLACE(status, 'X_Y', 'xy'), 'X_Z', 'xz')
WHERE status REGEXP_LIKE 'X_[YZ]'
AND category_id IN
(SELECT id
FROM category
WHERE code = 'ABC');
First of all, remember that scripting is not the same thing as programming, and you don't have to adhere to DRY principles. Scripts like this one are usually one-offs, not a program to be maintained over a long time.
But you could use PL/SQL to do this:
declare
type str_tab is table of varchar2(30) index by binary_integer;
from_tab str_tab;
to_tab str_tab;
begin
from_tab(1) := 'X_Y';
from_tab(2) := 'X_Z';
to_tab(1) := 'xy';
to_tab(2) := 'xz';
for i in 1..from_tab.count loop
UPDATE customer
SET status = REPLACE(status, from_tab(i), to_tab(i))
WHERE status LIKE '%' || from_tab(i) || '%'
AND category_id IN
(SELECT id
FROM category
WHERE code = 'ABC');
end loop;
end;
Pretty straightforward, unless I'm missing something.
UPDATE customer
SET status = REPLACE(REPLACE(status,'X_Y','xy'),'X_Z','xz')
WHERE ( status LIKE '%X_Y%' Or status LIKE '%X_Z%')
AND category_id IN
(SELECT id
FROM category
WHERE code = 'ABC');
Write a script that takes parameters and call it multiple times. (I'm assuming you're using SQLPlus to run the script.)
replace_in_status.sql:
UPDATE customer
SET status = REPLACE(status, UPPER('&1'), '&2')
WHERE status LIKE '%' ||UPPER('&1')|| '%'
AND category_id IN
(SELECT id
FROM category
WHERE code = 'ABC');
Calling script:
#replace_in_status X_Y xy
#replace_in_status X_Z xz
Okay, a shot from the hip here, take it easy on my syntax :-)
Would an approach like this help:
DECLARE
v_sql1 VARCHAR2(1000);
v_sql2 VARCHAR2(2000);
TYPE T_Rules IS RECORD (srch VARCHAR2(100), repl(VARCHAR2(100));
TYPE T_RuleTab IS TABLE OF T_Rules INDEX BY BINARY_INTEGER;
v_rules T_RuleTab;
FUNCTION get_subquery RETURN VARCHAR2 IS
BEGIN
RETURN '(SELECT id FROM category WHERE code = ''ABC'')';
END;
BEGIN
v_sql1 := 'UPDATE customer SET status = REPLACE('':1'','':2'') WHERE status LIKE ''%:1%'' AND category_id IN ';
v_rules(1).srch := ('X_Y'); v_rules(1).repl := 'yx';
v_rules(2).srch := ('X_Z'); v_rules(2).repl := 'xz';
FOR i IN 1..v_rules.COUNT LOOP
v_sql2 := v_sql1||get_subquery();
EXECUTE IMMEDIATE v_sql2 USING v_rules(i).srch, v_rules(i).repl;
END LOOP;
END;
You could replace the PL/SQL table with a real table and run a cursor over it, but this addresses your second requirement.
Obviously some work is left on get_subquery, your first requirement ;-)
EDIT
Dang! forgot to mention you need to be careful with that replace string in your WHERE clause - underscores are a single character matching wild card in Oracle...
Depending on how important the script is, I would:
Just copy and paste and modify, or
Write a script in another programming language that has better ways to resolve the duplication.
For the replace rules you could create a temporary table and fill it with these replace rules, and then join with this table.
If the subquery is always the same, you have solved the first problem also by using a join.
I've seen a few approaches to this:
Use string buffers to assemble the sql dynamically using PL/SQL or in your programming language.
Use a framework such as IBATIS which let's you reuse and extend fragments of SQL that are stored in XML files.
Using an ORM framework circumvents this issue by working with objects rather than directly with the SQL.
Depending on your language and problem at hand using a framework may be the best approach and then extending it to do what you want it to do.
The solution suggested by soulmerge is the simplest, and therefore best one - you just need to nest the calls to "replace". I just want to add that the condition
status like '%tagada%'
is useless. replace() will change nothing to the status if the searched string is not found, therefore you can safely apply it to all rows. And since a condition where you search a string lost in the middle of another string cannot make any use of whatever index you have, it's useless as a filtering condition.
Your only filtering condition is the one on category_id ...
Which brings one point that justifies why soulmerge's solution is best: iterating on all the changes is a bad idea. Suppose that the filter on category_id is moderately selective, odds are that Oracle will choose to scan the table. Do you really want to scan the table each time when you can do all the changes in a single pass?