Is there a way in PostgreSQL to abort execution of COUNT(*) statement and return its current result?
I would like to run:
SELECT COUNT(*) FROM table WHERE something=x;
Some queries are completed in almost no time, but some take quite a lot of time. I would like to have:
if statement is completed in within time limit then it returns final
result,
else it aborts execution but returns current result.
It would be nice to get an exit status as well (whether it finished execution or was aborted).
I found statement_timeout setting, but it doesn't return any result, just aborts.
You can easily instruct Postgres to count up to a given LIMIT - a maximum number of rows, not an elapsed time:
SELECT count(*)
FROM (
SELECT 1 FROM tbl
WHERE something = 'x'
LIMIT 100000 -- stop counting at 100k
) sub;
If count() takes a very long time, you either have huge tables or some other problems with your setup. Either way, an estimated count be good enough for your purpose:
Fast way to discover the row count of a table in PostgreSQL
It is not possible per se to stop counting after a maximum elapsed time. You could partition the count with the above technique and check the elapsed time after every step. But this adds a lot of overhead. Skipping rows with OFFSET is not that much cheaper than counting them. I don't think I would use it. Just as proof of concept:
DO
$do$
DECLARE
_partition bigint := 100000; -- size of count partition
_timeout timestamptz := clock_timestamp() + interval '1s'; -- max time allowed
_round int := 0;
_round_ct bigint;
BEGIN
LOOP
SELECT count(*)
FROM (
SELECT 1 FROM tbl
WHERE something = 'x'
LIMIT _partition
OFFSET _partition * _round
) sub
INTO _round_ct;
IF _round_ct < _partition THEN
RAISE NOTICE 'count: %; status: complete', _partition * _round + _round_ct;
RETURN;
ELSIF clock_timestamp() > _timeout THEN
RAISE NOTICE 'count: %; status: timeout', _partition * _round + _round_ct;
RETURN;
END IF;
_round := _round + 1;
END LOOP;
END
$do$;
You could wrap this in a plpgsql function and pass parameters. Even make it work for any given table / column with EXECUTE ...
If you have an ID column with few gaps, the technique would make a lot more sense. You could partition by ID with a lot less overhead ...
I don't believe you will ever get a result set with a count until a query completes and makes it visible to the end user, aka you. Such is the way of fundamental rules of an ACID database. From initiating a SELECT command you're asking for a snapshot of the number of rows at that moment in time.
You would probably be better off looking at the issue from another angle and look into why some queries take a long time by performing an EXPLAIN on the query and then investigate the results.
Related
I have a query that is trying to find the max value of a column after grouping over two other columns:
select address
, token_id
, max(input_tx_time) as last_tx_time
from processed.token_utxo
where input_tx_time < date_trunc('day', current_timestamp)
group by address, token_id
limit X
The table has ~330M rows and an index over all three columns ie:
create index idx_token_utxo on processed.token_utxo using btree (address, token_id, input_tx_time);
Running the query with a limit of 1,000,000 results in a query plan that uses the index and completes in ~1 min: https://explain.depesz.com/s/112B
However running the query with a limit of 10,000,000 results in a query plan that does not appear to use the index and completes in ~30 min: https://explain.depesz.com/s/zTkZ
If I run the query without a limit it will error out with a "temp file reached max size" so this is more than a just an issue with speed. I'm assuming the problem is the index itself is large at 20 GB (address column is a fairly long string) so it can't be loaded into memory or something like that. Would love to know what is actually going on here as well as how I can resolve this issue. Thanks in advance!
Running on PostgreSQL 13.7 with 24 GB RAM, 4 vCPU and SSD
Flags:
random_page_cost: 1.1
work_mem: 42598
default_statistics_target: 500
Your faster plan is using 642,780 heap fetches to fetch 2,414,121 rows, which means about 1/4 of the table is not marked as allvisible. If you vacuum this table more aggressively, it should make that plan actually faster, and should also make it estimated to be faster. The latter is probably more important, as the main problem you are complaining about is plan choice, not performance of the correct plan. Making the estimate look better means this plan is more likely to be chosen.
I'm surprised to see you had already lowered random_page_cost. Based the plan estimates, I'd assumed it was still 4 until I went back and read more closely. What is effective_cache_size set to? How about temp_file_limit?
I resolved this by using PL/pgSQL to iteratively process chunks of the table and update the max input_tx_time after each chunk. This seems like a really good way to process large tables and it might help others. For example:
create or replace function processed.token_x_address_update_last_tx_time_fn(max_token_utxo_id bigint) returns void as
$$
declare
token_utxo_id bigint;
begin
-- Loop through all token_utxos 1 million at a time
for token_utxo_id in 0..max_token_utxo_id by 1e6
loop
update processed.token_x_address_checkpoint as c
set last_tx_time=u.last_tx_time
from (
select address
, token_id
, max(input_tx_time) as last_tx_time
from processed.token_utxo
where token_utxo.id > token_utxo_id and token_utxo.id <= token_utxo_id
group by address, token_id
) as u
where c.address = u.address
and c.token_id = u.token_id
and c.last_tx_time < u.last_tx_time
;
end loop;
-- Process remainder token_utxos
update processed.token_x_address_checkpoint as c
set last_tx_time=u.last_tx_time
from (
select address
, token_id
, max(input_tx_time) as last_tx_time
from processed.token_utxo
where token_utxo.id > floor(max_token_utxo_id / 1e6) * 1e6 and token_utxo.id <= max_token_utxo_id
group by address, token_id
) as u
where c.address = u.address
and c.token_id = u.token_id
and c.last_tx_time < u.last_tx_time
;
end;
$$
language plpgsql;
In a similar fashion to how
select * from mytable where rownum <= 1000;
will give me the first 1000 rows of results of a query, is there a way to
select * from mytable where runtime <= 1000;
which would return the results obtained in the first 1000 <time units> of running the query?
Oracle does not support this, at least not in an easy sense like your example.
One blog I found was able to limit the execution time of users in a certain resource group that they made. They created a special group for said users, and then they defined a resource plan that they called LIMIT_EXEC_TIME for that user. Their code is as follows for reference:
set serverout on size 5555
--
-- first remove an existing active plan
ALTER SYSTEM SET RESOURCE_MANAGER_PLAN ='';
--
-- delete any existing plan or group
-- we have to create a pending area first
exec dbms_resource_manager.clear_pending_area();
exec dbms_resource_manager.create_pending_area();
exec dbms_resource_manager.DELETE_PLAN ('LIMIT_EXEC_TIME');
exec dbms_resource_manager.DELETE_CONSUMER_GROUP ('GROUP_WITH_LIMITED_EXEC_TIME');
exec DBMS_RESOURCE_MANAGER.VALIDATE_PENDING_AREA;
exec DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();
begin
dbms_resource_manager.create_pending_area();
--
-- we need a consumer group that maps to the desired oracle user:
dbms_resource_manager.create_consumer_group(
CONSUMER_GROUP=>'GROUP_WITH_LIMITED_EXEC_TIME',
COMMENT=>'This is the consumer group that has limited execution time per statement'
);
dbms_resource_manager.set_consumer_group_mapping(
attribute => DBMS_RESOURCE_MANAGER.ORACLE_USER,
value => 'PYTHIAN',
consumer_group =>'GROUP_WITH_LIMITED_EXEC_TIME'
);
-- and we need a resource plan:
dbms_resource_manager.create_plan(
PLAN=> 'LIMIT_EXEC_TIME',
COMMENT=>'Kill statement after exceeding total execution time'
);
-- now let's create a plan directive for that special user group
-- the plan will cancel the current SQL if it runs for more than 120 sec
dbms_resource_manager.create_plan_directive(
PLAN=> 'LIMIT_EXEC_TIME',
GROUP_OR_SUBPLAN=>'GROUP_WITH_LIMITED_EXEC_TIME',
COMMENT=>'Kill statement after exceeding total execution time',
SWITCH_GROUP=>'CANCEL_SQL',
SWITCH_TIME=>15,
SWITCH_ESTIMATE=>false
);
dbms_resource_manager.create_plan_directive(
PLAN=> 'LIMIT_EXEC_TIME',
GROUP_OR_SUBPLAN=>'OTHER_GROUPS',
COMMENT=>'leave others alone',
CPU_P1=>100
);
DBMS_RESOURCE_MANAGER.VALIDATE_PENDING_AREA;
DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();
end;
/
exec dbms_resource_manager_privs.grant_switch_consumer_group('PYTHIAN','GROUP_WITH_LIMITED_EXEC_TIME',false);
exec dbms_resource_manager.set_initial_consumer_group('PYTHIAN','GROUP_WITH_LIMITED_EXEC_TIME');
select * from DBA_RSRC_CONSUMER_GROUPS;
select * from DBA_RSRC_GROUP_MAPPINGS;
select * from DBA_RSRC_PLANS;
select * from DBA_RSRC_PLAN_DIRECTIVES;
-- to enable it:
ALTER SYSTEM SET RESOURCE_MANAGER_PLAN ='LIMIT_EXEC_TIME';
SELECT se.sid sess_id, co.name consumer_group,
se.state, se.consumed_cpu_time cpu_time, se.cpu_wait_time, se.queued_time
FROM v$rsrc_session_info se, v$rsrc_consumer_group co
WHERE se.current_consumer_group_id = co.id;
select username,resource_CONSUMER_GROUP,count(*) from v$session group by username,resource_CONSUMER_GROUP;
Partial Results
Queries can return partial results but the query will also throw exception "ORA-00040: active time limit exceeded - call aborted" that must be ignored by the client.
This can be simulated with a function that does a lot of CPU work:
create or replace function sleep_cpu return number authid current_user is
v_loop number := 0;
begin
for i in 1 .. 10000000 loop
v_loop := v_loop + 1;
end loop;
return v_loop;
end;
/
SQL*Plus can demonstrate a client able to read partial results:
SQL> set timing on
SQL> select sleep_cpu()
2 from dual
3 connect by level <= 100;
SLEEP_CPU()
-----------
10000000
10000000
10000000
10000000
10000000
10000000
10000000
10000000
10000000
10000000
10000000
10000000
10000000
10000000
10000000
ERROR:
ORA-00040: active time limit exceeded - call aborted
15 rows selected.
Elapsed: 00:00:08.52
SQL>
Note the Elapsed time in this example is 8 seconds. I set the timeout to 5 seconds, this demonstrates that it's hard to get good precision.
CPU Time, Not Elapsed Time
Resource manager only counts CPU time, not elapsed time. This is despite what the documentation says. One of the comments in the Pythian article suggests this behavior can be changed with an ALTER SYSTEM SET EVENT = '10720 trace name context forever, level 16384' scope=spfile; (and a restart), but that didn't work for me.
For example, create this function:
create or replace function sleep_no_cpu return number authid current_user is
begin
execute immediate 'begin dbms_lock.sleep(1); end;';
return 1;
end;
/
This SELECT will run for the whole 100 seconds because it's not using 100 seconds of CPU.
select sleep_cpu()
from dual
connect by level <= 100;
Assume that we have two tables, named Tb1 and Tb2 and we are going to replace data from one to another. Tb1 is the main source of data and Tb2 is the Destination. This replacement operation has 3 parts.
In the first part we are going to validate all rows in Tb1 and check if they are correct. For example National security code must exactly have 10 digits or a real customer must have a valid birth date so according to these validation rules, 28 different validation methods and error codes have been considered. During the validation every spoiled row's description and status will be updated to a new state.
Part 2 fixes the rows' problems and the third one replace them to the Tb2.
For instance this row says that it has 4 different error.
-- Tb1.desc=6,8,14,16
-- Tb1.sts=0
A correct row of data
-- Tb1.desc=Null i
-- Tb1.sts=1
I have been working on the first part recently and have come up with a solution which works fine but it is too slow. Unfortunately It takes exactly 31 minutes to validate 100,000 rows. In a real situation we are going to validate more than 2 million records so it is totally useless despite all it's functionality.
Let's take look at my package :
procedure Val_primary IS
begin
Open X_CUSTOMER;
Loop
fetch X_CUSTOMER bulk collect into CUSTOMER_RECORD;
EXIT WHEN X_CUSTOMER%notfound;
For i in CUSTOMER_RECORD.first..CUSTOMER_RECORD.last loop
Val_CTYP(CUSTOMER_RECORD(i).XCUSTYP);
Val_BRNCH(CUSTOMER_RECORD(i).XBRNCH);
--Rest of the validations ...
UptDate_Val(CUSTOMER_RECORD(i).Xrownum);
end loop;
CUSTOMER_RECORD.delete;
End loop;
Close X_CUSTOMER;
end Val_primary;
Inside a validation procedure :
procedure Val_CTYP(customer_type IN number)IS
Begin
IF(customer_type<1 or customer_type>3)then
RW_FINAL_STATUS:=0;
FINAL_ERR_DSC:=Concat(FINAL_ERR_DSC,ERR_INVALID_CTYP);
End If;
End Val_CTYP;
Inside the update procedure :
procedure UptDate_Val(rownumb IN number) IS
begin
update tb1 set tb1.xstst=RW_FINAL_STATUS,tb1.xdesc=FINAL_ERR_DSC where xc1customer.xrownum=rownumb;
RW_FINAL_STATUS:=1;
FINAL_ERR_DSC:=null;
end UptDate_Val;
Is there any way to reduce execution time ?
It must be done less than 20 minutes for more than 2 million records.
Maybe each validation check could be a case expression within an inline view, and you could concatenate them etc in the enclosing query, giving you a single SQL statement that could drive an update. Something along the lines of:
select xxx, yyy, zzz -- whatever columns you need from xc1customer
, errors -- concatenation of all error codes that apply
, case when errors is not null then 0 else 1 end as status
from ( select xxx, yyy, zzz
, trim(ltrim(val_ctyp||' ') || ltrim(val_abc||' ') || ltrim(val_xyz||' ') || etc...) as errors
from ( select c.xxx, c.yyy, c.zzz
, case when customer_type < 1 or customer_type > 3 then err_invalid_ctyp end as val_ctyp
, case ... end as val_abc
, case ... end as val_xyz
from xc1customer c
)
);
Sticking with the procedural approach, the slow part seems to be the single-row update. There is no advantage to bulk-collecting all 20 million rows into session memory only to apply 20 million individual updates. The quick fix would be to add a limit clause to the bulk collect (and move the exit to the bottom of the loop where it should be), have your validation procedures set a value in the array instead of updating the table, and batch the updates into one forall per loop iteration.
You can be a bit freer with passing records and arrays in and out of procedures rather than having everything a global variable, as passing by reference means there is no performance overhead.
There are two potential lines of attack.
Specific implementation. Collections are read into session memory. This is usually quite small compared to global memory allocation. Reading 100000 longish rows into session memory is a bad idea and can cause performance issues. So breaking up the process into smaller chunks (say 1000 rows) will most likely improve throughput.
General implementation. What is the point of the tripartite process? Updating Table1 with some error flags is an expensive activity. A more efficient approach would be to apply the fixes to the data in the collection and apply that to Table2. You can write a log record if you need to track what changes are made.
Applying these suggestion you'd end up with a single procedure which looks a bit like this:
procedure one_and_only is
begin
open x_customer;
<< tab_loop >>
loop
fetch x_customer bulk collect into customer_record
limit 1000;
exit when customer_record.count() = 0;
<< rec_loop >>
for i in customer_record.first..customer_record.last loop
val_and_fix_ctyp(customer_record(i).xcustyp);
val_and_fix_brnch(customer_record(i).xbrnch);
--rest of the validations ...
end loop rec_loop;
-- apply the cleaned data to target table
forall j in 1..customer_record.count()
insert into table_2
values customer_record(j);
end loop tab_loop;
close x_customer;
end one_and_only;
Note that this approach requires the customer_record collection to match the projection of the target table. Also, don't use %notfound to test for end of the cursor unless you can guarantee the total number of read records is an exact multiple of the LIMIT number.
We have an UPDATE in production(below) which processes more or less the same number of rows each day but with drastically different runtimes. Some days the query finishes in 2 minutes, while other days, the query might take 20 minutes. Per my analysis of the AWR data, the culprit was I/O wait time and whenever the query slows down, the cache hit ratio goes down due to increased physical reads.
The outline of the query itself is below:
update /*+ nologging parallel ( a 12 ) */ huge_table1 a
set col = 1
where col1 > 'A'
and col2 < 'B'
and exists ( select /*+ parallel ( b 12 ) */ 1
from huge_table2 b
where b.col3 = a.col3 );
huge_table1 and huge_table2 contains about 100 million rows and the execution statistics are below:
Day EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC DISK_READS_1EXEC DIRECT_WRITES_1EXEC
------- ----------- -------------------- ---------------- -------------- -------------------- ----------------- ----------------- -------------------
1 1 133.055 69.110 23.325 2178085.000 3430367.000 90522.000 42561.000
2 1 123.580 65.020 20.282 2179404.000 3341566.000 86614.000 38925.000
3 1 1212.762 72.800 1105.084 1982658.000 3131695.000 268260.000 38446.000
4 1 1085.773 59.600 996.642 1965309.000 2954480.000 200612.000 26790.000
As seen above, the LIO has remained almost the same in each case, although the elapsed time has increased in the 3rd and 4th days due to increased IO waits, which if my assumption is correct was caused by increase in the PIO. Per Tom Kyte, tuning should be focused on reducing the LIO instead of PIO and as LIO reduces, so will PIO. But in this case, the LIO has been constant throughout, but the PIO has been varying significantly.
My question - What tuning strategy could be adopted here?
I would:
-> Check the execution plan for both cases.
-> Check IO subsystem health.
-> Monitor the server the time this runs and make sure the IO sybsystem is not saturated by another process.
Also, what kind of I/O is leading read events? Sequential, parallel , scattered?... here you can ge a lead of the strategy the plan is following to perform the update...
Is the buffer cache being resized? a small and cold buffer cache which gets resized during this big execution could lead to blocks needing to be read into the buffer cache in order to update them.
Some ideas based on the data you showed... please let us know what came out!
Recently I had problem which huge update. I found good solution based on parallel pipelined function which decrease time of updating significantly.
My proposition is not exactly what you asked but maybe this approach could give you short and stable time in days perspective:
Create type:
CREATE type test_num_arr AS TABLE of INTEGER;
/
Make updating pipelined function (you can ofcourse adjust):
create or replace FUNCTION test_parallel_update (
test_cur IN SYS_REFCURSOR
)
RETURN test_num_arr
PARALLEL_ENABLE (PARTITION test_cur BY ANY)
PIPELINED
IS
PRAGMA AUTONOMOUS_TRANSACTION;
test_rec HUGE_TABLE1%ROWTYPE;
TYPE num_tab_t IS TABLE OF NUMBER(38);
pk_tab NUM_TAB_T;
cnt INTEGER := 0;
BEGIN
LOOP
FETCH test_cur BULK COLLECT INTO pk_tab LIMIT 1000;
EXIT WHEN pk_tab.COUNT() = 0;
FORALL i IN pk_tab.FIRST .. pk_tab.LAST
UPDATE HUGE_TABLE1
set col = 1
where col1 > 'A'
and col2 < 'B'
and exists ( select 1
from huge_table2 b
where b.col3 = a.col3
)
AND ID = pk_tab(i);
cnt := cnt + pk_tab.COUNT;
END LOOP;
CLOSE test_cur;
COMMIT;
PIPE ROW(cnt);
RETURN;
END;
Lastly, run your update:
SELECT * FROM TABLE(test_parallel_update(CURSOR(SELECT id FROM huge_table1)));
Approach based on:
http://www.orafaq.com/node/2450
To answer your question about the strategy, must of course choose LIO. Row access in buffer are much faster than disk operation.
With respect to your problem,seen that the first days the execution time is good and the last days it is not.
If you use indexes on columns = b.col3 a.col3 and there is a lot of insertion in the tables.Maybe they are out of date and so your query can no longer use the index and reads more blocks.
Because in your execution plan we see an increase in disk reads.
In this case it would be necessary to :
EXEC DBMS_STATS.gather_table_stats(schema, table_name);
You should gather statistics periodically with scheduler. depending on your data changing on volume.
You could schedule during the day just a gathers index statistics with :
DBMS_STATS.GATHER_INDEX_STATS
And evening :
DBMS_STATS.GATHER_TABLE_STATS
witch gathers table and column (and index) statistics.
In addition to your question about the possibilities there is also change to the data model. On large volumes partitioned tables are a good aproach in reducing IO.
hoping that ca can help
As bubooal says, we can't help you whithout execution plan and table structure of the 2 tables. Could you give us this 2 information?
Maybe partitioning could help you to reduce I/O.
Another possibilities is to keep the two table in your cache. it seems that the number of buffer get is the same. So when the query hangs it's because your tables are not in the buffer cache. For that you could use the db_keep_cache_size and pin your tables (or the good partition) in this cache
I have a table tmp_drop_ids with one column, id, and 3.3 million entries. I want to iterate over the table, doing something with every 200 entries. I have this code:
LIMIT = 200
for offset in xrange(0, drop_count+LIMIT, LIMIT):
print "Making tmp table with ids %s to %s/%s" % (offset, offset+LIMIT, drop_count)
query = """DROP TABLE IF EXISTS tmp_cur_drop_ids; CREATE TABLE tmp_cur_drop_ids AS
SELECT id FROM tmp_drop_ids ORDER BY id OFFSET %s LIMIT %s;""" % (offset, LIMIT)
cursor.execute(query)
This runs fine, at first, (~0.15s to generate the tmp table), but it will slow down occasionally, e.g. around 300k tickets it started taking 11-12 seconds to generate this tmp table, and again around 400k. It basically seems unreliable.
I will use those ids in other queries so I figured the best place to have them was in a tmp table. Is there any better way to iterate through results like this?
Use a cursor instead. Using a OFFSET and LIMIT is pretty expensive - because pg has to execute query, process and skip a OFFSET rows. OFFSET is like "skip rows", that is expensive.
cursor documentation
Cursor allows a iteration over one query.
BEGIN
DECLARE C CURSOR FOR SELECT * FROM big_table;
FETCH 300 FROM C; -- get 300 rows
FETCH 300 FROM C; -- get 300 rows
...
COMMIT;
Probably you can use a server side cursor without explicit using of DECLARE statement, just with support in psycopg (search section about server side cursors).
If your id's are indexed you can use "limit" with ">", for example in python-like pseudocode:
limit=200
max_processed_id=-1
query ("create table tmp_cur_drop_ids(id int)")
while true:
query("truncate tmp_cur_drop_ids")
query("insert into tmp_cur_drop_ids(id)" \
+ " select id from tmp_drop_ids" \
+ " where id>%d order by id limit %d" % (max_processed_id, limit))
max_processed_id = query("select max(id) from tmp_cur_drop_ids")
if max_processed_id == None:
break
process_tmp_cur_drop_ids();
query("drop table tmp_cur_drop_ids")
This way Postgres can use index for your query.