oracle multiple criteria dynamic sql - sql

I have a multiple criteria search function for user input/select different criteria to find results, and every criteria is optional, so the field value could be null; The PL/SQL backend processes each criteria value to construct a dynamic SQL.
Currently, I use the below way to process, but it is hard for debugging and maintaining.
jo := json_object_t(p_payload);
v_country := jo.get_String('IAINST_NATN_CODE');
v_region := jo.get_String('IAINST_REGN_CODE');
v_rank_code := jo.get_String('RANK_CODE');
v_year := jo.get_String('RANK_YEAR');
v_sql := 'select * from IAVW_INST_JSON_TABLE i where
((:1 is null) or (i.IAINST_NATN_CODE = :1))
and ((:2 is null) or (i.IAINST_REGN_CODE = :2))
and ((:3 is null) or (i.RANK_CODE = :3))
and ((:4 is null) or (i.RANK_YEAR = :4))';
OPEN c FOR v_sql
USING v_country, v_country, --1
v_region, v_region, --2
v_rank_code, v_rank_code, --3
v_year, v_year; --4
RETURN c;
Any good advice to improve?

I would only change the structure of the clauses to be like :
AND i.IAINST_REGN_CODE = NVL(:2, i.IAINST_REGN_CODE)
This way you will avoid OR and still won't interfer with indexing if there is any, but apart from that your code looks fine (and fine even without my suggestion either).

After searching the related post. Here is the summary:
For my scenario, my table owns around 5K rows.
So
WHERE NVL(mycolumn,'NULL') = NVL(searchvalue,'NULL')
could simplify my dynamic SQL.
But if the table owns massive data, the above approach is not efficient (time cost to run the column conversion for NVL), please use the below query:
where ((MYCOLUMN=SEARCHVALUE) OR (MYCOLUMN is NULL and SEARCHVALUE is NULL))
Details see this post: Determine Oracle null == null
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:7806711400346248708

For parameters referencing non-nullable columns you can use
and t.somecol = nvl(:b1,t.somecol)
For this the parser/optimiser will typically generate an execution plan with a union-all and a filter such that the most efficient approach will be used depending on whether :b1 is null or not (depending on database version, indexing, stats etc).
select * from bigtable t where t.product_type = nvl(:b1,t.product_type)
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6000000 | 486000000 | 5679 | 00:00:01 |
| 1 | PX COORDINATOR | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | 6000000 | 486000000 | 5679 | 00:00:01 |
| 3 | VIEW | VW_ORE_1B35BA0F | 6000000 | 486000000 | 5679 | 00:00:01 |
| 4 | UNION-ALL | | | | | |
| * 5 | FILTER | | | | | |
| 6 | PX BLOCK ITERATOR | | 1200000 | 145200000 | 2840 | 00:00:01 |
| * 7 | TABLE ACCESS FULL | BIGTABLE | 1200000 | 145200000 | 2840 | 00:00:01 |
| * 8 | FILTER | | | | | |
| 9 | PX BLOCK ITERATOR | | 4800000 | 580800000 | 2840 | 00:00:01 |
| 10 | TABLE ACCESS FULL | BIGTABLE | 4800000 | 580800000 | 2840 | 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(:B1 IS NOT NULL)
* 7 - filter("T"."PRODUCT_TYPE"=:B1)
* 8 - filter(:B1 IS NULL)
However, it obviously can't keep extending this by generating union-alls for every possible combination of an arbitrarily large number of bind variables.
select * from bigtable t
where t.product_type = nvl(:b1,t.product_type)
and t.in_stock = nvl(:b2,t.in_stock)
and t.discounted = nvl(:b3,t.discounted)
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 594 | 48114 | 5699 | 00:00:01 |
| 1 | PX COORDINATOR | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | 594 | 48114 | 5699 | 00:00:01 |
| 3 | VIEW | VW_ORE_1B35BA0F | 594 | 48114 | 5699 | 00:00:01 |
| 4 | UNION-ALL | | | | | |
| * 5 | FILTER | | | | | |
| 6 | PX BLOCK ITERATOR | | 119 | 14399 | 2844 | 00:00:01 |
| * 7 | TABLE ACCESS FULL | BIGTABLE | 119 | 14399 | 2844 | 00:00:01 |
| * 8 | FILTER | | | | | |
| 9 | PX BLOCK ITERATOR | | 475 | 57475 | 2854 | 00:00:01 |
| * 10 | TABLE ACCESS FULL | BIGTABLE | 475 | 57475 | 2854 | 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(:B1 IS NOT NULL)
* 7 - filter("T"."PRODUCT_TYPE"=:B1 AND "T"."IN_STOCK"=NVL(:B2,"T"."IN_STOCK") AND "T"."DISCOUNTED"=NVL(:B3,"T"."DISCOUNTED") AND (NVL(:B2,"T"."IN_STOCK")='Y' OR NVL(:B2,"T"."IN_STOCK")='N') AND
(NVL(:B3,"T"."DISCOUNTED")='Y' OR NVL(:B3,"T"."DISCOUNTED")='N'))
* 8 - filter(:B1 IS NULL)
* 10 - filter("T"."IN_STOCK"=NVL(:B2,"T"."IN_STOCK") AND "T"."DISCOUNTED"=NVL(:B3,"T"."DISCOUNTED") AND (NVL(:B2,"T"."IN_STOCK")='Y' OR NVL(:B2,"T"."IN_STOCK")='N') AND (NVL(:B3,"T"."DISCOUNTED")='Y'
OR NVL(:B3,"T"."DISCOUNTED")='N'))
The classic Tom Kyte/Bryn Llewellyn approach is to generate different SQL depending on whether the parameter is null or not null, but still binding each parameter exactly once. This way will produce multiple different cursors, but maximum 2 * the number of parameter values, and it's neat and efficient. The idea is that for each parameter value, you generate either
where t.column = :b1
if :b1 has a value, or else
where (1=1 or :b1 is null)
if it's null. You could logically skip the 1=1 part, but it takes advantage of some short-circuiting logic in the Oracle SQL parser that means it won't evaluate the or condition at all because it knows there is no need. For example,
select dummy from dual where 1=1 or sqrt(-1) > 1/0;
which returns 'X' without evaluating the impossible sqrt(-1) or 1/0 expressions.
Using this approach, your SQL would be generated as something like this:
v_sql := '
select * from iavw_inst_json_table i
where (1=1 or i.iainst_natn_code = :1)
and i.iainst_regn_code = :2
and i.rank_code = :3
and (1=1 or i.rank_year = :4)
';
You could use a procedure to generate the parameter handling SQL:
declare
l_report_sql clob := 'select * from bigtable t where 1=1';
l_product_type bigtable.product_type%type;
l_in_stock bigtable.in_stock%type := 'Y';
l_discounted bigtable.discounted%type := 'N';
procedure apply_bind
( p_bind# in number
, p_column_name in varchar2
, p_value_is_null in boolean
, p_sql in out clob )
is
begin
p_sql := p_sql || chr(10) || 'and ' ||
case
when p_value_is_null then '(1=1 or :'||p_bind#||' is null)'
else p_column_name||' = :'||p_bind#
end;
end;
begin
apply_bind(1, 't.product_type', l_product_type is null, l_report_sql);
apply_bind(2, 't.in_stock', l_in_stock is null, l_report_sql);
apply_bind(3, 't.discounted', l_discounted is null, l_report_sql);
dbms_output.put_line(l_report_sql);
open :results for l_report_sql using l_product_type, l_in_stock, l_discounted;
end;
My example gives:
select * from bigtable t where 1=1
and (1=1 or :1 is null)
and t.in_stock = :2
and t.discounted = :3

Related

Performance problem with QUERY using BIND variables and OR condition in Oracle 12.2

I am having a hard time understanding why the Oracle CBO is behaving the way it does when a bind variable is part of a OR condition.
My environment
Oracle 12.2 over Red Hat Linux 7
HINT. I am just providing a simplification of the query where the problem is located
$ sqlplus / as sysdba
SQL*Plus: Release 12.2.0.1.0 Production on Thu Jun 10 15:40:07 2021
Copyright (c) 1982, 2016, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> #test.sql
SQL> var loanIds varchar2(4000);
SQL> exec :loanIds := '100000018330,100000031448,100000013477,100000023115,100000022550,100000183669,100000247514,100000048198,100000268289';
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.00
SQL> SELECT
2 whs.* ,
3 count(*) over () AS TOTAL
4 FROM ALFAMVS.WHS_LOANS whs
5 WHERE
6 ( nvl(:loanIds,'XX') = 'XX' or
7 loanid IN (select regexp_substr(NVL(:loanIds,''),'[^,]+', 1, level) from dual
8 connect by level <= regexp_count(:loanIds,'[^,]+'))
9 )
10 ;
7 rows selected.
Elapsed: 00:00:18.72
Execution Plan
----------------------------------------------------------
Plan hash value: 2980809427
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6729 | 6748K| 2621 (1)| 00:00:01 |
| 1 | WINDOW BUFFER | | 6729 | 6748K| 2621 (1)| 00:00:01 |
|* 2 | FILTER | | | | | |
| 3 | TABLE ACCESS FULL | WHS_LOANS | 113K| 110M| 2621 (1)| 00:00:01 |
|* 4 | FILTER | | | | | |
|* 5 | CONNECT BY WITHOUT FILTERING (UNIQUE)| | | | | |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NVL(:LOANIDS,'XX')='XX' OR EXISTS (SELECT 0 FROM "DUAL" "DUAL" WHERE
SYS_OP_C2C( REGEXP_SUBSTR (NVL(:LOANIDS,''),'[^,]+',1,LEVEL))=:B1 CONNECT BY LEVEL<=
REGEXP_COUNT (:LOANIDS,'[^,]+')))
4 - filter(SYS_OP_C2C( REGEXP_SUBSTR (NVL(:LOANIDS,''),'[^,]+',1,LEVEL))=:B1)
5 - filter(LEVEL<= REGEXP_COUNT (:LOANIDS,'[^,]+'))
Statistics
----------------------------------------------------------
288 recursive calls
630 db block gets
9913 consistent gets
1 physical reads
118724 redo size
13564 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
113003 sorts (memory)
0 sorts (disk)
7 rows processed
SQL> set autotrace off
SQL> select count(*) from ALFAMVS.WHS_LOANS ;
COUNT(*)
----------
113095
1 row selected.
Elapsed: 00:00:00.14
KEY POINTS
I do know that if I change the OR expression by using two selects and UNION ALL works perfectly. The problem is that I have a lot of conditions done in the same way, so UNION ALL is not a solution in my case.
The table has statistics up to date calculated with FOR ALL COLUMNS SIZE AUTO and with ESTIMATE PERCENT 10%.
Dynamic SQL is not a solution in my case, because the query is called through a third party software that uses an API Web to convert the result to JSON.
I was able to rephrase the regular expression with connect by level in a way that now takes 19 seconds. Before it was taking 40 seconds.
The table has only 113K records and no indexes.
The query has 20 conditions of this kind, all written in the same way, as the screen in the web app that triggers the query by the API allows the user to use any combination of parameters or none at all.
If I remove the expression NVL(:loanIds,'XX') = 'XX' OR, the query takes 0.01 seconds. Why this OR expression with BINDs is giving such headache to the Optimizer ?
-- UPDATE --
I want to thank #Alex Poole for his suggestions and share with him that the third alternative ( removing the regular expressions ) has worked as a charm. It would be great to understand why, though. You have my most sincere gratitude. I used those for a while and I never had this problem. Also, the suggestion to use regexp_like was even better than the original one with regexp_substr and connect by level, but much slower by far than the one where no regular expressions are used at all
Original query
7 rows selected.
Elapsed: 00:00:36.29
New query
7 rows selected.
Elapsed: 00:00:00.58
Once the EXISTS disappeared of the internal predicate, the query works as fast as hell.
Thank you all for your comments !
From the execution plan the optimiser is, for some reason, re-evaluating the hierarchical query for every row in your table, and then using exists() to see if that row's ID is in the result. It isn't clear why the or is causing that. It might be something to raise with Oracle.
From experimenting I can see three ways to at least partially work around the problem - though I'm sure there are others. The first is to move the CSV expansion to a CTE and then force that to materialize with a hint:
WITH loanIds_cte (loanId) as (
select /*+ materialize */ regexp_substr(:loanIds,'[^,]+', 1, level)
from dual
connect by level <= regexp_count(:loanIds,'[^,]+')
)
SELECT
whs.* ,
count(*) over () AS TOTAL
FROM WHS_LOANS whs
WHERE
( :loanIds is null or
loanid IN (select loanId from loanIds_cte)
)
;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 3226738189
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1102 | 9918 | 11 (0)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9FD2A6_198A2E1A | | | | |
|* 3 | CONNECT BY WITHOUT FILTERING| | | | | |
| 4 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 5 | WINDOW BUFFER | | 1102 | 9918 | 9 (0)| 00:00:01 |
|* 6 | FILTER | | | | | |
| 7 | TABLE ACCESS FULL | WHS_LOANS | 11300 | 99K| 9 (0)| 00:00:01 |
|* 8 | VIEW | | 1 | 2002 | 2 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9FD2A6_198A2E1A | 1 | 2002 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(LEVEL<= REGEXP_COUNT (:LOANIDS,'[^,]+'))
6 - filter(:LOANIDS IS NULL OR EXISTS (SELECT 0 FROM (SELECT /*+ CACHE_TEMP_TABLE ("T1") */ "C0"
"LOANID" FROM "SYS"."SYS_TEMP_0FD9FD2A6_198A2E1A" "T1") "LOANIDS_CTE" WHERE SYS_OP_C2C("LOANID")=:B1))
8 - filter(SYS_OP_C2C("LOANID")=:B1)
That still does the odd transformation to exists(), but at least now that is querying the materialized CTE, so that connect by query is only evaluated one.
Or you could compare each loadId value with the full string using a regular expression:
SELECT
whs.* ,
count(*) over () AS TOTAL
FROM WHS_LOANS whs
WHERE
( :loanIds is null or
regexp_like(:loanIds, '(^|,)' || loanId || '(,|$)')
)
;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 1622376598
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1102 | 9918 | 9 (0)| 00:00:01 |
| 1 | WINDOW BUFFER | | 1102 | 9918 | 9 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| WHS_LOANS | 1102 | 9918 | 9 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(:LOANIDS IS NULL OR REGEXP_LIKE
(:LOANIDS,SYS_OP_C2C(U'(^|,)'||"LOANID"||U'(,|$)')))
which is slower than the CTE in my testing because regular expression are still expensive and you're doing 113k of them (still, better than 2 x 113k x number-of-elements of them).
Or you can avoid regular expressions and use several separate comparisons:
SELECT
whs.* ,
count(*) over () AS TOTAL
FROM WHS_LOANS whs
WHERE
( :loanIds is null or
:loanIds like loanId || ',%' or
:loanIds like '%,' || loanId or
:loanIds like '%,' || loanId || ',%'
)
;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 1622376598
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2096 | 18864 | 9 (0)| 00:00:01 |
| 1 | WINDOW BUFFER | | 2096 | 18864 | 9 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| WHS_LOANS | 2096 | 18864 | 9 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(:LOANIDS IS NULL OR :LOANIDS LIKE
SYS_OP_C2C("LOANID"||U',%') OR :LOANIDS LIKE
SYS_OP_C2C(U'%,'||"LOANID") OR :LOANIDS LIKE
SYS_OP_C2C(U'%,'||"LOANID"||U',%'))
which is fastest of those three options in my limited testing. But there may well be better and faster approaches.
Not really relevant, but you seem to be running this as SYS which isn't a good idea, even if the data is in another schema; your loanId column appears to be nvarchar2 (from the SYS_OP_C2C calls), which seems odd for something that could possibly be a number but in any case only seems likely to have ASCII characters; NVL(:loanIds,'') doesn't do anything, since null and empty string are the same in Oracle; and nvl(:loanIds,'XX') = 'XX' can be done as :loanIds is not null which avoid magic values.

Oracle SQL Execution Plan Difference

I have the below query and I'm generating the execution plan for it:
(EDIT: I'm using SQL Developer).
EXPLAIN PLAN
FOR
WITH aux AS (
SELECT
*
FROM
cdc.uap_fkkvkp#rbip
WHERE
ezawe = 'D'
)
SELECT /* FULL(a) FULL(b) */
-- COUNT(1)
b.zzpayment_plan,
b.vkont,
b.gpart,
a.opbel,
a.opupw,
a.opupk,
a.opupz,
a.blart,
a.betrw
FROM
cdc.uap_dfkkop#rbip a
JOIN aux b ON b.vkont = a.vkont
WHERE
a.augst IS NULL
AND a.xanza IS NULL
AND a.stakz IS NULL
AND a.augrs IS NULL
AND a.abwtp IS NULL;
SELECT
*
FROM
TABLE ( dbms_xplan.display );
It gives the below plan:
Plan hash value: 289441478
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4404K| 634M| | 6752K (1)| 00:04:24 |
|* 1 | HASH JOIN | | 4404K| 634M| 291M| 6752K (1)| 00:04:24 |
|* 2 | TABLE ACCESS STORAGE FULL | UAP_FKKVKP | 4487K| 239M| | 559K (1)| 00:00:22 |
|* 3 | TABLE ACCESS BY INDEX ROWID BATCHED| UAP_DFKKOP | 4404K| 399M| | 6160K (1)| 00:04:01 |
| 4 | BITMAP CONVERSION TO ROWIDS | | | | | | |
| 5 | BITMAP AND | | | | | | |
|* 6 | BITMAP INDEX SINGLE VALUE | UAP_DFKKOP_AUGST_2 | | | | | |
|* 7 | BITMAP INDEX SINGLE VALUE | UAP_DFKKOP_NEW_STAKZ_2 | | | | | |
|* 8 | BITMAP INDEX SINGLE VALUE | UAP_DFKKOP_AUGRS_2 | | | | | |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("UAP_FKKVKP"."VKONT"="A"."VKONT")
2 - storage("EZAWE"=U'D')
filter("EZAWE"=U'D')
3 - filter("A"."XANZA" IS NULL AND "A"."ABWTP" IS NULL)
6 - access("A"."AUGST" IS NULL)
7 - access("A"."STAKZ" IS NULL)
8 - access("A"."AUGRS" IS NULL)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- 1 Sql Plan Directive used for this statement
However, if I generate the plan for the first part of the code, but for the COUNT(1) uncommented (and the other selections commented), the below shows (it's exactly the same as the previous execution plan):
Plan hash value: 2732266276
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 66 | | 6736K (1)| 00:04:24 |
| 1 | SORT AGGREGATE | | 1 | 66 | | | |
|* 2 | HASH JOIN | | 4404K| 277M| 171M| 6736K (1)| 00:04:24 |
|* 3 | TABLE ACCESS STORAGE FULL | UAP_FKKVKP | 4487K| 119M| | 559K (1)| 00:00:22 |
|* 4 | TABLE ACCESS BY INDEX ROWID BATCHED| UAP_DFKKOP | 4404K| 159M| | 6160K (1)| 00:04:01 |
| 5 | BITMAP CONVERSION TO ROWIDS | | | | | | |
| 6 | BITMAP AND | | | | | | |
|* 7 | BITMAP INDEX SINGLE VALUE | UAP_DFKKOP_AUGST_2 | | | | | |
|* 8 | BITMAP INDEX SINGLE VALUE | UAP_DFKKOP_NEW_STAKZ_2 | | | | | |
|* 9 | BITMAP INDEX SINGLE VALUE | UAP_DFKKOP_AUGRS_2 | | | | | |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("UAP_FKKVKP"."VKONT"="A"."VKONT")
3 - storage("EZAWE"=U'D')
filter("EZAWE"=U'D')
4 - filter("A"."XANZA" IS NULL AND "A"."ABWTP" IS NULL)
7 - access("A"."AUGST" IS NULL)
8 - access("A"."STAKZ" IS NULL)
9 - access("A"."AUGRS" IS NULL)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- 1 Sql Plan Directive used for this statement
If I try to execute both queries, the first one takes about 3 secs while the second one takes almost 8 mins.
Why is there such a difference between the two and why is it not captured in the execution plan?
Is my full table scan here correctly applied? If not, what would be the best hint?
What would be the best (up to date) book/documentation/online tutorials to upgrade my SQL performance skills? So far, I've seen that Oracle has a dev gym for performance and there are also some books like Advanced Oracle SQL Tuning which looks interesting.
With explain there is no real execution. You will receive a lot more details with real execution after turning statistics_level to ALL on session level.
But run it in SQLPLUS:
set serverout off
alter session set statistics_level=all;
<execute the query here without explain>
select * from table(dbms_xplan.display_cursor(null,null,'RUNSTATS_LAST'));
If you provide two such plans instead of EXPLAIN PLAN it will be easier for your readers.
Short description of important columns(Tanel Poder site is good):
Starts - number of execution of the step
A-Rows - actual number of rows received from this step
Buffers - number of reads from RAM - you have to decrease this value
Reads - number of reads from disk
Writes - number of writes to disk - different from 0 - mainly for SELECTs when TEMP is used
This method for plan usage is shown for example here : https://tanelpoder.com/files/Oracle_SQL_Plan_Execution.pdf

Query Optimization - subselect in Left Join

I'm working on optimizing a sql query, and I found a particular line that appears to be killing my queries performance:
LEFT JOIN anothertable lastweek
AND lastweek.date>= (SELECT MAX(table.date)-7 max_date_lweek
FROM table table
WHERE table.id= lastweek.id)
AND lastweek.date< (SELECT MAX(table.date) max_date_lweek
FROM table table
WHERE table.id= lastweek.id)
I'm working on a way of optimizing these lines, but I'm stumped. If anyone has any ideas, please let me know!
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1908654 | 145057704 | 720461 | 00:00:29 |
| * 1 | HASH JOIN RIGHT OUTER | | 1908654 | 145057704 | 720461 | 00:00:29 |
| 2 | VIEW | VW_DCL_880D8DA3 | 427487 | 7694766 | 716616 | 00:00:28 |
| * 3 | HASH JOIN | | 427487 | 39328804 | 716616 | 00:00:28 |
| 4 | VIEW | VW_SQ_2 | 7174144 | 193701888 | 278845 | 00:00:11 |
| 5 | HASH GROUP BY | | 7174144 | 294139904 | 278845 | 00:00:11 |
| 6 | TABLE ACCESS STORAGE FULL | TASK | 170994691 | 7010782331 | 65987 | 00:00:03 |
| * 7 | HASH JOIN | | 8549735 | 555732775 | 429294 | 00:00:17 |
| 8 | VIEW | VW_SQ_1 | 7174144 | 172179456 | 278845 | 00:00:11 |
| 9 | HASH GROUP BY | | 7174144 | 294139904 | 278845 | 00:00:11 |
| 10 | TABLE ACCESS STORAGE FULL | TASK | 170994691 | 7010782331 | 65987 | 00:00:03 |
| 11 | TABLE ACCESS STORAGE FULL | TASK | 170994691 | 7010782331 | 65987 | 00:00:03 |
| * 12 | TABLE ACCESS STORAGE FULL | TASK | 1908654 | 110701932 | 2520 | 00:00:01 |
-----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 1 - access("SYS_ID"(+)="TASK"."PARENT")
* 3 - access("ITEM_2"="TASK_LWEEK"."SYS_ID")
* 3 - filter("TASK_LWEEK"."SNAPSHOT_DATE"<"MAX_DATE_LWEEK")
* 7 - access("ITEM_1"="TASK_LWEEK"."SYS_ID")
* 7 - filter("TASK_LWEEK"."SNAPSHOT_DATE">=INTERNAL_FUNCTION("MAX_DATE_LWEEK"))
* 12 - storage("TASK"."CLOSED_AT" IS NULL OR "TASK"."CLOSED_AT">=TRUNC(SYSDATE#!)-15)
* 12 - filter("TASK"."CLOSED_AT" IS NULL OR "TASK"."CLOSED_AT">=TRUNC(SYSDATE#!)-15)
Well, you are not even showing the select. As I can see that the select is done over Exadata ( Table Access Storage Full ) , perhaps you need to ask yourself why do you need to make 4 access to the same table.
You access fourth times ( lines 6, 10, 11, 12 ) to the main table TASK with 170994691 rows ( based on estimation of the CBO ). I don't know whether the statistics are up-to-date or it is optimizing sampling kick in due to lack of good statistics.
A solution could be use WITH for generating intermediate results that you need several times in your outline query
with my_set as
(SELECT MAX(table.date)-7 max_date_lweek ,
max(table.date) as max_date,
id from FROM table )
select
.......................
from ...
left join anothertable lastweek on ( ........ )
left join myset on ( anothertable.id = myset.id )
where
lastweek.date >= myset.max_date_lweek
and
lastweek.date < myset.max_date
Please, take in account that you did not provide the query, so I am guessing a lot of things.
Since complete information is not available I will suggest:
You are using the same query twice then why not use CTE such as
with CTE_example as (SELECT MAX(table.date), max_date_lweek, ID
FROM table table)
Looking at your explain plan, the only table being accessed is TASK. From that, I infer that the tables in your example: ANOTHERTABLE and TABLE are actually the same table and that, therefore, you are trying to get the last week of data that exists in that table for each id value.
If all that is true, it should be much faster to use an analytic function to get the max date value for each id and then limit based on that.
Here is an example of what I mean. Note I use "dte" instead of "date", to remove confusion with the reserved word "date".
LEFT JOIN ( SELECT lastweek.*,
max(dte) OVER ( PARTITION BY id ) max_date
FROM anothertable lastweek ) lastweek
ON 1=1 -- whatever other join conditions you have, seemingly omitted from your post
AND lastweek.dte >= lastweek.max_date - 7;
Again, this only works if I am correct in thinking that table and anothertable are actually the same table.

How to use object to optimise the query

I need to optimise this query by creating a object. But I don't know how to do, and I don't understand why using object can optimise this query in this case.
I have a WINE table: (I cannot change the data type in this case)
CREATE TABLE wine (
vintage NUMBER(4) NOT NULL,
wine_no SMALLINT NOT NULL,
vid CHAR(08) NOT NULL,
cid CHAR(06) NOT NULL,
pctalc NUMBER(4, 2),
price NUMBER(6, 2),
grade CHAR(01) NOT NULL,
wname CHAR(40) NOT NULL,
comments CHAR(200) NOT NULL
);
I tried to create object by following this link:https://docs.oracle.com/cd/B19306_01/appdev.102/b14261/objects.htm
but I don't know it is the right track and how to implement
This is the query I need to optimise:
SELECT w.wname,
SUM(w.price) sold_total
FROM wine w
GROUP BY w.wname;
this is my explain plan, and would like to run it faster
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4045097665
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 45 | 32 (4)| 00:00:01 |
| 1 | HASH GROUP BY | | 1 | 45 | 32 (4)| 00:00:01 |
| 2 | TABLE ACCESS FULL| WINE | 1500 | 67500 | 31 (0)| 00:00:01 |
-------------------------------------------------------------------------------
9 rows selected.
Any thoughts?
Do I have any way to optimise this query (not change data type)?
Could someone help me and teach me, thank a lot!
This is the query I need to optimise:
SELECT w.wname,
SUM(w.price) sold_total
FROM wine w
GROUP BY w.wname;
How do you expect Oracle to tell you the total price of every single distinct value for WNAME without reading every row in the table and adding everything up?
Answer: it's can't. It's a great database, but it's not magic.
Now, what you can do is give Oracle something else to read instead to get the answer... something smaller than the whole table.
Option 1 - Covering Index
The easy way to do this is to make a so-called "covering" index on the table. A "covering" index is one that contains all of the columns that you use in your query, so that Oracle can use the index instead of the table. E.g.,
CREATE INDEX wine_sum_n1 ON wine (wname, price);
However, in your case, your table rows are not very wide. So, a covering index won't be that much smaller than the actual table. It would help though and it is a very easy approach.
Option 2 - Materialized View with ON QUERY COMPUTATION
Another way to give Oracle a smaller thing to read is to pre-compute all the sums in a materialized view. This is always problematic, because any DML changes to your table will cause the materialized view to become stale and you'll lose the performance benefits unless and until something refreshes it.
(Oracle has an ON COMMIT REFRESH option that avoids this problem, but that has several dangers and limitations. I avoid it for having been burned in the past, but it's still worth reading up on).
Oracle 12.2 introduced a really cool option for materialized views called ON QUERY COMPUTATION. This feature allows Oracle to still use materialized views, even if they are stale, by joining in data from the materialized view log. It could be a good option for you, so I'll give a full example, below.
-- Setup
DROP TABLE wine;
DROP MATERIALIZED VIEW wine_name_sum_mv;
CREATE TABLE wine (
vintage NUMBER(4) NOT NULL,
wine_no SMALLINT NOT NULL,
vid CHAR(08) NOT NULL,
cid CHAR(06) NOT NULL,
pctalc NUMBER(4, 2),
price NUMBER(6, 2),
grade CHAR(01) NOT NULL,
wname CHAR(40) NOT NULL,
comments CHAR(200) NOT NULL
);
INSERT INTO wine
SELECT mod(rownum,10000) vintage,
rownum wine_No,
'xxxxxxxx' vid,
'yyyyyy' cid,
0 pctalc,
50 price,
'z' grade,
'WINE #' || mod(rownum,100) wname,
'made up data for wine' comments
FROM DUAL
CONNECT BY ROWNUM <= 100000;
COMMIT;
CREATE MATERIALIZED VIEW LOG ON wine
WITH ROWID
(wname, price)
INCLUDING NEW VALUES;
CREATE MATERIALIZED VIEW wine_name_sum_mv
REFRESH FAST ON DEMAND
ENABLE QUERY REWRITE
ENABLE ON QUERY COMPUTATION
AS
SELECT w.wname,
sum(w.price) sold_total
FROM wine w
GROUP BY w.wname;
-- Verify material view is being used
EXPLAIN PLAN
SET STATEMENT_ID = 'MMCP001' FOR
SELECT w.wname,
SUM(w.price) sold_total
FROM wine w
GROUP BY w.wname;
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 4400 | 3 (0)| 00:00:01 |
| 1 | MAT_VIEW REWRITE ACCESS FULL| WINE_NAME_SUM_MV | 100 | 4400 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
-- Run the INSERT again to change the underlying table
INSERT INTO wine
SELECT mod(rownum,10000) vintage,
rownum wine_No,
'xxxxxxxx' vid,
'yyyyyy' cid,
0 pctalc,
50 price,
'z' grade,
'WINE #' || mod(rownum,100) wname,
'made up data for wine' comments
FROM DUAL
CONNECT BY ROWNUM <= 100000;
-- Verify whether material view is still being used
EXPLAIN PLAN
SET STATEMENT_ID = 'MMCP001' FOR
SELECT w.wname,
SUM(w.price) sold_total
FROM wine w
GROUP BY w.wname;
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 210 | 11550 | 30 (14)| 00:00:01 |
| 1 | VIEW | | 210 | 11550 | 30 (14)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
|* 3 | VIEW | VW_FOJ_0 | 100 | 5800 | 10 (10)| 00:00:01 |
|* 4 | HASH JOIN FULL OUTER | | 100 | 2500 | 10 (10)| 00:00:01 |
| 5 | VIEW | | 10 | 80 | 7 (15)| 00:00:01 |
| 6 | HASH GROUP BY | | 10 | 640 | 7 (15)| 00:00:01 |
|* 7 | TABLE ACCESS FULL | MLOG$_WINE | 1000 | 64000 | 6 (0)| 00:00:01 |
| 8 | VIEW | | 100 | 1700 | 3 (0)| 00:00:01 |
| 9 | MAT_VIEW ACCESS FULL | WINE_NAME_SUM_MV | 100 | 4400 | 3 (0)| 00:00:01 |
|* 10 | VIEW | VW_FOJ_1 | 100 | 7100 | 10 (10)| 00:00:01 |
|* 11 | HASH JOIN FULL OUTER | | 100 | 3700 | 10 (10)| 00:00:01 |
| 12 | VIEW | | 10 | 300 | 7 (15)| 00:00:01 |
| 13 | HASH GROUP BY | | 10 | 640 | 7 (15)| 00:00:01 |
|* 14 | TABLE ACCESS FULL | MLOG$_WINE | 1000 | 64000 | 6 (0)| 00:00:01 |
| 15 | VIEW | | 100 | 700 | 3 (0)| 00:00:01 |
| 16 | MAT_VIEW ACCESS FULL | WINE_NAME_SUM_MV | 100 | 4400 | 3 (0)| 00:00:01 |
| 17 | MERGE JOIN | | 10 | 1150 | 10 (20)| 00:00:01 |
| 18 | MAT_VIEW ACCESS BY INDEX ROWID| WINE_NAME_SUM_MV | 100 | 4400 | 2 (0)| 00:00:01 |
| 19 | INDEX FULL SCAN | I_SNAP$_WINE_NAME_SUM_MV | 100 | | 1 (0)| 00:00:01 |
|* 20 | SORT JOIN | | 10 | 710 | 8 (25)| 00:00:01 |
| 21 | VIEW | | 10 | 710 | 7 (15)| 00:00:01 |
| 22 | HASH GROUP BY | | 10 | 640 | 7 (15)| 00:00:01 |
|* 23 | TABLE ACCESS FULL | MLOG$_WINE | 1000 | 64000 | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("AV$0"."OJ_MARK" IS NULL)
4 - access(SYS_OP_MAP_NONNULL("SNA$0"."WNAME")=SYS_OP_MAP_NONNULL("AV$0"."GB0"))
7 - filter("MAS$"."SNAPTIME$$">TO_DATE(' 2019-09-19 15:02:46', 'syyyy-mm-dd hh24:mi:ss'))
10 - filter("SNA$0"."SNA_OJ_MARK" IS NULL)
11 - access(SYS_OP_MAP_NONNULL("SNA$0"."WNAME")=SYS_OP_MAP_NONNULL("AV$0"."GB0"))
14 - filter("MAS$"."SNAPTIME$$">TO_DATE(' 2019-09-19 15:02:46', 'syyyy-mm-dd hh24:mi:ss'))
20 - access(SYS_OP_MAP_NONNULL("WNAME")=SYS_OP_MAP_NONNULL("AV$0"."GB0"))
filter(SYS_OP_MAP_NONNULL("WNAME")=SYS_OP_MAP_NONNULL("AV$0"."GB0"))
23 - filter("MAS$"."SNAPTIME$$">TO_DATE(' 2019-09-19 15:02:46', 'syyyy-mm-dd hh24:mi:ss'))
What this is showing is that Oracle still benefits a lot from the materialized view. ON QUERY COMPUTATION seems like a really cool feature that gets us around many of the historical drawbacks of materialized views. DISCLOSURE: I have not used it yet in Production code. There may be pitfalls!
Also, you still want to refresh your materialized views periodically. The more data there is in the materialized view logs, the less ON QUERY COMPUTATION will help you.
Creating a PL/SQL Object type won't do anything to make your query faster.
Here's the plan for your query on a 19c database, no data, no stats, no indexes.
SQL_ID 703yw7hub9rq2, child number 0
-------------------------------------
SELECT w.wname, SUM(w.price) sold_total FROM wine w GROUP BY
w.wname
Plan hash value: 385313506
--------------------------------------------
| Id | Operation | Name | E-Rows |
--------------------------------------------
| 0 | SELECT STATEMENT | | |
| 1 | HASH GROUP BY | | 1 |
| 2 | TABLE ACCESS FULL| WINE | 1 |
--------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- Warning: basic plan statistics not available. These are only collected when:
* hint 'gather_plan_statistics' is used for the statement or
* parameter 'statistics_level' is set to 'ALL', at session or system level
For better help on your question, describe your performance problem. Show us the Execution Plan of your problematic SQL. Tell us about your STATS and any indexes you have.
General design feedback: I think what you want for your text columns, such as COMMENTS, is a VARCHAR2 - not a CHAR.
CHAR(8) will always take up 8 bytes (single byte data), even for strings of length 1, 2, 3..7. VARCHAR2() only stores the data as entered.

Trying to optimize a *random* query in Oracle SQL

I need to optimize a procedure in Oracle SQL, mainly using indexes. This is the statement:
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number) IS
begin
FOR I IN (SELECT * FROM (SELECT * FROM observations ORDER BY DBMS_RANDOM.VALUE)WHERE ROWNUM<=cuantos)
LOOP
DELETE FROM OBSERVATIONS WHERE nplate=i.nplate AND odatetime=i.odatetime;
END LOOP;
end del_obs;
My plan was to create an index related with rownum since it is what appears to be used to do the deletes. But I don't know if it is going to be worthy. The problem with this procedure is that its randomness causes a lot of consistent gets. Can anyone help me with this?? Thanks :)
Note: I cannot change the code, only make improvements afterwards
Use the ROWID pseudo-column to filter the columns:
CREATE OR REPLACE PROCEDURE DEL_OBS(
cuantos number
)
IS
BEGIN
DELETE FROM OBSERVATIONS
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM observations
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM < cuantos
);
END del_obs;
If you have an index on the table then it can use a index fast full scan:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( id ) AS
SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= 50000;
Query 1: No Index:
DELETE FROM table_name
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM table_name
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM <= 10000
)
Execution Plan:
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
----------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 24 | 123 | 00:00:02 |
| 1 | DELETE | TABLE_NAME | | | | |
| 2 | NESTED LOOPS | | 1 | 24 | 123 | 00:00:02 |
| 3 | VIEW | VW_NSO_1 | 10000 | 120000 | 121 | 00:00:02 |
| 4 | SORT UNIQUE | | 1 | 120000 | | |
| * 5 | COUNT STOPKEY | | | | | |
| 6 | VIEW | | 19974 | 239688 | 121 | 00:00:02 |
| * 7 | SORT ORDER BY STOPKEY | | 19974 | 239688 | 121 | 00:00:02 |
| 8 | TABLE ACCESS FULL | TABLE_NAME | 19974 | 239688 | 25 | 00:00:01 |
| 9 | TABLE ACCESS BY USER ROWID | TABLE_NAME | 1 | 12 | 1 | 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(ROWNUM<=10000)
* 7 - filter(ROWNUM<=10000)
Query 2 Add an index:
ALTER TABLE table_name ADD CONSTRAINT tn__id__pk PRIMARY KEY ( id )
Query 3 With the index:
DELETE FROM table_name
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM table_name
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM <= 10000
)
Execution Plan:
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 37 | 13 | 00:00:01 |
| 1 | DELETE | TABLE_NAME | | | | |
| 2 | NESTED LOOPS | | 1 | 37 | 13 | 00:00:01 |
| 3 | VIEW | VW_NSO_1 | 9968 | 119616 | 11 | 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 119616 | | |
| * 5 | COUNT STOPKEY | | | | | |
| 6 | VIEW | | 9968 | 119616 | 11 | 00:00:01 |
| * 7 | SORT ORDER BY STOPKEY | | 9968 | 119616 | 11 | 00:00:01 |
| 8 | INDEX FAST FULL SCAN | TN__ID__PK | 9968 | 119616 | 9 | 00:00:01 |
| 9 | TABLE ACCESS BY USER ROWID | TABLE_NAME | 1 | 25 | 1 | 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(ROWNUM<=10000)
* 7 - filter(ROWNUM<=10000)
If you cannot do it in single SQL statement using ROWID then you can rewrite your existing procedure to use exactly the same queries but use the FORALL statement:
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number)
IS
TYPE obs_tab IS TABLE OF observations%ROWTYPE;
begin
SELECT *
BULK COLLECT INTO obs_tab
FROM (
SELECT * FROM observations ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM<=cuantos;
FORALL i IN 1 .. obs_tab.COUNT
DELETE FROM OBSERVATIONS
WHERE nplate = obs_tab(i).nplate
AND odatetime = obs_tab(i).odatetime;
END del_obs;
What you definitively need is an index on OBSERVATIONS to allow the DELETEwith an index access.
CREATE INDEX cuantos ON OBSERVATIONS(nplate, odatetime);
The execution of the procedure will lead to one FULL TABLE SCANot the OBSERVATIONS table and to one INDEX ACCESS for each deleted record.
For a limited number deleted recrods it will behave similar as the set DELETEproposed in other answer; for larger number of deleted records the elapsed time will linerary scale with the number of deletes.
For a non-trival number of deleted records you must assume that the index is not completely in the buffer pool and lots of disc access will be requried. So you'll end with approximately 100 deleted rows per second.
In other words to delete 100K rows it will take ca. 1/4 hour.
To delete 1M rows you need 2 3/4 of an hour.
You see while deleting in this scale the first part of the task - the FULL SCAN of your table is neglectable, it will take few minutes only. The only possibility to get acceptable response time in this case is to switch the logic to a single DELETEstatement as proposed in other answers.
This behavior is also called the rule: "Row by Row is Slow by Slow" (i.e. processing in a loop works fine, but only with a limited number of records).
You can do this using a single delete statement:
delete from observations o
where (o.nplate, o.odatetime) in (select nplace, odatetime
from (select o2.nplate, o2.odatetime
from observations o2
order by DBMS_RANDOM.VALUE
) o2
where rownum <= v_cuantos
);
This is often faster than executing multiple queries for each row being deleted.
Try this. test on MSSQL hopes so it will work also on Oracle. please remarks the status.
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number) IS
begin
DELETE OBSERVATIONS FROM OBSERVATIONS
join (select * from OBSERVATIONS ORDER BY VALUE ) as i on
nplate=i.nplate AND
odatetime=i.odatetime AND
i.ROWNUM<=cuantos;
End DEL_OBS;
Since you say that nplate and odatetime are the primary key of observations, then I am guessing the problem is here:
SELECT * FROM (
SELECT *
FROM observations
ORDER BY DBMS_RANDOM.VALUE)
WHERE ROWNUM<=cuantos;
There is no way to prevent that from performing a full scan of observations, plus a lot of sorting if that's a big table.
You need to change the code that runs. By far, the easiest way to change the code is to change the source code and recompile it.
However, there are ways to change the code that executes without changing the source code. Here are two:
(1) Use DBMS_FGAC to add a policy that detects whether you are in this procedure and, if so, add a predicate to the observations table like this:
AND rowid IN
( SELECT obs_sample.rowid
FROM observations sample (0.05) obs_sample)
(2) Use DBMS_ADVANCED_REWRITE to rewrite your query changing:
FROM observations
.. to ..
FROM observations SAMPLE (0.05)
Using the text of your query in the re-write policy should prevent it from affecting other queries against the observations table.
Neither of these are easy (at all), but can be worth a try if you are really stuck.