Sequence of query execution - sql

When there is a correlated query, what is the sequence of execution?
Ex:
select
p.productNo,
(
select count(distinct concat(bom.detailpart,bom.groupname))
from dl_MBOM bom
where bom.DetailPart=p.ProductNo
) cnt1
from dm_product p

The execution plan will vary by database vendors. For Oracle, here is a similar query, and the corresponding execution plan.
select dname,
( select count( distinct job )
from emp e
where e.deptno = d.deptno
) x
from dept d
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 6 (100)| |
| 1 | SORT GROUP BY | | 1 | 11 | | |
|* 2 | TABLE ACCESS FULL| EMP | 5 | 55 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | DEPT | 4 | 52 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("E"."DEPTNO"=:B1)

While it seems likely that the DBMS reads record for record from dm_product and for each such record looks up the value in dl_MBOM, this doesn't necessarily happen.
With an SQL query you tell the DBMS mainly what to do, not how to do it. If the DBMS thinks it better to build a join instead and work on this, it is free to do so.
Short answer: the sequence of execution is not determined. (You can, however, in many DBMS look at the query's execution plan to see how it is executed.)

Related

Performance impact of multiple aggregate functions in HAVING clause

I am wondering if there is a difference between the two queries below.
I am looking for a general answer to explain how the optimizer treats each of these answers. There is an index on t.id.
The version of Oracle is 11g.
select t.id, sum(t.amount)
from transaction t
group by t.id
having sum(t.amount) between -0.009 and 0.009
select t.id, sum(t.amount)
from transaction t
group by t.id
having sum(t.amount) >= -0.009 and sum(t.amount)<= 0.009
In an aggregation query, most of the work involves moving the data around. There is some overhead for aggregations, but it is usually pretty simple.
And, the SQL compiler can decide if it wants to re-use aggregated expressions. Just because you use sum(amount) twice in the query doesn't mean that it gets executed twice.
Some aggregation functions are more expensive -- especially on strings or using distinct. You can always test queries to see if there is much impact, but in general, you should worry about whether your logic is correct not how many times you are using aggregation functions.
If you want to obseve basic information about the steps decided by the CBO for the execution of SQL statement use explain plan
Example
EXPLAIN PLAN SET STATEMENT_ID = 'jara1' into plan_table FOR
select DEPARTMENT_ID, sum(salary)
from HR.employees
group by DEPARTMENT_ID
having sum(salary) between 5000 and 10000
;
--
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));
The query returns
Plan hash value: 244580604
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 4 (25)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 1 | 7 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL| EMPLOYEES | 107 | 749 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
3 - SEL$1 / EMPLOYEES#SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUM("SALARY")>=5000 AND SUM("SALARY")<=10000)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (rowset=256) "DEPARTMENT_ID"[NUMBER,22], SUM("SALARY")[22]
2 - (#keys=1; rowset=256) "DEPARTMENT_ID"[NUMBER,22],
SUM("SALARY")[22]
3 - (rowset=256) "SALARY"[NUMBER,22], "DEPARTMENT_ID"[NUMBER,22]
So first of all you see a TABLE ACCESS FULL is performed (line 3), so your index assumption is not correct.
As pointed in other answer, you see the between is translated in two perdicates connected with and (filter line 1).
But most impertant fro yur question is the Column Projection, you see that the sum(SALARY) is calculated in line 2 (HASH GROUP BY operation) and passed to the line 1 (FILTER), in both cases only once (one column with length 22).
So don't worry about multiple calculation.
There is absolutely no difference between the two queries. between is just syntactical sugar; the parser immediately transforms the between condition into the two inequalities, combined with the and operator. This is done even before the optimizer sees the query. (Note that in this context the distinction between the parsing and the optimization stages is meaningful, even though often programmers think of them as a single step.)
Trivial example:
SQL> set autotrace traceonly explain
SQL> select deptno, sum(sal) as sum_sal
2 from scott.emp
3 group by deptno
4 having sum(sal) between 10000 and 20000
5 ;
Execution Plan
----------------------------------------------------------
Plan hash value: 2138686577
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 4 (25)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 1 | 7 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL| EMP | 14 | 98 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUM("SAL")>=10000 AND SUM("SAL")<=20000)
The "index on..." thing that you mention has nothing to do with the question.
Another fun way to test this:
with function expand_sql_text(text_in varchar2)
return varchar2
as
text_out long;
begin
dbms_utility.expand_sql_text(text_in, text_out);
return text_out;
end expand_sql_text;
select expand_sql_text(
'select * from dual where 2 between 1 and 3'
) as text_out
from dual
/
TEXT_OUT
------------------------------------------------------------------------------------------------------------------------------------------------------------
SELECT "A1"."DUMMY" "DUMMY" FROM "SYS"."DUAL" "A1" WHERE 2>=1 AND 2<=3
1 row selected.
In your original question, the second predicate was
having sum(t.amount) > -0.009 and sum(t.amount)< 0.009
which is not the same as the between version, because between is not exclusive.
In SQL generally, filter predicates against simple literals do not normally lead to any significant performance overhead. In a group by clause, the fact that the predicate is applied after aggregation reduces any overhead even further.

Performance problem with QUERY using BIND variables and OR condition in Oracle 12.2

I am having a hard time understanding why the Oracle CBO is behaving the way it does when a bind variable is part of a OR condition.
My environment
Oracle 12.2 over Red Hat Linux 7
HINT. I am just providing a simplification of the query where the problem is located
$ sqlplus / as sysdba
SQL*Plus: Release 12.2.0.1.0 Production on Thu Jun 10 15:40:07 2021
Copyright (c) 1982, 2016, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> #test.sql
SQL> var loanIds varchar2(4000);
SQL> exec :loanIds := '100000018330,100000031448,100000013477,100000023115,100000022550,100000183669,100000247514,100000048198,100000268289';
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.00
SQL> SELECT
2 whs.* ,
3 count(*) over () AS TOTAL
4 FROM ALFAMVS.WHS_LOANS whs
5 WHERE
6 ( nvl(:loanIds,'XX') = 'XX' or
7 loanid IN (select regexp_substr(NVL(:loanIds,''),'[^,]+', 1, level) from dual
8 connect by level <= regexp_count(:loanIds,'[^,]+'))
9 )
10 ;
7 rows selected.
Elapsed: 00:00:18.72
Execution Plan
----------------------------------------------------------
Plan hash value: 2980809427
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6729 | 6748K| 2621 (1)| 00:00:01 |
| 1 | WINDOW BUFFER | | 6729 | 6748K| 2621 (1)| 00:00:01 |
|* 2 | FILTER | | | | | |
| 3 | TABLE ACCESS FULL | WHS_LOANS | 113K| 110M| 2621 (1)| 00:00:01 |
|* 4 | FILTER | | | | | |
|* 5 | CONNECT BY WITHOUT FILTERING (UNIQUE)| | | | | |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NVL(:LOANIDS,'XX')='XX' OR EXISTS (SELECT 0 FROM "DUAL" "DUAL" WHERE
SYS_OP_C2C( REGEXP_SUBSTR (NVL(:LOANIDS,''),'[^,]+',1,LEVEL))=:B1 CONNECT BY LEVEL<=
REGEXP_COUNT (:LOANIDS,'[^,]+')))
4 - filter(SYS_OP_C2C( REGEXP_SUBSTR (NVL(:LOANIDS,''),'[^,]+',1,LEVEL))=:B1)
5 - filter(LEVEL<= REGEXP_COUNT (:LOANIDS,'[^,]+'))
Statistics
----------------------------------------------------------
288 recursive calls
630 db block gets
9913 consistent gets
1 physical reads
118724 redo size
13564 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
113003 sorts (memory)
0 sorts (disk)
7 rows processed
SQL> set autotrace off
SQL> select count(*) from ALFAMVS.WHS_LOANS ;
COUNT(*)
----------
113095
1 row selected.
Elapsed: 00:00:00.14
KEY POINTS
I do know that if I change the OR expression by using two selects and UNION ALL works perfectly. The problem is that I have a lot of conditions done in the same way, so UNION ALL is not a solution in my case.
The table has statistics up to date calculated with FOR ALL COLUMNS SIZE AUTO and with ESTIMATE PERCENT 10%.
Dynamic SQL is not a solution in my case, because the query is called through a third party software that uses an API Web to convert the result to JSON.
I was able to rephrase the regular expression with connect by level in a way that now takes 19 seconds. Before it was taking 40 seconds.
The table has only 113K records and no indexes.
The query has 20 conditions of this kind, all written in the same way, as the screen in the web app that triggers the query by the API allows the user to use any combination of parameters or none at all.
If I remove the expression NVL(:loanIds,'XX') = 'XX' OR, the query takes 0.01 seconds. Why this OR expression with BINDs is giving such headache to the Optimizer ?
-- UPDATE --
I want to thank #Alex Poole for his suggestions and share with him that the third alternative ( removing the regular expressions ) has worked as a charm. It would be great to understand why, though. You have my most sincere gratitude. I used those for a while and I never had this problem. Also, the suggestion to use regexp_like was even better than the original one with regexp_substr and connect by level, but much slower by far than the one where no regular expressions are used at all
Original query
7 rows selected.
Elapsed: 00:00:36.29
New query
7 rows selected.
Elapsed: 00:00:00.58
Once the EXISTS disappeared of the internal predicate, the query works as fast as hell.
Thank you all for your comments !
From the execution plan the optimiser is, for some reason, re-evaluating the hierarchical query for every row in your table, and then using exists() to see if that row's ID is in the result. It isn't clear why the or is causing that. It might be something to raise with Oracle.
From experimenting I can see three ways to at least partially work around the problem - though I'm sure there are others. The first is to move the CSV expansion to a CTE and then force that to materialize with a hint:
WITH loanIds_cte (loanId) as (
select /*+ materialize */ regexp_substr(:loanIds,'[^,]+', 1, level)
from dual
connect by level <= regexp_count(:loanIds,'[^,]+')
)
SELECT
whs.* ,
count(*) over () AS TOTAL
FROM WHS_LOANS whs
WHERE
( :loanIds is null or
loanid IN (select loanId from loanIds_cte)
)
;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 3226738189
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1102 | 9918 | 11 (0)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9FD2A6_198A2E1A | | | | |
|* 3 | CONNECT BY WITHOUT FILTERING| | | | | |
| 4 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 5 | WINDOW BUFFER | | 1102 | 9918 | 9 (0)| 00:00:01 |
|* 6 | FILTER | | | | | |
| 7 | TABLE ACCESS FULL | WHS_LOANS | 11300 | 99K| 9 (0)| 00:00:01 |
|* 8 | VIEW | | 1 | 2002 | 2 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9FD2A6_198A2E1A | 1 | 2002 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(LEVEL<= REGEXP_COUNT (:LOANIDS,'[^,]+'))
6 - filter(:LOANIDS IS NULL OR EXISTS (SELECT 0 FROM (SELECT /*+ CACHE_TEMP_TABLE ("T1") */ "C0"
"LOANID" FROM "SYS"."SYS_TEMP_0FD9FD2A6_198A2E1A" "T1") "LOANIDS_CTE" WHERE SYS_OP_C2C("LOANID")=:B1))
8 - filter(SYS_OP_C2C("LOANID")=:B1)
That still does the odd transformation to exists(), but at least now that is querying the materialized CTE, so that connect by query is only evaluated one.
Or you could compare each loadId value with the full string using a regular expression:
SELECT
whs.* ,
count(*) over () AS TOTAL
FROM WHS_LOANS whs
WHERE
( :loanIds is null or
regexp_like(:loanIds, '(^|,)' || loanId || '(,|$)')
)
;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 1622376598
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1102 | 9918 | 9 (0)| 00:00:01 |
| 1 | WINDOW BUFFER | | 1102 | 9918 | 9 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| WHS_LOANS | 1102 | 9918 | 9 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(:LOANIDS IS NULL OR REGEXP_LIKE
(:LOANIDS,SYS_OP_C2C(U'(^|,)'||"LOANID"||U'(,|$)')))
which is slower than the CTE in my testing because regular expression are still expensive and you're doing 113k of them (still, better than 2 x 113k x number-of-elements of them).
Or you can avoid regular expressions and use several separate comparisons:
SELECT
whs.* ,
count(*) over () AS TOTAL
FROM WHS_LOANS whs
WHERE
( :loanIds is null or
:loanIds like loanId || ',%' or
:loanIds like '%,' || loanId or
:loanIds like '%,' || loanId || ',%'
)
;
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 1622376598
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2096 | 18864 | 9 (0)| 00:00:01 |
| 1 | WINDOW BUFFER | | 2096 | 18864 | 9 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| WHS_LOANS | 2096 | 18864 | 9 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(:LOANIDS IS NULL OR :LOANIDS LIKE
SYS_OP_C2C("LOANID"||U',%') OR :LOANIDS LIKE
SYS_OP_C2C(U'%,'||"LOANID") OR :LOANIDS LIKE
SYS_OP_C2C(U'%,'||"LOANID"||U',%'))
which is fastest of those three options in my limited testing. But there may well be better and faster approaches.
Not really relevant, but you seem to be running this as SYS which isn't a good idea, even if the data is in another schema; your loanId column appears to be nvarchar2 (from the SYS_OP_C2C calls), which seems odd for something that could possibly be a number but in any case only seems likely to have ASCII characters; NVL(:loanIds,'') doesn't do anything, since null and empty string are the same in Oracle; and nvl(:loanIds,'XX') = 'XX' can be done as :loanIds is not null which avoid magic values.

Trying to optimize a *random* query in Oracle SQL

I need to optimize a procedure in Oracle SQL, mainly using indexes. This is the statement:
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number) IS
begin
FOR I IN (SELECT * FROM (SELECT * FROM observations ORDER BY DBMS_RANDOM.VALUE)WHERE ROWNUM<=cuantos)
LOOP
DELETE FROM OBSERVATIONS WHERE nplate=i.nplate AND odatetime=i.odatetime;
END LOOP;
end del_obs;
My plan was to create an index related with rownum since it is what appears to be used to do the deletes. But I don't know if it is going to be worthy. The problem with this procedure is that its randomness causes a lot of consistent gets. Can anyone help me with this?? Thanks :)
Note: I cannot change the code, only make improvements afterwards
Use the ROWID pseudo-column to filter the columns:
CREATE OR REPLACE PROCEDURE DEL_OBS(
cuantos number
)
IS
BEGIN
DELETE FROM OBSERVATIONS
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM observations
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM < cuantos
);
END del_obs;
If you have an index on the table then it can use a index fast full scan:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( id ) AS
SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= 50000;
Query 1: No Index:
DELETE FROM table_name
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM table_name
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM <= 10000
)
Execution Plan:
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
----------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 24 | 123 | 00:00:02 |
| 1 | DELETE | TABLE_NAME | | | | |
| 2 | NESTED LOOPS | | 1 | 24 | 123 | 00:00:02 |
| 3 | VIEW | VW_NSO_1 | 10000 | 120000 | 121 | 00:00:02 |
| 4 | SORT UNIQUE | | 1 | 120000 | | |
| * 5 | COUNT STOPKEY | | | | | |
| 6 | VIEW | | 19974 | 239688 | 121 | 00:00:02 |
| * 7 | SORT ORDER BY STOPKEY | | 19974 | 239688 | 121 | 00:00:02 |
| 8 | TABLE ACCESS FULL | TABLE_NAME | 19974 | 239688 | 25 | 00:00:01 |
| 9 | TABLE ACCESS BY USER ROWID | TABLE_NAME | 1 | 12 | 1 | 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(ROWNUM<=10000)
* 7 - filter(ROWNUM<=10000)
Query 2 Add an index:
ALTER TABLE table_name ADD CONSTRAINT tn__id__pk PRIMARY KEY ( id )
Query 3 With the index:
DELETE FROM table_name
WHERE ROWID IN (
SELECT rid
FROM (
SELECT ROWID AS rid
FROM table_name
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM <= 10000
)
Execution Plan:
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 37 | 13 | 00:00:01 |
| 1 | DELETE | TABLE_NAME | | | | |
| 2 | NESTED LOOPS | | 1 | 37 | 13 | 00:00:01 |
| 3 | VIEW | VW_NSO_1 | 9968 | 119616 | 11 | 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 119616 | | |
| * 5 | COUNT STOPKEY | | | | | |
| 6 | VIEW | | 9968 | 119616 | 11 | 00:00:01 |
| * 7 | SORT ORDER BY STOPKEY | | 9968 | 119616 | 11 | 00:00:01 |
| 8 | INDEX FAST FULL SCAN | TN__ID__PK | 9968 | 119616 | 9 | 00:00:01 |
| 9 | TABLE ACCESS BY USER ROWID | TABLE_NAME | 1 | 25 | 1 | 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(ROWNUM<=10000)
* 7 - filter(ROWNUM<=10000)
If you cannot do it in single SQL statement using ROWID then you can rewrite your existing procedure to use exactly the same queries but use the FORALL statement:
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number)
IS
TYPE obs_tab IS TABLE OF observations%ROWTYPE;
begin
SELECT *
BULK COLLECT INTO obs_tab
FROM (
SELECT * FROM observations ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM<=cuantos;
FORALL i IN 1 .. obs_tab.COUNT
DELETE FROM OBSERVATIONS
WHERE nplate = obs_tab(i).nplate
AND odatetime = obs_tab(i).odatetime;
END del_obs;
What you definitively need is an index on OBSERVATIONS to allow the DELETEwith an index access.
CREATE INDEX cuantos ON OBSERVATIONS(nplate, odatetime);
The execution of the procedure will lead to one FULL TABLE SCANot the OBSERVATIONS table and to one INDEX ACCESS for each deleted record.
For a limited number deleted recrods it will behave similar as the set DELETEproposed in other answer; for larger number of deleted records the elapsed time will linerary scale with the number of deletes.
For a non-trival number of deleted records you must assume that the index is not completely in the buffer pool and lots of disc access will be requried. So you'll end with approximately 100 deleted rows per second.
In other words to delete 100K rows it will take ca. 1/4 hour.
To delete 1M rows you need 2 3/4 of an hour.
You see while deleting in this scale the first part of the task - the FULL SCAN of your table is neglectable, it will take few minutes only. The only possibility to get acceptable response time in this case is to switch the logic to a single DELETEstatement as proposed in other answers.
This behavior is also called the rule: "Row by Row is Slow by Slow" (i.e. processing in a loop works fine, but only with a limited number of records).
You can do this using a single delete statement:
delete from observations o
where (o.nplate, o.odatetime) in (select nplace, odatetime
from (select o2.nplate, o2.odatetime
from observations o2
order by DBMS_RANDOM.VALUE
) o2
where rownum <= v_cuantos
);
This is often faster than executing multiple queries for each row being deleted.
Try this. test on MSSQL hopes so it will work also on Oracle. please remarks the status.
CREATE OR REPLACE PROCEDURE DEL_OBS(cuantos number) IS
begin
DELETE OBSERVATIONS FROM OBSERVATIONS
join (select * from OBSERVATIONS ORDER BY VALUE ) as i on
nplate=i.nplate AND
odatetime=i.odatetime AND
i.ROWNUM<=cuantos;
End DEL_OBS;
Since you say that nplate and odatetime are the primary key of observations, then I am guessing the problem is here:
SELECT * FROM (
SELECT *
FROM observations
ORDER BY DBMS_RANDOM.VALUE)
WHERE ROWNUM<=cuantos;
There is no way to prevent that from performing a full scan of observations, plus a lot of sorting if that's a big table.
You need to change the code that runs. By far, the easiest way to change the code is to change the source code and recompile it.
However, there are ways to change the code that executes without changing the source code. Here are two:
(1) Use DBMS_FGAC to add a policy that detects whether you are in this procedure and, if so, add a predicate to the observations table like this:
AND rowid IN
( SELECT obs_sample.rowid
FROM observations sample (0.05) obs_sample)
(2) Use DBMS_ADVANCED_REWRITE to rewrite your query changing:
FROM observations
.. to ..
FROM observations SAMPLE (0.05)
Using the text of your query in the re-write policy should prevent it from affecting other queries against the observations table.
Neither of these are easy (at all), but can be worth a try if you are really stuck.

How to find a missing value from another table

Imagine if I had two tables like this:
Table Name: user
| id | user_id | password |
Table Name: permissions
| id | admin | write | delete | transfer |
And populated the tables with this:
Inserting into user table:
- 0, joshsh, asdf01
- 1, jakesh, asdf02
- 2, annsh, asdf03
- 3, lamsh, asdf04
Inserting into permissions table:
- 0, yes, yes, yes, yes
- 1, yes, yes, yes, yes
- 2, no, yes, yes, yes
And I didn't add in the 4th value into the permissions table. How would I query a statement as to find which numbers I forgot (in case it was a big database)?
There are many ways to get the same result, these are two possible ways:
With a NOT EXISTS:
SQL> select u.*
2 from user_ u
3 where not exists (
4 select 1
5 from permissions p
6 where u.id = p.id
7 );
ID USER_ID PASSWORD
---------- ---------- ----------
3 lamsh asdf04
Execution Plan
----------------------------------------------------------
Plan hash value: 3342498783
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 160 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN ANTI | | 4 | 160 | 7 (15)| 00:00:01 |
| 2 | TABLE ACCESS FULL| USER_ | 4 | 108 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| PERMISSIONS | 3 | 39 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("U"."ID"="P"."ID")
Note
-----
- dynamic sampling used for this statement (level=2)
And with an outer join:
SQL> select u.*
2 from user_ u
3 left outer join permissions p
4 on (u.id = p.id)
5 where p.id is null;
ID USER_ID PASSWORD
---------- ---------- ----------
3 lamsh asdf04
Execution Plan
----------------------------------------------------------
Plan hash value: 3342498783
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 160 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN ANTI | | 4 | 160 | 7 (15)| 00:00:01 |
| 2 | TABLE ACCESS FULL| USER_ | 4 | 108 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| PERMISSIONS | 3 | 39 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("U"."ID"="P"."ID")
Note
-----
- dynamic sampling used for this statement (level=2)
Notice that Oracle did the same explain plan for both statements, considering that the test has been made with two very small tables with no indexes and no statistics.
Also, consider that there are many ways to get the same result; performance strongly depends on you data, stats, indexes, ...
ps I used USER_ instead of USER, to avoid problems.
Assuming that the id column is the join between the permissions and user tables based on the id column (and that id is the primary/unique key in both tables), here are a couple of solutions:
select id from permissions
minus
select id from user;
or
select * from user
where id not in (select id from permissions);
By applying count(*) on both the tables if they are not equal then alert will be generated and that altert can be generated using sql server agent job.
or you can do this
Select * from user as u
full outer join permissions as p
on a.id in(select id from user)
The best way is to use LEFT JOIN with and IS NULL in where clause, it's elegant and efficient in every aspects:
SELECT u.*
FROM user u
LEFT JOIN permissions p ON u.id = p.id
WHERE p.id IS NULL

Oracle intermediate join table size

When we join more than 2 tables, oracle or for that matter any database decides to join 2 tables and use the result to join with subsequent tables. Is there a way to identify the intermediate join size. I am particularly interested in oracle. One solution I know is to use Autotrace in sqldeveloper which has the column LAST_OUTPUT_ROWS. But for queries executed by pl/sql and other means does oracle record the intermediate join size in some table?
I am asking this because recently we had a problem as someone dropped the statistics and failed to regenerate it and when traced through we found that oracle formed an intermediate table of 180 million rows before arriving at the final result of 6 rows and the query was quite slow.
Oracle can materialize the intermediate results of a table join in the temporary segment set for your session.
Since it's a one-off table that is deleted after the query is complete, its statistics are not stored.
However, you can estimate its size by building a plan for the query and looking at ROWS parameters of the appropriate operation:
EXPLAIN PLAN FOR
WITH q AS
(
SELECT /*+ MATERIALIZE */
e1.value AS val1, e2.value AS val2
FROM t_even e1, t_even e2
)
SELECT COUNT(*)
FROM q
SELECT *
FROM TABLE(DBMS_XPLAN.display())
Plan hash value: 3705384459
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 43G (5)|999:59:59 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | | | | | |
| 3 | MERGE JOIN CARTESIAN | | 100T| 909T| 42G (3)|999:59:59 |
| 4 | TABLE ACCESS FULL | T_ODD | 10M| 47M| 4206 (3)| 00:00:51 |
| 5 | BUFFER SORT | | 10M| 47M| 42G (3)|999:59:59 |
| 6 | TABLE ACCESS FULL | T_ODD | 10M| 47M| 4204 (3)| 00:00:51 |
| 7 | SORT AGGREGATE | | 1 | | | |
| 8 | VIEW | | 100T| | 1729M (62)|999:59:59 |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6604_2660595 | 100T| 909T| 1729M (62)|999:59:59 |
---------------------------------------------------------------------------------------------------------
Here, the materialized table is called SYS_TEMP_0FD9D6604_2660595 and the estimated record count is 100T (100,000,000,000,000 records)