Let's imagine that the table my_table is divided into 1000 partitions as the following example:
P1, P2, P3, ... , P997, P998, P999, P1000
Partitions are organized by dates, mostly a partition per day. E.g.:
P0 < 01/01/2000 => Contains around 472M records
P1 = 01/01/2000 => Contains around 15k records
P2 = 02/01/2000 => Contains around 15k records
P3 = 03/01/2000 => Contains around 15k records
... = ../../.... => Contains around ... records
P997 = 07/04/2000 => Contains around 15k records
P998 = 08/04/2000 => Contains around 15k records
P999 = 09/04/2000 => Contains around 15k records
P1000 = 10/04/2000 => Contains around 15k records
Please notice that P0 is < to 01/01/2000, NOT =
CURRENT SITUATION:
When looking for a specific record without knowing the date, I am doing a:
SELECT * FROM my_schema.my_table WHERE ... ;
But this take too much time because it does include P0 (30s).
IMPOSSIBLE SOLUTION:
So the best idea would be to execute an SQL query such as:
SELECT * FROM my_schema.my_table FROM PARTITION(P42) WHERE ... ;
But we never know in which partition is the record. We don't know either the date associated to the partition. And of course we won't loop over all partitions 1 by 1
BAD SOLUTION:
I could be clever by doing 5 by 5:
SELECT * FROM my_schema.my_table FROM PARTITION(P40,P41,P42,P43,P44) WHERE ... ;
However same issue as above, I won't loop over all partitions, even 5 by 5
LESS BAD SOLUTION:
I won't run either do (excluding P0 in the list):
SELECT * FROM my_schema.my_table FROM PARTITION(P1,P2,...,P99,P100) WHERE ... ;
The query would be too long and I would have to compute for each request the list of partitions names since it could not always start by P1 or end by P100 (each days some partitions are dropped, some are created)
CLEVER SOLUTION (but does it exist?):
How can I do something like this?
SELECT * FROM my_schema.my_table NOT IN PARTITION(P0) WHERE ... ;
or
SELECT * FROM my_schema.my_table PARTITION(*,-P0) WHERE ... ;
or
SELECT * FROM my_schema.my_table LESS PARTITION(P0) WHERE ... ;
or
SELECT * FROM my_schema.my_table EXCLUDE PARTITION(P0) WHERE ... ;
Is there any way to do that?
I mean a way to select all partitions expect one or some of them?
Note: I don't know in advance the value of the dateofSale. Inside the table, we have something like
CREATE TABLE my_table
(
recordID NUMBER(16) NOT NULL, --not primary
dateOfSale DATE NOT NULL, --unknown
....
<other fields>
)
Before you answer, read the following:
Index usage: yes, it is already optimized, but remember, we do not know the partitioning date
No we won't drop records in P0, we need to keep them for at least few years (3, 5 and sometimes 10 according each country laws)
We can "split" P0 into several partitions, but that won't solve the issue with a global SELECT
We cannot move that data into a new table, we need them to be kept in this table since we have multiple services and softwares performing select in it. We would have to edit to much code to add a query for the second table for each services and back-end.
We cannot do an aka WHERE date > 2019 clause and index the date field for multiples reasons that would take too much time to explain here.
The query below, ie two queries in a UNION ALL but I only want 1 row, will stop immediately a row is found. We do not need to go into the second part of the UNION ALL if we get a row in the first.
SQL> select * from
2 ( select x
3 from t1
4 where x = :b1
5 union all
6 select x
7 from t2
8 where x = :b1
9 )
10 where rownum = 1
11 /
See https://connor-mcdonald.com/golden-oldies/first-match-written-15-10-2007/ for a simple proof of this.
I'm assuming that you're working under the assumption that most of the time, the record you are interested in is in your most recent smaller partitions. In the absence of any other information to hone on in the right partition, you could do
select * from
( select ...
from tab
where trans_dt >= DATE'2000-01-01'
and record_id = :my_record
union all
select x
from tab
where trans_dt < DATE'2000-01-01'
and record_id = :my_record
)
where rownum = 1
which will only scan the big partition if we fall through and don't find it anywhere else.
But your problem does seem to be screaming out for an index to avoid all this work
Let's simplify your partitioned table as follows
CREATE TABLE tab
( trans_dt DATE
)
PARTITION BY RANGE (trans_dt)
( PARTITION p0 VALUES LESS THAN (DATE'2000-01-01')
, PARTITION p1 VALUES LESS THAN (DATE'2000-01-02')
, PARTITION p2 VALUES LESS THAN (DATE'2000-01-03')
, PARTITION p3 VALUES LESS THAN (DATE'2000-01-04')
);
If you want to skip your large partition P0 in a query, you simple (as this is the first partition) constraints the partition key as trans_dt >= DATE'2000-01-01'
You will need two predicates and or to skip a partition in the middle
The query
select * from tab
where trans_dt >= DATE'2000-01-01';
Checking the execution plan you see the expected behaviour in Pstart = 2(i.e. the 1st partition is pruned).
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 9 | 2 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE ITERATOR | | 1 | 9 | 2 (0)| 00:00:01 | 2 | 4 |
| 2 | TABLE ACCESS STORAGE FULL| TAB | 1 | 9 | 2 (0)| 00:00:01 | 2 | 4 |
---------------------------------------------------------------------------------------------------
Remember, if you scan a partitioned table without constraining the partition key you will have to scall all partitions.
If you know, that most of the query results are in the recent and small partitions, simple scan tme in the first query
select * from tab
where trans_dt >= DATE'2000-01-01' and <your filter>
and only if you fail to get the row scan the large partition
select * from tab
where trans_dt < DATE'2000-01-01' and <your filter>
You will get much better response time on average if the assumption is true that the queries refer mostly the recent data.
Although there is no syntax to exclude a specific partition, you can build a pipelined table function that dynamically builds a query that uses every partition except for one.
The table function builds a query like the one below. The function uses the data dictionary view USER_TAB_PARTITIONS to get the partition names to build the SQL, uses dynamic SQL to execute the query, and then pipes the results back to the caller.
select * from my_table partition (P1) union all
select * from my_table partition (P2) union all
...
select * from my_table partition (P1000);
Sample schema
CREATE TABLE my_table
(
recordID NUMBER(16) NOT NULL, --not primary
dateOfSale DATE NOT NULL, --unknown
a NUMBER
)
partition by range (dateOfSale)
(
partition p0 values less than (date '2000-01-01'),
partition p1 values less than (date '2000-01-02'),
partition p2 values less than (date '2000-01-03')
);
insert into my_table
select 1,date '1999-12-31',1 from dual union all
select 2,date '2000-01-01',1 from dual union all
select 3,date '2000-01-02',1 from dual;
commit;
Package and function
create or replace package my_table_pkg is
type my_table_nt is table of my_table%rowtype;
function get_everything_but_p0 return my_table_nt pipelined;
end;
/
create or replace package body my_table_pkg is
function get_everything_but_p0 return my_table_nt pipelined is
v_sql clob;
v_results my_table_nt;
v_cursor sys_refcursor;
begin
--Build SQL that referneces all partitions.
for partitions in
(
select partition_name
from user_tab_partitions
where table_name = 'MY_TABLE'
and partition_name <> 'P0'
) loop
v_sql := v_sql || chr(10) || 'union all select * from my_table ' ||
'partition (' || partitions.partition_name || ')';
end loop;
v_sql := substr(v_sql, 12);
--Print the query for debugging:
dbms_output.put_line(v_sql);
--Gather the results in batches and pipe them out.
open v_cursor for v_sql;
loop
fetch v_cursor bulk collect into v_results limit 100;
exit when v_results.count = 0;
for i in 1 .. v_results.count loop
pipe row (v_results(i));
end loop;
end loop;
close v_cursor;
end;
end;
/
The package uses 12c's ability to define types in package specifications. If you build this in 11g or below, you'll need to create SQL types instead. This package only works for one table, but if necessary there are ways to create functions that work with any table (using Oracle data cartridge or 18c's polymorphic table functions).
Sample query
SQL> select * from table(my_table_pkg.get_everything_but_p0);
RECORDID DATEOFSAL A
---------- --------- ----------
2 01-JAN-00 1
3 02-JAN-00 1
Performance
This function should perform almost as well as the clever solution you were looking for. There will be overhead because the rows get passed through PL/SQL. But most importantly, the function builds a SQL statement that partition prunes away the large P0 partition.
One possible issue with this function is that the optimizer has no visibility inside it and can't create a good row cardinality estimate. If you use the function as part of another large SQL statement, be aware that the optimizer will blindly guess that the function returns 8168 rows. That bad cardinality guess may lead to a bad execution plan.
Related
I am trying to get unique values from multiple columns but since the datastructure is an array I can't directly do DISTINCT on all columns. I am using UNNEST() for each column and performing a UNION ALL for each column.
My idea is to create a UDF so that I can simply give the column name each time instead of performing the select every time.
I would like to replace this Query with a UDF since there are many feature columns and I need to do many UNION ALL.
SELECT DISTINCT user_log as unique_value,
'user_log' as feature
FROM `my_table`
left join UNNEST(user_Log) AS user_log
union all
SELECT DISTINCT page_name as unique_value,
'user_login_page_name' as feature
FROM `my_table`
left join UNNEST(PageName) AS page_name
order by feature;
My UDF
CREATE TEMP FUNCTION get_uniques(feature_name ARRAY<STRING>, feature STRING)
AS (
(SELECT DISTINCT feature as unique_value,
'feature' as feature
FROM `my_table`
left join UNNEST(feature_name) AS feature));
SELECT get_uniques(user_Log, log_feature);
However the UDF to select the column doesnt really work and gives the error
Scalar subquery cannot have more than one column unless using SELECT AS STRUCT to build STRUCT values; failed to parse CREATE [TEMP] FUNCTION statement at [8:1]
There is probably a better way of doing this. Appreciate your help.
By reading what are you trying to achieve, which is:
My idea is to create a UDF so that i can simply give the column name each time instead of performing the select every time.
One approach could be to use format in combination with execution immediate to create your custom query and get the desirable output.
Below example shows the function using format to return a custom query and execute immediate to retrieve the final query output from the final table. I'm using a public data set so you can also try it out on your side:
CREATE TEMP FUNCTION GetUniqueValues(table_name STRING, col_name STRING, nest_col_name STRING)
AS (format("SELECT DISTINCT %s.%s as unique_val,'%s' as featured FROM %s ", col_name,nest_col_name,col_name,table_name));
EXECUTE IMMEDIATE (
select CONCAT(
(SELECT GetUniqueValues('bigquery-public-data.github_repos.commits','Author','name'))
,' union all '
,(SELECT GetUniqueValues('bigquery-public-data.github_repos.commits','Committer','name'))
,' limit 100'))
output
Row | unique_val | featured
1 | Sergio Garcia Murillo | Committer
2 | klimek | Committer
3 | marclaporte#gmail.com | Committer
4 | acoul | Committer
5 | knghtbrd | Committer
... | ... | ...
100 | Gustavo Narea | Committer
I have an oracle SQL database where there is a column called SESID which has a DATA_TYPE of CHAR(8 BYTE). We have an index set up on this column, however when I have a look at the execution plan, we appear not to be using the index. The simple query that I would be using is
SELECT * FROM TestTable WHERE SESID = 12345
Having a look at the execution plan it is not using the index because it has to do a TO_NUMBER call on the SESID column, this prevents oracle from considering the index in the query plan.
Here is the execution plan information which details this:
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TO_NUMBER("SESID")=12345)
My question is this, is there any way to change the query so that it considers the number '12345' as a CHAR Array? My intuition told me that this might work:
SELECT * FROM TestTable WHERE SESID = '12345'
But it obviously did not... Does anybody know how I could do this
I'm using the standard OracleClient provided in .NET 4 to connect to the oracle DB and run the query.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE tbl ( SESID CHAR(8 BYTE) );
INSERT INTO tbl VALUES ( '1' );
INSERT INTO tbl VALUES ( '12' );
INSERT INTO tbl VALUES ( '123' );
INSERT INTO tbl VALUES ( '1234' );
INSERT INTO tbl VALUES ( '12345' );
INSERT INTO tbl VALUES ( '123456' );
INSERT INTO tbl VALUES ( '1234567' );
INSERT INTO tbl VALUES ( '12345678' );
Query 1:
A CHAR column will right pad the string with space characters. You can see this with the following query:
SELECT SESID, LENGTH( SESID ), LENGTH( TRIM( SESID ) )
FROM tbl
Results:
| SESID | LENGTH(SESID) | LENGTH(TRIM(SESID)) |
|----------|---------------|---------------------|
| 1 | 8 | 1 |
| 12 | 8 | 2 |
| 123 | 8 | 3 |
| 1234 | 8 | 4 |
| 12345 | 8 | 5 |
| 123456 | 8 | 6 |
| 1234567 | 8 | 7 |
| 12345678 | 8 | 8 |
Query 2:
This query explicitly converts the number to a character string:
SELECT SESID
FROM tbl
WHERE SESID = TO_CHAR( 12345 )
Results:
However, the SESID you want to match is 12345___ (where ___represents the three trailing spaces) and does not equal 12345 (without the trailing spaces) so no rows are returned.
Query 3:
Instead you can ensure that the number is padded to the correct length when it is converted to a character string:
SELECT SESID
FROM tbl
WHERE SESID = RPAD( TO_CHAR( 12345 ), 8, ' ' )
Results:
| SESID |
|----------|
| 12345 |
An Alternative Solution
Change the column definition from CHAR(8 BYTE) to VARCHAR2(8) then you can use query 2 without issues.
SELECT * FROM TestTable WHERE SESID = '12345'
or
SELECT * FROM TestTable WHERE SESID = TO_CHAR( 12345 )
Oracle is slightly odd in that it if you compare a literal to a column that requires an implicit type conversion it will always try to convert the column instead of the literal.
It's worth watching out for on every SQL statement, and adding an explicit conversion on the literal to make sure that you don't get caught out.
Having said this, having an index and a SQL statement that could use one does not guarantee that Oracle will use it - there also needs to be enough data in the table for Oracle to think and index will be of use.
If you want to find out why you need to do a LOT of reading up on the "cost based optimizer"
Aside:
If you find that you can't do a conversion on the literal for some reason (e.g. the sql is generated by a library over which you have no control) then you can create a functional index.
I.E. an index that is based on the conversion that will take place.
E.g. CREATE index test_table_index on ( TO_NUMBER( sesid ) )
This would only be possible if all data in sesid is numeric.
Which raises the point that when the column is converted it may convert more data than you intend and sometimes it is not possible.
E.g. You perform your original SELECT on a table that contains a mix of numeric and non-numeric data. Since no index exists that supports the select Oracle will need to do a full table scan. It therefore needs to look at every record in TestTable and convert sesid to a number. But it can't for the non-numeric values, and so it throws an exception even though you didn't want the record that was non-numeric.
Final Aside:
Object names in Oracle are not case sensitive (unless you use "quotes" to state that they should be) so general practice is to use identifiers with underscores instead of camel case so that they are easier to read when output by Oracle.
E.g. test_table rather than TestTable
Does the short-circuit evaluation described in the documentation for CASE and COALESCE() apply to sequences when used in SQL? This does not appear to be happening.
The Oracle documentation on CASE states that:
Oracle Database uses short-circuit evaluation. For a simple CASE expression... Oracle never evaluates a comparison_expr if a previous comparison_expr is equal to expr. For a searched CASE expression, the database... never evaluates a condition if the previous condition was true.
Similarly for COALESCE() the documentation states that:
Oracle Database uses short-circuit evaluation. The database evaluates each expr value and determines whether it is NULL, rather than evaluating all of the expr values before determining whether any of them is NULL.
When calling a sequence from SQL this does not appear to be the case; as you can see no short circuiting occurs and the sequence is incremented.
SQL> create sequence tmp_test_seq start with 1 increment by 1;
SQL> select tmp_test_seq.nextval from dual;
NEXTVAL
----------
1
SQL> select tmp_test_seq.currval from dual;
CURRVAL
----------
1
SQL> select coalesce(1, tmp_test_seq.nextval) from dual;
COALESCE(1,TMP_TEST_SEQ.NEXTVAL)
--------------------------------
1
SQL> select tmp_test_seq.currval from dual;
CURRVAL
----------
2
SQL> select case when 1 = 1 then 1 else tmp_test_seq.nextval end as s from dual;
S
----------
1
SQL> select tmp_test_seq.currval from dual;
CURRVAL
----------
3
SQL Fiddle.
However, when calling from PL/SQL the sequence is not incremented:
SQL> create sequence tmp_test_seq start with 1 increment by 1;
SQL> declare
2 i number;
3 begin
4 i := tmp_test_seq.nextval;
5 dbms_output.put_line(tmp_test_seq.currval);
6 i := coalesce(1, tmp_test_seq.nextval);
7 dbms_output.put_line(i);
8 dbms_output.put_line(tmp_test_seq.currval);
9 i := case when 1 = 1 then 1 else tmp_test_seq.nextval end;
10 dbms_output.put_line(i);
11 dbms_output.put_line(tmp_test_seq.currval);
12 end;
13 /
1
1
1
1
1
SQL> select tmp_test_seq.nextval from dual;
NEXTVAL
----------
2
Calling the sequence in SQL from PL/SQL the same results as with SQL happens:
SQL> create sequence tmp_test_seq start with 1 increment by 1;
SQL> declare
2 i number;
3 begin
4 select tmp_test_seq.nextval into i from dual;
5 dbms_output.put_line(tmp_test_seq.currval);
6 select coalesce(1, tmp_test_seq.nextval) into i from dual;
7 dbms_output.put_line(i);
8 dbms_output.put_line(tmp_test_seq.currval);
9 select case when 1 = 1 then 1 else tmp_test_seq.nextval end into i
10 from dual;
11 dbms_output.put_line(i);
12 dbms_output.put_line(tmp_test_seq.currval);
13 end;
14 /
1
1
2
1
3
There doesn't seem to be anything in the documentation about this; the Administrator's guide for managing sequences, the SQL language reference on sequence psuedocolumns, the PL/SQL language reference on CURRVAL and NEXTVAL or the database concepts overview of sequences.
Does the short-circuit evaluation of CASE and COALESCE() occur for sequences when used in SQL? Is this documented?
We're on 11.2.0.3.5 if it's of interest.
For PL/SQL Oracle assures that it will use short-circuit evaluation:
When evaluating a logical expression, PL/SQL uses short-circuit
evaluation. That is, PL/SQL stops evaluating the expression as soon as
it can determine the result. Therefore, you can write expressions that
might otherwise cause errors.
From: 2 PL/SQL Language Fundamentals
When you use the nextval in SQL code, we have a different situation.
First of all we have to keep in mind that currval and nextval are pseudocolumns:
A pseudocolumn behaves like a table column, but is not actually stored
in the table. You can select from pseudocolumns, but you cannot
insert, update, or delete their values. A pseudocolumn is also similar
to a function without arguments (please refer to Chapter 5,
"Functions". However, functions without arguments typically return the
same value for every row in the result set, whereas pseudocolumns
typically return a different value for each row.
From: 3 Pseudocolumns.
The question now is: why Oracle evaluate nextval? or Is this behaviour stated somewhere?
Within a single SQL statement containing a reference to NEXTVAL,
Oracle increments the sequence once:
For each row returned by the outer query block of a SELECT statement. Such a query block can appear in the following places:
A top-level SELECT statement
An INSERT ... SELECT statement (either single-table or multitable). For a multitable insert, the reference to NEXTVAL must
appear in the VALUES clause, and the sequence is updated once for
each row returned by the subquery, even though NEXTVAL may be
referenced in multiple branches of the multitable insert.
A CREATE TABLE ... AS SELECT statement
A CREATE MATERIALIZED VIEW ... AS SELECT statement
For each row updated in an UPDATE statement
For each INSERT statement containing a VALUES clause
For each row merged by a MERGE statement. The reference to NEXTVAL can appear in the merge_insert_clause or the merge_update_clause or
both. The NEXTVALUE value is incremented for each row updated and for
each row inserted, even if the sequence number is not actually used in
the update or insert operation. If NEXTVAL is specified more than once
in any of these locations, then the sequence is incremented once for
each row and returns the same value for all occurrences of NEXTVAL for
that row.
From: Sequence Pseudocolumns
Your case is clearly "1. A top-level SELECT statement", but it doesn't mean that the short-circuit logic is not in place, but only that nextval is always evaluated.
If you are interested to the short-circuit logic, then it's better to remove the nextval from the equation.
A query like this doesn't evaluate the subquery:
select 6 c
from dual
where 'a' = 'a' or 'a' = (select dummy from dual)
But if try to do something similar with coalesce or case we will see that the Oracle Optimizer decides to execute the subqueries:
select 6 c
from dual
where 'a' = coalesce('a', (select dummy from dual) )
I created annotated tests in this demo in SQLFiddle to show this.
It looks like Oracle applies the short-circuit logic only if with OR condition, but with coalesce and case it has to evaluate all branches.
I think your first tests in PL/SQL shows that coalsce and case use a short-circuit logic in PL/SQL, as Oracle states. Your second test, including the sequence in SQL statements, shows that in that case the nextval is evaluated anyway, even if the result is not used, and Oracle also documents that.
Putting together the two things looks a bit odd, because coalesce and case behaviour seems to be really inconsistent too me too, but we have also to keep in mind that the implementation of that logic is implementation dependent (here my source)
Explanation of why the short-circuit evaluation does not apply to sequences might be the following. What is a sequence? Putting internals aside, it's a combination of sequence definition(record in seq$ data dictionary table) and some internal SGA component, it's not a function and might be considered, although the documentation does not state it directly(but execution plan does) as row source. And every time a sequence is being referenced directly in the select list of a query, it has to be evaluated by the optimizer when it searches for optimal execution plan. During the process of forming an optimal execution plan a sequence gets incremented if nextval pseudocolumn is referenced:
SQL> create sequence seq1;
Sequence created
Here is our sequence:
SQL> select o.obj#
2 , o.name
3 , s.increment$
4 , s.minvalue
5 , s.maxvalue
6 , s.cache
7 from sys.seq$ s
8 join sys.obj$ o
9 on (o.obj# = s.obj#)
10 where o.name = 'SEQ1'
11 ;
OBJ# NAME INCREMENT$ MINVALUE MAXVALUE CACHE
---------- ------- ---------- ---------- ---------- ----------
94442 SEQ1 1 1 1E28 20
Lets trace below query, and also take a look at its execution plan
SQL> ALTER SESSION SET EVENTS '10046 trace name context forever, level 4';
Session altered
SQL> select case
2 when 1 = 1 then 1
3 when 2 = 1 then seq1.nextval
4 end as res
5 from dual;
RES
----------
1
/* sequence got incremented by 1 */
SQL> select seq1.currval from dual;
CURRVAL
----------
3
Trace file information:
STAT #1016171528 id=1 cnt=1 pid=0 pos=1 obj=94442 op='SEQUENCE SEQ1 ...
STAT #1016171528 id=2 cnt=1 pid=1 pos=1 obj=0 op='FAST DUAL ...
CLOSE #1016171528:c=0,e=12,dep=0,type=0,tim=12896600071500 /* close the cursor */
The execution plan will show us basically the same:
SQL> explain plan for select case
2 when 1 = 1 then 1
3 else seq1.nextval
4 end
5 from dual
6 /
Explained
Executed in 0 seconds
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
---------------------------------------------------------------
Plan hash value: 51561390
-----------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 (0)| 00:00:01 |
| 1 | SEQUENCE | SEQ1 | | | |
| 2 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------
9 rows selected
Executed in 0.172 seconds
In terms of evaluation, referencing a sequence directly in a query, roughly the same as including a correlated sub-query. That correlated sub-query will always be evaluated by the optimizer:
SQL> explain plan for select case
2 when 1 = 1 then 1
3 when 2 = 1 then (select 1
4 from dual)
5 end as res
6 from dual;
Explained
Executed in 0 seconds
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------
Plan hash value: 1317351201
-----------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 (0)| 00:00:01 |
| 1 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
| 2 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------
9 rows selected
Executed in 0.063 seconds
We can see that dual table has been included in the execution plan twice.
The analogy with a sub-query was made in a rush. There are more differences than similarities, of course. Sequences are absolutely different mechanisms. But, a sequences are viewed by the optimizer as a row source, and as long as it doesn't see the nextval pseudocolumn of a sequence being directly referenced in the select list of a top-level query, it won't evaluate the sequence, otherwise sequence will be incremented, whether a short-circuit evaluation logic is being used or not. PL/SQL engine,obviously, (starting from Oracle 11g r1) has a different way to access a sequence value. Should be noted that in previous 11gR1 versions of RDBMS we should write a query to reference a sequence in PL/SQL block, which PL/SQL engine sent directly to the SQL engine.
The answer to the "why a sequence gets incremented during generating an execution plan by the optimizer" question, lies in the internal implementation of sequences.
i spent almost a day on it now and it seems like i am doing something wrong.
ok , here is the relation:
document_urls( doc_id , url_id)
what i want to do is to build a sorte of graph that will show all the children that has been generated from a document through on of his urls.
example
select * from document_urls where doc_id=1
doc_id url_id
1 2
1 3
if i select all the document with url_id=3 or 2 i will find
select * from document_urls where url_id=2 or url_id=3
doc_id url_id
1 2
1 3
2 3
now i do the same exercise with document 2 since we covered all links of document 1 and so forth.
here is my recursive query now
WITH RECURSIVE generate_links(document_id,url_id) as(
select document_id,url_id from document_urls where document_id=1
UNION ALL
select du.document_id,du.url_id from generate_links gl,document_urls du
where gl.url_id=du.url_id
)
SELECT * FROM generate_links GROUP BY url_id,document_id limit 10;
I take it you want to move your where document_id=1 into the lower part of the query.
Be wary about doing so, however, because a recursive query does not inject the constraint into the with statement. In other words, it'll actually seq scan your whole table, recursively build every possibility and filter out those you need.
You'll be better off with an sql function in practice, i.e. something like this:
create or replace function gen_links(int) returns table (doc_id int, doc_url text) as $$
WITH RECURSIVE generate_links(document_id,url_id) as(
select document_id,url_id from document_urls where document_id=$1
UNION ALL
select du.document_id,du.url_id from generate_links gl,document_urls du
where gl.url_id=du.url_id
)
SELECT * FROM generate_links GROUP BY url_id,document_id;
$$ language sql stable;
Say I have a sample table:
id_pk value
------------
1 a
2 b
3 c
And I have a sample PL/SQL block, which has a query that currently selects a single row into an array:
declare
type t_table is table of myTable%rowtype;
n_RequiredId myTable.id_pk%type := 1;
t_Output t_table := t_table();
begin
select m.id_pk, m.value
bulk collect into t_Output
from myTable m
where m.id_pk = n_RequiredId;
end;
What I need to do is to implement an ability to select a single row into an array, as shown in the block above, OR to select all rows into an array, if n_RequiredID, which is actually a user-input parameter, is set to null.
And, the question is, what's the best practice to handle such situation?
I can think of modifying where clause of my query to something like this:
where m.id_pk = nvl(n_RequiredId, m.id_pk);
but I suppose that's going to slow down the query if the parameter won't be null, and I remember Kyte said something really bad about this approach.
I can also think of implementing the following PL/SQL logic:
if n_RequiredId is null then
select m.id_pk, m.value bulk collect into t_Output from myTable m;
else
select m.id_pk, m.value bulk collect
into t_Output
from myTable m
where m.id_pk = n_RequiredId;
end if;
But would become too complex if I encounter more than one parameter of this kind.
What would you advice me?
Yes, using any of the following:
WHERE m.id_pk = NVL(n_RequiredId, m.id_pk);
WHERE m.id_pk = COALESCE(n_RequiredId, m.id_pk);
WHERE (n_RequiredId IS NULL OR m.id_pk = n_RequiredId);
...are not sargable. They will work, but perform the worst of the available options.
If you only have one parameter, the IF/ELSE and separate, tailored statements are a better alternative.
The next option after that is dynamic SQL. But coding dynamic SQL is useless if you carry over the non-sargable predicates in the first example. Dynamic SQL allows you to tailor the query while accommodating numerous paths. But it also risks SQL injection, so it should be performed behind parameterized queries (preferably within stored procedures/functions in packages.
OMG_Ponies' and Rob van Wijk's answers are entirely correct, this is just supplemental.
There's a nice trick to make it easy to use bind variables and still use dynamic SQL. If you put all of the binds in a with clause at the beginning, you can always bind the same set of variables, whether or not you're going to use them.
For instance, say you have three parameters, representing a date range and an ID. If you want to just search on the ID, you could put the query together like this:
with parameters as (
select :start_date as start_date,
:end_date as end_date,
:search_id as search_id
from dual)
select *
from your_table
inner join parameters
on parameters.search_id = your_table.id;
On the other hand, if you need to search on the ID and date range, it could look like this:
with parameters as (
select :start_date as start_date,
:end_date as end_date,
:search_id as search_id
from dual)
select *
from your_table
inner join parameters
on parameters.search_id = your_table.id
and your_table.create_date between (parameters.start_date
and parameters.end_date);
This may seem like an round-about way of handling this, but the end result is that no matter how you complicated your dynamic SQL gets, as long as it only needs those three parameters, the PL/SQL call is always something like:
execute immediate v_SQL using v_start_date, v_end_date, v_search_id;
In my experience it's better to make the SQL construction slightly more complicated in order to ensure that there's only one line where it actually gets executed.
The NVL approach will usually work fine. The optimizer recognizes this pattern and will build a dynamic plan. The plan uses an index for a single value and a full table scan for a NULL.
Sample table and data
drop table myTable;
create table myTable(
id_pk number,
value varchar2(100),
constraint myTable_pk primary key (id_pk)
);
insert into myTable select level, level from dual connect by level <= 100000;
commit;
Execute with different predicates
--Execute predicates that return one row if the ID is set, or all rows if ID is null.
declare
type t_table is table of myTable%rowtype;
n_RequiredId myTable.id_pk%type := 1;
t_Output t_table := t_table();
begin
select /*+ SO_QUERY_1 */ m.id_pk, m.value
bulk collect into t_Output
from myTable m
where m.id_pk = nvl(n_RequiredId, m.id_pk);
select /*+ SO_QUERY_2 */ m.id_pk, m.value
bulk collect into t_Output
from myTable m
where m.id_pk = COALESCE(n_RequiredId, m.id_pk);
select /*+ SO_QUERY_3 */ m.id_pk, m.value
bulk collect into t_Output
from myTable m
where (n_RequiredId IS NULL OR m.id_pk = n_RequiredId);
end;
/
Get execution plans
select sql_id, child_number
from gv$sql
where lower(sql_text) like '%so_query_%'
and sql_text not like '%QUINE%'
and sql_text not like 'declare%';
select * from table(dbms_xplan.display_cursor(sql_id => '76ucq3bkgt0qa', cursor_child_no => 1, format => 'basic'));
select * from table(dbms_xplan.display_cursor(sql_id => '4vxf8yy5xd6qv', cursor_child_no => 1, format => 'basic'));
select * from table(dbms_xplan.display_cursor(sql_id => '457ypz0jpk3np', cursor_child_no => 1, format => 'basic'));
Bad plans for COALESCE and IS NULL OR
EXPLAINED SQL STATEMENT:
------------------------
SELECT /*+ SO_QUERY_2 */ M.ID_PK, M.VALUE FROM MYTABLE M WHERE M.ID_PK
= COALESCE(:B1 , M.ID_PK)
Plan hash value: 1229213413
-------------------------------------
| Id | Operation | Name |
-------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| MYTABLE |
-------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
SELECT /*+ SO_QUERY_3 */ M.ID_PK, M.VALUE FROM MYTABLE M WHERE (:B1 IS
NULL OR M.ID_PK = :B1 )
Plan hash value: 1229213413
-------------------------------------
| Id | Operation | Name |
-------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| MYTABLE |
-------------------------------------
Good plan for NVL
The FILTER operations allow the optimizer to choose a different plan at run time, depending on the input values.
EXPLAINED SQL STATEMENT:
------------------------
SELECT /*+ SO_QUERY_1 */ M.ID_PK, M.VALUE FROM MYTABLE M WHERE M.ID_PK
= NVL(:B1 , M.ID_PK)
Plan hash value: 730481884
----------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | CONCATENATION | |
| 2 | FILTER | |
| 3 | TABLE ACCESS FULL | MYTABLE |
| 4 | FILTER | |
| 5 | TABLE ACCESS BY INDEX ROWID| MYTABLE |
| 6 | INDEX UNIQUE SCAN | MYTABLE_PK |
----------------------------------------------------
Warnings
FILTER operations and this NVL trick are not well documented. I'm not sure what version introduced these features but it works with 11g. I've had problems getting the FILTER to work correctly with some complicated queries, but for simple queries like these it is reliable.