Why is Oracle SQL Optimizer ignoring index predicate for this view? - sql

I'm trying to optimize a set of stored procs which are going against many tables including this view. The view is as such:
We have TBL_A (id, hist_date, hist_type, other_columns) with two types of rows: hist_type 'O' vs. hist_type 'N'. The view self joins table A to itself and transposes the N rows against the corresponding O rows. If no N row exists for the O row, the O row values are repeated. Like so:
CREATE OR REPLACE FORCE VIEW V_A (id, hist_date, hist_type, other_columns_o, other_columns_n)
select
o.id, o.hist_date, o.hist_type,
o.other_columns as other_columns_o,
case when n.id is not null then n.other_columns else o.other_columns end as other_columns_n
from
TBL_A o left outer join TBL_A n
on o.id=n.id and o.hist_date=n.hist_date and n.hist_type = 'N'
where o.hist_type = 'O';
TBL_A has a unique index on: (id, hist_date, hist_type). It also has a unique index on: (hist_date, id, hist_type) and this is the primary key.
The following query is at issue (in a stored proc, with x declared as TYPE_TABLE_OF_NUMBER):
select b.id BULK COLLECT into x from TBL_B b where b.parent_id = input_id;
select v.id from v_a v
where v.id in (select column_value from table(x))
and v.hist_date = input_date
and v.status_new = 'CLOSED';
This query ignores the index on id column when accessing TBL_A and instead does a range scan using the date to pick up all the rows for the date. Then it filters that set using the values from the array. However if I simply give the list of ids as a list of numbers the optimizer uses the index just fine:
select v.id from v_a v
where v.id in (123, 234, 345, 456, 567, 678, 789)
and v.hist_date = input_date
and v.status_new = 'CLOSED';
The problem also doesn't exist when going against TBL_A directly (and I have a workaround that does that, but it's not ideal.).Is there a way to get the optimizer to first retrieve the array values and use them as predicates when accessing the table? Or a good way to restructure the view to achieve this?

Oracle does not use the index because it assumes select column_value from table(x) returns 8168 rows.
Indexes are faster for retrieving small amounts of data. At some point it's faster to scan the whole table than repeatedly walk the index tree.
Estimating the cardinality of a regular SQL statement is difficult enough. Creating an accurate estimate for procedural code is almost impossible. But I don't know where they came up with 8168. Table functions are normally used with pipelined functions in data warehouses, a sorta-large number makes sense.
Dynamic sampling can generate a more accurate estimate and likely generate a plan that will use the index.
Here's an example of a bad cardinality estimate:
create or replace type type_table_of_number as table of number;
explain plan for
select * from table(type_table_of_number(1,2,3,4,5,6,7));
select * from table(dbms_xplan.display(format => '-cost -bytes'));
Plan hash value: 1748000095
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 00:00:01 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 8168 | 00:00:01 |
-------------------------------------------------------------------------
Here's how to fix it:
explain plan for select /*+ dynamic_sampling(2) */ *
from table(type_table_of_number(1,2,3,4,5,6,7));
select * from table(dbms_xplan.display(format => '-cost -bytes'));
Plan hash value: 1748000095
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 00:00:01 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 7 | 00:00:01 |
-------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)

Related

Self-Join with Natural Join

What is the difference between
select * from degreeprogram NATURAL JOIN degreeprogram ;
and
select * from degreeprogram d1 NATURAL JOIN degreeprogram d2;
in oracle?
I expected that they return the same result set, however, they do not. The second query does what I expect: it joins the two relations using the same named attributes and so it returns the same tuples as stored in degreeprogram. However, the first query is confusing for me: here, each tuple occurs several times in the result set-> what join condition is used here?
Thank you
NATURAL JOIN means join the two tables based on all columns having the same name in both tables.
I imagine that for each column in your table, Oracle is internally writing a condition like:
degreeprogram.column1 = degreeprogram.column1
(which you would not be able to write yourself due to ORA-00918 column ambiguously defined error)
And then, I imagine, Oracle is optimizing that away to just
degreeprogram.column1 is not null
So, you're not exactly getting a CROSS JOIN of your table with itself -- only a CROSS JOIN of those rows having no null columns.
UPDATE: Since this was the selected answer, I will just add from Thorsten Kettner's answer that this behavior is probably a bug on Oracle's part. In 18c, Oracle behaves properly and returns an ORA-00918 error when you try to NATURAL JOIN a table to itself.
The difference between those two statements is that the second explicitly defines a self join on the table, where the first statement, the optimizer is trying to figure out what you really want. On my database, the first statement performs a cartesian merge join and is not optimized at all, and the second statement has a better explain plan, using a single full table access with index scanning.
I'd call this a bug. This query:
select * from degreeprogram d1 NATURAL JOIN degreeprogram d2;
translates to
select col1, col2, ... -- all columns
from degreeprogram d1
join degreeprogram d2 using (col1, col2, ...)
and gives you all rows from the table where all columns are not null (because using(col) never matches nulls).
This query, however:
select * from degreeprogram NATURAL JOIN degreeprogram;
is invalid according to standard SQL, because every table must have a unique name or alias in a query. Oracle lets this pass, but doing so it should do something still to keep the table instances apart (e.g. create internally an alias for them). It obviously doesn't and multiplies the result with the number of rows in the table. A bug.
A so-called natural join instructs the database to
Find all column names common to both tables (in this case, degreeprogram and degreeprogram, which of course have the same columns.)
Generate a join condition for each pair of matching column names, in the form table1.column1 = table2.column1 (in this case, there will be one for every column in degreeprogram.)
Therefore a query like this
select count(*) from demo natural join demo;
will be transformed into
select count(*) from demo, demo where demo.x = demo.x;
I checked this by creating a table with one column and two rows:
create table demo (x integer);
insert into demo values (1);
insert into demo values (2);
commit;
and then tracing the session:
SQL> alter session set tracefile_identifier='demo_trace';
Session altered.
SQL> alter session set events 'trace [SQL_Compiler.*]';
Session altered.
SQL> select /* nj test */ count(*) from demo natural join demo;
COUNT(*)
----------
4
1 row selected.
SQL> alter session set events 'trace [SQL_Compiler.*] off';
Session altered.
Then in twelve_ora_6196_demo_trace.trc I found this line:
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT COUNT(*) "COUNT(*)" FROM "WILLIAM"."DEMO" "DEMO","WILLIAM"."DEMO" "DEMO" WHERE "DEMO"."X"="DEMO"."X"
and a few lines later:
try to generate single-table filter predicates from ORs for query block SEL$58A6D7F6 (#0)
finally: "DEMO"."X" IS NOT NULL
(This is merely an optimisation on top of the generated query above, as column X is nullable but the join allows the optimiser to infer that only non-null values are required. It doesn't replace the joins.)
Hence the execution plan:
-----------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------+-----------------------------------+
| 0 | SELECT STATEMENT | | | | 7 | |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | MERGE JOIN CARTESIAN | | 4 | 52 | 7 | 00:00:01 |
| 3 | TABLE ACCESS FULL | DEMO | 2 | 26 | 3 | 00:00:01 |
| 4 | BUFFER SORT | | 2 | | 4 | 00:00:01 |
| 5 | TABLE ACCESS FULL | DEMO | 2 | | 2 | 00:00:01 |
-----------------------------------------+-----------------------------------+
Query Block Name / Object Alias(identified by operation id):
------------------------------------------------------------
1 - SEL$58A6D7F6
3 - SEL$58A6D7F6 / DEMO_0001#SEL$1
5 - SEL$58A6D7F6 / DEMO_0002#SEL$1
------------------------------------------------------------
Predicate Information:
----------------------
3 - filter("DEMO"."X" IS NOT NULL)
Alternatively, let's see what dbms_utility.expand_sql_text does with it. I'm not quite sure what to make of this given the trace file above, but it shows a similar expansion taking place:
SQL> var result varchar2(1000)
SQL> exec dbms_utility.expand_sql_text('select count(*) from demo natural join demo', :result)
PL/SQL procedure successfully completed.
RESULT
----------------------------------------------------------------------------------------------------------------------------------
SELECT COUNT(*) "COUNT(*)" FROM (SELECT "A2"."X" "X" FROM "WILLIAM"."DEMO" "A3","WILLIAM"."DEMO" "A2" WHERE "A2"."X"="A2"."X") "A1"
Lesson: NATURAL JOIN is evil. Everybody knows this.

Improve join query in Oracle

I have a query which takes 17 seconds to execute. I have applied indexes on FIPS, STR_DT, END_DT but still it's taking time. Any suggestions on how I can improve the performance?
My query:
SELECT /*+ALL_ROWS*/ K_LF_SVA_VA.NEXTVAL VAL_REC_ID, a.REC_ID,
b.VID,
1 VA_SEQ,
51 VA_VALUE_DATATYPE,
b.VALUE VAL_NUM,
SYSDATE CREATED_DATE,
SYSDATE UPDATED_DATE
FROM CTY_REC a JOIN FIPS_CONS b
ON a.FIPS=b.FIPS AND a.STR_DT=b.STR_DT AND a.END_DT=b.END_DT;
DESC CTY_REC;
Name Null Type
------------------- ---- -------------
REC_ID NUMBER(38)
DATA_SOURCE_DATE DATE
STR_DT DATE
END_DT DATE
VID_RECSET_ID NUMBER
VID_VALSET_ID NUMBER
FIPS VARCHAR2(255)
DESC FIPS_CONS;
Name Null Type
------------- -------- -------------
STR_DT DATE
END_DT DATE
FIPS VARCHAR2(255)
VARIABLE VARCHAR2(515)
VALUE NUMBER
VID NOT NULL NUMBER
Explain Plan:
Plan hash value: 919279614
--------------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SEQUENCE | K_VAL |
| 2 | HASH JOIN | |
| 3 | TABLE ACCESS FULL| CTY_REC |
| 4 | TABLE ACCESS FULL| FIPS_CONS |
--------------------------------------------------------------
I have added description of tables and explain plan for my query.
On the face of it, and without information on the configuration of the sequence you're using, the number of rows in each table, and the total number of rows projected from the query, it's possible that the execution plan you have is the most efficient one for returning all rows.
The optimiser clearly thinks that the indexes will not benefit performance, and this is often more likely when you optimise for all rows, not first rows. Index-based access is single block and one row at a time, so can be inherently slower than multiblock full scans on a per-block basis.
The hash join that Oracle is using is an extremely efficient way of joining data sets. Unless the hashed table is so large that it spills to disk, the total cost is only slightly more than full scans of the two tables. We need more detailed statistics on the execution to be able to tell if the hashed table is spilling to disk, and if it is the solution may just be modified memory management, not indexes.
What might also hold up your SQL execution is calling that sequence, if the sequence's cache value is very low and the number of records is high. More info required on that -- if you need to generate a sequential identifier for each row then you could use ROWNUM.
This is basically your query:
SELECT . . .
FROM CTY_REC a JOIN
FIPS_CONS b
ON a.FIPS = b.FIPS AND a.STR_DT = b.STR_DT AND a.END_DT = b.END_DT;
You want a composite index on (FIPS, STR_DT, END_DT), perhaps on both tables:
create index idx_cty_rec_3 on cty_rec(FIPS, STR_DT, END_DT);
create index idx_fipx_con_3 on cty_rec(FIPS, STR_DT, END_DT);
Actually, only one is probably necessary but having both gives the optimizer more choices for improving the query.
You should have at least these two indexes on the table:
CTY_REC(FIPS, STR_DT, END_DT)
FIPS_CONS(FIPS, STR_DT, END_DT)
which can still be sped up with covering indexes instead:
CTY_REC(FIPS, STR_DT, END_DT, REC_ID)
FIPS_CONS(FIPS, STR_DT, END_DT, VALUE, VID)
If you wish to drive the optimizer to use the indexes,
replace /*+ all_rows */ with /*+ first_rows */

Why do WHERE and HAVING exist as separate clauses in SQL?

I understand the distinction between WHERE and HAVING in a SQL query, but I don't see why they are separate clauses. Couldn't they be combined into a single clause that could handle both aggregated and non-aggregated data?
Here's the rule. If a condition refers to an aggregate function, put that condition in the HAVING clause. Otherwise, use the WHERE clause.
Here's another rule: You can't use HAVING unless you also use GROUP BY.
The main difference is that WHERE cannot be used on grouped item (such as SUM(number)) whereas HAVING can.The reason is the WHERE is done before the grouping and HAVING is done after the grouping is done.
ANOTHER DIFFERENCE IS WHERE clause requires a condition to be a column in a table, but HAVING clause can use both column and alias.
Here's the difference:
SELECT `value` v FROM `table` WHERE `v`>5;
Error #1054 - Unknown column 'v' in 'where clause'
SELECT `value` v FROM `table` HAVING `v`>5; -- Get 5 rows
WHERE clause requires a condition to be a column in a table, but HAVING clause can use both column and alias.
This is because WHERE clause filters data before select, but HAVING clause filters data after select.
So put the conditions in WHERE clause will be more effective if you have many many rows in a table.
Try EXPLAIN to see the key difference:
EXPLAIN SELECT `value` v FROM `table` WHERE `value`>5;
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| 1 | SIMPLE | table | range | value | value | 4 | NULL | 5 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
EXPLAIN SELECT `value` v FROM `table` having `value`>5;
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
| 1 | SIMPLE | table | index | NULL | value | 4 | NULL | 10 | Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
You can see either WHERE or HAVING uses index, but the rows are different.
So there is a need of both of them especially when we need grouping and additional filters.
This question seems to illustrate a misunderstanding that WHERE and HAVING are both missing up to 1/2 of the information necessary to fully process a query.
Consider the following SQL:
drop table if exists foo; create table foo (
ID int,
bar int
); insert into foo values (1, 1);
select now() as d, bar as b
from foo
where bar = 1 and d <= now()
having bar = 1 and ID = 1
;
In the where clause, d is not available because the selected items have not been processed to create it yet.
In the having clause ID has been discarded because it was not selected. In aggregate queries ID may not even have meaning in context of multiple rows combined into one. ID may also be meaningless when joining different tables into a single result.
Could it be done? Sure, but on the back-end it'd do the same as it does now, because you have to aggregate something before you can filter based on that aggregation. Ultimately that's the reason, it's a logical separation of different processes. Why waste resources aggregating records you could have filtered with a WHERE?
The question could only be fully answered by the designer since it asks intent. But the implication is that both clauses do the same thing only against aggregated vs. non-aggregated data. That's not true. "The HAVING clause is typically used together with the GROUP BY clause to filter the results of aggregate values. However, HAVING can be specified without GROUP BY."
As I understand it, the important thing is that "The HAVING clause specifies additional filters that are applied after the WHERE clause filters."
http://technet.microsoft.com/en-us/library/ms179270(v=sql.105).aspx

How can I optimize this query?

I have the following query:
SELECT `masters_tp`.*, `masters_cp`.`cp` as cp, `masters_cp`.`punti` as punti
FROM (`masters_tp`)
LEFT JOIN `masters_cp` ON `masters_cp`.`nickname` = `masters_tp`.`nickname`
WHERE `masters_tp`.`stake` = 'report_A'
AND `masters_cp`.`stake` = 'report_A'
ORDER BY `masters_tp`.`tp` DESC, `masters_cp`.`punti` DESC
LIMIT 400;
Is there something wrong with this query that could affect the server memory?
Here is the output of EXPLAIN
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------------------------------------+
| 1 | SIMPLE | masters_cp | ALL | NULL | NULL | NULL | NULL | 8943 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | masters_tp | ALL | NULL | NULL | NULL | NULL | 12693 | Using where |
Run the same query prefixed with EXPLAIN and add the output to your question - this will show what indexes you are using and the number of rows being analyzed.
You can see from your explain that no indexes are being used, and its having to look at thousands of rows to get your result. Try adding an index on the columns used to perform the join, e.g. nickname and stake:
ALTER TABLE masters_tp ADD INDEX(nickname),ADD INDEX(stake);
ALTER TABLE masters_cp ADD INDEX(nickname),ADD INDEX(stake);
(I've assumed the columns might have duplicated values, if not, use UNIQUE rather than INDEX). See the MySQL manual for more information.
Replace the "masters_tp.* " bit by explicitly naming only the fields from that table you actually need. Even if you need them all, name them all.
There's actually no reason to do a left join here. You're using your filters to whisk away any leftiness of the join. Try this:
SELECT
`masters_tp`.*,
`masters_cp`.`cp` as cp,
`masters_cp`.`punti` as punti
FROM
`masters_tp`
INNER JOIN `masters_cp` ON
`masters_tp`.`stake` = `masters_cp`.stake`
and `masters_tp`.`nickname` = `masters_cp`.`nickname`
WHERE
`masters_tp`.`stake` = 'report_A'
ORDER BY
`masters_tp`.`tp` DESC,
`masters_cp`.`punti` DESC
LIMIT 400;
inner joins tend to be faster than left joins. The query can limit the number of rows that have to be joined using the predicates (aka the where clause). This means that the database is handling, potentially, a lot less rows, which obviously speeds things up.
Additionally, make sure you have a non-clustered index on stake and nickname (in that order).
It is simple query. I think everything is ok with it. You can try add indexes on 'stake' fields or make limit lower.

How to use a function-based index on a column that contains NULLs in Oracle 10+?

Lets just say you have a table in Oracle:
CREATE TABLE person (
id NUMBER PRIMARY KEY,
given_names VARCHAR2(50),
surname VARCHAR2(50)
);
with these function-based indices:
CREATE INDEX idx_person_upper_given_names ON person (UPPER(given_names));
CREATE INDEX idx_person_upper_last_name ON person (UPPER(last_name));
Now, given_names has no NULL values but for argument's sake last_name does. If I do this:
SELECT * FROM person WHERE UPPER(given_names) LIKE 'P%'
the explain plan tells me its using the index but change it to:
SELECT * FROM person WHERE UPPER(last_name) LIKE 'P%'
it doesn't. The Oracle docs say that to use the function-based index will only be used when several conditions are met, one of which is ensuring there are no NULL values since they aren't indexed.
I've tried these queries:
SELECT * FROM person WHERE UPPER(last_name) LIKE 'P%' AND UPPER(last_name) IS NOT NULL
and
SELECT * FROM person WHERE UPPER(last_name) LIKE 'P%' AND last_name IS NOT NULL
In the latter case I even added an index on last_name but no matter what I try it uses a full table scan. Assuming I can't get rid of the NULL values, how do I get this query to use the index on UPPER(last_name)?
The index can be used, though the optimiser may have chosen not to use it for your particular example:
SQL> create table my_objects
2 as select object_id, object_name
3 from all_objects;
Table created.
SQL> select count(*) from my_objects;
2 /
COUNT(*)
----------
83783
SQL> alter table my_objects modify object_name null;
Table altered.
SQL> update my_objects
2 set object_name=null
3 where object_name like 'T%';
1305 rows updated.
SQL> create index my_objects_name on my_objects (lower(object_name));
Index created.
SQL> set autotrace traceonly
SQL> select * from my_objects
2 where lower(object_name) like 'emp%';
29 rows selected.
Execution Plan
----------------------------------------------------------
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 17 | 510 | 355 (1)|
| 1 | TABLE ACCESS BY INDEX ROWID| MY_OBJECTS | 17 | 510 | 355 (1)|
|* 2 | INDEX RANGE SCAN | MY_OBJECTS_NAME | 671 | | 6 (0)|
------------------------------------------------------------------------------------
The documentation you read was presumably pointing out that, just like any other index, all-null keys are not stored in the index.
In your example you've created the same index twice - this would give an error so I'm assuming that was a mistake in pasting, not the actual code you tried.
I tried it with
CREATE INDEX idx_person_upper_surname ON person (UPPER(surname));
SELECT * FROM person WHERE UPPER(surname) LIKE 'P%';
and it produced the expected query plan:
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1 Card=1 Bytes=67)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'PERSON' (TABLE) (Cost=1
Card=1 Bytes=67)
2 1 INDEX (RANGE SCAN) OF 'IDX_PERSON_UPPER_SURNAME' (INDEX)
(Cost=1 Card=1)
To answer your question, yes it should work. Try double checking that you do have the second index created correctly.
Also try an explicit hint:
SELECT /*+INDEX(PERSON IDX_PERSON_UPPER_SURNAME)*/ *
FROM person
WHERE UPPER(surname) LIKE 'P%';
If that works, but only with the hint, then it is likely related to CBO statistics gone wrong, or CBO related init parameters.
Are you sure you want the index to be used? Full table scans are not bad. Depending on the size of the table, it might be more efficient to do a table scan than use an index. It also depends on the density and distribution of the data, which is why statistics are gathered. The cost based optimizer can usually be trusted to make the right choice. Unless you have a specific performance problem, I wouldn't worry too much about it.
Oracle will still use a function-based indexes with columns that contain null - I think you misinterpreted the documentation.
You need to put a nvl in the function index if you want to check for this though.
Something like...
create index idx_person_upper_surname on person (nvl(upper(surname),'N/A'));
You can then query using the index with
select * from person where nvl(upper(surname),'N/A') = 'PIERPOINT'
Although, all a bit ugly. Since most people have surnames, perhaps a "not null" is appropriate :-).
You can circumvent the problem of null values being unindexed in this or other situations by also indexing based on a literal value:
CREATE INDEX idx_person_upper_surname ON person (UPPER(surname),0);
This allows you to use the index for such queries as:
Select *
From person
Where UPPER(surname) is null;
This query would normally not usa an index, except for bitmap indexes or indexes including a nonnullable real column other than surname.