Oracle picks wrong index when joining table

Oracle picks wrong index when joining table - sql

I have a long query, but similar to the short version here:
select * from table_a a
left join table_b b on
b.id = a.id and
b.name = 'CONSTANT';
There are 2 indexes on table_b for both id and name, idx_id has fewer duplicates and idx_name has a lot of duplicates. This is quite a large table (20M+ records). And the join is taking 10min+.
A simple explain plan shows a lot of memory uses on the join part, and it shows it uses the index for name as opposed to id.
How to solve this issue? How to force using the idx_id index?
I was thinking of putting b.name='CONSTANT' to where clause, but this is a left join and where will remove all the record that exists in table_a.
Updated explain plan. Sorry cannot paste the whole plan.
Explain plan with b.name='CONSTANT':
Explain plan when commenting b.name clause:

Add an optimizer hint to your query.
Without knowing your 'long' query, it's difficult to know if Oracle is using the wrong one, or if your interpretation that indexb < indexa so therefore must be quicker for query z is correct.
To add a hint the syntax is
select /*+ index(table_name index_name) */ * from ....;

What is the size of TABLE_A relative to TABLE_B? It wouldn't make sense to use the ID index unless TABLE_A had significantly less rows than TABLE_B.
Index range scans are generally only useful when they access a small percentage of the rows in a table. Oracle reads the index one-block-at-a-time, and then still has to pull the relevant row from the table. If the index isn't very selective, that process can be slower than a multi-block full table scan.
Also, it might help if you can post the full explain plan using this text format:
explain plan for select ... ;
select * from table(dbms_xplan.display);

Related

Does index still exists in these situations?

I have some questions about index.
First, if I use a index column in WITH clause, dose this column still works as index column in main query?
For example,
WITH TEST AS (
SELECT EMP_ID
FROM EMP_MASTER
)
SELECT *
FROM TEST
WHERE EMP_ID >= '2000'
'EMP_ID' in 'EMP_MASTER' table is PK and index for EMP_MASTER consists of EMP_ID.
In this situation, Does 'Index Scan' happen in main query?
Second, if I join two tables and then use two index columns from each table in WHERE, does 'Index Scan' happen?
For example,
SELECT *
FROM A, B
WHERE A.COL1 = B.COL1 AND
A.COL1 > 200 AND
B.COL1 > 100
Index for table A consists of 'COL1' and index for table B consists of 'COL1'.
In this situation, does 'Index Scan' happen in each table before table join?
If you give me some proper advice, I really appreciate that.

First, SQL is a declarative language, not a procedural language. That is, a SQL query describes the result set, not the specific processing. The SQL engine uses the optimizer to determine the best execution plan to generate the result set.
Second, Oracle has a reasonable optimizer.
Hence, Oracle will probably use the indexes in these situations. However, you should look at the execution plan to see what Oracle really does.

First, if I use a index column in WITH clause, dose this column still works as index column in main query?
Yes. A CTE (the WITH part) is a query just like any other - and if a query references a physical table column used by an index then the engine will use the index if it thinks it's a good idea.
In this situation, Does 'Index Scan' happen in main query?
We can't tell from the limitated information you've provided. An engine will scan or seek an index based on its heuristics about the distribution of data in the index (e.g. STATISTICS objects) and other information it has, such as cached query execution plans.
In this situation, does 'Index Scan' happen in each table before table join?
As it's a range query, it probably would make sense for an engine to use an index scan rather than an index seek - but it also could do a table-scan and ignore the index if the index isn't selective and specific enough. Also factor in query flags to force reading non-committed data (e.g. for performance and to avoid locking).

How to index the attributes of the associated table in Oracle？

In order to optimize the query of the following statement add an index:
SELECT SUPPLIER.COMPANY_NAME, SUPPLIER.CITY
FROM PRODUCT JOIN SUPPLIER
ON PRODUCT.SUPPLIER_NAME = SUPPLIER.COMPANY_NAME;
The statement I wrote is as follows：
EXPLAIN PLAN FOR SELECT PRODUCT.SUPPLIER_NAME, SUPPLIER.COMPANY_NAME FROM PRODUCT,SUPPLIER;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
CREATE INDEX PS_IDX_SC ON PRODUCT,SUPPLIER(PRODUCT.) ;
EXPLAIN PLAN FOR SELECT PRODUCT.SUPPLIER_NAME, SUPPLIER.COMPANY_NAME FROM PRODUCT JOIN SUPPLIER;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
DROP INDEX PS_IDX_SC;
How to write the statement on line 45,thanks.

You can not crete the index on two tables.
You need to create two separate index as follows:
CREATE INDEX PS_IDX_PS ON PRODUCT(SUPPLIER_NAME) ;
CREATE INDEX PS_IDX_SC ON SUPPLIER(COMPANY_NAME) ;

Let me try to answer your question in a different way, trying to give you a short overview of what indexes are for, and that sometimes they are not the answer. You are joining two tables based on a condition, but without filtering. When you need to analyse a performance issue, and you think an index is the answer, try to think a bit more.
In your specific case, the join has no filter, so you show the supplier name and company name. But your query shows two columns only: supplier_name from the product table, and company_name from the supplier table. However, what is the join condition here ? I guess that company_name and supplier_name are the same, however it does not make any sense to retrieve the same column from both tables, if you ask me.
Original query
SQL> SELECT PRODUCT.SUPPLIER_NAME, SUPPLIER.COMPANY_NAME FROM PRODUCT JOIN SUPPLIER;
Rewrite query
SQL> SELECT PRODUCT.SUPPLIER_NAME, SUPPLIER.COMPANY_NAME FROM PRODUCT JOIN SUPPLIER
on PRODUCT.SUPPLIER_NAME = SUPPLIER.COMPANY_NAME;
Try to write always the join condition, makes the query more readable. In your case you could create two indexes in both tables, as #Tejash has shown you before, but let me explain you a bit more something else.
If your SQL query only retrieves the columns present in the index, Oracle probably will use the indexes to access the data. In this case, accessing by index will be faster than by table because the indexes are smaller than the tables.
However, if your SQL query retrieves more columns than the ones contained in the indexes (for example, the product_name), then it would be very interesting see whether than indexes make the query faster when you have no filter on it. In this case Oracle probably would use a method called TABLE ACCESS BY INDEX ROWID. It means that Oracle access the index to retrieve the rowid, then it goes to the table to get the data using the rowid retrieved from the index. In this case, when more columns are involved, if the tables are big enough, I bet accessing by table full scan is faster than accessing by index.
My advice: Get statistics of both tables by using DBMS_STATS. And, if you have Oracle 11g or higher, that you most probably do, you might want to use Invisible Indexes to verify the performance of those queries when you add the indexes without affecting your environment, then when you are sure, you can make them visible.
SQL> CREATE INDEX IDX_PRO_SUP ON PRODUCT(SUPPLIER_NAME) INVISIBLE;
SQL> CREATE INDEX IDX_SUP_COM SUPPLIER(COMPANY_NAME) INVISIBLE;
To see how the indexes will work with your explain plan in your own session.
SQL> ALTER SESSION SET OPTIMIZER_USE_INVISIBLE_INDEXES=TRUE;
SQL> EXPLAIN PLAN FOR SELECT PRODUCT.SUPPLIER_NAME, SUPPLIER.COMPANY_NAME FROM
PRODUCT,SUPPLIER;
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Then when you are sure those indexes work as you expect:
SQL> ALTER INDEX IDX_PRO_SUP VISIBLE;
SQL> ALTER INDEX IDX_SUP_COM VISIBLE;
Hope it helps.
Best regards

Tuning for Oracle query

I need to increase performance of this query:
select t.*
,(select max(1)
from schema1.table_a t1
where 1=1
to_date(t.misdate, 'YYYYMMDD') between t1.startdateref and t1.enddateref
and sysdate between t1.startdatevalue and t1.enddatevalue
and t1.idpma = t.idpm)
from schema2.table_b t
Any Ideas?
Thanks

Well you don't have any filtering condition on table_b. This means the best plan includes a full table scan on table_b. This would be optimal.
Having said that, now you need to focus on table_a. That one should be accessed using index range scans on either:
idpma, then by startdateref.
or idpma, then by startdateref.
Yes, it's one or the other. For Oracle's cost-based optimizer (CBO) to pick the best plan, you'll need to add the following indexes:
create index ix1 on schema1.table_a (idpma, startdateref);
create index ix2 on schema1.table_a (idpma, startdatevalue);
Try with this ones and see how it works.

How should tables be indexed to optimise this Oracle SELECT query?

I've got the following query in Oracle10g:
select *
from DATA_TABLE DT,
LOOKUP_TABLE_A LTA,
LOOKUP_TABLE_B LTB
where DT.COL_A = LTA.COL_A (+)
and DT.COL_B = LTA.COL_B (+)
and LTA.COL_C = LTB.COL_C
and LTA.COL_B = LTB.COL_B
and ( DT.REF_TXT = :refTxt or DT.ALT_REF_TXT = :refTxt )
and DT.CREATED_DATE between :startDate and :endDate
And was wondering whether you've got any hints for optimising the query.
Currently I've got the following indices:
IDX1 on DATA_TABLE (REF_TXT, CREATED_DATE)
IDX2 on DATA_TABLE (ALT_REF_TXT, CREATED_DATE)
LOOKUP_A_PK on LOOKUP_TABLE_A (COL_A, COL_B)
LOOKUP_A_IDX1 on LOOKUP_TABLE_A (COL_C, COL_B)
LOOKUP_B_PK on LOOKUP_TABLE_B (COL_C, COL_B)
Note, the LOOKUP tables are very small (<200 rows).
EDIT:
Explain plan:
Query Plan
SELECT STATEMENT Cost = 8
FILTER
NESTED LOOPS
NESTED LOOPS
TABLE ACCESS BY INDEX ROWID DATA_TABLE
BITMAP CONVERSION TO ROWIDS
BITMAP OR
BITMAP CONVERSION FROM ROWIDS
SORT ORDER BY
INDEX RANGE SCAN IDX1
BITMAP CONVERSION FROM ROWIDS
SORT ORDER BY
INDEX RANGE SCAN IDX2
TABLE ACCESS BY INDEX ROWID LOOKUP_TABLE_A
INDEX UNIQUE SCAN LOOKUP_A_PK
TABLE ACCESS BY INDEX ROWID LOOKUP_TABLE_B
INDEX UNIQUE SCAN LOOKUP_B_PK
EDIT2:
The data looks like this:
There will be 10000s of distinct REF_TXT, which 10-100s of CREATED_DTs for each. ALT_REF_TXT will mostly NULL but there are going to be 100s-1000s which it will be different from REF_TXT.
EDIT3: Fixed what ALT_REF_TXT actually contains.

The execution plan you're currently getting looks pretty good. There's no obvious improvement to be made.
As other have noted, you have some outer join indicators, but then you essentially prevent the outer join by requiring equality on other columns in the two outer tables. As you can see from the execution plan, no outer join is happening. If you don't want an outer join, remove the (+) operators, they're just confusing the issue. If you do want an outer join, rewrite the query as shown by #Dems.
If you're unhappy with the current performance, I would suggest running the query with the gather_plan_statistics hint, then using DBMS_XPLAN.DISPLAY_CURSOR(?,?,'ALLSTATS LAST') to view the actual execution statistics. This will show the elapsed time attributed to each step in the execution plan.
You might get some benefit from converting one or both of the lookup tables into index-organized tables.

Your 2 index range scans on IDX1 and IDX2 will produce at most 100 rows, so your BITMAP CONVERSION TO ROWIDS will produce at most 200 rows. And from there on, it's only indexed access by rowids, leading to a likely sub-second execution. So are you really experiencing performance problems? If so, how long does it take exactly?
If you are experiencing performance problems, then please follow Dave Costa's advice and get the real plan, because in that case it's likely that you are using another plan runtime, possibly due to certain bind variable values or different optimizer environment settings.
Regards,
Rob.

This is one of those cases where it makes very little sense to try to optimize the DBMS performance without knowing what your data means.
Do you have many, many distinct CREATED_DATE values and a few rows in your DT for each date? If so you want an index on CREATED_DATE, as it will be the primary way for the DBMS to reject columns it doesn't want to process.
On the other hand, do you have only a handful of dates, and many distinct values of REF_TXT or ALT_REF_TXT? In that case you probably have the correct compound index choices.
The presence of OR in your query complicates things greatly, and throws most guesswork out the window. You must look at EXPLAIN PLAN to see what's going on.
If you have tens of millions of distinct REF_TXT and ALT_REF_TXT values, you may want to consider denormalizing this schema.
Edit.
Thanks for the additional info. Your explain plan contains no smoking guns that I can see. Some things to try next if you're not happy with performance yet.
Flip the order of the columns in your compound indexes on your data tables. Maybe that will get you simpler index range scans instead of all the bitmap monkey business.
Exchange your SELECT * for the names of the columns you actually need in the query resultset. That's good programming practice in any case, and it MAY allow the optimizer to avoid some work.
If things are still too slow, try recasting this as a UNION of two queries rather than using OR. That MAY allow the alt_ref_txt part of your query, which is made a little more complex by all the NULL values in that column, to be optimized separately.

This may be the query you want using a more upto date syntax.
(And without inner joins breaking outer joins)
select
*
from
DATA_TABLE DT
left outer join
(
LOOKUP_TABLE_A LTA
inner join
LOOKUP_TABLE_B LTB
on LTA.COL_C = LTB.COL_C
and LTA.COL_B = LTB.COL_B
)
on DT.COL_A = LTA.COL_A
and DT.COL_B = LTA.COL_B
where
( DT.REF_TXT = :refTxt or DT.ALT_REF_TXT = :refTxt )
and DT.CREATED_DATE between :startDate and :endDate
INDEXes that I'd have are...
LOOKUP_TABLE_A (COL_A, COL_B)
LOOKUP_TABLE_B (COL_B, COL_C)
DATA_TABLE (REF_TXT, CREATED_DATE)
DATA_TABLE (ALT_REF_TXT, CREATED_DATE)
Note: The first condition in the WHERE clause about contains an OR that will likely frag the use of INDEXes. In such case I have seen performance benefits in UNIONing two queries together...
<your query>
where
DT.REF_TXT = :refTxt
and DT.CREATED_DATE between :startDate and :endDate
UNION
<your query>
where
DT.ALT_REF_TXT = :refTxt
and DT.CREATED_DATE between :startDate and :endDate

Provide output of this query with "set autot trace". Let's see how many blocks it is pulling. Explain plan looks good, it should be very fast. If you need more, denormalize the lookup table info into DT. Violates 3rd normal form, but it will make your query faster by eliminating the joins. In a situation where milliseconds counts, everything is in buffers, and you need that query to run 1000 times/second, it can help by driving down the number of blocks looked at per row. It is the ultimate way to boost read performance, but complicates your app (and ruins your lovely ER diagram).

Performance of updating one big table using values from one small table

First, I know that the sql statement to update table_a using values from table_b is in the form of:
Oracle:
UPDATE table_a
SET (col1, col2) = (SELECT cola, colb
FROM table_b
WHERE table_a.key = table_b.key)
WHERE EXISTS (SELECT *
FROM table_b
WHERE table_a.key = table_b.key)
MySQL:
UPDATE table_a
INNER JOIN table_b ON table_a.key = table_b.key
SET table_a.col1 = table_b.cola,
table_a.col2 = table_b.colb
What I understand is the database engine will go through records in table_a and update them with values from matching records in table_b.
So, if I have 10 millions records in table_a and only 10 records in table_b:
Does that mean that the engine will do 10 millions iterations through table_a just to update 10 records? Are Oracle/MySQL/etc smart enough to do only 10 iterations through table_b?
Is there a way to force the engine to actually iterate through records in table_b instead of table_a to do the update? Is there an alternative syntax for the sql statement?
Assume that table_a.key and table_b.key are indexed.

Either engine should be smart enough to optimize the query based on the fact that there are only ten rows in table b. How the engine determines what to do is based factors like indexes and statistics.
If the "key" column is the primary key and/or is indexed, the engine will have to do very little work to run this query. It will basically already sort of "know" where the matching rows are, and look them up very quickly. It won't have to "iterate" at all.
If there is no index on the key column, the engine will have to to a "table scan" (roughly the equivalent of "iterate") to find the right values and match them up. This means it will have to scan through 10 million rows.
Do a little reading on what's called an Execution Plan. This is basically an explanation of what work the engine had to do in order to run your query (some databases show it in text only, some have the option of seeing it graphically). Learning how to interpret an Execution Plan will give you great insight into adding indexes to your tables and optimizing your queries.
Look these up if they don't work (it's been a while), but it's something like:
In MySQL, put the work "EXPLAIN" in front of your SELECT statement
In Oracle, run "SET AUTOTRACE ON" before you run your SELECT statement
I think the first (Oracle) query would be better written with a JOIN instead of a WHERE EXISTS. The engine may be smart enough to optimize it properly either way. Once you get the hang of interpreting an execution plan, you can run it both ways and see for yourself. :)

Okay I know answering own question is usually frowned upon but I already accepted another answer and won't unaccept it so meh here it is ..
I've discovered a much better alternative that I'd like to share it with anyone who encounters the same scenario: MERGE statement.
Apparently, newer Oracle versions introduced this MERGE statement which simply blows! Not only that the performance is so much better in most cases, the syntax is so simple and so make sense that I feel stupid for using the UPDATE statement! Here comes ..
MERGE INTO table_a
USING table_b
ON (table_a.key = table_b.key)
WHEN MATCHED THEN UPDATE SET
table_a.col1 = table_b.cola,
table_a.col2 = table_b.colb;
And what more is that I can also extend the statement to include INSERT action when table_a does not have matching records for some records in table_b:
MERGE INTO table_a
USING table_b
ON (table_a.key = table_b.key)
WHEN MATCHED THEN UPDATE SET
table_a.col1 = table_b.cola,
table_a.col2 = table_b.colb
WHEN NOT MATCHED THEN INSERT
(key, col1, col2)
VALUES (table_b.key, table_b.cola, table_b.colb);
This new statement type made my day the day I discovered it :)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas