Join Performance Issue in Oracle

Join Performance Issue in Oracle - sql

We have are having 2 tables.
Table - XYZ - > Having over 189 M Records
Table - ABC - > Having only 1098 records.
Our join query is some what like
select a.a, a.b, a.c
from xyz a , ABC r
where a.d = r.d
and a.sub not like '0%'
and ((a.eff_dat < sysdate) or (a.eff_date is null))
This is how our query is performing. In any way can it be optmised to perform faster.
Apart from the not like, can you suggest me any other method.
In the explain plan I have seen that it is taking the 189 M as itrator and checking with the 1098 records which is taking more time.
I swapped the tables after the from Key word but also it did not work.
Tried leading hint, which also not servered the purpose.
Also a.d column is an indexed one which is also used in the hint.
Please do suggest any methods for optimisation.

When you have multiple predicates on a table such as:
a.sub not like '0%'
and ((a.eff_dat < sysdate) or (a.eff_date is null))
... it is rather unlikely that the optimiser will accurately estimate the cardinality of the result set unless you use dynamic sampling, so check the explain plan to see whether:
Dynamic sampling is being invoked.
The estimation of the cardinalities are correct.
If the predicates are not very selective -- if they do not eliminate something in the order of 90% or the rows in the table -- then it is unlikely that an index will be of help in finding the rows, and a full scan (with partition pruning if the table is partitioned in a way that supports that) is likely to be the best access path.
I'd be reasonably sure that if there is a foreign key between the tables (ie. that all values of a.d exist in r.d) then the best access path is going to be a full scan of XYZ with a hash join to ABC.
By the way, you mention hints but do not include them in the question. It's also unhelpful to hide the purpose of the tables with fake names, as the names often give valuable clues about the type of data and distribution of values within the data sets.

It would seem most of the cost would be in the (presumed) full table scan on the large table. I would suggest rewriting your WHERE condition as follows:
SELECT * FROM XYZ A
WHERE SUBSTR(A.SUB, 1, 1) <> '0'
AND NVL(A.EFF_DAT, TO_DATE('01-01-0001', 'MM-DD-YYYY')) < SYSDATE ;
And then create a function index that includes all the relevant columns:
CREATE INDEX IX_XYZ1 ON
XYZ(NVL(EFF_DAT, TO_DATE('01-01-0001', 'MM-DD-YYYY')), SUB, D);
Make sure the new index is being picked up by the cost-based optimizer, by checking the execution plan.
LIKE, NOT LIKE and the OR operand are some of the worst things you can use in a WHERE condition.

Related

Does index still exists in these situations?

I have some questions about index.
First, if I use a index column in WITH clause, dose this column still works as index column in main query?
For example,
WITH TEST AS (
SELECT EMP_ID
FROM EMP_MASTER
)
SELECT *
FROM TEST
WHERE EMP_ID >= '2000'
'EMP_ID' in 'EMP_MASTER' table is PK and index for EMP_MASTER consists of EMP_ID.
In this situation, Does 'Index Scan' happen in main query?
Second, if I join two tables and then use two index columns from each table in WHERE, does 'Index Scan' happen?
For example,
SELECT *
FROM A, B
WHERE A.COL1 = B.COL1 AND
A.COL1 > 200 AND
B.COL1 > 100
Index for table A consists of 'COL1' and index for table B consists of 'COL1'.
In this situation, does 'Index Scan' happen in each table before table join?
If you give me some proper advice, I really appreciate that.

First, SQL is a declarative language, not a procedural language. That is, a SQL query describes the result set, not the specific processing. The SQL engine uses the optimizer to determine the best execution plan to generate the result set.
Second, Oracle has a reasonable optimizer.
Hence, Oracle will probably use the indexes in these situations. However, you should look at the execution plan to see what Oracle really does.

First, if I use a index column in WITH clause, dose this column still works as index column in main query?
Yes. A CTE (the WITH part) is a query just like any other - and if a query references a physical table column used by an index then the engine will use the index if it thinks it's a good idea.
In this situation, Does 'Index Scan' happen in main query?
We can't tell from the limitated information you've provided. An engine will scan or seek an index based on its heuristics about the distribution of data in the index (e.g. STATISTICS objects) and other information it has, such as cached query execution plans.
In this situation, does 'Index Scan' happen in each table before table join?
As it's a range query, it probably would make sense for an engine to use an index scan rather than an index seek - but it also could do a table-scan and ignore the index if the index isn't selective and specific enough. Also factor in query flags to force reading non-committed data (e.g. for performance and to avoid locking).

how to improve the performance on this type of Query?

I have query that I am searching for a range of user accounts but every time I pass query I will be using multiple id number's first 5 digits and based on that I will searching. I wanted to know is there any other way to re-write this query for user Id range when we use more than 10 userIDs to search? Is there going to be huge performance hit with this kind of search in query?
example:
select A.col1,B.col2,B.col3
from table1 A,table2 B
where A.col2=B.col2
and (B.col_id like '12345%'
OR B.col_id like '47474%'
OR B.col_id like '59598%');
I am using Oracle11g.

Actually it is not important how many UserIDs you will pass to the query. The most considerable part is what is selectivity of your query. In other words: how many rows will return your query and how many rows are there in your tables. If the number of returned rows is relatively small then it is good idea to create an index on column B.col_id. There is also nothing bad in using OR condition. Basically each OR will add one more INDEX RANGE SCAN to the execution plan with final CONCATENATION (but you'd rather check your actual plan to be sure). If the total cost of all that operations are lower than full table scan then Oracle CBO will use your index. In other case if you select >=20-30% of data at once then full table scan is most likely to happen and you should even less be worried about OR because all data will be read and comparing each value with your multiple conditions won't add much overhead.

Generally the use of LIKE makes it impossible for Oracle to use indexes.
If the query is going to be reused, consider creating a synthetic column with the first 5 characters of COL_ID. Put a non-unique index on it. Put your search keys in a table and join that to that column.
There may be a way to do it with a functional index on the first 5 characters.

I don't know if the performance will be better or not, but another way to write this is with a union:
select A.col1,B.col2,B.col3
from table1 A,table2 B
where A.col2=B.col2
and (A.col_id like '12345%')
union all
select A.col1,B.col2,B.col3
from table1 A,table2 B
where A.col2=B.col2
and (A.col_id like '47474%') -- you did mean A.col_id again, right?
union all
select A.col1,B.col2,B.col3
from table1 A,table2 B
where A.col2=B.col2
and (A.col_id like '59598%'); -- and A.col_id here too?

Oracle: How can I find tablespace fragmentation?

I've a JOIN beween two tables. It's really really slow and I can't find why.
The query takes hours in a PRODUCTION environment on a very big Client.
Can you ask me what you need to understand why it doesn't work well?
I can add indexes, partition the table, etc. It's Oracle 10g.
I expect a few thousand record. Because of the following condition:
f.eif_campo1 != c.fornitura AND and f.field29 = 'New'
Infact it should be always verified for all 18 million records
SELECT c.id_messaggio
,f.campo1
,c.f
FROM
flows c,
tab f
WHERE
f.field198 = c.id_messaggio
AND f.extra_id = c.extra_id
and f.field1 != c.ExampleF
and f.field29 = 'New'
and c.processtype in ('Example1')
and c.flag_ann = 'N';
Selectivity for the following record expressed as number of distinct values:
COUNT (DISTINCT extra_id) =>17*10^6,
COUNT (DISTINCT (extra_id || field20)) =>17*10^6,
COUNT (DISTINCT field198) =>36*10^6,
COUNT (DISTINCT (field19 || field20)) =>45*10^6,
COUNT (DISTINCT (field1)) =>18*10^6,
COUNT (DISTINCT (field20)) =>47
This is the execution plan [See large image][1]
![enter image description here][2]
Extra details:
I have relaxed one contition to see how many records are taken. 300 thousand.
![enter image description here][7]
--03:57 mins with parallel execution /*+ parallel(c 8) parallel(f 24) */
--395.358 rows
SELECT count(1)
FROM
flows c,
flet f
WHERE
f.field19 = c.id_messaggio
AND f.extra_id = c.extra_id
and f.field20 = 'ExampleF'
and c.process_type in ('ExampleP')
and c.flag_ann = 'N';

Your explain plan shows the following.
The database uses an index to retrieve rows from ENI_FLUSSI_HUB where
flh_tipo_processo_cod in ('VT','VOLTURA_ENI','CC')
It then winnows the rows
where flh_flag_ann = 'N'
This produces a result set which is used to access
rows from ETL_ELAB_INTERF_FLAT on the basis of f.idde_identif_dati_ext_id =
c.idde_identif_dati_ext_id
Finally those rows are filtered on the basis of the
remaining parts of the WHERE clause.
Now, the starting point is a good one if flh_tipo_processo_cod is a selective
column: that is, if it contains hundreds of different values, or if the values in
your list are relatively rare. It might even be a good path of the flag column
identifies relatively few columns with a value of 'N'. So you need to understand
both the distribution of your data - how many distinct values you have - and its
skew - which values appear very often or hardly at all. The overall
performance suggests that the distribution and/or skew of the
flh_tipo_processo_cod and flh_flag_ann columns is not good.
So what can you do? One approach is to follow Ben's suggestion, and use full
table scans. If you have an Enterprise Edition licence and plenty of CPU capacity
you could try parallel query to improve things. That might still be too slow, or it might be too disruptive for other users.
An alternative approach would be to use better indexes. A composite index on
eni_flussi_hub(flh_tipo_processo_cod,flh_flag_ann,idde_identif_dati_ext_id,
flh_fornitura,flh_id_messaggio) would avoid the need to read that table. Whether
this would be a new index or a replacement for ENI_FLK_IDX3 depends on the other
activity against the table. You might be able to benefit from index compression.
All the columns in the query projection are referenced in the WHERE clause. So
you could also use a composite index on the other table to avoid table reads. Agsin you need to understand the distribution and skew of the data. But you should probably lead with the least selective columns. Something like etl_elab_interf_flat(etl_elab_interf_flat,eif_campo200,dde_identif_dati_ext_id,eif_campo1,eif_campo198). Probably this is a new index. It's unlikely you would want to replace ETL_EIF_FK_IDX4 with this (especially if that really is an index on a foreign key constraint)..
Of course these are just guesses on my part. Tuning is a science and to do it properly requires lots of data. Use the Wait Interface to investigate where the database is spending its time. Use the 10053 event to understand why the Optimizer makes the choices it does. But above all, don't implement partitioning unless you really know the ramifications.

The simple answer seems to be your explain plan. You're accessing both tables by index rowid. Whilst to select a single row you cannot - to my knowledge - get faster, in your case you're selecting a lot more than a single row.
This means that for every single row you, you're going into both tables one row at a time, which when you're looking a significant proportion of a table or index is not what you want to do.
My suggestion would be to force a full scan of one or both of your tables. Try to use the smaller as a driver first:
SELECT /*+ full(c) */ c.flh_id_messaggio
, f.eif_campo1
, c.f
FROM flows c,
JOIN flet f
ON f.field19 = c.flh_id_messaggio
AND f.extra_id = c.extra_id
AND f.field1 <> c.f
WHERE ...
But you may have to change /*+ full(c) */ to /*+ full(c) full(f) */.
Your indexes seem to be separate column indexes as well. For this, and if possible, I would have indexes on:
flows of id_messaggio, extra_id, f
and on flet of field19, extra_id, field1.
This will only really matter if you do not use as full scan. Or, if you have everything you're returning and selecting is in one index.

How should tables be indexed to optimise this Oracle SELECT query?

I've got the following query in Oracle10g:
select *
from DATA_TABLE DT,
LOOKUP_TABLE_A LTA,
LOOKUP_TABLE_B LTB
where DT.COL_A = LTA.COL_A (+)
and DT.COL_B = LTA.COL_B (+)
and LTA.COL_C = LTB.COL_C
and LTA.COL_B = LTB.COL_B
and ( DT.REF_TXT = :refTxt or DT.ALT_REF_TXT = :refTxt )
and DT.CREATED_DATE between :startDate and :endDate
And was wondering whether you've got any hints for optimising the query.
Currently I've got the following indices:
IDX1 on DATA_TABLE (REF_TXT, CREATED_DATE)
IDX2 on DATA_TABLE (ALT_REF_TXT, CREATED_DATE)
LOOKUP_A_PK on LOOKUP_TABLE_A (COL_A, COL_B)
LOOKUP_A_IDX1 on LOOKUP_TABLE_A (COL_C, COL_B)
LOOKUP_B_PK on LOOKUP_TABLE_B (COL_C, COL_B)
Note, the LOOKUP tables are very small (<200 rows).
EDIT:
Explain plan:
Query Plan
SELECT STATEMENT Cost = 8
FILTER
NESTED LOOPS
NESTED LOOPS
TABLE ACCESS BY INDEX ROWID DATA_TABLE
BITMAP CONVERSION TO ROWIDS
BITMAP OR
BITMAP CONVERSION FROM ROWIDS
SORT ORDER BY
INDEX RANGE SCAN IDX1
BITMAP CONVERSION FROM ROWIDS
SORT ORDER BY
INDEX RANGE SCAN IDX2
TABLE ACCESS BY INDEX ROWID LOOKUP_TABLE_A
INDEX UNIQUE SCAN LOOKUP_A_PK
TABLE ACCESS BY INDEX ROWID LOOKUP_TABLE_B
INDEX UNIQUE SCAN LOOKUP_B_PK
EDIT2:
The data looks like this:
There will be 10000s of distinct REF_TXT, which 10-100s of CREATED_DTs for each. ALT_REF_TXT will mostly NULL but there are going to be 100s-1000s which it will be different from REF_TXT.
EDIT3: Fixed what ALT_REF_TXT actually contains.

The execution plan you're currently getting looks pretty good. There's no obvious improvement to be made.
As other have noted, you have some outer join indicators, but then you essentially prevent the outer join by requiring equality on other columns in the two outer tables. As you can see from the execution plan, no outer join is happening. If you don't want an outer join, remove the (+) operators, they're just confusing the issue. If you do want an outer join, rewrite the query as shown by #Dems.
If you're unhappy with the current performance, I would suggest running the query with the gather_plan_statistics hint, then using DBMS_XPLAN.DISPLAY_CURSOR(?,?,'ALLSTATS LAST') to view the actual execution statistics. This will show the elapsed time attributed to each step in the execution plan.
You might get some benefit from converting one or both of the lookup tables into index-organized tables.

Your 2 index range scans on IDX1 and IDX2 will produce at most 100 rows, so your BITMAP CONVERSION TO ROWIDS will produce at most 200 rows. And from there on, it's only indexed access by rowids, leading to a likely sub-second execution. So are you really experiencing performance problems? If so, how long does it take exactly?
If you are experiencing performance problems, then please follow Dave Costa's advice and get the real plan, because in that case it's likely that you are using another plan runtime, possibly due to certain bind variable values or different optimizer environment settings.
Regards,
Rob.

This is one of those cases where it makes very little sense to try to optimize the DBMS performance without knowing what your data means.
Do you have many, many distinct CREATED_DATE values and a few rows in your DT for each date? If so you want an index on CREATED_DATE, as it will be the primary way for the DBMS to reject columns it doesn't want to process.
On the other hand, do you have only a handful of dates, and many distinct values of REF_TXT or ALT_REF_TXT? In that case you probably have the correct compound index choices.
The presence of OR in your query complicates things greatly, and throws most guesswork out the window. You must look at EXPLAIN PLAN to see what's going on.
If you have tens of millions of distinct REF_TXT and ALT_REF_TXT values, you may want to consider denormalizing this schema.
Edit.
Thanks for the additional info. Your explain plan contains no smoking guns that I can see. Some things to try next if you're not happy with performance yet.
Flip the order of the columns in your compound indexes on your data tables. Maybe that will get you simpler index range scans instead of all the bitmap monkey business.
Exchange your SELECT * for the names of the columns you actually need in the query resultset. That's good programming practice in any case, and it MAY allow the optimizer to avoid some work.
If things are still too slow, try recasting this as a UNION of two queries rather than using OR. That MAY allow the alt_ref_txt part of your query, which is made a little more complex by all the NULL values in that column, to be optimized separately.

This may be the query you want using a more upto date syntax.
(And without inner joins breaking outer joins)
select
*
from
DATA_TABLE DT
left outer join
(
LOOKUP_TABLE_A LTA
inner join
LOOKUP_TABLE_B LTB
on LTA.COL_C = LTB.COL_C
and LTA.COL_B = LTB.COL_B
)
on DT.COL_A = LTA.COL_A
and DT.COL_B = LTA.COL_B
where
( DT.REF_TXT = :refTxt or DT.ALT_REF_TXT = :refTxt )
and DT.CREATED_DATE between :startDate and :endDate
INDEXes that I'd have are...
LOOKUP_TABLE_A (COL_A, COL_B)
LOOKUP_TABLE_B (COL_B, COL_C)
DATA_TABLE (REF_TXT, CREATED_DATE)
DATA_TABLE (ALT_REF_TXT, CREATED_DATE)
Note: The first condition in the WHERE clause about contains an OR that will likely frag the use of INDEXes. In such case I have seen performance benefits in UNIONing two queries together...
<your query>
where
DT.REF_TXT = :refTxt
and DT.CREATED_DATE between :startDate and :endDate
UNION
<your query>
where
DT.ALT_REF_TXT = :refTxt
and DT.CREATED_DATE between :startDate and :endDate

Provide output of this query with "set autot trace". Let's see how many blocks it is pulling. Explain plan looks good, it should be very fast. If you need more, denormalize the lookup table info into DT. Violates 3rd normal form, but it will make your query faster by eliminating the joins. In a situation where milliseconds counts, everything is in buffers, and you need that query to run 1000 times/second, it can help by driving down the number of blocks looked at per row. It is the ultimate way to boost read performance, but complicates your app (and ruins your lovely ER diagram).

Oracle SQL query - unexpected query plan

I have a very simple query that's giving me unexpected results. Hints on where to troubleshoot it would be welcome.
Simplified, the query is:
SELECT Obs.obsDate,
Obs.obsValue,
ObsHead.name
FROM ml.Obs Obs
JOIN ml.ObsHead ObsHead ON ObsHead.hdId = Obs.hdId
WHERE obs.hdId IN (53, 54)
This gives me a query cost of: 963. However, if I change the query to:
SELECT Obs.obsDate,
Obs.obsValue,
ObsHead.name
FROM ml.Obs Obs
JOIN ml.ObsHead ObsHead ON ObsHead.hdId = Obs.hdId
WHERE ObsHead.name IN ('BP SYSTOLIC', 'BP DIASTOLIC')
Although it (should) return the same data, the estimated cost shoots up to 17688. Where is the problem here likely to lie? Thanks.
Edit: The query plan says that the index on ObsHead.Name is being used for a range scan, and the table access on ObsHead only costs 4. There's another index on Obs.hdId that's being used for a range scan costing 94: it's the Nested Loops join between the tables that jumps up to 17K.

As has already been stated, the plan's cost is not intended for comparing two different queries, only for comparing different paths for the same query.
This is only a guess, but in this case, the cardinality field of the plan might be more useful to you. If the index on OBSHEAD is not unique and the statistics were gathered using an estimate, then the optimizer may not know exactly how many rows to expect when querying that table. The cardinality will tell you whether this is true or not (ideally, you'll be seeing a cardinality of 2 for OBSHEAD).
Another suggestion is to check the statistics on OBS. It seems likely that is a table that grows frequently, in which case, January 28th is not recent enough to have gathered the statistics. Assuming monitoring is turned on for this table, the queries below can tell you if the statistics are stale and need to be refreshed.
select owner, table_name, last_analyzed, stale_stats
from all_tab_statistics
where owner = 'ML' and table_name = 'OBS';
select owner, index_name, last_analyzed, stale_stats
from all_ind_statistics
where owner = 'ML' and table_name = 'OBS';

There is probably an index on hdId (which there is if it's the primary key, which I suspect is the case) and not on name which means that the second query will have to do a full table scan.

Costs are only useful for comparing different plans for one query; they're not so useful for comparing different queries.
You need to look at the plans and compare them in terms of the actions they perform.
I suspect the actual performance of these queries will be similar - however it would be interesting to know whether the first query uses a hash join, which might help things if the percentage of records in obs that are matched is significant.

I find the costs supplied by the optimizer to be interesting but not particularly useful. The best way I've found to compare queries is to run them and see how they perform relative to one another.
Share and enjoy.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas