Oracle SQL query - unexpected query plan - sql

I have a very simple query that's giving me unexpected results. Hints on where to troubleshoot it would be welcome.
Simplified, the query is:
SELECT Obs.obsDate,
Obs.obsValue,
ObsHead.name
FROM ml.Obs Obs
JOIN ml.ObsHead ObsHead ON ObsHead.hdId = Obs.hdId
WHERE obs.hdId IN (53, 54)
This gives me a query cost of: 963. However, if I change the query to:
SELECT Obs.obsDate,
Obs.obsValue,
ObsHead.name
FROM ml.Obs Obs
JOIN ml.ObsHead ObsHead ON ObsHead.hdId = Obs.hdId
WHERE ObsHead.name IN ('BP SYSTOLIC', 'BP DIASTOLIC')
Although it (should) return the same data, the estimated cost shoots up to 17688. Where is the problem here likely to lie? Thanks.
Edit: The query plan says that the index on ObsHead.Name is being used for a range scan, and the table access on ObsHead only costs 4. There's another index on Obs.hdId that's being used for a range scan costing 94: it's the Nested Loops join between the tables that jumps up to 17K.

As has already been stated, the plan's cost is not intended for comparing two different queries, only for comparing different paths for the same query.
This is only a guess, but in this case, the cardinality field of the plan might be more useful to you. If the index on OBSHEAD is not unique and the statistics were gathered using an estimate, then the optimizer may not know exactly how many rows to expect when querying that table. The cardinality will tell you whether this is true or not (ideally, you'll be seeing a cardinality of 2 for OBSHEAD).
Another suggestion is to check the statistics on OBS. It seems likely that is a table that grows frequently, in which case, January 28th is not recent enough to have gathered the statistics. Assuming monitoring is turned on for this table, the queries below can tell you if the statistics are stale and need to be refreshed.
select owner, table_name, last_analyzed, stale_stats
from all_tab_statistics
where owner = 'ML' and table_name = 'OBS';
select owner, index_name, last_analyzed, stale_stats
from all_ind_statistics
where owner = 'ML' and table_name = 'OBS';

There is probably an index on hdId (which there is if it's the primary key, which I suspect is the case) and not on name which means that the second query will have to do a full table scan.

Costs are only useful for comparing different plans for one query; they're not so useful for comparing different queries.
You need to look at the plans and compare them in terms of the actions they perform.
I suspect the actual performance of these queries will be similar - however it would be interesting to know whether the first query uses a hash join, which might help things if the percentage of records in obs that are matched is significant.

I find the costs supplied by the optimizer to be interesting but not particularly useful. The best way I've found to compare queries is to run them and see how they perform relative to one another.
Share and enjoy.

Related

Is there performance impact when Non-Aggregate SQL functions are used in a SELECTed Column?

We have a report that uses a long and complex query that has the SELECT statement like below:
SELECT
NVL(nazwawystawcy,'BRAK') supplier_name,
NVL(AdresDostawcy,'BRAK') supplier_address,
NVL(NrDostawcy,'BRAK') supplier_registration,
DowodZakupu document_number,
DataZakupu document_issue_date,
DataWplywu document_recording_date,
trx_id,
KodKrajuNadaniaTIN country_code,
DokumentZakupu document_type_code,
payment_split MPP,
box_number box_number,
box_amount box_amount,
box_type box_type,
display_order display_order
...
FROM table1 t1
,table2 t2
....
We recently made modifications to this Query and just modified the 3rd SELECTed column to add a REGEXP_LIKE
SELECT
NVL(nazwawystawcy,'BRAK') supplier_name,
NVL(AdresDostawcy,'BRAK') supplier_address,
--NVL(NrDostawcy,'BRAK') supplier_registration,
Case When (NrDostawcy is not null and regexp_like(substr(NrDostawcy,1,2),'^[a-zA-Z]*$')) Then substr(NrDostawcy,3) else NVL(NrDostawcy,'BRAK') End supplier_registration,
DowodZakupu document_number,
DataZakupu document_issue_date,
DataWplywu document_recording_date,
trx_id,
KodKrajuNadaniaTIN country_code,
DokumentZakupu document_type_code,
payment_split MPP,
box_number box_number,
box_amount box_amount,
box_type box_type,
display_order display_order
...
FROM table1 t1
,table2 t2
....
I checked the Explain Plans of both queries and they turned out to have the same Plan hash value.
Does this mean there's no impact on performance if i use Seeded, non-aggregate, SQL Functions in SELECTed columns?
I believe there is an impact in performance if they're used in the WHERE clause, but i wasn't sure if the same applies to the SELECTed columns.
Apologies in advance as i can't provide the exact query since it's propietary and is very long and complex.
I also don't think I can create a good enough sample that would match the Explain plan of actual query as it joins over 10 tables, with thousand rows of data.
Thank you!
Since you are running this query on Oracle here's my advice. Run the query with Oracle hint /*+ gather_plan_statistics */. Run it with the first query without regex and with the regex. Then find this query in sharedpool (v$sql). The hint will give you the exact buffer gets, physical reads an also time spent in each step of the plan. With that data you can analyze in details how much more time query with regex needed to execute. I advice you, that you do this on data that returns you more than lets say 10k rows. In this way the difference should be seen (if you run this with 100 rows no difference will be seen).
The execution plan is the same as it needs to query exactly the same data from the same tables. You should also see the amount of data (logical IO) unchanged.
What will not be the same however is the execution time, as the regexp_like will consume more CPU, even if you see the logical IO unchanged.
Note that if you changed the selected columns, the execution plan could change as if all selected columns were part of an index, the optimizer might skip the table access and read the data from an index only.
it depends upon the query and the IO's being done to get the data. Sometimes you can try creating a Oracle Function based index, you may see some improvements.
Check this link, it could help you.
https://jeffkemponoracle.com/2007/11/will-oracle-use-my-regexp-function-based-index/
thanks

SQL query in dire need of optimization

I have this query, which works fine, except it takes a couple of minutes to load. I need help optimizing it so it runs faster and I don't know where to start:
SELECT
job_header.job,
job_header.suffix,
job_header.customer,
job_header.description,
job_header.comments_1,
job_header.date_due,
job_header.part,
job_header.customer_po,
job_header.date_closed,
job_header.flag_hold,
job_header.code_sort,
wo_user_flds.user_7,
wo_user_flds.user_3,
wo_user_flds.user_6,
wo_user_flds.user_5,
wo_user_flds.user_2,
quote_lines.user_2 as serialNo,
quote_lines.user_3 as unit,
quote_lines.user_4 as package
FROM job_header
LEFT JOIN wo_user_flds ON
(job_header.job = wo_user_flds.job) AND
(job_header.suffix = wo_user_flds.suffix)
LEFT JOIN quote_lines ON
(job_header.part = quote_lines.part)
WHERE job_header.date_closed = '000000'
AND LENGTH(job_header.job) > 5;
More information that might be of use:
Only the columns found in the select are the columns I need.
My query returns roughly 400 records.
Job_Header table has 97 columns and 6,300 records.
Wo_User_Flds table has 12 columns and 1,100 records.
Quote_Lines table has 198 columns and 46,000 records.
I could speculate on what I think I need to do, but I'm really just guessing at this point. I looked at similar questions and lot of talk of 'indexes', so I checked and these tables do have some indexes...if that helps? Thanks in advance.
[EDIT]
Thanks for the quick responses guys, really appreciate it. I'm going to look into everything everyone said, but here is the ddl for these tables: http://paste.ubuntu.com/13247664/
[EDIT 2]
My query takes 1 minute to load. My expectations may not be realistic in how much faster it can be. I might have to resort to breaking up the query into more than one and then just assemble the data on the client.
Without any other info you'd need an index on job_header on either (job, date_closed) or (date_closed, job). But post the indexes on the table e.g. sp_helpindex or better still the create index script (right click on the index in SSMS and script the index)
First be sure you have indexes on columns where you JOIN tables and your "WHERE clause column". In this case, you should have indexes on these columns:
--Table job_header indexes, beside unique index
job_header.job
job_header.suffix
job_header.part = quote_lines.part
job_header.date_closed
--Table wo_users_flds indexes, beside unique index
wo_user_flds.job
wo_user_flds.suffix
Then, avoid using UDFs (functions, like LENGHT, CAST, concatenation etc.). But in this case, you can leave LENGTH there. So your query would be same, only your indexes would improve query execution plan drastically.
Also, use execution plan to see where you have INDEX_SCAN and INDEX_SEEK. If you have INDEX_SCAN somewhere, it should be sign that you need index on that column.
This would be for start.

Why does my query in oracle take longer when I use temporary tables?

I am facing a peculiar issue when using an inner query in ORACLE DB. I am fetching data from a table which is having huge number of records.
The query I am using contains an inner query.
When I provide the values directly in the inner query it is much
faster.
But when I use exactly the same values from another (temporary) table
by either inner query or JOIN, it takes too longer.
Below is the query:
Faster performance
SELECT assembly_item_id menuItemId,
location_id restId,
bill_sequence_id,
bill_config_id
FROM zil_ibat_resolve_bmi_ai_max_v
WHERE assembly_item_id = 8321
AND location_id IN (82, 85, 116, .........)
Low in performance when used select query in inner section
Without JOIN
SELECT assembly_item_id menuItemId,
location_id restId,
bill_sequence_id,
bill_config_id
FROM zil_ibat_resolve_bmi_ai_max_v
WHERE assembly_item_id = 8321
AND location_id IN (SELECT temp_id FROM global_temp_ids)
With JOIN
SELECT assembly_item_id menuItemId, location_id restId, bill_sequence_id, bill_config_id
FROM zil_ibat_resolve_bmi_ai_max_v t1
join global_temp_ids t2
on t1.location_id = t2.temp_id
WHERE t1.assembly_item_id = 8321
Note: zil_ibat_resolve_bmi_ai_max_v is a view.
What is wrong with this query? Why is it taking so much time when I query table instead of putting the IDs directly in the inner section? Is there an alternate for this?
Explain Plan
usedSelectQueryInInnerSection.png
usedJoin
enterNumbersInInnerQuery
The second and third query are slow because of the NESTED LOOP join between the view results and the temporary table. Changing it to a HASH join, perhaps through better optimizer statistics or a USE_HASH hint, should speed up the query.
Problem
This part at the top of the execution plan:
NESTED LOOPS
zil_ibat_resolve_bmi_ai_max_v
global_temp_ids
is similar to this pseudo-code:
for each row of zil_ibat_resolve_bmi_ai_max_v
search index of global_temp_ids
Based on the images the execution plan for the view does not change between queries, that part must be relatively fast. And the look-up of the temporary table uses a unique index search, that must also be fast. But it is only fast to do it once. And we can tell from the the Cardinality 1 that the Oracle optimizer thinks it will only execute the inner part of the join once.
NESTED LOOPs are great when joining a small number of rows. HASH JOINs work much better when joining a large number of rows.
Solutions
There are many ways to change the join method, here are the two to try first:
1. Gather statistics. Better optimizer statistics will improve the cardinality estimates, which will usually improve execution plans. There are many ways to gather stats but usually the default settings are the best. In this case they can be gathered by running a procedure like this: exec dbms_stats.gather_schema_stats('SMART'); Repeat that for the schemas ZILADMIN and XCBAIRAG. If the statistics were missing or stale it would also be a good idea to investigate why the default statistics gathering job did not run.
2. Hint. Hints should generally be avoided in production code but they can still at least be helpful to diagnose the problem. Run the query with the hint SELECT /*+ USE_HASH(t1 t2) */ ... and see if that improves things. If that works you can either keep the hint or consider using some other form of plan management. For example, a SQL Profile may solve this and other problems in a cleaner way. Check with other developers or DBAs to find out what types of plan management features are common in your system.

Speeding up aggregations for a large table in Oracle

I am trying to see how to improve performance for aggregation queries in an Oracle database. The system is used to run financial series simulations.
Here is the simplified set-up:
The first table table1 has the following columns
date | id | value
It is read-only, has about 100 million rows and is indexed on id, date
The second table table2 is generated by the application according to user input, is relatively small (300K rows) and has this layout:
id | start_date | end_date | factor
After the second table is generated, I need to compute totals as follows:
select date, sum(value * nvl(factor,1)) as total
from table1
left join table2 on table1.id = table2.id
and table1.date between table2.start_date and table2.end_date group by date
My issue is that this is slow, taking up to 20-30 minutes if the second table is particularly large. Is there a generic way to speed this up, perhaps trading off storage space and execution time, ideally, to achieve something running in under a minute?
I am not a database expert and have been reading Oracle performance tuning docs but was not able to find anything appropriate for this. The most promising idea I found were OLAP cubes but I understand this would help only if my second table was fixed and I simply needed to apply different filters on the data.
First, to provide any real insight, you'd need to determine the execution plan that Oracle is producing for the slow query.
You say the second table is ~300K rows - yes that's small compared to 100M but since you have a range condition in the join between the two tables, it's hard to say how many rows from table1 are likely to be accessed in any given execution of the query. If a large proportion of the table is accessed, but the query optimizer doesn't recognize that, the index may actually be hurting instead of helping.
You might benefit from re-organizing table1 as an index-organized table, since you already have an index that covers most of the columns. But all I can say from the information so far is that it might help, but it might not.
Apart from indexes, Also try below. My two cents!
Try running this Query with PARALLEL option employing multiple processors. /*+ PARALLEL(table1,4) */ .
NVL has been done for million of rows, and this will be an impact
to some extent, any way data can be organised?
When you know the date in Advance, probably you divide this Query
into two chunks, by fetching the ids in TABLE2 using the start
date and end date. And issue a JOIN it to TABLE1 using a
view or temp table. By this we use the index (with id as
leading edge) optimally
Thanks!

Oracle: How can I find tablespace fragmentation?

I've a JOIN beween two tables. It's really really slow and I can't find why.
The query takes hours in a PRODUCTION environment on a very big Client.
Can you ask me what you need to understand why it doesn't work well?
I can add indexes, partition the table, etc. It's Oracle 10g.
I expect a few thousand record. Because of the following condition:
f.eif_campo1 != c.fornitura AND and f.field29 = 'New'
Infact it should be always verified for all 18 million records
SELECT c.id_messaggio
,f.campo1
,c.f
FROM
flows c,
tab f
WHERE
f.field198 = c.id_messaggio
AND f.extra_id = c.extra_id
and f.field1 != c.ExampleF
and f.field29 = 'New'
and c.processtype in ('Example1')
and c.flag_ann = 'N';
Selectivity for the following record expressed as number of distinct values:
COUNT (DISTINCT extra_id) =>17*10^6,
COUNT (DISTINCT (extra_id || field20)) =>17*10^6,
COUNT (DISTINCT field198) =>36*10^6,
COUNT (DISTINCT (field19 || field20)) =>45*10^6,
COUNT (DISTINCT (field1)) =>18*10^6,
COUNT (DISTINCT (field20)) =>47
This is the execution plan [See large image][1]
![enter image description here][2]
Extra details:
I have relaxed one contition to see how many records are taken. 300 thousand.
![enter image description here][7]
--03:57 mins with parallel execution /*+ parallel(c 8) parallel(f 24) */
--395.358 rows
SELECT count(1)
FROM
flows c,
flet f
WHERE
f.field19 = c.id_messaggio
AND f.extra_id = c.extra_id
and f.field20 = 'ExampleF'
and c.process_type in ('ExampleP')
and c.flag_ann = 'N';
Your explain plan shows the following.
The database uses an index to retrieve rows from ENI_FLUSSI_HUB where
flh_tipo_processo_cod in ('VT','VOLTURA_ENI','CC')
It then winnows the rows
where flh_flag_ann = 'N'
This produces a result set which is used to access
rows from ETL_ELAB_INTERF_FLAT on the basis of f.idde_identif_dati_ext_id =
c.idde_identif_dati_ext_id
Finally those rows are filtered on the basis of the
remaining parts of the WHERE clause.
Now, the starting point is a good one if flh_tipo_processo_cod is a selective
column: that is, if it contains hundreds of different values, or if the values in
your list are relatively rare. It might even be a good path of the flag column
identifies relatively few columns with a value of 'N'. So you need to understand
both the distribution of your data - how many distinct values you have - and its
skew - which values appear very often or hardly at all. The overall
performance suggests that the distribution and/or skew of the
flh_tipo_processo_cod and flh_flag_ann columns is not good.
So what can you do? One approach is to follow Ben's suggestion, and use full
table scans. If you have an Enterprise Edition licence and plenty of CPU capacity
you could try parallel query to improve things. That might still be too slow, or it might be too disruptive for other users.
An alternative approach would be to use better indexes. A composite index on
eni_flussi_hub(flh_tipo_processo_cod,flh_flag_ann,idde_identif_dati_ext_id,
flh_fornitura,flh_id_messaggio) would avoid the need to read that table. Whether
this would be a new index or a replacement for ENI_FLK_IDX3 depends on the other
activity against the table. You might be able to benefit from index compression.
All the columns in the query projection are referenced in the WHERE clause. So
you could also use a composite index on the other table to avoid table reads. Agsin you need to understand the distribution and skew of the data. But you should probably lead with the least selective columns. Something like etl_elab_interf_flat(etl_elab_interf_flat,eif_campo200,dde_identif_dati_ext_id,eif_campo1,eif_campo198). Probably this is a new index. It's unlikely you would want to replace ETL_EIF_FK_IDX4 with this (especially if that really is an index on a foreign key constraint)..
Of course these are just guesses on my part. Tuning is a science and to do it properly requires lots of data. Use the Wait Interface to investigate where the database is spending its time. Use the 10053 event to understand why the Optimizer makes the choices it does. But above all, don't implement partitioning unless you really know the ramifications.
The simple answer seems to be your explain plan. You're accessing both tables by index rowid. Whilst to select a single row you cannot - to my knowledge - get faster, in your case you're selecting a lot more than a single row.
This means that for every single row you, you're going into both tables one row at a time, which when you're looking a significant proportion of a table or index is not what you want to do.
My suggestion would be to force a full scan of one or both of your tables. Try to use the smaller as a driver first:
SELECT /*+ full(c) */ c.flh_id_messaggio
, f.eif_campo1
, c.f
FROM flows c,
JOIN flet f
ON f.field19 = c.flh_id_messaggio
AND f.extra_id = c.extra_id
AND f.field1 <> c.f
WHERE ...
But you may have to change /*+ full(c) */ to /*+ full(c) full(f) */.
Your indexes seem to be separate column indexes as well. For this, and if possible, I would have indexes on:
flows of id_messaggio, extra_id, f
and on flet of field19, extra_id, field1.
This will only really matter if you do not use as full scan. Or, if you have everything you're returning and selecting is in one index.