Friends
while executing where clause in Oracle SQL suppose I have
UPDATE schema1.TBL_SCHEMA1_PROCESS_FEED F
SET F.TBL_SCHEMA1_PROCESS_LINE_ID = V_LINE_ID,
F.TBL_SCHEMA1_PROCESS_LINE_TYPE_ID = V_LINE_TYPE_ID,
F.TBL_SCHEMA1_PROCESS_LINE_SUB_TYPE_ID = V_SUB_TYPE_ID,
WHERE F.CURR_DATE = V_CURR_DATE
AND F.NEXT_DATE = V_NEXT_BUSINESS_DATE OR F.NEXT_DATE IS NULL;
How this code can be optimized for the condition
F.NEXT_DATE = V_NEXT_BUSINESS_DATE OR F.NEXT_DATE IS NULL
Is that your actual where clause? Do you mean it to be:
WHERE F.CURR_DATE = V_CURR_DATE
AND ( F.NEXT_DATE = V_NEXT_BUSINESS_DATE
OR F.NEXT_DATE IS NULL )
If so then you need an index, unique if possible, on curr_date.
If you're not satisfied that this provides a large enough improvement in the execution time then think about extending it to curr_date, next_date. Don't create a larger index if you don't need to.
You might also consider chaning your conditions slightly, though I doubt it would make much, if any, difference.
WHERE F.CURR_DATE = V_CURR_DATE
AND NVL(F.NEXT_DATE, V_NEXT_BUSINESS_DATE) = V_NEXT_BUSINESS_DATE
The best possible option is, to update using the rowid. Without a lot more information it's impossible to know if you're in a situation where this might be possible but as the rowid is a unique address in the table it always is quicker than indexes, when updating a single row. If you're collecting data from this table then populating your variables before writing back to the table then this would be possible.
Are those your actual schema and table names... if they are then why not think about chosing something more descriptive?
Related
I have the following query that took too much time to be executed.
How to optimize it?
Update Fact_IU_Lead
set
Fact_IU_Lead.Latitude_Point_Vente = Adr.Latitude,
Fact_IU_Lead.Longitude_Point_Vente = Adr.Longitude
FROM Dim_IU_PointVente
INNER JOIN
Data_I_Adresse AS Adr ON Dim_IU_PointVente.Code_Point_Vente = Adr.Code_Point_Vente
INNER JOIN
Fact_IU_Lead ON Dim_IU_PointVente.Code_Point_Vente = Fact_IU_Lead.Code_Point_Vente
WHERE
Latitude_Point_Vente is null
or Longitude_Point_Vente is null and Adr.[Error]=0
Couple of things I would look at on this to help.
How many records are on each table? If it's millions, then you may need to cycle through them.
Are the columns you're joining on or filtering on indexed on each table? If no, add them in! typically a huge speed difference with less cost.
Are the columns you're joining on stored as text instead of geo-spatial? I've had much better performance out of geo-spatial data types in this scenario. Just make sure your SRIDs are the same across tables.
Are the columns you're updating indexed, or is the table that's being updated heavy with indexes? Tons of indexes on a large table can be great for looking things up, but kills update/insert speeds.
Take a look at those first.
I've added a bit of slight cleaning to your code in regard to aliases.
Also, take a look at the where clauses. Choose one of them.
When you have mix and's and or's the best thing you can ever do is add parenthesis.
At a minimum, you'll have zero question regarding your thoughts when you wrote it.
At most, you'll know that SQL is executing your logic correctly.
Update Fact_IU_Lead
set
Latitude_Point_Vente = Adr.Latitude --Note the table prefix is removed
, Longitude_Point_Vente = Adr.Longitude --Note the table prefix is removed
FROM Dim_IU_PointVente as pv --Added alias
INNER JOIN
Data_I_Adresse AS adr ON pv.Code_Point_Vente = adr.Code_Point_Vente --carried alias
INNER JOIN
Fact_IU_Lead as fl ON pv.Code_Point_Vente = fl.Code_Point_Vente --added/carried alias
WHERE
(pv.Latitude_Point_Vente is null or pv.Longitude_Point_Vente is null) and adr.[Error] = 0 --carried alias, option one for WHERE change
pv.Latitude_Point_Vente is null or (pv.Longitude_Point_Vente is null and adr.[Error] = 0) --carried alias, option two for WHERE change
Making joins is usually expensive, the best approach in your case will be to place the update into a stored procedure, split your update into selects and use a transaction to keep everything consistent (if needed) instead.
Hope this answer point you in the right direction :)
I come up by chances to this curious case.
Environment:
Oracle 12.2.2
Involved 2 tables.
N. of rows 16 milions
As far I know, and reported here Oracle / PLSQL: EXISTS Condition the use of where exists is in general less perfomant of other way.
In my case however when updating a table's column with the value with another on join condition with the exists, the query run in about 12-13 sec without issues(I did only some check, as I really do not know all the content of the table):
update fdm_auftrag ou
set (ou.e_hr,ou.e_budget) = ( select b.e_hr,b.e_budget
from fdm_budget_auftrag b
where b.fk_column1 = ou.fk_column1
and b.fk_column2 = ou.fk_column2
and b.fk_col3 = ou.fk_col3 )
where exists ( select b.e_hr,b.e_budget
from fdm_budget_auftrag b
where b.fk_column1 = ou.fk_column1
and b.fk_column2 = ou.fk_column2
and b.fk_col3 = ou.fk_col3 );
instead without the exists, it takes so much time then I even interrupt it.
I am just gessing as the condition in exist is valuated as a boolean, if the enginee found out at least one row, then had to do less touch on the DB, but I am not sure about it.
It is correct this "guess", have someone a more clear explanation?
The where clause is limiting the number of rows being updated.
Fewer updated rows means that the update query runs faster. There is a lot of overhead to updating a row, including stashing away information for roll-back purposes.
I am assuming that you are updating relatively few rows in a much larger table. If the where clause is selecting most of the rows, then there might be no performance difference.
And, finally, the two queries are not identical. Without the where unmatched values will be assigned NULL.
I have a simple query that selects all records in a table that match the id criteria provided.
SELECT * FROM table WHERE id = x or id = y or id = z;
It works fine, however I have over 50 ID's that need to be included in the where clause. So when it comes to performance, would it be better to do a WHERE IN clause, rather than OR? Or is there a better way to execute this that I am totally overlooking?
Thank you
PostrgeSQL behavior can be checked with EXPLAIN (analyze, buffers).
This is the only way to understand what database is doing.
In case your list grows big, you can try joining with a VALUES construct instead.
Please, check these:
https://dba.stackexchange.com/questions/91247/optimizing-a-postgres-query-with-a-large-in
https://www.datadoghq.com/blog/100x-faster-postgres-performance-by-changing-1-line/
I'm trying to optimise a select (cursor in pl/sql code actually) that includes a pl/sql function e.g.
select * from mytable t,mytable2 t2...
where t.thing = 'XXX'
... lots more joins and sql predicate on various columns
and myplsqlfunction(t.val) = 'X'
The myplsqlfunction() is very expensive, but is only applicable to a manageably small subset of the other conditions.
The problem is that Oracle appears to evaluating myplsqlfunction() on more data than is ideal.
My evidence for this is if I recast the above as either
select * from (
select * from mytable t,mytable2 t2...
where t.thing = 'XXX'
... lots more joins and sql predicate on various columns
) where myplsqlfunction(t.val) = 'X'
or pl/sql as:
begin
for t in ( select * from mytable t,mytable2 t2...
where t.thing = 'XXX'
... lots more joins and sql predicate on various columns ) loop
if myplsqlfunction(t.val) = 'X' then
-- process the desired subset
end if;
end loop;
end;
performance is an order of magnitude better.
I am resigned to restructuring the offending code to use either of the 2 above idioms, but it would be delighted if there was any simpler way to get the Oracle optimizer to do this for me.
You could specify a bunch of hints to force a particular plan. But that would almost assuredly be more of a pain than restructuring the code.
I would expect that what you really want to do is to associate non-default statistics with the function. If you tell Oracle that the function is less selective than the optimizer is guessing or (more likely) if you provide high values for the CPU or I/O cost of the function, you'll cause the optimizer to try to call the function as few times as possible. The oracle-developer.net article walks through how to pick reasonably correct values for the cost (or going a step beyond that how to make those statistics change over time as the cost of the function call changes). You can probably fix your immediate problem by setting crazy-high costs but you probably want to go to the effort of setting accurate values so that you're giving the optimizer the most accurate information possible. Setting costs way too high or way too low tends to cause some set of queries to do something stupid.
You can use WITH clause to first evaluate all your join conditions and get a manageable subset of data. Then you can go for the pl/sql Function on the subset of data. But it all depends on the volume still you can try this. Let me know for any issues.
You can use CTE like:
WITH X as
( select /*+ MATERIALIZE */ * from mytable t,mytable2 t2...
where t.thing = 'XXX'
... lots more joins and sql predicate on various columns
)
SELECT * FROM X
where myplsqlfunction(t.val) = 'X';
Note the Materiliaze hint. CTEs can be either inlined or materialized(into TEMP tablespace).
Another option would be to use NO_PUSH_PRED hint. This is generally better solution (avoids materializing of the subquery), but it requires some tweaking.
PS: you should not call another SQL from myplsqlfunction. This SQL might see data added after your query started and you might get surprising results.
You can also declare your function as RESULT_CACHE, to force the Oracle to remember return values from the function - if applicable i.e. the amount of possible function's parameter values is reasonably small.
Probably the best solution is to associate the stats, as Justin describes.
I've a JOIN beween two tables. It's really really slow and I can't find why.
The query takes hours in a PRODUCTION environment on a very big Client.
Can you ask me what you need to understand why it doesn't work well?
I can add indexes, partition the table, etc. It's Oracle 10g.
I expect a few thousand record. Because of the following condition:
f.eif_campo1 != c.fornitura AND and f.field29 = 'New'
Infact it should be always verified for all 18 million records
SELECT c.id_messaggio
,f.campo1
,c.f
FROM
flows c,
tab f
WHERE
f.field198 = c.id_messaggio
AND f.extra_id = c.extra_id
and f.field1 != c.ExampleF
and f.field29 = 'New'
and c.processtype in ('Example1')
and c.flag_ann = 'N';
Selectivity for the following record expressed as number of distinct values:
COUNT (DISTINCT extra_id) =>17*10^6,
COUNT (DISTINCT (extra_id || field20)) =>17*10^6,
COUNT (DISTINCT field198) =>36*10^6,
COUNT (DISTINCT (field19 || field20)) =>45*10^6,
COUNT (DISTINCT (field1)) =>18*10^6,
COUNT (DISTINCT (field20)) =>47
This is the execution plan [See large image][1]
![enter image description here][2]
Extra details:
I have relaxed one contition to see how many records are taken. 300 thousand.
![enter image description here][7]
--03:57 mins with parallel execution /*+ parallel(c 8) parallel(f 24) */
--395.358 rows
SELECT count(1)
FROM
flows c,
flet f
WHERE
f.field19 = c.id_messaggio
AND f.extra_id = c.extra_id
and f.field20 = 'ExampleF'
and c.process_type in ('ExampleP')
and c.flag_ann = 'N';
Your explain plan shows the following.
The database uses an index to retrieve rows from ENI_FLUSSI_HUB where
flh_tipo_processo_cod in ('VT','VOLTURA_ENI','CC')
It then winnows the rows
where flh_flag_ann = 'N'
This produces a result set which is used to access
rows from ETL_ELAB_INTERF_FLAT on the basis of f.idde_identif_dati_ext_id =
c.idde_identif_dati_ext_id
Finally those rows are filtered on the basis of the
remaining parts of the WHERE clause.
Now, the starting point is a good one if flh_tipo_processo_cod is a selective
column: that is, if it contains hundreds of different values, or if the values in
your list are relatively rare. It might even be a good path of the flag column
identifies relatively few columns with a value of 'N'. So you need to understand
both the distribution of your data - how many distinct values you have - and its
skew - which values appear very often or hardly at all. The overall
performance suggests that the distribution and/or skew of the
flh_tipo_processo_cod and flh_flag_ann columns is not good.
So what can you do? One approach is to follow Ben's suggestion, and use full
table scans. If you have an Enterprise Edition licence and plenty of CPU capacity
you could try parallel query to improve things. That might still be too slow, or it might be too disruptive for other users.
An alternative approach would be to use better indexes. A composite index on
eni_flussi_hub(flh_tipo_processo_cod,flh_flag_ann,idde_identif_dati_ext_id,
flh_fornitura,flh_id_messaggio) would avoid the need to read that table. Whether
this would be a new index or a replacement for ENI_FLK_IDX3 depends on the other
activity against the table. You might be able to benefit from index compression.
All the columns in the query projection are referenced in the WHERE clause. So
you could also use a composite index on the other table to avoid table reads. Agsin you need to understand the distribution and skew of the data. But you should probably lead with the least selective columns. Something like etl_elab_interf_flat(etl_elab_interf_flat,eif_campo200,dde_identif_dati_ext_id,eif_campo1,eif_campo198). Probably this is a new index. It's unlikely you would want to replace ETL_EIF_FK_IDX4 with this (especially if that really is an index on a foreign key constraint)..
Of course these are just guesses on my part. Tuning is a science and to do it properly requires lots of data. Use the Wait Interface to investigate where the database is spending its time. Use the 10053 event to understand why the Optimizer makes the choices it does. But above all, don't implement partitioning unless you really know the ramifications.
The simple answer seems to be your explain plan. You're accessing both tables by index rowid. Whilst to select a single row you cannot - to my knowledge - get faster, in your case you're selecting a lot more than a single row.
This means that for every single row you, you're going into both tables one row at a time, which when you're looking a significant proportion of a table or index is not what you want to do.
My suggestion would be to force a full scan of one or both of your tables. Try to use the smaller as a driver first:
SELECT /*+ full(c) */ c.flh_id_messaggio
, f.eif_campo1
, c.f
FROM flows c,
JOIN flet f
ON f.field19 = c.flh_id_messaggio
AND f.extra_id = c.extra_id
AND f.field1 <> c.f
WHERE ...
But you may have to change /*+ full(c) */ to /*+ full(c) full(f) */.
Your indexes seem to be separate column indexes as well. For this, and if possible, I would have indexes on:
flows of id_messaggio, extra_id, f
and on flet of field19, extra_id, field1.
This will only really matter if you do not use as full scan. Or, if you have everything you're returning and selecting is in one index.