I have this query that get list of records and traces the genealogy of each record but it runs forever. Can anyone help me improve the performance?
WITH root_nodes AS
(SELECT distinct dlot.dim_lot_key AS lot_key, facility, lot
FROM pedwroot.dim_lot dlot
JOIN AT_LOT a
ON (a.at_lot = dlot.lot AND a.at_facility = dlot.facility)
WHERE (dlot.has_test_lpt = 'Y'
or dlot.has_post_test_lpt = 'Y') and a.at_facility = 'MLA'),
upstream_genealogy AS
(SELECT /*+ INDEX(fact_link_lot IX_R_FLLOT_DLOT_2)*/DISTINCT CONNECT_BY_ROOT
fllot.dst_lot_key AS root_lot_key,
fllot.src_lot_key
FROM pedwroot.fact_link_lot fllot
CONNECT BY NOCYCLE PRIOR fllot.src_lot_key = fllot.dst_lot_key
START WITH fllot.dst_lot_key IN (SELECT lot_key FROM root_nodes)),
at_lst AS
(Select *
FROM pedwroot.dim_lot dlot_lst
JOIN upstream_genealogy upgen
ON (upgen.src_lot_key = dlot_lst.dim_lot_key)
where dlot_lst.has_assembly_lpt = 'Y')
SELECT distinct dlot_root.lot AS AT_LOT,
dlot_root.facility AS AT_FACILITY,
dfac_root.common_name AS AT_SITE,
dlot_root.LTC AS AT_LTC,
al.lot AS AT_LST,
dlot_src.lot AS FAB_LOT,
dlot_src.facility AS FAB_FACILITY,
dfac_src.common_name AS FAB_SITE
FROM upstream_genealogy upgen
JOIN at_lst al
ON (upgen.root_lot_key = al.root_lot_key)
JOIN pedwroot.dim_lot dlot_src
ON (upgen.src_lot_key = dlot_src.dim_lot_key)
JOIN pedwroot.dim_lot dlot_root
ON (upgen.root_lot_key = dlot_root.dim_lot_key)
JOIN pedwroot.fact_lot flot
ON (dlot_root.dim_lot_key = flot.lot_key)
JOIN pedwroot.dim_facility dfac_root
ON (flot.facility_key = dfac_root.dim_facility_key)
JOIN pedwroot.dim_facility dfac_src
ON (flot.fab_facility_key = dfac_src.dim_facility_key)
WHERE dlot_src.has_fab_lpt = 'Y';
Below is the explain plan of this query
The sudden change in cardinality from 11 million to 1 looks like a problem. Remove tables and predicates from the query until you find out exactly what is causing that poor estimate.
Most of the time these issue are caused by bad statistics, try gathering stats for all the related tables. (I can think of dozens of other potential problems, but it's probably not worth guessing until you can shrink the problem a little.)
Related
UPDATE: Changed title. Previous title "Does UNION instead of OR always speed up queries?"
Here is my query. The question is concerning the second last line with the OR:
SELECT distinct bigUnionQuery.customer
FROM ((SELECT buyer.customer
FROM membership_vw buyer
JOIN account_vw account
ON account.buyer = buyer.id
WHERE account.closedate >= 'some_date')
UNION
(SELECT joint.customer
FROM entity_vw joint
JOIN transactorassociation_vw assoc
ON assoc.associatedentity = joint.id
JOIN account_vw account
ON account.buyer = assoc.entity
WHERE assoc.account is null and account.closedate >= 'some_date')
UNION
(SELECT joint.customer
FROM entity_vw joint
JOIN transactorassociation_vw assoc
ON assoc.associatedentity = joint.id
JOIN account_vw account
ON account.id = assoc.account
WHERE account.closedate >= '2021-02-11 00:30:22.339'))
AS bigUnionQuery
JOIN entity_vw
ON entity_vw.customer = bigUnionQuery.customer OR entity_vw.id = bigUnionQuery.customer
WHERE entity_vw.lastmodifieddate >= 'some_date';
The original query doesn't have the OR in the second last line. Adding the OR here has slowed down the query. I'm wondering if there is a way to use UNION here to speed it up.
I tried doing (pseudo):
bigUnionQuery bq join entity_vw e on e.customer = bq.customer
union
bigUnionQuery bq join entity_vw e on e.id = bq.customer
But that slowed down the query even more, probably because the bigUnionQuery is a large, slow query, and running it twice in the UNION is not the correct way. What would be the right way to use UNION here, or is it always going to be faster with OR?
Does UNION instead of OR always speed up queries? In some cases it does. I think it depends on your indexes too. I have worked on tables with 1 million records and my queries' speed usually improves if I use union instead of 'or' or 'and'.
I am dealing with a monster query ( ~800 lines ) on oracle 11, and its taking expensive resources.
The main problem here is a table mouvement with about ~18 million lines, on which I have like 30 left joins on this table.
LEFT JOIN mouvement mracct_ad1
ON mracct_ad1.code_portefeuille = t.code_portefeuille
AND mracct_ad1.statut_ligne = 'PROPRE'
AND substr(mracct_ad1.code_valeur,1,4) = 'MRAC'
AND mracct_ad1.code_transaction = t.code_transaction
LEFT JOIN mouvement mracct_zias
ON mracct_zias.code_portefeuille = t.code_portefeuille
AND mracct_zias.statut_ligne = 'PROPRE'
AND substr(mracct_zias.code_valeur,1,4) = 'PRAC'
AND mracct_zias.code_transaction = t.code_transaction
LEFT JOIN mouvement mracct_zixs
ON mracct_zias.code_portefeuille = t.code_portefeuille
AND mracct_zias.statut_ligne = 'XROPRE'
AND substr(mracct_zias.code_valeur,1,4) = 'MRAT'
AND mracct_zias.code_transaction = t.code_transaction
is there some way so I can get rid of the left joins, (union join or example) to make the query faster and consumes less? execution plan or something?
Just a note on performance. Usually you want to "rephrase" conditions like:
AND substr(mracct_ad1.code_valeur,1,4) = 'MRAC'
In simple words, expressions on the left side of the equality will prevent the best usage of indexes and may push the SQL optimizer toward a less than optimal plan. The database engine will end up doing more work than is really needed, and the query will be [much] slower. In extreme cases they can even decide to use a Full Table Scan. In this case you can rephrase it as:
AND mracct_ad1.code_valeur like 'MRAC%'
or:
AND mracct_ad1.code_valeur >= 'MRAC' AND mracct_ad1.code_valeur < 'MRAD'
I am guessing so. Your code sample doesn't make much sense, but you can probably do conditional aggregation:
left join
(select m.code_portefeuille, m.code_transaction,
max(case when m.statut_ligne = 'PROPRE' and m.code_valeur like 'MRAC%' then ? end) as ad1,
max(case when m.statut_ligne = 'PROPRE' and m.code_valeur like 'MRAC%' then ? end) as zia,
. . . -- for all the rest of the joins as well
from mouvement m
group by m.code_portefeuille, m.code_transaction
) m
on m.code_portefeuille = t.code_portefeuille and m.code_transaction = t.code_transaction
You can probably replace all 30 joins with a single join to the aggregated table.
This basic query is throwing a System.OutOfMemoryException error after I join job_price_line to job_price_hdr
Would creating a temp table speed up this query? Im not understanding other explanations I have read on this topic. Thanks!
select
oe_line.qty_invoiced,
invoice_hdr.invoice_no,
invoice_hdr.invoice_date,
invoice_line.unit_price,
invoice_line.item_desc,
invoice_line.customer_part_number,
invoice_line.pricing_unit,
invoice_hdr.ship_to_id,
invoice_hdr.po_no,
invoice_hdr.ship_to_id,
invoice_line.item_id,
invoice_hdr.customer_id,
job_price_hdr.contract_no,
job_price_hdr.cancelled,
job_price_line.line_no,
invoice_hdr.sales_location_id
from invoice_hdr
join invoice_line on invoice_line.invoice_no = invoice_hdr.invoice_no
join oe_line on oe_line.order_no = invoice_hdr.order_no
join job_price_hdr on job_price_hdr.corp_address_id = invoice_hdr.corp_address_id
join job_price_line on job_price_line.job_price_hdr_uid = job_price_hdr.job_price_hdr_uid
where invoice_hdr.invoice_date between ('2016-05-02') and ('2016-05-03')
and job_price_hdr.cancelled = 'N'
and invoice_hdr.sales_location_id = '200'
No matter of speed is going to solve an out of memory exception. It looks like your last join has multiplied up heavily the number of records your returning. Try replacing your fields list with count(*) to see how many records your getting back first.
I have a DB2 query as follows:
SELECT DISTINCT RETAILMASTERFILE.DOIDCD AS "RETAILMASTERFILE_DOIDCD",
RETAILMASTERFILE.COCOMO AS "RETAILMASTERFILE_COCOMO",
#XENOS.CUSTREF AS "XENOS_CUSTREF",
#XENOS.ADDUDT AS "XENOS_ADDUDT",
#XENOS.ADUPDD AS "XENOS_ADUPDD",
#XENOS.ADUPDT AS "XENOS_ADUPDT",
#XENOS.ADSTAT AS "XENOS_ADSTAT"
FROM RETAILMASTERFILE INNER JOIN
#XENOS ON RETAILMASTERFILE.DOCOMP = #XENOS.ADCOMP
AND RETAILMASTERFILE.COCOMO = #XENOS.ADDELN
WHERE (RETAILMASTERFILE.DOIDCD = 'CUST008')
AND (RETAILMASTERFILE.COCOMO = '345126032')
AND (RETAILMASTERFILE.DOCOMP = 'LONDON')
The problem is #XENOS.ADUPDT may not be unique which gives me an unwanted duplicate record.
Is there any way I can exclude this from consideration ? Everything I've tried so far within my limited knowledge and crude understanding of group by has so far broken my query.
Use GROUP BY instead:
SELECT RETAILMASTERFILE.DOIDCD AS "RETAILMASTERFILE_DOIDCD",
RETAILMASTERFILE.COCOMO AS "RETAILMASTERFILE_COCOMO",
#XENOS.CUSTREF AS "XENOS_CUSTREF",
#XENOS.ADDUDT AS "XENOS_ADDUDT",
#XENOS.ADUPDD AS "XENOS_ADUPDD",
MAX(#XENOS.ADUPDT) AS "XENOS_ADUPDT",
#XENOS.ADSTAT AS "XENOS_ADSTAT"
FROM RETAILMASTERFILE INNER JOIN
#XENOS
ON RETAILMASTERFILE.DOCOMP = #XENOS.ADCOMP AND
RETAILMASTERFILE.COCOMO = #XENOS.ADDELN
WHERE (RETAILMASTERFILE.DOIDCD = 'CUST008') AND (RETAILMASTERFILE.COCOMO = '345126032') AND
(RETAILMASTERFILE.DOCOMP = 'LONDON')
GROUP BY RETAILMASTERFILE.DOIDCD,
RETAILMASTERFILE.COCOMO,
#XENOS.CUSTREF,
#XENOS.ADDUDT,
#XENOS.ADUPDD,
#XENOS.ADSTAT;
This two SQL statements return equal results but first one is much slower than the second:
SELECT leading.email, kstatus.name, contacts.status
FROM clients
JOIN clients_leading ON clients.id_client = clients_leading.id_client
JOIN leading ON clients_leading.id_leading = leading.id_leading
JOIN contacts ON contacts.id_k_p = clients_leading.id_group
JOIN kstatus on contacts.status = kstatus.id_kstatus
WHERE (clients.email = 'some_email' OR clients.email1 = 'some_email')
ORDER BY contacts.date DESC;
SELECT leading.email, kstatus.name, contacts.status
FROM (
SELECT *
FROM clients
WHERE (clients.email = 'some_email' OR clients.email1 = 'some_email')
)
AS clients
JOIN clients_leading ON clients.id_client = clients_leading.id_client
JOIN leading ON clients_leading.id_leading = leading.id_leading
JOIN contacts ON contacts.id_k_p = clients_leading.id_group
JOIN kstatus on contacts.status = kstatus.id_kstatus
ORDER BY contacts.date DESC;
But I'm wondering why is it so? It looks like in the firt statement joins are done first and then WHERE clause is applied and in second is just the opposite. But will it behave the same way on all DB engines (I tested it on MySQL)?
I was expecting DB engine can optimize queries like the fors one and firs apply WHERE clause and then make joins.
There are a lot of different reasons this could be (keying etc), but you can look at the explain mysql command to see how the statements are being executed. If you can run that and if it still is a mystery post it.
You always can replace join with nested query... It's always faster but lot messy...