I have a complex business logic that requires me to perform a 2-levels nested query. The queries are generated by Django's ORM. At the bottom of the question I'll provide the queries as-is as well as a full EXPLAIN suitable to be viewed with PEV2, but in order to help readers understand better the question, I'll start with a more conceptual explanation.
This is how a very naive description of what we're doing looks like:
some_ids = get_id_based_on_some_conditions(*conditions*)
some_other_ids = get_some_other_ids_based_on_some_conditions_and_filtering_by_some_ids(*other_conditions*, some_ids)
results = get_results_based_on_even_more_conditions_and_filtering_by_some_other_ids(*another_set_of_conditions*, some_other_ids)
Translating the following pseudo-code to actual SQL using subqueries is quite easy. A straightforward translation becomes into the following pseudo-query:
select
foo,
bar
from
t1,
t2
where
condition1 = something and
condition2 in ( <---- first level subquery
select
id
from
t3
where
condition3 = another_something and
condition4 in ( <---- second level subquery
select
another_id
from
t4
where
condition5 = something_something and
condition6 = another_something_something
)
)
Since the query is taking a considerable amount of time (~0.6s) given the number of rows that it returns (a little bit over 9.000), I thought that it might help replacing the second level subquery with an inner join.
That, in fact, made the query even slower (now at ~1.7s). So I thought that maybe the planner didn't correctly understood what would happen with a subquery with an inner join inside and made some serious miscalculations / overestimations / underestimations, so I replaced the first level subquery with more inner joins, which led to even poorer results (now at ~10s).
I have been analysing the EXPLAINS of the queries for hours, and I can't figure out why using inner joins makes everything slower. I also don't know how to tell if my (currently) best query is actually the best I can get or if there are things that I'm not doing and that might speed it up.
So, the questions that I have are:
why are the inner joins slower than subqueries?
how can I tell if I'm doing everything possible in order to squeeze the maximum performance out of my database or if I'm missing something?
Actual queries and EXPLAINS as-is:
Query with 2-levels subqueries:
SELECT DISTINCT
"phdrug_phdrug"."id",
"phdrug_phdrug"."uuid",
"phdrug_phdrug"."default_description",
"phdrug_phdrug"."alternative_description",
"phdrug_phdrug"."ean",
"phdrug_phdrug"."mirror_ean",
"phdrug_phdrug"."parent_ean",
"phdrug_phdrug"."reg_num",
"phdrug_phdrug"."medika_code",
"phdrug_phdrug"."atc_iv",
"phdrug_phdrug"."product_type",
"phdrug_phdrug"."fraction",
"phdrug_phdrug"."active",
"phdrug_phdrug"."loyal",
"phdrug_phdrug"."patent",
"phdrug_phdrug"."chronics",
"phdrug_phdrug"."recipe",
"phdrug_phdrug"."deal",
"phdrug_phdrug"."specialized",
"phdrug_phdrug"."armored",
"phdrug_phdrug"."top_hight_speciality",
"phdrug_phdrug"."top_generic",
"phdrug_phdrug"."hight_speciality",
"phdrug_phdrug"."temp_8_15",
"phdrug_phdrug"."temp_15_25",
"phdrug_phdrug"."temp_2_8",
"phdrug_phdrug"."temp_less_15",
"phdrug_phdrug"."new",
"phdrug_phdrug"."mdk_internal_code",
"phdrug_phdrug"."mdk_single_id",
"phdrug_phdrug"."mdk_object_id",
"phdrug_phdrug"."is_from_mdk_db",
"phdrug_phdrug"."top",
"phdrug_phdrug"."laboratory_name",
"phdrug_phdrug"."laboratory_alternative_name",
"phdrug_phdrug"."imported",
"phdrug_phdrug"."imported_country",
"phdrug_phdrug"."laboratory_id",
"phdrug_phdrug"."specialty",
"phdrug_phdrug"."dimension_id",
"phdrug_phdrug"."featured",
"phdrug_phdrug"."top_ae_rank",
"phdrug_phdrug"."top_farma_rank"
FROM
"phdrug_phdrug"
INNER JOIN "monetary_drugprice" ON ( "phdrug_phdrug"."id" = "monetary_drugprice"."drug_id" )
INNER JOIN "phdrug_phdrugpicture" ON ( "phdrug_phdrug"."id" = "phdrug_phdrugpicture"."drug_id" )
WHERE
(
"monetary_drugprice"."id" IN (
SELECT
V0."id"
FROM
"monetary_drugprice" V0
WHERE
(
V0."pricelist_id" IN (
SELECT DISTINCT ON
( U0."id" ) U0."id"
FROM
"monetary_pricelist" U0
INNER JOIN "monetary_pricelistdestinations" U1 ON ( U0."id" = U1."pricelist_id" )
INNER JOIN "organization_organization" U2 ON ( U0."manager_id" = U2."id" )
INNER JOIN "courier_carrier_pricelists" U3 ON ( U0."id" = U3."pricelist_id" )
INNER JOIN "courier_carrier" U4 ON ( U3."carrier_id" = U4."id" )
INNER JOIN "courier_carrierdelivery" U5 ON ( U4."id" = U5."carrier_id" )
INNER JOIN "monetary_pricelistcountry" U6 ON ( U0."id" = U6."pricelist_id" )
WHERE
(
(
U0."expires" = FALSE
OR (
U0."expires" = TRUE
AND ( U0."datestart" AT TIME ZONE'UTC' ) :: DATE <= '2020-05-01'
AND ( U0."dateend" AT TIME ZONE'UTC' ) :: DATE >= '2020-05-01'
)
)
AND U0."active" = TRUE
AND U1."to_public" = TRUE
AND U2."organization_type" = 2
AND (
U5."dst_country" = 'MX'
OR U5."ignore_country_filter" = TRUE
)
AND U6."country" = 'MX'
AND U2."active" = TRUE
)
)
AND V0."stock" > 0
)
)
AND "phdrug_phdrug"."active" = TRUE
AND "phdrug_phdrugpicture"."is_main" = TRUE
)
ORDER BY
"phdrug_phdrug"."id" ASC,
"phdrug_phdrug"."default_description" ASC
Full explain: https://pastebin.com/jDy3FyKp
Query with 1-level subquery:
SELECT DISTINCT
"phdrug_phdrug"."id",
"phdrug_phdrug"."uuid",
"phdrug_phdrug"."default_description",
"phdrug_phdrug"."alternative_description",
"phdrug_phdrug"."ean",
"phdrug_phdrug"."mirror_ean",
"phdrug_phdrug"."parent_ean",
"phdrug_phdrug"."reg_num",
"phdrug_phdrug"."medika_code",
"phdrug_phdrug"."atc_iv",
"phdrug_phdrug"."product_type",
"phdrug_phdrug"."fraction",
"phdrug_phdrug"."active",
"phdrug_phdrug"."loyal",
"phdrug_phdrug"."patent",
"phdrug_phdrug"."chronics",
"phdrug_phdrug"."recipe",
"phdrug_phdrug"."deal",
"phdrug_phdrug"."specialized",
"phdrug_phdrug"."armored",
"phdrug_phdrug"."top_hight_speciality",
"phdrug_phdrug"."top_generic",
"phdrug_phdrug"."hight_speciality",
"phdrug_phdrug"."temp_8_15",
"phdrug_phdrug"."temp_15_25",
"phdrug_phdrug"."temp_2_8",
"phdrug_phdrug"."temp_less_15",
"phdrug_phdrug"."new",
"phdrug_phdrug"."mdk_internal_code",
"phdrug_phdrug"."mdk_single_id",
"phdrug_phdrug"."mdk_object_id",
"phdrug_phdrug"."is_from_mdk_db",
"phdrug_phdrug"."top",
"phdrug_phdrug"."laboratory_name",
"phdrug_phdrug"."laboratory_alternative_name",
"phdrug_phdrug"."imported",
"phdrug_phdrug"."imported_country",
"phdrug_phdrug"."laboratory_id",
"phdrug_phdrug"."specialty",
"phdrug_phdrug"."dimension_id",
"phdrug_phdrug"."featured",
"phdrug_phdrug"."top_ae_rank",
"phdrug_phdrug"."top_farma_rank"
FROM
"phdrug_phdrug"
INNER JOIN "monetary_drugprice" ON ( "phdrug_phdrug"."id" = "monetary_drugprice"."drug_id" )
INNER JOIN "phdrug_phdrugpicture" ON ( "phdrug_phdrug"."id" = "phdrug_phdrugpicture"."drug_id" )
WHERE
(
"monetary_drugprice"."id" IN (
SELECT
U0."id"
FROM
"monetary_drugprice" U0
INNER JOIN "monetary_pricelist" U1 ON ( U0."pricelist_id" = U1."id" )
INNER JOIN "monetary_pricelistdestinations" U2 ON ( U1."id" = U2."pricelist_id" )
INNER JOIN "organization_organization" U3 ON ( U1."manager_id" = U3."id" )
INNER JOIN "courier_carrier_pricelists" U4 ON ( U1."id" = U4."pricelist_id" )
INNER JOIN "courier_carrier" U5 ON ( U4."carrier_id" = U5."id" )
INNER JOIN "courier_carrierdelivery" U6 ON ( U5."id" = U6."carrier_id" )
INNER JOIN "monetary_pricelistcountry" U7 ON ( U1."id" = U7."pricelist_id" )
WHERE
(
(
U1."expires" = FALSE
OR (
U1."expires" = TRUE
AND ( U1."datestart" AT TIME ZONE'UTC' ) :: DATE <= '2020-05-01'
AND ( U1."dateend" AT TIME ZONE'UTC' ) :: DATE >= '2020-05-01'
)
)
AND U1."active" = TRUE
AND U2."to_public" = TRUE
AND U3."organization_type" = 2
AND (
U6."dst_country" = 'MX'
OR U6."ignore_country_filter" = TRUE
)
AND U7."country" = 'MX'
AND U3."active" = TRUE
AND U0."stock" > 0
)
)
AND "phdrug_phdrug"."active" = TRUE
AND "phdrug_phdrugpicture"."is_main" = TRUE
)
ORDER BY
"phdrug_phdrug"."id" ASC,
"phdrug_phdrug"."default_description" ASC
Full explain: https://pastebin.com/NidTZMxY
Query with only inner joins:
SELECT DISTINCT
"phdrug_phdrug"."id",
"phdrug_phdrug"."uuid",
"phdrug_phdrug"."default_description",
"phdrug_phdrug"."alternative_description",
"phdrug_phdrug"."ean",
"phdrug_phdrug"."mirror_ean",
"phdrug_phdrug"."parent_ean",
"phdrug_phdrug"."reg_num",
"phdrug_phdrug"."medika_code",
"phdrug_phdrug"."atc_iv",
"phdrug_phdrug"."product_type",
"phdrug_phdrug"."fraction",
"phdrug_phdrug"."active",
"phdrug_phdrug"."loyal",
"phdrug_phdrug"."patent",
"phdrug_phdrug"."chronics",
"phdrug_phdrug"."recipe",
"phdrug_phdrug"."deal",
"phdrug_phdrug"."specialized",
"phdrug_phdrug"."armored",
"phdrug_phdrug"."top_hight_speciality",
"phdrug_phdrug"."top_generic",
"phdrug_phdrug"."hight_speciality",
"phdrug_phdrug"."temp_8_15",
"phdrug_phdrug"."temp_15_25",
"phdrug_phdrug"."temp_2_8",
"phdrug_phdrug"."temp_less_15",
"phdrug_phdrug"."new",
"phdrug_phdrug"."mdk_internal_code",
"phdrug_phdrug"."mdk_single_id",
"phdrug_phdrug"."mdk_object_id",
"phdrug_phdrug"."is_from_mdk_db",
"phdrug_phdrug"."top",
"phdrug_phdrug"."laboratory_name",
"phdrug_phdrug"."laboratory_alternative_name",
"phdrug_phdrug"."imported",
"phdrug_phdrug"."imported_country",
"phdrug_phdrug"."laboratory_id",
"phdrug_phdrug"."specialty",
"phdrug_phdrug"."dimension_id",
"phdrug_phdrug"."featured",
"phdrug_phdrug"."top_ae_rank",
"phdrug_phdrug"."top_farma_rank"
FROM
"phdrug_phdrug"
INNER JOIN "monetary_drugprice" ON ( "phdrug_phdrug"."id" = "monetary_drugprice"."drug_id" )
INNER JOIN "monetary_pricelist" ON ( "monetary_drugprice"."pricelist_id" = "monetary_pricelist"."id" )
INNER JOIN "monetary_pricelistdestinations" ON ( "monetary_pricelist"."id" = "monetary_pricelistdestinations"."pricelist_id" )
INNER JOIN "organization_organization" ON ( "monetary_pricelist"."manager_id" = "organization_organization"."id" )
INNER JOIN "courier_carrier_pricelists" ON ( "monetary_pricelist"."id" = "courier_carrier_pricelists"."pricelist_id" )
INNER JOIN "courier_carrier" ON ( "courier_carrier_pricelists"."carrier_id" = "courier_carrier"."id" )
INNER JOIN "courier_carrierdelivery" ON ( "courier_carrier"."id" = "courier_carrierdelivery"."carrier_id" )
INNER JOIN "monetary_pricelistcountry" ON ( "monetary_pricelist"."id" = "monetary_pricelistcountry"."pricelist_id" )
INNER JOIN "phdrug_phdrugpicture" ON ( "phdrug_phdrug"."id" = "phdrug_phdrugpicture"."drug_id" )
WHERE
(
(
"monetary_pricelist"."expires" = FALSE
OR (
"monetary_pricelist"."expires" = TRUE
AND ( "monetary_pricelist"."datestart" AT TIME ZONE'UTC' ) :: DATE <= '2020-05-01'
AND ( "monetary_pricelist"."dateend" AT TIME ZONE'UTC' ) :: DATE >= '2020-05-01'
)
)
AND "monetary_pricelist"."active" = TRUE
AND "monetary_pricelistdestinations"."to_public" = TRUE
AND "organization_organization"."organization_type" = 2
AND (
"courier_carrierdelivery"."dst_country" = 'MX'
OR "courier_carrierdelivery"."ignore_country_filter" = TRUE
)
AND "monetary_pricelistcountry"."country" = 'MX'
AND "organization_organization"."active" = TRUE
AND "monetary_drugprice"."stock" > 0
AND "phdrug_phdrug"."active" = TRUE
AND "phdrug_phdrugpicture"."is_main" = TRUE
)
ORDER BY
"phdrug_phdrug"."id" ASC,
"phdrug_phdrug"."default_description" ASC
Full explain: https://pastebin.com/DaVztBuV
Troubleshooting at this level is difficult without seeing the database structure. I've had to write two different versions of the same script because the environments were different at the different sites.
Make sure the tables are properly indexed.
Make your subscript as part of the FROM statement and not the WHERE statement, unless its part of the IN clause.
Select *
from Table1 t1
left outer join (Select * from Table2) t2 on t1.field = t2.field
If its a large pull and/or large heavy used tables, then using temp tables will speed it as well. But it looks like your script is smaller and this is over kill.
We have an error in production, luckily have a manual solution for this, however I have to run below two queries every morning to fix the error. This is so manual, I want to automate this and combine two queries into one. However we only have this error in production and not in DEV or QA, if I mess up with the combining query that will end up a chaos, So I need your expertise.
1st Query brings project numbers
select id, ugenProjectNumber
from unifier_uxpecai
where (pecaiChecklistNumber = 0 or pecaiChecklistNumber is null)
or (pecaiChecklistItemNumber = 0 or pecaiChecklistItemNumber is null)
2nd Query fix broken links between action items and list items, I manually put 1st query results unique project numbers into second query and run the second query per each unique project numbers.
update unifier_uxpecai pai
set (pai.pecaiChecklistNumber, pai.pecaiChecklistItemNumber) =
(
select pcl.id, pcli.id
from unifier_uxpecl pcl
inner join unifier_uxpecl_lineitem pcli on pcli.uuu_tab_id = 0 and
pcli.record_id = pcl.id
where pcl.ugenProjectNumber = 'GL-16-161010-143502'
and pcli.pecItemActionItemBPC = pai.id
)
where exists
(
select pcli.pecItemActionItemBPC
from unifier_uxpecl pcl
inner join unifier_uxpecl_lineitem pcli on pcli.uuu_tab_id = 0 and
pcli.record_id = pcl.id
where pcl.ugenProjectNumber = 'GL-16-161010-143502'
and pcli.pecItemActionItemBPC = pai.id
)
and (pai.pecaiChecklistNumber = 0 or pai.pecaiChecklistItemNumber = 0)
You can incorporate the logic into the queries:
update unifier_uxpecai pai
set (pai.pecaiChecklistNumber, pai.pecaiChecklistItemNumber) =
(select pcl.id, pcli.id
from unifier_uxpecl pcl join
unifier_uxpecl_lineitem pcli
on pcli.uuu_tab_id = 0 and pcli.record_id = pcl.id
where pcl.ugenProjectNumber in (select ugenProjectNumber
from unifier_uxpecai
where (pecaiChecklistNumber = 0 or pecaiChecklistNumber is null) or
(pecaiChecklistItemNumber = 0 or pecaiChecklistItemNumber is null
) and
pcli.pecItemActionItemBPC = pai.id
)
where exists
(
select pcli.pecItemActionItemBPC
from unifier_uxpecl pcl join
unifier_uxpecl_lineitem pcli
on pcli.uuu_tab_id = 0 and
pcli.record_id = pcl.id
where pcl.ugenProjectNumber in (select ugenProjectNumber
from unifier_uxpecai
where (pecaiChecklistNumber = 0 or pecaiChecklistNumber is null) or
(pecaiChecklistItemNumber = 0 or pecaiChecklistItemNumber is null
) and
pcli.pecItemActionItemBPC = pai.id
) and
(pai.pecaiChecklistNumber = 0 or pai.pecaiChecklistItemNumber = 0)
I am trying to add a table adapter to one of my datasets. The dataset has oracle as its datasource. When I put in the query (which works elsewhere) I get this error. The query has left outer joins in it. If i go to query builder, it then adds stuff like { oj to the query. I knew there was an issue with previous visual studio versions, but I thought they were fixed. How can I get this to work.
SELECT PERS.PERS_ID,
PERS.PERS_LAST_NM,
PERS.PERS_FIRST_NM,
PERS.PERS_MIDDLE_INIT,
PERS.PERS_MIDDLE_INIT,
PERS.PERS_PREFERRED_NM,
PERS.PERS_EMPLOYEE_NO,
PERS.PERS_EMPLOYEE_TYP,
PERS.PERS_STAT,
PERS.PERS_LENT_CD AS PERS_ENTITY,
PERS.PERS_TYP,
PERS.PERS_MSTP_ID,
PERS.PERS_BLDG_ID,
PERS.PERS_TITLE_INIT,
PERS.PERS_PERS_ID,
PERS.PERS_COCC_ID,
PERS.PERS_IORG_CD,
PERS.PERS_CORPORATE_ID,
CONCAT('(', CONCAT(PERS.PERS_PHONE_AREA_CD, CONCAT(')',CONCAT(PERS.PERS_PHONE_EXCH_NO,CONCAT('-',PERS.PERS_PHONE_EXTN_NO))))) AS PERS_PHONE_NO,
UPPER(EMAIL.ELID_LONG_USERID) AS EMPL_EMAIL,
UPPER(USERID.ELID_USERID) AS EMPL_USERID
FROM CDAS.TDWHPERS PERS
LEFT OUTER JOIN (SELECT ELID_PERS_ID,
ELID_LONG_USERID
FROM CDAS.TDWHELID
WHERE ( ELID_EICT_CD = '0010' )
AND ( ELID_OPEN_IND = 'Y' )) EMAIL
ON PERS.PERS_ID = EMAIL.ELID_PERS_ID
LEFT OUTER JOIN (SELECT ELID_PERS_ID,
ELID_USERID
FROM CDAS.TDWHELID TDWHELID_1
WHERE ( ELID_EICT_CD = '9100' )
AND ( ELID_OPEN_IND = 'Y' )) USERID
ON PERS.PERS_ID = USERID.ELID_PERS_ID
WHERE ( PERS.PERS_STAT <> 'INACTIVE' )
AND ( PERS.PERS_STAT <> 'UNKNOWN' )
ORDER BY PERS.PERS_LAST_NM
This Code does not work:
SELECT
(
SELECT [T_Licence].[isInstalled]
FROM [T_Licence]
WHERE [T_Licence].[System] = [T_System].[ID]
AND [T_Licence].[Software] = 750
) AS [IsInstalled] ,*
FROM [T_System]
WHERE [IsInstalled] = 1
I have to do it this way, but this makes the whole code so complicated. I really dont want that:
SELECT
(
SELECT [T_Licence].[isInstalled]
FROM [wf_subj_all].[T_Licence]
WHERE [T_Licence].[System] = [T_System].[ID]
AND [T_Licence].[Software] = 750
) AS [IsInstalled] ,*
FROM [wf_subj_it].[T_System]
WHERE
(
SELECT
(
SELECT [T_Licence].[isInstalled]
FROM [wf_subj_all].[T_Licence]
WHERE [T_Licence].[System] = [T_System].[ID]
AND [T_Licence].[Software] = 750
)
) = 1
Is there any way to do it like shown in the first code snippet?
So that the code stays somehow readeble.
thx very much
Try this one -
SELECT *
FROM wf_subj_it.T_System s
CROSS APPLY (
SELECT /*TOP(1)*/ t.isInstalled
FROM wf_subj_all.T_Licence t
WHERE t.[System] = s.ID
AND t.Software = 750
) t
WHERE t.isInstalled = 1
Just wrap the query with an outer select and it should work.
SELECT *
FROM
(
SELECT
(
SELECT [T_Licence].[isInstalled]
FROM [T_Licence]
WHERE [T_Licence].[System] = [T_System].[ID]
AND [T_Licence].[Software] = 750
) AS [IsInstalled], *
FROM [T_System]
) As tbl1
WHERE [IsInstalled] = 1
The assumptions I made for my answer:
You're trying to select the systems (table T_System) which have software with id=750 installed
The table T_License contains the installed information
There is a 1:n relation between T_System and T_License: T_License may contain 0, 1 or more records per Sytem value...
but the combination System plus Software is unique
I think this will work
SELECT l.[isInstalled], s.*
FROM [wf_subj_it].[T_System] AS s
INNER JOIN [wf_subj_it].[T_License] AS l
ON l.[System] = s.[ID]
AND l.[Software] = 750
AND l.isInstalled = 1
I'm trying to figure out where I could add an index or modify an existing one to make the query go faster.
Main problem is that I cannot change the query itself, it's generated by Business Objects.
The strange thing is that if I change the WHERE clause to use another table, then the query is really fast!
The query is the following:
SELECT Year(listingperformanceindicator.date),
Month(listingperformanceindicator.date),
categoryflattened.parentname,
categoryflattened.groupname,
categoryflattened.categoryname,
Count(listingperformanceindicator.listingid),
( Count(listingperformanceindicator.listingid) ) / 30,
Sum(listingperformanceindicator.listingviews),
CASE
WHEN ( ( Count(listingperformanceindicator.listingid) ) / 30 ) = 0 THEN
0
ELSE ( Sum(listingperformanceindicator.listingviews) ) /
( ( Count(listingperformanceindicator.listingid) ) / 30 )
END
FROM listingperformanceindicator
RIGHT OUTER JOIN categorytolisting
ON ( categorytolisting.siteid = listingperformanceindicator.siteid
AND categorytolisting.listingid = listingperformanceindicator.listingid )
RIGHT OUTER JOIN categoryflattened
ON ( categoryflattened.siteid = categorytolisting.siteid
AND ( categoryflattened.parentid = categorytolisting.categoryid
OR categoryflattened.categoryid = categorytolisting.categoryid
OR categoryflattened.groupid = categorytolisting.categoryid ) )
RIGHT OUTER JOIN site ON (site.id=categoryflattened.siteId)
WHERE --listingperformanceindicator.siteid = 'DED29E78-17B0-423B-A1D1-67E2F3CA864D' --THIS IS REALLY FAST :/
site.id = 'DED29E78-17B0-423B-A1D1-67E2F3CA864D' --THIS IS REALLY SLOW! :(
AND categoryflattened.languagecode ='SE'
GROUP BY Year(listingperformanceindicator.date),
Month(listingperformanceindicator.date),
categoryflattened.parentname,
categoryflattened.groupname,
categoryflattened.categoryname
I've tried looking at the execution plan but it makes no sense to me :(
Here they are:
fast: http://imageshack.us/a/img211/5755/fastquery.png
slow: http://imageshack.us/a/img823/7227/slowquery.png
Any suggestion appreciated!
Thanks!