Mysql optimization for select query with IN() clause inside where clause (explain output given)

Mysql optimization for select query with IN() clause inside where clause (explain output given) - optimization

I have this query:-
SELECT SUM(DISTINCT( ttagrels.id_tag IN ( 1816, 2642, 1906, 1398,
2436, 2940, 1973, 2791, 1389 ) )) AS
key_1_total_matches,
IF(( od.id_od > 0 ), COUNT(DISTINCT( od.id_od )), 0) AS
tutor_popularity,
td.*,
u.*
FROM tutor_details AS td
JOIN users AS u
ON u.id_user = td.id_user
JOIN all_tag_relations AS ttagrels
ON td.id_tutor = ttagrels.id_tutor
LEFT JOIN learning_packs AS lp
ON ttagrels.id_lp = lp.id_lp
LEFT JOIN learning_packs_categories AS lpc
ON lpc.id_lp_cat = lp.id_lp_cat
LEFT JOIN learning_packs_categories AS lpcp
ON lpcp.id_lp_cat = lpc.id_parent
LEFT JOIN learning_pack_content AS lpct
ON ( lp.id_lp = lpct.id_lp )
LEFT JOIN webclasses AS wc
ON ttagrels.id_wc = wc.id_wc
LEFT JOIN learning_packs_categories AS wcc
ON wcc.id_lp_cat = wc.id_wp_cat
LEFT JOIN learning_packs_categories AS wccp
ON wccp.id_lp_cat = wcc.id_parent
LEFT JOIN order_details AS od
ON td.id_tutor = od.id_author
LEFT JOIN orders AS o
ON od.id_order = o.id_order
WHERE ( u.country = 'IE'
OR u.country IN ( 'INT' ) )
AND u.status = 1
AND CASE
WHEN ( lp.id_lp > 0 ) THEN lp.id_status = 1
AND lp.published = 1
AND lpcp.status = 1
AND ( lpcp.country_code = 'IE'
OR lpcp.country_code IN ( 'INT' )
)
ELSE 1
END
AND CASE
WHEN ( wc.id_wc > 0 ) THEN wc.wc_api_status = 1
AND wc.id_status = 1
AND wc.wc_type = 0
AND
wc.class_date > '2010-06-16 11:44:40'
AND wccp.status = 1
AND ( wccp.country_code = 'IE'
OR wccp.country_code IN ( 'INT' )
)
ELSE 1
END
AND CASE
WHEN ( od.id_od > 0 ) THEN od.id_author = td.id_tutor
AND o.order_status = 'paid'
AND CASE
WHEN ( od.id_wc > 0 ) THEN od.can_attend_class = 1
ELSE 1
END
ELSE 1
END
AND ( ttagrels.id_tag IN ( 1816, 2642, 1906, 1398,
2436, 2940, 1973, 2791, 1389 ) )
GROUP BY td.id_tutor
HAVING key_1_total_matches = 1
ORDER BY tutor_popularity DESC,
u.surname ASC,
u.name ASC
LIMIT 0, 20
The numbers inside the IN() are actually ids of another table called Tags which matched the search keywords entered by users. In this example the user has searched "class".
See the explain output of this query here:-
http://www.test.examvillage.com/Screenshot.png
The time taken by this query is 0.0536 sec
But, if the number of values in the ttagrels.id_tag in () increases (as user enters more search keywords), the execution time rises to around 1-5 seconds and more.For example, if user searched "class Available to tutors and students 3 times a day"
the execution time is 4.2226 sec. The explain query output for this query contains 2513 under rows.
There are a total of 6,152 records in the All_Tag_Relations table. Is any further optimization possible?

I don't know your databse, so I cannot give a definitive answer.
I notice in your query the Distinct keyword twice SUM(DISTINCT( and COUNT(DISTINCT(
This implies that your select statement returns several rows with the same id_tag or id_od. This means that your intermediate result before grouping contains to many rows. You must narrow down your where clauses to limit the rows.
Insert count(*) in your query, and observe the number of rows. If it is larger than expected, this explains your problem.

Related

Counting and grouping NULL and non NULL values with count results in separate columns

Really stumped on this one. I'm trying to figure out how i can get my SQL query shown below to do a count of null and not null aircraft ID's with a table of results that has two count columns, one for NULL aircraft IDs and another for Not NULL aircraft IDs and is grouped by operator, so it looks something like this:
SELECT DISTINCT org.organization "operator",
ah.aircraft_registration_country "country",
ah.aircraft_registration_region "region",
acl.aircraft_master_series "aircraft type",
ah.publish_date "publish date",
f.aircraft_id "aircraft_id"
FROM ((((("flights"."tracked_utilization" f
left join "pond_dataops_analysis"."latest_aircraft" a
ON ( a.aircraft_id = f.aircraft_id ))
left join fleets.aircraft_all_history_latest ah
ON ( ( ( ah.aircraft_id = f.aircraft_id )
AND ( Coalesce(f.actual_runway_departure_time_local,
actual_gate_departure_time_local,
published_gate_departure_time_local) >=
ah.start_event_date ) )
AND ( Coalesce(f.actual_runway_departure_time_local,
actual_gate_departure_time_local,
published_gate_departure_time_local) <
ah.end_event_date ) ))
left join fleets.organizations_latest org
ON ( org.organization_id = ah.operator_organization_id )))
left join fleets.aircraft_usage_history_latest ash
ON ( ( ( ( ash.aircraft_id = f.aircraft_id )
AND ( start_event_date >= ash.usage_start_date ) )
AND ( start_event_date < ash.usage_end_date ) )
AND ( aircraft_usage_classification = 'Primary' ) )
left join fleets.aircraft_configuration_history_latest accl
ON ash.aircraft_id = accl.aircraft_id
left join fleets.aircraft_configurations_latest acl
ON accl.aircraft_configuration_id = acl.aircraft_configuration_id
)
WHERE (((( f.flight_departure_date > ( "Now"() - interval '90' day ) ))))
Not sure how to do a 'count/group by' so that the query can show what i'm after.
Regards,
Mark

Something like this:
select
x, y, z,
sum( case when aircraft_id is null then 1 else 0 end ) as null_cnt,
sum( case when aircraft_id is null then 0 else 1 end ) as notnull_cnt
from
(inline subquery)
group by
x, y, z
FWIW, you don't need all those parentheses in your query, they are unnecessary and more confusing than helpful. They do have their place in some cases, especially when dealing with "OR" conditions, but for this query they are completely superfluous:
FROM
"flights"."tracked_utilization" f
left join "pond_dataops_analysis"."latest_aircraft" a
ON a.aircraft_id = f.aircraft_id
left join fleets.aircraft_all_history_latest ah
ON ah.aircraft_id = f.aircraft_id
AND Coalesce(f.actual_runway_departure_time_local, actual_gate_departure_time_local, published_gate_departure_time_local) >= ah.start_event_date
AND Coalesce(f.actual_runway_departure_time_local, actual_gate_departure_time_local, published_gate_departure_time_local) < ah.end_event_date
left join fleets.organizations_latest org
ON org.organization_id = ah.operator_organization_id
left join fleets.aircraft_usage_history_latest ash
ON ash.aircraft_id = f.aircraft_id
AND start_event_date >= ash.usage_start_date
AND start_event_date < ash.usage_end_date
AND aircraft_usage_classification = 'Primary'
left join fleets.aircraft_configuration_history_latest accl
ON ash.aircraft_id = accl.aircraft_id
left join fleets.aircraft_configurations_latest acl
ON accl.aircraft_configuration_id = acl.aircraft_configuration_id
WHERE
f.flight_departure_date > "Now"() - interval '90' day

Sum of resulting set of rows in SQL

I've got the following query:
SELECT DISTINCT CU.permit_id, CU.month, /*CU.year,*/ M.material_id, M.material_name, /*MC.chemical_id, C.chemical_name,
C.precursor_organic_compound, C.non_precursor_organic_compound,*/
/*MC.chemical_percentage,*/
POC_emissions =
CASE
WHEN (C.precursor_organic_compound = 'true')
THEN (CU.chemical_usage_lbs / CU.material_density) * M.VOC
ELSE 0
END,
NON_POC_emissions =
CASE
WHEN (C.non_precursor_organic_compound = 'true')
THEN CU.chemical_usage_lbs * (MC.chemical_percentage / 100)
ELSE 0
END
FROM material M
LEFT OUTER JOIN material_chemical MC ON MC.material_id = M.material_id
LEFT OUTER JOIN chemical_usage CU ON CU.material_id = MC.material_id
LEFT OUTER JOIN chemical C ON C.chemical_id = MC.chemical_id
WHERE (CU.month >=1 AND CU.month <= 2)
AND CU.year = 2013
AND M.material_id = 52
--AND CU.permit_id = 2118
--GROUP BY CU.permit_id, M.material_id, M.material_name, CU.month, MC.chemical_id, MC.chemical_id, C.chemical_name, C.precursor_organic_compound, C.non_precursor_organic_compound
--ORDER BY C.chemical_name ASC
Which returns:
But what I need is to return one row per month per material adding up the values of POC per month and NON_POC per month.
So, I should end up with something like:
Month material_id material_name POC NON_POC
1 52 Krylon... 0.107581 0.074108687
2 52 Krylon... 0.143437 0.0988125
I tried using SUM but it sums up the same result multiple times:
SELECT /*DISTINCT*/ CU.permit_id, CU.month, /*CU.year,*/ M.material_id, M.material_name, /*MC.chemical_id, C.chemical_name,
C.precursor_organic_compound, C.non_precursor_organic_compound,*/
--MC.chemical_percentage,
POC_emissions = SUM(
CASE
WHEN (C.precursor_organic_compound = 'true')
THEN (CU.chemical_usage_lbs / CU.material_density) * M.VOC
ELSE 0
END),
NON_POC_emissions = SUM(
CASE
WHEN (C.non_precursor_organic_compound = 'true')
THEN CU.chemical_usage_lbs * (MC.chemical_percentage / 100)
ELSE 0
END)
FROM material M
LEFT OUTER JOIN material_chemical MC ON MC.material_id = M.material_id
LEFT OUTER JOIN chemical_usage CU ON CU.material_id = MC.material_id
LEFT OUTER JOIN chemical C ON C.chemical_id = MC.chemical_id
WHERE M.material_id = 52
--AND CU.permit_id = 187
AND (CU.month >=1 AND CU.month <= 2)
AND CU.year = 2013
GROUP BY CU.permit_id, M.material_id, M.material_name, CU.month/*, CU.year, MC.chemical_id, C.chemical_name, C.precursor_organic_compound, C.non_precursor_organic_compound*/
--ORDER BY C.chemical_name ASC

The first query has a DISTINCT clause. What is the output without the DISTINCT clause. I suspect you have more rows than shows in your screenshot.
Regardless, you could try something like this to get the desired result.
select permit_id, month, material_id, material_name,
sum(poc_emissions), sum(non_poc_emissions)
from (
SELECT DISTINCT CU.permit_id, CU.month, M.material_id, M.material_name,
POC_emissions =
CASE
WHEN (C.precursor_organic_compound = 'true')
THEN (CU.chemical_usage_lbs / CU.material_density) * M.VOC
ELSE 0
END,
NON_POC_emissions =
CASE
WHEN (C.non_precursor_organic_compound = 'true')
THEN CU.chemical_usage_lbs * (MC.chemical_percentage / 100)
ELSE 0
END
FROM material M
LEFT OUTER JOIN material_chemical MC ON MC.material_id = M.material_id
LEFT OUTER JOIN chemical_usage CU ON CU.material_id = MC.material_id
LEFT OUTER JOIN chemical C ON C.chemical_id = MC.chemical_id
WHERE (CU.month >=1 AND CU.month <= 2)
AND CU.year = 2013
AND M.material_id = 52
) main
group by permit_id, month, material_id, material_name
Explanation
Since the results you retrieved by doing a DISTINCT was consider source-of-truth, I created an in-memory table by making it a sub-query. However, this subquery must have a name of some kind...whatever name. I gave it a name main. Subqueries look like this:
select ... from (sub-query) <give-it-a-table-name>
Simple Example:
select * from (select userid, username from user) user_temp
Advanced Example:
select * from (select userid, username from user) user_temp
inner join (select userid, sum(debits) as totaldebits from debittable) debit
on debit.userid = user_temp.userid
Notice how user_temp alias for the subquery can be used as if the sub-query was a real table.

Use above query in subquery and group by (month) and select sum(POC_emissions) and sum(NON_POC_emissions )

How can I optimize the SQL query?

I have a query an SQL query as follows, can anybody suggest any optimization for this; I think most of the effort is being done for the Union operation - is there anything else can be done to get the same result ?
Basically I wanna query first portion of the UNION and if for each record there is no result then the second portion need to be run. Please help.
:
SET dateformat dmy;
WITH incidentcategory
AS (
SELECT 1 ord, i.IncidentId, rl.Description Category FROM incident i
JOIN IncidentLikelihood l ON i.IncidentId = l.IncidentId
JOIN IncidentSeverity s ON i.IncidentId = s.IncidentId
JOIN LikelihoodSeverity ls ON l.LikelihoodId = ls.LikelihoodId AND s.SeverityId = ls.SeverityId
JOIN RiskLevel rl ON ls.RiskLevelId = rl.riskLevelId
UNION
SELECT 2 ord, i.incidentid,
rl.description Category
FROM incident i
JOIN incidentreportlikelihood l
ON i.incidentid = l.incidentid
JOIN incidentreportseverity s
ON i.incidentid = s.incidentid
JOIN likelihoodseverity ls
ON l.likelihoodid = ls.likelihoodid
AND s.severityid = ls.severityid
JOIN risklevel rl
ON ls.risklevelid = rl.risklevelid
) ,
ic AS (
SELECT ROW_NUMBER() OVER (PARTITION BY i.IncidentId ORDER BY (CASE WHEN incidentTime IS NULL THEN GETDATE() ELSE incidentTime END) DESC,ord ASC) rn,
i.incidentid,
dbo.Incidentdescription(i.incidentid, '',
'',
'', '')
IncidentDescription,
dbo.Dateconverttimezonecompanyid(closedtime,
i.companyid)
ClosedTime,
incidenttime,
incidentno,
Isnull(c.category, '')
Category,
opencorrectiveactions,
reportcompleted,
Isnull(classificationcompleted, 0)
ClassificationCompleted,
Cast (( CASE
WHEN closedtime IS NULL THEN 0
ELSE 1
END ) AS BIT)
IncidentClosed,
Cast (( CASE
WHEN investigatorfinishedtime IS NULL THEN 0
ELSE 1
END ) AS BIT)
InvestigationFinished,
Cast (( CASE
WHEN investigationcompletetime IS NULL THEN 0
ELSE 1
END ) AS BIT)
InvestigationComplete,
Cast (( CASE
WHEN investigatorassignedtime IS NULL THEN 0
ELSE 1
END ) AS BIT)
InvestigatorAssigned,
Cast (( CASE
WHEN (SELECT Count(*)
FROM incidentinvestigator
WHERE incidentid = i.incidentid
AND personid = 1588
AND tablename = 'AdminLevels') = 0
THEN 0
ELSE 1
END ) AS BIT)
IncidentInvestigator,
(SELECT dbo.Strconcat(osname)
FROM (SELECT TOP 10 osname
FROM incidentlocation l
JOIN organisationstructure o
ON l.locationid = o.osid
WHERE incidentid = i.incidentid
ORDER BY l.locorder) loc)
Location,
Isnull((SELECT TOP 1 teamleader
FROM incidentinvestigator
WHERE personid = 1588
AND tablename = 'AdminLevels'
AND incidentid = i.incidentid), 0)
TeamLeader,
incidentstatus,
incidentstatussearch
FROM incident i
LEFT OUTER JOIN incidentcategory c
ON i.incidentid = c.incidentid
WHERE i.isdeleted = 0
AND i.companyid = 158
AND incidentno <> 0
--AND reportcompleted = 1
--AND investigatorassignedtime IS NOT NULL
--AND investigatorfinishedtime IS NULL
--AND closedtime IS NULL
),
ic2 AS (
SELECT * FROM ic WHERE rn=1
)
SELECT * FROM ic2
--WHERE rownumber >= 0
-- AND rownumber < 0 + 10
--WHERE ic2.incidentid in(53327,53538)
--WHERE ic2.incidentid = 53338
ORDER BY incidentid DESC
Following is the execution plan I got:
https://www.dropbox.com/s/50dcpelr1ag4blp/Execution_Plan.sqlplan?dl=0

There are several issues:
1) use UNION ALL instead of UNION ALL to avoid the additional operation to aggregate the data.
2) try to modify the numerous function calls (e.g. dbo.Incidentdescription() ) to be an in-lie table valued function so you can reference it using CROSS APPLY or OUTER APPLY. Especially, if those functions referencing a table again.
3) move the subqueries from the SELECT part of the query to the FROM part using CROSS APPLY or OUTER APPLY again.
4) after the above is done, check the execution plan again for any missing indexes. Also, run the query with STATISTICS TIME, IO on to verify that the number of times a table
is referenced is correct (sometimes the execution plan put you in the wrong direction, especially if function calls are involved)...

Since the first inner query produces rows with ord=1 and the second produces rows with ord=2, you should use UNION ALL instead of UNION. UNION will filter out equal rows and since you will never get equal rows it is more efficient to use UNION ALL.
Also, rewrite your query to not use the WITH construct. I've had very bad experiences with this. Just use regular derived tables instead. In the case the query is still abnormally slow, try to serialize some derived tables to a temporary table and query the temporary table instead.

Try alternate approach by removing
(SELECT dbo.Strconcat(osname)
FROM (SELECT TOP 10 osname
FROM incidentlocation l
JOIN organisationstructure o
ON l.locationid = o.osid
WHERE incidentid = i.incidentid
ORDER BY l.locorder) loc)
Location,
Isnull((SELECT TOP 1 teamleader
FROM incidentinvestigator
WHERE personid = 1588
AND tablename = 'AdminLevels'
AND incidentid = i.incidentid), 0)
TeamLeader
from the SELECT. Avoid using complex functions/sub-queries in select.

(ORDER BY CASE WHEN) ordering by subquery

I need to order my results by int column ascending, but I want to get only rows with numbers (0...10000) but default ordering gives me rows with null values for this column before numbers. I googled solution which set rows with null into the end of ordering (after all numbers) it looks like
SELECT ProductName
FROM Products
ORDER BY
CASE WHEN Position is null THEN 1 ELSE 0 END,
Position
So I my query looks like:
SELECT c.CompanyId, c.CompanyName, c.CompanyCategoryId, cc.CompanyCategoryName, c.HQCountryISO, c.CrunchBaseUrl,c.AngelListUrl,
(SELECT MAX(mf.NumLikes) FROM MeasurementFacebook mf
JOIN FacebookAccount f ON f.CompanyId = c.CompanyId
WHERE f.FacebookAccountId in (mf.FacebookAccountId)) as Likes,
(SELECT MAX(mt.NumFollowers) FROM MeasurementTwitter mt
JOIN TwitterAccount t ON t.CompanyId = c.CompanyId
WHERE t.TwitterAccountId in (mt.TwitterAccountId)) as Followers,
(SELECT MAX(ma.AlexaRanking) FROM MeasurementAlexa ma
JOIN Website w ON w.CompanyId = c.CompanyId
WHERE w.WebsiteId in (ma.WebsiteId)) as AlexaRank
FROM Company c
JOIN CompanyCategory cc ON c.CompanyCategoryId = cc.CompanyCategoryId
WHERE c.HQCountryISO = 'FRA'
ORDER BY CASE WHEN AlexaRank IS NULL THEN 1 ELSE 0 END, AlexaRank
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY
As you can see, AlexaRank is the result of third subquery, and I want to order result by this column. But I have an error which says:
Msg 207, Level 16, State 1, Line 14
Invalid column name 'AlexaRank'.
What I'm doing wrong? Thanks

While you can use an alias in the ORDER BY clause, you can't use an alias in an expression, easiest solution is to plop it in a cte/subquery:
;WITH cte AS (SELECT c.CompanyId
, c.CompanyName
, c.CompanyCategoryId
, cc.CompanyCategoryName
, c.HQCountryISO
, c.CrunchBaseUrl
,c.AngelListUrl
,(SELECT MAX(mf.NumLikes)
FROM MeasurementFacebook mf
JOIN FacebookAccount f ON f.CompanyId = c.CompanyId
WHERE f.FacebookAccountId in (mf.FacebookAccountId)) as Likes
,(SELECT MAX(mt.NumFollowers)
FROM MeasurementTwitter mt
JOIN TwitterAccount t ON t.CompanyId = c.CompanyId
WHERE t.TwitterAccountId in (mt.TwitterAccountId)) as Followers
,(SELECT MAX(ma.AlexaRanking)
FROM MeasurementAlexa ma
JOIN Website w ON w.CompanyId = c.CompanyId
WHERE w.WebsiteId in (ma.WebsiteId)) as AlexaRank
FROM Company c
JOIN CompanyCategory cc ON c.CompanyCategoryId = cc.CompanyCategoryId
WHERE c.HQCountryISO = 'FRA')
SELECT *
FROM cte
ORDER BY CASE WHEN AlexaRank IS NULL THEN 1 ELSE 0 END, AlexaRank
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY

Very inefficient code but you could do something like the following. Basically wrap your initial query in a common table expression so you don't need to rewrite your 3rd sub-select in your order by.
SELECT * FROM (
SELECT c.companyid,
c.companyname,
c.companycategoryid,
cc.companycategoryname,
c.hqcountryiso,
c.crunchbaseurl,
c.angellisturl,
(SELECT Max(mf.numlikes)
FROM measurementfacebook mf
JOIN facebookaccount f
ON f.companyid = c.companyid
WHERE f.facebookaccountid IN ( mf.facebookaccountid )) AS Likes,
(SELECT Max(mt.numfollowers)
FROM measurementtwitter mt
JOIN twitteraccount t
ON t.companyid = c.companyid
WHERE t.twitteraccountid IN ( mt.twitteraccountid )) AS Followers,
(SELECT Max(ma.alexaranking)
FROM measurementalexa ma
JOIN website w
ON w.companyid = c.companyid
WHERE w.websiteid IN ( ma.websiteid )) AS AlexaRank
FROM company c
JOIN companycategory cc
ON c.companycategoryid = cc.companycategoryid
WHERE c.hqcountryiso = 'FRA' ) Q
ORDER BY CASE
WHEN Q.AlexaRank IS NULL THEN 1
ELSE 0
END,
Q.AlexaRank

Query for logistic regression, multiple where exists

A logistic regression is a composed of a uniquely identifying number, followed by multiple binary variables (always 1 or 0) based on whether or not a person meets certain criteria. Below I have a query that lists several of these binary conditions. With only four such criteria the query takes a little longer to run than what I would think. Is there a more efficient approach than below? Note. tblicd is a large table lookup table with text representations of 15k+ rows. The query makes no real sense, just a proof of concept. I have the proper indexes on my composite keys.
select patient.patientid
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1000
)
then '1' else '0'
end as moreThan1000
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1500
)
then '1' else '0'
end as moreThan1500
,case when exists
(
select distinct picd.patientid from patienticd as picd
inner join patient as p on p.patientid= picd.patientid
and picd.admissiondate = p.admissiondate
and picd.dischargedate = p.dischargedate
inner join tblicd as t on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%' and patient.patientid = picd.patientid
)
then '1' else '0'
end as diabetes
,case when exists
(
select r.patientid, count(*) from patient as r
where r.patientid = patient.patientid
group by r.patientid
having count(*) >1
)
then '1' else '0'
end
from patient
order by moreThan1000 desc

I would start by using subqueries in the from clause:
select q.patientid, moreThan1000, moreThan1500,
(case when d.patientid is not null then 1 else 0 end),
(case when pc.patientid is not null then 1 else 0 end)
from patient p left outer join
(select c.patientid,
(case when count(*) > 1000 then 1 else 0 end) as moreThan1000,
(case when count(*) > 1500 then 1 else 0 end) as moreThan1500
from tblclaims as c inner join
patient as p
on p.patientid=c.patientid and
c.admissiondate = p.admissiondate and
c.dischargedate = p.dischargedate
group by c.patientid
) q
on p.patientid = q.patientid left outer join
(select distinct picd.patientid
from patienticd as picd inner join
patient as p
on p.patientid= picd.patientid and
picd.admissiondate = p.admissiondate and
picd.dischargedate = p.dischargedate inner join
tblicd as t
on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%'
) d
on p.patientid = d.patientid left outer join
(select r.patientid, count(*) as cnt
from patient as r
group by r.patientid
having count(*) >1
) pc
on p.patientid = pc.patientid
order by 2 desc
You can then probably simplify these subqueries more by combining them (for instance "p" and "pc" on the outer query can be combined into one). However, without the correlated subqueries, SQL Server should find it easier to optimize the queries.

Example of left joins as requested...
SELECT
patientid,
ISNULL(CondA.ConditionA,0) as IsConditionA,
ISNULL(CondB.ConditionB,0) as IsConditionB,
....
FROM
patient
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionA from ... where ... ) CondA
ON patient.patientid = CondA.patientID
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionB from ... where ... ) CondB
ON patient.patientid = CondB.patientID
If your Condition queries only return a maximum one row, you can simplify them down to
(SELECT patientid, 1 as ConditionA from ... where ... ) CondA

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Mysql optimization for select query with IN() clause inside where clause (explain output given) - optimization

Related

Counting and grouping NULL and non NULL values with count results in separate columns

Sum of resulting set of rows in SQL

How can I optimize the SQL query?

(ORDER BY CASE WHEN) ordering by subquery

Query for logistic regression, multiple where exists

Categories

Resources