Counting and grouping NULL and non NULL values with count results in separate columns - sql

Really stumped on this one. I'm trying to figure out how i can get my SQL query shown below to do a count of null and not null aircraft ID's with a table of results that has two count columns, one for NULL aircraft IDs and another for Not NULL aircraft IDs and is grouped by operator, so it looks something like this:
SELECT DISTINCT org.organization "operator",
ah.aircraft_registration_country "country",
ah.aircraft_registration_region "region",
acl.aircraft_master_series "aircraft type",
ah.publish_date "publish date",
f.aircraft_id "aircraft_id"
FROM ((((("flights"."tracked_utilization" f
left join "pond_dataops_analysis"."latest_aircraft" a
ON ( a.aircraft_id = f.aircraft_id ))
left join fleets.aircraft_all_history_latest ah
ON ( ( ( ah.aircraft_id = f.aircraft_id )
AND ( Coalesce(f.actual_runway_departure_time_local,
actual_gate_departure_time_local,
published_gate_departure_time_local) >=
ah.start_event_date ) )
AND ( Coalesce(f.actual_runway_departure_time_local,
actual_gate_departure_time_local,
published_gate_departure_time_local) <
ah.end_event_date ) ))
left join fleets.organizations_latest org
ON ( org.organization_id = ah.operator_organization_id )))
left join fleets.aircraft_usage_history_latest ash
ON ( ( ( ( ash.aircraft_id = f.aircraft_id )
AND ( start_event_date >= ash.usage_start_date ) )
AND ( start_event_date < ash.usage_end_date ) )
AND ( aircraft_usage_classification = 'Primary' ) )
left join fleets.aircraft_configuration_history_latest accl
ON ash.aircraft_id = accl.aircraft_id
left join fleets.aircraft_configurations_latest acl
ON accl.aircraft_configuration_id = acl.aircraft_configuration_id
)
WHERE (((( f.flight_departure_date > ( "Now"() - interval '90' day ) ))))
Not sure how to do a 'count/group by' so that the query can show what i'm after.
Regards,
Mark

Something like this:
select
x, y, z,
sum( case when aircraft_id is null then 1 else 0 end ) as null_cnt,
sum( case when aircraft_id is null then 0 else 1 end ) as notnull_cnt
from
(inline subquery)
group by
x, y, z
FWIW, you don't need all those parentheses in your query, they are unnecessary and more confusing than helpful. They do have their place in some cases, especially when dealing with "OR" conditions, but for this query they are completely superfluous:
FROM
"flights"."tracked_utilization" f
left join "pond_dataops_analysis"."latest_aircraft" a
ON a.aircraft_id = f.aircraft_id
left join fleets.aircraft_all_history_latest ah
ON ah.aircraft_id = f.aircraft_id
AND Coalesce(f.actual_runway_departure_time_local, actual_gate_departure_time_local, published_gate_departure_time_local) >= ah.start_event_date
AND Coalesce(f.actual_runway_departure_time_local, actual_gate_departure_time_local, published_gate_departure_time_local) < ah.end_event_date
left join fleets.organizations_latest org
ON org.organization_id = ah.operator_organization_id
left join fleets.aircraft_usage_history_latest ash
ON ash.aircraft_id = f.aircraft_id
AND start_event_date >= ash.usage_start_date
AND start_event_date < ash.usage_end_date
AND aircraft_usage_classification = 'Primary'
left join fleets.aircraft_configuration_history_latest accl
ON ash.aircraft_id = accl.aircraft_id
left join fleets.aircraft_configurations_latest acl
ON accl.aircraft_configuration_id = acl.aircraft_configuration_id
WHERE
f.flight_departure_date > "Now"() - interval '90' day

Related

How to calculate the z score after joining 3 tables in MySQL

I have joined three tables A, B, D using this query,
SELECT [A].ID, [A].Surname, [A].[Given Name], [B].[Pre-U Grade], [D ].[Total Score], [B].[score]
FROM ([A] LEFT JOIN [D] ON [A].ID = [D].[Student ID]) INNER JOIN [B-Results] ON [A].ID = [B].ID
WHERE ((([B].[Pre-U Grade])=IsNumeric([B]![Pre-U Grade])) AND (([D].[Total Score]) Is Not Null) AND (([A].Status) Not In ("REJECTED","OFFERED","WITHDRAWN"))) OR ((([B].[Pre-U Grade])>"0") AND (([D].[Total Score]) Is Not Null) AND (([A].Status) Not In ("REJECTED","OFFERED","WITHDRAWN")))
ORDER BY [D].[Date] DESC;
After joining the tables, the z-score for the 3 numerical columns needs to be calculated.
I came across this example
Calculating Z-Score for each row in MySQL? (simple)
but i didnt know how to use the code given for my problem statement. Can someone kindly help me with this?
SELECT
(pre-u_grade - AVG(pre-u_grade))/STD(pre-u_grade) z_pre-u_grade,
(total_score- AVG(total_score))/STD(total_score) z_total_score,
(score- AVG(score))/STD(score) z_score,
(SELECT
a.id,
a.surname,
a.given_name,
pre-u_grade,
total_score,
score
FROM
a
LEFT JOIN
d
ON
a.id = d.student id)
INNER JOIN
b.results
ON
a.id = b.id
WHERE
(
( b.pre-u_grade = ISNUMERIC(b ! pre-u_grade)
AND d.total score IS NOT NULL
AND a.status NOT IN ( "rejected", "offered", "withdrawn) )
OR
( b.pre-u_grade > 0
AND d.total score ) IS NOT NULL
AND a.status NOT IN ( "rejected", "offered", "withdrawn" ) )
)
ORDER BY
d.date DESC) result;
Try this.

Select sql statement for Mysql

I have this bit of sql
SELECT pd.id, pd.title, pd.fname, pd.lname, pd.DOB, pd.phone_home, pd.phone_biz, pd.phone_contact, pd.phone_cell, pd.sex, pd.email, pd.pid, f.form_id, ldf1.field_value AS autoleaxis, ldf2.field_value AS autolecyl
FROM patient_data pd
LEFT JOIN forms f ON pd.pid = f.pid
LEFT JOIN `lbf_data` ldf1 ON ldf1.form_id = f.form_id
LEFT JOIN `lbf_data` ldf2 ON ldf1.form_id = ldf2.form_id
WHERE (
f.form_name = 'Opthalmology'
AND ldf1.field_id = 'autoleaxis'
AND ldf2.field_id = 'autolecyl'
)
OR (
f.form_id IS NULL
AND ldf1.field_id IS NULL
AND ldf2.field_id IS NULL
)
ORDER BY pd.pid
Which is for a MySQL Database. It works ok except it is returning more than one record per person in some cases as they have more than one form_id. How do I restrict it so that it only returns the records for highest form_id that the person has or form_id set to null if there isn't one. Thanks
You can use subquery in LEFT JOIN, as with:
SELECT *
FROM patient_data pd
LEFT JOIN
(
select max(form_id) as form_id, pid, form_name
from forms as f2
) as f
ON pd.pid = f.pid
LEFT JOIN `lbf_data` ldf1 ON ldf1.form_id = f.form_id
LEFT JOIN `lbf_data` ldf2 ON ldf1.form_id = ldf2.form_id
WHERE (
f.form_name = 'Opthalmology'
AND ldf1.field_id = 'autoleaxis'
AND ldf2.field_id = 'autolecyl'
)
OR (
f.form_id IS NULL
AND ldf1.field_id IS NULL
AND ldf2.field_id IS NULL
)
ORDER BY pd.pid
You can test in here

Sum of resulting set of rows in SQL

I've got the following query:
SELECT DISTINCT CU.permit_id, CU.month, /*CU.year,*/ M.material_id, M.material_name, /*MC.chemical_id, C.chemical_name,
C.precursor_organic_compound, C.non_precursor_organic_compound,*/
/*MC.chemical_percentage,*/
POC_emissions =
CASE
WHEN (C.precursor_organic_compound = 'true')
THEN (CU.chemical_usage_lbs / CU.material_density) * M.VOC
ELSE 0
END,
NON_POC_emissions =
CASE
WHEN (C.non_precursor_organic_compound = 'true')
THEN CU.chemical_usage_lbs * (MC.chemical_percentage / 100)
ELSE 0
END
FROM material M
LEFT OUTER JOIN material_chemical MC ON MC.material_id = M.material_id
LEFT OUTER JOIN chemical_usage CU ON CU.material_id = MC.material_id
LEFT OUTER JOIN chemical C ON C.chemical_id = MC.chemical_id
WHERE (CU.month >=1 AND CU.month <= 2)
AND CU.year = 2013
AND M.material_id = 52
--AND CU.permit_id = 2118
--GROUP BY CU.permit_id, M.material_id, M.material_name, CU.month, MC.chemical_id, MC.chemical_id, C.chemical_name, C.precursor_organic_compound, C.non_precursor_organic_compound
--ORDER BY C.chemical_name ASC
Which returns:
But what I need is to return one row per month per material adding up the values of POC per month and NON_POC per month.
So, I should end up with something like:
Month material_id material_name POC NON_POC
1 52 Krylon... 0.107581 0.074108687
2 52 Krylon... 0.143437 0.0988125
I tried using SUM but it sums up the same result multiple times:
SELECT /*DISTINCT*/ CU.permit_id, CU.month, /*CU.year,*/ M.material_id, M.material_name, /*MC.chemical_id, C.chemical_name,
C.precursor_organic_compound, C.non_precursor_organic_compound,*/
--MC.chemical_percentage,
POC_emissions = SUM(
CASE
WHEN (C.precursor_organic_compound = 'true')
THEN (CU.chemical_usage_lbs / CU.material_density) * M.VOC
ELSE 0
END),
NON_POC_emissions = SUM(
CASE
WHEN (C.non_precursor_organic_compound = 'true')
THEN CU.chemical_usage_lbs * (MC.chemical_percentage / 100)
ELSE 0
END)
FROM material M
LEFT OUTER JOIN material_chemical MC ON MC.material_id = M.material_id
LEFT OUTER JOIN chemical_usage CU ON CU.material_id = MC.material_id
LEFT OUTER JOIN chemical C ON C.chemical_id = MC.chemical_id
WHERE M.material_id = 52
--AND CU.permit_id = 187
AND (CU.month >=1 AND CU.month <= 2)
AND CU.year = 2013
GROUP BY CU.permit_id, M.material_id, M.material_name, CU.month/*, CU.year, MC.chemical_id, C.chemical_name, C.precursor_organic_compound, C.non_precursor_organic_compound*/
--ORDER BY C.chemical_name ASC
The first query has a DISTINCT clause. What is the output without the DISTINCT clause. I suspect you have more rows than shows in your screenshot.
Regardless, you could try something like this to get the desired result.
select permit_id, month, material_id, material_name,
sum(poc_emissions), sum(non_poc_emissions)
from (
SELECT DISTINCT CU.permit_id, CU.month, M.material_id, M.material_name,
POC_emissions =
CASE
WHEN (C.precursor_organic_compound = 'true')
THEN (CU.chemical_usage_lbs / CU.material_density) * M.VOC
ELSE 0
END,
NON_POC_emissions =
CASE
WHEN (C.non_precursor_organic_compound = 'true')
THEN CU.chemical_usage_lbs * (MC.chemical_percentage / 100)
ELSE 0
END
FROM material M
LEFT OUTER JOIN material_chemical MC ON MC.material_id = M.material_id
LEFT OUTER JOIN chemical_usage CU ON CU.material_id = MC.material_id
LEFT OUTER JOIN chemical C ON C.chemical_id = MC.chemical_id
WHERE (CU.month >=1 AND CU.month <= 2)
AND CU.year = 2013
AND M.material_id = 52
) main
group by permit_id, month, material_id, material_name
Explanation
Since the results you retrieved by doing a DISTINCT was consider source-of-truth, I created an in-memory table by making it a sub-query. However, this subquery must have a name of some kind...whatever name. I gave it a name main. Subqueries look like this:
select ... from (sub-query) <give-it-a-table-name>
Simple Example:
select * from (select userid, username from user) user_temp
Advanced Example:
select * from (select userid, username from user) user_temp
inner join (select userid, sum(debits) as totaldebits from debittable) debit
on debit.userid = user_temp.userid
Notice how user_temp alias for the subquery can be used as if the sub-query was a real table.
Use above query in subquery and group by (month) and select sum(POC_emissions) and sum(NON_POC_emissions )

Query for logistic regression, multiple where exists

A logistic regression is a composed of a uniquely identifying number, followed by multiple binary variables (always 1 or 0) based on whether or not a person meets certain criteria. Below I have a query that lists several of these binary conditions. With only four such criteria the query takes a little longer to run than what I would think. Is there a more efficient approach than below? Note. tblicd is a large table lookup table with text representations of 15k+ rows. The query makes no real sense, just a proof of concept. I have the proper indexes on my composite keys.
select patient.patientid
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1000
)
then '1' else '0'
end as moreThan1000
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1500
)
then '1' else '0'
end as moreThan1500
,case when exists
(
select distinct picd.patientid from patienticd as picd
inner join patient as p on p.patientid= picd.patientid
and picd.admissiondate = p.admissiondate
and picd.dischargedate = p.dischargedate
inner join tblicd as t on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%' and patient.patientid = picd.patientid
)
then '1' else '0'
end as diabetes
,case when exists
(
select r.patientid, count(*) from patient as r
where r.patientid = patient.patientid
group by r.patientid
having count(*) >1
)
then '1' else '0'
end
from patient
order by moreThan1000 desc
I would start by using subqueries in the from clause:
select q.patientid, moreThan1000, moreThan1500,
(case when d.patientid is not null then 1 else 0 end),
(case when pc.patientid is not null then 1 else 0 end)
from patient p left outer join
(select c.patientid,
(case when count(*) > 1000 then 1 else 0 end) as moreThan1000,
(case when count(*) > 1500 then 1 else 0 end) as moreThan1500
from tblclaims as c inner join
patient as p
on p.patientid=c.patientid and
c.admissiondate = p.admissiondate and
c.dischargedate = p.dischargedate
group by c.patientid
) q
on p.patientid = q.patientid left outer join
(select distinct picd.patientid
from patienticd as picd inner join
patient as p
on p.patientid= picd.patientid and
picd.admissiondate = p.admissiondate and
picd.dischargedate = p.dischargedate inner join
tblicd as t
on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%'
) d
on p.patientid = d.patientid left outer join
(select r.patientid, count(*) as cnt
from patient as r
group by r.patientid
having count(*) >1
) pc
on p.patientid = pc.patientid
order by 2 desc
You can then probably simplify these subqueries more by combining them (for instance "p" and "pc" on the outer query can be combined into one). However, without the correlated subqueries, SQL Server should find it easier to optimize the queries.
Example of left joins as requested...
SELECT
patientid,
ISNULL(CondA.ConditionA,0) as IsConditionA,
ISNULL(CondB.ConditionB,0) as IsConditionB,
....
FROM
patient
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionA from ... where ... ) CondA
ON patient.patientid = CondA.patientID
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionB from ... where ... ) CondB
ON patient.patientid = CondB.patientID
If your Condition queries only return a maximum one row, you can simplify them down to
(SELECT patientid, 1 as ConditionA from ... where ... ) CondA

Mysql optimization for select query with IN() clause inside where clause (explain output given)

I have this query:-
SELECT SUM(DISTINCT( ttagrels.id_tag IN ( 1816, 2642, 1906, 1398,
2436, 2940, 1973, 2791, 1389 ) )) AS
key_1_total_matches,
IF(( od.id_od > 0 ), COUNT(DISTINCT( od.id_od )), 0) AS
tutor_popularity,
td.*,
u.*
FROM tutor_details AS td
JOIN users AS u
ON u.id_user = td.id_user
JOIN all_tag_relations AS ttagrels
ON td.id_tutor = ttagrels.id_tutor
LEFT JOIN learning_packs AS lp
ON ttagrels.id_lp = lp.id_lp
LEFT JOIN learning_packs_categories AS lpc
ON lpc.id_lp_cat = lp.id_lp_cat
LEFT JOIN learning_packs_categories AS lpcp
ON lpcp.id_lp_cat = lpc.id_parent
LEFT JOIN learning_pack_content AS lpct
ON ( lp.id_lp = lpct.id_lp )
LEFT JOIN webclasses AS wc
ON ttagrels.id_wc = wc.id_wc
LEFT JOIN learning_packs_categories AS wcc
ON wcc.id_lp_cat = wc.id_wp_cat
LEFT JOIN learning_packs_categories AS wccp
ON wccp.id_lp_cat = wcc.id_parent
LEFT JOIN order_details AS od
ON td.id_tutor = od.id_author
LEFT JOIN orders AS o
ON od.id_order = o.id_order
WHERE ( u.country = 'IE'
OR u.country IN ( 'INT' ) )
AND u.status = 1
AND CASE
WHEN ( lp.id_lp > 0 ) THEN lp.id_status = 1
AND lp.published = 1
AND lpcp.status = 1
AND ( lpcp.country_code = 'IE'
OR lpcp.country_code IN ( 'INT' )
)
ELSE 1
END
AND CASE
WHEN ( wc.id_wc > 0 ) THEN wc.wc_api_status = 1
AND wc.id_status = 1
AND wc.wc_type = 0
AND
wc.class_date > '2010-06-16 11:44:40'
AND wccp.status = 1
AND ( wccp.country_code = 'IE'
OR wccp.country_code IN ( 'INT' )
)
ELSE 1
END
AND CASE
WHEN ( od.id_od > 0 ) THEN od.id_author = td.id_tutor
AND o.order_status = 'paid'
AND CASE
WHEN ( od.id_wc > 0 ) THEN od.can_attend_class = 1
ELSE 1
END
ELSE 1
END
AND ( ttagrels.id_tag IN ( 1816, 2642, 1906, 1398,
2436, 2940, 1973, 2791, 1389 ) )
GROUP BY td.id_tutor
HAVING key_1_total_matches = 1
ORDER BY tutor_popularity DESC,
u.surname ASC,
u.name ASC
LIMIT 0, 20
The numbers inside the IN() are actually ids of another table called Tags which matched the search keywords entered by users. In this example the user has searched "class".
See the explain output of this query here:-
http://www.test.examvillage.com/Screenshot.png
The time taken by this query is 0.0536 sec
But, if the number of values in the ttagrels.id_tag in () increases (as user enters more search keywords), the execution time rises to around 1-5 seconds and more.For example, if user searched "class Available to tutors and students 3 times a day"
the execution time is 4.2226 sec. The explain query output for this query contains 2513 under rows.
There are a total of 6,152 records in the All_Tag_Relations table. Is any further optimization possible?
I don't know your databse, so I cannot give a definitive answer.
I notice in your query the Distinct keyword twice SUM(DISTINCT( and COUNT(DISTINCT(
This implies that your select statement returns several rows with the same id_tag or id_od. This means that your intermediate result before grouping contains to many rows. You must narrow down your where clauses to limit the rows.
Insert count(*) in your query, and observe the number of rows. If it is larger than expected, this explains your problem.