How to calculate the z score after joining 3 tables in MySQL - sql

I have joined three tables A, B, D using this query,
SELECT [A].ID, [A].Surname, [A].[Given Name], [B].[Pre-U Grade], [D ].[Total Score], [B].[score]
FROM ([A] LEFT JOIN [D] ON [A].ID = [D].[Student ID]) INNER JOIN [B-Results] ON [A].ID = [B].ID
WHERE ((([B].[Pre-U Grade])=IsNumeric([B]![Pre-U Grade])) AND (([D].[Total Score]) Is Not Null) AND (([A].Status) Not In ("REJECTED","OFFERED","WITHDRAWN"))) OR ((([B].[Pre-U Grade])>"0") AND (([D].[Total Score]) Is Not Null) AND (([A].Status) Not In ("REJECTED","OFFERED","WITHDRAWN")))
ORDER BY [D].[Date] DESC;
After joining the tables, the z-score for the 3 numerical columns needs to be calculated.
I came across this example
Calculating Z-Score for each row in MySQL? (simple)
but i didnt know how to use the code given for my problem statement. Can someone kindly help me with this?

SELECT
(pre-u_grade - AVG(pre-u_grade))/STD(pre-u_grade) z_pre-u_grade,
(total_score- AVG(total_score))/STD(total_score) z_total_score,
(score- AVG(score))/STD(score) z_score,
(SELECT
a.id,
a.surname,
a.given_name,
pre-u_grade,
total_score,
score
FROM
a
LEFT JOIN
d
ON
a.id = d.student id)
INNER JOIN
b.results
ON
a.id = b.id
WHERE
(
( b.pre-u_grade = ISNUMERIC(b ! pre-u_grade)
AND d.total score IS NOT NULL
AND a.status NOT IN ( "rejected", "offered", "withdrawn) )
OR
( b.pre-u_grade > 0
AND d.total score ) IS NOT NULL
AND a.status NOT IN ( "rejected", "offered", "withdrawn" ) )
)
ORDER BY
d.date DESC) result;
Try this.

Related

Counting and grouping NULL and non NULL values with count results in separate columns

Really stumped on this one. I'm trying to figure out how i can get my SQL query shown below to do a count of null and not null aircraft ID's with a table of results that has two count columns, one for NULL aircraft IDs and another for Not NULL aircraft IDs and is grouped by operator, so it looks something like this:
SELECT DISTINCT org.organization "operator",
ah.aircraft_registration_country "country",
ah.aircraft_registration_region "region",
acl.aircraft_master_series "aircraft type",
ah.publish_date "publish date",
f.aircraft_id "aircraft_id"
FROM ((((("flights"."tracked_utilization" f
left join "pond_dataops_analysis"."latest_aircraft" a
ON ( a.aircraft_id = f.aircraft_id ))
left join fleets.aircraft_all_history_latest ah
ON ( ( ( ah.aircraft_id = f.aircraft_id )
AND ( Coalesce(f.actual_runway_departure_time_local,
actual_gate_departure_time_local,
published_gate_departure_time_local) >=
ah.start_event_date ) )
AND ( Coalesce(f.actual_runway_departure_time_local,
actual_gate_departure_time_local,
published_gate_departure_time_local) <
ah.end_event_date ) ))
left join fleets.organizations_latest org
ON ( org.organization_id = ah.operator_organization_id )))
left join fleets.aircraft_usage_history_latest ash
ON ( ( ( ( ash.aircraft_id = f.aircraft_id )
AND ( start_event_date >= ash.usage_start_date ) )
AND ( start_event_date < ash.usage_end_date ) )
AND ( aircraft_usage_classification = 'Primary' ) )
left join fleets.aircraft_configuration_history_latest accl
ON ash.aircraft_id = accl.aircraft_id
left join fleets.aircraft_configurations_latest acl
ON accl.aircraft_configuration_id = acl.aircraft_configuration_id
)
WHERE (((( f.flight_departure_date > ( "Now"() - interval '90' day ) ))))
Not sure how to do a 'count/group by' so that the query can show what i'm after.
Regards,
Mark
Something like this:
select
x, y, z,
sum( case when aircraft_id is null then 1 else 0 end ) as null_cnt,
sum( case when aircraft_id is null then 0 else 1 end ) as notnull_cnt
from
(inline subquery)
group by
x, y, z
FWIW, you don't need all those parentheses in your query, they are unnecessary and more confusing than helpful. They do have their place in some cases, especially when dealing with "OR" conditions, but for this query they are completely superfluous:
FROM
"flights"."tracked_utilization" f
left join "pond_dataops_analysis"."latest_aircraft" a
ON a.aircraft_id = f.aircraft_id
left join fleets.aircraft_all_history_latest ah
ON ah.aircraft_id = f.aircraft_id
AND Coalesce(f.actual_runway_departure_time_local, actual_gate_departure_time_local, published_gate_departure_time_local) >= ah.start_event_date
AND Coalesce(f.actual_runway_departure_time_local, actual_gate_departure_time_local, published_gate_departure_time_local) < ah.end_event_date
left join fleets.organizations_latest org
ON org.organization_id = ah.operator_organization_id
left join fleets.aircraft_usage_history_latest ash
ON ash.aircraft_id = f.aircraft_id
AND start_event_date >= ash.usage_start_date
AND start_event_date < ash.usage_end_date
AND aircraft_usage_classification = 'Primary'
left join fleets.aircraft_configuration_history_latest accl
ON ash.aircraft_id = accl.aircraft_id
left join fleets.aircraft_configurations_latest acl
ON accl.aircraft_configuration_id = acl.aircraft_configuration_id
WHERE
f.flight_departure_date > "Now"() - interval '90' day

SQL:Last date of Join table

Table:
msub ->id,receive
msublist ->id,sub_id,item_id,qty
i'm try.
select a.sub_id
, a.item_id
, a.qty
, b.id
, b.receive_date
from msublist a
join (select x.id
, x.receive_date
from msub x
where x.receive_date = (select max(x1.receive_date)
from msub x1
where x1.id = x.id)) b
on (a.sub_id = b.id)
order by a.item_id,b.receive_date desc
it'not work.I want to be show lastdate of item_id
Please try the following...
SELECT msublist.sub_id AS msub_id,
msublist.item_id AS item_id,
msublist.qty AS qty,
msublist.id AS msublist_id,
mostRecentDate AS receive_date
FROM msub
JOIN msublist ON msub.id = msublist.sub_id
JOIN ( SELECT item_id,
MAX( receive_date ) AS mostRecentDate
FROM msub
JOIN msublist ON msub.id = msublist.sub_id
GROUP BY item_id
) AS mostRecentDateFinder ON msublist.item_id = mostRecentDateFinder.item_id
AND msub.receive_date = mostRecentDateFinder.mostRecentDate
ORDER BY item_id;
This statement starts with a subquery that performs an INNER JOIN between msub and msublist, then groups the results by item_id. Then for each group (i.e. for each item_id value) it finds the maximum value of the corresponding receive_date values.
The resulting list then has an INNER JOIN performed between it and the dataset that results from joining msub and msublist on their common value in such a way that only those rows from the msub / msublist dataset that have a matching combination of item_id and mostRecentDate are retained.
The resulting dataset is then sorted by the value of item_id.
Finally, the desired fields are then returned.
If you have any questions or comments, then please feel free to post a Comment accordingly.
Try this:-
select a.sub_id
, a.item_id
, a.qty
, b.id
, b.receive_date
from msublist a
inner join
(
Select a.sub_id,b.id,a.item_id, max(b.receive_dt) as receive_date
from
msublist a
inner join
msub b
on a.sub_id=b.id
group by a.sub_id,b.id,a.item_id
) b
on a.sub_id=b.sub_id and a.item_id=b.item_id
Let me know if you have any questions
Try this
select a.sub_id
, a.item_id
, a.qty
, b.id
, b.receive_date
from msublist a
join (select x.id
, x.receive_date
from msub x
where x.receive_date = (select max(x1.receive_date)
from msub x1
)) b
on a.sub_id = b.id
order by a.item_id,b.receive_date desc
Is this is the one you are looking for??

Window function issue - max over partition

I try to rewrite such SQL statements (with many subqueries) to more efficient form using outer join and max/count/... over partition. Old statements:
select a.ID,
(select max(b.valA) from something b where a.ID = b.ID_T and b.status != 0),
(select max(b.valB) from something b where a.ID = b.ID_T and b.status != 0),
(select max(b.valC) from something b where a.ID = b.ID_T and b.status != 0),
(select max(b.valD) from something b where a.ID = b.ID_T)
from tool a;
What is important here - there is different condition for max(b.valD). Firstly I didn't noticed this difference and write something like this:
select distinct a.ID,
max(b.valA) over (partition by b.ID_T),
max(b.valB) over (partition by b.ID_T),
max(b.valC) over (partition by b.ID_T),
max(b.valD) over (partition by b.ID_T),
from tool a,
(select * from something
where status != 0) b
where a.ID = b.ID_T(+);
Could I use somewhere in max over partition this condition of b.status != 0 ? Or should I better add 3rd table to join like this:
select distinct a.ID,
max(b.valA) over (partition by b.ID_T),
max(b.valB) over (partition by b.ID_T),
max(b.valC) over (partition by b.ID_T),
max(c.valD) over (partition by c.ID_T),
from tool a,
(select * from something
where status != 0) b,
something c
where a.ID = b.ID_T(+)
and a.ID = c.ID_T(+);
The issue is with selecting and joining millions of rows, my example is just simplification of my query. Could anyone help me to achieve more efficient sql?
You could try to do this using CASE:
select a.ID,
max(CASE WHEN b.status=0 THEN b.valA END),
max(CASE WHEN b.status=0 THEN b.valB END),
max(CASE WHEN b.status=0 THEN b.valC END),
max(b.valD)
from tool a
left join something b ON( b.ID_T = a.ID )
group by a.ID;
Note that I replaced your implicit join by the "new" join-syntax for better readability.
One more way is to use JOIN and group by subquery:
select a.ID,
b.MAX_A,
b.MAX_B,
b.MAX_C,
b2.MAX_D
from tool a
LEFT JOIN
(
SELECT ID_T,max(valA) MAX_A, max(valB) MAX_B, max(valC) MAX_C
FROM something
WHERE status != 0
GROUP BY ID_T
) b
ON a.ID=b.ID_T
LEFT JOIN
(
SELECT ID_T, max(valD) MAX_D
FROM something
GROUP BY ID_T
) b2
ON a.ID=b2.ID_T

MSSQL Inner Join on Concatenated Column

I'm not a DBA so please don't yell at me. Trying to do an inner join and Group By using a concatenated column. The ON statement is producing a syntax error. I do not have access to the original tables and am trying to normalize this into another table, I know its ugly. Not overly worried about performance, just need to work. Cant use functions either.
SELECT DISTINCT A.[carrier_code],[carrier_name], [carrier_grouping], A.[collector_name], [dataset_loaded], [docnum], [envoy_payer_id], [loc], [market], [master_payor_grouping], [plan_class], [plan_name], A.[resp_ins],A.[resp_ind], A.[resp_payor_grouping], A.[Resp_Plan_Type], A.[rspphone], A.[state], A.[sys],A.[resp_ins]+A.[resp_payor_grouping]+A.[carrier_code]+A.[state]+A.[Collector_Name] as ExtId
FROM [Table1] A
INNER JOIN
(SELECT [resp_ins]+[resp_payor_grouping]+[carrier_code]+[state]+[Collector_Name] as Extid
FROM [Table1]
WHERE [resp_ind] = 'Insurance'
GROUP BY [resp_ins]+[resp_payor_grouping]+[carrier_code]+[state]+[Collector_Name]) B
ON A.[resp_ins]+A.[resp_payor_grouping]+A.[carrier_code]+A.[state]+A.[Collector_Name] = B.[resp_ins]+B.[resp_payor_grouping]+B.[carrier_code]+B.[state]+B.[Collector_Name];
My ON and Group By statements are eventually the primary key in new table.
Your alias B hasn't columns as you mentioned. It has just on column Extid.
SELECT DISTINCT A.[carrier_code],[carrier_name], [carrier_grouping], A.[collector_name], [dataset_loaded], [docnum], [envoy_payer_id], [loc], [market], [master_payor_grouping], [plan_class], [plan_name], A.[resp_ins],A.[resp_ind], A.[resp_payor_grouping], A.[Resp_Plan_Type], A.[rspphone], A.[state], A.[sys],A.[resp_ins]+A.[resp_payor_grouping]+A.[carrier_code]+A.[state]+A.[Collector_Name] as ExtId
FROM [Table1] A
INNER JOIN
(SELECT [resp_ins]+[resp_payor_grouping]+[carrier_code]+[state]+[Collector_Name] as Extid
FROM [Table1]
WHERE [resp_ind] = 'Insurance'
GROUP BY [resp_ins]+[resp_payor_grouping]+[carrier_code]+[state]+[Collector_Name]) B
ON A.[resp_ins]+A.[resp_payor_grouping]+A.[carrier_code]+A.[state]+A.[Collector_Name] = B.Extid;
Try this, I didn't put all the column in result, you can manage yourself.
select A.*
from
(
select [carrier_code],[carrier_name], [sys],[resp_ins]+[resp_payor_grouping]+[carrier_code]+[state]+[Collector_Name] as ExtId
FROM [Table1]
) A
inner join
(
select distinct Extid
from
(
SELECT [resp_ins]+[resp_payor_grouping]+[carrier_code]+[state]+[Collector_Name] as ExtId
FROM [Table1]
WHERE [resp_ind] = 'Insurance'
) ins
) B on (A.ExtId = B.ExtId)
You don't need to concatenate the values - you can GROUP BY and JOIN on multiple columns.
SELECT DISTINCT
...
FROM
[Table1] A
INNER JOIN
(
SELECT
[resp_ins],
[resp_payor_grouping],
[carrier_code],
[state],
[Collector_Name]
FROM
[Table1]
WHERE
[resp_ind] = 'Insurance'
GROUP BY
[resp_ins],
[resp_payor_grouping],
[carrier_code],
[state],
[Collector_Name]
) B
ON
(
A.[resp_ins] = B.[resp_ins]
Or
(A.[resp_ins] Is Null And B.[resp_ins] Is Null)
)
And
(
A.[resp_payor_grouping] = B.[resp_payor_grouping]
Or
(A.[resp_payor_grouping] Is Null And B.[resp_payor_grouping] Is Null)
)
And
(
A.[carrier_code] = B.[carrier_code]
Or
(A.[carrier_code] Is Null And B.[carrier_code] Is Null)
)
And
(
A.[state] = B.[state]
Or
(A.[state] Is Null And B.[state] Is Null)
)
And
(
A.[Collector_Name] = B.[Collector_Name]
Or
(A.[Collector_Name] Is Null And B.[Collector_Name] Is Null)
)
;

Inner join that ignore singlets

I have to do an self join on a table. I am trying to return a list of several columns to see how many of each type of drug test was performed on same day (MM/DD/YYYY) in which there were at least two tests done and at least one of which resulted in a result code of 'UN'.
I am joining other tables to get the information as below. The problem is I do not quite understand how to exclude someone who has a single result row in which they did have a 'UN' result on a day but did not have any other tests that day.
Query Results (Columns)
County, DrugTestID, ID, Name, CollectionDate, DrugTestType, Results, Count(DrugTestType)
I have several rows for ID 12345 which are correct. But ID 12346 is a single row of which is showing they had a row result of count (1). They had a result of 'UN' on this day but they did not have any other tests that day. I want to exclude this.
I tried the following query
select
c.desc as 'County',
dt.pid as 'PID',
dt.id as 'DrugTestID',
p.id as 'ID',
bio.FullName as 'Participant',
CONVERT(varchar, dt.CollectionDate, 101) as 'CollectionDate',
dtt.desc as 'Drug Test Type',
dt.result as Result,
COUNT(dt.dru_drug_test_type) as 'Count Of Test Type'
from
dbo.Test as dt with (nolock)
join dbo.History as h on dt.pid = h.id
join dbo.Participant as p on h.pid = p.id
join BioData as bio on bio.id = p.id
join County as c with (nolock) on p.CountyCode = c.code
join DrugTestType as dtt with (nolock) on dt.DrugTestType = dtt.code
inner join
(
select distinct
dt2.pid,
CONVERT(varchar, dt2.CollectionDate, 101) as 'CollectionDate'
from
dbo.DrugTest as dt2 with (nolock)
join dbo.History as h2 on dt2.pid = h2.id
join dbo.Participant as p2 on h2.pid = p2.id
where
dt2.result = 'UN'
and dt2.CollectionDate between '11-01-2011' and '10-31-2012'
and p2.DrugCourtType = 'AD'
) as derived
on dt.pid = derived.pid
and convert(varchar, dt.CollectionDate, 101) = convert(varchar, derived.CollectionDate, 101)
group by
c.desc, dt.pid, p.id, dt.id, bio.fullname, dt.CollectionDate, dtt.desc, dt.result
order by
c.desc ASC, Participant ASC, dt.CollectionDate ASC
This is a little complicated because the your query has a separate row for each test. You need to use window/analytic functions to get the information you want. These allow you to do calculate aggregation functions, but to put the values on each line.
The following query starts with your query. It then calculates the number of UN results on each date for each participant and the total number of tests. It applies the appropriate filter to get what you want:
with base as (<your query here>)
select b.*
from (select b.*,
sum(isUN) over (partition by Participant, CollectionDate) as NumUNs,
count(*) over (partition by Partitipant, CollectionDate) as NumTests
from (select b.*,
(case when result = 'UN' then 1 else 0 end) as IsUN
from base
) b
) b
where NumUNs <> 1 or NumTests <> 1
Without the with clause or window functions, you can create a particularly ugly query to do the same thing:
select b.*
from (<your query>) b join
(select Participant, CollectionDate, count(*) as NumTests,
sum(case when result = 'UN' then 1 else 0 end) as NumUNs
from (<your query>) b
group by Participant, CollectionDate
) bsum
on b.Participant = bsum.Participant and
b.CollectionDate = bsum.CollectionDate
where NumUNs <> 1 or NumTests <> 1
If I understand the problem, the basic pattern for this sort of query is simply to include negating or exclusionary conditions in your join. I.E., self-join where columnA matches, but columns B and C do not:
select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
and t1.PkId != t2.PkId
and t1.category != t2.category
)
Put the conditions in the WHERE clause if it benchmarks better:
select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
)
where
t1.PkId != t2.PkId
and t1.category != t2.category
And it's often easiest to start with the self-join, treating it as a "base table" on which to join all related information:
select
[columns]
from
(select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
)
where
t1.PkId != t2.PkId
and t1.category != t2.category
) bt
join [othertable] on (<whatever>)
join [othertable] on (<whatever>)
join [othertable] on (<whatever>)
This can allow you to focus on getting that self-join right, without interference from other tables.