Left join not returning record with null - sql

Here's the structure of my database (used to collect info from a survey):
question_responses->answer_choices->subquestions->questions->survey
| ^
V |
submissions->response_group<----------------------------survey_deployment
where A->B means A has the PK of B
So, I want to get the responses of a particular user; let's say that his submission pk is 1. I had written the query
select question_responses.answer_text
from submissions join question_responses on (question_responses.submission_pk = submission.pk)
join answer_choices on (question_responses.answer_choices_pk = answer_choices.pk)
join subquestions on (answer_choices.subquestions_pk = subquestions.pk)
right join questions on (subquestions.questions_pk = questions.pk)
where submissions.pk = 1
order by questions.order
It is possible to not answer questions. In such a case, there is no question_responses.pk recorded. I would still like to be able to account for skipping, so I need this record to be returned, even if there is no response. I thought that the right join would have accounted for this record, but apparently not.
Any help is greatly appreciated.

Because you have submissions.pk = 1 in the where clause this effectively turns your outer join into an inner join, since any missing submission will have a pk of null. and null = 1 is not true.
You could rewrite your query using a LEFT JOIN, and selecting from questions:
select question_responses.answer_text
from questions
left join (subquestions
inner join answer_choices
on answer_choices.subquestions_pk = subquestions.pk
inner join question_responses
on question_responses.answer_choices_pk = answer_choices.pk
inner join submissions
on question_responses.submission_pk = submission.pk)
on subquestions.questions_pk = questions.pk
and submissions.pk = 1
order by questions.order;
This is similar to doing something like:
select subquery.answer_text
from questions
left join
( select question_responses.answer_text, subquestions.questions_pk
from subquestions
inner join answer_choices
on answer_choices.subquestions_pk = subquestions.pk
inner join question_responses
on question_responses.answer_choices_pk = answer_choices.pk
inner join submissions
on question_responses.submission_pk = submission.pk
where submissions.pk = 1
) subquery
on subquery.questions_pk = questions.pk
order by questions.order;
but depending on your dbms the former may perform better as it has no derived tables.

Check out this, does it help?
SELECT question_responses.answer_text
FROM questions LEFT JOIN subquestions on (subquestions.questions_pk = questions.pk)
LEFT JOIN answer_choices on (answer_choices.subquestions_pk = subquestions.pk)
LEFT JOIN question_responses on (question_responses.answer_choices_pk = answer_choices.pk)
LEFT JOIN submissions on (question_responses.submission_pk = submission.pk)
WHERE submissions.pk = 1
ORDER BY questions.order

Related

Speed up SQL query performance with nested queries

Could anyone help me speed this query up? It currently take 17 minutes to run but does return the correct data and it populates a subform in MS Access. Functions in the rest of the VBA are declared as long to try to speed up more.
Here's the full query:
SELECT lots of things
FROM (((((((((((((((ngstest
INNER JOIN patients
ON ngstest.internalpatientid = patients.internalpatientid)
INNER JOIN referral
ON ngstest.referralid = referral.referralid)
INNER JOIN checker
ON ngstest.bookby = checker.check1id)
INNER JOIN ngspanel
ON ngstest.ngspanelid = ngspanel.ngspanelid)
LEFT JOIN ngspanel AS ngspanel_1
ON ngstest.ngspanelid_b = ngspanel_1.ngspanelid)
INNER JOIN status
ON ngstest.statusid = status.statusid)
INNER JOIN dbo_patient_table
ON patients.patientid = dbo_patient_table.patienttrustid)
LEFT JOIN dna
ON ngstest.dna = dna.dnanumber)
INNER JOIN status AS status_1
ON patients.s_statusoverall = status_1.statusid)
LEFT JOIN gw_gendertable
ON dbo_patient_table.genderid = gw_gendertable.genderid)
LEFT JOIN ngswesbatch
ON ngstest.wesbatch = ngswesbatch.ngswesbatchid)
LEFT JOIN checker AS checker_1
ON ngstest.check1id = checker_1.check1id)
LEFT JOIN checker AS checker_2
ON ngstest.check2id = checker_2.check1id)
LEFT JOIN checker AS checker_3
ON ngstest.check3id = checker_3.check1id)
LEFT JOIN ngspanel AS ngspanel_2
ON ngstest.ngspanelid_c = ngspanel_2.ngspanelid)
LEFT JOIN checker AS checker_4
ON ngstest.check4id = checker_4.check1id
WHERE ((ngstest.referralid IN
(SELECT referralid FROM referral
WHERE grouptypeid = 14)
AND ngstest.ngstestid IN
(SELECT ngstest.ngstestid
FROM ngsanalysis
INNER JOIN ngstest
ON ngsanalysis.ngstestid = ngstest.ngstestid
WHERE ngsanalysis.pedigree = 3302) )
AND status.statusid = 1202218800)
ORDER BY ngstest.priority,
ngstest.daterequested;
The two nested queries are strings from elsewhere in the code so are called in the vba as " & includereferralls & " And " & ParentsStatusesFilter & "
They are:
ParentsStatusesFilter = "NGSTest.NGSTestID in
(SELECT NGSTest.NGSTestID
FROM NGSAnalysis
INNER JOIN NGSTest
ON NGSAnalysis.NGSTestID = NGSTest.NGSTestID
WHERE NGSAnalysis.Pedigree IN (3302,3303,3304)"
And
includereferrals = "NGSTest.ReferralID
(SELECT referralid FROM referral WHERE referral.grouptypeid = 14)"
The query needs to remain readable (and therefore editable) so can't use things like Distinct, Group By or contain any Unions. Have tried Exists instead of In for the nested queries but that stops it from actually filtering the results.
WHERE EXISTS (SELECT NGSTest.NGSTestID
FROM NGSAnalysis
INNER JOIN NGSTest
ON NGSAnalysis.NGSTestID = NGSTest.NGSTestID
WHERE NGSAnalysis.Pedigree IN (3302,3303,3304)
So the exist clause you have there isn't tied to the outer query which would run similar to just added 1 = 1 to the where clause. I took your where clause and converted it. It should look something like this...
WHERE EXISTS (
SELECT referralid
FROM referral
WHERE grouptypeid = 14 AND ngstest.referralid = referral.referralid)
AND EXISTS (
SELECT ngsanalysis.ngstestid
FROM ngsanalysis
WHERE ngsanalysis.pedigree IN (3302,3303,3304) AND ngstest.ngstestid = ngsanalysis.ngstestid
)
AND status.statusid = 1202218800
Adding exists will speed it up a bit, but the the bulk of the slowness is the left joins. Access does not handle the left joins as well as SQL Server does. Change all your joins to inner joins and you will see the query runs very fast. This is obviously not ideal since some relationships are optional. What I have done to get around this is add a default record that replaces a null relationship.
Here is what that looks like for you: In the checker table you could add a record that represents a null value. So put a record into the checker table with check1id of -1 or 0. Then default check1id, check2id, check3id on ngstest to -1 or 0. You will need to do that type of thing for all tables you need to left join on.

SQL-Get and replace minimum value of a query's field

I have the following query in Postgresql :
SELECT mq.nombre,sa.nombre,COUNT(DISTINCT(ae.numero_serie)),SUM(im.fin-im.inicio),MIN(pz.fecha_inicio)
FROM item_metraje AS im LEFT JOIN articulo_especificado AS ae ON (im.id_articulo_especificado = ae.id)
LEFT JOIN articulo AS a ON (ae.id_articulo = a.id)
LEFT JOIN serie_articulo AS sa ON (a.id_serie_articulo = sa.id)
LEFT JOIN reporte_perforacion AS rp ON (rp.id = im.id_reporte_perforacion)
LEFT JOIN pozo AS pz ON (pz.id=rp.id_pozo) LEFT JOIN proyecto AS p ON (p.id=pz.id_proyecto)
LEFT JOIN maquina_perforacion AS mq ON (mq.id = pz.id_maquina)
WHERE p.id = 2 GROUP BY mq.nombre,sa.nombre
and the result is :
However I want to put the minimum date for the rows that have the same value of the field 'nombre', for example for the value 'JM04' the three rows must have the date 2015-01-25 because it is the minimum value of the three rows.
Excuse me for my English and thanks for all.
You can use Window functions for this purpose. MIN(pz.fecha_inicio) over (partition by mq.nombre).
Therefore the final query is,
SELECT z.nombre1,z.nombre2,z.count,z.sum ,MIN(z.date) over (partition by z.nombre1) from
(SELECT mq.nombre as nombre1 ,sa.nombre as nombre2,COUNT(DISTINCT(ae.numero_serie)) as count,SUM(im.fin-im.inicio) as sum ,pz.fecha_inicio as date
FROM item_metraje AS im LEFT JOIN articulo_especificado AS ae ON (im.id_articulo_especificado = ae.id)
LEFT JOIN articulo AS a ON (ae.id_articulo = a.id)
LEFT JOIN serie_articulo AS sa ON (a.id_serie_articulo = sa.id)
LEFT JOIN reporte_perforacion AS rp ON (rp.id = im.id_reporte_perforacion)
LEFT JOIN pozo AS pz ON (pz.id=rp.id_pozo) LEFT JOIN proyecto AS p ON (p.id=pz.id_proyecto)
LEFT JOIN maquina_perforacion AS mq ON (mq.id = pz.id_maquina)
WHERE p.id = 2 GROUP BY mq.nombre,sa.nombre)z
You can modify this with the help of order by or having clauses inside window function as you want. I tried this with my own data set. Hope this helps.

LEFT JOIN ON COALESCE(a, b, c) - very strange behavior

I have encountered very strange behavior of my query and I wasted a lot of time to understand what causes it, in vane. So I am asking for your help.
SELECT count(*) FROM main_table
LEFT JOIN front_table ON front_table.pk = main_table.fk_front_table
LEFT JOIN info_table ON info_table.pk = front_table.fk_info_table
LEFT JOIN key_table ON key_table.pk = COALESCE(info_table.fk_key_table, front_table.fk_key_table_1, front_table.fk_key_table_2)
LEFT JOIN side_table ON side_table.fk_front_table = front_table.pk
WHERE side_table.pk = (SELECT MAX(pk) FROM side_table WHERE fk_front_table = front_table.pk)
OR side_table.pk IS NULL
Seems like a simple join query, with coalesce, I've used this technique before(not too many times) and it worked right.
In this query I don't ever get nulls for side_table.pk. If I remove coalesce or just don't use key_table, then the query returns rows with many null side_table.pk, but if I add coalesce join, I can't get those nulls.
It seems key_table and side_table don't have anything in common, but the result is so weird.
Also, when I don't use side_table and WHERE clause, the count(*) result with coalesce and without differs, but I can't see any pattern in rows missing, it seems random!
Real query:
SELECT ECHANGE.EXC_AUTO_KEY, STOCK_RESERVATIONS.STR_AUTO_KEY FROM EXCHANGE
LEFT JOIN WO_BOM ON WO_BOM.WOB_AUTO_KEY = EXCHANGE.WOB_AUTO_KEY
LEFT JOIN VIEW_WO_SUB ON VIEW_WO_SUB.WOO_AUTO_KEY = WO_BOM.WOO_AUTO_KEY
LEFT JOIN STOCK stock3 ON stock3.STM_AUTO_KEY = EXCHANGE.STM_AUTO_KEY
LEFT JOIN STOCK stock2 ON stock2.STM_AUTO_KEY = EXCHANGE.ORIG_STM
LEFT JOIN CONSIGNMENT_CODES con2 ON con2.CNC_AUTO_KEY = stock2.CNC_AUTO_KEY
LEFT JOIN CONSIGNMENT_CODES con3 ON con3.CNC_AUTO_KEY = stock3.CNC_AUTO_KEY
LEFT JOIN CI_UTL ON CI_UTL.CUT_AUTO_KEY = EXCHANGE.CUT_AUTO_KEY
LEFT JOIN PART_CONDITION_CODES pcc2 ON pcc2.PCC_AUTO_KEY = stock2.PCC_AUTO_KEY
LEFT JOIN PART_CONDITION_CODES pcc3 ON pcc3.PCC_AUTO_KEY = stock3.PCC_AUTO_KEY
LEFT JOIN STOCK_RESERVATIONS ON STOCK_RESERVATIONS.STM_AUTO_KEY = stock3.STM_AUTO_KEY
LEFT JOIN WAREHOUSE wh2 ON wh2.WHS_AUTO_KEY = stock2.WHS_ORIGINAL
LEFT JOIN SM_HISTORY ON (SM_HISTORY.STM_AUTO_KEY = EXCHANGE.ORIG_STM AND SM_HISTORY.WOB_REF = EXCHANGE.WOB_AUTO_KEY)
LEFT JOIN RC_DETAIL ON stock3.RCD_AUTO_KEY = RC_DETAIL.RCD_AUTO_KEY
LEFT JOIN RC_HEADER ON RC_HEADER.RCH_AUTO_KEY = RC_DETAIL.RCH_AUTO_KEY
LEFT JOIN WAREHOUSE wh3 ON wh3.WHS_AUTO_KEY = COALESCE(RC_DETAIL.WHS_AUTO_KEY, stock3.WHS_ORIGINAL, stock3.WHS_AUTO_KEY)
WHERE STOCK_RESERVATIONS.STR_AUTO_KEY = (SELECT MAX(STR_AUTO_KEY) FROM STOCK_RESERVATIONS WHERE STM_AUTO_KEY = stock3.STM_AUTO_KEY)
OR STOCK_RESERVATIONS.STR_AUTO_KEY IS NULL
Removing LEFT JOIN WAREHOUSE wh3 gives me about unique EXC_AUTO_KEY values with a lot of NULL STR_AUTO_KEY, while leaving this row removes all NULL STR_AUTO_KEY.
I recreated simple tables with numbers with the same structure and query works without any problems o.0
I have a feeling COALESCE is acting as a REQUIRED flag for the joined table, hence shooting the LEFT JOIN to become an INNER JOIN.
Try this:
SELECT COUNT(*)
FROM main_table
LEFT JOIN front_table ON front_table.pk = main_table.fk_front_table
LEFT JOIN info_table ON info_table.pk = front_table.fk_info_table
LEFT JOIN key_table ON key_table.pk = NVL(info_table.fk_key_table, NVL(front_table.fk_key_table_1, front_table.fk_key_table_2))
LEFT JOIN (SELECT fk_, MAX(pk) as pk FROM side_table GROUP BY fk_) st ON st.fk_ = front_table.pk
NVL might behave just the same though...
I undertood what was the problem (not entirely though): there is a LEFT JOIN VIEW_WO_SUB in original query, 3rd line. It causes this query to act in a weird way.
When I replaced the view with the other table which contained the information I needed, the query started returning right results.
Basically, with this view join, NVL, COALESCE or CASE join with combination of certain arguments did not work along with OR clause in WHERE subquery, all rest was fine. ALthough, I could get the query to work with this view join, by changing the order of joined tables, I had to place table participating in where subquery to the bottom.

How to improve the performance of a SQL query even after adding indexes?

I am trying to execute the following sql query but it takes 22 seconds to execute. the number of returned items is 554192. I need to make this faster and have already put indexes in all the tables involved.
SELECT mc.name AS MediaName,
lcc.name AS Country,
i.overridedate AS Date,
oi.rating,
bl1.firstname + ' ' + bl1.surname AS Byline,
b.id BatchNo,
i.numinbatch ItemNumberInBatch,
bah.changedatutc AS BatchDate,
pri.code AS IssueNo,
pri.name AS Issue,
lm.neptunemessageid AS MessageNo,
lmt.name AS MessageType,
bl2.firstname + ' ' + bl2.surname AS SourceFullName,
lst.name AS SourceTypeDesc
FROM profiles P
INNER JOIN profileresults PR
ON P.id = PR.profileid
INNER JOIN items i
ON PR.itemid = I.id
INNER JOIN batches b
ON b.id = i.batchid
INNER JOIN itemorganisations oi
ON i.id = oi.itemid
INNER JOIN lookup_mediachannels mc
ON i.mediachannelid = mc.id
LEFT OUTER JOIN lookup_cities lc
ON lc.id = mc.cityid
LEFT OUTER JOIN lookup_countries lcc
ON lcc.id = mc.countryid
LEFT OUTER JOIN itembylines ib
ON ib.itemid = i.id
LEFT OUTER JOIN bylines bl1
ON bl1.id = ib.bylineid
LEFT OUTER JOIN batchactionhistory bah
ON b.id = bah.batchid
INNER JOIN itemorganisationissues ioi
ON ioi.itemorganisationid = oi.id
INNER JOIN projectissues pri
ON pri.id = ioi.issueid
LEFT OUTER JOIN itemorganisationmessages iom
ON iom.itemorganisationid = oi.id
LEFT OUTER JOIN lookup_messages lm
ON iom.messageid = lm.id
LEFT OUTER JOIN lookup_messagetypes lmt
ON lmt.id = lm.messagetypeid
LEFT OUTER JOIN itemorganisationsources ios
ON ios.itemorganisationid = oi.id
LEFT OUTER JOIN bylines bl2
ON bl2.id = ios.bylineid
LEFT OUTER JOIN lookup_sourcetypes lst
ON lst.id = ios.sourcetypeid
WHERE p.id = #profileID
AND b.statusid IN ( 6, 7 )
AND bah.batchactionid = 6
AND i.statusid = 2
AND i.isrelevant = 1
when looking at the execution plan I can see an step which is costing 42%. Is there any way I could get this to a lower threshold or any way that I can improve the performance of the whole query.
Remove the profiles table as it is not needed and change the WHERE clause to
WHERE PR.profileid = #profileID
You have a left outer join on the batchactionhistory table but also have a condition in your WHERE clause which turns it back into an inner join. Change you code to this:
LEFT OUTER JOIN batchactionhistory bah
ON b.id = bah.batchid
AND bah.batchactionid = 6
You don't need the batches table as it is used to join other tables which could be joined directly and to show the id in you SELECT which is also available in other tables. Make the following changes:
i.batchidid AS BatchNo,
LEFT OUTER JOIN batchactionhistory bah
ON i.batchidid = bah.batchid
Are any of the fields that are used in joins or the WHERE clause from tables that contain large amounts of data but are not indexed. If so try adding an index on at time to the largest table.
Do you need every field in the result - if you could loose one or to you maybe could reduce the number of tables further.
First, if this is not a stored procedure, make it one. That's a lot of text for sql server to complile.
Next, my experience is that "worst practices" are occasionally a good idea. Specifically, I have been able to improve performance by splitting large queries into a couple or three small ones and assembling the results.
If this query is associated with a .net, coldfusion, java, etc application, you might be able to do the split/re-assemble in your application code. If not, a temporary table might come in handy.

SQL joins "going up" two tables

I'm trying to create a moderately complex query with joins:
SELECT `history`.`id`,
`parts`.`type_id`,
`serialized_parts`.`serial`,
`history_actions`.`action`,
`history`.`date_added`
FROM `history_actions`, `history`
LEFT OUTER JOIN `parts` ON `parts`.`id` = `history`.`part_id`
LEFT OUTER JOIN `serialized_parts` ON `serialized_parts`.`parts_id` = `history`.`part_id`
WHERE `history_actions`.`id` = `history`.`action_id`
AND `history`.`unit_id` = '1'
ORDER BY `history`.`id` DESC
I'd like to replace `parts`.`type_id` in the SELECT statement with `part_list`.`name` where the relationship I need to enforce between the two tables is `part_list`.`id` = `parts`.`type_id`. Also I have to use joins because in some cases `history`.`part_id` may be NULL which obviously isn't a valid part id. How would I modify the query to do this?
Here is some sample date as requested:
history table:
(source: ianburris.com)
serialized_parts table:
(source: ianburris.com)
parts table:
(source: ianburris.com)
part_list table:
(source: ianburris.com)
And what I want to see is:
id name serial action date_added
4 Battery 567 added 2010-05-19 10:42:51
3 Antenna Board 345 added 2010-05-19 10:42:51
2 Main Board 123 added 2010-05-19 10:42:51
1 NULL NULL created 2010-05-19 10:42:51
This would at least be on the right track...
If you're looking to NOT show any parts with an invalid ID, simply change the LEFT JOINs to INNER JOINs (they will restrict NULL values)
SELECT `history`.`id`
, `parts`.`type_id`
, `part_list`.`name`
, `serialized_parts`.`serial`
, `history_actions`.`action`
, `history`.`date_added`
FROM `history_actions`
INNER JOIN `history` ON `history`.`action_id` = `history_actions`.`id`
LEFT JOIN `parts` ON `parts`.`id` = `history`.`part_id`
LEFT JOIN `serialized_parts` ON `serialized_parts`.`parts_id` = `history`.`part_id`
LEFT JOIN `part_list` ON `part_list`.`id` = `parts`.`type_id`
WHERE `history`.`unit_id` = '1'
ORDER BY `history`.`id` DESC
Boy, these backticks make my eyes hurt.
SELECT
h.id,
p.type_id,
pl.name,
sp.serial,
ha.action,
h.date_added
FROM
history h
INNER JOIN history_actions ha ON ha.id = h.action_id
LEFT JOIN parts p ON p.id = h.part_id
LEFT JOIN serialized_parts sp ON sp.parts_id = h.part_id
LEFT JOIN part_list pl ON pl.id = p.type_id
WHERE
h.unit_id = '1'
ORDER BY
history.id DESC