Postgresql - Having trouble performing left joins on multiple tables - sql

I'm in the process of moving some Mysql queries over to Postgresql and I ran across this one that doesn't work.
select (tons of stuff)
from trip_publication
left join trip_collection AS "tc" on
tc.id = tp.collection_id
left join
trip_author ta1, (dies here)
trip_person tp1,
trip_institution tai1,
trip_location tail1,
trip_rank tr1
ON
tp.id = ta1.publication_id
AND tp1.id = ta1.person_id
AND ta1.order = 1
AND tai1.id = ta1.institution_id
AND tail1.id = tai1.location_id
AND ta1.rank_id = tr1.id
The query seems to be dying on the "trip_author ta1" line, where I marked it above. The actual error message is:
syntax error at or near ","
LINE 77: (trip_author ta1, trip_person tp1, ...
I went through the docs, and it seems to be correct. What exactly am I doing wrong here? Any feedback would be much appreciated.

I don't know postgres, but in regular SQL you would need to a series of LEFT JOIN statements rather than your comma syntax. You seemed to have started this then stopped after the first two.
Something like:
SELECT * FROM
table1
LEFT JOIN table2 ON match1
LEFT JOIN table3 ON match2
WHERE otherFilters
The alternative is the older SQL syntax of:
SELECT cols
FROM table1, table2, table3
WHERE match AND match2 AND otherFilters
There's a couple of other smaller errors in your SQL, like the fact you forgot your tp alias on your first table, and have tried including a where clause (ta1.order = 1) as a joining constraint.
I think this is what you are after:
select (tons of stuff)
from trip_publication tp
left join trip_collection AS "tc" on tc.id = tp.collection_id
left join trip_author ta1 on ta1.publication_id = tp.id
left join trip_person tp1 on tp1.id = ta1.person_id
left join trip_institution tai1 on tai1.id = ta1.institution_id
left join trip_location tail1 on tail1.id = tai1.location_id
left join trip_rank tr1 on tr1.id = ta1.rank_id
where ta1.order = 1

Your left joins are one per table you are joining
left join trip_author ta1 on ....
left join trip_person tp1 on ....
left join trip_institution on ...
...and so on

Related

Postgresql - Conditional Join if data exist

My current query show the data from the table called "Buque" and has some references from another tables. The problem is when i execute the query it never shows the result because it consumes too much memory i guess.
The current query i have
select buq.buq_codigo, tbu.tbu_codigo, tbu.tbu_nombre, pai.pai_codigo, pai.pai_nombre,
pue.pto_codigo, pue.pto_nombre, lin.lin_codigo, lin.lin_nombre, tra.tra_codigo,
tra.tra_nombre, buq.buq_nombre, buq.buq_des, buq.num_trb, buq.num_eslora,
buq.max_tons, buq.reg_lloyd, buq.buq_codigo1, buq.codigo_omi,
case buq.buq_estado when 'A' then 'Activo' else 'Inactivo' end as buq_estado
from publico.mae_buque as buq, publico.mae_tipbuque as tbu, publico.mae_pais as pai,
publico.mae_puerto as pue, publico.mae_linea as lin, publico.mae_trafico as tra
where buq.tbu_codigo = tbu.tbu_codigo or
buq.pai_codigo = pai.pai_codigo or
buq.pto_codigo = pue.pto_codigo or
buq.lin_codigo = lin.lin_codigo or
buq.tra_codigo = tra.tra_codigo
I also tried with inner joins but the problem is it returns me the data that meets the conditions of the joins. In other words, if the join has data to compare, returns the name, if not, show the null data.
The query must return me 611 records, with inner joins returns 68 records.
Concerning your desired result, use left outer joins, which fill up any non-existing rows of the right hand side table with null-values;
Concerning the out of memory issue, note that you used or to connect your tables; this actually leads to the fact that almost every record of the involved tables is connected to almost every other record (almost a cross join / cartesian product); This can get very large if you connect 6 tables...
select buq.buq_codigo, tbu.tbu_codigo, tbu.tbu_nombre, pai.pai_codigo, pai.pai_nombre,
pue.pto_codigo, pue.pto_nombre, lin.lin_codigo, lin.lin_nombre, tra.tra_codigo,
tra.tra_nombre, buq.buq_nombre, buq.buq_des, buq.num_trb, buq.num_eslora,
buq.max_tons, buq.reg_lloyd, buq.buq_codigo1, buq.codigo_omi,
case buq.buq_estado when 'A' then 'Activo' else 'Inactivo' end as buq_estado
from publico.mae_buque as buq
left outer join publico.mae_tipbuque as tbu on buq.tbu_codigo = tbu.tbu_codigo
left outer join publico.mae_pais as pai on (buq.pai_codigo = pai.pai_codigo)
left outer join publico.mae_puerto as pue on (buq.pto_codigo = pue.pto_codigo)
left outer join publico.mae_linea as lin on (buq.lin_codigo = lin.lin_codigo)
left outer join publico.mae_trafico as tra on (buq.tra_codigo = tra.tra_codigo)
You have to use left outer join:
select *
from
publico.mae_buque as buq
left outer join publico.mae_tipbuque as tbu on (buq.tbu_codigo = tbu.tbu_codigo)
left outer join publico.mae_pais as pai on (buq.pai_codigo = pai.pai_codigo)
left outer join publico.mae_puerto as pue on (buq.pto_codigo = pue.pto_codigo )
left outer join publico.mae_linea as lin on (buq.lin_codigo = lin.lin_codigo)
left outer join publico.mae_trafico as tra on (buq.tra_codigo = tra.tra_codigo);

LEFT JOIN ON COALESCE(a, b, c) - very strange behavior

I have encountered very strange behavior of my query and I wasted a lot of time to understand what causes it, in vane. So I am asking for your help.
SELECT count(*) FROM main_table
LEFT JOIN front_table ON front_table.pk = main_table.fk_front_table
LEFT JOIN info_table ON info_table.pk = front_table.fk_info_table
LEFT JOIN key_table ON key_table.pk = COALESCE(info_table.fk_key_table, front_table.fk_key_table_1, front_table.fk_key_table_2)
LEFT JOIN side_table ON side_table.fk_front_table = front_table.pk
WHERE side_table.pk = (SELECT MAX(pk) FROM side_table WHERE fk_front_table = front_table.pk)
OR side_table.pk IS NULL
Seems like a simple join query, with coalesce, I've used this technique before(not too many times) and it worked right.
In this query I don't ever get nulls for side_table.pk. If I remove coalesce or just don't use key_table, then the query returns rows with many null side_table.pk, but if I add coalesce join, I can't get those nulls.
It seems key_table and side_table don't have anything in common, but the result is so weird.
Also, when I don't use side_table and WHERE clause, the count(*) result with coalesce and without differs, but I can't see any pattern in rows missing, it seems random!
Real query:
SELECT ECHANGE.EXC_AUTO_KEY, STOCK_RESERVATIONS.STR_AUTO_KEY FROM EXCHANGE
LEFT JOIN WO_BOM ON WO_BOM.WOB_AUTO_KEY = EXCHANGE.WOB_AUTO_KEY
LEFT JOIN VIEW_WO_SUB ON VIEW_WO_SUB.WOO_AUTO_KEY = WO_BOM.WOO_AUTO_KEY
LEFT JOIN STOCK stock3 ON stock3.STM_AUTO_KEY = EXCHANGE.STM_AUTO_KEY
LEFT JOIN STOCK stock2 ON stock2.STM_AUTO_KEY = EXCHANGE.ORIG_STM
LEFT JOIN CONSIGNMENT_CODES con2 ON con2.CNC_AUTO_KEY = stock2.CNC_AUTO_KEY
LEFT JOIN CONSIGNMENT_CODES con3 ON con3.CNC_AUTO_KEY = stock3.CNC_AUTO_KEY
LEFT JOIN CI_UTL ON CI_UTL.CUT_AUTO_KEY = EXCHANGE.CUT_AUTO_KEY
LEFT JOIN PART_CONDITION_CODES pcc2 ON pcc2.PCC_AUTO_KEY = stock2.PCC_AUTO_KEY
LEFT JOIN PART_CONDITION_CODES pcc3 ON pcc3.PCC_AUTO_KEY = stock3.PCC_AUTO_KEY
LEFT JOIN STOCK_RESERVATIONS ON STOCK_RESERVATIONS.STM_AUTO_KEY = stock3.STM_AUTO_KEY
LEFT JOIN WAREHOUSE wh2 ON wh2.WHS_AUTO_KEY = stock2.WHS_ORIGINAL
LEFT JOIN SM_HISTORY ON (SM_HISTORY.STM_AUTO_KEY = EXCHANGE.ORIG_STM AND SM_HISTORY.WOB_REF = EXCHANGE.WOB_AUTO_KEY)
LEFT JOIN RC_DETAIL ON stock3.RCD_AUTO_KEY = RC_DETAIL.RCD_AUTO_KEY
LEFT JOIN RC_HEADER ON RC_HEADER.RCH_AUTO_KEY = RC_DETAIL.RCH_AUTO_KEY
LEFT JOIN WAREHOUSE wh3 ON wh3.WHS_AUTO_KEY = COALESCE(RC_DETAIL.WHS_AUTO_KEY, stock3.WHS_ORIGINAL, stock3.WHS_AUTO_KEY)
WHERE STOCK_RESERVATIONS.STR_AUTO_KEY = (SELECT MAX(STR_AUTO_KEY) FROM STOCK_RESERVATIONS WHERE STM_AUTO_KEY = stock3.STM_AUTO_KEY)
OR STOCK_RESERVATIONS.STR_AUTO_KEY IS NULL
Removing LEFT JOIN WAREHOUSE wh3 gives me about unique EXC_AUTO_KEY values with a lot of NULL STR_AUTO_KEY, while leaving this row removes all NULL STR_AUTO_KEY.
I recreated simple tables with numbers with the same structure and query works without any problems o.0
I have a feeling COALESCE is acting as a REQUIRED flag for the joined table, hence shooting the LEFT JOIN to become an INNER JOIN.
Try this:
SELECT COUNT(*)
FROM main_table
LEFT JOIN front_table ON front_table.pk = main_table.fk_front_table
LEFT JOIN info_table ON info_table.pk = front_table.fk_info_table
LEFT JOIN key_table ON key_table.pk = NVL(info_table.fk_key_table, NVL(front_table.fk_key_table_1, front_table.fk_key_table_2))
LEFT JOIN (SELECT fk_, MAX(pk) as pk FROM side_table GROUP BY fk_) st ON st.fk_ = front_table.pk
NVL might behave just the same though...
I undertood what was the problem (not entirely though): there is a LEFT JOIN VIEW_WO_SUB in original query, 3rd line. It causes this query to act in a weird way.
When I replaced the view with the other table which contained the information I needed, the query started returning right results.
Basically, with this view join, NVL, COALESCE or CASE join with combination of certain arguments did not work along with OR clause in WHERE subquery, all rest was fine. ALthough, I could get the query to work with this view join, by changing the order of joined tables, I had to place table participating in where subquery to the bottom.

syntax error in from clause in access

Below is my query and it says syntax error in from clause whereas it is perfectly working in SQL.After the error 'AS' is highlighted
SELECT
Table1.*,
emp_details_full1.*
FROM Table1
LEFT JOIN
((SELECT
iss_personal_detail.Specialization,
iss_personal_detail.New_rank,
iss_personal_detail.Induction_tr,
iss_personal_detail.Title,
iss_personal_detail.f_name,
iss_personal_detail.m_name,
iss_personal_detail.l_name,
iss_personal_detail.Father_Hus_Name,
iss_personal_detail.Category,
iss_personal_detail.Community,
iss_personal_detail.SEX,
iss_personal_detail.source_recruit,
iss_personal_detail.Pay_Parity,
iss_personal_detail.[Date_Pay_Parity],
iss_personal_detail.UPSC_Rank,
iss_personal_detail.dob,
iss_personal_detail.doj_govt,
iss_personal_detail.DOA_ISS,
iss_personal_detail.Batch,
iss_personal_detail.Year_of_Exam,
iss_personal_detail.Native_Distt,
iss_personal_detail.Native_State,
iss_personal_detail.[Highest Qualification],
iss_personal_detail.Languages_Known,
iss_personal_detail.Mother_Toung,
iss_personal_detail.Marital_Status,
iss_personal_detail.E_mail_ID,
iss_personal_detail.retire_reason,
iss_personal_detail.title_m,
Present_Posting.*,
ISS_MINISTRY_CODE_LIST.*,
ISS_DEPARTMENT_CODE_LIST.*,
ISS_CITY_CODE_LIST.*,
Desig_Code.*,
Grade_Code.Grade_code
FROM ISS_CITY_CODE_LIST
INNER JOIN( Grade_Code
INNER JOIN (Desig_Code
INNER JOIN (((iss_personal_detail
INNER JOIN Present_Posting
ON iss_personal_detail.OID = Present_Posting.OID)
INNER JOIN ISS_MINISTRY_CODE_LIST
ON Present_Posting.ministry = ISS_MINISTRY_CODE_LIST.MINISTRY_CODE)
INNER JOIN ISS_DEPARTMENT_CODE_LIST
ON Present_Posting.department = ISS_DEPARTMENT_CODE_LIST.DEPARTMENT_CODE)
ON Desig_Code.Code = Present_Posting.designation)
ON Grade_Code.Grade_code = Present_Posting.Grade)
ON ISS_CITY_CODE_LIST.city_code=Present_Posting.office_city
)) AS emp_details_full1 ON
(emp_details_full1.DEPARTMENT_CODE=Table1.department) AND
(emp_details_full1.MINISTRY_CODE=Table1.ministry) AND
(emp_details_full1.city_code=Table1.city) AND
(emp_details_full1.Grade_Code=Table1.grade)
WHERE Table1.grade='02';
The first thing I would do is take the inner select and create a view from it.
This will provide a simple, easy to read, easy to debug sql code.
CREATE VIEW emp_details_full1
AS
SELECT iss_personal_detail.Specialization, iss_personal_detail.New_rank,
iss_personal_detail.Induction_tr, iss_personal_detail.Title,
iss_personal_detail.f_name, iss_personal_detail.m_name,
iss_personal_detail.l_name, iss_personal_detail.Father_Hus_Name,
iss_personal_detail.Category, iss_personal_detail.Community,
iss_personal_detail.SEX, iss_personal_detail.source_recruit,
iss_personal_detail.Pay_Parity, iss_personal_detail.[Date_Pay_ Parity],
iss_personal_detail.UPSC_Rank, iss_personal_detail.dob,
iss_personal_detail.doj_govt, iss_personal_detail.DOA_ISS,
iss_personal_detail.Batch, iss_personal_detail.Year_of_Exam,
iss_personal_detail.Native_Distt, iss_personal_detail.Native_State,
iss_personal_detail.[Highest Qualification],
iss_personal_detail.Languages_Known,
iss_personal_detail.Mother_Toung, iss_personal_detail.Marital_Status,
iss_personal_detail.E_mail_ID, iss_personal_detail.retire_reason,
iss_personal_detail.title_m, Present_Posting., ISS_MINISTRY_CODE_LIST.,
ISS_DEPARTMENT_CODE_LIST., ISS_CITY_CODE_LIST., Desig_Code.*,
Grade_Code.Grade_code
FROM ISS_CITY_CODE_LIST INNER JOIN( Grade_Code INNER JOIN (Desig_Code INNER JOIN
(((iss_personal_detail INNER JOIN Present_Posting ON iss_personal_detail.OID =
Present_Posting.OID) INNER JOIN ISS_MINISTRY_CODE_LIST
ON Present_Posting.ministry = ISS_MINISTRY_CODE_LIST.MINISTRY_CODE)
INNER JOIN ISS_DEPARTMENT_CODE_LIST ON
Present_Posting.department = ISS_DEPARTMENT_CODE_LIST.DEPARTMENT_CODE) ON
Desig_Code.Code = Present_Posting.designation) ON Grade_Code.Grade_code =
Present_Posting.Grade)
ON ISS_CITY_CODE_LIST.city_code=Present_Posting.office_city
Then the rest of the sql would look like this:
SELECT Table1.,emp_details_full1.
FROM Table1 LEFT JOIN emp_details_full1
ON (emp_details_full1.DEPARTMENT_CODE=Table1.department) AND
(emp_details_full1.MINISTRY_CODE=Table1.ministry) AND
(emp_details_full1.city_code=Table1.city) AND
(emp_details_full1.Grade_Code=Table1.grade) WHERE Table1.grade='02';
Now, if you look closely, you can see that there is a closing bracket missing between the end of the ON clause and the start of the WHERE clause.
So to fix that:
SELECT Table1.,emp_details_full1.
FROM Table1 LEFT JOIN emp_details_full1
ON (emp_details_full1.DEPARTMENT_CODE=Table1.department) AND
(emp_details_full1.MINISTRY_CODE=Table1.ministry) AND
(emp_details_full1.city_code=Table1.city) AND
(emp_details_full1.Grade_Code=Table1.grade)) WHERE Table1.grade='02';
Isn't that much easier to work with?

Using left join and inner join in the same query

Below is my query using a left join that works as expected. What I want to do is add another table filter this query ever further but having trouble doing so. I will call this new table table_3 and want to add where table_3.rwykey = runways_updatable.rwykey. Any help would be very much appreciated.
SELECT *
FROM RUNWAYS_UPDATABLE
LEFT JOIN TURN_UPDATABLE
ON RUNWAYS_UPDATABLE.RWYKEY = TURN_UPDATABLE.RWYKEY
WHERE RUNWAYS_UPDATABLE.ICAO = 'ICAO'
AND (RUNWAYS_UPDATABLE.TORA > 4000 OR LDA > 0)
AND (TURN_UPDATABLE.AIRLINE_CODE IS NULL OR TURN_UPDATABLE.AIRLINE_CODE = ''
OR TURN_UPDATABLE.AIRLINE_CODE = '')
'*************EDIT To CLARIFY *****************
Here is the other statement that inner join i would like to use and I would like to combine these 2 statements.
SELECT *
FROM RUNWAYS_UPDATABLE A, RUNWAYS_TABLE B
WHERE A.RWYKEY = B.RWYKEY
'***What I have so far as advice taken below, but getting syntax error
SELECT RUNWAYS_UPDATABLE.*, TURN_UPDATABLE.*, AIRPORT_RUNWAYS_SELECTED.*
FROM RUNWAYS_UPDATABLE
INNER JOIN AIRPORT_RUNWAYS_SELECTED
ON RUNWAYS_UPDATABLE.RWYKEY = AIRPORT_RUNWAYS_SELECTED.RWYKEY
LEFT JOIN TURN_UPDATABLE
ON RUNWAYS_UPDATABLE.RWYKEY = TURN_UPDATABLE.RWYKEY
NOTE: If i comment out the inner join and leave the left join or vice versa, it works but when I have both of joins in the query, thats when im getting the syntax error.
I always come across this question when searching for how to make LEFT JOIN depend on a further INNER JOIN. Here is an example for what I am searching when I am searching for "using LEFT JOIN and INNER JOIN in the same query":
SELECT *
FROM foo f1
LEFT JOIN (bar b1
INNER JOIN baz b2 ON b2.id = b1.baz_id
) ON
b1.id = f1.bar_id
In this example, b1 will only be included if b2 is also found.
Remember that filtering a right-side table in left join should be done in join itself.
select *
from table1
left join table2
on table1.FK_table2 = table2.id
and table2.class = 'HIGH'
I finally figured it out. Thanks for all your help!!!
SELECT * FROM
(AIRPORT_RUNWAYS_SELECTED
INNER JOIN RUNWAYS_UPDATABLE
ON AIRPORT_RUNWAYS_SELECTED.RWYKEY = RUNWAYS_UPDATABLE.RWYKEY)
LEFT JOIN TURN_UPDATABLE ON RUNWAYS_UPDATABLE.RWYKEY = TURN_UPDATABLE.RWYKEY
Add your INNER_JOIN before your LEFT JOIN:
SELECT *
FROM runways_updatable ru
INNER JOIN table_3 t3 ON ru.rwykey = t3.rwykey
LEFT JOIN turn_updatable tu
ON ru.rwykey = tu.rwykey
AND (tu.airline_code IS NULL OR tu.airline_code = '' OR tu.airline_code = '')
WHERE ru.icao = 'ICAO'
AND (ru.tora > 4000 OR ru.lda > 0)
If you LEFT JOIN before your INNER JOIN, then you will not get results from table_3 if there is no matching row in turn_updatable. It's possible this is what you want, but since your join condition for table_3 only references runways_updatable, I would assume that you want a result from table_3, even if there isn't a matching row in turn_updatable.
EDIT:
As #NikolaMarkovinović pointed out, you should filter your LEFT JOIN in the join condition itself, as you see above. Otherwise, you will not get results from the left-side table (runways_updatable) if that condition isn't met in the right-side table (turn_updatable).
EDIT 2: OP mentioned this is actually Access, and not MySQL
In Access, perhaps it's a difference in the table aliases. Try this instead:
SELECT [ru].*, [tu].*, [ars].*
FROM [runways_updatable] AS [ru]
INNER JOIN [airport_runways_selected] AS [ars] ON [ru].rwykey = [ars].rwykey
LEFT JOIN [turn_updatable] AS [tu]
ON [ru].rwykey = [tu].rwykey
AND ([tu].airline_code IS NULL OR [tu].airline_code = '' OR [tu].airline_code = '')
WHERE [ru].icao = 'ICAO'
AND ([ru].tora > 4000 OR [ru].lda > 0)
If it is just an inner join that you want to add, then do this. You can add as many joins as you want in the same query. Please update your answer if this is not what you want, though
SELECT *
FROM RUNWAYS_UPDATABLE
LEFT JOIN TURN_UPDATABLE
ON RUNWAYS_UPDATABLE.RWYKEY = TURN_UPDATABLE.RWYKEY
INNER JOIN table_3
ON table_3.rwykey = runways_updatable.rwykey
WHERE RUNWAYS_UPDATABLE.ICAO = 'ICAO'
AND (RUNWAYS_UPDATABLE.TORA > 4000 OR LDA > 0)
AND (TURN_UPDATABLE.AIRLINE_CODE IS NULL OR TURN_UPDATABLE.AIRLINE_CODE = ''
OR TURN_UPDATABLE.AIRLINE_CODE = '')
I am not really sure what you want. But maybe something like this:
SELECT RUNWAYS_UPDATABLE.*, TURN_UPDATABLE.*
FROM RUNWAYS_UPDATABLE
JOIN table_3
ON table_3.rwykey = runways_updatable.rwykey
LEFT JOIN TURN_UPDATABLE
ON RUNWAYS_UPDATABLE.RWYKEY = TURN_UPDATABLE.RWYKEY
WHERE RUNWAYS_UPDATABLE.ICAO = 'ICAO'
AND (RUNWAYS_UPDATABLE.TORA > 4000 OR LDA > 0)
AND (TURN_UPDATABLE.AIRLINE_CODE IS NULL OR TURN_UPDATABLE.AIRLINE_CODE = ''
OR TURN_UPDATABLE.AIRLINE_CODE = '')
For Postgres, query planner does not guarantee order of execution of join. To Guarantee one can use #Gajus solution but the problem arises if there are Where condition for inner join table's column(s). Either one would to require to carefully add the where clauses in the respective Join condition or otherwise it is better to use subquery the inner join part, and left join the output.

Super Slow Query - sped up, but not perfect... Please help

I posted a query yesterday (see here) that was horrible (took over a minute to run, resulting in 18,215 records):
SELECT DISTINCT
dbo.contacts_link_emails.Email, dbo.contacts.ContactID, dbo.contacts.First AS ContactFirstName, dbo.contacts.Last AS ContactLastName, dbo.contacts.InstitutionID,
dbo.institutionswithzipcodesadditional.CountyID, dbo.institutionswithzipcodesadditional.StateID, dbo.institutionswithzipcodesadditional.DistrictID
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
INNER JOIN
dbo.institutionswithzipcodesadditional
ON dbo.contacts.InstitutionID = dbo.institutionswithzipcodesadditional.InstitutionID
LEFT OUTER JOIN
dbo.contacts_def_jobfunctions
INNER JOIN
dbo.contacts_link_jobfunctions
ON dbo.contacts_def_jobfunctions.JobID = dbo.contacts_link_jobfunctions.JobID
ON dbo.contacts.ContactID = dbo.contacts_link_jobfunctions.ContactID
WHERE
(dbo.contacts.JobTitle IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_1
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist))
OR
(dbo.contacts_link_jobfunctions.JobID IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_2
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist AS newsletterremovelist))
ORDER BY EMAIL
With a lot of coaching and research, I've tuned it up to the following:
SELECT contacts.ContactID,
contacts.InstitutionID,
contacts.First,
contacts.Last,
institutionswithzipcodesadditional.CountyID,
institutionswithzipcodesadditional.StateID,
institutionswithzipcodesadditional.DistrictID
FROM contacts
INNER JOIN contacts_link_emails ON
contacts.ContactID = contacts_link_emails.ContactID
INNER JOIN institutionswithzipcodesadditional ON
contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
WHERE
(contacts.ContactID IN
(SELECT contacts_2.ContactID
FROM contacts AS contacts_2
INNER JOIN contacts_link_emails AS contacts_link_emails_2 ON
contacts_2.ContactID = contacts_link_emails_2.ContactID
LEFT OUTER JOIN contacts_def_jobfunctions ON
contacts_2.JobTitle = contacts_def_jobfunctions.JobID
RIGHT OUTER JOIN newsletterremovelist ON
contacts_link_emails_2.Email = newsletterremovelist.EmailAddress
WHERE (contacts_def_jobfunctions.ParentJobID <> 1841)
GROUP BY contacts_2.ContactID
UNION
SELECT contacts_1.ContactID
FROM contacts_link_jobfunctions
INNER JOIN contacts_def_jobfunctions AS contacts_def_jobfunctions_1 ON
contacts_link_jobfunctions.JobID = contacts_def_jobfunctions_1.JobID
AND contacts_def_jobfunctions_1.ParentJobID <> 1841
INNER JOIN contacts AS contacts_1 ON
contacts_link_jobfunctions.ContactID = contacts_1.ContactID
INNER JOIN contacts_link_emails AS contacts_link_emails_1 ON
contacts_link_emails_1.ContactID = contacts_1.ContactID
LEFT OUTER JOIN newsletterremovelist AS newsletterremovelist_1 ON
contacts_link_emails_1.Email = newsletterremovelist_1.EmailAddress
GROUP BY contacts_1.ContactID))
While this query is now super fast (about 3 seconds), I've blown part of the logic somewhere - it only returns 14,863 rows (instead of the 18,215 rows that I believe is accurate).
The results seem near correct. I'm working to discover what data might be missing in the result set.
Can you please coach me through whatever I've done wrong here?
Thanks,
Russell Schutte
The main problem with your original query was that you had two extra joins just to introduce duplicates and then a DISTINCT to get rid of them.
Use this:
SELECT cle.Email,
c.ContactID,
c.First AS ContactFirstName,
c.Last AS ContactLastName,
c.InstitutionID,
izip.CountyID,
izip.StateID,
izip.DistrictID
FROM dbo.contacts c
INNER JOIN
dbo.institutionswithzipcodesadditional izip
ON izip.InstitutionID = c.InstitutionID
INNER JOIN
dbo.contacts_link_emails cle
ON cle.ContactID = c.ContactID
WHERE cle.Email NOT IN
(
SELECT EmailAddress
FROM dbo.newsletterremovelist
)
AND EXISTS
(
SELECT NULL
FROM dbo.contacts_def_jobfunctions cdj
WHERE cdj.JobId = c.JobTitle
AND cdj.ParentJobId <> '1841'
UNION ALL
SELECT NULL
FROM dbo.contacts_link_jobfunctions clj
JOIN dbo.contacts_def_jobfunctions cdj
ON cdj.JobID = clj.JobID
WHERE clj.ContactID = c.ContactID
AND cdj.ParentJobId <> '1841'
)
ORDER BY
email
Create the following indexes:
newsletterremovelist (EmailAddress)
contacts_link_jobfunctions (ContactID, JobID)
contacts_def_jobfunctions (JobID)
Do you get the same results when you do:
SELECT count(*)
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
SELECT COUNT(*)
FROM
contacts
INNER JOIN contacts_link_jobfunctions
ON contacts.ContactID = contacts_link_jobfunctions.ContactID
INNER JOIN contacts_link_emails
ON contacts.ContactID = contacts_link_emails.ContactID
If so keep adding each join conditon on until you don't get the same results and you will see where your mistake was. If all the joins are the same, then look at the where clauses. But I will be surprised if it isn't in the first join because the syntax you have orginally won't even work on SQL Server and it is pretty nonstandard SQL and may have been incorrect all along but no one knew.
Alternatively, pick a few of the records that are returned in the orginal but not the revised. Track them through the tables one at a time to see if you can find why the second query filters them out.
I'm not directly sure what is wrong, but when I run in to this situation, the first thing I do is start removing variables.
So, comment out the where clause. How many rows are returned?
If you get back the 11,604 rows then you've isolated the problems to the joins. Work though the joins, commenting each one out (remove the associated columns too) and figure out how many rows are eliminated.
As you do this, aim to find what is causing the desired rows to be eliminated. Once isolated, consider the join differences between the first query and the second query.
In looking at the first query, you could probably just modify that to eliminate any INs and instead do a EXISTS instead.
Consider your indexes as well. Any thing in the where or join clauses should probably be indexed.