SQL join not returning the correct data - sql

Im trying to build some test cases in SQL. The problem is when im trying to join some table with a left join (wich if my understanding is correct should return both the matching and nonmatching row) when im trying to get the data with no matching ID, no data is returned. Here is the code :
select
E.ID_EVENEMENT,
t1.dateRecente as date_Star,
virage_c.no_contr,
t2.id as id_Univers,
t2.dateRecente as date_Univers,
t3.NUMERO_DOSSIER_STAR,
t3.dateRecente as Date_factcan,
t3.dateLimite,
t2.ID_PERSONNE_UNIVERS,
t1.ID_PERSONNE_STAR
FROM
STAR.EVENEMENT e
left join VIRAGE.CONTRAT virage_c
on e.NO_CONTRAT_OFFICIEL = to_char(virage_c.no_contr)
inner join
( SELECT
e.ID_EVENEMENT,
ep.ID_EVEN_INDIVIDU as ID_PERSONNE_STAR,
GREATEST( e.dt_creation,
NVL(e.DT_MODIF_STA_ELI, TO_DATE(1,'j')),
NVL(max(nt.dt_maj), TO_DATE(1,'j')),
NVL(max(ser.DT_CREATION), TO_DATE(1,'j')),
NVL(max(SER.DT_MAJ), TO_DATE(1,'j')),
NVL(max(aut.dt_transmis), TO_DATE(1,'j')),
NVL(max(AUT.DT_CREATION), TO_DATE(1,'j')) ) dateRecente
FROM
STAR.EVENEMENT e
left join STAR.Note nt
on e.ID_EVENEMENT = nt.ID_EVEN
left join STAR.SERVICE ser
on e.ID_EVENEMENT = ser.ID_EVEN
left join STAR.AUTORISATION aut
on ser.id_service = aut.id_service
left join STAR.DOCUMENT doc
on e.ID_EVENEMENT = doc.ID_EVENEMENT
left join STAR.ETAT ett
on e.ID_EVENEMENT = ett.ID_EVENEMENT
left join STAR.EVENEMENT_PARTICIPANT ep
on e.ID_EVENEMENT = ep.ID_EVENEMENT
GROUP BY
e.ID_EVENEMENT,
e.dt_creation,
e.DT_MODIF_STA_ELI,
ep.ID_EVEN_INDIVIDU ) t1
on t1.ID_EVENEMENT = E.ID_EVENEMENT
left JOIN
( SELECT
sf.STAREVENTNUMBER,
c.id,
par.ID as ID_PERSONNE_UNIVERS,
GREATEST( c.UPDATEDATE,
max(NVL(ca.UPDATEDATE, TO_DATE(1, 'J'))),
max(NVL(bo.UPDATEDATE, TO_DATE(1, 'J'))),
max(NVL(a.UPDATEDATE, TO_DATE(1, 'J'))),
max(NVL(p.UPDATEDATE, TO_DATE(1, 'J'))),
max(NVL(p.RELEASEDATE, TO_DATE(1, 'J'))),
max(NVL(p.PAYMENTDATE, TO_DATE(1, 'J'))) ) dateRecente
FROM
CV_CLAIMS_TRAVEL.STAR_FILE sf
join CV_CLAIMS_TRAVEL.CLAIM c
on sf.claimid = c.id
left join CV_CLAIMS_TRAVEL.BENEFIT_OPTION bo
on c.id = BO.CLAIMID
left join CV_CLAIMS_TRAVEL.ADJUDICATION a
on bo.id = a.BENEFITOPTIONID
left join CV_CLAIMS_TRAVEL.CLAIM_ACTIVITY ca
on c.id = CA.CLAIMID
left join CV_CLAIMS_TRAVEL.CLAIM_RELATIONSHIP cr
on c.id = CR.CLAIMID
left join CV_CLAIMS_TRAVEL.PAYEE pa
on cr.id = PA.CLAIMRELATIONSHIPID
left join CV_CLAIMS_TRAVEL.PAYMENT p
on pa.id = P.PAYEEID
left join CV_CLAIMS_TRAVEL.PARTY par
on par.ID = sf.PARTYID
WHERE
c.PRIMARYSTATUSLID in ('CLAIM_PRIMARY_STATUS:0000000003','CLAIM_PRIMARY_STATUS:0000000001')
OR ( c.PRIMARYSTATUSLID = 'CLAIM_PRIMARY_STATUS:0000000004'
AND bo.BENEFITOPTIONSTATUSLID in ('BENEFIT_OPTION_STATUS:0000000010',
'BENEFIT_OPTION_STATUS:0000000060',
'BENEFIT_OPTION_STATUS:0000000030')
)
group by
sf.STAREVENTNUMBER,
c.id,
c.UPDATEDATE,
par.ID ) t2
on e.ID_EVENEMENT = t2.STAREVENTNUMBER
LEFT JOIN
( SELECT DISTINCT
fact.NUMERO_DOSSIER_STAR,
GREATEST( to_date(fact.VSTDTCHG,'yyyymmdd'),
to_date(fact.VSTDICHK,'yyyymmdd'),
to_date(decode(fact.VSTDIFIN,0,19000101,
decode(substr(fact.VSTDIFIN,5),
'0230', substr(fact.VSTDIFIN,1,4)
|| '0301',fact.VSTDIFIN)),'yyyymmdd') ) DateRecente,
DECODE( virage_cont.id_cont,
null,add_months(sysdate,-7*12),
add_months(sysdate,-15*12)) dateLimite
FROM
FACTCAN.XC4DSAV fact
LEFT JOIN VIRAGE.CONTRAT virage_cont
on fact.VSTNOCNT_VIRAGE = virage_cont.NO_CONTR
where
fact.NUMERO_DOSSIER_STAR is not null ) t3
on e.ID_EVENEMENT = t3.NUMERO_DOSSIER_STAR
WHERE
t1.dateRecente < add_months(sysdate, -7*12)
AND t2.dateRecente < add_months(sysdate, -7*12)
AND virage_c.id_cont is null
AND t2.ID_PERSONNE_UNIVERS is not null
AND t1.ID_PERSONNE_STAR is null
AND t3.dateRecente < t3.dateLimite
FETCH
FIRST 1000 ROWS ONLY;
When I'm trying to get the results from an event with either
t1.ID_PERSONNE_STAR or t2.ID_PERSONNE_UNIVERS is null, the query isn't returning anything when it should in fact return some data. The is not null work as intended though. Any idea?

I updated the readability of the query. Also, as noted by others, you have to be careful about the WHERE clause. The issue you are probably running into is your where clauses for the T1, T2 and T3 for left-joins... Move these up to those joins... ex
where you have
left join (rest of the left-join subquery with alias ) t1
on t1.ID_EVENEMENT = E.ID_EVENEMENT
change to
on t1.ID_EVENEMENT = E.ID_EVENEMENT
AND t1.dateRecente < add_months(sysdate, -7*12)
This way, the date requirement is part of the LEFT-JOIN. By having that in the WHERE clause is turning it into an INNER JOIN.
You can leave the T1 "IS NULL" test at the where clause portion because you are explicitly expecting no matching records.
Check similar situations for your other left-joins... move the CRITERIA to the JOIN/ON clause and remove from the WHERE. The where clause SHOULD have the "IS NULL" for final expectation.

Related

why selecting particular columns from same table slows down query performance significantly?

I have SELECT statement that querying columns from tblQuotes. Why if I am selecting columns a.ProducerCompositeCommission and a.CompanyCompositeCommission, then query spinning forever.
Execution plans with and without those columns are IDENTICAL!
If I commented them out - then it brings result for 1 second.
SELECT
a.stateid risk_state1,
--those columns slows down performance
a.ProducerCompositeCommission,
a.CompanyCompositeCommission,
GETDATE() runDate
FROM
tblQuotes a
INNER JOIN
lstlines l ON a.LineGUID = l.LineGUID
INNER JOIN
tblSubmissionGroup tsg ON tsg.SubmissionGroupGUID = a.SubmissionGroupGuid
INNER JOIN
tblUsers u ON u.UserGuid = tsg.UnderwriterUserGuid
INNER JOIN
tblUsers u2 ON u2.UserGuid = a.UnderwriterUserGuid
LEFT OUTER JOIN
tblFin_Invoices tfi ON tfi.QuoteID = a.QuoteID AND tfi.failed <> 1
INNER JOIN
lstPolicyTypes lpt ON lpt.policytypeid = a.policytypeid
INNER JOIN
tblproducercontacts prodC ON prodC.producercontactguid = a.producercontactguid
INNER JOIN
tblProducerLocations pl ON pl.producerlocationguid = prodc.producerlocationguid
INNER JOIN
tblproducers prod ON prod.ProducerGUID = pl.ProducerGUID
LEFT OUTER JOIN
Catalytic_tbl_Model_Analysis aia ON aia.ImsControl = a.controlno
AND aia.analysisid = (SELECT TOP 1 tma2.analysisid
FROM Catalytic_tbl_Model_Analysis tma2
WHERE tma2.imscontrol = a.controlno)
LEFT OUTER JOIN
Catalytic_tbl_RDR_Analysis rdr ON rdr.ImsControl = a.controlno
AND rdr.analysisid = (SELECT TOP 1 tma2.analysisid
FROM Catalytic_tbl_RDR_Analysis tma2
WHERE tma2.imscontrol = a.controlno)
LEFT OUTER JOIN
tblProducerContacts mnged ON mnged.producercontactguid = ProdC.ManagedBy
LEFT OUTER JOIN
lstQuoteStatusReasons r1 ON r1.id = a.QuoteStatusReasonID
WHERE
l.LineName = 'EARTHQUAKE'
AND CAST(a.EffectiveDate AS DATE) >= CAST('2017-01-01' AS DATE)
AND CAST(a.EffectiveDate AS DATE) <= CAST('2017-12-31' AS DATE)
ORDER BY
a.effectiveDate
The execution plan can be found here:
https://www.brentozar.com/pastetheplan/?id=rJawDkTx-
I ran sp_help and this is what I see:
What exactly wrong with those columns?
I dont use them in a JOIN or anything. Why such bahaviour?
Table Size:
Indexes on table tblQuotes

Problems with Sql query join

I am struggling with a sql query. I want to include the sum from an other table.
SELECT DISTINCT
tblProject.CompanyID,
tblCompany.Name,
tblCompany.AvtalsKund,
tblProject.ProjectName,
tblProject.Estimate,
tblProject.ProjectStart,
tblProject.Deadline,
CONVERT(VARCHAR(8), tblProject.Deadline, 2) AS [YY.MM.DD] ,
tblProject.PreOffered,
tblProject.ProjectType,
tblProjectType.ProjType,
tblOrdered.FirstName + + tblOrdered.LastName as OrderedFullName,
tblProject.ProjectID,
tblProject.RegDate,
tblProject.ProjectNr,
tblProject.ProjectNr
FROM tblProject
INNER JOIN tblCompany ON tblProject.CompanyID = tblCompany.CompanyID
---> INNER JOIN (SELECT tblTimeRecord.ProjectID, SUM(CONVERT(float,replace([Hours],',','') ))
FROM tblTimeRecord group by tblTimeRecord.ProjectID) as b
ON b.ProjectID = tblProject.ProjectID
INNER JOIN tblTimeRecord ON tblTimeRecord.ProjectID = tblProject.ProjectID
INNER JOIN tblProjectType ON tblProject.ProjectType = tblProjectType.ProjTypeID
LEFT OUTER JOIN tblOrdered ON tblProject.OrderedBy = tblOrdered.OrderedID
LEFT OUTER JOIN tblRel_WorkerProject ON tblProject.ProjectID = tblRel_WorkerProject.ProjectID
LEFT OUTER JOIN tblPerson ON tblPerson.PersonID = tblRel_WorkerProject.WorkerID
LEFT OUTER JOIN tblRel_StatusWorkerProject ON tblProject.ProjectID = tblRel_StatusWorkerProject.ProjectID
I want to include this sum-block from table tblTimeRecord.
I get a sum of timerapports with this code
SELECT tblTimeRecord.ProjectID,
SUM(CONVERT(float,replace([Hours],',','') ))
FROM tblTimeRecord where ProjectID=1312 group by tblTimeRecord.ProjectID
Guess i do it in join?
Got it working with this.
SELECT DISTINCT
tblProject.ProjectID,
Summa,
tblProject.CompanyID,
tblCompany.Name,
tblCompany.AvtalsKund,
tblProject.ProjectName,
tblProject.Estimate,
tblProject.ProjectStart,
tblProject.Deadline,
CONVERT(VARCHAR(8), tblProject.Deadline, 2) AS [YY.MM.DD] ,
tblProject.PreOffered,
tblProject.ProjectType,
tblProjectType.ProjType,
tblOrdered.FirstName + + tblOrdered.LastName as OrderedFullName,
tblProject.ProjectID,
tblProject.RegDate,
tblProject.ProjectNr,
tblProject.ProjectNr
FROM tblProject
INNER JOIN tblCompany ON tblProject.CompanyID = tblCompany.CompanyID
INNER JOIN (SELECT tblTimeRecord.ProjectID, SUM(CONVERT(float,replace([Hours],',','') )) as Summa FROM tblTimeRecord group by tblTimeRecord.ProjectID) as b
ON b.ProjectID = tblProject.ProjectID
INNER JOIN tblTimeRecord ON tblTimeRecord.ProjectID = tblProject.ProjectID
INNER JOIN tblProjectType ON tblProject.ProjectType = tblProjectType.ProjTypeID
LEFT OUTER JOIN tblOrdered ON tblProject.OrderedBy = tblOrdered.OrderedID
LEFT OUTER JOIN tblRel_WorkerProject ON tblProject.ProjectID = tblRel_WorkerProject.ProjectID
LEFT OUTER JOIN tblPerson ON tblPerson.PersonID = tblRel_WorkerProject.WorkerID
LEFT OUTER JOIN tblRel_StatusWorkerProject ON tblProject.ProjectID = tblRel_StatusWorkerProject.ProjectID
There are two ways to do this.
You can use a WITH clause to create the aggregate table then join this to the main query.
Or do it this way:
SELECT m.BLAH
,m.FOO
,x.AMOUNT
FROM MAINTABLE m
LEFT JOIN
(
SELECT FOO
,SUM(AMOUNT) as AMOUNT
FROM OTHERTABLE
GROUP BY FOO
) x
ON m.FOO = x.FOO
I prefer the second way.

How can I remove a sub-query from a NOT EXISTS contained in an INNER JOIN?

Due to rules set forward by third party software I need to remove the sub-query from the following code:
SELECT
1
FROM
factAttempt fact
INNER JOIN dimActivity act ON act.ID = fact.ActivityID
INNER JOIN dimUser emp ON emp.ID = fact.UserID
INNER JOIN Iwc_Usr IUser ON IUser.Usr_empFK = emp.EmpFK
INNER JOIN dimActivity class ON (
(class.ActivityFK = act.PrntActFK)
OR (
NOT EXISTS (
SELECT 1
FROM TBL_TMX_activity act1
WHERE act1.PrntActFK = Class.ActivityFK
)
AND
Class.ActivityFK = act.ActivityFK
)
)
AND class.ActivityName = act.ActivityName
I have tried using a Boolean (bit) scalar variable to replace it but while it will run the wrong results are returned. Since I don't know too much SQL I haven't been able to find anything else yet.
I'm using Microsoft SQL Server 2012 if that's useful
Thanks for the help.
You can move the exists table to a left join. Then the equivalent to the not exists is checking that the value of a field in the right part of the join is null.
SELECT
1
FROM
factAttempt fact
INNER JOIN dimActivity act ON act.ID = fact.ActivityID
INNER JOIN dimUser emp ON emp.ID = fact.UserID
INNER JOIN Iwc_Usr IUser ON IUser.Usr_empFK = emp.EmpFK
INNER JOIN dimActivity class
LEFT JOIN TBL_TMX_activity act1
ON act1.PrntActFK = class.ActivityFK
ON
(
(
class.ActivityFK = act.PrntActFK
OR
(act1.PrntActFK IS NULL -- equivalent of NOT EXISTS
AND Class.ActivityFK = act.ActivityFK)
)
AND class.ActivityName = act.ActivityName
)

SQL - Join issue

I have the following query:
SELECT rt.ID, rt.Name, rt.Rate, rt.Colour, vtb.ID AS 'vtbID', vtb.Value, rt.StdID
FROM Rates AS rt
LEFT OUTER JOIN VehicleTypeCostsBreakdown AS vtb ON rt.ID = vtb.RateID
LEFT OUTER JOIN VehicleTypeCostsDepots AS vtd ON vtd.ID = vtb.VehicleTypeDepotID AND vtd.DepotID = #DepotID AND vtd.VehicleTypeID = #VehicleTypeID
Basically, I want to select all 'rates' from Rates table, but if any references to a rate exists in the 'vtd' table, which has parameters that match #DepotID and #VehicleTypeID, I want to bring back the Value for that. If it doesn't have any referenced, I want it the 'vtb.Value' selection to be blank.
With the SQL above, it seems to always return a value for 'vtb.Value' value, even if the parameters are null. Am I missing something?
Try it this way. Basically, you'll LEFT JOIN to the derived table formed by the INNER JOIN between VehicleTypeCostsBreakdown and VehicleTypeCostsDepots. The INNER JOIN will only match when all of your conditions are true.
SELECT rt.ID, rt.Name, rt.Rate, rt.Colour, vtb.ID AS 'vtbID', vtb.Value, rt.StdID
FROM Rates AS rt
LEFT OUTER JOIN VehicleTypeCostsBreakdown AS vtb
INNER JOIN VehicleTypeCostsDepots AS vtd
ON vtd.ID = vtb.VehicleTypeDepotID
AND vtd.DepotID = #DepotID
AND vtd.VehicleTypeID = #VehicleTypeID
ON rt.ID = vtb.RateID
Try:
SELECT rt.ID, rt.Name, rt.Rate, rt.Colour, vtb.ID AS 'vtbID', vtb.Value, rt.StdID
FROM Rates AS rt
LEFT OUTER JOIN (SELECT b.ID, b.Value, b.RateID
FROM VehicleTypeCostsBreakdown AS b
JOIN VehicleTypeCostsDepots AS d
ON d.ID = b.VehicleTypeDepotID AND
d.DepotID = #DepotID AND
d.VehicleTypeID = #VehicleTypeID)
AS vtb ON rt.ID = vtb.RateID
Try this:
SELECT rt.ID, rt.Name, rt.Rate, rt.Colour, vtb.ID AS 'vtbID', vtb.Value, rt.StdID
FROM Rates AS rt
LEFT JOIN VehicleTypeCostsBreakdown AS vtb ON rt.ID = vtb.RateID
LEFT JOIN VehicleTypeCostsDepots AS vtd ON vtd.ID = vtb.VehicleTypeDepotID
WHERE vtd.ID IS NULL OR (vtd.DepotID = #DepotID AND vtd.VehicleTypeID = #VehicleTypeID)
You don't need to specify that the LEFT JOIN is an OUTER JOIN and you shouldn't put conditions in the ON section of a JOIN, that's what WHERE is for.

Help to optimize a Query

i need optimize this query, please see the comented line:
SELECT p.NUM_PROCESSO,
p.NUM_PROC_JUD,
p.Num_Proc_Jud_Antigo1,
p.Num_Proc_Jud_Antigo2,
p.Num_Proc_Jud_Novo,
a.assunto,
su.subassunto,
u.UNIDADE,
s.SERVIDOR,
dvj.data_vinc,
p.TIPO,
c.DESC_CLASSIF
FROM processo p
LEFT OUTER JOIN assunto a
ON a.cod_assunto = p.cod_assunto
LEFT OUTER JOIN subassunto su
ON su.cod_subassunto = p.cod_subassunto
LEFT OUTER JOIN Distrib_VincJud dvj
ON dvj.num_processo = p.num_processo
LEFT OUTER JOIN servidor s
ON S.COD_SERVIDOR = dvj.COD_SERVIDOR
LEFT OUTER JOIN unidade u
ON u.COD_UNIDADE = s.COD_UNIDADE
LEFT OUTER JOIN Classif_Processo c
ON C.COD_CLASSIF = p.COD_CLASSIF
WHERE p.TIPO = 'J'
AND p.NUM_PROCESSO NOT IN (SELECT d.num_processo
FROM distribuicao d
WHERE d.COD_SERVIDOR in ( '0', '000' )
AND d.num_distribuicao IN
(SELECT MAX(num_distribuicao)
FROM Distribuicao
GROUP BY num_processo)
--this suquery return 100k lines !!! and consume all CPU:
AND dvj.id_vinc IN
(SELECT MAX(id_vinc)
FROM Distrib_VincJud
where ativo = '1'
GROUP BY num_processo))
AND p.NUM_PROCESSO NOT IN (SELECT num_processo
FROM Anexos)
AND s.ATIVO = 1
my horrible solution at this moment: http://pastebin.com/C4PHNsSc
What I would do is convert the IN and NOT IN into joins:
SELECT p.NUM_PROCESSO, p.NUM_PROC_JUD, p.Num_Proc_Jud_Antigo1,
p.Num_Proc_Jud_Antigo2, p.Num_Proc_Jud_Novo, a.assunto,
su.subassunto, u.UNIDADE, s.SERVIDOR, dvj.data_vinc, p.TIPO,
c.DESC_CLASSIF
FROM
processo p
INNER JOIN (
SELECT p.num_processo,
CASE WHEN dvj.id_vinc IS NOT NULL
AND d.num_distribuicao IS NOT NULL
OR a.num_processo IS NOT NULL THEN
1
ELSE
0
END exclude
FROM
processo p
LEFT JOIN Anexos a
ON p.num_processo = a.num_processo
LEFT JOIN (
SELECT num_processo,
MAX(num_distribuicao) AS max_distribuicao
FROM Distribuicao
GROUP BY num_processo
) md ON p.num_processo = md.num_processo
LEFT JOIN (
SELECT num_processo, MAX(id_vinc) AS max_vinc
FROM Distrib_VincJud
WHERE ativo = '1'
GROUP BY num_processo
) mv on p.num_processo = mv.num_processo
LEFT JOIN distribuicao d
ON p.num_processo = d.num_processo
AND md.max_distribuicao = d.num_distribuicao
LEFT JOIN Distrib_VincJud dvj
ON p.num_processo = dvj.num_processo
AND mv.max_vinc = dvj.id_vinc
WHERE d.COD_SERVIDOR in ('0', '000')
) IncExc
ON p.num_processo = IncExc.num_processo
LEFT OUTER JOIN assunto a
ON a.cod_assunto = p.cod_assunto
LEFT OUTER JOIN subassunto su
ON su.cod_subassunto = p.cod_subassunto
LEFT OUTER JOIN Distrib_VincJud dvj
ON dvj.num_processo = p.num_processo
LEFT OUTER JOIN servidor s
ON S.COD_SERVIDOR = dvj.COD_SERVIDOR
LEFT OUTER JOIN unidade u
ON u.COD_UNIDADE = s.COD_UNIDADE
LEFT OUTER JOIN Classif_Processo c
ON C.COD_CLASSIF = p.COD_CLASSIF
WHERE
p.TIPO = 'J'
AND IncExc.exclude = 0
AND s.ATIVO = 1
This part
AND p.NUM_PROCESSO NOT IN (
SELECT d.num_processo FROM distribuicao d
WHERE d.COD_SERVIDOR in ('0','000')
AND d.num_distribuicao IN (
SELECT MAX(num_distribuicao) FROM Distribuicao GROUP BY num_processo
)
and this part
AND p.NUM_PROCESSO NOT IN (
SELECT num_processo FROM Anexos
)
are going to be your biggest bottlenecks in the query as you've got nested subqueries in there.
You also have a few of these:
SELECT MAX(id_vinc) FROM Distrib_VincJud where ativo = '1' GROUP BY num_processo)
SELECT MAX(num_distribuicao) FROM Distribuicao GROUP BY num_processo
You might gain a few more seconds by letting these be seperate queries where you can store the results.
In fact, you might do well to have a separate table with these NOT IN(...) values that gets updated upon every insert to your database. It all depends on how often you run each query.
Have you tried running your Query Optimizer on these?
Separate out the sub queries and then do a join
i.e. find all the num_processo that you are excluding in one query first. do a left join with the processo table on the num_processo field and exclude those where the first table's num_processo field is null
Edit:
what's the relationship between the tables distribuicao and distrib_vincJud?
this line is killing your performance...
AND dvj.id_vinc IN
( SELECT MAX(id_vinc)
FROM Distrib_VincJud
where ativo = '1'
GROUP BY num_processo
)
sub query in a sub query which then references a joined table outside of the sub query??????