Combine results without using intersect in SQL SERVER 2010 - sql

is there any way that i can restructure this below query without using intersect. Because using intersect causing very slow performance. Please suggest
SELECT DISTINCT( PFM.id ) AS PfmFolderFK
FROM cm.pfmfolder PFM WITH(nolock)
INNER JOIN cm.pfmfoldermstipmap PFMMST WITH(nolock)
ON PFMMST.pfmfolderfk = PFM.id
INNER JOIN cm.mstip MST WITH(nolock)
ON MST.id = PFMMST.mstipfk
WHERE MST.registrycode = #RegistryCode
AND PFM.deletedby IS NULL
AND PFM.deleteddate IS NULL
INTERSECT
SELECT DISTINCT( FMAP.pfmfolderfk ) AS PfmFolderFK
FROM cm.mstip MIP WITH(nolock)
INNER JOIN cm.pfmfoldermstipmap FMAP WITH(nolock)
ON MIP.id = FMAP.mstipfk
AND MIP.registrycode = #RegistryCode
AND MIP.deletedby IS NULL
AND MIP.deletedate IS NULL

An intersect is taking values from both tables. The queries are quite similar here, so I think you just need an additional join in the first table to complete the logic without an intersect:
SELECT DISTINCT( PFM.id ) AS PfmFolderFK
FROM cm.pfmfolder PFM WITH(nolock) INNER JOIN
cm.pfmfoldermstipmap PFMMST WITH(nolock)
ON PFMMST.pfmfolderfk = PFM.id INNER JOIN
cm.mstip MST WITH(nolock)
ON MST.id = PFMMST.mstipfk INNER JOIN
cm.pfmfoldermstipmap FMAP WITH(nolock)
ON PFMMST.id = FMAP.mstipfk AND
PFMMST.registrycode = #RegistryCode AND
PFMMST.deletedby IS NULL AND
PFMMST.deletedate IS NULL
WHERE MST.registrycode = #RegistryCode AND
PFM.deletedby IS NULL AND
PFM.deleteddate IS NULL;

How about this? I think you need to take only the first part of your query and add two contitions to it.
SELECT DISTINCT FMAP.pfmfolderfk AS PfmFolderFK
FROM cm.mstip MST WITH(nolock)
INNER JOIN cm.pfmfoldermstipmap FMAP WITH(nolock) ON MST.id = FMAP.mstipfk
INNER JOIN cm.pfmfolder PFM WITH(nolock) ON FMAP.pfmfolderfk = PFM.id
WHERE MST.registrycode = #RegistryCode
AND PFM.deletedby IS NULL
AND PFM.deleteddate IS NULL
AND MST.deletedby IS NULL
AND MST.deletedate IS NULL

Queries with DISTINCT are always dubious ;-) You are looking for pfmfolders where certain records exist in pfmfoldermstipmap and mstip. Use EXISTS for that. Actually you are looking for all pfmfolders that are in pfmfoldermstipmap and mstip and are either not deleted themselves or having not deleted mstips:
select id
from cm.pfmfolder
where exists
(
select *
from cm.pfmfoldermstipmap
inner join cm.mstip on mstip.id = pfmfoldermstipmap.mstipfk
and pfmfoldermstipmap.pfmfolderfk = pfmfolder.id
and mstip.registrycode = #registrycode
and
(
(pfmfolder.deletedby is null and pfmfolder.deleteddate is null)
or
(mstip.deletedby is null and mstip.deleteddate is null)
)
);

Related

How to create distinct count from queries with several tables

I am trying to create one single query that will give me a distinct count for both the ActivityID and the CommentID. My query in MS Access looks like this:
SELECT
tbl_Category.Category, Count(tbl_Activity.ActivityID) AS CountOfActivityID,
Count(tbl_Comments.CommentID) AS CountOfCommentID
FROM tbl_Category LEFT JOIN
(tbl_Activity LEFT JOIN tbl_Comments ON
tbl_Activity.ActivityID = tbl_Comments.ActivityID) ON
tbl_Category.CategoryID = tbl_Activity.CategoryID
WHERE
(((tbl_Activity.UnitID)=5) AND ((tbl_Comments.PeriodID)=1))
GROUP BY
tbl_Category.Category;
I know the answer must somehow include SELECT DISTINCT but am not able to get it to work. Do I need to create multiple subqueries?
This is really painful in MS Access. I think the following does what you want to do:
SELECT ac.Category, ac.num_activities, aco.num_comments
FROM (SELECT ca.category, COUNT(*) as num_activities
FROM (SELECT DISTINCT c.Category, a.ActivityID
FROM (tbl_Category as c INNER JOIN
tbl_Activity as a
ON c.CategoryID = a.CategoryID
) INNER JOIN
tbl_Comments as co
ON a.ActivityID = co.ActivityID
WHERE a.UnitID = 5 AND co.PeriodID = 1
) as caa
GROUP BY ca.category
) as ca LEFT JOIN
(SELECT c.Category, COUNT(*) as num_comments
FROM (SELECT DISTINCT c.Category, co.CommentId
FROM (tbl_Category as c INNER JOIN
tbl_Activity as a
ON c.CategoryID = a.CategoryID
) INNER JOIN
tbl_Comments as co
ON a.ActivityID = co.ActivityID
WHERE a.UnitID = 5 AND co.PeriodID = 1
) as aco
GROUP BY c.Category
) as aco
ON aco.CommentId = ac.CommentId
Note that your LEFT JOINs are superfluous because the WHERE clause turns them into INNER JOINs. This adjusts the logic for that purpose. The filtering is also very tricky, because it uses both tables, requiring that both subqueries have both JOINs.
You can use DISTINCT:
SELECT
tbl_Category.Category, Count(DISTINCT tbl_Activity.ActivityID) AS CountOfActivityID,
Count(DISTINCT tbl_Comments.CommentID) AS CountOfCommentID
FROM tbl_Category LEFT JOIN
(tbl_Activity LEFT JOIN tbl_Comments ON
tbl_Activity.ActivityID = tbl_Comments.ActivityID) ON
tbl_Category.CategoryID = tbl_Activity.CategoryID
WHERE
(((tbl_Activity.UnitID)=5) AND ((tbl_Comments.PeriodID)=1))
GROUP BY
tbl_Category.Category;

Best Join Strategy/Indexes for SQL Server

What is the best join strategy/indexes for this query:
SELECT
kwk.*, an.AuftragDatum, an.AbgabeDatum, an.BezahltDatum, an.AuftragStatus
FROM
KundenWerbenKunden kwk
INNER JOIN
Auftrag an ON an.AuftragNummer = kwk.AuftragNummer
WHERE
kwk.Deleted = 0
Table KundenWerbenKunden has 103950 rows with 103646 Deleted = 0 ones.
Table Auftrag has 3826552 rows.
In my real query I make some more joins:
INNER JOIN
Filiale fn WITH (NOLOCK) ON an.FilialeID = fn.FilialeID
INNER JOIN
Kunde kn ON an.KundeID = kn.KundeID
OUTER APPLY
(SELECT DISTINCT KSKNr
FROM KdZuordnung
WHERE KundeID = kn.KundeID) zn
LEFT JOIN
Anrede ann WITH (NOLOCK) ON kn.Anrede = ann.Anrede
INNER JOIN
AuftragArt aa WITH (NOLOCK) ON an.AuftragArtID = aa.AuftragArtID
INNER JOIN
AuftragGrund ag WITH (NOLOCK) ON an.AuftragGrundID = ag.AuftragGrundID
INNER JOIN
AuftragType at WITH (NOLOCK) ON an.AuftragTypeID = at.AuftragTypeID
For this query:
SELECT *
FROM KundenWerbenKunden kwk INNER JOIN
Auftrag an
ON an.AuftragNummer = kwk.AuftragNummer
WHERE kwk.Geloescht = 0;
And not knowing anything about the distribution of Geloescht, I would first try indexes on KundenWerbenKunden(Geloescht, AuftragNummer) and Auftrag(AuftragNummer).

Sum from the different tables Sql server

I have couple of tables which stores amount and I want to group by and get sum - reason for the mutiple tables are nhibernate descriminators.
I am using Union all and works but query is very big.
I am using following query
SELECT CustomerAccountNumber,
vc.CustomerName,
SUM(PermAmount) AS PermAmount,
SUM(FreetextAmount) AS FreetextAmount,
(SUM(PermAmount) + SUM(FreetextAmount)) AS TotalAmountByCustomer
FROM
(
SELECT pp.CustomerAccountNumber,
pl.Amount AS PermAmount,
0 AS FreetextAmount
FROM dbo.PermanentPlacementTransactionLine pl
INNER JOIN dbo.TransactionLine tl ON pl.TransactionLineId = tl.Id
INNER JOIN dbo.PermanentPlacement pp ON pl.PermanentPlacementId = pp.Id
WHERE tl.CurrentStatus = 1
GROUP BY pp.CustomerAccountNumber,
pl.Amount,
tl.Id
UNION ALL
SELECT ft.CustomerAccountNumber,
0 AS PermAmount,
ft.Amount AS FreetextAmount
FROM dbo.FreeTextTransactionLine fttl
INNER JOIN dbo.TransactionLine tl ON fttl.TransactionLineId = tl.Id
INNER JOIN dbo.[FreeText] ft ON fttl.FreeTextId = ft.Id
WHERE tl.CurrentStatus = 1
GROUP BY ft.CustomerAccountNumber,
ft.Amount,
tl.Id
) WIPSummary
INNER JOIN dbo.vw_Customer vc ON WIPSummary.CustomerAccountNumber = vc.CustomerAccount
GROUP BY CustomerAccountNumber,
vc.CustomerName;
is there any elegant way of displaying amount in separate columns ?
I can use partition by if it was same table and want to display row by row.
Try these query, is easy to understand and probably faster than yours.
I assume that the values are unique in your view
WITH cte_a
AS (SELECT pp.customeraccountnumber
,Sum(pl.amount) AS PermAmount
,0 AS FreetextAmount
FROM dbo.permanentplacementtransactionline pl
INNER JOIN dbo.transactionline tl
ON pl.transactionlineid = tl.id
INNER JOIN dbo.permanentplacement pp
ON pl.permanentplacementid = pp.id
WHERE tl.currentstatus = 1
GROUP BY pp.customeraccountnumber),
cte_b
AS (SELECT ft.customeraccountnumber
,0 AS PermAmount
,Sum(ft.amount) AS FreetextAmount
FROM dbo.freetexttransactionline fttl
INNER JOIN dbo.transactionline tl
ON fttl.transactionlineid = tl.id
INNER JOIN dbo.[freetext] ft
ON fttl.freetextid = ft.id
WHERE tl.currentstatus = 1
GROUP BY ft.customeraccountnumber)
SELECT vc.customeraccountnumber
,vc.customername
,Isnull(A.permamount, 0) AS PermAmount
,Isnull(B.freetextamount, 0) AS FreetextAmount
,Isnull(A.permamount, 0)
+ Isnull(B.freetextamount, 0) AS TotalAmountByCustomer
FROM dbo.vw_customer vc
LEFT JOIN cte_a a
ON vc.customeraccount = A.customeraccountnumber
LEFT JOIN cte_b b
ON vc.customeraccount = A.customeraccountnumber
if no table structures and sample data, that is the best I can do to help you.

How can I remove a sub-query from a NOT EXISTS contained in an INNER JOIN?

Due to rules set forward by third party software I need to remove the sub-query from the following code:
SELECT
1
FROM
factAttempt fact
INNER JOIN dimActivity act ON act.ID = fact.ActivityID
INNER JOIN dimUser emp ON emp.ID = fact.UserID
INNER JOIN Iwc_Usr IUser ON IUser.Usr_empFK = emp.EmpFK
INNER JOIN dimActivity class ON (
(class.ActivityFK = act.PrntActFK)
OR (
NOT EXISTS (
SELECT 1
FROM TBL_TMX_activity act1
WHERE act1.PrntActFK = Class.ActivityFK
)
AND
Class.ActivityFK = act.ActivityFK
)
)
AND class.ActivityName = act.ActivityName
I have tried using a Boolean (bit) scalar variable to replace it but while it will run the wrong results are returned. Since I don't know too much SQL I haven't been able to find anything else yet.
I'm using Microsoft SQL Server 2012 if that's useful
Thanks for the help.
You can move the exists table to a left join. Then the equivalent to the not exists is checking that the value of a field in the right part of the join is null.
SELECT
1
FROM
factAttempt fact
INNER JOIN dimActivity act ON act.ID = fact.ActivityID
INNER JOIN dimUser emp ON emp.ID = fact.UserID
INNER JOIN Iwc_Usr IUser ON IUser.Usr_empFK = emp.EmpFK
INNER JOIN dimActivity class
LEFT JOIN TBL_TMX_activity act1
ON act1.PrntActFK = class.ActivityFK
ON
(
(
class.ActivityFK = act.PrntActFK
OR
(act1.PrntActFK IS NULL -- equivalent of NOT EXISTS
AND Class.ActivityFK = act.ActivityFK)
)
AND class.ActivityName = act.ActivityName
)

Help to optimize a Query

i need optimize this query, please see the comented line:
SELECT p.NUM_PROCESSO,
p.NUM_PROC_JUD,
p.Num_Proc_Jud_Antigo1,
p.Num_Proc_Jud_Antigo2,
p.Num_Proc_Jud_Novo,
a.assunto,
su.subassunto,
u.UNIDADE,
s.SERVIDOR,
dvj.data_vinc,
p.TIPO,
c.DESC_CLASSIF
FROM processo p
LEFT OUTER JOIN assunto a
ON a.cod_assunto = p.cod_assunto
LEFT OUTER JOIN subassunto su
ON su.cod_subassunto = p.cod_subassunto
LEFT OUTER JOIN Distrib_VincJud dvj
ON dvj.num_processo = p.num_processo
LEFT OUTER JOIN servidor s
ON S.COD_SERVIDOR = dvj.COD_SERVIDOR
LEFT OUTER JOIN unidade u
ON u.COD_UNIDADE = s.COD_UNIDADE
LEFT OUTER JOIN Classif_Processo c
ON C.COD_CLASSIF = p.COD_CLASSIF
WHERE p.TIPO = 'J'
AND p.NUM_PROCESSO NOT IN (SELECT d.num_processo
FROM distribuicao d
WHERE d.COD_SERVIDOR in ( '0', '000' )
AND d.num_distribuicao IN
(SELECT MAX(num_distribuicao)
FROM Distribuicao
GROUP BY num_processo)
--this suquery return 100k lines !!! and consume all CPU:
AND dvj.id_vinc IN
(SELECT MAX(id_vinc)
FROM Distrib_VincJud
where ativo = '1'
GROUP BY num_processo))
AND p.NUM_PROCESSO NOT IN (SELECT num_processo
FROM Anexos)
AND s.ATIVO = 1
my horrible solution at this moment: http://pastebin.com/C4PHNsSc
What I would do is convert the IN and NOT IN into joins:
SELECT p.NUM_PROCESSO, p.NUM_PROC_JUD, p.Num_Proc_Jud_Antigo1,
p.Num_Proc_Jud_Antigo2, p.Num_Proc_Jud_Novo, a.assunto,
su.subassunto, u.UNIDADE, s.SERVIDOR, dvj.data_vinc, p.TIPO,
c.DESC_CLASSIF
FROM
processo p
INNER JOIN (
SELECT p.num_processo,
CASE WHEN dvj.id_vinc IS NOT NULL
AND d.num_distribuicao IS NOT NULL
OR a.num_processo IS NOT NULL THEN
1
ELSE
0
END exclude
FROM
processo p
LEFT JOIN Anexos a
ON p.num_processo = a.num_processo
LEFT JOIN (
SELECT num_processo,
MAX(num_distribuicao) AS max_distribuicao
FROM Distribuicao
GROUP BY num_processo
) md ON p.num_processo = md.num_processo
LEFT JOIN (
SELECT num_processo, MAX(id_vinc) AS max_vinc
FROM Distrib_VincJud
WHERE ativo = '1'
GROUP BY num_processo
) mv on p.num_processo = mv.num_processo
LEFT JOIN distribuicao d
ON p.num_processo = d.num_processo
AND md.max_distribuicao = d.num_distribuicao
LEFT JOIN Distrib_VincJud dvj
ON p.num_processo = dvj.num_processo
AND mv.max_vinc = dvj.id_vinc
WHERE d.COD_SERVIDOR in ('0', '000')
) IncExc
ON p.num_processo = IncExc.num_processo
LEFT OUTER JOIN assunto a
ON a.cod_assunto = p.cod_assunto
LEFT OUTER JOIN subassunto su
ON su.cod_subassunto = p.cod_subassunto
LEFT OUTER JOIN Distrib_VincJud dvj
ON dvj.num_processo = p.num_processo
LEFT OUTER JOIN servidor s
ON S.COD_SERVIDOR = dvj.COD_SERVIDOR
LEFT OUTER JOIN unidade u
ON u.COD_UNIDADE = s.COD_UNIDADE
LEFT OUTER JOIN Classif_Processo c
ON C.COD_CLASSIF = p.COD_CLASSIF
WHERE
p.TIPO = 'J'
AND IncExc.exclude = 0
AND s.ATIVO = 1
This part
AND p.NUM_PROCESSO NOT IN (
SELECT d.num_processo FROM distribuicao d
WHERE d.COD_SERVIDOR in ('0','000')
AND d.num_distribuicao IN (
SELECT MAX(num_distribuicao) FROM Distribuicao GROUP BY num_processo
)
and this part
AND p.NUM_PROCESSO NOT IN (
SELECT num_processo FROM Anexos
)
are going to be your biggest bottlenecks in the query as you've got nested subqueries in there.
You also have a few of these:
SELECT MAX(id_vinc) FROM Distrib_VincJud where ativo = '1' GROUP BY num_processo)
SELECT MAX(num_distribuicao) FROM Distribuicao GROUP BY num_processo
You might gain a few more seconds by letting these be seperate queries where you can store the results.
In fact, you might do well to have a separate table with these NOT IN(...) values that gets updated upon every insert to your database. It all depends on how often you run each query.
Have you tried running your Query Optimizer on these?
Separate out the sub queries and then do a join
i.e. find all the num_processo that you are excluding in one query first. do a left join with the processo table on the num_processo field and exclude those where the first table's num_processo field is null
Edit:
what's the relationship between the tables distribuicao and distrib_vincJud?
this line is killing your performance...
AND dvj.id_vinc IN
( SELECT MAX(id_vinc)
FROM Distrib_VincJud
where ativo = '1'
GROUP BY num_processo
)
sub query in a sub query which then references a joined table outside of the sub query??????