Complex join quick by itself, slow as a join - sql

I'm trying to help a coworker with a large complex query. One of the joins returns the data correctly but performs poorly. If we run the join by itself, it returns in seconds. As part of the larger query, it takes forever if it ever works at all.
Can someone provide some pointers on how to optimize this? Would sub selects perform better than joins?
LEFT JOIN (SELECT cip.CaseID, dispo_sub2.CaseTitle, DATEPART(YEAR,MAX(cip.DispoDt)) DispoYr, cip.DispoDt, cip.DispoCode, cip.DispoDesc, LTRIM(rea.Reasons) Reasons, cip.CountInvPersAddDt
FROM
(SELECT dispo_sub.CaseID, dispo_sub.CaseTitle, dispo_sub.DispoDt, MAX(cip_sub2.CountInvPersAddDt) LastAddDt
FROM
(SELECT cip_sub.CaseID, c.casetitle, MAX(cip_sub.DispoDt) DispoDt
FROM jw50_CountInvPers cip_sub
INNER JOIN jw50_Case c on c.CaseID = cip_sub.CaseID
WHERE c.CaseTypeCode = 'TY706'
AND cip_sub.DispoDt IS NOT NULL
AND cip_sub.DispoCode IS NOT NULL
GROUP BY cip_sub.CaseID, c.CaseTitle) dispo_sub
INNER JOIN jw50_CountInvPers cip_sub2 on cip_sub2.CaseID = dispo_sub.CaseID and cip_sub2.DispoDt = dispo_sub.DispoDt
GROUP BY dispo_sub.CaseID, dispo_sub.CaseTitle, dispo_sub.DispoDt) dispo_sub2
INNER JOIN jw50_CountInvPers cip on cip.CaseID = dispo_sub2.CaseID AND cip.DispoDt = dispo_sub2.DispoDt AND cip.CountInvPersAddDt = dispo_sub2.LastAddDt
LEFT JOIN (SELECT
c_sub.CaseID,
STUFF((SELECT '; ' + ConditionTypeDesc
FROM jw50_Condition con
WHERE con.CaseID = c_sub.CaseId
ORDER BY ConditionTypeDesc
FOR XML PATH('')), 1, 1, '') [Reasons]
FROM jw50_Case c_sub
GROUP BY c_sub.CaseID) rea on rea.CaseID = dispo_sub2.CaseID
GROUP BY cip.CaseID, dispo_sub2.CaseTitle, cip.DispoDt, cip.DispoCode, cip.DispoDesc, rea.Reasons, cip.CountInvPersAddDt) dispo on dispo.CaseID = c.CaseID

Related

SQL Cross Tab Query

Need help figuring out how to do a cross-tabulated report within one query. There are 3-4 tables involved but the users table may not need to be included in the query since we just need a count.
I have put together a screenshot of the table schema and data as an example which can be seen below:
What I need it to return is a query result that looks like:
So I can make a report that looks like:
I've tried to do cursor loops as it's the only way I can do it with my basic knowledge, but it's way too slow.
One particular report I'm trying to generate contains 32 rows and 64 columns with about 70,000 answers, so it's all about the performance of getting it down to one query and fast as possible.
I understand this may depend on indexes and so on but if someone could help me figure out how I could get this done in 1 query (with multiple joins?), that would be awesome!
Thanks!
SELECT MIN(ro.OptionText) RowOptionText, MIN(co.OptionText) RowOptionText, COUNT(ca.AnswerID) AnswerCount
FROM tblQuestions rq
CROSS JOIN tblQuestions cq
JOIN tblOptions ro ON rq.QuestionID = ro.QuestionID
JOIN tblOptions co ON cq.QuestionID = co.QuestionID
LEFT JOIN tblAnswers ra ON ra.OptionID = ro.OptionID
LEFT JOIN tblAnswers ca ON ca.OptionID = co.OptionID AND ca.UserID = ra.UserID
WHERE rq.questionText = 'Gender'
AND cq.questionText = 'How happy are you?'
GROUP BY ro.OptionID, co.OptionID
ORDER BY ro.OptionID, co.OptionID
This should be at least close to what you asked for. Turning this into a pivot will require dynamic SQL as SQL Server requires you to specify the actual value that will be pivoted into a column.
We cross join the questions and limit the results from each of those question references to the single question for the row values and column values respectively. Then we join the option values to the respective question reference. We use LEFT JOIN for the answers in case the user didn't respond to all of the questions. And we join the answers by UserID so that we match the row question and column question for each user. The MIN on the option text is because we grouped and ordered by OptionID to match your sequencing shown.
EDIT: Here's a SQLFiddle
For what it's worth, your query is complicated because you are using the Entity-Attribute-Value design pattern. Quite a few SQL Server experts consider that pattern to be problematic and to be avoided if possible. For instance see https://www.simple-talk.com/sql/t-sql-programming/avoiding-the-eav-of-destruction/.
EDIT 2: Since you accepted my answer, here's the dynamic SQL pivot solution :) SQLFiddle
DECLARE #SqlCmd NVARCHAR(MAX)
SELECT #SqlCmd = N'SELECT RowOptionText, ' + STUFF(
(SELECT ', ' + QUOTENAME(o.OptionID) + ' AS ' + QUOTENAME(o.OptionText)
FROM tblOptions o
WHERE o.QuestionID = cq.QuestionID
FOR XML PATH ('')), 1, 2, '') + ', RowTotal AS [Row Total]
FROM (
SELECT ro.OptionID RowOptionID, ro.OptionText RowOptionText, co.OptionID ColOptionID,
ca.UserID, COUNT(ca.UserID) OVER (PARTITION BY ra.OptionID) AS RowTotal
FROM tblOptions ro
JOIN tblOptions co ON ro.QuestionID = ' + CAST(rq.QuestionID AS VARCHAR(10)) +
' AND co.QuestionID = ' + CAST(cq.QuestionID AS VARCHAR(10)) + '
LEFT JOIN tblAnswers ra ON ra.OptionID = ro.OptionID
LEFT JOIN tblAnswers ca ON ca.OptionID = co.OptionID AND ca.UserID = ra.UserID
UNION ALL
SELECT 999999, ''Column Total'' RowOptionText, co.OptionID ColOptionID,
ca.UserID, COUNT(ca.UserID) OVER () AS RowTotal
FROM tblOptions ro
JOIN tblOptions co ON ro.QuestionID = ' + CAST(rq.QuestionID AS VARCHAR(10)) +
' AND co.QuestionID = ' + CAST(cq.QuestionID AS VARCHAR(10)) + '
LEFT JOIN tblAnswers ra ON ra.OptionID = ro.OptionID
LEFT JOIN tblAnswers ca ON ca.OptionID = co.OptionID AND ca.UserID = ra.UserID
) t
PIVOT (COUNT(UserID) FOR ColOptionID IN (' + STUFF(
(SELECT ', ' + QUOTENAME(o.OptionID)
FROM tblOptions o
WHERE o.QuestionID = cq.QuestionID
FOR XML PATH ('')), 1, 2, '') + ')) p
ORDER BY RowOptionID'
FROM tblQuestions rq
CROSS JOIN tblQuestions cq
WHERE rq.questionText = 'Gender'
AND cq.questionText = 'How happy are you?'
EXEC sp_executesql #SqlCmd
I think I see the problem. I know you can't modify the schema, but you need a conceptual table for the crosstab information such as which questionID is the rowHeader and which is the colHeader. You can create it in an external data source and join with the existing source or simply hard-code the table values in your sql.
you need to have 2 instances of the question/option/answer relations, one for each rowHeader and colHeader for each crosstab. Those 2 relations are joined by the userID.
this version has your outer joins:
sqlFiddle
this version doesn't have the crossTab table, just the row and col questionIDs hard-coded:
sqlFiddleNoTbl
The following piece of mess works with no hard-coded values but fails to show the rows where the count is 0.
This might however still work for your report.
;with stepone as(
SELECT
RANK() OVER(PARTITION BY a.UserId ORDER BY o.QuestionID) AS [temprank]
, o.QuestionID AS [QID1]
, o.OptionID AS [OID1]
, same.QuestionID
, same.OptionID
, a.UserId AS [IDUser]
, same.UserId
FROM
tblAnswers a
INNER JOIN
tblOptions o
ON a.OptionID = o.OptionID
INNER JOIN
tblQuestions q
ON o.QuestionID = q.QuestionID
INNER JOIN
(
SELECT
a.AnswerID
, a.OptionID
, a.UserId
, o.QuestionID
FROM
tblAnswers a
INNER JOIN
tblOptions o
ON a.OptionID = o.OptionID
) same
ON a.UserId = same.UserId AND a.AnswerID <> same.AnswerID
)
, stepthree AS(
SELECT
t.QID1, t.OID1, t.QuestionID, t.OptionID
, COUNT(UserId) AS myCount
FROM
stepone t
WHERE t.temprank = 1
GROUP BY
t.QID1, t.OID1, t.QuestionID, t.OptionID
)
SELECT
o1.OptionText AS [RowTest]
, o2.OptionText AS [ColumnText]
, t.myCount AS [Count]
FROM
stepthree t
INNER JOIN tblOptions o1
ON t.OID1 = o1.OptionID
INNER JOIN tblOptions o2
ON t.OptionID = o2.OptionID
ORDER BY t.OID1
Hope it helps, I enjoyed trying to do it.

How to optimize SQL query which returns millions of records

All the example used below is currently for only one Salesorganization and we may have more different salesOrganization number later in future.
I have 6 tables which has millions of records. This tables are populated by execution of SSIS package.
select count(*) from tmp_materials --11,02,032
select count(*) from tbl_VendorLogoData --20,41,501
select count(*) from TBL_Image_EDV --4,44,063
select count(*) from TBL_EXTPRODUCTATTRIBUTES_EDV -- 2,06,15,572
select count(*) from TBL_Accessories_EDV --10,11,568
select count(*) from TBL_SimilarSku --64,10,408
I have a stored procedure which is used to select distinct records from these tables
SELECT DISTINCT 'MD' AS [COMPANYCD]
, MAIN.MATERIAL AS [MATERIAL]
, ISNULL(UPPER(IMG.[lowprovider]),'') AS [LOW_IMAGE]
, ISNULL(UPPER(IMG.[midprovider]), '') AS [MID_IMAGE]
, ISNULL(UPPER(IMG.[highprovider]), '') AS [HIGH_IMAGE]
, ISNULL(UPPER(VS.isLogo),'') AS [VENDOR_LOGO]
, ISNULL(UPPER(DS.PROVIDER), '') AS [DATASHEET]
, ISNULL(ACC.AccessoryMaterial, '') AS [OPT_ACC]
, ISNULL(UPPER(ACC.Provider), '') AS [OPT_ACC_PROVIDER]
, ISNULL(SS.Similarsku, '') AS [SIMILAR_SKU]
, ISNULL(UPPER(SS.ProviderName), '') AS [SIMILAR_SKU_PROVIDER]
FROM tmp_materials MAIN WITH (NOLOCK)
LEFT OUTER JOIN TBL_Image_EDV IMG WITH (NOLOCK) ON MAIN.MATERIAL = IMG.MATERIAL AND MAIN.salesOrg = IMG.SalesOrganization
LEFT OUTER JOIN TBL_EXTPRODUCTATTRIBUTES_EDV DS WITH (NOLOCK) ON MAIN.MATERIAL = DS.SKUNBR AND MAIN.salesOrg = DS.SalesOrganization
LEFT OUTER JOIN TBL_Accessories_EDV ACC WITH (NOLOCK) ON MAIN.MATERIAL = ACC.ParentSKU AND MAIN.salesOrg = DS.SalesOrganization
LEFT OUTER JOIN TBL_SimilarSku SS WITH (NOLOCK) ON MAIN.MATERIAL = SS.ParentSKU AND MAIN.salesOrg = DS.SalesOrganization
LEFT OUTER JOIN tbl_VendorLogoData VS WITH (NOLOCK) ON MAIN.MATERIAL = VS.SKU AND MAIN.salesOrg = VS.Salesorganization
WHERE MAIN.salesOrg = #SALESORGANIZATION
AND (CASE WHEN IMG.MATERIAL IS NULL AND DS.SKUNBR IS NULL AND ACC.ParentSKU IS NULL AND SS.ParentSKU IS NULL AND VS.SKU IS NULL
THEN 0 ELSE 1 END) = 1
AND DS.Provider <> 'novalue'
AND SS.RecordIdentifier like '%##%'
AND ACC.RecordIdentifier like '%##%'
AND ACC.accessorySku LIKE '%##%'
The parameter for these procedure is #SALESORGANIZATION This is used to populate a report. I am running this for multiple salesorganization values. But for one of the salesorganization it is taking more than 5 hours to generate the data.
It seems i will need to write a loop, but finding it difficult to proceed with multiple joins any suggestions?
Please advice how can i optimize this query? Thanks for your assitance.
There you have an execution plan file SQL Execution Plan
Try to include more condition to improve performance.
In all the LEFT OUTER JOIN checks include your parameter directly and try to avoid JOINDE tables using there
e.g
LEFT OUTER JOIN TBL_Image_EDV IMG WITH (NOLOCK) ON
MAIN.MATERIAL = IMG.MATERIAL
AND MAIN.salesOrg = IMG.SalesOrganization
AND IMG.SalesOrganization=#SALESORGANIZATION
Add conditions in the WHERE
AND SS.RecordIdentifier like '%##%'
should be
(SS.id IS NOT NULL AND SS.RecordIdentifier like '%##%')
Thus you first cut the rows with no relation
UPDATE
One more idea
Try
INNER JOIN (select *
from TBL_Accessories_EDV
where RecordIdentifier like '%##%'
AND accessorySku LIKE '%##%') ACC WITH (NOLOCK)
ON MAIN.MATERIAL = ACC.ParentSKU AND MAIN.salesOrg = DS.SalesOrganization
Just to restrict amout of records you filter out later

How to improve the performance of a SQL query even after adding indexes?

I am trying to execute the following sql query but it takes 22 seconds to execute. the number of returned items is 554192. I need to make this faster and have already put indexes in all the tables involved.
SELECT mc.name AS MediaName,
lcc.name AS Country,
i.overridedate AS Date,
oi.rating,
bl1.firstname + ' ' + bl1.surname AS Byline,
b.id BatchNo,
i.numinbatch ItemNumberInBatch,
bah.changedatutc AS BatchDate,
pri.code AS IssueNo,
pri.name AS Issue,
lm.neptunemessageid AS MessageNo,
lmt.name AS MessageType,
bl2.firstname + ' ' + bl2.surname AS SourceFullName,
lst.name AS SourceTypeDesc
FROM profiles P
INNER JOIN profileresults PR
ON P.id = PR.profileid
INNER JOIN items i
ON PR.itemid = I.id
INNER JOIN batches b
ON b.id = i.batchid
INNER JOIN itemorganisations oi
ON i.id = oi.itemid
INNER JOIN lookup_mediachannels mc
ON i.mediachannelid = mc.id
LEFT OUTER JOIN lookup_cities lc
ON lc.id = mc.cityid
LEFT OUTER JOIN lookup_countries lcc
ON lcc.id = mc.countryid
LEFT OUTER JOIN itembylines ib
ON ib.itemid = i.id
LEFT OUTER JOIN bylines bl1
ON bl1.id = ib.bylineid
LEFT OUTER JOIN batchactionhistory bah
ON b.id = bah.batchid
INNER JOIN itemorganisationissues ioi
ON ioi.itemorganisationid = oi.id
INNER JOIN projectissues pri
ON pri.id = ioi.issueid
LEFT OUTER JOIN itemorganisationmessages iom
ON iom.itemorganisationid = oi.id
LEFT OUTER JOIN lookup_messages lm
ON iom.messageid = lm.id
LEFT OUTER JOIN lookup_messagetypes lmt
ON lmt.id = lm.messagetypeid
LEFT OUTER JOIN itemorganisationsources ios
ON ios.itemorganisationid = oi.id
LEFT OUTER JOIN bylines bl2
ON bl2.id = ios.bylineid
LEFT OUTER JOIN lookup_sourcetypes lst
ON lst.id = ios.sourcetypeid
WHERE p.id = #profileID
AND b.statusid IN ( 6, 7 )
AND bah.batchactionid = 6
AND i.statusid = 2
AND i.isrelevant = 1
when looking at the execution plan I can see an step which is costing 42%. Is there any way I could get this to a lower threshold or any way that I can improve the performance of the whole query.
Remove the profiles table as it is not needed and change the WHERE clause to
WHERE PR.profileid = #profileID
You have a left outer join on the batchactionhistory table but also have a condition in your WHERE clause which turns it back into an inner join. Change you code to this:
LEFT OUTER JOIN batchactionhistory bah
ON b.id = bah.batchid
AND bah.batchactionid = 6
You don't need the batches table as it is used to join other tables which could be joined directly and to show the id in you SELECT which is also available in other tables. Make the following changes:
i.batchidid AS BatchNo,
LEFT OUTER JOIN batchactionhistory bah
ON i.batchidid = bah.batchid
Are any of the fields that are used in joins or the WHERE clause from tables that contain large amounts of data but are not indexed. If so try adding an index on at time to the largest table.
Do you need every field in the result - if you could loose one or to you maybe could reduce the number of tables further.
First, if this is not a stored procedure, make it one. That's a lot of text for sql server to complile.
Next, my experience is that "worst practices" are occasionally a good idea. Specifically, I have been able to improve performance by splitting large queries into a couple or three small ones and assembling the results.
If this query is associated with a .net, coldfusion, java, etc application, you might be able to do the split/re-assemble in your application code. If not, a temporary table might come in handy.

Query Performance too Slow

Im having performance issues with this query. If I remove the status column it runs very fast but adding the subquery in the column section delays way too much the query 1.02 min. How can I modify this query so it runs fast getting the desired data.
The reason I put that subquery there its because I needed the status for the latest activity, some activities have null status so I have to ignore them.
Establishments: 6.5k rows -
EstablishmentActivities: 70k rows -
Status: 2 (Active, Inactive)
SELECT DISTINCT
est.id,
est.trackingNumber,
est.NAME AS 'establishment',
actTypes.NAME AS 'activity',
(
SELECT stat3.NAME
FROM SACPAN_EstablishmentActivities eact3
INNER JOIN SACPAN_ActivityTypes at3
ON eact3.activityType_FK = at3.code
INNER JOIN SACPAN_Status stat3
ON stat3.id = at3.status_FK
WHERE eact3.establishment_FK = est.id
AND eact3.rowCreatedDT = (
SELECT MAX(est4.rowCreatedDT)
FROM SACPAN_EstablishmentActivities est4
INNER JOIN SACPAN_ActivityTypes at4
ON est4.establishment_fk = est.id
AND est4.activityType_FK = at4.code
WHERE est4.establishment_fk = est.id
AND at4.status_FK IS NOT NULL
)
AND at3.status_FK IS NOT NULL
) AS 'status',
est.authorizationNumber,
reg.NAME AS 'region',
mun.NAME AS 'municipality',
ISNULL(usr.NAME, '') + ISNULL(+ ' ' + usr.lastName, '')
AS 'created',
ISNULL(usr2.NAME, '') + ISNULL(+ ' ' + usr2.lastName, '')
AS 'updated',
est.rowCreatedDT,
est.rowUpdatedDT,
CASE WHEN est.rowUpdatedDT >= est.rowCreatedDT
THEN est.rowUpdatedDT
ELSE est.rowCreatedDT
END AS 'LatestCreatedOrModified'
FROM SACPAN_Establishments est
INNER JOIN SACPAN_EstablishmentActivities eact
ON est.id = eact.establishment_FK
INNER JOIN SACPAN_ActivityTypes actTypes
ON eact.activityType_FK = actTypes.code
INNER JOIN SACPAN_Regions reg
ON est.region_FK = reg.code --
INNER JOIN SACPAN_Municipalities mun
ON est.municipality_FK = mun.code
INNER JOIN SACPAN_ContactEstablishments ce
ON ce.establishment_FK = est.id
INNER JOIN SACPAN_Contacts con
ON ce.contact_FK = con.id
--JOIN SACPAN_Status stat ON stat.id = actTypes.status_FK
INNER JOIN SACPAN_Users usr
ON usr.id = est.rowCreatedBy_FK
LEFT JOIN SACPAN_Users usr2
ON usr2.id = est.rowUpdatedBy_FK
WHERE (con.ssn = #ssn OR #ssn = '*')
AND eact.rowCreatedDT = (
SELECT MAX(eact2.rowCreatedDT)
FROM SACPAN_EstablishmentActivities eact2
WHERE eact2.establishment_FK = est.id
)
--AND est.id = 6266
ORDER BY 'LatestCreatedOrModified' DESC
Try moving that 'activiy' query to a Left Join and see if it speeds it up.
I solved the problem by creating a temporary table and creating an index to it, this removed the need of the slow subquery in the select statement. Then I join the temp table as I do with normal tables.
Thanks to all.

Convert inner join to SQL Subquery

my SQL Query is as follow:
i m using inner join in that query...
i want to modify that into subquery
select distinct Auditdata.ID,
ns.ProviderMaster_ID as CDRComment
from Auditdata AuditData
inner join AuditMaster am
on am.ID = AuditData.AuditMaster_ID
inner join HomeCircleMaster hcm
on hcm.Ori_CircleMaster_ID = am.CircleMaster_ID and hcm.Ori_ServiceTypeMaster_ID = 1 and hcm.Dest_ServiceTypeMaster_ID = 1 inner join AuditTaggingMaster atm
on atm.AuditMaster_ID = am.ID
inner join NoSeriesMaster ns on
( ns.CircleMaster_ID = am.CircleMaster_ID or ns.CircleMaster_ID = hcm.Dest_CircleMaster_ID) and ns.ProviderMaster_ID <> am.ProviderMaster_ID and ns.ServiceTypeMaster_ID = 1
inner join ProviderMaster_CallTypeMaster pm_ctm
on pm_ctm.ProviderMaster_ID = am.ProviderMaster_ID and pm_ctm.CallTypeMaster_ID = 101 and pm_ctm.CallTypeTagValue =AuditData.CallTypeTag INNER JOIN NoSeriesMaster_Prefix PD ON AuditData.CallTo like PD.PrefixNo + '%' AND AuditData.calltolen = PD.PrefixLen + PD.AfterPrefixLen AND PD.PrefixNo + ns.NoSeries = LEFT(AuditData.CallTo, NoSeriesLen + PD.PrefixLen) where AuditData.TATCallType is null and AuditData.AuditMaster_ID = 74 AND PD.PrefixType = 'SMS'
Please help me
thanx
I would try to post the query beautified. Otherwise it is hard to help. Try to explain also what are you trying to do so. There are 6 Inner joins. Which one would you like to change.
Anyway. Your question seems to be an quite similar to This one. Please don't post questions twice.
Apart from that It seems that you are trying to optimize this query. Why don't you just ask who to make it faster? In my humble opinion using subquerys will make everything worse.
If you need help optimizing we would need more information like the table structure, indexes, number of rows, etc...
select distinct Auditdata.ID,
ns.ProviderMaster_ID as CDRComment
from Auditdata AuditData
inner join AuditMaster am on am.ID = AuditData.AuditMaster_ID
inner join HomeCircleMaster hcm on hcm.Ori_CircleMaster_ID = am.CircleMaster_ID
and hcm.Ori_ServiceTypeMaster_ID = 1
and hcm.Dest_ServiceTypeMaster_ID = 1
inner join AuditTaggingMaster atm on atm.AuditMaster_ID = am.ID
inner join NoSeriesMaster ns on ( ns.CircleMaster_ID = am.CircleMaster_ID
or ns.CircleMaster_ID = hcm.Dest_CircleMaster_ID)
and ns.ProviderMaster_ID <> am.ProviderMaster_ID
and ns.ServiceTypeMaster_ID = 1
inner join ProviderMaster_CallTypeMaster pm_ctm on pm_ctm.ProviderMaster_ID = am.ProviderMaster_ID
and pm_ctm.CallTypeMaster_ID = 101
and pm_ctm.CallTypeTagValue =AuditData.CallTypeTag
INNER JOIN NoSeriesMaster_Prefix PD ON AuditData.CallTo like PD.PrefixNo + '%'
AND AuditData.calltolen = PD.PrefixLen + PD.AfterPrefixLen
AND PD.PrefixNo + ns.NoSeries = LEFT(AuditData.CallTo, NoSeriesLen + PD.PrefixLen)
where AuditData.TATCallType is null
and AuditData.AuditMaster_ID = 74
AND PD.PrefixType = 'SMS