Using nested queries vs CTEs and performance impact

Using nested queries vs CTEs and performance impact - sql

I've been using sql for a while, and would like to know whether and how it would make sense to convert a script containing a lot of CTE's into a regular nested script. I'm using:
WITH cte_account_pricelevelid
AS (SELECT a.accountid,
a.pricelevelid
FROM companypricelist a
JOIN(SELECT accountid
FROM crm_accountbase
WHERE defaultpricelevelid IS NULL) b
ON a.accountid = b.accountid),
totals
AS (SELECT a.accountid,
a.pricelevelid
FROM companypricelist a
JOIN(SELECT accountid
FROM crm_accountbase
WHERE defaultpricelevelid IS NULL) b
ON a.accountid = b.accountid),
totalsgrouped
AS (SELECT pricelevelid,
Count(*) counts
FROM totals
GROUP BY pricelevelid),
final
AS (SELECT cte.accountid,
cte.pricelevelid,
frequency.counts
FROM cte_account_pricelevelid cte
CROSS JOIN totalsgrouped frequency
WHERE cte.pricelevelid = frequency.pricelevelid),
mycolumns
AS (SELECT b.accountid,
b.pricelevelid,
b.counts
FROM (SELECT accountid
FROM crm_accountbase
WHERE defaultpricelevelid IS NULL) a
JOIN final b
ON a.accountid = b.accountid),
e
AS (SELECT *,
Row_number()
OVER (
partition BY accountid
ORDER BY counts, pricelevelid ) AS Recency
FROM mycolumns),
cte_result
AS (SELECT accountid,
pricelevelid
FROM e
WHERE recency = 1)
SELECT a.accountid,
a.defaultpricelevelid,
b.pricelevelid
FROM crm_accountbase a
JOIN cte_result b
ON a.accountid = b.accountid
I feel that it is silly to keep running the same query within my CTE's:
SELECT accountid
FROM crm_accountbase
WHERE defaultpricelevelid IS NULL
But I don't know how to get around it. I suppose that I can convert this into, but don't know if there would be any performance gain.
select * from (select * from(select * from(...
Is there an opportunity to tremendously improve performance by converting this into a nested SQL or just simplifying the CTE? If so, would you kindly get me started?

If accountid is indexed this join will not use that index
The derived table has not index
SELECT a.accountid, a.pricelevelid
FROM companypricelist a
JOIN (SELECT accountid
FROM crm_accountbase
WHERE defaultpricelevelid IS NULL) b
ON a.accountid = b.accountid
This will use an index on accountid
Even if there is not an index on accountid it will be faster
SELECT a.accountid, a.pricelevelid
FROM companypricelist a
JOIN crm_accountbase b
ON a.accountid = b.accountid
AND b.defaultpricelevelid IS NULL

I tried to simplify it a bit. Mind running it and letting me know the results?
WITH
totals
AS (SELECT a.accountid,
a.pricelevelid
FROM companypricelist a
JOIN(SELECT accountid
FROM crm_accountbase
WHERE defaultpricelevelid IS NULL) b
ON a.accountid = b.accountid),
totalsgrouped
AS (SELECT pricelevelid,
Count(*) counts
FROM totals
GROUP BY pricelevelid),
final
AS (SELECT cte.accountid,
cte.pricelevelid,
frequency.counts
FROM totals cte
CROSS JOIN totalsgrouped frequency
WHERE cte.pricelevelid = frequency.pricelevelid),
mycolumns
AS (SELECT b.accountid,
b.pricelevelid,
b.counts
FROM final b
JOIN totals a
ON a.accountid = b.accountid),
e
AS (SELECT *,
Row_number()
OVER (
partition BY accountid
ORDER BY counts, pricelevelid ) AS Recency
FROM mycolumns)
SELECT a.accountid,
a.defaultpricelevelid,
b.pricelevelid
FROM crm_accountbase a
JOIN e b
ON a.accountid = b.accountid
WHERE b.recency = 1

Related

How to display distinct values based on MAX date in report builder?

I'm quite new to SQL and I hope you can help me.
I'm trying to retrieve unique values from my table based on the latest date where specific users are selected.
This is the data:
Raw Data
And this is what I'm looking to achieve:
Desired Data
I tried to write 2 queries but unfortunately:
My 1st query would display duplicated rows for each company:
SELECT DISTINCT FilteredAppointment.regardingobjectidname ,FilteredAppointment.owneridname ,FilteredAppointment.subject ,MAX(FilteredAppointment.scheduledstart) as Date ,FilteredAppointment.location ,FilteredCcx_member.ccx_mnemonic FROM FilteredAppointment INNER JOIN FilteredAccount ON FilteredAppointment.regardingobjectid = FilteredAccount.accountid INNER JOIN FilteredCcx_member ON FilteredAccount.accountid = FilteredCcx_member.ccx_accountid WHERE FilteredAppointment.statecodename != N'Canceled' AND FilteredAppointment.owneridname IN (N'User1', N'User2', N'User3') GROUP BY FilteredAppointment.regardingobjectidname ,FilteredAppointment.owneridname ,FilteredAppointment.subject ,FilteredAppointment.scheduledstart ,FilteredAppointment.location ,FilteredCcx_member.ccx_mnemonic ORDER BY FilteredAppointment.regardingobjectidname
And my 2nd query would display one row only:
SELECT DISTINCT FilteredAppointment.regardingobjectidname ,FilteredAppointment.owneridname ,FilteredAppointment.subject ,FilteredAppointment.scheduledstart ,FilteredAppointment.location ,FilteredCcx_member.ccx_mnemonic FROM FilteredAppointment INNER JOIN FilteredAccount ON FilteredAppointment.regardingobjectid = FilteredAccount.accountid INNER JOIN FilteredCcx_member ON FilteredAccount.accountid = FilteredCcx_member.ccx_accountid WHERE FilteredAppointment.scheduledstart = (SELECT MAX(FilteredAppointment.scheduledstart) FROM FilteredAppointment WHERE FilteredAppointment.regardingobjectidname = FilteredAppointment.regardingobjectidname) AND FilteredAppointment.statecodename != N'Canceled' AND FilteredAppointment.owneridname IN (N'User1', N'User2', N'User3') GROUP BY FilteredAppointment.regardingobjectidname ,FilteredAppointment.owneridname ,FilteredAppointment.subject ,FilteredAppointment.scheduledstart ,FilteredAppointment.location ,FilteredCcx_member.ccx_mnemonic ORDER BY FilteredAppointment.regardingobjectidname

Try this:-
SELECT distinct a.date, a.company, a.companyID, a.User, a.Location, a.topic
FROM tablename a
inner join
(
Select company, companyID, User, max(date) as recent_date
from
tablename
group by company, companyID, User
) b
on a.date=b.recent_date and a.company=b.company and a.companyID=b.companyID
and a.User=b.User;

I managed to solve the issue - Thank you for the help again!
WITH apptmts AS (SELECT TOP 1 WITH TIES fa.scheduledstart,fa.location,fa.regardingobjectidname,mem.ccx_mnemonic,fa.owneridname,fa.subject FROM FilteredAppointment fa JOIN FilteredAccount acc on fa.regardingobjectid = acc.accountid JOIN FilteredCcx_member mem ON acc.accountid = mem.ccx_accountid WHERE fa.statecodename != N'Canceled' AND fa.owneridname IN (N'User1', N'User2', N'User3') ORDER BY ROW_NUMBER() OVER(PARTITION BY fa.regardingobjectidname ORDER BY fa.scheduledstart DESC) ) SELECT * FROM apptmts ORDER BY scheduledstart DESC

Limiting result sets by future date - SQL

The Query below produces a record for each Entry in the SP_ScheduleEvent Table.
SELECT m.MaterialId, m.MaterialTitle, se.EventDateTime, c.ChannelName
FROM GB_Material m
LEFT OUTER JOIN SP_ScheduleEvent se on se.MaterialName = m.MaterialName
INNER JOIN SP_Schedule s on s.ScheduleID = se.ScheduleID
INNER JOIN GB_Channel c on c.ChannelID = s.ChannelID
WHERE LOWER(m.MaterialName) like '%foo%' OR LOWER(m.MaterialTitle) like '%foo%'
I want to limit the result set by the nearest future EventDateTime.
So per material name i would like to see one EventDateTime, which should be the nearest future date to the current time.
And lastly, a record may not exist in the SP_ScheduleEvent table for a particular materialname, in which case there should be null returned for the EventDateTime column
SQLFiddle
How would i go about doing this?

First, your LEFT JOIN is immaterial, because the subsequent joins make it an INNER JOIN. Either use LEFT JOIN throughout the FROM statement or switch to INNER JOIN.
I think you can use ROW_NUMBER():
SELECT t.*
FROM (SELECT m.MaterialId, m.MaterialName, m.MaterialTitle, se.EventDateTime,
ROW_NUMBER() over (PARTITION BY m.MaterialId OVER se.EventDateTime DESC) as seqnum
FROM GB_Material m INNER JOIN
SP_ScheduleEvent se
on se.MaterialName = m.MaterialName INNER JOIN
SP_Schedule s
on s.ScheduleID = se.ScheduleID INNER JOIN
GB_Channel c
on c.ChannelID = s.ChannelID
WHERE se.EventDateTime > getdate() AND
(LOWER(m.MaterialName) like '%foo%' OR LOWER(m.MaterialTitle) like '%foo%')
) t
WHERE seqnum = 1
ORDER BY se.EventDateTime;

Use the ROW_NUMBER() function:
WITH cte AS (
SELECT m.MaterialId, m.MaterialTitle, se.EventDateTime, c.ChannelName,
ROW_NUMBER() OVER (PARTITION BY m.MaterialId ORDER BY EventDateTime ASC) AS rn
FROM GB_Material m
LEFT OUTER JOIN SP_ScheduleEvent se on se.MaterialName = m.MaterialName
LEFT OUTER JOIN SP_Schedule s on s.ScheduleID = se.ScheduleID
LEFT OUTER JOIN GB_Channel c on c.ChannelID = s.ChannelID
WHERE LOWER(m.MaterialName) like '%foo%' OR LOWER(m.MaterialTitle) like '%foo%'
AND se.EventDateTime > GETDATE()
)
SELECT * FROM cte
WHERE rn=1

SQL inner join on alias (for XML)

I have a running query where I need to expand the XML hierarchy.
The existing query does this (WORKING):
select a.fields, (select c.fields from c),
(select d.fields from d), (select e.fields from e)
from a
--REPAIR ORDERS, PARTS, LABOR, NARRATIVE
I need to create another level at b (THIS IS THE JOB ORDER FOR REPAIR ORDERS, and aliased as bb):
--REPAIR ORDERS, JOB ID (JOB ID/PARTS, JOB ID/LABOR, JOB ID/NARRATIVE)
select a.fields, select b.fields, (select c.fields from c),
(select d.fields from d), (select e.fields from e) from b)
bb
from a
so here's the code (this inner join is killing me):
(also, think as REPAIR NARRATIVES as C and once I get this going I need to add D & E)
IT's the INNER JOIN at the comments line which is stopping me:
declare #OEMDEALERCODE nvarchar(20),#SDate smalldatetime,#EDate smalldatetime,#DMxServiceROJobStatus_ReadyToInvoice int
SET #SDate = '01/01/2013'
SET #EDate = '12/31/2013'
SET #DMxServiceROJobStatus_ReadyToInvoice = dbo.[fn_DMxSysGetEnumItemValue](N'DMxServiceROJobStatus', N'ReadyToInvoice')
-- JobId hierarchy
select ff.QualifyingROX, ff.JobId, ff.JobName,
(
---------------------------------------------------------------------------------------------------
SELECT DISTINCT --REPAIR NARRATIVE
Concern, Cause, Correction, CauseMore, ConcernMore, CorrectionMore
FROM
(
SELECT DISTINCT
TOP (100) PERCENT dbo.DMXDEALERINFORMATIONTABLE.OEMDEALERCODE,
dbo.DMXSERVICEROTABLE.ROID,
dbo.DMXSERVICEROJOB.JOBID,
dbo.DMXSERVICEROJOB.STATUS,
DMXSERVICECCCSTATEMENT_1.TEXT CONCERN,
dbo.DMXSERVICECCCSTATEMENT.TEXT CAUSE,
DMXSERVICECCCSTATEMENT_2.TEXT CORRECTION,
dbo.DMXSERVICEROJOB.CUSTOMCAUSETEXT CAUSEMORE,
dbo.DMXSERVICEROJOB.CUSTOMCONCERNTEXT CONCERNMORE,
dbo.DMXSERVICEROJOB.CUSTOMCORRECTIONTEXT CORRECTIONMORE,
DMXSERVICECCCSTATEMENT_2.RECVERSION Expr5,
MAX(dbo.DMXSERVICEROJOB.RECVERSION) Expr4,
MAX(dbo.DMXSERVICECCCSTATEMENT.RECVERSION) Expr3,
MAX(DMXSERVICECCCSTATEMENT_1.RECVERSION) Expr1,
MAX(DMXSERVICECCCSTATEMENT_2.RECVERSION) Expr2
FROM dbo.DMXSERVICEROJOB (NOLOCK) INNER JOIN
dbo.DMXDEALERINFORMATIONTABLE (NOLOCK) INNER JOIN
dbo.DMXSERVICEROTABLE (NOLOCK) ON dbo.DMXDEALERINFORMATIONTABLE.PARTITION = dbo.DMXSERVICEROTABLE.PARTITION ON
dbo.DMXSERVICEROJOB.ROTABLEREF = dbo.DMXSERVICEROTABLE.RECID LEFT OUTER JOIN
dbo.DMXSERVICECCCSTATEMENT DMXSERVICECCCSTATEMENT_2 ON
dbo.DMXSERVICEROJOB.CORRECTIONREF = DMXSERVICECCCSTATEMENT_2.RECID LEFT OUTER JOIN
dbo.DMXSERVICECCCSTATEMENT ON dbo.DMXSERVICEROJOB.CAUSEREF = dbo.DMXSERVICECCCSTATEMENT.RECID LEFT OUTER JOIN
dbo.DMXSERVICECCCSTATEMENT DMXSERVICECCCSTATEMENT_1 ON
dbo.DMXSERVICEROJOB.CONCERNREF = DMXSERVICECCCSTATEMENT_1.RECID
GROUP BY dbo.DMXDEALERINFORMATIONTABLE.OEMDEALERCODE, dbo.DMXSERVICEROTABLE.ROID, dbo.DMXSERVICEROJOB.JOBID, dbo.DMXSERVICEROJOB.STATUS, DMXSERVICECCCSTATEMENT_1.TEXT, dbo.DMXSERVICECCCSTATEMENT.TEXT,
DMXSERVICECCCSTATEMENT_2.TEXT, dbo.DMXSERVICEROJOB.CUSTOMCAUSETEXT, dbo.DMXSERVICEROJOB.CUSTOMCONCERNTEXT,
dbo.DMXSERVICEROJOB.CUSTOMCORRECTIONTEXT, DMXSERVICECCCSTATEMENT_2.RECID, DMXSERVICECCCSTATEMENT_2.PARTITION,
dbo.DMXSERVICECCCSTATEMENT.RECVERSION, dbo.DMXSERVICECCCSTATEMENT.PARTITION, DMXSERVICECCCSTATEMENT_1.PARTITION,
dbo.DMXSERVICEROJOB.RECVERSION, dbo.DMXSERVICEROJOB.RECID, dbo.DMXSERVICEROJOB.PARTITION,
DMXSERVICECCCSTATEMENT_1.RECVERSION, DMXSERVICECCCSTATEMENT_1.RECID, dbo.DMXSERVICECCCSTATEMENT.RECID,
DMXSERVICECCCSTATEMENT_2.RECVERSION
having dbo.DMXDEALERINFORMATIONTABLE.OEMDEALERCODE = #OEMDEALERCODE
--and dbo.DMXSERVICEROTABLE.ROID = ff.QualifyingROX
and dbo.DMXSERVICEROJOB.STATUS=#DMxServiceROJobStatus_ReadyToInvoice
ORDER BY Expr4 DESC, Expr3 DESC, Expr1 DESC, Expr2 DESC
) cc
inner join ff on cc.ROID = ff.QualifyingROX
---------------------------------------------------------------------------------------------------
)
FROM
(
SELECT DISTINCT --REPAIR NARRATIVE
ee.JobId, ee.JobName, ee.QualifyingROX
from
(
SELECT DISTINCT TOP (100) PERCENT
dbo.DMXSERVICEROTABLE.ROID QualifyingROX,
dbo.DMXSERVICEROJOB.JOBID JobId,
MAX(DISTINCT dbo.DMXSERVICEROJOB.NAME) JobName
FROM dbo.DMXSERVICEROTABLE (nolock) INNER JOIN dbo.DMXSERVICEROJOB (NOLOCK) ON dbo.DMXSERVICEROTABLE.RECID = dbo.DMXSERVICEROJOB.ROTABLEREF
GROUP BY dbo.DMXSERVICEROTABLE.ROID, dbo.DMXSERVICEROJOB.JOBID
ORDER BY QualifyingROX, dbo.DMXSERVICEROJOB.JOBID
) ee
) ff
for XML PATH ('JobDetail'), ROOT ('Jobs'), TYPE

counting items that match within select statement

We have a stored procedure that executes against some fairly large tables and while joining to a larger table it is also keeping a tally of how many records match the corresponding batch_id. What I am trying to figure out is can I improve this with a function for the count or some other means? Trying to get rid of the nested SELECT COUNT(*) statement. The CCTransactions table is 1.4 million rows and the BatchItems is 6.6 million rows.
SELECT a.ItemAuthID, a.FeeAuthID, a.Batch_ID, a.ItemAuthCode,
a.FeeAuthCode, b.Amount, b.Fee,
(SELECT COUNT(*) FROM BatchItems WHERE Batch_ID = a.Batch_ID) AS BatchCount,
ItemBillDate, FeeBillDate, b.AccountNumber,
b.Itemcode, ItemAuthToken, FeeAuthToken,
cc.ItemMerchant, cc.FeeMerchant
FROM CCTransactions a WITH(NOLOCK)
INNER JOIN BatchItems b WITH(NOLOCK)
ON a.Batch_ID = b.Batch_ID
INNER JOIN CCConfig cc WITH(NOLOCK)
ON a.ClientCode = cc.ClientCode
WHERE ((ItemAuthCode > '' AND ItemBillDate IS NULL)
OR (FeeAuthCode > '' AND FeeBillDate IS NULL))
AND TransactionDate BETWEEN DATEADD(d,-7,GETDATE())
AND convert(char(20),getdate(),101) + ' ' + #Cutoff
ORDER BY TransactionDate

When your DBMS supports WIndowed Aggregate Functions you can rewrite it to
COUNT(*) OVER (PARTITION BY Batch_ID)
Of course this only returns the number of rows per Batch_ID returned by the SELECT. if the inner join results in less rows, it's not the correct number.
Then it might be more efficient (depending on your DBMS) to rewrite the Scalar Subquery to a join:
SELECT a.ItemAuthID, a.FeeAuthID, a.Batch_ID, a.ItemAuthCode,
a.FeeAuthCode, b.Amount, b.Fee,
dt.BatchCount,
ItemBillDate, FeeBillDate, b.AccountNumber,
b.Itemcode, ItemAuthToken, FeeAuthToken,
cc.ItemMerchant, cc.FeeMerchant
FROM CCTransactions a WITH(NOLOCK)
INNER JOIN BatchItems b WITH(NOLOCK)
ON a.Batch_ID = b.Batch_ID
INNER JOIN CCConfig cc WITH(NOLOCK)
ON a.ClientCode = cc.ClientCode
INNER JOIN
(
SELECT BatchCount, COUNT(*) AS BatchCount
FROM BatchItems
GROUP BY Batch_ID
) AS dt ON a.Batch_ID = dt.Batch_ID
WHERE ((ItemAuthCode > '' AND ItemBillDate IS NULL)
OR (FeeAuthCode > '' AND FeeBillDate IS NULL))
AND TransactionDate BETWEEN DATEADD(d,-7,GETDATE())
AND convert(CHAR(20),getdate(),101) + ' ' + #Cutoff
ORDER BY TransactionDate

Inner join that ignore singlets

I have to do an self join on a table. I am trying to return a list of several columns to see how many of each type of drug test was performed on same day (MM/DD/YYYY) in which there were at least two tests done and at least one of which resulted in a result code of 'UN'.
I am joining other tables to get the information as below. The problem is I do not quite understand how to exclude someone who has a single result row in which they did have a 'UN' result on a day but did not have any other tests that day.
Query Results (Columns)
County, DrugTestID, ID, Name, CollectionDate, DrugTestType, Results, Count(DrugTestType)
I have several rows for ID 12345 which are correct. But ID 12346 is a single row of which is showing they had a row result of count (1). They had a result of 'UN' on this day but they did not have any other tests that day. I want to exclude this.
I tried the following query
select
c.desc as 'County',
dt.pid as 'PID',
dt.id as 'DrugTestID',
p.id as 'ID',
bio.FullName as 'Participant',
CONVERT(varchar, dt.CollectionDate, 101) as 'CollectionDate',
dtt.desc as 'Drug Test Type',
dt.result as Result,
COUNT(dt.dru_drug_test_type) as 'Count Of Test Type'
from
dbo.Test as dt with (nolock)
join dbo.History as h on dt.pid = h.id
join dbo.Participant as p on h.pid = p.id
join BioData as bio on bio.id = p.id
join County as c with (nolock) on p.CountyCode = c.code
join DrugTestType as dtt with (nolock) on dt.DrugTestType = dtt.code
inner join
(
select distinct
dt2.pid,
CONVERT(varchar, dt2.CollectionDate, 101) as 'CollectionDate'
from
dbo.DrugTest as dt2 with (nolock)
join dbo.History as h2 on dt2.pid = h2.id
join dbo.Participant as p2 on h2.pid = p2.id
where
dt2.result = 'UN'
and dt2.CollectionDate between '11-01-2011' and '10-31-2012'
and p2.DrugCourtType = 'AD'
) as derived
on dt.pid = derived.pid
and convert(varchar, dt.CollectionDate, 101) = convert(varchar, derived.CollectionDate, 101)
group by
c.desc, dt.pid, p.id, dt.id, bio.fullname, dt.CollectionDate, dtt.desc, dt.result
order by
c.desc ASC, Participant ASC, dt.CollectionDate ASC

This is a little complicated because the your query has a separate row for each test. You need to use window/analytic functions to get the information you want. These allow you to do calculate aggregation functions, but to put the values on each line.
The following query starts with your query. It then calculates the number of UN results on each date for each participant and the total number of tests. It applies the appropriate filter to get what you want:
with base as (<your query here>)
select b.*
from (select b.*,
sum(isUN) over (partition by Participant, CollectionDate) as NumUNs,
count(*) over (partition by Partitipant, CollectionDate) as NumTests
from (select b.*,
(case when result = 'UN' then 1 else 0 end) as IsUN
from base
) b
) b
where NumUNs <> 1 or NumTests <> 1
Without the with clause or window functions, you can create a particularly ugly query to do the same thing:
select b.*
from (<your query>) b join
(select Participant, CollectionDate, count(*) as NumTests,
sum(case when result = 'UN' then 1 else 0 end) as NumUNs
from (<your query>) b
group by Participant, CollectionDate
) bsum
on b.Participant = bsum.Participant and
b.CollectionDate = bsum.CollectionDate
where NumUNs <> 1 or NumTests <> 1

If I understand the problem, the basic pattern for this sort of query is simply to include negating or exclusionary conditions in your join. I.E., self-join where columnA matches, but columns B and C do not:
select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
and t1.PkId != t2.PkId
and t1.category != t2.category
)
Put the conditions in the WHERE clause if it benchmarks better:
select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
)
where
t1.PkId != t2.PkId
and t1.category != t2.category
And it's often easiest to start with the self-join, treating it as a "base table" on which to join all related information:
select
[columns]
from
(select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
)
where
t1.PkId != t2.PkId
and t1.category != t2.category
) bt
join [othertable] on (<whatever>)
join [othertable] on (<whatever>)
join [othertable] on (<whatever>)
This can allow you to focus on getting that self-join right, without interference from other tables.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using nested queries vs CTEs and performance impact - sql

Related

How to display distinct values based on MAX date in report builder?

Limiting result sets by future date - SQL

SQL inner join on alias (for XML)

counting items that match within select statement

Inner join that ignore singlets

Categories

Resources