Changing comparator in WHERE clause has catastrophic results on query performance

Changing comparator in WHERE clause has catastrophic results on query performance - sql

I have a monster query that I'm running against a SQL SERVER 2005 database that is acting very strange. I have two conditions in the WHERE clause of the outermost select, comparing a field to a constant date. When the constant dates are either identical (down to the second) or their date parts are not equal, the query runs in under 2 seconds. When the date parts are the same but the time parts are different, the query takes around 7 minutes to complete. Specifically, having a WHERE clause of
WHERE
d.date >= '2011-11-07 00:00:00' AND
d.date <= '2011-11-08 11:59:59'
works well and as expected. Changing the WHERE clause to
WHERE
d.date >= '2011-11-07 00:00:00' AND
d.date <= '2011-11-07 11:59:59'
causes the query to take many minutes.
I also noticed that when I turned off the index on the Agent_Hours table that the bad case of having the same dates the same reduces the query time to 25 seconds, still far longer than when they dates are different, but not by as much.
Below is the full query for reference (the WHERE clause in question is at the very end):
SELECT
s.transaction_id AS 'transaction',
s.created_on AS transaction_date,
s.first_name + ' ' + s.Last_Name AS customer_name,
a.name AS agent_name,
a.phantom AS phantom,
a.team AS agent_team,
a.id AS agent_number,
h.hours,
h2.hours_today,
d.*
FROM
(SELECT
agents.first_name + ' ' + agents.last_name AS name,
agents.id AS id,
agents.phantom AS phantom,
transient.value AS team,
transient.start_date AS team_start_date,
transient.end_date AS team_end_date
FROM
Agents.dbo.Agent_Static AS agents
JOIN
Agents.dbo.Agent_Transient AS transient
ON transient.agent = agents.id
WHERE
transient.field = 'team') AS a
LEFT JOIN Agents.dbo.Agent_Daily AS d
ON d.agent = a.id
LEFT JOIN (SELECT
agent_hours.agent AS agent,
dates.date AS date,
CAST(COUNT(*) AS FLOAT) / 4 AS hours
FROM
Agents.dbo.Agent_Hours AS agent_hours
JOIN
(SELECT
DISTINCT CONVERT(
VARCHAR(10),
hour_worked,
101)
AS date
FROM
Agents.dbo.Agent_Hours) AS dates
ON dates.date = CONVERT(
VARCHAR(10),
agent_hours.hour_worked,
101)
WHERE
(status = 'Phone' OR
status = 'Meeting')
GROUP BY
agent_hours.agent,
dates.date) AS h
ON h.agent = a.id AND
h.date = d.date
LEFT JOIN (SELECT
agent_hours.agent AS agent,
dates.date AS date,
CAST(COUNT(*) AS FLOAT) / 4 AS hours_today
FROM
Agents.dbo.Agent_Hours AS agent_hours
JOIN
(SELECT
DISTINCT CONVERT(
VARCHAR(10),
hour_worked,
101)
AS date
FROM
Agents.dbo.Agent_Hours) AS dates
ON dates.date = CONVERT(
VARCHAR(10),
agent_hours.hour_worked,
101)
WHERE
(status = 'Phone' OR
status = 'Meeting') AND
CONVERT(
VARCHAR(10),
CAST('11/09/2011 13:01' AS DATETIME),
101) = CONVERT(
VARCHAR(10),
agent_hours.hour_worked,
101) AND
CONVERT(
VARCHAR(10),
CAST('11/09/2011 13:01' AS DATETIME),
114) > CONVERT(
VARCHAR(10),
agent_hours.hour_worked,
114)
GROUP BY
agent_hours.agent,
dates.date) AS h2
ON h2.agent = a.id AND
h2.date = d.date
LEFT JOIN sale_transactions AS s
ON a.id = s.agent_hermes_id AND
s.created_on >= a.team_start_date AND
s.created_on <= a.team_end_date AND
CONVERT(
VARCHAR(10),
d.date,
101) = CONVERT(
VARCHAR(10),
s.created_on,
101)
LEFT JOIN sold_phrases AS p
ON s.Transaction_ID = p.transaction_id
WHERE
d.date >= '2011-11-07 00:00:00' AND
d.date <= '2011-11-07 11:59:59'

As a general rule, always post your exact table definition, including all indexes, when asking performance problems in SQL.
I cannot see any difference between the two cases, but considering your explanation, this is what likely happens: the cardinality estimates for the date range may trigger the index tipping point and you get wildly different execution plans. Such issues are best addressed by using plan guides, see Optimizing Queries in Deployed Applications by Using Plan Guides. You should be able to confirm if the problem is indeed the plan, see Displaying Graphical Execution Plans (SQL Server Management Studio).

This is maybe a micro optimization but have you consider changing the way you get the date part from datetime to DATEADD(dd, 0, DATEDIFF(dd, 0, datetime_format)). It's usually faster way than convert function.
SELECT
s.transaction_id AS 'transaction',
s.created_on AS transaction_date,
s.first_name + ' ' + s.Last_Name AS customer_name,
a.name AS agent_name,
a.phantom AS phantom,
a.team AS agent_team,
a.id AS agent_number,
h.hours,
h2.hours_today,
d.*
FROM (SELECT
agents.first_name + ' ' + agents.last_name AS name,
agents.id AS id,
agents.phantom AS phantom,
transient.value AS team,
transient.start_date AS team_start_date,
transient.end_date AS team_end_date
FROM
Agents.dbo.Agent_Static AS agents
JOIN
Agents.dbo.Agent_Transient AS transient
ON transient.agent = agents.id
WHERE
transient.field = 'team'
) AS a
LEFT JOIN Agents.dbo.Agent_Daily AS d ON d.agent = a.id
LEFT JOIN (
SELECT
agent_hours.agent AS agent,
dates.date AS date,
COUNT(*) / 4.0 AS hours
FROM Agents.dbo.Agent_Hours AS agent_hours
JOIN (
SELECT DATEADD(dd, 0, DATEDIFF(dd, 0, hour_worked)) as date
FROM Agents.dbo.Agent_Hours GROUP BY DATEADD(dd, 0, DATEDIFF(dd, 0, hour_worked))
) AS dates ON dates.date = DATEADD(dd, 0, DATEDIFF(dd, 0, agent_hours.hour_worked))
WHERE (status = 'Phone' OR status = 'Meeting')
GROUP BY agent_hours.agent, dates.date
) AS h ON h.agent = a.id AND h.date = d.date
LEFT JOIN (
SELECT
agent_hours.agent AS agent,
dates.date AS date,
COUNT(*) / 4.0 AS hours_today
FROM Agents.dbo.Agent_Hours AS agent_hours
JOIN (
SELECT DATEADD(dd, 0, DATEDIFF(dd, 0, hour_worked)) as date
FROM Agents.dbo.Agent_Hours GROUP BY DATEADD(dd, 0, DATEDIFF(dd, 0, hour_worked))
) AS dates ON dates.date = DATEADD(dd, 0, DATEDIFF(dd, 0, agent_hours.hour_worked))
WHERE
(status = 'Phone' OR status = 'Meeting') AND
agent_hours.hour_worked >=
DATEADD(dd, 0, DATEDIFF(dd, 0, CAST('11/09/2011 13:01' AS DATETIME)))
AND
agent_hours.hour_worked <
CAST('11/09/2011 13:01' AS DATETIME)
GROUP BY agent_hours.agent, dates.date
) AS h2 ON h2.agent = a.id AND h2.date = d.date
LEFT JOIN sale_transactions AS s
ON a.id = s.agent_hermes_id AND
s.created_on >= a.team_start_date AND
s.created_on <= a.team_end_date AND
DATEADD(dd, 0, DATEDIFF(dd, 0, d.date))
=
DATEADD(dd, 0, DATEDIFF(dd, 0, s.created_on))
LEFT JOIN sold_phrases AS p
ON s.Transaction_ID = p.transaction_id
WHERE
d.date >= '2011-11-07 00:00:00' AND
d.date <= '2011-11-07 11:59:59'
The more important (as Remus Rusanu already wrote) are indexes. Execute both queries and check which indexes are used in faster query and force SQL Server to use them always. You can do it using with(index(index_name)).

Related

GROUP BY & SUM of values with missing MONTHS

I have gone through a lot of examples and joined couple of them in order to come down to the following statement;
DECLARE #StartDate SMALLDATETIME, #EndDate SMALLDATETIME;
SELECT #StartDate = '20170930', #EndDate = '20180930';
;WITH d(d) AS
(
SELECT DATEADD(MONTH, n, DATEADD(MONTH, DATEDIFF(MONTH, 0, #StartDate), 0))
FROM ( SELECT TOP (DATEDIFF(MONTH, #StartDate, #EndDate) + 1)
n = ROW_NUMBER() OVER (ORDER BY [object_id]) - 1
FROM sys.all_objects ORDER BY [object_id] ) AS n
)
SELECT
[Period] = CONVERT(VARCHAR(4), YEAR(d.d)) + '-' + CONVERT(VARCHAR(2), MONTH(d.d)),
QtyTotal = ISNULL(SUM(o.QEXIT),0)
FROM d LEFT OUTER JOIN VE_STOCKTRANS AS o
ON o.TRANSDATE >= d.d
AND o.TRANSDATE < DATEADD(MONTH, 1, d.d)
WHERE STOCKID = 6000 AND TRANSTYPE = 3553
GROUP BY d.d
ORDER BY d.d;
I need to get the total sales quaantity of an item for the past year. If the item does not have any sales for that particular month, 0 should be displayed next to that month. The above query does what is required unless the WHERE clause is provided. As soon as I add the WHERE clause to get the data for a specific product, the months with no sales dissappears.
I would be grateful if an experienced SQL developer can show me the right direction on this.
Thanks

You need to move condtition to ON:
-- ...
SELECT
[Period] = CONVERT(VARCHAR(4),YEAR(d.d)) +'-'+ CONVERT(VARCHAR(2), MONTH(d.d)),
QtyTotal = ISNULL(SUM(o.QEXIT),0)
FROM d LEFT OUTER JOIN VE_STOCKTRANS AS o
ON o.TRANSDATE >= d.d
AND o.TRANSDATE < DATEADD(MONTH, 1, d.d)
AND STOCKID = 6000 AND TRANSTYPE = 3553 -- here
GROUP BY d.d
ORDER BY d.d;

A more generic approach is to apply the filter before you join.
;WITH d(d) AS
(
SELECT DATEADD(MONTH, n, DATEADD(MONTH, DATEDIFF(MONTH, 0, #StartDate), 0))
FROM ( SELECT TOP (DATEDIFF(MONTH, #StartDate, #EndDate) + 1)
n = ROW_NUMBER() OVER (ORDER BY [object_id]) - 1
FROM sys.all_objects ORDER BY [object_id] ) AS n
),
o AS
(
SELECT *
FROM VE_STOCKTRANS
WHERE STOCKID = 6000
AND TRANSTYPE = 3553
)
SELECT
[Period] = CONVERT(VARCHAR(4), YEAR(d.d)) + '-' + CONVERT(VARCHAR(2), MONTH(d.d)),
QtyTotal = ISNULL(SUM(o.QEXIT),0)
FROM
d
LEFT OUTER JOIN
o
ON o.TRANSDATE >= d.d
AND o.TRANSDATE < DATEADD(MONTH, 1, d.d)
GROUP BY
d.d
ORDER BY
d.d;
It's not strictly necessary here, as you've seen in the other answer. When doing FULL OUTER JOIN or other complex queries, however, it can be extremely helpful to filter your sources in one scope and join in a separate scope.
(I always filter my sources, I hate lumpy ketchup.)

Multiple columns are specified in an aggregated expression containing an outer reference TSQL

I have the following query:
SELECT
FileNumber,
dbo.GetLocalDateTimeFunc(SentDate) AS SentDate
INTO #tmp1
FROM FileMain f
JOIN FileActions fa ON f.FileID = fa.FileID
WHERE ActionDefID = 15 AND SentDate IS NOT NULL
SELECT
FileNumber,
dbo.GetLocalDateTimeFunc(ReceivedDate) AS ReceivedDate
INTO #tmp2
FROM FileMain f
JOIN FileActions fa ON f.FileID = fa.FileID
WHERE ActionDefID = 23 AND ReceivedDate IS NOT NULL
SELECT DISTINCT
o.Name AS Company, fm.FileNumber, pc.Name as Client,
p.State, c.County, t1.SentDate, t2.ReceivedDate,
(SELECT sum(case
when dateadd(day, datediff(day, 0, t1.SentDate), 0) = dateadd(day, datediff(day, 0, t2.ReceivedDate), 0) then
datediff(second, t1.SentDate, t2.ReceivedDate)
when [DATE] = dateadd(day, datediff(day, 0, t1.SentDate), 0) then
case
when t1.SentDate > [DATE] + begin_time then datediff(second, t1.SentDate, [DATE] + end_time)
else duration
end
when [DATE] = dateadd(day, datediff(day, 0, t2.ReceivedDate), 0) then
case
when t2.ReceivedDate < [DATE] + end_time then datediff(second, [DATE] + begin_time, t2.ReceivedDate)
else duration
end
else duration
end
)
/ 60.0 / 60.0
FROM F_TABLE_DATE(t1.SentDate, t2.ReceivedDate) d
INNER JOIN Unisource_Calendar c ON d.WEEKDAY_NAME_LONG = c.day_name)
FROM Office o
JOIN PartnerCompany pc ON o.OfficeID = pc.OfficeID
JOIN FileMain fm ON o.OfficeID = fm.OfficeID AND pc.PartnerCompanyID = fm.ClientID
JOIN Property p ON p.FileID = fm.FileID
JOIN County c ON p.CountyID = c.CountyID
JOIN FileActions fa ON fm.FileID = fa.FileID
JOIN #tmp1 t1 ON t1.FileNumber = fm.FileNumber
JOIN #tmp2 t2 ON t2.FileNumber = fm.FileNumber
WHERE p.State IN ('AR', 'CA', 'CO', 'DE', 'DC', 'FL', 'GA', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NJ', 'NV', 'NH', 'NY', 'NC', 'ND', 'OH', 'OK', 'PA', 'RI', 'SC', 'TN', 'TX', 'VA', 'WV', 'WI')
ORDER BY SentDate, FileNumber DESC
I'm getting the following error on my subquery:
Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression.
Does anybody know how to fix this?
Or if someone has a function that can calculate datetime differences while excluding business hours and weekends that would help also. Thanks!

I would recommend you to simplify your code using CTEs for a start (enumeration of ALL tables distracts to give a precise statement). Also you should try your aggregate SUM function as a part of PARTITION by expression. This would probably help to avoid the problem you mentioned.

From what I can glean, the table function F_Table_Date is returning DATE or DATETIME rows for each day between the two parameters, and the UnisourceCalendar Is likely a list of work days (to allot for holidays as you mentioned). If this is the case, and UnisourceCalendar also returns a DATE or DATETIME column, consider this for your subquery:
SELECT (COUNT(*) * 60*60*24)
+ (
SELECT COUNT(*)
FROM UnisourceCalendar
WHERE [DATE] = CAST(CONVERT(VARCHAR,t1.SentDate+1,112) AS DATETIME)
)*DATEDIFF(SS,t1.SentDate,CAST(CONVERT(VARCHAR,t1.SentDate+1,112) AS DATETIME))
+ (
SELECT COUNT(*)
FROM UnisourceCalendar
WHERE [DATE] = CAST(CONVERT(VARCHAR,t1.SentDate+1,112) AS DATETIME)
)*DATEDIFF(SS,CAST(CONVERT(VARCHAR,t2.ReceivedDate,112) AS DATETIME),t2.ReceivedDate)
FROM UnisourceCalendar C
WHERE C.[DATE] > t1.SentDate AND C.[DATE] < t2.ReceivedDate
GROUP BY t1.SentDate, t2.ReceivedDate
What's at Play here:
Presuming 1 row per business day from UnisourceCalendar, any other join is superfluous.
A count is all that's needed, then.
The datediff of a converted/cast value of one date against itself using style 112 strips the time out and is recast as midnight, thus allowing us to get the seconds to next midnight from the sent date, and from the previous midnight of the received date, but only if each date is in the unisource calendar (mulitply by count, if 0, then no seconds added, if 1, then add the extra seconds).
Output is presuming that you will be dividing the results down to hours outside the subquery as you are.
Complicated? Sure, but it should output the results you're looking for in relatively short order.

Tough T-SQL To Left Join?

I've got a table of ExchangeRates that have a countryid and an exchangeratedate something to this effect:
ExchangeRateID Country ToUSD ExchangeRateDate
1 Euro .7400 2/14/2011
2 JAP 80.1900 2/14/2011
3 Euro .7700 7/20/2011
Notice there can be the same country with a different rate based on the date...so for instance above Euro was .7400 on 2/14/2011 and now is .7700 7/20/2011.
I have another table of line items to list items based on the country..in this table each line item has a date associated with it. The line item date should use the corresponding date and country based on the exchange rate. So using the above data if I had a line item with country Euro on 2/16/2011 it should use the euro value for 2/14/2011 and not the value for 7/20/2011 because of the date (condition er.ExchangeRateDate <= erli.LineItemDate). This would work if I only had one item in the table, but imagine I had a line item date of 8/1/2011 then that condition (er.ExchangeRateDate <= erliLineItemDate) would return multiple rows hence my query would fail...
SELECT
er.ExchangeRateID,
er.CountryID AS Expr1,
er.ExchangeRateDate,
er.ToUSD,
erli.ExpenseReportLineItemID,
erli.ExpenseReportID,
erli.LineItemDate
FROM
dbo.ExpenseReportLineItem AS erli
LEFT JOIN
dbo.ExchangeRate AS er
ON er.CountryID = erli.CountryID
AND DATEADD(d, DATEDIFF(d, 0, er.ExchangeRateDate), 0) <= DATEADD(d, DATEDIFF(d, 0,
erli.LineItemDate), 0)
WHERE (erli.ExpenseReportID = 196)
The issue with this left join...is because the dates are <= the line item date so it returns many records, I would have to somehow do this but dont know how.
The LineItem tables has multiple records and each record could have its own CountryID:
Item Country ParentID LineItemDate
Line Item 1 Euro 1 2/14/2011
Line Item 2 US 1 2/14/2011
Line Item3 Euro 1 2/15/2011
So there are three records for ParentID (ExpenseReportID) = 1. So then I take those records and join the ExchangeRate table where the Country in my line item table = the country of the exchange rate table (that part is easy) BUT the second condition I have to do is the:
AND DATEADD(d, DATEDIFF(d, 0, er.ExchangeRateDate), 0) <= DATEADD(d, DATEDIFF(d, 0,
erli.LineItemDate), 0)
But here is where the issue is because that will return multiple rows from my exchange rate table because euro is listed twice.

I may be missing something here, but as I understand it the "dumb" solution to your problem is to use A ROW_NUMBER function and outer filter with your existing "returns too many entries" query (this can also be done with a CTE, but I prefer the derived table syntax for simple cases like this):
SELECT *
FROM (
SELECT
er.ExchangeRateID,
er.CountryID AS Expr1,
er.ExchangeRateDate,
er.ToUSD,
erli.ExpenseReportLineItemID,
erli.ExpenseReportID,
erli.LineItemDate,
ROW_NUMBER() OVER (PARTITION BY ExpenseReportID, ExpenseReportLineItemID ORDER BY ExchangeRateDate DESC) AS ExchangeRateOrderID
FROM dbo.ExpenseReportLineItem AS erli
LEFT JOIN dbo.ExchangeRate AS er
ON er.CountryID = erli.CountryID
AND DATEADD(d, DATEDIFF(d, 0, er.ExchangeRateDate), 0)
<= DATEADD(d, DATEDIFF(d, 0, erli.LineItemDate), 0)
WHERE (erli.ExpenseReportID = 196)
--For reasonable performance, it would be VERY nice to put a filter
-- on how far back the exchange rates can go here:
--AND er.ExchangeRateDate > DateAdd(Day, -7, GetDate())
) As FullData
WHERE ExchangeRateOrderID = 1
Sorry if I misunderstood, otherwise hope this helps!

It would make your life a lot easier if you could add an additional column to your ExchangeRates table called (something like)
ExchangeRateToDate
A separate process could update the previous entry when a new one was added.
Then, you could just query for LineItemDate >= ExhangeRateDate and <= ExchangeRateToDate
(treating the last one, presumably with a null ExchangeRateToDate, as a special case).

I would create an in memory table creating an ExchangeRate table with ExchangeRateDates From & To.
All that's left to do after this is joining this CTE in your query instead of your ExchangeRate table and add a condition where the date is betweenthe date from/to.
SQL Statement
;WITH er AS (
SELECT rn = ROW_NUMBER() OVER (PARTITION BY er1.ExchangeRateID ORDER BY er2.ExchangeRateDate DESC)
, er1.ExchangeRateID
, er1.Country
, ExchangeRateDateFrom = ISNULL(DATEADD(d, 1, er2.ExchangeRateDate), 0)
, ExchangeRateDateTo = er1.ExchangeRateDate
, er1.ToUSD
FROM #ExchangeRate er1
LEFT OUTER JOIN #ExchangeRate er2
ON er1.Country = er2.Country
AND er1.ExchangeRateDate >= er2.ExchangeRateDate
AND er1.ExchangeRateID > er2.ExchangeRateID
)
SELECT er.ExchangeRateID,
er.CountryID AS Expr1,
er.ExchangeRateDateTo,
er.ToUSD,
erli.ExpenseReportLineItemID,
erli.ExpenseReportID,
erli.LineItemDate
FROM dbo.ExpenseReportLineItem AS erli
LEFT JOIN er ON er.CountryID = erli.CountryID
AND DATEADD(d, DATEDIFF(d, 0, er.ExchangeRateDateTo), 0) <= DATEADD(d, DATEDIFF(d, 0, erli.LineItemDate), 0)
AND DATEADD(d, DATEDIFF(d, 0, er.ExchangeRateDateFrom), 0) >= DATEADD(d, DATEDIFF(d, 0, erli.LineItemDate), 0)
WHERE (erli.ExpenseReportID = 196)
and er.rn = 1
Test script
DECLARE #ExchangeRate TABLE (
ExchangeRateID INTEGER
, Country VARCHAR(32)
, ToUSD FLOAT
, ExchangeRateDate DATETIME
)
INSERT INTO #ExchangeRate
VALUES (1, 'Euro', 0.7400, '02/14/2011')
, (2, 'JAP', 80.1900, '02/14/2011')
, (3, 'Euro', 0.7700, '07/20/2011')
, (4, 'Euro', 0.7800, '07/25/2011')
;WITH er AS (
SELECT rn = ROW_NUMBER() OVER (PARTITION BY er1.ExchangeRateID ORDER BY er2.ExchangeRateDate DESC)
, er1.ExchangeRateID
, er1.Country
, ExchangeRateDateFrom = ISNULL(DATEADD(d, 1, er2.ExchangeRateDate), 0)
, ExchangeRateDateTo = er1.ExchangeRateDate
, ToUSD = er1.ToUSD
FROM #ExchangeRate er1
LEFT OUTER JOIN #ExchangeRate er2
ON er1.Country = er2.Country
AND er1.ExchangeRateDate >= er2.ExchangeRateDate
AND er1.ExchangeRateID > er2.ExchangeRateID
)
SELECT *
FROM er
WHERE rn = 1

Perhaps you can try using a table expression to get to your TOP 1 and then JOIN to the table expression. Does that make sense? Hope this helps.

This can be solved by using one or more CTEs. This earlier SO question should have the needed building blocks :
How can you use SQL to return values for a specified date or closest date < specified date?
Note that you have to modify this to your own schema, and also filter out results that are closer but in the future.
I hope this helps, but if not enough then I'm sure I can post a more detailed answer.

If i don't misunderstand what you want to do you could use an outer apply to get the latest exchange rate.
select *
from ExpenseReportLineItem erli
outer apply (select top 1 *
from ExchangeRates as er1
where er1.Country = erli.Country and
er1.ExchangeRateDate <= erli.LineItemDate
order by er1.ExchangeRateDate desc) as er

You can use this as an correlated subquery that will give you a table with the most recent exchange values for a given date (indicated in a comment):
SELECT *
FROM er
INNER JOIN
(
SELECT CountryID, MAX(ExchangeRateDate) AS ExchangeRateDate
FROM er
WHERE ExchangeRateDate <= '9/1/2011'
-- the above is the date you will need to correlate with the main query...
GROUP BY Country
) iq
ON iq.Country = er.Country AND er.ExchangeRateDate = iq.ExchangeRateDate
So the full query should look something like this:
SELECT
iq2.ExchangeRateID,
iq2.CountryID AS Expr1,
iq2.ExchangeRateDate,
iq2.ToUSD,
erli.ExpenseReportLineItemID,
erli.ExpenseReportID,
erli.LineItemDate
FROM dbo.ExpenseReportLineItem AS erli
LEFT JOIN
(
SELECT *
FROM ExchangeRate er
INNER JOIN
(
SELECT CountryID, MAX(ExchangeRateDate) AS ExchangeRateDate
FROM ExchangeRate er
WHERE ExchangeRateDate <= erli.LineItemDate
-- the above is where the correlation occurs...
GROUP BY Country
) iq
ON iq.Country = er.Country AND er.ExchangeRateDate = iq.ExchangeRateDate
) iq2
ON er.CountryID = erli.CountryID
AND DATEADD(d, DATEDIFF(d, 0, iq2.ExchangeRateDate), 0) <= DATEADD(d, DATEDIFF(d, 0, erli.LineItemDate), 0)
WHERE (erli.ExpenseReportID = 196)

why does adding the where statement to this sql make it run so much slower?

I have inherited a stored procedure and am having problems with it takes a very long time to run (around 3 minutes). I have played around with it, and without the where clause it actually only takes 12 seconds to run. None of the tables it references have a lot of data in them, can anybody see any reason why adding the main where clause below makes it take so much longer?
ALTER Procedure [dbo].[MissingReadingsReport] #SiteID INT,
#FormID INT,
#StartDate Varchar(8),
#EndDate Varchar(8)
As
If #EndDate > GetDate()
Set #EndDate = Convert(Varchar(8), GetDate(), 112)
Select Dt.FormID,
DT.FormDAte,
DT.Frequency,
Dt.DayOfWeek,
DT.NumberOfRecords,
Dt.FormName,
dt.OrgDesc,
Dt.CDesc
FROM (Select MeterForms.FormID,
MeterForms.FormName,
MeterForms.SiteID,
MeterForms.Frequency,
DateTable.FormDate,
tblOrganisation.OrgDesc,
CDesc = ( COMPANY.OrgDesc ),
DayOfWeek = CASE Frequency
WHEN 'Day' THEN DatePart(dw, DateTable.FormDate)
WHEN 'WEEK' THEN
DatePart(dw, MeterForms.FormDate)
END,
NumberOfRecords = CASE Frequency
WHEN 'Day' THEN (Select TOP 1 RecordID
FROM MeterReadings
Where
MeterReadings.FormDate =
DateTable.FormDate
And MeterReadings.FormID =
MeterForms.FormID
Order By RecordID DESC)
WHEN 'WEEK' THEN (Select TOP 1 ( FormDate )
FROM MeterReadings
Where
MeterReadings.FormDate >=
DateAdd(d
, -4,
DateTable.FormDate)
And MeterReadings.FormDate
<=
DateAdd(d, 3,
DateTable.FormDate)
AND MeterReadings.FormID =
MeterForms.FormID)
END
FROM MeterForms
INNER JOIN DateTable
ON MeterForms.FormDate <= DateTable.FormDate
INNER JOIN tblOrganisation
ON MeterForms.SiteID = tblOrganisation.pkOrgId
INNER JOIN tblOrganisation COMPANY
ON tblOrganisation.fkOrgID = COMPANY.pkOrgID
/*this is what makes the query run slowly*/
Where DateTable.FormDAte >= #StartDAte
AND DateTable.FormDate <= #EndDate
AND MeterForms.SiteID = ISNULL(#SiteID, MeterForms.SiteID)
AND MeterForms.FormID = IsNull(#FormID, MeterForms.FormID)
AND MeterForms.FormID > 0)DT
Where ( Frequency = 'Day'
And dt.NumberofRecords IS NULL )
OR ( ( Frequency = 'Week'
AND DayOfWeek = DATEPART (dw, Dt.FormDate) )
AND ( FormDate <> NumberOfRecords
OR dt.NumberofRecords IS NULL ) )
Order By FormID

Based on what you've already mentioned, it looks like the tables are properly indexed for columns in the join conditions but not for the columns in the where clause.
If you're not willing to change the query, it may be worth it to look into indexes defined on the where clause columns, specially that have the NULL check

Try replacing your select with this:
FROM
(select siteid, formid, formdate from meterforms
where siteid = isnull(#siteid, siteid) and
meterforms.formid = isnull(#formid, formid) and formid >0
) MeterForms
INNER JOIN
(select formdate from datetable where formdate >= #startdate and formdate <= #enddate) DateTable
ON MeterForms.FormDate <= DateTable.FormDate
INNER JOIN tblOrganisation
ON MeterForms.SiteID = tblOrganisation.pkOrgId
INNER JOIN tblOrganisation COMPANY
ON tblOrganisation.fkOrgID = COMPANY.pkOrgID
/*this is what makes the query run slowly*/
)DT

I would be willing to bet that if you moved the Meterforms where clauses up to the from statement:
FROM (select [columns] from MeterForms WHERE SiteID= ISNULL [etc] ) MF
INNER JOIN [etc]
It would be faster, as the filtering would occur before the join. Also, having your INNER JOIN on your DateTable doing a <= down in your where clause may be returning more than you'd like ... try moving that between up to a subselect as well.
Have you run an execution plan on this yet to see where the bottleneck is?

Random suggestion, coming from an Oracle background:
What happens if you rewrite the following:
AND MeterForms.SiteID = ISNULL(#SiteID, MeterForms.SiteID)
AND MeterForms.FormID = IsNull(#FormID, MeterForms.FormID)
...to
AND (#SiteID is null or MeterForms.SiteID = #SiteID)
AND (#FormID is null or MeterForms.FormID = #FormID)

How to output only one max value from this query in SQL?

Yesterday Thomas helped me a lot by providing exactly the query I wanted. And now I need a variant of it, and hopes someone can help me out.
I want it to output only one row, namely a max value - but it has to build on the algorithm in the following query:
WITH Calendar AS (SELECT CAST(#StartDate AS datetime) AS Date
UNION ALL
SELECT DATEADD(d, 1, Date) AS Expr1
FROM Calendar AS Calendar_1
WHERE (DATEADD(d, 1, Date) < #EndDate))
SELECT C.Date, C2.Country, COALESCE (SUM(R.[Amount of people per day needed]), 0) AS [Allocated testers]
FROM Calendar AS C CROSS JOIN
Country AS C2 LEFT OUTER JOIN
Requests AS R ON C.Date BETWEEN R.[Start date] AND R.[End date] AND R.CountryID = C2.CountryID
WHERE (C2.Country = #Country)
GROUP BY C.Date, C2.Country OPTION (MAXRECURSION 0)
The output from above will be like:
Date Country Allocated testers
06/01/2010 Chile 3
06/02/2010 Chile 4
06/03/2010 Chile 0
06/04/2010 Chile 0
06/05/2010 Chile 19
but what I need right now is
Allocated testers
19
that is - only one column - one row - the max value itself... (for the (via parameters (that already exists)) selected period of dates and country)

use order and limit
ORDER BY 'people needed DESC' LIMIT 1
EDITED
as LIMIT is not exist in sql
use ORDER BY and TOP
select TOP 1 .... ORDER BY 'people needed' DESC

WITH Calendar
AS (
SELECT
CAST(#StartDate AS datetime) AS Date
UNION ALL
SELECT
DATEADD(d, 1, Date) AS Expr1
FROM
Calendar AS Calendar_1
WHERE
( DATEADD(d, 1, Date) < #EndDate )
)
SELECT TOP 1 *
FROM
(
SELECT
C.Date
,C2.Country
,COALESCE(SUM(R.[Amount of people per day needed]), 0) AS [Allocated testers]
FROM
Calendar AS C
CROSS JOIN Country AS C2
LEFT OUTER JOIN Requests AS R
ON C.Date BETWEEN R.[Start date] AND R.[End date]
AND R.CountryID = C2.CountryID
WHERE
( C2.Country = #Country )
GROUP BY
C.Date
,C2.Country
OPTION
( MAXRECURSION 0 )
) lst
ORDER BY lst.[Allocated testers] DESC

Full example following the discussion in #Salil answer..
WITH Calendar AS (SELECT CAST(#StartDate AS datetime) AS Date
UNION ALL
SELECT DATEADD(d, 1, Date) AS Expr1
FROM Calendar AS Calendar_1
WHERE (DATEADD(d, 1, Date) < #EndDate))
SELECT TOP 1 C.Date, C2.Country, COALESCE (SUM(R.[Amount of people per day needed]), 0) AS [Allocated testers]
FROM Calendar AS C CROSS JOIN
Country AS C2 LEFT OUTER JOIN
Requests AS R ON C.Date BETWEEN R.[Start date] AND R.[End date] AND R.CountryID = C2.CountryID
WHERE (C2.Country = #Country)
GROUP BY C.Date, C2.Country
ORDER BY 3 DESC
OPTION (MAXRECURSION 0)
the ORDER BY 3 means order by the 3rd field in the SELECT statement.. so if you remove the first two fields, change this accordingly..

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Changing comparator in WHERE clause has catastrophic results on query performance - sql

Related

GROUP BY & SUM of values with missing MONTHS

Multiple columns are specified in an aggregated expression containing an outer reference TSQL

Tough T-SQL To Left Join?

why does adding the where statement to this sql make it run so much slower?

How to output only one max value from this query in SQL?

Categories

Resources