Horrible sql server performance when capturing result in variable - sql

I'm using SQL Server 2012.
When I run this query...
select
count(*)
from
MembershipStatusHistory msh
join
gym.Account a on msh.AccountID = a.AccountID
join
gym.MembershipType mt on a.MembershipTypeID = mt.MembershipTypeID
join
MemberTypeGroups mtg on mt.MemberTypeGroupID = mtg.MemberTypeGroupID
where
mtg.MemberTypeGroupID IN (1,2)
and msh.NewMembershipStatus = 'Cancelled'
and year(msh.ChangeDate) = year(getdate())
and month(msh.ChangeDate) = month(getdate())
and day(msh.ChangeDate) = day(getdate())
...it returns almost instantly. Great. Now, when I run the same exact query like this:
declare #CancellationsToday int
SET #CancellationsToday = (
select
count(*)
from MembershipStatusHistory msh
join gym.Account a
on msh.AccountID = a.AccountID
join gym.MembershipType mt
on a.MembershipTypeID = mt.MembershipTypeID
join MemberTypeGroups mtg
on mt.MemberTypeGroupID = mtg.MemberTypeGroupID
where mtg.MemberTypeGroupID IN (1,2)
and msh.NewMembershipStatus = 'Cancelled'
and year(msh.ChangeDate) = year(getdate())
and month(msh.ChangeDate) = month(getdate())
and day(msh.ChangeDate) = day(getdate())
)
...it takes 1.5 MINUTES to return. Consistently, every time.
What the **** is going on? I have to use a variable because I need to sum the result later on in my stored proc. I am storing the results of other queries in the same proc and they are fast. I am stumped.
Here is the execution plan from the SLOW query:
And here is the execution plan from the FAST query:
I'll be honest, I don't know what these execution plans mean or what I need to correct.

Very strange but try something like this....
declare #CancellationsToday int;
select #CancellationsToday = count(*)
from MembershipStatusHistory msh
join gym.Account a
on msh.AccountID = a.AccountID
join gym.MembershipType mt
on a.MembershipTypeID = mt.MembershipTypeID
join MemberTypeGroups mtg
on mt.MemberTypeGroupID = mtg.MemberTypeGroupID
where mtg.MemberTypeGroupID IN (1,2)
and msh.NewMembershipStatus = 'Cancelled'
and year(msh.ChangeDate) = year(getdate())
and month(msh.ChangeDate) = month(getdate())
and day(msh.ChangeDate) = day(getdate())

Mmmm strange, try this:
SELECT #CancellationsToday = COUNT(*) FROM ......
Another thing worth to mention is don't use functions in the WHERE clause.
I think you have only the date in msh.ChangeDate, make a variable with today's date like this:
DATEADD(dd, 0, DATEDIFF(dd, 0, GETDATE()))
and use that in the WHERE clause.

You need to look at the execution plans for both queries in SQL Server Management Studio to understand what's going on and why. There may be an index you can add that will fix things, or the plan itself may tell you what's going side-ways and how to fix it. Without that info, it's hard to know what to say here.
As I commented above, adjusting your where clause to get rid of the six function calls and just compare the "date" portion of the database column with a constant variable should help some.
Another minor suggestion would be to be explicit about INNER JOIN if that's what you want... always specify exactly the type of join you want (INNER JOIN, LEFT OUTER JOIN, CROSS JOIN, etc.) instead of just 'join'. It makes things more clear.

Related

Need help in optimizing sql query

I am new to sql and have created the below sql to fetch the required results.However the query seems to take ages in running and is quite slow. It will be great if any help in optimization is provided.
Below is the sql query i am using:
SELECT
Date_trunc('week',a.pair_date) as pair_week,
a.used_code,
a.used_name,
b.line,
b.channel,
count(
case when b.sku = c.sku then used_code else null end
)
from
a
left join b on a.ma_number = b.ma_number
and (a.imei = b.set_id or a.imei = b.repair_imei
)
left join c on a.used_code = c.code
group by 1,2,3,4,5
I would rewrite the query as:
select Date_trunc('week',a.pair_date) as pair_week,
a.used_code, a.used_name, b.line, b.channel,
count(*) filter (where b.sku = c.sku)
from a left join
b
on a.ma_number = b.ma_number and
a.imei in ( b.set_id, b.repair_imei ) left join
c
on a.used_code = c.code
group by 1,2,3,4,5;
For this query, you want indexes on b(ma_number, set_id, repair_imei) and c(code, sku). However, this doesn't leave much scope for optimization.
There might be some other possibilities, depending on the tables. For instance, or/in in the on clause is usually a bad sign -- but it is unclear what your intention really is.

How to optimize query to reduce execution time

My query's order by clause & datetime comparison of between causes the execution time to increase, where as I had indexed the datetime
SELECT TOP(1)
#PeriodStart = DATEADD(SECOND, 1, dbo.tbl_WPT_AttendanceLog.ATDateTime)
FROM
dbo.tbl_WPT_EmployeeMachineLink
INNER JOIN
dbo.tbl_WPT_Machine ON dbo.tbl_WPT_EmployeeMachineLink.FK_tbl_WPT_Machine_ID = dbo.tbl_WPT_Machine.ID
RIGHT OUTER JOIN
dbo.tbl_WPT_AttendanceLog ON dbo.tbl_WPT_EmployeeMachineLink.FK_tbl_WPT_Machine_ID = dbo.tbl_WPT_AttendanceLog.FK_tbl_WPT_Machine_ID
AND dbo.tbl_WPT_EmployeeMachineLink.MachineEnrollmentNo = dbo.tbl_WPT_AttendanceLog.ATEnrollmentNo
WHERE
(dbo.tbl_WPT_EmployeeMachineLink.FK_tbl_WPT_Employee_ID = #EmpID)
AND (dbo.tbl_WPT_AttendanceLog.ATDateTime BETWEEN #ShiftEndPreviousInstance AND #ShiftStart)
AND dbo.tbl_WPT_AttendanceLog.ATInOutMode in (1,2,5)
OR (dbo.tbl_WPT_AttendanceLog.ATDateTime BETWEEN #ShiftEndPreviousInstance AND #ShiftStart)
AND (dbo.tbl_WPT_AttendanceLog.FK_tbl_WPT_Employee_ID = #EmpID)
AND dbo.tbl_WPT_AttendanceLog.ATInOutMode in (1,2,5)
ORDER BY
dbo.tbl_WPT_AttendanceLog.ATDateTime DESC
It looks like you're trying to get an employee's info from multiple sources (EmployeeMachineLink and AttendanceLog). Is that correct? If so, I think you just need to clean up the WHERE clause logic:
SELECT TOP(1)
#PeriodStart = DATEADD(SECOND, 1, dbo.tbl_WPT_AttendanceLog.ATDateTime)
FROM dbo.tbl_WPT_EmployeeMachineLink eml
INNER JOIN dbo.tbl_WPT_Machine ON eml.FK_tbl_WPT_Machine_ID = dbo.tbl_WPT_Machine.ID
RIGHT OUTER JOIN dbo.tbl_WPT_AttendanceLog ON eml.FK_tbl_WPT_Machine_ID = dbo.tbl_WPT_AttendanceLog.FK_tbl_WPT_Machine_ID
AND eml.MachineEnrollmentNo = dbo.tbl_WPT_AttendanceLog.ATEnrollmentNo
WHERE (
eml.FK_tbl_WPT_Employee_ID = #EmpID OR
dbo.tbl_WPT_AttendanceLog.FK_tbl_WPT_Employee_ID = #EmpID
)
AND (dbo.tbl_WPT_AttendanceLog.ATDateTime BETWEEN #ShiftEndPreviousInstance AND #ShiftStart)
AND dbo.tbl_WPT_AttendanceLog.ATInOutMode IN (1,2,5)
ORDER BY dbo.tbl_WPT_AttendanceLog.ATDateTime DESC
Changes
- added table alias eml for readability
- removed duplicate reference to dbo.tbl_WPT_AttendanceLog.ATInOutMode IN (1,2,5)
- removed duplicate BETWEEN ... AND ... reference
- grouped OR conditions together
You have to be careful when mixing OR with AND without using parentheses. Otherwise that will lead to unexpected results and possibly poor performance.
Let me know if that helps.

Improve performance of a sql query

I am looking for some tips/tricks to improve performance of a stored procedure with multiple SELECT statements inserting into a table. All objects I am joining on are already indexed.
I believe the reason this stored procedure takes almost an hour to run is because there are multiple SELECT statements using following two views: rvw_FinancialLineItemValues and rvw_FinancialLineItems
Also, each SELECT statement uses specific hard-coded values for AccountNumber, LineItemTypeID, and few other field values coming from two views mentioned above.
Would it improve performance if I create a temporary table, which gets ALL data needed for these SELECT statements at once and then using this temporary table in my join instead?
Are there any other ways to improve performance and manageability?
SELECT
#scenarioid,
#portfolioid,
pa.Id,
pa.ExternalID,
(select value from fn_split(i.AccountNumber,'.') where id = 1),
ac.[Description],
cl.Name,
NullIf((select value from fn_split(i.AccountNumber,'.') where id = 2),''),
NullIf((select value from fn_split(i.AccountNumber,'.') where id = 3),''),
ty.Name,
v.[Date],
cast(SUM(v.Amount) as decimal(13,2)),
GETDATE()
FROM rvw_FinancialLineItems i
INNER JOIN rvw_Scenarios sc
ON i.ScenarioId = sc.Id
AND sc.Id = #scenarioid
AND sc.PortfolioId = #portfolioid
INNER JOIN #pa AS pa
ON i.PropertyAssetID = pa.Id
INNER JOIN rvw_FinancialLineItemValues v
ON i.ScenarioId = v.ScenarioId
AND i.PropertyAssetID = v.PropertyAssetID
AND i.Id = v.FinancialLineItemId
AND ((i.BusinessEntityTypeId = 11
AND i.LineItemTypeId = 3002)
OR (i.LineItemTypeId IN (2005, 2010, 2003, 2125, 2209, 5012, 6001)
AND i.ModeledEntityKey = 1))
AND i.AccountNumber not in ('401ZZ','403ZZ')
AND i.AccountNumber not in ('401XX')
AND i.AccountNumber not in ('40310','41110','42010','41510','40190','40110') -- exclude lease-level revenues selected below
AND v.[Date] BETWEEN #fromdate AND
CASE
WHEN pa.AnalysisEnd < #todate THEN pa.AnalysisEnd
ELSE #todate
END
AND v.ResultSet IN (0, 4)
INNER JOIN rvw_Portfolios po
ON po.Id = #portfolioid
INNER JOIN Accounts ac
ON po.ChartOfAccountId = ac.ChartOfAccountId
AND i.AccountNumber = ac.AccountNumber
AND ac.HasSubAccounts = 0
INNER JOIN fn_LookupClassTypes() cl
ON ac.ClassTypeId = cl.Id
INNER JOIN LineItemTypes ty
ON ac.LineItemTypeId = ty.Id
LEFT JOIN OtherRevenues r
ON i.PropertyAssetID = r.PropertyAssetID
AND i.AccountNumber = r.AccountID
AND v.[Date] BETWEEN r.[Begin] AND r.[End]
WHERE (r.IsMemo IS NULL
OR r.IsMemo = 0)
GROUP BY pa.AnalysisBegin
,pa.Id
,pa.ExternalID
,i.AccountNumber
,ac.[Description]
,cl.Name
,ty.Name
,v.[Date]
HAVING SUM(v.amount) <> 0
You should run your query with SET SHOWPLAN ALL ON or with Management Studio Save Execution Plan and look for inefficiencies.
There are some resources online that help in analyzing the results, such as:
http://www.sqlservercentral.com/articles/books/65831/
See also How do I obtain a Query Execution Plan?
First thing, which fn_split() UDF are you using? If you are not using the table-Valued inline UDF, then this is notoriously slow.
Second, is the UDF fn_LookupClassTypes() an inline table valued UDF? If not, convert it to an inline Table-Valued UDF.
Last, your SQL query had some redundancies. Try this and see what it does.
SELECT #scenarioid, #portfolioid, pa.Id, pa.ExternalID,
(select value from fn_split(i.AccountNumber,'.')
where id = 1), ac.[Description], cl.Name,
NullIf((select value from fn_split(i.AccountNumber,'.')
where id = 2),''),
NullIf((select value from fn_split(i.AccountNumber,'.')
where id = 3),''), ty.Name, v.[Date],
cast(SUM(v.Amount) as decimal(13,2)), GETDATE()
FROM rvw_FinancialLineItems i
JOIN rvw_Scenarios sc ON sc.Id = i.ScenarioId
JOIN #pa AS pa ON pa.Id = i.PropertyAssetID
JOIN rvw_FinancialLineItemValues v
ON v.ScenarioId = i.ScenarioId
AND v.PropertyAssetID = i.PropertyAssetID
AND v.FinancialLineItemId = i.Id
JOIN rvw_Portfolios po ON po.Id = sc.portfolioid
JOIN Accounts ac
ON ac.ChartOfAccountId = po.ChartOfAccountId
AND ac.AccountNumber = i.AccountNumber
JOIN fn_LookupClassTypes() cl On cl.Id = ac.ClassTypeId
JOIN LineItemTypes ty On ty.Id = ac.LineItemTypeId
Left JOIN OtherRevenues r
ON r.PropertyAssetID = i.PropertyAssetID
AND r.AccountID = i.AccountNumber
AND v.[Date] BETWEEN r.[Begin] AND r.[End]
WHERE i.ScenarioId = #scenarioid
and ac.HasSubAccounts = 0
and sc.PortfolioId = #portfolioid
and IsNull(r.IsMemo, 0) = 0)
and v.ResultSet In (0, 4)
and i.AccountNumber not in
('401XX', '401ZZ','403ZZ','40310','41110',
'42010','41510','40190','40110')
and v.[Date] BETWEEN #fromdate AND
CASE WHEN pa.AnalysisEnd < #todate
THEN pa.AnalysisEnd ELSE #todate END
and ((i.LineItemTypeId = 3002 and i.BusinessEntityTypeId = 11) OR
(i.ModeledEntityKey = 1 and i.LineItemTypeId IN
(2005, 2010, 2003, 2125, 2209, 5012, 6001)))
GROUP BY pa.AnalysisBegin,pa.Id, pa.ExternalID, i.AccountNumber,
ac.[Description],cl.Name,ty.Name,v.[Date]
HAVING SUM(v.amount) <> 0
I would look to the following first. What are the wait types relevant to your stored procedure here? Are you seeing a lot of disk io time? Are things being done in memory? Maybe there's network latency pulling that much information.
Next, what does the plan look like for the procedure, where does it show all the work is being done?
The views definitely could be an issue as you mentioned. You could maybe have pre-processed tables so you don't have to do as many joins. Specifically the joins where you are seeing the most amount of CPU spent.
Correlated subqueries are generally slow and should never be used when you are trying for performance. Use the fn_split to create a temp table Index it if need be and then join to it to get the value you need. You might need to join multiple times for different value, without actually knowing the data I am having a hard time visualizing.
It is also not good for performance to use OR. Use UNION ALL in a derived table instead.
Since you have all those conditions on the view rvw_FinancialLineItems, yes it might work to pull those out to a temp table and then index the temp table.
YOu might also see if using the views is even a good idea. Often views have joins to many table that you aren't getting data from and thus are less performant than querying only the tables you actually need. This is especially true if your organization was dumb enough to make views that call views.

How to access separate database from SQL Server stored procedure

I am trying to run the below code but run into the error:
Multi-part Identifier could not be bound
I think it is due to trying to access a database table from a separate database but it is on the same server. Any ideas?
SELECT DISTINCT
#ActiveStudents2 = COUNT([ActivityHistory].[dbo].[tblActivityCounts].[id])
FROM
dbo.tblSchools
INNER JOIN
dbo.tblStudentSchool ON dbo.tblSchools.schoolid = dbo.tblStudentSchool.schoolid
INNER JOIN
dbo.tblStudentPersonal ON dbo.tblStudentSchool.id = dbo.tblStudentPersonal.id
WHERE
dbo.tblStudentSchool.schoolid IN (#tempschoolid)
AND tblStudentSchool.graduationyear IN (SELECT Items
FROM FN_Split(#gradyears, ','))
AND ([ActivityHistory].[dbo].[tblActivityCounts].[datetimechanged] >= #datefrom
AND [ActivityHistory].[dbo].[tblActivityCounts].[datetimechanged] <= #dateto)
The error occurs when I try to access the tblActivityCounts table in the Activity History database which is a separate database. I even try running this as the sa and it doesn't work. There aren't any spelling errors. Any help is appreciated. Thank you!
You are not joining on ActivityHistory table. That's why the query doesn't know from where to access tblActivityCounts table.
As Aleem said, you're not joining in the other table. I'd also recommend taking advantage of table aliases to make your code just a bit more readable... Makes it much easier to figure out what you were doing when you come back to it later.
Depending on how tblActivityCounts relates to the other tables, you would do something like this:
SELECT #ActiveStudents2 = COUNT(ActivityHistory.dbo.tblActivityCounts.id)
FROM dbo.tblSchools as Schools
INNER JOIN dbo.tblStudentSchool as Students
ON Schools.schoolid = Students.schoolid
INNER JOIN dbo.tblStudentPersonal as Personel
ON Students.id = Personel.id
INNER JOIN ActivityHistory.dbo.tblActivityCounts as Activity
ON Activity.studentid = Students.id -- Depending on how tblActivityCounts relates to the other tables
WHERE Students.schoolid IN (#tempschoolid)
AND tblStudentSchool.graduationyear IN (SELECT Items FROM FN_Split(#gradyears, ','))
AND Activity.datetimechanged >= #datefrom
AND Activity.datetimechanged <= #dateto
Your query is wrong. This line:
SELECT DISTINCT
#ActiveStudents2 =
COUNT([ActivityHistory].[dbo].[tblActivityCounts].[id])
Will retrieve all Ids from the tblActivityCounts table and count them, but has no reference in the rest of your query. You have to JOIN with this table and reference it to make your count. Also, I would recommend you use aliases.
Something like this:
SELECT DISTINCT
#ActiveStudents2 = COUNT(ah.[id])
FROM dbo.tblSchools s
INNER JOIN dbo.tblStudentSchool ss ON s.schoolid = ss.schoolid
INNER JOIN dbo.tblStudentPersonal sp ON s.id = sp.id
INNER JOIN [ActivityHistory].[dbo].[tblActivityCounts] ah ON s.acIdentifier = ah.acIdentifier
WHERE ss.schoolid IN (#tempschoolid)
AND ss.graduationyear IN (SELECT Items
FROM FN_Split(#gradyears, ','))
AND ([ActivityHistory].[dbo].[tblActivityCounts].[datetimechanged] >= #datefrom
AND [ActivityHistory].[dbo].[tblActivityCounts].[datetimechanged] <= #dateto)

Left Outer Join in SQL Server 2014

We are currently upgrading to SQL Server 2014; I have a join that runs fine in SQL Server 2008 R2 but returns duplicates in SQL Server 2014. The issue appears to be with the predicate AND L2.ACCOUNTING_PERIOD = RG.PERIOD_TO for if I change it to anything but 4, I do not get the duplicates. The query is returning those values in Accounting Period 4 twice. This query gets account balances for all the previous Accounting Periods so in this case it returns values for Accounting Periods 0, 1, 2 and 3 correctly but then duplicates the values from Period 4.
SELECT
A.ACCOUNT,
SUM(A.POSTED_TRAN_AMT),
SUM(A.POSTED_BASE_AMT),
SUM(A.POSTED_TOTAL_AMT)
FROM
PS_LEDGER A
LEFT JOIN PS_GL_ACCOUNT_TBL B
ON B.SETID = 'LTSHR'
LEFT OUTER JOIN PS_LEDGER L2
ON A.BUSINESS_UNIT = L2.BUSINESS_UNIT
AND A.LEDGER = L2.LEDGER
AND A.ACCOUNT = L2.ACCOUNT
AND A.ALTACCT = L2.ALTACCT
AND A.DEPTID = L2.DEPTID
AND A.PROJECT_ID = L2.PROJECT_ID
AND A.DATE_CODE = L2.DATE_CODE
AND A.BOOK_CODE = L2.BOOK_CODE
AND A.GL_ADJUST_TYPE = L2.GL_ADJUST_TYPE
AND A.CURRENCY_CD = L2.CURRENCY_CD
AND A.STATISTICS_CODE = L2.STATISTICS_CODE
AND A.FISCAL_YEAR = L2.FISCAL_YEAR
AND A.ACCOUNTING_PERIOD = L2.ACCOUNTING_PERIOD
AND L2.ACCOUNTING_PERIOD = RG.PERIOD_TO
WHERE
A.BUSINESS_UNIT = 'UK001'
AND A.LEDGER = 'LOCAL'
AND A.FISCAL_YEAR = 2015
AND ( (A.ACCOUNTING_PERIOD BETWEEN 1 and 4
AND B.ACCOUNT_TYPE IN ('E','R') )
OR
(A.ACCOUNTING_PERIOD BETWEEN 0 and 4
AND B.ACCOUNT_TYPE IN ('A','L','Q') ) )
AND A.STATISTICS_CODE = ' '
AND A.ACCOUNT = '21101'
AND A.CURRENCY_CD <> ' '
AND A.CURRENCY_CD = 'GBP'
AND B.SETID='LTSHR'
AND B.ACCOUNT=A.ACCOUNT
AND B.SETID = SETID
AND B.EFFDT=(SELECT MAX(EFFDT) FROM PS_GL_ACCOUNT_TBL WHERE SETID='LTSHR' AND WHERE ACCOUNT=B.ACCOUNT AND EFFDT<='2015-01-31 00:00:00.000')
GROUP BY A.ACCOUNT
ORDER BY A.ACCOUNT
I'm inclined to suspect that you have simplified your original query too much to reflect the real problem, but I'm going to answer the question as posed, in light of the comments on it to this point.
Since your query does not in fact select anything derived from table L2, nor do any other predicates rely on anything from that table, the only thing accomplished by (left) joining it is to duplicate rows of the pre-aggregation results where more than one satisfies the join condition for the same L2 row. That seems unlikely to be what you want, especially with that particular join being a self join, so I don't see any reason not to remove it altogether. Dollars to doughnuts, that solves the duplication problem.
I'm also going to suggest removing the correlated subquery in the WHERE clause in favor of joining an inline view, since you already join the base table for the subquery anyway. This particular inline view uses the window function version of MAX() instead of the aggregate function version. Ideally, it would directly select only the rows with the target EFFDT values, but it cannot do so without being rather more complicated, which is exactly what I am trying to avoid. The resulting query therefore filters EFFDT externally, as the original did, but without a correlated subquery.
I furthermore removed a few redundant predicates and rewrote one of the messier ones to a somewhat nicer equivalent. And I reordered the predicates in a way that seems more logical to me.
Additionally, since you are filtering on a specific value of A.ACCOUNT, it is pointless (but not wrong) to GROUP BY or ORDER_BY that column. Accordingly, I have removed those clauses to make the query simpler and clearer.
Here's what I came up with:
SELECT
A.ACCOUNT,
SUM(A.POSTED_TRAN_AMT),
SUM(A.POSTED_BASE_AMT),
SUM(A.POSTED_TOTAL_AMT)
FROM
PS_LEDGER A
INNER JOIN (
SELECT
*,
MAX(EFFDT) OVER (PARTITION BY ACCOUNT) AS MAX_EFFDT
FROM PS_GL_ACCOUNT_TBL
WHERE
EFFDT <= '2015-01-31 00:00:00.000'
AND SETID = 'LTSHR'
) B
ON B.ACCOUNT=A.ACCOUNT
WHERE
A.ACCOUNT = '21101'
AND A.BUSINESS_UNIT = 'UK001'
AND A.LEDGER = 'LOCAL'
AND A.FISCAL_YEAR = 2015
AND A.CURRENCY_CD = 'GBP'
AND A.STATISTICS_CODE = ' '
AND B.EFFDT = B.MAX_EFFDT
AND CASE
WHEN B.ACCOUNT_TYPE IN ('E','R')
THEN A.ACCOUNTING_PERIOD BETWEEN 1 and 4
WHEN B.ACCOUNT_TYPE IN ('A','L','Q')
THEN A.ACCOUNTING_PERIOD BETWEEN 0 and 4
ELSE 0
END