Same Query, Same Server, different database query runs super slow - sql

Wrote a view in a database. The view takes 0 seconds to run when called from 1 database and 2.5 minutes when called from another.
I have created a video that best describes this problem. Watch it here: https://youtu.be/jEqI2bUyelQ
I tired to re create the view by dropping it.
I tried to compare the query execution plans, they are different when run with 1 database vs the other.
I looked into the query it self and noticed that if you remove the where clause the performance is regained and it takes the same amount of time for both.
Expected results are that it should take 0 seconds to run from no matter what database the view is being called from.
Here is the SQL script:
SELECT
cus.MacolaCustNo,
dsp.cmp_code ,
count(distinct dsp.item_no) AS InventoryOnDisplay,
(SELECT max(dsp.LastSynchronizationDate)
FROM Hinkley.dbo.vw_HH_next_Capture_date ) AS UpdatedDate,
case
WHEN DATEADD(DAY, 90, isnull(max(dsp.LastSynchronizationDate),'1/1/1900')) >=
(SELECT max(dsp.LastSynchronizationDate)
FROM Hinkley.dbo.vw_HH_next_Capture_date )
THEN 'Compliant'
WHEN DATEADD(DAY, 90, isnull(max(dsp.LastSynchronizationDate),'1/1/1900')) <=
(SELECT max(dsp.LastSynchronizationDate)
FROM Hinkley.dbo.vw_HH_next_Capture_date )
AND DATEADD(DAY, 90, isnull(max(dsp.LastSynchronizationDate),'1/1/1900')) >= getdate()
THEN 'Warning'
ELSE 'Non-Compliant'
END AS Inventory_Status
FROM
Hinkley.dbo.HLIINVDSP_SQL dsp (nolock)
INNER JOIN
[DATA].dbo.vw_HLI_Customer (nolock) cus
ON cus.CusNo = dsp.cmp_code
WHERE
cus.cust_showroom = 1
AND
cus.active_y = 1
GROUP BY cus.MacolaCustNo,dsp.cmp_code

Related

Simplify SQL query to Interbase DB

I have WPF application which task is to drag data from Interbase DB. Note, that this DB is located on the remote network device. Also, Firebird ado.net data provider is used.
One of my query looks like:
SELECT
T1.ind_st,
T2.ttt,
T2.tdtdtd,
sumr
FROM ((SELECT ind_st,
Sum(r) AS sumR
FROM (SELECT ind_st,
rrr AS r
FROM srok_tel
WHERE date_ch = '23.07.2018 0:00:00'
AND srok_ch = '18'
AND ind_st >= 33049
AND ind_st <= 34717
UNION
SELECT ind_st,
-rrr AS r
FROM srok_tel
WHERE date_ch = '23.07.2018 0:00:00'
AND srok_ch = '12'
AND ind_st >= 33049
AND ind_st <= 34717
UNION
SELECT ind_st,
rrr AS r
FROM srok_tel
WHERE date_ch = '24.07.2018 0:00:00'
AND srok_ch IN ( 6, 12 )
AND ind_st >= 33049
AND ind_st <= 34717)
GROUP BY ind_st) T1
JOIN (SELECT ind_st,
ttt,
tdtdtd
FROM srok_tel
WHERE date_ch = '24.07.2018 0:00:00'
AND srok_ch = '12'
AND ind_st >= 33049
AND ind_st <= 34717) T2
ON T1.ind_st = T2.ind_st)
Yes, heavy, hard to read at first look and probably written in a wrong way, but my task is to drag all data with one query and I am NOT sql pro.
Target table (SROK_TEL), from with data is selecting, contains aproximately 10^7 rows. Query run time is about 90 seconds, which is significantly more, then I wish to see.
Any suggestions about how to make this query work faster?
UPDATE1: On luisarcher's request I've added a query plan (hope that's exactly what he asked for)
PLAN JOIN (SORT ((T1 SROK_TEL NATURAL)
PLAN (T1 SROK_TEL NATURAL)
PLAN (T1 SROK_TEL NATURAL)), T2 SROK_TEL INDEX (PK_SROK_TEL))
I've had an issue like yours not long ago, so I'll share some tips that apply to your situation:
1) If you don't mind having duplicates, you can use UNION ALL instead of UNION. You can see why here
2) Restrict the data you use. This one is important; I got about 90% of execution time reduced by correctly removing data I don't need from the query (more specific where clauses, not selecting useless data).
3) Check if you can add an index in your table srok_tel.

SQL Query - combine 2 rows into 1 row

I have the following query below (view) in SQL Server. The query produces a result set that is needed to populate a grid. However, a new requirement has come up where the users would like to see data on one row in our app. The tblTasks table can produce 1 or 2 rows. The issue becomes when they're is two rows that have the same job_number but different fldProjectContextId (1 or 31). I need to get the MechApprovalOut and ElecApprovalOut columns on one row instead of two.
I've tried restructuring the query using CTE and over partition and haven't been able to get the necessary results I need.
SELECT TOP (100) PERCENT
CAST(dbo.Job_Control.job_number AS int) AS Job_Number,
dbo.tblTasks.fldSalesOrder, dbo.tblTaskCategories.fldTaskCategoryName,
dbo.Job_Control.Dwg_Sent, dbo.Job_Control.Approval_done,
dbo.Job_Control.fldElecDwgSent, dbo.Job_Control.fldElecApprovalDone,
CASE WHEN DATEDIFF(day, dbo.Job_Control.Dwg_Sent, GETDATE()) > 14
AND dbo.Job_Control.Approval_done IS NULL
AND dbo.tblProjectContext.fldProjectContextID = 1
THEN 1 ELSE 0
END AS MechApprovalOut,
CASE WHEN DATEDIFF(day, dbo.Job_Control.fldElecDwgSent, GETDATE()) > 14
AND dbo.Job_Control.fldElecApprovalDone IS NULL
AND dbo.tblProjectContext.fldProjectContextID = 31
THEN 1 ELSE 0
END AS ElecApprovalOut,
dbo.tblProjectContext.fldProjectContextName,
dbo.tblProjectContext.fldProjectContextId, dbo.Job_Control.Drawing_Info,
dbo.Job_Control.fldElectricalAppDwg
FROM dbo.tblTaskCategories
INNER JOIN dbo.tblTasks
ON dbo.tblTaskCategories.fldTaskCategoryId = dbo.tblTasks.fldTaskCategoryId
INNER JOIN dbo.Job_Control
ON dbo.tblTasks.fldSalesOrder = dbo.Job_Control.job_number
INNER JOIN dbo.tblProjectContext
ON dbo.tblTaskCategories.fldProjectContextId = dbo.tblProjectContext.fldProjectContextId
WHERE (dbo.tblTaskCategories.fldTaskCategoryName = N'Approval'
OR dbo.tblTaskCategories.fldTaskCategoryName = N'Re-Approval')
AND (CASE WHEN DATEDIFF(day, dbo.Job_Control.Dwg_Sent, GETDATE()) > 14
AND dbo.Job_Control.Approval_done IS NULL
AND dbo.tblProjectContext.fldProjectContextID = 1
THEN 1 ELSE 0
END = 1)
OR (dbo.tblTaskCategories.fldTaskCategoryName = N'Approval'
OR dbo.tblTaskCategories.fldTaskCategoryName = N'Re-Approval')
AND (CASE WHEN DATEDIFF(day, dbo.Job_Control.fldElecDwgSent, GETDATE()) > 14
AND dbo.Job_Control.fldElecApprovalDone IS NULL
AND dbo.tblProjectContext.fldProjectContextID = 31
THEN 1 ELSE 0
END = 1)
ORDER BY dbo.Job_Control.job_number, dbo.tblTaskCategories.fldProjectContextId
The above query gives me the following result set:
I've created a work around via code (which I don't like but it works for now) where i've used code to populate a "temp" table the way i need it to display the data, that is, one record if duplicate job numbers to get the MechApprovalOut and ElecApprovalOut columns on one row (see first record in following screen shot).
Example:
With the desired result set and one row per job_number, this is how the form looks with the data and how I am using the result set.
Any help restructuring my query to combine duplicate rows with the same job number where MechApprovalOut and ElecApproval out columns are on one row is greatly appreciated! I'd much prefer to use a view on SQL then code in the app to populate a temp table.
Thanks,
Jimmy
What I would do is LEFT JOIN the main table to itself at the beginning of the query, matching on Job Number and Sales Order, such that the left side of the join is only looking at Approval task categories and the right side of the join is only looking at Re-Approval task categories. Then I would make extensive use of the COALESCE() function to select data from the correct side of the join for use later on and in the select clause. This may also be the piece you were missing to make a CTE work.
There is probably also a solution that uses a ranking/windowing function (maybe not RANK itself, but something that category) along with the PARTITION BY clause. However, as those are fairly new to Sql Server I haven't used them enough personally to be comfortable writing an example solution for you without direct access to the data to play with, and it would still take me a little more time to get right than I can devote to this right now. Maybe this paragraph will motivate someone else to do that work.

How can I optimize this SQL query? (Solarwinds Orion)

I'm very new to SQL, and still learning. I'm using a reporting tool called Solarwinds Orion, and I'm honestly not sure how specific the query I have written is to the program, so if there's anything in the query that's confusing, let me know and I'll try to figure out if it's specific to the program or not.
The problem with the query I'm running is that it times out after a very long time (maybe an hour) of running. The database I'm using is huge. Unfortunately I don't really know how huge, but I've been told it's huge.
Is there anything I am doing wrong that would have a huge performance impact?
SELECT TOP 10000
Nodes.Caption AS NodeName,
NetflowApplicationSummary.AppName AS Application_Name,
SUM(NetflowApplicationSummary.TotalBytes) AS SUM_of_Bytes_Transferred,
AVG(Case OutBandwidth
When 0 Then 0
Else (NetflowApplicationSummary.TotalBytes/OutBandwidth) * 100
End) AS TEST_PERCENT
FROM
((NetflowApplicationSummary
INNER JOIN Nodes ON (NetflowApplicationSummary.NodeID = Nodes.NodeID))
INNER JOIN InterfaceTraffic ON (Nodes.NodeID = InterfaceTraffic.InterfaceID))
INNER JOIN Interfaces ON (Nodes.NodeID = Interfaces.NodeID)
WHERE
( InterfaceTraffic.DateTime > (GetDate()-30) )
AND
(Nodes.WANCircuit = 1)
GROUP BY Nodes.Caption, NetflowApplicationSummary.AppName
EDIT: I ran COUNT() on each of my tables with the below result.
SELECT COUNT(*) FROM NetflowApplicationSummary # 50671011
SELECT COUNT(*) FROM Nodes # 898
SELECT COUNT(*) FROM InterfaceTraffic # 18000166
SELECT COUNT(*) FROM Interfaces # 3938
# Total : 68,676,013
I really have no idea if 68 million items is a huge database to be honest.
A couple of notes:
The INNER JOIN operator is associative, so get rid of those parenthesis in the FROM clause and let the optimizer figure out the best join order.
You may have an implied cursor from the getdate() function being called for every row. Store the value in a local variable and compare to that.
The resulting SQL should look like this:
DECLARE #Date as datetime = getdate() - 30;
SELECT TOP 10000
Nodes.Caption AS NodeName,
NetflowApplicationSummary.AppName AS Application_Name,
SUM(NetflowApplicationSummary.TotalBytes) AS SUM_of_Bytes_Transferred,
AVG(Case OutBandwidth
When 0 Then 0
Else (NetflowApplicationSummary.TotalBytes/OutBandwidth) * 100
End) AS TEST_PERCENT
FROM NetflowApplicationSummary
INNER JOIN Nodes ON NetflowApplicationSummary.NodeID = Nodes.NodeID
INNER JOIN InterfaceTraffic ON Nodes.NodeID = InterfaceTraffic.InterfaceID
INNER JOIN Interfaces ON Nodes.NodeID = Interfaces.NodeID
WHERE InterfaceTraffic.DateTime > #Date
AND Nodes.WANCircuit = 1
GROUP BY Nodes.Caption, NetflowApplicationSummary.AppName
Also, make sure you have an index on table InterfaceTraffic with a leading field of DateTime. If this doesn't exist you may need to pay the penalty of a first time creation of it.
If this doesn't help, then you may need to post the execution plan where it can be inspected.
Out of interest, also perform a count() on all four tables and post that result, just so members here can make their own assessment of how big your database really is. It is amazing how many non-technical people still think a 1 or 10 GB database is huge, while I run that easily on my workstation!

Incorrect Results using SUM()

I am not sure where I have gone wrong. I am trying to count the number of hours and endpoints for a company, per agreement. My SUM() results, however, are wildly off-course.
Here is my code:
SELECT v_rpt_Company.Company_Name, v_rpt_Service.agreement_name,
COUNT(DISTINCT v_rpt_Service.TicketNbr) AS tickets,
SUM(ISNULL(v_rpt_Service.Hours_Agreement, 0)) AS hours,
SUM(ISNULL(AGR_Detail.AGD_Qty, 0)) AS endpoints
FROM AGR_Header
INNER JOIN v_rpt_Service
ON AGR_Header.AGR_Header_RecID = v_rpt_Service.AGR_Header_RecID
INNER JOIN v_rpt_Company
ON v_rpt_Service.company_recid = v_rpt_Company.Company_RecID AND
AGR_Header.Company_RecID = v_rpt_Company.Company_RecID
INNER JOIN AGR_Detail
ON AGR_Header.AGR_Header_RecID = AGR_Detail.AGR_Header_RecID
WHERE
(v_rpt_Service.date_entered >= DATEADD(day, - 30, GETDATE()))
AND
(v_rpt_Company.Company_RecID =
CASE
WHEN #Company <> - 1 THEN #Company
ELSE v_rpt_Company.Company_RecID
END)
AND
(v_rpt_Service.AGR_Header_RecID =
CASE
WHEN #Agreement <> - 1 THEN #Agreement
ELSE v_rpt_Service.AGR_Header_RecID
END)
GROUP BY v_rpt_Company.Company_Name, v_rpt_Service.agreement_name
ORDER BY v_rpt_Company.Company_Name, v_rpt_Service.agreement_name
To debug cases like this, you should set up a test database, probably on your development computer where you can drop all tables at any time, rebuild the schema and load test data with specific test cases.
Use tools like dbunit for this. That way, you can be sure that the SQL works as expected, even when the requirements change.
If your test cases work but the result in the production DB fails, copy part of the production DB into your development DB and create a new test case.
The second part of your WHERE clause isn't going to work as planned:
(v_rpt_Company.Company_RecID =
CASE
WHEN #Company <> - 1 THEN #Company
ELSE v_rpt_Company.Company_RecID
END)
the ELSE part will match the current record if #Company=-1, essentially making the result always true.
SELECT v_rpt_Company.Company_Name, v_rpt_Service.agreement_name, COUNT(DISTINCT v_rpt_Service.TicketNbr) AS tickets,
SUM(ISNULL(v_rpt_Service.Hours_Actual, 0)) AS hours,( SUM(ISNULL(AGR_Detail.AGD_Qty, 0))/COUNT(DISTINCT v_rpt_Service.TicketNbr)) AS endpoints
That ended up being the correct result. Thank you DavidFaber.

SQL Query optimization

I have some questions about my query. I call this store-procedure in my first page, so it is important for me if it is optimize enough.
I do some select with some basic where expression, Then I filter them with some expression I passed through this store-procedure.
It is also considerable for me to select top n and its gonna search through millions of items (but I have hundreds of items already) and then do some paging in my website.
Select top (#NumberOfRows)
...
from(
SELECT
row_number() OVER (ORDER BY tblEventOpen.TicketAt, tblEvent.EventName, tblEventDetail.TimeStart) as RowNumber
, ...
FROM --[...some inner join logic...]
WHERE
(tblEventOpen.isValid = 1) AND (tblEvent.isValid = 1) and
(tblCondition_ResellerDetail.ResellerID = 1) AND
(tblEventOpen.TicketAt >= GETDATE()) AND
(GETDATE() BETWEEN
DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.StartTime) , tblEventOpen.TicketAt)
AND DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.EndTime) , tblEventOpen.TicketAt))
) as t1
where RowNumber >= (#PageNumber -1) * #NumberOfRows and
(#city='' or #city is null or city like #city) and
(#At is null or #At=At) and
(#TimeStartInMinute=-1 or #TimeStartInMinute=TimeStartInMinute) and
(#EventName='' or EventName like #EventName) and
(#CategoryID=-1 or #CategoryID = CategoryID) and
(#EventID is null or #EventID = EventID) and
(#DetailID is null or #DetailID = DetailID)
ORDER BY RowNumber
I'm worry about this part:
(GETDATE() BETWEEN
DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.StartTime) , tblEventOpen.TicketAt)
AND DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.EndTime) , tblEventOpen.TicketAt))
How does table t1 execute? I mean after I put some where expression after t1 (line 17 and further), does it filter items after execution of t1? for example I filter result by rownumber of 10, so it mean the inner (...) as t1 select will only return 10 items, or it select all items then my outer select will take 10 of them?
I want to filter my result by some optional parameters, so I put something like #DetailID is null or #DetailID = DetailID, is it a good way?
Anything else should I consider to make it faster (more optimize)?
My comment on your query:
You're correct, you should worry about condition "GETDATE() BETWEEN ...". Comparing value with function involving more than 1 table will most likely scan entire search space. Simplify your condition or if possible add a computed column for such function
Put all conditions except "RowNumber >= ..." in inner query
Its okay to put optional condition the way you do. I do it too :-)
Make sure you have index at least one for each column employed in the where clause as the first column of the index, and then the primary key. It would be better if your primary key is clustered
Well, these are based on my own experience. It may or may be not applicable to your situation.
[UPDATE] Here's the complete query
Select top (#NumberOfRows)
...
from(
SELECT
row_number() OVER (ORDER BY tblEventOpen.TicketAt, tblEvent.EventName, tblEventDetail.TimeStart) as RowNumber
, ...
FROM --[...some inner join logic...]
WHERE
(tblEventOpen.isValid = 1) AND (tblEvent.isValid = 1) and
(tblCondition_ResellerDetail.ResellerID = 1) AND
(tblEventOpen.TicketAt >= GETDATE()) AND
(GETDATE() BETWEEN
DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.StartTime) , tblEventOpen.TicketAt)
AND DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.EndTime) , tblEventOpen.TicketAt)) and
(#city='' or #city is null or city like #city) and
(#At is null or #At=At) and
(#TimeStartInMinute=-1 or #TimeStartInMinute=TimeStartInMinute) and
(#EventName='' or EventName like #EventName) and
(#CategoryID=-1 or #CategoryID = CategoryID) and
(#EventID is null or #EventID = EventID) and
(#DetailID is null or #DetailID = DetailID)
) as t1
where RowNumber >= (#PageNumber -1) * #NumberOfRows
ORDER BY RowNumber
Whilst you can seek advice on your query, it is better to learn how to optimise it yourself.
You need to view the execution plan, identify the bottlenecks and then see if there is anything that can be done to make an improvement.
In SSMS you can click "Query" ---> "Include Actual Execution Plan" before you run your query. (Ctrl+M) is they keyboard shortcut.
Then execute your query. SSMS will create a new tab in the results pane. Which will show you how the SQL engine executes your query, you can hover over each node for more information. The cost % will be particularly interesting, allowing you to see the most expensive part of your query.
It's difficult to advise you any more without that execution plan, which is why a number of people commented on your question. Your schema and indexes change how the query is executed, so it's not something that someone can accuratly replicate in their own environment without scripts for tables / indexes etc.... Even then statistics could be out of date and other problems could arise.
You can also execute SET STATISTICS PROFILE ON to get a textual view of the plan (maybe useful to seek help).
There are a number of articles that can help you fix the bottlenecks, or post another question for more advice.
http://msdn.microsoft.com/en-us/library/ms178071.aspx
SQL Server Query Plan Analysis
Execution Plan Basics