Optimize SQL Query to avoid Hash Match (Aggregate) - sql

I have a SQL query that takes 7 minutes+ to return results. I'm trying to optimize as much as possible and the Execution plan loses 82% of the time on a Hash Match (Aggregate). I've done some searching and looks like using an "EXISTS" would help to resolve, but I haven't figured out the syntax of the query to make it work. Here's the query:
select dbo.Server.Name,
dbo.DiskSpace.Drive,
AVG(dbo.DiskSpace.FreeSpace) as 'Free Disk Space',
AVG(dbo.Processor.PercentUsed) as 'CPU % Used',
AVG(dbo.Memory.PercentUtilized) as '% Mem Used'
from Server
join dbo.DiskSpace on dbo.Server.ID=DiskSpace.ServerID
join dbo.Processor on dbo.Server.ID=Processor.ServerID
join dbo.Memory on dbo.Server.ID=dbo.Memory.ServerID
where
dbo.Processor.ProcessorNum='_Total'
and dbo.Processor.Datetm>DATEADD(DAY,-(1),(CONVERT (date, GETDATE())))
and ( dbo.Server.Name='qp-ratking'
or dbo.Server.Name='qp-hyper2012'
or dbo.Server.Name='qp-hyped'
or dbo.Server.Name='qp-lichking')
Group By dbo.server.name, Dbo.DiskSpace.Drive
Order By Dbo.Server.Name, dbo.DiskSpace.Drive;
How do I reduce/eliminate the joins using EXISTS? Or if there is a better way to optimize, I'm up for that too. Thanks

A co-worker broke down the query and pulled out data in smaller chunks so there wasn't as much processing of the data returned by the joins. It cut it down to less than 1 second return. New Query:
WITH tempDiskSpace AS
(
SELECT dbo.Server.Name
,dbo.DiskSpace.Drive
,AVG(dbo.DiskSpace.FreeSpace) AS 'Free Disk Space'
FROM dbo.DiskSpace
LEFT JOIN dbo.Server ON dbo.DiskSpace.ServerID=Server.ID
WHERE dbo.DiskSpace.Datetm>DATEADD(DAY,-(1),(CONVERT (date, GETDATE())))
AND (dbo.Server.Name='qp-ratking'
OR dbo.Server.Name='qp-hyper2012'
OR dbo.Server.Name='qp-hyped'
OR dbo.Server.Name='qp-lichking')
GROUP BY Name, Drive
)
,tempProcessor
AS
(
SELECT dbo.Server.Name
,AVG(dbo.Processor.PercentUsed) AS 'CPU % Used'
FROM dbo.Processor
LEFT JOIN dbo.Server ON dbo.Processor.ServerID=Server.ID
WHERE dbo.Processor.Datetm>DATEADD(DAY,-(1),(CONVERT (date, GETDATE())))
AND dbo.Processor.ProcessorNum='_Total'
AND (dbo.Server.Name='qp-ratking'
OR dbo.Server.Name='qp-hyper2012'
OR dbo.Server.Name='qp-hyped'
OR dbo.Server.Name='qp-lichking')
GROUP BY Name
)
,tempMemory
AS
(
SELECT dbo.Server.Name
,AVG(dbo.Memory.PercentUtilized) as '% Mem Used'
FROM dbo.Memory
LEFT JOIN dbo.Server ON dbo.Memory.ServerID=Server.ID
WHERE dbo.Memory.Datetm>DATEADD(DAY,-(1),(CONVERT (date, GETDATE())))
AND (dbo.Server.Name='qp-ratking'
OR dbo.Server.Name='qp-hyper2012'
OR dbo.Server.Name='qp-hyped'
OR dbo.Server.Name='qp-lichking')
GROUP BY Name
)
SELECT tempDiskSpace.Name, tempDiskSpace.Drive, tempDiskSpace.[Free Disk Space], tempProcessor.[CPU % Used], tempMemory.[% Mem Used]
FROM tempDiskSpace
LEFT JOIN tempProcessor ON tempDiskSpace.Name=tempProcessor.Name
LEFT JOIN tempMemory ON tempDiskSpace.Name=tempMemory.Name
ORDER BY Name, Drive;
Thanks for all the suggestions.

At the very least I'd start with getting rid of all those OR clauses.
AND (dbo.Server.Name='qp-ratking'
OR dbo.Server.Name='qp-hyper2012'
OR dbo.Server.Name='qp-hyped'
OR dbo.Server.Name='qp-lichking')
and replace with
AND dbo.Server.Name in ('qp-ratking','qp-hyper2012','qp-hyped','qp-lichking')
I'm not sure about converting everything to CTEs though. You can't index CTEs and I'm yet to come across an occasion where CTEs outperform a regular query. Your initial query seemed well formed apart from the over use of OR as mentioned above, so I'd be looking at indexes next.

I would start by checking the indexes. Are all the keys used in the join defined as primary keys? Or do they at least have indexes?
Then, additional indexes on Processor and Server might help:
create index idx_Processor_ProcessorNum_Datetm_ServerId on ProcessorNum(ProcessorNum, Datetm, ServerId);
create index idx_Server_Name_ServerId on Server(Name, ServerId)

The statement looks reasonably structured and do not see a huge scope for optimization provided the per-requisits are addressed such as
Check Index Fragmentation and ensure all Indexes are maintained
Check if Statistics are up to date
If dirty ready are acceptable then worth consider applying WITH (NOLOCK) on the tables.
If the query allows declaring variables then moving the DATEADD out of the Filter statement as below can be beneficial.
Hope this helps.
-- Assuming Variables can be declared see the script below.
-- I made a few changes per my coding standard only to help me read better.
DECLARE #dt_Yesterdate DATE
SET #dt_Yesterdate = DATEADD(DAY, -(1), CONVERT (DATE, GETDATE()))
SELECT s.Name,
ds.Drive,
AVG(ds.FreeSpace) AS 'Free Disk Space',
AVG(P.PercentUsed) AS 'CPU % Used',
AVG(m.PercentUtilized) AS '% Mem Used'
FROM Server s
JOIN dbo.DiskSpace AS ds
ON s.ID = ds.ServerID
JOIN dbo.Processor AS p
ON s.ID = p.ServerID
JOIN dbo.Memory AS m
ON s.ID = m.ServerID
WHERE P.ProcessorNum = '_Total'
AND P.Datetm > #dt_Yesterdate
AND s.Name IN ('qp-ratking', 'qp-hyper2012', 'qp-hyped','qp-lichking')
GROUP BY s.name, ds.Drive
ORDER BY s.Name, ds.Drive;

Related

If transaction within date range, then return customer name (and not all the transactions!)

This code is taking a significant amount of time to run. It's returning every single transaction within the date range but I just need to know if the customer has had at least one transaction, then include the CustomerID, CustomerName, Type, Sign, ReportingName.
I think I need to GROUP BY 'CustomerID' but again only if there was a transaction within the date range. And of course, I'm sure there is an optimal way to execute the below TSQL because it's quite slow at present.
Thanks in advance for any help!
SELECT [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))
Check your indexes on fragmentation, to speed up your query. And make sure you have indexes.
If you just need one result, just TOP 1
SELECT TOP 1 [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))
If you only need to check for the existence of a row, and not actually get any data from it then use EXISTS() rather than INNER JOIN, e.g.
SELECT vpr.[RelatedNameId] AS CustomerID
,vpr.[RelatedName] AS CustomerName
,tt.[ParticluarType] AS Type
,prd.[Sign]
,prd.ReportingName
,tr.[EffectiveDate] AS [Date]
FROM [AFGPurchase].[IvL].[Account] AS acc
INNER JOIN [AFGPurchase].[IvL].[Position] AS pos ON acc.[AccountId] = pos.[AccountId]
INNER JOIN [AFGPurchase].[IvL].[Product] AS prd ON pos.[ProductID] = prd.[ProductId]
INNER JOIN [ABC].[dbo].[vwPrimary] AS vpr ON acc.[ReportingEntityId] = vpr.[RelatedNameId]
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] AS tt ON acc.[TaxTreatmentId] = tt.[TaxTreatmentId]
WHERE tt.[RegistrationType] LIKE 'NON%'
AND prd.[Sign]='XYZ2'
AND pos.[Quantity]<>0
AND EXISTS
( SELECT 1
FROM [AFGPurchase].[IvL].[Transaction] AS tr
WHERE tr.[PositionId] = pos.[PositionId]
AND tr.[EffectiveDate] BETWEEN '2021-12-31' AND '2022-12-31'
);
N.B. I have added in table aliases and removed all the unnecessary parentheses for readability - you may disagree that it is more readable, but I would expect that most people would agree
This may not offer any performance benefits over simply grouping by the columns you are selecting and keeping your joins as they are - SQL is after all a declarative language where you tell the engine what you want, not how to get it. So you may find that the two plans are the same because you are requesting the same result. Using EXISTS does have the advance of being more semantically tied to what you are trying to do though, so gives the optimiser the best chance of getting to the right plan. If you are still having performance issues, then you may need to inspect the execution plan, and see if it suggests any indexes.
Finally, if you are really still using SQL Server 2008 then you really need to start thinking about your upgrade path. It has been completely unsupported for over 3 years now.

When I try to get MIN value of manufactured parts I get (Only one expression ... when the subquery is not introduced with EXISTS)

I try to get MIN value of manufactured parts grouped by project like so:
This is my query:
SELECT
proinfo.ProjectN
,ProjShipp.[Parts]
,ProjShipp.Qty AS 'Qty Total'
,Sum(DailyProduction.Quantity) AS 'Qty Manufactured'
,(SELECT DailySumPoteau.IdProject, MIN(DailySumPoteau.DailySum)
FROM (SELECT PShipp.IdProject, SUM(DailyWelding.Quantity) DailySum
FROM DailyWeldingPaintProduction DailyWelding
INNER JOIN ProjectShipping PShipp ON PShipp.id=DailyWelding.FK_idPartShip
WHERE PShipp.id=ProjShipp.id
GROUP BY PShipp.id,PShipp.IdProject)DailySumPoteau
GROUP BY DailySumPoteau.IdProject ) AS 'Qt Pole'
FROM [dbo].[DailyWeldingPaintProduction] DailyProduction
INNER join ProjectShipping ProjShipp on ProjShipp.id=DailyProduction.FK_idPartShip
inner join ProjectInfo proinfo on proinfo.id=IdProject
GROUP By proinfo.id
,proinfo.ProjectN
,ProjShipp.[Parts]
,ProjShipp.Qty
,ProjShipp.[Designation]
,ProjShipp.id
I have three tables:
01 - ProjectInfo: it stores information about the project:
02 - ProjectShipping: it stores information about the parts and it has ProjectInfoId as foreign key:
03 - DailyWeldingPaintProduction: it stores information about daily production and it has ProjectShippingId as foreign key:
but when I run it I get this error:
Msg 116, Level 16, State 1, Line 13
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
How can I solve this problem?.
From your target results, I suspect that you want a window MIN(). Assuming that your query works and generates the correct results when the subquery is removed (column QtPole left apart), that would be:
SELECT pi.ProjectN, ps.[Parts], ps.Qty AS QtyTotal,
SUM(dp.Quantity) AS QtyManufactured,
MIN(SUM(dp.Quantity)) OVER(PARTITION BY pi.ProjectN) AS QtPole
ps.Designation
FROM [dbo].[DailyWeldingPaintProduction] dp
INNER join ProjectShipping ps on ps.id=dp.FK_idPartShip
INNER join ProjectInfo pi on pi.id=IdProject
GROUP BY pi.id, pi.ProjectN, ps.[Parts], ps.Qty, ps.Designation, ps.id
Side note: don't use single quotes for identifiers; they should be reserved for literal strings only. Use the proper quoting character for your database (in SQL Server: square brackets) - or better yet, use identifiers that do not require being quoted.
Formulating the query in the way you have done is not necessarily the best solution. As the other solution mentions, the best method in this instance is probably to use a window function / OVER. But since this can depend on indexes, and also to understand what went wrong, I will give you the way to fix the original query.
The issue with your query is that it has a correlated subquery in the SELECT which returns two values. What you are trying to do can be done in RDBMSs that support row constructors, unfortunately SQl Server is not one of them.
What you are trying to get at here is to get a whole resultset per row of the table.
The correct syntax for your query is to APPLY the resultset of the subquery for every row. You can CROSS APPLY in this instance because you are guaranteed a result anyway due to the correlation:
SELECT
proinfo.ProjectN
,ProjShipp.[Parts]
,ProjShipp.Qty AS 'Qty Total'
,Sum(DailyProduction.Quantity) AS 'Qty Manufactured'
,QtPole.IdProject
,QtPole.MinDailySum
FROM [dbo].[DailyWeldingPaintProduction] DailyProduction
INNER join ProjectShipping ProjShipp on ProjShipp.id=DailyProduction.FK_idPartShip
inner join ProjectInfo proinfo on proinfo.id=ProjShipp.IdProject
CROSS APPLY (
SELECT DailySumPoteau.IdProject, MIN(DailySumPoteau.DailySum) MinDailySum
FROM (SELECT DailyWelding.FK_idPartShip IdProject, SUM(DailyWelding.Quantity) DailySum
FROM DailyWeldingPaintProduction DailyWelding
WHERE DailyWelding.FK_idPartShip=ProjShipp.id
GROUP BY DailyWelding.FK_idPartShip) DailySumPoteau
GROUP BY DailySumPoteau.IdProject
) AS QtPole
GROUP By proinfo.id
,proinfo.ProjectN
,ProjShipp.[Parts]
,ProjShipp.Qty
,ProjShipp.[Designation]
,ProjShipp.id
,QtPole.IdProject
,QtPole.MinDailySum
I have taken the liberty of cleaning up the subquery by removing the unnecessary ProjectShipping reference. Note that the addition of grouping columns here does not matter because of the correlation to ProjShipp.Id
Note also that depending on indexes and density and such like, it may be better to formulate the subquery as a JOIN instead, with the correlation on the outside in the ON. You would need to modify the grouping in that case.

Select SQL View Slow with table alias

I am baffled as to why selecting my SQL View is so slow when using a table alias (25 seconds) but runs so much faster when the alias is removed (2 seconds)
-this query takes 25 seconds.
SELECT [Extent1].[Id] AS [Id],
[Extent1].[ProjectId] AS [ProjectId],
[Extent1].[ProjectWorkOrderId] AS [ProjectWorkOrderId],
[Extent1].[Project] AS [Project],
[Extent1].[SubcontractorId] AS [SubcontractorId],
[Extent1].[Subcontractor] AS [Subcontractor],
[Extent1].[ValuationNumber] AS [ValuationNumber],
[Extent1].[WorksOrderName] AS [WorksOrderName],
[Extent1].[NewGross],
[Extent1].[CumulativeGross],
[Extent1].[CreateByName] AS [CreateByName],
[Extent1].[CreateDate] AS [CreateDate],
[Extent1].[FinalDateForPayment] AS [FinalDateForPayment],
[Extent1].[CreateByEmail] AS [CreateByEmail],
[Extent1].[Deleted] AS [Deleted],
[Extent1].[ValuationStatusCategoryId] AS [ValuationStatusCategoryId]
FROM [dbo].[ValuationsTotal] AS [Extent1]
-this query takes 2 seconds.
SELECT [Id],
[ProjectId],
[Project],
[SubcontractorId],
[Subcontractor],
[NewGross],
[ProjectWorkOrderId],
[ValuationNumber],
[WorksOrderName],
[CreateByName],
[CreateDate],
[CreateByEmail],
[Deleted],
[ValuationStatusCategoryId],
[FinalDateForPayment],
[CumulativeGross]
FROM [dbo].[ValuationsTotal]
this is my SQL View code -
WITH ValuationTotalsTemp(Id, ProjectId, Project, SubcontractorId, Subcontractor, WorksOrderName, NewGross, ProjectWorkOrderId, ValuationNumber, CreateByName, CreateDate, CreateByEmail, Deleted, ValuationStatusCategoryId, FinalDateForPayment)
AS (SELECT vi.ValuationId AS Id,
v.ProjectId,
p.NAME,
b.Id AS Expr1,
b.NAME AS Expr2,
wo.OrderNumber,
SUM(vi.ValuationQuantity * pbc.BudgetRate) AS 'NewGross',
sa.ProjectWorkOrderId,
v.ValuationNumber,
up.FirstName + ' ' + up.LastName AS Expr3,
v.CreateDate,
up.Email,
v.Deleted,
v.ValuationStatusCategoryId,
sa.FinalDateForPayment
FROM dbo.ValuationItems AS vi
INNER JOIN dbo.ProjectBudgetCosts AS pbc
ON vi.ProjectBudgetCostId = pbc.Id
INNER JOIN dbo.Valuations AS v
ON vi.ValuationId = v.Id
INNER JOIN dbo.ProjectSubcontractorApplications AS sa
ON sa.Id = v.ProjectSubcontractorApplicationId
INNER JOIN dbo.Projects AS p
ON p.Id = v.ProjectId
INNER JOIN dbo.ProjectWorkOrders AS wo
ON wo.Id = sa.ProjectWorkOrderId
INNER JOIN dbo.ProjectSubcontractors AS sub
ON sub.Id = wo.ProjectSubcontractorId
INNER JOIN dbo.Businesses AS b
ON b.Id = sub.BusinessId
INNER JOIN dbo.UserProfile AS up
ON up.Id = v.CreateBy
WHERE ( vi.Deleted = 0 )
AND ( v.Deleted = 0 )
GROUP BY vi.ValuationId,
v.ProjectId,
p.NAME,
b.Id,
b.NAME,
wo.OrderNumber,
sa.ProjectWorkOrderId,
v.ValuationNumber,
up.FirstName + ' ' + up.LastName,
v.CreateDate,
up.Email,
v.Deleted,
v.ValuationStatusCategoryId,
sa.FinalDateForPayment)
SELECT Id,
ProjectId,
Project,
SubcontractorId,
Subcontractor,
NewGross,
ProjectWorkOrderId,
ValuationNumber,
WorksOrderName,
CreateByName,
CreateDate,
CreateByEmail,
Deleted,
ValuationStatusCategoryId,
FinalDateForPayment,
(SELECT SUM(NewGross) AS Expr1
FROM ValuationTotalsTemp AS tt
WHERE ( ProjectWorkOrderId = t.ProjectWorkOrderId )
AND ( t.ValuationNumber >= ValuationNumber )
GROUP BY ProjectWorkOrderId) AS CumulativeGross
FROM ValuationTotalsTemp AS t
Any ideas why this is?
The SQL query runs with table alias as this is generated from Entity Framework so I have no way of changing this. I will need to modify my SQL view to be able to handle the table alias without affecting performance.
The execution plans are very different.
The slow one has a part that leaps out as being problematic. It estimates a single row will be input to a nested loops join and result in a single scan of ValuationItems. In practice it ends up performing more than 1,000 such scans.
Estimated
Actual
SQL Server 2014 introduced a new cardinality estimator. Your fast plan is using it. This is shown in the XML as CardinalityEstimationModelVersion="120" Your slow plan isn't (CardinalityEstimationModelVersion="70").
So it looks as though in this case the assumptions used by the new estimator give you a better plan.
The reason for the difference is probably as the fast one is running cross database (references [ProbeProduction].[dbo].[ValuationsTotal]) and presumably the database you are executing it from has compatility level of 2014 so automatically gets the new CardinalityEstimator.
The slow one is executing in the context of ProbeProduction itself and I assume the compatibility level of that database must be < 2014 - so you are defaulting to the legacy cardinality estimator.
You can use OPTION (QUERYTRACEON 2312) to get the slow query to use the new cardinality estimator (changing the database compatibility mode to globally alter the behaviour shouldn't be done without careful testing of existing queries as it can cause regressions as well as improvements).
Alternatively you could just try and tune the query working within the limits of the legacy CE. Perhaps adding join hints to encourage it to use something more akin to the faster plan.
The two queries are different (column order!). It is reasonable to assume the first query uses an index and is therefore much faster. I doubt it has anything to do with the aliassing.
For grins would take out the where and give this a try?
I might be doing a bunch of loop joins and filtering at the end
This might get it to filter up front
FROM dbo.ValuationItems AS vi
INNER JOIN dbo.Valuations AS v
ON vi.ValuationId = v.Id
AND vi.Deleted = 0
AND v.Deleted = 0
-- other joins
-- NO where
If you have a lot of loop joins going on then try inner hash join (on all)

How can I optimize this SQL query? (Solarwinds Orion)

I'm very new to SQL, and still learning. I'm using a reporting tool called Solarwinds Orion, and I'm honestly not sure how specific the query I have written is to the program, so if there's anything in the query that's confusing, let me know and I'll try to figure out if it's specific to the program or not.
The problem with the query I'm running is that it times out after a very long time (maybe an hour) of running. The database I'm using is huge. Unfortunately I don't really know how huge, but I've been told it's huge.
Is there anything I am doing wrong that would have a huge performance impact?
SELECT TOP 10000
Nodes.Caption AS NodeName,
NetflowApplicationSummary.AppName AS Application_Name,
SUM(NetflowApplicationSummary.TotalBytes) AS SUM_of_Bytes_Transferred,
AVG(Case OutBandwidth
When 0 Then 0
Else (NetflowApplicationSummary.TotalBytes/OutBandwidth) * 100
End) AS TEST_PERCENT
FROM
((NetflowApplicationSummary
INNER JOIN Nodes ON (NetflowApplicationSummary.NodeID = Nodes.NodeID))
INNER JOIN InterfaceTraffic ON (Nodes.NodeID = InterfaceTraffic.InterfaceID))
INNER JOIN Interfaces ON (Nodes.NodeID = Interfaces.NodeID)
WHERE
( InterfaceTraffic.DateTime > (GetDate()-30) )
AND
(Nodes.WANCircuit = 1)
GROUP BY Nodes.Caption, NetflowApplicationSummary.AppName
EDIT: I ran COUNT() on each of my tables with the below result.
SELECT COUNT(*) FROM NetflowApplicationSummary # 50671011
SELECT COUNT(*) FROM Nodes # 898
SELECT COUNT(*) FROM InterfaceTraffic # 18000166
SELECT COUNT(*) FROM Interfaces # 3938
# Total : 68,676,013
I really have no idea if 68 million items is a huge database to be honest.
A couple of notes:
The INNER JOIN operator is associative, so get rid of those parenthesis in the FROM clause and let the optimizer figure out the best join order.
You may have an implied cursor from the getdate() function being called for every row. Store the value in a local variable and compare to that.
The resulting SQL should look like this:
DECLARE #Date as datetime = getdate() - 30;
SELECT TOP 10000
Nodes.Caption AS NodeName,
NetflowApplicationSummary.AppName AS Application_Name,
SUM(NetflowApplicationSummary.TotalBytes) AS SUM_of_Bytes_Transferred,
AVG(Case OutBandwidth
When 0 Then 0
Else (NetflowApplicationSummary.TotalBytes/OutBandwidth) * 100
End) AS TEST_PERCENT
FROM NetflowApplicationSummary
INNER JOIN Nodes ON NetflowApplicationSummary.NodeID = Nodes.NodeID
INNER JOIN InterfaceTraffic ON Nodes.NodeID = InterfaceTraffic.InterfaceID
INNER JOIN Interfaces ON Nodes.NodeID = Interfaces.NodeID
WHERE InterfaceTraffic.DateTime > #Date
AND Nodes.WANCircuit = 1
GROUP BY Nodes.Caption, NetflowApplicationSummary.AppName
Also, make sure you have an index on table InterfaceTraffic with a leading field of DateTime. If this doesn't exist you may need to pay the penalty of a first time creation of it.
If this doesn't help, then you may need to post the execution plan where it can be inspected.
Out of interest, also perform a count() on all four tables and post that result, just so members here can make their own assessment of how big your database really is. It is amazing how many non-technical people still think a 1 or 10 GB database is huge, while I run that easily on my workstation!

SQL query hangs on join

I've written an SQL query that produces a report of some stats for each Year-Week-Mine-Product.
It works exactly as desired except for one thing - trn.wid-date isn't the correct date to be using.
I should be using td.datetime-act-comp-dump. When I replace trn.wid-date with td.datetime-act-comp-dump, it doesn't give me any errors but seems to just hang indefinitely. I let it go for a while yesterday and it came back with ORA-01652 unable to extend temp segment by 128 in tablespace TEMP, though I haven't seen that error since.
I don't understand what might be causing that considering that I'm able to successfully return MAX(td.datetime-act-comp-dump) in the query below
select to_char(trn.wid_date, 'IYYY') as dump_year,
to_char(trn.wid_date-7/24, 'IW') as dump_week,
SUBSTR(trn.train_control_id,1,2) as Mine,
vcon.product_type_code as Product,
COUNT(DISTINCT trn.train_control_id) as Trains,
COUNT(1) as Wagons,
MIN(trn.wid_date) as Min_WID_Hrs,
MAX(trn.wid_date) as Max_WID_Hrs,
MIN(td.datetime_act_comp_dump) as Min_Fin_Dump,
MAX(td.datetime_act_comp_dump) as Max_Fin_Dump,
ROUND(SUM(con.weight_total-con.empty_weight_total),0) as Tot_Tonnes,
ROUND(AVG(con.weight_total-con.empty_weight_total),2) as Avg_Tonnes,
ROUND(MIN(con.weight_total-con.empty_weight_total),2) as Minimum,
ROUND(PERCENTILE_DISC(0.99) WITHIN GROUP (ORDER BY (con.weight_total-con.empty_weight_total) DESC),2) as "1st",
from widsys.consist con
INNER JOIN widsys.train trn
USING (train_record_id)
INNER JOIN tpps.train_details td
ON trn.train_tpps_id||trn.mine_code = td.train_id||td.mine_code
INNER JOIN widsys.v_consist_ore_detail vcon
USING (consist_id)
where trn.direction = 'N'
and to_char(trn.wid_date, 'IYYY') = 2009
and to_char(trn.wid_date-7/24, 'IW') = 25
group by to_char(trn.wid_date, 'IYYY'),
to_char(trn.wid_date-7/24, 'IW'),
SUBSTR(trn.train_control_id,1,2),
vcon.product_type_code
order by to_char(trn.wid_date-7/24, 'IW') DESC
Just in order to troubleshoot, from the query above, I've tried removing everything to do with vcon and replacing trn.wid_date with td.datetime-act-comp-dump. The effect is that it only reports on Year-Week-Mine rather than Year-Week-Mine-Product. (see query below)
This new query actually executes rather than just hanging, but returns a few odd results and doesn't isn't sufficient since it doesn't break things down on Product.
select to_char(td.datetime_act_comp_dump, 'IYYY') as dump_year,
to_char(td.datetime_act_comp_dump-7/24, 'IW') as dump_week,
SUBSTR(trn.train_control_id,1,2) as Mine,
--vcon.product_type_code as Product,
COUNT(DISTINCT trn.train_control_id) as Trains,
COUNT(1) as Wagons,
MIN(trn.wid_date) as Min_WID_Hrs,
MAX(trn.wid_date) as Max_WID_Hrs,
MIN(td.datetime_act_comp_dump) as Min_Fin_Dump,
MAX(td.datetime_act_comp_dump) as Max_Fin_Dump,
ROUND(SUM(con.weight_total-con.empty_weight_total),0) as Tot_Tonnes,
ROUND(AVG(con.weight_total-con.empty_weight_total),2) as Avg_Tonnes,
ROUND(MIN(con.weight_total-con.empty_weight_total),2) as Minimum,
ROUND(PERCENTILE_DISC(0.99) WITHIN GROUP (ORDER BY (con.weight_total-con.empty_weight_total) DESC),2) as "1st"
from widsys.consist con
INNER JOIN widsys.train trn
USING (train_record_id)
INNER JOIN tpps.train_details td
ON trn.train_tpps_id||trn.mine_code = td.train_id||td.mine_code
--INNER JOIN widsys.v_consist_ore_detail vcon
--USING (consist_id)
where trn.direction = 'N'
and to_char(td.datetime_act_comp_dump, 'IYYY') = 2009
and to_char(td.datetime_act_comp_dump-7/24, 'IW') = 25
group by to_char(td.datetime_act_comp_dump, 'IYYY'),
to_char(td.datetime_act_comp_dump-7/24, 'IW'),
SUBSTR(trn.train_control_id,1,2)
--vcon.product_type_code
order by to_char(td.datetime_act_comp_dump-7/24, 'IW') DESC
Any advice on what might be going wrong?
Cheers,
Tommy
The only thing that I can think of without more information is that the datetime_act_comp_dump column of train_details isn't indexed and wid_date is. This sounds like a pretty normal performance issue where something is not indexed or the train and train_details tables are dramatically different sizes and your join is blowing up.
I'm not sure which DB you are using, but you might want to figure out how to run the query execution plan profiler and see what the difference between the two execution plans are. I suspect that the answer is going to be something structural or maybe that the concatenation in the join statement is causing some DB-specific problems.
I managed to get it to run muuuuuuuch faster by creating a subquery for widsys tables and one for tpps tables. Then doing an implicit inner join on two columns instead of concatenating.
SELECT blah FROM (widsys subquery) w, (tpps subquery) t WHERE w.mine_code = t.mine_code and w.train_id = t.train_tpps_id