Select SQL View Slow with table alias

Select SQL View Slow with table alias - sql

I am baffled as to why selecting my SQL View is so slow when using a table alias (25 seconds) but runs so much faster when the alias is removed (2 seconds)
-this query takes 25 seconds.
SELECT [Extent1].[Id] AS [Id],
[Extent1].[ProjectId] AS [ProjectId],
[Extent1].[ProjectWorkOrderId] AS [ProjectWorkOrderId],
[Extent1].[Project] AS [Project],
[Extent1].[SubcontractorId] AS [SubcontractorId],
[Extent1].[Subcontractor] AS [Subcontractor],
[Extent1].[ValuationNumber] AS [ValuationNumber],
[Extent1].[WorksOrderName] AS [WorksOrderName],
[Extent1].[NewGross],
[Extent1].[CumulativeGross],
[Extent1].[CreateByName] AS [CreateByName],
[Extent1].[CreateDate] AS [CreateDate],
[Extent1].[FinalDateForPayment] AS [FinalDateForPayment],
[Extent1].[CreateByEmail] AS [CreateByEmail],
[Extent1].[Deleted] AS [Deleted],
[Extent1].[ValuationStatusCategoryId] AS [ValuationStatusCategoryId]
FROM [dbo].[ValuationsTotal] AS [Extent1]
-this query takes 2 seconds.
SELECT [Id],
[ProjectId],
[Project],
[SubcontractorId],
[Subcontractor],
[NewGross],
[ProjectWorkOrderId],
[ValuationNumber],
[WorksOrderName],
[CreateByName],
[CreateDate],
[CreateByEmail],
[Deleted],
[ValuationStatusCategoryId],
[FinalDateForPayment],
[CumulativeGross]
FROM [dbo].[ValuationsTotal]
this is my SQL View code -
WITH ValuationTotalsTemp(Id, ProjectId, Project, SubcontractorId, Subcontractor, WorksOrderName, NewGross, ProjectWorkOrderId, ValuationNumber, CreateByName, CreateDate, CreateByEmail, Deleted, ValuationStatusCategoryId, FinalDateForPayment)
AS (SELECT vi.ValuationId AS Id,
v.ProjectId,
p.NAME,
b.Id AS Expr1,
b.NAME AS Expr2,
wo.OrderNumber,
SUM(vi.ValuationQuantity * pbc.BudgetRate) AS 'NewGross',
sa.ProjectWorkOrderId,
v.ValuationNumber,
up.FirstName + ' ' + up.LastName AS Expr3,
v.CreateDate,
up.Email,
v.Deleted,
v.ValuationStatusCategoryId,
sa.FinalDateForPayment
FROM dbo.ValuationItems AS vi
INNER JOIN dbo.ProjectBudgetCosts AS pbc
ON vi.ProjectBudgetCostId = pbc.Id
INNER JOIN dbo.Valuations AS v
ON vi.ValuationId = v.Id
INNER JOIN dbo.ProjectSubcontractorApplications AS sa
ON sa.Id = v.ProjectSubcontractorApplicationId
INNER JOIN dbo.Projects AS p
ON p.Id = v.ProjectId
INNER JOIN dbo.ProjectWorkOrders AS wo
ON wo.Id = sa.ProjectWorkOrderId
INNER JOIN dbo.ProjectSubcontractors AS sub
ON sub.Id = wo.ProjectSubcontractorId
INNER JOIN dbo.Businesses AS b
ON b.Id = sub.BusinessId
INNER JOIN dbo.UserProfile AS up
ON up.Id = v.CreateBy
WHERE ( vi.Deleted = 0 )
AND ( v.Deleted = 0 )
GROUP BY vi.ValuationId,
v.ProjectId,
p.NAME,
b.Id,
b.NAME,
wo.OrderNumber,
sa.ProjectWorkOrderId,
v.ValuationNumber,
up.FirstName + ' ' + up.LastName,
v.CreateDate,
up.Email,
v.Deleted,
v.ValuationStatusCategoryId,
sa.FinalDateForPayment)
SELECT Id,
ProjectId,
Project,
SubcontractorId,
Subcontractor,
NewGross,
ProjectWorkOrderId,
ValuationNumber,
WorksOrderName,
CreateByName,
CreateDate,
CreateByEmail,
Deleted,
ValuationStatusCategoryId,
FinalDateForPayment,
(SELECT SUM(NewGross) AS Expr1
FROM ValuationTotalsTemp AS tt
WHERE ( ProjectWorkOrderId = t.ProjectWorkOrderId )
AND ( t.ValuationNumber >= ValuationNumber )
GROUP BY ProjectWorkOrderId) AS CumulativeGross
FROM ValuationTotalsTemp AS t
Any ideas why this is?
The SQL query runs with table alias as this is generated from Entity Framework so I have no way of changing this. I will need to modify my SQL view to be able to handle the table alias without affecting performance.

The execution plans are very different.
The slow one has a part that leaps out as being problematic. It estimates a single row will be input to a nested loops join and result in a single scan of ValuationItems. In practice it ends up performing more than 1,000 such scans.
Estimated
Actual
SQL Server 2014 introduced a new cardinality estimator. Your fast plan is using it. This is shown in the XML as CardinalityEstimationModelVersion="120" Your slow plan isn't (CardinalityEstimationModelVersion="70").
So it looks as though in this case the assumptions used by the new estimator give you a better plan.
The reason for the difference is probably as the fast one is running cross database (references [ProbeProduction].[dbo].[ValuationsTotal]) and presumably the database you are executing it from has compatility level of 2014 so automatically gets the new CardinalityEstimator.
The slow one is executing in the context of ProbeProduction itself and I assume the compatibility level of that database must be < 2014 - so you are defaulting to the legacy cardinality estimator.
You can use OPTION (QUERYTRACEON 2312) to get the slow query to use the new cardinality estimator (changing the database compatibility mode to globally alter the behaviour shouldn't be done without careful testing of existing queries as it can cause regressions as well as improvements).
Alternatively you could just try and tune the query working within the limits of the legacy CE. Perhaps adding join hints to encourage it to use something more akin to the faster plan.

The two queries are different (column order!). It is reasonable to assume the first query uses an index and is therefore much faster. I doubt it has anything to do with the aliassing.

For grins would take out the where and give this a try?
I might be doing a bunch of loop joins and filtering at the end
This might get it to filter up front
FROM dbo.ValuationItems AS vi
INNER JOIN dbo.Valuations AS v
ON vi.ValuationId = v.Id
AND vi.Deleted = 0
AND v.Deleted = 0
-- other joins
-- NO where
If you have a lot of loop joins going on then try inner hash join (on all)

Related

If transaction within date range, then return customer name (and not all the transactions!)

This code is taking a significant amount of time to run. It's returning every single transaction within the date range but I just need to know if the customer has had at least one transaction, then include the CustomerID, CustomerName, Type, Sign, ReportingName.
I think I need to GROUP BY 'CustomerID' but again only if there was a transaction within the date range. And of course, I'm sure there is an optimal way to execute the below TSQL because it's quite slow at present.
Thanks in advance for any help!
SELECT [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))

Check your indexes on fragmentation, to speed up your query. And make sure you have indexes.
If you just need one result, just TOP 1
SELECT TOP 1 [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))

If you only need to check for the existence of a row, and not actually get any data from it then use EXISTS() rather than INNER JOIN, e.g.
SELECT vpr.[RelatedNameId] AS CustomerID
,vpr.[RelatedName] AS CustomerName
,tt.[ParticluarType] AS Type
,prd.[Sign]
,prd.ReportingName
,tr.[EffectiveDate] AS [Date]
FROM [AFGPurchase].[IvL].[Account] AS acc
INNER JOIN [AFGPurchase].[IvL].[Position] AS pos ON acc.[AccountId] = pos.[AccountId]
INNER JOIN [AFGPurchase].[IvL].[Product] AS prd ON pos.[ProductID] = prd.[ProductId]
INNER JOIN [ABC].[dbo].[vwPrimary] AS vpr ON acc.[ReportingEntityId] = vpr.[RelatedNameId]
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] AS tt ON acc.[TaxTreatmentId] = tt.[TaxTreatmentId]
WHERE tt.[RegistrationType] LIKE 'NON%'
AND prd.[Sign]='XYZ2'
AND pos.[Quantity]<>0
AND EXISTS
( SELECT 1
FROM [AFGPurchase].[IvL].[Transaction] AS tr
WHERE tr.[PositionId] = pos.[PositionId]
AND tr.[EffectiveDate] BETWEEN '2021-12-31' AND '2022-12-31'
);
N.B. I have added in table aliases and removed all the unnecessary parentheses for readability - you may disagree that it is more readable, but I would expect that most people would agree
This may not offer any performance benefits over simply grouping by the columns you are selecting and keeping your joins as they are - SQL is after all a declarative language where you tell the engine what you want, not how to get it. So you may find that the two plans are the same because you are requesting the same result. Using EXISTS does have the advance of being more semantically tied to what you are trying to do though, so gives the optimiser the best chance of getting to the right plan. If you are still having performance issues, then you may need to inspect the execution plan, and see if it suggests any indexes.
Finally, if you are really still using SQL Server 2008 then you really need to start thinking about your upgrade path. It has been completely unsupported for over 3 years now.

script optimization in sql

I am running simple select script, which inner join with other 3 table . all the tables are big ( lots of data ) its taking around 20 sec to run. want to optimized it.
I tried to used nolock , but not much deference
SELECT RR.ReportID,
RR.RequestFormat,
RRP.SequenceNumber,
RRP.ParameterName,
RRP.ParameterValue
CASE WHEN RP.ParameterLabelOvrrd IS NULL THEN P.ParameterLabel ELSE .ParameterLabelOvrrd END AS ParameterLabelChosen,
RRP.ParameterValueEntered
FROM ReportRequestParameters AS RRP WITH (NOLOCK)
INNER JOIN ReportRequests AS RR WITH (NOLOCK) ON RRP.RequestID = RR.RequestID
INNER JOIN ReportParameter AS RP WITH (NOLOCK) ON RP.ReportID = RR.ReportID
AND RP.SequenceNumber = RRP.SequenceNumber
INNER JOIN Parameter AS P WITH (NOLOCK) ON P.ParameterID = RP.ParameterID
WHERE RRP.RequestID = '2226765'
ORDER BY SequenceNumber;
Please advice.

This is your query:
SELECT RR.ReportID, RR.RequestFormat, RRP.SequenceNumber,
RRP.ParameterName, RRP.ParameterValue
COALESCE(RP.ParameterLabelOvrrd, P.ParameterLabel) as ParameterLabelChosen,
RRP.ParameterValueEntered
FROM ReportRequestParameters RRP JOIN
ReportRequests RR
ON RRP.RequestID = RR.RequestID JOIN
ReportParameter RP
ON RP.ReportID = RR.ReportID AND
RP.SequenceNumber = RRP.SequenceNumber JOIN
Parameter P
ON P.ParameterID = RP.ParameterID
WHERE RRP.RequestID = 2226765
ORDER BY RRP.SequenceNumber;
I have removed the single quotes on 2226765, assuming that the id is a number. Mixing types can impede the optimizer.
Then, I recommend an index on ReportRequestParameters(RequestID, SequenceNumber). I assume the other tables have indexes on the appropriate columns, but these are:
ReportRequests(RequestID, ReportID, SequenceNumber)
ReportParameter(ReportID, SequenceNumber, ParameterID)
Parameter(ParameterID)
I strongly advise you not to use nolock, unless you know what you are doing. Aaron Bertrand has a good blog post on this subject.

I would suggest running with the execution plan turned on and see if SSMS can advise you on additional indexing.
Other than that your query looks straight-forward, nothing code wise that is going to help make it faster, other than perhaps getting rid of the case statement and definitely getting rid of the NOLOCK statements.

How to optimize a query that uses a multi statement table valued function

so I have a query that is taking several seconds to run.
SELECT it.invoiceID, SUM(xgtpp.total + ws.expense) AS invoice_total
FROM Invoices_Timesheets it (NOLOCK)
INNER JOIN Timesheets_WorkSegments tws (NOLOCK)
INNER JOIN WorkSegments ws (NOLOCK) ON (tws.worksegmentID = ws.ID)
CROSS APPLY (
SELECT gtpp.worksegmentID, SUM(gtpp.pay_per_shift) AS total
FROM dbo.fnGetTotalPerProject(ws.projectID) gtpp
WHERE (gtpp.worksegmentID = tws.worksegmentID)
GROUP BY gtpp.worksegmentID
) xgtpp
ON (it.timesheetID = tws.timesheetID)
WHERE it.invoiceID = 37
GROUP BY it.invoiceID
The tables used are:
[Invoices]
ID,companyID,userID,projectID,insertDate,submitDate,viewDate,tax
[Invoices_Timesheets]
ID,timesheetID,invoiceID
[WorkSegments]
ID,companyID,userID,projectID,insertDate,startTime,endTime,break,poa,deleteDate,expense
[Timesheets_WorkSegments]
ID,timesheetID,worksegmentID
The UDF dbo.fnGetTotalPerProject() accepts only one parameter projectID
When I replace ws.projectID inside the UDF with a static value the performance is incredible, but as soon as I make it use ws.projectID the performance slows down badly.
This query is a sub-query of a larger one, but it is definitely the bottle neck.

I ended up re-writing the dbo.fnGetTotalPerProject to be per timesheet rather than per project. My next goal is to optimize it per work segment.

Different execution time for same query - SQL Server 2008 R2

I run this query in SqlServer 2008 R2, it take about 6 seconds and return around 8000 records. OrderItemView is a view and DocumentStationHistory is a table.
SELECT o.Number, dsh.DateSend AS Expr1
FROM OrderItemView AS o INNER JOIN
DocumentStationHistory AS dsh ON dsh.DocumentStationHistoryId =
(SELECT TOP (1) DocumentStationHistoryId
FROM DocumentStationHistory AS dsh2
WHERE (o.DocumentStationId = ToStationId) AND
(DocumentId = o.id)
ORDER BY DateSend DESC)
WHERE (o.DocumentStationId = 10)
But when I run the same query with o.DocumentStationId = 8 where clause, it return around 200 records but take about 90 seconds!
Is there any idea that where is the problem?

I suppose the index is the issue, But not for o.DocumentStationId but all the fields that are joined using the field o.DocumentStationId.
try to see how your inner query is working by checking the execution plan.
that would need some performance tuning.
Also, try using index for ToStationId and DateSend. also see if you can modify inner query.
Other than these i dont see any suggestions.
Also post you execution plan

I rebuilt the index on o.DocumentStationId and the problem solved.

Try the following query. Also check if there is any index on DocumentStationId, ToStationId and DocumentId. if not create them
SELECT o.Number, dsh.DateSend AS Expr1
FROM OrderItemView AS o
OUTER APPLY
(SELECT TOP (1) DateSend
FROM DocumentStationHistory
WHERE (o.DocumentStationId = ToStationId) AND (DocumentId = o.id)
ORDER BY DateSend DESC) AS dsh
WHERE (o.DocumentStationId = 10)

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?

SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.

what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas