Improve performance of a sql query - sql

I am looking for some tips/tricks to improve performance of a stored procedure with multiple SELECT statements inserting into a table. All objects I am joining on are already indexed.
I believe the reason this stored procedure takes almost an hour to run is because there are multiple SELECT statements using following two views: rvw_FinancialLineItemValues and rvw_FinancialLineItems
Also, each SELECT statement uses specific hard-coded values for AccountNumber, LineItemTypeID, and few other field values coming from two views mentioned above.
Would it improve performance if I create a temporary table, which gets ALL data needed for these SELECT statements at once and then using this temporary table in my join instead?
Are there any other ways to improve performance and manageability?
SELECT
#scenarioid,
#portfolioid,
pa.Id,
pa.ExternalID,
(select value from fn_split(i.AccountNumber,'.') where id = 1),
ac.[Description],
cl.Name,
NullIf((select value from fn_split(i.AccountNumber,'.') where id = 2),''),
NullIf((select value from fn_split(i.AccountNumber,'.') where id = 3),''),
ty.Name,
v.[Date],
cast(SUM(v.Amount) as decimal(13,2)),
GETDATE()
FROM rvw_FinancialLineItems i
INNER JOIN rvw_Scenarios sc
ON i.ScenarioId = sc.Id
AND sc.Id = #scenarioid
AND sc.PortfolioId = #portfolioid
INNER JOIN #pa AS pa
ON i.PropertyAssetID = pa.Id
INNER JOIN rvw_FinancialLineItemValues v
ON i.ScenarioId = v.ScenarioId
AND i.PropertyAssetID = v.PropertyAssetID
AND i.Id = v.FinancialLineItemId
AND ((i.BusinessEntityTypeId = 11
AND i.LineItemTypeId = 3002)
OR (i.LineItemTypeId IN (2005, 2010, 2003, 2125, 2209, 5012, 6001)
AND i.ModeledEntityKey = 1))
AND i.AccountNumber not in ('401ZZ','403ZZ')
AND i.AccountNumber not in ('401XX')
AND i.AccountNumber not in ('40310','41110','42010','41510','40190','40110') -- exclude lease-level revenues selected below
AND v.[Date] BETWEEN #fromdate AND
CASE
WHEN pa.AnalysisEnd < #todate THEN pa.AnalysisEnd
ELSE #todate
END
AND v.ResultSet IN (0, 4)
INNER JOIN rvw_Portfolios po
ON po.Id = #portfolioid
INNER JOIN Accounts ac
ON po.ChartOfAccountId = ac.ChartOfAccountId
AND i.AccountNumber = ac.AccountNumber
AND ac.HasSubAccounts = 0
INNER JOIN fn_LookupClassTypes() cl
ON ac.ClassTypeId = cl.Id
INNER JOIN LineItemTypes ty
ON ac.LineItemTypeId = ty.Id
LEFT JOIN OtherRevenues r
ON i.PropertyAssetID = r.PropertyAssetID
AND i.AccountNumber = r.AccountID
AND v.[Date] BETWEEN r.[Begin] AND r.[End]
WHERE (r.IsMemo IS NULL
OR r.IsMemo = 0)
GROUP BY pa.AnalysisBegin
,pa.Id
,pa.ExternalID
,i.AccountNumber
,ac.[Description]
,cl.Name
,ty.Name
,v.[Date]
HAVING SUM(v.amount) <> 0

You should run your query with SET SHOWPLAN ALL ON or with Management Studio Save Execution Plan and look for inefficiencies.
There are some resources online that help in analyzing the results, such as:
http://www.sqlservercentral.com/articles/books/65831/
See also How do I obtain a Query Execution Plan?

First thing, which fn_split() UDF are you using? If you are not using the table-Valued inline UDF, then this is notoriously slow.
Second, is the UDF fn_LookupClassTypes() an inline table valued UDF? If not, convert it to an inline Table-Valued UDF.
Last, your SQL query had some redundancies. Try this and see what it does.
SELECT #scenarioid, #portfolioid, pa.Id, pa.ExternalID,
(select value from fn_split(i.AccountNumber,'.')
where id = 1), ac.[Description], cl.Name,
NullIf((select value from fn_split(i.AccountNumber,'.')
where id = 2),''),
NullIf((select value from fn_split(i.AccountNumber,'.')
where id = 3),''), ty.Name, v.[Date],
cast(SUM(v.Amount) as decimal(13,2)), GETDATE()
FROM rvw_FinancialLineItems i
JOIN rvw_Scenarios sc ON sc.Id = i.ScenarioId
JOIN #pa AS pa ON pa.Id = i.PropertyAssetID
JOIN rvw_FinancialLineItemValues v
ON v.ScenarioId = i.ScenarioId
AND v.PropertyAssetID = i.PropertyAssetID
AND v.FinancialLineItemId = i.Id
JOIN rvw_Portfolios po ON po.Id = sc.portfolioid
JOIN Accounts ac
ON ac.ChartOfAccountId = po.ChartOfAccountId
AND ac.AccountNumber = i.AccountNumber
JOIN fn_LookupClassTypes() cl On cl.Id = ac.ClassTypeId
JOIN LineItemTypes ty On ty.Id = ac.LineItemTypeId
Left JOIN OtherRevenues r
ON r.PropertyAssetID = i.PropertyAssetID
AND r.AccountID = i.AccountNumber
AND v.[Date] BETWEEN r.[Begin] AND r.[End]
WHERE i.ScenarioId = #scenarioid
and ac.HasSubAccounts = 0
and sc.PortfolioId = #portfolioid
and IsNull(r.IsMemo, 0) = 0)
and v.ResultSet In (0, 4)
and i.AccountNumber not in
('401XX', '401ZZ','403ZZ','40310','41110',
'42010','41510','40190','40110')
and v.[Date] BETWEEN #fromdate AND
CASE WHEN pa.AnalysisEnd < #todate
THEN pa.AnalysisEnd ELSE #todate END
and ((i.LineItemTypeId = 3002 and i.BusinessEntityTypeId = 11) OR
(i.ModeledEntityKey = 1 and i.LineItemTypeId IN
(2005, 2010, 2003, 2125, 2209, 5012, 6001)))
GROUP BY pa.AnalysisBegin,pa.Id, pa.ExternalID, i.AccountNumber,
ac.[Description],cl.Name,ty.Name,v.[Date]
HAVING SUM(v.amount) <> 0

I would look to the following first. What are the wait types relevant to your stored procedure here? Are you seeing a lot of disk io time? Are things being done in memory? Maybe there's network latency pulling that much information.
Next, what does the plan look like for the procedure, where does it show all the work is being done?
The views definitely could be an issue as you mentioned. You could maybe have pre-processed tables so you don't have to do as many joins. Specifically the joins where you are seeing the most amount of CPU spent.

Correlated subqueries are generally slow and should never be used when you are trying for performance. Use the fn_split to create a temp table Index it if need be and then join to it to get the value you need. You might need to join multiple times for different value, without actually knowing the data I am having a hard time visualizing.
It is also not good for performance to use OR. Use UNION ALL in a derived table instead.
Since you have all those conditions on the view rvw_FinancialLineItems, yes it might work to pull those out to a temp table and then index the temp table.
YOu might also see if using the views is even a good idea. Often views have joins to many table that you aren't getting data from and thus are less performant than querying only the tables you actually need. This is especially true if your organization was dumb enough to make views that call views.

Related

How do I improve my code for better/faster performance?

I have this INSERT setup where I want the following to be filtered and inserted in a table. The 'Splitout' has more than 30000000 rows. This statement is taking more than 2 hours for a single project I have 100 projects like this.
My initial plan was to insert everything at once but because it took more than 20 for it to execute I had to split by projects but even then the performance was very low. I was planning on using CROSS-APPLY but wasn't really sure how it would apply in my case. Any suggestions to improve the performance is appreciated.
Below is the code I have now - Thank you !
insert into DimQuestion (ResponderKey, ProjectID, qid, Question, QuestionType,AttributeID,
Attribute, ProductID, ProductCode, ProductName, AnswerCode, AnswerLabel)
SELECT distinct RKey
,a.ProID
,a.qid
,c.QID + ' - ' + c.[Ql] as Question
,c.[Type] as QType
,a.AttributeID
,e.Attribute
,a.ProductID
,d.ProductCode
,d.ProductName
,a.Answers
,'AnswerLabel' = case when a.qid not in ('Q2','QA','QA1','QA2','QA5','QA6','QA7','QA8','QF1','QF2','QF2c','QF2a','QF5',
'QF6','QF7','QF8','QF9','QF10','QX5','QX12')
then datamap.[Answer Label]
when a.qid = 'Q2'
then f.AnswerLabel
when a.qid in ('QA','QA1','QA2','QA5','QA6','QA7','QA8','QF1','QF2','QF2c','QF2a','QF5',
'QF6','QF7','QF8','QF9','QF10','QX5','QX12')
then a.Answers END
FROM [SplitOut] a
INNER join [DimResponder] b on a.responseid = b.ResponseID and a.ProjectID = b.ProjectID
INNER join Question_List c on a.qid = c.qid
left outer JOIN Data_Map datamap ON a.QID = datamap.QID and a.answers = datamap.[answer code]
left outer join DimProduct d on a.ProductID = d.ProductTypeCode and a.ProjectID = d.ProjectID
left outer join DimAttribute e on e.projectid = 0 and a.AttributeID = e.AttributeCode
left outer join Q2AnswerData f on a.QID = f.QID and a.Answers = f.AnswerCode and a.AttributeID = f.VariableID
where a.columnNames not like '%open%' and a.ColumnNames not like '%seg%' and a.columnnames not like '%rot%' and a.Answers not like ''and datamap.Project not in ('Project 0') and a.ProjectID in (1,2,3,4,5,6,7,8,9,10)
Make sure columns in the ON clauses are indexed.
Just looking at the SQL and not having the table counts and explain plan, here's something to try. Use a subquery for the SPLITOUT table. You say that table has 30,000,000 rows. Most of your where clause qualifiers are against the SPLITOUT table, so I'm using a subquery to reduce the number of rows joined to it. The way it's coded in your version, SPLITOUT will possibly be joined to the other tables before the where clause is applied. I agree with the comments that like clauses are bad. They don't use an index so you're most likely table scanning your 30,000,000 row table.
I also made the call to DATAMAP a subquery because it's a left join with a qualifier in the where clause. If there's no row, the qualifier will fail when you may have wanted it to succeed.
Run the subquery on SPLITOUT by itself. Tune it first. Create a composite index on splitout.projectID, answers, and columnNames. If the optimizer uses it for projectID, the likes on columnNames may be index scanned. Once the SPLITOUT subquery is tuned, add the other joins in one at a time.
Try to remove the distinct with cleaner joins. The optimizer has to sort to do a distinct which is costly.
Don't use like and in when you don't need to. Use = and not when possible.
I wouldn't use cross join for a query such as this.
insert into DimQuestion (ResponderKey, ProjectID, qid, Question, QuestionType,AttributeID, Attribute, ProductID, ProductCode, ProductName, AnswerCode, AnswerLabel)
SELECT distinct RKey
,a.ProID
,a.qid
,c.QID + ' - ' + c.[Ql] as Question
,c.[Type] as QType
,a.AttributeID
,e.Attribute
,a.ProductID
,d.ProductCode
,d.ProductName
,a.Answers
,'AnswerLabel' = case when a.qid = 'Q2' then f.AnswerLabel
when a.qid not in ('Q2','QA','QA1','QA2','QA5','QA6','QA7','QA8','QF1','QF2','QF2c','QF2a','QF5',
'QF6','QF7','QF8','QF9','QF10','QX5','QX12')
then datamap.[Answer Label]
when a.qid in ('QA','QA1','QA2','QA5','QA6','QA7','QA8','QF1','QF2','QF2c','QF2a','QF5',
'QF6','QF7','QF8','QF9','QF10','QX5','QX12')
then a.Answers
end
FROM (select a.responseID, a.ProjectID, a.ProID, a.qid, a.AttributeID, a.ProductID, a.Answers
from [SplitOut]
where a.ProjectID in (1,2,3,4,5,6,7,8,9,10)
and a.Answers <> '' -- don't use like when equality or inequality will work, note not and <> do not use the index
and a.columnNames not like '%open%' and a.ColumnNames not like '%seg%' and a.columnnames not like '%rot%' -- very bad, won't use index, consider creating a category or codes column to identify these values.
) a
inner join [DimResponder] b on a.responseid = b.ResponseID and a.ProjectID = b.ProjectID
inner join Question_List c on a.qid = c.qid
left outer join (select quid, [Answer Code] ,[Answer Label] from datamap where datamap.Project <> 'Project 0') datamap ON a.QID = datamap.QID and a.answers = datamap.[answer code]
left outer join DimProduct d on a.ProductID = d.ProductTypeCode and a.ProjectID = d.ProjectID
left outer join DimAttribute e on e.projectid = 0 and a.AttributeID = e.AttributeCode
left outer join Q2AnswerData f on a.QID = f.QID and a.Answers = f.AnswerCode and a.AttributeID = f.VariableID
Inner and outer joins tend to be expensive (used to do a lot of SQL years ago and they were then). You may be able to do it as several insert statements and it may be a lot faster, expecialy if you can reduce/get rid on inner/outer joins.
Also maybe making some temp tables will help, pre processing some of the data.
Also prociding a list of indexed will realy help. Indexes can make the diference wetween minutes and houres,

Stored Procedure Optimisation

I have written the following SQL Stored Procedure and because of all the select commands (I think) it's really slow running now the database has been populated with lots of data. Is there a way to optimise it so that it runs much quicker? Currently in an Azure S0 DB it takes around 1:40 min to process. Here's the stored procedure:
USE [cmb2SQL]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[spStockReport] #StockReportId as INT
AS
select
ProductId,
QtyCounted,
LastStockCount,
Purchases,
UnitRetailPrice,
CostPrice,
GrossProfit,
Consumed,
(Consumed * UnitRetailPrice) as ValueOfSales,
(QtyCounted * CostPrice) as StockOnHand,
StockCountId
from (
select
ProductId,
QtyCounted,
LastStockCount,
Purchases,
UnitRetailPrice,
CostPrice,
GrossProfit,
(LastStockCount + Purchases) - QtyCounted as Consumed,
StockCountId
from (
select
distinct
sci.StockCountItem_Product as ProductId,
(Select ISNULL(SUM(Qty), 0) as tmpQty from
(Select Qty from stockcountitems
join stockcounts on stockcountitems.stockcountitem_stockcount = stockcounts.id
where stockcountitem_Product = p.Id and stockcountitem_stockcount = sc.id and stockcounts.stockcount_pub = sc.StockCount_Pub
) as data
) as QtyCounted,
(Select ISNULL(SUM(Qty), 0) as LastStockCount from
(Select Qty from StockCountItems as sci
join StockCounts on sci.StockCountItem_StockCount = StockCounts.Id
join Products on sci.StockCountItem_Product = Products.id
where sci.StockCountItem_Product = p.id and sci.stockcountitem_stockcount =
(select top 1 stockcounts.id from stockcounts
join stockcountitems on stockcounts.id = stockcountitem_stockcount
where stockcountitems.stockcountitem_product = p.id and stockcounts.id < sc.id and StockCounts.StockCount_Pub = sc.StockCount_Pub
order by stockcounts.id desc)
) as data
) as LastStockCount,
(Select ISNULL(SUM(Qty * CaseSize), 0) as Purchased from
(select Qty, Products.CaseSize from StockPurchaseItems
join Products on stockpurchaseitems.stockpurchaseitem_product = products.id
join StockPurchases on stockpurchaseitem_stockpurchase = stockpurchases.id
join Periods on stockpurchases.stockpurchase_period = periods.id
where Products.id = p.Id and StockPurchases.StockPurchase_Period = sc.StockCount_Period and StockPurchases.StockPurchase_Pub = sc.StockCount_Pub) as data
) as Purchases,
sci.RetailPrice as UnitRetailPrice,
sci.CostPrice,
(select top 1 GrossProfit from Pub_Products where Pub_Products.Pub_Product_Product = p.id and Pub_Products.Pub_Product_Pub = sc.StockCount_Pub) as GrossProfit,
sc.Id as StockCountId
from StockCountItems as sci
join StockCounts as sc on sci.StockCountItem_StockCount = sc.Id
join Pubs on sc.StockCount_Pub = pubs.Id
join Periods as pd on sc.StockCount_Period = pd.Id
join Products as p on sci.StockCountItem_Product = p.Id
join Pub_Products as pp on p.Id = pp.Pub_Product_Product
Where StockCountItem_StockCount = #StockReportId and pp.Pub_Product_Pub = sc.StockCount_Pub
Group By sci.CostPrice, sci.StockCountItem_Product, sci.Qty, sc.Id, p.Id, sc.StockCount_Period, pd.Id, sci.RetailPrice, pp.CountPrice, sc.StockCount_Pub
) as data
) as final
GO
As requested here is the execution plan in XML (had to upload it to tinyupload as it exceeds the message character length):
execusionplan.xml
Schema:
Row Counts:
Table row_count
StockPurchaseItems 57511
Products 3116
StockCountItems 60949
StockPurchases 6494
StockCounts 240
Periods 30
Pub_Products 5694
Pubs 7
Without getting into the query rewrite, it's the most expensive and the last thing you should do probably. Try these steps first, one by one, and measure the impact - time, execution plan, and SET STATISTICS IO ON output. Create the baseline first for these metrics. Stop when you achieve the acceptable performance.
First of all, update statistics on relevant tables, I see some of the estimates are way off. Check the exec plan for estimated vs actual rows - any better now?
Create indexes on StockPurchaseItems(StockPurchaseItem_Product) and on StockCountItems(StockCountItem_Product, StockCountItem_StockCount). Check the execution plan, did optimizer consider using the indexes at all?
Add (include) other referenced columns to those two indexes in order to cover the query. Are they used now?
If nothing of above helped, consider breaking the query into smaller ones. Would be nice to have some real data to experiment with to be more specific.
** That "select distinct" smells real bad, are you sure the joins are all ok?

Improve query in stored procedure

I need to improve this query inside a stored procedure, the performance with lots of data broke the application. Is there any way to make it faster?
I need to collect certain columns from several tables to build a dashboard in my app web, others columns I collected in other query and joined in my controller through classes.
EDIT: this query is a part from dynamically transaction, they can change the database.
SELECT
a.fact_num as cotizacion, a.comentario, m.co_cli, k.cli_des,
m.co_ven, l.ven_des, m.fec_emis, m.fec_venc, m.campo8,
a.reng_num, a.co_art, g.art_des, a.co_alma, b.fact_num as pedido,
c.fact_num as factura, d.fact_num as despacho, e.cob_num as cobro,
f.fec_venc as fecha_venc, f.fec_emis as fecha_pedido,
h.odp_num as ord_produccion, h.co_ced as cedula, i.req_num as requisicion,
j.ent_num as cierre
FROM
reng_cac a
LEFT JOIN
cotiz_c m ON a.fact_num = m.fact_num
LEFT JOIN
reng_ped b ON a.co_art = b.co_art AND a.fact_num = b.num_doc AND b.tipo_doc = 'T'
LEFT JOIN
pedidos f ON b.fact_num = f.fact_num
LEFT JOIN
reng_fac c ON b.fact_num = c.num_doc AND a.co_art = c.co_art AND c.tipo_doc = 'P'
LEFT JOIN
reng_ndd d ON c.fact_num = d.num_doc AND a.co_art = d.co_art AND d.tipo_doc = 'F'
LEFT JOIN
reng_cob e ON c.fact_num = e.doc_num AND e.tp_doc_cob = 'FACT'
LEFT JOIN
art g ON a.co_art = g.co_art
LEFT JOIN
spodp h ON b.fact_num = h.doc_ori AND b.co_art = h.co_art
LEFT JOIN
spreqalm i ON h.odp_num = i.odp_num
LEFT JOIN
spcierre j ON h.odp_num = j.odp_num
LEFT JOIN
clientes k ON m.co_cli = k.co_cli
LEFT JOIN
vendedor l ON m.co_ven = l.co_ven
WHERE
a.fact_num BETWEEN '0' AND '999999999'
AND m.fec_emis BETWEEN '01/01/2012' AND '30/06/2012'
AND m.co_cli BETWEEN '' AND 'þþþþþþþþþþþþþþþþþþþþþþþþþþþþþþ'
ORDER BY
a.fact_num, a.reng_num ASC
As Nick commented, you always need to run your query in a execution plan. There are some points you need to consider. For instance, having a lot of LEFT JOINs will reduce the performance. Try to see if you can use INNER JOINs where possible. Check if you have proper indexes. If you can attach your execution plan to your question, you will get more helpful answers.

Retrieve additional rows if bit flag is true

I have a large stored procedure that is used to return results for a dialog with many selections. I have a new criteria to get "extra" rows if a particular bit column is set to true. The current setup looks like this:
SELECT
CustomerID,
FirstName,
LastName,
...
FROM HumongousQuery hq
LEFT JOIN (
-- New Query Text
) newSubQuery nsq ON hq.CustomerID = nsq.CustomerID
I have the first half of the new query:
SELECT DISTINCT
c.CustomerID,
pp.ProjectID,
ep.ProductID
FROM Customers c
JOIN Evaluations e (NOLOCK)
ON c.CustomerID = e.CustomerID
JOIN EvaluationProducts ep (NOLOCK)
ON e.EvaluationID = ep.EvaluationID
JOIN ProjectProducts pp (NOLOCK)
ON ep.ProductID = pp.ProductID
JOIN Projects p
ON pp.ProjectID = p.ProjectID
WHERE
c.EmployeeID = #EmployeeID
AND e.CurrentStepID = 5
AND p.IsComplete = 0
The Projects table has a bit column, AllowIndirectCustomers, which tells me that this project can use additional customers when the value is true. As far as I can tell, the majority of the different SQL constructs are geared towards adding additional columns to the result set. I tried different permutations of the UNION command, with no luck. Normally, I would turn to a table-valued function, but I haven't been able to make it work with this scenerio.
This one has been a stumper for me. Any ideas?
So basically, you're looking to negate the need to match pp.ProjectID = p.ProjectID when the flag is set. You can do that right in the JOIN criteria:
JOIN Projects p
ON pp.ProjectID = p.ProjectID OR p.AllowIndirectCustomers = 1
Depending on the complexity of your tables, this might not work out too easily, but you could do a case statement on your bit column. Something like this:
select table1.id, table1.value,
case table1.flag
when 1 then
table2.value
else null
end as secondvalue
from table1
left join table2 on table1.id = table2.id
Here's a SQL Fiddle demo

Strange performance issue with SELECT (SUBQUERY)

I have a stored procedure that has been having some issues lately and I finally narrowed it down to 1 SELECT. The problem is I cannot figure out exactly what is happening to kill the performance of this one query. I re-wrote it, but I am not sure the re-write is the exact same data.
Original Query:
SELECT
#userId, p.job, p.charge_code, p.code
, (SELECT SUM(b.total) FROM dbo.[backorder w/total] b WHERE b.ponumber = p.ponumber AND b.code = p.code)
, ISNULL(jm.markup, 0)
, (SELECT SUM(b.TOTAL_TAX) FROM dbo.[backorder w/total] b WHERE b.ponumber = p.ponumber AND b.code = p.code)
, p.ponumber
, p.billable
, p.[date]
FROM dbo.PO p
INNER JOIN dbo.JobCostFilter jcf
ON p.job = jcf.jobno AND p.charge_code = jcf.chargecode AND jcf.userno = #userId
LEFT JOIN dbo.JobMarkup jm
ON jm.jobno = p.job
AND jm.code = p.code
LEFT JOIN dbo.[Working Codes] wc
ON p.code = wc.code
INNER JOIN dbo.JOBFILE j
ON j.JOB_NO = p.job
WHERE (wc.brcode <> 4 OR #BmtDb = 0)
GROUP BY p.job, p.charge_code, p.code, p.ponumber, p.billable, p.[date], jm.markup, wc.brcode
This query will practically never finish running. It actually times out for some larger jobs we have.
And if I change the 2 subqueries in the select to read like joins instead:
SELECT
#userid, p.job, p.charge_code, p.code
, (SELECT SUM(b.TOTAL))
, ISNULL(jm.markup, 0)
, (SELECT SUM(b.TOTAL_TAX))
, p.ponumber, p.billable, p.[date]
FROM dbo.PO p
INNER JOIN dbo.JobCostFilter jcf
ON p.job = jcf.jobno AND p.charge_code = jcf.chargecode AND jcf.userno = 11190030
INNER JOIN [BACKORDER W/TOTAL] b
ON P.PONUMBER = b.ponumber AND P.code = b.code
LEFT JOIN dbo.JobMarkup jm
ON jm.jobno = p.job
AND jm.code = p.code
LEFT JOIN dbo.[Working Codes] wc
ON p.code = wc.code
INNER JOIN dbo.JOBFILE j
ON j.JOB_NO = p.job
WHERE (wc.brcode <> 4 OR #BmtDb = 0)
GROUP BY p.job, p.charge_code, p.code, p.ponumber, p.billable, p.[date], jm.markup, wc.brcode
The data comes out looking very nearly identical to me (though there are thousands of lines overall so I could be wrong), and it runs very quickly.
Any ideas appreciated..
Performace
In the second query you have less logical reads because the table [BACKORDER W/TOTAL] has been scanned only once. In the first query two separate subqueries are processed indenpendent and the table is scanned twice although both subqueries have the same predicates.
Correctness
If you want to check if two queries return the same resultset you can use the EXCEPT operator:
If both statements:
First SELECT Query...
EXCEPT
Second SELECT Query...
and
Second SELECT Query..
EXCEPT
First SELECT Query...
return an empty set the resultsets are identical.
In terms of correctness, you are inner joining [BACKORDER W/TOTAL] in the second query, so if the first query has Null values in the subqueries, these rows would be missing in the second query.
For performance, the optimizer is a heuristic - it will sometimes use spectacularly bad query plans, and even minimal changes can sometimes lead to a completely different query plan. Your best chance is to compare the query plans and see what causes the difference.