Performance when adding a parameter to where clause - sql

I'm trying to understand this behaviour. My below statement takes nearly half an hour to complete. However when I replace the parameter #IsGazEnabled (in the case statement of the where clause at the bottom) with a value of 1, then it takes just a second.
Looking at the estimated execution plan, when using the parameter and when it takes 30 minutes, most of the cost (92%) lies with a Nested Loop (Left Anti Semi Join). there also seems to be some Parallelism going on. I'm only just learning about Execution plans and I'm left a bit confused. Very different to the plan produced when not using the parameter.
So how does having a parameter compared to not having one make such a difference to the execution plan and performance?
declare #IsGazEnabled tinyint;
set #IsGazEnabled = 1;
select 'CT Ref: ' + accountreference + ' - Not synced due to missing property ref ' + t.PropertyReference
from CTaxAccountTemp t
where not exists (
select *
from ccaddress a
left join w2addresscrossref x on x.UPRN = a.UPRN
and x.appcode in (
select w2source
from GazSourceConfig
where GazSource in (
select GazSource
from GazSourceConfig
where W2Source = 'CTAX'
)
union all select 'URB'
)
where t.PropertyReference = case #IsGazEnabled when 1 then x.PropertyReference else a.PropertyReference end
);

This can occur because SQL Server (the query optimizer) uses the value of the provided parameter when creating the initial execution plan. If the values in some of your tables are not evenly distributed, the created plan may work really well for certain values of the parameter, but really poorly for others. This is generally referred to a parameter sniffing. You can get around this using query hints (OPTIMIZE FOR X) or recompiling the stored procedure before each run (WITH RECOMPILE). You should read up on these options thoroughly before implementing as they both have side effects.
See a couple articles on Brent Ozar's site for more details, here and here.

I think you should rethink this query. Try and avoid the NOT EXISTS() for starters - as thats generally quite inefficient (I usually prefer a LEFT JOIN in these instances - and a corresponding WHERE x IS NULL - the x being something in the right hand side)
The main cause of woe for you though is likely to be the CASE based WHERE - as that is now causing the inner query to be evaluated for EVERY ROW!. I think you'd be better left joining both sets of disqualifying criteria, but include the parameter in the join conditions - and then check that there is nothing on the right hand side of either of the 2 left joined criteria
Heres how I think it could be rewritten:
declare #IsGazEnabled tinyint;
set #IsGazEnabled = 1;
select 'CT Ref: ' + accountreference + ' - Not synced due to missing property ref ' + t.PropertyReference
from CTaxAccountTemp t
left join ccaddress a2 ON t.PropertyReference = a2.PropertyReference and #IsGazEnabled = 0
left join
(
ccaddress a
join w2addresscrossref x on x.UPRN = a.UPRN
and x.appcode in ( -- could make this a join for efficiency....
select w2source
from GazSourceConfig
where GazSource in (
select GazSource
from GazSourceConfig
where W2Source = 'CTAX'
)
union all select 'URB'
)
) ON t.PropertyReference = x.PropertyReference AND and #IsGazEnabled = 1
WHERE
a2.PropertyReference IS NULL
AND x.PropertyReference IS NULL
;

Related

Runtime of stored procedure with temporary tables varies intermittently

We are facing a performance issue while executing a stored procedure. It usually takes between 10-15 minutes to run, but sometimes it takes up to more than 30 minutes to execute.
We captured visualize plan execute files for the Normal run and Long run cases.
By checking the visualized plan we came to know that, one particular Insert block of code takes extra time in the long run. And by checking
"EXPLAIN PLAN FOR SQL PLAN CACHE ENTRY <plan_id> "
the table we found that the order of execution differs in the long run.
This is the block which takes extra time to run sometimes.
INSERT INTO #TMP_DATI_SALDI_LORDI_BASE (
"COD_SCENARIO","COD_PERIODO","COD_CONTO","COD_DEST1","COD_DEST2","COD_DEST3","COD_DEST4","COD_DEST5"
,"IMPORTO","COD_VALUTA","IMPORTO_VALUTA_ORIGINARIA","COD_VALUTA_ORIGINARIA","NOTE"
)
( SELECT
SCEN_P.SCENARIO
,SCEN_P.PERIOD
,ACCOUT_ADJ.ATTRIBUTO1 AS "COD_CONTO"
,DATAS_rev.COD_DEST1
,DATAS_rev.COD_DEST2
,DATAS_rev.COD_DEST3
,__typed_NString__($1, 50)
,'RPT_NON'
,SUM(
CASE WHEN INFO.INCOT = 'FOB' THEN
CASE ACCOUT_rev.ATTRIBUTO1 WHEN 'CalcInsurance' THEN
0
ELSE
DATAS_rev.IMPORTO
END
ELSE
DATAS_rev.IMPORTO
END
* (DATAS_ADJ.IMPORTO - DATAS.IMPORTO)
)
,DATAS_rev.COD_VALUTA
,SUM(
CASE WHEN INFO.INCOT = 'FOB' THEN
CASE ACCOUT_rev.ATTRIBUTO1 WHEN 'CalcInsurance' THEN
0
ELSE
DATAS_rev.IMPORTO_VALUTA_ORIGINARIA
END
ELSE
DATAS_rev.IMPORTO_VALUTA_ORIGINARIA
END
* (DATAS_ADJ.IMPORTO_VALUTA_ORIGINARIA - DATAS.IMPORTO_VALUTA_ORIGINARIA)
)
,DATAS_rev.COD_VALUTA_ORIGINARIA
,'CPM_SP_CACL_FY_E3 Parts Option ADJ'
FROM #TMP_TAGERT_SCEN_P SCEN_P
INNER JOIN #TMP_DATI_SALDI_LORDI_BASE DATAS_rev
ON DATAS_rev.COD_SCENARIO = SCEN_P.SCENARIO
AND DATAS_rev.COD_PERIODO = SCEN_P.PERIOD
AND LEFT(DATAS_rev.COD_DEST3, 1) = 'O'
INNER JOIN CONTO ACCOUT_rev
ON ACCOUT_rev.COD_CONTO = DATAS_rev.COD_CONTO
AND ACCOUT_rev.ATTRIBUTO1 IN ('CalcFOB','CalcInsurance') --FOB,Insurance(Ocean freight is Nothing by Option)
INNER JOIN #DSL DATAS
ON DATAS.COD_SCENARIO = 'LAUNCH'
AND DATAS.COD_PERIODO = 12
AND DATAS.COD_DEST1 = 'NC'
AND DATAS.COD_DEST2 = 'NC'
AND DATAS.COD_DEST3 = 'F001'
AND DATAS.COD_DEST4 = DATAS_rev.COD_DEST4
AND DATAS.COD_DEST5 = 'INP'
INNER JOIN CONTO ACCOUT
ON ACCOUT.COD_CONTO = DATAS.COD_CONTO
AND ACCOUT.ATTRIBUTO2 = 'E3'
INNER JOIN CONTO ACCOUT_ADJ
ON ACCOUT_ADJ.ATTRIBUTO3 = DATAS.COD_CONTO
AND ACCOUT_ADJ.ATTRIBUTO2 = 'HE3'
INNER JOIN #DSL DATAS_ADJ
ON LEFT(DATAS_ADJ.COD_SCENARIO,4) = LEFT(SCEN_P.SCENARIO,4)
AND DATAS_ADJ.COD_PERIODO = 12
AND DATAS_ADJ.COD_DEST1 = DATAS.COD_DEST1
AND DATAS_ADJ.COD_DEST2 = DATAS.COD_DEST2
AND DATAS_ADJ.COD_DEST3 = DATAS.COD_DEST3
AND DATAS_ADJ.COD_DEST4 = DATAS.COD_DEST4
AND DATAS_ADJ.COD_DEST5 = DATAS.COD_DEST5
AND DATAS_ADJ.COD_CONTO = ACCOUT_ADJ.COD_CONTO
LEFT OUTER JOIN #TMP_KDPWT_INCOTERMS INFO
ON INFO.P_CODE = DATAS.COD_DEST4
GROUP BY
SCEN_P.SCENARIO,SCEN_P.PERIOD,ACCOUT_ADJ.ATTRIBUTO1,DATAS_rev.COD_DEST1,DATAS_rev.COD_DEST2
,DATAS_rev.COD_DEST3, DATAS.COD_DEST4,DATAS_rev.COD_VALUTA,DATAS_rev.COD_VALUTA_ORIGINARIA,INFO.INCOT
)
I will share the order of execution details also for normal and long run case.
Could someone please help us to overcome this issue? And also we don't know how to fix the order of the join execution. Is there any way to fix the join order execution, Please guide us.
Thanks in advance
Vinothkumar
Without a lot more detailed information, there is no way to tell exactly why your INSERT statement shows this alternating runtime behaviour.
Based on my experience, such an analysis can take quite some time and there are only few people available that are capable to perform it. If you can get someone like that to look at this, make sure to understand and learn.
What I can tell from the information shared is this
using temporary tables to structure a multi-stage data flow is the wrong thing to do on SAP HANA. Instead, use table variables in SQLScript.
if you insist on using the temporary tables, make them at least column tables; this will allow to avoid a need for some internal data materialisation.
when using joins make sure that the joined columns are of the same data type. The explain plan is full of TO_INT(), TO_DECIMAL(), and other conversion functions. Those take time, memory, and make it hard for the optimiser(s) to estimate cardinalities.
as the statement uses a lot of temporary tables, the different join orders can easily result from different volumes of data that was present when the SQL was parsed, prepared and optimised. One option to avoid this is to have HANA ignore any cached plans for the statement. The documentation has the HINTS for that.
And that is about what I can say about this with the available information.

Select SQL View Slow with table alias

I am baffled as to why selecting my SQL View is so slow when using a table alias (25 seconds) but runs so much faster when the alias is removed (2 seconds)
-this query takes 25 seconds.
SELECT [Extent1].[Id] AS [Id],
[Extent1].[ProjectId] AS [ProjectId],
[Extent1].[ProjectWorkOrderId] AS [ProjectWorkOrderId],
[Extent1].[Project] AS [Project],
[Extent1].[SubcontractorId] AS [SubcontractorId],
[Extent1].[Subcontractor] AS [Subcontractor],
[Extent1].[ValuationNumber] AS [ValuationNumber],
[Extent1].[WorksOrderName] AS [WorksOrderName],
[Extent1].[NewGross],
[Extent1].[CumulativeGross],
[Extent1].[CreateByName] AS [CreateByName],
[Extent1].[CreateDate] AS [CreateDate],
[Extent1].[FinalDateForPayment] AS [FinalDateForPayment],
[Extent1].[CreateByEmail] AS [CreateByEmail],
[Extent1].[Deleted] AS [Deleted],
[Extent1].[ValuationStatusCategoryId] AS [ValuationStatusCategoryId]
FROM [dbo].[ValuationsTotal] AS [Extent1]
-this query takes 2 seconds.
SELECT [Id],
[ProjectId],
[Project],
[SubcontractorId],
[Subcontractor],
[NewGross],
[ProjectWorkOrderId],
[ValuationNumber],
[WorksOrderName],
[CreateByName],
[CreateDate],
[CreateByEmail],
[Deleted],
[ValuationStatusCategoryId],
[FinalDateForPayment],
[CumulativeGross]
FROM [dbo].[ValuationsTotal]
this is my SQL View code -
WITH ValuationTotalsTemp(Id, ProjectId, Project, SubcontractorId, Subcontractor, WorksOrderName, NewGross, ProjectWorkOrderId, ValuationNumber, CreateByName, CreateDate, CreateByEmail, Deleted, ValuationStatusCategoryId, FinalDateForPayment)
AS (SELECT vi.ValuationId AS Id,
v.ProjectId,
p.NAME,
b.Id AS Expr1,
b.NAME AS Expr2,
wo.OrderNumber,
SUM(vi.ValuationQuantity * pbc.BudgetRate) AS 'NewGross',
sa.ProjectWorkOrderId,
v.ValuationNumber,
up.FirstName + ' ' + up.LastName AS Expr3,
v.CreateDate,
up.Email,
v.Deleted,
v.ValuationStatusCategoryId,
sa.FinalDateForPayment
FROM dbo.ValuationItems AS vi
INNER JOIN dbo.ProjectBudgetCosts AS pbc
ON vi.ProjectBudgetCostId = pbc.Id
INNER JOIN dbo.Valuations AS v
ON vi.ValuationId = v.Id
INNER JOIN dbo.ProjectSubcontractorApplications AS sa
ON sa.Id = v.ProjectSubcontractorApplicationId
INNER JOIN dbo.Projects AS p
ON p.Id = v.ProjectId
INNER JOIN dbo.ProjectWorkOrders AS wo
ON wo.Id = sa.ProjectWorkOrderId
INNER JOIN dbo.ProjectSubcontractors AS sub
ON sub.Id = wo.ProjectSubcontractorId
INNER JOIN dbo.Businesses AS b
ON b.Id = sub.BusinessId
INNER JOIN dbo.UserProfile AS up
ON up.Id = v.CreateBy
WHERE ( vi.Deleted = 0 )
AND ( v.Deleted = 0 )
GROUP BY vi.ValuationId,
v.ProjectId,
p.NAME,
b.Id,
b.NAME,
wo.OrderNumber,
sa.ProjectWorkOrderId,
v.ValuationNumber,
up.FirstName + ' ' + up.LastName,
v.CreateDate,
up.Email,
v.Deleted,
v.ValuationStatusCategoryId,
sa.FinalDateForPayment)
SELECT Id,
ProjectId,
Project,
SubcontractorId,
Subcontractor,
NewGross,
ProjectWorkOrderId,
ValuationNumber,
WorksOrderName,
CreateByName,
CreateDate,
CreateByEmail,
Deleted,
ValuationStatusCategoryId,
FinalDateForPayment,
(SELECT SUM(NewGross) AS Expr1
FROM ValuationTotalsTemp AS tt
WHERE ( ProjectWorkOrderId = t.ProjectWorkOrderId )
AND ( t.ValuationNumber >= ValuationNumber )
GROUP BY ProjectWorkOrderId) AS CumulativeGross
FROM ValuationTotalsTemp AS t
Any ideas why this is?
The SQL query runs with table alias as this is generated from Entity Framework so I have no way of changing this. I will need to modify my SQL view to be able to handle the table alias without affecting performance.
The execution plans are very different.
The slow one has a part that leaps out as being problematic. It estimates a single row will be input to a nested loops join and result in a single scan of ValuationItems. In practice it ends up performing more than 1,000 such scans.
Estimated
Actual
SQL Server 2014 introduced a new cardinality estimator. Your fast plan is using it. This is shown in the XML as CardinalityEstimationModelVersion="120" Your slow plan isn't (CardinalityEstimationModelVersion="70").
So it looks as though in this case the assumptions used by the new estimator give you a better plan.
The reason for the difference is probably as the fast one is running cross database (references [ProbeProduction].[dbo].[ValuationsTotal]) and presumably the database you are executing it from has compatility level of 2014 so automatically gets the new CardinalityEstimator.
The slow one is executing in the context of ProbeProduction itself and I assume the compatibility level of that database must be < 2014 - so you are defaulting to the legacy cardinality estimator.
You can use OPTION (QUERYTRACEON 2312) to get the slow query to use the new cardinality estimator (changing the database compatibility mode to globally alter the behaviour shouldn't be done without careful testing of existing queries as it can cause regressions as well as improvements).
Alternatively you could just try and tune the query working within the limits of the legacy CE. Perhaps adding join hints to encourage it to use something more akin to the faster plan.
The two queries are different (column order!). It is reasonable to assume the first query uses an index and is therefore much faster. I doubt it has anything to do with the aliassing.
For grins would take out the where and give this a try?
I might be doing a bunch of loop joins and filtering at the end
This might get it to filter up front
FROM dbo.ValuationItems AS vi
INNER JOIN dbo.Valuations AS v
ON vi.ValuationId = v.Id
AND vi.Deleted = 0
AND v.Deleted = 0
-- other joins
-- NO where
If you have a lot of loop joins going on then try inner hash join (on all)

Sqlite subquery with multiple sums

I have a sqlite database and I need to perform some arithmetic in my sql to get the final amount. The thing is I am not certain how to go and achieve this, I'm pretty certain from what I've been reading its a subquery which I need to group by lineID (unique id in my db)
The fields I want to include in the calculation below are
el.material_cost1 * el.material_qty1
el.material_cost2 * el.material_qty2
el.material_cost3 * el.material_qty3
The query I currently have is below. It returns the value of my costs apart from the fields missing above. As I store the material costs individually, I can't work out how to do a subquery within my query to get the desired result.
SELECT sum(el.enquiry_cost1) + sum(el.enquiry_cost2) + sum(el.enquiry_cost3)
FROM estimate e
LEFT JOIN estimate_line el
ON e.estimateID=el.estimateID
WHERE e.projectID=7 AND el.optional='false'
Use of a LEFT JOIN instead of an [INNER] JOIN is pointless, as your WHERE condition filters out any rows that could have differed.
I think you're making this harder than it needs to be. In particular, nothing in your description makes me think you need a subquery. Instead, it looks like this query would be close to what you're after:
SELECT
sum(el.enquiry_cost1)
+ sum(el.enquiry_cost2)
+ sum(el.enquiry_cost3)
+ sum(el.material_cost1 * el.material_qty1)
+ sum(el.material_cost2 * el.material_qty2)
+ sum(el.material_cost3 * el.material_qty3)
AS total_costs,
FROM
estimate e
JOIN estimate_line el
ON e.estimateID = el.estimateID
WHERE e.projectID = 7 AND el.optional = 'false'

Query taking too long - Optimization

I am having an issue with the following query returning results a bit too slow and I suspect I am missing something basic. My initial guess is the 'CASE' statement is taking too long to process its result on the underlying data. But it could be something in the derived tables as well.
The question is, how can I speed this up? Are there any glaring errors in the way I am pulling the data? Am I running into a sorting or looping issues somewhere? The query runs for about 40 seconds, which seems quite long. C# is my primary expertise, SQL is a work in progress.
Note I am not asking "write my code" or "fix my code". Just for a pointer in the right direction, I can't seem to figure out where the slow down occurs. Each derived table runs very quickly (less than a second) by themselves, the joins seem correct and the result set is returning exactly what I need. It's just too slow and I'm sure there are better SQL scripter's out there ;) Any tips would be greatly appreciated!
SELECT
hdr.taker
, hdr.order_no
, hdr.po_no as display_po
, cust.customer_name
, hdr.customer_id
, 'INCORRECT-LARGE ORDER' + CASE
WHEN (ext_price_calc >= 600.01 and ext_price_calc <= 800) and fee_price.unit_price <> round(ext_price_calc * -.01,2)
THEN '-1%: $' + cast(cast(ext_price_calc * -.01 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc >= 800.01 and ext_price_calc <= 1000 and fee_price.unit_price <> round(ext_price_calc * -.02,2)
THEN '-2%: $' + cast(cast(ext_price_calc * -.02 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc > 1000 and fee_price.unit_price <> round(ext_price_calc * -.03,2)
THEN '-3%: $' + cast(cast(ext_price_calc * -.03 as decimal(18,2)) as varchar(255))
ELSE
'OK'
END AS Status
FROM
(myDb_view_oe_hdr hdr
LEFT OUTER JOIN myDb_view_customer cust
ON hdr.customer_id = cust.customer_id)
LEFT OUTER JOIN wpd_view_sales_territory_by_customer territory
ON cust.customer_id = territory.customer_id
LEFT OUTER JOIN
(select
order_no,
SUM(ext_price_calc) as ext_price_calc
from
(select
hdr.order_no,
line.item_id,
(line.qty_ordered - isnull(qty_canceled,0)) * unit_price as ext_price_calc
from myDb_view_oe_hdr hdr
left outer join myDb_view_oe_line line
on hdr.order_no = line.order_no
where
line.delete_flag = 'N'
AND line.cancel_flag = 'N'
AND hdr.projected_order = 'N'
AND hdr.delete_flag = 'N'
AND hdr.cancel_flag = 'N'
AND line.item_id not in ('LARGE-ORDER-1%','LARGE-ORDER-2%', 'LARGE-ORDER-3%', 'FUEL','NET-FUEL', 'CONVENIENCE-FEE')) as line
group by order_no) as order_total
on hdr.order_no = order_total.order_no
LEFT OUTER JOIN
(select
order_no,
count(order_no) as convenience_count
from oe_line with (nolock)
left outer join inv_mast inv with (nolock)
on oe_line.inv_mast_uid = inv.inv_mast_uid
where inv.item_id in ('LARGE-ORDER-1%','LARGE-ORDER-2%', 'LARGE-ORDER-3%')
and oe_line.delete_flag <> 'Y'
group by order_no) as fee_count
on hdr.order_no = fee_count.order_no
INNER JOIN
(select
order_no,
unit_price
from oe_line line with (nolock)
where line.inv_mast_uid in (select inv_mast_uid from inv_mast with (nolock) where item_id in ('LARGE-ORDER-1%','LARGE-ORDER-2%', 'LARGE-ORDER-3%'))) as fee_price
ON fee_count.order_no = fee_price.order_no
WHERE
hdr.projected_order = 'N'
AND hdr.cancel_flag = 'N'
AND hdr.delete_flag = 'N'
AND hdr.completed = 'N'
AND territory.territory_id = ‘CUSTOMERTERRITORY’
AND ext_price_calc > 600.00
AND hdr.carrier_id <> '100004'
AND fee_count.convenience_count is not null
AND CASE
WHEN (ext_price_calc >= 600.01 and ext_price_calc <= 800) and fee_price.unit_price <> round(ext_price_calc * -.01,2)
THEN '-1%: $' + cast(cast(ext_price_calc * -.01 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc >= 800.01 and ext_price_calc <= 1000 and fee_price.unit_price <> round(ext_price_calc * -.02,2)
THEN '-2%: $' + cast(cast(ext_price_calc * -.02 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc > 1000 and fee_price.unit_price <> round(ext_price_calc * -.03,2)
THEN '-3%: $' + cast(cast(ext_price_calc * -.03 as decimal(18,2)) as varchar(255))
ELSE
'OK' END <> 'OK'
Just as a clue to the right direction for optimization:
When you do an OUTER JOIN to a query with calculated columns, you are guaranteeing not only a full table scan, but that those calculations must be performed against every row in the joined table. It appears that you can actually do your join to oe_line without the column calculations (i.e. by filtering ext_price_calc to a specific range).
You don't need to do most of the subqueries that are in your query--the master query can be recrafted to use regular table join syntax. Joins to subqueries containing subqueries presents a challenge to the SQL optimizer that it may not be able to meet. But by using regular joins, the optimizer has a much better chance at identifying more efficient query strategies.
You don't tag which SQL engine you're using. Every database has proprietary extensions that may allow for speedier or more efficient queries. It would be easier to provide useful feedback if you indicated whether you were using MySQL, SQL Server, Oracle, etc.
Regardless of the database you're using, reviewing the query plan is always a good place to start. This will tell you where most of the I/O and time in your query is being spent.
Just on general principle, make sure your statistics are up-to-date.
It's may not be solvable by any of us without the real stuff to test with.
IF that's the case and nobody else posts the answer, I can still help. Here is how to trouble shoot it.
(1) take joins and pieces out one by one.
(2) this will cause errors. Remove or fake the references to get rid of them.
(3) see how that works.
(4) Put items back before you try taking something else out
(5) keep track...
(6) also be aware where a removal of something might drastically reduce the result set.
You might find you're missing an index or some other smoking gun.
I was having the same problem and I was able to solve it by indexing one of the tables and setting a primary key.
I strongly suspect that the problem lies in the number of joins you're doing. A lot of databases do joins basically by systemically checking all possible combinations of the various tables as being valid - so if you're joinging table A and B on column C, and A looks like:
Name:C
Fred:1
Alice:2
Betty:3
While B looks like:
C:Pet
1:Alligator
2:Lion
3:T-Rex
When you do the join, it checks all 9 possibilities:
Fred:1:1:Alligator
Fred:1:2:Lion
Fred:1:3:T-Rex
Alice:2:1:Alligator
Alice:2:2:Lion
Alice:2:3:T-Rex
Betty:3:1:Alligator
Betty:3:2:Lion
Betty:3:3:T-Rex
And goes through and deletes the non-matching ones:
Fred:1:1:Alligator
Alice:2:2:Lion
Betty:3:3:T-Rex
... which means with three entries in each table, it creates nine temporary records, sorts through them all, and deletes six of them ... all before it actually sorts through the results for what you're after (so if you are looking for Betty's Pet, you only want one row on that final result).
... and you're doing how many joins and sub-queries?

Slow SQL query when joining tables

This query is very very slow and i'm not sure where I'm going wrong to cause it to be so slow.
I'm guessing it's something to do with the flight_prices table
because if I remove that join it goes from 16 seconds to less than one.
SELECT * FROM OPENQUERY(mybook,
'SELECT wb.booking_ref
FROM web_bookings wb
LEFT JOIN prod_info pi ON wb.location = pi.location
LEFT JOIN flight_prices fp ON fp.dest_date = pi.dest_airport + '' '' + wb.sort_date
WHERE fp.dest_cheapest = ''Y''
AND wb.inc_flights = ''Y''
AND wb.customer = ''12345'' ')
Any ideas how I can speed up this join??
You're unlikely to get any indexing on flight_prices.dest_date to be used as you're not actually joining to another column which makes it hard for the optimiser.
If you can change the schema I'd make it so flight_prices.dest_date was split into two columns dest_airport and dest_Date as it appears to be currently a composite of airport and date. If you did that you could then join like this
fp.dest_date = wb.sort_date and fp.dest_airport = pi.dest_airport
Try EXPLAIN PLAN and see what your database comes back with.
If you see TABLE SCAN, you might need to add indexes.
That second JOIN looks rather odd to me. I'd wonder if that could be rewritten.
Your statement reformatted gives me this
SELECT wb.booking_ref
FROM web_bookings wb
LEFT JOIN prod_info pi ON wb.location = pi.location
LEFT JOIN flight_prices fp ON fp.dest_date = pi.dest_airport + ' ' + wb.sort_date
WHERE fp.dest_cheapest = 'Y'
AND wb.inc_flights = 'Y'
AND wb.customer = '12345'
I would make sure that following fields have indexes
dest_cheapest
dest_date
location
customer, inc_flights, booking_ref (covering index)