Slow SQL query when joining tables - sql

This query is very very slow and i'm not sure where I'm going wrong to cause it to be so slow.
I'm guessing it's something to do with the flight_prices table
because if I remove that join it goes from 16 seconds to less than one.
SELECT * FROM OPENQUERY(mybook,
'SELECT wb.booking_ref
FROM web_bookings wb
LEFT JOIN prod_info pi ON wb.location = pi.location
LEFT JOIN flight_prices fp ON fp.dest_date = pi.dest_airport + '' '' + wb.sort_date
WHERE fp.dest_cheapest = ''Y''
AND wb.inc_flights = ''Y''
AND wb.customer = ''12345'' ')
Any ideas how I can speed up this join??

You're unlikely to get any indexing on flight_prices.dest_date to be used as you're not actually joining to another column which makes it hard for the optimiser.
If you can change the schema I'd make it so flight_prices.dest_date was split into two columns dest_airport and dest_Date as it appears to be currently a composite of airport and date. If you did that you could then join like this
fp.dest_date = wb.sort_date and fp.dest_airport = pi.dest_airport

Try EXPLAIN PLAN and see what your database comes back with.
If you see TABLE SCAN, you might need to add indexes.
That second JOIN looks rather odd to me. I'd wonder if that could be rewritten.

Your statement reformatted gives me this
SELECT wb.booking_ref
FROM web_bookings wb
LEFT JOIN prod_info pi ON wb.location = pi.location
LEFT JOIN flight_prices fp ON fp.dest_date = pi.dest_airport + ' ' + wb.sort_date
WHERE fp.dest_cheapest = 'Y'
AND wb.inc_flights = 'Y'
AND wb.customer = '12345'
I would make sure that following fields have indexes
dest_cheapest
dest_date
location
customer, inc_flights, booking_ref (covering index)

Related

Oracle complex query with multiple joins on same table

I am dealing with a monster query ( ~800 lines ) on oracle 11, and its taking expensive resources.
The main problem here is a table mouvement with about ~18 million lines, on which I have like 30 left joins on this table.
LEFT JOIN mouvement mracct_ad1
ON mracct_ad1.code_portefeuille = t.code_portefeuille
AND mracct_ad1.statut_ligne = 'PROPRE'
AND substr(mracct_ad1.code_valeur,1,4) = 'MRAC'
AND mracct_ad1.code_transaction = t.code_transaction
LEFT JOIN mouvement mracct_zias
ON mracct_zias.code_portefeuille = t.code_portefeuille
AND mracct_zias.statut_ligne = 'PROPRE'
AND substr(mracct_zias.code_valeur,1,4) = 'PRAC'
AND mracct_zias.code_transaction = t.code_transaction
LEFT JOIN mouvement mracct_zixs
ON mracct_zias.code_portefeuille = t.code_portefeuille
AND mracct_zias.statut_ligne = 'XROPRE'
AND substr(mracct_zias.code_valeur,1,4) = 'MRAT'
AND mracct_zias.code_transaction = t.code_transaction
is there some way so I can get rid of the left joins, (union join or example) to make the query faster and consumes less? execution plan or something?
Just a note on performance. Usually you want to "rephrase" conditions like:
AND substr(mracct_ad1.code_valeur,1,4) = 'MRAC'
In simple words, expressions on the left side of the equality will prevent the best usage of indexes and may push the SQL optimizer toward a less than optimal plan. The database engine will end up doing more work than is really needed, and the query will be [much] slower. In extreme cases they can even decide to use a Full Table Scan. In this case you can rephrase it as:
AND mracct_ad1.code_valeur like 'MRAC%'
or:
AND mracct_ad1.code_valeur >= 'MRAC' AND mracct_ad1.code_valeur < 'MRAD'
I am guessing so. Your code sample doesn't make much sense, but you can probably do conditional aggregation:
left join
(select m.code_portefeuille, m.code_transaction,
max(case when m.statut_ligne = 'PROPRE' and m.code_valeur like 'MRAC%' then ? end) as ad1,
max(case when m.statut_ligne = 'PROPRE' and m.code_valeur like 'MRAC%' then ? end) as zia,
. . . -- for all the rest of the joins as well
from mouvement m
group by m.code_portefeuille, m.code_transaction
) m
on m.code_portefeuille = t.code_portefeuille and m.code_transaction = t.code_transaction
You can probably replace all 30 joins with a single join to the aggregated table.

Best way to optimize this SQL Query

I have to optimize this query and I am really in a hurry here. The following query searches by client. The input value RIF.keyvaluechar
LIKE 'V%10553790 ' is because in some old registers in the database some IDs when missing characters it used to be V0012345678 but it should have been V12345678 as that's the maximum amount of characters the ID can have. I know 12345678 should have been numeric and the V a char and then compare, but that's another issue.
Anyway, the query is this one:
SELECT DISTINCT idata.itemnum AS [ID],
LTRIM(RTRIM(ISNULL(CONTRATO.keyvaluechar,'N/A'))) AS [Contrato],
idata.datestored AS [Fecha],
NUMERO.keyvaluesmall AS [Numero],
TIPO.keyvaluechar AS [Tipo],
LTRIM(RTRIM(ISNULL(LC.lifecyclename,'N/A'))) AS [Flujo],
LTRIM(RTRIM(ISNULL(LC.lcnum,-1))) AS [FlujoID],
LTRIM(RTRIM(ISNULL(LCS.statename,'N/A'))) AS [Cola],
LTRIM(RTRIM(ISNULL(LCS.statenum,-1))) AS [ColaID],
CASE
WHEN PC.NombreProceso IN('PTD','PV2','PV3') THEN 1
ELSE 0
END AS [Portada]
FROM OnBase.hsi.itemdata idata WITH (NOLOCK)
INNER JOIN OnBase.hsi.keyitem109 TIPO WITH (NOLOCK) ON TIPO.itemnum = idata.itemnum
INNER JOIN OnBase.hsi.keyitem113 NUMERO WITH (NOLOCK) ON NUMERO.itemnum = idata.itemnum
LEFT JOIN OnBase.hsi.keyitem132 CONTRATO WITH (NOLOCK) ON CONTRATO.itemnum = idata.itemnum
LEFT JOIN OnBase.hsi.keyitem114 CLIENTE WITH (NOLOCK) ON CLIENTE.itemnum = idata.itemnum
LEFT JOIN OnBase.hsi.keyitem111 RIF WITH (NOLOCK) ON RIF.itemnum = idata.itemnum
INNER JOIN OnBase.hsi.doctype DOC WITH (NOLOCK) ON DOC.itemtypenum = idata.itemtypenum
INNER JOIN BD_WorkFlow.dbo.BBVA_ProcesosConfig PC WITH (NOLOCK) ON PC.ID_Documento = idata.itemtypenum
LEFT JOIN Onbase.hsi.itemlc ILC WITH (NOLOCK) ON ILC.itemnum = idata.itemnum
LEFT JOIN Onbase.hsi.lcstate LCS WITH (NOLOCK) ON LCS.statenum = ILC.statenum
LEFT JOIN Onbase.hsi.lifecycle LC WITH (NOLOCK) ON LC.lcnum = ILC.lcnum
WHERE PC.NombreProceso <> 'XXX' AND
PC.NombreProceso NOT IN('PTD','PV2','PV3') AND
TIPO.keyvaluechar = 'CCD' AND
RIF.keyvaluechar LIKE 'V%10553790 '
As you can see it is this way so it finds V0012345678 or V12345678 but this is not the right way or I feel it is the best optimization, although I am no expert in databases.
Anyways, I've though about something like this instead of last line
AND LEFT ('RIF.Keyvaluechar, 1) ="V"
AND SUBSTRING (RIF.Keyvaluechar, 2, LEN(RIF.Keyvaluechar)) = "12345678"
What do you guys think? Is there any other better way to improve upon this?
First, your query has a logic problem. You have this:
LEFT JOIN OnBase.hsi.keyitem111 RIF WITH(NOLOCK) ON RIF.itemnum = idata.itemnum
and then this in your where clause:
AND RIF.keyvaluechar LIKE 'V%10553790 '
Putting that filter in your where clause effectively changes your left join to an inner join. To fix this, move the filter to the join.
In terms of optimizing it, I assume that means to make it run faster. What you were thinking about will probably slow things down because you are filtering on function results instead of fields. A better approach, no matter how much of a hurry you are in, is to look at the indexes in your database and try to filter on those. In fact, it might be appropriate to add new ones.
Is the Keyvaluechar always a number from the second character onwards and you want to treat it as a number (=remove leading zeros). You could try to add a persisted column convert(int, SUBSTRING (Keyvaluechar, 2, 10)) to the table, then index that, and use it as a search criteria. At least I would assume that should help a lot.
In addition to that, looking at statistics IO output might be a good idea too, to see what table is actually responsible for the biggest I/O amounts.
Just a note, I hope you also know the problems using NOLOCK can cause you.

Query taking too long - Optimization

I am having an issue with the following query returning results a bit too slow and I suspect I am missing something basic. My initial guess is the 'CASE' statement is taking too long to process its result on the underlying data. But it could be something in the derived tables as well.
The question is, how can I speed this up? Are there any glaring errors in the way I am pulling the data? Am I running into a sorting or looping issues somewhere? The query runs for about 40 seconds, which seems quite long. C# is my primary expertise, SQL is a work in progress.
Note I am not asking "write my code" or "fix my code". Just for a pointer in the right direction, I can't seem to figure out where the slow down occurs. Each derived table runs very quickly (less than a second) by themselves, the joins seem correct and the result set is returning exactly what I need. It's just too slow and I'm sure there are better SQL scripter's out there ;) Any tips would be greatly appreciated!
SELECT
hdr.taker
, hdr.order_no
, hdr.po_no as display_po
, cust.customer_name
, hdr.customer_id
, 'INCORRECT-LARGE ORDER' + CASE
WHEN (ext_price_calc >= 600.01 and ext_price_calc <= 800) and fee_price.unit_price <> round(ext_price_calc * -.01,2)
THEN '-1%: $' + cast(cast(ext_price_calc * -.01 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc >= 800.01 and ext_price_calc <= 1000 and fee_price.unit_price <> round(ext_price_calc * -.02,2)
THEN '-2%: $' + cast(cast(ext_price_calc * -.02 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc > 1000 and fee_price.unit_price <> round(ext_price_calc * -.03,2)
THEN '-3%: $' + cast(cast(ext_price_calc * -.03 as decimal(18,2)) as varchar(255))
ELSE
'OK'
END AS Status
FROM
(myDb_view_oe_hdr hdr
LEFT OUTER JOIN myDb_view_customer cust
ON hdr.customer_id = cust.customer_id)
LEFT OUTER JOIN wpd_view_sales_territory_by_customer territory
ON cust.customer_id = territory.customer_id
LEFT OUTER JOIN
(select
order_no,
SUM(ext_price_calc) as ext_price_calc
from
(select
hdr.order_no,
line.item_id,
(line.qty_ordered - isnull(qty_canceled,0)) * unit_price as ext_price_calc
from myDb_view_oe_hdr hdr
left outer join myDb_view_oe_line line
on hdr.order_no = line.order_no
where
line.delete_flag = 'N'
AND line.cancel_flag = 'N'
AND hdr.projected_order = 'N'
AND hdr.delete_flag = 'N'
AND hdr.cancel_flag = 'N'
AND line.item_id not in ('LARGE-ORDER-1%','LARGE-ORDER-2%', 'LARGE-ORDER-3%', 'FUEL','NET-FUEL', 'CONVENIENCE-FEE')) as line
group by order_no) as order_total
on hdr.order_no = order_total.order_no
LEFT OUTER JOIN
(select
order_no,
count(order_no) as convenience_count
from oe_line with (nolock)
left outer join inv_mast inv with (nolock)
on oe_line.inv_mast_uid = inv.inv_mast_uid
where inv.item_id in ('LARGE-ORDER-1%','LARGE-ORDER-2%', 'LARGE-ORDER-3%')
and oe_line.delete_flag <> 'Y'
group by order_no) as fee_count
on hdr.order_no = fee_count.order_no
INNER JOIN
(select
order_no,
unit_price
from oe_line line with (nolock)
where line.inv_mast_uid in (select inv_mast_uid from inv_mast with (nolock) where item_id in ('LARGE-ORDER-1%','LARGE-ORDER-2%', 'LARGE-ORDER-3%'))) as fee_price
ON fee_count.order_no = fee_price.order_no
WHERE
hdr.projected_order = 'N'
AND hdr.cancel_flag = 'N'
AND hdr.delete_flag = 'N'
AND hdr.completed = 'N'
AND territory.territory_id = ‘CUSTOMERTERRITORY’
AND ext_price_calc > 600.00
AND hdr.carrier_id <> '100004'
AND fee_count.convenience_count is not null
AND CASE
WHEN (ext_price_calc >= 600.01 and ext_price_calc <= 800) and fee_price.unit_price <> round(ext_price_calc * -.01,2)
THEN '-1%: $' + cast(cast(ext_price_calc * -.01 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc >= 800.01 and ext_price_calc <= 1000 and fee_price.unit_price <> round(ext_price_calc * -.02,2)
THEN '-2%: $' + cast(cast(ext_price_calc * -.02 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc > 1000 and fee_price.unit_price <> round(ext_price_calc * -.03,2)
THEN '-3%: $' + cast(cast(ext_price_calc * -.03 as decimal(18,2)) as varchar(255))
ELSE
'OK' END <> 'OK'
Just as a clue to the right direction for optimization:
When you do an OUTER JOIN to a query with calculated columns, you are guaranteeing not only a full table scan, but that those calculations must be performed against every row in the joined table. It appears that you can actually do your join to oe_line without the column calculations (i.e. by filtering ext_price_calc to a specific range).
You don't need to do most of the subqueries that are in your query--the master query can be recrafted to use regular table join syntax. Joins to subqueries containing subqueries presents a challenge to the SQL optimizer that it may not be able to meet. But by using regular joins, the optimizer has a much better chance at identifying more efficient query strategies.
You don't tag which SQL engine you're using. Every database has proprietary extensions that may allow for speedier or more efficient queries. It would be easier to provide useful feedback if you indicated whether you were using MySQL, SQL Server, Oracle, etc.
Regardless of the database you're using, reviewing the query plan is always a good place to start. This will tell you where most of the I/O and time in your query is being spent.
Just on general principle, make sure your statistics are up-to-date.
It's may not be solvable by any of us without the real stuff to test with.
IF that's the case and nobody else posts the answer, I can still help. Here is how to trouble shoot it.
(1) take joins and pieces out one by one.
(2) this will cause errors. Remove or fake the references to get rid of them.
(3) see how that works.
(4) Put items back before you try taking something else out
(5) keep track...
(6) also be aware where a removal of something might drastically reduce the result set.
You might find you're missing an index or some other smoking gun.
I was having the same problem and I was able to solve it by indexing one of the tables and setting a primary key.
I strongly suspect that the problem lies in the number of joins you're doing. A lot of databases do joins basically by systemically checking all possible combinations of the various tables as being valid - so if you're joinging table A and B on column C, and A looks like:
Name:C
Fred:1
Alice:2
Betty:3
While B looks like:
C:Pet
1:Alligator
2:Lion
3:T-Rex
When you do the join, it checks all 9 possibilities:
Fred:1:1:Alligator
Fred:1:2:Lion
Fred:1:3:T-Rex
Alice:2:1:Alligator
Alice:2:2:Lion
Alice:2:3:T-Rex
Betty:3:1:Alligator
Betty:3:2:Lion
Betty:3:3:T-Rex
And goes through and deletes the non-matching ones:
Fred:1:1:Alligator
Alice:2:2:Lion
Betty:3:3:T-Rex
... which means with three entries in each table, it creates nine temporary records, sorts through them all, and deletes six of them ... all before it actually sorts through the results for what you're after (so if you are looking for Betty's Pet, you only want one row on that final result).
... and you're doing how many joins and sub-queries?

Slow Query with Dynamic WHERE Clause

I know the query below is not the best, but right now it has to do the job:
FROM dbo.CE_Summons_ext0 s with (nolock)
INNER JOIN dbo.CE_Fines_ext0 f with (nolock)
ON (f.ref_no = s.ref_no AND f.doc_type = s.doc_type)
INNER JOIN dbo.CE_charge_status c with (nolock)
ON f.status = c.status_no
INNER JOIN dbo.CE_COURT_DESC crt_desc with (nolock)
ON crt_desc.COURT = s.COURT
INNER JOIN dbo.CE_CntParms_ext0 param with (nolock)
ON param.REF_NO = s.ref_no
INNER JOIN dbo.CE_Court_result crt_result with (nolock)
ON crt_result.COURT_RESULT = param.COURT_RESULT
WHERE s.SUMMONS_NO = isnull(nullif(#sms_summons_no, ''), s.SUMMONS_NO)
AND s.ref_no = isnull(nullif(#scp_ref_no,''), s.ref_no)
AND s.COURT = isnull(nullif(#sms_court,'') , s.COURT)
-- AND f.STREET1 = isnull(nullif(#street1,''), f.STREET1)
-- AND f.acc_name = isnull(nullif(#offender_name,''), f.acc_name)
-- AND f.id_no = isnull(nullif(#offender_id,''), f.id_no)
-- AND f.acc_name = isnull(nullif(#owner_name,''), f.acc_name)
-- AND f.id_no = isnull(nullif(#owner_id,''), f.id_no)
END
On the WHERE clause if I uncomment the last conditions it runs horribly slow. What am I doing wrong?
This looks as if you are determining the where clause based on the values of parameters. The most effective way to do this in terms of performance is usually to build the query dynamically using dynamic SQL so that you do not have to use functions in the where clause.
Try replacing with this:
(#Street1 IS NULL OR #Street1 = '' OR f.STREET1 = #Street1)
When you're executing the query (in both cases) go to the Query menu in SSMS and select Include Actual Execution Plan. That will tell you the specific parts of the query that are slow and why.
I realize this isn't a direct answer to your question, but it is a very useful means by which you can research and solve your own issue as well as learn how to better diagnose slow queries.

Help converting multi DB SQL Query to LINQ

I have the below SQL Query, which returns what I expect, and I would LIKE to accomplish the same thing in LINQ but so far my results have been less then spectacular. The major hurdle, as I see it, is that the data is coming from 3 separate DB's. I was able to accomplish this in LINQ but it is extremely slow, see here.
So, with out further ado, here it is, with the hardcoded Guid() being the only exception as that gets passed in:
SELECT en.ClientID, p.LastName, p.FirstName, NurseName = dm2.FirstName + ' ' + dm2.LastName, SocialWorkerName = dm.FirstName + ' ' + dm.LastName, en.EnrollmentDate, en.DisenrollmentDate, ESWorkerName = sw.FirstName + ' ' + sw.LastName, sw.Phone
FROM CMO.dbo.tblCMOEnrollment en
LEFT OUTER JOIN CMO.dbo.tblSupportWorker sw
ON en.EconomicSupportWorkerID = sw.SupportWorkerID
INNER JOIN Connect.dbo.tblPerson p
ON en.ClientID = p.PersonID
LEFT OUTER JOIN aspnetdb.dbo.tblDemographics dm
ON en.CMOSocialWorkerID = dm.UserID
LEFT OUTER JOIN aspnetdb.dbo.tblDemographics dm2
ON en.CMONurseID = dm2.UserID
WHERE (en.CMOSocialWorkerID = '060632EE-BE09-4057-B17B-2D0190D0FF74'
OR
en.CMONurseID = '060632EE-BE09-4057-B17B-2D0190D0FF74')
AND (en.DisenrollmentDate IS NULL
OR
en.DisenrollmentDate > GetDate())
ORDER BY en.DisenrollmentDate, p.LastName
Since you want to issue 1 query, you should only use 1 datacontext. Add views to one of the databases to represent the other databases tables, then add it all to one LinqToSqlClasses file.
If you can't modify any of the three databases, create a fourth database with views to the other three.