How to prevent a filter from causing a query to run much slower

How to prevent a filter from causing a query to run much slower - sql

I have the following piece of code which runs quickly (<1s):
SELECT
[Policy].[Value] AS [PolicyId]
,[Person].[Value] AS [PersonId]
,[Person].[Index] AS [PersonIndex]
FROM
[dbo].[View] AS [Policy]
INNER JOIN [dbo].[ViewPerson] AS [Person] WITH(INDEX([Index])) ON ([Policy].[CollectionId] = [Person].[CollectionId]
AND [Person].[Name] = 'PersonId' AND [Policy].[Name] = 'PolicyId')
WHERE
[Policy].[CollectionId] = 10003
-- AND [Policy].[Value] = [Person].[Value]
This will return 2 rows from my database. When I comment out the last line to apply a stronger filter it returns only 1 row from my database, but will take much longer to run (~20s).
Is there a method to reduce the time this query takes to run when a filter is applied to it? Ideally I'd like it to run at the same speed as the original.

You were told in comments, that forcing the engine to use a special index is - in most cases - not the best idea. The engine is pretty good in finding the best plan and it will work best if you let it go its own route.
Secondly you were told already, that the execution plan is the best place to start. As we do not see any details, the following is pure guessing:
If I get this correctly, your query will use CollectionId to filter for one given id (just very few Policy rows). For these rows, the JOIN on a VIEW (we have no idea, what is behind here!) tries to link person rows.
The filter should work against a very reduced set.
Your observations let me assume, that the second line in WHERE is dealing with a much larger set. I'm pretty sure, that the filter for CollectionId=10003 pulls after the other filter... The execution plan will show the details...
What you can do:
Take away the index hint
Try to add the second line in the WHERE with AND to the ON-clause of the JOIN.
Something along this:
SELECT
[Policy].[Value] AS [PolicyId]
,[Person].[Value] AS [PersonId]
,[Person].[Index] AS [PersonIndex]
FROM
[dbo].[View] AS [Policy]
INNER JOIN [dbo].[ViewPerson] AS [Person] ON ([Policy].[CollectionId] = [Person].[CollectionId]
AND [Person].[Name] = 'PersonId'
AND [Policy].[Name] = 'PolicyId'
AND [Policy].[Value] = [Person].[Value])
WHERE
[Policy].[CollectionId] = 10003;

Related

Order by in subquery behaving differently than native sql query?

So I am honestly a little puzzled by this!
I have a query that returns a set of transactions that contain both repair costs and an odometer reading at the time of repair on the master level. To get an accurate Cost per mile reading I need to do a subquery to get both the first meter reading between a starting date and an end date, and an ending meter.
(select top 1 wf2.ro_num
from wotrans wotr2
left join wofile wf2
on wotr2.rop_ro_num = wf2.ro_num
and wotr2.rop_fac = wf2.ro_fac
where wotr.rop_veh_num = wotr2.rop_veh_num
and wotr.rop_veh_facility = wotr2.rop_veh_facility
AND ((#sdate = '01/01/1900 00:00:00' and wotr2.rop_tran_date = 0)
OR ([dbo].[udf_RTA_ConvertDateInt](#sdate) <= wotr2.rop_tran_date
AND [dbo].[udf_RTA_ConvertDateInt](#edate) >= wotr2.rop_tran_date))
order by wotr2.rop_tran_date asc) as highMeter
The reason I have the tables aliased as xx2 is because those tables are also used in the main query, and I don't want these to interact with each other except to pull the correct vehicle number and facility.
Basically when I run the main query it returns a value that is not correct; it returns the one that is second(keep in mind that the first and second have the same date.) But when I take the subquery and just copy and paste it into it's own query and run it, it returns the correct value.
I do have a work around for this, but I am just curious as to why this happening. I have searched quite a bit and found not much(other than the fact that people don't like order bys in subqueries). Talking to one of my friends that also does quite a bit of SQL scripting, it looks to us as if the subquery is ordering differently than the subquery by itsself when you have multiple values that are the same for the order by(i.e. 10 dates of 08/05/2016).
Any ideas would be helpful!
Like I said I have a work around that works in this one case, but don't know yet if it will work on a larger dataset.
Let me know if you want more code.

Oracle view join to table very weird slow issue

I have a table order, which is very straightforward, it is storing order data.
I have a view, which is storing currency pair and currency rate. The view is created as below:
create or replace view view_currency_rate as (
select c.* from currency_rate c, (
select curr_from, curr_to, max(rate_date) max_rate_date from currency_rate
where system_rate > 0
group by curr_from, curr_to) r
where c.curr_from = r.curr_from
and c.curr_to = r.curr_to
and c.rate_date = r.max_rate_date
and c.system_rate > 0
);
nothing fancy here, this view populate the latest currency rate (curr_from -> curr_to) from the currency_rate table.
When I do as below, it populate 80k row (all data) because I have plenty of records in order table. And the time spent is less than 5 seconds.
First Query:
select * from
VIEW_CURRENCY_RATE c, order a
where
c.curr_from = A.CURRENCY;
I want to add in more filter, so I thought it could be faster, so I added this:
Second Query:
select * from
VIEW_CURRENCY_RATE c, order a
where
a.id = 'xxxx'
and c.curr_from = A.CURRENCY;
And now it run over 1 minute! I totally have no idea what happen to this. I thought it would be some oracle optimizer goes wrong, so I try to find another way, think of just the 80K data can be populated quite fast, so I try to get the data from it, so I nested the SQL as below:
select * from (
select * from
VIEW_CURRENCY_RATE c, order a
where
c.curr_from = A.CURRENCY
)
where id = 'xxxx';
It run damn slow as well! I running out of idea, can anyone explain what happen to my script?
Updated on 6-Sep-2016
After I know how to 'explain plan', I capture the screen:
Fist query (fast one with 80K data):
Second query (slow one):
The slow one totally break the view and form a new SQL! This is super weird that how can Oracle optimize this like that?

It seems problem relates to the plan of second query. because it uses of nest loops inplace of hash joint.
at first check if _hash_join_enable is true if it isn't true change it to true. if it is true there are some problem with oracle optimizer. for test it use of USE_HASH(tab2 tab1) hint.
Regards
mohsen

I am using Mike solution, I re-write the script, and it is running fast now, although the root cause is not determined, probably due to the oracle optimizer algorithm working in different way that I expect.

Access query speed differs

I have a local access database and in it a query which takes values from a form to populate a drop down menu. The weird (to me) thing is that with most options this query is quick (blink of an eye), but with a few options it's very slow (>10 seconds).
What the query is does is a follows: It populates a dropdown menu to record animals seen at a specific sighting, but only those animals which have not been recorded at that specific sighting yet (to avoid duplicate entries).
SELECT DISTINCT tblAnimals.AnimalID, tblAnimals.Nickname, tblAnimals.Species
FROM tblSightings INNER JOIN (tblAnimals INNER JOIN tblAnimalsatSighting ON tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID) ON tblSightings.SightingID = tblAnimalsatSighting.SightingID
WHERE (((tblAnimals.Species)=[form]![Species]) AND ((tblAnimals.CurrentGroup)=[form]![AnimalGroup2]) AND ((tblAnimals.[Dead?])=False) AND ((Exists (select tblAnimalsatSighting.AnimalID FROM tblAnimalsatSighting WHERE tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID AND tblAnimalsatSighting.SightingID = [form]![SightingID]))=False));
It performs well for all groups of 2 of the 4 possible species, for 1 species it performs well for 4 of the 5 groups, but not for the last group, and for the last species it performs very slowly for both groups. Anybody an idea what can be the cause of this kind of behavior? Is it problems with the query? Or duplicate entries in the tables which can cause this? I don't think it's duplicates in the tables, I've checked that, and there are some, but they appear both for groups where there are problems and where there aren't. Could I re-write the query so it performs faster?

As noted in our comments above, you confirmed that the extra joins were not really need and were in fact going to limit the results to animal that had already had a sighting. Those joins would also likely contribute to a slowdown.
I know that Access probably added most of the parentheses automatically but I've removed them and converted the subquery to a not exists form that's a lot more readable.
SELECT tblAnimals.AnimalID, tblAnimals.Nickname, tblAnimals.Species
FROM tblAnimals
WHERE
tblAnimals.Species = [form]![Species]
AND tblAnimals.CurrentGroup = [form]![AnimalGroup2]
AND tblAnimals.[Dead?] = False
AND NOT EXISTS (
SELECT tblAnimalsatSighting.AnimalID
FROM tblAnimalsatSighting
WHERE
tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID
AND tblAnimalsatSighting.SightingID = [form]![SightingID]
);

Additional condition in MS SQL query is executing 2 min (compared to original 1 sec)

I have one illogical problem that I just can't figure out.
I am doing a complex query. After I did a little change it began executing over 2 minutes instead of one second. Can someone explain to me how is this even possible? What could be the background of this?
First query
DECLARE #CRUISE_ID int = 10001890 --:CRUISE_ID
SELECT
/* ... */
FROM Cruise_Itinerary with(nolock)
INNER JOIN Cruise with(nolock) ON Cruise_Itinerary.CRUISE_ID = Cruise.CRUISE_ID
AND (Cruise.CRUISE_ID = #CRUISE_ID)
/* ... */
Second query
DECLARE #CRUISE_ID int = 10001890 --:CRUISE_ID
SELECT
/* ... */
FROM Cruise_Itinerary with(nolock)
INNER JOIN Cruise with(nolock) ON Cruise_Itinerary.CRUISE_ID = Cruise.CRUISE_ID
AND (#CRUISE_ID is null OR Cruise.CRUISE_ID = #CRUISE_ID)
/* ... */
The first query executes in one second but the second one takes over 2 minutes to execute. I just don't get it. What is a difference between
AND (10001890 is null OR Cruise.CRUISE_ID = 10001890)
and
AND (#CRUISE_ID is null OR Cruise.CRUISE_ID = #CRUISE_ID)?
Variable #CRUISE_ID has no other occurrences is the entire query.
Edit: I figured it out with help of my colleagues and you guys.
Here is a good explanation what is going on:
http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/
The optimal plan differs completely depending on what parameters are passed. The optimizer can't tell that and it plays safe. It creates plans that will always work. That’s (one of the reasons) why in the first example it was an index scan, not an index seek.
We can see it from the execution plan of the second query that the index scan happened at the end of plan. I checked. It takes over 2 minutes to execute if I remove this the whole condition.

Firstly, the logic in your query seems contradictory. You are essentially saying "If x and (x or y)". We (humans) might think along the lines of:
Given that x (Cruise.CRUISE_ID = #CRUISE_ID) in this instance must be true to meet the AND logic, the second condition (#CRUISE_ID is null OR Cruise.CRUISE_ID = #CRUISE_ID) can be ignored. So ensure that the x is true as the starting point for one's calculations.
The SQL query optimiser however clearly decides that the query plan must try to ensure that both sides of the AND must be met and thus rationalises it something along the lines of:
With just condition 1 the plan can start by performing a (clustered?) INDEX SEEK on the Cruise table on the basis of the (presumably indexed) CruiseID. When you add in condition 3 the optimiser can no longer perform this seek as another predicate (#CruiseID is null) must be taken account of (#CRUISE_ID is null OR Cruise.CRUISE_ID = #CRUISE_ID). Therefore the whole of the Cruise_Itinerary table has to be scanned (there are no other indexed columns it can use), then the join onto Cruise performed before the various conditions are checked as part of the join.
Essentially it is doing what you are asking - if the value is NULL then everything must be returned with predictably devastating consequences for performance. You would be better off using an IF...ELSE block to ensure that the query plan is optimised for both possible options (#CruiseID is null/ is not null).

Multiple Joins + Lots of Data Optimization

I am working on a massive join at work and have very limited resources in terms of being able to add indexes and such as well as what I can do in the query itself due to the environment (i.e. I can only select data, no variables or table creations allowed). I have read somewhere that a subquery will automatically index the result, is this true? Also for my major join tables (3) each has ~140K rows. I have to join 2 extra tables to ensure filtering is correct. I have the query listed below which I currently have criteria on the JOIN clause. Another question is if I move my criteria to a where clause either in or out of the subquery will it benefit?
SELECT *
FROM (SELECT NULL AS A1,
DFS_ROHEADER.TECHID,
DFS_ROHEADER.RONUMBER,
DFS_ROHEADER.CUSTOMERNUMBER,
DFS_CUSTOMER.BNAME,
DFS_ROHEADER.UNITNUMBER,
DFS_ROHEADER.MILEAGE,
DFS_ROHEADER.OPENEDDATE,
DFS_ROHEADER.CLOSEDDATE,
DFS_ROHEADER.STATUS,
DFS_ROHEADER.PONUMBER,
DFS_TECH.REGION,
DFS_TECH.RSM,
DFS_ROPART.PARTID,
CONVERT(NVARCHAR(max), DFS_RODETAIL.STORY) AS STORY
FROM DFS_ROHEADER
LEFT JOIN DFS_CUSTOMER
ON DFS_ROHEADER.CUSTOMERNUMBER = DFS_CUSTOMER.CUST_NO
LEFT JOIN DFS_TECH
ON DFS_ROHEADER.TECHID = DFS_TECH.TECHID
INNER JOIN DFS_RODETAIL
ON DFS_ROHEADER.RONUMBER = DFS_RODETAIL.RONUMBER
INNER JOIN DFS_ROPART
ON DFS_RODETAIL.RONUMBER = DFS_ROPART.RONUMBER
AND DFS_RODETAIL.LINENUMBER = DFS_ROPART.LINENUMBER
AND DFS_ROHEADER.RONUMBER LIKE '%$FF_RONumber%'
AND DFS_ROHEADER.UNITNUMBER LIKE '%$FF_UnitNumber%'
AND DFS_ROHEADER.PONUMBER LIKE '%$FF_PONumber%'
AND ( DFS_CUSTOMER.BNAME LIKE '%$FF_Customer%'
OR DFS_CUSTOMER.BNAME IS NULL )
AND DFS_ROHEADER.TECHID LIKE '%$FF_TechID%'
AND DFS_ROHEADER.CLOSEDDATE BETWEEN
FF_ClosedBegin AND FF_ClosedEnd
AND ( DFS_TECH.REGION LIKE '%$FilterRegion%'
OR DFS_TECH.REGION IS NULL )
AND ( DFS_TECH.RSM LIKE '%$FF_RSM%'
OR DFS_TECH.RSM IS NULL )
AND DFS_RODETAIL.STORY LIKE '%$FF_Story%'
AND DFS_ROPART.PARTID LIKE '%$FF_PartID%'
WHERE DFS_ROHEADER.DELETED_BY < 0
AND DFS_RODETAIL.DELETED_BY < 0
AND DFS_ROPART.DELETED_BY < 0) T
ORDER BY T.RONUMBER
This query works; however, it can take forever to run, and can timeout. I have other queries that also run in the environment and I will take whatever you can give me in terms of suggestions and apply it to those. I am using SQLServer 2000, Thanks for the help.
EDIT:
Execution Plan:
https://dl.dropboxusercontent.com/u/99733863/ExecutionPlan.sqlplan
UPDATE:
I have come to the conclusion the environment I'm working in is the cause of the problem. My query works as intended and is not slow at all (1 sec. for 18,000 rows). As stated in the comments I have to fill grids with limited flexibility and I believe that these grids fill by first filling a temporary grid with the SQL statement and then copying row by row into the desired grid. There is a good chance that this is the cause of my issues. Thanks for the help.

I have come to the conclusion the environment I'm working in is the cause of the problem. My query works as intended and is not slow at all (1 sec. for 18,000 rows). As stated in the comments I have to fill grids with limited flexibility and I believe that these grids fill by first filling a temporary grid with the SQL statement and then copying row by row into the desired grid. There is a good chance that this is the cause of my issues. Thanks for the help everyone.

My 2 cents here.. In general LIKE is not very well optimized. In your case you also seem to be using LIKE with '%value%'. In that case the query optimizer has to scan the entire index. At a minimum I would see if there is a way to avoid using this.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas