SQL Query Performance with case statement

SQL Query Performance with case statement - sql

I have a simple select that is running very slow and have narrowed it down to one particular where statement.
I am not sure if you need to see the whole query, or maybe will be able to help me understand why the case is affecting performance so much. I feel like I found the problem, but can't seem to resolve it. I've worked with case statement before, and have never ran into such huge performance issues.
For this particular example. the declaration is as follows: Declare #lastInvOnly as int = 0
the problem where statement follow and runs for about 20 seconds:
AND ird.inventorydate = CASE WHEN #lastinvonly=0 THEN
-- get the last reported inventory in respect to the specified parameter
(SELECT MAX(ird2.inventorydate)
FROM irdate ird2
WHERE ird2.ris =r.ris AND
ird2.generateddata!='g' AND
ird2.inventorydate <= #inventorydate)
END
Removing the case makes it run in 1 second which is a HUGE difference. I can't understand why.
AND ird.inventorydate =
(SELECT MAX(ird2.inventorydate)
FROM irdate ird2
WHERE ird2.ris = r.ris AND
ird2.generateddata! = 'g' AND
ird2.inventorydate <= #inventorydate)

It should almost certainly be a derived table and you should join to it instead. Sub-selects tend to have poor performance and when used conditionally, even worse. Try this instead:
INNER JOIN (
select
ris
,max(inventorydate) AS [MaxInvDate]
from irdate
where s and generateddata!='g'
and inventorydate <= #inventorydate
GROUP BY ris
) AS MaxInvDate ON MaxInvDate.ris=r.ris
and ird.inventorydate=MaxInvDate.MaxInvDate
and #lastinvonly=0
I'm not 100% positive this logically works with the whole query as your question only provides a small part.

I can't tell for sure without seeing an execution plan but the branch in your filter is likely the cause of the performance problems. Theoretically, the optimizer can take the version without the case and apply an optimization that transforms the subquery in your filter into a join; when the case statement is added this optimization is no longer possible and the subquery is executed for every row. One can refactor the code to help the optimizer out, something like this should work:
outer apply (
select max(ird2.inventorydate) as maxinventorydate
from irdate ird2
where ird2.ris = r.ris
and ird2.generateddata <> 'g'
and ird2.inventorydate <= #inventorydate
and #lastinvonly = 0
) as ird2
where ird.inventorydate = ird2.maxinventorydate

Related

How to prevent a filter from causing a query to run much slower

I have the following piece of code which runs quickly (<1s):
SELECT
[Policy].[Value] AS [PolicyId]
,[Person].[Value] AS [PersonId]
,[Person].[Index] AS [PersonIndex]
FROM
[dbo].[View] AS [Policy]
INNER JOIN [dbo].[ViewPerson] AS [Person] WITH(INDEX([Index])) ON ([Policy].[CollectionId] = [Person].[CollectionId]
AND [Person].[Name] = 'PersonId' AND [Policy].[Name] = 'PolicyId')
WHERE
[Policy].[CollectionId] = 10003
-- AND [Policy].[Value] = [Person].[Value]
This will return 2 rows from my database. When I comment out the last line to apply a stronger filter it returns only 1 row from my database, but will take much longer to run (~20s).
Is there a method to reduce the time this query takes to run when a filter is applied to it? Ideally I'd like it to run at the same speed as the original.

You were told in comments, that forcing the engine to use a special index is - in most cases - not the best idea. The engine is pretty good in finding the best plan and it will work best if you let it go its own route.
Secondly you were told already, that the execution plan is the best place to start. As we do not see any details, the following is pure guessing:
If I get this correctly, your query will use CollectionId to filter for one given id (just very few Policy rows). For these rows, the JOIN on a VIEW (we have no idea, what is behind here!) tries to link person rows.
The filter should work against a very reduced set.
Your observations let me assume, that the second line in WHERE is dealing with a much larger set. I'm pretty sure, that the filter for CollectionId=10003 pulls after the other filter... The execution plan will show the details...
What you can do:
Take away the index hint
Try to add the second line in the WHERE with AND to the ON-clause of the JOIN.
Something along this:
SELECT
[Policy].[Value] AS [PolicyId]
,[Person].[Value] AS [PersonId]
,[Person].[Index] AS [PersonIndex]
FROM
[dbo].[View] AS [Policy]
INNER JOIN [dbo].[ViewPerson] AS [Person] ON ([Policy].[CollectionId] = [Person].[CollectionId]
AND [Person].[Name] = 'PersonId'
AND [Policy].[Name] = 'PolicyId'
AND [Policy].[Value] = [Person].[Value])
WHERE
[Policy].[CollectionId] = 10003;

Oracle view join to table very weird slow issue

I have a table order, which is very straightforward, it is storing order data.
I have a view, which is storing currency pair and currency rate. The view is created as below:
create or replace view view_currency_rate as (
select c.* from currency_rate c, (
select curr_from, curr_to, max(rate_date) max_rate_date from currency_rate
where system_rate > 0
group by curr_from, curr_to) r
where c.curr_from = r.curr_from
and c.curr_to = r.curr_to
and c.rate_date = r.max_rate_date
and c.system_rate > 0
);
nothing fancy here, this view populate the latest currency rate (curr_from -> curr_to) from the currency_rate table.
When I do as below, it populate 80k row (all data) because I have plenty of records in order table. And the time spent is less than 5 seconds.
First Query:
select * from
VIEW_CURRENCY_RATE c, order a
where
c.curr_from = A.CURRENCY;
I want to add in more filter, so I thought it could be faster, so I added this:
Second Query:
select * from
VIEW_CURRENCY_RATE c, order a
where
a.id = 'xxxx'
and c.curr_from = A.CURRENCY;
And now it run over 1 minute! I totally have no idea what happen to this. I thought it would be some oracle optimizer goes wrong, so I try to find another way, think of just the 80K data can be populated quite fast, so I try to get the data from it, so I nested the SQL as below:
select * from (
select * from
VIEW_CURRENCY_RATE c, order a
where
c.curr_from = A.CURRENCY
)
where id = 'xxxx';
It run damn slow as well! I running out of idea, can anyone explain what happen to my script?
Updated on 6-Sep-2016
After I know how to 'explain plan', I capture the screen:
Fist query (fast one with 80K data):
Second query (slow one):
The slow one totally break the view and form a new SQL! This is super weird that how can Oracle optimize this like that?

It seems problem relates to the plan of second query. because it uses of nest loops inplace of hash joint.
at first check if _hash_join_enable is true if it isn't true change it to true. if it is true there are some problem with oracle optimizer. for test it use of USE_HASH(tab2 tab1) hint.
Regards
mohsen

I am using Mike solution, I re-write the script, and it is running fast now, although the root cause is not determined, probably due to the oracle optimizer algorithm working in different way that I expect.

are SQL subqueries within CASE WHEN run once for a query, or for each row?

Basically, is the code below efficient (if I cannot use # variables in MonetDB), or will this call the subqueries more than once each?
CREATE VIEW sys.share26cuts_2007 (peorglopnr,share26cuts_2007) AS (
SELECT peorglopnr, CASE WHEN share26_2007 < (SELECT QUANTILE(share26_2007,0.25) FROM sys.share26_2007) THEN 1
WHEN share26_2007 < (SELECT QUANTILE(share26_2007,0.5) FROM sys.share26_2007) THEN 2
WHEN share26_2007 < (SELECT QUANTILE(share26_2007,0.75) FROM sys.share26_2007) THEN 3
ELSE 4 END AS share26cuts_2007
FROM sys.share26_2007
);
I would rather not use a user-defined function either, though this came up in other questions.

As e.g. GoatCO commented on the question, this is probably better avoided. The SET command that MonetDB support can be used with SELECT as in the code below. The remaining question is why all quantiles are zero where my data is surely not (I also got division by zero errors before using NULLIF). I show more of the code now.
CREATE VIEW sys.over26_2007 (personlopnr,peorglopnr,loneink,below26_loneink) AS (
SELECT personlopnr,peorglopnr,loneink, CASE WHEN fodelsear < 1981 THEN 0 ELSE loneink END AS below26_loneink
FROM sys.ds_chocker_lev_lisaindivid_2007
);
SELECT COUNT(*) FROM over26_2007;
CREATE VIEW sys.share26_2007 (peorglopnr,share26_2007) AS (
SELECT peorglopnr, SUM(below26_loneink)/NULLIF(SUM(loneink),0)
FROM sys.over26_2007
GROUP BY peorglopnr
);
SELECT COUNT(*) FROM share26_2007;
DECLARE firstq double;
SET firstq = (SELECT QUANTILE(share26_2007,0.25) FROM sys.share26_2007);
SELECT firstq;
DECLARE secondq double;
SET secondq = (SELECT QUANTILE(share26_2007,0.5) FROM sys.share26_2007);
SELECT secondq;
DECLARE thirdq double;
SET thirdq = (SELECT QUANTILE(share26_2007,0.275) FROM sys.share26_2007);
SELECT thirdq;
CREATE VIEW sys.share26cuts_2007 (peorglopnr,share26cuts_2007) AS (
SELECT peorglopnr, CASE WHEN share26_2007 < firstq THEN 1
WHEN share26_2007 < secondq THEN 2
WHEN share26_2007 < thirdq THEN 3
ELSE 4 END AS share26cuts_2007
FROM sys.share26_2007
);
SELECT COUNT(*) FROM share26cuts_2007;

About inspecting plans, MonetDB supports:
PLAN to see the logical plan
EXPLAIN to see the physical plan in terms of MAL instructions
TRACE same as EXPLAIN, but actually execute the MAL plan and return statistics for all instructions.
About your question on repeating the subqueries, in principle nothing will be repeated and you will not need to take care of it explicitly.
That's because the default optimization pipeline includes the commonTerms optimizer. Your SQL will be translated to a sequence of MAL instructions, with duplicate calls. MAL is designed to be simple: many short instruction calls, a bit like assembly, which operate on columns, not rows (hence don't apply the same reasoning you would use for SQL Server when you think of execution efficiency). This makes it easier to run some optimizations on it. The commonTerms optimizer will detect the duplicate calls and reuse all results that it can. This is done per-column. So you should really be able to run your query and be happy.
However, I said in principle. Not all cases will be detected (though most will), plus some limitations have been introduced on purpose. For example, the search-space for detecting duplicates is a window (too small for my taste - I have removed it altogether in my installations) over the whole MAL plan: if the duplicate instruction is too far down the plan it won't be detected. This was done for efficiency. In your case, that single query isn't that big, but if it is part of a longer chain of views, then all these views will compile into a large MAL plan - which might make commonTerms less effective - it really depends on the actual queries.

sql server group by with an alias

I am new to sql server, transitioning from mysql.
I have a complicated case statement that I would like to group on 6 whens and an else. Likely to get larger. To be able to run it, I need to copy the statement into the group by each time there is a modification. In mySql I would just group by the column number. Is there any work around for this? Making the code very ugly.
Is there going to be a performance penalty in creating a sub query for my case, then just grouping on the result field. Seems like trying to make the code more elegant will cause the query to use more resources.
Thanks
Below is a field I am grouping on. As I make a modification to the field for more edge cases, then I need to change code in up to 3 places. Makes for some very ugly code, and I need no extra help doing that myself.
dz_code = case
when isnull(dz.dz_code,'N/A') in ('GAB', 'MAB', 'N/A') and dc.howdidyouhear = 'Television' then 'Television'
when isnull(dz.dz_code,'N/A') in ('GAB', 'MAB', 'N/A') and dc.howdidyouhear in ('Other', 'N/A') then 'Other'
WHEN dz.dz_code = 'irs,irs' THEN 'irs'
when dz.dz_code like '%SDE%' THEN 'SDE'
when dz.dz_code like 'referral,' then REPLACE(dz.dz_code, 'referral','')
when charindex(',',dz.dz_code) = 4 then left(dz.dz_code,3)
else
dz.dz_code
END,

Maybe you can wrap the query in a subquery and use the alias in the select and the group by. It looks a little bulky in this example, but if you've got more complex case switches, or more than one of them, then this solution will probably much smaller and more readable.
select
CaseField
from
(select
case when 1 = 2 then
3
else 4 end as CaseField
from
YourTable t) c
group by
CaseField

Multiple Joins + Lots of Data Optimization

I am working on a massive join at work and have very limited resources in terms of being able to add indexes and such as well as what I can do in the query itself due to the environment (i.e. I can only select data, no variables or table creations allowed). I have read somewhere that a subquery will automatically index the result, is this true? Also for my major join tables (3) each has ~140K rows. I have to join 2 extra tables to ensure filtering is correct. I have the query listed below which I currently have criteria on the JOIN clause. Another question is if I move my criteria to a where clause either in or out of the subquery will it benefit?
SELECT *
FROM (SELECT NULL AS A1,
DFS_ROHEADER.TECHID,
DFS_ROHEADER.RONUMBER,
DFS_ROHEADER.CUSTOMERNUMBER,
DFS_CUSTOMER.BNAME,
DFS_ROHEADER.UNITNUMBER,
DFS_ROHEADER.MILEAGE,
DFS_ROHEADER.OPENEDDATE,
DFS_ROHEADER.CLOSEDDATE,
DFS_ROHEADER.STATUS,
DFS_ROHEADER.PONUMBER,
DFS_TECH.REGION,
DFS_TECH.RSM,
DFS_ROPART.PARTID,
CONVERT(NVARCHAR(max), DFS_RODETAIL.STORY) AS STORY
FROM DFS_ROHEADER
LEFT JOIN DFS_CUSTOMER
ON DFS_ROHEADER.CUSTOMERNUMBER = DFS_CUSTOMER.CUST_NO
LEFT JOIN DFS_TECH
ON DFS_ROHEADER.TECHID = DFS_TECH.TECHID
INNER JOIN DFS_RODETAIL
ON DFS_ROHEADER.RONUMBER = DFS_RODETAIL.RONUMBER
INNER JOIN DFS_ROPART
ON DFS_RODETAIL.RONUMBER = DFS_ROPART.RONUMBER
AND DFS_RODETAIL.LINENUMBER = DFS_ROPART.LINENUMBER
AND DFS_ROHEADER.RONUMBER LIKE '%$FF_RONumber%'
AND DFS_ROHEADER.UNITNUMBER LIKE '%$FF_UnitNumber%'
AND DFS_ROHEADER.PONUMBER LIKE '%$FF_PONumber%'
AND ( DFS_CUSTOMER.BNAME LIKE '%$FF_Customer%'
OR DFS_CUSTOMER.BNAME IS NULL )
AND DFS_ROHEADER.TECHID LIKE '%$FF_TechID%'
AND DFS_ROHEADER.CLOSEDDATE BETWEEN
FF_ClosedBegin AND FF_ClosedEnd
AND ( DFS_TECH.REGION LIKE '%$FilterRegion%'
OR DFS_TECH.REGION IS NULL )
AND ( DFS_TECH.RSM LIKE '%$FF_RSM%'
OR DFS_TECH.RSM IS NULL )
AND DFS_RODETAIL.STORY LIKE '%$FF_Story%'
AND DFS_ROPART.PARTID LIKE '%$FF_PartID%'
WHERE DFS_ROHEADER.DELETED_BY < 0
AND DFS_RODETAIL.DELETED_BY < 0
AND DFS_ROPART.DELETED_BY < 0) T
ORDER BY T.RONUMBER
This query works; however, it can take forever to run, and can timeout. I have other queries that also run in the environment and I will take whatever you can give me in terms of suggestions and apply it to those. I am using SQLServer 2000, Thanks for the help.
EDIT:
Execution Plan:
https://dl.dropboxusercontent.com/u/99733863/ExecutionPlan.sqlplan
UPDATE:
I have come to the conclusion the environment I'm working in is the cause of the problem. My query works as intended and is not slow at all (1 sec. for 18,000 rows). As stated in the comments I have to fill grids with limited flexibility and I believe that these grids fill by first filling a temporary grid with the SQL statement and then copying row by row into the desired grid. There is a good chance that this is the cause of my issues. Thanks for the help.

I have come to the conclusion the environment I'm working in is the cause of the problem. My query works as intended and is not slow at all (1 sec. for 18,000 rows). As stated in the comments I have to fill grids with limited flexibility and I believe that these grids fill by first filling a temporary grid with the SQL statement and then copying row by row into the desired grid. There is a good chance that this is the cause of my issues. Thanks for the help everyone.

My 2 cents here.. In general LIKE is not very well optimized. In your case you also seem to be using LIKE with '%value%'. In that case the query optimizer has to scan the entire index. At a minimum I would see if there is a way to avoid using this.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas