Performance tuning a DECODE() statement in a WHERE clause - sql

I re-wrote a query to reduce the time it takes to pick the records. But still I see that there is a small tuning that needs to be done in the decode line as the cost is high. Can someone please let me know if the same query can be re-written without the decode functionality? The purpose of removing the decode function would be to use the index in v_id column.
What I have tried so far.
Tried creating a function index (Knowing that bind variables cannot be used) and it failed.
Have tried using the OR condition but it would not pick the index. So any suggestion would be of a great help.
Query is given below.:
SELECT SUM(NVL(dd.amt,0))
FROM db,dd
WHERE db.id = dd.dsba_id
AND dd.nd_id = xxxxxxx
AND dd.a_id = 'xxxxx-xx'
AND DECODE (db.v_id , xxxxxxxxx, 'COMPLETE' , db.code ) = 'COMPLETE'
AND db.datet BETWEEN TRUNC ( SYSDATE , 'YEAR' ) AND SYSDATE;

I would suggest writing the code as:
SELECT SUM(dd.amt)
FROM db JOIN
dd
ON db.id = dd.dsba_id
WHERE dd.nd_id = xxxxxxx AND
dd.a_id = 'xxxxx-xx' AND
(db.v_id = xxxxxxxxx OR db.code = 'COMPLETE') AND
db.datet >= trunc(sysdate, 'YEAR');
For this query, I would recommend indexes on:
db(nd_id, a_id, id, datet, code)
dd(dsba_id, datet, v_id)
The changes to the above query:
Never use commas in the FROM clause. Always use proper, explicit, standard, readable JOIN syntax. (This does not affect performance, however.)
decode() is rather hard to follow. A simple boolean or is equivalent.
BETWEEN is unnecessary assuming that datet is not in the future.
SUM(NVL()) is not needed, because NULL values are ignored. If you are concerned about NULL result, I would suggest COALESCE(SUM(dd.amt), 0)

Related

Is there any option to replace NVL in where clause for parameter

I have been using NVL in my WHERE clause and it worked well till now.
But in such case where the column has NULL value and parameter was also NULL, it didnt return any query.
select * from Table
where
f_date BETWEEN NVL(:F_DATE_FROM,F_DATE) AND NVL(:F_DATE_TO,F_DATE)
AND op_code = NVL(:CODE, OP_CODE)
AND T_CBC = NVL(:TO_CBC,T_CBC)
order by fiscal_date desc
I updated the query as below, and it returns me all the records as expected. However it takes way too long to execute the query. The original query took 1.5min and the new query takes 7min. Is there any way to fine tune the below query please?
select * from Table
where
f_date BETWEEN NVL(:F_DATE_FROM,F_DATE) AND NVL(:F_DATE_TO,F_DATE)
AND (OP_CODE = :CODE or :CODE is null)
AND (T_CBC = :TO_CBC or :TO_CBC is null)
order by fiscal_date desc
Sure:
WHERE
(f_date >= :F_DATE_FROM OR :F_DATE_FROM IS NULL) AND
(f_date <= :F_DATE_TO OR :F_DATE_TO IS NULL) AND
...
though I'm not sure how much of a performance improvement it'll realize. If your query is about performance specifically, ask a question that includes a query plan

Include NULL values in a designated field in my where clause on SSRS

I am learning SQL so be gentle. If I have designated a specific role in my where clause it is only pulling those cases where that role is populated. How can I also include the NULL values or those roles that are blank?
Here is the where clause now:
WHERE (dbo.vcases.lawtype = 'My Cases') AND
(dbo.vcase_parties_people.role_sk = 4001) AND
**(v1.role_sk = 3940) AND
(v1.report_ind = 'Y') AND
(v2.role_sk = 3939) AND
(v2.report_ind = 'Y')** AND
(dbo.vcases.case_type NOT IN ('Case type 1', 'Case type 2'))
The COALESCE() expression in SQL is useful for substituting a default value when NULL is encountered for a given column or expression. When the query optimizer encounters a COALESCE() call, it will internally rewrite that expression to an equivalent CASE...WHEN expression. In your sample query, WHERE (COALESCE(v1.role_sk, 3940) = 3940) would operate (and optimize) the same as WHERE (CASE WHEN v1.role_sk IS NOT NULL THEN v1.role_sk ELSE 3940 END = 3940).
Since your example specifically involves a condition in the WHERE clause, you may want to use an OR operation, which could optimize better than a COALESCE() expression: WHERE (v1.role_sk = 3940 OR v1.role_sk IS NULL).
This is also assuming that any joins in your query aren't filtering out rows whose role_sk column is NULL.
You might edit your code as follows:
WHERE (dbo.vcases.lawtype = 'My Cases') AND
(dbo.vcase_parties_people.role_sk = 4001) AND
(v1.role_sk = 3940 OR v1.role_sk IS NULL) AND
(v1.report_ind = 'Y') AND
(v2.role_sk = 3939) AND
(v2.report_ind = 'Y') AND
(dbo.vcases.case_type NOT IN ('Case type 1', 'Case type 2'))
The use of the Coalesce function has been suggested but a good rule of thumb in SQL is to avoid the use of functions in the WHERE clause because it reduces the efficiency of the table's indexes. Functions in WHERE clause often cause Index-Scans instead of the more efficient Index-Seeks.

BigQuery - Is it possible to use an IF statement within à FROM?

I would like to do the following
FROM if(... = ...,
table_date_range(mytable, timestamp('2017-01-01'), timestamp('2017-01-17')),
table_date_range(mytable, timestamp('2016-01-01'), timestamp('2016-01-17'))
)
Is this kind of operation allowed on BigQuery ?
You can do this using a condition on _TABLE_SUFFIX in standard SQL. For example,
SELECT *
FROM `my-dataset.mytable`
WHERE IF(condition,
_TABLE_SUFFIX BETWEEN '20170101' AND '20170117',
_TABLE_SUFFIX BETWEEN '20160101' AND '20160117');
One thing to keep in mind is that since the matching table suffixes are probably determined dynamically (based on something in your table) you will be charged for a full table scan.
For BigQuery Legacy SQL (which code in your question looks more like) you can use TABLE_QUERY table wildcard function to achieve this.
See example below:
SELECT
...
FROM
TABLE_QUERY([mydataset],
"CASE WHEN ... = ...
THEN REPLACE(table_id, 'mytable_', '') BETWEEN '20170101' AND '20170117'
ELSE REPLACE(table_id, 'mytable_', '') BETWEEN '20160101' AND '20160117'
")
or , with IF()
SELECT
...
FROM
TABLE_QUERY([mydataset],
"IF(... = ..., REPLACE(table_id, 'mytable_', '') BETWEEN '20170101' AND '20170117',
REPLACE(table_id, 'mytable_', '') BETWEEN '20160101' AND '20160117')
")
Meantime, when possible, consider migrating to BigQuery Standard SQL

optional parameter checking in where clauses

I have a bunch of report parameters and as a result my criteria checking first checks if parameter value is null and if not compares it with a column value .
(#dateStart IS NULL OR #dateStart <= BELGE.AccDate)
AND (#dateEnd IS NULL OR #dateEnd >= BELGE.AccDate)
AND (#CompanyId IS NULL OR #CompanyId = hrktlr.CompanyId)
AND ((#onKayitlarDahil = 1 and hrktlr.StatusCode in ('M', 'O'))
OR (#onKayitlarDahil = 0 AND hrktlr.StatusCode = 'M'))
AND (#BizPartnerId IS NULL or CK.BizPartnerId = #BizPartnerId)
AND (#BizPartnerKodStart is null or #BizPartnerKodStart = '' or #BizPartnerKodStart <= CK.BizPartnerKod)
AND (#BizPartnerKodEnd is null or #BizPartnerKodEnd = '' or #BizPartnerKodEnd >= CK.BizPartnerKod)
AND (#BizPartnerType is null or #BizPartnerType=CK.BizPartnerType)
this is great for a maintainable sql query, but the problem is that Sql Query Optimizer prepares itself for the worst case I guess, and index usage is bad. For example when I pass in BizPartnerId and thus avoid BizPartnerId is null check, query runs a 100 times faster.
So if I keep going with this approach are there any pointers that you can recommend for Query Planner to help increase query performance.
Any viable alternatives to optional parameter checking?
To stop sql server form saving a sub optimal query plan you can use the option WITH RECOMPILE. The query plan will be recalculated each time you run the query.

The speed of 'extract(year from sysdate)'

I have two queries:
with tmp as (
select asy.aim_student_id, ast.aim_test, asq.response
from aim_student_test ast
join aim_student_qst asq on (asq.aps_yr = ast.aps_yr and asq.aim_test = ast.aim_test and asq.aim_id = ast.aim_id)
join aim_student_yr asy on (asy.aps_yr = ast.aps_yr and asy.aim_student_yr_id = ast.aim_student_yr_id)
where asq.aps_yr = '2012'
and asq.qst_num = 1)
select aim_student_id, aim_test, response
from tmp
where response is null
-- execution-time: 0.032 seconds
define this_year = extract(year from sysdate)
with tmp as (
select asy.aim_student_id, ast.aim_test, asq.response
from aim_student_test ast
join aim_student_qst asq on (asq.aps_yr = ast.aps_yr and asq.aim_test = ast.aim_test and asq.aim_id = ast.aim_id)
join aim_student_yr asy on (asy.aps_yr = ast.aps_yr and asy.aim_student_yr_id = ast.aim_student_yr_id)
where asq.aps_yr = &this_year
and asq.qst_num = 1)
select aim_student_id, aim_test, response
from tmp
where response is null
-- execution-time: 82.202 seconds
The only difference is that in one I use '2012' and the other I implement extract(year from sysdate).
I can only imagine that Oracle is computing extract(year from sysdate) for EVERY record it checks, and that I just can't figure out how to make it compute this once and use it as a variable. Searching has not returned me the answers I seek... so I come to the magicians of SO.com. HOW do I properly use
extract(year from sysdate)
as a variable?
Using &this_year in the query causes a substitution of the string extract(year from sysdate), so the second query actually has:
where asq.aps_yr = extract(year from sysdate)
which you can see from the second explain plan. That in itself probably isn't the problem; what's possibly slowing it down is that doing this is changing the plan from an index range scan to an index skip scan against aim_student_qstp1. The real difference is that in the fast version you're comparing asq.aps_yr to a string ('2012'), in the second it's a number (2012), and - as also shown in the explain plan - this is causing it to do to_number(asq.aps_yr) which is stopping the index being used as you expect.
You could fix this in your code by making it:
where asq.aps_yr = to_char(&this_year)
If you want to calculate it once before the query runs and then use it as a variable, there are at least two ways (in SQL*Plus/SQL Developer). Sticking with substitution variables you can use the column commands instead of the define:
column tmp_this_year new_value this_year
set termout off
select extract(year from sysdate) as tmp_this_year from dual;
set termout on
set verify off
... which makes &this_year=2012 (the termout changes just make the actual retrieval invisible, and the verify stops it telling you when it uses the substitution; both so you don't get extra output in your script), and change your query to have:
where asq.aps_yr = '&this_year'
... so the value is treated as a string, making a to_char() unnecessary.
Or you can use bind variables:
var this_year varchar2(4);
set feedback off;
exec :this_year := extract(year from sysdate);
... and then your query has:
where asq.aps_yr = :this_year
Note that in this case you don't need the quotes because the bind variable is defined as a string already - there's an implicit conversion in the exec that sets it.
I doubt the difference is due to extracting the year from the date. I'm pretty sure Oracle would only be extracting the year once, since it is using a variable in the second case.
The difference is due to the execution path used by the query. You would need to post the execution plans to really see the difference. Using an explicit constant gives the optimizer more information for choosing an optimal query plan.
For instance, if the data is partitioned by year, then with a constant year, Oracle can determine which partition has the data. In the second case, Oracle might not recognize the value as a constant, and require reading all data partitions. This is just an example of what might happen -- I'm not sure what Oracle does in this case.