I'm a bit stuck with a stored procedure that is executing really slow. The stored procedure basically contains a query that uses an incoming parameter (in_id) and is put in a cursor like this:
open tmp_cursor for
select col1, col2, col3
from table1 tab
where ((in_id is null) or (tab.id = in_id)); -- tab.id is the PK
When I get an execution plan for the SQL query separately with predefined value, I get good results with the query using an index. However when I call the procedure from my application, I see that no index is being used and the table gets full scan, thus giving slow performance.
If I remove the first part of WHERE clause "(in_id is null)" the performance from the application is fast again.
How come the index isn't used during the call from my application (in_id is passed in)?
in_id is null
I have answered a similar question here https://stackoverflow.com/a/26633820/3989608
Some facts about NULL values and INDEX:
Entirely NULL keys are not entered into a ‘normal’ B*Tree in Oracle
Therefore, if you have a concatenated index on say C1 and C2, then you will likely find NULL values in it – since you could have a row where C1 is NULL but C2 is NOT NULL – that key value will be in the index.
Some portion of the demonstration by Thomas Kyte regarding the same:
ops$tkyte#ORA9IR2> create table t
2 as
3 select object_id, owner, object_name
4 from dba_objects;
Table created.
ops$tkyte#ORA9IR2> alter table t modify (owner NOT NULL);
Table altered.
ops$tkyte#ORA9IR2> create index t_idx on t(object_id,owner);
Index created.
ops$tkyte#ORA9IR2> desc t
Name Null? Type
----------------------- -------- ----------------
OBJECT_ID NUMBER
OWNER NOT NULL VARCHAR2(30)
OBJECT_NAME VARCHAR2(128)
ops$tkyte#ORA9IR2> exec dbms_stats.gather_table_stats(user,'T');
PL/SQL procedure successfully completed.
Well, that index can certainly be used to satisfy “IS NOT NULL” when applied to OBJECT_ID:
ops$tkyte#ORA9IR2> set autotrace traceonly explain
ops$tkyte#ORA9IR2> select * from t where object_id is null;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=34)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=3 Card=1 Bytes=34)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=2 Card=1)
In fact – even if the table did not have any NOT NULL columns, or we didn’t want/need to have a concatenated index involving OWNER – there is a transparent way to find the NULL OBJECT_ID values rather easily:
ops$tkyte#ORA9IR2> drop index t_idx;
Index dropped.
ops$tkyte#ORA9IR2> create index t_idx_new on t(object_id,0);
Index created.
ops$tkyte#ORA9IR2> set autotrace traceonly explain
ops$tkyte#ORA9IR2> select * from t where object_id is null;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=34)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=3 Card=1 Bytes=34)
2 1 INDEX (RANGE SCAN) OF 'T_IDX_NEW' (NON-UNIQUE) (Cost=2 Card=1)
Source : Something about nothing by Thomas Kyte
Assuming that in_id is a query parameter - not a column name:
The query has to have only ONE exec plan, regardless of the input. So if you pass parameter in_id as NULL, then it is supposed to return ALL rows. If you pass non-NULL in_id is should return only a single PK value.
So Oracle chooses the "worst possible" exec. plan to deal with "worst possible" scenario. The "generic" queries are road to hell. Simply split the query into two.
select col1, col2, col3
from table1 tab
where in_id is null or in_id is not null;
This will use FULL table scan, which is the best way how to get all the rows.
select col1, col2, col3
from table1 tab
where tab.id = in_id; -- tab.id is the PK
This will use UNIQUE index scan, which is the best way how to get a single indexed row.
select col1, col2, col3 from table1 tab where (tab.id = nvl(in_id,tab.id));
May be help.. or you may use oracle hint
+Use_concat
Related
I have a query
Select Column1, column2.... column30
From Table
Where (#var IS NULL OR Col1 = #var)
This can get resolved to a clustered index scan where every row in a table getting checked against this condition #var IS NULL and the index set up on the column Col1 is disregarded. Is there a way to re-write this query to enable the optimizer to use the index on col1? Thank you!
According to the SQLite documentation on rowid the data for rowid tables is stored in a B-tree. I’ve been considering using a hash of my data as the rowid. Since this means I’d be inserting rows with rowids that are not ordered like the default implementation of rowid how will this impact INSERT and SELECT performance in addition to the layout of data in my table?
If I insert a row which has a large rowid because it’s a hash and then a row with a smaller rowid what will the table layout look like?
It would depend upon how.
If you do not define an alias for the rowid column and a VACUUM takes place then the rowid values will likely be messed up (as they may/will be re-assigned).
e.g. :-
DROP TABLE IF EXISTS tablex;
CREATE TABLE IF NOT EXISTS tablex (data TEXT);
INSERT INTO tablex (rowid,data) VALUES(82356476978,'fred'),(55,'mary');
SELECT rowid AS therowid,* FROM tablex;
VACUUM;
SELECT rowid AS therowid,* FROM tablex;
results in :-
and then :-
If an alias is defined the VACUUM shouldn't be an issue and as above, it's fine to do so.
Of course you have to adhere to the rules and as long as the rules are obeyed, that is that the values are unique integers and are not greater than 9223372036854775807 or less than -9223372036854775808, then it should be fine. Other values would result in a datatype mismatch error.
I don't believe there would be much of an impact upon performance, there could possibly even be an improvement as there may well be space free in leaves reducing the need for a more costly split.
e.g. the following :-
DROP TABLE IF EXISTS tabley;
CREATE TABLE IF NOT EXISTS tabley (myrowidalias INTEGER PRIMARY KEY ,data TEXT);
INSERT INTO tabley VALUES(9223372036854775807,'fred'),(-9223372036854775808,'Mary'),(55,'Sue');
SELECT rowid AS therowid,* FROM tabley;
VACUUM;
SELECT rowid AS therowid,* FROM tabley;
-- INSERT INTO tabley VALUES(9223372036854775808,'Sarah'); -- Dataype mismatch
INSERT INTO tabley VALUES(-9223372036854775809,'Bob'); -- Datatype mismatch
SELECT rowid AS therowid,* FROM tabley; -- not run due to above error
Results in (note rowid retrieved via rowid and it's alias) :-
and after the VACUUM (identical) :-
With message :-
-- INSERT INTO tabley VALUES(9223372036854775808,'Sarah');
INSERT INTO tabley VALUES(-9223372036854775809,'Bob')
> datatype mismatch
> Time: 0s
I am using Oracle SQL developer, We are loading tables with data and I need to validate if all the tables are populated and if there are any columns that are completely null(all the rows are null for that column).
For tables I am clicking each table and looking at the data tab and finding if the tables are populated and then have looking through each of the columns using filters to figure out if there are any completely null columns. I am wondering if there is faster way to do this.
Thanks,
Suresh
You're in luck - there's a fast and easy way to get this information using optimizer statistics.
After a large data load the statistics should be gathered anyway. Counting NULLs is something the statistics gathering already does. With the default settings since 11g, Oracle will count the number of NULLs 100% accurately. (But remember that the number will only reflect that one point in time. If you add data later, the statistics must be re-gathered to get newer results.)
Sample schema
create table test1(a number); --Has non-null values.
create table test2(b number); --Has NULL only.
create table test3(c number); --Has no rows.
insert into test1 values(1);
insert into test1 values(2);
insert into test2 values(null);
commit;
Gather stats and run a query
begin
dbms_stats.gather_schema_stats(user);
end;
/
select table_name, column_name, num_distinct, num_nulls
from user_tab_columns
where table_name in ('TEST1', 'TEST2', 'TEST3');
Using the NUM_DISTINCT and NUM_NULLS you can tell if the column has non-NULLs (num_distinct > 0), NULL only (num_distinct = 0 and num_nulls > 0), or no rows (num_distinct = 0 and num_nulls = 0).
TABLE_NAME COLUMN_NAME NUM_DISTINCT NUM_NULLS
---------- ----------- ------------ ---------
TEST1 A 2 0
TEST2 B 0 1
TEST3 C 0 0
Certainly. Write a SQL script that:
Enumerates all of the tables
Enumerates the columns within the tables
Determine a count of rows in the table
Iterate over each column and count how many rows are NULL in that column.
If the number of rows for the column that are null is equal to the number of rows in the table, you've found what you're looking for.
Here's how to do just one column in one table, if the COUNT comes back as anything higher than 0 - it means there is data in it.
SELECT COUNT(<column_name>)
FROM <table_name>
WHERE <column_name> IS NOT NULL;
This query return that what you want
select table_name,column_name,nullable,num_distinct,num_nulls from all_tab_columns
where owner='SCHEMA_NAME'
and num_distinct is null
order by column_id;
Below script you can use to get empty columns in a table
SELECT column_name
FROM all_tab_cols
where table_name in (<table>)
and avg_col_len = 0;
I am having serious performance issues when using a nested loop in a WHERE clause.
When I run the below code as is, it takes several minutes. The trick is I'm using the WHERE clause to pull ALL data if the report_id is NULL, but only certain report_id's if I set them in the parameter string.
The function [fn_Parse_List] turns a VARCHAR string such as '123,456,789' into a table where each row is each number in integer form, which is then used in the IN clause.
When I run the code below with report_id = '456' (the dashed out portion), the code takes seconds, but passing the temporary table and using the SELECT statement in the WHERE clause kills it.
alter procedure dbo.p_revenue
(#report_id varchar(max) = NULL)
as
select cast(value as int) Report_ID
into #report_ID_Temp
from [fn_Parse_List] (#report_id)
SELECT *
FROM BIGTABLE
where #report_id is null
or a.report_id in (select Report_ID from #report_ID_Temp)
--Where #report_id is null or a.report_id in (456)
exec p_revenue #report_id = '456'
Is there a way to optimize this? I tried a JOIN with the table #report_ID_Temp, but it still takes just as long and doesn't work when the report_id is NULL.
You're breaking three different rules.
If you want two query plans, you need two queries: OR does not give you two query plans. IF does.
If you have a temporary table, make sure it has a primary key and any appropriate indexes. In your case, you need an ALTER TABLE statement to add the primary key clustered index. Or you can CREATE TABLE to declare the structure in the first place.
If you think fn_Parse_List is a good idea, you haven't read enough Sommarskog
If I were to write the Stored Procedure for your case, I would use a Table Valued Parameter (TVP) instead of passing multiple values as a comma-seperated string.
Something like the following:
-- Create a type for the TVP
CREATE TYPE REPORT_IDS_PAR AS TABLE(
report_id INT
);
GO
-- Use the TVP type instead of VARCHAR
CREATE PROCEDURE dbo.revenue
#report_ids REPORT_IDS_PAR READONLY
AS
BEGIN
SET NOCOUNT ON;
IF NOT EXISTS(SELECT 1 FROM #report_ids)
SELECT
*
FROM
BIGTABLE;
ELSE
SELECT
*
FROM
#report_ids AS ids
INNER JOIN BIGTABLE AS bt ON
bt.report_id=ids.report_id;
-- OPTION(RECOMPILE) -- see remark below
END
GO
-- Execute the Stored Procedure
DECLARE #ids REPORT_IDS_PAR;
-- Empty table for all rows:
EXEC dbo.revenue #ids;
-- Specific report_id's for specific rows:
INSERT INTO #ids(report_id)VALUES(123),(456),(789);
EXEC dbo.revenue #ids;
GO
If you run this procedure with a TVP with a lot of rows or a wildly varying number of rows, I suggest you add the option OPTION(RECOMPILE) to the query.
I see 2 possible things that could help improve performance. Depends on which part is taking the longest. First off, SELECT INTO is a single threaded operation until SQL Server 2014. If this is taking a long time, create an explicitly defined temp table with CREATE TABLE. Secondly, depending on the number of records inserted into the temp table, you probably need an index on the Report_ID column. That can all be done in the body of the stored procedure. If you do end up using an explicitly defined temp table, I would create the index after the data is loaded.
If that doesn't help, first check that the report_id column on the BIGTABLE is indexed. Then try splitting the select into 2 and combining with a UNION ALL like this:
ALTER PROCEDURE dbo.p_revenue
(
#report_id VARCHAR(MAX) = NULL
)
AS
SELECT CAST(value AS INT) Report_ID
INTO #report_ID_Temp
FROM fn_Parse_List(#report_id);
SELECT *
FROM BIGTABLE
WHERE #report_id IS NULL
UNION ALL
SELECT *
FROM BIGTABLE
WHERE a.report_id IN ( SELECT Report_ID
FROM #report_ID_Temp );
GO
EXEC p_revenue #report_id = '456';
Are you saying I should have two queries, one where it pulls if the report_id doesn't exists and one where there is a list of report_ids?
Yes, yes, yes. The fact, that it somehow works when You enter the numbers directly, distracts You from the core problem. You need table scan when #report_id is null and index seek when it is not and You can not have both in one execution plan. The performance would inevitably have to suffer, one way or another.
I would prefer not to, as the table i'm pulling from is actually a
view with 800 lines with an additional parameter not shown above.
I do not see where is the problem, SELECT * FROM BIGTABLE and SELECT * FROM BIGVIEW seems the same. If You need parameters You can use inline table valued function. If You have more parameters with variable selectivity like #report_id, I guess You would end up with dynamic sql anyway, sooner or later.
UNION ALL as proposed by #db_brad would help, but one of those subquery is executed even when there is no need for it.
As a quick patch You can append OPTION(RECOMPILE) to the SELECT and have table scan one time and index seek the other time, but recompiling every time would induce nontrivial overhead.
I'm using stored procedure to fetch data and i needed to filter dynamically. For example if i dont want to fetch some data which's id is 5, 10 or 12 im sending it as string to procedure and im converting it to table via user defined function. But i must consider performance so here is a example:
Solution 1:
SELECT *
FROM Customers
WHERE CustomerID NOT IN (SELECT Value
FROM dbo.func_ConvertListToTable('4,6,5,1,2,3,9,222',','));
Solution 2:
CREATE TABLE #tempTable (Value NVARCHAR(4000));
INSERT INTO #tempTable
SELECT Value FROM dbo.func_ConvertListToTable('4,6,5,1,2,3,9,222',',')
SELECT *
FROM BusinessAds
WHERE AdID NOT IN (SELECT Value FROM #tempTable)
DROP TABLE #tempTable
Which solution is better for performance?
You would probably be better off creating the #temp table with a clustered index and appropriate datatype
CREATE TABLE #tempTable (Value int primary key);
INSERT INTO #tempTable
SELECT DISTINCT Value
FROM dbo.func_ConvertListToTable('4,6,5,1,2,3,9,222',',')
You can also put a clustered index on the table returned by the TVF.
As for which is better SQL Server will always assume that the TVF will return 1 row rather than recompiling after the #temp table is populated, so you would need to consider whether this assumption might cause sub optimal query plans for the case that the list is large.