Indexed query along with non index conditions - sql

The following is a query that I'm executing.
select col1, col2 from table1 where col1 in (select table2.col1 from table2) and col2 = 'ABC' ;
In table1, an index is available on col2, no index is available on
col1.
I have around 4 million records for 'ABC' in table1.
The total size of the table1 is around 50 million.
The size of table2 is less. Around 1 million. (No indexes are available on this table)
The query takes a lot of time to come out. If I remove the "in (select table2.col1 from table2)" condition, the query behaves the way an indexed query needs to.
My question is,
In case we have an indexed column being used in the where clause and we include a condition for a non indexed column (specifically an in condition), is there a possibility of a performance hit? Explain plan on the query does not give any hint of non index fetch.
Also, does the order of the conditions matter?
i.e In case I give the index clause before the non index clause, will oracle apply the non index clause only on the subset chosen?
Thanks in advance.

The order of your predicates does not matter. The optimizer determines that.
It's not as simple as "index is always better", so the optimizer tries to evaluate the "selectivity" of each predicate, and then determine the "best" order.
If one predicate is determined to result in a very small portion of the table, and an index exist, it is likely that indexed access will be used. In your case, I don't think an index will help you, unless the rows are physically sorted (on disk) by col2.
If you can share the execution plans, we could probably help you.

Related

SQL: index on "order by" column when a query also has a lot of "where" predicates

Suppose I have this sql query:
select * from my_table
where col1 = 'abc' and col2 = 'qwe' and ... --e.g. 10 predicates or more
order by my_date desc
WIll the index only on my_date column even be used by DB? Will it improve performance somehow?
I'm more interested in Postgres.
The PostgreSQL optimizer will use the index if it thinks that that is cheaper than fetching the rows that match the WHERE condition and sorting them.
This will probably be that case if:
there are many such rows, and sorting would be more expensive than the index scan
there are no indexes to support the WHERE condition
Without a LIMIT, the chances of using the single-column index to provide order here are pretty low. Indeed, I can't contrive a situation to do so without monkeying around with enable_sort or enable_seqsan.
Even with a LIMIT, after applying 10 equality conditions it will be pretty unusual for the expected number of rows left over to be high enough to make the index appear to be worthwhile.

SQL 'case when' vs 'where' efficiency

Which is more efficient:
Select SUM(case when col2=2 then col1 Else 0 End) From myTable
OR
Select SUM(Col1) From myTable where col2=2
Or are they the same speed?
Definitively the second one should be faster. This is because of the concept of "Access". Access refers to the amount of data that the query needs to retrieve in order to produce the result. It has a big impact on the "operator" the database engine optimizer decides to include in the execution plan.
Safe some exceptions, the first query needs to access all the table rows and then compute the result, including rows that don't have anything to do with the case.
The second query only refers to the specific rows needed to compute the result. Therefore, it has the potentiality of being faster. In order for it to be materialized, the presence of indexes is crucial. For example:
create index ix1 on myTable (col2);
In this case it will only access the subset of rows that match the filtering predicate col2 = 2.
The second is more efficient:
It would generally process fewer rows (assuming there are non-"2" values), because rows would be ignored before the aggregation function is called.
It allows the optimizer to take advantage of indexes.
It allows the optimizer to take advantage of table partitions.
Under some circumstances, they might appear to take the same amount of time, particularly on small tables.

DB2 query using zero value in IN CLAUSE is causing table scan, index on column is ignored

SELECT * FROM TABLE1 WHERE COL1 in( 597966104, 597966100);
SELECT * FROM TABLE1 WHERE COL1 in( 0, 597966100)
In the above 2 queries the first query uses index created on COL1 but the second query does not use index. The only difference in both queries is that zero (0) is used in the IN CLAUSE of the second query. Why is the zero causing the index to be ignored. This leading to table scan and slowing down the query performance. Is there any solution for this problem. Any help on this issue is welcome and appreciated. Database used is DB2
DB2 has a cost based optimizer. It tries to fugure out the best access plan and uses its statistics and configuration to determine it.
In your case the number of rows with col1 = 0 could really matter. For example when col1=0 for 40% of your data it could be cheaper to do the table scan.
If you want to figure out more details explain the query and you will see how the data is accessed and how much rows the optimizer guesses for the result set.
Make sure you have the correct and up-to-date statistics by running runstats for the table(s) as this will be the most important source of information for the optimizer.

Optimize Oracle SELECT on large dataset

I am new in Oracle (working on 11gR2). I have a table TABLE with something like ~10 millions records in it, and this pretty simple query :
SELECT t.col1, t.col2, t.col3, t.col4, t.col5, t.col6, t.col7, t.col8, t.col9, t.col10
FROM TABLE t
WHERE t.col1 = val1
AND t.col11 = val2
AND t.col12 = val3
AND t.col13 = val4
The query is currently taking about 30s/1min.
My question is: how can I improve performance ? After a lot of research, I am aware of the most classical ways to improve performance but I have some problems :
Partitioning: can't really, the table is used in an other project and it would be too impactful. Plus it only delay the problem given the number of rows inserted in the table every day.
Add an index: The thing is, the columns used in the WHERE clause are not the one returned by the query (except for one). Thus, I have not been able to find an appropriate index yet. As far as I know, setting an index on 12~13 columns does not make a lot of sense (or does it?).
Materialized views: I must say I never used them, but I understood the maintenance cost is pretty high and my table is updated quite often.
I think the best way to do this would be to add an appropriate index, but I can't find the right columns on which it should be created.
An index makes sense provided that your query results in a small percentage of all rows. You would create one index on all four columns used in the WHERE clause.
If too many records match, then a full table scan will be done. You may be able to speed this up by having this done in parallel threads using the PARALLEL hint:
SELECT /*+parallel(t,4)*/
t.col1, t.col2, t.col3, t.col4, t.col5, t.col6, t.col7, t.col8, t.col9, t.col10
FROM TABLE t
WHERE t.col1 = val1 AND t.col11 = val2 AND t.col12 = val3 AND t.col13 = val4;
Table with 10 millions records is quite little table. You just need to create an appropriate index. Which column select for index - depends on content of them. For example, if you have column that contains only "1" and "0", or "yes" and "no", you shouldn't index it. The more different values contains column - the more effect gives index. Also you can make index on two or three (and more) columns, or function-based index (in this case index contains results of your SQL function, not columns values). Also you can create more than one index on table.
And in any case, if your query selects more then 20 - 30% of all table records, index will not help.
Also you said that table is used by many people. In this case, you need to cooperate with them to avoid duplicating indexes.
Indexes on each of the columns referenced in the WHERE clause will help performance of a query against a table with a large number of rows, where you are seeking a small subset, even if the columns in the WHERE clause are not returned in the SELECT column list.
The downside of course is that indexes impede insert/update performance. So when loading the table with large numbers of records, you might need to disable/drop the indexes prior to loading and then re-create/enable them again afterwards.

Does indexes work with group function in oracle?

I am running following query.
SELECT Table_1.Field_1,
Table_1.Field_2,
SUM(Table_1.Field_5) BALANCE_AMOUNT
FROM Table_1, Table_2
WHERE Table_1.Field_3 NOT IN (1, 3)
AND Table_2.Field_2 <> 2
AND Table_2.Field_3 = 'Y'
AND Table_1.Field_1 = Table_2.Field_1
AND Table_1.Field_4 = '31-oct-2011'
GROUP BY Table_1.Field_1, Table_1.Field_2;
I have created index for columns (Field_1,Field_2,Field_3,Field_4) of Table_1 but the index is not getting used.
If I remove the SUM(Table_1.Field_5) from select clause then index is getting used.
I am confused if optimizer is not using this index or its because of SUM() function I have used in query.
Please share your explaination on the same.
When you remove the SUM you also remove field_5 from the query. All the data needed to answer the query can then be found in the index, which may be quicker than scanning the table. If you added field_5 to the index the query with SUM might use the index.
If your query is returning the large percentage of table's rows, Oracle may decide that doing a full table scan is cheaper than "hopping" between the index and the table's heap (to get the values in Table_1.Field_5).
Try adding Table_1.Field_5 to the index (thus covering the whole query with the index) and see if this helps.
See the Index-Only Scan: Avoiding Table Access at Use The Index Luke for conceptual explanation of what is going on.
As you mentioned, the presence of the summation function results in the the Index being overlooked.
There are function based indexes:
A function-based index includes columns that are either transformed by a function, such as the UPPER function, or included in an expression, such as col1 + col2.
Defining a function-based index on the transformed column or expression allows that data to be returned using the index when that function or expression is used in a WHERE clause or an ORDER BY clause. Therefore, a function-based index can be beneficial when frequently-executed SQL statements include transformed columns, or columns in expressions, in a WHERE or ORDER BY clause.
However, as with all, function based indexes have their restrictions:
Expressions in a function-based index cannot contain any aggregate functions. The expressions must reference only columns in a row in the table.
Though I see some good answers here couple of important points are being missed -
SELECT Table_1.Field_1,
Table_1.Field_2,
SUM(Table_1.Field_5) BALANCE_AMOUNT
FROM Table_1, Table_2
WHERE Table_1.Field_3 NOT IN (1, 3)
AND Table_2.Field_2 <> 2
AND Table_2.Field_3 = 'Y'
AND Table_1.Field_1 = Table_2.Field_1
AND Table_1.Field_4 = '31-oct-2011'
GROUP BY Table_1.Field_1, Table_1.Field_2;
Saying that having SUM(Table_1.Field_5) in select clause causes index not to be used in not correct. Your index on (Field_1,Field_2,Field_3,Field_4) can still be used. But there are problems with your index and sql query.
Since your index is only on (Field_1,Field_2,Field_3,Field_4) even if your index gets used DB will have to access the actual table row to fetch Field_5 for applying filter. Now it completely depends on the execution plan charted out of sql optimizer which one is cost effective. If SQL optimizer figures out that full table scan has less cost than using index it will ignore the index. Saying so I will now tell you probable problems with your index -
As others have states you could simply add Field_5 to the index so that there is no need for separate table access.
Your order of index matters very much for performance. For eg. in your case if you give order as (Field_4,Field_1,Field_2,Field_3) then it will be quicker since you have equality on Field_4 -Table_1.Field_4 = '31-oct-2011'. Think of it this was -
Table_1.Field_4 = '31-oct-2011' will give you less options to choose final result from then Table_1.Field_3 NOT IN (1, 3). Things might change since you are doing a join. It's always best to see the execution plan and design your index/sql accordingly.