So I have a table with two indexes:
Index_1: Column_A, Column_B
Index_2: Column A, Column_B, Column_C
I am running a select query:
select * from table Where (Column A, Column_B, Column_C)
IN(('1','2','3'), ('4','5','6'),...);
When using "EXPLAIN PLAN for" in SQL developer. It seems to be using the first index instead of the second, despite the second matching the values in my query?
Why is this? and is it hindering my optimal performance?
Expanding on my comment, although we can't analyze Oracle's query planning without knowing anything about the data or seeing the actual plan, the three-column index is not necessarily better suited for your query than is the two-column index, at least if the base table has additional columns (which you are selecting) beyond those three.
Oracle is going to need to read the base table anyway to get the other columns. Supposing that the values in column_C are not too correlated with the values in column_A and column_B, the three-column index will be a lot larger than the two-column index. Using the two-column index may therefore involve reading fewer blocks overall, especially if that index is relatively selective.
Oracle has a very good query planner. If it has good table statistics to work with then it is probably choosing a good plan. Even without good statistics it will probably do a fine job for a query as simple as yours.
I had similar problem, Oracle use an index with 2 columns as per explain plan, while my query involves selection of 20 columns from the table and where clause having 5 values as below:
from tab1
where Col1= 'A'
and col2 = 'b'
and col3 = 'c'
and col4 = 'd'
and col5 >= 10
Index1: col1, col2
Index2: col1, col2, col3, col4, col5
If I added a hint to use index2 the query executed much faster than using index1... what could be done so that Oracle choose the index2?
Tried to ensure statistics are gathered, still system detects the index1 as the best to use.
Related
In MS Sql Server 2016, I have a view that contains a query that looks something like this:
SELECT Col1, Col2, MAX(Col3) FROM TableA
WHERE Col1 = ???
GROUP BY Col1, Col2
I use this view all over the place, and as a result, most of my queries seem very inefficient, having to do an Index Scan on TableA on almost every SELECT in my system.
What I'd like to know is whether there a way to store that MAX(Col3) so that it is computed during INSERTs and UPDATEs instead of during SELECTs?
Here are some thoughts and why I rejected them:
I don't think I can use a clustered indexed on the view, because "MAX(Col3)" is not deterministic.
I don't see how I can use a filtered view.
I don't want to use triggers.
I would start with an index tableA(col1, col2, col3).
Depending on the size of the table and the number of matches on col1, this should be quite reasonable performancewise.
Otherwise, you might need a summary table that is maintained using triggers.
I have a table with ~250 columns and 10m rows in it. I am selecting 3 columns with the where clause on an indexed column with an IN query. The number of ids in the IN clause is 2500 and the output is limited by 1000 rows, here's the rough query:
select col1, col2, col3 from table1 where col4 in (1, 2, 3, 4, etc) limit 1000;
This query takes much longer than I expected, ~1s. On an indexed integer column with only 2500 items to match, it seems like this should go faster? Maybe my assumption there is incorrect. Here is the explain:
http://explain.depesz.com/s/HpL9
I did not paste all 2500 ids into the EXPLAIN just for simplicity so ignore the fact that there are only 3 in that. Anything I am missing here?
It looks like you're pushing the limits of select x where y IN (...) type queries. You basically have a very large table with an large set of conditions to search on.
Depending on the type of indexes, I'm guessing you have B+Tree this kind of query is inefficient. These type of indexes do well with general purpose range matching and DB inserts while performing worse on single value lookups. Your query is doing ~2500 lookups on this index for single values.
You have a few options to deal with this...
Use Hash indexes (these perform much better on single value lookups)
Help out the query optimizer by adding in a few range based constraints, so you could take the 2500 values and find the min and max values and add that to the query. So basically append where x_id > min_val and x_id < max_val
Run the query in parallel mode if you have multiple db backends, simply breakup the 2500 constraints into say 100 groups and run all the queries at once and collect the results. It will be better if you group the constraints based on their value
The first option is certainly easier, but it will come at a price of making your inserts/deletes slower.
The second does not suffer from this, and you don't even need to limit it to one min max group. You could create N groups with N min and max constraints. Test it out with different groupings and see what works.
The last option is by far the best performing of course.
Your query is equivalent to:
select col1, col2, col3
from table1
where
col4 = 1
OR col4 = 2
OR col4 = 3
OR col4 = 4
... repeat 2500 times ...
which is equivalent to:
select col1, col2, col3
from table1
where col4 = 1
UNION
select col1, col2, col3
from table1
where col4 = 2
UNION
select col1, col2, col3
from table1
where col4 = 3
... repeat 2500 times ...
Basically, it means that the index on a table with 10M rows is searched 2500 times. On top of that, if col4 is not unique, then each search is a scan, which may potentially return many rows. Then 2500 intermediate result sets are combined together.
The server doesn't know that the 2500 IDs that are listed in the IN clause do not repeat. It doesn't know that they are already sorted. So, it has little choice, but do 2500 independent index seeks, remember intermediate results somewhere (like in an implicit temp table) and then combine them together.
If you had a separate table table_with_ids with the list of 2500 IDs, which had a primary or unique key on ID, then the server would know that they are unique and they are sorted.
Your query would be something like this:
select col1, col2, col3
from
table_with_ids
inner join table1 on table_with_ids.id = table1.col4
The server may be able to perform such join more efficiently.
I would test the performance using pre-populated (temp) table of 2500 IDs and compare it to the original. If the difference is significant, you can investigate further.
Actually, I'd start with running this simple query:
select col1, col2, col3
from table1
where
col4 = 1
and measure the time it takes to run. You can't get better than this. So, you'll have a lower bound and a clear indication of what you can and can't achieve. Then, maybe change it to where col4 in (1,2) and see how things change.
One more way to somewhat improve performance is to have an index not just on col4, but on col4, col1, col2, col3. It would still be one index, but on several columns. (In SQL Server I would have columns col1, col2, col3 "included" in the index on col4, rather than part of the index itself to make it smaller, but I don't think Postgres has such feature). In this case the server should be able to retrieve all data it needs from the index itself, without doing additional look-ups in the main table. Make it the so-called "covering" index.
This question already has answers here:
What are Covering Indexes and Covered Queries in SQL Server?
(9 answers)
Closed 8 years ago.
Trying to understand what "covering a query means" with a specific example
If I have a table with say 3 columns:
Col1 Col2 Col3
And I put an index on Col1 and Col2
is "covering a query" determine by the columns selected in the SELECT or the columns in the the WHERE Clause?
Thus :
1) select Col1, Col2 from MyTable where Col3=XXX
2) Select Col3 from MyTable where Col1=xxx and Col2=yyy
3) Select Col1, Col2 from MyTable where Col1=xxx and Col2=yyy
Which of these three are truly "Covered"?
Only the third example is covered. To be covered, a query must be fully satisfied from the index. Your first example produces results that are entirely within the index, but it needs information that is not part of the index to complete, and so is not covered. To match your first example, you need an index that lists Col3 first.
One important feature of indexes is the ability to include a set of column in the index without actually indexing those columns. So an index example for your table might look like this:
CREATE INDEX [ix_MyTable] ON [MyTable]
(
[Col1] ASC,
[Col2] ASC
)
INCLUDE ( [Col3])
Now samples 2 and 3 are both covered. Sample 1 is still not covered, because the index is still not useful for the WHERE clause.
Why INCLUDE Col3, rather than just listing it with the others? It's important to remember that as you add indexes or make them more complex, operations that change data using those indexes will require more and more work, because each change will also require updating the indexes. If you include a column in an index, without actually indexing it, an update to that column still needs to go back and update the index as well, so that the data in the index is accurate... but it doesn't also need to re-order the index based on the new value. So this saves some work for our database server. To put it another way, if a column will only be in the select list, and not in the where clause, you might get a small performance benefit by including it in an index to get the benefit of covering a query from the index, without actually indexing on the column.
It is not just the where clause and select clause. A group by clause also needs its columns to be covered by the index for it to be a covering index. Basically, to be a covering index, it needs to contain all the column used in the query for a given table. However, if you don't include them in the right order, the index won't be used.
If the column order in the index is (col1, col2, col3), then the index can't be used for query one since you are selecting by col3. Think of it like a phone book sorted by last name, then first name, then middle initial. Finding everyone with a last name Smith is easy, finding everyone with the first name John isn't helped by the sorting, you have to read the whole phone book. Same for the index. Finding a col1 value is easy. Finding a col1 value and then col2 values is fine. Just finding col3 or just col2 is not helped by the index.
The following is a query that I'm executing.
select col1, col2 from table1 where col1 in (select table2.col1 from table2) and col2 = 'ABC' ;
In table1, an index is available on col2, no index is available on
col1.
I have around 4 million records for 'ABC' in table1.
The total size of the table1 is around 50 million.
The size of table2 is less. Around 1 million. (No indexes are available on this table)
The query takes a lot of time to come out. If I remove the "in (select table2.col1 from table2)" condition, the query behaves the way an indexed query needs to.
My question is,
In case we have an indexed column being used in the where clause and we include a condition for a non indexed column (specifically an in condition), is there a possibility of a performance hit? Explain plan on the query does not give any hint of non index fetch.
Also, does the order of the conditions matter?
i.e In case I give the index clause before the non index clause, will oracle apply the non index clause only on the subset chosen?
Thanks in advance.
The order of your predicates does not matter. The optimizer determines that.
It's not as simple as "index is always better", so the optimizer tries to evaluate the "selectivity" of each predicate, and then determine the "best" order.
If one predicate is determined to result in a very small portion of the table, and an index exist, it is likely that indexed access will be used. In your case, I don't think an index will help you, unless the rows are physically sorted (on disk) by col2.
If you can share the execution plans, we could probably help you.
I am trying to figure out a way to optimize the below SQL query:
select * from SOME_TABLE
where (col1 = 123 and col2 = 'abc')
or (col1 = 234 and col2 = 'cdf')
or (col1 = 755 and col2 = 'cvd') ---> I have around 2000 'OR' statements in a single query.
Currently this query takes a long time to execute, so is there anyway to make this query run faster?
Create a lookup table CREATE TABLE lookup (col1 INT, col2 VARCHAR(3), PRIMARY KEY(col1, col2), KEY(col2)) ORGANIZATION INDEX or whatever fits your needs
Make sure you have indexes on your original table (col1 and col2)
populate the lookup table with your 2000 combinations
Now query
SELECT
mytable.*
FROM mytable
INNER JOIN lookup ON mytable.col1=lookup.col1 AND mytable.col2=lookup.col2
Difficult to say without seeing the query plan but I'd imagine this is resolves to a FTS with a lot of CPU doing the OR logic.
If the general pattern is col1=x and col2=y then try creating a table with your 2000 pairs and joining instead. If your 2000 pairs come from other tables, factor the select statement that retrieves them straight into your SELECT statement here.
Also make sure you've got all your unique and NOT NULL constraints in place as that will make a difference. Consider an index on col1 & col2, though don't be surprised if it doesn't use it.
Not sure if that's going to do the trick, but post more details if not.
Select only your desired columns, not all (*)... But surely you know that.
But if you have more than 2000 OR in your SQL statement, maybe it's time to change it! If you explain us something more about your database, we'll help you better.