i am joining two tables and i have index on both columns but still oracle chooses full table scan on both instead of index scan.
i have mentioned below the query
SELECT * FROM Table 1, Table2 WHERE Tab1_col= Tab2_col
and here it can been seen in execution plan that it is going on Full table scan
Related
Azure SQL Server has a simple table with three columns, all of which are part of the Clustered Index. The select query that filters using two of those columns and returns the third column uses Index Scan instead of Index Seek. A total of 500k records are being scanned and the CPU usage is high and the execution is slow
Clustered Index
Are there any reasons to not use Index Seek? Should I create the non-clustered index as suggested?
I have a patient table with a few columns, and a clustered index on column ID and a non-clustered index on column birth.
create clustered index CI_patient on dbo.patient (ID)
create nonclustered index NCI_patient on dbo.patient (birth)
Here are my queries:
select * from patient
select ID from patient
select birth from patient
Looking at the execution plan, the first query is 'clustered index scan' (which is understandable because the table is a clustered table), the third one is 'index scan nonclustered' (which is also understandable because this column has a nonclustered index)
My question is why the second one is 'index scan nonclustered'? This column suppose to have a clustered index, in this sense, should that be clustered index scan? Any thoughts on this?
Basically, your second query wants to get all ID values from the table (no WHERE clause or anything).
SQL Server can do this two ways:
clustered index scan - basically a full table scan to read all the data from all rows, and extract the ID from each row - would work, but it loads the WHOLE table, one by one
do a scan across the non-clustered index, because each non-clustered index also includes the clustering column(s) on its leaf level. Since this is a index that is much smaller than the full table, to do this, SQL Server will need to load fewer data pages and thus can provide the answer - all ID values from all rows - faster than when doing a full table scan (clustered index scan)
The cost-based optimizer in SQL Server just picks the more efficient route to get the answer to the question you've asked with your second query.
There are the following scenarios:
Use PG to execute the query as follows:
Select count(*) from t where DATETIME >'2018-07-27 10.12.12.000000' and DATETIME < '2018-07-28 10.12.12.000000'
It returns 22 indexes with rapid execution.
The query condition has "="
Select count(*) from t where DATETIME >='2018-07-27 10.12.12.000000' and DATETIME <= '2018-07-28 10.12.12.000000'
It return 22 indexes which cost 20s.
I find that the query without “=” choose index scan, however, the query with “=” partly choose table scan.
According to your question:
The current indexing mechanism is that the optimizer matches the first available index, which means that the query will first select the first index created, and the choice of index depends on the order in which the index is created. In the case of an index, the query will take the index scan first.
Make sure that the nodes on each data group contain the index, otherwise the unindexed data nodes will take the table scan.
Execute analyze optimization query. Analyze is a new feature of SequoiaDB v3.0. It is mainly used to analyze collections, index data, and collect statistical information, and provide an optimal query algorithm to determine either index or table scan. Analyze specific usage reference: http://doc.sequoiadb.com/cn/index-cat_id-1496923440-edition_id-300
View the access plan by find.explain() to view the query cost
Let's say your table has three columns:
time (integer)
name (varchar)
other_column (varchar)
and you have two indexes:
CREATE INDEX index_time ON my_table (time);
CREATE INDEX index_name ON my_table (name);
In this case, does it make any difference if I create a new index based on both time and name? i.e.:
CREATE INDEX index_name_and_time ON my_table (name,time);
In regards overall performance the three indexes may be overkill and have a detrimental affect when inserting as there are then the three indexes to maintain and the extra memory/space utilisation.
However, the first factor would be to ascertain if the indexes would actually be utilised which depends upon what queries are to be run.
From a brief play with the following code, which you could use as the basis to explore more fully (EXPLAIN QUERY PLAN your_query being a tool to use):-
DROP TABLE IF EXISTS my_table;
DROP INDEX IF EXISTS index_time;
DROP INDEX IF EXISTS index_name;
DROP INDEX IF EXISTS index_name_and_time;
CREATE TABLE IF NOT EXISTS my_table (time INTEGER, name TEXT, other TEXT);
CREATE INDEX IF NOT EXISTS index_time ON my_table (time); -- INDEX 1
-- CREATE INDEX IF NOT EXISTS index_name ON my_table (name); -- INDEX 2
-- CREATE INDEX index_name_and_time ON my_table (name,time); -- INDEX 3
EXPLAIN QUERY PLAN
SELECT * FROM my_table; -- QUERY 1
-- EXPLAIN QUERY PLAN
-- SELECT time, name, other FROM my_table -- QUERY 2
-- EXPLAIN QUERY PLAN
-- SELECT time, name, other FROM my_table ORDER BY time, name; -- QUERY 3
-- EXPLAIN QUERY PLAN
-- SELECT time, name, other FROM my_table ORDER BY name, time; -- QUERY 4
The following results can be obtained :-
First two Queries, no advantage, just disadvantage.
Having no indexes through to having all 3 makes no difference to the first 2 queries (basically the same). None use any of the indexes when 0,1,2 or 3 indexes are available. They use SCAN TABLE my_table
The 3rd Query
Without any indexes then SCAN TABLE my_table and USE TEMP B-TREE FOR ORDER BY
With just the first index SCAN TABLE my_table USING INDEX index_time and USE TEMP B-TREE FOR RIGHT PART OF ORDER BY.
With the 1st and 2nd SCAN TABLE my_table USING INDEX index_time and USE TEMP B-TREE FOR RIGHT PART OF ORDER BY.
With just the 2nd SCAN TABLE my_table and USE TEMP B-TREE FOR ORDER BY
With all 3 SCAN TABLE my_table USING INDEX index_time and USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
With just the 3rd SCAN TABLE my_table and USE TEMP B-TREE FOR ORDER BY
The 4th query
Without any SCAN TABLE my_table and USE TEMP B-TREE FOR ORDER BY
With 1 SCAN TABLE my_table and USE TEMP B-TREE FOR ORDER BY
With 1 and 2 SCAN TABLE my_table USING INDEX index_name and USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
With 2 SCAN TABLE my_table USING INDEX index_name and USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
With 1,2 and 3 SCAN TABLE my_table USING INDEX index_name_and_time
With just 3 SCAN TABLE my_table USING INDEX index_name_and_time
Of course this is not factoring in timings as the tables are empty. The code above could easily be adapted to include data and thus then have timings applied. Note you also perhaps ant to consider effects other than running queries, such as insertions and deletions which would alter the indexes.
The Answer - It depends.
So at least from a index utilisation point of view it's quite clear that an index being useful or not is dependant upon the queries used.
The third index, (name, time) is redundant with (name).
You should probably drop the (name) index and just include (name, time) and (time) -- if those are the indexes that you think you need.
How does SQL actually run?
For example, if I want to find a row with row_id=123, will SQL query search row by row from the top of memory?
This is a topic of query optimization. Briefly speaking, based on your query, the database system first tries to generate and optimize a query plan that possibly has optimal performance, then executes that plan.
For selections like row_id = 123, the actually query plan depends on whether you have an index or not. If you do not, a table scan will be used to examine the table row by row. But if you do have an index on row_id, there is a chance to skip most of the rows by using the index. In this case, the DB will not search row by row.
If you're running PostgreSQL or MySQL, you can use
EXPLAIN SELECT * FROM table WHERE row_id = 123;
to see the query plan generated by your system.
For an example table,
CREATE TABLE test(row_id INT); -- without index
COPY test FROM '/home/user/test.csv'; -- 40,000 rows
The EXPLAIN SELECT * FROM test WHERE row_id = 123 outputs:
QUERY PLAN
------------------------------------------------------
Seq Scan on test (cost=0.00..677.00 rows=5 width=4)
Filter: (row_id = 123)
(2 rows)
which means the database will do a sequential scan on the whole table and find the rows with row_id = 123.
However, if you create an index on the column row_id = 123:
CREATE INDEX test_idx ON test(row_id);
then the same EXPLAIN will tell us that the database will use an index scan to avoid going through the whole table:
QUERY PLAN
--------------------------------------------------------------------------
Index Only Scan using test_idx on test (cost=0.00..8.34 rows=5 width=4)
Index Cond: (row_id = 123)
(2 rows)
You can also use EXPLAIN ANALYZE to see actual performance of your SQL queries. On my machine, the total runtimes for sequential scan and index scan are 14.738 ms and 0.171 ms, respectively.
For details of query optimization, refer to Chapters 15 and 16 in the Database Systems: The Complete Book.