sqlite3 DB doesn't use index unless vacuumed - indexing

Performing VACUUM on my DB significantly improves query performance. While trying to determine why this is, I found that sqlite3 isn't using the index on the DB in its original state, just a generic SEARCH TABLE.
QUERY PLAN
|--SCAN TABLE data <--- no Index
|--USE TEMP B-TREE FOR GROUP BY
`--USE TEMP B-TREE FOR ORDER BY
After performing VACUUM, the QUERY PLAN shows a SEARCH USING INDEX as it should
QUERY PLAN
|--SEARCH TABLE data USING INDEX index_name (name=?)
|--USE TEMP B-TREE FOR GROUP BY
`--USE TEMP B-TREE FOR ORDER BY
How can I determine why the index isn't being used before the vacuum operation?
I have the explain results as well, but I'm not sure they'd be useful. They are clearly different (original, non-vacuumed result performs a Rewind/Loop where the vacuumed DB OpenRead's the index)
Thank you,

Related

How avoid using group by SQL ORACLE

If I include group by clause in SQL statement,
my index does not work.
Here is the code:
The GROUP BY clause is not preventing index access. Indexes aren't used because there's no filtering on the table. Without any WHERE clause, the entire table must be read so there's no point in using an index. (Indexes are faster for retrieving a small percentage of rows from a table, but full table scans are faster for retrieving a large percentage of rows.)
A covering index might help. If you build an index that contains all the columns used in the statement, Oracle could read from that index like it's a skinny version of the table.
For example:
create index purchage_idx on purchase(purchaseno, servicetype, paymenttype, gst);
If you really want to force use of an index in this case you can use a function-based index. For example, if we go to dbfiddle.uk, set up a table, and run an EXPLAIN PLAN on your first query we get:
OPERATION OPTIONS OBJECT_NAME OBJECT_TYPE OPTIMIZER
SELECT STATEMENT ALL ROWS
FILTER
HASH GROUP BY
TABLE ACCESS FULL PURCHASE TABLE
Now we add a function-based index:
CREATE INDEX IX_PURCHASE_1 ON PURCHASE(SUBSTR(RECEIPTNO,1,3))
then run another EXPLAIN PLAN, and here's what we get:
OPERATION OPTIONS OBJECT_NAME OBJECT_TYPE OPTIMIZER
SELECT STATEMENT ALL ROWS
FILTER
SORT GROUP BY NOSORT
INDEX FULL SCAN IX_PURCHASE_1 INDEX ANALYZED
So you can encourage index usage here by creating the appropriate index.

How to choose index scan or table scan when query in SequoiaDB?

There are the following scenarios:
Use PG to execute the query as follows:
Select count(*) from t where DATETIME >'2018-07-27 10.12.12.000000' and DATETIME < '2018-07-28 10.12.12.000000'
It returns 22 indexes with rapid execution.
The query condition has "="
Select count(*) from t where DATETIME >='2018-07-27 10.12.12.000000' and DATETIME <= '2018-07-28 10.12.12.000000'
It return 22 indexes which cost 20s.
I find that the query without “=” choose index scan, however, the query with “=” partly choose table scan.
According to your question:
The current indexing mechanism is that the optimizer matches the first available index, which means that the query will first select the first index created, and the choice of index depends on the order in which the index is created. In the case of an index, the query will take the index scan first.
Make sure that the nodes on each data group contain the index, otherwise the unindexed data nodes will take the table scan.
Execute analyze optimization query. Analyze is a new feature of SequoiaDB v3.0. It is mainly used to analyze collections, index data, and collect statistical information, and provide an optimal query algorithm to determine either index or table scan. Analyze specific usage reference: http://doc.sequoiadb.com/cn/index-cat_id-1496923440-edition_id-300
View the access plan by find.explain() to view the query cost

Optimize the Clustered Index Scan into Clustered Index Seek

There is scenario, I have table with 40 columns and I have to select all data of a table (including all columns). I have created a clustered index on the table and its including Clustered Index Scan while fetching full data set from the table.
I know that without any filter or join key, SQL Server will choose Clustered Index Scan instead of Clustered Index Seek. But, I want to have optimize execution plan by optimizing Clustered Index Scan into Clustered Index Seek. Is there any solution to achieve this? Please share.
Below is the screenshot of the execution plan:
Something is not quite right in the question / request, because what you are asking for will perform badly. I suspect it comes from mis-understanding what a clustered index is.
The clustered index - which is perhaps better stated as a clustered table - is the table of data, its not separate to the table, it is the table. If the order of the data on the table is already based on ITEM ID then the scan is the most efficient access method for your query (especially given the select *) - you do not want to seek in this scenario at all - and I don't believe that it is your scenario due to the sort operator.
If the clustered table is ordered based on another field, then you would need an additional non-clustered index to provide the correct order. You would then try to force a plan which was a non-clustered index scan, nested loop to a clustered index seek. That can be achieved using query hints, most likely an INNER LOOP JOIN would cause the seek - but a FORCESEEK also exists which can be used.
Performance wise this second option is never going to win - you are in effect looking at a tipping point notion (https://www.sqlskills.com/blogs/kimberly/the-tipping-point-query-answers/)
Well, I was trying to achieve the same, I wanted an index seek instead of an index scan on my top query.
SELECT TOP 5 id FROM mytable
Here is the execution plan being shown for the query:
I even tried the Offset Fetch Next approach, the plan was same.
To avoid a index scan, I included a fake primary key filter like below:
SELECT TOP 5 id FROM mytable where id != 0
I know, I won't have a 0 value in my primary key, so I added it in top query, which was resolved to an index seek instead of index scan:
Even though, the query plan comparison gives operation cost as similar to other, for index seek and scan in this regard. But I think to achieve index seek this way, it is an extra operation for the db to perform because it has to compare whether the id is 0 or not. Which we entirely do not need it to do if we want the top few records.

Why NonClustered index scan faster than Clustered Index scan?

As I know, heap tables are tables without clustered index and has no physical order.
I have a heap table "scan" with 120k rows and I am using this select:
SELECT id FROM scan
If I create a non-clustered index for the column "id", I get 223 physical reads.
If I remove the non-clustered index and alter the table to make "id" my primary key (and so my clustered index), I get 515 physical reads.
If the clustered index table is something like this picture:
Why Clustered Index Scans workw like the table scan? (or worse in case of retrieving all rows). Why it is not using the "clustered index table" that has less blocks and already has the ID that I need?
SQL Server indices are b-trees. A non-clustered index just contains the indexed columns, with the leaf nodes of the b-tree being pointers to the approprate data page. A clustered index is different: its leaf nodes are the data page itself and the clustered index's b-tree becomes the backing store for the table itself; the heap ceases to exist for the table.
Your non-clustered index contains a single, presumably integer column. It's a small, compact index to start with. Your query select id from scan has a covering index: the query can be satisfied just by examining the index, which is what is happening. If, however, your query included columns not in the index, assuming the optimizer elected to use the non-clustered index, an additional lookup would be required to fetch the data pages required, either from the clustering index or from the heap.
To understand what's going on, you need to examine the execution plan selected by the optimizer:
See Displaying Graphical Execution Plans
See Red Gate's SQL Server Execution Plans, by Grant Fritchey
A clustered index generally is about as big as the same data in a heap would be (assuming the same page fullness). It should use just a little more reads than a heap would use because of additional B-tree levels.
A CI cannot be smaller than a heap would be. I don't see why you would think that. Most of the size of a partition (be it a heap or a tree) is in the data.
Note, that less physical reads does not necessarily translate to a query being faster. Random IO can be 100x slower than sequential IO.
When to use Clustered Index-
Query Considerations:
1) Return a range of values by using operators such as BETWEEN, >, >=, <, and <= 2) Return large result sets
3) Use JOIN clauses; typically these are foreign key columns
4) Use ORDER BY, or GROUP BY clauses. An index on the columns specified in the ORDER BY or GROUP BY clause may remove the need for the Database Engine to sort the data, because the rows are already sorted. This improves query performance.
Column Considerations :
Consider columns that have one or more of the following attributes:
1) Are unique or contain many distinct values
2) Defined as IDENTITY because the column is guaranteed to be unique within the table
3) Used frequently to sort the data retrieved from a table
Clustered indexes are not a good choice for the following attributes:
1) Columns that undergo frequent changes
2) Wide keys
When to use Nonclustered Index-
Query Considerations:
1) Use JOIN or GROUP BY clauses. Create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns.
2) Queries that do not return large result sets
3) Contain columns frequently involved in search conditions of a query, such as WHERE clause, that return exact matches
Column Considerations :
Consider columns that have one or more of the following attributes:
1) Cover the query. For more information, see Index with Included Columns
2) Lots of distinct values, such as a combination of last name and first name, if a clustered index is used for other columns
3) Used frequently to sort the data retrieved from a table
Database Considerations:
1) Databases or tables with low update requirements, but large volumes of data can benefit from many nonclustered indexes to improve query performance.
2) Online Transaction Processing applications and databases that contain heavily updated tables should avoid over-indexing. Additionally, indexes should be narrow, that is, with as few columns as possible.
Try running
DBCC DROPCLEANBUFFERS
Before the queries...
If you really want to compare them.
Physical reads don't mean the same as logical reads when optimizing a query

Is PARTITION RANGE ALL in your explain plan bad?

Here's my explain plan:
SELECT STATEMENT, GOAL = ALL_ROWS 244492 4525870 235345240
SORT ORDER BY 244492 4525870 235345240
**PARTITION RANGE ALL** 207633 4525870 235345240
INDEX FAST FULL SCAN MCT MCT_PLANNED_CT_PK 207633 4525870 235345240
Just wondering if this is the best optimized plan for querying huge partitioned tables.
Using Oracle10g
PARTITION RANGE ALL just means that the predicates could not be used to perform any partition pruning. More info. Or, that the alternative (scanning the table blocks instead of using a fast full scan on the index) was estimated to be more expensive overall.
If you can change the predicate to limit the affected rows to a small subset of the partitions, the database will be able to skip whole partitions when querying the table.