Fast table indexing for range lookup - sql

I have a Postgres table with about 4.5 million rows. The columns are basically just
low BIGINT,
high BIGINT,
data1,
data2,
...
When you query this table, you have a long integer, and want to find the data corresponding to the range between low and high that includes that value. What would be the best way to index this table for fast lookups?

A multi-column index with reversed sort order:
CREATE INDEX tbl_low_high_idx on tbl(low, high DESC);
This way, the index can be scanned forward to where low is high enough, then take all rows until high is too low - all in one scan. That's the main reason why sort order is implemented for indexes to begin with: to combine different sort orders in a multi-column index with different order. Basically, a b-tree index can be traversed in both directions at practically the same speed, so ASC / DESC would hardly be needed for single-column indexes.
You may also be interested in the new range types of PostgreSQL 9.2. Can be indexed with a GiST index like this:
CREATE INDEX tbl_idx ON tbl USING gist (low_high);
Should make a query of this form perform very fast:
SELECT * FROM tbl WHERE my_value <# low_high;
<# being the "element is contained by" operator.

Related

How to choose index scan or table scan when query in SequoiaDB?

There are the following scenarios:
Use PG to execute the query as follows:
Select count(*) from t where DATETIME >'2018-07-27 10.12.12.000000' and DATETIME < '2018-07-28 10.12.12.000000'
It returns 22 indexes with rapid execution.
The query condition has "="
Select count(*) from t where DATETIME >='2018-07-27 10.12.12.000000' and DATETIME <= '2018-07-28 10.12.12.000000'
It return 22 indexes which cost 20s.
I find that the query without “=” choose index scan, however, the query with “=” partly choose table scan.
According to your question:
The current indexing mechanism is that the optimizer matches the first available index, which means that the query will first select the first index created, and the choice of index depends on the order in which the index is created. In the case of an index, the query will take the index scan first.
Make sure that the nodes on each data group contain the index, otherwise the unindexed data nodes will take the table scan.
Execute analyze optimization query. Analyze is a new feature of SequoiaDB v3.0. It is mainly used to analyze collections, index data, and collect statistical information, and provide an optimal query algorithm to determine either index or table scan. Analyze specific usage reference: http://doc.sequoiadb.com/cn/index-cat_id-1496923440-edition_id-300
View the access plan by find.explain() to view the query cost

Index Decreases Number of Rows Read; No performance Gain

I created a non-clustered, non-unique index on a column (date) on a large table (16 million rows), but am getting very similar query speeds when compared to the exact same query that's being forced to not use any indexes.
Query 1 (uses index):
SELECT *
FROM testtable
WHERE date BETWEEN '01/01/2017' AND '03/01/2017'
ORDER BY date
Query 2 (no index):
SELECT *
FROM testtable WITH(INDEX(0))
WHERE date BETWEEN '01/01/2017' AND '03/01/2017'
ORDER BY date
Both queries take the same amount of time to run, and return the same result. When looking at the Execution plan for each, Query 1's number of rows read is
~ 4 million rows, where as Query 2 is reading 106 million rows. It appears that the index is working, but I'm not gaining any performance benefits from it.
Any ideas as to why this is, or how to increase my query speed in this case would be much appreciated.
Create Indexes with Included Columns: Cover index
This topic describes how to add included (or nonkey) columns to extend the functionality of nonclustered indexes in SQL Server by using SQL Server Management Studio or Transact-SQL. By including nonkey columns, you can create nonclustered indexes that cover more queries. This is because the nonkey columns have the following benefits:
They can be data types not allowed as index key columns.
They are not considered by the Database Engine when calculating the
number of index key columns or index key size.
An index with nonkey columns can significantly improve query performance when all columns in the query are included in the index either as key or nonkey columns. Performance gains are achieved because the query optimizer can locate all the column values within the index; table or clustered index data is not accessed resulting in fewer disk I/O operations.
CREATE NONCLUSTERED INDEX IX_your_index_name
ON testtable (date)
INCLUDE (col1,col2,col3);
GO
You need to build an index around the need of your query - this quick and free video course should bring you up to speed really quick.
https://www.brentozar.com/archive/2016/10/think-like-engine-class-now-free-open-source/

Erratic indexed query performance in PostgreSQL

Need help regarding performance of a query in PostgreSQL. It seems to relate to the indexes.
This query:
Filters according to type
Orders by timestamp, ascending:
SELECT * FROM the_table WHERE type = 'some_type' ORDER BY timestamp LIMIT 20
The Indexes:
CREATE INDEX the_table_timestamp_index ON the_table(timestamp);
CREATE INDEX the_table_type_index ON the_table(type);
The values of the type field are only ever one of about 11 different strings.
The problem is that the query seems to execute in O(log n) time, taking only a few milliseconds most times except for some values of type which take on the order of several minutes to run.
In these example queries, the first takes only a few milliseconds to run while the second takes over 30 minutes:
SELECT * FROM the_table WHERE type = 'goq' ORDER BY timestamp LIMIT 20
SELECT * FROM the_table WHERE type = 'csp' ORDER BY timestamp LIMIT 20
I suspect, with about 90% certainty, that the indexes we have are not the right ones. I think, after reading this similar question about index performance, that most likely what we need is a composite index, over type and timestamp.
The query plans that I have run are here:
Expected performance, type-specific index (i.e. new index with the type = 'csq' in the WHERE clause).
Slowest, problematic case, indexes as described above.
Fast case, same indexes as above.
Thanks very much for your help! Any pointers will be really appreciated!
The indexes can be used either for the where clause or the order by clause. With the index thetable(type, timestamp), then the same index can be used for both.
My guess is that Postgres is deciding which index to use based on statistics it gathers. When it uses the index for the where and then attempts a sort, you get really bad performance.
This is just a guess, but it is worth creating the above index to see if that fixes the performance problems.
The explain outputs all use the timestamp index. That is probably because the cardinality of the type column is too low so a scan on an index on that column is as expensive as a table scan.
The composite index to be created should be:
create index comp_index on the_table ("timestamp", type)
In that order.

Full table scan occured even when index exists?

We have a sql query as follows
select * from Table where date < '20091010'
However when we look at the query plan, we see
The type of query is SELECT.
FROM TABLE
Worktable1.
Nested iteration.
Table Scan.
Forward scan.
Positioning at start of table.
Using I/O Size 32 Kbytes for data pages.
With MRU Buffer Replacement Strategy for data pages.
which seems to suggest that a full table scan is done. Why is the index not used?
If the majority of your dates are found by applying < '20091010' then the index may well be overlooked in favour of a table scan. What is your distribution of dates within that table? What is the cardinality? Is the index used if you only select date rather than select *?
Unless the index is covering *, the optimizer realizes that a table scan is probably more efficient than an index seek/scan and bookmark lookup. What's the expected selectivity of the date range? Do you have a primary key defined?

What is a Covered Index?

I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/