This is excerpt from Tom Kyte's book.
"We’re using a SELECT COUNT(*) FROM T query (or something similar)
and we have a B*Tree index on table T. However, the optimizer is full
scanning the table, rather than counting the (much smaller) index
entries. In this case, the index is probably on a set of columns that
can contain Nulls. Since a totally Null index entry would never be
made, the count of rows in the index will not be the count of rows in
the table. Here the optimizer is doing the right thing—it would get
the wrong answer if it used the index to count rows."
As far as I know indexes come into picture when we use a WHERE clause. Why index come in the above scenario? Before countering him I wanted to know the facts.
"As far as i know indexes comes in picture when you used where clause. "
That's one use case for indexes, when we want quick access to rows identified by specific values of indexed column(s). But there are other uses.
Counting rows is one. To count the number of rows in a table Oracle actually has to count each row (because statistics may not be fresh enough), which means literally reading each block of storage and counting the rows in each block. Potentially that's a lot of reads.
However, an index on a NOT NULL column also has an entry for each row of the table. Indexes are much smaller than tables (typically only one column) so an Index block contains many more entries than a Table block. Consequently Oracle has to read far fewer Index blocks to get the count of rows than scanning the table would require. Reading fewer blocks is faster than reading more blocks.
This isn't true if the table only has indexes on nullable columns. Oracle doesn't index null values (unless the index is a composite index and at least one column is populated) so a count of the entries in an index couldn't guarantee to be the actual count of the table's rows.
Another common use case for reading indexes is to satisfy a SELECT statement where all the columns in a projection are in one index, and the index also services any WHERE conditions.
Oracle Database does not store NULLs in the B-tree index, see the documentation
Oracle Database does not index table rows in which all key columns are
null, except for bitmap indexes or when the cluster key column value
is null.
Because of this, if the index has been created on a column that may contain null values, the database cannot use this index in a query like: SELECT COUNT(*) FROM T. Even when the column does not contain any NULLs, the optimizer doesn't know this unless the column has ben marked as NOT NULL.
According to the documentation - FAST FULL INDEX SCAN
Fast Full Index Scan
A fast full index scan is a full index scan in
which the database accesses the data in the index itself without
accessing the table, and the database reads the index blocks in no
particular order.
Fast full index scans are an alternative to a full table scan when
both of the following conditions are met:
The index must contain all columns needed for the query.
A row containing all nulls must not appear in the query result set.
For this result to be guaranteed, at least one column in the index
must have either:
A NOT NULL constraint
A predicate applied to the column that prevents nulls from being
considered in the query result set
So if you know that the indexed column cannot contain NULL values, then mark this column as NOT NULL using ALTER TABLE table_name MODIFY column_name column_type NOT NULL; and the database will use that index in the query: SELECT COUNT(*) FROM T
If the colum can have nulls, and cannot be marked as NOT NULL, then use a solution from #Gordon Linoff's answer.
You can force the indexing of NULL values by including a constant in the index:
create index t_table_col on t(col, 0);
The 1 is a constant expression that is never NULL.
Related
I Need to create a Unique Index in HANA with nullable column. I need to exclude NULL value from Index.
In SQL SERVER I can create an Index with this sintax:
CREATE UNIQUE NONCLUSTERED INDEX [MyTableIX_] ON [dbo].[MyTable]
(
[MyField1] ASC,
[MyField2] ASC,
[MyField3] ASC
)
WHERE ([MyField1] IS NOT NULL AND [MyField2] IS NOT NULL AND [MyField3] IS NOT NULL)
How can obtain the same result in HANA?
AFAIK This is not possible as a UNIQUE index requires that all of the entries are unique at the time the index is created, and will prevent records being added which would create duplicate entires in the index. (The documentation explains this)
Most Database systems work this way- unique means unique.
However, if your table is a column store (most are in HANA) then do you really need to create this index? The Column store optimises the table for retrieval of data (which is why in HANA generally reads are so much faster than writes) so for retrieval the use of index may not make any significant difference.
If you want to enforce uniqueness you could implement a trigger on the table instead which would abort the insert or update if it finds any records which conflict.
The "Filtered Index" syntax for MS SQL Server is intended to optimised retrieval for a particular subset of records in the table so that when the filter applies an index can be used which does not have to cover all rows of the table - resulting in a shorter index and a (hopefully) faster query.
Given that for column store tables (most tables in HANA) every field is effectively indexed the need for optimised indexes for subsets of the table is reduced (probably to zero, depending on the data schema and values).
In Oracle, if I make a composite index on 2 columns, then in which situation this index will be used to search the record ?
a) If my query has a WHERE clause which involves first column
e.g. WHERE first_column = 'John'
b) If my query has a WHERE clause which involves second column
e.g. WHERE second_column = 'Sharma'
c) Either a or b
d) Both a and b
e) Not specifically these 2 columns but it could be any column in the WHERE clause.
f) Only column a or both columns a and b
I happen to think that MySQL does a pretty good job of describing how composite indexes are used. The documentation is here.
The basic idea is that the index would normally be used in the following circumstances:
When the where condition is an equality on col1 (col1 = value).
When the where condition is an inequality or in on col1 (col1 in (list), col1 < value)
When the where condition is an equality on col1 and col2, connected by an and (col1 = val1 and col2 = val2)
When the where condition is an equality on col1 and an inequality or in on col2.
Any of the above four cases where additional columns are used with additional conditions on other columns, connected by an and.
In addition, the index would normally be used if col1 and col2 are the only columns referenced in the query. This is called a covering index, and -- assuming there are other columns in the table -- it is faster to read the index than the original table because the index is smaller.
Oracle has a pretty smart optimizer, so it might also use the index in some related circumstances, for instance when col1 uses an in condition along with a condition on col2.
In general, a condition will not qualify for an index if the column is an argument to a function. So, these clauses would not use a basic index:
where month(col1) = 3
where trunc(col1) = trunc(sysdate)
where abs(col1) < 1
Oracle supports functional indexes, so if these constructs are actually important, you can create an index on month(col1), trunc(col1), or abs(col1).
Also, or tends to make the use of indexes less likely.
d) Both a or b
If the leading column is used, Oracle will likely use a regular index range scan and just ignore the unused columns.
If a non-leading column is used, Oracle can use an index skip scan. In practice a skip scan is not used very often.
There are two completely different questions here: when can Oracle use an index and when will Oracle use an index. The above explains that Oracle can use an index in either case, and you can test that out with a hint: /*+ index(table_name index_name) */.
Determining when Oracle will use an index is much trickier. Oracle uses multi-block reads for full table scans and fast full index scans, and uses single-block reads for other index scans. This means a full table scan is more efficient when reading a larger percent of the data. But there are a lot of factors involved: the percentage of data, how big is the index, system statistics that tell Oracle how fast single- and multi-block IO are, the number of distinct values (especially important for choosing a skip scan), index clustering factor (how ordered is the table by the index columns), etc.
The optimizer will use indexes in several scenarios. Even if not "perfect".
Optimaly, if you are querying using the first columns in the index, then the index will be used. Even if you're referencing only the first column, then it will still use the index if the optimizer deems it filters out enough data.
If the indexed columns aren't answering the query requirement (for instance only referencing the second column in the where clause), the optimizer could still use the index for a full (table) index scan, if it holds all of the data required, because the index is smaller than the full table.
In your example, if you are only querying from that table, and you only have that one index, (a) will use the index, (b) will use it if you are only querying columns in the index, while the table itself has more.
If you have other indexes, or join other tables, then that could affect the explain plan compeltely.
Check out http://docs.oracle.com/cd/B19306_01/server.102/b14231/indexes.htm
I have a table with a nonclustered index1 on ID1 and ID2, in that order.
Select count(distinct(id1)) from table
returns 1
and Select count(distinct(id2)) from table has all the values of the table.
The querys to that table uses ... where id1= XX and id2 = XX
Could it make any performance improvement if I switch the order of the fields of index1 ?
I know it SHOULD be better but maybe: is it indifferent because id1 has only 1 value?
If I understand correctly, you are comparing these two statements:
where id1= XX and id2 = XX
Under most circumstances, this would use either an index on table(id1, id2) or table(id2, id1). The order of the comparisons in the where (or on) clauses has no impact on which indexes can be used.
Whether you should include a column that has only a single value in the unique index is a different matter. There is a minor performance effect to having a more complex index -- the tree structure has to store more bytes for each key. However, the query:
select count(distinct id2)
from table
where id1 = xx and idx = xx
will actually run faster with a composite index than with a singleton index table(id2). The reason is that the composite index can be used to entirely satisfy the query (in the jargon, it is a "covering index for the query"). The singleton index would need to look up the value of id1 in the table data, which requires extra processing.
The order you define the columns in your Index matters. If your column ID1 will always only have 1 value, then there is no point in putting it into the index, unless you are using it in a Covering Index in a Non-Clustered Index (meaning an Index not the physical ordering of the Table itself). In general, your first column defined in your Index should be the column with the most Varying Values that you need to search through. Visualize it this way, if you had a table of 1 million rows, and the first Column in your Index only had 1 (or small number) of varying values, then would that Index help you in finding the rows you want among the 1 million? Or would it be better to have ID2 first, which would be more efficient for the search, and which would be more frequently used, is what you have to ask yourself. Below is also more info on your question.
SQL Server Clustered Index - Order of Index Question
If you are using a Non-Clustered index, it may appear to not make a Different if your first Column in your Index is all the same values. However it does matter, the reason being is a Non-Clustered Index is stored on a number of Pages. The more entries you can store on a Page which helps you search faster the better. If you include a Column on a Page which adds no value to the Search, then it will requires the same Index to span more Pages. Meaning more Pages to flip through and Longer Lookups. It also means less Room to add new entries to an Existing Page during Inserts when the index is updated, causing more Page Splits. So there are side effects to the decision to add a Column of only 1 value to the Index. If you are using the Column to "cover" retrieved values in common selects, then you can also use Included Columns in your Index, which has the added benefit of not reordering your Index and yet acts like a Covered Index. If that was the intended purpose originally for adding a Column which only has 1 value.
I am not db guy. But I need to create tables and do CRUD operations on them. I get confused should I create the index on all columns by default
or not? Here is my understanding which I consider while creating index.
Index basically contains the memory location range ( starting memory location where first value is stored to end memory location where last value is
stored). So when we insert any value in table index for column needs to be updated as it has got one more value but update of column
value wont have any impact on index value. Right? So bottom line is when my column is used in join between two tables we should consider
creating index on column used in join but all other columns can be skipped because if we create index on them it will involve extra cost of
updating index value when new value is inserted in column.Right?
Consider this scenario where table mytable contains two three columns i.e col1,col2,col3. Now we fire this query
select col1,col2 from mytable
Now there are two cases here. In first case we create the index on col1 and col2. In second case we don't create any index.** As per my understanding
case 1 will be faster than case2 because in case 1 we oracle can quickly find column memory location. So here I have not used any join columns but
still index is helping here. So should I consider creating index here or not?**
What if in the same scenario above if we fire
select * from mytable
instead of
select col1,col2 from mytable
Will index help here?
Don't create Indexes in every column! It will slow things down on insert/delete/update operations.
As a simple reminder, you can create an index in columns that are common in WHERE, ORDER BY and GROUP BY clauses. You may consider adding an index in colums that are used to relate other tables (through a JOIN, for example)
Example:
SELECT col1,col2,col3 FROM my_table WHERE col2=1
Here, creating an index on col2 would help this query a lot.
Also, consider index selectivity. Simply put, create index on values that has a "big domain", i.e. Ids, names, etc. Don't create them on Male/Female columns.
but update of column value wont have any impact on index value. Right?
No. Updating an indexed column will have an impact. The Oracle 11g performance manual states that:
UPDATE statements that modify indexed columns and INSERT and DELETE
statements that modify indexed tables take longer than if there were
no index. Such SQL statements must modify data in indexes and data in
tables. They also create additional undo and redo.
So bottom line is when my column is used in join between two tables we should consider creating index on column used in join but all other columns can be skipped because if we create index on them it will involve extra cost of updating index value when new value is inserted in column. Right?
Not just Inserts but any other Data Manipulation Language statement.
Consider this scenario . . . Will index help here?
With regards to this last paragraph, why not build some test cases with representative data volumes so that you prove or disprove your assumptions about which columns you should index?
In the specific scenario you give, there is no WHERE clause, so a table scan is going to be used or the index scan will be used, but you're only dropping one column, so the performance might not be that different. In the second scenario, the index shouldn't be used, since it isn't covering and there is no WHERE clause. If there were a WHERE clause, the index could allow the filtering to reduce the number of rows which need to be looked up to get the missing column.
Oracle has a number of different tables, including heap or index organized tables.
If an index is covering, it is more likely to be used, especially when selective. But note that an index organized table is not better than a covering index on a heap when there are constraints in the WHERE clause and far fewer columns in the covering index than in the base table.
Creating indexes with more columns than are actually used only helps if they are more likely to make the index covering, but adding all the columns would be similar to an index organized table. Note that Oracle does not have the equivalent of SQL Server's INCLUDE (COLUMN) which can be used to make indexes more covering (it's effectively making an additional clustered index of only a subset of the columns - useful if you want an index to be unique but also add some data which you don't want to be considered in the uniqueness but helps to make it covering for more queries)
You need to look at your plans and then determine if indexes will help things. And then look at the plans afterwards to see if they made a difference.
I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/