Exclude NULL value from UNIQUE INDEX in HANA - hana

I Need to create a Unique Index in HANA with nullable column. I need to exclude NULL value from Index.
In SQL SERVER I can create an Index with this sintax:
CREATE UNIQUE NONCLUSTERED INDEX [MyTableIX_] ON [dbo].[MyTable]
(
[MyField1] ASC,
[MyField2] ASC,
[MyField3] ASC
)
WHERE ([MyField1] IS NOT NULL AND [MyField2] IS NOT NULL AND [MyField3] IS NOT NULL)
How can obtain the same result in HANA?

AFAIK This is not possible as a UNIQUE index requires that all of the entries are unique at the time the index is created, and will prevent records being added which would create duplicate entires in the index. (The documentation explains this)
Most Database systems work this way- unique means unique.
However, if your table is a column store (most are in HANA) then do you really need to create this index? The Column store optimises the table for retrieval of data (which is why in HANA generally reads are so much faster than writes) so for retrieval the use of index may not make any significant difference.
If you want to enforce uniqueness you could implement a trigger on the table instead which would abort the insert or update if it finds any records which conflict.
The "Filtered Index" syntax for MS SQL Server is intended to optimised retrieval for a particular subset of records in the table so that when the filter applies an index can be used which does not have to cover all rows of the table - resulting in a shorter index and a (hopefully) faster query.
Given that for column store tables (most tables in HANA) every field is effectively indexed the need for optimised indexes for subsets of the table is reduced (probably to zero, depending on the data schema and values).

Related

What INCLUDE() function does when creating index in MS SQL Server?

What is the difference between creating an index using INCLUDE function vs not?
What would be the difference between the following two indexes?
CREATE NONCLUSTERED INDEX SomeName ON SomeTable (
ColumnA
,ColumnB
,ColumnC
,ColumnD
) INCLUDE (
ColumnE
,ColumnF
,ColumnG
)
vs
CREATE INDEX SomeName ON SomeTable (
ColumnA
,ColumnB
,ColumnC
,ColumnD
,ColumnE
,ColumnF
,ColumnG
)
The INCLUDE clause adds the data at the lowest/leaf level, rather than in the index tree. This makes the index smaller because it's not part of the tree.
INCLUDE columns are not key columns in the index, so they are not ordered. This means it isn't really useful for predicates, sorting etc.. However, it may be useful if you have a residual lookup in a few rows from the key columns.
INCLUDE columns are not key columns in the index, so they are not ordered. This makes them not typically useful for JOINs or sorting. And because they are not key columns, they don't sit in the whole B-tree structure like key columns
By adding Include (or nonkey)columns, you can create nonclustered indexes that cover more queries. This is because the nonkey columns have the following benefits:
They can be data types not allowed as index key columns.
They are not considered by the Database Engine when calculating the number of index key columns or index key size.
An index with Included columns can significantly improve query performance when all columns in the query are included in the index either as key or nonkey columns. Performance gains are achieved because the query optimizer can locate all the column values within the index; table or clustered index data is not accessed resulting in fewer disk I/O operations.
For more info refer Microsoft docs: Create Indexes with Included Columns
When an execution plan uses an index, it has access to all the columns in the index. If all the columns from a given table are in the index, there is no need to refer to the original data pages. Eliminating that data page lookup is a gain in efficiency.
However, including columns in indexes has overhead for the indexing structure itself (this is in addition to duplicating the values).
The INCLUDE keyword allows for column values to be in the index, without incurring the overhead of the additional indexing structure. The purpose is to resolve queries without having to look up the column information on the original data pages.

Index is not getting used

This is excerpt from Tom Kyte's book.
"We’re using a SELECT COUNT(*) FROM T query (or something similar)
and we have a B*Tree index on table T. However, the optimizer is full
scanning the table, rather than counting the (much smaller) index
entries. In this case, the index is probably on a set of columns that
can contain Nulls. Since a totally Null index entry would never be
made, the count of rows in the index will not be the count of rows in
the table. Here the optimizer is doing the right thing—it would get
the wrong answer if it used the index to count rows."
As far as I know indexes come into picture when we use a WHERE clause. Why index come in the above scenario? Before countering him I wanted to know the facts.
"As far as i know indexes comes in picture when you used where clause. "
That's one use case for indexes, when we want quick access to rows identified by specific values of indexed column(s). But there are other uses.
Counting rows is one. To count the number of rows in a table Oracle actually has to count each row (because statistics may not be fresh enough), which means literally reading each block of storage and counting the rows in each block. Potentially that's a lot of reads.
However, an index on a NOT NULL column also has an entry for each row of the table. Indexes are much smaller than tables (typically only one column) so an Index block contains many more entries than a Table block. Consequently Oracle has to read far fewer Index blocks to get the count of rows than scanning the table would require. Reading fewer blocks is faster than reading more blocks.
This isn't true if the table only has indexes on nullable columns. Oracle doesn't index null values (unless the index is a composite index and at least one column is populated) so a count of the entries in an index couldn't guarantee to be the actual count of the table's rows.
Another common use case for reading indexes is to satisfy a SELECT statement where all the columns in a projection are in one index, and the index also services any WHERE conditions.
Oracle Database does not store NULLs in the B-tree index, see the documentation
Oracle Database does not index table rows in which all key columns are
null, except for bitmap indexes or when the cluster key column value
is null.
Because of this, if the index has been created on a column that may contain null values, the database cannot use this index in a query like: SELECT COUNT(*) FROM T. Even when the column does not contain any NULLs, the optimizer doesn't know this unless the column has ben marked as NOT NULL.
According to the documentation - FAST FULL INDEX SCAN
Fast Full Index Scan
A fast full index scan is a full index scan in
which the database accesses the data in the index itself without
accessing the table, and the database reads the index blocks in no
particular order.
Fast full index scans are an alternative to a full table scan when
both of the following conditions are met:
The index must contain all columns needed for the query.
A row containing all nulls must not appear in the query result set.
For this result to be guaranteed, at least one column in the index
must have either:
A NOT NULL constraint
A predicate applied to the column that prevents nulls from being
considered in the query result set
So if you know that the indexed column cannot contain NULL values, then mark this column as NOT NULL using ALTER TABLE table_name MODIFY column_name column_type NOT NULL; and the database will use that index in the query: SELECT COUNT(*) FROM T
If the colum can have nulls, and cannot be marked as NOT NULL, then use a solution from #Gordon Linoff's answer.
You can force the indexing of NULL values by including a constant in the index:
create index t_table_col on t(col, 0);
The 1 is a constant expression that is never NULL.

Decision when to create Index on table column in database?

I am not db guy. But I need to create tables and do CRUD operations on them. I get confused should I create the index on all columns by default
or not? Here is my understanding which I consider while creating index.
Index basically contains the memory location range ( starting memory location where first value is stored to end memory location where last value is
stored). So when we insert any value in table index for column needs to be updated as it has got one more value but update of column
value wont have any impact on index value. Right? So bottom line is when my column is used in join between two tables we should consider
creating index on column used in join but all other columns can be skipped because if we create index on them it will involve extra cost of
updating index value when new value is inserted in column.Right?
Consider this scenario where table mytable contains two three columns i.e col1,col2,col3. Now we fire this query
select col1,col2 from mytable
Now there are two cases here. In first case we create the index on col1 and col2. In second case we don't create any index.** As per my understanding
case 1 will be faster than case2 because in case 1 we oracle can quickly find column memory location. So here I have not used any join columns but
still index is helping here. So should I consider creating index here or not?**
What if in the same scenario above if we fire
select * from mytable
instead of
select col1,col2 from mytable
Will index help here?
Don't create Indexes in every column! It will slow things down on insert/delete/update operations.
As a simple reminder, you can create an index in columns that are common in WHERE, ORDER BY and GROUP BY clauses. You may consider adding an index in colums that are used to relate other tables (through a JOIN, for example)
Example:
SELECT col1,col2,col3 FROM my_table WHERE col2=1
Here, creating an index on col2 would help this query a lot.
Also, consider index selectivity. Simply put, create index on values that has a "big domain", i.e. Ids, names, etc. Don't create them on Male/Female columns.
but update of column value wont have any impact on index value. Right?
No. Updating an indexed column will have an impact. The Oracle 11g performance manual states that:
UPDATE statements that modify indexed columns and INSERT and DELETE
statements that modify indexed tables take longer than if there were
no index. Such SQL statements must modify data in indexes and data in
tables. They also create additional undo and redo.
So bottom line is when my column is used in join between two tables we should consider creating index on column used in join but all other columns can be skipped because if we create index on them it will involve extra cost of updating index value when new value is inserted in column. Right?
Not just Inserts but any other Data Manipulation Language statement.
Consider this scenario . . . Will index help here?
With regards to this last paragraph, why not build some test cases with representative data volumes so that you prove or disprove your assumptions about which columns you should index?
In the specific scenario you give, there is no WHERE clause, so a table scan is going to be used or the index scan will be used, but you're only dropping one column, so the performance might not be that different. In the second scenario, the index shouldn't be used, since it isn't covering and there is no WHERE clause. If there were a WHERE clause, the index could allow the filtering to reduce the number of rows which need to be looked up to get the missing column.
Oracle has a number of different tables, including heap or index organized tables.
If an index is covering, it is more likely to be used, especially when selective. But note that an index organized table is not better than a covering index on a heap when there are constraints in the WHERE clause and far fewer columns in the covering index than in the base table.
Creating indexes with more columns than are actually used only helps if they are more likely to make the index covering, but adding all the columns would be similar to an index organized table. Note that Oracle does not have the equivalent of SQL Server's INCLUDE (COLUMN) which can be used to make indexes more covering (it's effectively making an additional clustered index of only a subset of the columns - useful if you want an index to be unique but also add some data which you don't want to be considered in the uniqueness but helps to make it covering for more queries)
You need to look at your plans and then determine if indexes will help things. And then look at the plans afterwards to see if they made a difference.

SQL non-clustered index

I have a table that maps a user's permissions to a given object. So, it is essentially a join table to 3 different tables. (Object, User, and Permission)
The values of each row will always be unique for all 3 columns, but not any 2.
I need to create a non-clustered index. I want to put the index on the foreign keys to the object and user, but I am wondering if I should put it on all 3 columns.
"The values of each row will always be unique for all 3 columns"
You might be interested to know that SQL Server unique constraints are implemented as indexes. So if you have (or want) a constraint backing up that unique-claim of yours, you already have an index on all 3.
CREATE UNIQUE NONCLUSTERED INDEX idx_unique_perms ON UserPermissions
(
ObjectId ASC,
UserId ASC,
PermissionID ASC
)
If you make one, just remember to order your columns for high selectivity.
If you have some doubts, formulate the query(ies) you intend to execute against these tables, and run the SSMS Query Tuning Wizard. That should help you get started in the right direction.
One thing to consider is the number of rows in these three tables. If the row counts will be small, it might not even be worthwhile adding indexes. A table scan would probably be done anyway.

What is a Covered Index?

I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/