Decision when to create Index on table column in database? - sql

I am not db guy. But I need to create tables and do CRUD operations on them. I get confused should I create the index on all columns by default
or not? Here is my understanding which I consider while creating index.
Index basically contains the memory location range ( starting memory location where first value is stored to end memory location where last value is
stored). So when we insert any value in table index for column needs to be updated as it has got one more value but update of column
value wont have any impact on index value. Right? So bottom line is when my column is used in join between two tables we should consider
creating index on column used in join but all other columns can be skipped because if we create index on them it will involve extra cost of
updating index value when new value is inserted in column.Right?
Consider this scenario where table mytable contains two three columns i.e col1,col2,col3. Now we fire this query
select col1,col2 from mytable
Now there are two cases here. In first case we create the index on col1 and col2. In second case we don't create any index.** As per my understanding
case 1 will be faster than case2 because in case 1 we oracle can quickly find column memory location. So here I have not used any join columns but
still index is helping here. So should I consider creating index here or not?**
What if in the same scenario above if we fire
select * from mytable
instead of
select col1,col2 from mytable
Will index help here?

Don't create Indexes in every column! It will slow things down on insert/delete/update operations.
As a simple reminder, you can create an index in columns that are common in WHERE, ORDER BY and GROUP BY clauses. You may consider adding an index in colums that are used to relate other tables (through a JOIN, for example)
Example:
SELECT col1,col2,col3 FROM my_table WHERE col2=1
Here, creating an index on col2 would help this query a lot.
Also, consider index selectivity. Simply put, create index on values that has a "big domain", i.e. Ids, names, etc. Don't create them on Male/Female columns.

but update of column value wont have any impact on index value. Right?
No. Updating an indexed column will have an impact. The Oracle 11g performance manual states that:
UPDATE statements that modify indexed columns and INSERT and DELETE
statements that modify indexed tables take longer than if there were
no index. Such SQL statements must modify data in indexes and data in
tables. They also create additional undo and redo.
So bottom line is when my column is used in join between two tables we should consider creating index on column used in join but all other columns can be skipped because if we create index on them it will involve extra cost of updating index value when new value is inserted in column. Right?
Not just Inserts but any other Data Manipulation Language statement.
Consider this scenario . . . Will index help here?
With regards to this last paragraph, why not build some test cases with representative data volumes so that you prove or disprove your assumptions about which columns you should index?

In the specific scenario you give, there is no WHERE clause, so a table scan is going to be used or the index scan will be used, but you're only dropping one column, so the performance might not be that different. In the second scenario, the index shouldn't be used, since it isn't covering and there is no WHERE clause. If there were a WHERE clause, the index could allow the filtering to reduce the number of rows which need to be looked up to get the missing column.
Oracle has a number of different tables, including heap or index organized tables.
If an index is covering, it is more likely to be used, especially when selective. But note that an index organized table is not better than a covering index on a heap when there are constraints in the WHERE clause and far fewer columns in the covering index than in the base table.
Creating indexes with more columns than are actually used only helps if they are more likely to make the index covering, but adding all the columns would be similar to an index organized table. Note that Oracle does not have the equivalent of SQL Server's INCLUDE (COLUMN) which can be used to make indexes more covering (it's effectively making an additional clustered index of only a subset of the columns - useful if you want an index to be unique but also add some data which you don't want to be considered in the uniqueness but helps to make it covering for more queries)
You need to look at your plans and then determine if indexes will help things. And then look at the plans afterwards to see if they made a difference.

Related

Does PostgreSQL use all available indexes to run a query faster?

We are structuring a project where some tables will have many records, and we intend to use 4 numeric foreign keys and 1 numeric primary, our assumption is that if we create an index for each foreign key and the default index of the primary key, the postgres planning would use all the starts (5 in total) to perform the query.
95% of the time the queries would be providing at least the 4 foreign keys.
Would each index be used to position the search faster in the sequential section of records?
Would having 4 indexes increase the speed of the query or would it suffice with a single index of the parent level (branch_id)?
Thank you for your time and experience.
example: if all foreign keys have an index
SELECT * FROM products WHERE
account_d=1 AND
organization_id=2 AND
business_id=3 AND
branch_id=4 AND
product_id=5;
example: if I only indicate the id of the primary key
SELECT * FROM products WHERE product_id=5;
If all 4 columns are specified by equality, it is possible to combine the single-column indexes using BitmapAnd. However, this would be less efficient than using one multi-column index on all four columns.
Since that will apparently be a very common query, it would make sense to have that multi-column index.
Usually you will want to index each foreign key column. Otherwise, if you want to delete an organization, for example, it would need to scan the whole table to verify that no records were still referencing it. Whichever column is the first one in the multi-column index will not need to also have a single-column index for it. But the other 3 which are not first probably still need their own indexes.
Indexes are (predominantly) used when filtering or joining tables, so whether the indexes you are proposing are useful is entirely dependent on the SQL you are running and whether the query optimiser determines that using an index would be beneficial.
For example, if you ran SELECT * FROM TABLE then none of the indexes would be used.
I can’t comment on Postgresql specifically but many/most DBMSs automatically create indexes when you define PKs/FKs - so you will get the indexes anyway, regardless of any performance tuning you are trying to implement
Update
Having individual indexes on each column is not going to help with the query you’ve provided, the optimiser will only use one of them, probably the PK. A compound index on multiple columns would help, but the more columns you add to the index, the more restrictive the pattern of queries it will benefit.
Say you have 3 columns A, B, C and include them all in WHERE clause, then having a compound index of A+B+C would be highly beneficial.
If you keep this index but your WHERE clause only has columns A, B it will still benefit significantly as the query can still use the A+B subset of the index.
If your WHERE clause only has columns A,C then it would benefit only slightly, as it would select all records from the index that start with the A value - but then would have to filter them to find the subset with the C value

What INCLUDE() function does when creating index in MS SQL Server?

What is the difference between creating an index using INCLUDE function vs not?
What would be the difference between the following two indexes?
CREATE NONCLUSTERED INDEX SomeName ON SomeTable (
ColumnA
,ColumnB
,ColumnC
,ColumnD
) INCLUDE (
ColumnE
,ColumnF
,ColumnG
)
vs
CREATE INDEX SomeName ON SomeTable (
ColumnA
,ColumnB
,ColumnC
,ColumnD
,ColumnE
,ColumnF
,ColumnG
)
The INCLUDE clause adds the data at the lowest/leaf level, rather than in the index tree. This makes the index smaller because it's not part of the tree.
INCLUDE columns are not key columns in the index, so they are not ordered. This means it isn't really useful for predicates, sorting etc.. However, it may be useful if you have a residual lookup in a few rows from the key columns.
INCLUDE columns are not key columns in the index, so they are not ordered. This makes them not typically useful for JOINs or sorting. And because they are not key columns, they don't sit in the whole B-tree structure like key columns
By adding Include (or nonkey)columns, you can create nonclustered indexes that cover more queries. This is because the nonkey columns have the following benefits:
They can be data types not allowed as index key columns.
They are not considered by the Database Engine when calculating the number of index key columns or index key size.
An index with Included columns can significantly improve query performance when all columns in the query are included in the index either as key or nonkey columns. Performance gains are achieved because the query optimizer can locate all the column values within the index; table or clustered index data is not accessed resulting in fewer disk I/O operations.
For more info refer Microsoft docs: Create Indexes with Included Columns
When an execution plan uses an index, it has access to all the columns in the index. If all the columns from a given table are in the index, there is no need to refer to the original data pages. Eliminating that data page lookup is a gain in efficiency.
However, including columns in indexes has overhead for the indexing structure itself (this is in addition to duplicating the values).
The INCLUDE keyword allows for column values to be in the index, without incurring the overhead of the additional indexing structure. The purpose is to resolve queries without having to look up the column information on the original data pages.

Index is not getting used

This is excerpt from Tom Kyte's book.
"We’re using a SELECT COUNT(*) FROM T query (or something similar)
and we have a B*Tree index on table T. However, the optimizer is full
scanning the table, rather than counting the (much smaller) index
entries. In this case, the index is probably on a set of columns that
can contain Nulls. Since a totally Null index entry would never be
made, the count of rows in the index will not be the count of rows in
the table. Here the optimizer is doing the right thing—it would get
the wrong answer if it used the index to count rows."
As far as I know indexes come into picture when we use a WHERE clause. Why index come in the above scenario? Before countering him I wanted to know the facts.
"As far as i know indexes comes in picture when you used where clause. "
That's one use case for indexes, when we want quick access to rows identified by specific values of indexed column(s). But there are other uses.
Counting rows is one. To count the number of rows in a table Oracle actually has to count each row (because statistics may not be fresh enough), which means literally reading each block of storage and counting the rows in each block. Potentially that's a lot of reads.
However, an index on a NOT NULL column also has an entry for each row of the table. Indexes are much smaller than tables (typically only one column) so an Index block contains many more entries than a Table block. Consequently Oracle has to read far fewer Index blocks to get the count of rows than scanning the table would require. Reading fewer blocks is faster than reading more blocks.
This isn't true if the table only has indexes on nullable columns. Oracle doesn't index null values (unless the index is a composite index and at least one column is populated) so a count of the entries in an index couldn't guarantee to be the actual count of the table's rows.
Another common use case for reading indexes is to satisfy a SELECT statement where all the columns in a projection are in one index, and the index also services any WHERE conditions.
Oracle Database does not store NULLs in the B-tree index, see the documentation
Oracle Database does not index table rows in which all key columns are
null, except for bitmap indexes or when the cluster key column value
is null.
Because of this, if the index has been created on a column that may contain null values, the database cannot use this index in a query like: SELECT COUNT(*) FROM T. Even when the column does not contain any NULLs, the optimizer doesn't know this unless the column has ben marked as NOT NULL.
According to the documentation - FAST FULL INDEX SCAN
Fast Full Index Scan
A fast full index scan is a full index scan in
which the database accesses the data in the index itself without
accessing the table, and the database reads the index blocks in no
particular order.
Fast full index scans are an alternative to a full table scan when
both of the following conditions are met:
The index must contain all columns needed for the query.
A row containing all nulls must not appear in the query result set.
For this result to be guaranteed, at least one column in the index
must have either:
A NOT NULL constraint
A predicate applied to the column that prevents nulls from being
considered in the query result set
So if you know that the indexed column cannot contain NULL values, then mark this column as NOT NULL using ALTER TABLE table_name MODIFY column_name column_type NOT NULL; and the database will use that index in the query: SELECT COUNT(*) FROM T
If the colum can have nulls, and cannot be marked as NOT NULL, then use a solution from #Gordon Linoff's answer.
You can force the indexing of NULL values by including a constant in the index:
create index t_table_col on t(col, 0);
The 1 is a constant expression that is never NULL.

Adding fields to optimize MySQL queries

I have a MySQL table with 3 fields:
Location
Variable
Value
I frequently use the following query:
SELECT *
FROM Table
WHERE Location = '$Location'
AND Variable = '$Variable'
ORDER BY Location, Variable
I have over a million rows in my table and queries are somewhat slow. Would it increase query speed if I added a field VariableLocation, which is the Variable and the Location combined? I would be able to change the query to:
SELECT *
FROM Table
WHERE VariableLocation = '$Location$Variable'
ORDER BY VariableLocation
I would add a covering index, for columns location and variable:
ALTER TABLE
ADD INDEX (variable, location);
...though if the variable & location pairs are unique, they should be the primary key.
Combining the columns will likely cause more grief than it's worth. For example, if you need to pull out records by location or variable only, you'd have to substring the values in a subquery.
Try adding an index which covers the two fields you should then still get a performance boost but also keep your data understandable because it wouldn't seem like the two columns should be combine but you are just doing it to get performance.
I would advise against combining the fields. Instead, create an index that covers both fields in the same order as your ORDER BY clause:
ALTER TABLE tablename ADD INDEX (location, variable);
Combined indices and keys are only used in queries that involve all fields of the index or a subset of these fields read from left to right. Or in other words: If you use location in a WHERE condition, this index would be used, but ordering by variable would not use the index.
When trying to optimize queries, the EXPLAIN command is quite helpful: EXPLAIN in mysql docs
Correction Update:
Courtesy: #paxdiablo:
A column in the table will make no difference. All you need is an index over both columns and the MySQL engine will use that. Adding a column in the table is actually worse than that since it breaks 3NF and wastes space. See http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html which states: SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2; If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly.

What is a Covered Index?

I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/