I am trying to learn about optimizing databases and queries. I have a test table Objednavka, that has a foreign key attribute ODIS. In this database, queries like
SELECT * FROM Objednavka WHERE ODIS = 123
are frequent, so I created an index like this
CREATE NONCLUSTERED INDEX Objednavka_ODIS_index ON Objednavka (ODIS)
I then looked at the plan of the query I mentioned and this is what I see:
Can someone please explain why do I have the Index Seek and Key Lookup operations performed in parallel, then joined using Nested Loops? From what I learned, I thought that the Index Seek should be performed first, so that the engine finds the location of the row that contains (the indexed) ODIS attribute in the index, and then it should retrieve the whole data using Key Lookup, when it already knows the location, or am I wrong?
non clustered index has the PK columns added to it automatically by SQL Server. The index seek is to seek the index b-tree for the value(s) you provided for the indexed column(s) and the result is the PK values (or RIDs if your table is a heap). If all the columns you queried are part of the indexed columns, which is called a covering index, you query is done because SQL server can get all the information from the b-tree. If you need to return other columns not in the index, SQL server needs to find the data of the record using the PK, which is the key-lookup part.
Related
I'm having this issue in Oracle 11g R2. Table containing not null column which is indexed with non unique index. The index is not containing other columns.
Then I assumed that if I query distinct values of the column from the table, it would use index to get different values of the column (sounds logical to me). However at least explain plan is telling me it's doing full table scan. Also it took some time so probably the plan was not changed during run time. Optimizer index hint didn't helped.
I tried to search answer for this but no luck. Is there way to get values stored in index or somehow query the table without "touching" the table at all (like multi column index joins can)?
Thanks!
EDIT: This was about Oracle EBS gl_balances table and gl_balances_n2 index. I got answer and this changed the explain plan:
select /*+ index_ffs(gl gl_balances_n2) */
distinct gl.period_name
from gl_balances gl;
It may not be more efficient to scan the index than to scan the table -- don't forget that the index segment also contains branch nodes, and each index entry has to contain a ROWID of about 16 bytes (if memory serves).
So a "fast full index scan", which is the plan you're looking to get, may not be as fast as a full table scan. (You'd use an index_ffs() hint for that, by the way.)
edit: It be possible to use a more exotic method
Maintaining your own list by periodically querying the table using DBMS_Scheduler.
A materialized view. Complete refresh on demand might be adequate, though barely better than just periodically querying the data and maintaining your own unique list.
Making the index compressed, though that would only be of value for longish index keys.
A bitmap index -- not for a concurrently modified table though.
Tabe1 has around 10 Lack records (1 Million) and does not contain any primary key. Retrieving the data by using SELECT command ( With a specific WHERE condition) is taking large amount of time. Can we reduce the time of retrieval by adding a primary key to the table or do we need to follow any other ways to do the same. Kindly help me.
A primary key does not have a direct affect on performance. But indirectly, it does. This is because when you add a primary key to a table, SQL Server creates a unique index (clustered by default) that is used to enforce entity integrity. But you can create your own unique indexes on a table. So, strictly speaking, a primary index does not affect performance, but the index used by the primary key does.
WHEN SHOULD PRIMARY KEY BE USED?
Primary key is needed for referring to a specific record.
To make your SELECTs run fast you should consider adding an index on an appropriate columns you're using in your WHERE.
E.g. to speed-up SELECT * FROM "Customers" WHERE "State" = 'CA' one should create an index on State column.
Primarykey will not help if you don't have Primarykey in where cause.
If you would like to make you quesry faster, you can create non-cluster index on columns in where cause. You may want include columns on top of your index(it depend on your select cause)
The SQL optimizer will seek on your indexs that will make your query faster.
(but you should think about when data adding in your table. Insert operation might takes time if you create index on many columns.)
It depends on the SELECT statement, and the size of each row in the table, the number of rows in the table, and whether you are retrieving all the data in each row or only a small subset of the data (and if a subset, whether the data columns that are needed are all present in a single index), and on whether the rows must be sorted.
If all the columns of all the rows in the table must be returned, then you can't speed things up by adding an index. If, on the other hand, you are only trying to retrieve a tiny fraction of the rows, then providing appropriate indexes on the columns involved in the filter conditions will greatly improve the performance of the query. If you are selecting all, or most, of the rows but only selecting a few of the columns, then if all those columns are present in a single index and there are no conditions on columns not in the index, an index can help.
Without a lot more information, it is hard to be more specific. There are whole books written on the subject, including:
Relational Database Index Design and the Optimizers
One way you can do it is to create indexes on your table. It's always better to create a primary key, which creates a unique index that by default will reduce the retrieval time .........
The optimizer chooses an index scan if the index columns are referenced in the SELECT statement and if the optimizer estimates that an index scan will be faster than a table scan. Index files generally are smaller and require less time to read than an entire table, particularly as tables grow larger. In addition, the entire index may not need to be scanned. The predicates that are applied to the index reduce the number of rows to be read from the data pages.
Read more: Advantages of using indexes in database?
Also, when is it appropriate to use one?
An index is used to speed up searching in the database. MySQL has some good documentation on the subject (which is relevant for other SQL servers as well):
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
An index can be used to efficiently find all rows matching some column in your query and then walk through only that subset of the table to find exact matches. If you don't have indexes on any column in the WHERE clause, the SQL server has to walk through the whole table and check every row to see if it matches, which may be a slow operation on big tables.
The index can also be a UNIQUE index, which means that you cannot have duplicate values in that column, or a PRIMARY KEY which in some storage engines defines where in the database file the value is stored.
In MySQL you can use EXPLAIN in front of your SELECT statement to see if your query will make use of any index. This is a good start for troubleshooting performance problems. Read more here:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
A clustered index is like the contents of a phone book. You can open the book at 'Hilditch, David' and find all the information for all of the 'Hilditch's right next to each other. Here the keys for the clustered index are (lastname, firstname).
This makes clustered indexes great for retrieving lots of data based on range based queries since all the data is located next to each other.
Since the clustered index is actually related to how the data is stored, there is only one of them possible per table (although you can cheat to simulate multiple clustered indexes).
A non-clustered index is different in that you can have many of them and they then point at the data in the clustered index. You could have e.g. a non-clustered index at the back of a phone book which is keyed on (town, address)
Imagine if you had to search through the phone book for all the people who live in 'London' - with only the clustered index you would have to search every single item in the phone book since the key on the clustered index is on (lastname, firstname) and as a result the people living in London are scattered randomly throughout the index.
If you have a non-clustered index on (town) then these queries can be performed much more quickly.
An index is used to speed up the performance of queries. It does this by reducing the number of database data pages that have to be visited/scanned.
In SQL Server, a clustered index determines the physical order of data in a table. There can be only one clustered index per table (the clustered index IS the table). All other indexes on a table are termed non-clustered.
SQL Server Index Basics
SQL Server Indexes: The Basics
SQL Server Indexes
Index Basics
Index (wiki)
Indexes are all about finding data quickly.
Indexes in a database are analogous to indexes that you find in a book. If a book has an index, and I ask you to find a chapter in that book, you can quickly find that with the help of the index. On the other hand, if the book does not have an index, you will have to spend more time looking for the chapter by looking at every page from the start to the end of the book.
In a similar fashion, indexes in a database can help queries find data quickly. If you are new to indexes, the following videos, can be very useful. In fact, I have learned a lot from them.
Index Basics
Clustered and Non-Clustered Indexes
Unique and Non-Unique Indexes
Advantages and disadvantages of indexes
Well in general index is a B-tree. There are two types of indexes: clustered and nonclustered.
Clustered index creates a physical order of rows (it can be only one and in most cases it is also a primary key - if you create primary key on table you create clustered index on this table also).
Nonclustered index is also a binary tree but it doesn't create a physical order of rows. So the leaf nodes of nonclustered index contain PK (if it exists) or row index.
Indexes are used to increase the speed of search. Because the complexity is of O(log N). Indexes is very large and interesting topic. I can say that creating indexes on large database is some kind of art sometimes.
INDEXES - to find data easily
UNIQUE INDEX - duplicate values are not allowed
Syntax for INDEX
CREATE INDEX INDEX_NAME ON TABLE_NAME(COLUMN);
Syntax for UNIQUE INDEX
CREATE UNIQUE INDEX INDEX_NAME ON TABLE_NAME(COLUMN);
First we need to understand how normal (without indexing) query runs. It basically traverse each rows one by one and when it finds the data it returns. Refer the following image. (This image has been taken from this video.)
So suppose query is to find 50 , it will have to read 49 records as a linear search.
Refer the following image. (This image has been taken from this video)
When we apply indexing, the query will quickly find out the data without reading each one of them just by eliminating half of the data in each traversal like a binary search. The mysql indexes are stored as B-tree where all the data are in leaf node.
INDEX is a performance optimization technique that speeds up the data retrieval process. It is a persistent data structure that is associated with a Table (or View) in order to increase performance during retrieving the data from that table (or View).
Index based search is applied more particularly when your queries include WHERE filter. Otherwise, i.e, a query without WHERE-filter selects whole data and process. Searching whole table without INDEX is called Table-scan.
You will find exact information for Sql-Indexes in clear and reliable way:
follow these links:
For cocnept-wise understanding:
http://dotnetauthorities.blogspot.in/2013/12/Microsoft-SQL-Server-Training-Online-Learning-Classes-INDEX-Overview-and-Optimizations.html
For implementation-wise understanding:
http://dotnetauthorities.blogspot.in/2013/12/Microsoft-SQL-Server-Training-Online-Learning-Classes-INDEX-Creation-Deletetion-Optimizations.html
If you're using SQL Server, one of the best resources is its own Books Online that comes with the install! It's the 1st place I would refer to for ANY SQL Server related topics.
If it's practical "how should I do this?" kind of questions, then StackOverflow would be a better place to ask.
Also, I haven't been back for a while but sqlservercentral.com used to be one of the top SQL Server related sites out there.
An index is used for several different reasons. The main reason is to speed up querying so that you can get rows or sort rows faster. Another reason is to define a primary-key or unique index which will guarantee that no other columns have the same values.
So, How indexing actually works?
Well, first off, the database table does not reorder itself when we put index on a column to optimize the query performance.
An index is a data structure, (most commonly its B-tree {Its balanced tree, not binary tree}) that stores the value for a specific column in a table.
The major advantage of B-tree is that the data in it is sortable. Along with it, B-Tree data structure is time efficient and operations such as searching, insertion, deletion can be done in logarithmic time.
So the index would look like this -
Here for each column, it would be mapped with a database internal identifier (pointer) which points to the exact location of the row. And, now if we run the same query.
Visual Representation of the Query execution
So, indexing just cuts down the time complexity from o(n) to o(log n).
A detailed info - https://pankajtanwar.in/blog/what-is-the-sorting-algorithm-behind-order-by-query-in-mysql
INDEX is not part of SQL. INDEX creates a Balanced Tree on physical level to accelerate CRUD.
SQL is a language which describe the Conceptual Level Schema and External Level Schema. SQL doesn't describe Physical Level Schema.
The statement which creates an INDEX is defined by DBMS, not by SQL standard.
An index is an on-disk structure associated with a table or view that speeds retrieval of rows from the table or view. An index contains keys built from one or more columns in the table or view. These keys are stored in a structure (B-tree) that enables SQL Server to find the row or rows associated with the key values quickly and efficiently.
Indexes are automatically created when PRIMARY KEY and UNIQUE constraints are defined on table columns. For example, when you create a table with a UNIQUE constraint, Database Engine automatically creates a nonclustered index.
If you configure a PRIMARY KEY, Database Engine automatically creates a clustered index, unless a clustered index already exists. When you try to enforce a PRIMARY KEY constraint on an existing table and a clustered index already exists on that table, SQL Server enforces the primary key using a nonclustered index.
Please refer to this for more information about indexes (clustered and non clustered):
https://learn.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?view=sql-server-ver15
Hope this helps!
I have created an indexed view:
CREATE VIEW LogValueTexts WITH SCHEMABINDING AS
SELECT ISNULL(LRVS_SLOG_ID*256+LRVS_IDX,0) AS ID,LRVS_VALUE AS Value
FROM dbo.LRVS_LogRecordedValues WHERE LEN(LRVS_VALUE)>4
CREATE UNIQUE CLUSTERED INDEX IX_LogValueTexts ON LogValueTexts (ID)
On SQL 2005 Standard SP3 it takes forever to populate a full-text index on that view because the full-text indexing executes the following query for every row in the view:
SELECT COLUMN FULLTEXTALL FROM[dbo].[LogValueTexts] WHERE COLUMN FULLTEXTKEY = #p1
I assume that COLUMN FULLTEXTALL and COLUMN FULLTEXTKEY are actually Value and ID, but that's what SQL Server Profiler shows. The problem is that the query plan uses a clustered index scan over about 11M rows/1GB of data because it doesn't use the index on the view.
I have tried creating a plan guide for that query, but since it's not a standard T-SQL query it doesn't allow it (Incorrect syntax near the keyword 'FULLTEXTKEY').
Is there a way to get this full-text index to work other than:
upgrading to SQL 2008 (or SQL 2005 Enterprise) where it works fine.
creating a unique ID and a covering index on the underlying table.
Upgrading would require downtime on the server and probably new SQL Server licences while creating the unique ID and a covering index would waste a lot of space because only a subset of the 11M rows needs full-text indexing (LRVS_VALUE is often NULL or has a very short text value).
I dont know your data; why cant you put the full text index on the original table? You could add the calculated column into your table structure. That way you wouldnt have the index rebuild operation ( i think that is the cause of your scan)
If you can't do that then next easiest change is likely to create an lookup table populated with an sp or a triggerthat way you can change the table's indexing so they make sense for your query.
Your final option (and one you would need to spend some time on to get right) would be using partitioned tables. You could have a partition that covers the data filtered on the view. Full text index the entire table; your query at run time would hit the partitioned table with the relevant data in it.
Can you explain the concepts of, and relationship between, Covering Indexes and Covered Queries in Microsoft's SQL Server?
A covering index is one which can satisfy all requested columns in a query without performing a further lookup into the clustered index.
There is no such thing as a covering query.
Have a look at this Simple-Talk article: Using Covering Indexes to Improve Query Performance.
If all the columns requested in the select list of query, are available in the index, then the query engine doesn't have to lookup the table again which can significantly increase the performance of the query. Since all the requested columns are available with in the index, the index is covering the query. So, the query is called a covering query and the index is a covering index.
A clustered index can always cover a query, if the columns in the select list are from the same table.
The following links can be helpful, if you are new to index concepts:
Excellent video on advantages and disadvantages of index and covering
queries and indexes.
Indexes in SQL Server
A Covering Index is a Non-Clustered index. Both Clustered and Non-Clustered indexes use B-Tree data structure to improve the search for data, the difference is that in the leaves of a Clustered Index a whole record (i.e. row) is stored physically right there!, but this is not the case for Non-Clustered indexes. The following examples illustrate it:
Example: I have a table with three columns: ID, Fname and Lname.
However, for a Non-Clustered index, there are two possibilities: either the table already has a Clustered index or it doesn't:
As the two diagrams show, such Non-Clustered indexes do not provide a good performance, because they cannot find the favorite value (i.e. Lname) solely from the B-Tree. Instead they have to do an extra Look Up step (either Key or RID look up) to find the value of Lname. And, this is where covered index comes to the screen. Here, the Non-Clustered index on ID coveres the value of Lname right next to it in the leaves of the B-Tree and there is no need for any type of look up anymore.
A covered query is a query where all the columns in the query's result set are pulled from non-clustered indexes.
A query is made into a covered query by the judicious arrangement of indexes.
A covered query is often more performant than a non-covered query in part because non-clustered indexes have more rows per page than clustered indexes or heap indexes, so fewer pages need to be brought into memory in order to satisfy the query. They have more rows per page because only part of the table row is part of the index row.
A covering index is an index which is used in a covered query. There is no such thing as an index which, in and of itself, is a covering index. An index may be a covering index with respect to query A, while at the same time not being a covering index with respect to query B.
Here's an article in devx.com that says:
Creating a non-clustered index that contains all the columns used in a SQL query, a technique called index covering
I can only suppose that a covered query is a query that has an index that covers all the columns in its returned recordset. One caveat - the index and query would have to be built as to allow the SQL server to actually infer from the query that the index is useful.
For example, a join of a table on itself might not benefit from such an index (depending on the intelligence of the SQL query execution planner):
PersonID ParentID Name
1 NULL Abe
2 NULL Bob
3 1 Carl
4 2 Dave
Let's assume there's an index on PersonID,ParentID,Name - this would be a covering index for a query like:
SELECT PersonID, ParentID, Name FROM MyTable
But a query like this:
SELECT PersonID, Name FROM MyTable LEFT JOIN MyTable T ON T.PersonID=MyTable.ParentID
Probably wouldn't benifit so much, even though all of the columns are in the index. Why? Because you're not really telling it that you want to use the triple index of PersonID,ParentID,Name.
Instead, you're building a condition based on two columns - PersonID and ParentID (which leaves out Name) and then you're asking for all the records, with the columns PersonID, Name. Actually, depending on implementation, the index might help the latter part. But for the first part, you're better off having other indexes.
A covering query is on where all the predicates can be matched using the indices on the underlying tables.
This is the first step towards improving the performance of the sql under consideration.
a covering index is the one which gives every required column and in which SQL server don't have hop back to the clustered index to find any column. This is achieved using non-clustered index and using INCLUDE option to cover columns.
Non-key columns can be included only in non-clustered indexes. Columns can’t be defined in both the key column and the INCLUDE list. Column names can’t be repeated in the INCLUDE list. Non-key columns can be dropped from a table only after the non-key index is dropped first. Please see details here
When I simply recalled that a Clustered Index consists of a key-ordered non-heap list of ALL the columns in the defined table, the lights went on for me. The word "cluster", then, refers to the fact that there is a "cluster" of all the columns, like a cluster of fish in that "hot spot". If there is no index covering the column containing the sought value (the right side of the equation), then the execution plan uses a Clustered Index Seek into the Clustered Index's representation of the requested column because it does not find the requested column in any other "covering" index. The missing will cause a Clustered Index Seek operator in the proposed Execution Plan, where the sought value is within a column inside the ordered list represented by the Clustered Index.
So, one solution is to create a non-clustered index that has the column containing the requested value inside the index. In this way, there is no need to reference the Clustered Index, and the Optimizer should be able to hook that index in the Execution Plan with no hint. If, however, there is a Predicate naming the single column clustering key and an argument to a scalar value on the clustering key, the Clustered Index Seek Operator will still be used, even if there is already a covering index on a second column in the table without an index.
Page 178, High Performance MySQL, 3rd Edition
An index that contains (or "covers") all the data needed to satisfy a query is called a covering index.
When you issue a query that is covered by an index (an indexed-covered query), you'll see "Using Index" in the Extra column in EXPLAIN.