Clustered Index Vs Non-Clustered Index Usage - sql

I am new to query optimization in T-SQL and I am a bit confused with one of the implementations.
The scenario has been explained here: I have this table (Table A) on which regular inserts are happening, no updates - only inserts as data is being moved to another table (Table B) based on a filter on a particular column in Table A (Col-1).
Two columns in Table A which I am focusing on are Col-1 (identity column) and Col-2 (nvarchar(20) -- and has duplicates).
Col-2 is on which I am filtering my records when moving my data from Table A to Table B.
Should I be defining a clustered index on Col-1 and a nonclustered index on Col-2, since I am filtering on Col-2; or should I only define a nonclustered index on Col-2 to speed up query performance?
Or should I keep the table as Heap and only define nonclustered index on the Col-2.
Moreover, would defining a clustered index and storing the table as a B-Tree degrade performance as we are appending data into Table -A weekly through inserts.
Thanks for the help.

As many here have said, it's hard to say definitively what will be the best solution without testing. However, you say that you are filtering by col2 before choosing to move data. Depending on what percentage of those records are moved, I would suggest starting with clustering on the unique col1. Then create a non-clustered index on col2. One advantage of the non-clustered index is that you can make it a filtered index with a WHERE clause. So, for example, if only 10% of your records have a col2 value from a few choices that you care about, the index 'WHERE col2 IN (val, val2, val3) will be 10x smaller and therefore faster to access.
If you go this route, make sure the WHERE clause in your SELECT matches the WHERE clause you specify on the index.

Related

Adding specific index to SQL Server table to improve performance

I have a slow query on a table.
SELECT (some columns)
FROM Table
This table has an ID (integer, identity (1,1)) primary index which is the only index on this table.
The query has a WHERE clause:
WHERE Field05 <> 1
AND (Field01 LIKE '%something%' OR Field02 LIKE '%something%' OR
Field03 LIKE'%something%' OR Field04 LIKE'%something%')
Field05 is bit, not null
Field01 is NVarchar(255)
Field02 is NVarchar(255)
Field03 is Nchar(11)
Field04 is Varchar(50)
The execution plan shows a "Clustered index scan" resulting in a slow execution.
I tried adding indexes:
CREATE NONCLUSTERED INDEX IX_Aziende_RagSoc ON dbo.Aziende (Field01);
CREATE NONCLUSTERED INDEX IX_Aziende_Nome ON dbo.Aziende (Field02);
CREATE NONCLUSTERED INDEX IX_Aziende_PIVA ON dbo.Aziende (Field03);
CREATE NONCLUSTERED INDEX IX_Aziende_CodFisc ON dbo.Aziende (Field04);
CREATE NONCLUSTERED INDEX IX_Aziende_Eliminata ON dbo.Aziende (Field05);
Same performances, and again, the execution plan shows a "Clustered index scan"
I removed these 5 indexes and added only ONE index:
CREATE NONCLUSTERED INDEX IX_Aziende_Ricerca
ON Aziende (Field05)
INCLUDE (Field01, Field02, Field03, Field04)
Same performances, but in this situation the execution plan changes.
Is more complex but always slow.
I removed this index and added a different index:
CREATE NONCLUSTERED INDEX IX_Aziende_Ricerca
ON Aziende (Field05,Field01,Field02,Field03,Field04)
Same performances, in this situation the execution plan remains like in the previous situation.
The execution is always slow.
I have no other ideas ... someone can help?
This is too long for a comment.
First, you should use Field05 = 0 rather than Field05 <> 1. Equality is both easier to read and better for the optimizer. It won't make a difference in this particular case, unless you have a clustered index starting with Field05 or if almost all values are 1 (that is, the 0 is highly selective).
Second, in general, you can only optimize string pattern matching using a full text index. This in turn has other limitations, such as looking for words or prefixes (but not starting with wildcards).
The one exception is if "something" is a constant. In that case, you could add persisted computed columns with indexes to capture whether the value is present in these columns. However, I'm guessing that "something" is not constant.
That leaves you with full text indexes or with reconsidering your data model. Perhaps you are storing things in strings -- like lists of tags -- that should really be in a separate table.
Just to chime in with a few comments.
SQL Server tends to Table Scan Even if an index is present unless it thinks the Searched field Has a Cardinality of less than 1%. With this in mind there is never going to be any value in a index on a Bit field. (cardinality 50%!)
One option you might consider is to create a Filtered Index (WHERE Field05 = 0) Then you can include your other fields in this index.
Note this will only help you if you are not selecting any other columns from the table.
Can you check what proportion of your data has Field5=0 ?- If this is small (eg under 10%) then a filtered index might help.
I can't see any way that you can avoid a scan of some sort though - The best you can get is probably an Index scan.
Another option (essentially the same thing!) is to create a schema bound indexed view with all the columns you need and with the field5=0 filter hardcoded into the view.
Again - Unless you are certain that the Selected Column list is going to be a tiny proportion of the columns in the table then SQL will probably be faster with a table scan. If you were only ever selecting a handful of columns from a a very wide table then an index covering these columns might help as even though it will still be a scan - there will be more rows per page than scanning the full table.
So in summary - If you can guarantee a small subset of the table cols will be selected
AND field5 = 0 represents a minority of your rows in the table then a filtered index with Includes can be of value.
EG
CREATE NONCLUSTERED INDEX ix ON dbo.Aziende(ID) INCLUDE (Field01,Field02,Field03,Field04, [other cols used by select]) WHERE (field5=0)
Good Luck!
After a lot of fight I forgot the idea of adding an index.
Nothing changes with index.
I changed the C# code that builds the query, and now I try to understand the meaning of the "something" parameter received from the function.
If it is of type 1, then I build a WHERE on Field01
If it is of type 2, then I build a WHERE on Field02
If it is of type 3, then I build a WHERE on Field03
If it is of type 4, then I build a WHERE on Field04
This way, execution times becomes 1/4 of before.
Curstomers are satisfied.

In Oracle, if I make a composite index on 2 columns, then in which situation this index will be used to search the record?

In Oracle, if I make a composite index on 2 columns, then in which situation this index will be used to search the record ?
a) If my query has a WHERE clause which involves first column
e.g. WHERE first_column = 'John'
b) If my query has a WHERE clause which involves second column
e.g. WHERE second_column = 'Sharma'
c) Either a or b
d) Both a and b
e) Not specifically these 2 columns but it could be any column in the WHERE clause.
f) Only column a or both columns a and b
I happen to think that MySQL does a pretty good job of describing how composite indexes are used. The documentation is here.
The basic idea is that the index would normally be used in the following circumstances:
When the where condition is an equality on col1 (col1 = value).
When the where condition is an inequality or in on col1 (col1 in (list), col1 < value)
When the where condition is an equality on col1 and col2, connected by an and (col1 = val1 and col2 = val2)
When the where condition is an equality on col1 and an inequality or in on col2.
Any of the above four cases where additional columns are used with additional conditions on other columns, connected by an and.
In addition, the index would normally be used if col1 and col2 are the only columns referenced in the query. This is called a covering index, and -- assuming there are other columns in the table -- it is faster to read the index than the original table because the index is smaller.
Oracle has a pretty smart optimizer, so it might also use the index in some related circumstances, for instance when col1 uses an in condition along with a condition on col2.
In general, a condition will not qualify for an index if the column is an argument to a function. So, these clauses would not use a basic index:
where month(col1) = 3
where trunc(col1) = trunc(sysdate)
where abs(col1) < 1
Oracle supports functional indexes, so if these constructs are actually important, you can create an index on month(col1), trunc(col1), or abs(col1).
Also, or tends to make the use of indexes less likely.
d) Both a or b
If the leading column is used, Oracle will likely use a regular index range scan and just ignore the unused columns.
If a non-leading column is used, Oracle can use an index skip scan. In practice a skip scan is not used very often.
There are two completely different questions here: when can Oracle use an index and when will Oracle use an index. The above explains that Oracle can use an index in either case, and you can test that out with a hint: /*+ index(table_name index_name) */.
Determining when Oracle will use an index is much trickier. Oracle uses multi-block reads for full table scans and fast full index scans, and uses single-block reads for other index scans. This means a full table scan is more efficient when reading a larger percent of the data. But there are a lot of factors involved: the percentage of data, how big is the index, system statistics that tell Oracle how fast single- and multi-block IO are, the number of distinct values (especially important for choosing a skip scan), index clustering factor (how ordered is the table by the index columns), etc.
The optimizer will use indexes in several scenarios. Even if not "perfect".
Optimaly, if you are querying using the first columns in the index, then the index will be used. Even if you're referencing only the first column, then it will still use the index if the optimizer deems it filters out enough data.
If the indexed columns aren't answering the query requirement (for instance only referencing the second column in the where clause), the optimizer could still use the index for a full (table) index scan, if it holds all of the data required, because the index is smaller than the full table.
In your example, if you are only querying from that table, and you only have that one index, (a) will use the index, (b) will use it if you are only querying columns in the index, while the table itself has more.
If you have other indexes, or join other tables, then that could affect the explain plan compeltely.
Check out http://docs.oracle.com/cd/B19306_01/server.102/b14231/indexes.htm

Optimizing index update speed during bulk insert into table

When inserting large sets of data into a table (from another table, in no particular order), how do you optimize a multi-column index so that the index is updated in the fastest possible way?
Assume the index is never used in any SELECT, DELETE or UPDATE query.* Assume also that the distinct counts for the columns as follows (for example):
COLUMN | DISTINCT COUNT
col1 | 634
col2 | 9,923
col3 | 2,357
col4 | 3
* Reason for not using the index in selecting data is this is a primary key index or a unique constraint index. The index is in place so that inserts violating the constraint should fail.
I have read that the most selective column should come first. Is that correct, and is the index then to be created as follows?
(col2, col3, col1, col4)
If that is wrong, how do you determine the best order for column in an index which will only see bulk INSERTs into the corresponding table? The goal is to speed up the updating of the index during the bulk INSERT.
The quickest way is to DROP INDEX, then do the bulk inserts and CREATE INDEX when you are done inserting.
The proper structure of the index does not have so much to do with the distribution of values in the columns but with the retrieval strategies, presumably for UPDATE and DELETE only, and then specifically when you do partial filtering on some but not all always all columns of the index. Those more frequent filters should come first in your index columns. But you probably want to reconsider your indexing strategy more radically if this is the case: it may be better to have two or more indexes to match your typical retrieval strategies.
Ignoring your call for ignorance: why would you not apply the index to SELECT statements? Indexes are useful only for selecting subsets of data from your tables, whether that is for SELECT or a qualified UPDATE or DELETE. There is no functional difference for using indexes in any of these three operations.
Addendum after comments from OP: Indexes are useful for many purposes but their maintenance is relatively expensive, where "relatively" becomes "impossibly" very quickly with increasing table size. In your case you have to compare every record from your source table with every record in your destination table, or O(m*n) order. That is unworkable with tables of a large size, even with an index. Your best bet is to drop the index, do the inserts, create an index which is not unique, find and delete all duplicates, drop the index, finally create a new unique index.
The order of the columns is not terribly important for a uniqueness enforcement purposes. But it would be unusual for a unique index to not also happen to be useful for some queries, and so I would order the columns to take advantage of that.
For bulk inserting into this index rapidly, I'd try inserting in index order. So add an order by (col2, col3, col1, col4) to the select part of your insert. This leads to more efficient IO.

SQL Server multiple index order optimization

I have a table with a nonclustered index1 on ID1 and ID2, in that order.
Select count(distinct(id1)) from table
returns 1
and Select count(distinct(id2)) from table has all the values of the table.
The querys to that table uses ... where id1= XX and id2 = XX
Could it make any performance improvement if I switch the order of the fields of index1 ?
I know it SHOULD be better but maybe: is it indifferent because id1 has only 1 value?
If I understand correctly, you are comparing these two statements:
where id1= XX and id2 = XX
Under most circumstances, this would use either an index on table(id1, id2) or table(id2, id1). The order of the comparisons in the where (or on) clauses has no impact on which indexes can be used.
Whether you should include a column that has only a single value in the unique index is a different matter. There is a minor performance effect to having a more complex index -- the tree structure has to store more bytes for each key. However, the query:
select count(distinct id2)
from table
where id1 = xx and idx = xx
will actually run faster with a composite index than with a singleton index table(id2). The reason is that the composite index can be used to entirely satisfy the query (in the jargon, it is a "covering index for the query"). The singleton index would need to look up the value of id1 in the table data, which requires extra processing.
The order you define the columns in your Index matters. If your column ID1 will always only have 1 value, then there is no point in putting it into the index, unless you are using it in a Covering Index in a Non-Clustered Index (meaning an Index not the physical ordering of the Table itself). In general, your first column defined in your Index should be the column with the most Varying Values that you need to search through. Visualize it this way, if you had a table of 1 million rows, and the first Column in your Index only had 1 (or small number) of varying values, then would that Index help you in finding the rows you want among the 1 million? Or would it be better to have ID2 first, which would be more efficient for the search, and which would be more frequently used, is what you have to ask yourself. Below is also more info on your question.
SQL Server Clustered Index - Order of Index Question
If you are using a Non-Clustered index, it may appear to not make a Different if your first Column in your Index is all the same values. However it does matter, the reason being is a Non-Clustered Index is stored on a number of Pages. The more entries you can store on a Page which helps you search faster the better. If you include a Column on a Page which adds no value to the Search, then it will requires the same Index to span more Pages. Meaning more Pages to flip through and Longer Lookups. It also means less Room to add new entries to an Existing Page during Inserts when the index is updated, causing more Page Splits. So there are side effects to the decision to add a Column of only 1 value to the Index. If you are using the Column to "cover" retrieved values in common selects, then you can also use Included Columns in your Index, which has the added benefit of not reordering your Index and yet acts like a Covered Index. If that was the intended purpose originally for adding a Column which only has 1 value.

Decision when to create Index on table column in database?

I am not db guy. But I need to create tables and do CRUD operations on them. I get confused should I create the index on all columns by default
or not? Here is my understanding which I consider while creating index.
Index basically contains the memory location range ( starting memory location where first value is stored to end memory location where last value is
stored). So when we insert any value in table index for column needs to be updated as it has got one more value but update of column
value wont have any impact on index value. Right? So bottom line is when my column is used in join between two tables we should consider
creating index on column used in join but all other columns can be skipped because if we create index on them it will involve extra cost of
updating index value when new value is inserted in column.Right?
Consider this scenario where table mytable contains two three columns i.e col1,col2,col3. Now we fire this query
select col1,col2 from mytable
Now there are two cases here. In first case we create the index on col1 and col2. In second case we don't create any index.** As per my understanding
case 1 will be faster than case2 because in case 1 we oracle can quickly find column memory location. So here I have not used any join columns but
still index is helping here. So should I consider creating index here or not?**
What if in the same scenario above if we fire
select * from mytable
instead of
select col1,col2 from mytable
Will index help here?
Don't create Indexes in every column! It will slow things down on insert/delete/update operations.
As a simple reminder, you can create an index in columns that are common in WHERE, ORDER BY and GROUP BY clauses. You may consider adding an index in colums that are used to relate other tables (through a JOIN, for example)
Example:
SELECT col1,col2,col3 FROM my_table WHERE col2=1
Here, creating an index on col2 would help this query a lot.
Also, consider index selectivity. Simply put, create index on values that has a "big domain", i.e. Ids, names, etc. Don't create them on Male/Female columns.
but update of column value wont have any impact on index value. Right?
No. Updating an indexed column will have an impact. The Oracle 11g performance manual states that:
UPDATE statements that modify indexed columns and INSERT and DELETE
statements that modify indexed tables take longer than if there were
no index. Such SQL statements must modify data in indexes and data in
tables. They also create additional undo and redo.
So bottom line is when my column is used in join between two tables we should consider creating index on column used in join but all other columns can be skipped because if we create index on them it will involve extra cost of updating index value when new value is inserted in column. Right?
Not just Inserts but any other Data Manipulation Language statement.
Consider this scenario . . . Will index help here?
With regards to this last paragraph, why not build some test cases with representative data volumes so that you prove or disprove your assumptions about which columns you should index?
In the specific scenario you give, there is no WHERE clause, so a table scan is going to be used or the index scan will be used, but you're only dropping one column, so the performance might not be that different. In the second scenario, the index shouldn't be used, since it isn't covering and there is no WHERE clause. If there were a WHERE clause, the index could allow the filtering to reduce the number of rows which need to be looked up to get the missing column.
Oracle has a number of different tables, including heap or index organized tables.
If an index is covering, it is more likely to be used, especially when selective. But note that an index organized table is not better than a covering index on a heap when there are constraints in the WHERE clause and far fewer columns in the covering index than in the base table.
Creating indexes with more columns than are actually used only helps if they are more likely to make the index covering, but adding all the columns would be similar to an index organized table. Note that Oracle does not have the equivalent of SQL Server's INCLUDE (COLUMN) which can be used to make indexes more covering (it's effectively making an additional clustered index of only a subset of the columns - useful if you want an index to be unique but also add some data which you don't want to be considered in the uniqueness but helps to make it covering for more queries)
You need to look at your plans and then determine if indexes will help things. And then look at the plans afterwards to see if they made a difference.