One column index vs 2 column index - sql

I want to execute the next delete query:
delete from MyTable where userId = 5
What of the below indexed will have better performance on this query, or will they both perform the same?
All the mentioned fields in here are BigInt
CREATE INDEX MyTable_UserId_UserBalance_index ON Main.dbo.MyTable (UserId, UserBalance);
CREATE INDEX MyTable_UserId_index ON Main.dbo.MyTable (UserId);

For that particular query they should perform roughly the same. Since you're looking up the first item in the compound index, finding the records should be the same as if there were another index on the single column.

I'm surprised at many of the answers here - for the purpose of deletion, there won't be any benefit using either one of these indexes over the other. For if the record exists in either index, you'll need to find it and remove it.
The purpose of indexes is for reads, not deleting data. If you're trying to read data, you would ask a question like this one, as one index has the potential to return the data quicker than the other. For deletes, you need to delete from all indexes, including the NC indexes.
It seems as if some enlightenment into the world of indexes is being called for.
Some great (free) documentation from well-known DBA - Brent Ozar: https://www.brentozar.com/archive/2016/10/think-like-engine-class-now-free-open-source/

CREATE INDEX MyTable_UserId_index ON Main.dbo.MyTable (UserId);

Both the indexes would work the same way unless the order is reversed for below index.
From:
CREATE INDEX MyTable_UserId_UserBalance_index ON Main.dbo.MyTable (UserId, UserBalance);
To:
CREATE INDEX MyTable_UserId_UserBalance_index ON Main.dbo.MyTable (UserBalance, UserId);
In second case, server may not see this index as useful as UserId is at second level.
Also, why would you create two indexes with same column in it? If you know that your table will be queried frequently with both UserId and UserBalance then, probably it is best to create an index with both columns in it.
Again, just make sure which column gets utilized the most.

Related

How to build an index that is best for this SQL?

I'm querying database by a simple SQL like:
SELECT DISTINCT label FROM Document WHERE folderId=123
I need a best index for it.
There are two way to implement this.
(1) create two indexes
CREATE INDEX .. ON Document (label)
CREATE INDEX .. ON Document (folderId)
(2) create a composite index:
CREATE INDEX .. ON Document (folderId, label)
According to the implementation of the database index. Which method is more reasonable? Thank you.
Your second index -- the composite index -- is the best index for this query:
SELECT DISTINCT label
FROM Document
WHERE folderId = 123;
First, the index covers the query so the data pages do not need to be accessed (in most databases).
The index works because the engine can seek to the records with the identified folderId. The information on label can be pulled. Most databases should use the index for pulling the distinct labels as well.
The answer depends also on the used dbms. I'm not sure if all dbms support the use of multiple indexes.
I should think the combined index (folderid, label) is the best solution in your case as it is possible to build your result set solely by using index data. So it doesn't even need to access the real data.
It can even be a strategy to add extra columns to an index so a query which is called a lot can be answered by only accessing the index.

Why can't I simply add an index that includes all columns?

I have a table in SQL Server database which I want to be able to search and retrieve data from as fast as possible. I don't care about how long time it takes to insert into the table, I am only interested in the speed at which I can get data.
The problem is the table is accessed with 20 or more different types of queries. This makes it a tedious task to add an index specially designed for each query. I'm considering instead simply adding an index that includes ALL columns of the table. It's not something you would normally do in "good" database design, so I'm assuming there is some good reason why I shouldn't do it.
Can anyone tell me why I shouldn't do this?
UPDATE: I forgot to mention, I also don't care about the size of my database. It's OK that it means my database size will grow larger than it needed to
First of all, an index in SQL Server can only have at most 900 bytes in its index entry. That alone makes it impossible to have an index with all columns.
Most of all: such an index makes no sense at all. What are you trying to achieve??
Consider this: if you have an index on (LastName, FirstName, Street, City), that index will not be able to be used to speed up queries on
FirstName alone
City
Street
That index would be useful for searches on
(LastName), or
(LastName, FirstName), or
(LastName, FirstName, Street), or
(LastName, FirstName, Street, City)
but really nothing else - certainly not if you search for just Street or just City!
The order of the columns in your index makes quite a difference, and the query optimizer can't just use any column somewhere in the middle of an index for lookups.
Consider your phone book: it's order probably by LastName, FirstName, maybe Street. So does that indexing help you find all "Joe's" in your city? All people living on "Main Street" ?? No - you can lookup by LastName first - then you get more specific inside that set of data. Just having an index over everything doesn't help speed up searching for all columns at all.
If you want to be able to search by Street - you need to add a separate index on (Street) (and possibly another column or two that make sense).
If you want to be able to search by Occupation or whatever else - you need another specific index for that.
Just because your column exists in an index doesn't mean that'll speed up all searches for that column!
The main rule is: use as few indices as possible - too many indices can be even worse for a system than having no indices at all.... build your system, monitor its performance, and find those queries that cost the most - then optimize these, e.g. by adding indices.
Don't just blindly index every column just because you can - this is a guarantee for lousy system performance - any index also requires maintenance and upkeep, so the more indices you have, the more your INSERT, UPDATE and DELETE operations will suffer (get slower) since all those indices need to be updated.
You are having a fundamental misunderstanding how indexes work.
Read this explanation "how multi-column indexes work".
The next question you might have is why not creating one index per column--but that's also a dead-end if you try to reach top select performance.
You might feel that it is a tedious task, but I would say it's a required task to index carefully. Sloppy indexing strikes back, as in this example.
Note: I am strongly convinced that proper indexing pays off and I know that many people are having the very same questions you have. That's why I'm writing a the a free book about it. The links above refer the pages that might help you to answer your question. However, you might also want to read it from the beginning.
...if you add an index that contains all columns, and a query was actually able to use that index, it would scan it in the order of the primary key. Which means hitting nearly every record. Average search time would be O(n/2).. the same as hitting the actual database.
You need to read a bit lot about indexes.
It might help if you consider an index on a table to be a bit like a Dictionary in C#.
var nameIndex = new Dictionary<String, List<int>>();
That means that the name column is indexed, and will return a list of primary keys.
var nameOccupationIndex = new Dictionary<String, List<Dictionary<String, List<int>>>>();
That means that the name column + occupation columns are indexed. Now imagine the index contained 10 different columns, nested so far deep it contains every single row in your table.
This isn't exactly how it works mind you. But it should give you an idea of how indexes could work if implemented in C#. What you need to do is create indexes based on one or two keys that are queried on extensively, so that the index is more useful than scanning the entire table.
If this is a data warehouse type operation where queries are highly optimized for READ queries, and if you have 20 ways of dissecting the data, e.g.
WHERE clause involves..
Q1: status, type, customer
Q2: price, customer, band
Q3: sale_month, band, type, status
Q4: customer
etc
And you absolutely have plenty of fast storage space to burn, then by all means create an index for EVERY single column, separately. So a 20-column table will have 20 indexes, one for each individual column. I could probably say to ignore bit columns or low cardinality columns, but since we're going so far, why bother (with that admonition). They will just sit there and churn the WRITE time, but if you don't care about that part of the picture, then we're all good.
Analyze your 20 queries, and if you have hot queries (the hottest ones) that still won't go any faster, plan it using SSMS (press Ctrl-L) with one query in the query window. It will tell you what index can help that queries - just create it; create them all, fully remembering that this adds again to the write cost, backup file size, db maintenance time etc.
I think the questioner is asking
'why can't I make an index like':
create index index_name
on table_name
(
*
)
The problems with that have been addressed.
But given it sounds like they are using MS sql server.
It's useful to understand that you can include nonkey columns in an index so they the values of those columns are available for retrieval from the index, but not to be used as selection criteria :
create index index_name
on table_name
(
foreign_key
)
include (a,b,c,d) -- every column except foreign key
I created two tables with a million identical rows
I indexed table A like this
create nonclustered index index_name_A
on A
(
foreign_key -- this is a guid
)
and table B like this
create nonclustered index index_name_B
on B
(
foreign_key -- this is a guid
)
include (id,a,b,c,d) -- ( every key except foreign key)
no surprise, table A was slightly faster to insert to.
but when I and ran these this queries
select * from A where foreign_key = #guid
select * from B where foreign_key = #guid
On table A, sql server didn't even use the index, it did a table scan, and complained about a missing index including id,a,b,c,d
On table B, the query was over 50 times faster with much less io
forcing the query on A to use the index didn't make it any faster
select * from A where foreign_key = #guid
select * from A with (index(index_name_A)) where foreign_key = #guid
I'm considering instead simply adding an index that includes ALL columns of the table.
This is always a bad idea. Indexes in database is not some sort of pixie dust that works magically. You have to analyze your queries and according to what and how is being queried - append indexes.
It is not as simple as "add everything to index and have a nap"
I see only long and complicated answers here so I thought I should give the simplest answer possible.
You cannot add an entire table, or all its columns, to an index because that just duplicates the table.
In simple terms, an index is just another table with selected data ordered in the order you normally expect to query it in, and a pointer to the row on disk where the rest of the data lives.
So, a level of indirection exists. You have a partial copy of a table in an preordered manner (both on disk and in RAM, assuming the index is not fragmented), which is faster to query for the columns defined in the index only, while the rest of the columns can be fetched without having to scan the disk for them, because the index contains a reference to the correct position on disk where the rest of the data is for each row.
1) size, an index essentially builds a copy of the data in that column some easily searchable structure, like a binary tree (I don't know SQL Server specifcs).
2) You mentioned speed, index structures are slower to add to.
That index would just be identical to your table (possibly sorted in another order).
It won't speed up your queries.

Do any databases allow multiple indexes on the same table to be created simultaneously?

I'm pretty sure this can't be done in Oracle, but I'd love to be proved wrong...
Say I have a huge table in with lots of columns and I want to create indexes on a dozen or so columns. Using Oracle, I'd fire off several sequential create index statements and go off and boil the kettle.
Each create index needs to scan through every row in the table to form the index.
i.e. 10 indexes = 10 full scans.
You'd think an obvious optimisation would be to scan the table once and index the 10 columns at the same time. Wouldn't you?
create indexes on mytable (
ix_mytable_cola (cola),
ix_mytable_colb (colb),
ix_mytable_colc (colc)
);
So obvious that there must be a great reason why it's not there.
Any ideas?
I could fire off each create index simultaneously in separate sessions and hope the database buffer cache saved the day, but seems like a long shot.
EDIT
I didn't get a definitive answer so I asked the same question on Oracle-L:
http://www.freelists.org/post/oracle-l/Creating-multiple-indexes
General consensus was that it isn't available but would perhaps be a useful feature. Most useful response was from David Aldridge who suggested that if the create index statements were all kicked off at the same time then Oracle would 'do the right thing'.
I don't believe it's possible in Oracle, or any other DBMS. However in Oracle you can speed up index creation using options like PARALLEL and NOLOGGING.
PARALLEL allows you to parallelize the processing onto N other CPUS.
NOLOGGING forgoes writing to the redo log (which may not be for you).
CREATE INDEX some_idx
ON a_table(col1, col2, col3)
PARALLEL 3
NOLOGGING;
The answer is no for Oracle and according to my research it is also no for DB2. I doubt any others have that feature.
In your example, you had mutiple single-column indexes, so the following suggestion does not apply here. But I wanted to point it out anyway since it IS an example of how to cut down on index creation time in certain cases.
When the table rows are physically sorted in the same order as the index you are building, you may specify the "NOSORT" option in your create index statement. That way Oracle does not have to sort the rows during the index creation (which is a killer when the sort spills to disk).
This is typically useful when you create an empty table (or CTAS), insert the rows in some specific order (enforced by order by), and then directly create an index using the same column order as your order by statement.
Noticed this blog post by David Aldridge from March 2008...
http://oraclesponge.wordpress.com/2008/03/26/how-to-create-multiple-indexes-in-a-single-ddl-statement/

SQL2005 indexes

Was scanning through a SQL2005 database and saw the following two indexes for a table:
**PK_CLUSTERED_INDEX**
USER_ID
COMPANY_ID
DEPARTMENT_ID
**NON-unique_NON-clustered_INDEX**
USER_ID
COMPANY_ID
My initial thought is, drop the last index since the PK_CLUSTERED_INDEX already contains those columns, correct order and sort. Do the last index provide any gains at all?
If searching by User_ID, or User_ID and Company_ID columns, both indexes could fulfil that.
However, only the PK index would then be ideal if the Department_Id is also queried on in addition to those 2 fields.
If a query filters on User_ID and Company_ID, and needs to return other data columns then the PK index is still best as it has all the data there to hand. Whereas the nonclustered index doesn't so would probably need a Key Lookup to pull out the extra fields which is not as efficient.
It looks redundant to me, so I'd definitely be considering removing it.
To see whether an index is actually being used/get a feel for the level of usage, you can run one of the various index usage stats scripts out there. A good example is here.
In this case, drop the index, given it's non-unique, I would bet the optimizer never hits it, the first index is more unique and doesn't involve a row lookup after finding a match.
First one's all around better, you're not losing anything by dropping the second.
I would drop the NON-unique_NON-clustered_INDEX index, it is redundant and not needed.

Two single-column indexes vs one two-column index in MySQL?

I'm faced with the following and I'm not sure what's best practice.
Consider the following table (which will get large):
id PK | giver_id FK | recipient_id FK | date
I'm using InnoDB and from what I understand, it creates indices automatically for the two foreign key columns. However, I'll also be doing lots of queries where I need to match a particular combination of:
SELECT...WHERE giver_id = x AND recipient_id = t.
Each such combination will be unique in the table.
Is there any benefit from adding an two-column index over these columns, or would the two individual indexes in theory be sufficient / the same?
If you have two single column indexes, only one of them will be used in your example.
If you have an index with two columns, the query might be faster (you should measure). A two column index can also be used as a single column index, but only for the column listed first.
Sometimes it can be useful to have an index on (A,B) and another index on (B). This makes queries using either or both of the columns fast, but of course uses also more disk space.
When choosing the indexes, you also need to consider the effect on inserting, deleting and updating. More indexes = slower updates.
A covering index like:
ALTER TABLE your_table ADD INDEX (giver_id, recipient_id);
...would mean that the index could be used if a query referred to giver_id, or a combination of giver_id and recipient_id. Mind that index criteria is leftmost based - a query referring to only recipient_id would not be able to use the covering index in the statement I provided.
Please note that some older MySQL versions can only use one index per SELECT so a covering index would be the best means of optimizing your queries.
If one of the foreign key indexes is already very selective, then the database engine should use that one for the query you specified. Most database engines use some kind of heuristic to be able to choose the optimal index in that situation. If neither index is highly selective by itself, it probably does make sense to add the index built on both keys since you say you will use that type of query a lot.
Another thing to consider is if you can eliminate the PK field in this table and define the primary key index on the giver_id and recipient_id fields. You said that the combination is unique, so that would possibly work (given a lot of other conditions that only you can answer). Typically, though, I think the added complexity that adds is not worth the hassle.
Another thing to consider is that the performance characteristics of both approaches will be based on the size and cardinality of the dataset. You may find that the 2-column index only becomes noticing more performant at a certain dataset size threshold, or the exact opposite. Nothing can substitute for performance metrics for your exact scenario.