Two single-column indexes vs one two-column index in MySQL? - sql

I'm faced with the following and I'm not sure what's best practice.
Consider the following table (which will get large):
id PK | giver_id FK | recipient_id FK | date
I'm using InnoDB and from what I understand, it creates indices automatically for the two foreign key columns. However, I'll also be doing lots of queries where I need to match a particular combination of:
SELECT...WHERE giver_id = x AND recipient_id = t.
Each such combination will be unique in the table.
Is there any benefit from adding an two-column index over these columns, or would the two individual indexes in theory be sufficient / the same?

If you have two single column indexes, only one of them will be used in your example.
If you have an index with two columns, the query might be faster (you should measure). A two column index can also be used as a single column index, but only for the column listed first.
Sometimes it can be useful to have an index on (A,B) and another index on (B). This makes queries using either or both of the columns fast, but of course uses also more disk space.
When choosing the indexes, you also need to consider the effect on inserting, deleting and updating. More indexes = slower updates.

A covering index like:
ALTER TABLE your_table ADD INDEX (giver_id, recipient_id);
...would mean that the index could be used if a query referred to giver_id, or a combination of giver_id and recipient_id. Mind that index criteria is leftmost based - a query referring to only recipient_id would not be able to use the covering index in the statement I provided.
Please note that some older MySQL versions can only use one index per SELECT so a covering index would be the best means of optimizing your queries.

If one of the foreign key indexes is already very selective, then the database engine should use that one for the query you specified. Most database engines use some kind of heuristic to be able to choose the optimal index in that situation. If neither index is highly selective by itself, it probably does make sense to add the index built on both keys since you say you will use that type of query a lot.
Another thing to consider is if you can eliminate the PK field in this table and define the primary key index on the giver_id and recipient_id fields. You said that the combination is unique, so that would possibly work (given a lot of other conditions that only you can answer). Typically, though, I think the added complexity that adds is not worth the hassle.

Another thing to consider is that the performance characteristics of both approaches will be based on the size and cardinality of the dataset. You may find that the 2-column index only becomes noticing more performant at a certain dataset size threshold, or the exact opposite. Nothing can substitute for performance metrics for your exact scenario.

Related

Is unique composite index as effective as non-composite for queries on first column?

I have a table with b-tree index on column A (non-unique). Now I want to add a check for uniqueness of column A and column B combination when inserting, so I want to add a unique composite index (A, B).
Should I drop existing non-composite index? (queries in most cases use single index, as I have read)?
Will unique composite index be as effective as non-unique non-composite one for queries only on column A?
If you have a lot of queries going for column A only in a where clause, then most likely you should keep the index on column A in addition to the new one.
The amount of queries which would use the index and the query cost difference are the 2 most important criteria for deciding whether or not to leave the index. As it depends on many factors like amount of content in the table and also query parameters, as Frank Heikens comment says, you can use the EXPLAIN ANALYZE statements to check important queries with and without the index to confirm your hypothesis.
There is a very small probability it would make sense to keep both indexes. If the unique index is almost never exercised (because you never do inserts or non-HOT updates, or queries that benefit from both columns) and you have precisely the right amount of memory and memory usage patterns, then it is possible for the single-column index to be small enough to stay in cache while the composite would not be.
But most likely what would happen is that the composite index would be used at least enough of the time that both indexes would be fighting with each other for cache space, making it overall less effective.

Apache Cassandra. Advantage and disadvantage of Secondary Index

I have read, that Secondary Index in Cassandra is quite useless feature. Indeed, it makes writing to DB much more slower, you can find value only by exact index and you need to make requests to all servers in claster to find value by index. Can anyone tell me about benifit, that will be the reason to use Secondary Index?
Querying becomes more flexible when you add secondary indexes to table columns. You can add indexed columns to the WHERE clause of a SELECT.
When to use secondary indexes
You want to query on a column that isn't the primary key and isn't part of a composite key. The column you want to be querying on has few unique values (what I mean by this is, say you have a column Town, that is a good choice for secondary indexing because lots of people will be form the same town, date of birth however will not be such a good choice).
When to avoid secondary indexes
Try not using secondary indexes on columns contain a high count of unique values and that will produce few results.
As always, check out the documentation:
About Indexes in Cassandra
FAQ for Secondary Indexes

Issue with the big tables ( no primary key available)

Tabe1 has around 10 Lack records (1 Million) and does not contain any primary key. Retrieving the data by using SELECT command ( With a specific WHERE condition) is taking large amount of time. Can we reduce the time of retrieval by adding a primary key to the table or do we need to follow any other ways to do the same. Kindly help me.
A primary key does not have a direct affect on performance. But indirectly, it does. This is because when you add a primary key to a table, SQL Server creates a unique index (clustered by default) that is used to enforce entity integrity. But you can create your own unique indexes on a table. So, strictly speaking, a primary index does not affect performance, but the index used by the primary key does.
WHEN SHOULD PRIMARY KEY BE USED?
Primary key is needed for referring to a specific record.
To make your SELECTs run fast you should consider adding an index on an appropriate columns you're using in your WHERE.
E.g. to speed-up SELECT * FROM "Customers" WHERE "State" = 'CA' one should create an index on State column.
Primarykey will not help if you don't have Primarykey in where cause.
If you would like to make you quesry faster, you can create non-cluster index on columns in where cause. You may want include columns on top of your index(it depend on your select cause)
The SQL optimizer will seek on your indexs that will make your query faster.
(but you should think about when data adding in your table. Insert operation might takes time if you create index on many columns.)
It depends on the SELECT statement, and the size of each row in the table, the number of rows in the table, and whether you are retrieving all the data in each row or only a small subset of the data (and if a subset, whether the data columns that are needed are all present in a single index), and on whether the rows must be sorted.
If all the columns of all the rows in the table must be returned, then you can't speed things up by adding an index. If, on the other hand, you are only trying to retrieve a tiny fraction of the rows, then providing appropriate indexes on the columns involved in the filter conditions will greatly improve the performance of the query. If you are selecting all, or most, of the rows but only selecting a few of the columns, then if all those columns are present in a single index and there are no conditions on columns not in the index, an index can help.
Without a lot more information, it is hard to be more specific. There are whole books written on the subject, including:
Relational Database Index Design and the Optimizers
One way you can do it is to create indexes on your table. It's always better to create a primary key, which creates a unique index that by default will reduce the retrieval time .........
The optimizer chooses an index scan if the index columns are referenced in the SELECT statement and if the optimizer estimates that an index scan will be faster than a table scan. Index files generally are smaller and require less time to read than an entire table, particularly as tables grow larger. In addition, the entire index may not need to be scanned. The predicates that are applied to the index reduce the number of rows to be read from the data pages.
Read more: Advantages of using indexes in database?

when I use something like unique key(element1,element2) how does it work internally?

If I say index(element1),
index(element2) does it use much
less space than unique
key(element1,element2)?
I know what they do is different. My
understanding is that unique
key(element1,element2) ensures that
there are no duplicates where those
2 rows are the same. Is this
correct?
Does it still index both keys
individually?
But is this expensive in terms of
disk space and checking to create
such and index?
Maybe it's better to not have it if
it's not critical there are no
duplicates?
An INDEX(a,b) uses less space than two indexes INDEX(a) and INDEX(b), because each index consists of (a part of) that column and the primary key. But read the note below about the functional difference between these indices.
Correct. A UNIQUE KEY makes that no 2 rows have the same values for the columns in that key.
A UNIQUE INDEX is also an INDEX and can be used for searching. A special example of a UNIQUE KEY is the PRIMARY KEY.
Indexes do take up space on the disk, depending on your Storage Engine. If your application is write-heavy (like a logging table), sometimes it might be better to not have an index. Most tables are probably read-heavy though.
From a logical point of view, if it's not critical there are no duplicates, don't put the index.
Edit: elaboration on pst's comment:
If you have INDEX(A), INDEX(B) and INDEX(A,B), your INDEX(A) is redundant. Drop it
But INDEX(A,B) does not cover queries that only search on B, you need an INDEX(B) for that.
You can argument that INDEX(A) and INDEX(B) together can use MySQL's INDEX MERGE to form the INDEX(A,B). This leaves you the choice between
INDEX(A,B) and INDEX(B)
INDEX(A) and INDEX(B)
Solution 2 will take less disk-space, that is true. But read this Very Nice MySQLPerformanceBlog article about INDEX MERGE, which comes to this conclusion:
As a summary: Use multi column indexes
is typically best idea if you use AND
between such columns in where clause.
Index merge does helps performance but
it is far from performance of combined
index in this case. In case you're
using OR between columns - single
column indexes are required for index
merge to work and combined indexes
can't be used for such queries.

How to know when to use indexes and which type?

I've searched a bit and didn't see any similar question, so here goes.
How do you know when to put an index in a table? How do you decide which columns to include in the index? When should a clustered index be used?
Can an index ever slow down the performance of select statements? How many indexes is too many and how big of a table do you need for it to benefit from an index?
EDIT:
What about column data types? Is it ok to have an index on a varchar or datetime?
Well, the first question is easy:
When should a clustered index be used?
Always. Period. Except for a very few, rare, edge cases. A clustered index makes a table faster, for every operation. YES! It does. See Kim Tripp's excellent The Clustered Index Debate continues for background info. She also mentions her main criteria for a clustered index:
narrow
static (never changes)
unique
if ever possible: ever increasing
INT IDENTITY fulfills this perfectly - GUID's do not. See GUID's as Primary Key for extensive background info.
Why narrow? Because the clustering key is added to each and every index page of each and every non-clustered index on the same table (in order to be able to actually look up the data row, if needed). You don't want to have VARCHAR(200) in your clustering key....
Why unique?? See above - the clustering key is the item and mechanism that SQL Server uses to uniquely find a data row. It has to be unique. If you pick a non-unique clustering key, SQL Server itself will add a 4-byte uniqueifier to your keys. Be careful of that!
Next: non-clustered indices. Basically there's one rule: any foreign key in a child table referencing another table should be indexed, it'll speed up JOINs and other operations.
Furthermore, any queries that have WHERE clauses are a good candidate - pick those first which are executed a lot. Put indices on columns that show up in WHERE clauses, in ORDER BY statements.
Next: measure your system, check the DMV's (dynamic management views) for hints about unused or missing indices, and tweak your system over and over again. It's an ongoing process, you'll never be done! See here for info on those two DMV's (missing and unused indices).
Another word of warning: with a truckload of indices, you can make any SELECT query go really really fast. But at the same time, INSERTs, UPDATEs and DELETEs which have to update all the indices involved might suffer. If you only ever SELECT - go nuts! Otherwise, it's a fine and delicate balancing act. You can always tweak a single query beyond belief - but the rest of your system might suffer in doing so. Don't over-index your database! Put a few good indices in place, check and observe how the system behaves, and then maybe add another one or two, and again: observe how the total system performance is affected by that.
Rule of thumb is primary key (implied and defaults to clustered) and each foreign key column
There is more but you could do worse than using SQL Server's missing index DMVs
An index may slow down a SELECT if the optimiser makes a bad choice, and it is possible to have too many. Too many will slow writes but it's also possible to overlap indexes
Answering the ones I can I would say that every table, no matter how small, will always benefit from at least one index as there has to be at least one way in which you are interested in looking up the data; otherwise why store it?
A general rule for adding indexes would be if you need to find data in the table using a particular field, or set of fields. This leads on to how many indexes are too many, generally the more indexes you have the slower inserts and updates will be as they also have to modify the indexes but it all depends on how you use your data. If you need fast inserts then don't use too many. In reporting "read only" type data stores you can have a number of them to make all your lookups faster.
Unfortunately there is no one rule to guide you on the number or type of indexes to use, although the query optimiser of your chosen DB can give hints based on the queries you are executing.
As to clustered indexes they are the Ace card you only get to use once, so choose carefully. It's worth calculating the selectivity of the field you are thinking of putting it on as it can be wasted to put it on something like a boolean field (contrived example) as the selectivity of the data is very low.
This is really a very involved question, though a good starting place would be to index any column that you will filter results on. ie. If you often break products into groups by sale price, index the sale_price column of the products table to improve scan times for that query, etc.
If you are querying based on the value in a column, you probably want to index that column.
i.e.
SELECT a,b,c FROM MyTable WHERE x = 1
You would want an index on X.
Generally, I add indexes for columns which are frequently queried, and I add compound indexes when I'm querying on more than one column.
Indexes won't hurt the performance of a SELECT, but they may slow down INSERTS (or UPDATES) if you have too many indexes columns per table.
As a rule of thumb - start off by adding indexes when you find yourself saying WHERE a = 123 (in this case, an index for "a").
You should use an index on columns that you use for selection and ordering - i.e. the WHERE and ORDER BY clauses.
Indexes can slow down select statements if there are many of them and you are using WHERE and ORDER BY on columns that have not been indexed.
As for size of table - several thousands rows and upwards would start showing real benefits to index usage.
Having said that, there are automated tools to do this, and SQL server has an Database Tuning Advisor that will help with this.