I have multiple tables storing call data, and they have the same clustered index: start_time (DATETIME). The base table is "calls", and I have a "calls_participants" and a "calls_other_data". All tables also have a call_id CHAR(36) column which identifies a call, so it should be indexed of course.
I am going to store a lot of rows (1 billion) and would like to save space and maintenance costs if possible, so my idea is to index the call_id column only on the base table, and so the other tables would not have any index apart from the CLUSTERED start_time index.
Then if I would have to access a row in the calls_other_data table based on the call_id, I would write something like this:
SELECT cod.some_column
FROM calls_other_data cod
WHERE cod.start_time = (SELECT start_time
FROM calls
WHERE call_id = '36-chars-unique-value')
AND cod.call_id = '36-chars-unique-value'
I would say the performance of this query is quite the same as if there was an index on calls_other_data.call_id, since the calls.call_id index can be used the same way: the start_time value is included automatically, so the same steps have to be performed by SQL Server:
Index seek on (either table).call_id to get the start_time
Clustered index seek on calls_other_data.start_time
I just never read about such design and would like to read other people's opinion about it :) Are you aware of any drawbacks?
Obviously, if a row is missing from the calls table, then it will be hard to look for it in the other tables, but that I do not mind.
Thanks :)
I see what you're trying to get at. calls_other_data would still carry both a call_id column and a start_time column, just like the calls table, but the calls_other_date.call_id column would not be indexed, because indexes come with a storage cost. That seems to be your thinking.
Something to note here is that since your clustered index is not unique on any of your tables, sql will make it unique by adding some additional data called a uniqueifier. So you already have extra storage here that you may not have considered, making your attempts to "optimize" storage somewhat moot.
I would advise against the approach. Storage is cheap, unique indexes are of great help to the optimizer, and indexes on foreign key columns (or foreign-key-like columns, if you don't actually have any referential integrity) are a good rule of thumb.
I want to execute the next delete query:
delete from MyTable where userId = 5
What of the below indexed will have better performance on this query, or will they both perform the same?
All the mentioned fields in here are BigInt
CREATE INDEX MyTable_UserId_UserBalance_index ON Main.dbo.MyTable (UserId, UserBalance);
CREATE INDEX MyTable_UserId_index ON Main.dbo.MyTable (UserId);
For that particular query they should perform roughly the same. Since you're looking up the first item in the compound index, finding the records should be the same as if there were another index on the single column.
I'm surprised at many of the answers here - for the purpose of deletion, there won't be any benefit using either one of these indexes over the other. For if the record exists in either index, you'll need to find it and remove it.
The purpose of indexes is for reads, not deleting data. If you're trying to read data, you would ask a question like this one, as one index has the potential to return the data quicker than the other. For deletes, you need to delete from all indexes, including the NC indexes.
It seems as if some enlightenment into the world of indexes is being called for.
Some great (free) documentation from well-known DBA - Brent Ozar: https://www.brentozar.com/archive/2016/10/think-like-engine-class-now-free-open-source/
CREATE INDEX MyTable_UserId_index ON Main.dbo.MyTable (UserId);
Both the indexes would work the same way unless the order is reversed for below index.
From:
CREATE INDEX MyTable_UserId_UserBalance_index ON Main.dbo.MyTable (UserId, UserBalance);
To:
CREATE INDEX MyTable_UserId_UserBalance_index ON Main.dbo.MyTable (UserBalance, UserId);
In second case, server may not see this index as useful as UserId is at second level.
Also, why would you create two indexes with same column in it? If you know that your table will be queried frequently with both UserId and UserBalance then, probably it is best to create an index with both columns in it.
Again, just make sure which column gets utilized the most.
I want to order my table Jogadores by Total value which I can achieve by writing this:
ALTER TABLE `Jogadores` ORDER BY `Total` DESC ;
My question is how do I use this as a trigger every time I edit a Total value or insert a new row?
I'm using phpMyAdmin
Thanks
I'm not sure I understand what you are trying to accomplish. I am going to assume you want your table sorted in that fashion for when you retrieve it. If that is true, you can sort it upon retrieval rather than altering the table.
SELECT *
FROM Jogadores
ORDER BY Total DESC;
Why? That's not usually how SQL databases work
What if some totals are the same? What is the second way to sort them?
It's also usually not a good idea to store a total in your operational db. Is this a data warehouse?
It would probably be easier to add an index on total and create a view to order by that column.
After reading the new comments, what you want is not doable. See this answer.
Original content:
As others have said, the physical order of the rows in the table doesn't matter since you can always sort when querying. If you're thinking of clustered indexes, then all you need to do is define your primary key properly.
The InnoDB term for a primary key index. InnoDB table storage is organized based on the values of the primary key columns, to speed up queries and sorts involving the primary key columns. For best performance, choose the primary key columns carefully based on the most performance-critical queries. Because modifying the columns of the clustered index is an expensive operation, choose primary columns that are rarely or never updated.
So a clustered index would achieve what you want, but it is probably not what you actually need. Just so you know, the clustered index speedups are practically zero if you're dealing with less than a million or so rows (rough estimate).
I have been given as an assignment the following queries and how to optimize them by creating indexes:
a)SELECT EmployeeID FROM Employee WHERE Name='John'AND Surname='Brown'
b)SELECT EmployeeID FROM Employee WHERE Salary=1300
c)SELECT EmployeeID FROM Employee WHERE Salary BETWEEN 1000 AND 1500
d)SELECT EmployeeID FROM Employee WHERE Salary+Bonus>1500
from a table Employee:
EmployeeID,
Name,
Surname,
Salary,
Bonus
I've stated that for the first one a) a compound index would be best and a clustered better for the second one and a partioned for the third one and a some kind of clustered in (d.I am not sure about my choices could you please verify them and correct me as I am new to this.P.s.indexes better be in Oracle.Thanks in advance.
for the first one a) a compound index would be best
On what columns? Surname + Name, Name + Surname? The order can matter. In this case it likely doesn't matter at all, but normally you want to consider the entire application, and think about how you will be commonly doing lookups. If you have another query that looks up by surname alone, for example, you would want to make sure to put the surname column first in the index, so that this index will work for both queries. Over-indexing can be almost as bad for performance as under-indexing.
a clustered better for the second one
Again, you need to consider the entire table/application when choosing your indexes. You can only have one clustered index on a table. It's highly likely that your one clustered index will need to be on the EmployeeID column. Even if we don't see any queries using it here, that's the most common need. A regular index on Salary is probably good enough here.
a partitioned for the third one
A regular index on Salary will likely be good enough. The database will be able to go to the first record, and then "walk the index" until it no longer matches. But it depends on the table size... if the table is huge (into the 10s and 100s of millions of rows), partitioning can make sense (usually on the table itself). I don't know many businesses that have 10s of millions of employees. Again, one thing we want to do is avoid over-indexing, and so re-using the same index from b) is good.
some kind of clustered in (d
It depends on the database engine and version, but it's unlikely any index by itself will help this query. The reason is that expressions are very often not sargable, meaning the query optimizer won't be smart enough to know if the index will work or not. What you can do is create a computed column virtual column and put an index on that column.
In all cases, since you're only requesting the EmployeeID column, you want to add EmployeeID to the index, but don't actually index on that field. Just INCLUDE the column with the index. In this way, the database will be able to entirely fulfill your query from the index alone, without needing to go back to the table. The reason for just including the column, rather than indexing on it, is to improve performance of INSERT/UPDATE statements, to avoid needing to rebuild the index.
For d) a function based index (FBI) would be appropriate:
CREATE INDEX emp_i3 ON Employee (Salary+Bonus);
I have a table in SQL Server database which I want to be able to search and retrieve data from as fast as possible. I don't care about how long time it takes to insert into the table, I am only interested in the speed at which I can get data.
The problem is the table is accessed with 20 or more different types of queries. This makes it a tedious task to add an index specially designed for each query. I'm considering instead simply adding an index that includes ALL columns of the table. It's not something you would normally do in "good" database design, so I'm assuming there is some good reason why I shouldn't do it.
Can anyone tell me why I shouldn't do this?
UPDATE: I forgot to mention, I also don't care about the size of my database. It's OK that it means my database size will grow larger than it needed to
First of all, an index in SQL Server can only have at most 900 bytes in its index entry. That alone makes it impossible to have an index with all columns.
Most of all: such an index makes no sense at all. What are you trying to achieve??
Consider this: if you have an index on (LastName, FirstName, Street, City), that index will not be able to be used to speed up queries on
FirstName alone
City
Street
That index would be useful for searches on
(LastName), or
(LastName, FirstName), or
(LastName, FirstName, Street), or
(LastName, FirstName, Street, City)
but really nothing else - certainly not if you search for just Street or just City!
The order of the columns in your index makes quite a difference, and the query optimizer can't just use any column somewhere in the middle of an index for lookups.
Consider your phone book: it's order probably by LastName, FirstName, maybe Street. So does that indexing help you find all "Joe's" in your city? All people living on "Main Street" ?? No - you can lookup by LastName first - then you get more specific inside that set of data. Just having an index over everything doesn't help speed up searching for all columns at all.
If you want to be able to search by Street - you need to add a separate index on (Street) (and possibly another column or two that make sense).
If you want to be able to search by Occupation or whatever else - you need another specific index for that.
Just because your column exists in an index doesn't mean that'll speed up all searches for that column!
The main rule is: use as few indices as possible - too many indices can be even worse for a system than having no indices at all.... build your system, monitor its performance, and find those queries that cost the most - then optimize these, e.g. by adding indices.
Don't just blindly index every column just because you can - this is a guarantee for lousy system performance - any index also requires maintenance and upkeep, so the more indices you have, the more your INSERT, UPDATE and DELETE operations will suffer (get slower) since all those indices need to be updated.
You are having a fundamental misunderstanding how indexes work.
Read this explanation "how multi-column indexes work".
The next question you might have is why not creating one index per column--but that's also a dead-end if you try to reach top select performance.
You might feel that it is a tedious task, but I would say it's a required task to index carefully. Sloppy indexing strikes back, as in this example.
Note: I am strongly convinced that proper indexing pays off and I know that many people are having the very same questions you have. That's why I'm writing a the a free book about it. The links above refer the pages that might help you to answer your question. However, you might also want to read it from the beginning.
...if you add an index that contains all columns, and a query was actually able to use that index, it would scan it in the order of the primary key. Which means hitting nearly every record. Average search time would be O(n/2).. the same as hitting the actual database.
You need to read a bit lot about indexes.
It might help if you consider an index on a table to be a bit like a Dictionary in C#.
var nameIndex = new Dictionary<String, List<int>>();
That means that the name column is indexed, and will return a list of primary keys.
var nameOccupationIndex = new Dictionary<String, List<Dictionary<String, List<int>>>>();
That means that the name column + occupation columns are indexed. Now imagine the index contained 10 different columns, nested so far deep it contains every single row in your table.
This isn't exactly how it works mind you. But it should give you an idea of how indexes could work if implemented in C#. What you need to do is create indexes based on one or two keys that are queried on extensively, so that the index is more useful than scanning the entire table.
If this is a data warehouse type operation where queries are highly optimized for READ queries, and if you have 20 ways of dissecting the data, e.g.
WHERE clause involves..
Q1: status, type, customer
Q2: price, customer, band
Q3: sale_month, band, type, status
Q4: customer
etc
And you absolutely have plenty of fast storage space to burn, then by all means create an index for EVERY single column, separately. So a 20-column table will have 20 indexes, one for each individual column. I could probably say to ignore bit columns or low cardinality columns, but since we're going so far, why bother (with that admonition). They will just sit there and churn the WRITE time, but if you don't care about that part of the picture, then we're all good.
Analyze your 20 queries, and if you have hot queries (the hottest ones) that still won't go any faster, plan it using SSMS (press Ctrl-L) with one query in the query window. It will tell you what index can help that queries - just create it; create them all, fully remembering that this adds again to the write cost, backup file size, db maintenance time etc.
I think the questioner is asking
'why can't I make an index like':
create index index_name
on table_name
(
*
)
The problems with that have been addressed.
But given it sounds like they are using MS sql server.
It's useful to understand that you can include nonkey columns in an index so they the values of those columns are available for retrieval from the index, but not to be used as selection criteria :
create index index_name
on table_name
(
foreign_key
)
include (a,b,c,d) -- every column except foreign key
I created two tables with a million identical rows
I indexed table A like this
create nonclustered index index_name_A
on A
(
foreign_key -- this is a guid
)
and table B like this
create nonclustered index index_name_B
on B
(
foreign_key -- this is a guid
)
include (id,a,b,c,d) -- ( every key except foreign key)
no surprise, table A was slightly faster to insert to.
but when I and ran these this queries
select * from A where foreign_key = #guid
select * from B where foreign_key = #guid
On table A, sql server didn't even use the index, it did a table scan, and complained about a missing index including id,a,b,c,d
On table B, the query was over 50 times faster with much less io
forcing the query on A to use the index didn't make it any faster
select * from A where foreign_key = #guid
select * from A with (index(index_name_A)) where foreign_key = #guid
I'm considering instead simply adding an index that includes ALL columns of the table.
This is always a bad idea. Indexes in database is not some sort of pixie dust that works magically. You have to analyze your queries and according to what and how is being queried - append indexes.
It is not as simple as "add everything to index and have a nap"
I see only long and complicated answers here so I thought I should give the simplest answer possible.
You cannot add an entire table, or all its columns, to an index because that just duplicates the table.
In simple terms, an index is just another table with selected data ordered in the order you normally expect to query it in, and a pointer to the row on disk where the rest of the data lives.
So, a level of indirection exists. You have a partial copy of a table in an preordered manner (both on disk and in RAM, assuming the index is not fragmented), which is faster to query for the columns defined in the index only, while the rest of the columns can be fetched without having to scan the disk for them, because the index contains a reference to the correct position on disk where the rest of the data is for each row.
1) size, an index essentially builds a copy of the data in that column some easily searchable structure, like a binary tree (I don't know SQL Server specifcs).
2) You mentioned speed, index structures are slower to add to.
That index would just be identical to your table (possibly sorted in another order).
It won't speed up your queries.