resource busy while rebuilding an index - sql

There is a table T with column a:
CREATE TABLE T {
id_t integer not null,
text varchar2(100),
a integer
}
/
ALTER TABLE T ADD CONSTRAINT PK_T PRIMARY KEY (ID_T)
/
Index was created like this:
CREATE INDEX IDX_T$A ON T(a);
Also there's such a check constraint:
ALTER TABLE T ADD CONSTRAINT CHECK (a is null or a = 1);
Most of the records in T have null value of a, so the query using the index works really fast if the index is in consistent state and statistics for it is up to date.
But the problem is that values of a of some rows change really frequently (some rows get null value, some get 1), and I need to rebuild the index let's say every hour.
However, really often when the job doing this, trying to rebuild the index, it gets an exception:
ORA-00054: resource busy and acquire with NOWAIT specified
Can anybody help me with coping with this issue?

Index rebuild is not needed in most cases. Of course newly created indexes are efficient and their efficiency decreases over time. But this process stops after some time - it simply converges to some level.
If you really need to optimize indexes try to use less invasive DDL command "ALTER INDEX SHRINK SPACE COMPACT".
PS: I would also recommend you to use some smaller block size (4K or 8K) for you tablespace storage.

Have you tried adding "ONLINE" to that index rebuild statement?
Edit: If online rebuild is not available then you might look at a fast refresh on commit materialised view to store the rowid's or primary keys of rows that have a 1 for column A.
Start with a look at the documentation:-
http://docs.oracle.com/cd/B28359_01/server.111/b28326/repmview.htm
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_6002.htm#SQLRF01302
You'd create a materialised view log on the table, and then a materialised view.
Think in particular about the resource requirements for this: changes to the master table require a change vector to be written to the materialised view log, which is effectively an additional insert for every change. Then the changes have to be propagated to another table (the materialised view storage table) with additional queries. It is by no means a low-impact option.

Rebuilding for Performance
Most Oracle experts are skeptical of frequently rebuilding indexes. For example, a quick glance at the presentation Rebuilding the Truth will show you that indexes do not behave in the naive way many people assume they do.
One of the relevant points in that presentation is "fully deleted blocks are recycled and are not generally problematic". If your values completely change, then your index should not grow infinitely large. Although your indexes are used in a non-typical way, that
behavior is probably a good thing.
Here's a quick example. Create 1 million rows and index 100 of them.
--Create table, constraints, and index.
CREATE TABLE T
(
id_t integer primary key,
text varchar2(100),
a integer check (a is null or a = 1)
);
CREATE INDEX IDX_T$A ON T(a);
--Insert 1M rows, with 100 "1"s.
insert into t
select level, level, case when mod(level, 10000) = 0 then 1 else null end
from dual connect by level <= 1000000;
commit;
--Initial sizes:
select segment_name, bytes/1024/1024 MB
from dba_segments
where segment_name in ('T', 'IDX_T$A');
SEGMENT_NAME MB
T 19
IDX_T$A 0.0625
Now completely shuffle the index rows around 1000 times.
--Move the 1s around 1000 times. Takes about 6 minutes.
begin
for i in 9000 .. 10000 loop
update t
set a = case when mod(id_t, i) = 0 then 1 else null end
--Don't update if the vlaue is the same
where nvl(a,-1) <> nvl(case when mod(id_t,i) = 0 then 1 else null end,-1);
commit;
end loop;
end;
/
The index segment size is still the same.
--The the index size is the same.
select segment_name, bytes/1024/1024 MB
from dba_segments
where segment_name in ('T', 'IDX_T$A');
SEGMENT_NAME MB
T 19
IDX_T$A 0.0625
Rebuilding for Statistics
It's good to worry about the statistics of objects whose data changes so dramatically. But again, although your system is unusual, it may work fine with the default Oracle behavior. Although the rows indexed may completely change, the relevant statistics may stay the same. If there are always 100 rows indexed, the number of rows, blocks, and distinctness will stay the same.
Perhaps the clustering factor will significantly change, if the 100 rows shift from being completely random to being very close to each other. But even that may not matter. If there are millions of rows, but only 100 indexed, the optimizer's decision will probably be the same regardless of the clustering factor. Reading 1 block (awesome clustering factor) or reading 100 blocks (worst-case clustering factor) will still look much better than doing a full table scan of millions of rows.
But statistics are complicated, I'm surely over-simplifying things. If you need to keep your statistics a specific way, you may want to lock them. Unfortunately you can't lock just an index, but you can lock the table and it's dependent indexes.
begin
dbms_stats.lock_table_stats(ownname => user, tabname => 'T');
end;
/
Rebuilding anyway
If a rebuild is still necessary, #Robe Eleckers idea to retry should work. Although instead of an exception, it would be easier to set DDL_LOCK_TIMEOUT.
alter session set ddl_lock_timeout = 500;
The session will still need to get an exclusive lock on the table, but this will make it much easier to find the right window of opportunity.

Since the field in question has very low cardinality I would suggest using a bitmap index and skipping the rebuilds altogether.
CREATE BITMAP INDEX IDX_T$A ON T(a);
Note (as mentioned in comments): transactional performance is very low for bitmap indexes so this would only work well if there are very few overlapping transactions doing updates to the table.

Related

Oracle indexes. "DISTINCT_KEYS" vs "NUM_ROWS". Do I need an NONUNIQUE index?

I have a table in which I have a lot of indexes. I noticed that in on one of them "DISTINCT_KEYS" is almost the same as "NUM_ROWS". Is such an index needed?
Or maybe it is better to remove it because:
takes a place on the database.
When adding data to a table, it does not necessarily slow down the refreshing of indexes.
What do you think? Will deleting this index slow down the queries using the name of this column?
Is such an index needed?
All you can tell from statistics like DISTINCT_KEYS and NUM_ROWS (and other statistics like histograms) is whether an index might be useful. An index is only truly "needed" if it is actually being used by queries in your system. (See ALTER INDEX ... MONITORING USAGE command)
An index having DISTINCT_KEYS that is almost equal to NUM_ROWS certainly might be useful. In fact, it would be much more natural to suspect an index to be useless if DISTINCT_KEYS was a very low percentage of NUM_ROWS.
Suppose you have a query:
SELECT column_x
FROM table_y
WHERE column_z = :some_value
Suppose the index on column_z shows DISTINCT_KEYS = 999999 and NUM_ROWS = 1000000.
That means, on average, each distinct key has (very) slightly more than one row. That makes the index very selective and very useful. When our query runs, we will use the index to pull out only one row of the table very quickly.
Suppose, instead, the index on column_z shows DISTINCT_KEYS = 2 and NUM_ROWS = 1000000. Now, each distinct key has an average of 500,000 rows. This index is worthless because we have to read each half of the blocks from the index and then still probably wind up reading at last half of the blocks from the table (probably way more than half). Worse, these reads are all single block reads. It would be way, way faster for Oracle to ignore the index and do a full table scan -- fewer blocks in total to read and all the reads are multi-block reads (e.g., 8 at a time).
For completeness, I'll point out that an index with DISTINCT_KEYS = 2 and NUM_ROWS = 1000000 could still be useful if the data is very skewed. That is, for example, if one distinct key had 999,000 rows and the other distinct key had only 1,000 rows. The index would be useful for finding the rows of that other (smaller) distinct key. Oracle gathers histograms as part of its statistics to keep track of which columns have skewed data and, if so, how many rows there are for each distinct key. (Over-simplification).
TL;DR It's very likely a good index and no more likely to be "unneeded" than any other index in your system.

Copying timestamp columns within a Postgres table

I have a table with about 30 million rows in a Postgres 9.4 db. This table has 6 columns, the primary key id, 2 text, one boolean, and two timestamp. There are indices on one of the text columns, and obviously the primary key.
I want to copy the values in the first timestamp column, call it timestamp_a into the second timestamp column, call it timestamp_b. To do this, I ran the following query:
UPDATE my_table SET timestamp_b = timestamp_a;
This worked, but it took an hour and 15 minutes to complete, which seems a really long time to me considering, as far as I know, it's just copying values from one column to the next.
I ran EXPLAIN on the query and nothing seemed particularly inefficient. I then used pgtune to modify my config file, most notably it increased the shared_buffers, work_mem, and maintenance_work_mem.
I re-ran the query and it took essentially the same amount of time, actually slightly longer (an hour and 20 mins).
What else can I do to improve the speed of this update? Is this just how long it takes to write 30 million timestamps into postgres? For context I'm running this on a macbook pro, osx, quadcore, 16 gigs of ram.
The reason this is slow is that internally PostgreSQL doesn't update the field. It actually writes new rows with the new values. This usually takes a similar time to inserting that many values.
If there are indexes on any column this can further slow the update down. Even if they're not on columns being updated, because PostgreSQL has to write a new row and write new index entries to point to that row. HOT updates can help and will do so automatically if available, but that generally only helps if the table is subject to lots of small updates. It's also disabled if any of the fields being updated are indexed.
Since you're basically rewriting the table, if you don't mind locking out all concurrent users while you do it you can do it faster with:
BEGIN
DROP all indexes
UPDATE the table
CREATE all indexes again
COMMIT
PostgreSQL also has an optimisation for writes to tables that've just been TRUNCATEd, but to benefit from that you'd have to copy the data to a temp table, then TRUNCATE and copy it back. So there's no benefit.
#Craig mentioned an optimization for COPY after TRUNCATE: Postgres can skip WAL entries because if the transaction fails, nobody will ever have seen the new table anyway.
The same optimization is true for tables created with CREATE TABLE AS:
What causes large INSERT to slow down and disk usage to explode?
Details are missing in your description, but if you can afford to write a new table (no concurrent transactions get in the way, no dependencies), then the fastest way might be (except if you have big TOAST table entries - basically big columns):
BEGIN;
LOCK TABLE my_table IN SHARE MODE; -- only for concurrent access
SET LOCAL work_mem = '???? MB'; -- just for this transaction
CREATE my_table2
SELECT ..., timestamp_a, timestamp_a AS timestamp_b
-- columns in order, timestamp_a overwrites timestamp_b
FROM my_table
ORDER BY ??; -- optionally cluster table while being at it.
DROP TABLE my_table;
ALTER TABLE my_table2 RENAME TO my_table;
ALTER TABLE my_table
, ADD CONSTRAINT my_table_id_pk PRIMARY KEY (id);
-- more constraints, indices, triggers?
-- recreate views etc. if any
COMMIT;
The additional benefit: a pristine (optionally clustered) table without bloat. Related:
Best way to populate a new column in a large table?

DELETE query taking a long time to execute - how to optimise? [duplicate]

I have a rather big table named FTPLog with around 3 milion record I wanted to add a delete mechanism to delete old logs but delete command takes long time. I found that clustered index deleting takes long time.
DECLARE #MaxFTPLogId as bigint
SELECT #MaxFTPLogId = Max(FTPLogId) FROM FTPLog WHERE LogTime <= DATEADD(day, -10 , GETDATE())
PRINT #MaxFTPLogId
DELETE FROM FTPLog WHERE FTPLogId <= #MaxFTPLogId
I want to know how can I improve performance of deleting?
It might be slow because a large delete generates a big transaction log. Try to delete it in chunks, like:
WHILE 1 = 1
BEGIN
DELETE TOP (256) FROM FTPLog WHERE FTPLogId <= #MaxFTPLogId
IF ##ROWCOUNT = 0
BREAK
END
This generates smaller transactions. And it mitigates locking issues by creating breathing space for other processes.
You might also look into partitioned tables. These potentially allow you to purge old entries by dropping an entire partition.
Since it's a log table, there is no need to make is clustered.
It's unlikely that you will search it on Id.
Alter your PRIMARY KEY so that it's unclustered. This will use HEAP storage method which is faster on DML:
ALTER TABLE FTPLog DROP CONSTRAINT Primary_Key_Name
ALTER TABLE FTPLog ADD CONSTRAINT Primary_Key_Name PRIMARY KEY NONCLUSTERED (FTPLogId)
, and just issue:
SELECT #MaxFTPLogTime = DATEADD(day, -10 , GETDATE())
PRINT #MaxFTPLogId
DELETE FROM FTPLog WHERE LogTime <= #MaxFTPLogTime
Check the density of your table (use command DBCC showcontig to check density)
Scan Density [Best Count:Actual Count] this parameter should be closer to 100% and Logical Scan Fragmentation parameter should be closer to 0% for best performance of your table. If it is not, re-index and refragment the index of that table to improve performance of your query execution.
I assume that not only this table is huge in terms of number of rows, but also that it is really heavily used for logging new entries while you try to clean it up.
Suggestion of Andomar should help, but I would try to clean it up when there are no inserts going on.
Alternative: when you write logs, you probably do not care about the transaction isolation so much. Therefore I would change transaction isolation level for the code/processes that write the log entries so that you may avoid creating huge tempdb (by the way, check if tempdb grows a lot during this DELETE operation)
Also, I think that deletions from the clustered index should not be really slower then from non-clustered one: you are still psysically deleting rows. Rebuilding this index afterward may take time though.

Optimize Oracle Between Date Statement

I got an oracle SQL query that selects entries of the current day like so:
SELECT [fields]
FROM MY_TABLE T
WHERE T.EVT_END BETWEEN TRUNC(SYSDATE)
AND TRUNC(SYSDATE) + 86399/86400
AND T.TYPE = 123
Whereas the EVT_END field is of type DATE and T.TYPE is a NUMBER(15,0).
Im sure with increasing size of the table data (and ongoing time), the date constraint will decrease the result set by a much larger factor than the type constraint. (Since there are a very limited number of types)
So the basic question arising is, what's the best index to choose to make the selection on the current date faster. I especially wonder what the advantages and disadvantages of a functional index on TRUNC(T.EVT_END) to a normal index on T.EVT_END would be. When using a functional index the query would look something like that:
SELECT [fields]
FROM MY_TABLE T
WHERE TRUNC(T.EVT_END) = TRUNC(SYSDATE)
AND T.TYPE = 123
Because other queries use the mentioned date constraints without the additional type selection (or maybe with some other fields), multicolumn indexes wouldn't help me a lot.
Thanks, I'd appreciate your hints.
Your index should be TYPE, EVT_END.
CREATE INDEX PIndex
ON MY_TABLE (TYPE, EVT_END)
The optimizer plan will first go through this index to find the TYPE=123 section. Then under TYPE=123, it will have the EVT_END timestamps sorted, so it can search the b-tree for the first date in the range, and go through the dates sequentially until a data is out of the range.
Based on the query above the functional index will provide no value. For a functional index to be used the predicate in the query would need to be written as follows:
SELECT [fields]
FROM MY_TABLE T
WHERE TRUNC(T.EVT_END) BETWEEN TRUNC(SYSDATE) AND TRUNC(SYSDATE) + 86399/86400
AND T.TYPE = 123
The functional index on the column EVT_END, is being ignored. It would be better to have a normal index on the EVT_END date. For a functional index to be used the left hand of the condition must match the declaration of the functional index. I would probably write the query as:
SELECT [fields]
FROM MY_TABLE T
WHERE T.EVT_END BETWEEN TRUNC(SYSDATE) AND TRUNC(SYSDATE+1)
AND T.TYPE = 123
And I would create the following index:
CREATE INDEX bla on MY_TABLE( EVT_END )
This is assuming you are trying to find the events that ended within a day.
Results
If your index is cached, a function-based index performs best. If your index is not cached, a compressed function-based index performs best.
Below are the relative times generated by my test code. Lower is better. You cannot compare the numbers between cached and non-cached, they are totally different tests.
In cache Not in cache
Regular 120 139
FBI 100 138
Compressed FBI 126 100
I'm not sure why the FBI performs better than the regular index. (Although it's probably related to what you said about equality predicates versus range. You can see that the regular index has an extra "FILTER" step in its explain plan.) The compressed FBI has some additional overhead to uncompress the blocks. This small amount of extra CPU time is relevant when everything is already in memory, and CPU waits are most important. But when nothing is cached, and IO is more important, the reduced space of the compressed FBI helps a lot.
Assumptions
There seems to be a lot of confusion about this question. The way I read it, you only care about this one specific query, and you want to know whether a function-based index or a regular index will be faster.
I assume you do not care about other queries that may benefit from this index, additional time spent to maintain the index, if the developers remember to use it, or whether or not the optimizer chooses the index. (If the optimizer doesn't choose the index, which I think is unlikely, you can add a hint.) Let me know if any of these assumptions are wrong.
Code
--Create tables. 1 = regular, 2 = FBI, 3 = Compressed FBI
create table my_table1(evt_end date, type number) nologging;
create table my_table2(evt_end date, type number) nologging;
create table my_table3(evt_end date, type number) nologging;
--Create 1K days, each with 100K values
begin
for i in 1 .. 1000 loop
insert /*+ append */ into my_table1
select sysdate + i - 500 + (level * interval '1' second), 1
from dual connect by level <= 100000;
commit;
end loop;
end;
/
insert /*+ append */ into my_table2 select * from my_table1;
insert /*+ append */ into my_table3 select * from my_table1;
--Create indexes
create index my_table1_idx on my_table1(evt_end);
create index my_table2_idx on my_table2(trunc(evt_end));
create index my_table3_idx on my_table3(trunc(evt_end)) compress;
--Gather statistics
begin
dbms_stats.gather_table_stats(user, 'MY_TABLE1');
dbms_stats.gather_table_stats(user, 'MY_TABLE2');
dbms_stats.gather_table_stats(user, 'MY_TABLE3');
end;
/
--Get the segment size.
--This shows the main advantage of a compressed FBI, the lower space.
select segment_name, bytes/1024/1024/1024 GB
from dba_segments
where segment_name like 'MY_TABLE__IDX'
order by segment_name;
SEGMENT_NAME GB
MY_TABLE1_IDX 2.0595703125
MY_TABLE2_IDX 2.0478515625
MY_TABLE3_IDX 1.1923828125
--Test block.
--Uncomment different lines to generate 6 different test cases.
--Regular, Function-based, and Function-based compressed. Both cached and not-cached.
declare
v_count number;
v_start_time number;
v_total_time number := 0;
begin
--Uncomment two lines to test the server when it's "cold", and nothing is cached.
for i in 1 .. 10 loop
execute immediate 'alter system flush buffer_cache';
--Uncomment one line to test the server when it's "hot", and everything is cached.
--for i in 1 .. 1000 loop
v_start_time := dbms_utility.get_time;
SELECT COUNT(*)
INTO V_COUNT
--#1: Regular
FROM MY_TABLE1 T
WHERE T.EVT_END BETWEEN TRUNC(SYSDATE) AND TRUNC(SYSDATE) + 86399/86400;
--#2: Function-based
--FROM MY_TABLE2 T
--WHERE TRUNC(T.EVT_END) = TRUNC(SYSDATE);
--#3: Compressed function-based
--FROM MY_TABLE3 T
--WHERE TRUNC(T.EVT_END) = TRUNC(SYSDATE);
v_total_time := v_total_time + (dbms_utility.get_time - v_start_time);
end loop;
dbms_output.put_line('Seconds: '||v_total_time/100);
end;
/
Test Methodology
I ran each block at least 5 times, alternated between run types (in case something was running on my machine only part of the time), threw out the high and the low run times, and averaged them. The code above does not include all that logic, since it would take up 90% of this answer.
Other Things to Consider
There are still many other things to consider. My code assumes the data is inserted in a very index-friendly order. Things will be totally different if this is not true, as compression may not help at all.
Probably the best solution to this problem is to avoid it completely with partitioning. For reading the same amount of data, a full table scan is much faster than an index read because it uses multi-block IO. But there are some downsides to partitioning, like the large amount of money
required to buy the option, and extra maintenance tasks. For example, creating partitions ahead of time, or using interval partitioning (which has some other weird issues), gathering stats, deferred segment creation, etc.
Ultimately, you will need to test this yourself. But remember that testing even such a simple choice is difficult. You need realistic data, realistic tests, and a realistic environment. Realistic data is much harder than it sounds. With indexes, you cannot simply copy the data and build the indexes at once. create table my_table1 as select * from and create index ... will create a different index than if you create the table and perform a bunch of inserts and deletes in a specific order.
#S1lence:
I believe there would be a considerable time of thought behind this question being asked by you. And, I took a lot of time to post my answer here, as I don't like posting any guesses for answers.
I would like to share my websearch experience on this choice of normal Index on a date column against FBIs.
Based on my understanding on the link below, if you are about to use TRUNC function for sure, then you can strike out the option of normal index, as this consulting web space says that:
Even though the column may have an index, the trunc built-in function will invalidate the index, causing sub-optimal execution with unnecessary I/O.
I suppose that clears all. You've to go with FBI if you gonna use TRUNC for sure. Please let me know if my reply makes sense.
Oracle SQL Tuning with function-based indexes
Cheers,
Lakshmanan C.
The deicion over whether or not to use a function-based index should be driven by how you plan to write your queries. If all your queries against the date column will be in the form TRUNC(EVT_END), then you should use the FBI. However, in general it will be better to create an index on just EVT_END for the following reasons:
It will be more reusable. If you ever have queries checking particular times of the day then you can't use TRUNC.
There will be more distinct keys in the index using just the date. If you have 1,000 different times inserted during a day, EVT_END will will 1,000 distinct keys, whereas TRUNC(EVT_END) will only have 1 (this assumes that you're storing the time component and not just midnight for all the dates - in the second case both will have 1 distinct key for a day). This matters because the more distinct values an index has, the higher the selectivity of the index and the more likely it is to be used by the optimizer (see this)
The clustering factor is likely to be different, but in the case of using trunc it's more likely to go up, not down as stated in some other comments. This is because the clustering factor represents how closely the order of the values in the index match the physical storage of the data. If all your data is inserted in date order then a plain index will have the same order as the physical data. However, with TRUNC all times on a single day will map to the same value, so the order of rows in the index may be completely different to the physical data. Again, this means the trunc index is less likely to be used. This will entirely depend on your database's insertion/deletion patterns however.
Developers are more likely to write queries against where trunc isn't applied to the column (in my experience). Whether this holds true for you will depend upon your developers and the quality controls you have around deployed SQL.
Personally, I would go with Marlin's answer of TYPE, EVT_END as a first pass. You need to test this in your environment however and see how this affects this query and all others using the TYPE and EVT_END columns however.

Using more than one index per table is dangerous?

In a former company I worked at, the rule of thumb was that a table should have no more than one index (allowing the odd exception, and certain parent-tables holding references to nearly all other tables and thus are updated very frequently).
The idea being that often, indexes cost the same or more to uphold than they gain. Note that this question is different to indexed-view-vs-indexes-on-table as the motivation is not only reporting.
Is this true? Is this index-purism worth it?
In your career do you generally avoid using indexes?
What are the general large-scale recommendations regarding indexes?
Currently and at the last company we use SQL Server, so any product specific guidelines are welcome too.
You need to create exactly as many indexes as you need to create. No more, no less. It is as simple as that.
Everybody "knows" that an index will slow down DML statements on a table. But for some reason very few people actually bother to test just how "slow" it becomes in their context. Sometimes I get the impression that people think that adding another index will add several seconds to each inserted row, making it a game changing business tradeoff that some fictive hotshot user should decide in a board room.
I'd like to share an example that I just created on my 2 year old pc, using a standard MySQL installation. I know you tagged the question SQL Server, but the example should be easily converted. I insert 1,000,000 rows into three tables. One table without indexes, one table with one index and one table with nine indexes.
drop table numbers;
drop table one_million_rows;
drop table one_million_one_index;
drop table one_million_nine_index;
/*
|| Create a dummy table to assist in generating rows
*/
create table numbers(n int);
insert into numbers(n) values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
/*
|| Create a table consisting of 1,000,000 consecutive integers
*/
create table one_million_rows as
select d1.n + (d2.n * 10)
+ (d3.n * 100)
+ (d4.n * 1000)
+ (d5.n * 10000)
+ (d6.n * 100000) as n
from numbers d1
,numbers d2
,numbers d3
,numbers d4
,numbers d5
,numbers d6;
/*
|| Create an empty table with 9 integer columns.
|| One column will be indexed
*/
create table one_million_one_index(
c1 int, c2 int, c3 int
,c4 int, c5 int, c6 int
,c7 int, c8 int, c9 int
,index(c1)
);
/*
|| Create an empty table with 9 integer columns.
|| All nine columns will be indexed
*/
create table one_million_nine_index(
c1 int, c2 int, c3 int
,c4 int, c5 int, c6 int
,c7 int, c8 int, c9 int
,index(c1), index(c2), index(c3)
,index(c4), index(c5), index(c6)
,index(c7), index(c8), index(c9)
);
/*
|| Insert 1,000,000 rows in the table with one index
*/
insert into one_million_one_index(c1,c2,c3,c4,c5,c6,c7,c8,c9)
select n, n, n, n, n, n, n, n, n
from one_million_rows;
/*
|| Insert 1,000,000 rows in the table with nine indexes
*/
insert into one_million_nine_index(c1,c2,c3,c4,c5,c6,c7,c8,c9)
select n, n, n, n, n, n, n, n, n
from one_million_rows;
My timings are:
1m rows into table without indexes: 0,45 seconds
1m rows into table with 1 index: 1,5 seconds
1m rows into table with 9 indexes: 6,98 seconds
I'm better with SQL than statistics and math, but I'd like to think that:
Adding 8 indexes to my table, added (6,98-1,5) 5,48 seconds in total. Each index would then have contributed 0,685 seconds (5,48 / 8) for all 1,000,000 rows. That would mean that the added overhead per row per index would have been 0,000000685 seconds. SOMEBODY CALL THE BOARD OF DIRECTORS!
In conclusion, I'd like to say that the above test case doesn't prove a shit. It just shows that tonight, I was able to insert 1,000,000 consecutive integers into in a table in a single user environment. Your results will be different.
That is utterly ridiculous. First, you need multiple indexes in order to perfom correctly. For instance, if you have a primary key, you automatically have an index. that means you can't index anything else with the rule you described. So if you don't index foreign keys, joins will be slow and if you don't index fields used in the where clause, queries will still be slow. Yes you can have too many indexes as they do take extra time to insert and update and delete records, but no more than one is not dangerous, it is a requirement to have a system that performs well. And I have found that users tolerate a longer time to insert better than they tolerate a longer time to query.
Now the exception might be for a system that takes thousands of readings per second from some automated equipment. This is a database that generally doesn't have indexes to speed inserts. But usually these types of databases are also not used for reading, the data is transferred instead daily to a reporting database which is indexed.
Yes, definitely - too many indexes on a table can be worse than no indexes at all. However, I don't think there's any good in having the "at most one index per table" rule.
For SQL Server, my rule is:
index any foreign key fields - this helps JOINs and is beneficial to other queries, too
index any other fields when it makes sense, e.g. when lots of intensive queries can benefit from it
Finding the right mix of indices - weighing the pros of speeding up queries vs. the cons of additional overhead on INSERT, UPDATE, DELETE - is not an exact science - it's more about know-how, experience, measuring, measuring, and measuring again.
Any fixed rule is bound to be more contraproductive than anything else.....
The best content on indexing comes from Kimberly Tripp - the Queen of Indexing - see her blog posts here.
Unless you like very slow reads, you should have indexes. Don't go overboard, but don't be afraid of being liberal about them either. EVERY FK should be indexed. You're going to do a look up each of these columns on inserts to other tables to make sure the references are set. The index helps. As well as the fact that indexed columns are used often in joins and selects.
We have some tables that are inserted into rarely, with millions of records. Some of these tables are also quite wide. It's not uncommon for these tables to have 15+ indexes. Other tables with heavy inserting and low reads we might only have a handful of indexes- but one index per table is crazy.
Updating an index is once per insert (per index). Speed gain is for every select. So if you update infrequently and read often, then the extra work may be well worth it.
If you do different selects (meaning the columns you filter on are different), then maintaining an index for each type of query is very useful. Provided you have a limited set of columns that you query often.
But the usual advice holds: if you want to know which is fastest: profile!
You should of course be careful not to create too many indexes per table, but only ever using a single index per table is not a useful level.
How many indexes to use depends on how the table is used. A table that is updated often would generally have less indexes than one that is read much more often than it's updated.
We have some tables that are updated regularly by a job every two minutes, but they are read often by queries that vary a lot, so they have several indexes. One table for example have 24 indexes.
So much depends on your schema and the queries that you normally run. For example: if you normally need to select above 60% of the rows of your table, indexes won't help you and it will be cheaper to table scan than to index scan and then lookup rows. Focused queries that select a small number of rows in different parts of the table or which are used for joins in queries will probably benefit from indexes. The right index in the right place can make or break a feature.
Indexes take space so making too many indexes on a table can be counter productive for the same reasons listed above. Scanning 5 indexes and then performing row lookups may be much more expensive than simply table scanning.
Good design is the synthesis about about knowing when to normalise and when not to.
If you frequently join on a particular column, check the IO plan with the index and without. As a general rule I avoid tables with more than 20 columns. This is often a sign that the data should be normalised. More than about 5 indexes on a table and you may be using more space for the indexes than the main table, be sure that is worth it. These rules are only the lightest of guidance and so much depends on how the data will be used in queries and what your data update profile looks like.
Experiment with your query plans to see how your solution improves or degrades with an index.
Every table must have a PK, which is indexed of course (generally a clustered one), then every FK should be indexed as well.
Finally you may want to index fields on which you often sort on, if their data is well differenciated: for a field with only 5 possible values in a table with 1 million records, an index will not be of a great benefit.
I tend to be minimalistic with indexes, until the db starts beeing well filled, and ...slower. It is easy to identify the bottlenecks and add just the right the indexes at that point.
Optimizing the retrieval with indexes must be carefully designed to reflect actual query patterns. Surely, for a table with Primary Key, you will have at least one clustered index (that's how data is actually stored), then any additional indexes are taking advantage of the layout of the data (clustered index).
After analyzing queries that execute against the table, you want to design an index(s) that cover them. That may mean building one or more indexes but that heavily depends on the queries themselves. That decision cannot be made just by looking at column statistics only.
For tables where it's mostly inserts, i.e. ETL tables or something, then you should not create Primary Keys, or actually drop indexes and re-create them if data changes too quickly or drop/recreated entirely.
I personally would be scared to step into an environment that has a hard-coded rule of indexes per table ratio.