index value or create new column for dimensions? - sql

Main Tables have 5-25 columns and 1,000,000-100,000,000 rows, most of columns are FK to tables with less then 100 values like 'unique char(6)' and many views that perform calculations, joins and grouping need to show the user values like that original 'unique char(5)'.
In this case it is better to create tinyint identity(1,1) index column for each dimension table and join them to result of view or create and use indexes on original values like that char(6)?

You should test to see which works better.
However, it is quite reasonable to assume that dimension tables using tinyint would benefit performance, because they would significantly reduce the size of each row -- from 5 or 6 bytes to 1. This means that your large tables would fit on many fewer pages, reducing I/O time.
Balanced against this is the cost of the joins. However, the reference tables would easily fit into memory. And by making the lookup key the primary key on the table, the lookups would also be much faster.

Related

Does PostgreSQL use all available indexes to run a query faster?

We are structuring a project where some tables will have many records, and we intend to use 4 numeric foreign keys and 1 numeric primary, our assumption is that if we create an index for each foreign key and the default index of the primary key, the postgres planning would use all the starts (5 in total) to perform the query.
95% of the time the queries would be providing at least the 4 foreign keys.
Would each index be used to position the search faster in the sequential section of records?
Would having 4 indexes increase the speed of the query or would it suffice with a single index of the parent level (branch_id)?
Thank you for your time and experience.
example: if all foreign keys have an index
SELECT * FROM products WHERE
account_d=1 AND
organization_id=2 AND
business_id=3 AND
branch_id=4 AND
product_id=5;
example: if I only indicate the id of the primary key
SELECT * FROM products WHERE product_id=5;
If all 4 columns are specified by equality, it is possible to combine the single-column indexes using BitmapAnd. However, this would be less efficient than using one multi-column index on all four columns.
Since that will apparently be a very common query, it would make sense to have that multi-column index.
Usually you will want to index each foreign key column. Otherwise, if you want to delete an organization, for example, it would need to scan the whole table to verify that no records were still referencing it. Whichever column is the first one in the multi-column index will not need to also have a single-column index for it. But the other 3 which are not first probably still need their own indexes.
Indexes are (predominantly) used when filtering or joining tables, so whether the indexes you are proposing are useful is entirely dependent on the SQL you are running and whether the query optimiser determines that using an index would be beneficial.
For example, if you ran SELECT * FROM TABLE then none of the indexes would be used.
I can’t comment on Postgresql specifically but many/most DBMSs automatically create indexes when you define PKs/FKs - so you will get the indexes anyway, regardless of any performance tuning you are trying to implement
Update
Having individual indexes on each column is not going to help with the query you’ve provided, the optimiser will only use one of them, probably the PK. A compound index on multiple columns would help, but the more columns you add to the index, the more restrictive the pattern of queries it will benefit.
Say you have 3 columns A, B, C and include them all in WHERE clause, then having a compound index of A+B+C would be highly beneficial.
If you keep this index but your WHERE clause only has columns A, B it will still benefit significantly as the query can still use the A+B subset of the index.
If your WHERE clause only has columns A,C then it would benefit only slightly, as it would select all records from the index that start with the A value - but then would have to filter them to find the subset with the C value

What sort of Index for 'AND' columns?

I have a table to store people and want to select where the person is not marked as "deleted". I have a clustered primary key on the ID column (PersonID).
The 'Deleted' column is a DATETIME, nullable, and is populated when deleted.
My query looks like this:
SELECT *
FROM dbo.Person
WHERE PersonID = 100
AND Deleted IS NULL
This table can grow to around 40,000 people.
Should I have an index that covers the Deleted flag as well?
I may also query things like:
SELECT *
FROM Task t
INNER JOIN Person p
ON p.PersonID = t.PersonID
AND p.Deleted IS NULL
WHERE t.TaskTypeId = 5
AND t.Deleted IS NULL
Task table estimate is about 1.5 million rows.
I think I need one that covers both the pk and the deleted flag on both tables? i.e. on (Task.TaskId, Task.Deleted) and (Person.PersonID, Person.Deleted)?
Reasons for me investigating an index rethink, is due to a number of deadlock occurring in complex procedures. I'd like to reduce the number of rows locked on selects/writes/updates, as well as get a performance gain.
Since you are using SQL Server 2008, the fastest querying might well be using a filtered index. In this Deleted column whose type is DATETIME and nullable, you could try something like this index:
CREATE NONCLUSTERED INDEX Filtered_Deleted_Index
ON dbo.Person(Deleted)
WHERE Deleted IS NOT NULL
This will get you the smallest valid set in both use cases you listed above (for querying dbo.Person and also joining with Tasks).
Your instinct is (generally speaking) sound - an index that contains all columns needed for the query is called a covering index, which in this case would require:
CREATE INDEX Person_PersonID_Deleted ON Person(PersonID, Deleted);
You are unlikely to get much performance benefit on index lookup by adding the Deleted column, since searching for null is (usually) ignored, but having these indexes means that accessing the table can be bypassed entirely for Person.
You could also try creating:
CREATE INDEX Task_TaskTypeId_Deleted ON Task(TaskTypeId, Deleted);
which will avoid accessing Task rows that are marked as "deleted", and Task would then only accessed for non-deleted rows. However, if most of your Tasks are not deleted, I wouldn't bother with this index.
It's worth trying out various combinations of index(es) to see which combination gives the best result.
Since the primary key is PersonID, adding another index with extra columns after PersonID will not improve the "selectability" of the index, although is may prevent the need to lookup the record by rowid for filtering on deleted. With only 3% records filtered, that's nothing, so don't create another index on Person.
As for the Task table, it very much depends on the selectability of TaskTypeId = 5 AND Deleted IS NULL, i.e. how many records match the criteria. In general, a sequential search (full table scan) is faster than an index scan with row lookup if more than 20% of the records are selected. For very larger tables where the data is very distributed (e.g. physically every 10th record is selected), the performance threshold is below 10%.
So, if more than 10-20% of Task records are type 5, and only 3% of records are deleted, no index will improve performance, because the fastest access plan is likely a merge join of two full table scans.

SQL Database Design - Single column table - SELECT efficiency?

I'm putting together a database which I want to be very efficient for SELECT queries as all the data in the database will be created once and multiple read-only queries run on that data.
I have multiple tables (~20) and each have a composite primary key which is made up of a combination of Time (int) and either ProductID (int) or ServiceID (int) depending on the table.
I understand to maximize read/SELECT efficiency I should generally de-normalize the data to prevent expensive table joins.
So considering that, if I want to optimize read performance should I
have 3 single-column tables containing all the possible Time, ProductID and ServiceID values. Then have these as a foreign key in each of the tables.
keep all the 20 tables completely independent to optimize SELECT performance.
The fastest SELECT statement is an index SEEK from a single table.
If you only care about SELECT performance, and don't have to worry about writing new data to the tables, then design your tables around your expected queries, so that all the data you need for each query can be found in one table, and that table has an index on the expected search arguments.

Joining 100 tables

Assume that I have a main table which has 100 columns referencing (as foreign keys) to some 100 tables (containing primary keys).
The whole pack of information requires joining those 100 tables. And it is definitely a performance issue to join such a number of tables. Hopefully, we can expect that any user would like to request a bunch of data containing values from not more than some 5-7 tables (out of those 100) in queries that put conditions (in WHERE part of the query) on the fields from some 3-4 tables (out of those 100). Different queries have different combinations of the tables used to produce "SELECT" part of query and to put conditions in "WHERE". But, again, every SELECT would require some 5-7 tables and every WHERE would requre some 3-4 tables (definitely, the list of tables used to produce SELECT may overlap with the list of tables used to put conditions in WHERE).
I can write a VIEW with the underlying code joining all those 100 tables. Then I can write the mentioned above SQL-queries to this VIEW. But in this case it is a big issue for me how to instruct SQL Server that (despite the explicit instructions in the code to join all those 100 tables) only some 11 tables should be joined (11 tables are enough to be joined to produce SELECT outcome and take into account WHERE conditions).
Another approach may be to create a "feature" that converts the following "fake" code
SELECT field1, field2, field3 FROM TheFakeTable WHERE field1=12 and field4=5
into the following "real" code:
SELECT T1.field1, T2.field2, T3.field3 FROM TheRealMainTable
join T1 on ....
join T2 on ....
join T3 on ....
join T4 on ....
WHERE T1.field1=12 and T4.field4=5
From grammatical point of view, it is not a problem even to allow any mixed combinations of this "TheFakeTable-mechanism" with real tables and constructions. The real problem here is how to realize this "feature" technically. I can create a function which takes the "fake" code as an input and produces the "real" code. But it is not convenient because it requires using dynamic SQL tools evrywhere where this "TheFakeTable-mechanism" appears. A fantasy-land solution is to extend the gramma of the SQL-language in my Management Studio to allow writing such a fake code and then automatically converting this code into the real one before sending to the server.
My questions are:
whether SQl Server can be instructed shomehow (or to be genius enouh) to join only 11 tables instead of 100 in the VIEW described above?
If I decide to create this "TheFakeTable-mechanism" feature, what would be the best form for the technical realization of this feature?
Thanks to everyone for every comment!
PS
The structure with 100 tables arises from the following question that I asked here:
Normalizing an extremely big table
The SQL Server optimizer does contain logic to remove redundant joins, but there are restrictions, and the joins have to be provably redundant. To summarize, a join can have four effects:
It can add extra columns (from the joined table)
It can add extra rows (the joined table may match a source row more than once)
It can remove rows (the joined table may not have a match)
It can introduce NULLs (for a RIGHT or FULL JOIN)
To successfully remove a redundant join, the query (or view) must account for all four possibilities. When this is done, correctly, the effect can be astonishing. For example:
USE AdventureWorks2012;
GO
CREATE VIEW dbo.ComplexView
AS
SELECT
pc.ProductCategoryID, pc.Name AS CatName,
ps.ProductSubcategoryID, ps.Name AS SubCatName,
p.ProductID, p.Name AS ProductName,
p.Color, p.ListPrice, p.ReorderPoint,
pm.Name AS ModelName, pm.ModifiedDate
FROM Production.ProductCategory AS pc
FULL JOIN Production.ProductSubcategory AS ps ON
ps.ProductCategoryID = pc.ProductCategoryID
FULL JOIN Production.Product AS p ON
p.ProductSubcategoryID = ps.ProductSubcategoryID
FULL JOIN Production.ProductModel AS pm ON
pm.ProductModelID = p.ProductModelID
The optimizer can successfully simplify the following query:
SELECT
c.ProductID,
c.ProductName
FROM dbo.ComplexView AS c
WHERE
c.ProductName LIKE N'G%';
To:
Rob Farley wrote about these ideas in depth in the original MVP Deep Dives book, and there is a recording of him presenting on the topic at SQLBits.
The main restrictions are that foreign key relationships must be based on a single key to contribute to the simplification process, and compilation time for the queries against such a view may become quite long particularly as the number of joins increases. It could be quite a challenge to write a 100-table view that gets all the semantics exactly correct. I would be inclined to find an alternative solution, perhaps using dynamic SQL.
That said, the particular qualities of your denormalized table may mean the view is quite simple to assemble, requiring only enforced FOREIGN KEYs non-NULLable referenced columns, and appropriate UNIQUE constraints to make this solution work as you would hope, without the overhead of 100 physical join operators in the plan.
Example
Using ten tables rather than a hundred:
-- Referenced tables
CREATE TABLE dbo.Ref01 (col01 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE);
CREATE TABLE dbo.Ref02 (col02 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE);
CREATE TABLE dbo.Ref03 (col03 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE);
CREATE TABLE dbo.Ref04 (col04 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE);
CREATE TABLE dbo.Ref05 (col05 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE);
CREATE TABLE dbo.Ref06 (col06 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE);
CREATE TABLE dbo.Ref07 (col07 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE);
CREATE TABLE dbo.Ref08 (col08 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE);
CREATE TABLE dbo.Ref09 (col09 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE);
CREATE TABLE dbo.Ref10 (col10 tinyint PRIMARY KEY, item varchar(50) NOT NULL UNIQUE);
The parent table definition (with page-compression):
CREATE TABLE dbo.Normalized
(
pk integer IDENTITY NOT NULL,
col01 tinyint NOT NULL REFERENCES dbo.Ref01,
col02 tinyint NOT NULL REFERENCES dbo.Ref02,
col03 tinyint NOT NULL REFERENCES dbo.Ref03,
col04 tinyint NOT NULL REFERENCES dbo.Ref04,
col05 tinyint NOT NULL REFERENCES dbo.Ref05,
col06 tinyint NOT NULL REFERENCES dbo.Ref06,
col07 tinyint NOT NULL REFERENCES dbo.Ref07,
col08 tinyint NOT NULL REFERENCES dbo.Ref08,
col09 tinyint NOT NULL REFERENCES dbo.Ref09,
col10 tinyint NOT NULL REFERENCES dbo.Ref10,
CONSTRAINT PK_Normalized
PRIMARY KEY CLUSTERED (pk)
WITH (DATA_COMPRESSION = PAGE)
);
The view:
CREATE VIEW dbo.Denormalized
WITH SCHEMABINDING AS
SELECT
item01 = r01.item,
item02 = r02.item,
item03 = r03.item,
item04 = r04.item,
item05 = r05.item,
item06 = r06.item,
item07 = r07.item,
item08 = r08.item,
item09 = r09.item,
item10 = r10.item
FROM dbo.Normalized AS n
JOIN dbo.Ref01 AS r01 ON r01.col01 = n.col01
JOIN dbo.Ref02 AS r02 ON r02.col02 = n.col02
JOIN dbo.Ref03 AS r03 ON r03.col03 = n.col03
JOIN dbo.Ref04 AS r04 ON r04.col04 = n.col04
JOIN dbo.Ref05 AS r05 ON r05.col05 = n.col05
JOIN dbo.Ref06 AS r06 ON r06.col06 = n.col06
JOIN dbo.Ref07 AS r07 ON r07.col07 = n.col07
JOIN dbo.Ref08 AS r08 ON r08.col08 = n.col08
JOIN dbo.Ref09 AS r09 ON r09.col09 = n.col09
JOIN dbo.Ref10 AS r10 ON r10.col10 = n.col10;
Hack the statistics to make the optimizer think the table is very large:
UPDATE STATISTICS dbo.Normalized WITH ROWCOUNT = 100000000, PAGECOUNT = 5000000;
Example user query:
SELECT
d.item06,
d.item07
FROM dbo.Denormalized AS d
WHERE
d.item08 = 'Banana'
AND d.item01 = 'Green';
Gives us this execution plan:
The scan of the Normalized table looks bad, but both Bloom-filter bitmaps are applied during the scan by the storage engine (so rows that cannot match do not even surface as far as the query processor). This may be enough to give acceptable performance in your case, and certainly better than scanning the original table with its overflowing columns.
If you are able to upgrade to SQL Server 2012 Enterprise at some stage, you have another option: creating a column-store index on the Normalized table:
CREATE NONCLUSTERED COLUMNSTORE INDEX cs
ON dbo.Normalized (col01,col02,col03,col04,col05,col06,col07,col08,col09,col10);
The execution plan is:
That probably looks worse to you, but column storage provides exceptional compression, and the whole execution plan runs in Batch Mode with filters for all the contributing columns. If the server has adequate threads and memory available, this alternative could really fly.
Ultimately, I'm not sure this normalization is the correct approach considering the number of tables and the chances of getting a poor execution plan or requiring excessive compilation time. I would probably correct the schema of the denormalized table first (proper data types and so on), possibly apply data compression...the usual things.
If the data truly belongs in a star-schema, it probably needs more design work than just splitting off repeating data elements into separate tables.
Why do you think joining 100 tables would be a performance issue?
If all the keys are primary keys, then all the joins will use indexes. The only question, then, is whether the indexes fit into memory. If they fit in memory, performance is probably not an issue at all.
You should try the query with the 100 joins before making such a statement.
Furthermore, based on the original question, the reference tables have just a few values in them. The tables themselves fit on a single page, plus another page for the index. This is 200 pages, which would occupy at most a few megabytes of your page cache. Don't worry about the optimizations, create the view, and if you have performance problems then think about the next steps. Don't presuppose performance problems.
ELABORATION:
This has received a lot of comments. Let me explain why this idea may not be as crazy as it sounds.
First, I am assuming that all the joins are done through primary key indexes, and that the indexes fit into memory.
The 100 keys on the page occupy 400 bytes. Let's say that the original strings are, on average 40 bytes each. These would have occupied 4,000 bytes on the page, so we have a savings. In fact, about 2 records would fit on a page in the previous scheme. About 20 fit on a page with the keys.
So, to read the records with the keys is about 10 times faster in terms of I/O than reading the original records. With the assumptions about the small number of values, the indexes and original data fit into memory.
How long does it take to read 20 records? The old way required reading 10 pages. With the keys, there is one page read and 100*20 index lookups (with perhaps an additional lookup to get the value). Depending on the system, the 2,000 index lookups may be faster -- even much faster -- than the additional 9 page I/Os. The point I want to make is that this is a reasonable situation. It may or may not happen on a particular system, but it is not way crazy.
This is a bit oversimplified. SQL Server doesn't actually read pages one-at-a-time. I think they are read in groups of 4 (and there might be look-ahead reads when doing a full-table scan). On the flip side, though, in most cases, a table-scan query is going to be more I/O bound than processor bound, so there are spare processor cycles for looking up values in reference tables.
In fact, using the keys could result in faster reading of the table than not using them, because spare processing cycles would be used for the lookups ("spare" in the sense that processing power is available when reading). In fact, the table with the keys might be small enough to fit into available cache, greatly improving performance of more complex queries.
The actual performance depends on lots of factors, such as the length of the strings, the original table (is it larger than available cache?), the ability of the underlying hardware to do I/O reads and processing at the same time, and the dependence on the query optimizer to do the joins correctly.
My original point was that assuming a priori that the 100 joins are a bad thing is not correct. The assumption needs to be tested, and using the keys might even give a boost to performance.
If your data doesn't change much, you may benefit from creating an Indexed View, which basically materializes the view.
If the data changes often, it may not be a good option, as the server has to maintain the indexed view for each change in the underlying tables of the view.
Here's a good blog post that describes it a bit better.
From the blog:
CREATE VIEW dbo.vw_SalesByProduct_Indexed
WITH SCHEMABINDING
AS
SELECT
Product,
COUNT_BIG(*) AS ProductCount,
SUM(ISNULL(SalePrice,0)) AS TotalSales
FROM dbo.SalesHistory
GROUP BY Product
GO
The script below creates the index on our view:
CREATE UNIQUE CLUSTERED INDEX idx_SalesView ON vw_SalesByProduct_Indexed(Product)
To show that an index has been created on the view and that it does
take up space in the database, run the following script to find out
how many rows are in the clustered index and how much space the view
takes up.
EXECUTE sp_spaceused 'vw_SalesByProduct_Indexed'
The SELECT statement below is the same statement as before, except
this time it performs a clustered index seek, which is typically very
fast.
SELECT
Product, TotalSales, ProductCount
FROM vw_SalesByProduct_Indexed
WHERE Product = 'Computer'

Using more than one index per table is dangerous?

In a former company I worked at, the rule of thumb was that a table should have no more than one index (allowing the odd exception, and certain parent-tables holding references to nearly all other tables and thus are updated very frequently).
The idea being that often, indexes cost the same or more to uphold than they gain. Note that this question is different to indexed-view-vs-indexes-on-table as the motivation is not only reporting.
Is this true? Is this index-purism worth it?
In your career do you generally avoid using indexes?
What are the general large-scale recommendations regarding indexes?
Currently and at the last company we use SQL Server, so any product specific guidelines are welcome too.
You need to create exactly as many indexes as you need to create. No more, no less. It is as simple as that.
Everybody "knows" that an index will slow down DML statements on a table. But for some reason very few people actually bother to test just how "slow" it becomes in their context. Sometimes I get the impression that people think that adding another index will add several seconds to each inserted row, making it a game changing business tradeoff that some fictive hotshot user should decide in a board room.
I'd like to share an example that I just created on my 2 year old pc, using a standard MySQL installation. I know you tagged the question SQL Server, but the example should be easily converted. I insert 1,000,000 rows into three tables. One table without indexes, one table with one index and one table with nine indexes.
drop table numbers;
drop table one_million_rows;
drop table one_million_one_index;
drop table one_million_nine_index;
/*
|| Create a dummy table to assist in generating rows
*/
create table numbers(n int);
insert into numbers(n) values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
/*
|| Create a table consisting of 1,000,000 consecutive integers
*/
create table one_million_rows as
select d1.n + (d2.n * 10)
+ (d3.n * 100)
+ (d4.n * 1000)
+ (d5.n * 10000)
+ (d6.n * 100000) as n
from numbers d1
,numbers d2
,numbers d3
,numbers d4
,numbers d5
,numbers d6;
/*
|| Create an empty table with 9 integer columns.
|| One column will be indexed
*/
create table one_million_one_index(
c1 int, c2 int, c3 int
,c4 int, c5 int, c6 int
,c7 int, c8 int, c9 int
,index(c1)
);
/*
|| Create an empty table with 9 integer columns.
|| All nine columns will be indexed
*/
create table one_million_nine_index(
c1 int, c2 int, c3 int
,c4 int, c5 int, c6 int
,c7 int, c8 int, c9 int
,index(c1), index(c2), index(c3)
,index(c4), index(c5), index(c6)
,index(c7), index(c8), index(c9)
);
/*
|| Insert 1,000,000 rows in the table with one index
*/
insert into one_million_one_index(c1,c2,c3,c4,c5,c6,c7,c8,c9)
select n, n, n, n, n, n, n, n, n
from one_million_rows;
/*
|| Insert 1,000,000 rows in the table with nine indexes
*/
insert into one_million_nine_index(c1,c2,c3,c4,c5,c6,c7,c8,c9)
select n, n, n, n, n, n, n, n, n
from one_million_rows;
My timings are:
1m rows into table without indexes: 0,45 seconds
1m rows into table with 1 index: 1,5 seconds
1m rows into table with 9 indexes: 6,98 seconds
I'm better with SQL than statistics and math, but I'd like to think that:
Adding 8 indexes to my table, added (6,98-1,5) 5,48 seconds in total. Each index would then have contributed 0,685 seconds (5,48 / 8) for all 1,000,000 rows. That would mean that the added overhead per row per index would have been 0,000000685 seconds. SOMEBODY CALL THE BOARD OF DIRECTORS!
In conclusion, I'd like to say that the above test case doesn't prove a shit. It just shows that tonight, I was able to insert 1,000,000 consecutive integers into in a table in a single user environment. Your results will be different.
That is utterly ridiculous. First, you need multiple indexes in order to perfom correctly. For instance, if you have a primary key, you automatically have an index. that means you can't index anything else with the rule you described. So if you don't index foreign keys, joins will be slow and if you don't index fields used in the where clause, queries will still be slow. Yes you can have too many indexes as they do take extra time to insert and update and delete records, but no more than one is not dangerous, it is a requirement to have a system that performs well. And I have found that users tolerate a longer time to insert better than they tolerate a longer time to query.
Now the exception might be for a system that takes thousands of readings per second from some automated equipment. This is a database that generally doesn't have indexes to speed inserts. But usually these types of databases are also not used for reading, the data is transferred instead daily to a reporting database which is indexed.
Yes, definitely - too many indexes on a table can be worse than no indexes at all. However, I don't think there's any good in having the "at most one index per table" rule.
For SQL Server, my rule is:
index any foreign key fields - this helps JOINs and is beneficial to other queries, too
index any other fields when it makes sense, e.g. when lots of intensive queries can benefit from it
Finding the right mix of indices - weighing the pros of speeding up queries vs. the cons of additional overhead on INSERT, UPDATE, DELETE - is not an exact science - it's more about know-how, experience, measuring, measuring, and measuring again.
Any fixed rule is bound to be more contraproductive than anything else.....
The best content on indexing comes from Kimberly Tripp - the Queen of Indexing - see her blog posts here.
Unless you like very slow reads, you should have indexes. Don't go overboard, but don't be afraid of being liberal about them either. EVERY FK should be indexed. You're going to do a look up each of these columns on inserts to other tables to make sure the references are set. The index helps. As well as the fact that indexed columns are used often in joins and selects.
We have some tables that are inserted into rarely, with millions of records. Some of these tables are also quite wide. It's not uncommon for these tables to have 15+ indexes. Other tables with heavy inserting and low reads we might only have a handful of indexes- but one index per table is crazy.
Updating an index is once per insert (per index). Speed gain is for every select. So if you update infrequently and read often, then the extra work may be well worth it.
If you do different selects (meaning the columns you filter on are different), then maintaining an index for each type of query is very useful. Provided you have a limited set of columns that you query often.
But the usual advice holds: if you want to know which is fastest: profile!
You should of course be careful not to create too many indexes per table, but only ever using a single index per table is not a useful level.
How many indexes to use depends on how the table is used. A table that is updated often would generally have less indexes than one that is read much more often than it's updated.
We have some tables that are updated regularly by a job every two minutes, but they are read often by queries that vary a lot, so they have several indexes. One table for example have 24 indexes.
So much depends on your schema and the queries that you normally run. For example: if you normally need to select above 60% of the rows of your table, indexes won't help you and it will be cheaper to table scan than to index scan and then lookup rows. Focused queries that select a small number of rows in different parts of the table or which are used for joins in queries will probably benefit from indexes. The right index in the right place can make or break a feature.
Indexes take space so making too many indexes on a table can be counter productive for the same reasons listed above. Scanning 5 indexes and then performing row lookups may be much more expensive than simply table scanning.
Good design is the synthesis about about knowing when to normalise and when not to.
If you frequently join on a particular column, check the IO plan with the index and without. As a general rule I avoid tables with more than 20 columns. This is often a sign that the data should be normalised. More than about 5 indexes on a table and you may be using more space for the indexes than the main table, be sure that is worth it. These rules are only the lightest of guidance and so much depends on how the data will be used in queries and what your data update profile looks like.
Experiment with your query plans to see how your solution improves or degrades with an index.
Every table must have a PK, which is indexed of course (generally a clustered one), then every FK should be indexed as well.
Finally you may want to index fields on which you often sort on, if their data is well differenciated: for a field with only 5 possible values in a table with 1 million records, an index will not be of a great benefit.
I tend to be minimalistic with indexes, until the db starts beeing well filled, and ...slower. It is easy to identify the bottlenecks and add just the right the indexes at that point.
Optimizing the retrieval with indexes must be carefully designed to reflect actual query patterns. Surely, for a table with Primary Key, you will have at least one clustered index (that's how data is actually stored), then any additional indexes are taking advantage of the layout of the data (clustered index).
After analyzing queries that execute against the table, you want to design an index(s) that cover them. That may mean building one or more indexes but that heavily depends on the queries themselves. That decision cannot be made just by looking at column statistics only.
For tables where it's mostly inserts, i.e. ETL tables or something, then you should not create Primary Keys, or actually drop indexes and re-create them if data changes too quickly or drop/recreated entirely.
I personally would be scared to step into an environment that has a hard-coded rule of indexes per table ratio.