I have added an index for a table in sql server 2008. I would like to know how much impact the index has on the table and if the index is useful or not.
Thanks.
The best way to tell is to look at execution plans for queries run against the table.
You can look at index usage DMVs but they only tell you how many queries used the index. Whether that is a one-row seek or a 10 million row scan, there's no difference in the recorded stats.
Did you make any measurments prior to making the change? If you did a baseline measurment prior to modifing the system run the same baseline then compare the results.
Two things the DMVs for indexes can tell use is how much the index is used to satisfy queries and how much it cost to keep an index. As the ratio (index usage / index) updates gets smaller the more time the DBA should take in deciding if the index is needed.
Related
I heard this question during a job interview, and the interviewer said that yes. My question is why and could someone give an example of an index that makes search longer instead of shorter.
Yes, it can.
An additional index adds possible execution plans for a query if applicable. The Postgres query planner estimates costs for a variety of possible plans and the fastest estimate wins. Since those are estimates, actual query plans can always deviate. A chosen query plan using your new index can turn out to be slower than another plan without.
If your server is configured properly (cost and resource settings, current columns statistics, ...) this outcome is unlikely, but still possible. This can happen for almost every query. More likely for more complex queries. And some types of queries are notoriously hard to estimate.
Related:
Keep PostgreSQL from sometimes choosing a bad query plan
Also, indexes always add write cost, so if your database is write-heavy and the machine is already saturated, more indexes can bring overall performance down.
A trivial example would be on a table with very few rows.
An index search has to load the index into memory and then look up the original data. If a table has only a few rows, they probably fit onto one data page. So, a full table scan requires loading one page.
Any index search (on a cold cache) requires loading two data pages -- one for the index and one for the data. That can be (significantly) longer then just scanning the rows on a single page.
On a large table, if the "search" returns a significant proportion of the rows in the table, then an index search ends up fetching the rows in an order different from how they are stored. If the data pages do not fix in memory, then you have a situation called thrashing, which means that there is a high probability that each new row would be a cache miss.
When creating indexes on PostgreSQL tables, EXPLAIN ANALYZE followed by an SQL command shows which indexes are used.
For example:
EXPLAIN ANALYZE SELECT A,B,C FROM MY_TABLE WHERE C=123;
Returns:
Seq Scan on public.my_table (cost=...) <- No index, BAD
And, after creating the index, it would return:
Index Scan using my_index_name on public.my_table (cost=...) <- Index, GOOD
However, for some queries that used the same index with a few hundred records, it didn't make any difference. Reading through documentation, it is recommended that either run ANALYZE or have the "Autovacuum" daemon on. This way the database would know the size of tables and decide on query plans properly.
is this absolutely necessary in a production environment? In other words, will PostgreSQL use the index when it's time to use it without need to analyse or vacuum as an extra task?
Short answer "just run autovacuum." Long answer... yes, because statistics can get out of date.
Let's talk about indexes and how/when PostgreSQL decides to use them.
PostgreSQL gets a query in, parses it, and then begins the planning process. How are we going to scan the tables? How are we going to join them and in what order? These are not trivial decisions and trying to find the generally best ways to do things typically means that PostgreSQL needs to know something about the tables.
The first thing to note is that indexes are not always a win. No plan ever beats a sequential scan through a one-page table, and even a 5 page table will almost always be faster with a sequential scan than an index scan. So PostgreSQL cannot safely decide to "use all available indexes."
So the way PostgreSQL decides whether to use an index is to check statistics. Now, these go out of date, which is why you want autovacuum to be updating them. You say your table has a few hundred records and the statics were probably out of date. If PostgreSQL cannot say that the index is a win, it won't use it. A few hundred records is going to be approaching "an index might help" territory depending on how selective the index is in weeding out records.
In your large table, there was probably no question based on existing statistics that the index would help. In your smaller table, there probably was a question and it got answered one way based on the stats it had, and a different way based on newer stats.
Im running a query that is taking 2 seconds but it should perform better, so I run the Execute Plan details from SQL Managemenet Studio and I found a "step" in the process that the Cost is 70%.
Then, I right click on the item and I found an option that says "Missing Index Details", after I clicked that then a query with a recommendation is generated:
/*
Missing Index Details from SQLQuery15.sql - (local).application_prod (appprod (58))
The Query Processor estimates that implementing the following index could improve the query cost by 68.8518%.
*/
/*
USE [application_prod]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[cloud_document] ([isactivedocument])
INCLUDE ([objectuid])
GO
*/
So my question is exactly what happens if I execute the query? Is it going to affect my database, is there any sideback or sideeffects after applying that?
Thanks a lot and appreciate in advance.
Running the query qill create an Index on the table specified (cloud_document).
This should improve the reading performance and improve performance/query time.
It does also affect the performance of INSERT/UPDATE/DELETE statements as the indexes needs to be maintained during these statements.
The decision to use indexing, how many indexes and what the index consists of is more an art than an exact science.
The actual maintinance of indexes, defragmenting, and statistics is something that can be automated, but should be left, until you have a better understanding of what indexes are and what they do.
I would recomend that you start reading some documentation regarding indexing.
May start with Stairway to SQL Server Indexes
The literal meaning is telling you to build an index on isactivedocument of table [dbo].[cloud_document], from which I assume you are using isactivedocument as a condition to filter the table, and select the column objectuid. Something like
select objectuid, ... from [dbo].[cloud_document] where isactivedocument = 0/1
Note that the "Clustered Index Scan (70%)" doesn't mean it's a problem of the clustered index. It means you don't have index on isactivedocument, then sql engine has to scan clustered index to find what you want. That's why it's taking so much pressure. Once you create index on isactivedocument, check out the plan again, and you'll see the node becomes "index seek", which is a much faster way to find out what you want.
Usually if your database stress mainly comes from query, new index doesn't do much harm to your system. Just go ahead and create the index. But of course you need to keep index quantities as less as possible.
While writing complex SQL queries, how do we ensure that we are using proper indexes and avoiding full table scans? I do it by making sure I only join on columns that have indexes(primary key, unique key etc). Is this enough?
Ask the database for the execution plan for your query, and proceed from there.
Don't forget to index the columns that appear in your where clause as well.
Look at the execution plan of the query to see how the query optimizer thinks things must be retrieved. The plan is generally based on the statistics on the tables, the selectivity of the indices and the order of the joins. Note that the optimizer can decide that performing a full table scan is 'cheaper' than index lookup.
other Things to look for:
avoid subqueries if possible.
minimize the use of 'OR'-predicates
in the where clause
It is hard to say what is the best indexing because there are different strategies depend on situation. Still there are coupe things you should now about indexes.
Index SOMETIMES increase performance on select statement and ALWAYS decrease performance on insert and update.
To index table it is not necessary to make it as key on certain field. Also, real life indexes almost always include several fields.
Don't create any indexes for "future purposes" if your performance is satisfactory. Even if you don't have indexes at all.
Always try to analyze execution plan when tuning indexes. Don't be afraid to experiment.
+
Table scan is not always bad thing.
That is all from me.
Use Database Tuning Advisor(SQL Server) to analyse your query. It will suggest necessary indexes to add to tune your query performance
I have created script to find selectivity of each columns for every tables. In those some tables with less than 100 rows but selectivity of column is more than 50%.
where Selectivity = Distinct Values / Total Number Rows
So, is those column are eligible for index?
Or, can you tell, how much minimum rows require for eligibility for creating index?
I think I understand what you are trying to accomplish by calculating a 'Selectivity' value for your data but you cannot apply the rule blindly.
In fact in for certain queries the 'Selectivity' value might be really low an index will still be very beneficial. For example:
Assume a 'inbox' table with millions of rows, these rows have a 'Read' boolean field. In this case the distinct values over the number of rows will be really low. If most items are read most of the time then finding unread items with an index on this field will be very efficient.
Creating indexes index come at a cost. Although you get the benefit for reads, you pay for writes and disk usage.
I would rather recommend you profile your queries and index accordingly. You can also look at the data from sys.dm_db_missing_index_group_stats and other Dynamic management views that will give you insight on indexes usage (or missing) ones.
You can create a index on a table with 0 rows, 1 row or a 100 million rows. You can create an index where every column has the same value or unique values.
So you can create an index. The question is really should you create an index and no tool is going to tell you that because indexes can also be multi-value and it depends on what queries you run. Creating indexes is something done when performance tuning queries or preemptively when you know that you'll be creating queries that are using it.
Every index comes with a cost in terms of space and time required to do updates, inserts and deletes. You don't want to be creating them spuriously so you're really going to have to do this by hand, not as a result of a script to see how unique the value of a column is.
A general rule of thumb says that if you have a very large table (over 1 million rows), you should only use an index if a WHERE clause based on that index selects at most something in the neighborhood of 1-2% of the data.
If you have a "gender" column and roughly 50% of values are "male" and roughly 50% "female", then having an index on that really doesn't give you much - SQL Server and most other RDBMS will most likely still do a full table scan in this case, since on average, they'd have to scan at least half the table anyway, so the "detour" by using an index first and then looking up the actual full data based on that index value is just not worth it.
An index is excellent if you have something like unique keys (customer number), or a value that is quite selective. An index is not without cost - it uses up disk space, it needs to be maintained, it will slightly slow down all operations besides the SELECT - so thread carefully, it's not the best idea to just blindly index everything. Having too few indices is bad - but having too many, and the wrong ones, can be even worse! :-) Nobody ever claimed getting your indices right was easy.... :-)
But there's definitely help out there - the best source I know are Kimberly Tripp's excellent blog posts on SQL Server indexing (and many other topics).
Marc