I was reading Ian Stirk's Uncover Hidden Data to Optimize Application Performance article on MSDN Magazine, and now I am wondering if table indexes creation could be automated, as Google AppEngine does for its BigTable.
There is any tool or Sql Server feature that automates table index creation?
No, as far as I know, there's no feature in SQL Server that enables automatic table index creation.
I wouldn't think it to be a good idea, either, because getting the right indexes in place will depend on a multitude of factors, hardly any of which can be really truly automated.
Sure - you can place a primary key on any column called "ID" - until you run into a case where you need a primary key on something else....
Sure, it makes sense to index foreign key columns in the child table - but sometimes, the added overhead for INSERTs can more than offset the gains of having the index.
Getting the right indices in place is just way too dependant on your actual usage and a lot of dynamic, usage parameters (and design decisions on your part, too) so I'd be surprised if any solution would really work all that well...
Marc
I am not aware of any tools, and the best way to create indexes is to actually check the queries and their execution plans manually. I don't think that an automated tool will ever be as good as a few good DBA's analyzing the data together with the Profiler.
But, if you feel like giving a shot yourself, I recommend that you start looking at the performance views in the SQL Server.
Start with the function sys.dm_db_missing_index_columns, that should give you a hint of which columns that could benefit from being indexed.
sys.dm_db_index_usage_stats could show you which indexes are useless, or could be optimized as well.
sys.dm_exec_cached_plans, sys.dm_exec_query_plan, sys.dm_exec_query_stats and sys.dm_exec_sql_text should show you queries performed and how they perform, and together with information from the other tables, you could probably find out which ones that need more work.
Actually, I vaguely recall some wizard that can help you to analyze performance in the Profiler, probably not automatically, but it might be possible to put that in a Maintenance plan.
I'm glad you enjoyed my article!
In my new book about SQL Server performance via DMVs (www.manning.com/stirk), in chapter 10, you'll see a script that allows you to automatically created indexes... there's also a script to remove unused indexes... you can see how these could be used together.
Thanks
Ian
I know it's an old thread, but I thought someone may find this interesting:
http://blogs.msdn.com/b/queryoptteam/archive/2006/06/01/613516.aspx
Even though it sounds awesome I heed caution - indexes can cripple a system as much as they can make it fly - only use if you know what you are doing!
Related
This question already has an answer here:
When to use Oracle hints?
(1 answer)
Closed 5 years ago.
I have gone through some documentation on the net and using hints is mostly discouraged. I still have doubts about this. Can hints be really useful in production specially when same query is used by hundreds of different customer.
Is hint only useful when we know the number of records that are present in the tables? I am using leading in my query and it gives faster results when the data is very large but the performance is not that great when the records fetched are less.
This answer by David is very good but I would appreciate if someone clarified this in more details.
Most hints are a way of communicating our intent to the optimizer. For instance, the leading hint you mention means join tables in this order. Why is this necessary? Often it's because the optimal join order is not obvious, because the query is badly written or the database statistics are inaccurate.
So one use of hints such as leading is to figure out the best execution path, then to figure out why the database doesn't choose that plan without the hint. Does gathering fresh statistics solve the problem? Does rewriting the FROM clause solve the problem? If so, we can remove the hints and deploy the naked SQL.
Some times there are times where we cannot resolve this conundrum, and have to keep the hints in Production. However this should be a rare exception. Oracle have had lots of very clever people working on the Cost-Based Optimizer for many years, so its decisions are usually better than ours.
But there are other hints we would not blink to see in Production. append is often crucial for tuning bulk inserts. driving_site can be vital in tuning distributed queries.
Conversely other hints are almost always abused. Yes parallel, I'm talking about you. Blindly putting /*+ parallel (t23, 16) */ will probably not make your query run sixteen times faster, and not infrequently will result in slower retrieval than a single-threaded execution.
So, in short, there is no universally applicable advice to when we should use hints. The key things are:
understand how the database works, and especially how the cost-based optimizer works;
understand what each hint does;
test hinted queries in a proper tuning environment with Production-equivalent data.
Obviously the best place to start is the Oracle documentation. However, if you feel like spending some money, Jonathan Lewis's book on the Cost-Based Optimizer is the best investment you could make.
I couldn't just rephrase that, so I will paste it here
(it's a brief explanation as of "When Not To Use Hints", that I had bookmarked):
In summary, don’t use hints when
What the hint does is poorly understood, which is of course not limited to the (ab)use of hints;
You have not looked at the root cause of bad SQL code and thus not yet tapped into the vast expertise and experience of your DBA in tuning the database;
Your statistics are out of date, and you can refresh the statistics more frequently or even fix the statistics to a representative state;
You do not intend to check the correctness of the hints in your statements on a regular basis, which means that, when statistics change, the hint may be woefully inadequate;
You have no intention of documenting the use of hints anyway.
Source link here.
I can summarize this as: The use of hints is not only a last resort, but also a lack of knowledge on the root cause of the issue. The CBO (Cost Based Optimizer) does an excellent job, if you just ensure some basics for it. Those include:
Fresh statistics
1.1. Index statistics
1.2. Table statistics
1.3. Histograms
Correct JOIN conditions and INDEX utilization
Correct Database settings
This article here is worth reading:
Top 10 Reasons for poor Oracle performance
Presented by non other, but Mr. Donald Burleson.
Cheers
In general hints should be used only exceptional, I know following situations where they make sense:
Workaround for Oracle bugs
Example: Once for a SELECT statement I got an error ORA-01795: maximum expression number in list - 1000, although the query did not contain an IN expression at all.
Problem was: The queried table contains more than 1000 (sub-) partitions and Oracle did a transformation of my query. Using the (undocumented) hint NO_EXPAND_TABLE solved the issue.
Datewarehouse application while staging
While staging you can have significant changes on your data where the table/index statistics are not aware about as statistics are gathered only once a week by default. If you know your data structure then hints could be useful as they are faster than running DBMS_STATS.GATHER_TABLE_STATS(...) manually all the time in between your operations. On the other hand you can run DBMS_STATS.GATHER_TABLE_STATS() even for single columns which might be the better approach.
Online Application Upgrade Hints
From Oracle documentation:
The CHANGE_DUPKEY_ERROR_INDEX, IGNORE_ROW_ON_DUPKEY_INDEX, and
RETRY_ON_ROW_CHANGE hints are unlike other hints in that they have a
semantic effect. The general philosophy explained in "Hints" does not
apply for these three hints.
I'm trying to optimize my site and found this nice little Django doc:
Database Access Optimization, which suggests profiling followed by indexing and the selection of proper fields as the starting point for database optimization.
Normally, the django docs explain things pretty well, even things that more experienced programmers might consider "obvious". Not so in this case. After no explanation of indexing, the doc goes on to say:
We will assume you have done the obvious things above.
Uhhh. Wait! What the heck is indexing?
Obviously I can figure out what indexing is via google, my question is: what is it that I need to know as far as database stuff goes in order to create a scalable website? What should I be aware of about the Django framework specifically? What other "obvious" things ought I know? Where can I learn them?
I'm looking to get pointed in a direction here. I don't need to learn anything and everything about SQL, I just want to be informed enough to build my app the right way.
Thanks in advance!
I encourage you to read all that the other answers suggest and whatever else you can find on the subject, because it's all good information to know and will make you a better programmer.
That said, one of the nice things about Django and other similar frameworks is that for the most part you don't have to know what's going on behind the scenes in the DB. Django adds indexes automatically for fields that need them. The encouragement to add more is based on the use cases of your app. If you continually query based on one particular field, you should ensure that that field is indexed. It might be already (if it's a foreign key, primary key, etc.), but other random fields typically aren't.
There's also various optimizations that are database client-specific. Django can't do much here because it's goal is to remain database independent. So, if you're using PostgreSQL, MySQL, whatever, read about optimizations and best practices concerning those particular clients.
Wikipedia database design, and database normalization http://en.wikipedia.org/wiki/Database_design, and http://en.wikipedia.org/wiki/Database_normalization are two very important concepts, in addition to indexing.
In addition to these, having a basic understanding of your database of choice is necessary. Being able to add users, set permissions, and create a database are key things that you should know.
Learning how to backup your data is also a crucial thing.
The list keeps getting longer, one should also be aware of the db relationships that django handles for you, OneToOne, ManyToMany, ManyToOne. https://docs.djangoproject.com/en/dev/topics/db/models/
The performance impact of JOINs shouldn't be ignored. Access model properties in django is so easy, but understanding that some of Foreign Key relationships could have huge performance impacts is something to consider too.
Once you have a basic understanding of these things you should be at a pretty good starting point for creating a non-trivial django app!
Wikipedia has a nice article about database indexes, they are similar(ish) to an index in a book i.e. lets you (the computer) find things faster because you just look at the index (probably a very bad example :-)
As for performance there are many things you can do and presumably as it is a very detailed subject in itself, and is something that is particular to each RDBMS then it would be distracting / irrelevant for them (django) to go into great detail. Best thing is really to google performance tips for your particular RDBMS. There are some general tips such as indexing, limiting queries to only return the required data etc.
I think one of the main things is a good design, sticking as much as possible to Normal Form and in general actually taking your database into consideration before programming your models etc (which clearly you seem to be doing). Naming conventions are also a big plus, remembering explicit is better then implicit :-)
To summarise:
Learn/understand the fundamentals such as the relational model
Decide on a naming convention
Design your database perhaps using an ERM tool
Prefer surrogate ID's
Use the correct data type of minimum possible size
Use indexes appropriately and don't over index
Avoid unecessary/over querying
Prioritise security and stability over raw performance
Once you have an up and running database 'tune' the database analysing/profiling settings, queries, design etc
Backup and archive regularly - cron
Hang out here :-)
If required advance into replication (master/slave - django supports this quite well too)
Consider upgrading your hardware
Don't get too hung up about it
Can a DBA find performance issues just from reading TSQL code? Is a DBA expected to have that capability?
A DBA should understand TSQL, and be able to identify issues with it.
Having said that, performance of a query is only partly caused by the TSQL itself.
It also depends hugely on the indexes, triggers, table structure, and architecture of the machine which is hosting the SQL Server.
So TSQL is just one factor which a DBA has to consider.
A good DBA can detect problems using nothing more than tea leaves and chicken entrails.
Seriously though, a hallmark of a good DBA is not so much to be able to detect the problems, but to be able to detect possible problems, based on this sort of static analysis.
By way of example, a good DBA can look at:
select * from tbl where col1 = 7 order by col2;
and immediately figure out that:
the user of that query should either use every column they ask for, or they should ask for less.
the table may need indexes on the col1 and col2 columns.
Now, those two potential problem areas aren't enough to decide whether there are problems. To find that out for sure, the DBA would need to examine the database schema and statistics. But static analysis should be a good start.
So, in answer to your actual question, no, they generally don't find problems from only reading TSQL (or any SQL), but they should be able to use that information to target their efforts.
No. She would at least need to have the table schema and index definitions to be able to start guessing, and for an educated opinion, she would want to see performance profiling information from a live database.
There may be a few general "you just cannot do that" mistakes she could catch by just looking at the queries, but for most performance problems it all depends on configuration and context.
I need to test indexes performances for some table in my database.
After I run my query with indexes or without them I always use this code;
SELECT * FROM sys.dm_exec_query_optimizer_info;
And I receive details about my query.
My problem is:
using sys.dm_exec_query_optimizer
The details for my query are always changing making difficult to understand.
What is the best solution?
Do you know any way or best practices?
You have to learn what the query optimizer is telling you. That the data changes is good; it means that things behave differently depending on whether you have indexes or not. However, there is no standardization on how optimizer information is presented - each DBMS does it differently. If you are going to interpret the data, you must understand it.
Looking at the query plan is important. Ultimately, so to is measuring the actual performance. It depends in part on why you are looking at the indexing at all. If there's a perceived performance problem that you are addressing, then clearly you need to ensure that the problem is resolved by the index or indexes you add. You also need to ensure that the cost of adding the indexes on maintenance operations (insert, delete, update operations) is not intolerable - you have not added too many indexes. You may also need to consider disk space usage - is it OK to commit so much disk space to so many indexes.
Without more specific information about your DBMS or the particular queries, it is hard to give more specific advice.
I've found a number of resources that talk about tuning the database server, but I haven't found much on the tuning of the individual queries.
For instance, in Oracle, I might try adding hints to ignore indexes or to use sort-merge vs. correlated joins, but I can't find much on tuning Postgres other than using explicit joins and recommendations when bulk loading tables.
Do any such guides exist so I can focus on tuning the most run and/or underperforming queries, hopefully without adversely affecting the currently well-performing queries?
I'd even be happy to find something that compared how certain types of queries performed relative to other databases, so I had a better clue of what sort of things to avoid.
update:
I should've mentioned, I took all of the Oracle DBA classes along with their data modeling and SQL tuning classes back in the 8i days ... so I know about 'EXPLAIN', but that's more to tell you what's going wrong with the query, not necessarily how to make it better. (eg, are 'while var=1 or var=2' and 'while var in (1,2)' considered the same when generating an execution plan? What if I'm doing it with 10 permutations? When are multi-column indexes used? Are there ways to get the planner to optimize for fastest start vs. fastest finish? What sort of 'gotchas' might I run into when moving from mySQL, Oracle or some other RDBMS?)
I could write any complex query dozens if not hundreds of ways, and I'm hoping to not have to try them all and find which one works best through trial and error. I've already found that 'SELECT count(*)' won't use an index, but 'SELECT count(primary_key)' will ... maybe a 'PostgreSQL for experienced SQL users' sort of document that explained sorts of queries to avoid, and how best to re-write them, or how to get the planner to handle them better.
update 2:
I found a Comparison of different SQL Implementations which covers PostgreSQL, DB2, MS-SQL, mySQL, Oracle and Informix, and explains if, how, and gotchas on things you might try to do, and his references section linked to Oracle / SQL Server / DB2 / Mckoi /MySQL Database Equivalents (which is what its title suggests) and to the wikibook SQL Dialects Reference which covers whatever people contribute (includes some DB2, SQLite, mySQL, PostgreSQL, Firebird, Vituoso, Oracle, MS-SQL, Ingres, and Linter).
As for badly performing queries - do explain analyze and read it.
You can put explain analyze output on site like explain.depesz.com - it will help you find the elements that really take the most time.
There is a nice online tool that takes the output of EXPLAIN ANALYZE, and graphically shows you critical parts (e.g. wrong estimates, hot spots, etc)
http://explain.depesz.com/help
Btw, I think posted queries become public, and the "previous explains" link has been hit by spambots.
http://www.postgresql.org/docs/current/static/indexes-examine.html
You can give hints: SET enable_indexscan TO false; would make PostgreSQL try to not use indexes
To address your point, unfortunately the only way to tune a query in Postgres is pretty much to tune the database underlying it. In oracle, you can set all of those options on a query by query basis, trump the optimizers plan in the process, but in Postgres, you're pretty much at the mercy of the optimizer, for good and ill.
The PGAdmin3 tool includes a graphical explanation tool for breaking down how a query is handled. It also is especially helpful for showing where table scans occur.
Best I've seen are in here: http://wiki.postgresql.org/wiki/Using_EXPLAIN, but the latest PDF in there is from 2008, so there may be something more recent. I'm interested to hear other user's answers.
Also, something's brewing in the contrib packages: http://www.sai.msu.su/~megera/wiki/plantuner