SQL Profile and DTA - sql

Here is the scenario:
I have run a trace for few hours during maximum server load using the three events (never mind this) that DTA looks for. I then stop and feed this profiler load to DTA. It does its tuning work and gives me feedback on what indexes need to be put.
Here is the question:
Several (upto 15) different indexes recommendations are for single table. If I let DTA do its work does this mean several indexes are going to be created for this? Is this not going to be a problem?

DTA does a reasonable in many circumstances, but it doesn't always make the optimum recommendations. Overlaps are quite common, as is the duplication of the clustered index as a non-clustered index(!).
If you want to do this more accurately by hand: MS SQL Server 2008 - How Can I Log and Find the Most Expensive Queries?

Optimisation of SQL server is complex and depends heavily on the data in your database. The only real way to determine what affect changes will have is to perform performance and load testing against your database using representative data (preferably a backup of your live database)
That said, 15 indexes seems like a lot to me - a large number of indexes may have a detremental affect on the speed of writes against that table. DTA has probably taken each query run against that table individually and come up with the optimum indexes for each query. You will probably find that its possible to recuce the number of indexes by creating indexes suitable for multiple queries - this might mean that some queries are slightly slower that with all 15 indexes, however the chances are that you will be able to get 99% of the improvement.

Without knowing any more about the tables, queries or indexes, the answer has to be "it depends"...
You need to take the output from the DTA as a starting point. If you look at the recommended indexes, you may find that there is some overlap between them in order to reduce the number of indexes.

15 indexes does seem like a lot for a single table. I would look at the actual queries themselves that are run against it.
Look for places where changes to the query structure will all them to work against either existing indexes or a small subset of the proposed indexes.

Related

How do I improve performance when querying on a column that changes frequently in SQL Azure using LINQ to SQL

I have an SQL Azure database, and one of the tables contains over 400k objects. One of the columns in this table is a count of the number of times that the object has been downloaded.
I have several queries that include this particular column (call it timesdownloaded), sorted descending, in order to find the results.
Here's an example query in LINQ to SQL (I'm writing all this in C# .NET):
var query = from t in db.tablename
where t.textcolumn.StartsWith(searchfield)
orderby t.timesdownloaded descending
select t.textcolumn;
// grab the first 5
var items = query.Take(5);
This query called perhaps 90 times per minute on average.
Objects are downloaded perhaps 10 times per minute on average, so this timesdownloaded column is updated that frequently.
As you can imagine, any index involving the timesdownloaded column gets over 30% fragmented in a matter of hours. I have implemented an index maintenance plan that checks and rebuilds these indexes when necessary every few hours. This helps, but of course adds spikes in query response times whenever the indexes are rebuilt which I would like to avoid or minimize.
I have tried a variety of indexing schemes.
The best performing indexes are covering indexes that include both the textcolumn and timesdownloaded columns. When these indexes are rebuilt, the queries are amazingly quick of course.
However, these indexes fragment badly and I end up with pretty frequent delay spikes due to rebuilding indexes and other factors that I don't understand.
I have also tried simply not indexing the timesdownloaded column. This seems to perform more consistently overall, though slower of course. And when I check on the SQL query execution plan, it seems to be pretty inconsistent in how SQL tries to optimize this query. Of course it ends up with a log of logical reads as it has to fetch the timesdownloaded column from the table and not an organized index. So this isn't optimal.
What I'm trying to figure out is if I am fundamentally missing something in how I have configured or manage this database.
I'm no SQL expert, and I've yet to find a good answer for how to do this.
I've seen some suggestions that Stored Procedures could help, but I don't understand why and haven't tried to get those going with LINQ just yet.
As commented below, I have considered caching but haven't taken that step yet either.
For some context, this query is a part of a search suggestion feature. So it is called frequently with many different search terms.
Any suggestions would be appreciated!
Based on the comments to my question and further testing, I ended up using an Azure Table to cache my results. This is working really well and I get a lot of hits off of my cache and many fewer SQL queries. The overall performance of my API is much better now.
I did try using Azure In Role Caching, but that method doesn't appear to work well for my needs. It ended up using too much memory (no matter how I configured it, which I don't understand), swapping to disk like crazy and brought my little Small instances to their knees. I don't want to pay more at the moment, so Tables it is.
Thanks for the suggestions!

Is there a downside to adding numerous indexes to tables?

I am creating a new DB and was wondering if there was any downside to adding numerous indexes to tables that I think may require one.
If I create an index but end up not utilizing it will it cause any issues?
Indexes make it faster to search tables, but longer to write to. Having unused indexes will end come causing some unnecessary slow down.
Each Index :
Takes some place, on disk, and in RAM
Takes some time to update, each time you insert/update/delete a row
Which means you should only define indexes that are useful for your application's needs : too many indexes will slow down the writes more than you'll gain from the reads.
Yes.
You should only add those indexes, that are necessary.
An index requires extra space, and, when inserting / updating / deleting records, the DBMS needs to update those indexes as well. So, this means that it takes more time to update/add/delete a record, since the DBMS has to do some extra administration.
adding numerous indexes to tables that
I think may require one.
You should only add those indexes for which you're sure that they're necessary.
To determine the columns where you could put indexes on, you could:
add indexes to columns that are
foreign keys
add indexes to columns that are often used in where clauses
add indexes to columns that are used in order by clauses.
Another -and perhaps better- approach, is to use SQL Profiler:
use SQL Profiler to trace your
application / database for a while
save the trace results
use the trace results in the Index Tuning Wizard, which will tell you which indexes you should create, what columns should be in each index, and it will also tell you the order of those columns for the index.
Indexes cause an increase in database size and the amount of time to insert/update/delete records. Try not to add them unless you know you will use them.
Having an index means INSERTs and UPDATEs take a bit longer. If you have too many indexes, then the benefit of faster search times can become not worth the extra INSERT and UPDATE time.
Yes; having an index makes selects faster but potentially makes inserts slower, as the indexes must be updated for each insert. If your application does a lot of writing and not much reading (e.g. an audit log) you should avoid extra indexing.
update and insert cost more as indexes need to be updated as well
more space used
Don't create any extra indices at the begining. Wait until you've at least partly developed the system so you can have an idea about the usage of the table. Generate query plans to see what gets queried (and how, and the performance costs) and THEN add new indices as needed.
Do not index blindly! Take a look at your data to see which columns are actually being used in SELECT predicates and index those.
Also consider that indexes take room. Sometimes a lot of room. I have seen databases where the indexing data far outweighed the raw data in sheer volume.
Extra space, Extra time to insert like everyone has said.
Also, you should be certain of your indexes and your design because sometimes indexes can actually slow down queries if the query optimizer chooses the wrong index. This is uncommon but can happen if by optimizing an index for a particular join and causing another join to actually become slower. I don't know about SQL Server but you'll find lots of tricks around for hinting the mySQL optimizer to build queries in specific ways to get around this.
DBA's get payed a lot of money to know about weird gotcha's like this with indexes(among other things) so yeah, there are downsides to adding lots of indexes so be careful. Lean heavily on your query profiler and don't just throw indexes blindly at problems.
Take a look at the columns used in your where clauses, look at columns used in joins if any.
Generally the very simplest rule of thumb.
Extraneous indexes as pointed out before will slow down your DML statements and are generally not recommended. Ideally, what I have done is finish the entire module and during your unit testing phase, ensure that you can do load analysis on the module and then check whether or not you are seeing slow downs and after analyzing where the slow downs are, add indexes.
I think it's been answered already, but basically indexes slow down inserts/updates as the index is updated when a new record is inserted (or an existing one updated).
Space is also a consideration, both memory and disk.
So for databases where a large amount of transactions are occurring, this will definitely have a noticeable performance impact (which is why performance tuning includes adding and removing indexes to optimize certain activities performed against the database).
Good luck

Sql Query Optimization

I am not so proficient in TSql as of now (writing since last 4/5 months) but I have written many queries. Although I have given the outputs, sometimes I feel that the queries are not so optimized. I searched in google and found lot of stuffs about query optimization, and they ask to look into the query plan(actual & estimated) for the performance improvisation.
As I already said that I am very new to writing queries so it is becoming difficult for me to grasp those solutions. But I need to learn query optimization.
Can any body help me out initially how and where should I start from?
Searching in internet reveals that, SEEK is better than SCAN(May it be index or Table). How can I achieve a seek over a scan?
Then they says that ORDER BY clause i.e. sorting is more costly. Then what is the work around? How can I write effective query?
Can anybody explain me, with some examples, which kind of query is better over what and in what situation?
Edited
Dear All,
You all have answered and that will help me a lot. But what I intend to say is that, you all have practised a lot for becoming an expert. Once upon a time, I guess you all were like what I am now.So my humble request is how you all started for writing optimised query.I know that patience is needed and I will devote that.
I apologise for any wrong statement of mine.
Thanks in advance
Articles discussing Query Optimization issues are often very factual and useful, but as you found out they can be hard to follow. It is a bit like when someone is trying to learn the basics rules of baseball, and all the sports commentary he/she finds on the subject is rife with acronyms and strategic details about the benefits of sacrificing someone at bat, and other "inside baseball" trivia...
So you need to learn the basics first:
the structure(s) of the database storage
indexes' structure, the clustered and non clustered kind, the multi column indexes
the concept of covering a query
the selectivity of a particular column
the disadvantage of indexes when it comes to CRUD operations
the basic subtasks/strategies of a query: table or index scan, index seek, sorting, inner-outer merge etc.
the log file, the data recovery model.
The following links apply to MS SQL Server. If that is not the DBMS you are using you can try and find similar material for the system of your choice. In fact, so long as you realize that the implementation may vary, it may be useful to peruse the MS documention.
MS SQL storage structures
MS SQL pages and extents
Then as you started doing, learn the way to read query plans (even if not in fully understand at first), and all this should bring you to a level where you start to make sense of the more advanced books or articles on the topic. I do not know of tutorials for Query Plans on the Internet (though I'm quite sure they exist...), but the following methodology may be of use: Start with simple queries, review the query plan (if possible in a graphic fashion), start recognizing the most common elements: Table Scan, Index Seek, Sort, nested loops... Read the detailed properties of these instances: estimated nb of rows, cost percentage etc. When you find a new element that you do not know/understand, use this keyword to find details on the internet. Also: experiment a lot.
Finally you should remember that while the way the query is written and the set of indexes etc. provided cover a great part of optimization needs, there are other sources of optmization, for example the way hardware is put to use (a basic example is how by having the data file and the log file on separate physical disks, we can greatly improve CRUD performance).
Searching in internet reveals that,
SEEK is better than SCAN(May it be
index or Table). How can I achieve a
seek over a scan?
Add the necessary index -- if the incremental costs on INSERT and UPDATE (and extra storage) are an overall win to speed up the seeking in your queries.
Then they says that ORDER BY clause
i.e. sorting is more costly. Then what
is the work around? How can I write
effective query?
Add the necessary index -- if the incremental costs on INSERT and UPDATE (and extra storage) are an overall win to speed up the ordering in your queries.
Can anybody explain me, with some
examples, which kind of query is
better over what and in what
situation?
You already pointed out a couple of specific questions -- and the answers were nearly identical. What good would it do to add another six?
Run benchmark queries over representative artificial data sets (must resemble what you're planning to have in production -- if you have small toy-sized tables the query plans will not be representative nor meaningful), try with and without the index that appear to be suggested by the different query plans, measure performance; rinse, repeat.
It takes 10,000 hours of practice to be good at anything. Optimizing DB schemas, indices, queries, etc, is no exception;-).
ORDER BY is a necessary evil - there's no way around it.
Refer to this question for solving index seek, scan and bookmark/key lookups. And this site is very good for optimization techniques...
Always ensure that you have indexes on your tables. Not too many and not too few.
Using sql server 2005, apply included columns in these indexes, they help for lookups.
Order by is costly, if not required, why sort a data table if it is not required.
Always filter as early as possible, if you reduce the number of joins, function calls etc, as early as possible, you reduce time taken over all
avoid cursors if you can
use temp tables/ table vars for
filtering where possible
remote queries will cost you
queries with sub
selects in the where clause can be
hurtfull
table functions can be costly if not
filtered
as always, there is no hard rule, and things should be taken on a per query basis.
Always create the query as understandle/readable as possible, and optimize when needed.
EDIT to comment question:
Temp tables can be used when you require to add indexes on the temp table (you cannot add indexes on var tables, except the pk). I mostly use var tables when i can, and only have the required fields in them as such
DECLARE #Table TABLE(
FundID PRIMARY KEY
)
i would use this to fill my fund group ids instead of having a join to tables that are less optimized.
I read a couple of articles the other day and to my surprise found that var tables are actually created in the tempdb
link text
Also, i have heard, and found that table UDFs can seems like a "black box" to the query planner. Once again, we tend to move the selects from the table functions into table vars, and then join on these var tables. But as mentioned earlier, write the code first, then optimize when you find bottle necks.
I have found that CTEs can be usefull, but also, that when the level of recursion grows, that it can be very slow...

Sql optimization Techniques

I want to know optimization techniques for databases that has nearly 80,000 records,
list of possibilities for optimizing
i am using for my mobile project in android platform
i use sqlite,i takes lot of time to retreive the data
Thanks
Well, with only 80,000 records and assuming your database is well designed and normalized, just adding indexes on the columns that you frequently use in your WHERE or ORDER BY clauses should be sufficient.
There are other more sophisticated techniques you can use (such as denormalizing certain tables, partitioning, etc.) but those normally only start to come into play when you have millions of records to deal with.
ETA:
I see you updated the question to mention that this is on a mobile platform - that could change things a bit.
Assuming you can't pare down the data set at all, one thing you might be able to do would be to try to partition the database a bit. The idea here is to take your one large table and split it into several smaller identical tables that each hold a subset of the data.
Which of those tables a given row would go into would depend on how you choose to partition it. For example, if you had a "customer_id" field that could range from 0 to 10,000 you might put customers 0 - 2500 in table1, 2,500 - 5,000 in table2, etc. splitting the one large table into 4 smaller ones. You would then have logic in your app that would figure out which table (or tables) to query to retrieve a given record.
You would want to partition your data in such a way that you generally only need to query one of the partitions at a time. Exactly how you would partition the data would depend on what fields you have and how you are using them, but the general idea is the same.
Create indexes
Delete indexes
Normalize
DeNormalize
80k rows isn't many rows these days. Clever index(es) with queries that utlise these indexes will serve you right.
Learn how to display query execution maps, then learn to understand what they mean, then optimize your indices, tables, queries accordingly.
Such a wide topic, which does depend on what you want to optimise for. But the basics:
indexes. A good indexing strategy is important, indexing the right columns that are frequently queried on/ordered by is important. However, the more indexes you add, the slower your INSERTs and UPDATEs will be so there is a trade-off.
maintenance. Keep indexes defragged and statistics up to date
optimised queries. Identify queries that are slow (using profiler/built-in information available from SQL 2005 onwards) and see if they could be written more efficiently (e.g. avoid CURSORs, used set-based operations where possible
parameterisation/SPs. Use parameterised SQL to query the db instead of adhoc SQL with hardcoded search values. This will allow better execution plan caching and reuse.
start with a normalised database schema, and then de-normalise if appropriate to improve performance
80,000 records is not much so I'll stop there (large dbs, with millions of data rows, I'd have suggested partitioning the data)
You really have to be more specific with respect to what you want to do. What is your mix of operations? What is your table structure? The generic advice is to use indices as appropriate but you aren't going to get much help with such a generic question.
Also, 80,000 records is nothing. It is a moderate-sized table and should not make any decent database break a sweat.
First of all, indexes are really a necessity if you want a well-performing database.
Besides that, though, the techniques depend on what you need to optimize for: Size, speed, memory, etc?
One thing that is worth knowing is that using a function in the where statement on the indexed field will cause the index not to be used.
Example (Oracle):
SELECT indexed_text FROM your_table WHERE upper(indexed_text) = 'UPPERCASE TEXT';

How do you optimize tables for specific queries?

What are the patterns you use to determine the frequent queries?
How do you select the optimization factors?
What are the types of changes one can make?
This is a nice question, if rather broad (and none the worse for that).
If I understand you, then you're asking how to attack the problem of optimisation starting from scratch.
The first question to ask is: "is there a performance problem?"
If there is no problem, then you're done. This is often the case. Nice.
On the other hand...
Determine Frequent Queries
Logging will get you your frequent queries.
If you're using some kind of data access layer, then it might be simple to add code to log all queries.
It is also a good idea to log when the query was executed and how long each query takes. This can give you an idea of where the problems are.
Also, ask the users which bits annoy them. If a slow response doesn't annoy the user, then it doesn't matter.
Select the optimization factors?
(I may be misunderstanding this part of the question)
You're looking for any patterns in the queries / response times.
These will typically be queries over large tables or queries which join many tables in a single query. ... but if you log response times, you can be guided by those.
Types of changes one can make?
You're specifically asking about optimising tables.
Here are some of the things you can look for:
Denormalisation. This brings several tables together into one wider table, so in stead of your query joining several tables together, you can just read one table. This is a very common and powerful technique. NB. I advise keeping the original normalised tables and building the denormalised table in addition - this way, you're not throwing anything away. How you keep it up to date is another question. You might use triggers on the underlying tables, or run a refresh process periodically.
Normalisation. This is not often considered to be an optimisation process, but it is in 2 cases:
updates. Normalisation makes updates much faster because each update is the smallest it can be (you are updating the smallest - in terms of columns and rows - possible table. This is almost the very definition of normalisation.
Querying a denormalised table to get information which exists on a much smaller (fewer rows) table may be causing a problem. In this case, store the normalised table as well as the denormalised one (see above).
Horizontal partitionning. This means making tables smaller by putting some rows in another, identical table. A common use case is to have all of this month's rows in table ThisMonthSales, and all older rows in table OldSales, where both tables have an identical schema. If most queries are for recent data, this strategy can mean that 99% of all queries are only looking at 1% of the data - a huge performance win.
Vertical partitionning. This is Chopping fields off a table and putting them in a new table which is joinned back to the main table by the primary key. This can be useful for very wide tables (e.g. with dozens of fields), and may possibly help if tables are sparsely populated.
Indeces. I'm not sure if your quesion covers these, but there are plenty of other answers on SO concerning the use of indeces. A good way to find a case for an index is: find a slow query. look at the query plan and find a table scan. Index fields on that table so as to remove the table scan. I can write more on this if required - leave a comment.
You might also like my post on this.
That's difficult to answer without knowing which system you're talking about.
In Oracle, for example, the Enterprise Manager lets you see which queries took up the most time, lets you compare different execution profiles, and lets you analyze queries over a block of time so that you don't add an index that's going to help one query at the expense of every other one you run.
Your question is a bit vague. Which DB platform?
If we are talking about SQL Server:
Use the Dynamic Management Views. Use SQL Profiler. Install the SP2 and the performance dashboard reports.
After determining the most costly queries (i.e. number of times run x cost one one query), examine their execution plans, and look at the sizes of the tables involved, and whether they are predominately Read or Write, or a mixture of both.
If the system is under your full control (apps. and DB) you can often re-write queries that are badly formed (quite a common occurrance), such as deep correlated sub-queries which can often be re-written as derived table joins with a little thought. Otherwise, you options are to create covering non-clustered indexes and ensure that statistics are kept up to date.
For MySQL there is a feature called log slow queries
The rest is based on what kind of data you have and how it is setup.
In SQL server you can use trace to find out how your query is performing. Use ctrl + k or l
For example if u see full table scan happening in a table with large number of records then it probably is not a good query.
A more specific question will definitely fetch you better answers.
If your table is predominantly read, place a clustered index on the table.
My experience is with mainly DB2 and a smattering of Oracle in the early days.
If your DBMS is any good, it will have the ability to collect stats on specific queries and explain the plan it used for extracting the data.
For example, if you have a table (x) with two columns (date and diskusage) and only have an index on date, the query:
select diskusage from x where date = '2008-01-01'
will be very efficient since it can use the index. On the other hand, the query
select date from x where diskusage > 90
would not be so efficient. In the former case, the "explain plan" would tell you that it could use the index. In the latter, it would have said that it had to do a table scan to get the rows (that's basically looking at every row to see if it matches).
Really intelligent DBMS' may also explain what you should do to improve the performance (add an index on diskusage in this case).
As to how to see what queries are being run, you can either collect that from the DBMS (if it allows it) or force everyone to do their queries through stored procedures so that the DBA control what the queries are - that's their job, keeping the DB running efficiently.
indices on PKs and FKs and one thing that always helps PARTITIONING...
1. What are the patterns you use to determine the frequent queries?
Depends on what level you are dealing with the database. If you're a DBA or a have access to the tools, db's like Oracle allow you to run jobs and generate stats/reports over a specified period of time. If you're a developer writing an application against a db, you can just do performance profiling within your app.
2. How do you select the optimization factors?
I try and get a general feel for how the table is being used and the data it contains. I go about with the following questions.
Is it going to be updated a ton and on what fields do updates occur?
Does it have columns with low cardinality?
Is it worth indexing? (tables that are very small can be slowed down if accessed by an index)
How much maintenance/headache is it worth to have it run faster?
Ratio of updates/inserts vs queries?
etc.
3. What are the types of changes one can make?
-- If using Oracle, keep statistics up to date! =)
-- Normalization/De-Normalization either one can improve performance depending on the usage of the table. I almost always normalize and then only if I can in no other practical way make the query faster will de-normalize. A nice way to denormalize for queries and when your situation allows it is to keep the real tables normalized and create a denormalized "table" with a materialized view.
-- Index judiciously. Too many can be bad on many levels. BitMap indexes are great in Oracle as long as you're not updating the column frequently and that column has a low cardinality.
-- Using Index organized tables.
-- Partitioned and sub-partitioned tables and indexes
-- Use stored procedures to reduce round trips by applications, increase security, and enable query optimization without affecting users.
-- Pin tables in memory if appropriate (accessed a lot and fairly small)
-- Device partitioning between index and table database files.
..... the list goes on. =)
Hope this is helpful for you.