I made an DB auditing to sort out some performance problems one customers has in a part of our software. I used the profiler to look for the queries that were taking too much time. Minutes later i saw the selects that were taking sooo long to execute (up to minutes). As usual i took the selects and let them execute in the management studio to search for missing or bad indexes (execution plan).
Then the shock they are blazing fast (milliseconds) and use good indexes.
Now my guess was a locking problem but to my surprise the select has the nolock on both tables marker....
Networkproblems seem not to be the hickup here since i have good times from different clients on other selects (the slow selects come from different clients).
Just to be sure i maintained the indexes on the 2 inner joined tables used in this select without any success. Other selects with those tables don't have those horrible times.
edit: so my not so clear question is what steps should be taken to further look for the problem?
Several questions to answer:
is your test environement the same as the productive (customer) environment?
What is the difference between test and prod. Environment (e.g. DB Stats)
Do you know of concurrent processes running at customer site?
What about the table volumes (number of records)?
Key definitions and indexes OK ?
and many more steps you could take but start with these ones
Related
I am using very large tables containing hundreds of millions of rows, and I am measuring the performances of some queries using SQL Developer, I figured out that there is an option called Unshared SQL worksheet, it allows me to execute many queries at the same time. Executing many queries at the same time is suitable for me especially that some queries or procedure take hours to be executed.
My question is does executing many queries at the same time affect performances? (by performances I mean the duration of execution of queries)
Every query costs something to execute. That's just physics. Your database has a fixed amount of resources - CPU, memory, I/O, temp disk - to service queries (let's leave elastic cloud instances out of the picture). Every query which is executing simultaneously is asking for resources from that fixed pot. Potentially, if you run too many queries at the same time you will run into resource contention, which will affect the performance of individual queries.
Note the word "potentially". Whether you will run into actual problem depends on many things: what resources your queries need, how efficiently your queries have been written, how much resource your database server has available, how efficiently it's been configured to support multiple users (and whether the DBA has implemented profiles to manage resource usage). So, like with almost every database tuning question, the answer is "it depends".
This is true even for queries which hit massive tables such as you describe. Although, if you have queries which you know will take hours to run you might wish to consider tuning them as a matter of priority.
I have two identical SQL Databases that contain nearly the same records in each of their tables. The only difference between them is that one lives on my local machine and the other one is in Azure. Yet, after investigating a performance issue I found out that the two databases produce different execution plans for some of the queries. To give you an example, here is a simple query that takes approximately 1 second to run.
select count(*) from Deposits
inner join households on households.id = deposits.HouseholdId
where CashierId = 'd89c8029-4808-4cea-b505-efd8279dc66d'
It is obvious that the inner join needs to be omitted as it doesn't contribute to the end result. Indeed, it is omitted on my local machine but this is not the fact for Azure. Here are two pictures visualizing the execution plans for my local machine and Azure, respectively.
Just to give you some background on what happened, everything worked perfectly until I scaled down my Azure database to Basic 5 DTUs. Afterwards, some queries became extremely slow and I had no idea why. I scaled up the Db instance again but I saw no improvement. Then I rebuilt the indexes and noticed that if I rebuild them in the correct order, the queries will once again start performing as expected. Yet, I have absolutely no idea why I need to rebuild them in some specific order and, even less, how to determine the correct order. Now, I have issues with virtually all queries related to the Deposits table. I tried rebuilding the indexes but I saw no improvement whatsoever. I suspect that there is something to do with the PK index but I am not quite sure. The table contains approximately 300k rows.
Your databases may have the same schemas and approximate number of records, but it's tough to make them identical. Are you sure your databases are identical?
SELECT SERVERPROPERTY(N'ProductVersion');
What about the hardware they are running on? CPU? Memory? Disks? I mean it's Azure, right? Hard to know what actual server hardware you are using. SQL Server's query optimizer will adjust for hardware differences. Additionally, even if the hardware and software were identical... simply the fact that databases are used differently can make them have differences in statistics. The first time you run a query, it is evaluated, and optimized using statistics. All subsequent calls of that query will use that initially cached query plan. Tables change over time, they get taller. The shape of the data changes, meaning an old cached query plan can eventually fall out of favor. Certain things can reshape the data and enact a change in statistics which in turn invalidate the query plan cache, such as such as rebuilding your indices. Try this. To force a fresh query plan on each statement, add a
OPTION (RECOMPILE)
statement to the bottom of your queries. Does that help or stabilize performance? Furthermore - this is a strech - but can I assume that you aren't using that exact same query over and over? It makes more sense that you haven't hardcoded that GUID and that we really have created a query plan for something that has #CashierID as parameter? If so, then your existing query plan could be victim of parameter sniffing, where the query plan you're pulling was optimized for some specific GUID and does poorly when you pass in anything else.
For more info about what that statement does, have a look here. For more understanding on why it's hard to have identical databases take a look here and here.
Good luck! Hope you can get it sorted.
I have a database with tables that grow every day. I cannot predict which tables are going to grow and which are not as I'm not the one who is putting the data into them.
Is there a way to find tables that need indexes at a particular point in time? Is there a way, in SQL Server, to notify me if a database needs tuning on certain tables?
This is a product we have deployed at different client locations and we cannot go onto their servers every time to check if they have a performance issue. What I was thinking about is something that can notify me if there are performance issues on certain tables, so as the new patches go to the clients we can add these indexes or tuned queries.
After referring to Insertion of data after creating index on empty table or creating unique index after inserting data on oracle? I'm not willing to create indexes while installing databases or when the tables have few rows or are empty.
As per my understanding we must not create indexes on a smaller table as it can affect the write performances.
This is only a real concern if you're bulk loading or otherwise generating a hundred million records each day and write performance is a problem. Indexes do increase write times because they have to be updated when data is written, but unless you're running on a potato or running very high loads it's unlikely to be a problem. You'd know it was a problem before you encountered it.
If we're talking about small tables (less than 100 pages) then it's much more likely that indexes won't be useful because the data set is so small, but you shouldn't be concerned about impacting write performance.
Overall, your application should have indexes that support the queries that you expect should be run in your unit testing and staging. You will need feedback from your customers or clients, but until you really know how people use their data, you're going to have to make a best guess.
The general question of "How do I know what indexes I need when I don't know what queries will be run?" is better suited to DBA Stack Exchange. Briefly, you'll need to use dynamic management views for that. The three missing index dynamic views can be used for this. The example query given isn't horrible:
SELECT mig.*, statement AS table_name,
column_id, column_name, column_usage
FROM sys.dm_db_missing_index_details AS mid
CROSS APPLY sys.dm_db_missing_index_columns (mid.index_handle)
INNER JOIN sys.dm_db_missing_index_groups AS mig
ON mig.index_handle = mid.index_handle
ORDER BY mig.index_group_handle, mig.index_handle, column_id;
You shouldn't just blindly follow what this view says, however. It's a good lead on what to look at, but you have to look at the column order and queries actually being used to tell.
You should also monitor index usage statistics and examine how much and in what way indexes are used compared to how much they have to be updated. Indexes that are updated a million times a day but are used once or twice should be considered for removal.
You will also want to monitor query stats to look for queries that run for a long time. This may be poor development on the part of your client, but can also be a sign of design problems.
This is not even a comprehensive overview of things to look for, however. There's a lot to database maintenance and operations. That's why DBAs make a good living. This is just the tip of the iceberg. Just the tip for indexes, even.
What I'd do if you want to maintain this is consider asking your customers to allow you to send feedback for performance analysis. Set up a broker that monitors the management views and sends compiled and sanitized information back to yourselves. You'll need to be very careful about what you send because you don't want to be sending actual customer data, of course.
Keep in mind that dynamic management views typically reset when the instance does, so the results will not typically represent the entire lifespan of the database.
Ok, so I work for a company who sells a web product which has a MS SQL Server back end (can be any version, we've just changed our requirements to 2008+ now that 05 is out of extended support). All databases are owned by the company who purchases the product but we have VPN access and have a tech support department to deal with any issues. One part of my role is to act as 3rd line support for SQL issues.
When performance is a concern one of the usual checks is unused/missing indexes. We've got the usual standard indexes but depending on which modules or how a company utilises the system then it will require different indexes (there's an accounting module and a document management module amongst others). With hundreds of customers it's not possible to remote onto each on a regular basis in order to carry out optimisation work. I'm wondering if anybody else in my position has considered a scheduled task that may be able to drop and create indexes when needed?
I've got concerns (obviously), any changes that this procedure makes would also be stored in a table with full details of the change and a time stamp. I'd need this to be bullet proof, can't be sending something out into the wild if it may cause issues. I'm thinking an overnight or (probably) weekly task.
Dropping Indexes:
Would require the server to be up for a minimum amount of time to ensure all relevant server statistics are up to date (say 2 weeks or 1 month).
Only drop unused indexes for tables that are being actively used (indexes on unused parts of the system aren't a concern).
Log it.
This won't highlight duplicate indexes (that will have to be manual), just the quick wins (unused indexes with writes).
Creating Indexes
Only look for indexes with a value above a certain threshold.
Would have to check whether any similar indexes could be modified to cover the requirement. This could be on a ranking (check all indexed fields are the same and then score the included fields to see if additional would be needed).
Limit to a maximum number of indexes to be created (say 5 per week) to ensure it doesn't get carried away and create a bunch at once). This should help only focus on the most important indexes.
Log it.
This would need to be dynamic as we've got customers on different versions of the system with different usage patterns.
Just to clarify: I'm not expecting anybody to code for this, it's more a question relating to the feasibility and concerns for a task like this.
Edit: I've put a bounty on this to gather some further opinions and to get feedback from anybody who may have tried this before. I'll award it to the answer with the most upvotes by the time the bounty duration ends.
I can't recommend what you're contemplating, but you might be able to simplify your life by gathering the inputs to your contemplated program and making them available to clients and the support team.
If the problem were as simple as you suppose, surely the server itself or the tuning advisor would have solved it by now. You're making at least one unwarranted assumption,
require the server to be up for a minimum amount of time to ensure all relevant server statistics are up to date.
Table statistics are only as good as the last time the were updated after a significant change. Uptime won't guarantee anything about truncate table or a bulk insert.
This won't highlight duplicate indexes
But that's something you can do in a single query using the system tables. (It would be disappointing if the tuning gadget didn't help with those.) You could similarly look for overlapping indexes, such as for columns {a,b} and {a}; the second won't be useful unless {b} is selective and there are queries that don't mention {b}.
To look for new indexes, I would be tempted to try to instrument query use frequency and automate the analysis of query plan output. If you can identify frequently used, long-running queries and map their physical operations (table scan, hash join, etc.) onto the tables and existing indexes, you would have good input for adding and removing indexes. But you have to allow for the infrequently run quarterly report that, without its otherwise unused index, would take days to complete.
I must tell you that when I did that kind of analysis once some years ago, I was disappointed to learn that most problem children were awful queries, usually prompted by awful table design. No index will help the SQL mule. Hopefully that will not be your experience.
An aspect you didn't touch on that might be just as important is machine capacity. You might look into gathering, say, hourly snapshots of SQL Server stats, like disk queue depth and paging. Hardly a server exists that can't be improved with more RAM, and sometimes that's really the best answer.
SQL perf tuning advisor worth a check: https://msdn.microsoft.com/en-us/library/ms186232.aspx
another way could be to get performance data, start here: https://www.experts-exchange.com/articles/17780/Monitoring-table-level-activity-in-a-SQL-Server-database-by-using-T-SQL.html and generate indexes based on the performance table data
check this too : https://msdn.microsoft.com/en-us/library/dn817826.aspx
I have a stored proc that processes a large amount of data (about 5m rows in this example). The performance varies wildly. I've had the process running in as little as 15 minutes and seen it run for as long as 4 hours.
For maintenance, and in order to verify that the logic and processing is correct, we have the SP broken up into sections:
TRUNCATE and populate a work table (indexed) we can verify later with automated testing tools.
Join several tables together (including some of these work tables) to product another work table
Repeat 1 and/or 2 until a final output is produced.
My concern is that this is a single SP and so gets an execution plan when it is first run (even WITH RECOMPILE). But at that time, the work tables (permanent tables in a Work schema) are empty.
I am concerned that, regardless of the indexing scheme, the execution plan will be poor.
I am considering breaking up the SP and calling separate SPs from within it so that they could take advantage of a re-evaluated execution plan after the data in the work tables is built. I have also seen reference to using EXEC to run dynamic SQL which, obviously might get a RECOMPILE also.
I'm still trying to get SHOWPLAN permissions, so I'm flying quite blind.
Are you able to determine whether there are any locking problems? Are you running the SP in sufficiently small transactions?
Breaking it up into subprocedures should have no benefit.
Somebody should be concerned about your productivity, working without basic optimization resources. That suggests there may be other possible unseen issues as well.
Grab the free copy of "Dissecting Execution Plan" in the link below and maybe you can pick up a tip or two from it that will give you some idea of what's really going on under the hood of your SP.
http://dbalink.wordpress.com/2008/08/08/dissecting-sql-server-execution-plans-free-ebook/
Are you sure that the variability you're seeing is caused by "bad" execution plans? This may be a cause, but there may be a number of other reasons:
"other" load on the db machine
when using different data, there may be "easy" and "hard" data
issues with having to allocate more memory/file storage
...
Have you tried running the SP with the same data a few times?
Also, in order to figure out what is causing the runtime/variability, I'd try to do some detailed measuring to pin the problem down to a specific section of the code. (Easiest way would be to insert some log calls at various points in the sp). Then try to explain why that section is slow (other than "5M rows ;-)) and figure out a way to make that faster.
For now, I think there are a few questions to answer before going down the "splitting up the sp" route.
You're right it is quite difficult for you to get a clear picture of what is happening behind the scenes until you can get the "actual" execution plans from several executions of your overall process.
One point to consider perhaps. Are your work tables physical of temporary tables? If they are physical you will get a performance gain by inserting new data into a new table without an index (i.e. a heap) which you can then build an index on after all the data has been inserted.
Also, what is the purpose of your process. It sounds like you are moving quite a bit of data around, in which case you may wish to consider the use of partitioning. You can switch in and out data to your main table with relative ease.
Hope what I have detailed is clear but please feel free to pose further questions.
Cheers, John
In several cases I've seen this level of diversity of execution times / query plans comes down to statistics. I would recommend some tests running update stats against the tables you are using just before the process is run. This will both force a re-evaluation of the execution plan by SQL and, I suspect, give you more consistent results. Additionally you may do well to see if the differences in execution time correlate with re-indexing jobs by your dbas. Perhaps you could also gather some index health statistics before each run.
If not, as other answerers have suggested, you are more likely suffering from locking and/or contention issues.
Good luck with it.
The only thing I can think that an execution plan would do wrong when there's no data is err on the side of using a table scan instead of an index, since table scans are super fast when the whole table will fit into memory. Are there other negatives you're actually observing or are sure are happening because there's no data when an execution plan is created?
You can force usage of indexes in your query...
Seems to me like you might be going down the wrong path.
Is this an infeed or outfeed of some sort or are you creating a report? If it is a feed, I would suggest that you change the process to use SSIS which should be able to move 5 million records very fast.