Fixing a slow running SQL query - sql

I have been asked this in many interviews:
What is the first step to do if somebody complains that a query is running slowly?
I say that I run sp_who2 <active> and check the queries running to see which one is taking the most resources and if there is any locking, blocking or deadlocks going on.
Can somebody please provide me their feedback on this? Is this the best answer or is there a better approach?
Thanks!

This is one of my interview questions that I've given for years. Keep in mind that I do not use it as a yes/no, I use it to gauge how deep their SQL Server knowledge goes and whether they're server or code focused.
Your answer went towards how to find which query is running slow, and possibly examine server resource reasons as to why it's suddenly running slow. Based on your answer, I would start to label you as an operational DBA type. These are exactly the steps that an operational DBA performs when they get the call that the server is suddenly running slow. That's fine if that's what I'm interviewing for and that's what you're looking for. I might dig further into what your steps would be to resolve the issue once you find deadlocks for example, but I wouldn't expect people to be able to go very deep. If it's not a deadlock or blocking, better answers here would be to capture the execution plan and see if there are stale stats. It's also possible that parameter sniffing is going on, so a stored proc may need to be "recompiled". Those are the typical problems I see the DBA's running into. I don't interview for DBA's often so maybe other people have deeper questions here.
If the interview is for a developer job however, then I would expect the answer more to make an assumption that we've already located which query is running slowly, and that it's reproducible. I'll even go ahead and state as much if needed. The things that a developer has control over are different than what the operational DBA has control over, so I would expect the developer to start looking at the code.
People will often recommend looking at the execution plan at this point, and therefore recommend it as a good answer. I'll explain a little later why I don't necessarily agree that this is the best first step. If the interviewee does happen to mention the execution plan at this point however, my followup questions would be to ask what they're looking for on the execution plan. The most common answer would be to look for table scans instead of seeks, possibly showing signs of a missing index. The answers that show me more experience working with execution plans have to do with looking for steps with the highest percentage of the whole and/or looking for thick lines.
I find a lot of query tuning efforts go astray when starting with the execution plans and solutions get hacky because the people tuning the queries don't know what they want the execution plan to look like, just that they don't like the one they have. They'll then try to focus on the seemingly worst performing step, adding indexes, query hints, etc, when it may turn out that because of some other step, the entire execution plan is flipped upside down, and they're tuning the wrong piece. If, for example, you have three tables joined together on foreign keys, and the third table is missing an index, SQL Server may decide that the next best plan is to walk the tables in the opposite direction because primary key indexes exist there. The side effect may be that it looks like the first table is the one with the problem when really it's the third table.
The way I go about tuning a query, and therefore what I prefer to hear as an answer, is to look at the code and get a feel for what the code is trying to do and how I would expect the joins to flow. I start breaking up the query into pieces starting with the first table. Keep in mind that I'm using the term "first" here loosely, to represent the table that I want SQL Server to start in. That is not necessarily the first table listed. It is however typically the smallest table, especially with the "where" applied. I will then slowly add in the additional tables one by one to see if I can find where the query turns south. It's typically a missing index, no sargability, too low of cardinality, or stale statistics. If you as the interviewee use those exact terms in context, you're going to ace this question no matter who is interviewing you.
Also, once you have an expectation of how you want the joins to flow, now is a good time to compare your expectations with the actual execution plan. This is how you can tell if a plan has flipped on you.
If I was answering the question, or tuning an actual query, I would also add that I like to get row counts on the tables and to look at the selectivity of all columns in the joins and "where" clauses. I also like to actually look at the data. Sometimes problems just aren't obvious from the code but become obvious when you see some of the data.

I can't really say which is the best answer, but I'd answer: analyze the Actual Execution Plan. That should be a basis to check for performance issues.
There is plenty of information to be found on the internet about analyzing Execution Plans. I suggest you check it out.

Use SQL Profiler. Do needed settings and run your Stored procedure and check which statement is taking more duration. execute those statements separate, get execution plan. check for missed indices, joining order (join smaller tables first.). Try to use temp tables joininig tables.

I guess I'm a coder based on Bruce's interview model, but I'm currently working with a slow query problem that led me here. We're using nHibernate as our ORM, and some poor technology I've never seen before that doesn't take Lazy Loading into account when it talks to nHibernate. As such, the slow query is slow because it's in fact a horrible query, joining every table it can (the generated query fills two pages of the screen). The same query when we re-wrote it using Linq removed all the joins.
No matter what role you're in, I think asking the question: is this the right query, needs to be the number one question, no matter the role. Even as a DBA, looking at the query you might recommend changing the query if it's a bad one. Focusing on the query plan and indexes and other optimization fine tuning should be secondary to making sure you're optimizing what you actually want. I like Bruce's answer for this focus.

Related

BigQuery - how to think about query optimisation when coming from a SQL Server background

I have a background that includes SQL Server and Informix database query optimisation (non big-data). I'm confident in how to maximise database performance on those systems. I've recently been working with BigQuery and big data (about 9+ months), and optimisation doesn't seem to work the same way. I've done some research and read some articles on optimisation, but I still need to better understand the basics of how to optimise on BigQuery.
In SQL Server/Informix, a lot of the time I would introduce a column index to speed up reads. BigQuery doesn't have indexes, so I've mainly been using clustering. When I've done benchmarking after introducing a cluster for a column that I thought should make a difference, I didn't see any significant change. I'm also not seeing a difference when I switch on query cacheing. This could be an unfortunate coincidence with the queries I've tried, or a mistaken perception, however with SQL Server/SQL Lite/Informix I'm used to seeing immediate significant improvement, consistently. Am I misunderstanding clustering (I know it's not exactly like an index, but I'm expecting it should work in a similar type of way), or could it just be that I've somehow been 'unlucky' with the optimisations.
And this is where the real point is. There's almost no such thing as being 'unlucky' with optimisation, but in a traditional RDBMS I would look at the execution plan and know exactly what I need to do to optimise, and find out exactly what's going on. With BigQuery, I can get the 'execution details', but it really isn't telling me much (at least that I can understand) about how to optimise, or how the query really breaks down.
Do I need a significantly different way of thinking about BigQuery? Or does it work in similar ways to an RDBMS, where I can consciously make the first JOINS eliminate as many records as possible, use 'where' clauses that focus on indexed columns, etc. etc.
I feel I haven't got the control to optimise like in a RDBMS, but I'm sure I'm missing a major point (or a few points!). What are the major strategies I should be looking at for BigQuery optimisation, and how can I understand exactly what's going on with queries? If anyone has any links to good documentation that would be fantastic - I'm yet to read something that makes me think "Aha, now I get it!".
It is absolutely a paradigm shift in how you think. You're right: you don't have hardly any control in execution. And you'll eventually come to appreciate that. You do have control over architecture, and that's where a lot of your wins will be. (As others mentioned in comments, the documentation is definitely helpful too.)
I've personally found that premature optimization is one of the biggest issues in BigQuery—often the things you do trying to make a query faster actually have a negative impact, because things like table scans are well optimized and there are internals that you can impact (like restructuring a query in a way that seems more optimal, but forces additional shuffles to disk for parallelization).
Some of the biggest areas our team HAS seem greatly improve performance are as follows:
Use semi-normalized (nested/repeated) schema when possible. By using nested STRUCT/ARRAY types in your schema, you ensure that the data is colocated with the parent record. You can basically think of these as tables within tables. The use of CROSS JOIN UNNEST() takes a little getting used to, but eliminating those joins makes a big difference (especially on large results).
Use partitioning/clustering on large datasets when possible. I know you mention this, just make sure that you're pruning what you can using _PARTITIONTIME when possible, and also using clutering keys that make sense for your data. Keep in mind that clustering basically sorts the storage order of the data, meaning that the optimizer knows it doesn't have to continue scanning if the criteria has been satisfied (so it doesn't help as much on low-cardinality values)
Use analytic window functions when possible. They're very well optimized, and you'll find that BigQuery's implementation is very mature. Often you can eliminate grouping this way, or filter our more of your data earlier in the process. Keep in mind that sometimes filtering data in derived tables or Common Table Expressions (CTEs/named WITH queries) earlier in the process can make a more deeply nested query perform better than trying to do everything in one flat layer.
Keep in mind that results for Views and Common Table Expressions (CTEs/named WITH queries) aren't materialized during execution. If you use the CTE multiple times, it will be executed multiple times. If you join the same View multiple times, it will be executed multiple times. This was hard for members of our team who came from the world of materialized views (although it looks like somethings in the works for that in BQ world since there's an unused materializedView property showing in the API).
Know how the query cache works. Unlike some platforms, the cache only stores the output of the outermost query, not its component parts. Because of this, only an identical query against unmodified tables/views will use the cache—and it will typically only persist for 24 hours. Note that if you use non-deterministic functions like NOW() and a host of other things, the results are non-cacheable. See details under the Limitations and Exceptions sections of the docs.
Materialize your own copies of expensive tables. We do this a lot, and use scheduled queries and scripts (API and CLI) to normalize and save a native table copy of our data. This allows very efficient processing and fast responses from our client dashboards as well as our own reporting queries. It's a pain, but it works well.
Hopefully that will give you some ideas, but also feel free to post queries on SO in the future that you're having a hard time optimizing. Folks around here are pretty helpful when you let them know what your data looks like and what you've already tried.
Good luck!

When to use hints in oracle query [duplicate]

This question already has an answer here:
When to use Oracle hints?
(1 answer)
Closed 5 years ago.
I have gone through some documentation on the net and using hints is mostly discouraged. I still have doubts about this. Can hints be really useful in production specially when same query is used by hundreds of different customer.
Is hint only useful when we know the number of records that are present in the tables? I am using leading in my query and it gives faster results when the data is very large but the performance is not that great when the records fetched are less.
This answer by David is very good but I would appreciate if someone clarified this in more details.
Most hints are a way of communicating our intent to the optimizer. For instance, the leading hint you mention means join tables in this order. Why is this necessary? Often it's because the optimal join order is not obvious, because the query is badly written or the database statistics are inaccurate.
So one use of hints such as leading is to figure out the best execution path, then to figure out why the database doesn't choose that plan without the hint. Does gathering fresh statistics solve the problem? Does rewriting the FROM clause solve the problem? If so, we can remove the hints and deploy the naked SQL.
Some times there are times where we cannot resolve this conundrum, and have to keep the hints in Production. However this should be a rare exception. Oracle have had lots of very clever people working on the Cost-Based Optimizer for many years, so its decisions are usually better than ours.
But there are other hints we would not blink to see in Production. append is often crucial for tuning bulk inserts. driving_site can be vital in tuning distributed queries.
Conversely other hints are almost always abused. Yes parallel, I'm talking about you. Blindly putting /*+ parallel (t23, 16) */ will probably not make your query run sixteen times faster, and not infrequently will result in slower retrieval than a single-threaded execution.
So, in short, there is no universally applicable advice to when we should use hints. The key things are:
understand how the database works, and especially how the cost-based optimizer works;
understand what each hint does;
test hinted queries in a proper tuning environment with Production-equivalent data.
Obviously the best place to start is the Oracle documentation. However, if you feel like spending some money, Jonathan Lewis's book on the Cost-Based Optimizer is the best investment you could make.
I couldn't just rephrase that, so I will paste it here
(it's a brief explanation as of "When Not To Use Hints", that I had bookmarked):
In summary, don’t use hints when
What the hint does is poorly understood, which is of course not limited to the (ab)use of hints;
You have not looked at the root cause of bad SQL code and thus not yet tapped into the vast expertise and experience of your DBA in tuning the database;
Your statistics are out of date, and you can refresh the statistics more frequently or even fix the statistics to a representative state;
You do not intend to check the correctness of the hints in your statements on a regular basis, which means that, when statistics change, the hint may be woefully inadequate;
You have no intention of documenting the use of hints anyway.
Source link here.
I can summarize this as: The use of hints is not only a last resort, but also a lack of knowledge on the root cause of the issue. The CBO (Cost Based Optimizer) does an excellent job, if you just ensure some basics for it. Those include:
Fresh statistics
1.1. Index statistics
1.2. Table statistics
1.3. Histograms
Correct JOIN conditions and INDEX utilization
Correct Database settings
This article here is worth reading:
Top 10 Reasons for poor Oracle performance
Presented by non other, but Mr. Donald Burleson.
Cheers
In general hints should be used only exceptional, I know following situations where they make sense:
Workaround for Oracle bugs
Example: Once for a SELECT statement I got an error ORA-01795: maximum expression number in list - 1000, although the query did not contain an IN expression at all.
Problem was: The queried table contains more than 1000 (sub-) partitions and Oracle did a transformation of my query. Using the (undocumented) hint NO_EXPAND_TABLE solved the issue.
Datewarehouse application while staging
While staging you can have significant changes on your data where the table/index statistics are not aware about as statistics are gathered only once a week by default. If you know your data structure then hints could be useful as they are faster than running DBMS_STATS.GATHER_TABLE_STATS(...) manually all the time in between your operations. On the other hand you can run DBMS_STATS.GATHER_TABLE_STATS() even for single columns which might be the better approach.
Online Application Upgrade Hints
From Oracle documentation:
The CHANGE_DUPKEY_ERROR_INDEX, IGNORE_ROW_ON_DUPKEY_INDEX, and
RETRY_ON_ROW_CHANGE hints are unlike other hints in that they have a
semantic effect. The general philosophy explained in "Hints" does not
apply for these three hints.

is there such thing as a query being too big?

I don't have too much experience with SQL. Most of the queries I have written have been very small. Whenever I see a very large query, I always kinda assume it needs to be optimized. But is this true? or is there situations where really large queries are just whats needed?
BTW when I say large queries I mean queries that exceed 1000+ chars
Yes, any statement, method, or even query can be "too big".
The problem, is actually defining what too big really is.
If you can't sit down and figure out what the query does in a relatively short amount of time, it's probably best to break it up into smaller chunks.
I always like to look at things from a maintenance standpoint. If the query is hard to understand now, what if you have to debug something in it?
Just because you see a large query, doesn't mean it needs to be changed or optimized, but if it's too complicated for its own good, then you might want to consider refactoring.
Just as in other languages, you can't determine the efficiency of a query based on a character count. Also, 1000 characters isn't what I could call "large", especially when you use good table/column names, aliases that make sense, etc.
If you're not comfortable enough with SQL to be able to "eye ball" the design merits of particular query, run it through a profiler and examine the execution plan. That'll give you a good idea of problems, if any, the code in question will suffer from.
My rule of thumb is this: write the best, tightest, simplest code you can, and optimize where needed - ie, where you see a performance bottleneck or where (as frequently happens) you slap yourself in the head and say "D'OH!" at about three in the morning on vacation.
Summary:Code well, and optimize where needed.
As Robert said, if you can't easily tell what the query is doing, it probably needs to be simplified.
If you are used to writing simple stuff, you may not realize how complex getting information for a complex report might be. Yes, queries can get long and complicated and still perform well for what they are being asked to do. Often the techniques that are used to performance tune something may make the code look more complicated to those less familar with advanced querying techniques. What counts is how long it takes to execute and whether it returns the correct data, not how many characters it has.
When I see a complex query, my first thought is does it return what the developer really intended to return (you'd be surprised at how often the answer to that is no) and then I look to see if it could be performance tuned. Yes there are many badly written long queries out there, but there are also as many or more that do what they are intended to do about as fast as it can be done without a major database redesign or faster hardware.
I'd suggest that it's not the characters that should measure the size/complexity of the query.
I'd boil it down to:
what's the goal of the query?
does it used set-based logic?
does it re-use any components?
does it JOIN improperly/poorly?
what are the performance implications?
maintainability concerns - is it written so that another developer can grok its intentions?
Where I work we've created stored procedures that exceed 1000 characters. I can't really say it was NECESSARY but sometimes haste wins out over efficiency (most notably when a quick fix is necessary for a client).
Having said that ... if given the time I would attempt to optimize a query as small/efficient as it can get without it being overly confusing. I've used nested stored procedures to make things a little more clear and/or functions as well.
The number of characters does not mean that a query needs to be optimized - it is what you see within those characters that does.
Things like subqueries on top of subqueries is something I would review. I'd review JOINs as well, but it shouldn't take long comparing to the ERD to know if there's an unnecessary JOIN - the first thing I'd look at would be what tables are joined but not used in the output, which is fine if the tables are link/corrollary/etc tables.

Any SQL database: When is it better to fetch a whole table instead of querying for particular rows?

I have a table that contains maybe 10k to 100k rows and I need varying sets of up to 1 or 2 thousand rows, but often enough a lot less. I want these queries to be as fast as possible and I would like to know which approach is generally smarter:
Always query for exactly the rows I need with a WHERE clause that's different all the time.
Load the whole table into a cache in memory inside my app and search there, syncing the cache regularly
Always query the whole table (without WHERE clause), let the SQL server handle the cache (it's always the same query so it can cache the result) and filter the output as needed
I'd like to be agnostic of a specific DB engine for now.
with 10K to 100K rows, number 1 is the clear winner to me. If it was <1K I might say keep it cached in the application, but with this many rows, let the DB do what it was designed to do. With the proper indexes, number 1 would be the best bet.
If you were pulling the same set of data over and over each time then caching the results might be a better bet too, but when you are going to have a different where all the time, it would be best to let the DB take care of it.
Like I said though, just make sure you index well on all the appropriate fields.
Seems to me that a system that was designed for rapid searching, slicing, and dicing of information is going to be a lot faster at it than the average developers' code. On the other hand, some factors that you don't mention include the location or potential location of the database server in relation to the application - returning large data sets over slower networks would certainly tip the scales in favor of the "grab it all and search locally" option. I think that, in the 'general' case, I'd recommend querying for just what you want, but that in special circumstances, other options may be better.
I firmly believe option 1 should be preferred in an initial situation.
When you encounter performance problems, you can look on how you could optimize it using caching. (Pre optimization is the root of all evil, Dijkstra once said).
Also, remember that if you would choose option 3, you'll be sending the complete table-contents over the network as well. This also has an impact on performance .
In my experience it is best to query for what you want and let the database figure out the best way to do it. You can examine the query plan to see if you have any bottlenecks that could be helped by indexes as well.
First of all, let us dismiss #2. Searching tables is data servers reason for existence, and they will almost certainly do a better job of it than any ad hoc search you cook up.
For #3, you just say 'filter the output as needed" without saying where that filter is been done. If it's in the application code as in #2, than, as with #2, than you have the same problem as #2.
Databases were created specifically to handle this exact problem. They are very good at it. Let them do it.
The only reason to use anything other than option 1 is if the WHERE clause itself is huge (i.e. if your WHERE clause identifies each row individually, e.g. WHERE id = 3 or id = 4 or id = 32 or ...).
Is anything else changing your data? The point about letting the SQL engine optimally slice and dice is a good one. But it would be surprising if you were working with a database and do not have the possibility of "someone else" changing the data. If changes can be made elsewhere, you certainly want to re-query frequently.
Trust that the SQL server will do a better job of both caching and filtering than you can afford to do yourself (unless performance testing shows otherwise.)
Note that I said "afford to do" not just "do". You may very well be able to do it better but you are being paid (presumably) to provide functionality not caching.
Ask yourself this... Is spending time writing cache management code helping you fulfil your requirements document?
if you do this:
SELECT * FROM users;
mysql should perform two queries: one to know fields in the table and another to bring back the data you asked for.
doing
SELECT id, email, password FROM users;
mysql only reach the data since fields are explicit.
about limits: always ss best query the quantity of rows you will need, no more no less. more data means more time to drive it

Refactoring "extreme" SQL queries

I have a business user who tried his hand at writing his own SQL query for a report of project statistics (e.g. number of tasks, milestones, etc.). The query starts off declaring a temp table of 80+ columns. There are then almost 70 UPDATE statements to the temp table over almost 500 lines of code that each contain their own little set of business rules. It finishes with a SELECT * from the temp table.
Due to time constraints and 'other factors', this was rushed into production and now my team is stuck with supporting it. Performance is appalling, although thanks to some tidy up it's fairly easy to read and understand (although the code smell is nasty).
What are some key areas we should be looking at to make this faster and follow good practice?
First off, if this is not causing a business problem, then leave it until it becomes a problem. Wait until it becomes a problem, then fix everything.
When you do decide to fix it, check if there is one statement causing most of your speed issues ... issolate and fix it.
If the speed issue is over all the statements, and you can combine it all into a single SELECT, this will probably save you time. I once converted a proc like this (not as many updates) to a SELECT and the time to run it went from over 3 minutes to under 3 seconds (no shit ... I couldn't believe it). By the way, don't attempt this if some of the data is coming from a linked server.
If you don't want to or can't do that for whatever reason, then you might want to adjust the existing proc. Here are some of the things I would look at:
If you are creating indexes on the temp table, wait until after your initial INSERT to populate it.
Adjust your initial INSERT to insert as many of the columns as possible. There are probably some update's you can eliminate by doing this.
Index the temp table before running your updates. Do not create indexes on any of the columns targetted by the update statements until after their updated.
Group your updates if your table(s) and groupings allow for it. 70 updates is quite a few for only 80 columns, and sounds like there may be an opportunity to do this.
Good luck
First thing I would do is check to make sure there is an active index maintenance job being run periodically. If not, get all existing indexes rebuilt or if not possible at least get statistics updated.
Second thing I would do is set up a trace (as described here) and find out which statements are causing the highest number of reads.
Then I would run in SSMS with 'show actual execution plan' and tally the results with the trace. From this you should be able to work out whether there are missing indexes that could improve performance.
EDIT: If you are going to downvote, please leave a comment as to why.
Just like any refactoring, make sure you have an automated way to verify your refactorings after each change (you can write this yourself using queries which check the development output against a known good baseline). That way, you are always matching the known good data. This will give you a high degree of confidence in the correctness of your approach when you enter the phase where you are deciding whether to switch over to your new version of the process and want to run side by side for a few iterations to ensure correctness.
I also like to log all the test batches and the run times of the processes within the batch, so I can tell if some particular process within the batch was adversely affected at some point in time. I can get average times for processes and see trends of improvement or spot potential problems. This also lets me identify the low-hanging fruit within the batch where I can make the most improvement.
There are then almost 70 UPDATE
statements to the temp table over
almost 500 lines of code that each
contain their own little set of
business rules. It finishes with a
SELECT * from the temp table.
Actually this sounds like it can be followed and understood quite well, each update statement does one thing to the table with a specific purpose and set of business rules. I think that maintaining procedures of 500 lines of code with one or a couple of select statements that does "everything", built with 15 or so joins, and case statements etc scattered all over the place, is a lot harder to maintain. Although it would make for better performance..
It's a bit of a dilemma with SQL, that writing clear and concise code (using multiple updates, creating functions etc) always seems to have a big negative impact on performance. Trying to do everything at once, which is considered bad practice in other programming languages, seems to be the very core of set oriented languages.
If this is a report generating stored procedure, how often is it being run? If it's only necessary to run it once a day and is run during the night how much of an issue is the performance?
If it's not I'd recommend being careful in your choice to re-write it because there is a chance that you could muck up your figures.
Also it sounds like the sort of thing that should be pulled out into an SSIS package building up a new permanent table with the results so it only has to be run once.
Hope this makes sense
One thing you could try is to replace the temp table with a table variable. There are times when this is faster and times when it is not, you will have to just try it and see.
Look at the 70 update statements. It is possible to combine any of them? If the person writing did not use CASE statments, it might be possible to do fewer statements.
Other obvious things to look at - eliminate any cursors, change any subqueries to joins to tables or derived tables.
Rewrite perhaps. One hardware solution would be to make sure your database temp table goes on a 'fast' drive, perhaps a solid state disk (SSD), or can be managed all in memory.
My guess is this 'solution' was developed by someone with a grasp of and a dependency upon spreadsheets, someone who may not be very savvy on 'normalized' databases--how to construct and populate tables to retain data for reporting purposes, something which perhaps BI Business Intelligence software can be utilized with sophistication and yet be adaptable.
You didn't say 'where' the update process is being run. Is the update process being run as a SQL script from a separate computer (desktop) against the server where the data is? There can be significant bottlenecks and overhead created by that approach. If so, consider running the entire update process directly on the server as a local job, as a compiled stored procedure, bypassing the network and (multiple) cursor management overhead. It could have a scheduled time to run and a controlled priority, completing in off peak business data usage hours.
Evaluate how often 'commit' statements are really needed for the sequence of update statements...saving on a bunch of commit lines could notably improve the overall update time. There may be a couple of settings in the database client driver software which can make a notable difference.
Can the queries used for update conditions be factored out as static 'views' which in turn can be shared across multiple update statements? Views can keep in memory data/query rows frequently accessed. There may be performance tuning in determining how much update data can be pended before a commit is optimal.
It might be worth evaluating whether Triggers could be used to replace the batch job update sequence. You don't say from how many tables the data used comes from...that might help with decision making. I don't know if you have the option of adding triggers to the database tables from which the data is gathered. If so, adding a few triggers to a number of tables wouldn't really degrade overall system performance much, but might save a big wad of time on that update process. You could try replacing the update statements one at a time with triggers and see if the results are the same as before. Create a similar temp table, based on the same update process, then carefully test whether triggers feeding updates to the temp table could replace individual update statements. Perhaps you may have a sort of 'Data Warehouse' application. It might be useful to review how to set up a 'star' schema of tables to retain summarized business data for reporting.
Creating a comprehensive and cached 'view' which updates via the queries once per day, reflecting the updates might be another approach to explore.
Well, since the only thing you've told us about this stored procedure is that it has a 80+ column temp table, the only thing I can recommend is to remove that table, and rewrite the rest to remove the need for it.
You should get a tool that allows you to get an explain plan of all queries your app will run. It is the best bang for the buck on an SQL heavy app for performace increases. If you read and react to what the Explain Plan is telling you. If you are on Oracle what we used to use was TOAD by Qwest(?) I think. It was a great tool.
I would recommend looking at the tables involved, the end result, and starting from scratch to see if the query can be done in a more efficient manner. Keep the query to verify that the new one is working exactly the same as the old one, but try to forget all methods used to obtain the end result.
I would rewrite it from scratch.
You say that you understand what it supposed to do so it should not be that difficult. And I bet that the requirements for that piece of code will keep changing so if you do not rewrite it now you may end up maintaining some ugly monster