Can anyone explain the situations in which we need to make use of temporary tables in stored procedures?
There are many cases where a complex join can really trip up the optimizer and make it do very expensive things. Sometimes the easiest way to cool the optimizer down is to break the complex query into smaller parts. You'll find a lot of misinformation out there about using a #table variable instead of a #temp table because #table variables always live in memory - this is a myth and don't believe it.
You may also find this worthwhile if you have an outlier query that would really benefit from a different index that is not on the base table, and you are not permitted (or it may be detrimental) to add that index to the base table (it may be an alternate clustered index, for example). A way to get around that would be to put the data in a #temp table (it may be a limited subset of the base table, acting like a filtered index), create the alternate index on the #temp table, and run the join against the #temp table. This is especially true if the data filtered into the #temp table is going to be used multiple times.
There are also times when you need to make many updates against some data, but you don't want to update the base table multiple times. You may have multiple things you need to do against a variety of other data that can't be done in one query. It can be more efficient to put the affected data into a #temp table, perform your series of calculations / modifications, then update back to the base table once instead of n times. If you use a transaction here against the base tables you could be locking them from your users for an extended period of time.
Another example is if you are using linked servers and the join across servers turns out to be very expensive. Instead you can stuff the remote data into a local #temp table first, create indexes on it locally, and run the query locally.
Related
Does it take more time to create a table using select as statement or to just run the select statement? Is the time difference too large or can it be neglected?
For example between
create table a
as select * from b
where c = d;
and
select * from b
where c = d;
which one should run faster and can the time difference be neglected?
Creating the table will take more time. There is more overhead. If you look at the metadata for your database, you will find lots of tables or views that contain information about tables. In particular, the table names and column names need to be stored.
That said, the data processing effort is pretty similar. However, there might be overhead in storing the result set in permanent storage rather than in the data structures needed for a result set. In fact, the result set may never need to be stored "on disk" (i.e. permanently). But with a table creation, that is needed.
Depending on the database, the two queries might also be optimized differently. The SELECT query might be optimized to return the first row as fast as possible. The CREATE query might be optimized to return all rows as fast as possible. Also, the SELECT quite might just look faster if your database and interface start returning rows when they first appear.
I should point out that under most circumstances, the overhead might not really be noticeable. But, you can get other errors with the create table statement that you would not get with just the select. For instance, the table might already exist. Or duplicate column names might pose a problem (although some databases don't allow duplicate column names in result sets either).
I have been tasked with replacing a costly stored procedure which performs calculations across 10 - 15 tables, some of which contain many millions of rows. The plan is to pre-stage the many computations and store the results in separate tables for speeding reading.
Having quickly created these new tables and inserted all of the necessary pre-staged data as a test case, the execution time of getting the same results is vastly improved, as you would expect.
My question is, what is the best practice for keeping these new separate tables up to date?
A procedure which runs at a specific interval could do it, but there
is a requirement for the data to be live.
A trigger on each table could do it, but that seems very costly, and
could cause slow-downs for everywhere else that uses these tables.
Are there other alternatives?
Have you considered Indexed Views for this? As long as you meet the criteria for creating Indexed Views (no self joins etc) it may well be a good solution.
The downsides of Indexed Views are that when the data in underlying tables is changed (delete, update, insert) then it will have to recalculate the indexed view. This can slow down these types of operations in certain circumstances so you have to be careful. I've put some links to documentation below;
https://www.brentozar.com/archive/2013/11/what-you-can-and-cant-do-with-indexed-views/
https://msdn.microsoft.com/en-GB/library/ms191432.aspx
https://technet.microsoft.com/en-GB/library/ms187864(v=sql.105).aspx
what is the best practice for keeping these new separate tables up to date?
Answer is it depends .Depends on what ..?
1.How frequently you will use those computed values
2.what is the acceptable data latency
we to have same kind of reporting where we store computed values in seperate tables and use them in reports.In our case we run this sps before sending the reports through SQL server agent
Consider using an A/B table solution. Place a generic view on over the _A table version (CREATE VIEW MY_TABLE AS SELECT * FROM MY_TABLE_A). And then you rebuild the _B version, and then switch the view to the _B version (CREATE VIEW MY_TABLE AS SELECT * FROM MY_TABLE_B). It takes twice as much space for processing, but it gives you the opportunity to build your tables without down-time.
I've got a table that can contain a variety of different types and fields, and I've got a table definitions table that tells me which field contains which data. I need to select things from that table, so currently I build up a dynamic select statement based on what's in that table definitions table and select it all into a temp table, then work from that.
The actual amount of data I'm selecting is quite big, over 5 million records. I'm wondering if a temp table is really the best way to go around doing this.
Are there other more efficient options of doing what I need to do?
If your data is static, reports like - cache most popular queries results, preferably on Application Server. Or do multidimensional modeling (cubes). That is the really "more efficient option to do that".
Temp tables, table variables, table data types... In any case you will use your tempdb, and if you want to optimize your queries, try to optimize tempdb storage (after checking IO statistics ). You can aslo create indexes for your temp tables.
You can use Table Variables to achieve the functionality.
If you are using the same structure in multiple queries, you can go for custom defined Table data types as well.
http://technet.microsoft.com/en-us/library/ms188927.aspx
http://technet.microsoft.com/en-us/library/bb522526(v=sql.105).aspx
For my company I am redesigning some stored procedures. The original procedures are using lots of permanent tables which are filled during the execution of procedure and at the end, the values are deleted. The number of rows can extend from 100 to 50,000 rows for calculation of aggregations.
My question is, will there be severe performance issues if I replace those tables with temp tables ? Is it feasible to use temp tables ?
It depends on how often your using them, how long the processing takes, and if you are concurrently accessing data from the tables while writing.
If you use a temp table, it won't be sitting around waiting for indexing and caching while it's not in use. So it should save an ever so slight bit of resources there. However, you will incur overhead with the temp tables (i.e. creating and destroying).
I would re-examine how your queries function in the procedures and consider employing more in procedure CURSOR operations instead of loading everything into tables and deleting them.
However, databases are for storing information and retrieving information. I would shy away from using permanent tables for routine temp work and stick with the temp tables.
The overall performance shouldn't have any effect with the use case you specified in your question.
Hope this helps,
Jeffrey Kevin Pry
Yes its certainly feasible, you may want to check to see if the permanent tables have any indexing on them to speed up joins and so on.
I agree with Jeffrey. It always depends.
Since you're using Sql Server 2008 you might have a look at table variables.
They should be lighter than TEMP tables.
I define a User Defined Function which returns a table variable like this:
CREATE FUNCTION .ufd_GetUsers ( #UserCode INT )
RETURNS #UsersTemp TABLE
(
UserCode INT NOT NULL,
RoleCode INT NOT NULL
)
AS
BEGIN
INSERT #RolesTemp
SELECT
dbo.UsersRoles.Code, Roles.Code
FROM
dbo.UsersRoles
INNER JOIN
dbo.UsersRolesRelations ON dbo.UsersRoles.Code = dbo.UsersRolesRelations.UserCode
INNER JOIN
dbo.UsersRoles Roles ON dbo.UsersRolesRelations.RoleCode = Roles.Code
WHERE dbo.UsersRoles.Code = #UserCode
INSERT #UsersTemp VALUES(#UserCode, #UserCode)
RETURN
END
A big question is, can more then one person run one of these stored procedures at a time? I regularly see these kind of tables carried over from old single user databases (or from programmers who couldn't do subqueries or much of anything beyond SELECT * FROM). What happens if more then one user tries to run the same procedure, what happens if it crashes midway through - does the table get cleaned up? With temp tables or table variables you have the ability to properly scope the table to just the current connection.
Definitely use a temporary table, especially since you've alluded to the fact that its purpose is to assist with calculations and aggregates. If you used a table inside one of your database's schemas all that work is going to be logged - written, backed up, and so on. Using a temporary table eliminates that overhead for data that in the end you probably don't care about.
You actually might save some time from the fact that you can drop the temp tables at the end instead of deleting rows (you said you had multiple users so you have to delete rather than truncate). Deleting is a logged operation and can add considerable time to the process. If the permanent tables are indexed, then create the temp tables and index them as well. I would bet you would see an increase in performance usless your temp db is close to out of space.
Table variables also might work but they can't be indexed and they are generally only faster for smaller datasets. So you might try a combination of temp tables for the things taht will be large enough to benfit form indexing and table varaibles for the smaller items.
An advatage of using temp tables and table variables is that you guarantee that one users process won;t interfer with another user's process. You say they currently havea way to identify which records but all it takes is one bug being introduced to break that when using permanent tables. Permanent table for temporary processing are a very risky choice. Temp tables and table variabels can never see the data from someone else's process and thus are far safer as a choice.
Table variables are normally the way to go.
SQL2K and below can have significant performance bottlenecks if there are many temp tables being manipulated - the issue is the blocking DDL on the system tables.
Sql2005 is better, but table vars avoid the whole issue by not using those system tables at all, so can perform without inter-user locking issues (except those involved with the source data).
The issue is then that table vars only persist within scope, so if there is genuinuely a large amount of data that needs to be processed repeatedly & needs to be persisted over a (relatively) long duration then 'static' work tables may actually be faster - it does need a user key of some sort & regular cleaning. A last resort really.
Sometimes we can write a query with both derived table and temporary table. my question is that which one is better? why?
Derived table is a logical construct.
It may be stored in the tempdb, built at runtime by reevaluating the underlying statement each time it is accessed, or even optimized out at all.
Temporary table is a physical construct. It is a table in tempdb that is created and populated with the values.
Which one is better depends on the query they are used in, the statement that is used to derive a table, and many other factors.
For instance, CTE (common table expressions) in SQL Server can (and most probably will) be reevaluated each time they are used. This query:
WITH q (uuid) AS
(
SELECT NEWID()
)
SELECT *
FROM q
UNION ALL
SELECT *
FROM q
will most probably yield two different NEWID()'s.
In this case, a temporary table should be used since it guarantees that its values persist.
On the other hand, this query:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS rn
FROM master
) q
WHERE rn BETWEEN 80 AND 100
is better with a derived table, because using a temporary table will require fetching all values from master, while this solution will just scan the first 100 records using the index on id.
It depends on the circumstances.
Advantages of derived tables:
A derived table is part of a larger, single query, and will be optimized in the context of the rest of the query. This can be an advantage, if the query optimization helps performance (it usually does, with some exceptions). Example: if you populate a temp table, then consume the results in a second query, you are in effect tying the database engine to one execution method (run the first query in its entirety, save the whole result, run the second query) where with a derived table the optimizer might be able to find a faster execution method or access path.
A derived table only "exists" in terms of the query execution plan - it's purely a logical construct. There really is no table.
Advantages of temp tables
The table "exists" - that is, it's materialized as a table, at least in memory, which contains the result set and can be reused.
In some cases, performance can be improved or blocking reduced when you have to perform some elaborate transformation on the data - for example, if you want to fetch a 'snapshot' set of rows out of a base table that is busy, and then do some complicated calculation on that set, there can be less contention if you get the rows out of the base table and unlock it as quickly as possible, then do the work independently. In some cases the overhead of a real temp table is small relative to the advantage in concurrency.
I want to add an anecdote here as it leads me to advise the opposite of the accepted answer. I agree with the thinking presented in the accepted answer but it is mostly theoretical. My experience has lead me to recommend temp tables over derived tables, common table expressions and table value functions. We used derived tables and common table expressions extensively with much success based on thoughts consistent with the accepted answer until we started dealing with larger result sets and/or more complex queries. Then we found that the optimizer did not optimize well with the derived table or CTE.
I looked at an example today that ran for 10:15. I inserted the results from the derived table into a temp table and joined the temp table in the main query and the total time dropped to 0:03. Usually when we see a big performance problem we can quickly address it this way. For this reason I recommend temp tables unless your query is relatively simple and you are certain it will not be processing large data sets.
The big difference is that you can put constraints including a primary key on a temporary table. For big (I mean millions of records) sometime you can get better performance with temporary. I have the key query that needs 5 joins (each joins happens to be similar). Performance was OK with 2 joins and then on the third performance went bad and query plan went crazy. Even with hints I could not correct the query plan. Tried restructuring the joins as derived tables and still same performance issues. With with temporary tables can create a primary key (then when I populate first sort on PK). When SQL could join the 5 tables and use the PK performance went from minutes to seconds. I wish SQL would support constraints on derived tables and CTE (even if only a PK).