That is, using temporary tables with some initial unique data and then populating it one or several fields at a time. Sometimes it makes code seem more readable but it also leads to procedural type thinking. And it's also slower than using derived tables or other methods. Is it discouraged in industry?
It would be a bad practice if all set-based operations were a) implemented and b) efficiently in all engines.
However, for some tasks (like emulating LAG and LEAD in SQL Server, long insert chains on cascading auto-generated id is several tables etc), temp tables or table variables are a nice solution.
You should note that temporary tables are very often created and dropped by the engine itself for the operations involving using temporary in MySQL, spool in SQL Server etc.
So each time you create a temp table you should ask yourself a question:
Do I create a temp table because I don't know a set-based way, or because I know a set-based way but the server (or optimizer) does not?
If the answer is "I know but the optimizer does not", then create the table. The optimizer would do the same if it could.
Related
I have been tasked with replacing a costly stored procedure which performs calculations across 10 - 15 tables, some of which contain many millions of rows. The plan is to pre-stage the many computations and store the results in separate tables for speeding reading.
Having quickly created these new tables and inserted all of the necessary pre-staged data as a test case, the execution time of getting the same results is vastly improved, as you would expect.
My question is, what is the best practice for keeping these new separate tables up to date?
A procedure which runs at a specific interval could do it, but there
is a requirement for the data to be live.
A trigger on each table could do it, but that seems very costly, and
could cause slow-downs for everywhere else that uses these tables.
Are there other alternatives?
Have you considered Indexed Views for this? As long as you meet the criteria for creating Indexed Views (no self joins etc) it may well be a good solution.
The downsides of Indexed Views are that when the data in underlying tables is changed (delete, update, insert) then it will have to recalculate the indexed view. This can slow down these types of operations in certain circumstances so you have to be careful. I've put some links to documentation below;
https://www.brentozar.com/archive/2013/11/what-you-can-and-cant-do-with-indexed-views/
https://msdn.microsoft.com/en-GB/library/ms191432.aspx
https://technet.microsoft.com/en-GB/library/ms187864(v=sql.105).aspx
what is the best practice for keeping these new separate tables up to date?
Answer is it depends .Depends on what ..?
1.How frequently you will use those computed values
2.what is the acceptable data latency
we to have same kind of reporting where we store computed values in seperate tables and use them in reports.In our case we run this sps before sending the reports through SQL server agent
Consider using an A/B table solution. Place a generic view on over the _A table version (CREATE VIEW MY_TABLE AS SELECT * FROM MY_TABLE_A). And then you rebuild the _B version, and then switch the view to the _B version (CREATE VIEW MY_TABLE AS SELECT * FROM MY_TABLE_B). It takes twice as much space for processing, but it gives you the opportunity to build your tables without down-time.
I have one select in my stored procedure that returns 4000 + rows. I was going to make this a temp table to work off the data later in the procedure.
I also have various other selects that only return 100-300 rows. I was going to make these table variables, again to work off later in the procedure.
Is it ok to use temp tables and table variables in the same procedure, or will this cause any performance issues?
Yes it is ok.
As for programming practice, I would prefer one type or the other (and lean toward table variables), if I'm reading a stored procedure. However, you might have a good reason for using one or the other, such as needing an index on a temp table or using it for a select into, then go ahead.
This is where you need to look for a full set of options sommarskog.se - share_data
Being able to add various indexes to temp tables is a particularly reason I'll sometimes choose temporary tables.
To avoid hitting temp db continuously, and if indexes are not required, then I'll use table variables.
Quite often now I use lots of CTEs that work together and avoid using any sort of materialized tables.
Classic answer - "it depends!"
I think there are many factors here that we don't know, such as your company's resources, your time-constraints, etc.
Generally speaking, it is fine to use temp tables for this purpose. And 100-300 rows(mentioned in the selects) - that's peanuts. No worries.
For my company I am redesigning some stored procedures. The original procedures are using lots of permanent tables which are filled during the execution of procedure and at the end, the values are deleted. The number of rows can extend from 100 to 50,000 rows for calculation of aggregations.
My question is, will there be severe performance issues if I replace those tables with temp tables ? Is it feasible to use temp tables ?
It depends on how often your using them, how long the processing takes, and if you are concurrently accessing data from the tables while writing.
If you use a temp table, it won't be sitting around waiting for indexing and caching while it's not in use. So it should save an ever so slight bit of resources there. However, you will incur overhead with the temp tables (i.e. creating and destroying).
I would re-examine how your queries function in the procedures and consider employing more in procedure CURSOR operations instead of loading everything into tables and deleting them.
However, databases are for storing information and retrieving information. I would shy away from using permanent tables for routine temp work and stick with the temp tables.
The overall performance shouldn't have any effect with the use case you specified in your question.
Hope this helps,
Jeffrey Kevin Pry
Yes its certainly feasible, you may want to check to see if the permanent tables have any indexing on them to speed up joins and so on.
I agree with Jeffrey. It always depends.
Since you're using Sql Server 2008 you might have a look at table variables.
They should be lighter than TEMP tables.
I define a User Defined Function which returns a table variable like this:
CREATE FUNCTION .ufd_GetUsers ( #UserCode INT )
RETURNS #UsersTemp TABLE
(
UserCode INT NOT NULL,
RoleCode INT NOT NULL
)
AS
BEGIN
INSERT #RolesTemp
SELECT
dbo.UsersRoles.Code, Roles.Code
FROM
dbo.UsersRoles
INNER JOIN
dbo.UsersRolesRelations ON dbo.UsersRoles.Code = dbo.UsersRolesRelations.UserCode
INNER JOIN
dbo.UsersRoles Roles ON dbo.UsersRolesRelations.RoleCode = Roles.Code
WHERE dbo.UsersRoles.Code = #UserCode
INSERT #UsersTemp VALUES(#UserCode, #UserCode)
RETURN
END
A big question is, can more then one person run one of these stored procedures at a time? I regularly see these kind of tables carried over from old single user databases (or from programmers who couldn't do subqueries or much of anything beyond SELECT * FROM). What happens if more then one user tries to run the same procedure, what happens if it crashes midway through - does the table get cleaned up? With temp tables or table variables you have the ability to properly scope the table to just the current connection.
Definitely use a temporary table, especially since you've alluded to the fact that its purpose is to assist with calculations and aggregates. If you used a table inside one of your database's schemas all that work is going to be logged - written, backed up, and so on. Using a temporary table eliminates that overhead for data that in the end you probably don't care about.
You actually might save some time from the fact that you can drop the temp tables at the end instead of deleting rows (you said you had multiple users so you have to delete rather than truncate). Deleting is a logged operation and can add considerable time to the process. If the permanent tables are indexed, then create the temp tables and index them as well. I would bet you would see an increase in performance usless your temp db is close to out of space.
Table variables also might work but they can't be indexed and they are generally only faster for smaller datasets. So you might try a combination of temp tables for the things taht will be large enough to benfit form indexing and table varaibles for the smaller items.
An advatage of using temp tables and table variables is that you guarantee that one users process won;t interfer with another user's process. You say they currently havea way to identify which records but all it takes is one bug being introduced to break that when using permanent tables. Permanent table for temporary processing are a very risky choice. Temp tables and table variabels can never see the data from someone else's process and thus are far safer as a choice.
Table variables are normally the way to go.
SQL2K and below can have significant performance bottlenecks if there are many temp tables being manipulated - the issue is the blocking DDL on the system tables.
Sql2005 is better, but table vars avoid the whole issue by not using those system tables at all, so can perform without inter-user locking issues (except those involved with the source data).
The issue is then that table vars only persist within scope, so if there is genuinuely a large amount of data that needs to be processed repeatedly & needs to be persisted over a (relatively) long duration then 'static' work tables may actually be faster - it does need a user key of some sort & regular cleaning. A last resort really.
In SQL Server stored procedures when to use temporary tables and when to use cursors. which is the best option performance wise?
If ever possible avoid cursors like the plague. SQL Server is set-based - anything you need to do in an RBAR (row-by-agonizing-row) fashion will be slow, sluggish and goes against the basic principles of how SQL works.
Your question is very vague - based on that information, we cannot really tell what you're trying to do. But the main recommendation remains: whenever possible (and it's possible in the vast majority of cases), use set-based operations - SELECT, UPDATE, INSERT and joins - don't force your procedural thinking onto SQL Server - that's not the best way to go.
So if you can use set-based operations to fill and use your temporary tables, I would prefer that method over cursors every time.
Cursors work row-by-row and are extremely poor performers. They can in almost all cases be replaced by better set-based code (not normally temp tables though)
Temp tables can be fine or bad depending on the data amount and what you are doing with them. They are not generally a replacement for a cursor.
Suggest you read this:
http://wiki.lessthandot.com/index.php/Cursors_and_How_to_Avoid_Them
I believe SARAVAN originally made the comparison between cursors and temp tables because many times you are confronted with a situation where using a temp table with an identity column and a #counter variable can be used to scroll/navigate through a data set much like one in a cursor.
In my experience, using the temp table (or table variable) scenario can help me get the job done 95% of the time and is faster than the typically slow cursor.
I am creating a table to summarize data that is gathered from about 8 or so queries that have very light logic/WHERE clauses and all select against different tables.
I was wondering what the best option would be to fetch the summarized data:
One query with multiple JOINS to gather all relevant information
A stored proc that encapsulates the logic and maybe executes the 8 queries and does the "joining" in some other way? This seems more modular and maintainable to me...but I'm not sure.
I am using SQL Server 2008 for this. Any suggestions?
If you can, then use usual SQL methods. Db's are optimized to run them. This "joining in some other way" would probably require the use of cursor which slows down everything. Just let the db do its job. If you need more performance then you should examine execution plan and do what has to be done there(eg. adding indexes).
Databases are pretty good at figuring out the optimal way of executing SQL. It is what they are designed to do. Using stored procedures to load the data in chunks and combining it yourself will be more complex to write, and likely to be less efficient than letting the database just do it for you.
If you are concerned about reusing a complex query in multiple places, consider creating a view of it instead.
Depending on the size of the tables, joining 8 of them could be pretty hairy. I would try it that way first - as others have said, the db is pretty good at figuring this stuff out. If the performance is not as good as you would like, I would try a stored proc which creates a table variable (or a temp table) and inserts the data from each of the 8 tables separately. Then you can return the contents of the table variable to your app.
This method also makes it a little easier to add the 9th, 10th, etc tables in the future. And it gives you an easy way to do any processing you may need on the summarized data before returning it to your app.