is sharing temp tables across procedures considered bad form? - sql

I have 2 procedures. One that builds a temp table and another (or several others) that use a temp table that's created in the first proc. Is this considered bad form? I'm walking into an existing codebase here and it uses this pattern a lot. It certainly bothers me but I can't say that it's explicitly a bad idea. I just find it to be an annoying pattern -- something smells rotten but I can't say what. I'd prefer to build the table locally and then fill it with an exec. But that requires procs that only return one table, which is unusual in the existing codebase.
Do the gurus of SQL shy away from this sort of temp table sharing? If so, why? I'm trying to form a considered opinion about this and would like some input.
Would a view be a viable alternative?
What about performance? Would it be better to build a #table locally or build a #table in the "fill" proc?
There's a good discussion of all methods here: http://www.sommarskog.se/share_data.html

As a programming paradigm it is ugly because it involves passing out-of-band parameters ('context parameters') that are not explicitly called out in the procedure signatures. This creates hidden dependencies and leads to spaghetti effect.
But in the specific context of SQL, there is simply no alternative. SQL works with data sets and you cannot pass back and forth these data sets as procedure parameters. You have few alternatives:
Pass through the client. All too obvious not a real option.
XML or strings with delimiters types to represent results. Not a serious option by any stretch, they may give good 'programming' semantics (ie. Demeter Law conformance) but they really really suck when performance comes into play.
shared tables (be it in tempdb or in appdb) like Raj suggest. You're loosing the #temp automated maintenance (cleanup on ref count goes to 0) and you have to be prepared for creation race conditions. Also they can grow large for no reason (they're no longer partitioned by session into separate rowsets, like #temp tables are).
#tables. They're scoped to the declaration context (ie. the procedure) and cannot be passed back and forth between procedures. I also discovered some nasty problems under memory pressure.

The biggest problem with sharing temp tables is that it introduces external dependencies into a procedure that may not be apparent at first glance. Say you have some procedure p1 that calls a procedure p2 and a temp table #t1 is used to pass information between the two procedures. If you want run p2 in isolation to see what it does, you have to create a "harness" that defines the #t1 before you can run it. Using temp tables in T-SQL is often equivalent to using global variables in other languages-- not recommended but sometimes unavoidable.
SQL Server 2008 now has table-valued parameters but Microsoft chose to make them read-only in this release. Still, it means you don't have to use temp tables in some scenarios where you had to before.
My advice is, use them if you have to, but document their use thoroughly. If you have some proc that depends on a temp table to run, call this out in a comment.

Sometimes this is the only way to go. If you need to do this, then DOCUMENT, DOCUMENT, DOCUMENT the facts. Here is one way to make it clear, put the table definition as a comment in the parameter section...
CREATE PROCEDURE xyz
(
#param1 int --REQUIRED, what it does
,#param2 char(1) --OPTIONAL, what it does
,#param3 varchar(25) --OPTIONAL, what it does
--this temp table is required and must be created in the calling procedure
--#TempXyz (RowID int not null primary key
-- ,DataValue varchar(10) not null
-- ,DateValue datetime null
-- )
)
Also document in the calling procedure where the temp table is created....
--** THIS TEMP TABLE IS PASSED BETWEEN STORED PROCEDURES **
--** ALL CHANGES MUST TAKE THIS INTO CONSIDERATION!!! **
CREATE TABLE #TempXyz
(RowID int not null primary key
,DataValue varchar(10) not null
,DateValue datetime null
)

I've encountered this same exact problem. What I can say from personal experience:
If there's a way to avoid using a #temp table across multiple procedures, use it. #temp tables are easy to lose track of and can easily grow tempdb if you're not careful.
In some cases, this method is un-avoidable (my classic example is certain reporting functionality that builds data differently based on report configration). If managed carefully, I believe that in these situations it is acceptable.

Sharing the temp tables is not a bad idea. But we need to ensure that the table level data should not be manipulated by the two procs at the same time, leading to Dirty Read/Write scenario. This will also help to have a single table [centralized ] and the procs working on it, to have sync-up data.
Regarding performance, it is true that, having multiple procs sharing the same temp data, will decrease performance. But at the same time, having individual tables [ per Proc ] will lead to increased memory consumption,

I have used this approach in a few cases. But I always declare the temp table as a fully qualified table and not by using #tablename.
CREATE TABLE [tempdb].[dbo].[tablename]
with this approach, it is easy to keep track of the temp table.
Raj

Another technique I've seen to avoid using temp tables is so-called "SPID-keyed" tables. Instead of a temp table, you define a regular table with the following properties:
CREATE TABLE spid_table
(
id INT IDENTITY(1, 1) NOT NULL,
-- Your columns here
spid INT NOT NULL DEFAULT ##SPID,
PRIMARY KEY (id)
);
In your procedures, you would have code like the following:
SELECT #var = some_value
FROM spid_table
WHERE -- some condition
AND spid = ##SPID;
and at the end of processing:
DELETE FROM spid_table
WHERE spid = ##SPID;
The big disadvantage of this is that the table uses the same recovery model as the rest of the database so all these transient inserts, updates and deletes are being logged. The only real advantage is the dependency is more apparent than using a temp table.

I guess, it is OK if the 2nd proc checks for existence of temp table & moves forward, if so. Also, it can raise error, if the temp table doesn't exist, asking user to run the proc1 first.
Why is the work divided in 2 stored procs? Can't things be done in 1 proc?

Related

Any downside to using a view to make sure stored procedures get all the columns they need?

Let me start by stating that when writing SELECT statements in a stored procedure or elsewhere in application code, I ALWAYS specify columns explicitly rather than using SELECT *.
That said, I have a case where I have many stored procedures that need exactly the same columns back because they are used to hydrate the same models on the client. In an effort to make the code less brittle and less prone forgetting to update a stored procedure with a new column, I am thinking of creating a view and selecting from that in my stored procedures using SELECT *.
To help clarify, here are examples of what the stored procedures might be named:
Entity_GetById
Entity_GetByX
Entity_GetForY
-- and on and on...
So in each of these stored procedures, I would have
SELECT *
FROM EntityView
WHERE...[criteria based on the stored procedure]
I'm wondering if there is any cost to doing this. I understand the pitfalls of SELECT * FROM Table but by selecting from a view that exists specifically to define the columns needed seems to mitigate this.
Are there any other reasons I wouldn't want to use this approach?
Personally, I don't see a problem with this approach.
However, there is a range of opinions on the use of select * in production code, which generally discourages it for the outermost query (of course, it is fine in subqueries).
You do want to make sure that the stored procedures are recompiled if there is any change to either the view or to the underlying tables supplying the view. You probably want to use schemabinding to ensure that (some) changes to the underlying tables do not inadvertently affect the view(s).
I don't know your system, but using a view would not affect performance?
SELECT * from the view makes sense, but does the view just selects specific columns from one table?
If not then look carefully into performance.
If I remember correctly in MS SQL stored procedure can return a recordset.
If I right you can try to wrap various queries into kind of sub queries stored procedure and have a one main which selects specific columns -- here complication should fail if you miss something in .
Even better would be having stored procedures which ask by various parameters and returns only primary keys (as record set or in temporary table) and one main which fetch all required columns based on returned primary keys.

Ok to use temp tables and table variable in the same stored procedure?

I have one select in my stored procedure that returns 4000 + rows. I was going to make this a temp table to work off the data later in the procedure.
I also have various other selects that only return 100-300 rows. I was going to make these table variables, again to work off later in the procedure.
Is it ok to use temp tables and table variables in the same procedure, or will this cause any performance issues?
Yes it is ok.
As for programming practice, I would prefer one type or the other (and lean toward table variables), if I'm reading a stored procedure. However, you might have a good reason for using one or the other, such as needing an index on a temp table or using it for a select into, then go ahead.
This is where you need to look for a full set of options sommarskog.se - share_data
Being able to add various indexes to temp tables is a particularly reason I'll sometimes choose temporary tables.
To avoid hitting temp db continuously, and if indexes are not required, then I'll use table variables.
Quite often now I use lots of CTEs that work together and avoid using any sort of materialized tables.
Classic answer - "it depends!"
I think there are many factors here that we don't know, such as your company's resources, your time-constraints, etc.
Generally speaking, it is fine to use temp tables for this purpose. And 100-300 rows(mentioned in the selects) - that's peanuts. No worries.

Temp table or permanent tables?

For my company I am redesigning some stored procedures. The original procedures are using lots of permanent tables which are filled during the execution of procedure and at the end, the values are deleted. The number of rows can extend from 100 to 50,000 rows for calculation of aggregations.
My question is, will there be severe performance issues if I replace those tables with temp tables ? Is it feasible to use temp tables ?
It depends on how often your using them, how long the processing takes, and if you are concurrently accessing data from the tables while writing.
If you use a temp table, it won't be sitting around waiting for indexing and caching while it's not in use. So it should save an ever so slight bit of resources there. However, you will incur overhead with the temp tables (i.e. creating and destroying).
I would re-examine how your queries function in the procedures and consider employing more in procedure CURSOR operations instead of loading everything into tables and deleting them.
However, databases are for storing information and retrieving information. I would shy away from using permanent tables for routine temp work and stick with the temp tables.
The overall performance shouldn't have any effect with the use case you specified in your question.
Hope this helps,
Jeffrey Kevin Pry
Yes its certainly feasible, you may want to check to see if the permanent tables have any indexing on them to speed up joins and so on.
I agree with Jeffrey. It always depends.
Since you're using Sql Server 2008 you might have a look at table variables.
They should be lighter than TEMP tables.
I define a User Defined Function which returns a table variable like this:
CREATE FUNCTION .ufd_GetUsers ( #UserCode INT )
RETURNS #UsersTemp TABLE
(
UserCode INT NOT NULL,
RoleCode INT NOT NULL
)
AS
BEGIN
INSERT #RolesTemp
SELECT
dbo.UsersRoles.Code, Roles.Code
FROM
dbo.UsersRoles
INNER JOIN
dbo.UsersRolesRelations ON dbo.UsersRoles.Code = dbo.UsersRolesRelations.UserCode
INNER JOIN
dbo.UsersRoles Roles ON dbo.UsersRolesRelations.RoleCode = Roles.Code
WHERE dbo.UsersRoles.Code = #UserCode
INSERT #UsersTemp VALUES(#UserCode, #UserCode)
RETURN
END
A big question is, can more then one person run one of these stored procedures at a time? I regularly see these kind of tables carried over from old single user databases (or from programmers who couldn't do subqueries or much of anything beyond SELECT * FROM). What happens if more then one user tries to run the same procedure, what happens if it crashes midway through - does the table get cleaned up? With temp tables or table variables you have the ability to properly scope the table to just the current connection.
Definitely use a temporary table, especially since you've alluded to the fact that its purpose is to assist with calculations and aggregates. If you used a table inside one of your database's schemas all that work is going to be logged - written, backed up, and so on. Using a temporary table eliminates that overhead for data that in the end you probably don't care about.
You actually might save some time from the fact that you can drop the temp tables at the end instead of deleting rows (you said you had multiple users so you have to delete rather than truncate). Deleting is a logged operation and can add considerable time to the process. If the permanent tables are indexed, then create the temp tables and index them as well. I would bet you would see an increase in performance usless your temp db is close to out of space.
Table variables also might work but they can't be indexed and they are generally only faster for smaller datasets. So you might try a combination of temp tables for the things taht will be large enough to benfit form indexing and table varaibles for the smaller items.
An advatage of using temp tables and table variables is that you guarantee that one users process won;t interfer with another user's process. You say they currently havea way to identify which records but all it takes is one bug being introduced to break that when using permanent tables. Permanent table for temporary processing are a very risky choice. Temp tables and table variabels can never see the data from someone else's process and thus are far safer as a choice.
Table variables are normally the way to go.
SQL2K and below can have significant performance bottlenecks if there are many temp tables being manipulated - the issue is the blocking DDL on the system tables.
Sql2005 is better, but table vars avoid the whole issue by not using those system tables at all, so can perform without inter-user locking issues (except those involved with the source data).
The issue is then that table vars only persist within scope, so if there is genuinuely a large amount of data that needs to be processed repeatedly & needs to be persisted over a (relatively) long duration then 'static' work tables may actually be faster - it does need a user key of some sort & regular cleaning. A last resort really.

Using a temporary table to replace a WHERE IN clause

I've got the user entering a list of values that I need to query for in a table. The list could be potentially very large, and the length isn't known at compile time. Rather than using WHERE ... IN (...) I was thinking it would be more efficient to use a temporary table and execute a join against it. I read this suggestion in another SO question (can't find it at the moment, but will edit when I do).
The gist is something like:
CREATE TEMP TABLE my_temp_table (name varchar(160) NOT NULL PRIMARY KEY);
INSERT INTO my_temp_table VALUES ('hello');
INSERT INTO my_temp_table VALUES ('world');
//... etc
SELECT f.* FROM foo f INNER JOIN my_temp_table t ON f.name = t.name;
DROP TABLE my_temp_table;
If I have two of these going at the same time, would I not get an error if Thread 2 tries to create the TEMP table after Thread 1?
Should I randomly generate a name for the TEMP table instead?
Or, if I wrap the whole thing in a transaction, will the naming conflict go away?
This is Postgresql 8.2.
Thanks!
There is no need to worry about the conflict.
The pg_temp schema is session specific. If you've a concurrent statement in a separate session, it'll use a different schema (even if you see it as having the same name).
Two notes, however:
Every time you create temporary objects, the system catalog creates a temporary schema and the objects themselves. This can lead to clutter if used frequently.
Thus, for small sets/frequent uses, it's usually better stick to an in or a with statement (both of which Postgres copes quite well with). It's also occasionally useful to "trick" the planner into using whichever plan you're seeking by using an immutable set returning function.
In the event you decide to actually use temporary tables, it's usually better to index and analyze them once you've filled them up. Else you're doing little more than writing a with statement.
Consider using WITH query insteed: http://www.postgresql.org/docs/9.0/interactive/queries-with.html
It also creates temporary table, which is destroyed when query / transaction finishes, so I believe there should be no concurrency conflicts.

Which are the best Variables in stored procedures

I'm often dealing with some interfaces between two systems with data import or data export. Therefore I'm programming some T-SQL procedures. It's is often necessary to use some variables inside these procedures, to hold some values or single records.
The last time I set up some temp tables e.g. one with name #tmpGlobals and another named #tmpOutput. The names doesn't matter but I eliminated the use of declaring some #MainID int or like that.
Is this a good idea? Is it a performance issue?
As Alexander suggests, it really depends. I won't draw hard lines in the sand about number of rows, because it can also depend on the data types and hence the size of each row. Where one will make more sense than the other in your environment can depend on several factors aside from just the size of the data, including access patterns, sensitivity of performance to recompiles, your hardware, etc.
There is a common misconception that #table variables are only in memory, do not incur I/O, do not use tempdb, etc. While in certain isolated cases some of this is true, it is not something you can or should rely on.
Some other limitations of #table variables that may prevent your use of them, even for small data sets:
cannot index (other than primary key / unique constraint declarations on creation)
no statistics are maintained (unlike #temp tables)
cannot ALTER
cannot use as INSERT EXEC target in SQL Server 2005 (this restriction was lifted in 2008)
cannot use as SELECT INTO target
cannot truncate
can't use an alias type in definition
no parallelism
not visible to nested procs (unlike #temp tables)
It really depends on the amount of data. If you're using under 100 records, then DECLARE #MainID or the like is probably better since it's a smaller amount of data. Anything over 100 records though, you should definitely use #tmpGlobals or similar since it's better for memory management on the SQL server.
EDIT: It's not bad to use #tmpGlobals for smaller sets, just not much of a performance loss or gain from DECLARE #MainID. You will see a performance gain when using #tmpGlobals, instead of DECLARE #MainID, on a high number of records.
In general, you should choose the reverse if possible. It depends on whether you need to store a set of items or just result values.
Scoped variables, aside from table variables, are relatively cheap. Things that fit into typed variables that aren't tables, operate faster than storing them as single rows in a table.
Table variables and temp tables tend to be quite expensive. They may require space in tempdb and also offer no optimizations by default. In addition, table variables should be avoided for large sets of data. When processing large sets, you can apply indexes and define primary keys on temp tables if you wish, but you cannot do this for table variables. Finally, temp tables need cleanup before exiting scope.
For parameters, table variables are useful for return sets from functions. Temp tables cannot be returned from functions. Depending on the task at hand, use of functions may make it easier to encapsulate specific portions of work. You may find that some stored procedures are doing work that is better suited to functions, especially if you are reusing but not modifying the results.
Finally, if you just need one-time storage of results in the middle of stored-procedure work, try CTEs. These usually beat out both table variables and temp tables, as SQL server can make better decisions on how to store this data for you. Also, as a matter of syntax, it may make your declarations more legible.
Using Common-Table Expressions # MSDN
edit: (regarding temp tables)
Local temp tables go away when the query session ends, which can be an indeterminate amount of time away in the future. Global temp tables don't go away until the connection is closed and no other users are using the table, which can be even longer. In either case, it is best to drop temp tables (as no longer needed) on exit of a procedure to avoid tying up memory and other resources.
CTEs can be used to avert this, in many cases, because they are only local to the location where they are declared. They automatically are marked for cleanup once the stored procedure or function of their scope exits.