I have to compute a value involving data from several tables. I was wondering if using a stored procedure with cursors would offer a performance advantage compared to reading the data into a dataset (using simple select stored procedures) and then looping through the records? The dataset is not large, it consists in 6 tables, each with about 10 records, mainly GUIDs, several nvarchar(100) fields, a float column, and an nvarchar(max).
That would probably depend on the dataset you may be retrieving back (the larger the set, the more logical it may be to perform inside SQL Server instead of passing it around), but I tend to think that if you are looking to perform computations, do it in your code and away from your stored procedures. If you need to use cursors to pull the data together, so be it, but using them to do calculations and other non-retrieval functions I think should be shied away from.
Edit: This Answer to another related question will give some pros and cons to cursors vs. looping. This answer would seem to conflict with my previous assertion (read above) about scaling. Seems to suggest that the larger you get, the more you will probably want to move it off to your code instead of in the stored procedure.
alternative to a cursor
declare #table table (Fields int)
declare #count int
declare #i
insert inot #table (Fields)
select Fields
from Table
select #count = count(*) from #table
while (#i<=#count)
begin
--whatever you need to do
set #i = #i + 1
end
Cursors should be faster, but if you have a lot of users running this it will eat up your server resources. Bear in mind you have a more powerful coding language when writing loops in .Net rather than SQL.
There are very few occasions where a cursor cannot be replaced using standard set based SQL. If you are doing this operation on the server you may be able to use a set based operation. Any more details on what you are doing?
If you do decide to use a cursor bear in mind that a FAST_FORWARD read only cursor will give you the best performance, and make sure that you use the deallocate statement to release it. See here for cursor tips
Cursors should be faster (unless you're doing something weird in SQL and not in ADO.NET).
That said, I've often found that cursors can be eliminated with a little bit of legwork. What's the procedure you need to do?
Cheers,
Eric
Related
I have several linked servers and I want insert a value into each of those linked servers. On first try executing, I've waited too long for the INSERT using CURSOR. It's done for about 17 hours. But I'm curious for those INSERT queries, and I checked a single line of my INSERT query using Display Estimated Execution Plan, it showed a Cost of 46% on Remote Insert and Constant Scan for 54%.
Below of my code snippets I worked before
DECLARE #Linked_Servers varchar(100)
DECLARE CSR_STAGGING CURSOR FOR
SELECT [Linked_Servers]
FROM MyTable_Contain_Lists_of_Linked_Server
OPEN CSR_STAGGING
FETCH NEXT FROM CSR_STAGGING INTO #Linked_Servers
WHILE ##FETCH_STATUS = 0
BEGIN
BEGIN TRY
EXEC('
INSERT INTO ['+#Linked_Servers+'].[DB].[Schema].[Table] VALUES (''bla'',''bla'',''bla'')
')
END TRY
BEGIN CATCH
DECLARE #ERRORMSG as varchar(8000)
SET #ERRORMSG = ERROR_MESSAGE()
END CATCH
FETCH NEXT FROM CSR_STAGGING INTO #Linked_Servers
END
CLOSE CSR_STAGGING
DEALLOCATE CSR_STAGGING
Also below, figure of how I check my estimation execution plan of my query
I check only INSERT query, not all queries.
How can I get best practice and best performance using Remote Insert?
You can try this, but I think the difference should be negligibly better. I do recall that when reading on the differences of approaches with doing inserts across linked servers, most of the standard approaches where basically on par with each other, though its been a while since I looked that up, so do not quote me.
It will also require you to do some light rewriting due to the obvious differences in approach (and assuming that you would be able to do so anyway). The dynamic sql required to do this might be tricky though as I am not entirely sure if you can call openquery within dynamic sql (I should know this but ive never needed to either).
However, if you can use this approach, the main benefit is that the where clause gets the destination schema without having to select any data (because 1 will never equal 0).
INSERT OPENQUERY (
[your-server-name],
'SELECT
somecolumn
, another column
FROM destinationTable
WHERE 1=0'
-- this will help reduce the scan as it will
-- get schema details without having to select data
)
SELECT
somecolumn
, another column
FROM sourceTable
Another approach you could take is to build a insert proc on the destination server/DB. Then you just call the proc by sending the params over. While yes this is a little bit more work, and introduces more objects to maintain, it add simplicity into your process and potentially reduces I/O when sending things across the linked servers, not to mention might save on CPU cost of your constant scans as well. I think its probably a more clean cut approach instead of trying to optimize linked server behavior.
I often write SQL scripts that have repetitive several lines in the WHERE statement to eliminate records.
For instance:
SELECT *
FROM tblAll
WHERE Field1 NOT LIKE '%AA%'
AND Field1 NOT LIKE '%BB%'
AND Field1 NOT LIKE '%00'
It would be less prone to mistakes if I didn't have to add these lines each time. How can I create a function that would help me to do this instead?:
SELECT *
FROM tblAll
WHERE Field1 NOT LIKE Function
This is what I have currently:
CREATE FUNCTION [dbo].[ExcludeField1]
(
#Field1 varchar(max),
#Date datetime
)
RETURNS VARCHAR(15) AS
BEGIN
DECLARE #Field1 varchar(15)
DECLARE #Date datetime
SELECT TOP 1 #Field1=Field1 ,#Date=Date
FROM tblAll A
WHERE A.Date=#Date
AND Field1 NOT LIKE '%AA%'
AND Field1 NOT LIKE 'BB%'
AND Field1 NOT LIKE '%00%'
ORDER BY LEN(A.Field1) ASC ;
RETURN #Field1
END
GO
But I feel I'm missing something vital. The function only provides what I consider to be valid values for Field1. So my future scripts should be:
Field1 = #Field1
What's not right?
What you are trying to do may not be the best approach.
You are trying to short circuit the speed of the code by adding in something that makes it easier to program.
If you have large files, this could be disastrous. If you have small files, it's still cumbersome.
You may find it better (and more readable) to create a temp table that executes just your repetitive stuff first, and then call that smaller temp file into your other code. This makes it more readable for future people finding your code AND reduces the size (and complexity) of your following code. Depending on the type of SQL you are running, you may even be able to put your temp table code into a sub proc that can be called before each SQL query that needs it.
Still, you'll get better performance just adding in the extra code to each query that needs it. That way SQL can decide what the fastest way is to grab your data and send it back to you. (Even if the SQL code is much bulkier and you have to repeat the same snippets in lots of different queries.)
However if you insist on being able to insert code snippets that don't change. You could try building the SQL code in a different language (like Visual Basic, C, or VBA, etc) where you can create a string of code. Then you can connect to your SQL database and send the generated SQL code over and have it executed. This gives you the luxury of building all kinds of Frankenstein code. (And could even let you build interaction front ends so users can add all kinds of different parameters.) But beware... there be all kinds of nasty dragons for you play with in those caves.
Personally, I'd go with temp files if your goal is to simplify and bring some order to your repetitive code.
Hope that helps a bit. :)
EDIT
Oooops, let's not forget dynamic SQL (if your version of SQL has it):
DECLARE #sql nvarchar(max);
DECLARE #snippet nvarchar(max);
set #snippet=
'Field1 NOT LIKE ''%AA%''
AND Field1 NOT LIKE ''%BB%''
AND Field1 NOT LIKE ''%00''';
set #sql=
'SELECT
Field1
,Field2
FROM tblAll
WHERE '
+ #snippet +
'ORDER BY
Field1;'
exec(#sql);
This is actually the best way for you to control repetitive code. It allows you to lock it down in one variable that you can insert into larger/many SQL statements. That way, if you need to make future changes to the snippet, you only need to make the change in one place instead of having to hunt down all the queries that have the code.
But it also gives the ability for the SQL engine to create the fastest code for returning your data.
Remember... function calls can be very expensive for processing. Avoid them if possible.
I have worked on SQL stored procedures and I have noticed that many people use two different approaches -
First, to use select queries i.e. something like
Select * from TableA where colA = 10 order by colA
Second, is to do the same by constructing a query i.e. like
Declare #sqlstring varchar(100)
Declare #sqlwhereclause varchar(100)
Declare #sqlorderby varchar(100)
Set #sqlstring = 'Select * from TableA '
Set #sqlwhereclause = 'where colA = 10 '
Set #sqlorderby = 'order by colA'
Set #sqlstring = #sqlstring + #sqlwhereclause + #sqlorderby
exec #sqlstring
Now, I know both work fine. But, the second method I mentioned is a little annoying to maintain.
I want to know which one is better? Is there any specific reason one would resort to one method over the other? Any benefits of one method over other?
Use the first one. This will allow a query plan to be cached properly, apart from being the way you are supposed to work with SQL.
The second one is open to SQL Injection attacks, apart from the other issues.
With the dynamic SQL you will not get compile time checking, so it may fail only when invoked (the sooner you know about incorrect syntax, the better).
And, you noted yourself, the maintenance burden is also higher.
The second method has the obvious drawback of not being syntax checked at compile time. It does however allow a dynamic order by clause, which the first does not. I recommend that you always use the first example unless you have a very good reason to make the query dynamic. And, as #Oded has already pointed out, be sure to guard yourself against sql injection if you do go for the second approach.
I don't have a full comprehensive answer for you, but I can tell you right now that the latter method is much more difficult to work with when importing the stored procedure as a function in an ORM. Since the SQL is constructed dynamically, you have to manually create any type-classes that are returned from the stored procedure that aren't directly correlated to entities in your model.
With that in mind, there are times where you simply can't avoid constructing a SQL statement, especially when where clauses and joins depend on the parameters passed in. In my experience, I have found that stored procs that are creating large, variably joined/whered statements for EXECs are trying to do too many things. In these situations, I would recommend you keep the Single Responsibility Principle in mind.
Executing dynamic SQL inside a stored procedure reduces the value of using stored procedures to just a saved query container. Stored procedures are mostly beneficial in that the query execution plan (a very costly operation) is compiled and stored in memory the first time the procedure is executed. This means that every subsequent execution of the procedure is bypassing the query plan calculations, and jumping right to the data retrieval portiion of the operation.
Also, allowing a stored procedure to take an executable query string as a parameter is dangerous. Anyone with execute permission on granted on the procedure could potentially cause havoc on the rest of the database.
I have about half a dozen generic, but fairly complex stored procedures and functions that I would like to use in a more generic fashion.
Ideally I'd like to be able to pass the table name as a parameter to the procedure, as currently it is hard coded.
The research I have done suggests I need to convert all existing SQL within my procedures to use dynamic SQL in order to splice in the dynamic table name from the parameter, however I was wondering if there is a easier way by referencing the table in another way?
For example:
SELECT * FROM #MyTable WHERE...
If so, how do I set the #MyTable variable from the table name?
I am using SQL Server 2005.
Dynamic SQL is the only way to do this, but I'd reconsider the architecture of your application if it requires this. SQL isn't very good at "generalized" code. It works best when it's designed and coded to do individual tasks.
Selecting from TableA is not the same as selecting from TableB, even if the select statements look the same. There may be different indexes, different table sizes, data distribution, etc.
You could generate your individual stored procedures, which is a common approach. Have a code generator that creates the various select stored procedures for the tables that you need. Each table would have its own SP(s), which you could then link into your application.
I've written these kinds of generators in T-SQL, but you could easily do it with most programming languages. It's pretty basic stuff.
Just to add one more thing since Scott E brought up ORMs... you should also be able to use these stored procedures with most sophisticated ORMs.
You'd have to use dynamic sql. But don't do that! You're better off using an ORM.
EXEC(N'SELECT * from ' + #MyTable + N' WHERE ... ')
You can use dynamic Sql, but check that the object exists first unless you can 100% trust the source of that parameter. It's likely that there will be a performance hit as SQL server won't be able to re-use the same execution plan for different parameters.
IF OBJECT_ID(#tablename, N'U') IS NOT NULL
BEGIN
--dynamic sql
END
ALTER procedure [dbo].[test](#table_name varchar(max))
AS
BEGIN
declare #tablename varchar(max)=#table_name;
declare #statement varchar(max);
set #statement = 'Select * from ' + #tablename;
execute (#statement);
END
It's not hard to find developers who think cursors are gauche but I am wondering how to solve the following problem without one:
Let's say I have a proc called uspStudentDelete that takes as a parameter #StudentID.
uspStudentDelete applies a bunch of cascading soft delete logic, marking a flag on tables like "classes", "grades", and so on as inactive. uspStudentDelete is well vetted and has worked for some time.
What would be the best way to run uspStudentDelete on the results of a query (e.g. select studentid from students where ... ) in TSQL?
That's exactly what cursors are intended for:
declare c cursor local for <your query here>
declare #ID int
open c
fetch next from c into #id
while ##fetch_status = 0
begin
exec uspStudentDelete #id
fetch next from c into #id
end
close c
deallocate c
Most people who rail against cursors think you should do this in a proper client, like a C# desktop application.
The best solution is to write a set-based proc to handle the delete (try running this through a cursor to delete 10,000 records and you'll see why) or to add the set-based code to the current proc with a parameter to tell you wheter to run the set-based or single record part of the proc (this at least keeps it together for maintenance purposes).
In SQL Server 2008 you can use a table variable as an input variable. If you rewrite the proc to be set-based, you can have the same logic and run it no matter if the proc sends in one record or ten thousand. You may need to have a batch process in there to avoid deleting millions of records in one go though and locking up the tables for hours. Of course if you do this you will also need to adjust how the currect sp is being called.