I have several linked servers and I want insert a value into each of those linked servers. On first try executing, I've waited too long for the INSERT using CURSOR. It's done for about 17 hours. But I'm curious for those INSERT queries, and I checked a single line of my INSERT query using Display Estimated Execution Plan, it showed a Cost of 46% on Remote Insert and Constant Scan for 54%.
Below of my code snippets I worked before
DECLARE #Linked_Servers varchar(100)
DECLARE CSR_STAGGING CURSOR FOR
SELECT [Linked_Servers]
FROM MyTable_Contain_Lists_of_Linked_Server
OPEN CSR_STAGGING
FETCH NEXT FROM CSR_STAGGING INTO #Linked_Servers
WHILE ##FETCH_STATUS = 0
BEGIN
BEGIN TRY
EXEC('
INSERT INTO ['+#Linked_Servers+'].[DB].[Schema].[Table] VALUES (''bla'',''bla'',''bla'')
')
END TRY
BEGIN CATCH
DECLARE #ERRORMSG as varchar(8000)
SET #ERRORMSG = ERROR_MESSAGE()
END CATCH
FETCH NEXT FROM CSR_STAGGING INTO #Linked_Servers
END
CLOSE CSR_STAGGING
DEALLOCATE CSR_STAGGING
Also below, figure of how I check my estimation execution plan of my query
I check only INSERT query, not all queries.
How can I get best practice and best performance using Remote Insert?
You can try this, but I think the difference should be negligibly better. I do recall that when reading on the differences of approaches with doing inserts across linked servers, most of the standard approaches where basically on par with each other, though its been a while since I looked that up, so do not quote me.
It will also require you to do some light rewriting due to the obvious differences in approach (and assuming that you would be able to do so anyway). The dynamic sql required to do this might be tricky though as I am not entirely sure if you can call openquery within dynamic sql (I should know this but ive never needed to either).
However, if you can use this approach, the main benefit is that the where clause gets the destination schema without having to select any data (because 1 will never equal 0).
INSERT OPENQUERY (
[your-server-name],
'SELECT
somecolumn
, another column
FROM destinationTable
WHERE 1=0'
-- this will help reduce the scan as it will
-- get schema details without having to select data
)
SELECT
somecolumn
, another column
FROM sourceTable
Another approach you could take is to build a insert proc on the destination server/DB. Then you just call the proc by sending the params over. While yes this is a little bit more work, and introduces more objects to maintain, it add simplicity into your process and potentially reduces I/O when sending things across the linked servers, not to mention might save on CPU cost of your constant scans as well. I think its probably a more clean cut approach instead of trying to optimize linked server behavior.
Related
I want to insert the results of a stored procedure into a temp table using OPENROWSET. However, the issue I run into is I'm not able to pass parameters to my stored procedure.
This is my stored procedure:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[N_spRetrieveStatement]
#PeopleCodeId nvarchar(10),
#StatementNumber int
AS
SET NOCOUNT ON
DECLARE #PersonId int
SELECT #PersonId = [dbo].[fnGetPersonId](#PeopleCodeId)
SELECT *
INTO #tempSpRetrieveStatement
FROM OPENROWSET('SQLNCLI', 'Server=PCPRODDB01;Trusted_Connection=yes;',
'EXEC Campus.dbo.spRetrieveStatement #StatementNumber, #PersonId');
--2577, 15084
SELECT *
FROM #tempSpRetrieveStatement;
OpenRowSet will not allow you to execute Procedure with input parameters. You have to use INSERT/EXEC.
INTO #tempSpRetrieveStatement(Col1, Col2,...)
EXEC PCPRODDB01.Campus.dbo.spRetrieveStatement #StatementNumber, #PersonId
Create and test a LinkedServer for PCPRODDB01 before running the above command.
The root of your problem is that you don't actually have parameters inside your statement that you're transmitting to the remote server you're connecting to, given the code sample you provided. Even if it was the very same machine you were connecting to, they'd be in different processes, and the other process doesn't have access to your session variables.
LinkedServer was mentioned as an option, and my understanding is that's the preferred option. However in practice that's not always available due to local quirks in tech or organizational constraints. It happens.
But there is a way to do this.
It's hiding in plain sight.
You need to pass literals into the string that will be executed on the other server, right?
So, you start by building the string that will do that.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[N_spRetrieveStatement]
#PeopleCodeId nvarchar(10),
#StatementNumber int
AS
SET NOCOUNT ON
DECLARE
#PersonId INT,
#TempSQL VARCHAR(4000) = '';
SELECT #PersonId = [dbo].[fnGetPersonId](#PeopleCodeId);
SET #TempSQL =
'EXEC Campus.dbo.spRetrieveStatement(''''' +
FORMAT(#StatementNumber,'D') +''''', ''''' +
FORMAT(#PersonId,'D') + ''''')';
--2577, 15084
Note the seemingly excessive number of quotes. That's not a mistake -- that's foreshadowing. Because, yes, OPENROWSET hates taking variables as parameters. It, too, only wants literals. So, how do we give OPENROWSET what it needs?
We create a string that is the entire statement, no variables of any kind. And we execute that.
SET #TempSQL =
'SELECT * INTO #tempSpRetrieveStatement ' +
'FROM OPENROWSET(''SQLNCLI'', ''Server=PCPRODDB01;Trusted_Connection=yes;'', ' + #TempSQL +
'EXEC Campus.dbo.spRetrieveStatement #StatementNumber, #PersonId';
EXEC (#TempSQL);
SELECT *
FROM #tempSpRetrieveStatement;
And that's it! Pretty simple except for counting your escaped quotes, right?
Now... This is almost beyond the scope of the question you asked, but it is a 'gotcha' I've experienced in executing stored procedures in another machine via OPENROWSET. You're obviously used to using temp tables. This will fail if the stored procedure you're calling is creating temp tables or doing a few other things that -- in a nutshell -- inspire the terror of ambiguity into your SQL server. It doesn't like ambiguity. If that's the case, you'll see a message like this:
"Msg 11514, Level 16, State 1, Procedure sp_describe_first_result_set, Line 1
The metadata could not be determined because statement '…your remote EXEC statement here…' in procedure '…name of your local stored procedure here…' contains dynamic SQL. Consider using the WITH RESULT SETS clause to explicitly describe the result set."
So, what's up with that?
You don't just get data back with OPENROWSET. The local and remote servers have a short conversation about what exactly the local server is going to expect from the remote server (so it can optimize receiving and processing it as it comes in -- something that's extremely important for large rowsets). Starting with SQL Server 2012, sp_describe_first_result_set is the newer procedure for this, and normally it executes quickly without you noticing it. It's just that it's powers of divination aren't unlimited. Namely, it doesn't know how to get the type and name information regarding temp tables (and probably a few other things it can't do -- PIVOT in a select statement is probably right out).
I specifically wanted to be sure to point this out because of your reply regarding your hesitation about using LinkedServer. In fact, the very same reasons you're hesitant are likely to render that error message's suggestion completely useless -- you can't even predict what columns you're getting and in what order until you've got them.
I think what you're doing will work if, say, you're just branching upstream based on conditional statements and are executing one of several potential SELECT statements. I think it will work if you're just not confident that you can depend on the upstream component being fixed and are trying to ensure that even if it varies, this procedure doesn't have to because it's very generic.
But on the other hand you're facing a situation in which you literally cannot guarantee that SQL Server can predict the columns, you're likely going to have to force some changes in the stored procedure you're calling to insist that it's stable. You might, for instance work out how to ensure all possible fields are always present by using CASE expressions rather than any PIVOT. You might create a session table that's dedicated to housing what you need to SELECT just long enough to do that then DELETE the contents back out of there. You might change the way in which you transmit your data such that it's basically gone through the equivalent of UNPIVOT. And after all that extra work, maybe it'll be just a matter of preference if you use LinkedServer or OPENROWSET to port the data across.
So that's the answer to the literal question you asked, and one of the limits on what you can do with the answer.
I've just written something to insert 10000 rows into a table for the purposes of load testing.
The data in each of the rows is the same and uninteresting.
I did it like this:
DECLARE #i int = 0
WHILE #i < 10000 BEGIN
exec blah.CreateBasicRow ;
SET #i = #i + 1
END
All create basic row does is fill out the not null columns with something valid.
It turns out this is very slow and it even seems to hang occasionally! What are my alternatives? Would it be better to write something to generate a long file with all the data repeated with fewer insert clauses? Are there any other options?
Update
A constraint is that this needs to be in a form that sqlcmd can deal with - our database versioning process produces sql files to be run by sqlcmd. So I could generate a patch file with the data in a different form but I couldn't use a different tool to insert the data.
You can speed this exact code up by wrapping a transaction around the loop. That way SQL Server does not have to harden the log to disk on each iteration (possibly multiple times depending on how often you issue a DML statement in that proc).
That said, the fastest way to go is to insert all records at once. Something like
insert into Target
select someComputedColumns
from Numbers n
WHERE n.ID <= 10000
This should execute in <<1sec for typical cases. It breaks the encapsulation of using that procedure, though.
It's not hard to find developers who think cursors are gauche but I am wondering how to solve the following problem without one:
Let's say I have a proc called uspStudentDelete that takes as a parameter #StudentID.
uspStudentDelete applies a bunch of cascading soft delete logic, marking a flag on tables like "classes", "grades", and so on as inactive. uspStudentDelete is well vetted and has worked for some time.
What would be the best way to run uspStudentDelete on the results of a query (e.g. select studentid from students where ... ) in TSQL?
That's exactly what cursors are intended for:
declare c cursor local for <your query here>
declare #ID int
open c
fetch next from c into #id
while ##fetch_status = 0
begin
exec uspStudentDelete #id
fetch next from c into #id
end
close c
deallocate c
Most people who rail against cursors think you should do this in a proper client, like a C# desktop application.
The best solution is to write a set-based proc to handle the delete (try running this through a cursor to delete 10,000 records and you'll see why) or to add the set-based code to the current proc with a parameter to tell you wheter to run the set-based or single record part of the proc (this at least keeps it together for maintenance purposes).
In SQL Server 2008 you can use a table variable as an input variable. If you rewrite the proc to be set-based, you can have the same logic and run it no matter if the proc sends in one record or ten thousand. You may need to have a batch process in there to avoid deleting millions of records in one go though and locking up the tables for hours. Of course if you do this you will also need to adjust how the currect sp is being called.
I'm working on a procedure that will update a large number of items on a remote server, using records from a local database. Here's the pseudocode.
CREATE PROCEDURE UpdateRemoteServer
pre-processing
get cursor with ID's of records to be updated
while on cursor
process the item
No matter how much we optimize it, the routine is going to take a while, so we don't want the whole thing to be processed as a single transaction. The items are flagged after being processed, so it should be possible to pick up where we left off if the process is interrupted.
Wrapping the contents of the loop ("process the item") in a begin/commit tran does not do the trick... it seems that the whole statement
EXEC UpdateRemoteServer
is treated as a single transaction. How can I make each item process as a complete, separate transaction?
Note that I would love to run these as "non-transacted updates", but that option is only available (so far as I know) in 2008.
EXEC procedure does not create a transaction. A very simple test will show this:
create procedure usp_foo
as
begin
select ##trancount;
end
go
exec usp_foo;
The ##trancount inside usp_foo is 0, so the EXEC statement does not start an implicit transaction. If you have a transaction started when entering UpdateRemoteServer it means somebody started that transaction, I can't say who.
That being said, using remote servers and DTC to update items is going to perform quite bad. Is the other server also SQL Server 2005 at least? Maybe you can queue the requests to update and use messaging between the local and remote server and have the remote server perform the updates based on the info from the message. It would perform significantly better because both servers only have to deal with local transactions, and you get much better availability due to the loose coupling of queued messaging.
Updated
Cursors actually don't start transactions. The typical cursor based batch processing is usually based on cursors and batches updates into transactions of a certain size. This is fairly common for overnight jobs, as it allows for better performance (log flush throughput due to larger transaction size) and jobs can be interrupted and resumed w/o losing everithing. A simplified version of a batch processing loop is typically like this:
create procedure usp_UpdateRemoteServer
as
begin
declare #id int, #batch int;
set nocount on;
set #batch = 0;
declare crsFoo cursor
forward_only static read_only
for
select object_id
from sys.objects;
open crsFoo;
begin transaction
fetch next from crsFoo into #id ;
while ##fetch_status = 0
begin
-- process here
declare #transactionId int;
SELECT #transactionId = transaction_id
FROM sys.dm_tran_current_transaction;
print #transactionId;
set #batch = #batch + 1
if #batch > 10
begin
commit;
print ##trancount;
set #batch = 0;
begin transaction;
end
fetch next from crsFoo into #id ;
end
commit;
close crsFoo;
deallocate crsFoo;
end
go
exec usp_UpdateRemoteServer;
I ommitted the error handling part (begin try/begin catch) and the fancy ##fetch_status checks (static cursors actually don't need them anyway). This demo code shows that during the run there are several different transactions started (different transaction IDs). Many times batches also deploy transaction savepoints at each item processed so they can skip safely an item that causes an exception, using a pattern similar to the one in my link, but this does not apply to distributed transactions since savepoints and DTC don't mix.
EDIT: as pointed out by Remus below, cursors do NOT open a transaction by default; thus, this is not the answer to the question posed by the OP. I still think there are better options than a cursor, but that doesn't answer the question.
Stu
ORIGINAL ANSWER:
The specific symptom you describe is due to the fact that a cursor opens a transaction by default, therefore no matter how you work it, you're gonna have a long-running transaction as long as you are using a cursor (unless you avoid locks altogether, which is another bad idea).
As others are pointing out, cursors SUCK. You don't need them for 99.9999% of the time.
You really have two options if you want to do this at the database level with SQL Server:
Use SSIS to perform your operation; very fast, but may not be available to you in your particular flavor of SQL Server.
Because you're dealing with remote servers, and you're worried about connectivity, you may have to use a looping mechanism, so use WHILE instead and commit batches at a time. Although WHILE has many of the same issues as a cursor (looping still sucks in SQL), you avoid creating the outer transaction.
Stu
Are yo running this only from within sql server, or from an app? if so, get the list to be processed, then loop in the app to only process for the subsets as required.
Then the transaction should be handled by your app, and should only lock the items being updated/pages the items are in.
NEVER process one item at a time in a loop when you are doing transactional work. You can loop through records processing groups of them but never ever do one record at a time. Do set-based inserts instead and your performance will change from hours to minutes or even seconds. If you are using a cursor to insert update or delete and it isn't handling at least 1000 rowa in each statement (not one at atime) you are doing the wrong thing. Cursors are an extremely poor practice for such thing.
Just an idea ..
Only process a few items when the procedure is called (e.g. only get the TOP 10 items to process)
Process those
Hopefully, this will be the end of the transaction.
Then write a wrapper that calls the procedure as long as there is more work to do (either use a simple count(..) to see if there are items or have the procedure return true indicating that there is more work to do.
Don't know if this works, but maybe the idea is helpful.
I have to compute a value involving data from several tables. I was wondering if using a stored procedure with cursors would offer a performance advantage compared to reading the data into a dataset (using simple select stored procedures) and then looping through the records? The dataset is not large, it consists in 6 tables, each with about 10 records, mainly GUIDs, several nvarchar(100) fields, a float column, and an nvarchar(max).
That would probably depend on the dataset you may be retrieving back (the larger the set, the more logical it may be to perform inside SQL Server instead of passing it around), but I tend to think that if you are looking to perform computations, do it in your code and away from your stored procedures. If you need to use cursors to pull the data together, so be it, but using them to do calculations and other non-retrieval functions I think should be shied away from.
Edit: This Answer to another related question will give some pros and cons to cursors vs. looping. This answer would seem to conflict with my previous assertion (read above) about scaling. Seems to suggest that the larger you get, the more you will probably want to move it off to your code instead of in the stored procedure.
alternative to a cursor
declare #table table (Fields int)
declare #count int
declare #i
insert inot #table (Fields)
select Fields
from Table
select #count = count(*) from #table
while (#i<=#count)
begin
--whatever you need to do
set #i = #i + 1
end
Cursors should be faster, but if you have a lot of users running this it will eat up your server resources. Bear in mind you have a more powerful coding language when writing loops in .Net rather than SQL.
There are very few occasions where a cursor cannot be replaced using standard set based SQL. If you are doing this operation on the server you may be able to use a set based operation. Any more details on what you are doing?
If you do decide to use a cursor bear in mind that a FAST_FORWARD read only cursor will give you the best performance, and make sure that you use the deallocate statement to release it. See here for cursor tips
Cursors should be faster (unless you're doing something weird in SQL and not in ADO.NET).
That said, I've often found that cursors can be eliminated with a little bit of legwork. What's the procedure you need to do?
Cheers,
Eric