SQL Server transactional replication republication - sql

We have a transactional replication setup where the subscriber is also a publisher to a second set of subscribers. I think this is because of the slow link between the primary publisher and the subscriber. The subscriber publishes the same set of articles to multiple local subscribers.
One problem we have is when the primary publisher/subscriber setup needs to be reinitialized, we have to remove the second publisher/subscriber setup. We get errors regarding dropping of tables otherwise. They can't be dropped by the initialization process because they are being used for replication by the second setup.
Maybe this is the way it has to be done but I'm wondering if there is a better way. Looking for any tips or suggestions.
Thanks,
Kevin

Maybe. The procedure to add an article (sp_addarticle) takes a parameter #pre_creation_cmd that specifies what to do before creating the article. The default is "drop", but can be "none" (do nothing), "delete" (deletes all data in the destination table), or "truncate" (truncates the destination table). In your case, I'd choose "delete" since you can't truncate a replicated table, either.
But I have to say that if it were me, I wouldn't do that either. I'd make my re-init script a sqlcmd script that looks something like:
:connect $(REPEATER_INSTANCE)
use [$(REPEATER_DB)];
declare arts cursor for
select p.name as pub, a.name as art
from sysarticles as a
join syspublications as p
on a.pubid = p.pubid;
open arts;
declare #p sysname, #a sysname;
while(1=1)
begin
fetch next from arts into #p, #a
if (##fetch_status <> 0)
break;
exec sp_droparticle #publication = #p, #article #a;
end
close arts;
deallocate arts;
:connect $(PUBLISHER)
use [$(PUBLISHER_DB)];
--whatever script you use to create the publication here
Note: that's completely untested (I don't have replication set up at home), but should be pretty close.
Lastly (and rhetorically), why are you re-initializing so often? That should be a rare event. If it's not, you may have a configuration issue (e.g. if you're constantly lagging behind so far that you exceed the distributor retention, increase the distributor retention).

Related

Should I use sp_getapplock to prevent multiple instances of a stored procedure that conditionally inserts?

Hear me out! I know this use case sounds suspect, but...
I have a stored procedure which checks a table (effectively a cache) for data for a given requested ID. If it doesn't find any data for that ID, or deems it out of date, it executes a second stored procedure which will pull data from a separate database (using dynamic SQL, source DB name is based on the requested ID) and insert it into the local table. It then selects from this table.
If the data is in the table, everything returns quickly (ms), but if it needs to be brought in from the other database, it takes about 10 seconds. We're seeing race conditions where two concurrent instances check the local cache, see something is missing, and queue up sequential ingestions of the remote data into the cache. To avoid double-insertion, the cache-populating procedure will clear whatever is already there for this id, but this just means the first instance of the procedure can selecting no rows because the second instance deleted the just-inserted records before re-inserting them itself.
I think I want to put a lock around the entire procedure (checking the cache, potentially populating the cache, selecting from the cache) - although I'm open to other solutions. I think the overall caching approach has to remain on-demand though, the remote databases come and go by the hundreds, and we only want to cache the ones actually requested by reporting as-needed.
BEGIN TRANSACTION;
BEGIN TRY
-- Take out a lock intended to prevent anyone else modifying the cache while we're reading and potentially modifying it
EXEC sp_getapplock #Resource = '[private].[cache_entries]', #LockOwner='Transaction', #LockMode = 'Exclusive', #LockTimeout = 120000;
-- Invoke a stored procedure that ingests any required data that is not already cached
EXEC [private].populate_cache #required_dbs
-- CALCULATIONS
-- ... SELECT FROM [private].cache_entries
COMMIT TRANSACTION; -- Free the lock
END TRY
BEGIN CATCH --Ensure we release our lock on failure
ROLLBACK TRANSACTION;
THROW
END CATCH;
The alternative to sp_getapplock is to use locking hints with your transaction. Both are reasonable approaches. Locking hints can be complex, but they protect the target object itself rather than a single code path. So sometimes necessary. sp_getapplock is simple (with Transaction as owner), and reliable.
You can do this without sp_getapplock, which tends to inhibit concurrency a lot.
The way to do this is to continue do your checks within a transaction, but to apply a HOLDLOCK hint, as well as a UPDLOCK hint.
HOLDLOCK aka the SERIALIZABLE isolation level, will place a lock not only on the ID you specify, but even on the absence of such data, in other words it will prevent anyone else inserting into that ID.
You must use both these hints, as well as have an index that matches that SELECT, otherwise you could run into major blocking and deadlocking problems due to full table scans.
Also, you don't need a CATCH and ROLLBACK. Just use SET XACT_ABORT ON; which ensures a rollback in any event of an error.
SET XACT_ABORT ON; -- always have this set
BEGIN TRANSACTION;
DECLARE #SomeData nvarchar(100) = (
SELECT ce.SomeColumn
FROM [private].cache_entries ce WITH (HOLDLOCK, UPDLOCK)
WHERE ce.SomeCondition = 1
);
IF #SomeData IS NULL
BEGIN
-- Invoke a stored procedure that ingests any required data that is not already cached
EXEC [private].populate_cache #required_dbs
END
-- CALCULATIONS
-- ... SELECT FROM [private].cache_entries
COMMIT TRANSACTION; -- Free the lock

SQL Server: Intermittent Invalid object name 'dbo.computed_view1'. Could not use view or function 'dbo.view2' because of binding errors

I'm using SQL Server 2008 R2.
I have a view; let's call it view1. This view is complex and slow. It cannot be made into an indexed view because it uses left joins and various other trickery. As such, we created a stored procedure which basically:
obtains an exclusive lock
selects * into computed_view1_tmp from view1; (slow)
creates indexes on the above computed table (slow)
renames computed_view1 to computed_view1_todelete; and does the same for its indexes (assumed fast)
renames computed_view1_tmp to computed_view1; and does the same for its indexes (assumed fast)
drops the table computed_view1_todelete (slow)
releases the lock.
We run this procedure when we know we're changing the data in our web application. We then have other views, such as view2 using computed_view1 instead of view1.
Once in a while, we get:
Invalid object name 'dbo.computed_view1'. Could not use view or
function 'dbo.view2 because of binding errors.
I assume this is because we're trying to access dbo.computed_view1 at the same time as it's being renamed. I assume this is a very short period, but the frequency I am seeing this error in my logs makes me wonder if something else might be at play. I'm getting the error many times per day on a site with about a dozen users active throughout the day.
In development, this procedure takes about five seconds given the amount of data in the view. Renaming is instantaneous. In production, it must be taking longer but I don't understand why. I once saw the procedure fail to obtain the exclusive lock within 90 seconds.
Any thoughts on how to fix or a better solution?
Edit: Extra notes on my locking - maybe I'm not doing this right:
BEGIN TRANSACTION
DECLARE #result int
EXEC #result = sp_getapplock #Resource = 'lock_computed_view1', #LockMode = 'Exclusive', #LockTimeout = 90
IF #result NOT IN ( 0, 1 ) -- Only successful return codes
BEGIN
PRINT #result
RAISERROR ( 'Lock failed to acquire...', 16, 1 )
END
ELSE
BEGIN
// rest of the magic
END
EXEC #result = sp_releaseapplock #Resource = 'lock_computed_view1'
COMMIT TRANSACTION
If you're locking and transaction scope is right I would expect other transactions to wait and never see the view missing. This might be a SQL Server idiosyncrasy that I don't know about.
It is often possible to do without dynamic DDL. Here are two ways to do it:
TRUNCATE the computed table and insert into it. This takes an exclusive automatically. No need to rename. All of this is atomic and supports rollback.
Use a staging table with the same schema. Work on that. So far no service interruption at all. Then, SWITCH PARTITION the staging table with the production table. This is quick and atomic. This does not require Enterprise Edition.
With these approaches the problem is solved by just not renaming.

How do I change a subscriber schema?

I am currently working on a prototype to allow a client to update a subscriber database schema so that they can eventually change the subscriber to match the new version of their application then switch over to that database when they deploy the front end code.
My hope was that I could issue schema changes (for example, change a column data type) while keeping the replication stored procedures up to date to properly convert any replicated data. While the subscriber might hold locks on big tables being updated it could then just queue up the changes from the publisher instead of causing locking issues with the still-running application. I hope I'm explaining this well enough...
Here's what I tried:
BEGIN TRANSACTION
BEGIN TRY
UPDATE dbo.Big_Table SET some_string = REPLACE(some_string, ',', '')
ALTER TABLE dbo.Big_Table ALTER COLUMN some_string INT
DECLARE #sql VARCHAR(MAX)
SET #sql = 'create procedure [dbo].[sp_MSins_dboBig_Table]
#c1 bigint,
#c2 varchar(20),
#c3 varchar(30)
as
begin
declare #c2_new int
set #c2_new = cast(replace(#c2, '','', '''') as int)
insert into [dbo].[Big_Table] (
[my_id],
[some_string],
[another_string]
)
values (
#c1,
#c2_new,
#c3
)
end -- '
EXEC(#sql)
COMMIT TRANSACTION
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION
END CATCH
This specific script would change a VARCHAR column that contains numeric data into an INT while at the same time removing any commas that might be included in a number like "1,325".
The problem is, this causes blocking at the publisher as well. I've seen references to pausing replication, but none of them have concrete steps to follow (I don't have a lot of replication experience). It's typically, "turn off some jobs".
I also saw a page on switching updating modes, but I think that only applies to update-able subscribers.
Any suggestions on how to handle this situation?
How do you have replication set up now if the publisher and subscriber schema don't match? This is a really "interesting" setup. In general, messing with the default procedures is going to cause a headache. Because they're system maintained, your new version could be overwritten at any point (though in practice, this wouldn't happen). If you don't have an updating subscriber, the subscriber should be treated as read-only lest you break replication. Perhaps I'm missing something, though.

Finding Caller of SQL Function

There's a SQL function that I'd like to remove from a SQL Server 2005 database, but first I'd like to make sure that there's no one calling it. I've used the "View Dependencies" feature to remove any reference to it from the database. However, there may be web applications or SSIS packages using it.
My idea was to have the function insert a record in an audit table every time it was called. However, this will be of limited value unless I also know the caller. Is there any way to determine who called the function?
You can call extended stored procedures from a function.
Some examples are:
xp_cmdshell
xp_regwrite
xp_logevent
If you had the correct permissions, theoretically you could call an extended stored procedure from your function and store information like APP_NAME() and ORIGINAL_LOGIN() in a flat file or a registry key.
Another option is to build an extended stored procedure from scratch.
If all this is too much trouble, I'd follow the early recommendation of SQL Profiler or server side tracing.
An example of using an extended stored procedure is below. This uses xp_logevent to log every instance of the function call in the Windows application log.
One caveat of this method is that if the function is applied to a column in a SELECT query, it will be called for every row that is returned. That means there is a possibility you could quickly fill up the log.
Code:
USE [master]
GO
/* A security risk but will get the job done easily */
GRANT EXECUTE ON xp_logevent TO PUBLIC
GO
/* Test database */
USE [Sandbox]
GO
/* Test function which always returns 1 */
CREATE FUNCTION ufx_Function() RETURNS INT
AS
BEGIN
DECLARE
#msg VARCHAR(4000),
#login SYSNAME,
#app SYSNAME
/* Gather critical information */
SET #login = ORIGINAL_LOGIN()
SET #app = APP_NAME()
SET #msg = 'The function ufx_Function was executed by '
+ #login + ' using the application ' + #app
/* Log this event */
EXEC master.dbo.xp_logevent 60000, #msg, warning
/* Resume normal function */
RETURN 1
END
GO
/* Test */
SELECT dbo.ufx_Function()
Depending on your current security model. We use connection pooling w/ one sql account. Each application has it's own account to connect to the database. If this is the case. You could then do a Sql Profiler session to find the caller of that function. Whichever account is calling the function will directly relate to one application.
This works for us in the way we handle Sql traffic; I hope it does the same for you.
try this to search the code:
--declare and set a value of #SearchValue to be your function name
SELECT DISTINCT
s.name+'.'+o.name AS Object_Name,o.type_desc
FROM sys.sql_modules m
INNER JOIN sys.objects o ON m.object_id=o.object_id
INNER JOIN sys.schemas s ON o.schema_id=s.schema_id
WHERE m.definition Like '%'+#SearchValue+'%'
ORDER BY 1
to find the caller at run time, you might try using CONTEXT_INFO
--in the code chain doing the suspected function call:
DECLARE #CONTEXT_INFO varbinary(128)
,#Info varchar(128)
SET #Info='????'
SET #CONTEXT_INFO =CONVERT(varbinary(128),'InfoForFunction='+ISNULL(#Info,'')+REPLICATE(' ',128))
SET CONTEXT_INFO #CONTEXT_INFO
--after the suspected function call
SET CONTEXT_INFO 0x0 --reset CONTEXT_INFO
--here is the portion to put in the function:
DECLARE #Info varchar(128)
,#sCONTEXT_INFO varchar(128)
SET #sCONTEXT_INFO=CONVERT(varchar(128),CONTEXT_INFO())
IF LEFT(#sCONTEXT_INFO,15)='InfoForFunction='
BEGIN
SET #Info=RIGHT(RTRIM(#sCONTEXT_INFO),LEN(RTRIM(#sCONTEXT_INFO))-15)
END
--use the #Info
SELECT #Info,#sCONTEXT_INFO
if you put different values in #CONTEXT_INFO in various places, you can narrow down who is calling the function, and refine the value until you find it.
You can try using APP_NAME() and USER_NAME(). It won't give you specifics (like an SSIS package name), but it might help.
This will help you find if this is being called anywhere in your database.
select object_name(id) from sys.syscomments where text like '%**<FunctionName>**%'
Another far less elegant way is to grep -R [functionname] * through your source code. This may or may not be workable depending on the amount of code.
This has the advantage of working even if that part of the only gets used very infrequently, which would be big problem with your audit table idea.
You could run a trace in the profiler to see if that function is called for a week (or whatever you consider a safe window).
I think that you might also be able to use OPENROWSET to call an SP which logs to a table if you enable ad-hoc queries.

Transactions within loop within stored procedure

I'm working on a procedure that will update a large number of items on a remote server, using records from a local database. Here's the pseudocode.
CREATE PROCEDURE UpdateRemoteServer
pre-processing
get cursor with ID's of records to be updated
while on cursor
process the item
No matter how much we optimize it, the routine is going to take a while, so we don't want the whole thing to be processed as a single transaction. The items are flagged after being processed, so it should be possible to pick up where we left off if the process is interrupted.
Wrapping the contents of the loop ("process the item") in a begin/commit tran does not do the trick... it seems that the whole statement
EXEC UpdateRemoteServer
is treated as a single transaction. How can I make each item process as a complete, separate transaction?
Note that I would love to run these as "non-transacted updates", but that option is only available (so far as I know) in 2008.
EXEC procedure does not create a transaction. A very simple test will show this:
create procedure usp_foo
as
begin
select ##trancount;
end
go
exec usp_foo;
The ##trancount inside usp_foo is 0, so the EXEC statement does not start an implicit transaction. If you have a transaction started when entering UpdateRemoteServer it means somebody started that transaction, I can't say who.
That being said, using remote servers and DTC to update items is going to perform quite bad. Is the other server also SQL Server 2005 at least? Maybe you can queue the requests to update and use messaging between the local and remote server and have the remote server perform the updates based on the info from the message. It would perform significantly better because both servers only have to deal with local transactions, and you get much better availability due to the loose coupling of queued messaging.
Updated
Cursors actually don't start transactions. The typical cursor based batch processing is usually based on cursors and batches updates into transactions of a certain size. This is fairly common for overnight jobs, as it allows for better performance (log flush throughput due to larger transaction size) and jobs can be interrupted and resumed w/o losing everithing. A simplified version of a batch processing loop is typically like this:
create procedure usp_UpdateRemoteServer
as
begin
declare #id int, #batch int;
set nocount on;
set #batch = 0;
declare crsFoo cursor
forward_only static read_only
for
select object_id
from sys.objects;
open crsFoo;
begin transaction
fetch next from crsFoo into #id ;
while ##fetch_status = 0
begin
-- process here
declare #transactionId int;
SELECT #transactionId = transaction_id
FROM sys.dm_tran_current_transaction;
print #transactionId;
set #batch = #batch + 1
if #batch > 10
begin
commit;
print ##trancount;
set #batch = 0;
begin transaction;
end
fetch next from crsFoo into #id ;
end
commit;
close crsFoo;
deallocate crsFoo;
end
go
exec usp_UpdateRemoteServer;
I ommitted the error handling part (begin try/begin catch) and the fancy ##fetch_status checks (static cursors actually don't need them anyway). This demo code shows that during the run there are several different transactions started (different transaction IDs). Many times batches also deploy transaction savepoints at each item processed so they can skip safely an item that causes an exception, using a pattern similar to the one in my link, but this does not apply to distributed transactions since savepoints and DTC don't mix.
EDIT: as pointed out by Remus below, cursors do NOT open a transaction by default; thus, this is not the answer to the question posed by the OP. I still think there are better options than a cursor, but that doesn't answer the question.
Stu
ORIGINAL ANSWER:
The specific symptom you describe is due to the fact that a cursor opens a transaction by default, therefore no matter how you work it, you're gonna have a long-running transaction as long as you are using a cursor (unless you avoid locks altogether, which is another bad idea).
As others are pointing out, cursors SUCK. You don't need them for 99.9999% of the time.
You really have two options if you want to do this at the database level with SQL Server:
Use SSIS to perform your operation; very fast, but may not be available to you in your particular flavor of SQL Server.
Because you're dealing with remote servers, and you're worried about connectivity, you may have to use a looping mechanism, so use WHILE instead and commit batches at a time. Although WHILE has many of the same issues as a cursor (looping still sucks in SQL), you avoid creating the outer transaction.
Stu
Are yo running this only from within sql server, or from an app? if so, get the list to be processed, then loop in the app to only process for the subsets as required.
Then the transaction should be handled by your app, and should only lock the items being updated/pages the items are in.
NEVER process one item at a time in a loop when you are doing transactional work. You can loop through records processing groups of them but never ever do one record at a time. Do set-based inserts instead and your performance will change from hours to minutes or even seconds. If you are using a cursor to insert update or delete and it isn't handling at least 1000 rowa in each statement (not one at atime) you are doing the wrong thing. Cursors are an extremely poor practice for such thing.
Just an idea ..
Only process a few items when the procedure is called (e.g. only get the TOP 10 items to process)
Process those
Hopefully, this will be the end of the transaction.
Then write a wrapper that calls the procedure as long as there is more work to do (either use a simple count(..) to see if there are items or have the procedure return true indicating that there is more work to do.
Don't know if this works, but maybe the idea is helpful.