SQL 2005 Transactional Replication: Behavior during snapshot processing?

SQL 2005 Transactional Replication: Behavior during snapshot processing? - sql

So, I've got SQL (2005) Transactional Replication generally working well with a single publisher and single (read-only) subscriber. Data changes and updates flow perfectly, with about 5 second latency, which is just fine.
My one nagging problem, that I've spent a couple days trying to solve (and Googling everywhere for answers) is that new sprocs/tables/etc. do not get propagated to the read-only subscriber, even though I've added them as "articles" to the "publication". The publication has "transmit schema changes" set to ON, and stored procedures are set to transfer their definitions. But, for some reason, they don't.
My "snapshot agent" process is set to NOT SCHEDULED. (In other words, it only happens once, when I initiate it manually.) Should I be putting this on a schedule to enable the transfer of new or modified tables and sprocs?
I thought the mere act of adding the object as an article to the publication would do it, but it's still not sending it unless I do a snapshot. The WAN connecting these is totally fast and reliable, so that's not the issue, and table-data-updates transfer relatively fast and flawlessly.
While I could put my snapshot agent on a schedule, does this have any real-time production impacts for users of the main publication database or the read-only copy? (My site currently gets 4+ million unique-users a month, so I'd like to have minimal disruption...) Thanks!

Transactional replication only distributes (and then subsequently publishes) the DML (Data Manipulation Language) statements from the transaction log of the source (publication) database.
New tables and stored procedures are not replicated to the subscriber. Schema changes in this particular context, although I have to admit it is a little unclear in some of the Books Online documentation, refer to the existing schema, i.e. if you were to a add column to an existing database this change would be propagated to the subscribers.
For clarification here is a Microsoft article that details the schema changes that you can make.
[http://msdn.microsoft.com/en-us/library/ms151870(SQL.90).aspx][1]
I hope this helps. Replication is a big subject area so please let me know if I can be of further assistance.
Oh yes, you are correct, if you add new articles to your publication you will need to create an updated snapshot.
Cheers,

Related

Adding table(s) to replicated DB

I recently added two new tables to a db that is currently being transactionally replicated. Short of dropping and recreating the entire publication is there a way to quickly add these two new tables to the existing publication? Will I have to take an entirely new snapshot? I only ask because this is a production db and cant be stopped until nighttime, lockups will cause major issues.
Thanks - Travis

DISCALIMER: All replication has the potential to lock entire databases as it reads the entire log. Changes should be thoroughly tested outside of Production and implemented off hours.
For basic transactional replication, you can use sp_addarticle and sp_addsubscription for each table without affecting the existing subscriptions. If you initialized the current subscription with sp_addsubscription #article = 'all'; (default), it may not let you add additional articles in which case you will have to drop existing subscriptions or create a new publication.
You won't necessarily have to take a snapshot for the existing subscriptions even if you do have to drop them, but you take responsibility for keeping the data in sync. You should use triggers or other methods to lock down changes before dropping subscriptions, and recreate them using sp_addsubscription #sync_type='replication support only'; If all subscriptions are created this way, a snapshot will not be generated. If only the new articles are subscribed with #sync_type='automatic' then only those articles will be present in the new snapshot. Afterwards, you should verify data integrity between publisher and subscriber.

Daily Data subset of main database

I have a large Db 500GB one of our customers wants daily snapshot of only his data, He only has a 3mb connection , I suspect that is the Max ! What method is the most effective method I could use?
1. Views that are updated but it wants the underlaying tables.
2. Replication I don’t know much about this.
3. Alternative method.

Merge replication, which allows you to initialize a subscriber without using a snapshot. You will have to initialize replication on the subscriber from a backup. You can possibly do the same with transactional replication, but it just never worked quite the same for me. YMMV. When it breaks (etc.) you will have to be prepared to ship a new backup and start over (hacking replication sometimes works, but don't count on it). Changing database structures is also a pain once in replication. I have seen ~500Gb with less than 3Mbit work in production, though not without proper planning and preparation (and grey hair)
I have used transactional replication with a read only subscription, where I invoked the distribution with a batch file on a task schedule once a day. Transactional replication did not feel as maintained (from MS) or as stable as merge replication, though the data integrity with transactional was more consistent
I have not tried transaction log shipping, but that might also be an option
(p.s. notice I didn't say "If it breaks")

Creating tables in SQL Server 2005 master DB

I am adding a monitoring script to check the size of my DB files so I can deliver a weekly report which shows each files size and how much it grew over the last week. In order to get the growth, I was simply going to log a record into a table each week with each DB's size, then compare to the previous week's results. The only trick is where to keep that table. What are the trade-offs in using the master DB instead of just creating a new DB to hold these logs? (I'm assuming there will be other monitors we will add in the future)

The main reason is that master is not calibrated for additional load: it is not installed on IO system with proper capacity planning, is hard to move around to new IO location, it's maintenance plan takes backups and log backups are as frequent as needed for a very low volume of activity, its initial size and growth rate are planned as if no changes are expected. Another reason against it is that many troubleshooting scenarios you would want a copy of the database to inspect, but you'd have to attach a new master to your instance. These are the main reasons why adding objects to master is discouraged. Also many admins understandably prefer an application to use it's own database so it can be properly accounted for, and ultimately easily uninstalled.
Similar problems exist for msdb, but if push comes to shove it would be better to store app data in msdb rather than master since the former is an ordinary database (despite widespread believe that is system, is actually not).

The Master DB is a system database that belongs to SQL Server. It should not be used for any other purposes. Create your own DB to hold your logs.

I would refrain from putting anything in master, it could be overwritten/recreated on an upgrade.
I have put a DBA only ServerInfo database on each server for uses like this, as well as any application specific environmental things (things that differ between prod and test and dev).

You should add a separat database for the logging. It is not garanteed that the master database is not breaking the next patch of sql server if you leave your objects in there.
And microsoft itself does advise you to not do it.
http://msdn.microsoft.com/en-us/library/ms187837.aspx

Using Sql Server Replication

We are using Replication and seem to be having endless problems with it. It seems to shut down for unknown reasons. It needs to be shut down to remove a column and only starts back up half the time. Does anyone have any advice on how to properly use replication or some alternatives to it.
Edit:
We are using Sql Server 2005, We cannot use database mirroring as we used the other database for reporting. As far as I am aware you cannot query from a mirrored database.

If you need just couple of tables from your DB for reports, replication is more useful, but you also can set up log shipping with secondary server in STAND BY mode (especially if you need significant part of your data for reports), then you can run reports on secondary server. You just have to remember that log shipping will interfere with transaction log backups, so you have to use the same folder with log backup files for both processes.

I would think the combination of database mirroring and database snapshots will solve your issues.
First, database mirroring is very easy to setup and I have never had any problems with it (using it for the past 4+ years).
Second, creating a database snapshot on your failover server will allow you to run reports. You can setup a sql agent job to drop and re-create the snapshot on whatever acceptable interval you like.
Of course this is all dependent on if you need your reports to run on real-time data or if they can be delayed somewhat.

Here are a list of the problems that I have had to resolve to get replication working:
1) The replication sometimes lies to me and tells me this, even when its working fine.
"The server 'Bob' is not a Subscriber. (.Net SqlClient Data Provider)" I have tried to re-initialise it thinking that it was broken and it never was...
2) It can take a little while to restart itself, especially if your remote DB is on the other side of the planet, which it is in my case. If you are on a slow network connection, or it is not 100% reliable, then you can have problems. Also, the jobs which restart the process can sometimes take a while to run, which also delays things further.
3) Some changes require full re-initalisation which involves sending a new snapshot out. If you don't have your permissions quite right, and you can re-initialise manually, but it doesn't happen automatically, then this can be a another reason for problems.

We have a SQL transactional replication which runs perfectly happily. You seem to say that it is when you are making schema changes to the publisher that you get problems. Each time we do a schema change we drop the publication, subscription and the subscription database. Do the change, then re-build it all. We can do this becuase we can tolerate the time it takes to re-apply the snapshot. There are ways to apply schema changes to the publication and have them propogate to the subscriber. Take a look at sp_register_custom_scripting. We have made this work once, so I can give some more information about it if you need.
As #Jason says, you can report from a mirrored database by using a snapshot. Beware that the snapshot will take up space, and cause more work for the mirror server. Although how much space will depend on how much data is changing and how big your original database is. We do use a snapshot on a mirrored database for occasional reports because our entire database is not replicated.

log shipping http://msdn.microsoft.com/en-us/library/ms187103.aspx

What version of SQL Server are you using?
We're using replication now for a particular solution, and it seems to just work, day in, day out.
I would examine your event log's, and SQL Server logs to see if you can determine why it is shutting down, and why it doesn't start up.
Are you possibly patching the servers, or are you having network errors?
The alternatives to replication are log shipping, or database mirroring.
I personally prefer Database Mirroring, but it really depends what you're trying to do, as some of these aren't appropriate for certain situations.

We also have used SQL transactional replication. We had the same pains with updating schema, which requires dropping the publication on all servers, performing the updates, and then reinitializing replication, and hoping for the best. Sometimes it would not initialize, or a node would fall behind and we'd get little warning for it. A few times we even lost all the stored procedure execute permissions causing pretty much total failure on the websites.
We have a rather large database so reinitialization could take quite some time, meaning all updates had to be done at 2am on Sunday - not exactly when we're awake and alert and able to use all our faculties to deal with a problem that might arise.
We are ditching replication in favor of failover clustering on SQL 2008, but it can still be done all the way back to SQL 2000.
http://technet.microsoft.com/en-us/library/cc917693.aspx

Performance questions for SQL Cache Dependency

I'm working on a project where we are thinking of using SQLCacheDependency with SQL Server 2005/2008 and we are wondering how this will affect the performance of the system.
So we are wondering about the following questions
Can the number of SQLCacheDependency objects (query notifications) have negative effect on SQL Server performance i.e. on insert, update and delete operations on affected tables ?
What effect (performance wise) would for example 50000 different query notifications on a single table have in SQL Server 2005/2008 on insertion and deletion on that table.
Are there any recommendations of how to use SQLCacheDependencies? Any official do‘s and don‘ts? We have found some information on the internet but haven‘t found information on performance implications.
If there is anyone here that has some answers to these questions that would be great.

The SQL Cache dependency using the polling mechanism should not be a load on the sql server or the application server.
Lets see what all steps are there for sqlcachedependency to work and analyze them:
Database is enabled for sqlcachedependency.
A table say 'Employee' is enabled for sqlcachedependency. (can be any number of tables)
Web.config is updated to enable sqlcachedependency.
The Page where u r using sql cache dependency is configured.
thats it.
Internally:
step 1. creates a table 'ASPnet_sqlcachetablesforchangenotification' in database which will store the 'Employee' table name for which sqlcachedependency is enabled. and add some stored procedures aswell.
step 2. inserts a 'Employee' table entry in the 'ASPnet_sqlcachetablesforchangenotification' table. Also creates an insert update delete trigger on this 'Employee' table.
step 3. enables application for sqlcachedependency by providing the connectionstring and polltime.
whenever there is a change in 'Employee' table, trigger is fired which inturn updates the 'ASPnet_sqlcachetablesforchangenotification' table.
Now application polls the database say every 5000ms and checks for any changes to the 'ASPnet_sqlcachetablesforchangenotification' table. if there r any changes the respective caches is removed from memory.
The great benefit of caching combined with freshness of data ( atmost data can be 5 seconds stale). The polling is taken care by a background process with should not be a performance hurdle. because as u see from above list the task are least CPU demanding.

SQLCacheDependency is implemented as an indexed view and every time the table is modified this views index gets changed. so many views (SQLCacheDependency objects) on the same table mean quite a perf hit for modifications. however if you have 1 view (SQLCacheDependency object) per table you should have no problems.
the cache changed notification is async and is triggered when the server has resources.

You're right, not much information on this is provided but there's a phrase related to your question in this page http://msdn.microsoft.com/en-us/library/ms178604%28VS.80%29.aspx
"The database operations associated with SQL cache dependency are simple and therefore do not incur a heavy processing cost on the server."
Hope this helps you although your question is a little bit old already.

This page appears to have some good info on setup which technique to use well (granted I did just skim it).

All I can provide is anecdotal evidence for performance, but we use SqlCacheDependency as a sort of "messaging solution" for a large enterprise application that processes on the order of ten thousand messages per hour.
The basic architecture is that our company uses Perforce for source control and we have a "subscription service" that receives messages from a trigger webservice call than gets called on every p4 commit and inserts a record into a SQL database. Our application has the dependency setup to send subscription notifications for every changeliest that affects a branch or path that you are monitoring.
The performance is fine. Trigger runs on the order of 200ms and we have never had a complaint about the latency of relaying the messages to end users.
As always, your mileage may vary.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas