sql subscriber replication change subscriber side only data - sql

We are replicating a very large database to a data warehouse server (DELETE is off) via publisher/subscriber. The data warehouse team want to change some of the data at their subscriber side by wiping to NULL a few non-key data columns (name, address etc.).
I have tried searching via Google etc. but can't find any information on whether this is possible without breaking replication or causing unknown issues.
We do NOT want these changes propagated back to the publisher.
Anybody know what impact this will have?
I could change the Sp_MS* stored procedures but I reckon on reinitialize of the replication they could be overwritten by the MS standard ones again – I don’t really want to do this as it seems a messy solution.
I can test the idea out but was more concerned even if it did work with no issues that some unknown factor would cause an issue at some point.

Related

Change Tracking w/o an Application to Sync with

I work as part of a two man DBA team running SQL Server 2008 R2 with me being somewhat of an accidental DBA. We recently had an issue where a small table we hardly ever use ended up getting truncated. Both of us swear we didn't do it, but it happened nonetheless.
To avoid the situation in the future, we're interested in implementing change tracking. It's not really necessary for us to preserve the data that was changed so we decided against using change data capture.
With that said, the things I'm reading about change tracking seem to be more about using it to synchronize data with an application rather than simply recording all the changes. Can I use change tracking to simply keep a list of all the changes made in the last 6 months or something? Once I enable it for each database in the SQL Server GUI, where is the info stored? Any other info you may have on implementing this correctly would be great.
Thanks!
From the documentation for change tracking:
The values of the primary key column is only information from the
tracked table that is recorded with the change information. These
values identify the rows that have been changed. To obtain the latest
data for those rows, an application can use the primary key column
values to join the source table with the tracked table.
So, if you're going to try to use this as a mechanism to somehow recover accidental deletions, I think you'll find it lacking. But don't take my word for it. Set up the following test:
In a test environment, set up a dummy table and enable change tracking on it.
Insert some data into the dummy table.
Delete the data
Recover the data using change tracking.
Despite having already dismissed CDC as an option, it sounds more in line with what you're after. CDC does keep track of non-primary key columns so if someone does a data modification accidentally, CDC will keep track of all of the values in the row(s) affected. It has the added benefit of not allowing table truncation because of how it's implemented (it uses the replication log reader).
Additionally, you can configure the CDC cleanup job to automatically purge data after any any amount of time you want (it sounds like 6 months is your retention period, which is completely doable).

Strategy for keeping separate Databases in Sync

I have a NoSQL database that we are using for data processing, as it can be used for my application faster than SQL can. I'm treating our NoSQL database almost like a cache of information, with the SQL being the authority of data, and the NoSQL store being updated with changes. Right now this is being done through our application, so when a request comes in for a change, it is made in the SQL database, and the NoSQL database. This is failing at times as sometimes the NoSQL update fails, or other situations cause the NoSQL database to get out of sync.
I could do a batch update every X minutes, however it is a lot of information in the data stores, and it would take hours to ensure that they are in sync. We have some timestamps to do a difference of what has been changed, but this is not always accurate.
I'm wondering what some recommended strategy for keeping a data store(secondary database cache) in sync with my main store are?
I know I've done with this with messaging in the past - specifically JMS with ActiveMQ. I would send the updates to a NoSQL store (Mongo) by using a queue. This way messages could accumulate in the queue and if the connection to the NoSQL store ever got severed, it could pick up where it left off.
It worked really well because ActiveMQ was really stable and simple to work with.
I've always seen this done with diffs like you mentioned. You introduce date fields all over and then keep track of the latest sync. The nice thing about this approach is that it easily allows you to replay transactions by modifying the last sync date.
One last piece of advice ... write good tools around pumping data from point A to point B (in this case SQL to NoSQL). I wrote several tools to bulk load the NoSQL store from SQL at my last job and it made life easy if anything got really out of sync. Between scripts and bulk loading processes, I could always recover.

Creating tables in SQL Server 2005 master DB

I am adding a monitoring script to check the size of my DB files so I can deliver a weekly report which shows each files size and how much it grew over the last week. In order to get the growth, I was simply going to log a record into a table each week with each DB's size, then compare to the previous week's results. The only trick is where to keep that table. What are the trade-offs in using the master DB instead of just creating a new DB to hold these logs? (I'm assuming there will be other monitors we will add in the future)
The main reason is that master is not calibrated for additional load: it is not installed on IO system with proper capacity planning, is hard to move around to new IO location, it's maintenance plan takes backups and log backups are as frequent as needed for a very low volume of activity, its initial size and growth rate are planned as if no changes are expected. Another reason against it is that many troubleshooting scenarios you would want a copy of the database to inspect, but you'd have to attach a new master to your instance. These are the main reasons why adding objects to master is discouraged. Also many admins understandably prefer an application to use it's own database so it can be properly accounted for, and ultimately easily uninstalled.
Similar problems exist for msdb, but if push comes to shove it would be better to store app data in msdb rather than master since the former is an ordinary database (despite widespread believe that is system, is actually not).
The Master DB is a system database that belongs to SQL Server. It should not be used for any other purposes. Create your own DB to hold your logs.
I would refrain from putting anything in master, it could be overwritten/recreated on an upgrade.
I have put a DBA only ServerInfo database on each server for uses like this, as well as any application specific environmental things (things that differ between prod and test and dev).
You should add a separat database for the logging. It is not garanteed that the master database is not breaking the next patch of sql server if you leave your objects in there.
And microsoft itself does advise you to not do it.
http://msdn.microsoft.com/en-us/library/ms187837.aspx

Using Sql Server Replication

We are using Replication and seem to be having endless problems with it. It seems to shut down for unknown reasons. It needs to be shut down to remove a column and only starts back up half the time. Does anyone have any advice on how to properly use replication or some alternatives to it.
Edit:
We are using Sql Server 2005, We cannot use database mirroring as we used the other database for reporting. As far as I am aware you cannot query from a mirrored database.
If you need just couple of tables from your DB for reports, replication is more useful, but you also can set up log shipping with secondary server in STAND BY mode (especially if you need significant part of your data for reports), then you can run reports on secondary server. You just have to remember that log shipping will interfere with transaction log backups, so you have to use the same folder with log backup files for both processes.
I would think the combination of database mirroring and database snapshots will solve your issues.
First, database mirroring is very easy to setup and I have never had any problems with it (using it for the past 4+ years).
Second, creating a database snapshot on your failover server will allow you to run reports. You can setup a sql agent job to drop and re-create the snapshot on whatever acceptable interval you like.
Of course this is all dependent on if you need your reports to run on real-time data or if they can be delayed somewhat.
Here are a list of the problems that I have had to resolve to get replication working:
1) The replication sometimes lies to me and tells me this, even when its working fine.
"The server 'Bob' is not a Subscriber. (.Net SqlClient Data Provider)" I have tried to re-initialise it thinking that it was broken and it never was...
2) It can take a little while to restart itself, especially if your remote DB is on the other side of the planet, which it is in my case. If you are on a slow network connection, or it is not 100% reliable, then you can have problems. Also, the jobs which restart the process can sometimes take a while to run, which also delays things further.
3) Some changes require full re-initalisation which involves sending a new snapshot out. If you don't have your permissions quite right, and you can re-initialise manually, but it doesn't happen automatically, then this can be a another reason for problems.
We have a SQL transactional replication which runs perfectly happily. You seem to say that it is when you are making schema changes to the publisher that you get problems. Each time we do a schema change we drop the publication, subscription and the subscription database. Do the change, then re-build it all. We can do this becuase we can tolerate the time it takes to re-apply the snapshot. There are ways to apply schema changes to the publication and have them propogate to the subscriber. Take a look at sp_register_custom_scripting. We have made this work once, so I can give some more information about it if you need.
As #Jason says, you can report from a mirrored database by using a snapshot. Beware that the snapshot will take up space, and cause more work for the mirror server. Although how much space will depend on how much data is changing and how big your original database is. We do use a snapshot on a mirrored database for occasional reports because our entire database is not replicated.
log shipping http://msdn.microsoft.com/en-us/library/ms187103.aspx
What version of SQL Server are you using?
We're using replication now for a particular solution, and it seems to just work, day in, day out.
I would examine your event log's, and SQL Server logs to see if you can determine why it is shutting down, and why it doesn't start up.
Are you possibly patching the servers, or are you having network errors?
The alternatives to replication are log shipping, or database mirroring.
I personally prefer Database Mirroring, but it really depends what you're trying to do, as some of these aren't appropriate for certain situations.
We also have used SQL transactional replication. We had the same pains with updating schema, which requires dropping the publication on all servers, performing the updates, and then reinitializing replication, and hoping for the best. Sometimes it would not initialize, or a node would fall behind and we'd get little warning for it. A few times we even lost all the stored procedure execute permissions causing pretty much total failure on the websites.
We have a rather large database so reinitialization could take quite some time, meaning all updates had to be done at 2am on Sunday - not exactly when we're awake and alert and able to use all our faculties to deal with a problem that might arise.
We are ditching replication in favor of failover clustering on SQL 2008, but it can still be done all the way back to SQL 2000.
http://technet.microsoft.com/en-us/library/cc917693.aspx

SQL 2005 Transactional Replication: Behavior during snapshot processing?

So, I've got SQL (2005) Transactional Replication generally working well with a single publisher and single (read-only) subscriber. Data changes and updates flow perfectly, with about 5 second latency, which is just fine.
My one nagging problem, that I've spent a couple days trying to solve (and Googling everywhere for answers) is that new sprocs/tables/etc. do not get propagated to the read-only subscriber, even though I've added them as "articles" to the "publication". The publication has "transmit schema changes" set to ON, and stored procedures are set to transfer their definitions. But, for some reason, they don't.
My "snapshot agent" process is set to NOT SCHEDULED. (In other words, it only happens once, when I initiate it manually.) Should I be putting this on a schedule to enable the transfer of new or modified tables and sprocs?
I thought the mere act of adding the object as an article to the publication would do it, but it's still not sending it unless I do a snapshot. The WAN connecting these is totally fast and reliable, so that's not the issue, and table-data-updates transfer relatively fast and flawlessly.
While I could put my snapshot agent on a schedule, does this have any real-time production impacts for users of the main publication database or the read-only copy? (My site currently gets 4+ million unique-users a month, so I'd like to have minimal disruption...) Thanks!
Transactional replication only distributes (and then subsequently publishes) the DML (Data Manipulation Language) statements from the transaction log of the source (publication) database.
New tables and stored procedures are not replicated to the subscriber. Schema changes in this particular context, although I have to admit it is a little unclear in some of the Books Online documentation, refer to the existing schema, i.e. if you were to a add column to an existing database this change would be propagated to the subscribers.
For clarification here is a Microsoft article that details the schema changes that you can make.
[http://msdn.microsoft.com/en-us/library/ms151870(SQL.90).aspx][1]
I hope this helps. Replication is a big subject area so please let me know if I can be of further assistance.
Oh yes, you are correct, if you add new articles to your publication you will need to create an updated snapshot.
Cheers,