I recently added two new tables to a db that is currently being transactionally replicated. Short of dropping and recreating the entire publication is there a way to quickly add these two new tables to the existing publication? Will I have to take an entirely new snapshot? I only ask because this is a production db and cant be stopped until nighttime, lockups will cause major issues.
Thanks - Travis
DISCALIMER: All replication has the potential to lock entire databases as it reads the entire log. Changes should be thoroughly tested outside of Production and implemented off hours.
For basic transactional replication, you can use sp_addarticle and sp_addsubscription for each table without affecting the existing subscriptions. If you initialized the current subscription with sp_addsubscription #article = 'all'; (default), it may not let you add additional articles in which case you will have to drop existing subscriptions or create a new publication.
You won't necessarily have to take a snapshot for the existing subscriptions even if you do have to drop them, but you take responsibility for keeping the data in sync. You should use triggers or other methods to lock down changes before dropping subscriptions, and recreate them using sp_addsubscription #sync_type='replication support only'; If all subscriptions are created this way, a snapshot will not be generated. If only the new articles are subscribed with #sync_type='automatic' then only those articles will be present in the new snapshot. Afterwards, you should verify data integrity between publisher and subscriber.
Related
What is the best approach to keep Production,dev and test enviroments in sync?
We have Master Data Services database in our development, Test and Production environments. Data is been entered into Production and we need to keep our test and development servers in Sync. I couldn't find the documentation to handle this.
I am not sure if this process is correct-
For moving updated data from Development we are following this process-
create second version of the model and make the changes in it and then deploy the 2nd version to test and prod.
Can we do this same above process from Production to test and Development to keep them in Sync?
Thanks
Two options come to mind:
Snapshot replication
Snapshot replication distributes data exactly as it appears at a specific moment in time and does not monitor for updates to the data. When synchronization occurs, the entire snapshot is generated and sent to Subscribers.
Log shipping
SQL Server Log shipping allows you to automatically send transaction log backups from a primary database on a primary server instance to one or more secondary databases on separate secondary server instances. The transaction log backups are applied to each of the secondary databases individually.
MDS has tool which is called MDSModelDeploy. You can create package with all business rules, schema and data. Ship it over to some other machine and.
clone model (preserving keys, etc)
update model
More information here
I want to stream some time series data into BigQuery with insertAll but only retain the last 3 months (say) to avoid unbounded storage costs. The usual answer is to save each day of data into a separate table but AFAICT this would require each such table to be created in advance. I intend to stream data directly from unsecured clients authorized with a token that only has bigquery.insertdata scope, so they wouldn't be able to create the daily tables themselves. The only solution I can think of would be to run a secure daily cron job to create the tables -- not ideal, especially since if it misfires data will be dropped until the table is created.
Another approach would be to stream data into a single table and use table decorators to control query costs as the table grows. (I expect all queries to be for specific time ranges so the decorators should be pretty effective here.) However, there's no way to delete old data from the table, so storage costs will become unsustainable after a while. I can't figure out any way to "copy and truncate" the table atomically either, so that I can partition old data into daily tables without losing rows being streamed at that time.
Any ideas on how to solve this? Bonus points if your solution lets me re-aggregate old data into temporally coarser rows to retain more history for the same storage cost. Thanks.
Edit: just realized this is a partial duplicate of Bigquery event streaming and table creation.
If you look at the streaming API discovery document, there's a curious new experimental field called "templateSuffix", with a very relevant description.
I'd also point out that no official documentation has been released, so special care should probably go into using this field -- especially in a production setting. Experimental fields could possibly have bugs etc. Things I could think to be careful of off the top of my head are:
Modifying the schema of the base table in non-backwards-compatible ways.
Modifying the schema of a created table directly in a way that is incompatible with the base table.
Streaming to a created table directly and via this suffix -- row insert ids might not apply across boundaries.
Performing operations on the created table while it's actively being streamed to.
And I'm sure other things. Anyway, just thought I'd point that out. I'm sure official documentation will be much more thorough.
Most of us are doing the same thing as you described.
But we don't use a cron, as we create tables advance for 1 year or on some project for 5 years in advance. You may wonder why we do so, and when.
We do this when the schema is changed by us, by the developers. We do a deploy and we run a script that takes care of the schema changes for old/existing tables, and the script deletes all those empty tables from the future and simply recreates them. We didn't complicated our life with a cron, as we know the exact moment the schema changes, that's the deploy and there is no disadvantage to create tables in advance for such a long period. We do this based on tenants too on SaaS based system when the user is created or they close their accounts.
This way we don't need a cron, we just to know that the deploy needs to do this additional step when the schema changed.
As regarding don't lose streaming inserts while I do some maintenance on your tables, you need to address in your business logic at the application level. You probably have some sort of message queue, like Beanstalkd to queue all the rows into a tube and later a worker pushes to BigQuery. You may have this to cover the issue when BigQuery API responds with error and you need to retry. It's easy to do this with a simple message queue. So you would relly on this retry phase when you stop or rename some table for a while. The streaming insert will fail, most probably because the table is not ready for streaming insert eg: have been temporary renamed to do some ETL work.
If you don't have this retry phase you should consider adding it, as it not just helps retrying for BigQuery failed calls, but also allows you do have some maintenance window.
you've already solved it by partitioning. if table creation is an issue have an hourly cron in appengine that verifies today and tomorrow tables are always created.
very likely the appengine wont go over the free quotas and it has 99.95% SLO for uptime. the cron will never go down.
I have a large Db 500GB one of our customers wants daily snapshot of only his data, He only has a 3mb connection , I suspect that is the Max ! What method is the most effective method I could use?
1. Views that are updated but it wants the underlaying tables.
2. Replication I don’t know much about this.
3. Alternative method.
Merge replication, which allows you to initialize a subscriber without using a snapshot. You will have to initialize replication on the subscriber from a backup. You can possibly do the same with transactional replication, but it just never worked quite the same for me. YMMV. When it breaks (etc.) you will have to be prepared to ship a new backup and start over (hacking replication sometimes works, but don't count on it). Changing database structures is also a pain once in replication. I have seen ~500Gb with less than 3Mbit work in production, though not without proper planning and preparation (and grey hair)
I have used transactional replication with a read only subscription, where I invoked the distribution with a batch file on a task schedule once a day. Transactional replication did not feel as maintained (from MS) or as stable as merge replication, though the data integrity with transactional was more consistent
I have not tried transaction log shipping, but that might also be an option
(p.s. notice I didn't say "If it breaks")
I am adding a monitoring script to check the size of my DB files so I can deliver a weekly report which shows each files size and how much it grew over the last week. In order to get the growth, I was simply going to log a record into a table each week with each DB's size, then compare to the previous week's results. The only trick is where to keep that table. What are the trade-offs in using the master DB instead of just creating a new DB to hold these logs? (I'm assuming there will be other monitors we will add in the future)
The main reason is that master is not calibrated for additional load: it is not installed on IO system with proper capacity planning, is hard to move around to new IO location, it's maintenance plan takes backups and log backups are as frequent as needed for a very low volume of activity, its initial size and growth rate are planned as if no changes are expected. Another reason against it is that many troubleshooting scenarios you would want a copy of the database to inspect, but you'd have to attach a new master to your instance. These are the main reasons why adding objects to master is discouraged. Also many admins understandably prefer an application to use it's own database so it can be properly accounted for, and ultimately easily uninstalled.
Similar problems exist for msdb, but if push comes to shove it would be better to store app data in msdb rather than master since the former is an ordinary database (despite widespread believe that is system, is actually not).
The Master DB is a system database that belongs to SQL Server. It should not be used for any other purposes. Create your own DB to hold your logs.
I would refrain from putting anything in master, it could be overwritten/recreated on an upgrade.
I have put a DBA only ServerInfo database on each server for uses like this, as well as any application specific environmental things (things that differ between prod and test and dev).
You should add a separat database for the logging. It is not garanteed that the master database is not breaking the next patch of sql server if you leave your objects in there.
And microsoft itself does advise you to not do it.
http://msdn.microsoft.com/en-us/library/ms187837.aspx
So, I've got SQL (2005) Transactional Replication generally working well with a single publisher and single (read-only) subscriber. Data changes and updates flow perfectly, with about 5 second latency, which is just fine.
My one nagging problem, that I've spent a couple days trying to solve (and Googling everywhere for answers) is that new sprocs/tables/etc. do not get propagated to the read-only subscriber, even though I've added them as "articles" to the "publication". The publication has "transmit schema changes" set to ON, and stored procedures are set to transfer their definitions. But, for some reason, they don't.
My "snapshot agent" process is set to NOT SCHEDULED. (In other words, it only happens once, when I initiate it manually.) Should I be putting this on a schedule to enable the transfer of new or modified tables and sprocs?
I thought the mere act of adding the object as an article to the publication would do it, but it's still not sending it unless I do a snapshot. The WAN connecting these is totally fast and reliable, so that's not the issue, and table-data-updates transfer relatively fast and flawlessly.
While I could put my snapshot agent on a schedule, does this have any real-time production impacts for users of the main publication database or the read-only copy? (My site currently gets 4+ million unique-users a month, so I'd like to have minimal disruption...) Thanks!
Transactional replication only distributes (and then subsequently publishes) the DML (Data Manipulation Language) statements from the transaction log of the source (publication) database.
New tables and stored procedures are not replicated to the subscriber. Schema changes in this particular context, although I have to admit it is a little unclear in some of the Books Online documentation, refer to the existing schema, i.e. if you were to a add column to an existing database this change would be propagated to the subscribers.
For clarification here is a Microsoft article that details the schema changes that you can make.
[http://msdn.microsoft.com/en-us/library/ms151870(SQL.90).aspx][1]
I hope this helps. Replication is a big subject area so please let me know if I can be of further assistance.
Oh yes, you are correct, if you add new articles to your publication you will need to create an updated snapshot.
Cheers,