Avoiding a complete resync of Azure's SQL Data Sync - sql

So as a I understand it, if you have an outstanding sync error for more than 40 days Azure's SQL Data Sync forces you to do a fresh upload of your entire database in order to get the service working again.
I'm wondering if there is a way to avoid this as my internet is quite slow and a complete re-sync would take half a week to complete. Is there a way to trick the system into thinking that the error was not present for 40 days and resume it's differential back up after the error has a been corrected?

Unfortunately this is correct. See the MSDN topics http://msdn.microsoft.com/en-us/library/hh667321.aspx#bkmk_databaseoutofdatestatus (for an out-of-date database) and http://msdn.microsoft.com/en-us/library/hh667321.aspx#bkmk_sgoutofdatestatus (for an out-of-date sync group).

If it is a database that is out-of-date, you're better off deleting the database then creating it anew, but leaving it empty of data, and adding the empty database to your sync group. If you merely remove then re-add the populated database to the sync group, on the first sync Data Sync will treat every row as a conflict that needs to be resolved (even if the data is unchanged in the row) which adds a lot of time to the initial sync (and may add costs too). See the topic http://msdn.microsoft.com/en-us/library/hh667328.aspx#InitialSync.

Related

RavenDB taking forever to show updates

I'm starting to assess our company using RavenDB for storing some stuff that doesn't really belong in a relational database (we're traditionally a SQL Server shop). I installed RavenDB locally on my machine, created a database, added a document. Nice!
Being a DBA, I decided to see how backups/restores work. I backed up my database, deleted it, then restored it from the backup. After refreshing my admin screen, I saw my database. I clicked on it, and got a message that the database doesn't exist.
After a couple hours, I tried again. Still doesn't exist. A full day later, I walk into work, and try again. This time the database works. I've had similar situations with updating documents. The update seems to take anywhere between 1 second - several hours to show an update...
Is this normal for RavenDB?? Am I completely misconfigured?? I run SQL Server on my local machine and it's lightning-fast, so I can't imagine updating a single document could take that long. As-is, I can't imagine recommending we use RavenDB for anything.
Are you querying using indexes or getting documents by ID? Documents should be updated immediately (ACID). If indexes are slow to update (check their status using RavenDB Studio), it could be a configuration problem or something external like an anti-virus software can cause them to update slowly.
Apparently, at least for the document-update latency, the default for caching in queries is enabled, so I was getting cached results.
Jeffery,
No, that isn't normal by a long short. You should be able to immediately see what was changed.
Note that certain AV products will interfere with the HTTP pipeline and can affect RavenDB's usage. The studio will also auto update things only every 5 seconds (to reduce UI jitter), but that is about it.
Restoring a database (from the same machine), should take only as long as it take to copy the files (pure I/O bound operation).
If this is from another machine using a different version of Windows, we might need to run a check on the file, which can take a bit of time, but that doesn't sound like your scenario

Change Tracking w/o an Application to Sync with

I work as part of a two man DBA team running SQL Server 2008 R2 with me being somewhat of an accidental DBA. We recently had an issue where a small table we hardly ever use ended up getting truncated. Both of us swear we didn't do it, but it happened nonetheless.
To avoid the situation in the future, we're interested in implementing change tracking. It's not really necessary for us to preserve the data that was changed so we decided against using change data capture.
With that said, the things I'm reading about change tracking seem to be more about using it to synchronize data with an application rather than simply recording all the changes. Can I use change tracking to simply keep a list of all the changes made in the last 6 months or something? Once I enable it for each database in the SQL Server GUI, where is the info stored? Any other info you may have on implementing this correctly would be great.
Thanks!
From the documentation for change tracking:
The values of the primary key column is only information from the
tracked table that is recorded with the change information. These
values identify the rows that have been changed. To obtain the latest
data for those rows, an application can use the primary key column
values to join the source table with the tracked table.
So, if you're going to try to use this as a mechanism to somehow recover accidental deletions, I think you'll find it lacking. But don't take my word for it. Set up the following test:
In a test environment, set up a dummy table and enable change tracking on it.
Insert some data into the dummy table.
Delete the data
Recover the data using change tracking.
Despite having already dismissed CDC as an option, it sounds more in line with what you're after. CDC does keep track of non-primary key columns so if someone does a data modification accidentally, CDC will keep track of all of the values in the row(s) affected. It has the added benefit of not allowing table truncation because of how it's implemented (it uses the replication log reader).
Additionally, you can configure the CDC cleanup job to automatically purge data after any any amount of time you want (it sounds like 6 months is your retention period, which is completely doable).

Programmatically purge document deletion

I've a database with an agent that periodically delete (via Java agent, "removePermanently" method) all documents in a view and re-create them.
After some month, i've noticed that database size is considerably increased.
Showing database information through this command
sh database <dbpath>
it results that i've a lot of deleted documents (i suppose they are deletion stubs)
Document Type Live Deleted
Documents 1,922 817,378
Compacting database, 80% space was recovered.
Is there a way to programmatically delete stubs definitively to avoid "database explosion"? Or, is there a way to correctly manage this scenario (deletion and creation of documents)?
Don't delete the documents! Re-use them. That's the best answer. Seriously. Take the existing documents, clear the fields and set Form := "Obsolete". Modify the selection formula for all your views by appending & Form != "Obsolete" Create a new hidden view called "Obsolete" with selection formula Form = "Obsolete", and instead of creating new documents, change your code to go to the Obsolete view, grab an available document and set new field values (including changing the Form field). Only create new documents if there are not enough available in the Obsolete view. Any performance that you lose by doing this, which really should be minimal with the number of documents that you seem to have, will be more than offset by what you will gain by avoiding the growth and fragmentation of the NSF file that you are creating by doing all the deletions and creating new documents.
If, however, there's no possible way for you to do that -- maybe some third party tool that is outside of your control is creating the documents -- then it's important to know if the database you are talking about is replicated. If it is replicated, then you must be very careful because purging deletion stubs before all replicas are brought up to date will cause deleted documents to "come back to life" if a replica that has been off-line since before the delete occurs comes back on-line.
If the database is not replicated at all, or is reliably replicated across all replicas quickly, then you can reduce the purge interval. Go to the Replication Settings dialog, find the checkbox labeled "Remove documents not modified in the last __ days". Do not check the box, but enter a small number into the number of days. The purge interval for deletion stubs will be set to 1/3 of this number. So if you set it to 3 the effect will be that stubs are kept for 1 day and then purged, giving you 24 hours to assure that all replicas are up to date. If you need more, set the interval higher, maintaining the 3x multiple as needed. If a server is down for an extended period of time (longer than your purge interval), then adjust your operations procedures so that you will be sure to disable replication of the database to that server before it comes back on line and the replica can be deleted and recreated. Be aware, though, that user replicas pose the same problem, and it's not really possible to control or be aware of user replicas that might go off-line for longer than the purge interval. In any case, remember: do not check the box. To reduce the purge interval for deletion stubs only, just reduce the number.
Apart from this, the only way to programmatically delete deletion stubs requires use of the Notes C API. It's possible to call the required routines from LotusScript, but in my experience once the total number of stubs plus documents gets too high you will likely run into an error and may have to create and deploy a new non-replica copy of the database to get past it. You can find code along with my explanation in the answer to this previous question.
I have to second Richard's recommendation to reuse documents. I recently had a similar project, and started the way you did with deleting everything and importing half a million records every night. Deletion stubs and the growth of the FT index quickly became problems, eating up huge amounts of disk space and slowing performance significantly. I tried to manage the deletion stubs, but I was clearly going against the grain of Domino's architecture.
I read Richard's suggestion here, and adopted that approach. Here's what I did:
1) create 2 views based on form - one for 'active' records, and another for 'inactive' records
2) start the agent by setting autoupdate = false for both views
3) use stampall("form", "inactive") to change all fo the active records to inactive
4) manually refresh the 2 views using notesview.refresh()
5) start importing data. for each record, pull a document out of the pool of inactive records (by walking the 'inactive' view)
6) if if run out of inactive records in the pool, create new ones
7) when import is complete, manually refresh the views again
8) use db.createftindex(0, true) to re-create the FT index
the code is really not that complex, and it runs in about the same amount of time, if not faster, than my original approach.
Thanks Richard!
Also, look at the advanced db properties - several things there that will help optimize the db.
It sounds like you are "refreshing" the contents of the database by periodically deleting all the documents and creating new ones from some other source. Cut that out. If the data are in the Notes database already, leave the document alone. What you're doing is very inefficient.

Creating tables in SQL Server 2005 master DB

I am adding a monitoring script to check the size of my DB files so I can deliver a weekly report which shows each files size and how much it grew over the last week. In order to get the growth, I was simply going to log a record into a table each week with each DB's size, then compare to the previous week's results. The only trick is where to keep that table. What are the trade-offs in using the master DB instead of just creating a new DB to hold these logs? (I'm assuming there will be other monitors we will add in the future)
The main reason is that master is not calibrated for additional load: it is not installed on IO system with proper capacity planning, is hard to move around to new IO location, it's maintenance plan takes backups and log backups are as frequent as needed for a very low volume of activity, its initial size and growth rate are planned as if no changes are expected. Another reason against it is that many troubleshooting scenarios you would want a copy of the database to inspect, but you'd have to attach a new master to your instance. These are the main reasons why adding objects to master is discouraged. Also many admins understandably prefer an application to use it's own database so it can be properly accounted for, and ultimately easily uninstalled.
Similar problems exist for msdb, but if push comes to shove it would be better to store app data in msdb rather than master since the former is an ordinary database (despite widespread believe that is system, is actually not).
The Master DB is a system database that belongs to SQL Server. It should not be used for any other purposes. Create your own DB to hold your logs.
I would refrain from putting anything in master, it could be overwritten/recreated on an upgrade.
I have put a DBA only ServerInfo database on each server for uses like this, as well as any application specific environmental things (things that differ between prod and test and dev).
You should add a separat database for the logging. It is not garanteed that the master database is not breaking the next patch of sql server if you leave your objects in there.
And microsoft itself does advise you to not do it.
http://msdn.microsoft.com/en-us/library/ms187837.aspx

Recommended approach how to modify schema of a production SQL database?

Say there is a database with 100+ tables and a major feature is added, which requires 20 of existing tables to be modified and 30 more added. The changes were done over a long time (6 months) by multiple developers on the development database. Let's assume the changes do not make any existing production data invalid (e.g. there are default values/nulls allowed on added columns, there are no new relations or constraints that could not be fulfilled).
What is the easiest way to publish these changes in schema to the production database? Preferably, without shutting the database down for an extended amount of time.
Write a T-SQL script that performs the needed changes. Test it on a copy of your production database (restore from a recent backup to get the copy). Fix the inevitable mistakes that the test will discover. Repeat until script works perfectly.
Then, when it's time for the actual migration: lock the DB so only admins can log in. Take a backup. Run the script. Verify results. Put DB back online.
The longest part will be the backup, but you'd be crazy not to do it. You should know how long backups take, the overall process won't take much longer than that, so that's how long your downtime will need to be. The middle of the night works well for most businesses.
There is no generic answer on how to make 'changes' without downtime. The answer really depends from case to case, based on exactly what are the changes. Some changes have no impact on down time (eg. adding new tables), some changes have minimal impact (eg. adding columns to existing tables with no data size change, like a new nullable column that doe snot increase the null bitmap size) and other changes will wreck havoc on down time (any operation that will change data size will force and index rebuild and lock the table for the duration). Some changes are impossible to apply without *significant * downtime. I know of cases when the changes were applies in parallel: a copy of the database is created, replication is set up to keep it current, then the copy is changed and kept in sync, finally operations are moved to the modified copy that becomes the master database. There is a presentation at PASS 2009 given by Michelle Ufford that mentions how godaddy gone through such a change that lasted weeks.
But, at a lesser scale, you must apply the changes through a well tested script, and measure the impact on the test evaluation.
But the real question is: is this going to be the last changes you ever make to the schema? Finally, you have discovered the perfect schema for the application and the production database will never change? Congratulation, once you pull this off, you can go to rest. But realistically, you will face the very same problem in 6 months. the real problem is your development process, with developers and making changes from SSMS or from VS Server Explored straight into the database. Your development process must make a conscious effort to adopt a schema change strategy based on versioning and T-SQL scripts, like the one described in Version Control and your Database.
Use a tool to create a diff script and run it during a maintenance window. I use RedGate SQL Compare for this and have been very happy with it.
I've been using dbdeploy successfully for a couple of years now. It allows your devs to create small sql change deltas which can then be applied against your database. The changes are tracked by a changelog table within database so that it knows what to apply.
http://dbdeploy.com/