SyncFramework 2.1 updates & deletes do not seem to apply properly - wcf

I'm synchronizing SQL Server 2008 with ~6 SQL Server 2008 Express clients (everything R2 I believe), using the SyncOrchestrator or specifically using http://code.msdn.microsoft.com/windowsdesktop/Database-SyncSQL-Server-e97d1208 as a base with slight modifications. To my knowledge this means all connections are peers or nodes.
I have 2 scopes. One is download only and the other is upload only. The download only scope is ridden with identity columns primarily because I didn't know any better and still couldn't wrap my head around introducing Guids as the PK on the client side. It doesn't totally matter as all clients should have exact replicas of about 8 or so tables and these machines don't touch this data in any way, only read it.
The upload only scope uses Guids as fortunately I can control that portion of the database and there would be no way 10 clients all using the same identity seed could sync back to the server properly. Both scopes use the default provisioning with bulk inserts and the whole 9 yards so there shouldn't be anything I'm doing on the provisioning end to screw this up.
I initially set everything up not using PerformPostRestoreFixup AND the initial database would be manually synchronized with insert statements from the host. This seemed fine but no updates or deletes seemed to ever be applied. You can safely ignore this (only used for historical accuracy and to prove my ineptness) as I then used VS2010 Database Projects to rebuild the database down to schema only & synchronized. I then used the steps outlined here (http://social.microsoft.com/Forums/br/syncdevdiscussions/thread/9ac6d1a1-1565-4b82-a8d8-3d4a9ff5d07b) (sync, backup, restore, call performpostrestorefixup, sync on x clients) and on my dev box where I'm setting all this up I could see updates and deletes just fine. Its when I deploy this to the x clients that I'm not seeing a mirror of the database as I think I should.
The initial sync will complain and try to synchronize all records again. I believe this is expected. During ApplyChangeFailed event on the client I set everything other than DbConflictType.ErrorsOccurred to ApplyAction.RetryWithForceWrite. This may be a source of problems as I initially thought this should be done to force the change down to the client. I want the server to always win in this scenario but during trace I always see the phrase "Local wins" during the bulk insert/update calls. It's possible I'm seeing the error before the re-apply happens but it's awkward to look at.
The only problem I seem to be having is with the download only scope. The initial client database is about a week old now and if I use the performpostrestorefixup steps I don't see any of the updates that have applied between now and then as I think I should. It's as if SyncFx almost prefers a blank database on the client side to kick off the initial sync then all the updates seem to apply just fine with no ApplyChangesFailed events kicking off.
If anyone has seen this before or has a clue where to go I would greatly appreciate it. My brain has fried trying to determine what it is that's going on. My last ditch effort will be to deploy blank databases to all the clients and have them start the sync. I've had no issues with this on the dev side but I can only test one other client to know if that'll do anything different. Aside from that I don't know what to do other than to keep doing manual syncs which would defeat this purpose entirely. I thought PerformPostRestoreFixup would alleviate the issue entirely but I seem to be having the same problems with or without it or perhaps I'm not looking at what I need to be.
Thanks

I wanted to report and close the entry with my findings.
When I would deploy a previously configured client database, I'd often get ApplyChangeFailed events in the form of this log:
"[05:30:41 PM] - ApplyChange Failed: TableName: , Stage: ApplyingInserts, ConflictType: LocalInsertRemoteInsert, Action: RetryWithForceWrite"
This is what I thought would be expected as it tried to reinsert the data that is already there. What this should've been changed to was an update statement during RetryWithForceWrite but I found the data was not updating with what was being sent down.
Once I started each client with a completely blank database and provisioned locally, all of these errors went away. It's as if every client expects some unique id only it sets. I'm also using x64 builds versus x86 which may have some or no bearing on the results. I wish I could determine what exactly happened but it seems that when in doubt, and whenever possible, starting from absolute zero and letting sync fill in the data is your safest option.

Related

RavenDB taking forever to show updates

I'm starting to assess our company using RavenDB for storing some stuff that doesn't really belong in a relational database (we're traditionally a SQL Server shop). I installed RavenDB locally on my machine, created a database, added a document. Nice!
Being a DBA, I decided to see how backups/restores work. I backed up my database, deleted it, then restored it from the backup. After refreshing my admin screen, I saw my database. I clicked on it, and got a message that the database doesn't exist.
After a couple hours, I tried again. Still doesn't exist. A full day later, I walk into work, and try again. This time the database works. I've had similar situations with updating documents. The update seems to take anywhere between 1 second - several hours to show an update...
Is this normal for RavenDB?? Am I completely misconfigured?? I run SQL Server on my local machine and it's lightning-fast, so I can't imagine updating a single document could take that long. As-is, I can't imagine recommending we use RavenDB for anything.
Are you querying using indexes or getting documents by ID? Documents should be updated immediately (ACID). If indexes are slow to update (check their status using RavenDB Studio), it could be a configuration problem or something external like an anti-virus software can cause them to update slowly.
Apparently, at least for the document-update latency, the default for caching in queries is enabled, so I was getting cached results.
Jeffery,
No, that isn't normal by a long short. You should be able to immediately see what was changed.
Note that certain AV products will interfere with the HTTP pipeline and can affect RavenDB's usage. The studio will also auto update things only every 5 seconds (to reduce UI jitter), but that is about it.
Restoring a database (from the same machine), should take only as long as it take to copy the files (pure I/O bound operation).
If this is from another machine using a different version of Windows, we might need to run a check on the file, which can take a bit of time, but that doesn't sound like your scenario

Peer-to-peer replication fails due to missing sp_MS* stored procedures

We've had a two node peer-to-peer (P2P) replication setup running for almost a year but for some reason it was recently marked as inactive forcing us to re-build it. Unfortunately no matter how I try to re-build it I keep getting the same error.
Basically once P2P is configured it would instantly fail because it couldn’t find some auto-created “sp_MS[upd|ins|del]*******” stored procedures. While doing some investigation (and selecting directly from sys.procedures) I found that these stored procedures are only created when the DB becomes a subscriber as a P2P node, similar to transactional replication. As every DB in P2P is a publisher and a subscriber the fact that these were missing is strange especially considering it clearly succeeds in doing this on one of the DBs but not the other. It would appear that one of the DBs does not get set up as a subscriber but I receive no error or indication of this (in fact Management Studio does show two subscriptions). I tried pretty much everything I could think of, changing which DB is configured first, completely disabling distribution and re-creating it, etc. but still no luck.
Has anyone else run into this or could does anyone have any suggestions as to what I should look at trying next?
Thanks in advance!
The issue is that the database that the P2P backup was taken from was modified before it was restored on the new node. This happened even when the option specifying that a change has happened was used and is likely a bug. The "solution" is to ensure that the original DB has not been modified at all before replication gets back up and running.

HA Database configuration that avoids split-brain issues?

I am looking for a (SQL/RDB) database setup that works something like this:
I will have 3+ databases in an active/active/active configuration
prior to doing any insert, the database will communicate with atleast a majority of the others, such that they all either insert at the same time or rollback (transaction)
this way I can write and read from any of the databases, and always get the same results (as long as the field wasn't updated very recently)
note: this is for a use case that will be very read-heavy and have few writes (and delay on the writes is an OK situation)
does anything like this exist? I see all sorts of solutions with database HA configurations, but most of them suggest writing to a primary node or having a passive backup
alternatively I could setup a custom application, and have each application talk to exactly 1 database, and achieve a similar result, but I was hoping something similar would already exist
So my questions is: does something like this exist? if not, are there any technical/architectural reasons why not?
P.S. - I will NOT be using a SAN where all databases can store/access the same data
edit: more clarifications as far as what I am looking for:
1. I have no database picked out yet, but I am more familiar with MySQL / SQL Server / Oracle, so I would have a minor inclination towards on of those
2. If a majority of the nodes are down (or a single node can't communicate with the collective), then I expect all writes from that node to fail, and accept that it may provide old/outdated information
failure / recover scenario expectations:
1. A node goes down: it will query and get updates from the other nodes when it comes back up
2. A node loses connection with the collective: it will provide potentially old data to read request, and refuse any writes
3. A node is in disagreement with the data stores in others: majority rule
4. 4. majority rule does not work: go with whomever has the latest data (although this really shouldn't happen)
5. The entries are insert/update/read only, i.e. there will be no deletes (except manually ofc), so I shouldn't need to worry about an update after a delete, however in that case I would choose to delete the record and ignore the update
6. Any other scenarios I missed?
update: I the closest I see to what I am looking for seems to be using a quorum + 2 DBs, however I would prefer if I could have 3 DBs instead, such that I can query any of them at any time (to further distribute the reads, and also to keep another copy of the data)
You need to define "very recently". In most environments with replication for inserts, all the databases will have the same data within a few seconds of an insert (and a few seconds seems pessimistic).
An alternative approach is a "read-from-one/write-to-all" approach. In this case, reads are spread through the system. Writes are then sent to all nodes by the application (or a common layer that the application uses).
Remember, though, that the issue with database replication is not how it works when it works. The issue is how it recovers when it fails and even how failures are identified. You need to decide what happens when nodes go down, how they recover lost transactions, how you decide that nodes are really synchronized. I would suggest that you peruse the documentation of the database that you are actually using and understand the replication mechanisms provided by that platform.

AX 2009 Code Propagation with Load Balancing

I'm curious how AX 2009 handles code propagation when operating in a load balanced environment.
We have recently converted our AX server infrastructure from a single AOS instance to 3 AOS instances, one of which is a dedicated load balancer (effectively 2 user-facing servers). All share the same application files and database. Since then, we have had one user who has been having trouble receiving code updates made to the system. The changes generally take a few days before they can see it, and the changes don't seem to update all at once.
For example, a value was added to an ENUM field, and they were not able to see it on a form where it was used (though others connected to the same instance were). Now, this user can see the field in the dropdown as expected, but when connected to one of the instances it will not flow onto a report as it should. When connected to the other instance it works fine, and for any other user connected to either instance it works properly.
I'm not certain if this is related to the infrastructure changes, but it does seem odd that only one user is experiencing it. My understanding was that with this setup, code changes would propagate across the servers either immediately (due to sharing the Application Files), or at least in a reasonable amount of time (<1 day). Is this correct or have I been misinformed?
As your cache problems seems to be per user, then go learn about AUC files.
The files are store on the client computer and can be tricky to keep in sync. There are other problems as well.
Start AX by a script, delete the AUC file before starting AX.
There is no cache coherency between AOS instances: import an XPO on one AOS server, and it is not visible on the other. You will either have to flush the cache manually or restart the other AOS. The simplest thing is to import on each server, this is especially true for labels, as this is the only way to bring labels in sync to my knowledge.
I am sort of curious to this as well, but what I do know, is that if a user has access to the AOT (member of admin or a group with developer access), the client will cache AOT-elements more aggressively than if not having developer access.
Elements (like an Enum) might be cached at client level, but also at AOS-level. Restarting the AOS (service) would flush out memory for that service, forcing it to reload elements upon restart.
I guess what I am suggesting is that you make sure the element is not cached client side. Either restart the client, or run the "Refresh AOD" from the developer tools menu. If that doesn't help, try restaring the AOS the client connects to, and see if that helps.
I think it is safe to say, if you want to be absolutely sure every user has the most recent "copy" of any element, you should not develop on the application files shared by all of these services, but rather develop in an environment with 1 AOS. And when you need to move things to production, you need to take down all AOSes in production and move the chances over while the system is down.
In such cases it is often difficult to find the exact cause for a specific case.
I try to follow some best practices to avoid such situations:
- Use separate environment for developing
- Deploy code changes using layer files, not XPOs
- When deploying, stop all AOSs, deploy files, delete index files in the application directory, start one AOSs, compile, sync DB, start other AOS (or even shut down all and start again)
- Try to have latest kernel versions for AOSs and client

SQL Server Express Idle Mode Partial Data Returns?

I'm attempting to help our network engineers troubleshoot a situation for one of our clients. This client purchased a point-of-sale system from quite literally a "mom-and-pop" vendor, and said vendor recommended SQL Server Express 2005 as the back-end database to save the client from having to incur extra licensing fees. (Please don't get me started on that!)
We didn't write the app, and because it's a commercial app, we have no source code available. (Not that it would help us if we did; the thing was built in PowerBuilder, so we don't have tooling for it.) The app does none of its own logging, that we can ascertain. All we have to go on is SQL Server Express's own logging.
In the application, an end user swipes a membership card. Occasionally (a few times a day), the swipe will not return data from the database. The message on screen will say, "Member 123 not found." (The member numbers are actually six digits, "000123.") A rescan immediately afterward returns the member data correctly.
We've eliminated the scanner itself as a source of issues -- it routinely scans the full six-digit number. A scan of SQL Server Express's log indicates that it is coming back online from being idle, often at the point of the scan (but also at several other times per day). (Idle mode is explained here.)
I understand that allocating/deallocating RAM the way SQL Express does is a time-consuming process, especially if we're talking about hundreds of megabytes at a time -- which appears to be the case.
What we're not sure of is whether or not we're getting back partial data, or if the app is simply failing to connect to the database and displaying a generic error message. Since everything is so opaque, and the client is (for obvious reasons) unwilling to pay us to sit in their facility for 8 hours or so to physically see it happen (perhaps with network monitoring/packet sniffing tools), we're kind of at a loss.
At this point, our recommendation is that the client upgrade to SQL Server 2005 Workgroup Edition, with 5 CALs. But that doesn't completely sit well with me as the solution to this issue, because I'm reasonably certain that no SQL Server ever returns partial data -- if you can't connect, you can't connect. (That said, I still recommend it because it's a solution to a number of their other issues!)
I don't have much experience with Express. (I never use it for anything but local development, and there only at home; I certainly never recommend it to my clients.)
My question to those who might have experience with Express is, have you ever seen an instance of SQL Express return partial data, without the app itself being the cause of it? Specifically, have you seen this behavior when returning from idle mode?
(For what it's worth, we're inclined to believe that the app is failing to connect and merely displaying a generic error message, lopping off leading zeroes on the member ID when it does. That seems the most reasonable answer -- a third question might be, do you guys concur with that assessment?)
I've never heard of or experienced SQL Server Express returning partial data. It's essentially the same code base as the full SQL Server.
It is more likely that the application is experiencing a timeout (which defaults to 30 seconds) due to SQL Server Express going idle. The application probably receives a timeout that it does not expect and does not handle it well.
The problem and possible solutions are discussed in this forum thread: http://social.msdn.microsoft.com/forums/en-US/sqlexpress/thread/a8fbf8d6-9949-47a5-a32b-50f8131f1127/
I suspect you have a connection string that looks like this:
Data Source=.\SQLEXPRESS; Integrated Security=True;AttachDbFilename=|DataDirectory|\myDatabase.mdf;User Instance=True
From the referenced thread:
This connection string will cause an
initial connection to the main
instance (.\SQLEXPRESS) and then
instruct the main instance to spawn a
new instance of SQL Server under the
user's context and attach the database
specified to that new User Instance.
The User Instance is a completely
separate running instance of SQL
Server form the main instance that is
unique to the user and that will be
shut down when there are no longer any
connections to it.
This is totally different that
attaching a database to the main
instance, which stays running at all
times, unless you've manually shut it
down. If your question is about the
main instance going into an Idle
state, then your question is not
unique to SQL Express and you should
ask this question in the Database
Engine forum. I believe all Editions
of SQL Server have an Idle state and
the other forum would be where you can
find out how to affect that behavior.