Synchronizing with CORP and being a host using symmetricds - replication

I have a question regarding SymmetricDS and trying to get a project to work.
So far I've managed to get Corp to sync with Stores and Stores with Corp
This is working as intended, changes in Store are replicated to Corp, which will broadcast the changes back to the other stores.
Corp <--> Store_PC01
Corp <--> Store_PC02
Corp <--> Store_PC03
Corp <--> Store_PC04
This is all fine, however, I'm looking to go beyond this structure, I'm looking for the computers in a store to sync with each other, I'm thinking about making a PC also a host that will only sync with the computers in the same network, as such:
and
etc etc
Is this possible with Symmetric? I'm thinking of having 3 engine files. One will communicate with corp, the second one will be the PC self-host, the third will be the client to connect to the other PCs self-hosts in the same store. Is this logic correct? Do you recommend I change anything?
The reason I'm doing this is because connection with Corp might be lost as the internet can be somewhat unstable, so this is why I need to sync the computers in the same network.

It’s possible. Yes. As long as the graph of synchronization is a tree graph, i.e. one node has only one parent to which it talks and there’s one node in the root (without a parent) that represents the referent configuration point. For each level in the graph you will need to introduce at least one new node group and define sync rules W or P between it and the node group representing its parent level

Related

Akka.NET: Restrict child actor creation in akka.net cluster to a single machine

We have a particular scenario in our application - All the child actors in this application deals with huge volume of data (Around 50 - 200 MB).
Due to this, we have decided to create the child actors in the same machine (worker process) in which parent actor was created.
Currently, this is achieved by the use of Roles. We also use .NET memory cache to transfer the data (Several MBs) between child actors.
Question : Is it ok to turn off clustering in the child actors to achieve the result we are expecting?
Edit: To be more specific, I have explained the our application setup in detail, below.
The whole process happens inside a Akka.NET cluster of around 5 machines
Worker processes (which contains both parent and child actors) are deployed in each of those machines
Both parent and child actors are cluster enabled, in this setup
When we found out the network overhead caused by distributing the child actors across machines, we decided to restrict child actor creation to the corresponding machines which received the primary request, and distribute only the parent actor across machines.
While approaching an Akka.NET expert with this problem, we were advised to use "Roles" in order to restrict the child actor creation to a single machine in a cluster system. (E.g., Worker1Child, Worker2Child instead of "Child" role)
Question (Contd.) : I just want to know, if simply by disabling cluster option in child actors will achieve the same result; and is it a best practice to do so?
Please advise.
Sounds to me like you've been using a clustered pool router to remotely deploy worker actors across the cluster - you didn't explicitly mention this in your description, but that's what it sounds like.
It also sounds like, what you're really trying to do here is take advantage of local affinity: have child worker actors for the same entities all work together inside the same process.
Here's what I would recommend:
Have all worker actors created as children of parents, locally, inside the same process, but either using something like the child-per-entity pattern or a LOCAL pool router.
Distribute work between the worker nodes using a clustered group router, using roles etc.
Any of the work in that high volume workload should all flow directly from parent to children, without needing to round-trip back and forth between the rest of the cluster.
Given the information that you've provided here, this is as close to a "general" answer as I can provide - hope you find it helpful!

Working with patient/customer data outside of the office

Background
I am a developer that works for a health care organization. We build a variety of business apps that a majority of them contain PHI (Patient Health Information). We work on laptops in-house and occasionally have the option to work from home. Something we are discussing though is how do we handle the data stored on our laptops when we are working out of the office.
Although we have passwords and our laptops are encrypted that still doesn't seem like enough to us to protect data. What I mean by that is this. We are a small five person team. When we are working on a task we all work locally on our own databases, on our laptops. When the change is done we commit to svn and publish to a test server. Our concern is my local database is a copy of production sometimes so I can test against real data. That local database could contain thousands of records of PHI. This is obviously a major concern to us when we takes our laptops out of our building because if I have my laptop stolen, I would be putting thousands of patients health information at risk. Not something we want to do.
My Question
How do developers work as a best practice in regards to patient data safety. Or even if it was financial? Either way, how do people work with patient/customer data locally?
Is it fair to say that sometimes you just don't have the ability to connect in to a database behind a firewall or is that just negligence? Even if I keep the database internal I still have project code on my laptop. Is that bad too?
• Should I have fake data?
• Should all data be on an internal machine that you connect to?
• Should I only connect in to a machine that is internal?
I can’t imagine that is what people do all the time.
We are discussing this as a team and would love to hear your feedback in regards to "how do you or anyone work as a remote developer".
Thanks

Synchronizing client-server databases

I'm looking for some general strategies for synchronizing data on a central server with client applications that are not always online.
In my particular case, I have an android phone application with an sqlite database and a PHP web application with a MySQL database.
Users will be able to add and edit information on the phone application and on the web application. I need to make sure that changes made one place are reflected everywhere even when the phone is not able to immediately communicate with the server.
I am not concerned with how to transfer data from the phone to the server or vice versa. I'm mentioning my particular technologies only because I cannot use, for example, the replication features available to MySQL.
I know that the client-server data synchronization problem has been around for a long, long time and would like information - articles, books, advice, etc - about patterns for handling the problem. I'd like to know about general strategies for dealing with synchronization to compare strengths, weaknesses and trade-offs.
The first thing you have to decide is a general policy about which side is considered "authoritative" in case of conflicting changes.
I.e.: suppose Record #125 is changed on the server on January 5th at 10pm and the same record is changed on one of the phones (let's call it Client A) on January 5th at 11pm.
Last synch was on Jan 3rd. Then the user reconnects on, say, January 8th.
Identifying what needs to be changed is "easy" in the sense that both the client and the server know the date of the last synch, so anything created or updated (see below for more on this) since the last synch needs to be reconciled.
So, suppose that the only changed record is #125.
You either decide that one of the two automatically "wins" and overwrites the other, or you need to support a reconcile phase where a user can decide which version (server or client) is the correct one, overwriting the other.
This decision is extremely important and you must weight the "role" of the clients. Especially if there is a potential conflict not only between client and server, but in case different clients can change the same record(s).
[Assuming that #125 can be modified by a second client (Client B) there is a chance that Client B, which hasn't synched yet, will provide yet another version of the same record, making the previous conflict resolution moot]
Regarding the "created or updated" point above... how can you properly identify a record if it has been originated on one of the clients (assuming this makes sense in your problem domain)?
Let's suppose your app manages a list of business contacts. If Client A says you have to add a newly created John Smith, and the server has a John Smith created yesterday by Client D... do you create two records because you cannot be certain that they aren't different persons? Will you ask the user to reconcile this conflict too?
Do clients have "ownership" of a subset of data? I.e. if Client B is setup to be the "authority" on data for Area #5 can Client A modify/create records for Area #5 or not? (This would make some conflict resolution easier, but may prove unfeasible for your situation).
To sum it up the main problems are:
How to define "identity" considering that detached clients may not have accessed the server before creating a new record.
The previous situation, no matter how sophisticated the solution, may result in data duplication, so you must foresee how to periodically solve these and how to inform the clients that what they considered as "Record #675" has actually been merged with/superseded by Record #543
Decide if conflicts will be resolved by fiat (e.g. "The server version always trumps the client's if the former has been updated since the last synch") or by manual intervention
In case of fiat, especially if you decide that the client takes precedence, you must also take care of how to deal with other, not-yet-synched clients that may have some more changes coming.
The previous items don't take in account the granularity of your data (in order to make things simpler to describe). Suffice to say that instead of reasoning at the "Record" level, as in my example, you may find more appropriate to record change at the field level, instead. Or to work on a set of records (e.g. Person record + Address record + Contacts record) at a time treating their aggregate as a sort of "Meta Record".
Bibliography:
More on this, of course, on Wikipedia.
A simple synchronization algorithm by the author of Vdirsyncer
OBJC article on data synch
SyncML®: Synchronizing and Managing Your Mobile Data (Book on O'Reilly Safari)
Conflict-free Replicated Data Types
Optimistic Replication YASUSHI SAITO (HP Laboratories) and MARC SHAPIRO (Microsoft Research Ltd.) - ACM Computing Surveys, Vol. V, No. N, 3 2005.
Alexander Traud, Juergen Nagler-Ihlein, Frank Kargl, and Michael Weber. 2008. Cyclic Data Synchronization through Reusing SyncML. In Proceedings of the The Ninth International Conference on Mobile Data Management (MDM '08). IEEE Computer Society, Washington, DC, USA, 165-172. DOI=10.1109/MDM.2008.10 http://dx.doi.org/10.1109/MDM.2008.10
Lam, F., Lam, N., and Wong, R. 2002. Efficient synchronization for mobile XML data. In Proceedings of the Eleventh international Conference on information and Knowledge Management (McLean, Virginia, USA, November 04 - 09, 2002). CIKM '02. ACM, New York, NY, 153-160. DOI= http://doi.acm.org/10.1145/584792.584820
Cunha, P. R. and Maibaum, T. S. 1981. Resource &equil; abstract data type + synchronization - A methodology for message oriented programming -. In Proceedings of the 5th international Conference on Software Engineering (San Diego, California, United States, March 09 - 12, 1981). International Conference on Software Engineering. IEEE Press, Piscataway, NJ, 263-272.
(The last three are from the ACM digital library, no idea if you are a member or if you can get those through other channels).
From the Dr.Dobbs site:
Creating Apps with SQL Server CE and SQL RDA by Bill Wagner May 19, 2004 (Best practices for designing an application for both the desktop and mobile PC - Windows/.NET)
From arxiv.org:
A Conflict-Free Replicated JSON Datatype - the paper describes a JSON CRDT implementation (Conflict-free replicated datatypes - CRDTs - are a family of data structures that support concurrent modification and that guarantee convergence of such concurrent updates).
I would recommend that you have a timestamp column in every table and every time you insert or update, update the timestamp value of each affected row. Then, you iterate over all tables checking if the timestamp is newer than the one you have in the destination database. If it´s newer, then check if you have to insert or update.
Observation 1: be aware of physical deletes since the rows are deleted from source db and you have to do the same at the server db. You can solve this avoiding physical deletes or logging every deletes in a table with timestamps. Something like this: DeletedRows = (id, table_name, pk_column, pk_column_value, timestamp) So, you have to read all the new rows of DeletedRows table and execute a delete at the server using table_name, pk_column and pk_column_value.
Observation 2: be aware of FK since inserting data in a table that´s related to another table could fail. You should deactivate every FK before data synchronization.
If anyone is dealing with similar design issue and needs to synchronize changes across multiple Android devices I recommend checking Google Cloud Messaging for Android (GCM).
I am working on one solution where changes done on one client must be propagated to other clients. And I just implemented a proof of concept implementation (server & client) and it works like a charm.
Basically, each client sends delta changes to the server. E.g. resource id ABCD1234 has changed from value 100 to 99.
Server validates these delta changes against its database and either approves the change (client is in sync) and updates its database or rejects the change (client is out of sync).
If the change is approved by the server, server then notifies other clients (excluding the one who sent the delta change) via GCM and sends multicast message carrying the same delta change. Clients process this message and updates their database.
Cool thing is that these changes are propagated almost instantaneously!!! if those devices are online. And I do not need to implement any polling mechanism on those clients.
Keep in mind that if a device is offline too long and there is more than 100 messages waiting in GCM queue for delivery, GCM will discard those message and will send a special message when the devices gets back online. In that case the client must do a full sync with server.
Check also this tutorial to get started with CGM client implementation.
this answers developers who are using the Xamarin framework (see https://stackoverflow.com/questions/40156342/sync-online-offline-data)
A very simple way to achieve this with the xamarin framework is to use the Azure’s Offline Data Sync as it allows to push and pull data from the server on demand. Read operations are done locally, and write operations are pushed on demand; If the network connection breaks, the write operations are queued until the connection is restored, then executed.
The implementation is rather simple:
1) create a Mobile app in azure portal (you can try it for free here https://tryappservice.azure.com/)
2) connect your client to the mobile app.
https://azure.microsoft.com/en-us/documentation/articles/app-service-mobile-xamarin-forms-get-started/
3) the code to setup your local repository:
const string path = "localrepository.db";
//Create our azure mobile app client
this.MobileService = new MobileServiceClient("the api address as setup on Mobile app services in azure");
//setup our local sqlite store and initialize a table
var repository = new MobileServiceSQLiteStore(path);
// initialize a Foo table
store.DefineTable<Foo>();
// init repository synchronisation
await this.MobileService.SyncContext.InitializeAsync(repository);
var fooTable = this.MobileService.GetSyncTable<Foo>();
4) then to push and pull your data to ensure we have the latest changes:
await this.MobileService.SyncContext.PushAsync();
await this.saleItemsTable.PullAsync("allFoos", fooTable.CreateQuery());
https://azure.microsoft.com/en-us/documentation/articles/app-service-mobile-xamarin-forms-get-started-offline-data/
I suggest you also take a look at Symmetricds. it is a SQLite replication library available to android systems. you can use it to synchronize your client and server database, I also suggest to have separate databases on server for each client. Trying to hold the data of all users in one mysql database is not always the best idea. Specially if the user data is going to grow fast.
Lets call it the CUDR Sync problem (I don't like CRUD - because Create/Update/Delete are writes and should be paired together)
The problem may also be looked at from write-offliine-first or write-online-first perspective. The write-offline-approach has a problem with unique identifier conflict, and also multiple network calls for same transaction increasing risk (or cost)...
I personally find write-online-first approach easier to manage (so it will be the single source of truth - from where everything else is synced). The write-online-approach will require not letting users write offline first - they will write offline by getting ok response form online write.
He may read offline first and as soon as network is available get the data from online and update the local database and then update the ui....
One way to avoid the unique identifier conflict would be to use a combination of unique user id + table name or table id + row id (generated by sqlite)... and then use the synced boolean flag column with it.. but still the registration has to be done online first to get the unique id on which all other ids will be generated... here the issue will also be if clocks are not synced - which someone mentioned above...

Cache Regions in Velocity/AppFabric using WCF

I have a service based architecture where a web farm full of asp clients hit application server farm of WCF services. Obviously all the database access is done by the WCF services. Now I would like to cache my frequently used database retrieved objects using Velocity at the service tier level. I am considering to make each physical application server also part of the cache cluster.
According to Velocity documentation, if I use regions, objects are stored only at a single host. I actually wouldn't have any problem if each host kept it's own cache provided that I could somehow synchronize them.
So my questions are
If I create one region on one host is it also created on another one?
When I clear a cache region, is it cleared on one host only?
If I subscribe to a region level notification on all the hosts, can I catch events of one host on another one?
In this scenario should I use regions at all or stay away from them?
I hope my questions are clear. Actually I am more interested in a solution to my problem than answers to my questions
Yes you are right in reading the doc that the region will exists only in one host.
" I actually wouldn't have any problem if each host kept it's own cache provided that I could somehow synchronize them."
When you say synchronize, you mean when HA in enabled ? Velocity would actually take care of that if thats what you meant.
For the questions:
1. No.
2. Yes
3. Notifications will be sent to the client. So i am not sure if there is anyway to send notifications to other host.
4. Regions gives Search capabilities and takes away HA from you. In your case, you could use the advantages of HA.
Having regions not necessarily means that you don't have HA. if your create your own cache (and don't use the 'default' one) you can create it with Secondarys = 1 (HA on)
now let’s say you have 4 cache hosts; when you define a region , it will have both primary and secondary hosts. so each action on the region will result it being applied in both.
Shany
Named caches distribute across participating nodes. Named regions live on a single node. Regions can be HA, but they cannot take full advantage of distributed cache scaling, as their object load does not distribute across participating nodes in the cluster. Also, using named caches with HA requires three nodes minimum, rather than two nodes if you used the "default" cache only.

Functional Server Naming Conventions

I've seen "The Coolest Server Names," and I've seen another smaller-ish question related to mine, which was unfortunately closed.
It's a serious question though, as I'm on an internal applications dev team that manages the apps on a couple dozen servers. The networking folks typically don't care what we call the servers as long as they know about 'em, so we can come up with whatever conventions.
The apps the servers deal with can be home-grown custom apps, or they can be larger vendor ones like SharePoint. They can be:
In multiple networking environments that can't speak to each other (think firewalled-off external servers versus intranet-esque servers)
In different physical locations (California office versus New York, etc.)
In multiple deployment tiers (production, staging, testing, dev)
Have one or many functions (web server, DB server, mail server, app server)
Load-balanced or not
Standby (for disaster recovery purposes) or primary
Whew! Think it's even possible to come up with a convention that can address all of these aspects, or significant ones? It'd be nice to hear a server name (or DNS entry for it) and be able to immediately know what it does, and it works for getting new guys up to speed as well. "sharepoint-IPC-1 is down" could be parsed into "the internal production SharePoint web server in the California datacenter that's the first node in the load balancing is down!"...but that seems overly complicated at first glance.
Another thing in the back of my mind is that an old mail relay server is getting decommissioned, which means we have to scour through a lot of old apps to repoint hardcoded server values (I know... :).
Here are some general guidelines I try to abide by, based on mistakes I've made in the past.
Never base your machine names on...
Hardware Machines get swapped out all the time, and you don't want to have to do too much work if you change from an IBM server, to a Sun server, to a Dell server.
Location Equipment and even entire server rooms can be moved based on business requirements or technical issues.
Intended Use As your product evolves, so too may the intended use of each server. Having a machine named "dbsrv" but eventually acts as a file server too, is confusing.
Owner The person who "owns" the equipment (an employee) can change, due to firings, layoffs, and moves within the company.
Subnet As I said before, labs can move, and so can subnets. One of the main goals of DNS is to free you from being tied to a specific IP address, so why tie yourself down needlessly?
Now, some suggestions for the situation you described...
Machines spread across a region This is what subdomains are for in DNS. You could have "west.company.com" and "east.company.com".
Have one or many functions Don't name them based on intended use. If you name them based on some large collection of names--Greek gods, for example--you will eventually intuitively know that zeus.east means your master database server and apollo.west is your backup database server. Worst case, look it up in a spreadsheet.
Load-balanced or not You can take two approaches. You could have a unique name per node behind the load balancer, or you could do something like athena-1.east, athena-2.east, etc. Either way, a load balancer will (hopefully) free you from worrying too much about what each node is named.
Standby or not This doesn't sound like a criterion that should have an impact on the machine name.
What I'm essentially saying is:
Separate your equipment into different regional subdomains
Choose a naming scheme with plenty of names (Greek gods in this example)
Don't base the names on any of the criteria I mentioned above (intended use, location, etc.)
Trying to do anything more than that will be more trouble than it's worth.
I know that it's tempting to assign names to servers that describe their functions and other similar attributes and in a perfect world that will work but in practice I have found that after a while these things get messed up as functions and other parameters of the servers change (as the requirements of the business change) so the names no longer reflect the reality.
I think you should assign unique names to the servers that do not tell anything about the function or other parameters and have some sort of (up to date) list detailing those things so that your people can look it up. That's what we do here.
The other extreme is using IP addresses only or having names based on IP addresses which can lead to a disaster too if you ever have to change your IP addresses.