Using an ESB system to replicate data among databases

Using an ESB system to replicate data among databases - replication

I work in a small supermarket chain (4 stores). Each store has its own local database which contains information of each product, prices, and transactions that have ocurred on the store. In addition, each store needs to replicate this information back and forth to a central location.
Right now we are using something called SQLRemote, which is a feature of Sybase's SQL Anywhere database. It works, but sometimes fails and is difficult to manage. To its' credit, SQLRemote actually wasn't designed for this type of scenarios, so it could be said that we are using it incorrectly.
I was thinking that an ESB system such as Mule (or ChainBuilder which seems easier to set up) might be a good alternative to SQL remote. I understand that these systems can detect when changes occur in the database (i.e. when records are added, modified or deleted), and can be set up to deliver a message in a transaction.
Would this be a viable solution to my scenario?
Best regards,
Edgard

Yeah I am sure Mule should be able to do this.
However I work for a company which provides Fuse ESB which is using Apache projects such as Apache ServiceMix, Apache ActiveMQ, Apache Camel and Apache CXF.
We have a user story about a very big retailler in US which uses Fuse ESB to integrate their stores and warehouses and whatnot
http://fusesource.com/collateral/17
Fuse ESB
http://fusesource.com/products/enterprise-servicemix/

Yes, Mule can support this scenario thought it might be overkill. There are targeted database replication solutions out there. The advantage of Mule would be it's ability to handle failure and other scenarios where you need the workflow to be adapted based on what is happening. This allows you to build a very robust solution.
Mule flows could be a very good choice to address this problem. It's a new feature of Mule 3 designed for orchestrating integrations like this.

Related

Can MuleSoft API/Anypoint copy data from one database's table to another database's table without any additional step (any custom code)?

Goal:
I have two SQL server databases (DB-A and DB-B) located on two different severs in same network.
DB-A has a table T1 and I want to copy data from DB-A's Table T1 (source) to DB-B's Table T2 (Destination). This DB sync should take palace anytime any record in T1 is added, updated, and deleted.
Please note: All db to db data syc options are out of consideration, I must use MuleSoft API for this job.
Background:
I am new to MuleSoft and its offered products, I am told mule soft platform can help with building and managing API’s.
I explored web for MuleSoft offering, there are many articles (mentioned below) which are suggesting that MuleSoft itself can read and write from one DB table and write to another DB table (using DB connectors etc).
Questions:
Is it possible that MuleSoft itself can get this data sync job done without us writing own MuleSoft API invoker or MuleSoft API Consumer (to trigger MuleSoft API from one end or to receive data from MuleSoft API on the other end and write to DB table)?
What are all key steps to get this data transfer working? If you can provide any reference which shows step by step journey to achieve the goal will be huge help.
Links:
https://help.mulesoft.com/s/question/0D52T00004mXXGDSA4/copy-data-from-one-oracle-table-to-another-oracle-table
https://help.mulesoft.com/s/question/0D52T00004mXStnSAG/select-insert-data-from-one-database-to-another
https://help.mulesoft.com/s/question/0D72T000003rpJJSAY/detail

First let's clarify the terminology since the questions mixes several concepts in a confusing way. MuleSoft is a company that has several products that may apply. A MuleSoft API should be considered an API created by MuleSoft. Since you clearly are talking about APIs created by you or your organization that would be an incorrect description. What you are talking about are really Mule applications, which are applications that are deployed and executed in a Mule runtime. Mule applications may implement your APIs, or may implement integrations. After all Mule originally was an ESB product used to integrate other systems, before REST APIs where a thing. You may deploy Mule applications to Anypoint Platform. Specifically to the CloudHub component of the platform, or to an on-prem instance of Mule runtime.
In any case, a Mule application is perfectly capable of implementing APIs, integrations or both. There is no need that it implements an API or call another API if that is not what you want. You need to trigger the flow somehow, either reading directly from the database to find new rows, with a scheduler to execute a query at a given time, an HTTP request or even have an API listening for requests to trigger the flow.
As an example the application can use the <db:listener> source of the Database connector to start the flow fetching rows. You need to take care of any watermark columns configurations to detect only new rows. See the documentation https://docs.mulesoft.com/db-connector/1.13/database-documentation#listener for details.
Alternatively you can trigger the flow in another way and just use a select operation.
After that use DataWeave to transform the records as needed. Then use insert or update operations.
There are examples in the documentation that can help you to get started. If you are not familiar with Mule you should start with reading the documentation and do some training until you get the concepts.

Organizing services dataflow / eip

Say I have like 1000 VMs with different services running on them with different technologies used like python, NET, java and different middleware like rabbitmq, redis etc.
How can I dynamically handle the interactions between the services and provide scalability?
For Example, say I have Service A which is pushing Data to a rabbitmq then the data is processed by service B while fetching additional data from Service C. You see at the end I have a decentralized system which is pulling data somewhere and pushing it somewhere else... a total mess! Scale it up to 2000 microservices omg XD.
The moment I change one thing a lot of other systems are affected.
Do you know something maybe like an ESB where I can couple two services together with a message transform adapter in the middle of it and I can change dependenciesat runtime? Like the stream doesn't end in service F anymore and does end in G for example?
I think microservices are a good idea because they can be stateless, can scale, can easily be deployed as a container. But I don't know a good tool/program for managing the data flow. The rabbitmq doesn't support enough enterprise integration patterns. Do you have any advice?

How can I dynamically handle the interactions -
See if using an existing EIP pattern solves your problem to implement the logistics
Depending on how your design shapes up, you may need to use Distributed Lock Management
Or maybe your application is simple enough to use a Consul K/V store as a semaphore & a simple mosquitto topic based bus.
Provide scalability
What is the solution you are trying to scale? AMQP, Consul, "microservices" in themselves are very scalable & distributed
However, to scale your thought process & devops, you need to find a way to see things as patterns that help you split the problem & tackle the complexity
Do you know something maybe like an ESB where I can couple two services together with a message transform adapter in the middle of it and I can change dependenciesat runtime?
Read up on EIP. ESBs are just one of the many ways you can solve your problem. RTFM, & get some perspective.
But I don't know a good tool/program for managing the data flow.
Ask yourself if your problem is related to distributed workflow management, or if a data pipeline is what you are really looking for
Look at Spark, Storm, Luigi, Airflow - they all have a different purpose - but you will know what to do with them if you manage to read up on everything else in this post ;)

Can HL7 2.x only be used for receiving messages or also to pull data?

I am quite new in the HL7 field and not a developer, so sorry if my question might seem to be too obvious.
We want to develop an app for a hospital which visualises performance and patient-flow data by aggregating data from other hospital applications. Our app will both visualise realtime data and historic data. During talks with the head of IT I got confused, he explained I need to:
Develop an HL7 listener like Mirth which can receive messages of other applications which communicate via HL7 2.x standards to catch realtime data and after this organise to migrate historic data from other applications via sql queries. Sounds pretty logic, though not sure if he's an expert since he had no idea what an API was and knew nothing about FHIR.
My questions are:
1 What triggers an application to send an HL7 2.x message around to other application when for instance someone changes the status of a patient? Is it programmed to automatically send a message with each change in record just randomly around? So assuming all applications do this standardly and you just need a listener like Mirth to catch those messages and migrate into my own database?
2 Can't I use the HL7 2.x standard to pull info via a query out of a database? Meaning can it be used for two-way communication? I send query, application sends me the data in an HL7 message? Meaning I can also use it to pull historic data from another database?
3 What kind of difference would the use of FHIR standard have in this situation? I believe it can definitely be used to pull information from another database. But would it in fact make a difference compared with the tactic which the tech guy is advising me, which is migrating historic data to my own database and further just catch new changes by receiving hl7 2.x messages?
4 Would it be an advise to use an FHIR RESTful API to pull/receive info from applications which still use HL7 2.x standard? So for both historic as realtime changes? Would this be a faster way of integration, or better to use the old fashioned way the Tech guy advises me.
Very keen to know more about this, since I want to organise a strategy which is future proof and won't cost months of integration time every time we migrate to a new hospital.
Thanks for your help guys!

depends on the application. most only send data, and it's configurable when and why.
no, you use hl7 v2 to pull data out of an application, not a database - if, that is, the application supports it. Many (most?) don't. And you can only do what the applcation allows
FHIR would be a lot easier to use, but it's still settling, and you'll have trouble finding applications that offer a fhir interface this year. you'll have to talk to potential customers to find out whether it's possible. btw, FHIR can do what v2 can in this regsard - both pull and push
it's always to advisable to use FHIR - if you can. mostly, though, you'll have to use v2 because that's what's on offer.

How can I use NServiceBus with a database instead of MSMQ

Is it possible to use NServiceBus with a database as the queue storage instead of MSMQ? If so, how can I get started and what are the pros and cons of using a database instead of MSMQ?

If you want to use something other than MSMQ you'll have to plug in your own ITransport. I would take a look at the NSB Contrib project on GitHub, there is an implementation of of ITransport for the SQL Server Broker(messaging).
The cons I see for using a database includes cost and maintenance overhead. MSMQ comes with the OS for free and most admins have the skills to maintain it. Once you get in a DB, you have to pay for it and find someone to maintain it. This starts out ok, but once you get into multiple environments and things like clustering, licensing gets out of control.

Application Level Replication Technologies

I am building out a solution that will be deployed in multiple data centers in multiple regions around the world, with each data center having a replicated copy of data actively updated in each region. I will have a combination of multiple databases and file systems in each data center, the state of which must be kept consistent (within a data center). These multiple repositories will be fronted by a SOA service tier.
I can tolerate some latency in the replication, and need to allow for regions to be off-line, and then catch up later.
Given the multiple back end repositories of data, I can't easily rely on independent replication solutions for each one to maintain a consistent state. I am thus lead to implementing replication at the application layer -- by replicating the SOA requests in some manner. I'll need to make sure that replication loops don't occur, and that last writer conditions are sorted out correctly.
In your experience, what is the best pattern for solving this problem, and are there good products (free or otherwise) that should be investigated?

Lotus/ Domino is your answer. I've been working with it for ten years and its exactly what you need. It may not be trendy (a perception that I would challenge) but its powerful, adaptable and very secure, The latest version R8 is the best yet.

You should definitely consider IBM Lotus Domino. A Lotus Notes database can replicate between sites on a predefined schedule. The replicate in Notes/Domino is definitely a very powerful feature and enables for full replication of data between sites. Even if a server is unavailable the next time it connects it will simply replicate and get back in sync.
As far as SOA Service tier you could then use Domino Designer to write a webservice. Since Notes/Domino 7.5.x (I believe) Domino has been able to provision and consume webservices.

AS what other advised, I will recommend also Lotus Notes/Domino. 8.5 is really very powerful application development platfrom

You dont give enough specifics to be certain of your needs but I think you should check out SQL Server Merge replication. It allows for asynchronous replication of multiple databases with full conflict resolution. You will need to designate a Global master and all the other databases will replicate to that one, but all the database instances are fully functional (read/write) and so you can schedule replication at whatever intervals suit you. If any region goes offline they can catch up later with no issues - if the master goes offline everyone will work independantly until replication can resume.
I would be interested to know of other solutions this flexible (apart from Lotus Notes/Domino of course which is not very trendy these days).

I think that your answer is going to have to be based on a pub/sub architecture. I am assuming that you have reliable messaging between your data centers so that you can rely on published updates being received eventually. If all of your access to the data repositories is via service you can add an event notification to the orchestration of each of your update services that notifies all interested data centers of the event. Ideally the master database is the only one that sends out these updates. If the master database is the only one sending the updates you can exclude routing the notifications to the node that generated them in the first place thus avoiding update loops.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas