Keeping Gemfire in Sync with a Database

Keeping Gemfire in Sync with a Database - gemfire

We are developing an application that makes use of deep object models that are being stored in Gemfire. Clients will access a REST service to perform cache operations instead of accessing the cache via the Gemfire client. We are also planning to have a database underneath Gemfire as the system of record making use of asynchronous write-behind to do the DB updates and inserts.
In this scenario it seems to be a non-trivial problem to guarantee that an insert or update into Gemfire will result in a successfull insert or update in the DB without setting up elaborate server-side validation (essentially the constraints to the Gemfire operation would have to match the DB operation constraints). We cannot bubble the DB insert or update success/failure back to the client without making the DB call synchronous to the Gemfire operation. This would obviously defeat the purpose of using Gemfire for low latency client operations.
We are curious as to how other Gemfire adopters who are using write-behind have solved the problem of keeping the DB in sync with the Gemfire data fabric?

We generally recommend to implement validations in GemFire in a CacheWriter since by definition it's going to be called before any modification occurs on the cache.
http://gemfire.docs.gopivotal.com/javadocs/japi/com/gemstone/gemfire/cache/CacheWriter.html
That said, a pattern that I saw on a customer was to receive data on region "A" and implement basic validations there, accepting that data. This data will then be copied to the DB using write-behind as you describe, batched through AsyncEventListener where in the try/catch of the insertion, if any error occur they store that data in a another region that has another Listener for the write-behind, non-batched, so you can actually see which record failed and decide what to do accordingly.
Something like:
data -> cacheWriter basic checks -> region A -> AEL (batch 500~N events) -> DB
If error occurs copy to region B -> Listener persist individual records on DB ->
Do some action on the failed record.
In their case, they had an interface to manually clear pending records on region B.
Hope that helps, if not maybe you can provide more details about your use case...
Cheers.

You can register an AsyncEventListener and update your DB with batch updates. For more info:
http://pubs.vmware.com/vfabric53/topic/com.vmware.vfabric.gemfire.7.0/developing/events/implementing_write_behind_event_handler.html

Gemfire ver8 and above solves your problem of accessing Cache objects REST service.
Gemfire REST

Related

Can i have redis master in one microservice domain and redis slave used by a different microservice as a way of sharing data?

I have three microservice that needs to communicate between each other. Microservice-1 is incharge of the data and the database(he writes and read to it). I will add a redis cache store for Microservice-1 to cache data there. I want to put a redis-slave for the other 2 microservices to reduce communication with the actual microservice, if the data is already in the cache store. Since all updates to the data, has to go thru the Microservice-1 and he will always update the cache, redis replication will make sure the other two microservices will get it too. Ofcourse, if the data is not in cache, it will call the Microservice-1 for the data, which will update the cache.
Am i missing something, with this approach ?

This will definitely work in the "sunny day" case.
But sometimes there are storms, and in storms there's a chance of losing cache coherency (i.e. the DB and Redis disagree on the data).
For example, lets say that you have Microservice-1 update the DB and then update Redis. What happens if there's a crash between updating the DB and updating Redis?
On the other hand, what if you reverse the ordering (update Redis and then the DB)? Now Redis could be updated and not the DB.
Neither of these in insurmountable, but absent a means of having a transaction which ensures that 0 or 2 of Redis and the DB are updated, there will always be a time window where the change is in one but not the other. In that situation, it's probably worth embracing eventual consistency (e.g. periodically scan the DB and update redis with recently updated records).
As an elaboration on that, a Command Query Responsibility Segregation with Event Sourcing (CQRS/ES) approach may prove useful: Microservice-1 gets split into two services, one which takes commands (requests to update) and another which handles queries. Instead of updating a row in a DB, the command service now appends (in a typical DB, an INSERT) an event which describes what changed. The query service can subscribe to those events and update Redis. Other microservices can also subscribe to the stream of events and update their own views (which can be remixed in any way they want) of Microservice-1's state.

When writing to azure table storage we sometimes see behavior

When writing to azure table storage we sometimes see behavior that looks like the following situation:
We send an update request “The update is received and queued for actual processing in azure“
We receive an 200 OK result on the update request
We send a request for data
We get data from before the update (undesirable situation)
We “wait a bit”
We send another request for data
We get data from after the update
When azure is busy, the update seems to take a while, which becomes a problem if we query the updated data immediately (eventual consistency).
Are the above assumed inner workings of azure correct?
If so, what are best practices for getting up to date data directly after an update?

I'm afraid that the situation is kind of normal.As we know,CAP has influenced so many data systems.Please refer to this detailed document.
The situation you described shows that azure table storage uses high availability, which guarantees that the service could be accessed by users at all times. However, this has a slight impact on consistency, and the data accessed by users may not be up-to-date.
You could know about cosmos db table-api,it supports 5 consistency levels,from Strong to Eventually.
If you do concern the real-time data,you could set the level as Strong.

Updating OpenFlow group table bucket list in OpenDaylight

I have a mininet (v2.2.2) network with openvswitch (v2.5.2), controlled by OpenDaylight Carbon. My application is an OpenDaylight karaf feature.
The application creates a flow (for multicasts) to a group table (type=all) and adds/removes buckets as needed.
To add/remove buckets, I first check if there is an existing group table:
InstanceIdentifier<Group> groupIid = InstanceIdentifier.builder(Nodes.class)
.child(Node.class, new NodeKey(NodId))
.augmentation(FlowCapableNode.class)
.child(Group.class, grpKey)
.build();
ReadOnlyTransaction roTx = dataBroker.newReadOnlyTransaction();
Future<Optional<Group>> futOptGrp = rwTx.read(LogicalDatastoreType.OPERATIONAL, groupIid);
If it doesn't find the group table, it is created (SalGroupService.addGroup()). If it does find the group table, it is updated (SalGroupService.updateGroup()).
The problem is that it takes some time after the RPC call add/updateGroup() to see the changes in the data model. Waiting for the Future<RPCResult<?>> doesn't guarantee that the data model has the same state as the device.
So, how do I read the group table and bucket list from the data model and make sure that I am indeed reading the same state as the current state of the device?
I know that
Add/UpdateGroupInputBuilder has setTransactionUri()
DataBroker gives transaction to read/write
you should use transaction chaining
But I cannot figure out how to combine these.
Thank you
EDIT: Or do I have to use write transactions in stead of RPC calls?

I dropped using RPC calls for writing flows and switched to using writes to the config datastore. It will still take some time to see the changes appear in the actual device and in the operational datastore but that is ok as long as I use the config datastore for both reads and writes.
However, I have to keep in mind that it is not guaranteed that changes to the config datastore will always make it to the actual device. My flows are not that complicated in the sense that conflicts are unlikely to happen. Still, I will probably check consistency between operational and configuration datastore.

MFP 8.0 adapter cache

I am using MFP 8.0, and there are requirements that we want implement cache on the adapter level.
Whenever MFP server starts we want to dump all the database in cache till the server restart again.
Now whenever user hit some transaction or adapter procedure which call database so instead of calling database it must read from cache.

Adapters support read-only and transactional access modes to back-end systems.
Adapters are Maven projects that contain server-side code implemented in either Java or JavaScript. Adapters are used perform
any necessary server-side logic, and to transfer and retrieve
information from back-end systems to client applications and cloud
services.
JSONStore is an optional client-side API providing a lightweight, document-oriented storage system. JSONStore enables persistent storage
of JSON documents. Documents in an application are available in
JSONStore even when the device that is running the application is
offline. This persistent, always-available storage can be useful to
give users access to documents when, for example, there is no network
connection available in the device.

From your description, assuming you are talking about some custom DB where you have data stored, then you need to implement the logic of caching the data.
Adapter's have two classes <AdapterName>Application.java and <AdapterName>Resource.java. <>Application.java contains the lifecycle methods - init() and destroy().
You should put your custom code of loading data from your DB into cache in the init() method. And also take care of removing it in the destroy().
Now during transactional access (which hits <>Resource.java), you refer to the cache you have already created.
Your requirement, however may not be ideal for heavily loaded systems. You need to consider that:
a) Your adapter initialization is delayed. Any wrongly written code can also break the adapter initialization. An adapter isn't available to service your request until it has been initialized. In case of a clustered environment, the adapter load in all cluster members will delayed depending on the amount of data your are loading. Any client request intended for this adapter will get a runtime exception until the initialization is complete.
b) Holding the cache in memory means, so much space in the heap is used up. If your DB keeps growing, this adversely affects adapter initialization and also heap usage.
c) You are in charge maintaining the data at the latest level and also cleaning it up after use.
To summarize, while it is possible, it is not recommended. While this may work in case of very small data set, this cannot scale well. The design of adapters is to provide you transactional access to data/backend systems. You should use the adapter the way it was designed to.

Why does Quartz Scheduler(JobSToreCMT) require the use of two datasources?

I found this annswer:
1. Long answer to Quartz requiring to data sources, however, if you want an even deeper answer, I believe I’ll need to dig into the source code or do more research:
a. JobStoreCMT relies upon transactions being managed by the application which is using Quartz. A JTA transaction must be in progress before attempt to schedule (or unschedule) jobs/triggers. This allows the "work" of scheduling to be part of the applications "larger" transaction. JobStoreCMT actually requires the use of two datasources - one that has it's connection's transactions managed by the application server (via JTA) and one datasource that has connections that do not participate in global (JTA) transactions. JobStoreCMT is appropriate when applications are using JTA transactions (such as via EJB Session Beans) to perform their work. (Ref; http://quartz-scheduler.org/documentation/quartz-1.x/configuration/ConfigJobStoreCMT)
However, there is a believed conflict with a non transactional driver in our particular application. Does anyone know if Quartz (JobsStoreCMT) can just work with just a transactional data source?

Does anyone know if Quartz (JobsStoreCMT) can just work with just a transactional data source?
No you must have a datasource of each type. Invocations on the API by the client application use the connections that are XA-capable, so that the work join's the application's transaction. Work done by the scheduler's internal threads use the non-XA connections.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas