Controlling Gemfire cache updates in the background

Controlling Gemfire cache updates in the background - gemfire

I will be implementing a Java program that acts as a gemfire client. The program will continuosly process records that it receives on its port from a remote program. Each record will be processed using the static data cached with my program. The cache may get updated behind the scenes in my program when it is changed on the gemfire server. The processing of one record may take a few seconds. I run the risk of processing half the record with static data that was prevalent before the change and rest of the record with static data that has taken effect after the change. Is there a way I can tell gemfire to not apply the cache to the local client until I am done processing the ongoing record?
Regards,
Yash

Consider this approach: Use a Continuous query "Select *" instead of event registration. A CQ does not update the client region like a subscription does. Make your client region LOCAL. After receiving the CQ event on the client, execute your long running process and put the value that you received from the CQ into your client region. Decoupling client and server in this way will allow your client to run long-running processes.
Alternatively: if you must have the client cache proxied with the server as an absolute requirement, then keep the interest registration AND register a CQ. Ignore the event callback from the subscription but handle your long-running process using the event callback from the CQ.
The following is from page 683 at http://gemfire.docs.pivotal.io/pdf/pivotal-gemfire-ug.pdf
CQs do not update the client region. This is in contrast to other server-to-client messaging like the updates sent to satisfy interest registration and responses to get requests from the client's Pool.

Related

Apache Ignite's Continuous Queries event handler group & sequencing

We are trying to use the Continuous Query feature of Ignite. But we are facing an issue on handling that event. Below is our problem statement
We have defined a Continuous Query with remote filter for a cache and shared the filter definition with Thick Client.
We are running multiple replica of the "Thin Client" in Kubernetes cluster.
Now the problem is each instance of the "Thin Client" running in k8s cluster have registered the remote filter and each instance receiving the event and trying to process the data in parallel. This resulting in duplicating data process or even overriding the data in my store.
Is there any way to form a consumer group and ensure that only one instance of the "Thin Client" is receiving the notification and its processing the data ?
My Thick client and Thin Clients are in .NET
Couldn't found any details on Ignite document
https://ignite.apache.org/docs/latest/key-value-api/continuous-queries

Here each thin client is starting its own continuous query and thereby, by design, each thin client is getting its own event to consume. If you want to route an event to a specific client then you would need to start only one continuous query, and distribute that event to your app as you see fit.
Take a look at ignite messaging to see whether it fits your use case.
Also check out the distributed Queue/Set which have unique delivery guarantees.

Can RabbitMQ (or similar message queuing system) be used to single thread requests per user?

The issue is we have some modern web applications that are integrated with a legacy system that was never designed to support multiple concurrent requests from a single user. Basically there are certain types of requests that the legacy system can only handle one-at-a-time from a single user. It can handle multiple concurrent requests coming from different users, but for technical reasons cannot handle multiple from a single user. In these situations, the user's first request will complete successfully, but any subsequent requests from that same user that come in while the first request is still executing will fail.
Because our apps are ajax enabled, multi-tab/multi-browser friendly, and just the fact that there are multiple apps - there are certain scenarios where a user could wind up having more than one of these types of requests being sent to the legacy system at the same time.
I'm trying to determine if something like RabbitMQ could be positioned in front of the legacy system and leveraged to single-thread requests per user/IP. The thinking being that the web apps would send all requests to MQ, and they'd stack into per-user queues and pass on to the legacy system one at a time.
I don't know if there would be concerns about the potential number of queues this could create - we have a user-base of approx 4,000.
And I know we could somewhat address this in the web apps individually, but since there are multiple apps it'd be duplicating logic across them, and you'd still have the potential for two different apps to fire off concurrent requests.
Any feedback would be appreciated. Thanks-

I'm not sure a unique queue per user will work as you would need to have a backend worker process listening for messages on that queue that would need to be dynamically created.
Below is one option but it does have a performance bottleneck potential as a single backend process would be handling all requests sequentially. You could use multiple worker processes but you wouldn't know if one had completed before the other causing a race condition if your app requires a specific sequence of actions.
You could simply put all transactions (from all users) into a single queue and have a backend process pull off of that queue and service the request. If there needs to be a response back to the user once the request was serviced, then the worker process could respond back to a separate queue with a correlationID that could be used to send the response date back to the correct user.
I've done this before with ExpressJS apps where the following flow would happen:
The user/process/ajax makes a request
Express takes the payload from the request object and sends it to a RabbitMQ queue with a unique correlationId (e.g. UUID).
Express then takes the response object and stores it in a responseStore object with the key being the correlationId
Meanwhile, a backend worker process pulls the item from the queue, does some work and then sends a message to a different response queue with the same correlationId
The ExpressJS application has a connection to the response queue and when it receives a message, it takes the correlationId from the response and looks for a response object stored with same correlationId in the responseStore. If it finds it, it takes the payload from the message and does something like response.send(payload) or response.json(payload)
To do this, you should also have a mechanism that stores the creation time of the response object in the responseStore along with the response object. Then have a separate process that will check the responseStore and clean up old response objects after a certain timeout in case there are issues with the backend process completing.
Look here for more info on RPC with RabbitMQ:
https://www.rabbitmq.com/tutorials/tutorial-six-javascript.html
Hope this helps.

Gemfire cache pre-heat completion

I would like to have one server and a few clients. The Server will be my own Java application that uses CacheFactory. I will be reading all my static data from a database and populating the cache even before it is requested by any client. While the cache is getting populated in the server, it would also be spreading among all clients that are connected to the server. Once the cache population is complete, I would like to give a green signal to all clients to start requesting data. Is there something I need to do so that the server sends an event to the clients or the clients generate an event signallig the completion of cache pre-heating?
Thanks,
Yash

One way to accomplish this would be to create a region on the server and the client (say /server-ready) for notification only. The client will register interest for all keys in this region. The client will also register a CacheListener for this region.
When the server is done loading data, you can put an entry in the server-ready region. The event will be sent to the client and afterCreate() method on the CacheListener will be invoked, which could serve as a notification to your clients that the server is done populating data.

WCF service writes log only if client receives results

I'm working on a WCF service to help our new code interoperate with a legacy system. The process goes like this:
Client calls the service with a request for the legacy system.
Service writes the request into a database.
Legacy system services request from the DB in its own time and writes results back into the DB (updating a status flag to say results are ready).
Client retrieves results by calling a second service method, which polls the DB until the ready flag is set.
Just before returning the results, the service updates the status flag to client has results, so that the related DB rows can be deleted.
My concern is the race condition at the last step. I can see this happening:
Service updates status to client has results.
Client times out after waiting for the service to poll the DB.
Service tries to return results. Hilarity ensues.
One way to solve this would be to have three service calls instead of two: the second call retrieves results, and the last one is an explicit acknowledgement by the client that it has them. I'd like to know whether there is a way which doesn't impose this extra "protocol" burden on the client though.
I've looked briefly into using transactions in WCF, and it sounds like they might be able to do what I need. The client (optionally) starts a transaction, flows it to the service, which uses it if it's there, and commits it when done. This seems as if it implicitly does the "third call".
Does this idea have any merit? Any disadvantages that you can see? Are there any other avenues I could explore?

Using transaction flow is possible but flowing transaction in polling scenario (in each poll call) is terrible architecture. What you generally need is transaction flow for the real read operation where service modifies the record and returns data back to the client. The client will commit the transaction and it will commit changes performed by the service.
Using transactional processing places some additional requirements on your service and clients.
Another approach can be transactional MSMQ:
Client calls the service with a request for the legacy system = client sends a message to the service's queue
Service writes the request into a database = service processes the message from its queue
Legacy system services request from the DB in its own time and writes results back into the DB (updating a status flag to say results are ready).
Service polls the database and places messages to correct client queues. Placing the message and modifying database records runs in transaction
Client processes incoming message
Transactional queue allows transactional reading (the message is removed from the queue only if transaction is committed) and writing (the message is added to the queue only if transactions is committed). That will allow deleting records before the client reads the message because the message will remain in the queue until he successfully reads it (or until it timeouts and even after that it can be passed to some error queues).
In both cases you should think about clients who will consume the service. Transaction flowing can be interoperable but not every web service stack supports it. MSMQ is not interoperable.

Why not reduce the likelihood of the client timing out by doing this instead:
Client calls service with a request for the legacy system.
Service writes the request into a database.
Legacy system services request from the DB in its own time and writes results back into the DB (updating a status flag to say results are ready).
Client calls a service to find out whether the results are ready. NB. no polling: just returns with an immediate yes or no.
If the results are NOT ready, client waits a bit and then goes back to step 4.
If the results ARE ready, call the service to retrieve the results. The service can update the status to "Client has results" at that point.
By doing this, the client won't be waiting for the service call in step 4. to return for a prolonged period, and the chances of a timeout should be minimal.
However, you're never going to be 100% certain that the client has received the results unless the client makes a final service call to say so. (What if, for example, the client dies after making the very last request?)

Is it possible to have asynchronous processing

I have a requirement where I need to send continuous updates to my clients. Client is browser in this case. We have some data which updates every sec, so once client connects to our server, we maintain a persistent connection and keep pushing data to the client.
I am looking for suggestions of this implementation at the server end. Basically what I need is this:
1. client connects to server. I maintain the socket and metadata about the socket. metadata contains what updates need to be send to this client
2. server process now waits for new client connections
3. One other process will have the list of all the sockets opened and will go through each of them and send the updates if required.
Can we do something like this in Apache module:
1. Apache process gets the new connection. It maintains the state for the connection. It keeps the state in some global memory and returns back to root process to signify that it is done so that it can accept the new connection
2. the Apache process though has returned the status to root process but it is also executing in parallel where it going through its global store and sending updates to the client, if any.
So can a Apache process do these things:
1. Have more than one connection associated with it
2. Asynchronously waiting for new connection and at the same time processing the previous connections?

This is a complicated and ineffecient model of updating. Your server will try to update clients that have closed down. And the server has to maintain all that client data and meta data (last update time, etc).
Usually, for continuous updates ajax is used in a polling model. The client has a javascript timer that when it fires, hits a service that provides updated data. The client continues to get updates at regular intervals without having to write an apache module.
Would this model work for your scenario?
More reasons to opt for poll instead of push
Periodic_Refresh

With a little patch to resume a SUSPENDED mpm_event connection, I've got an asynchronous Apache module working. With this you can do the improved polling:
javascript connects to Apache and asks for an update;
if there's no updates, then instead of answering immediately the module uses SUSPENDED;
some time later, after an update or a timeout happens, callback fires somewhere;
callback gives an update (or a "no updates" message) to the client and resumes the connection;
client goes to step 1, repeating the poll which with Keep-Alive will use the same connection.
That way the number of roundtrips between the client and the server can be decreased and the client receives the update immediately. (This is known as Comet's Reverse Ajax, AFAIK).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas