I have two different threads one of which put information into Gemfire region and another one read and remove same information from the same Gemfire region.
The problem is:
If the second thread is busy the first thread continued to put information into Gemfire region.
What is needed - block first thread if some limit of items in region was reached.
It is possible to implement this manualy but may be same mechanism already exists into Gemfire?
GemFire does not have any kind of blocking queue abstractions so, you are right, you would have to implement this manually. However, I would also ask why you feel the need to be able to throttle GemFire in this way? In case you are concerned about memory consumption you can always configure your regions to overflow to disk either when a specific entry count threshold is reached or when a heap threshold is reached.
Dont think gemfire have an option to block put operations but it have option to overflow to disk.
But if you still want to go with blocking option. You can use a CacheListener on your region and can do a size operation and block/sleep operation. Put operation will be blocked till listener is complete.
And you can reuse/remove/add listener component easily.
Related
I am running managed Instance groups whose overall c.p.u is always below 30% but if i check instances individually then i found some are running at 70 above and others are running as low as 15 percent.
Keep in mind that Managed Instance Groups don't take into account individual instances as whether a machine should be removed from the pool or not. GCP's MIGs keep a running average of the last 10 minutes of activity of all instances in the group and use that metric to determine scaling decisions. You can find more details here.
Identifying instances with lower CPU usage than the group doesn't seem like the right goal here, instead I would suggest focusing on why some machines have 15% usage and others have 70%. How is work distributed to your instances, are you using the correct strategies for load balancing for your workload?
Maybe your applications have specific endpoints that cause large amounts of CPU usage while the majority of them are basic CRUD operations, having one machine generating a report and displaying higher usage is fine. If all instances render HTML pages from templates and return the results one machine performing much less work than the others is a distribution issue. Maybe you're using a RPS algorithm when you want a CPU utilization one.
In your use case, the best option is to create an Alert notification that will alert you when an instance goes over the desired CPU usage. Once you receive the notification, you will then be able to manually delete the VM instance. As it is part of the Managed Instance group, the VM instance will automatically recreate.
I have attached an article on how to create an Alert notification here.
There is no metric within Stackdriver that will call the GCE API to delete a VM instance .
There is currently no such automation in place. It should't be too difficult to implement it yourself though. You can write a small script that would run on all your machines (started from Cron or something) that monitors CPU usage. If it decides it is too low, the instance can delete itself from the MIG (you can use e.g. gcloud compute instance-groups managed delete-instances --instances ).
My company uses IBM MQ's Multi-Instance Queues right now. We would like to replicate those queues to a different Data Center over the WAN for Disaster Recover purposes. I'm skeptical it will work simply due to all the message traffic and even a slight delay will cause the Queues to fail.
What is the technical reason why this will not work?
Are you talking about storage replication? If so are you planning to use synchronous or asynchronous replication?
Asynch will not cause any delay on the replicating end but there will be some amount of delay before the receiving end receives data depending on network distance. Your storage team should be able to tell you how many seconds the async replication delay could be.
With synch the data is sent over the network by the replicating end storage array and a confirmation comes back over the network before the the storage array returns to the OS that the write was successful. To be usable the two arrays have to be with in 6ms of each other. This type of replication adds a delay to each write equal to the network ms.
MQ application can batch messages into single units of work to improve performance with sync replication is in place, but this will slow down persistent message performance.
Define "Slight delay" in your statement?
Async replication will cause a delay and RPO will not be zero. Your storage team can advise on RPO value. If that is not acceptable, asynch replication is not an option for you.
Although it's pragmatic choice from cost and distance standpoint but could cause duplicate or missing transactions.
For synch replication, the distance in data-centers is limited. (Apart from hit on performance on Primary DC). Check with your storage team on the distance limit.
I'm trying to configure Gemfire/Geode in order to have an async event queue with parallel=true on a replicated region. However, I'm getting the following exception at startup:
com.gemstone.gemfire.internal.cache.wan.AsyncEventQueueConfigurationException: Parallel Async Event Queue myQueue can not be used with replicated region /myRegion
This (i.e. to prevent parallel queues on replicated regions) seems to be a design decision, but I can't understand why it is the case.
I have read all the documentation I've been able to find (primarily http://gemfire.docs.pivotal.io/docs-gemfire/latest/reference/book_intro.html and related docs),
and searched any kind of reference to this exception on the internet, but I didn't find any clear explanation on why I can't have an event listener on each member hosting a replicated region.
My conclusion is that I must be missing some fundamental concept about replicated regions and/or parallel queues, but since I can't find the appropriate documentation
on my own, I'm asking for an explanation and/or pointers to the right resources to read.
Thanks in advance.
EDIT : Let me put the question into context.
I have an external system sending data to my application using REST services, which are load balanced between nodes in order to maximize performance. Each of the nodes hosts the same regions (let's say 3, named A B and C). The data travels through all those regions (A to B to C) and is processed along the way. This means that region A hosts data that has just been received, region B data that has been partially processed and region C hosts data whose processing is complete.
I am using event listeners to process data and move it from region to region, and in case of the listener for region C, to export it to another external system.
All the listeners must (and I repeat, must) be transactional.
I also need horizontal scalability (i.e. adding nodes on the fly to increase throughput) and the maximum amount of data replication that can be possibily achieved.
Moreover, I want to run all of the nodes with the same gemfire configuration.
I have already tried to use partitioned regions, but they are not fit to my needs for a bunch of reasons that I won't explain here for the sake of brevity (just trust me, it is not currently possible).
So I thought that having all the nodes host the replicated regions could be the way, but I need all of them to be able to process events independently and perform region synchronization afterwards in an active/active scenario. It is my understanding that this requires event queues to be parallel, but it does not seem possible (by design).
So the (updated) question(s) are :
Is this scenario even possible? And if it is, how can I achieve it?
Any explanation and/or documentation, example, resource or anything else is more than welcome.
Again, thanks in advance.
An AsyncEventQueue is used to write data that arrives in GemFire to some other data store. You would ideally want to do this only once. Since the content of the replicated region is same on all the members of the system, you only need a Async event listener on one member, hence parallel=true is not supported.
For Partitioned regions, if you only had one member that hosts the AsyncQueue, then every single put to a partitioned region will also be routed through that member. This introduces a single point of contention in the system. The solution to this problem was introduction of parallel AsyncQueues, so that events on each member are only queued up locally in that member.
GemFire also supports CacheListeners, which are invoked on each member even for replicated regions, however, they are synchronous. You can introduce a thread pool in your CacheListener to get the same functionality.
I am trying to get my head around an issue I have recently encountered and I hope someone will be able to point me in the most reasonable direction of solving it.
I am using Riak KV store and working on CRDT data, where I have some sort of counter inside each CRDT item stored in database.
I have a rabbitmq queue, where each message is a request to increase or decrease a certain amount of aforementioned counters.
Finally, I have a group of service-workers, that listens on the queue, and for each request try to change the amount of counters accordingly.
The issue I have is as follows: While a single worker is processing a request, it may get stuck for a while on a write operation to database – let’s say on a second change of counters out of three. It’s connection with rabbitmq gets lost (timeout) so the message-request gets back on to the queue (I cannot afford to miss one). Then it is picked up by second worker, which begins all processing anew. However, the first worker finishes its work, and as a results I have processed a single message twice.
I can split those increments into single actions, but this still leaves me with dilemma – can still change value of counter twice, if some worker gets stuck on a write operation for a long period.
I do not have possibility of making Riak KV CRDT writes work faster, nor can I accept missing out a message-request. I need to implement some means of checking whether a request was already processed before.
My initial thoughts were to use some alternative, quick KV store for storing rabbitMQ message ID if they are being processed. That way other workers could tell whether they are not starting to process a message that is already parsed elsewhere.
I could use any help and pointers to materials I can read.
You can't have "exactly one delivery" semantic. You can reduce double-sent messages or missed deliveries, so it's up to you to decide which misbehavior is the least inconvenient.
First of all are you sure it's the CRDTs that are too slow ? Are you using simple counters or counters inside maps ? In my experience they are quite fast, although slower than kv. You could try:
- having simple CRDTs (no maps), and more CRDTs objects, to lower their stress( can you split the counters in two ?)
- not using CRDTs but using good old sibling resolution on client side on simple key/values.
- accumulate the count updates orders and apply them in batch, but then you're accepting an increase in latency so it's equivalent to increasing the timeout.
Can you provide some metrics? Like how long the updates take, what numbers you'd expect, if it's as slow when you have few updates or many updates, etc
I've found different zookeeper definitions across multiple resources. Maybe some of them are taken out of context, but look at them pls:
A canonical example of Zookeeper usage is distributed-memory computation...
ZooKeeper is an open source Apache™ project that provides a centralized infrastructure and services that enable synchronization across a cluster.
Apache ZooKeeper is an open source file application program interface (API) that allows distributed processes in large systems to synchronize with each other so that all clients making requests receive consistent data.
I've worked with Redis and Hazelcast, that would be easier for me to understand Zookeeper by comparing it with them.
Could you please compare Zookeeper with in-memory-data-grids and Redis?
If distributed-memory computation, how does zookeeper differ from in-memory-data-grids?
If synchronization across cluster, than how does it differs from all other in-memory storages? The same in-memory-data-grids also provide cluster-wide locks. Redis also has some kind of transactions.
If it's only about in-memory consistent data, than there are other alternatives. Imdg allow you to achieve the same, don't they?
https://zookeeper.apache.org/doc/current/zookeeperOver.html
By default, Zookeeper replicates all your data to every node and lets clients watch the data for changes. Changes are sent very quickly (within a bounded amount of time) to clients. You can also create "ephemeral nodes", which are deleted within a specified time if a client disconnects. ZooKeeper is highly optimized for reads, while writes are very slow (since they generally are sent to every client as soon as the write takes place). Finally, the maximum size of a "file" (znode) in Zookeeper is 1MB, but typically they'll be single strings.
Taken together, this means that zookeeper is not meant to store for much data, and definitely not a cache. Instead, it's for managing heartbeats/knowing what servers are online, storing/updating configuration, and possibly message passing (though if you have large #s of messages or high throughput demands, something like RabbitMQ will be much better for this task).
Basically, ZooKeeper (and Curator, which is built on it) helps in handling the mechanics of clustering -- heartbeats, distributing updates/configuration, distributed locks, etc.
It's not really comparable to Redis, but for the specific questions...
It doesn't support any computation and for most data sets, won't be able to store the data with any performance.
It's replicated to all nodes in the cluster (there's nothing like Redis clustering where the data can be distributed). All messages are processed atomically in full and are sequenced, so there's no real transactions. It can be USED to implement cluster-wide locks for your services (it's very good at that in fact), and tehre are a lot of locking primitives on the znodes themselves to control which nodes access them.
Sure, but ZooKeeper fills a niche. It's a tool for making a distributed applications play nice with multiple instances, not for storing/sharing large amounts of data. Compared to using an IMDG for this purpose, Zookeeper will be faster, manages heartbeats and synchronization in a predictable way (with a lot of APIs for making this part easy), and has a "push" paradigm instead of "pull" so nodes are notified very quickly of changes.
The quotation from the linked question...
A canonical example of Zookeeper usage is distributed-memory computation
... is, IMO, a bit misleading. You would use it to orchestrate the computation, not provide the data. For example, let's say you had to process rows 1-100 of a table. You might put 10 ZK nodes up, with names like "1-10", "11-20", "21-30", etc. Client applications would be notified of this change automatically by ZK, and the first one would grab "1-10" and set an ephemeral node clients/192.168.77.66/processing/rows_1_10
The next application would see this and go for the next group to process. The actual data to compute would be stored elsewhere (ie Redis, SQL database, etc). If the node failed partway through the computation, another node could see this (after 30-60 seconds) and pick up the job again.
I'd say the canonical example of ZooKeeper is leader election, though. Let's say you have 3 nodes -- one is master and the other 2 are slaves. If the master goes down, a slave node must become the new leader. This type of thing is perfect for ZK.
Consistency Guarantees
ZooKeeper is a high performance, scalable service. Both reads and write operations are designed to be fast, though reads are faster than writes. The reason for this is that in the case of reads, ZooKeeper can serve older data, which in turn is due to ZooKeeper's consistency guarantees:
Sequential Consistency
Updates from a client will be applied in the order that they were sent.
Atomicity
Updates either succeed or fail -- there are no partial results.
Single System Image
A client will see the same view of the service regardless of the server that it connects to.
Reliability
Once an update has been applied, it will persist from that time forward until a client overwrites the update. This guarantee has two corollaries:
If a client gets a successful return code, the update will have been applied. On some failures (communication errors, timeouts, etc) the client will not know if the update has applied or not. We take steps to minimize the failures, but the only guarantee is only present with successful return codes. (This is called the monotonicity condition in Paxos.)
Any updates that are seen by the client, through a read request or successful update, will never be rolled back when recovering from server failures.
Timeliness
The clients view of the system is guaranteed to be up-to-date within a certain time bound. (On the order of tens of seconds.) Either system changes will be seen by a client within this bound, or the client will detect a service outage.