Is there any way to achieve hazelcast distributed lock fairness?
It doesn't support now.
Please advise
Thankyou
Hazelcast distributed ILocks do not support fairness as is stated in the docs. Blocking operations are put in wait set and picked up randomly, so it can be quite unfair in some situations.
Implementing fairness with distributed locks would decrease performance greatly. Even if it would satisfy your use-case, it might not meet your performance requirements.
In most of the situations Hazelcast EntryProcessor achieves what ILock would offer. It has a FIFO based work queue so processor requests going to the same partitions will be guaranteed to run in FIFO order.
Hazelcast has a variety of distributed data structures. I am sure with the right combination of usage, you can achieve fairness for your use case.
Related
I am evaluating the best approach to distributed locking. The oob reentrant locking support in Ignite is tied to the thread that acquires locks. Our requirements need locking and unlocking not tied to the same thread/process, one thread/process can lock and other thread/process can unlock. So while evaluating alternatives, we came across 2 options,
Use semaphores - This works but is extremely slow. Cannot use this.
Do manual locking - Use a separate cache where we add entries for locks and remove entries for unlock. But it needs too many cases to handle like nodes going down during put ops, which other node gets through, etc.
Just wanted to check if there are other oob performant options available that we might have overlooked.
TIA
Redis can be scaled using replicas and shards. However:
replicas scale only reads, but can provide HA
shards scale both reads and writes, and have the added benefit of requiring less memory than adding a shard.
Based on these facts, if I'm not interested in HA does it make sense to always use shards and not replicas since I get the benefit of scaling both reads and writes, with a smaller memory footprint (and lower costs)?
Yes you can.
About HA, you have to be sure you define/know what is the application behaviour if this shard is becoming not available. (dataloss, service unavailable, ...)
On the replica-read, without having information about your application it is hard to tell; but most of the time a Redis instance (shard) is enough to deal with lot of load. A very "short" rules is, that a shard can deal with 25Gb of data, 25.000 operations/seconds with a sub-ms latency without any problem. Obviously this depends of the type of operations, data and command your are doing; it could be a lot more ops/sec if you do basic set/get.
And usually when we have more than this, we use Clustering to distribute the load.
So before going into the "replica-read" route (that I am trying to avoid as much as possible), take a look to your application, do some benchmark on a single shard: and you will probably see that everything is ok (at least from the workload point of view, but you will have a SPOF if you do not replicate)
Suppose in your web application you need to do a number of redis calls to render a page, like, getting a bunch of user hashes. To speed this up you could wrap up your redis commands in a MULTI/EXEC section, thus using pipelining, so that you avoid doing many round-trips. But you also want to shard your data, because you have lots of it and/or you want to distribute writes. Then pipelining wouldn't work, because different keys would potentially live on different nodes, unless you have a clear idea of the data layout of your application and shard based on roles rather than using a hash function. So, what are the best practices to shard data across different servers without compromising performance too much due to many servers being contacted to complete a "conceptually unique" job? I believe the answer depends on the web application one is developing, and I'll eventually run some tests, but it'd be helpful to know how others have coped with the trade-offs I mentioned.
MULTI/EXEC and pipelining are two different things. You can do MULTI/EXEC without any pipelining and vice versa.
If you want to shard and pipeline at the same time, you need to group the operations to pipeline per Redis instance, and then use pipelining for each instance.
Here is a simple example using Ruby: https://gist.github.com/2587593
One way to further improve performance is to parallelize the traffic on the Redis instances once the operations have been grouped (i.e. you group the operations, you send them to all instances in parallel, you wait for the answers from all instances).
This is a bit more complex, because an asynchronous non blocking client is required. For maximum performance, C/C++ should be used on client side. This can be easily implemented by using hiredis + the event loop of your choice.
I have a doubt that every service could be also highly available.
I want to use redis and activemq service and I want to avoid single point of failure. I also need to keep writing data to the redis and activemq server.
I found many articles about MySQL high availability, but only a few about other database solutions, so my question is if there is a common high availability solution suite for many products?
High availability is one of the principles in CAP theorem and many NoSQL database systems favor rather availability at the expense of data consistency. Replication is often used to achieve high availability for reads, but writes might depend on the type replication being used. Try to look at current redis replication docs or upcoming redis cluster presentation for more information on this stuff.
Greetings,
I'm evaluating some components for a multi-data center distributed system. We're going to be using message queues (via either RabbitMQ or Qpid) so agents can make asynchronous requests to other agents without worrying about addressing, routing, load balancing or retransmission.
In many cases, the agents will be interacting with components that were not designed for highly concurrent access, so locking and cross-agent coordination will be needed to avoid race conditions. Also, we'd like the system to automatically respond to agent or data center failures.
With the above use cases in mind, ZooKeeper seemed like it might be a good fit. But I'm wondering if trying to use both ZK and message queuing is overkill. It seems like what Zookeeper does could be accomplished by my own cluster manager using AMQP messaging, but that would be hard to get really right. On the other hand, I've seen some examples where ZooKeeper was used to implement message queuing, but I think RabbitMQ/Qpid are a more natural fit for that.
Has anyone out there used a combination like this?
Thanks in advance,
-Chris
Coming into this late, but maybe it will be of some use. The primary consideration should be the performance characteristics of your system. ZooKeeper, like you said, is more than capable of implementing a task distribution system using a distributed queue, but zk currently, is more optimized for reads than it is for writes (this only comes into play in the 1000's of ops per second range). If your throughput needs are less than this, then using just zk to implement your system would reduce number of runtime components and make it simpler. Of course, you should always run your performance tests before deciding.
Distributed coordination is really hard to get right, so I would definitely recommend using zookeeper for that and not rolling your own.
Not quite sure what ZooKeeper exactly is, but I guess that using a component from Apache (if it does fit your needs well) is preferred before managing such things as distributed synchronization and group services at your own. You could of course hire a team of developers especially for that purpose, but that doesn't guarantee you a better implementation.
I guess, that it would be anyways implemented as a separate component, cuz other way could bring much complexity and decelerate the workflow; so the preference of ZooKeeper or anything similar is kind of obvious (to me).
And surely, unless you're in the global optimization phase of your project workflow, I guess it would be better to use RabbitMQ or such (I would even stress that, cuz implementations (especially commercial) of the AMQP would be more reliable than everything that you'd come up with).
So I would go for both, carefully chosing the appropriate thirdparty products, but using as much of them as it is needed. And that's just my opinion; thanks for reading :)