In Apache Ignite, how to control on which node cache is created - ignite

In ignite, how can I control on which node cache is created? If I need to guarantee one cache is created on all nodes, how can I do that?
Will following codes create cache on all nodes or just some of them?
ignite.cluster().forServers().ignite().createCache("myCache")
Thanks.

By default Ignite creates a cache on the entire set of server nodes.
However it's possible to control that behaviour. There's a mechanism called node filter to choose a subset of nodes to store data for a cache.
What I'm trying to say here is that a cache is created everywhere even by the call:
ignite.getOrCreateCache("myCache")
To make you compute call collocated you could utilize affinityCall. More detailed info can be found here. Example (this lambda is going to be executed on a node storing the myKey key):
ignite.compute().affinityCall("myCache", myKey, () -> {
// do something
return "something";
})
Another option is to specify a subset of nodes (maybe even just one node) for your computation. Something like that (this lambda is going to be executed on a node with the nodeId id):
ignite.compute(ignite.cluster().forNodeId(nodeId)).call(() -> {
// do something
return "something";
})

In short, to have a cache on all nodes you need to configure REPLICATED cache mode. The default mode is PARTITIONED one which means data will be spread equally across cluster nodes.
I think configuring nodeFilters is the easiest way of adjusting the default behavior, you can say to Ignite which nodes should not keep the data depending on some user-defined node attributes. Please, be aware that you should have a good reason behind changing the default distribution and understand the trade-offs.

Related

Using IgniteSet as part of dynamic remote filter for Ignite's Continuous Query

Has anyone ever tried using IgniteSet or similar data structures for setting remote filters on a continuous query? There is not much documentation on how IgniteSet works and hence this question. Basically my use case is as follows:
I have a distributed cache implemented using Ignite. A user is interested in real time updates from my cache based on some criteria. I will have more than one user subscribing to these updates. Rather than run n continuous queries, I intend to run one continuous query for n users with the remote filter backed by some distributed data structure.
I think IgniteSet could work for me. But I am not sure how this will affect the performance of my app server in production since I am not entirely sure how IgniteSet would work (due to minimal documentation on this topic). Basically, if I need to update the ignite set data structure, will it be dynamically updated for all remote nodes as well and will this mean I will start receiving updates for the filter that might be evaluated (to true) on these remote nodes?
qry.setRemoteFilterFactory(new Factory<CacheEntryEventFilter<PersonKey, Person>>() {
#Override public CacheEntryEventFilter<PersonKey, Person> create() {
return new CacheEntryEventFilter<PersonKey, Person>() {
#Override public boolean evaluate(CacheEntryEvent<? extends PersonKey, ? extends Person> e) {
//IgniteSet maintained outside of filter
return igniteSet.contains(e.getKey().getCity());
}
};
}
});
Sorry if I am missing something obvious here.
Any help would be greatly appreciated!
IgniteSet is backed by a cache, and like all Ignite caches, is designed to allow all nodes to see updates as soon as they are available.
see: https://ignite.apache.org/docs/latest/data-structures/queue-and-set for configuration settings.
The design you are proposing is subject to race conditions. A consumer of the continuous query could come in before the appropriate writer had a chance to update the given IgniteSet.
Use appropriate synchronization mechanisms to work out all edge conditions. Examples/Descriptions here: https://ignite.apache.org/features/datastructures.html

Redisson local cache use

I have two questions regarding the reddison client:
Does redisson support automatic synchronization of local cache with remote redis cache (when remote cache data change or invalidate)?
I understand that redisson supports data partitioning only in pro edition but isn't that feature already supported OOTB by redis cluster mode? Am I missing something here?
Answering to your questions:
RLocalCachedMap has two synchronization strategies:
INVALIDATE - Used by default. Invalidate cache entry across all RLocalCachedMap instances on map entry change.
UPDATE - Update cache entry across all LocalCachedMap instances on map entry change.
Right, all Redisson objects works also in cluster mode. Each object tied to some Redis node and its content always remain only on the same Redis node and not distributed. If your object couldn't fit in single Redis node then you need to use data partitioning feature. This feature evenly distributes content of object across multiple Redis nodes in cluster.
Re: "local cache truely local" -- I think you can just use a java Map, initially populate it with a RMap contents then from then on just serve your requests from the 'truely local' map in memory.

Redis Cluster minimal configuration

Actually I'm using a configuration of Redis Master-Slaves with HAProxy for Wordpress to have High Avaibility. This configuration is nice and works perfect (I'm able to remove any server to maintenance without downtime). The problem of this configuration is that only one Redis server is getting all the traffic and the others are just waiting if that server dies, so in a very high load webpage can be a problem, and add more servers is not a solution because always only one will be master.
With this in mind, I'm thinking if maybe I can just use a Redis Cluster to allow to read/write on all nodes but I'm not really sure if it will works on my setup.
My setup is limited to three nodes the most of times, and I've read in some places that Redis cluster minimal setup is three nodes, but six is recommended. This is rational because this setup allow to have Slaves nodes that will become Masters if her Master dies, and then all data will be kept, but what happend if data don't cares?. I mean, on my setups the data is just cached objects, so if don't exists it just create it again so:
The data will be lost (don't care), and the other nodes will get the objects from clients again, to serve it on later requests (like happen if a Flush the data).
The nodes will answer that data doesn't exists and will reject to cache because the object would have to be on other node that is dead.
Someone know it?
Thanks!!
When a master dies, the Redis cluster goes to a down state and any command involving a key served by the failed instance will fail.
This may differ from some other distributed software because Redis Cluster is not the kind of program that every master holds all data. In fact, the key space is horizontally partitioned and each key is served by only one master.
This is mentioned in the specification:
The key space is split into 16384 slots...
a single hash slot will be served by a single node...
The base algorithm used to map keys to hash slots is the following:
HASH_SLOT = CRC16(key) mod 16384
When you setup a cluster, you certainly ask each node to serve a set of slots, and each slot can only be served by one node. If one node dies, you lose the slots on this node unless you have a slave failover to serve them, so that any command involving keys mapped to these slots will fail.

How to setup a Akka.NET cluster when I do not really need persistence?

I have a fairly simple Akka.NET system that tracks in-memory state, but contains only derived data. So any actor can on startup load its up-to-date state from a backend database and then start receiving messages and keep their state from there. So I can just let actors fail and restart the process whenever I want. It will rebuild itself.
But... I would like to run across multiple nodes (mostly for the memory requirements) and I'd like to increase/decrease the number of nodes according to demand. Also for releasing a new version without downtime.
What would be the most lightweight (in terms of Persistence) setup of clustering to achieve this? Can you run Clustering without Persistence?
This not a single question, so let me answer them one by one:
So I can just let actors fail and restart the process whenever I want - yes, but keep in mind, that hard reset of the process is a lot more expensive than graceful shutdown. In distributed systems if your node is going down, it's better for it to communicate that to the rest of the nodes before, than requiring them to detect the dead one - this is a part of node failure detection and can take some time (even sub minute).
I'd like to increase/decrease the number of nodes according to demand - this is a standard behavior of the cluster. In case of Akka.NET depending on which feature set are you going to use, you may sometimes need to specify an upper bound of the cluster size.
Also for releasing a new version without downtime. - most of the cluster features can be scoped to a set of particular nodes using so called roles. Each node can have it's set of roles, that can be used what services it provides and detect if other nodes have required capabilities. For that reason you can use roles for things like versioning.
Can you run Clustering without Persistence? - yes, and this is a default configuration (in Akka, cluster nodes don't need to use any form of persistent backend to work).

Implementing Cuckoo Filter on multiple nodes in Redis

I'm trying to implement cuckoo filter in Redis. What I have till now works fine except that it just inserts all the values on a single node even when working on a cluster.
In order to implement it on multiple nodes, I'm thinking of directing different elements to different nodes using some hash function. Is there any command or function call in Redis that allows forcing of elements to a particular node using its key or number, or even a particular slot?
For reference, this is the implementation of cuckoo filter I have till now.
As an aside, is there any existing implementation of Cuckoo Filter or Bloom Filter on distributed nodes in Redis that I can refer to?
This page explains how Redis cluster works and how the redis-cli works when using it in cluster mode. Other clients do a better handling of the operations in cluster mode, but the basic functionality of the redis-cli should work for simple tests.
If you check the code of other data structures (for example, hash or set) that come with Redis, you'll notice that they do not have code to deal with cluster mode. This is handled by the code in cluster.c, and should be orthogonal to your implementation. Are you sure you have correctly configured the cluster and the Redis cli?