Apache Ignite grid gets auto activated when persistence is disabled - ignite

We are having an Apache Ignite grid where we have persistence enabled but we are facing issues with persistence as we often face grid getting hung due locking while checkpointing.
We now want to move to non persistence grid but the problem is that non-persistent grid is auto active from start itself which we dont want. If grid is auto enabled then it doesn't give us time to do some initial checks before starting processing of tasks.
Is there any way to achieve this either in form of some initial delay or starting grid in inactive form?

Yes, the desired behaviour is totally achievable.
You can specify it as a part of your IgniteConfiguration, here is the right property. Possible options are:
ClusterState.INACTIVE
ClusterState.ACTIVE
ClusterState.ACTIVE_READ_ONLY
Please note, this property should be strictly consistent across a cluster.

Related

What happens when all baseline nodes stop or disconnect from network except one?

We have 3 ignite server nodes in 3 different server farms, full replicated, persistence enabled, and all servers area baseline nodes. It happens that if 2 server nodes fail (node or connection crash or slow connection), the remainig one also perform a shutdown, perhaps guessing it's disconnected from the network.
Is it possible to make the surviving node not to shutdown?
Is it possible to adjust some timeout to avoid disconnections from slow networks or nodes?
I cannot find any hint into the documentation.
To avoid the problem I've to run only one server node (what we tried to avoid using Ignite...).
You can try to customize StopNodeOrHaltFailureHandler https://stackoverflow.com/questions/tagged/ignite with SEGMENTATION added to ignoredFailureTypes.
But in this case, if all 3 nodes are segmented and remain alive, you need to keep in mind that the cluster may enter the split-brain state.
To decide wich node should be used for cache operations, you can add TopologyValidator https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/TopologyValidator.html to cache config. And based on node attributes to decide which node is allowed.

Akka.net / Cluster - How to "Heal" the topology when the leader dies?

I set up a basic test topology with Petabridge Lighthouse and two simple test actors that communicate with each other. This works well so far, but there is one problem: Lighthouse (or the underlying Akka.Cluster) makes one of my actors the leader, and when not shutting the node down gracefully (e.g. when something crashes badly or I simply hit "Stop" in VS) the Lighthouse is not usable any more. Tons of exceptions scroll by and it must be restarted.
Is it possible to configure Akka.Cluster .net in a way that the rest of the topology elects a new leader and carries on?
There are 2 things to point here. One is that if you have a serious risk of your lighthouse node going down, you probably should have more that one -
akka.cluster.seed-nodes setting can take multiple addresses, the only requirement here is that all nodes, including lighthouses, must have them specified in the same order. This way if one lighthouse is going down, another one still can take its role.
Other thing is that when a node becomes unreachable (either because the process crashed on network connection is unavailable), by default akka.net cluster won't down that node. You need to tell it, how it should behave, when such thing happens:
At any point you can configure your own IDowningProvider interface, that will be triggered after certain period of node inactivity will be reached. Then you can manually decide what to do. To use it add fully qualified type name to followin setting: akka.cluster.downing-provider = "MyNamespace.MyDowningProvider, MyAssembly". Example downing provider implementation can be seen here.
You can specify akka.cluster.auto-down-unreachable-after = 10s (or other time value) to specify some timeout given for an unreachable node to join - if it won't join before the timeout triggers, it will be kicked out from the cluster. Only risk here is when cluster split brain happens: under certain situations a network failure between machines can split your cluster in two, if that happens with auto-down set up, two halves of the cluster may consider each other dead. In this case you could end up having two separate clusters instead of one.
Starting from the next release (Akka.Cluster 1.3.3) a new Split Brain Resolver feature will be available. It will allow you to configure more advanced strategies on how to behave in case of network partitions and machine crashes.

How to setup a Akka.NET cluster when I do not really need persistence?

I have a fairly simple Akka.NET system that tracks in-memory state, but contains only derived data. So any actor can on startup load its up-to-date state from a backend database and then start receiving messages and keep their state from there. So I can just let actors fail and restart the process whenever I want. It will rebuild itself.
But... I would like to run across multiple nodes (mostly for the memory requirements) and I'd like to increase/decrease the number of nodes according to demand. Also for releasing a new version without downtime.
What would be the most lightweight (in terms of Persistence) setup of clustering to achieve this? Can you run Clustering without Persistence?
This not a single question, so let me answer them one by one:
So I can just let actors fail and restart the process whenever I want - yes, but keep in mind, that hard reset of the process is a lot more expensive than graceful shutdown. In distributed systems if your node is going down, it's better for it to communicate that to the rest of the nodes before, than requiring them to detect the dead one - this is a part of node failure detection and can take some time (even sub minute).
I'd like to increase/decrease the number of nodes according to demand - this is a standard behavior of the cluster. In case of Akka.NET depending on which feature set are you going to use, you may sometimes need to specify an upper bound of the cluster size.
Also for releasing a new version without downtime. - most of the cluster features can be scoped to a set of particular nodes using so called roles. Each node can have it's set of roles, that can be used what services it provides and detect if other nodes have required capabilities. For that reason you can use roles for things like versioning.
Can you run Clustering without Persistence? - yes, and this is a default configuration (in Akka, cluster nodes don't need to use any form of persistent backend to work).

Google App Engine automatically updating memcache

So here's the problem, I've created a database model. When I create the model, a = Model(args), and then perform a.put(), GAE seems to automatically update the memcache, because all the data seems up-to-date even without me hitting the database. Logging the number of elements in the cache works also shows the correct number of elements. But I'm not manually updating the cache. How do I prevent this? Cheers.
You can set policy functions:
Automatic caching is convenient for most applications but maybe your application is unusual and you want to turn off automatic caching for some or all entities. You can control the behavior of the caches by setting policy functions.
Memcache Policy
That's for NDB. You don't say what language/DB you are using but I'm sure it's all similar.

How to ensure Hazelcast migration is finished

Consider the following scenario.
There are 2 Hazelcast nodes. One is stopped, another is running under quite heavy load.
Now, the second node comes up. The application starts up and its Hazelcast instance hooks up to the first. Hazelcast starts data repartitioning. For 2 nodes, it essentially means
that each entry in IMap gets copied to the new node and two nodes are assigned to be master/backup arbitrarily.
PROBLEM:
If the first node is brought down during this process, and the replication is not done completely, part of the IMap contents and ITopic subscriptions may be lost.
QUESTION:
How to ensure that the repartitioning process has finished, and it is safe to turn off the first node?
(The whole setup is made to enable software updates without downtime, while preserving current application state).
I tried using getPartitionService().addMigrationListener(...) but the listener does not seem to be hooked up to the complete migration process. Instead, I get tens to hundreds calls migrationStarted()/migrationCompleted() for each chunk of the replication.
1- When you gracefully shutdown first node, shutdown process should wait (block) until data is safely backed up.
hazelcastInstance.getLifecycleService().shutdown();
2- If you use Hazelcast Management Center, it shows ongoing migration/repartitioning operation count in home screen.