Change initial token to vnode in Cassandra 2.1.7 - datastax

Currently I am running Cassandra cluster with 4 nodes with initial token range.
It has not been repaired since long time due to gc failed.
Now, I want to change initial token to vnode (num token).
What should I do?
1. Shall I run 'nodetool repair' and then change initial toke to vnode?
2. Shall I change initial token to vnode first and then run 'nodetool repair' ?
3. Do I need to run 'nodetool repair' on all the node or on any one node?

You cannot directly convert a single-token nodes to a vnode. However,
you can configure another datacenter configured with vnodes already
enabled and let Cassandra automatic mechanisms distribute the existing
data into the new nodes. This method has the least impact on
performance.
Existing cluster to vNodes
Nodetool repair will have to be ran on each node and doing with advanced option -pr will be helpful
nodetool repair -pr

Related

Is there a way to evict vernemq cached auth_on_register, auth_on_publish, auth_on_subscribe hook data from in memory

Vernemq build : 1.10.4.1+build.76.ref4f0bbab
Erlang Version : 22
As per vernemq documents the hook data is stored in in memory cache and is not actively disposed.
We have around 360k clients distributed over cluster of 8 nodes.
The client id, username and password do not change and are fixed for 320k clients, where as the rest 40k clients keep changing. These 40k clients also subscribe and publish to at most 3 topics. The clients tend to disconnect and connect back to any node from the cluster once in a day, due to which the hook data is being cached on all the nodes and increasing the memory. The memory keeps increasing on a daily basis, and the memory usage curve has not flattened.
Issue: I fear at some point of time we will get in OOM errors and the nodes can go down.
I tried clearing memory using echo commands (1 2 and 3) but only buff cache memory was cleared and the hook data was not.
Is there a way to clear or evict the hook data from the in memory?
Since vernemq is written in erlang, the hook data is stored in Built-in term storage (ETS-Erlang term storage). These provide the ability to store very large quantities of data in an Erlang runtime system, and to have constant access time to the data. Find more details here https://erlang.org/doc/man/ets.html
Note : Only the owner process can delete the table or all the objects from the table, here the owner process is vernemq broker itself.
To answer my question, below are the following code changes made to the vernemq source code inorder to evict/ delete all the cache objects(it is mandatory to build the vernemq from source).
There is a command ./vmq-admin webhooks cache show --reset which resets (deletes all objects) from the vmq_webhooks_stats ets table (resets the cache hits,entries,missess).
This command is defined in vmq_webhooks_cli.erl, under cache_stats_cmd() function.
Just replace vmq_webhooks_cache:reset_stats() with vmq_webhooks_cache:purge_all()
Build the source and start the broker with the updated changes.
On invoking ./vmq-admin webhooks cache show --reset command both the hook data and the stats will be deleted.
This change helped me in solving the OOM issue, which was eventually faced after sometime.

The infinispan cluster node expiration result is inconsistent with the official documentation description

When I tested the expiration method of infinispan cluster node cache, I found that when the node reached the maximum idle time, it would not get "the last time the entry is accessed" from other nodes in the cluster, but directly invalidate the cache entry of the node. For example: I started two nodes A and B, and set the maximum idle time of the cache to 10s. At the beginning of the test, I sent a request to Node A to access the database records and write the database records to the cache. At this time, Node A synchronizes the data cache to Node B. Then at 5s, I accessed the cache entry at Node A, and then at Node B after 10s. I found that the cache entry at Node B was invalid, Node B retrieved the database records from the database, and wrote the cache and synchronized to other nodes, instead of treating the cache as valid.
Why is it different from the description in the document? http://infinispan.org/docs/stable/user_guide/user_guide.html#expiration_details
For the configuration of cluster node cache expiration failure, I configure it as follows:
Configuration C = new ConfigurationBuilder()
.expiration().enableReaper(). wakeUpInterval(50000L).maxIdle(10000l).build();
It sounds like you are using an older version of Infinispan. Cluster wide max idle expiration wasn't introduced until 9.3 in https://issues.jboss.org/browse/ISPN-9003. If this issue still persists with 9.3 or newer, you can log a bug at https://issues.jboss.org/projects/ISPN.

How to make sure initial replication is completed for a new node (Apache Ignite)?

Here is a use case:
I have version 1 of a web app deployed.
It uses couple Ignite-powered distributed (configured for replication) Maps, Sets and other data structures.
I'm going to deploy v2 of this application and once data is replicated I'm going to shutdown v1 of this app and re-route users (using nginx) to new instance (v2).
I can see that Ignite on v1 and v2 can discover each other and automatically perform replication of data structures.
My intention: I don't want to shutdown 1st instance (v1) before all data is replicated to 2nd instance (v2).
Question is: how do I know if initial replication is completed? Is there any event that is fired in such cases, or maybe some other way to accomplish this task?
If you configure you caches to use synchronous rebalancing [1], second node will not complete start process before rebalancing is completed. This way you will guarantee that all the data is replicated to the second node (of course, assuming that you're using fully replicated caches).
[1] https://apacheignite.readme.io/docs/rebalancing#section-rebalance-modes

Aerospike cluster rebalancing causing errors

When adding a new node to an Aerospike cluster, a rebalance happens for the new node. For large data sets this takes time and some requests to the new node fail until rebalance is complete. The only solution I could figure out is retry the request until it gets the data.
Is there a better way?
I don't think it is possible to keep the node out of cluster for requests until it's done replicating because it is also master for one of the partitions.
If you are performing batch-reads, there is an improvement in 3.6.0. While the cluster is in-flux, if the client directs the read transaction to Node_A, but the partition containing the record has been moved to Node_B, Node_A proxies the request to Node_B.
Is that what you are doing?
You should not be in a position where the client cannot connect to the cluster, or it cannot complete a transaction.
I know that SO frowns on this, but can you provide more detail about the failures? What kinds of transactions are you performing? What versions are you using?
I hope this helps,
-DM
Requests shouldn't be failing, the new node will proxy to the node that currently has the data.
Prior to Aerospike 3.6.0 batch read requests were the exception. I suspect this is your problem.

Couchbase node failure

My understanding could be amiss here. As I understand it, Couchbase uses a smart client to automatically select which node to write to or read from in a cluster. What I DON'T understand is, when this data is written/read, is it also immediately written to all other nodes? If so, in the event of a node failure, how does Couchbase know to use a different node from the one that was 'marked as the master' for the current operation/key? Do you lose data in the event that one of your nodes fails?
This sentence from the Couchbase Server Manual gives me the impression that you do lose data (which would make Couchbase unsuitable for high availability requirements):
With fewer larger nodes, in case of a node failure the impact to the
application will be greater
Thank you in advance for your time :)
By default when data is written into couchbase client returns success just after that data is written to one node's memory. After that couchbase save it to disk and does replication.
If you want to ensure that data is persisted to disk in most client libs there is functions that allow you to do that. With help of those functions you can also enshure that data is replicated to another node. This function is called observe.
When one node goes down, it should be failovered. Couchbase server could do that automatically when Auto failover timeout is set in server settings. I.e. if you have 3 nodes cluster and stored data has 2 replicas and one node goes down, you'll not lose data. If the second node fails you'll also not lose all data - it will be available on last node.
If one node that was Master goes down and failover - other alive node becames Master. In your client you point to all servers in cluster, so if it unable to retreive data from one node, it tries to get it from another.
Also if you have 2 nodes in your disposal you can install 2 separate couchbase servers and configure XDCR (cross datacenter replication) and manually check servers availability with HA proxies or something else. In that way you'll get only one ip to connect (proxy's ip) which will automatically get data from alive server.
Hopefully Couchbase is a good system for HA systems.
Let me explain in few sentence how it works, suppose you have a 5 nodes cluster. The applications, using the Client API/SDK, is always aware of the topology of the cluster (and any change in the topology).
When you set/get a document in the cluster the Client API uses the same algorithm than the server, to chose on which node it should be written. So the client select using a CRC32 hash the node, write on this node. Then asynchronously the cluster will copy 1 or more replicas to the other nodes (depending of your configuration).
Couchbase has only 1 active copy of a document at the time. So it is easy to be consistent. So the applications get and set from this active document.
In case of failure, the server has some work to do, once the failure is discovered (automatically or by a monitoring system), a "fail over" occurs. This means that the replicas are promoted as active and it is know possible to work like before. Usually you do a rebalance of the node to balance the cluster properly.
The sentence you are commenting is simply to say that the less number of node you have, the bigger will be the impact in case of failure/rebalance, since you will have to route the same number of request to a smaller number of nodes. Hopefully you do not lose data ;)
You can find some very detailed information about this way of working on Couchbase CTO blog:
http://damienkatz.net/2013/05/dynamo_sure_works_hard.html
Note: I am working as developer evangelist at Couchbase