We want to integrate Apache Ignite in our application. Our application is deployed on multiple servers on our customer networks and is able to update itself.
The problem is that while the application is updating we have servers running with different versions at the same time. Therefore the updated servers rejoin the cluster and get a copy of the data based on old classes. While all servers are updated the data gets handed around and we end up with a cluster of updated servers and old data.
My goal is to form a new empty cluster with the updated servers. I allready read that there are multiple options to do this like changing the multicast group or the localPort. But I need a solution that works for Multicast and direct IP connections and doesn't change the ports. Changing the ports can be a problem because of firewall restrictions.
I wonder if it is somehow possible to filter the nodes provided by the ipFinder and check if they are the same version.
I think you have several options:
you can deactivate your cluster before the update and activate this only when last app instance is updated(https://apacheignite.readme.io/docs/baseline-topology#section-cluster-activation-tool)
destroy caches and create them again when last app instance is updated
you can set node attribute( set app version for example ) and cache filter, so new version will use a new cache which stores data only on nodes with attribute "version 4", while previous one stores data only on nodes with "version 3", so when all instances will be updated there will be no nodes with attribute "version 3" and only caches for new version will survive and store data on nodes "version 4".
( https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/CacheConfiguration.html#getNodeFilter-- )
However, in this case, you need to use new cache names for each new app version.
Related
I am working on Mulesoft application which I have deployed in Mule servers of two different physical machines.The servers are binded together to form a cluster.
In clustering mode, the servers are said to share common distributed memory such that if one machine goes down,the other machine takes up the task of the first machine.So,they maintain common distributed memory between them.
Is there any way to configure the memory for the common distributed memory the cluster leverages?
As the traffic/number of applications gets added up,I guess,there will be need to lift the threshold memory up for the respective cluster.
Or if not,do we ever have to modify the memory volume at all that Mulesoft cluster uses ?
Please help me out.
Thanks
In clustered scenarios all object stores are replaced with clustered object stores. Clustered object stores use the shared memory grid created by the clustering code to persist information (meaning that there is no file system level persistence), in case of an outage with a node, other nodes in the cluster should remain active and maintain the OS information in the shared memory grid, thus making the persistence in the file system unnecessary
Additionally, since object stores use the name of the application as part of storage information, if you want to keep them across re-deployments, the newly deployed application must have the same name as the previous one. Please see below as a reference:
Scenario a:
1. Current application name: test
2. New application name: test
- Object store values will be preserved from 1 to 2.
Scenario b:
1. Current application name: test-v1
2. New application name: test-v2
- Object store values will not be preserved from 1 to 2.
Note, In-memory store – Prior to Mule 3.5.0, in-memory store was the default. As of Mule 3.5.0, persistent store is the default.
Mulesoft has a felicity of active-active server here we need not to bother about which server has to work when one server is down another will work. The memory is similar to jvm memory consumption.
I am doing data base scaling using postgresql.
Currently i am using pg_shard for scaling and able to do sharding and replication. i have tested the example that mentioned in Readme file of pg_shard.
But i need dynamically scale a cluster as new machines are added or old ones are retired.I am using google cloud VM to setup database .So once one VM is filled with data i want to setup new instance with same configuration.
ie,if the current machine size is 4GB and is of out of memory then it should create one more VM with 4GB size and next entries should come there.
I have gone through http://slideplayer.com/slide/4896815/ and after reading this i understood that it is possible to do but the steps are not mentioned anywhere.
How to achieve this using pg_shard?
I got the answer myself.
We can use CitusDB for this.
CitusDB is installed with an extension called "shard_rebalancer", which helps you to move the shards around when new nodes are added to the cluster. For this, you need to follow the installation instructions for CitusDB.
In this documentation, you can find about the related information for the shard rebalancer functions (i.e., rebalance_table_shards and replicate_table_shards)
With simpler words, you must follow the steps:
Add CitusDB node(s) to the cluster
Add the IPs (or host names) to pg_worker_list.conf
Reload the master node configuration, so that the master becomes aware of the new worker node(s)
Run "SELECT rebalance_table_shards('tablename')" on the master node.
Looking through the Infinispan getting started guide it states [When in replication mode]
Infinispan only replicates data to nodes which are already in the
cluster. If a node is added to the cluster after an entry is added, it
won’t be replicated there.
Which I read as any cluster member will always be ignorant of any data that existed in the cluster before it became a cluster member.
Is there a way to force Infinispan to replicate all existing data to a new cluster member?
I see two options currently but I'm hoping I can just get Infinispan to do the work.
Use a distributed cache and live with the increase in access times inherent in the model, but this at least leaves Infinispan to handle its own state.
Create a Listener to listen for a new cache member joining and iterate through the existing data, pushing it into the new member. Unfortunately this would in effect cause every entry to replicate out to the existing cluster members again. I don't think this option will fly.
This information sounds as misleading/outdated. When the node joins a cluster, a rebalance process is initiated and when you query for these data during the rebalance prior to delivering these data to the node, the entry is fetched by remote RPC.
My understanding could be amiss here. As I understand it, Couchbase uses a smart client to automatically select which node to write to or read from in a cluster. What I DON'T understand is, when this data is written/read, is it also immediately written to all other nodes? If so, in the event of a node failure, how does Couchbase know to use a different node from the one that was 'marked as the master' for the current operation/key? Do you lose data in the event that one of your nodes fails?
This sentence from the Couchbase Server Manual gives me the impression that you do lose data (which would make Couchbase unsuitable for high availability requirements):
With fewer larger nodes, in case of a node failure the impact to the
application will be greater
Thank you in advance for your time :)
By default when data is written into couchbase client returns success just after that data is written to one node's memory. After that couchbase save it to disk and does replication.
If you want to ensure that data is persisted to disk in most client libs there is functions that allow you to do that. With help of those functions you can also enshure that data is replicated to another node. This function is called observe.
When one node goes down, it should be failovered. Couchbase server could do that automatically when Auto failover timeout is set in server settings. I.e. if you have 3 nodes cluster and stored data has 2 replicas and one node goes down, you'll not lose data. If the second node fails you'll also not lose all data - it will be available on last node.
If one node that was Master goes down and failover - other alive node becames Master. In your client you point to all servers in cluster, so if it unable to retreive data from one node, it tries to get it from another.
Also if you have 2 nodes in your disposal you can install 2 separate couchbase servers and configure XDCR (cross datacenter replication) and manually check servers availability with HA proxies or something else. In that way you'll get only one ip to connect (proxy's ip) which will automatically get data from alive server.
Hopefully Couchbase is a good system for HA systems.
Let me explain in few sentence how it works, suppose you have a 5 nodes cluster. The applications, using the Client API/SDK, is always aware of the topology of the cluster (and any change in the topology).
When you set/get a document in the cluster the Client API uses the same algorithm than the server, to chose on which node it should be written. So the client select using a CRC32 hash the node, write on this node. Then asynchronously the cluster will copy 1 or more replicas to the other nodes (depending of your configuration).
Couchbase has only 1 active copy of a document at the time. So it is easy to be consistent. So the applications get and set from this active document.
In case of failure, the server has some work to do, once the failure is discovered (automatically or by a monitoring system), a "fail over" occurs. This means that the replicas are promoted as active and it is know possible to work like before. Usually you do a rebalance of the node to balance the cluster properly.
The sentence you are commenting is simply to say that the less number of node you have, the bigger will be the impact in case of failure/rebalance, since you will have to route the same number of request to a smaller number of nodes. Hopefully you do not lose data ;)
You can find some very detailed information about this way of working on Couchbase CTO blog:
http://damienkatz.net/2013/05/dynamo_sure_works_hard.html
Note: I am working as developer evangelist at Couchbase
I have a service based architecture where a web farm full of asp clients hit application server farm of WCF services. Obviously all the database access is done by the WCF services. Now I would like to cache my frequently used database retrieved objects using Velocity at the service tier level. I am considering to make each physical application server also part of the cache cluster.
According to Velocity documentation, if I use regions, objects are stored only at a single host. I actually wouldn't have any problem if each host kept it's own cache provided that I could somehow synchronize them.
So my questions are
If I create one region on one host is it also created on another one?
When I clear a cache region, is it cleared on one host only?
If I subscribe to a region level notification on all the hosts, can I catch events of one host on another one?
In this scenario should I use regions at all or stay away from them?
I hope my questions are clear. Actually I am more interested in a solution to my problem than answers to my questions
Yes you are right in reading the doc that the region will exists only in one host.
" I actually wouldn't have any problem if each host kept it's own cache provided that I could somehow synchronize them."
When you say synchronize, you mean when HA in enabled ? Velocity would actually take care of that if thats what you meant.
For the questions:
1. No.
2. Yes
3. Notifications will be sent to the client. So i am not sure if there is anyway to send notifications to other host.
4. Regions gives Search capabilities and takes away HA from you. In your case, you could use the advantages of HA.
Having regions not necessarily means that you don't have HA. if your create your own cache (and don't use the 'default' one) you can create it with Secondarys = 1 (HA on)
now let’s say you have 4 cache hosts; when you define a region , it will have both primary and secondary hosts. so each action on the region will result it being applied in both.
Shany
Named caches distribute across participating nodes. Named regions live on a single node. Regions can be HA, but they cannot take full advantage of distributed cache scaling, as their object load does not distribute across participating nodes in the cluster. Also, using named caches with HA requires three nodes minimum, rather than two nodes if you used the "default" cache only.