GemFire: start distributed system - gemfire

I want to start a distributed system over wan, but I got an exception when trying to start a remote member
I proceed like that :
From server1:
I start a locator (locator1)
I start a server (server1)
From another distant server:
I connect to the cluster with: connect --locator=#ip[port]
I start another server but I got this exception exception
click to view the Exception screen-shot when trying to start a remote member
How can I correct this exception?
And how can I start a member that belongs to a remote cluster in GemFire?

The image doesn't help much, it just says that the member couldn't join the distributed system after 64077ms (probably due to a slow connection but can't be sure looking only at the stack trace), you need to look at the locator and server logs to check why the member couldn't join.
That said, if the cluster is composed of multiple members distributed across geographical distances with slower connections, it's better to use WAN replication instead of a single cluster. Id' recommend a quick reading through Topologies and Communication.
Cheers.

Related

Akka.net / Cluster - How to "Heal" the topology when the leader dies?

I set up a basic test topology with Petabridge Lighthouse and two simple test actors that communicate with each other. This works well so far, but there is one problem: Lighthouse (or the underlying Akka.Cluster) makes one of my actors the leader, and when not shutting the node down gracefully (e.g. when something crashes badly or I simply hit "Stop" in VS) the Lighthouse is not usable any more. Tons of exceptions scroll by and it must be restarted.
Is it possible to configure Akka.Cluster .net in a way that the rest of the topology elects a new leader and carries on?
There are 2 things to point here. One is that if you have a serious risk of your lighthouse node going down, you probably should have more that one -
akka.cluster.seed-nodes setting can take multiple addresses, the only requirement here is that all nodes, including lighthouses, must have them specified in the same order. This way if one lighthouse is going down, another one still can take its role.
Other thing is that when a node becomes unreachable (either because the process crashed on network connection is unavailable), by default akka.net cluster won't down that node. You need to tell it, how it should behave, when such thing happens:
At any point you can configure your own IDowningProvider interface, that will be triggered after certain period of node inactivity will be reached. Then you can manually decide what to do. To use it add fully qualified type name to followin setting: akka.cluster.downing-provider = "MyNamespace.MyDowningProvider, MyAssembly". Example downing provider implementation can be seen here.
You can specify akka.cluster.auto-down-unreachable-after = 10s (or other time value) to specify some timeout given for an unreachable node to join - if it won't join before the timeout triggers, it will be kicked out from the cluster. Only risk here is when cluster split brain happens: under certain situations a network failure between machines can split your cluster in two, if that happens with auto-down set up, two halves of the cluster may consider each other dead. In this case you could end up having two separate clusters instead of one.
Starting from the next release (Akka.Cluster 1.3.3) a new Split Brain Resolver feature will be available. It will allow you to configure more advanced strategies on how to behave in case of network partitions and machine crashes.

JDBC Connection Pooling in a Tomcat Cluster Environment

I'm relatively very new to this, but I have a Tomcat cluster set up (using mod_proxy from httpd) with session replication (separate redis server) for fault-tolerance.
I have a couple of questions about this setup:
My application (spring/hibernate) has a different database per user. So the problem here is that the data source (using spring along with hibernate for persistence) is created at Tomcat level. Thus, whatever connection pooling I do will be at server level.
As per the cluster configuration the Tomcat instances will create their own Connection Pool.
I'd like to know if connection pooling is possible at a cluster level using Tomcat i.e. is there a way to make sure that all the servers in the cluster are using the shared Connection Pool?
I do not want to configure a DataSource on every Tomcat instance because of performance issues. Before the cluster setup, the application was deployed on a single server and the DataSource was configured such that it allowed only a few (50) connections in a connection pool per DataSource.
Now in a clustered environment, I cannot afford to create or split those number of connections on every Tomcat, and also dynamic registration of nodes will create further problems. I'd also like to know is there some alternative solution to this problem if connection pooling is not possible or inefficient?
I'm going to handle your questions in reverse order, since the second one is more simple.
Database connection pooling in Tomcat cannot be configured cluster-wide: you have to configure a separate pool for each node in the cluster. But this doesn't have to be bad news... there's nothing wrong with configuring a node to have 5 or 10 or 100 connections in the connection pool on each node.
It's true, you might end up with a situation where you have too many users connecting to the database at a single time which overwhelms your database, but that could also happen with a single node as well. There isn't anything conceptually different about multiple-nodes that wouldn't also be true for a single node.
the key is to make sure that your cluster balances users appropriately so that you don't have a limit of e.g. 5 database connections per node, but 100 users end up on one node while the other nodes only have 5 users per node. In that case, the popular node (100 users) will have to share those 5 connections while on the other nodes, each user gets a connection all to themselves.
Back to your first item, which is more complicated. If you have a separate database per user, then connection-pooling is an impossible thing to accomplish because you will absolutely have to establish a new connection for every user every time. Those connections aren't poolable, at least not without being quite careful about it. It sounds like you have an architectural issue that you might have to solve before you can identify a technical solution to that issue.

Membase caching pattern when one server in cluster is inaccessible

I have an application that runs a single Membase server (1.7.1.1) that I use to cache data I'd otherwise fetch from our central SQL Server DB. I have one default bucket associated to the Membase server, and follow the traditional data-fetching pattern of:
When specific data is requested, lookup the relevant key in Membase
If data is returned, use it.
If no data is returned, fetch data from the DB
Store the newly returned data in Membase
I am looking to add an additional server to my default cluster, and rebalance the keys. (I also have replication enabled for one additional server).
In this scenario, I am curious as to how I can use the current pattern (or modify it) to make sure that I am not getting data out of sync when one of my two servers goes down in either an auto-failover or manual failover scenario.
From my understanding, if one server goes down (call it Server A), during the period that it is down but still attached to the cluster, there will be a cache key miss (if the active key is associated to Server A, not Server B). In that case, in the data-fetching pattern above, I would get no data returned and fetch straight from SQL Server. But, when I attempt to store the data back to my Membase cluster, will it store the data in Server B and remap that key to Server B on the next fetch?
I understand that once I mark Server A as "failed over", Server B's replica key will become the active one, but I am unclear about how to handle the intermittent situation when Server A is inaccessible but not yet marked as failed over.
Any help is greatly appreciated!
That's a pretty old version. But several things to clarify.
If you are performing caching you are probably using a memcached bucket, and in this case there is no replica.
Nodes are always considered attached to the cluster until they are explicitly removed by administrative action (autofailover attempts to automate this administrative action for you by attempting to remove the node from the cluster if it's determined to be down for n amount of time).
If the server is down (but not failed over), you will not get a "Cache Miss" per se, but some other kind of connectivity error from your client. Many older memcached clients do not make this distinction and simply return a NULL, False, or similar value for any kind of failure. I suggest you use a proper Couchbase client for your application which should help differentiate between the two.
As far as Couchbase is concerned, data routing for any kind of operation remains the same. So if you were not able to reach the item on Server A. because it was not available, you will encounter this same issue upon attempting to store it back again. In other words, if you tried to get data from Server A and it was down, attempting to store data to Server A will fail in the exact same way, unless the server was failed over between the last fetch and the current storage attempt -- in which case the client will determine this and route the request to the appropriate server.
In "newer" versions of Couchbase (> 2.x) there is a special get-from-replica command available for use with couchbase (or membase)-style buckets which allow you to explicitly read information from a replica node. Note that you still cannot write to such a node, though.
Your overall strategy seems very sane for a cache; except that you need to understand that if a node is unavailable, then a certain percentage of your data will be unavailable (for both reads and writes) until the node is either brought back up again or failed over. There is no

Couchbase node failure

My understanding could be amiss here. As I understand it, Couchbase uses a smart client to automatically select which node to write to or read from in a cluster. What I DON'T understand is, when this data is written/read, is it also immediately written to all other nodes? If so, in the event of a node failure, how does Couchbase know to use a different node from the one that was 'marked as the master' for the current operation/key? Do you lose data in the event that one of your nodes fails?
This sentence from the Couchbase Server Manual gives me the impression that you do lose data (which would make Couchbase unsuitable for high availability requirements):
With fewer larger nodes, in case of a node failure the impact to the
application will be greater
Thank you in advance for your time :)
By default when data is written into couchbase client returns success just after that data is written to one node's memory. After that couchbase save it to disk and does replication.
If you want to ensure that data is persisted to disk in most client libs there is functions that allow you to do that. With help of those functions you can also enshure that data is replicated to another node. This function is called observe.
When one node goes down, it should be failovered. Couchbase server could do that automatically when Auto failover timeout is set in server settings. I.e. if you have 3 nodes cluster and stored data has 2 replicas and one node goes down, you'll not lose data. If the second node fails you'll also not lose all data - it will be available on last node.
If one node that was Master goes down and failover - other alive node becames Master. In your client you point to all servers in cluster, so if it unable to retreive data from one node, it tries to get it from another.
Also if you have 2 nodes in your disposal you can install 2 separate couchbase servers and configure XDCR (cross datacenter replication) and manually check servers availability with HA proxies or something else. In that way you'll get only one ip to connect (proxy's ip) which will automatically get data from alive server.
Hopefully Couchbase is a good system for HA systems.
Let me explain in few sentence how it works, suppose you have a 5 nodes cluster. The applications, using the Client API/SDK, is always aware of the topology of the cluster (and any change in the topology).
When you set/get a document in the cluster the Client API uses the same algorithm than the server, to chose on which node it should be written. So the client select using a CRC32 hash the node, write on this node. Then asynchronously the cluster will copy 1 or more replicas to the other nodes (depending of your configuration).
Couchbase has only 1 active copy of a document at the time. So it is easy to be consistent. So the applications get and set from this active document.
In case of failure, the server has some work to do, once the failure is discovered (automatically or by a monitoring system), a "fail over" occurs. This means that the replicas are promoted as active and it is know possible to work like before. Usually you do a rebalance of the node to balance the cluster properly.
The sentence you are commenting is simply to say that the less number of node you have, the bigger will be the impact in case of failure/rebalance, since you will have to route the same number of request to a smaller number of nodes. Hopefully you do not lose data ;)
You can find some very detailed information about this way of working on Couchbase CTO blog:
http://damienkatz.net/2013/05/dynamo_sure_works_hard.html
Note: I am working as developer evangelist at Couchbase

Is there a log file of running processes in Server Advantage

My name is Josue
I need your help with this:
Is there any way to audit or monitor the server processes that connect to the
Advantage Database Server?
Is there a log of running processes?
Thank's
There is no existing log of processes that use Advantage Database Server. Because it is a client/server architecture, there is no mechanism that I am aware of that can easily associate a connection on the server to a specific process.
However, it would be possible to use the system procedure sp_mgGetConnectedUsers() to obtain some of this information. It might be possible to use it to obtain the information you are looking for at a given point in time (a snapshot).
The output of that procedure includes three fields that you might be interested in. The Address column gives the address of the machine that connected to Advantage. It is typically the IP address of the client application. But it can also be of the form "IPC Connection N", which indicates that it is using shared memory for communications; this means that the client process is running on the same machine as the server.
The TSAddress column might also be of interest. If the connection is made by a client that is running through terminal services (e.g., a remote desktop), then that column contains the IP address of the client machine. If you are interested in knowing processes that originate from the server machine itself, then you would need this field to differentiate between those and clients that connected through terminal services.
The other column of potential interest would be ApplicationID. By default, that field contains the process name (e.g., the executable) of the client application. This could help identify the actual process. It is not guaranteed, though. The application itself can change that value through mechanisms such as sp_SetApplicationID.