Load balance sessions vs requests - load-balancing

I have a virtual data layer cluster set up with a Netscaler load balancer. This virtual data layer dispatches queries for the client to different data sources and returns the data to the client.
In Netscaler, my VIP uses the same weight of "1" for each node in the cluster (we have 2 nodes). This with the idea of maintaining the same number of queries going to each node in the cluster. The issue is that some days one node is underbalanced with queries and that same node the next day can be overloaded with queries. I checked the virtual layer logs and noticed that the SESSIONS are balanced; but a session can have an different number of queries. So, Netscaler sends equal number of sessions to the nodes but the server may end up processing more queries that the other. What we need to do is to balance the queries, not the sessions. So, my question, is there a way to differentiate between sessions and requests (or queries) in the Netscaler settings?

Related

Regarding cluster configuration in Ignite

Let us say I've two server nodes in one data center DC1 and two more server
nodes in another data center DC2. Two data centers have some network delay.
Now I'm using SQL select statements on caches which are replicated. Now
those caches' write synchronization mode is FULL_SYNC.
Now at a time we have working clients nodes only in one DC but not both.
Let's say we have two clients in DC1.
So total nodes is 6 (2 client nodes and 2 server nodes in DC1 and 2
server nodes in DC2).
Our use case is such a way that..
2 clients should query only 2 server nodes in DC1 and not the other 2
servers in DC2.
All the cache queries should be in FULL_SYNC with 2 server nodes in DC1
and DC1-DC2 should be done in ASYNC mode.
A doubt I got which is, if in client's node discoveryspi, if I (X,Y) ip
list as server nodes ips, would the queries always reach X,Y even though the
entire topology contains X,Y,Z as server nodes?
Please someone provide us the solution for this.
Note: I saw one GridGain's capability for cluster-cluster replication but that comes under paid version. I am looking for a solution in the community edition.
A doubt I got which is, if in client's node discoveryspi, if I (X,Y) ip list as server nodes ips, would the queries always reach X,Y
even though the entire topology contains X,Y,Z as server nodes?
No, DiscoverySPI is used only for the connecting to the cluster, after that, client node will be working with all nodes in the cluster.
All the cache queries should be in FULL_SYNC with 2 server nodes in
DC1 and DC1-DC2 should be done in ASYNC mode.
It's not possible to do this, only one synchronization mode can be used for one cache in the cluster.
2 clients should query only 2 server nodes in DC1 and not the other 2 servers in DC2.
It's not possible to do this for cache operations, but you can do this for computing operations - you can send a job to a certain node with a primary or backup copy in DC1 and it will take the local partition. But compute creates some overhead compared to the plain cache operations if it used only for getting the entries.
So, as you mentioned, the best way here is the DataCenter Replication, which is available as a part of GridGain, because, based on your requirements, you need 2 separate clusters here.

Ignite C++, Server-client cluster load balancing performance issue

I have 2 nodes, in which im trying to run 4 ignite servers, 2 on each node and 16 ignite clients, 8 on each node. I am using replicated cache mode. I could see the load on cluster is not distributed eventually to all servers.
My intension of having 2 servers per node is to split the load of 8 local clients to local servers and server can work in write behind to replicate the data across all servers.
But I could notice that only one server is taking the load, which is running at 200% cpu and other 3 servers are running at very less usage of around 20%cpu. How can I setup the cluster to eventually distribute the client loads across all servers. Thanks in advance.
I'm generating load by inserting same value 1Million times and trying to get the value using the same key
Here is your problem. Same key is always stored on the same Ignite node, according to Affinity Function (see https://apacheignite.readme.io/docs/data-grid), so only one node takes read and write load.
You should use a wide range of keys instead.

JDBC Connection Pooling in a Tomcat Cluster Environment

I'm relatively very new to this, but I have a Tomcat cluster set up (using mod_proxy from httpd) with session replication (separate redis server) for fault-tolerance.
I have a couple of questions about this setup:
My application (spring/hibernate) has a different database per user. So the problem here is that the data source (using spring along with hibernate for persistence) is created at Tomcat level. Thus, whatever connection pooling I do will be at server level.
As per the cluster configuration the Tomcat instances will create their own Connection Pool.
I'd like to know if connection pooling is possible at a cluster level using Tomcat i.e. is there a way to make sure that all the servers in the cluster are using the shared Connection Pool?
I do not want to configure a DataSource on every Tomcat instance because of performance issues. Before the cluster setup, the application was deployed on a single server and the DataSource was configured such that it allowed only a few (50) connections in a connection pool per DataSource.
Now in a clustered environment, I cannot afford to create or split those number of connections on every Tomcat, and also dynamic registration of nodes will create further problems. I'd also like to know is there some alternative solution to this problem if connection pooling is not possible or inefficient?
I'm going to handle your questions in reverse order, since the second one is more simple.
Database connection pooling in Tomcat cannot be configured cluster-wide: you have to configure a separate pool for each node in the cluster. But this doesn't have to be bad news... there's nothing wrong with configuring a node to have 5 or 10 or 100 connections in the connection pool on each node.
It's true, you might end up with a situation where you have too many users connecting to the database at a single time which overwhelms your database, but that could also happen with a single node as well. There isn't anything conceptually different about multiple-nodes that wouldn't also be true for a single node.
the key is to make sure that your cluster balances users appropriately so that you don't have a limit of e.g. 5 database connections per node, but 100 users end up on one node while the other nodes only have 5 users per node. In that case, the popular node (100 users) will have to share those 5 connections while on the other nodes, each user gets a connection all to themselves.
Back to your first item, which is more complicated. If you have a separate database per user, then connection-pooling is an impossible thing to accomplish because you will absolutely have to establish a new connection for every user every time. Those connections aren't poolable, at least not without being quite careful about it. It sounds like you have an architectural issue that you might have to solve before you can identify a technical solution to that issue.

Uneven cache hits

I have integrated twemproxy into web layer and I have 6 Elasticache(1 master , 5 read replicas) I am getting issue that the all replicas have same keys everything is same but cache hits on one replica is way more than others and I performed several load testing still on every test I am getting same result. I have separate data engine that writes on the master of this cluster and remaining 5 replicas get sync with it. So I am using twemproxy only for reading data from Elasticache not for sharding purpose. So my simple question is why i am getting 90% of hits on single read replicas of Elasticache it should distribute the hits evenly among all read replicas? right?
Thank you in advance
Twemproxy hashes everything as I recall. This means it will try to split keys among the masters you give it. If you have one master this means it hashes everything to one server. Thus, as far as it is concerned you have one server for acceptable queries. As such, it isn't helping you in this case.
If you want to have a single endpoint to distribute reads across a bank of identical slaves, you will need to put a TCP load balancer in front of the slaves and have your application talk to the IP:port of the load balancer. Common options are Nginx and HAProxy for software based ones, on AWS you could use their load balancer but you could run into various resource limits out of your control there, and pretty much any hardware load balancer would work as well (though this is difficult if not impossible on AWS).
Which load balancer to use is dependent on your (or your personnel's) comfort and knowledge level with each option.

Couchbase node failure

My understanding could be amiss here. As I understand it, Couchbase uses a smart client to automatically select which node to write to or read from in a cluster. What I DON'T understand is, when this data is written/read, is it also immediately written to all other nodes? If so, in the event of a node failure, how does Couchbase know to use a different node from the one that was 'marked as the master' for the current operation/key? Do you lose data in the event that one of your nodes fails?
This sentence from the Couchbase Server Manual gives me the impression that you do lose data (which would make Couchbase unsuitable for high availability requirements):
With fewer larger nodes, in case of a node failure the impact to the
application will be greater
Thank you in advance for your time :)
By default when data is written into couchbase client returns success just after that data is written to one node's memory. After that couchbase save it to disk and does replication.
If you want to ensure that data is persisted to disk in most client libs there is functions that allow you to do that. With help of those functions you can also enshure that data is replicated to another node. This function is called observe.
When one node goes down, it should be failovered. Couchbase server could do that automatically when Auto failover timeout is set in server settings. I.e. if you have 3 nodes cluster and stored data has 2 replicas and one node goes down, you'll not lose data. If the second node fails you'll also not lose all data - it will be available on last node.
If one node that was Master goes down and failover - other alive node becames Master. In your client you point to all servers in cluster, so if it unable to retreive data from one node, it tries to get it from another.
Also if you have 2 nodes in your disposal you can install 2 separate couchbase servers and configure XDCR (cross datacenter replication) and manually check servers availability with HA proxies or something else. In that way you'll get only one ip to connect (proxy's ip) which will automatically get data from alive server.
Hopefully Couchbase is a good system for HA systems.
Let me explain in few sentence how it works, suppose you have a 5 nodes cluster. The applications, using the Client API/SDK, is always aware of the topology of the cluster (and any change in the topology).
When you set/get a document in the cluster the Client API uses the same algorithm than the server, to chose on which node it should be written. So the client select using a CRC32 hash the node, write on this node. Then asynchronously the cluster will copy 1 or more replicas to the other nodes (depending of your configuration).
Couchbase has only 1 active copy of a document at the time. So it is easy to be consistent. So the applications get and set from this active document.
In case of failure, the server has some work to do, once the failure is discovered (automatically or by a monitoring system), a "fail over" occurs. This means that the replicas are promoted as active and it is know possible to work like before. Usually you do a rebalance of the node to balance the cluster properly.
The sentence you are commenting is simply to say that the less number of node you have, the bigger will be the impact in case of failure/rebalance, since you will have to route the same number of request to a smaller number of nodes. Hopefully you do not lose data ;)
You can find some very detailed information about this way of working on Couchbase CTO blog:
http://damienkatz.net/2013/05/dynamo_sure_works_hard.html
Note: I am working as developer evangelist at Couchbase