NLB Concurrency - wcf

I have 3 servers on a NLB cluster and deployed a stateless heavy-running WCF service to stress the cluster.
Configuration is: Port 80, Multicast, Affinity : None
Then I shoot 32 requests PARALELLY from my workstation to the cluster, the total amount of time to complete 32 requests is about 35 seconds.
I tried to turn 2 servers off, run the service again and the final result is also about ... 35 seconds !!!?!?!?
When looking at Task Manager on those 3 servers at the same time, I recognized that the requests were processed sequentially, if 1 server is processing a request, the other 2 become idle. I though that the requests should be process paralelly on all 3 servers.
I cannot figure out what happened. Did I configure them wrong ?
Anyone have explanation for this ?

According to the NLB documentation:
If your Affinity is set to Single or Class C, the request from a single IP should be routed to a single host in the cluster. But if you have Affinity set to None, it should route the request to all the hosts.
In reality, I see the same problem you are seeing. We have multiple servers in a cluster, with affinity set to None, all requests from a single IP still get routed to save host. I am still looking for answers.

Finally, I found the answer myself.
NLB cluster decides the host for serving request base on client's IP, at a time there is only 1 server assgined to serve all the request from 1 client'IP.
All my requests were sent in parallel but with only 1 IP; therefore, only 1 server is assigned to serve all these requests.
When trying to send requests from 2 or 3 clients, I see that the other servers begin working parallely.
That is just the way NLB work.

Related

How do Cloud Run instances perceive multiple requests from HTTP2 connections?

HTTP2 has this multiplexing feature.
From this [answer](Put simply, multiplexing allows your Browser to fire off multiple requests at once on the same connection and receive the requests back in any order.) we get that:
Put simply, multiplexing allows your Browser to fire off multiple requests at once on the same connection and receive the requests back in any order.
Let's say I split my app into 50 small bundled files, to take advantage of the multiplex communication.
My server is an express app hosted in a Cloud Run instance.
Here is what Cloud Run says about concurrency:
By default Cloud Run container instances can receive many requests at the same time (up to a maximum of 250).
So, if 5 users hit my app at the same time, does it mean that my instance will be max'ed out for a brief moment?
Because each browser (from the 5 users) will make 50 requests (for the 50 small bundled files), resulting on a total of 250.
Does the fact that multiplex traffic occurs on over the same connection change any thing? How does it work?
Does it mean that my cloud run will perceive 5 connections and my express server will perceive 250 requests? I think I'm confused about the request expression in these 2 perspectives (the cloud run instance and the express server).
A "request" is :
the establishment of the connexion between the server and the client (the browser here)
The data transfert
The connexion close.
With streaming capacity of HTTP2 and websocket, the connexion can takes minutes (and up to 1 hour) and you can send data through the channel as you want. 1 connexion = 1 request, 5 connexions = 5 requests.
But keep in mind that keeping this connexion open and processing data in it consume resources on your backend and you can't have dozens of connexion that actively send/receive data, you will saturate your instance.

How to scale out haproxy node itself?

As i am having one application in which architecture is as per below,
users ----> Haproxy load balancer(TCP Connection) -> Application server 1
-> Application server 2
-> Application server 3
Now i am able to scale out and scale in application servers. But due to high no of TCP connections(around 10000 TCP connections), my haproxy load balancer node RAM gets full, so further not able to achieve more no of TCP connections afterwords. So need to add one more node for happroxy itself. So my question is how to scale out haproxy node itself or is there any other solution for the same?
As i am deploying application on AWS. Your solution is highly appreciated.
Thanks,
Mayank
Use AWS Route53 and create CNAMEs that point to your haproxy instances.
For example:
Create a CNAME haproxy.example.com pointing to haproxy instance1.
Create a second CNAME haproxy.example.com pointing to haproxy instance2.
Both of your CNAMEs should use some sort of routing policy. Simplest is probably round robin. Simply rotates over your list of CNAMEs. When you lookup haproxy.example.com you will get the addresses of both instances. The order of ips returned will change with each request. This will distribute load evenly between your two instances.
There are many more options depending on your needs: http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html
Overall this is pretty easy to configure.
Few unrelated issues: You can also setup health checks and route traffic to the remaining healthy instance(s) if needed. If your running at capacity with two instances you might want to add a few more to able to cope with an instance failing.

Kubernetes cluster internal load balancing

Playing a bit with Kubernetes (v1.3.2) I’m checking the ability to load balance calls inside the cluster (3 on-premise CentOS 7 VMs).
If I understand correctly the documentation in http://kubernetes.io/docs/user-guide/services/ ‘Virtual IPs and service proxies’ paragraph, and as I see in my tests, the load balance is per node (VM). I.e., if I have a cluster of 3 VMs and deployed a service with 6 pods (2 per VM), the load balancing will only be between the pods of the same VM which is somehow disappointing.
At least this is what I see in my tests: Calling the service from within the cluster using the service’s ClusterIP, will load-balance between the 2 pods that reside in the same VM that the call was sent from.
(BTW, the same goes when calling the service from out of the cluster (using NodePort) and then the request will load-balance between the 2 pods that reside in the VM which was the request target IP address).
Is the above correct?
If yes, how can I make internal cluster calls load-balance between all the 6 replicas? (Must I employ a load balancer like nginx for this?)
No, the statement is not correct. The loadbalancing should be across nodes (VMs). This demo demonstrates it. I have run this demo on a k8s cluster with 3 nodes on gce. It first creates a service with 5 backend pods, then it ssh into one gce node and visits the service.ClusterIP, and the traffic is loadbalanced to all 5 pods.
I see you have another question "not unique ip per pod" open, it seems you hadn't set up your cluster network properly, which might caused what you observed.
In your case, each node will be running a copy of the service - and load-balance across the nodes.

Using Load Balancers To Terminate TLS For TCP/IP Connections

I am writing a TCP/IP server that handlers persistent connections. I'll be using TLS to secure the communication and have a question about how to do this:
Currently I have a load balancer (AWS ELB) in front of a single server. In order for the load balancer to do the TLS termination for the duration of the connection it must hold on to the connection and forward the plain text to the application behind it.
client ---tls---> Load Balancer ---plain text---> App Server
This works great. Yay! My concern is that I'll need a load balancer in front of every app server because, presumably, the number of connections the load balancer can handle is the same as the number of connections the app server can handle (assuming the same OS and NIC). This means that if I had 1 load balancer and 2 app servers, I could wind up in a situation where the load balancer is at full capacity and each app server is at half capacity. In order to avoid this problem I'd have to create a 1 to 1 relationship between the load balancers and app servers.
I'd prefer the app server to not have to do the TLS termination because, well, why recreate the wheel? Are there better methods than to have a 1 to 1 relationship between the load balancer and the app server to avoid the capacity issue mentioned above?
There are two probable flaws in your presumption.
The first is the assumption that your application server will experience the same amount of load for a given number of connections as the load balancer. Unless your application server is extremely well-written, it seems reasonable that it would run out of CPU or memory or encounter other scaling issues before it reached the theoretical maximum ~64K concurrent connections IPv4 can handle on a given IP address. If that's really true, then great -- well done.
The second issue is that a single load balancer from ELB is not necessarily a single machine. A single ELB launches a hidden virtual machine in each availability zone where you've attached the ELB to a subnet, regardless of the number of instances attached, and the number of ELB nodes scales up automatically as load increases. (If I remember right, I've seen as many as nodes 8 running at the same time -- for a single ELB.) Presumably the class of those ELB instances could change , too, but that's not a facet that's well documented. There's not a charge for these machines, as they are included in the ELB price, so as they scale up, the monthly cost for the ELB doesn't change... but provisioning qty = 1 ELB does not mean you get only 1 ELB node.

Configuring Flume on with multiple HTTP source in a cluster

How can I configure Apache Flume to listen to multiple HTTP sources in a cluster with multiple flume agents?
My flume agent is configured as follows:
agent1.sources.httpSource_1.type = http
...
agent1.sources.httpSource_1.port = 8081
agent1.sources.httpSource_2.type = http
...
agent1.sources.httpSource_2.port = 8082
agent1.sources.httpSource_3.type = http
...
agent1.sources.httpSource_3.port = 8083
Let's assume I have 5 servers in my cluster. Which address should I send my REST or POST http message to reach all of my 5 servers?
For example, if I will send an HTTP POST message to <server_dns_1>:8081 then only agent1 will process it, if I understand it correctly.
How can I use all of my cluster servers and which address should I send my http requests to?
Cantroid, the way you have configured Flume only one agent (agent1) will be run. This agent will internally run 5 listening threads.
Being said that, there is no way a single http POST sends a message to all the 5 listening threads (or 5 agents, if you finnally split your unique agent into 5). Not unless you use some load balancing software or you use some "broadcasting" magic at network level (I'm not an expert on that).
Nevertheless, if the reason for having 5 listening ports is you want to perform 5 different data treatments, then you can create a single agent listening in a single HTTP port and then create 5 different channels where 5 different sinks will be listening. The key point with this architecture is the default channel selector is the replicating one, i.e. a copy of the same event will be put into the 5 channels by the unique listening source.