How can I configure Apache Flume to listen to multiple HTTP sources in a cluster with multiple flume agents?
My flume agent is configured as follows:
agent1.sources.httpSource_1.type = http
...
agent1.sources.httpSource_1.port = 8081
agent1.sources.httpSource_2.type = http
...
agent1.sources.httpSource_2.port = 8082
agent1.sources.httpSource_3.type = http
...
agent1.sources.httpSource_3.port = 8083
Let's assume I have 5 servers in my cluster. Which address should I send my REST or POST http message to reach all of my 5 servers?
For example, if I will send an HTTP POST message to <server_dns_1>:8081 then only agent1 will process it, if I understand it correctly.
How can I use all of my cluster servers and which address should I send my http requests to?
Cantroid, the way you have configured Flume only one agent (agent1) will be run. This agent will internally run 5 listening threads.
Being said that, there is no way a single http POST sends a message to all the 5 listening threads (or 5 agents, if you finnally split your unique agent into 5). Not unless you use some load balancing software or you use some "broadcasting" magic at network level (I'm not an expert on that).
Nevertheless, if the reason for having 5 listening ports is you want to perform 5 different data treatments, then you can create a single agent listening in a single HTTP port and then create 5 different channels where 5 different sinks will be listening. The key point with this architecture is the default channel selector is the replicating one, i.e. a copy of the same event will be put into the 5 channels by the unique listening source.
Related
HTTP2 has this multiplexing feature.
From this [answer](Put simply, multiplexing allows your Browser to fire off multiple requests at once on the same connection and receive the requests back in any order.) we get that:
Put simply, multiplexing allows your Browser to fire off multiple requests at once on the same connection and receive the requests back in any order.
Let's say I split my app into 50 small bundled files, to take advantage of the multiplex communication.
My server is an express app hosted in a Cloud Run instance.
Here is what Cloud Run says about concurrency:
By default Cloud Run container instances can receive many requests at the same time (up to a maximum of 250).
So, if 5 users hit my app at the same time, does it mean that my instance will be max'ed out for a brief moment?
Because each browser (from the 5 users) will make 50 requests (for the 50 small bundled files), resulting on a total of 250.
Does the fact that multiplex traffic occurs on over the same connection change any thing? How does it work?
Does it mean that my cloud run will perceive 5 connections and my express server will perceive 250 requests? I think I'm confused about the request expression in these 2 perspectives (the cloud run instance and the express server).
A "request" is :
the establishment of the connexion between the server and the client (the browser here)
The data transfert
The connexion close.
With streaming capacity of HTTP2 and websocket, the connexion can takes minutes (and up to 1 hour) and you can send data through the channel as you want. 1 connexion = 1 request, 5 connexions = 5 requests.
But keep in mind that keeping this connexion open and processing data in it consume resources on your backend and you can't have dozens of connexion that actively send/receive data, you will saturate your instance.
I'm facing difficulty in finding a solution where my listener code in activemq should listen messages from multiple brokers. For an example: we have 4 brokers(1,2,3,4) which serves messages to consumers which is hosted in 4 servers (A,B,C,D). The consumerA should listen for response messages from broker1,2,3 & 4. If its finds the message, then consumerA should pick and process the message. If in case consumerA is down for any reason consumerB should listen to all 4 brokers.
Configuring failover transport in below way doesn't help me to achieve above design.
activemq.broker.url=failover:(tcp://localhost:61716,tcp://localhost:61717,tcp://localhost:61718,tcp://localhost:61719)?randomize=false,timeout=5000,maxReconnectAttempts=3
With above uri configuration my listener code only listens to broker on port 61716 and if the message is available on another broker say on port 61717 its not able to pick and process it. Any help will be really appreciated.
P.S: Is there any example for one consumer listening to multiple brokers at the same time?
As i'm not finding a solution from activemq for one consumer listening to multiple brokers, we have implemented a solution of creating multiple beans each pointing to one specific broker url. That way we are pointing to 4 urls from same server and from same listener configuration file.
We are working on Mule HA cluster PoC with two separate server nodes. We were able to create a cluster. We have developed small dummy application with Http endpoint with reliability pattern implementation which loops for a period and prints a value. When we deploy the application in Mule HA cluster, even though its deploys successfully in cluster and application log file has been generated in both the servers but its running in only one server. In application we can point to only server IP for HTTP endpoint. Could any one please clarify my following queries?
In our case why the application is running in one server (which ever IP points to server getting executed).
Will Mule HA cluster create virtual IP?
If not then which IP we need to configure in application for HTTP endpoints?
Do we need to have Load balancer for HTTP based endpoints request? If so then in application which IP needs to be configured for HTTP endpoint as we don't have virtual IP for Mule HA cluster?
Really appreciate any help on this.
Environment: Mule EE ESB v 3.4.2 & Private cloud.
1) You are seeing one server processing requests because you are sending them to the same server each time.
2) Mule HA will not create a virtual IP
3/4) You need to place a load balancer in front of the Mule nodes in order to distribute the load when using HTTP inbound endpoints. You do not need to decide which IP to place in the HTTP connector within the application, the load balancer will route the request to one of the nodes.
creating a Mule cluster will just allow your Mule applications share information through its shared memory (VM transport and Object Store) and make the polling endpoints poll only from a single node. In the case of HTTP, it will listen in each of the nodes, but you need to put a load balancer in front of your Mule nodes to distribute load. I recommend you to read the High Availability documentation. But the more importante question is why do you need to create a cluster? You can have two separate Mule servers with your application deployed and have a load balancer send request to them.
I have a http server application that receives requests from HTTP clients and puts them to a Redis List for processing. Another process listening on this list picks up the requests and processes them, and finally puts the response into another Redis Queue to be consumed by the HTTP server.
The sequence is like this:
(1) Http Client ==> Web app
(2) Web App ==> Redis Request Queue (List data structure)
(3) Processor ==> consumes requests using multiple threads and processes them
(4) Processor ==> puts to a Redis Response Queue (List data structure)
(5) Web App ==> has to pick the response from the response que and deliver to HTTP clients
Given the above scenario, if multiple threads on the HTTP server are queueing the msgs to Redis, is there any established pattern to ensure responses are correctly picked up and sent back to HTTP clients using the right session?
I am planning to use Redis (or may be RabbitMQ or ZeroMQ) for the producer/consumer, because I want to scale horizontally and configure many consumers spread across several nodes.
Thanks for pointing me a reasonable approach on this.
I have 3 servers on a NLB cluster and deployed a stateless heavy-running WCF service to stress the cluster.
Configuration is: Port 80, Multicast, Affinity : None
Then I shoot 32 requests PARALELLY from my workstation to the cluster, the total amount of time to complete 32 requests is about 35 seconds.
I tried to turn 2 servers off, run the service again and the final result is also about ... 35 seconds !!!?!?!?
When looking at Task Manager on those 3 servers at the same time, I recognized that the requests were processed sequentially, if 1 server is processing a request, the other 2 become idle. I though that the requests should be process paralelly on all 3 servers.
I cannot figure out what happened. Did I configure them wrong ?
Anyone have explanation for this ?
According to the NLB documentation:
If your Affinity is set to Single or Class C, the request from a single IP should be routed to a single host in the cluster. But if you have Affinity set to None, it should route the request to all the hosts.
In reality, I see the same problem you are seeing. We have multiple servers in a cluster, with affinity set to None, all requests from a single IP still get routed to save host. I am still looking for answers.
Finally, I found the answer myself.
NLB cluster decides the host for serving request base on client's IP, at a time there is only 1 server assgined to serve all the request from 1 client'IP.
All my requests were sent in parallel but with only 1 IP; therefore, only 1 server is assigned to serve all these requests.
When trying to send requests from 2 or 3 clients, I see that the other servers begin working parallely.
That is just the way NLB work.