How many objects boost::asio::ip::tcp::acceptor do I need to have to accept connections in multiple threads? - boost-asio

I use a pool of ten threads for acceptors (listeners), which are accept connections and run tasks of their processing in other threads. In this case, I need one or more of the acceptor object?
{
boost::asio::io_service io_service_acceptors;
boost::asio::io_service::work work_acceptors(io_service_acceptors);
// Do I need a single object or many like threads "thr_grp_acceptors"?:
// std::vector<boost::asio::ip::tcp::acceptor> acpt_grp_acceptors
boost::asio::ip::tcp::acceptor acceptor(io_service_acceptors,
ba::ip::tcp::endpoint(ba::ip::tcp::v4(), port));
std::vector<boost::thread> thr_grp_acceptors;
for(size_t i = 0; i < 10; ++i)
thr_grp_acceptors_.emplace_back(
boost::bind(&boost::asio::io_service::run, &io_service_acceptors));
acceptor_.async_accept(...);
}

Your 2nd and other acceptors will not be able to bind() into port, so no, you cannot use many acceptors on same port. You can run many async_accept at a time, their handlers can fire in different threads. But, in my application i have only one active async_accept at a time and its enough for me under heavy load.

Related

Call `recvfrom` multiple times from different task on the same server socket

I am developing a system whereby I have to communicate with 18-different subsystems.
All 18-subsystems are UDP clients. I have created UDP server.
I'm using recvfrom to receive data these 18-subsystems.
char buf[1000];
int buf_len = 1000;
int sockfd;
//Code to configure socket
//Code to create Socket
//Code to bind socket
FOREVER
{
bytes_read = recvfrom(sockfd, (void *)buf, buf_len, 0,
(struct sockaddr *)&client_addr, &sock_addr_size);
//Spawn New Task to process data
}
I have three options process received data
Process the data immediately after receiving new data. This approach is not feasible as it will increase the latency in processing message and system will loose its deterministic hard-real time capabilities.
Spawn a new task after receiving new data. This new task will process incoming data and forward the processed data to appropriate task that will consume this data.
Create multiple task each running recvfrom on the same socket and will process data immediately after receiving the new data and forward the processed data to appropriate task that will consume this data.
I am more inclined towards Option 3. I wish to know is it allowed in vxWorks to call recvfrom multiple times from different task (disjoint task) on the same server socket or will it cause some complication.

Why Alamofire is using dispatch_sync() function when creating a dataTask?

The code below is from source code of Alamofire
let queue = dispatch_queue_create(nil, DISPATCH_QUEUE_SERIAL)
public func request(URLRequest: URLRequestConvertible) -> Request {
var dataTask: NSURLSessionDataTask!
dispatch_sync(queue) { dataTask = self.session.dataTaskWithRequest(URLRequest.URLRequest) }
let request = Request(session: session, task: dataTask)
self.delegate[request.delegate.task] = request.delegate
if startRequestsImmediately {
request.resume()
}
return request
}
It seems like every time it creates a dataTask, it dispatch that creating process to a serial queue. Would this measure protect the program from any kind of multi-thread trap?
I can't figure out what's the difference without that queue.
The reason why we implemented that check is due to Alamofire Issue #393. We were seeing duplicate task identifiers without the serial queue when creating data and upload tasks in parallel from multiple threads. It appears that Apple has a thread safety issue when incrementing the task identifiers. Therefore in Alamofire, we eliminate the issue by creating the tasks on a serial queue.
Cheers. 🍻

Execute multiple downloads and wait for all to complete

I am currently working on an API service that allows 1 or more users to download 1 or more items from an S3 bucket and return the contents to the user. While the downloading is fine, the time taken to download several files is pretty much 100-150 ms * the number of files.
I have tried a few approaches to speeding this up - parallelStream() instead of stream() (which, considering the amount of simultaneous downloads, is at a serious risk of running out of threads), as well as CompleteableFutures, and even creating an ExecutorService, doing the downloads then shutting down the pool. Typically I would only want a few concurrent tasks e.g. 5 at the same time, per request to try and cut down on the number of active threads.
I have tried integrating Spring #Cacheable to store the downloaded files to Redis (the files are readonly) - while this certainly cuts down response times (several ms to retrieve files compared to 100-150 ms), the benefits are only there once the file has been previously retrieved.
What is the best way to handle waiting on multiple async tasks to finish then getting the results, also considering I don't want (or don't think I could) have hundreds of threads opening http connections and downloading all at once?
You're right to be concerned about tying up the common fork/join pool used by default in parallel streams, since I believe it is used for other things like sort operations outside of the Stream api. Rather than saturating the common fork/join pool with an I/O-bound parallel stream, you can create your own fork/join pool for the Stream. See this question to find out how to create an ad hoc ForkJoinPool with the size you want and run a parallel stream in it.
You could also create an ExecutorService with a fixed-size thread pool that would also be independent of the common fork/join pool, and would throttle the requests, using only the threads in the pool. It also lets you specify the number of threads to dedicate:
ExecutorService executor = Executors.newFixedThreadPool(MAX_THREADS_FOR_DOWNLOADS);
try {
List<CompletableFuture<Path>> downloadTasks = s3Paths
.stream()
.map(s3Path -> completableFuture.supplyAsync(() -> mys3Downloader.downloadAndGetPath(s3Path), executor))
.collect(Collectors.toList());
// at this point, all requests are enqueued, and threads will be assigned as they become available
executor.shutdown(); // stops accepting requests, does not interrupt threads,
// items in queue will still get threads when available
// wait for all downloads to complete
CompletableFuture.allOf(downloadTasks.toArray(new CompletableFuture[downloadTasks.size()])).join();
// at this point, all downloads are finished,
// so it's safe to shut down executor completely
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
} finally {
executor.shutdownNow(); // important to call this when you're done with the executor.
}
Following #Hank D's lead, you can encapsulate the creation of the executor service to ensure that you do, indeed, call ExecutorService::shutdownNow after using said executor:
private static <VALUE> VALUE execute(
final int nThreads,
final Function<ExecutorService, VALUE> function
) {
ExecutorService executorService = Executors.newFixedThreadPool(nThreads);
try {
return function.apply(executorService);
} catch (final InterruptedException | ExecutionException exception) {
exception.printStackTrace();
} finally {
executorService .shutdownNow(); // important to call this when you're done with the executor service.
}
}
public static void main(final String... arguments) {
// define variables
final List<CompletableFuture<Path>> downloadTasks = execute(
MAX_THREADS_FOR_DOWNLOADS,
executor -> s3Paths
.stream()
.map(s3Path -> completableFuture.supplyAsync(
() -> mys3Downloader.downloadAndGetPath(s3Path),
executor
))
.collect(Collectors.toList())
);
// use downloadTasks
}

Jedis getResource() is taking lot of time

I am trying to use sentinal redis to get/set keys from redis. I was trying to stress test my setup with about 2000 concurrent requests.
i used sentinel to put a single key on redis and then I executed 1000 concurrent get requests from redis.
But the underlying jedis used my sentinel is blocking call on getResource() (pool size is 500) and the overall average response time that I am achieving is around 500 ms, but my target was about 10 ms.
I am attaching sample of jvisualvm snapshot here
redis.clients.jedis.JedisSentinelPool.getResource() 98.02227 4.0845232601E7 ms 4779
redis.clients.jedis.BinaryJedis.get() 1.6894469 703981.381 ms 141
org.apache.catalina.core.ApplicationFilterChain.doFilter() 0.12820946 53424.035 ms 6875
org.springframework.core.serializer.support.DeserializingConverter.convert() 0.046286926 19287.457 ms 4
redis.clients.jedis.JedisSentinelPool.returnResource() 0.04444578 18520.263 ms 4
org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept() 0.035538 14808.45 ms 11430
May anyone help to debug further into the issue?
From JedisSentinelPool implementation of getResource() from Jedis sources (2.6.2):
#Override
public Jedis getResource() {
while (true) {
Jedis jedis = super.getResource();
jedis.setDataSource(this);
// get a reference because it can change concurrently
final HostAndPort master = currentHostMaster;
final HostAndPort connection = new HostAndPort(jedis.getClient().getHost(), jedis.getClient()
.getPort());
if (master.equals(connection)) {
// connected to the correct master
return jedis;
} else {
returnBrokenResource(jedis);
}
}
}
Note the while(true) and the returnBrokenResource(jedis), it means that it tries to get a jedis resource randomly from the pool that is indeed connected to the correct master and retries if it is not the good one. It is a dirty check and also a blocking call.
The super.getResource() call refers to JedisPool traditionnal implementation that is actually based on Apache Commons Pool (2.0). It does a lot to get an object from the pool, and I think it even repairs fail connections for instance. With a lot of contention on your pool, as probably in your stress test, it can probably take a lot of time to get a resource from the pool, just to see it is not connected to the correct master, so you end up calling it again, adding contention, slowing getting the resource etc...
You should check all the jedis instances in your pool to see if there's a lot of 'bad' connections.
Maybe you should give up using a common pool for your stress test (only create Jedis instances manually connected to the correct node, and close them nicely), or setting multiple ones to mitigate the cost of looking to "dirty" unchecked jedis resources.
Also with a pool of 500 jedis instances, you can't emulate 1000 concurrent queries, you need at least 1000.

Ridiculously slow simultaneous publish/consume rate with RabbitMQ

I'm evaluating RabbitMQ and while the general impression (of AMQP as such, and also RabbitMQ) is positive, I'm not very impressed by the result.
I'm attempting to publish and consume messages simultaneously and have achieved very poor message rates. I have a durable direct exchange, which is bound to a durable queue and I publish persistent messages to that exchange. The average size of the message body is about 1000 bytes.
My publishing happens roughly as follows:
AMQP.BasicProperties.Builder bldr = new AMQP.BasicProperties.Builder();
ConnectionFactory factory = new ConnectionFactory();
factory.setUsername("guest");
factory.setPassword("guest");
factory.setVirtualHost("/");
factory.setHost("my-host");
factory.setPort(5672);
Connection conn = null;
Channel channel = null;
ObjectMapper mapper = new ObjectMapper(); //com.fasterxml.jackson.databind.ObjectMapper
try {
conn = factory.newConnection();
channel = conn.createChannel();
channel.confirmSelect();
} catch (IOException e) {}
for(Message m : messageList) { //the size of messageList happens to be 9945
try {
channel.basicPublish("exchange", "", bldr.deliveryMode(2).contentType("application/json").build(), mapper.writeValueAsBytes(cm));
} catch (Exception e) {}
}
try {
channel.waitForConfirms();
channel.close();
conn.close();
} catch (Exception e1) {}
And consuming messages from the bound queue happens as so:
AMQP.BasicProperties.Builder bldr = new AMQP.BasicProperties.Builder();
ConnectionFactory factory = new ConnectionFactory();
factory.setUsername("guest");
factory.setPassword("guest");
factory.setVirtualHost("/");
factory.setHost("my-host");
factory.setPort(5672);
Connection conn = null;
Channel channel = null;
try {
conn = factory.newConnection();
channel = conn.createChannel();
channel.basicQos(100);
while (true) {
GetResponse r = channel.basicGet("rawDataQueue", false);
if(r!=null)
channel.basicAck(r.getEnvelope().getDeliveryTag(), false);
}
} catch (IOException e) {}
The problem is that when the message publisher (or several of them) and consumer (or several of them) run simultaneously then the publisher(s) appear to run at full throttle and the RabbitMQ management web interface shows a publishing rate of, say, ~2...3K messages per second, but a consumption rate of 0.5...3 per consumer. When the publisher(s) finish then I get a consumption rate of, say, 300...600 messages per consumer. When not setting the QOS prefetch value for the Java client, then a little less, when setting it to 100 or 250, then a bit more.
When experimenting with throttling the consumers somewhat, I have managed to achieve simultaneous numbers like ~400 published and ~50 consumed messages per second which is marginally better but only marginally.
Here's, a quote from the RabbitMQ blog entry which claims that queues are fastest when they're empty which very well may be, but slowing the consumption rate to a crawl when there are a few thousand persistent messages sitting in the queue is still rather unacceptable.
Higher QOS prefetching values may help a bit but are IMHO not a solution as such.
What, if anything, can be done to achieve reasonable throughput rates (2 consumed messages per consumer per second is not reasonable in any circumstance)? This is only a simple one direct exchange - one binding - one queue situation, should I expect more performance degradation with more complicated configurations? When searching around the internet there have also been suggestions to drop durability, but I'm afraid in my case that is not an option. I'd be very happy if somebody would point out that I'm stupid and that there is an evident and straightforward solution of some kind :)
Have you tried with the autoAck option? That should improve your performance. It is much faster than getting the messages one by one and ack'ing them. Increasing the prefetch count should make it even better too.
Also, what is the size of the messages you are sending and consuming including headers? Are you experiencing any flow-control in the broker?
Another question, are you creating a connection and channel every time you send/get a message? If so, that's wrong. You should be creating a connection once, and use a channel per thread (probably in a thread-local fashion) to send and receive messages. You can have multiple channels per connection. There is no official documentation about this, but if you read articles and forums this seems to be the best performance practice.
Last thing, have you considered using the basicConsume instead of basicGet? It should also make it faster.
Based on my experience, I have been able to run a cluster sending and consuming at rates around 20000 messages per second with non-persistent messages. I guess that if you are using durable and persistent messages the performance would decrease a little, but not 10x.
Operating system could schedule your process to the next time slot, if sleep is used. This could create significant performance decrease.