using multiple io_service objects - boost-asio

I have my application in which listen and process messages from both internet sockets and unix domain sockets. Now I need to add SSL to the internet sockets, I was using a single io_service object for all the sockets in the application. It seems now I need to add separate io_service objects for network sockets and unix domain sockets. I don't have any threads in my application and I use async_send and async_recieve and async_accept to process data and connections. Please point me to any examples using multiple io_service objects with async handlers.

The question has a degree of uncertainty as if multiple io_service objects are required. I could not locate anything in the reference documentation, or the overview for SSL and UNIX Domain Sockets that mandated separate io_service objects. Regardless, here are a few options:
Single io_service:
Try to use a single io_service.
If you do not have a direct handle to the io_service object, but you have a handle to a Boost.Asio I/O object, such as a socket, then a handle to the associated io_service object can be obtained by calling socket.get_io_service().
Use a thread per io_service:
If multiple io_service objects are required, then dedicate a thread to each io_service. This approach is used in Boost.Asio's HTTP Server 2 example.
boost::asio::io_service service1;
boost::asio::io_service service2;
boost::thread_group threads;
threads.create_thread(boost::bind(&boost::asio::io_service::run, &service1));
service2.run();
threads.join_all();
One consequence of this approach is that the it may require thread-safety guarantees to be made by the application. For example, if service1 and service2 both have completion handlers that invoke message_processor.process(), then message_processor.process() needs to either be thread-safe or called in a thread-safe manner.
Poll io_service:
io_service provides non-blocking alternatives to run(). Where as io_service::run() will block until all work has finished, io_service::poll() will run handlers that are ready to run and will not block. This allows for a single thread to execute the event loop on multiple io_service objects:
while (!service1.stopped() &&
!service2.stopped())
{
std::size_t ran = 0;
ran += service1.poll();
ran += service2.poll();
// If no handlers ran, then sleep.
if (0 == ran)
{
boost::this_thread::sleep_for(boost::chrono::seconds(1));
}
}
To prevent a tight-busy loop when there are no ready-to-run handlers, it may be worth adding in a sleep. Be aware that this sleep may introduce latency in the overall handling of events.
Transfer handlers to a single io_service:
One interesting approach is to use a strand to transfer completion handlers to a single io_service. This allows for a thread per io_service, while preventing the need to have the application make thread-safety guarantees, as all completion handlers will post through a single service, whose event loop is only being processed by a single thread.
boost::asio::io_service service1;
boost::asio::io_service service2;
// strand2 will be used by service2 to post handlers to service1.
boost::asio::strand strand2(service1);
boost::asio::io_service::work work2(service2);
socket.async_read_some(buffer, strand2.wrap(read_some_handler));
boost::thread_group threads;
threads.create_thread(boost::bind(&boost::asio::io_service::run, &service1));
service2.run();
threads.join_all();
This approach does have some consequences:
It requires handlers that are intended to by ran by the main io_service to be wrapped via strand::wrap().
The asynchronous chain now runs through two io_services, creating an additional level of complexity. It is important to account for the case where the secondary io_service no longer has work, causing its run() to return.
It is common for an asynchronous chains to occur within the same io_service. Thus, the service never runs out of work, as a completion handler will post additional work onto the io_service.
| .------------------------------------------.
V V |
read_some_handler() |
{ |
socket.async_read_some(..., read_some_handler) --'
}
On the other hand, when a strand is used to transfer work to another io_service, the wrapped handler is invoked within service2, causing it to post the completion handler into service1. If the wrapped handler was the only work in service2, then service2 no longer has work, causing servce2.run() to return.
service1 service2
====================================================
.----------------- wrapped(read_some_handler)
| .
V .
read_some_handler NO WORK
| .
| .
'----------------> wrapped(read_some_handler)
To account for this, the example code uses an io_service::work for service2 so that run() remains blocked until explicitly told to stop().

Looks like you are writing a server and not a client. Don't know if this helps, but I am using ASIO to communicate with 6 servers from my client. It uses TCP/IP SSL/TSL. You can find a link to the code here
You should be able to use just one io_service object with multiple socket objects. But, if you decide that you really want to have multiple io_service objects, then it should be fairly easy to do so. In my class, the io_service object is static. So, just remove the static keyword along with the logic in the constructor that only creates one instance of the io_service object. Depending on the number of connections expected for your server, you would probably be better off using a thread pool dedicated to handling socket I/O rather than creating a thread for each new socket connection.

Related

Reactive Redis (Lettuce) always publishing to single thread

Im using Spring Webflux (with spring-reactor-netty) 2.1.0.RC1 and Lettuce 5.1.1.RELEASE.
When I invoke any Redis operation using the Reactive Lettuce API the execution always switches to the same individual thread (lettuce-nioEventLoop-4-1).
That is leading to poor performance since all the execution is getting bottlenecked in that single thread.
I know I could use publishOn every time I call Redis to switch to another thread, but that is error prone and still not optimal.
Is there any way to improve that? I see that Lettuce provides the ClientResources class to customize the Thread allocation but I could not find any way to integrate that with Spring webflux.
Besides, wouldn't the current behaviour be dangerous for a careless developer? Maybe the defaults should be tuned a little. I suppose the ideal scenario would be if Lettuce could just reuse the same event loop from webflux.
I'm adding this spring boot single class snippet that can be used to reproduce what I'm describing:
#SpringBootApplication
public class ReactiveApplication {
public static void main(String[] args) {
SpringApplication.run(ReactiveApplication.class, args);
}
}
#Controller
class TestController {
private final RedisReactiveCommands<String, String> redis = RedisClient.create("redis://localhost:6379").connect().reactive();
#RequestMapping("/test")
public Mono<Void> test() {
return redis.exists("key")
.doOnSubscribe(subscription -> System.out.println("\nonSubscribe called on thread " + Thread.currentThread().getName()))
.doOnNext(aLong -> System.out.println("onNext called on thread " + Thread.currentThread().getName()))
.then();
}
}
If I keep calling the /test endpoint I get the following output:
onSubscribe called on thread reactor-http-nio-2
onNext called on thread lettuce-nioEventLoop-4-1
onSubscribe called on thread reactor-http-nio-3
onNext called on thread lettuce-nioEventLoop-4-1
onSubscribe called on thread reactor-http-nio-4
onNext called on thread lettuce-nioEventLoop-4-1
That's an excellent question!
The TL;DR;
Lettuce always publishes using the I/O thread that is bound to the netty channel. This may or may not be suitable for your workload.
The Longer Read
Redis is single-threaded, so it makes sense to keep a single TCP connection. Netty's threading model is that all I/O work is handled by the EventLoop thread that is bound to the channel. Because of this constellation, you receive all reactive signals on the same thread. It makes sense to benchmark the impact using various reactive sequences with various options.
A different usage scheme (i.e. using pooled connections) is something that changes directly the observed results as pooling uses different connections and so notifications are received on different threads.
Another alternative could be to provide an ExecutorService just for response signals (data, error, completion). In some scenarios, the cost of context switching can be neglected because of the removing congestion in the I/O thread. In other scenarios, the context switching cost might be more notable.
You can already observe the same behavior with WebFlux: Every incoming connection is a new connection, and so it's handled by a different inbound EventLoop thread. Reusing the same EventLoop thread for outbound notification (that one, that was used for inbound notifications) happens quite late when writing the HTTP response to the channel.
This duality of responsibilities (completing a command, performing I/O) can experience some gravity towards a more computation-heavy workload which drags performance out of I/O.
Additional resources:
Investigate on response thread switching #905.

Is it possible to dictate use of RPC callback threads?

I am working on a bug that related to an unmanaged MTA COM object. The object has Lock and Unlock methods and uses a mutex that requires the same thread that called Lock to call Unlock.
The problem is that when Lock and Unlock are called from a managed STA thread (using COM interop), the calls come into the COM object on a RPC callback thread but the callback thread that is used is not always the same for both calls. When it is not the same, the Unlock call fails because it can't unlock the mutex.
In other words:
Managed STA thread 1 -> RPC callback (thread 11) -> Lock
Managed STA thread 1 -> RPC callback (thread 12) -> Unlock -> Error
I am trying to evaluate all possible solutions before making any decisions on a fix. As such, I am trying to find out:
1) Is there is a way to prevent a RPC callback thread from being used in the first place? In my testing, if I make the calls to the object from an unmanaged STA thread, the calls seem to come in on the calling thread itself. What is different when the call is coming from .Net that necessitates the use of an RPC callback thread? Is there any way to prevent RPC callbacks from being used? (except for using an MTA calling thread)
2) If not, is there a way to force a consistent RPC callback thread to be used from the same managed STA thread?
This is by design for a free-threaded server. COM takes your word for it and allows stubs to use arbitrary RPC threads. You cannot make any assumptions about the thread identity, the RPC thread is picked from a pool and is recycled. Unfortunately it often picks the same one when the calls are sequenced so it will look like it works fine at first. But trouble starts as soon as more than one concurrent server call is made. There is no option to make it selective, a free-threaded server promises to not care. Nor could that work well in practice, it would either scale horribly or induce deadlock.
You therefore cannot use a mutex to implement locking, it has thread affinity. A semaphore is a good choice.

Alternatives to dispatch_get_current_queue() for completion blocks in iOS 6?

I have a method that accepts a block and a completion block. The first block should run in the background, while the completion block should run in whatever queue the method was called.
For the latter I always used dispatch_get_current_queue(), but it seems like it's deprecated in iOS 6 or higher. What should I use instead?
The pattern of "run on whatever queue the caller was on" is appealing, but ultimately not a great idea. That queue could be a low priority queue, the main queue, or some other queue with odd properties.
My favorite approach to this is to say "the completion block runs on an implementation defined queue with these properties: x, y, z", and let the block dispatch to a particular queue if the caller wants more control than that. A typical set of properties to specify would be something like "serial, non-reentrant, and async with respect to any other application-visible queue".
** EDIT **
Catfish_Man put an example in the comments below, I'm just adding it to his answer.
- (void) aMethodWithCompletionBlock:(dispatch_block_t)completionHandler
{
dispatch_async(self.workQueue, ^{
[self doSomeWork];
dispatch_async(self.callbackQueue, completionHandler);
}
}
This is fundamentally the wrong approach for the API you are describing to take. If an API accepts a block and a completion block to run, the following facts need to be true:
The "block to run" should be run on an internal queue, e.g. a queue which is private to the API and hence entirely under that API's control. The only exception to this is if the API specifically declares that the block will be run on the main queue or one of the global concurrent queues.
The completion block should always be expressed as a tuple (queue, block) unless the same assumptions as for #1 hold true, e.g. the completion block will be run on a known global queue. The completion block should furthermore be dispatched async on the passed-in queue.
These are not just stylistic points, they're entirely necessary if your API is to be safe from deadlocks or other edge-case behavior that WILL otherwise hang you from the nearest tree someday. :-)
The other answers are great, but for the me the answer is structural. I have a method like this that's on a Singleton:
- (void) dispatchOnHighPriorityNonMainQueue:(simplest_block)block forceAsync:(BOOL)forceAsync {
if (forceAsync || [NSThread isMainThread])
dispatch_async_on_high_priority_queue(block);
else
block();
}
which has two dependencies, which are:
static void dispatch_async_on_high_priority_queue(dispatch_block_t block) {
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH, 0), block);
}
and
typedef void (^simplest_block)(void); // also could use dispatch_block_t
That way I centralize my calls to dispatch on the other thread.
You should be careful about your use of dispatch_get_current_queue in the first place. From the header file:
Recommended for debugging and logging purposes only:
The code
must not make any assumptions about the queue returned, unless it
is one of the global queues or a queue the code has itself created.
The code must not assume that synchronous execution onto a queue is
safe from deadlock if that queue is not the one returned by
dispatch_get_current_queue().
You could do either one of two things:
Keep a reference to the queue you originally posted on (if you created it via dispatch_queue_create), and use that from then on.
Use system defined queues via dispatch_get_global_queue, and keep a track of which one you're using.
Effectively whilst previously relying on the system to keep track of the queue you are on, you are going to have to do it yourself.
Apple had deprecated dispatch_get_current_queue(), but left a hole in another place, so we still able to get current dispatch queue:
if let currentDispatch = OperationQueue.current?.underlyingQueue {
print(currentDispatch)
// Do stuff
}
This works for main queue at least.
Note, that underlyingQueue property is available since iOS 8.
If you need to perform the completion block in the original queue, you also may use OperationQueue directly, I mean without GCD.
For those who still need in queue comparing, you could compare queues by their label or specifies.
Check this https://stackoverflow.com/a/23220741/1531141
This is a me too answer. So I will talk about our use case.
We have a services layer and the UI layer (among other layers). The services layer runs tasks in the background. (Data manipulation tasks, CoreData tasks, Network calls etc). The service layer has a couple operation queues to satisfy the needs of the UI layer.
The UI layer relies on the services layer to do its work and then run a success completion block. This block can have UIKit code in it. A simple use case is to get all messages from the server and reload the collection view.
Here we guarantee that the blocks that are passed into the services layer are dispatched on the queue on which the service was invoked on. Since dispatch_get_current_queue is a deprecated method, we use the NSOperationQueue.currentQueue to get the caller's current queue. Important note on this property.
Calling this method from outside the context of a running operation
typically results in nil being returned.
Since we always invoke our services on a known queue (Our custom queues and Main queue) this works well for us. We do have cases where serviceA can call serviceB which can call serviceC. Since we control where the first service call is being made from, we know the rest of the services will follow the same rules.
So NSOperationQueue.currentQueue will always return one of our Queues or the MainQueue.

WCF client causes server to hang until connection fault

The below text is an effort to expand and add color to this question:
How do I prevent a misbehaving client from taking down the entire service?
I have essentially this scenario: a WCF service is up and running with a client callback having a straight forward, simple oneway communication, not very different from this one:
public interface IMyClientContract
{
[OperationContract(IsOneWay = true)]
void SomethingChanged(simpleObject myObj);
}
I'm calling this method potentially thousands of times a second from the service to what will eventually be about 50 concurrently connected clients, with as low latency as possible (<15 ms would be nice). This works fine until I set a break point on one of the client apps connected to the server and then everything hangs after maybe 2-5 seconds the service hangs and none of the other clients receive any data for about 30 seconds or so until the service registers a connection fault event and disconnects the offending client. After this all the other clients continue on their merry way receiving messages.
I've done research on serviceThrottling, concurrency tweaking, setting threadpool minimum threads, WCF secret sauces and the whole 9 yards, but at the end of the day this article MSDN - WCF essentials, One-Way Calls, Callbacks and Events describes exactly the issue I'm having without really making a recommendation.
The third solution that allows the service to safely call back to the client is to have the callback contract operations configured as one-way operations. Doing so enables the service to call back even when concurrency is set to single-threaded, because there will not be any reply message to contend for the lock.
but earlier in the article it describes the issue I'm seeing, only from a client perspective
When one-way calls reach the service, they may not be dispatched all at once and may be queued up on the service side to be dispatched one at a time, all according to the service configured concurrency mode behavior and session mode. How many messages (whether one-way or request-reply) the service is willing to queue up is a product of the configured channel and the reliability mode. If the number of queued messages has exceeded the queue's capacity, then the client will block, even when issuing a one-way call
I can only assume that the reverse is true, the number of queued messages to the client has exceeded the queue capacity and the threadpool is now filled with threads attempting to call this client that are now all blocked.
What is the right way to handle this? Should I research a way to check how many messages are queued at the service communication layer per client and abort their connections after a certain limit is reached?
It almost seems that if the WCF service itself is blocking on a queue filling up then all the async / oneway / fire-and-forget strategies I could ever implement inside the service will still get blocked whenever one client's queue gets full.
Don't know much about the client callbacks, but it sounds similar to generic wcf code blocking issues. I often solve these problems by spawning a BackgroundWorker, and performing the client call in the thread. During that time, the main thread counts how long the child thread is taking. If the child has not finished in a few milliseconds, the main thread just moves on and abandons the thread (it eventually dies by itself, so no memory leak). This is basically what Mr.Graves suggests with the phrase "fire-and-forget".
Update:
I implemented a Fire-and-forget setup to call the client's callback channel and the server no longer blocks once the buffer fills to the client
MyEvent is an event with a delegate that matches one of the methods defined in the WCF client contract, when they connect I'm essentially adding the callback to the event
MyEvent += OperationContext.Current.GetCallbackChannel<IFancyClientContract>().SomethingChanged
etc... and then to send this data to all clients, I'm doing the following
//serialize using protobuff
using (var ms = new MemoryStream())
{
ProtoBuf.Serializer.Serialize(ms, new SpecialDataTransferObject(inputData));
byte[] data = ms.GetBuffer();
Parallel.ForEach(MyEvent.GetInvocationList(), p => ThreadUtil.FireAndForget(p, data));
}
in the ThreadUtil class I made essentially the following change to the code defined in the fire-and-foget article
static void InvokeWrappedDelegate(Delegate d, object[] args)
{
try
{
d.DynamicInvoke(args);
}
catch (Exception ex)
{
//THIS will eventually throw once the client's WCF callback channel has filled up and timed out, and it will throw once for every single time you ever tried sending them a payload, so do some smarter logging here!!
Console.WriteLine("Error calling client, attempting to disconnect.");
try
{
MyService.SingletonServiceController.TerminateClientChannelByHashcode(d.Target.GetHashCode());//this is an IContextChannel object, kept in a dictionary of active connections, cross referenced by hashcode just for this exact occasion
}
catch (Exception ex2)
{
Console.WriteLine("Attempt to disconnect client failed: " + ex2.ToString());
}
}
}
I don't have any good ideas how to go and kill all the pending packets the server is still waiting to see if they'll get delivered on. Once I get the first exception I should in theory be able to go and terminate all the other requests in some queue somewhere, but this setup is functional and meets the objectives.

boost::asio timeouts example - writing data is expensive

boost:: asio provides an example of how to use the library to implement asynchronous timeouts; client sends server periodic heartbeat messages to server, which echoes heartbeat back to client. failure to respond within N seconds causes disconnect. see boost_asio/example/timeouts/server.cpp
The pattern outlined in these examples would be a good starting point for part of a project i will be working on shortly, but for one wrinkle:
in addition to heartbeats, both client and server need to send messages to each other.
The timeouts example pushes heartbeat echo messages onto a queue, and a subsequent timeout causes an asynchronous handler for the timeout to actually write the data to the socket.
Introducing data for the socket to write cannot be done on the thread running io_service, because it is blocked on run(). run_once() doesn't help, you still block until there is a handler to run, and introduce the complexity of managing work for the io_service.
In asio, asynchronous handlers - writes to the socket being one of them - are called on the thread running io_service.
Therefore, to introduce messages randomly, data to be sent is pushed onto a queue from a thread other than the io_service thread, which implies protecting the queue and notification timer with a mutex. There are then two mutexes per message, one for pushing the data to the queue, and one for the handler which dequeues the data for write to socket.
This is actually a more general question than asio timeouts alone: is there a pattern, when the io_service thread is blocked on run(), by which data can be asynchronously written to the socket without taking two mutexes per message?
The following things could be of interest: boost::asio strands is a mechanism of synchronising handlers. You only need to do this though if you are calling io_service::run from multiple threads AFAIK.
Also useful is the io_service::post method, which allows you execute code from the thread that has invoked io_service::run.