Reactive Redis (Lettuce) always publishing to single thread - spring-webflux

Im using Spring Webflux (with spring-reactor-netty) 2.1.0.RC1 and Lettuce 5.1.1.RELEASE.
When I invoke any Redis operation using the Reactive Lettuce API the execution always switches to the same individual thread (lettuce-nioEventLoop-4-1).
That is leading to poor performance since all the execution is getting bottlenecked in that single thread.
I know I could use publishOn every time I call Redis to switch to another thread, but that is error prone and still not optimal.
Is there any way to improve that? I see that Lettuce provides the ClientResources class to customize the Thread allocation but I could not find any way to integrate that with Spring webflux.
Besides, wouldn't the current behaviour be dangerous for a careless developer? Maybe the defaults should be tuned a little. I suppose the ideal scenario would be if Lettuce could just reuse the same event loop from webflux.
I'm adding this spring boot single class snippet that can be used to reproduce what I'm describing:
#SpringBootApplication
public class ReactiveApplication {
public static void main(String[] args) {
SpringApplication.run(ReactiveApplication.class, args);
}
}
#Controller
class TestController {
private final RedisReactiveCommands<String, String> redis = RedisClient.create("redis://localhost:6379").connect().reactive();
#RequestMapping("/test")
public Mono<Void> test() {
return redis.exists("key")
.doOnSubscribe(subscription -> System.out.println("\nonSubscribe called on thread " + Thread.currentThread().getName()))
.doOnNext(aLong -> System.out.println("onNext called on thread " + Thread.currentThread().getName()))
.then();
}
}
If I keep calling the /test endpoint I get the following output:
onSubscribe called on thread reactor-http-nio-2
onNext called on thread lettuce-nioEventLoop-4-1
onSubscribe called on thread reactor-http-nio-3
onNext called on thread lettuce-nioEventLoop-4-1
onSubscribe called on thread reactor-http-nio-4
onNext called on thread lettuce-nioEventLoop-4-1

That's an excellent question!
The TL;DR;
Lettuce always publishes using the I/O thread that is bound to the netty channel. This may or may not be suitable for your workload.
The Longer Read
Redis is single-threaded, so it makes sense to keep a single TCP connection. Netty's threading model is that all I/O work is handled by the EventLoop thread that is bound to the channel. Because of this constellation, you receive all reactive signals on the same thread. It makes sense to benchmark the impact using various reactive sequences with various options.
A different usage scheme (i.e. using pooled connections) is something that changes directly the observed results as pooling uses different connections and so notifications are received on different threads.
Another alternative could be to provide an ExecutorService just for response signals (data, error, completion). In some scenarios, the cost of context switching can be neglected because of the removing congestion in the I/O thread. In other scenarios, the context switching cost might be more notable.
You can already observe the same behavior with WebFlux: Every incoming connection is a new connection, and so it's handled by a different inbound EventLoop thread. Reusing the same EventLoop thread for outbound notification (that one, that was used for inbound notifications) happens quite late when writing the HTTP response to the channel.
This duality of responsibilities (completing a command, performing I/O) can experience some gravity towards a more computation-heavy workload which drags performance out of I/O.
Additional resources:
Investigate on response thread switching #905.

Related

Spring Reactor WebClient how does it achieve non-blocking?

Basic question: How does Spring Reactors WebClient achieve non blocking when compared to RestTemplate? Doesn't it have to block somewhere after it has dispatched the request to the external service (for example)? HTTP by nature is synchronous right? So the calling application has to wait for the response?
How does the thread know the context to react upon the response from the service?
There are several separate questions here.
How I/O operations are managed?
What's the threading model behind this runtime?
How does the application deal with the request/response model behind HTTP?
In the case of WebClient and project Reactor, the Netty event loop is used to queue/dispatch/process events. Each read/write operation is done is a non-blocking manner, meaning that no thread sits waiting for an I/O operation to complete. In this model, concurrency is not done through thread pools, but there's a small number of threads that process unit of work which should never block.
From a pure HTTP standpoint (i.e. if you were capturing the HTTP packets on the network), you'd see no big difference between a RestTemplate and a WebClient call. The HTTP transport itself doesn't support the backpressure concept. So the client still has to wait for the response - the difference here is that the application using that WebClient won't waste resources on waiting for that operation to complete - it will use them to process other events.
For more information on that, please check out the reactive programming introduction in the Reactor reference documentation and this talk given by Rossen Stoyanchev that explains things well if you're used to the typical Servlet container model.

How we can test the Apache Common pool evict functionality

I am trying to consume Apache common pool library to implement an object pooling for the objects that are expensive to create in my application. For respource pooling I have used the GenericObjectPool class of the library to use the default implementation provided by API for the object pooling. In order to ensure that we do not end up having several idle objects in memory, I set up the minEvictableIdleTimeMillis and timeBetweenEvictionRunsMillis properties to 30 minutes.
As I understood from other questions, blogs and API documentation, these properties trigger a separate thread in order to evict the idle objects from pool.
Could someone help me if that has any adverse impact on application performance and if there is any way to test if that thread is actually executed or not?
Library comes with the performance disclaimer when evictor is enabled
Eviction runs contend with client threads for access to objects in the pool, so if they run too frequently performance issues may result.
reference : https://commons.apache.org/proper/commons-pool/api-1.6/org/apache/commons/pool/impl/GenericObjectPool.html
However, we have a high TPS system running eviction every 1 sec and we don't see much of a performance bottle necks.
As for the eviction thread runs are concerned, you can override the evict() method in your implementation of GenericObjectPool and add a log line.
#Override
public void evict() throws Exception {
//log
super.evict();
}

How to customize Akka Http Client execution context

When calling singleRequest, how can one customize the execution context that is used by the connection pool?
I took a brief look at the code, and a call to singleRequest results in a message being sent to the PoolMasterActor, which in turn sends a message to the pool interface actor.
Is each connection blocking or non-blocking?
Which context is used for the connection pool? (I want to make sure that my HTTP requests don't block all the threads)
If you check out the singleRequest signature, it requires an implicit Materializer (and therefore an ActorSystem and its dispatchers) to run the underlying HTTP infrastructure - which is based on Akka Streams. More knowledge on how materializers spawn threads under-the-hood can be found in the docs, and this blogpost.
Going back to your questions:
The whole Akka-HTTP infrastructure is inherently non-blocking (as it's based on Akka Streams - which adheres to the Reactive Streams spec and is based on Akka Actors).
The threading used by the singleRequest call inherits from the ActorSystem dispatcher used down the line. Unless you do anything specific, you will end up using your system's default dispatcher. This is reasonable choice in many cases when you are writing an Akka HTTP client.
In case you really need your materializer to use a custom dispatcher you can achieve this by customizing your ActorMaterializerSettings, e.g.
implicit val materializer = ActorMaterializer(
ActorMaterializerSettings(actorSystem).withDispatcher("my-custom-dispatcher")
)

using multiple io_service objects

I have my application in which listen and process messages from both internet sockets and unix domain sockets. Now I need to add SSL to the internet sockets, I was using a single io_service object for all the sockets in the application. It seems now I need to add separate io_service objects for network sockets and unix domain sockets. I don't have any threads in my application and I use async_send and async_recieve and async_accept to process data and connections. Please point me to any examples using multiple io_service objects with async handlers.
The question has a degree of uncertainty as if multiple io_service objects are required. I could not locate anything in the reference documentation, or the overview for SSL and UNIX Domain Sockets that mandated separate io_service objects. Regardless, here are a few options:
Single io_service:
Try to use a single io_service.
If you do not have a direct handle to the io_service object, but you have a handle to a Boost.Asio I/O object, such as a socket, then a handle to the associated io_service object can be obtained by calling socket.get_io_service().
Use a thread per io_service:
If multiple io_service objects are required, then dedicate a thread to each io_service. This approach is used in Boost.Asio's HTTP Server 2 example.
boost::asio::io_service service1;
boost::asio::io_service service2;
boost::thread_group threads;
threads.create_thread(boost::bind(&boost::asio::io_service::run, &service1));
service2.run();
threads.join_all();
One consequence of this approach is that the it may require thread-safety guarantees to be made by the application. For example, if service1 and service2 both have completion handlers that invoke message_processor.process(), then message_processor.process() needs to either be thread-safe or called in a thread-safe manner.
Poll io_service:
io_service provides non-blocking alternatives to run(). Where as io_service::run() will block until all work has finished, io_service::poll() will run handlers that are ready to run and will not block. This allows for a single thread to execute the event loop on multiple io_service objects:
while (!service1.stopped() &&
!service2.stopped())
{
std::size_t ran = 0;
ran += service1.poll();
ran += service2.poll();
// If no handlers ran, then sleep.
if (0 == ran)
{
boost::this_thread::sleep_for(boost::chrono::seconds(1));
}
}
To prevent a tight-busy loop when there are no ready-to-run handlers, it may be worth adding in a sleep. Be aware that this sleep may introduce latency in the overall handling of events.
Transfer handlers to a single io_service:
One interesting approach is to use a strand to transfer completion handlers to a single io_service. This allows for a thread per io_service, while preventing the need to have the application make thread-safety guarantees, as all completion handlers will post through a single service, whose event loop is only being processed by a single thread.
boost::asio::io_service service1;
boost::asio::io_service service2;
// strand2 will be used by service2 to post handlers to service1.
boost::asio::strand strand2(service1);
boost::asio::io_service::work work2(service2);
socket.async_read_some(buffer, strand2.wrap(read_some_handler));
boost::thread_group threads;
threads.create_thread(boost::bind(&boost::asio::io_service::run, &service1));
service2.run();
threads.join_all();
This approach does have some consequences:
It requires handlers that are intended to by ran by the main io_service to be wrapped via strand::wrap().
The asynchronous chain now runs through two io_services, creating an additional level of complexity. It is important to account for the case where the secondary io_service no longer has work, causing its run() to return.
It is common for an asynchronous chains to occur within the same io_service. Thus, the service never runs out of work, as a completion handler will post additional work onto the io_service.
| .------------------------------------------.
V V |
read_some_handler() |
{ |
socket.async_read_some(..., read_some_handler) --'
}
On the other hand, when a strand is used to transfer work to another io_service, the wrapped handler is invoked within service2, causing it to post the completion handler into service1. If the wrapped handler was the only work in service2, then service2 no longer has work, causing servce2.run() to return.
service1 service2
====================================================
.----------------- wrapped(read_some_handler)
| .
V .
read_some_handler NO WORK
| .
| .
'----------------> wrapped(read_some_handler)
To account for this, the example code uses an io_service::work for service2 so that run() remains blocked until explicitly told to stop().
Looks like you are writing a server and not a client. Don't know if this helps, but I am using ASIO to communicate with 6 servers from my client. It uses TCP/IP SSL/TSL. You can find a link to the code here
You should be able to use just one io_service object with multiple socket objects. But, if you decide that you really want to have multiple io_service objects, then it should be fairly easy to do so. In my class, the io_service object is static. So, just remove the static keyword along with the logic in the constructor that only creates one instance of the io_service object. Depending on the number of connections expected for your server, you would probably be better off using a thread pool dedicated to handling socket I/O rather than creating a thread for each new socket connection.

WCF client causes server to hang until connection fault

The below text is an effort to expand and add color to this question:
How do I prevent a misbehaving client from taking down the entire service?
I have essentially this scenario: a WCF service is up and running with a client callback having a straight forward, simple oneway communication, not very different from this one:
public interface IMyClientContract
{
[OperationContract(IsOneWay = true)]
void SomethingChanged(simpleObject myObj);
}
I'm calling this method potentially thousands of times a second from the service to what will eventually be about 50 concurrently connected clients, with as low latency as possible (<15 ms would be nice). This works fine until I set a break point on one of the client apps connected to the server and then everything hangs after maybe 2-5 seconds the service hangs and none of the other clients receive any data for about 30 seconds or so until the service registers a connection fault event and disconnects the offending client. After this all the other clients continue on their merry way receiving messages.
I've done research on serviceThrottling, concurrency tweaking, setting threadpool minimum threads, WCF secret sauces and the whole 9 yards, but at the end of the day this article MSDN - WCF essentials, One-Way Calls, Callbacks and Events describes exactly the issue I'm having without really making a recommendation.
The third solution that allows the service to safely call back to the client is to have the callback contract operations configured as one-way operations. Doing so enables the service to call back even when concurrency is set to single-threaded, because there will not be any reply message to contend for the lock.
but earlier in the article it describes the issue I'm seeing, only from a client perspective
When one-way calls reach the service, they may not be dispatched all at once and may be queued up on the service side to be dispatched one at a time, all according to the service configured concurrency mode behavior and session mode. How many messages (whether one-way or request-reply) the service is willing to queue up is a product of the configured channel and the reliability mode. If the number of queued messages has exceeded the queue's capacity, then the client will block, even when issuing a one-way call
I can only assume that the reverse is true, the number of queued messages to the client has exceeded the queue capacity and the threadpool is now filled with threads attempting to call this client that are now all blocked.
What is the right way to handle this? Should I research a way to check how many messages are queued at the service communication layer per client and abort their connections after a certain limit is reached?
It almost seems that if the WCF service itself is blocking on a queue filling up then all the async / oneway / fire-and-forget strategies I could ever implement inside the service will still get blocked whenever one client's queue gets full.
Don't know much about the client callbacks, but it sounds similar to generic wcf code blocking issues. I often solve these problems by spawning a BackgroundWorker, and performing the client call in the thread. During that time, the main thread counts how long the child thread is taking. If the child has not finished in a few milliseconds, the main thread just moves on and abandons the thread (it eventually dies by itself, so no memory leak). This is basically what Mr.Graves suggests with the phrase "fire-and-forget".
Update:
I implemented a Fire-and-forget setup to call the client's callback channel and the server no longer blocks once the buffer fills to the client
MyEvent is an event with a delegate that matches one of the methods defined in the WCF client contract, when they connect I'm essentially adding the callback to the event
MyEvent += OperationContext.Current.GetCallbackChannel<IFancyClientContract>().SomethingChanged
etc... and then to send this data to all clients, I'm doing the following
//serialize using protobuff
using (var ms = new MemoryStream())
{
ProtoBuf.Serializer.Serialize(ms, new SpecialDataTransferObject(inputData));
byte[] data = ms.GetBuffer();
Parallel.ForEach(MyEvent.GetInvocationList(), p => ThreadUtil.FireAndForget(p, data));
}
in the ThreadUtil class I made essentially the following change to the code defined in the fire-and-foget article
static void InvokeWrappedDelegate(Delegate d, object[] args)
{
try
{
d.DynamicInvoke(args);
}
catch (Exception ex)
{
//THIS will eventually throw once the client's WCF callback channel has filled up and timed out, and it will throw once for every single time you ever tried sending them a payload, so do some smarter logging here!!
Console.WriteLine("Error calling client, attempting to disconnect.");
try
{
MyService.SingletonServiceController.TerminateClientChannelByHashcode(d.Target.GetHashCode());//this is an IContextChannel object, kept in a dictionary of active connections, cross referenced by hashcode just for this exact occasion
}
catch (Exception ex2)
{
Console.WriteLine("Attempt to disconnect client failed: " + ex2.ToString());
}
}
}
I don't have any good ideas how to go and kill all the pending packets the server is still waiting to see if they'll get delivered on. Once I get the first exception I should in theory be able to go and terminate all the other requests in some queue somewhere, but this setup is functional and meets the objectives.