Akka.net: Should I specify "split brain resolver" configuration for Lighthouse/Seed nodes - akka.net

I have this application using Akka.net cluster feature. The people who wrote the code have left the company.
I am trying to understand the code and we are planning a deployment.
The cluster has 2 types of nodes
QueueServicer: supports sharding and only these nodes should participate in sharding.
LightHouse: They are just seed nodes, nothing else.
Lighthouse : 2 nodes
QueueServicer : 3 Nodes
I see one of the QueueServicer node unable to join the cluster. Both lighthouse nodes are refusing connection. It constantly tries to join and never succeeds. This has been happening for the last 5 days or so and the node is never dying also. Its CPU and memory usage is high. Also It doesn't have any queue processor actors running when filtered search through the log. It takes long hours for Garbage collection etc. I see in the log for this node, the following.
{"timestamp":"2021-09-08T22:26:59.025Z", "logger":"Akka.Event.DummyClassForStringSources", "message":Tried to associate with unreachable remote address [akka.tcp://myapp#lighthouse-1:7892]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [Association failed with akka.tcp://myapp#lighthouse-1:7892] Caused by: [System.AggregateException: One or more errors occurred. (Connection refused akka.tcp://myapp#lighthouse-1:7892) ---> Akka.Remote.Transport.InvalidAssociationException: Connection refused akka.tcp://myapp#lighthouse-1:7892 at Akka.Remote.Transport.DotNetty.TcpTransport.AssociateInternal(Address remoteAddress) at Akka.Remote.Transport.DotNetty.DotNettyTransport.Associate(Address remoteAddress) --- End of inner exception stack trace --- at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification) at Akka.Remote.Transport.ProtocolStateActor.<>c.<InitializeFSM>b__12_18(Task1 result) at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke() at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
{"timestamp":"2021-09-08T22:26:59.025Z", "logger":"Akka.Event.DummyClassForStringSources", "message":Tried to associate with unreachable remote address [akka.tcp://myapp#lighthouse-0:7892]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [Association failed with akka.tcp://myapp#lighthouse-0:7892] Caused by: [System.AggregateException: One or more errors occurred. (Connection refused akka.tcp://myapp#lighthouse-0:7892) ---> Akka.Remote.Transport.InvalidAssociationException: Connection refused akka.tcp://myapp#lighthouse-0:7892 at Akka.Remote.Transport.DotNetty.TcpTransport.AssociateInternal(Address remoteAddress) at Akka.Remote.Transport.DotNetty.DotNettyTransport.Associate(Address remoteAddress) --- End of inner exception stack trace --- at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification) at Akka.Remote.Transport.ProtocolStateActor.<>c.<InitializeFSM>b__12_18(Task1 result) at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke() at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
There are other "Now supervising", "Stopping" "Started" logs which I am omitting here.
Can you please verify if the HCON config is correct for split brain resolver and Sharding?
I think LightHouse/SeeNodes should not have the sharding configuration specified. I think it is a mistake.
I also think, split brain resolver configuration might be wrong in LightHouse/SeedNodes and should not be specified for seed nodes.
I appreciate your help.
Here is the HOCON for QueueServicer Trimmed
akka {
    loggers = ["Akka.Logger.log4net.Log4NetLogger, Akka.Logger.log4net"]
    log-config-on-start = on
    loglevel = "DEBUG"
    actor {
        provider = cluster
        serializers {
            hyperion = "Akka.Serialization.HyperionSerializer, Akka.Serialization.Hyperion"
        }
        serialization-bindings {
            "System.Object" = hyperion
        }
    }
remote {
dot-netty.tcp {
….
}
}
cluster {
seed-nodes = ["akka.tcp://myapp#lighthouse-0:7892",akka.tcp://myapp#lighthouse-1:7892"]
roles = ["QueueProcessor"]
sharding {
role = "QueueProcessor"
state-store-mode = ddata
remember-entities = true
passivate-idle-entity-after = off
}
downing-provider-class = "Akka.Cluster.SplitBrainResolver, Akka.Cluster"
split-brain-resolver {
active-strategy = keep-majority
stable-after = 20s
keep-majority {
role = "QueueProcessor"
}
}
down-removal-margin = 20s
}
extensions = ["Akka.Cluster.Tools.PublishSubscribe.DistributedPubSubExtensionProvider,Akka.Cluster.Tools"]
}
Here is the HOCON for Lighthouse
akka {
    loggers = ["Akka.Logger.log4net.Log4NetLogger, Akka.Logger.log4net"]
    log-config-on-start = on
    loglevel = "DEBUG"
    actor {
        provider = cluster
        serializers {
            hyperion = "Akka.Serialization.HyperionSerializer, Akka.Serialization.Hyperion"
        }
        serialization-bindings {
            "System.Object" = hyperion
        }
    }
remote {
dot-netty.tcp {
…
}
}
cluster {
seed-nodes = ["akka.tcp://myapp#lighthouse-0:7892",akka.tcp://myapp#lighthouse-1:7892"]
roles = ["lighthouse"]
sharding {
role = "lighthouse"
state-store-mode = ddata
remember-entities = true
passivate-idle-entity-after = off
}
downing-provider-class = "Akka.Cluster.SplitBrainResolver, Akka.Cluster"
split-brain-resolver {
active-strategy = keep-oldest
stable-after = 30s
keep-oldest {
down-if-alone = on
role = "lighthouse"
}
}
}
}

I meant to reply to this sooner.
Here is your problem: you're using two different split brain resolver configurations - one for the QueueServicer and one for Lighthouse. Therefore, how your cluster resolves itself is going to be quite different depending upon who is the leader of each half of the cluster.
I would stick with a simple keep-majority strategy and use it uniformly on all nodes throughout the cluster - we're very likely going to enable this by default in Akka.NET v1.5.
If you have any questions, please feel free to reach out to us: https://petabridge.com/

Related

Redis Reactive Streams Subscriber Thread Hangs

Am trying to use Spring Boot Redis Reactive Stream to subscribe to the stream as a listener.When the data is inserted into the stream listener will pass to client using GRPC stream. I have pointer which is just redis value to keep track of last record which was delivered to the client.The thread is getting blocked randomly when i set pointer to redis and getting timeout.
Mainly am using this pointer if client re-connects after sometime I wanted to deliver data which was last sent and till current data template.opsForValue().set(pointerKey, msg.getId().toString()) .block(Duration.ofSeconds(5)); this place thread is getting blocked
Please let me know is anything wrong in the below code. If posted 10 record to the stream I got the error after receiving 5 record.
Code
public void subscribe(){
String channelId = this.streamRequest.getTopic();
String identifier = this.streamRequest.getIdentifier();
boolean isNew = this.streamRequest.getNew();
String pointerKey = channelId + "_" + identifier + "_pointer";
StreamOffset<String> stringStreamOffset = StreamOffset.fromStart(channelId);
if(isNew){
// If client want to read data from the start
// Removed the pointer
template.opsForValue().delete(pointerKey).block();
} else {
String id = template.opsForValue().get(pointerKey).block();
stringStreamOffset = id != null ? StreamOffset.create(channelId, ReadOffset.from(id)) : StreamOffset.fromStart(channelId);
}
logger.info("[SC] subscribed {}", this.streamRequest);
Flux<ObjectRecord<String, String>> receiver = this.streamReceive.receive(stringStreamOffset);
disposable = receiver.subscribe(msg -> {
logger.info("Processing message {}", msg.getValue());
String value = msg.getValue();
StreamResponse streamResponse = StreamResponse.newBuilder().setData(value).build();
try{
logger.info("[SC] posting data to the grpc client topic {}", this.streamRequest);
this.responseObserver.onNext(streamResponse);
logger.info("[SC] Successfully posted data to the grpc client {}", this.streamRequest);
logger.info("[SC] Updating pointer {}", pointerKey);
template.opsForValue().set(pointerKey, msg.getId().toString())
.block(Duration.ofSeconds(5));
logger.info("[SC] pointer update completed {}", pointerKey);
}catch (Exception ex){
logger.error("Error:{}", ex.getMessage());
this.responseObserver.onError(ex.getCause());
close();
}
});
}
Error:
Name: lettuce-nioEventLoop-4-1
State: TIMED_WAITING on java.util.concurrent.CountDownLatch$Sync#3c9ebf7a
Total blocked: 2 Total waited: 60
Stack trace:
java.base#17.0.2/jdk.internal.misc.Unsafe.park(Native Method)
java.base#17.0.2/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
java.base#17.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:717)
java.base#17.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1074)
java.base#17.0.2/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:276)
app//reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:121)
app//reactor.core.publisher.Mono.block(Mono.java:1731)
app//ai.jiffy.message.publisher.ws.StreamConnection.setPointer(StreamConnection.java:68)
app//ai.jiffy.message.publisher.ws.StreamConnection.lambda$new$0(StreamConnection.java:54)
app//ai.jiffy.message.publisher.ws.StreamConnection$$Lambda$1318/0x0000000801553610.accept(Unknown Source)
app//reactor.core.publisher.LambdaSubscriber.onNext(LambdaSubscriber.java:160)
app//reactor.core.publisher.FluxCreate$BufferAsyncSink.drain(FluxCreate.java:793)
app//reactor.core.publisher.FluxCreate$BufferAsyncSink.next(FluxCreate.java:718)
app//reactor.core.publisher.FluxCreate$SerializedFluxSink.next(FluxCreate.java:154)
app//org.springframework.data.redis.stream.DefaultStreamReceiver$StreamSubscription.onStreamMessage(DefaultStreamReceiver.java:398)
app//org.springframework.data.redis.stream.DefaultStreamReceiver$StreamSubscription.access$300(DefaultStreamReceiver.java:210)
app//org.springframework.data.redis.stream.DefaultStreamReceiver$StreamSubscription$1.onNext(DefaultStreamReceiver.java:360)
app//org.springframework.data.redis.stream.DefaultStreamReceiver$StreamSubscription$1.onNext(DefaultStreamReceiver.java:351)
app//reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
app//reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:120)
app//reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
app//reactor.core.publisher.FluxUsingWhen$UsingWhenSubscriber.onNext(FluxUsingWhen.java:345)
app//reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onNext(MonoFlatMapMany.java:250)
app//reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
app//reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onNext(MonoFlatMapMany.java:250)
app//reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:120)
app//io.lettuce.core.RedisPublisher$ImmediateSubscriber.onNext(RedisPublisher.java:886)
app//io.lettuce.core.RedisPublisher$RedisSubscription.onNext(RedisPublisher.java:291)
app//io.lettuce.core.output.StreamingOutput$Subscriber.onNext(StreamingOutput.java:64)
app//io.lettuce.core.output.StreamReadOutput.complete(StreamReadOutput.java:110)
app//io.lettuce.core.protocol.RedisStateMachine.doDecode(RedisStateMachine.java:343)
app//io.lettuce.core.protocol.RedisStateMachine.decode(RedisStateMachine.java:295)
app//io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:841)
app//io.lettuce.core.protocol.CommandHandler.decode0(CommandHandler.java:792)
app//io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:766)
app//io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:658)
app//io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:598)
app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
app//io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
app//io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
app//io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
app//io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
app//io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
app//io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
app//io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
app//io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
app//io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
app//io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.base#17.0.2/java.lang.Thread.run(Thread.java:833)
Name: lettuce-nioEventLoop-4-1
State: TIMED_WAITING on java.util.concurrent.CountDownLatch$Sync#3c9ebf7a
Total blocked: 2 Total waited: 60
Stack trace:
java.base#17.0.2/jdk.internal.misc.Unsafe.park(Native Method)
java.base#17.0.2/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
java.base#17.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:717)
java.base#17.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1074)
java.base#17.0.2/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:276)
app//reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:121)
app//reactor.core.publisher.Mono.block(Mono.java:1731)
app//ai.jiffy.message.publisher.ws.StreamConnection.setPointer(StreamConnection.java:68)
app//ai.jiffy.message.publisher.ws.StreamConnection.lambda$new$0(StreamConnection.java:54)
app//ai.jiffy.message.publisher.ws.StreamConnection$$Lambda$1318/0x0000000801553610.accept(Unknown Source)
app//reactor.core.publisher.LambdaSubscriber.onNext(LambdaSubscriber.java:160)
app//reactor.core.publisher.FluxCreate$BufferAsyncSink.drain(FluxCreate.java:793)
app//reactor.core.publisher.FluxCreate$BufferAsyncSink.next(FluxCreate.java:718)
app//reactor.core.publisher.FluxCreate$SerializedFluxSink.next(FluxCreate.java:154)
app//org.springframework.data.redis.stream.DefaultStreamReceiver$StreamSubscription.onStreamMessage(DefaultStreamReceiver.java:398)
app//org.springframework.data.redis.stream.DefaultStreamReceiver$StreamSubscription.access$300(DefaultStreamReceiver.java:210)
app//org.springframework.data.redis.stream.DefaultStreamReceiver$StreamSubscription$1.onNext(DefaultStreamReceiver.java:360)
app//org.springframework.data.redis.stream.DefaultStreamReceiver$StreamSubscription$1.onNext(DefaultStreamReceiver.java:351)
app//reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
app//reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:120)
app//reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
app//reactor.core.publisher.FluxUsingWhen$UsingWhenSubscriber.onNext(FluxUsingWhen.java:345)
app//reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onNext(MonoFlatMapMany.java:250)
app//reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
app//reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onNext(MonoFlatMapMany.java:250)
app//reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:120)
app//io.lettuce.core.RedisPublisher$ImmediateSubscriber.onNext(RedisPublisher.java:886)
app//io.lettuce.core.RedisPublisher$RedisSubscription.onNext(RedisPublisher.java:291)
app//io.lettuce.core.output.StreamingOutput$Subscriber.onNext(StreamingOutput.java:64)
app//io.lettuce.core.output.StreamReadOutput.complete(StreamReadOutput.java:110)
app//io.lettuce.core.protocol.RedisStateMachine.doDecode(RedisStateMachine.java:343)
app//io.lettuce.core.protocol.RedisStateMachine.decode(RedisStateMachine.java:295)
app//io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:841)
app//io.lettuce.core.protocol.CommandHandler.decode0(CommandHandler.java:792)
app//io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:766)
app//io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:658)
app//io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:598)
app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
app//io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
app//io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
app//io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
app//io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
app//io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
app//io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
app//io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
app//io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
app//io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
app//io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.base#17.0.2/java.lang.Thread.run(Thread.java:833)
Thanks in advance.

Got NPE after integrating play framework with play-redis

After integrating play-redis(https://github.com/KarelCemus/play-redis) with play framework, i've got an error when a request incomes:
[20211204 23:20:48.350][HttpErrorHandler.scala:272:onServerError][E] Error while handling error
java.lang.NullPointerException: null
at play.api.http.HttpErrorHandlerExceptions$.convertToPlayException(HttpErrorHandler.scala:377)
at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:367)
at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:264)
at play.core.server.Server$$anonfun$handleErrors$1$1.applyOrElse(Server.scala:109)
at play.core.server.Server$$anonfun$handleErrors$1$1.applyOrElse(Server.scala:105)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35)
at play.core.server.Server$.getHandlerFor(Server.scala:129)
at play.core.server.AkkaHttpServer.handleRequest(AkkaHttpServer.scala:317)
at play.core.server.AkkaHttpServer.$anonfun$createServerBinding$1(AkkaHttpServer.scala:224)
at akka.stream.impl.fusing.MapAsync$$anon$30.onPush(Ops.scala:1297)
at akka.stream.impl.fusing.GraphInterpreter.processPush(GraphInterpreter.scala:541)
at akka.stream.impl.fusing.GraphInterpreter.processEvent(GraphInterpreter.scala:495)
at akka.stream.impl.fusing.GraphInterpreter.execute(GraphInterpreter.scala:390)
at akka.stream.impl.fusing.GraphInterpreterShell.runBatch(ActorGraphInterpreter.scala:625)
at akka.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:502)
at akka.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:600)
at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:775)
at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:790)
at akka.actor.Actor.aroundReceive(Actor.scala:537)
at akka.actor.Actor.aroundReceive$(Actor.scala:535)
at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:691)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:579)
at akka.actor.ActorCell.invoke(ActorCell.scala:547)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
at akka.dispatch.Mailbox.run(Mailbox.scala:231)
at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
I am sure the cause must be play-redis cause the app runs smoothly without it. Particularly, i use a custom implementation of the configuration provider, since need to get the ip and port by calling rest API of a name service.
#Singleton
class CustomRedisInstance #Inject() (
config: Configuration,
polarisExtensionService: PolarisExtensionService,
#NamedCache("redisConnection") redisConnectionCache: AsyncCacheApi)(implicit
asyncExecutionContext: AsyncExecutionContext)
extends RedisStandalone
with RedisDelegatingSettings {
val pathPrefix = "play.cache.redis"
def name = "play"
private def defaultSettings =
RedisSettings.load(
// this should always be "play.cache.redis"
// as it is the root of the configuration with all defaults
config.underlying,
"play.cache.redis")
def settings: RedisSettings = {
RedisSettings
.withFallback(defaultSettings)
.load(
// this is the path to the actual configuration of the instance
//
// in case of named caches, this could be, e.g., "play.cache.redis.instances.my-cache"
//
// in that case, the name of the cache is "my-cache" and has to be considered in
// the bindings in the CustomCacheModule (instead of "play", which is used now)
config.underlying,
"play.cache.redis")
}
def host: String = {
val connectionInfoFuture = getConnectionInfoFromPolaris
Try(Await.result(connectionInfoFuture, 10.seconds)) match {
case Success(extractedVal) => extractedVal.host
case Failure(_) => config.get[String](s"$pathPrefix.host")
case _ => config.get[String](s"$pathPrefix.host")
}
}
def port: Int = {
val connectionInfoFuture = getConnectionInfoFromPolaris
Try(Await.result(connectionInfoFuture, 10.seconds)) match {
case Success(extractedVal) => extractedVal.port
case Failure(_) => config.get[Int](s"$pathPrefix.port")
case _ => config.get[Int](s"$pathPrefix.port")
}
}
def database: Option[Int] = Some(config.get[Int](s"$pathPrefix.database"))
def password: Option[String] = Some(config.get[String](s"$pathPrefix.password"))
}
But the play-redis itself have no error logs. After all these hard work of reading manual and examples, turns out that i should turn to Jedis or Lettuce? Hopeless now.
The reason is that i want to use Redis with Caffeine which cause collision, as the document says, need to rename the default-cache to redis in application.conf:
play.modules.enabled += play.api.cache.redis.RedisCacheModule
# provide additional configuration in the custom module
play.modules.enabled += services.CustomCacheModule
play.cache.redis {
# do not bind default unqualified APIs
bind-default: false
# name of the instance in simple configuration,
# i.e., not located under `instances` key
# but directly under 'play.cache.redis'
default-cache: "redis"
source = custom
host = 127.0.0.1
# redis server: port
port = 6380
# redis server: database number (optional)
database = 0
# authentication password (optional)
password = "#########"
refresh-minute = 10
}
So in the CustomCacheModule, the input param of NamedCacheImpl need to change to redis from play.
class CustomCacheModule extends AbstractModule {
override def configure(): Unit = {
// NamedCacheImpl's input used to be "play"
bind(classOf[RedisInstance]).annotatedWith(new NamedCacheImpl("redis")).to(classOf[CustomRedisInstance])
()
}
}

what is correct use of consumer groups in spring cloud stream dataflow and rabbitmq?

A follow up to this:
one SCDF source, 2 processors but only 1 processes each item
The 2 processors (del-1 and del-2) in the picture are receiving the same data within milliseconds of each other. I'm trying to rig this so del-2 never receives the same thing as del-1 and vice versa. So obviously I've got something configured incorrectly but I'm not sure where.
My processor has the following application.properties
spring.application.name=${vcap.application.name:sample-processor}
info.app.name=#project.artifactId#
info.app.description=#project.description#
info.app.version=#project.version#
management.endpoints.web.exposure.include=health,info,bindings
spring.autoconfigure.exclude=org.springframework.boot.autoconfigure.security.servlet.SecurityAutoConfiguration
spring.cloud.stream.bindings.input.group=input
Is "spring.cloud.stream.bindings.input.group" specified correctly?
Here's the processor code:
#Transformer(inputChannel = Processor.INPUT, outputChannel = Processor.OUTPUT)
public Object transform(String inputStr) throws InterruptedException{
ApplicationLog log = new ApplicationLog(this, "timerMessageSource");
String message = " I AM [" + inputStr + "] AND I HAVE BEEN PROCESSED!!!!!!!";
log.info("SampleProcessor.transform() incoming inputStr="+inputStr);
return message;
}
Is the #Transformer annotation the proper way to link this bit of code with "spring.cloud.stream.bindings.input.group" from application.properties? Are there any other annotations necessary?
Here's my source:
private String format = "EEEEE dd MMMMM yyyy HH:mm:ss.SSSZ";
#Bean
#InboundChannelAdapter(value = Source.OUTPUT, poller = #Poller(fixedDelay = "1000", maxMessagesPerPoll = "1"))
public MessageSource<String> timerMessageSource() {
ApplicationLog log = new ApplicationLog(this, "timerMessageSource");
String message = new SimpleDateFormat(format).format(new Date());
log.info("SampleSource.timeMessageSource() message=["+message+"]");
return () -> new GenericMessage<>(new SimpleDateFormat(format).format(new Date()));
}
I'm confused about the "value = Source.OUTPUT". Does this mean my processor needs to be named differently?
Is the inclusion of #Poller causing me a problem somehow?
This is how I define the 2 processor streams (del-1 and del-2) in SCDF shell:
stream create del-1 --definition ":split > processor-that-does-everything-sleeps5 --spring.cloud.stream.bindings.applicationMetrics.destination=metrics > :merge"
stream create del-2 --definition ":split > processor-that-does-everything-sleeps5 --spring.cloud.stream.bindings.applicationMetrics.destination=metrics > :merge"
Do I need to do anything differently there?
All of this is running in Docker/K8s.
RabbitMQ is given by bitnami/rabbitmq:3.7.2-r1 and is configured with the following props:
RABBITMQ_USERNAME: user
RABBITMQ_PASSWORD <redacted>:
RABBITMQ_ERL_COOKIE <redacted>:
RABBITMQ_NODE_PORT_NUMBER: 5672
RABBITMQ_NODE_TYPE: stats
RABBITMQ_NODE_NAME: rabbit#localhost
RABBITMQ_CLUSTER_NODE_NAME:
RABBITMQ_DEFAULT_VHOST: /
RABBITMQ_MANAGER_PORT_NUMBER: 15672
RABBITMQ_DISK_FREE_LIMIT: "6GiB"
Are any other environment variables necessary?

Recover connection when a new master is elected in the cluster

I've a Redis cluster with 3 nodes; 1 is the master and the other 2 are slaves, holding the replica of the master. When I kill the master instance, Redis Sentinel promotes another node to be the master, which starts to accept writes.
During my tests I noticed that once the new master is promoted, the first operation in Redis with SE.Redis fails with:
StackExchange.Redis.RedisConnectionException: SocketFailure on GET
---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote
host. ---> System.Net.Sockets.SocketException: An existing connection
was forcibly closed by the remote host
To avoid it, I've implemented a retry logic as below. Is there any better alternative?
private RedisValue RedisGet(string key)
{
return RedisOperation(() =>
{
RedisKey redisKey = key;
RedisValue redisValue = connection.StringGet(redisKey);
return (redisValue);
});
}
private T RedisOperation<T>(Func<T> act)
{
int timeToSleepBeforeRetryInMiliseconds = 20;
DateTime startTime = DateTime.Now;
while (true)
{
try
{
return act();
}
catch (Exception e)
{
Debug.WriteLine("Failed to perform REDIS OP");
TimeSpan passedTime = DateTime.Now - startTime;
if (this.retryTimeout < passedTime)
{
Debug.WriteLine("ABORTING re-try to REDIS OP");
throw;
}
else
{
int remainingTimeout = (int)(this.retryTimeout.TotalMilliseconds - passedTime.TotalMilliseconds);
// if remaining time is less than 1 sec than wait only for that much time and than give a last try
if (remainingTimeout < timeToSleepBeforeRetryInMiliseconds)
{
timeToSleepBeforeRetryInMiliseconds = remainingTimeout;
}
}
Debug.WriteLine("Sleeping " + timeToSleepBeforeRetryInMiliseconds + " before next try");
System.Threading.Thread.Sleep(timeToSleepBeforeRetryInMiliseconds);
}
}
}
TLDR: don't use Sentinel with Stackexchange.Redis as Sentinel support is still not implemented in this client library.
See https://github.com/StackExchange/StackExchange.Redis/labels/sentinel for all the open issues, there is also a pretty good PR open for ~1 year now.
That being said, I also had relatively good experience with retries, but I would never use that approach in production as it is not reliable at all.

Cluster sharding client not connecting with host

After recent investigation and a Stack over flow question I realise that the cluster sharding is a better option than a cluster-consistent-hash-router. But I am having trouble getting a 2 process cluster going.
One process is the Seed and the other is the Client. The Seed node seems to continuously throw dead letter messages (see the end of this question).
This Seed HOCON follows:
akka {
loglevel = "INFO"
actor {
provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
serializers {
wire = "Akka.Serialization.WireSerializer, Akka.Serialization.Wire"
}
serialization-bindings {
"System.Object" = wire
}
}
remote {
dot-netty.tcp {
hostname = "127.0.0.1"
port = 5000
}
}
persistence {
journal {
plugin = "akka.persistence.journal.sql-server"
sql-server {
class = "Akka.Persistence.SqlServer.Journal.SqlServerJournal, Akka.Persistence.SqlServer"
schema-name = dbo
auto-initialize = on
connection-string = "Data Source=localhost;Integrated Security=True;MultipleActiveResultSets=True;Initial Catalog=ClusterExperiment01"
plugin-dispatcher = "akka.actor.default- dispatcher"
connection-timeout = 30s
table-name = EventJournal
timestamp-provider = "Akka.Persistence.Sql.Common.Journal.DefaultTimestampProvider, Akka.Persistence.Sql.Common"
metadata-table-name = Metadata
}
}
sharding {
connection-string = "Data Source=localhost;Integrated Security=True;MultipleActiveResultSets=True;Initial Catalog=ClusterExperiment01"
auto-initialize = on
plugin-dispatcher = "akka.actor.default-dispatcher"
class = "Akka.Persistence.SqlServer.Journal.SqlServerJournal, Akka.Persistence.SqlServer"
connection-timeout = 30s
schema-name = dbo
table-name = ShardingJournal
timestamp-provider = "Akka.Persistence.Sql.Common.Journal.DefaultTimestampProvider, Akka.Persistence.Sql.Common"
metadata-table-name = ShardingMetadata
}
}
snapshot-store {
sharding {
class = "Akka.Persistence.SqlServer.Snapshot.SqlServerSnapshotStore, Akka.Persistence.SqlServer"
plugin-dispatcher = "akka.actor.default-dispatcher"
connection-string = "Data Source=localhost;Integrated Security=True;MultipleActiveResultSets=True;Initial Catalog=ClusterExperiment01"
connection-timeout = 30s
schema-name = dbo
table-name = ShardingSnapshotStore
auto-initialize = on
}
}
cluster {
seed-nodes = ["akka.tcp://my-cluster-system#127.0.0.1:5000"]
roles = ["Seed"]
sharding {
journal-plugin-id = "akka.persistence.sharding"
snapshot-plugin-id = "akka.snapshot-store.sharding"
}
}}
I have a method that essentially turns the above into a Config like so:
var config = NodeConfig.Create(/* HOCON above */).WithFallback(ClusterSingletonManager.DefaultConfig());
Without the "WithFallback" I get a null reference exception out of the config generation.
And then generates the system like so:
var system = ActorSystem.Create("my-cluster-system", config);
The client creates its system in the same manner and the HOCON is almost identical aside from:
{
remote {
dot-netty.tcp {
hostname = "127.0.0.1"
port = 5001
}
}
cluster {
seed-nodes = ["akka.tcp://my-cluster-system#127.0.0.1:5000"]
roles = ["Client"]
role.["Seed"].min-nr-of-members = 1
sharding {
journal-plugin-id = "akka.persistence.sharding"
snapshot-plugin-id = "akka.snapshot-store.sharding"
}
}}
The Seed node creates the sharding like so:
ClusterSharding.Get(system).Start(
typeName: "company-router",
entityProps: Props.Create(() => new CompanyDeliveryActor()),
settings: ClusterShardingSettings.Create(system),
messageExtractor: new RouteExtractor(100)
);
And the client creates a sharding proxy like so:
ClusterSharding.Get(system).StartProxy(
typeName: "company-router",
role: "Seed",
messageExtractor: new RouteExtractor(100));
The RouteExtractor is:
public class RouteExtractor : HashCodeMessageExtractor
{
public RouteExtractor(int maxNumberOfShards) : base(maxNumberOfShards)
{
}
public override string EntityId(object message) => (message as IHasRouting)?.Company?.VolumeId.ToString();
public override object EntityMessage(object message) => message;
}
In this scenario the VolumeId is always the same (just for experiment sake).
Both processes come to life but the Seed keeps throwing this error to the log:
[INFO][7/05/2017 9:00:58 AM][Thread 0003][akka://my-cluster-system/user/sharding
/company-routerCoordinator/singleton/coordinator] Message Register from akka.tcp
://my-cluster-system#127.0.0.1:5000/user/sharding/company-router to akka://my-cl
uster-system/user/sharding/company-routerCoordinator/singleton/coordinator was n
ot delivered. 4 dead letters encountered.
Ps. I am not using Lighthouse.
From the quick look, you're starting a cluster sharding proxy on your client node and you're telling it that sharded nodes are those using seed role. This doesn't match the cluster sharding definition on seed node, when you haven't specified any role.
Since there is no role to limit it, cluster sharding on a seed node will treat all nodes in the cluster as perfectly capable of hosting sharded actors - including client node, which doesn't have cluster sharding (non-proxy) instantiated on it.
This may not be the only issue, but you could either host cluster sharding on all of your nodes, or use ClusterShardingSettings.Create(system).WithRole("seed") to limit your shard only to a specific subset of nodes (having seed role) in the cluster.
Thanks Horusiath, that's fixed it:
return sharding.Start(
typeName: "company-router",
entityProps: Props.Create(() => new CompanyDeliveryActor()),
settings: ClusterShardingSettings.Create(system).WithRole("Seed"),
messageExtractor: new RouteExtractor(100)
);
The clustered shard is now communicating between the 2 processes. Thanks very much for that bit.