Akka Cluster Sharding - How to test

Akka Cluster Sharding - How to test - testing

I want to test a function that at some point asks an EventSourcedBehaviorWithEnforcedReplies using ClusterSharing
I prepare the ClusterSharding like:
ClusterSharding.get(testKit.system()).init(
Entity.of(
ENTITY_TYPE_KEY,
entityContext -> new Entity(entityContext.getEntityId())));
The function sends a command :
CompletionStage<ActorAnswer> promisedAnswer = sharding
.entityRefFor(ENTITY_TYPE_KEY, identifier)
.ask(CommandToExecute::new, ASK_TIMEOUT)
The CompletionStage never gets executed...
What am I missing?

After same research
1) check that in your application-test.conf
akka {
actor {
provider = cluster
}
}
2) Cluster needs to be created and join cluster (self join) - easiest that I found was programmatically
Cluster cluster = Cluster.get(testKit.system());
cluster.manager().tell(Join.create(cluster.selfMember().address()));
3) Then we can talk about cluster sharding :)
ClusterSharding.get(testKit.system()).init(
clusterSharding.init(
Entity.of(ENTITY_TYPE_KEY, entityContext -> new Entity(entityContext.getEntityId())));

Related

Kafka Parallel Consumer is not splitting work between different processes

I am using confluent parallel-consumer in order to acheive fast writes into different Data stores. I implemented my code and everything worked just fine locally with dockers.
Once I started several hosts with several consumers (with the same group id) I noticed that only one of the nodes (processes) is really consuming data. The topic I am reading from has 24 partitions, and I have 3 different nodes, I expected that kafka will split the work between them.
Here are parts of my code:
fun buildConsumer(config: KafkaConsumerConfig): KafkaConsumer<String, JsonObject> {
val props = Properties()
props[ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG] = config.kafkaBootstrapServers
props[ConsumerConfig.AUTO_OFFSET_RESET_CONFIG] = "earliest"
props[ConsumerConfig.GROUP_ID_CONFIG] = "myGroup"
// Auto commit must be false in parallel consumer
props[ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG] = false
props[ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG] = StringDeserializer::class.java.name
props[ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG] = JsonObjectDeSerializer::class.java.name
val consumer = KafkaConsumer<String, JsonObject>(props)
return consumer
}
private fun createReactParallelConsumer(): ReactorProcessor<String, JsonObject> {
val options = ParallelConsumerOptions.builder<String, JsonObject>()
.ordering(ParallelConsumerOptions.ProcessingOrder.KEY)
.maxConcurrency(10)
.batchSize(1)
.consumer(buildConsumer(kafkaConsumerConfig))
.build()
return ReactorProcessor(options)
}
And my main code:
pConsumer = createReactParallelConsumer()
pConsumer.subscribe(UniLists.of(kafkaConsumerConfig.kafkaTopic))
pConsumer.react { context ->
batchProcessor.processBatch(context)
}
Would appreciate any advice

We hit an issue that was closed in version 0.5.2.4 https://github.com/confluentinc/parallel-consumer/issues/409
The Parallel client kept old unfinished offsets, since our consumer was slow (many different reasons) we got to the end of the retention (earliest strategy), so every time we restarted the consumer, it was scanning all those incompatible offsets (which it did not truncate them - AKA the bug). Fix was just updating version from 0.5.2.3 to 0.5.2.4

Running Multiple Redis Sentinels through ServiceStack

I'm working on a project that is using a Redis Sentinel through Servicestack. When the project was set up the original developer used Redis for both Caching and for maintaining a series of queue that power the logic of the system. Due to performance issues we are planning to spin up a new Redis Sentinel box and split the functionalities with the Caching being done on one server, and the queuing being done on another.
I was able to make a couple small changes to a local instance I had to split it between two servers by using the RedisClient and the PooledClient
container.Register<IRedisClientsManager>(c => new RedisManagerPool(redCon, poolConfig));
container.Register<PooledRedisClientManager>(c => new PooledRedisClientManager(redCon2Test));
container.Register(c => c.Resolve<IRedisClientsManager>().GetClient());
container.Register(c => c.Resolve<PooledRedisClientManager>().GetClient());
// REDIS CACHE
container.Register(c => c.Resolve<PooledRedisClientManager>().GetCacheClient());
// SESSION
container.Register(c => new SessionFactory(c.Resolve<ICacheClient>()));
// REDIS MQ
container.Register<IMessageService>(c => new RedisMqServer(c.Resolve<IRedisClientsManager>())
{
DisablePriorityQueues = true,
DisablePublishingResponses = true,
RetryCount = 2
});
container.Register(q => q.Resolve<IMessageService>().MessageFactory);
this.RegisterHandlers(container.Resolve<IMessageService>() as RedisMqServer);
The problem though is I don't have Redis Sentinel set up on the machine I'm using, and when I tried to drop a Sentinel Connection in as a PooledRedis Connection, I receive compilation errors on the second start. It will let me cast it as a PooledRedisClientManager, but I wasn't sure if Pooled vs Sentinel was even something that would play well together to begin with
if (useSentinel)
{
var hosts = redCon.Split(',');
var sentinel = new RedisSentinel(hosts, masterName)
{
RedisManagerFactory = CreateRedisManager,
ScanForOtherSentinels = false,
SentinelWorkerConnectTimeoutMs = 150,
OnWorkerError = OnWorkerError,
OnFailover = OnSentinelFailover,
OnSentinelMessageReceived = (x, y) => Log.Debug($"MSG: {x} DETAIL: {y}")
};
container.Register(c => sentinel.Start());
var hosts2 = redCon.Split(',');
var sentinel2 = new RedisSentinel(hosts2, masterName)
{
RedisManagerFactory = CreatePooledRedisClientManager,
ScanForOtherSentinels = false,
SentinelWorkerConnectTimeoutMs = 150,
OnWorkerError = OnWorkerError,
OnFailover = OnSentinelFailover,
OnSentinelMessageReceived = (x, y) => Log.Debug($"MSG: {x} DETAIL: {y}")
};
container.Register<PooledRedisClientManager>(c => sentinel2.Start());
}
But honestly, I'm not sure if this is even the correct way to be trying to go about this. Should I even be using the Pooled manager at all? Is there a good way to register two different Redis Sentinel servers in the Container and split them in the way I am attempting?

ServiceStack only allows 1 IRedisClientsManager implementation per AppHost, if you're using RedisSentinel its .Start() method will return a pre-configured PooledRedisClientManager that utilizes the RedisSentinel configuration.
If you wanted RedisMqServer to use a different RedisSentinel cluster you should avoid duplicating Redis registrations in the IOC and just configure it directly with RedisMqServer, e.g:
container.Register<IMessageService>(c => new RedisMqServer(sentinel2.Start())
{
DisablePriorityQueues = true,
DisablePublishingResponses = true,
RetryCount = 2
});
However given RedisSentinel typically requires 6 nodes for setting up a minimal highly available configuration it seems counter productive to double the required infrastructure resources just to have a separate Sentinel Cluster for RedisMQ especially when the load for using Redis as a message transport should be negligible compared to the compute resources to process the messages. What’s the MQ throughput? You should verify the load on Redis servers is the bottleneck as it’s very unlikely.
I would recommend avoiding this duplicated complexity and use a different RedisMQ Server like see if Background MQ is an option where MQ Requests are executed in Memory in Background Threads, if you need to use a distributed MQ look at Rabbit MQ which is purpose built for the task and would require a lot less maintenance than trying to manage 2 separate RedisSentinel cluster configurations.

One task per node only in Apache Ignite

I'm relatively new to Apache Ignite. I'm using Ignite compute to distribute tasks to nodes. My goal is a task dispatcher that produces tasks and submits these only to nodes that are "free". One node can only do one task at a time. If all nodes have a task running, the dispatcher shall wait for the next node to become available and then submit the next task.
I can implement this with a queue and async Callables, however I wonder if there is an Ignite onboard class that does something like this? Not sure the class ComputeTaskSplitAdapter is what I need to look at, I'm not fully understanding its purpose.
Any help appreciated.
Server nodes can join and leave the cluster while tasks are distributed.
Tasks can take different amount of time on the nodes and as soon as a server finishes a task it shall get the next task.
Here's my node code:
JobStealingCollisionSpi spi = new JobStealingCollisionSpi();
spi.setActiveJobsThreshold(1);
IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setCollisionSpi(spi);
Ignition.start(cfg);
And this is my job distribution code (for testing):
JobStealingCollisionSpi spi = new JobStealingCollisionSpi();
spi.setActiveJobsThreshold(1);
IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setCollisionSpi(spi);
Ignition.setClientMode(true);
Ignite ignite = Ignition.start(cfg);
for (int i = 0; i < 10; i++)
{
ignite.compute().runAsync(new IgniteRunnable()
{
#Override
public void run()
{
System.out.print("Sleeping...");
try
{
Thread.sleep(10000);
} catch (InterruptedException e)
{
e.printStackTrace();
}
System.out.println("Done.");
}
});
}

Yes, Apache Ignite has direct support for it. Please take a look at the One-at-a-Time section in the Job Scheduling documentation: https://apacheignite.readme.io/docs/job-scheduling#section-one-at-a-time
Note that every server has its own waiting queue and servers will move to the next job in their queue immediately after they are done with a previous job.
If you would like even more aggressive scheduling, then you can take a look at Job-Stealing scheduling here: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/spi/collision/jobstealing/JobStealingCollisionSpi.html
With Job Stealing enabled, servers will still jobs from the job-queues on other servers once their own queue becomes empty. Most of the parameters are configurable.

Gemfire WAN with Peer to Peer combined

We are using the multi-site WAN configuration. We have two clusters across geographical distances in North America and Europe.
Context: Cluster 1 has two members A and B that are both gateway senders. Cluster B has two members C and D that are both gateway receivers. When member A in cluster 1 starts, it reads data from database and loads it into the gemfire cache which gets sent to the cluster 2. Everything so far is good.
Problem: If both members in Cluster 2 are restarted at the same time, they lose all the gemfire regions/data. At that point, we could restart member A in cluster 1, it again loads data from the DB and gets pushed to cluster B. But we would prefer to avoid the restart of member A and without persisting to hard disk.
Is there a solution where if cluster 2 is restarted, it can request a full copy of data from cluster 1?
Not sure if it's possible, but could we somehow setup peer to peer for the gateway receivers in cluster 2 (on top of WAN), so they would be updated automatically upon restart.
Thanks

Getting a full copy of data over WAN is not supported at this time. What you could do instead is run a function on all members of site A, that simply iterates over all data and puts it back again in the region. i.e something like:
public void execute(FunctionContext context) {
RegionFunctionContext ctx = (RegionFunctionContext)context;
Region localData = PartitionRegionHelper.getLocalDataForContext(ctx);
for (Object key : localData.keySet()) {
Object val = localData.get(key);
localData.put(key, val);
}
}

How to initiate warming when a cache is created?

I'd like to to use an Ignite cluster to warm a PARTITIONED cache from an existing database. The existing database is not partitioned and expensive to scan, so I'd like to perform a single scan when the cache is created by the cluster. Once the job completes, the result would be a cache containing all data from the existing database partitioned and evenly distributed across the cluster.
How do you implement a job that runs when a cache is created by Ignite?

Ignite integrates with underlying stores via CacheStore [1] implementations. Refer to [2] for details about your particular use case.
[1] https://apacheignite.readme.io/docs/persistent-store
[2] https://apacheignite.readme.io/docs/data-loading

You can create a Service that runs once on cluster start and then cancels itself. It can use a cache to store state, so it will not run if it's deployed in the cluster a second time.
The following abstract Service runs executeOnce once per cluster the first time it's deployed after cluster start:
abstract class ExecuteOnceService extends Service {
val ExecuteOnceCacheName = "_execute_once_service"
val config = new CacheConfiguration[String, java.lang.Boolean](ExecuteOnceCacheName)
.setCacheMode(CacheMode.PARTITIONED)
.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
#IgniteInstanceResource
var ignite: Ignite = _
override def execute(ctx: ServiceContext): Unit = {
val cache = ignite.getOrCreateCache(config)
val executed = cache.getAndPutIfAbsent(ctx.name(), java.lang.Boolean.TRUE)
if (executed != java.lang.Boolean.TRUE) executeOnce(ctx)
ignite.services().cancel(ctx.name())
}
def executeOnce(ctx: ServiceContext): Unit
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Akka Cluster Sharding - How to test - testing

Related

Kafka Parallel Consumer is not splitting work between different processes

Running Multiple Redis Sentinels through ServiceStack

One task per node only in Apache Ignite

Gemfire WAN with Peer to Peer combined

How to initiate warming when a cache is created?

Categories

Resources