Apache Ignite - Distibuted Queue and Executors - ignite

I am planning to use Apache Ignite Distributed Queue.
I am using Ignite with a spring boot application. So, on bootup, I will be adding 20 names in a queue. But, since there are 3 servers in a cluster, the same 20 names gets added 3 times. But, i want to add them only once in the queue.
Ignite ignite = Ignition.ignite();
IgniteQueue<String> queue = ignite.queue(
"queueName", // Queue name.
0, // Queue capacity. 0 for unbounded queue.
null // Collection configuration.
);
Distributed executors, will be able to poll from the queue and run the task. Here, the executor is expected to poll, run the task and then add the same name to the queue. Trying to achieve round robin here.
Only one executor should be running the same task at any point of time, though there are multiple servers in a cluster.
Any suggestion for this.

You can launch ignite cluster singleton service https://apacheignite.readme.io/docs/cluster-singletons which will fill data to queue. Also you can adding data from coordinator node (oldest node in cluster) ignite.cluster().forOldest().node().isLocal()

I fixed bootup time duplicate cache loading issue this way:
final IgniteAtomicLong cacheLoadCnt = ignite.atomicLong(cacheName + "Cnt", 0, true);
if (cacheLoadCnt.get() == 0) {
loadCache();
cacheLoadCnt.addAndGet(1);
}

Related

It is possible for only the Redis master instances to handle all reads and writes, and for the slaves to be used only for failover?

My work system consists of Spring web applications and it uses Redis as a transaction counter and it conditionally blocks transaction requests.
The transaction is as follows:
Check whether or not data exists. (HGET)
If it doesn't, saves new one with count 0 and set expiration time. (HSET, EXPIRE)
Increases a count value. (INCRBY)
If the increased count value reaches a specific configured limit, it sets the transaction to 'blocked' (HSET)
The limit value is my company's business policy.
Such read and write operations are requested one after another, immediately.
Currently, I use one Redis instance at one machine. (only master, no replications.)
I want to get Redis HA, so I need slave insntaces but at the same time, I want to have all reads and writes to Redis only to master insntaces because of slave data relication latency.
After some research, I found that it is a good idea to have a proxy server to use Redis HA. However, with proxy, it seems impossible to use only the master instances to receive requests and the slaves only for failover.
Is it possible??
Thanks in advance.
What you need is Redis Sentinel.
With Redis Sentinel, you can get the master address from sentinel, and read/write with master. If master is down, Redis Sentinel will do failover, and elect a new master. Then you can get the address of the new master from sentinel.
As you're going to use Lettuce for Redis cluster driver, you should set read preference to Master and things should be working fine a sample code might look like this.
LettuceClientConfiguration lettuceClientConfiguration =
LettuceClientConfiguration.builder().readFrom(ReadFrom.MASTER).build();
RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration();
List<RedisNode> redisNodes = new ArrayList<>();
redisNodes.add(new RedisNode("127.0.0.1", 9000));
redisNodes.add(new RedisNode("127.0.0.1", 9001));
redisNodes.add(new RedisNode("127.0.0.1", 9002));
redisNodes.add(new RedisNode("127.0.0.1", 9003));
redisNodes.add(new RedisNode("127.0.0.1", 9004));
redisNodes.add(new RedisNode("127.0.0.1", 9005));
redisClusterConfiguration.setClusterNodes(redisNodes);
LettuceConnectionFactory lettuceConnectionFactory =
new LettuceConnectionFactory(redisClusterConfiguration, lettuceClientConfiguration);
lettuceConnectionFactory.afterPropertiesSet();
See in action at Redis Cluster Configuration

Inconsistent behavior of Quartz2 scheduler in Apache Camel

I have an Apache Camel project that is using Quartz2 as the scheduler. The requirement is to make it a cluster. The code is deployed to weblogic 12c. the quartz is configured as per many samples with clustering enabled.
This is my properties file (without the datasource)
org.quartz.scheduler.instanceName = MyScheduler
org.quartz.scheduler.instanceId = AUTO
org.quartz.scheduler.skipUpdateCheck = true
org.quartz.scheduler.jobFactory.class = org.quartz.simpl.SimpleJobFactory
org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount = 10
org.quartz.threadPool.threadPriority = 5
org.quartz.jobStore.misfireThreshold = 60000
org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.oracle.OracleDelegate
org.quartz.jobStore.useProperties=true
org.quartz.JobBuilder.requestRecovery=true
org.quartz.jobStore.isClustered = true
org.quartz.jobStore.clusterCheckinInterval = 20000
When I deploy and start both nodes I see that the QRTZ_SCHEDULER_STATE table has extra entry for one of the nodes:
MyScheduler-routerContext server_node21567108546690
MyScheduler-routerContext-1 server_node11565896495100
MyScheduler-routerContext-1 server_node11567108547295
And I am guessing because of that the one node is being called once in a while while the other node gets called all the time (so occasionally both nodes are invoked at the same time).
I have tried to do a clean restart of weblogic nodes but the issue is still there
This is how my route(s) look like:
from("quartz2://provRegGroup/createUsersTrigger?cron={{create_users_cron}}&job.name=createUsersJob")
.routeId("createUsersRB")
.log("**** starting check for create users");
//where
//create_users_cron=0+0,5,10,15,20,25,30,35,40,45,50,55+*+*+*+?
//expecting one node being called by the scheduler at a time..
I figured out what caused the issue. apparently there were orphan weblogic processes that were running on one (or even both nodes) - this would be a question to our tech archs - why this was such a mess.. ps was showing two weblogic servers running on a node - one that I started recently and one that was there for say a month..
expecting this would never happen to production environment I assume the issue has been resolved..

Camel-rabbitmq: Consuming from multiple rabbitmq queues in a single camel consumer

I have the following scenario:
There are 3 rabbitmq queues to which producers push their messages based on the priority of the message.(myqueue_high, myqueue_medium, myqueue_low)
I want to have a single consumer which can pull from these queues in order or priority i.e. it keeps pulling from high queue as long as messages are there. o/w it pulls from medium. If medium is also empty it pulls from low.
How do i achieve this? Do i need to write a custom component?
It would be easier to put all the messages to one queue but with different priorities. That way, the priority sorting would be done in the broker and the Camel consumer would get the messages already sorted by priority. However, RabbitMQ implements the FIFO principle and does not support priority handling (yet).
Solution 1
Camel allows you to reorganise messages based on some comparator using a Resequencer: https://camel.apache.org/resequencer.html:
from("rabbitmq://hostname[:port]/myqueue_high")
.setHeader("priority", constant(9))
.to("direct:messageProcessing");
from("rabbitmq://hostname[:port]/myqueue_medium")
.setHeader("priority", constant(5))
.to("direct:messageProcessing");
from("rabbitmq://hostname[:port]/myqueue_low")
.setHeader("priority", constant(1))
.to("direct:messageProcessing");
// sort by priority by allowing duplicates (message can have same priority)
// and use reverse ordering so 9 is first output (most important), and 0 is last
// (of course we could have set the priority the other way around, but this way
// we keep align with the JMS specification...)
// use batch mode and fire every 3th second
from("direct:messageProcessing")
.resequence(header("priority")).batch().timeout(3000).allowDuplicates().reverse()
.to("mock:result");
That way, all incoming messages are routed to the same sub route (direct:messageProcessing) where the messages are reordered according the priority header set by the incoming routes.
Solution 2
Use SEDA with a prioritization queue:
final PriorityBlockingQueueFactory<Exchange> priorityQueueFactory = new PriorityBlockingQueueFactory<Exchange>();
priorityQueueFactory.setComparator(new Comparator<Exchange>() {
#Override
public int compare(final Exchange exchange1, final Exchange exchange2) {
final Integer prio1 = (Integer) exchange1.getIn().getHeader("priority");
final Integer prio2 = (Integer) exchange2.getIn().getHeader("priority");
return -prio1.compareTo(prio2); // 9 has higher priority then 0
}
});
final SimpleRegistry registry = new SimpleRegistry();
registry.put("priorityQueueFactory", priorityQueueFactory);
final ModelCamelContext context = new DefaultCamelContext(registry);
// configure and start your context here...
The route definition:
from("rabbitmq://hostname[:port]/myqueue_high")
.setHeader("priority", constant(9))
.to("seda:priority?queueFactory=#priorityQueueFactory"); // reference queue in registry
from("rabbitmq://hostname[:port]/myqueue_medium")
.setHeader("priority", constant(5))
.to("seda:priority?queueFactory=#priorityQueueFactory");
from("rabbitmq://hostname[:port]/myqueue_low")
.setHeader("priority", constant(1))
.to("seda:priority?queueFactory=#priorityQueueFactory");
from("seda:priority")
.to("direct:messageProcessing");
Solution 3
Use JMS such as Camel's ActiveMQ component instead of SEDA if you need persistence in case of failures. Just forward the incoming messages from RabbitMQ to a JMS destination with setting the JMSPriority header.
Solution 4
Skip the RabbitMQ entirely and just use a JMS broker such as ActiveMQ that supports prioritization.

Celery with rabbitmq creates results multiple queues

I have installed Celery with RabbitMQ.
Problem is that for every result that is returned, Celery will create in the Rabbit, queue with the task's ID in the exchange celeryresults.
I still want to have results, but on ONE queue.
my celeryconfig:
from datetime import timedelta
OKER_URL = 'amqp://'
CELERY_RESULT_BACKEND = 'amqp'
#CELERY_IGNORE_RESULT = True
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT=['json', 'application/json']
CELERY_TIMEZONE = 'Europe/Oslo'
CELERY_ENABLE_UTC = True
from celery.schedules import crontab
CELERYBEAT_SCHEDULE = {
'every-minute': {
'task': 'tasks.remote',
'schedule': timedelta(seconds=30),
'args': (),
},
}
Is that possible? How?
Thanks!
amqp backend creates a new queue for each task. Alternatively, there is a new rpc backend which keeps results in a single queue.
http://docs.celeryproject.org/en/master/whatsnew-3.1.html#new-rpc-result-backend
Nothing unusual.
That is how celery works when we use amqp as result backend. It will create a new temporary queue for every result corresponding to each tasks that worker consumes.
If you are not interested in the result, you can try CELERY_IGNORE_RESULT = True setting
If you do want to store the result, then i would recommend using a different result backend like Redis.
You say you want Celery to keep the result on one queue. Now, to answer your question, let me ask you one:
How do you expect each producer to check for it's relevant result without reading every single message off the queue to find the one it needs/wants?
In essence, what you want is a database of key-value pairs so that the lookup is O(1). The only way to do that with a queue broker is to create one queue for each "pair".
I understand that having many GUID queues is not neat or pretty, but it's conceptually the only way to do it on a messaging broker.
This solution won't keep all the results to ONE queue, but it will at least clean up the extra queues right when you're done with them.
If you use Redis as your backend, when you're done with a result that has created an errant queue, run result.forget(). This will cause both the result and the queue for the result to disappear. This can help you manage the number of queues you have, and prevent OOM issues.

Why does celery add thousands of queues to rabbitmq that seem to persist long after the tasks completel?

I am using celery with a rabbitmq backend. It is producing thousands of queues with 0 or 1 items in them in rabbitmq like this:
$ sudo rabbitmqctl list_queues
Listing queues ...
c2e9b4beefc7468ea7c9005009a57e1d 1
1162a89dd72840b19fbe9151c63a4eaa 0
07638a97896744a190f8131c3ba063de 0
b34f8d6d7402408c92c77ff93cdd7cf8 1
f388839917ff4afa9338ef81c28aad75 0
8b898d0c7c7e4be4aa8007b38ccc00ea 1
3fb4be51aaaa4ac097af535301084b01 1
This seems to be inefficient, but further I have observed that these queues persist long after processing is finished.
I have found the task that appears to be doing this:
#celery.task(ignore_result=True)
def write_pages(page_generator):
g = group(render_page.s(page) for page in page_generator)
res = g.apply_async()
for rendered_page in res:
print rendered_page # TODO: print to file
It seems that because these tasks are being called in a group, they are being thrown into the queue but never being released. However, I am clearly consuming the results (as I can view them being printed when I iterate through res. So, I do not understand why those tasks are persisting in the queue.
Additionally, I am wondering if the large number queues that are being created is some indication that I am doing something wrong.
Thanks for any help with this!
Celery with the AMQP backend will store task tombstones (results) in an AMQP queue named with the task ID that produced the result. These queues will persist even after the results are drained.
A couple recommendations:
Apply ignore_result=True to every task you can. Don't depend on results from other tasks.
Switch to a different backend (perhaps Redis -- it's more efficient anyway): http://docs.celeryproject.org/en/latest/userguide/tasks.html
Use CELERY_TASK_RESULT_EXPIRES (or on 4.1 CELERY_RESULT_EXPIRES) to have a periodic cleanup task remove old data from rabbitmq.
http://docs.celeryproject.org/en/master/userguide/configuration.html#std:setting-result_expires