Keda RabbitMQ - Keda not spawning additional jobs when queue has few messages - keda

I have a Keda Scaledjob configured to spawn 1 job per message having the state 'ready' in RabbitMQ.
It has a max replica count set to 70.
Observed:
When there are many messages in the queue, let's say 300, Keda correctly creates new jobs to reach the max replica count limit => So there are 70 running jobs each consuming 1 message from the queue.
When there are few messages in the queue, let's say 1 Ready and 1 Unacked, Keda refuses to create a new job even if there's enough resources in the cluster.
It's like waiting until the current running job finishes to spawn a new job.
Here's my Keda configuration :
---
# Reference - https://keda.sh/docs/2.0/concepts/scaling-jobs/
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: scaledjob-puppeteer
labels:
environment: development
app: puppeteer-display
spec:
jobTargetRef:
parallelism: 1 # [max number of desired pods](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#controlling-parallelism)
completions: 1 # [desired number of successfully finished pods](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#controlling-parallelism)
activeDeadlineSeconds: 7200 # (2 hours) Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer
backoffLimit: 2 # Specifies the number of retries before marking this job failed. Defaults to 6
template:
spec:
volumes:
...
containers:
...
pollingInterval: 10
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
maxReplicaCount: 75
triggers:
- type: rabbitmq
metadata:
protocol: amqp
queueName: tasks
mode: QueueLength
value: "1"
authenticationRef:
name: keda-trigger-auth-rabbitmq-conn
---
How to make Keda to create a job whenever the queue has >= 1 message ?
Edit: It seems like it waits for at least 1 hour before creating the new job.

The problem seems to be the missing scalingStrategy setting. You can add following configuration:
scalingStrategy:
strategy: accurate
The accurate setting is used when you consume messages from your queue instead of locking the messages. This is often used in other message queues.
For reference you can look into https://keda.sh/docs/2.7/concepts/scaling-jobs/
You can find further information about the scaling strategies in the details section.

Related

Airflow 2.0 - Task stuck in queued state even though pool slots available

python dependencies:
apache-airflow==2.1.3
redis==3.5.3
celery==4.4.7
airflow config:
core.parallelism = 1000
core.dag_concurrency = 400
core.max_active_runs_per_dag = 1
celery.worker_concurrency = 6
celery.broker_url = redis
celery.result_backend = mysql
Facing this issue after upgrade to airflow 2.0
Not all tasks stuck in "queued" state, only 1 task stuck at a time (out of 100+).
Sometimes tasks stuck in "running" state also.
This issue happens once in 1-2 days.
This happens even though pool slots available to use (Slots = 128, Running slots = 0, Queued slots = 1)
As per worker logs (for both stuck tasks and normal tasks):
INFO/MainProcess] Received task: airflow.executors.celery_executor.execute_command[066398d7-98e0-4f17-bd2b-93ae36438577]
INFO/ForkPoolWorker-6] Executing command in Celery: ['airflow', 'tasks', 'run', 'dag_name', 'task01', '2021-10-26T14:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', '/dag_dir/dag.py']
I see below log for all normal working tasks, this is missing for the stuck tasks:
WARNING/ForkPoolWorker-1] Running <TaskInstance: dag.task02 2021-10-26T14:00:00+00:00 [queued]> on host worker02.localdomain

How to set up a mirrored queue that will work when the master node goes down?

I have a cluster of two rabbitmq servers in my dev environment and I want to make it so that the queue and all of the messages will be available when the original master goes down.
I made a durable queue on a durable exchange with the following attributes:
ha-mode : all
ha-sync-mode : automatic
x-queue-master-locator : min-masters
I also published a persistant message to the queue.
When I bring down the host that is the master for the queue the state changes to down. I expected that ha-mode all would copy the queue and its messages to all nodes, that ha-sync-mode would keep the nodes synced, and that x-queue-master-locator would move the queue to the other node or in production to the node with the least queues. How do I set up a queue so that I can achieve this?
Edit(More info):
Server info:
rmq: 3.7.17
Erlang: 22.0.7
My config for both nodes:
vm_memory_high_watermark.relative = 0.65
vm_memory_high_watermark_paging_ratio = 0.8
disk_free_limit.relative = 2.0
channel_max = 32
num_acceptors.tcp = 20
num_acceptors.ssl = 0
handshake_timeout = 10000
frame_max = 160000
mirroring_sync_batch_size = 1024
background_gc_enabled = true
background_gc_target_interval = 300000
These attributes mean nothing when you create a queue with them. You need to create a Policy that adds these attributes to the queue.

Apache Ignite - Distibuted Queue and Executors

I am planning to use Apache Ignite Distributed Queue.
I am using Ignite with a spring boot application. So, on bootup, I will be adding 20 names in a queue. But, since there are 3 servers in a cluster, the same 20 names gets added 3 times. But, i want to add them only once in the queue.
Ignite ignite = Ignition.ignite();
IgniteQueue<String> queue = ignite.queue(
"queueName", // Queue name.
0, // Queue capacity. 0 for unbounded queue.
null // Collection configuration.
);
Distributed executors, will be able to poll from the queue and run the task. Here, the executor is expected to poll, run the task and then add the same name to the queue. Trying to achieve round robin here.
Only one executor should be running the same task at any point of time, though there are multiple servers in a cluster.
Any suggestion for this.
You can launch ignite cluster singleton service https://apacheignite.readme.io/docs/cluster-singletons which will fill data to queue. Also you can adding data from coordinator node (oldest node in cluster) ignite.cluster().forOldest().node().isLocal()
I fixed bootup time duplicate cache loading issue this way:
final IgniteAtomicLong cacheLoadCnt = ignite.atomicLong(cacheName + "Cnt", 0, true);
if (cacheLoadCnt.get() == 0) {
loadCache();
cacheLoadCnt.addAndGet(1);
}

Dataflow's BigQuery inserter thread pool exhausted

I'm using Dataflow to write data into BigQuery.
When the volume gets big and after some time, I get this error from Dataflow:
{
metadata: {
severity: "ERROR"
projectId: "[...]"
serviceName: "dataflow.googleapis.com"
region: "us-east1-d"
labels: {…}
timestamp: "2016-08-19T06:39:54.492Z"
projectNumber: "[...]"
}
insertId: "[...]"
log: "dataflow.googleapis.com/worker"
structPayload: {
message: "Uncaught exception: "
work: "[...]"
thread: "46"
worker: "[...]-08180915-7f04-harness-jv7y"
exception: "java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask#1a1680f rejected from java.util.concurrent.ThreadPoolExecutor#b11a8a1[Shutting down, pool size = 100, active threads = 100, queued tasks = 2316, completed tasks = 1192]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:681)
at com.google.cloud.dataflow.sdk.util.BigQueryTableInserter.insertAll(BigQueryTableInserter.java:218)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$StreamingWriteFn.flushRows(BigQueryIO.java:2155)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$StreamingWriteFn.finishBundle(BigQueryIO.java:2113)
at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.finishBundle(DoFnRunnerBase.java:158)
at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.finishBundle(SimpleParDoFn.java:196)
at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.finishBundle(ForwardingParDoFn.java:47)
at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.finish(ParDoOperation.java:62)
at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:79)
at com.google.cloud.dataflow.sdk.runners.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:657)
at com.google.cloud.dataflow.sdk.runners.worker.StreamingDataflowWorker.access$500(StreamingDataflowWorker.java:86)
at com.google.cloud.dataflow.sdk.runners.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:483)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)"
logger: "com.google.cloud.dataflow.sdk.runners.worker.StreamingDataflowWorker"
stage: "F10"
job: "[...]"
}
}
It looks like I'm exhausting the thread pool defined in BigQueryTableInserter.java:84. This thread pool has an hardcoded size of 100 threads and cannot be configured.
My questions are:
How could I avoid this error?
Am I doing something wrong?
Shouldn't the pool size be configurable? How can 100 threads be the perfect fit for all needs and machine types?
Here's a bit of context of my usage:
I'm using Dataflow in streaming mode, reading from Kafka using KafkaIO.java
"After some time" is a few hours, (less than 12h)
I'm using 36 workers of type n1-standard-4
I'm reading around 180k messages/s from Kafka (about 130MB/s of network input to my workers)
Messages are grouped together, outputting around 7k messages/s into BigQuery
Dataflow workers are in the us-east1-d zone, BigQuery dataset location is US
You aren't doing anything wrong, though you may need more resources, depending on how long volume stays high.
The streaming BigQueryIO write does some basic batching of inserts by data size and row count. If I understand your numbers correctly, your rows are large enough that each is being submitted to BigQuery in its own request.
It seems that the thread pool for inserts should install ThreadPoolExecutor.CallerRunsPolicy which causes the caller to block and run jobs synchronously when they exceed the capacity of the executor. I've posted PR #393. This will convert the work queue overflow into pipeline backlog as all the processing threads block.
At this point, the issue is standard:
If the backlog is temporary, you'll catch up once volume decreases.
If the backlog grows without bound, then of course it will not solve the issue and you will need to apply more resources. The signs should be the same as any other backlog.
Another point to be aware of is that around 250 rows/second per thread this will exceed the BigQuery quota of 100k updates/second for a table (such failures will be retried, so you might get past them anyhow). If I understand your numbers correctly, you are far from this.

RabbitMQ - Scheduled Queue - Dead Letter Queue - Good practise

we have setup some workflow environment with Rabbit.
It solves our needs but I like to know if it is also good practise to do it like we do for scheduled tasks.
Scheduling means no mission critical 100% adjusted time. So if a job should be retried after 60 seconds, it does mean 60+ seconds, depends on when the queue is handled.
I have created one Q_WAIT and made some headers to transport settings.
Lets do it like:
Worker is running subscribed on Q_ACTION
If the action missed (e.g. smtp server not reachable)
-> (Re-)Publish the message to Q_WAIT and set properties.headers["scheduled"] = time + 60seconds
Another process loops every 15 seconds through all messages in Q_WAIT by method pop() and NOT by subscribed
q_WAIT.pop(:ack => true) do |delivery_info,properties,body|...
if (properties.headers["scheduled"] has reached its time)
-> (Re-)Publish the message back to Q_ACTION
ack(message)
after each loop, the connection is closed so that the NOT (Re-)Published are left in Q_WAIT because they were not acknowledged.
Can someone confirm this as a working (good) practise.
Sure you can use looping process like described in original question.
Also, you can utilize Time-To-Live Extension with Dead Letter Exchanges extension.
First, specify x-dead-letter-exchange Q_WAIT queue argument equal to current exchange and x-dead-letter-routing-key equal to routing key that Q_ACTION bound.
Then set x-message-ttl queue argument set or set message expires property during publishing if you need custom per-message ttl (which is not best practice though while there are some well-known caveats, but it works too).
In this case your messages will be dead-lettered from Q_WAIT to Q_ACTION right after their ttl expires without any additional consumers, which is more reliable and stable.
Note, if you need advanced re-publish logic (change message body, properties) you need additional queue (say Q_PRE_ACTION) to consume messages from, change them and then publish to target queue (say Q_ACTION).
As mentioned here in comments I tried that feature of x-dead-letter-exchange and it worked for most requirements. One question / missunderstandig is TTL-PER-MESSAGE option.
Please look on the example here. From my understanding:
the DLQ has a timeout of 10 seconds
so first message will be available on subscriber 10 seconds after publishing.
the second message is posted 1 second after the first with a message-ttl (expiration) of 3 seconds
I would expect the second message should be prounounced after 3 seconds from publishing and before first message.
But it did not work like that, both are available after 10 seconds.
Q: Shouldn't the message expiration overrule the DLQ ttl?
#!/usr/bin/env ruby
# encoding: utf-8
require 'bunny'
B = Bunny.new ENV['CLOUDAMQP_URL']
B.start
DELAYED_QUEUE='work.later'
DESTINATION_QUEUE='work.now'
def publish
ch = B.create_channel
# declare a queue with the DELAYED_QUEUE name
q = ch.queue(DELAYED_QUEUE, :durable => true, arguments: {
# set the dead-letter exchange to the default queue
'x-dead-letter-exchange' => '',
# when the message expires, set change the routing key into the destination queue name
'x-dead-letter-routing-key' => DESTINATION_QUEUE,
# the time in milliseconds to keep the message in the queue
'x-message-ttl' => 10000,
})
# publish to the default exchange with the the delayed queue name as routing key,
# so that the message ends up in the newly declared delayed queue
ch.basic_publish('message content 1 ' + Time.now.strftime("%H-%M-%S"), "", DELAYED_QUEUE, :persistent => true)
puts "#{Time.now}: Published the message 1"
# wait moment before next publish
sleep 1.0
# puts this with a shorter ttl
ch.basic_publish('message content 2 ' + Time.now.strftime("%H-%M-%S"), "", DELAYED_QUEUE, :persistent => true, :expiration => "3000")
puts "#{Time.now}: Published the message 2"
ch.close
end
def subscribe
ch = B.create_channel
# declare the destination queue
q = ch.queue DESTINATION_QUEUE, durable: true
q.subscribe do |delivery, headers, body|
puts "#{Time.now}: Got the message: #{body}"
end
end
subscribe()
publish()
sleep