I've skimmed through the docs but haven't found anything yet.
Is it possible the retrieve the current position of a given job within Sidekiq's queue?
My use case is 2 workers over the default queue with hundreds of tasks of variable complexity. So if for example I submit a new task now and the queue size is say 10, how can I determine the position of my task after 5 minutes?
Thanks.
You can't without a linear scan of the entire queue; the position will change millisecond by millisecond as jobs are fetched and executed.
Sidekiq::Queue.new("default").each do |job|
p job
end
Related
I need to start a job every 30 minutes, but before a new job is being started I want the old but same job being terminated. This is to make sure the job always fetches the newest data file which is constantly being updated.
Right now I'm using the BlockingScheduler paired with my own condition to stop the job (stop job if processed 1k data etc.), I was wondering if APScheduler supports this "only 1 job at the same time and stop old one before new one" behavior natively
I've read the docs but I think the closest is still the default behavior which equals max_instances=1, this just prevents new jobs firing before the old job finishes, which is not what I'm looking for.
Any help is appreciated. Thanks!
After further research I came to a conclusion that this is not supported natively in APScheduler, but by inspired by
Get number of active instances for BackgroundScheduler jobs
, I modified the answer into a working way of detecting the number of current running instances of the same job, so when you have a infinite loop/long task executing, and you want the new instance to replace the old instance, you can add something like
if(scheduler._executors['default']._instances['set_an_id_you_like'] > 1):
# if multiple instances break loop/return
return
and this is what should look like when you start:
scheduler = BlockingScheduler(timezone='Asia/Taipei')
scheduler.add_job(main,'cron', minute='*/30', max_instances=3, next_run_time=datetime.now(),\
id='set_an_id_you_like')
scheduler.start()
but like the answer in the link, please refrain from doing this if someday there's a native way to do this, currently I'm using APScheduler 3.10
This method at least doesn't rely on calculating time.now() or datetime.datetime.now() in every iteration to check if the time has passed compared when the loop started. In my case since my job runs every 30 minutes, I didn't want to calculate deltatime so this is what I went for, hope this hacky method helped someone that googled for a few days to come here.
I have a simple job queue on Redis where new jobs are pushed with RPUSH and consumed with BLPOP. The jobs are stringified JSON objects that have an id field among other things (the json string is parsed by the workers).
Each job takes some time to do, so there can be a meaningful wait time. I'd like to be able to find a job's current position in the queue, so that I can give an update to whatever is waiting on that job. That is, be able to do something like "your current position is 300... 250... 200... 100... 10... your job is now being processed".
It can be assumed that the list may grow long but never too long, i.e. possibly 1000 entries but not 1 million.
After looking through the docs a bit, it seems like this is maybe easier said than done. A possible naive solution seems to be to just loop through the list until the element is found. Are there any performance issues with calling LINDEX a couple hundred times at a time like that?
Would appreciate any suggestions on other ways this can be done (or confirmation that LINDEX is the only way). The whole structure (even the usage of a list, or addition of some helper map/list) can be changed if needed, only requirement is that it run on Redis.
You can use a sorted set and a counter to more elegantly solve the problem.
Push a job
Call INCR counter to get a counter.
Use the counter as score of the job, and call ZADD jobs counter job-name
Pop a job
Call BZPOPMIN jobs to get the first unprocessed job.
Get job position
Call ZRANK jobs job-name to get the rank of the job, e.g. the current position of the job.
What is the purpose of setting the loop count? Is it just depend on how many times i want to run the test? Or it has other purpose of test with different loop count? Will it affect the final test result?
"If you give loop count as 2 then every request two times to the server"
I found this online, but i don't understand what it means.
Based on my understanding, the loop count set to 2 because of i want to repeat the test twice only. After the first test end, then the threads in first round test in dead before the second test starts. Then the new thread group will send the request to the server. Why "every request two times to the server"?
The loop count means each thread of your thread group will run the steps inside the loop twice if iteration is set to 2
The thread will start based on delay and rampup and not related to this setting
If your server has concurrent users limit, for example 100, and you want to execute more, as 600, you can set loop count as 6 and execute 600 requests with given server limits
It's the number of times for each JMeter thread (virtual user) to execute Samplers inside the Thread Group
Each JMeter thread executes Samplers upside down (or according to the Logic Controllers) so if there are no more Samplers to execute the thread will shut down. And it might be the case you won't be able to achieve the desired concurrency because some threads have already finished execution and some haven't been yet started like it's described in the JMeter Test Results: Why the Actual Users Number is Lower than Expected so you might want to increase the number of iterations or even set it to "Infinite" and control the test duration using "Duration" section of the Thread Group or Runtime Controller
I am trying to write streaming data into elasticsearch with apache-nifi.putElasticSearch processor,
PutElasticSearch has property named "Batch Size", when I set this value to 1 all events are written to elasticsearch ASAP.
But such a low "batch size" obviously not working when the load is high. So in order to have a reasonable throughput I need to set it to 1000.
My question is, does PutElasticSearch waits till the batch size of events available. If yes it can wait hours when there are 999 events waiting on processor.
I am searching to understand how logstash doing same job on elasticsearch output plugin. There may be some flushing logic implemented based on time ( if events are waiting ~2 sec flush events to elasticsearch )..
You have any idea?
Edit: I just found logstash implemented this https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-idle_flush_time :)
How can I do same functionality on nifi
According to the code the batch size parameter is a maximum number of FlowFiles from the incoming queue.
For example in case of value batch size = 1000:
1/ if incoming queue contains 1001 flow files - only 1000 will be taken in one transaction.
2/ if incoming queue contains 999 flow files - 999 will be taken in one transaction.
And everything will be processed as soon as there is something in the incoming queue and there are available threads in nifi.
references:
PutElasticsearch.java
ProcessSession.java
today I created a small batch of 20 categorization HITs with the name Grammatical or Ungrammatical using the web UI. Can you tell me the easiest way to manage this batch so that I can reduce its time allotted to 15 minutes from 1 hour and remove also remove the categorization of masters. This is a very simple task that's set to auto-approve within 1 hour, and I am fine with that. I just need to make it more lucrative for people to attempt this at the penny rate.
You need to register a new HITType with the relevant properties (reduced time and no masters qualification) and then perform a ChangeHITTypeOfHIT operation on all of the HITs in the batch.
API documentation here: http://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_ChangeHITTypeOfHITOperation.html