Check the current position in a Redis list of some list element - redis

I have a simple job queue on Redis where new jobs are pushed with RPUSH and consumed with BLPOP. The jobs are stringified JSON objects that have an id field among other things (the json string is parsed by the workers).
Each job takes some time to do, so there can be a meaningful wait time. I'd like to be able to find a job's current position in the queue, so that I can give an update to whatever is waiting on that job. That is, be able to do something like "your current position is 300... 250... 200... 100... 10... your job is now being processed".
It can be assumed that the list may grow long but never too long, i.e. possibly 1000 entries but not 1 million.
After looking through the docs a bit, it seems like this is maybe easier said than done. A possible naive solution seems to be to just loop through the list until the element is found. Are there any performance issues with calling LINDEX a couple hundred times at a time like that?
Would appreciate any suggestions on other ways this can be done (or confirmation that LINDEX is the only way). The whole structure (even the usage of a list, or addition of some helper map/list) can be changed if needed, only requirement is that it run on Redis.

You can use a sorted set and a counter to more elegantly solve the problem.
Push a job
Call INCR counter to get a counter.
Use the counter as score of the job, and call ZADD jobs counter job-name
Pop a job
Call BZPOPMIN jobs to get the first unprocessed job.
Get job position
Call ZRANK jobs job-name to get the rank of the job, e.g. the current position of the job.

Related

Pyhon APScheduler stop jobs before starting a new one

I need to start a job every 30 minutes, but before a new job is being started I want the old but same job being terminated. This is to make sure the job always fetches the newest data file which is constantly being updated.
Right now I'm using the BlockingScheduler paired with my own condition to stop the job (stop job if processed 1k data etc.), I was wondering if APScheduler supports this "only 1 job at the same time and stop old one before new one" behavior natively
I've read the docs but I think the closest is still the default behavior which equals max_instances=1, this just prevents new jobs firing before the old job finishes, which is not what I'm looking for.
Any help is appreciated. Thanks!
After further research I came to a conclusion that this is not supported natively in APScheduler, but by inspired by
Get number of active instances for BackgroundScheduler jobs
, I modified the answer into a working way of detecting the number of current running instances of the same job, so when you have a infinite loop/long task executing, and you want the new instance to replace the old instance, you can add something like
if(scheduler._executors['default']._instances['set_an_id_you_like'] > 1):
# if multiple instances break loop/return
return
and this is what should look like when you start:
scheduler = BlockingScheduler(timezone='Asia/Taipei')
scheduler.add_job(main,'cron', minute='*/30', max_instances=3, next_run_time=datetime.now(),\
id='set_an_id_you_like')
scheduler.start()
but like the answer in the link, please refrain from doing this if someday there's a native way to do this, currently I'm using APScheduler 3.10
This method at least doesn't rely on calculating time.now() or datetime.datetime.now() in every iteration to check if the time has passed compared when the loop started. In my case since my job runs every 30 minutes, I didn't want to calculate deltatime so this is what I went for, hope this hacky method helped someone that googled for a few days to come here.

Retrieve Queue position in sidekiq

I've skimmed through the docs but haven't found anything yet.
Is it possible the retrieve the current position of a given job within Sidekiq's queue?
My use case is 2 workers over the default queue with hundreds of tasks of variable complexity. So if for example I submit a new task now and the queue size is say 10, how can I determine the position of my task after 5 minutes?
Thanks.
You can't without a linear scan of the entire queue; the position will change millisecond by millisecond as jobs are fetched and executed.
Sidekiq::Queue.new("default").each do |job|
p job
end

Negamax: what to do with "partial" results after canceling a search?

I'm implementing negamax with alpha/beta transposition table based on the pseudo code here, with roughly this algorithm:
NegaMax():
1. Transposition Table lookup
2. Loop through moves
2a. **Bail if I'm out of time**
2b. Make move, call -NegaMax, undo move
2c. Update bestvalue, alpha/beta but if appropriate
3. Transposition table store/update
4. Return bestvalue
I'm also using iterative deepening, calling NegaMax with progressively higher depths.
My question is: when I determine I've run out of time (2a. in the beginning of move loop) what is the right thing to do? Do I bail immediately (not updating the transposition table) or do I just break the loop (saving whatever partial work I've done)?
Currently, I return null at that point, signifying that the search was canceled before "completing" that node (whether by trying every move or the alpha/beta cut). The null gets propagated up and up the stack, and each node on the way up bails by return, so step 3 never runs.
Essentially, I only store values in the TT if the node "completed". The scenario I keep seeing with the iterative deepening:
I get through depths 1-5 really quick, so the TT has a depth = 5, type = Exact entry.
The depth = 6 search is taking a long time, so I bail.
I ultimately return the best move in the transposition table, which is the move I found during the depth = 5 search. The problem is, if I start a new depth = 6 search, it feels like I'm starting it from scratch. However, if I save whatever partial results I found, I worry that I'll have corrupted my TT, potentially by overwriting the completed depth = 5 entry with an incomplete depth = 6 entry.
If the search wasn't completed, the score is inaccurate and should likely not be added to the TT. If you have a best move from the previous ply and it is still best and the score hasn't dropped significantly, you might play that.
On the other hand, if at depth 6 you discover that the opponent has a mate in 3 (oops!) or could win your queen, you might have to spend even more time to try to resolve that.
That would leave you with less time for the remaining moves (if any...), but it might be better to be slightly short on time than to get mated with plenty of time remaining. :-)

Is there any option to use redis.expire more elastically?

I got a quick simple question,
Assume that if server receives 10 messages from user within 10 minutes, server sends a push email.
At first I thought it very simple using redis,
incr("foo"), expire("foo",60*10)
and in Java, handle the occurrence count like below
if(jedis.get("foo")>=10){sendEmail();jedis.del("foo");}
but imagine if user send one message at first minute and send 8 messages at 10th minute.
and the key expires, and user again send 3 messages in the next minute.
redis key will be created again with value 3 which will not trigger sendEmail() even though user send 11 messages in 2 minutes actually.
we're gonna use Redis and we don't want to put receive time values to redis.
is there any solution ?
So, there's 2 ways of solving this-- one to optimize on space and the other to optimize on speed (though really the speed difference should be marginal).
Optimizing for Space:
Keep up to 9 different counters; foo1 ... foo9. Basically, we'll keep one counter for each of the possible up to 9 different messages before we email the user, and let each one expire as it hits the 10 minute mark. This will work like a circular queue. Now do this (in Python for simplicity, assuming we have a connection to Redis called r):
new_created = False
for i in xrange(1,10):
var_name = 'foo%d' % i
if not (new_created or r.exists(var_name)):
r.set(var_name, 0)
r.expire(var_name, 600)
new_created = True
if not r.exists(var_name): continue
r.incr(var_name, 1)
if r.get(var_name) >= 10:
send_email(user)
r.del(var_name)
If you go with this approach, put the above logic in a Lua script instead of the example Python, and it should be quite fast. Since you'll at most be storing 9 counters per user, it'll also be quite space efficient.
Optimizing for speed:
Keep one Redis Sortet Set per user. Every time a user sends a message, add to his sorted set with a key equal to the timestamp and an arbitrary value. Then just do a ZCOUNT(now, now - 10 minutes) and send an email if that's greater than 10. Then ZREMRANGEBYSCORE(now - 10 minutes, inf). I know you said you didn't want to keep timestamps in Redis, but IMO this is a better solution, and you're going to have to hold some variant on timestamps somewhere no matter what.
Personally I'd go with the latter approach because the space differences are probably not that big, and the code can be done quickly in pure Redis, but up to you.

Redis unique increment

I am trying to implement a scoring system on redis. I have no experience with it what-so-ever.
What my app should be doing is increasing a value ONLY if the user has not already voted, so I was thinking of something like this:
INCR voteme
but only if this is has not been increased already, so wanted to do the following:
SET voteme:voterip 1
so then i would count the elements. Problem is I think this is not doable in redis, and have to think of another approach.
Any ideas?
EXTRA question:
I want to make this data persistent by writing the resulting count (e.g: 24) to the corresponding user, in mongodb. Some pseudo code would be of great help
I would not store a counter but directly a set containing all the users who have already voted.
Let's suppose a vote is organized for user 1. Each time, a user X vote for user 1, you can execute:
SADD user:1:votes X
The number of votes for user 1 can be easily retrieved:
SCARD user:1:votes
Now if you need to keep this count in sync with another store, you can execute (still supposing user X votes for user 1):
MULTI
SADD users:1:votes X
SCARD user:1:votes
EXEC
The trick is the SADD command returns the number of items effectively added to the set. If the item already exists, it returns 0. So it is quite easy to run this multi/exec block, check the result of SADD, get the cardinality of the set (number of votes), and push the cardinality to another store only if the set has been altered by the transaction.
This way, you keep the counter up-to-date in your persistent store (in real time), while filtering useless voting events.