I wanna implement a "rpushnx" function that:
if the key exists, do nothing. else
rpush strings to the list.
it is a multi-thread environment.
Currently, the code snippets are just like this:
if (!redis.exists(db, key)) {
synchronized (MyClass.class) {
if (!redis.exists(db, key)) redisClient.rpush(db, key, list);
}
}
But I think it is a little bit trivial.
Is there some nicer way to have it done ?
Many thanks in advance.
Yes, there is a better way. Your solution may work if you have a single multi-threaded application server, but it will not work in a distributed system with several application servers. Plus, it requires 3 roundtrips when the key does not exist.
You will be better served by leveraging a Lua script. Redis always executes Lua scripts atomically.
For instance:
eval "if redis.call( 'EXISTS', KEYS[1] ) == 0 then redis.call( 'RPUSH', KEYS[1], unpack(ARGV) ) end" 1 mykey val1 val2 val3 val4
Related
I'm building a cache implementation in Java using Redisson. I want to use it to cache a numerical value. So I'm using getAtomicLong() like so:
RAtomicLong userNumber = redissonClient.getAtomicLong("my-key");
long value = userNumber.get();
However, the docs aren't very descriptive about what happens here, and so I have a few questions:
Assume that "my-key" does not yet exist in the cache. What does getAtomicLong() return?
If "my-key" does not exist, what does userNumber.get() return?
I wrote a silly Java example program and did some experimenting, so let me answer both the question you actually asked and then question I think you were trying to ask.
Assume that "my-key" does not yet exist in the cache. What does getAtomicLong() return?
An instance of RAtomicLong- because you might do something with the key/value in the future (like incrementAndGet or something).
If "my-key" does not exist, what does userNumber.get() return?
Zero. Yes, not null, yes not an exception, just 0. This was reasonably surprising in my test program.
The real interesting part about the Reddison API is that it leans hard on the atomic stuff - great if you could be updating a value from multiple threads - but seems to not document the simpliest use case: I want to read or write from Redis and it's not a number / not atomic / my threads or data is structured in such a way that they won't clobber each other.
That seems to be what Reddison's RBucket stuff is for.
That will return a null if the object is not in Redis.
RBucket<String> back = client.getBucket("foo");
String value = back.get();
if (value == null) {
System.out.println("NOPE, NULL");
} else {
System.out.println(value);
}
I really wish that would have been documented better - looking back I see "bucket" as a container for stuff, but for a while I assumed it meant some advanced Redis pattern and not "a holder for a value of a generic type".
(If you really do want the fancy stuff, there's an excellent baeldung artigle on it ).
When I execute a transaction (MULTI/EXEC) via SE.Redis, does it hit the server multiple times? For example,
ITransaction tran = Database.CreateTransaction();
tran.AddCondition(Condition.HashExists(cacheKey, oldKey));
HashEntry hashEntry = GetHashEntry(newKeyValuePair);
Task fieldDeleteTask = tran.HashDeleteAsync(cacheKey, oldKey);
Task hashSetTask = tran.HashSetAsync(cacheKey, new[] { hashEntry });
if (await tran.ExecuteAsync())
{
await fieldDeleteTask;
await hashSetTask;
}
Here I am executing two tasks in the transaction. Does this mean I hit the server 4 times? 1 for MULTI, 1 for delete, 1 for set, 1 for exec? Or is SE.Redis smart enough to buffer the tasks in local memory and send everything in one shot when we call ExecuteAsync?
It has to send multiple commands, but it doesn't pay latency costs per command; specifically, when you call Execute[Async] (and not before) it issues a pipeline (all together, not waiting for replies) of:
WATCH cacheKey // observes any competing changes to cacheKey
HEXIST cacheKey oldKey // see if the existing field exists
MULTI // starts the transacted commands
HDEL cacheKey oldKey // delete the existing field
HSET cachKey newField newValue // assign the new field
then it pays latency costs to get the result from the HEXIST, because only when that is known can it decide whether to proceed with the transaction (issuing EXEC and checking the result - which can be negative if the WATCH detects a conflict), or whether to throw everything away (DISCARD).
So; either way 6 commands are going to be issued, but in terms of latency: you're paying for 2 round trips due to the need for a decision point before the final EXEC/DISCARD. In many cases, though, this can itself be further masked by the reality that the result of HEXIST could already be on the way back to you before we've even got as far as checking, especially if you have any non-trivial bandwidth, for example a large newValue.
However! As a general rule: anything you can do with redis MULTI/EXEC: can be done faster, more reliably, and with fewer bugs, by using a Lua script instead. It looks like what we're actually trying to do here is:
for the hash cacheKey, if (and only if) the field oldField exists: remove oldField and set newField to newValue
We can do this very simply in Lua, because Lua scripts are executed at the server from start to finish without interruption from competing connections. This means that we don't need to worry about things like atomicity i.e. other connections changing data that we're making decisions with. So:
var success = (bool)await db.ScriptEvaluateAsync(#"
if redis.call('hdel', KEYS[1], ARGV[1]) == 1 then
redis.call('hset', KEYS[1], ARGV[2], ARGV[3])
return true
else
return false
end
", new RedisKey[] { cacheKey }, new RedisValue[] { oldField, newField, newValue });
The verbatim string literal here is our Lua script, noting that we don't need to do a separate HEXISTS/HDEL any more - we can make our decision based on the result of the HDEL. Behind the scenes, the library performs SCRIPT LOAD operations as needed, so: if you are doing this lots of times, it doesn't need to send the script itself over the network more than once.
From the perspective of the client: you are now only paying a single latency fee, and we're not sending the same things repeatedly (the original code sent cacheKey four times, and oldKey twice).
(a note on the choice of KEYS vs ARGV: the distinction between keys and values is important for routing purposes, in particular on sharded environments such as redis-cluster; sharding is done based on the key, and the only key here is cacheKey; the field identifiers in hashes do not impact sharding, so for the purpose of routing they are values, not keys - and as such, you should convey them via ARGV, not KEYS; this won't impact you on redis-server, but on redis-cluster this difference is very important, as if you get it wrong: the server will most-likely reject your script, thinking that you are attempting a cross-slot operation; multi-key commands on redis-cluster are only supported when all the keys are on the same slot, usually achieved via "hash tags")
Is there good way to support pop members from the Redis Sorted Set just like the api LPOP of the List ?
What I figured out for poping message from the Redis Sorted Set is using ZRANGE +ZREM , however it is not thread security and need the distributed lock when multi threads accessing them at the same time from the different host.
Please kind suggesting if there is better way to pop the members from the Sorted Set?
In Redis 5.0 or above, you can use [B]ZPOP{MIN|MAX} key [count] for this scenario.
The MIN version takes the item(s) with the lowest scores; MAX takes the item(s) with the highest scores. count defaults to 1, and the B prefix blocks until the data is available.
ZPOPMIN
ZPOPMAX
BZPOPMIN
BZPOPMAX
You can write a Lua script to do the job: wrap these two commands in a single Lua script. Redis ensures that the Lua script runs in an atomic way.
local key = KEYS[1]
local result = redis.call('ZRANGE', key, 0, 0)
local member = result[1]
if member then
redis.call('ZREM', key, member)
return member
else
return nil
end
Is it safe, to share an array between promises like I did it in the following code?
#!/usr/bin/env perl6
use v6;
sub my_sub ( $string, $len ) {
my ( $s, $l );
if $string.chars > $len {
$s = $string.substr( 0, $len );
$l = $len;
}
else {
$s = $string;
$l = $s.chars;
}
return $s, $l;
}
my #orig = <length substring character subroutine control elements now promise>;
my $len = 7;
my #copy;
my #length;
my $cores = 4;
my $p = #orig.elems div $cores;
my #vb = ( 0..^$cores ).map: { [ $p * $_, $p * ( $_ + 1 ) ] };
#vb[#vb.end][1] = #orig.elems;
my #promise;
for #vb -> $r {
#promise.push: start {
for $r[0]..^$r[1] -> $i {
( #copy[$i], #length[$i] ) = my_sub( #orig[$i], $len );
}
};
}
await #promise;
It depends how you define "array" and "share". So far as array goes, there are two cases that need to be considered separately:
Fixed size arrays (declared my #a[$size]); this includes multi-dimensional arrays with fixed dimensions (such as my #a[$xs, $ys]). These have the interesting property that the memory backing them never has to be resized.
Dynamic arrays (declared my #a), which grow on demand. These are, under the hood, actually using a number of chunks of memory over time as they grow.
So far as sharing goes, there are also three cases:
The case where multiple threads touch the array over its lifetime, but only one can ever be touching it at a time, due to some concurrency control mechanism or the overall program structure. In this case the arrays are never shared in the sense of "concurrent operations using the arrays", so there's no possibility to have a data race.
The read-only, non-lazy case. This is where multiple concurrent operations access a non-lazy array, but only to read it.
The read/write case (including when reads actually cause a write because the array has been assigned something that demands lazy evaluation; note this can never happen for fixed size arrays, as they are never lazy).
Then we can summarize the safety as follows:
| Fixed size | Variable size |
---------------------+----------------+---------------+
Read-only, non-lazy | Safe | Safe |
Read/write or lazy | Safe * | Not safe |
The * indicating the caveat that while it's safe from Perl 6's point of view, you of course have to make sure you're not doing conflicting things with the same indices.
So in summary, fixed size arrays you can safely share and assign to elements of from different threads "no problem" (but beware false sharing, which might make you pay a heavy performance penalty for doing so). For dynamic arrays, it is only safe if they will only be read from during the period they are being shared, and even then if they're not lazy (though given array assignment is mostly eager, you're not likely to hit that situation by accident). Writing, even to different elements, risks data loss, crashes, or other bad behavior due to the growing operation.
So, considering the original example, we see my #copy; and my #length; are dynamic arrays, so we must not write to them in concurrent operations. However, that happens, so the code can be determined not safe.
The other posts already here do a decent job of pointing in better directions, but none nailed the gory details.
Just have the code that is marked with the start statement prefix return the values so that Perl 6 can handle the synchronization for you. Which is the whole point of that feature.
Then you can wait for all of the Promises, and get all of the results using an await statement.
my #promise = do for #vb -> $r {
start
do # to have the 「for」 block return its values
for $r[0]..^$r[1] -> $i {
$i, my_sub( #orig[$i], $len )
}
}
my #results = await #promise;
for #results -> ($i,$copy,$len) {
#copy[$i] = $copy;
#length[$i] = $len;
}
The start statement prefix is only sort-of tangentially related to parallelism.
When you use it you are saying, “I don't need these results right now, but probably will later”.
That is the reason it returns a Promise (asynchrony), and not a Thread (concurrency)
The runtime is allowed to delay actually running that code until you finally ask for the results, and even then it could just do all of them sequentially in the same thread.
If the implementation actually did that, it could result in something like a deadlock if you instead poll the Promise by continually calling it's .status method waiting for it to change from Planned to Kept or Broken, and only then ask for its result.
This is part of the reason the default scheduler will start to work on any Promise codes if it has any spare threads.
I recommend watching jnthn's talk “Parallelism, Concurrency,
and Asynchrony in Perl 6”.
slides
This answer applies to my understanding of the situation on MoarVM, not sure what the state of art is on the JVM backend (or the Javascript backend fwiw).
Reading a scalar from several threads can be done safely.
Modifying a scalar from several threads can be done without having to fear for a segfault, but you may miss updates:
$ perl6 -e 'my $i = 0; await do for ^10 { start { $i++ for ^10000 } }; say $i'
46785
The same applies to more complex data structures like arrays (e.g. missing values being pushed) and hashes (missing keys being added).
So, if you don't mind missing updates, changing shared data structures from several threads should work. If you do mind missing updates, which I think is what you generally want, you should look at setting up your algorithm in a different way, as suggested by #Zoffix Znet and #raiph.
No.
Seriously. Other answers seem to make too many assumptions about the implementation, none of which are tested by the spec.
Is there a command in redis where I can set a default value for a key if it does not exist?
For example if get hello returns (nil) I would like to default it to world. But if the key hello already exists, I would like to return this value.
You can do it with a Lua script:
local value = redis.call("GET", KEYS[1])
if (not value) then
redis.call("SET", KEYS[1], ARGV[1])
return ARGV[1]
end
return value
Save this as script.lua and call it like this:
$ redis-cli eval "$(cat script.lua") 1 myKey defaultValue
You could also use SETNX to put the default value and then do a normal GET.
A simple Get should be the fastest thing to do
// practically Get would suffer a miss only the first time
value = Get key
if value not nil
return value
// Get did not find key => setNX
saved = SetNX key default
if saved
return default
// SetNX discovered key value was already set by someone else => Get it
return Get key
While it is possible as you can see in the other answers, be aware that depending on your use case, it may be faster to make 2 requests instead. For 2 reasons. The first is that it forces you to "compute" the output which is what you are trying to avoid with a cache in the first place. And second you will send a script AND the computed data with each requests which slows down traffic, where if your data stays the same for a while, you will only send a simple get request.
This is why a get-or-set is better if implemented in your app or at the driver level. I don't know the language you use, but here is a pseudo code version:
function get_or_set(key, callback_function)
value = REDIS.get(key)
if value.nil?
value = callback_function(REDIS) # pass REDIS to the anonymous function just in case
value_str = pack(value) # could be JSON or MsgPack
REDIS.set(key, value_str)
else
unpack(value)
end
end
This simplified because it only takes care of string (packing to make sure they are strings) and no expire is implemented, but you get the idea.
While it sounds overkill, it is how you get better performance. Otherwise it definitely would be a Redis feature already.