Redis using members of sorted list to delete external keys - redis

Using sort one can sort a set and get external keys using results from the sort component of query.
By way of example:
If the external key/value are defined as various keys using the pattern:itemkey:<somestring>
And a sorted list has list of the members then issuing command sort <lists key> by nosort get itemkey:* would get the values of the referenced keys.
I would like to be able to sort through a sorted list and delete these individual keys but it appears that sort <key> by nosort del itemkey:* is not supported.
Any suggestions on how to get list of values stored in a set and then delete the external keys?
Obviously I can do this with two commands, first getting the list of values and then by iterating through list call the delete function - but this is not desirable as I requite atomic operation.

To ensure atomic operation one can use either transactions or redis' lua scripts. For efficiency I decided to go with using script. This way the entire script is completed before next redis action/request is processed.
In code snippet below. I used loadScript in order to store script redis side reducing traffic with every call, the response from loadScript is then used as identifier to Jedis's evalsha command.
Using Scala (Note Jedis is a Java library, hence the .asJava):
val scriptClearIndexRecipe = """local names = redis.call('SORT', KEYS[1]);
| for i, k in ipairs(names) do
| redis.call('DEL', "index:recipe:"..k)
| end;
| redis.call('DEL', KEYS[1]);
| return 0;""".stripMargin
def loadScript(script: String): String = client.scriptLoad(script)
def eval(luaSHA: String, keys: List[String], args: List[String]): AnyRef = {
client.evalsha(luaSHA, keys.asJava, args.asJava)
}

Related

Is there a command in Redis for HASH data structure similar to MGET?

I need to get in one call all data fields for a set of known REDIS hash keys. I've used MGET for string keys such as :
MGET key [key ...]
Available since 1.0.0.
Time complexity: O(N) where N is the number of keys to retrieve.
Returns the values of all specified keys. For every key that does not hold a string value or does not exist, the special value nil is returned. Because of this, the operation never fails.
HMGET only brings all fields for one key. I need many keys all fields by key.
There is no command like that, redis hashes work within the hash, so HMGET work inside one hash and give all the fields in that hash. There is no way to access all the fields in multiple hashes at ones.
However you can user several HMGET on each hash and get all the fields. you can pipeline these commands to execute in a one go.
Option 1
Ex. implementation in pseudo code
Pipeline p
List<String> = p.hgetall('key1', fields...);
List<String> = p.hgetall('key2', fields...);
List<String> = p.hgetall('key3', fields...);
p.exec();
Option 2
Other option is to write a LUA script and call that using EVAL
local array = {}
local keys = redis.call('KEYS', '<your pattern>')
for _,key in ipairs(keys) do
local val = redis.call('HGETALL', key)
array[#array + 1] = val
end
return array
Call the lua sctipt
redis-cli EVAL "$(cat test.lua)" 0
1) 1) "field1"
2) "val"
2) 1) "field1"
2) "val"
3) "field2"
4) "val2"
As noted in another answer, there's no built in way but there more workarounds besides a transaction.
Option 1: use a Lua script (i.e. EVAL "..." 3 myhash1 myhash2 myhash3 myfield)
local r = {}
while (#KEYS > 0) do
local k = table.remove(KEYS,1)
r[#r+1] = redis.call('HGET', k, ARGV[1])
end
return r
Option 2: write a Redis module
Out of scope as a an answer :)

Redis: How to increment hash key when adding data?

I'm iterating through data and dumping some to a Redis DB. Here's an example:
hmset id:1 username "bsmith1" department "accounting"
How can I increment the unique ID on the fly and then use that during the next hmset command? This seems like an obvious ask but I can't quite find the answer.
Use another key, a String, for storing the last ID. Before calling HMSET, call INCR on that key to obtain the next ID. Wrap the two commands in a MULTI/EXEC block or a Lua script to ensure the atomicity of the transaction.
Like Itamar mentions you can store your index/counter in a separate key. In this example I've chosen the name index for that key.
Python 3
KEY_INDEX = 'index'
r = redis.from_url(host)
def store_user(user):
r.incr(KEY_INDEX, 1) # If key doesn't exist it will get created
index = r.get(KEY_INDEX).decode('utf-8') # Decode from byte to string
int_index = int(index) # Convert from string to int
result = r.set('user::%d' % int_index, user)
...
Note that user::<index> is an arbitrary key chosen by me. You can use whatever you want.
If you have multiple machines writing to the same DB you probably want to use pipelines.

Deleting a complete folder in Redis

How can I delete a complete folder in redis, using DEL command OR in C# in StackExchange.Redis.
SomeFolder1:SomeSubfolder1:somekey1
SomeFolder1:SomeSubfolder1:somekey2
SomeFolder1:SomeSubfolder2:somekey1
How can I Del SomeFolder1:SomeSubfolder1 so remaining keys are
SomeFolder1:SomeSubfolder2:somekey1
There's no such thing as a "folder" in redis, and no functionality to del based on a pattern. Options:
use a database; you can have an arbitrary number of databases (each is numbered, the default is 0), and you can discard a database in a single operation: flushdb - so keep all this associated data in one database and you're sorted
use scan to iterate the matching keys, issuing a del for each match
note that scan is a server command, so you need:
const int db = 0;
var server = muxer.GetServer(...);
var db = muxer.GetDatabase(db);
foreach(var key in server.Keys(db, "SomeFolder1:SomeSubfolder1:*",
pageSize: 500))
{
db.KeyDelete(key, flags: CommandFlags.FireAndForget);
}
the FireAndForget allows you to ignore the individual replies, which means you aren't bound by latency and you don't have TPL overheads.
Fundamentally, though: redis is not meant to be used like this - if you find yourself scanning keys, you are doing something wrong. A more typical implementation might be to use a hash to store the folder (which each key/value pair inside the hash being the contents) - then deleting the hash is one operation. Alternatively, use a set to store the keys of the items inside each logical folder.
Hash approach:
hash = SomeFolder1:SomeSubfolder1
key = somekey1, value = ...
key = somekey2, value = ...
hash = SomeFolder1:SomeSubfolder2
key = somekey1, value = ...
Set approach:
set: SomeFolder1:SomeSubfolder1
somekey1
somekey2
set: SomeFolder1:SomeSubfolder2
somekey1
string: SomeFolder1:SomeSubfolder1:somekey1
value
string: SomeFolder1:SomeSubfolder1:somekey2
value
string: SomeFolder1:SomeSubfolder2:somekey1
value

How to retrieve all hash values from a list in Redis?

In Redis, to store an array of objects we should use hash for the object and add its key to a list:
HMSET concept:unique_id name "concept"
...
LPUSH concepts concept:unique_id
...
I want to retrieve all hash values (or objects) in the list, but the list contains only hash keys so a two step command is necessary, right? This is how I'm doing in python:
def get_concepts():
list = r.lrange("concepts", 0, -1)
pipe = r.pipeline()
for key in list:
pipe.hgetall(key)
pipe.execute()
Is it necessary to iterate and fetch each individual item? Can it be more optimized?
You can use the SORT command to do this:
SORT concepts BY nosort GET concept:*->name GET concept:*->some_key
Where * will expand to each item in the list.
Add LIMIT offset count for pagination.
Note that you have to enumerate each field in the hash (each field you want to fetch).
Another option is to use the new (in redis 2.6) EVAL command to execute a Lua script in the redis server, which could do what you are suggesting, but server side.

Redis sorted sets and best way to store uids

I have data consisting of user_ids and tags of these user ids.
The user_ids occur multiple times and have pre-specified number of tags (500) however that might change in the feature. What must be stored is the user_id, their tags and their count.
I want later to easily find tags with top score.. etc. Every time a tag appears it is incremented
My implementation in redis is done using sorted sets
every user_id is a sorted set
key is user_id and is a hex number
works like this:
zincrby user_id:x 1 "tag0"
zincrby user_id:x 1 "tag499"
zincrby user_id:y 1 "tag3"
and so on
having in mind that I want to get tags with highest score, is there a better way?
The second issue is that right now I 'm using "keys *" to retrieve these keys for client side manipulation which I know that it's not aimed towards production systems.
Plus it would be great for memory problems to iterate through a specified number of keys (in the range of 10000). I know that keys have to be stored in memory, however they don't follow
a specific pattern to allow for partial retrieval so I can avoid "zmalloc" error (4GB 64 bit debian server).
Keys amount to range of 20 million.
Any thoughts?
My first point would be to note that 4 GB are tight to store 20M sorted sets. A quick try shows that 20M users, each of them with 20 tags would take about 8 GB on a 64 bits box (and it accounts for the sorted set ziplist memory optimizations provided with Redis 2.4 - don't even try this with earlier versions).
Sorted sets are the ideal data structure to support your use case. I would use them exactly as you described.
As you pointed out, KEYS cannot be used to iterate on keys. It is rather meant as a debug command. To support key iteration, you need to add a data structure to provide this access path. The only structures in Redis which can support iteration are the list and the sorted set (through the range methods). However, they tend to transform O(n) iteration algorithms into O(n^2) (for list), or O(nlogn) (for zset). A list is also a poor choice to store keys since it will be difficult to maintain it as keys are added/removed.
A more efficient solution is to add an index composed of regular sets. You need to use a hash function to associate a specific user to a bucket, and add the user id to the set corresponding to this bucket. If the user id are numeric values, a simple modulo function will be enough. If they are not, a simple string hashing function will do the trick.
So to support iteration on user:1000, user:2000 and user:1001, let's choose a modulo 1000 function. user:1000 and user:2000 will be put in bucket index:0 while user:1001 will be put in bucket index:1.
So on top of the zsets, we now have the following keys:
index:0 => set[ 1000, 2000 ]
index:1 => set[ 1001 ]
In the sets, the prefix of the keys is not needed, and it allows Redis to optimize the memory consumption by serializing the sets provided they are kept small enough (integer sets optimization proposed by Sripathi Krishnan).
The global iteration consists in a simple loop on the buckets from 0 to 1000 (excluded). For each bucket, the SMEMBERS command is applied to retrieve the corresponding set, and the client can then iterate on the individual items.
Here is an example in Python:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# ----------------------------------------------------
import redis, random
POOL = redis.ConnectionPool(host='localhost', port=6379, db=0)
NUSERS = 10000
NTAGS = 500
NBUCKETS = 1000
# ----------------------------------------------------
# Fill redis with some random data
def fill(r):
p = r.pipeline()
# Create only 10000 users for this example
for id in range(0,NUSERS):
user = "user:%d" % id
# Add the user in the index: a simple modulo is used to hash the user id
# and put it in the correct bucket
p.sadd( "index:%d" % (id%NBUCKETS), id )
# Add random tags to the user
for x in range(0,20):
tag = "tag:%d" % (random.randint(0,NTAGS))
p.zincrby( user, tag, 1 )
# Flush the pipeline every 1000 users
if id % 1000 == 0:
p.execute()
print id
# Flush one last time
p.execute()
# ----------------------------------------------------
# Iterate on all the users and display their 5 highest ranked tags
def iterate(r):
# Iterate on the buckets of the key index
# The range depends on the function used to hash the user id
for x in range(0,NBUCKETS):
# Iterate on the users in this bucket
for id in r.smembers( "index:%d"%(x) ):
user = "user:%d" % int(id)
print user,r.zrevrangebyscore(user,"+inf","-inf", 0, 5, True )
# ----------------------------------------------------
# Main function
def main():
r = redis.Redis(connection_pool=POOL)
r.flushall()
m = r.info()["used_memory"]
fill(r)
info = r.info()
print "Keys: ",info["db0"]["keys"]
print "Memory: ",info["used_memory"]-m
iterate(r)
# ----------------------------------------------------
main()
By tweaking the constants, you can also use this program to evaluate the global memory consumption of this data structure.
IMO this strategy is simple and efficient, because it offers O(1) complexity to add/remove users, and true O(n) complexity to iterate on all items. The only downside is the key iteration order is random.