Ensuring Uniqueness for a sorted set in redis - redis

I am trying to store media objects and have them retrievable by a certain time range through redis. I have chosen a sorted set data type to do this. I am adding elements like:
zAdd: key: media:552672 score: 1355264694
zAdd: key: media:552672 score: 1355248565
zAdd: key: media:552672 score: 1355209157
zAdd: key: media:552672 score: 1355208992
zAdd: key: media:552672 score: 1355208888
zAdd: key: media:552672 score: 1355208815
Where key is unique to the location id the media was taken at and the score is the creation time of the media object. And the value is a json_decode of the media object.
When I go to retrieve using zRevRangeByScore, occasionally there will be duplicate entries. I'm essentially using Redis as a buffer to an external API, if the users are making the same API call twice with X seconds, then I will retrieve the results from the cache, otherwise, I will add it to the cache, not checking to see if it already exists due to the definition of a set not containing duplicates.
Possible known issues:
If the media object attribute changes between caching it will show up as a duplicate
Is there a better way to store this type of data without doing checks on the redis client side?
TLDR;
What is the best way to store and retrieve objects in Redis where you can select a range of objects by timestamp and ensure that they are unique?

Lets make sure we're talking about the same things, so here is the terminology for Redis sorted sets:
ZADD key score member [score] [member]
summary: Add one or more members to a sorted set, or update its score if it already exists
key - the 'name' of the sorted set
score - the score (in our case a timestamp)
member - the string the score is associated with
A sorted set has many members, each with a score
It sounds like your are using a JSON encoded string of the object as the member. The member is what is unique in a sorted set. As you say, if the object changes it will be added as a new member to the sorted set. That is probably not what you want.
A sorted set is the Redis way to store data by timestamp, but the member that is stored in the set is usually a 'pointer' to another key in Redis.
From your description I think you want this data structure:
A sorted set storing all media by created timestamp
A string or hash for each unique media
I recommend storing the media objects in a hash as this allows more flexibility.
Example:
# add some members to our sorted set
redis 127.0.0.1:6379> ZADD media 1000 media:1 1003 media:2 1001 media:3
(integer) 3
# create hashes for our members
redis 127.0.0.1:6379> HMSET media:1 id 1 name "media one" content "content string for one"
OK
redis 127.0.0.1:6379> HMSET media:2 id 2 name "media two" content "content string for two"
OK
redis 127.0.0.1:6379> HMSET media:3 id 3 name "media three" content "content string for three"
OK
There are two ways to retrieve data stored in this way. If you need to get members within specific timestamp ranges (eg: last 7 days) you will have to use ZREVRANGEBYSCORE to retrieve the members, then loop through those to fetch each hash with HGETALL or similar. See pipelining to see how you can do the loop with one call to the server.
redis 127.0.0.1:6379> ZREVRANGEBYSCORE media +inf -inf
1) "media:2"
2) "media:3"
3) "media:1"
redis 127.0.0.1:6379> HGETALL media:2
1) "id"
2) "2"
3) "name"
4) "media two"
5) "content"
6) "content string for two"
If you only want to get the last n members (or eg: 10th most recent to 100th most recent) you can use SORT to get items. See the sort documentation for syntax and how to retrieve different hash fields, limit the results and other options.
redis 127.0.0.1:6379> SORT media BY nosort GET # GET *->name GET *->content1) DESC
1) "media:2"
2) "media two"
3) "content string for two"
4) "media:3"
5) "media three"
6) "content string for three"
7) "media:1"
8) "media one"
9) "content string for one"
NB: sorting a sorted hash by score (BY nosort) only works from Redis 2.6.
If you plan on getting media for the last day, week, month, etc. I would recommend using a seperate sorted set for each one and use ZREMRANGEBYSCORE to remove old members. You can then just use SORT on these sorted sets to retrieve the data.

Related

Redis HGETALL showing empty array even if LRANGE shows it has elements

If stored value is type of list in hash field, HGETALL is returning the following response.
Reproduction steps below:
Connect to redis-cli, then:
HSET user1 user2user1 []
LPUSH user2user1 "test1"
LPUSH user2user1 "test2"
LRANGE user2user1 0 -1 #> shows ["test2", "test1"]
HGETALL user1 #> shows "user2user1" "[]"
I was expecting that HGETALL will return all array with all elements. I have checked the documentation but I could not find information on storing list with HSET.
Reference: https://redis.io/commands/hset/
You are using two different keys :
a hash named user1
a list named user2user1
You are pushing on the list so the hash is unchanged.
Moreover, hash are used to store strings, there is no such thing as pushing as you can see in the documentation you provided. When you set HSET user1 user2user1 [] you are setting the key user2user1 to the string [].
It is not really a problem if user2user1 is a unique string as it makes the hash useless in this "trivial" implementation.
HSET store key-value pairs.
You cannot link between keys.
You my want to look at RedisJSON module where you can store arrays.

Sorted sets vs hash in Redis?

I have the following information that I need to store in Redis:
url => {title, author, email}
Each of URL has title, author, email
So, I shall ensure that information are not dubplicated in store.
I think to use Sorted sets like as:
ZADD links_urls url "title"
ZADD links_author url "author"
ZADD links_email url "email"
What do you think about this? Am I wrong?
This is not the correct way to use a sorted set. You are using url as a score. However, scores must be numeric (they define the sort order).
If I understand your constraint correctly, each url is unique. If that is the case, I would use a hash to store everything.
I would use the url as a key, and then concatenate or JSON-encode the fields together, like this:
HSET links <url> '<title>::<author>::<email>'
This ensures constant time lookup and amortized constant time insertion.

How to model this hash with values as list in redis?

I have to store data in redis, where the data to be stored is of form
{
"KEY" : {
"k1":["v1", "v2", "v3"],
"k2":["v4", "v5"],
"k3":["v1", "v2]}
},
"KEY1" :{
"k1":["v11", "v2"],
"k2":["v4", "v15", "v3"],
"k3":["v12", "v2]}
}
}
According the documentation, we can not have list as a value in the hash data structure. What is the best way for modelling this, the list is generated one value at a time, so I need to append or add. Should there be different database for each of the top level keys?, or should there be different instances of redis which needs to be brought up so that the top level keys is used to identify the specific database or instance in which the mid level key can be used as a key to load the values in a list or set?
Redis is quite flexible regarding how you can structure your data, but here is a possible approach.
Since hash values have to be strings, they can be keys referring to lists. So you can create a list of values under a top-level key:
redis> lpush list_value:k1 v1
(integer) 1
redis> lpush list_value:k1 v2
(integer) 2
redis> lpush list_value:k1 v3
(integer) 3
Then you set the key referring to that list as a value of a hash:
redis> hset KEY k1 list_value:k1
(integer) 1
To fetch your list of values, you first get the key where those values are stored:
redis> hget KEY k1
"list_value:k1"
Then you use that key to retrieve the list of values:
redis> lrange list_value:k1 0 -1
1) "v3"
2) "v2"
3) "v1"
You should use namespaces (typically based on colon separators) to name the keys pointing to your lists of values in order to avoid clashes with the keys of your hashes.

Implementing following stream

I am developing an app for photo sharing and having follow system so whosoever follow x user then x users photo will come in his following .
I am storing my data in redis as following
sadd rdis_key+user_id photo_id
set redis_key+photo_id+data data_of_photo
sadd redis_key+follow+user_id follower_id
Now I want to get directly all photo_id of followers without looping.
This is a simple fan-out problem which you can not easily do with Redis directly.
You can do it with Lua but YOU WILL block Redis during the action.
I have an open source project which does the same thing but I do it in code as someone creates a new post. I would imagine this is just like a new photo.
https://github.com/pjuu/pjuu/blob/master/pjuu/posts/backend.py#L252
I use sorted sets though and use the unix timestamp as the score so they are always in order.
As User1 creates a new photo you look up a list of their followers. If you are using a sorted set you can get this via:
followers = zrange followers:user1 0 -1
then simply loop over all entries in that list:
for follower in followers: zadd feed:user2 <timestamp> <photo_id>
This way this new post is now pushed out to all users that are follow user1.
If you want this done on the fly then bad news: You will need some relational data and a way to query in the values which you can't do. SQL, Mongo, Couch, etc...
This is only pseudo code as you did not mention which language you use.
EDIT: As per question this is to be done on the Redis side
local followers = redis.call('zrange', KEYS[1], 0, -1)
for key, value in pairs(followers) do
redis.call('zadd', 'items:'..value, ARGV[1], ARGV[2])
end
return true
This will take a key of the users followers to iterate over. A zset score and value and will add these to the items for each user. You will need to change it to suit your exact needs. If you want to use sets you will need to use sscan or something. Zsets are easier though and in order.

Add Redis expire to whole bunch of namespaced key?

Say I have a namespaced key for user + id:
lastMessages
isNice attribute
So - it goes like this :
>lpush user:111:lastMessages a
>lpush user:111:lastMessages b
>lpush user:111:lastMessages c
ok
let's add the isNice prop:
>set user:111:isNice 1
so : let's see all keys for 111 :
> keys user:111*
result :
1) "user:111:isNice"
2) "user:111:lastMessages"
Ok , BUT
I want to expire the namespaced entry at it's whole ! ( so when timeout - all the keys should go at once. I don't want start managing each namespaced key and time left because not all props are added at the same time - but i want all props to be dead at the same time...)
Question :
Does it mean that I have to set expire for each namespaced key entry ?
if not , what is the correct way of doing it ?
Yes, the way you have it set up, these are all just separate keys. You can think of the namespace as an understanding you have with all the people who will access the Redis store
Okay guys, here's the deal. We're all going to use keys that look like this:
user:{user_id}:lastMessages
That way, we all understand where to look to get user number 325's last messages.
But really, there's nothing shared between user:111:lastMessages and user:111:isNice.
The fix
A way you can do what you're describing is using a hash. You will create a hash whose key is user:111 and then add fields lastMessages and isNice.
> hset user:111 lastMessages "you are my friend!"
> hset user:111 isNice true
> expire user:111 1000
Or, all at once,
> hmset user:111 lastMessages "you are my friend!" isNice true
> expire user:111 1000
Here is a page describing redis' data types. Scroll down to where it says 'Hashes' for more information.
Edit
Ah, I hadn't noticed you were using a list.
If you don't have too many messages (under 20, say), you could serialize them into JSON and store them as one string. That's not a very good solution though.
The cleanest way might just be to set two expires.