Sorted sets vs hash in Redis? - redis

I have the following information that I need to store in Redis:
url => {title, author, email}
Each of URL has title, author, email
So, I shall ensure that information are not dubplicated in store.
I think to use Sorted sets like as:
ZADD links_urls url "title"
ZADD links_author url "author"
ZADD links_email url "email"
What do you think about this? Am I wrong?

This is not the correct way to use a sorted set. You are using url as a score. However, scores must be numeric (they define the sort order).
If I understand your constraint correctly, each url is unique. If that is the case, I would use a hash to store everything.
I would use the url as a key, and then concatenate or JSON-encode the fields together, like this:
HSET links <url> '<title>::<author>::<email>'
This ensures constant time lookup and amortized constant time insertion.

Related

How to tag a key in REDIS so later I can remove all keys that match this tag?

Today we save data like that:
redisClient->set($uniquePageID, $data);
and output the data like that:
redisClient->get($uniquePageID)
But now we need to remove the data base on a userID. So we need something like that:
redisClient->set($uniquePageID, $data)->tag($userID);
So we can remove all the keys that related to this userID only, for example:
redisClient->tagDel($userID);
Does REDIS can solve something like that?
Thanks
There's no built-in way to do that. Instead, you need to tag these pages by yourself:
When setting a page-data pair, also put the page id into a SET of the corresponding user.
When you want to remove all pages of a given user, scan the SET of the user to get the page ids of this user, and delete these pages.
When scanning the SET, you can use either SMEMBERS or SSCAN command, depends on the size of the SET. If it's a big SET, prefer SSCAN to avoid block Redis for a long time.
I used HSET and HDEL to store and delete items like this:
$this->client = new Predis\Client(array...);
$this->client->hset($key, $tag, $value);
$this->client->hdel($key, $tags)
and if you want to delete every item KEY no matter tag or value you can use del key, it works with any data type including hset
$this->client->del($key);

Implementing following stream

I am developing an app for photo sharing and having follow system so whosoever follow x user then x users photo will come in his following .
I am storing my data in redis as following
sadd rdis_key+user_id photo_id
set redis_key+photo_id+data data_of_photo
sadd redis_key+follow+user_id follower_id
Now I want to get directly all photo_id of followers without looping.
This is a simple fan-out problem which you can not easily do with Redis directly.
You can do it with Lua but YOU WILL block Redis during the action.
I have an open source project which does the same thing but I do it in code as someone creates a new post. I would imagine this is just like a new photo.
https://github.com/pjuu/pjuu/blob/master/pjuu/posts/backend.py#L252
I use sorted sets though and use the unix timestamp as the score so they are always in order.
As User1 creates a new photo you look up a list of their followers. If you are using a sorted set you can get this via:
followers = zrange followers:user1 0 -1
then simply loop over all entries in that list:
for follower in followers: zadd feed:user2 <timestamp> <photo_id>
This way this new post is now pushed out to all users that are follow user1.
If you want this done on the fly then bad news: You will need some relational data and a way to query in the values which you can't do. SQL, Mongo, Couch, etc...
This is only pseudo code as you did not mention which language you use.
EDIT: As per question this is to be done on the Redis side
local followers = redis.call('zrange', KEYS[1], 0, -1)
for key, value in pairs(followers) do
redis.call('zadd', 'items:'..value, ARGV[1], ARGV[2])
end
return true
This will take a key of the users followers to iterate over. A zset score and value and will add these to the items for each user. You will need to change it to suit your exact needs. If you want to use sets you will need to use sscan or something. Zsets are easier though and in order.

Add Redis expire to whole bunch of namespaced key?

Say I have a namespaced key for user + id:
lastMessages
isNice attribute
So - it goes like this :
>lpush user:111:lastMessages a
>lpush user:111:lastMessages b
>lpush user:111:lastMessages c
ok
let's add the isNice prop:
>set user:111:isNice 1
so : let's see all keys for 111 :
> keys user:111*
result :
1) "user:111:isNice"
2) "user:111:lastMessages"
Ok , BUT
I want to expire the namespaced entry at it's whole ! ( so when timeout - all the keys should go at once. I don't want start managing each namespaced key and time left because not all props are added at the same time - but i want all props to be dead at the same time...)
Question :
Does it mean that I have to set expire for each namespaced key entry ?
if not , what is the correct way of doing it ?
Yes, the way you have it set up, these are all just separate keys. You can think of the namespace as an understanding you have with all the people who will access the Redis store
Okay guys, here's the deal. We're all going to use keys that look like this:
user:{user_id}:lastMessages
That way, we all understand where to look to get user number 325's last messages.
But really, there's nothing shared between user:111:lastMessages and user:111:isNice.
The fix
A way you can do what you're describing is using a hash. You will create a hash whose key is user:111 and then add fields lastMessages and isNice.
> hset user:111 lastMessages "you are my friend!"
> hset user:111 isNice true
> expire user:111 1000
Or, all at once,
> hmset user:111 lastMessages "you are my friend!" isNice true
> expire user:111 1000
Here is a page describing redis' data types. Scroll down to where it says 'Hashes' for more information.
Edit
Ah, I hadn't noticed you were using a list.
If you don't have too many messages (under 20, say), you could serialize them into JSON and store them as one string. That's not a very good solution though.
The cleanest way might just be to set two expires.

Ensuring Uniqueness for a sorted set in redis

I am trying to store media objects and have them retrievable by a certain time range through redis. I have chosen a sorted set data type to do this. I am adding elements like:
zAdd: key: media:552672 score: 1355264694
zAdd: key: media:552672 score: 1355248565
zAdd: key: media:552672 score: 1355209157
zAdd: key: media:552672 score: 1355208992
zAdd: key: media:552672 score: 1355208888
zAdd: key: media:552672 score: 1355208815
Where key is unique to the location id the media was taken at and the score is the creation time of the media object. And the value is a json_decode of the media object.
When I go to retrieve using zRevRangeByScore, occasionally there will be duplicate entries. I'm essentially using Redis as a buffer to an external API, if the users are making the same API call twice with X seconds, then I will retrieve the results from the cache, otherwise, I will add it to the cache, not checking to see if it already exists due to the definition of a set not containing duplicates.
Possible known issues:
If the media object attribute changes between caching it will show up as a duplicate
Is there a better way to store this type of data without doing checks on the redis client side?
TLDR;
What is the best way to store and retrieve objects in Redis where you can select a range of objects by timestamp and ensure that they are unique?
Lets make sure we're talking about the same things, so here is the terminology for Redis sorted sets:
ZADD key score member [score] [member]
summary: Add one or more members to a sorted set, or update its score if it already exists
key - the 'name' of the sorted set
score - the score (in our case a timestamp)
member - the string the score is associated with
A sorted set has many members, each with a score
It sounds like your are using a JSON encoded string of the object as the member. The member is what is unique in a sorted set. As you say, if the object changes it will be added as a new member to the sorted set. That is probably not what you want.
A sorted set is the Redis way to store data by timestamp, but the member that is stored in the set is usually a 'pointer' to another key in Redis.
From your description I think you want this data structure:
A sorted set storing all media by created timestamp
A string or hash for each unique media
I recommend storing the media objects in a hash as this allows more flexibility.
Example:
# add some members to our sorted set
redis 127.0.0.1:6379> ZADD media 1000 media:1 1003 media:2 1001 media:3
(integer) 3
# create hashes for our members
redis 127.0.0.1:6379> HMSET media:1 id 1 name "media one" content "content string for one"
OK
redis 127.0.0.1:6379> HMSET media:2 id 2 name "media two" content "content string for two"
OK
redis 127.0.0.1:6379> HMSET media:3 id 3 name "media three" content "content string for three"
OK
There are two ways to retrieve data stored in this way. If you need to get members within specific timestamp ranges (eg: last 7 days) you will have to use ZREVRANGEBYSCORE to retrieve the members, then loop through those to fetch each hash with HGETALL or similar. See pipelining to see how you can do the loop with one call to the server.
redis 127.0.0.1:6379> ZREVRANGEBYSCORE media +inf -inf
1) "media:2"
2) "media:3"
3) "media:1"
redis 127.0.0.1:6379> HGETALL media:2
1) "id"
2) "2"
3) "name"
4) "media two"
5) "content"
6) "content string for two"
If you only want to get the last n members (or eg: 10th most recent to 100th most recent) you can use SORT to get items. See the sort documentation for syntax and how to retrieve different hash fields, limit the results and other options.
redis 127.0.0.1:6379> SORT media BY nosort GET # GET *->name GET *->content1) DESC
1) "media:2"
2) "media two"
3) "content string for two"
4) "media:3"
5) "media three"
6) "content string for three"
7) "media:1"
8) "media one"
9) "content string for one"
NB: sorting a sorted hash by score (BY nosort) only works from Redis 2.6.
If you plan on getting media for the last day, week, month, etc. I would recommend using a seperate sorted set for each one and use ZREMRANGEBYSCORE to remove old members. You can then just use SORT on these sorted sets to retrieve the data.

How to implement our own UID in Lucene?

I wish to create an index with, lets say the following fields :
UID
title
owner
content
out of which, I don't want UID to be searchable. [ like meta data ]
I want the UID to behave like docID so that when I want to delete or update,
I'll use this.
Is this possible ? How to do this ?
You could mark is as non-searchable by adding it with Store.YES and Index.NO, but that wont allow you easy updating/removal by using it. You'll need to index the field to allow replacing it (using IndexWriter.UpdateDocument(Term, Document) where term = new Term("UID", "...")), so you need to use either Index.ANALYZED with a KeywordAnalyzer, or Index.NOT_ANALYZED. You can also use the FieldCache if you have a single-valued field, which a primary key usually is. However, this makes it searchable.
Summary:
Store.NO (It can be retrieved using the FieldCache or a TermsEnum)
Index.NOT_ANALYZED (The complete value will be indexed as a term, including any whitespaces)