I'm currently researching high scalable web site architectures,nearly all of the articles i've read say that Redis is very good choice for a timeline(facebook,twitter like) architecture.So let's suppose that I'm building a new social network and I want to save last 500 feeds of each user with Redis,I'm just curious about what will happen when a user delete a feed which is in last 500 feeds? I couldn't find any information about updating Redis list item,if there is no such a thing in Redis how can it be a very good choice?
Using the Lset command will allow you to update a list item: https://redis.io/commands/lset
Instead of a list, a better redis structure to use would be the sorted set, assuming you want to maintain ordering of the last 500 feeds.
To add new entry, you can use the command ZADD user_id:feeds time_in_epoch feed_id. The time_in_epoch would be score for sorting the set & will maintaining a ordering on the feeds.
To delete a feed for a user, ZREM user_id:feeds feed_id.
Related
I'm writing a interface to query pagination data from Hbase table ,I query pagination data by some conditions, but it's very slow .My rowkey like this : 12345678:yyyy-mm-dd , length of 8 random Numbers and date .I try to use Redis cache all rowkeys and do pagination in it , but it's difficult to query data by the other conditions .
I also consider to design the secondary index in Hbase , and I discuss it with colleagues ,they think the secondary index is hard to maintain .
So , who can give me some ideas?
First thing, AFAIK random number + date pattern of rowkey may lead to hotspotting, if you scale with large data.
Regarding Pagination :
I'd offer solr + hbase if you are using cloudera then its cloudera search. It gives good performance(proved in our case) while querying 100 per page and with webservice call we have populated angularjs dashboard.
Also, most important thing is you can move back and forth between pages with out any issues..
Below diagram describes that.
To achieve this, you need to create collections(from hbase data) and can use solrj api
Hbase alone with scan api doesn't work for quick queries.
Apart from that, Please see my answer. Which is more insightful with implementation details...
How to achieve pagination in HBase?
Hbase only solution could be Hindex (co-processor based solution)
Link explains more in detail
Hindex architecture :
In Hbase to achieve good read performance you want your data retrieved by small number of gets (requests for single row) or a small scan (request over range of rows). Hbase stores your data sorted by key, so most important idea is to come up with such row key that would allow it.
Your key seems to contain only random integer and date so I assume that your queries are about pagination over records marked with time.
First idea is that in typical pagination scenario you access just 1 page at a time and navigate from page 1 to page 2 to page 3 etc. Given you want to paginate over all records for date 2015-08-16 you could use a scan of 50 rows with start key '\0:2015-08-16' (as it is smaller than any row in 2015-08-16) to retrieve first page. After retrieval of first page you have last key from a first page, say '12345:2015-08-16'. You can use it (or 12346:2015-08-16) to make another scan with start key 12346:2015-08-16 of 50 rows to retrieve page 2 and so on. So using this approach you query your pages fast as a single scan with predefined number of returned rows. So you can use last page row key as a parameter to paging API or just put last row key in redis so next paging API call will find it there.
All this works perfectly well until some user comes in and clicks directly to page 100. Or try to click on page 5 when he was on page 2. In such scenario you can use similar scan with nSkippedPages * 50 rows. This will not be as fast as a sequential access, but it's not a usual page usage pattern. You can use redis then to cache last row of the page result in a structure like pageNumber -> rowKey. Then if next user comes and clicks on page 100, it will see same performance as is in usual click page 1- click page 2- click page 3 scenario.
Then to make things more fast for users which click on page 99 first time, you could write a separate daemon which retrieves every 50th row and puts result in redis as a page index. Then launch it every 10-15 minutes and say that your page index has at most 10-15 minutes stale data.
You also can design a separate API which preloads row keys for a bulk of N pages (say about 100 pages, it could be async e.g. don't wait for actual preload to complete). What it will do is just a scan with KeyOnlyFilter and 50*N results and then selection of rowkeys for each page. So it accepts rowkey and populates redis with rowkey cache for N pages. Then when user walks in on a first page you fetch first 100 pages row keys for him so when he clicks on some page link seen on page, page start row key will be available. With right bulk size of preload you could approach your required latency.
Limit could be implemented using Scan.setMaxResults() or using PageFilter.
"skip nPages * 50 rows" and especially "output every 50th row" functionality seems to be trickier e.g. for latter you may end-up performing full scan which retrieves the keys or writing map-reduce to do it and for first it is not clear how to do it without sending rows over network since request can be distributed across several regions.
If you are looking for secondary indexes that are maintained in HBase there are several open source options (Splice Machine, Lilly, etc.). You can do index lookups in a few milliseconds.
I am trying to make a notification system with Redis rather than using MySQL which is what I use for the rest of the system. The reason for this is that I don't really need to save that much data so it can be saved in memory and I want it to be lightweight and fast.
The notifications will be kept temporarily. What I mean by that is that I do not want to save all notifications, but more like 50 latest unseen notifications for each user. So first thing I thought about was to use a linked list with a capped length of 50.
I would need to save this information for the notification:
postId
commentId
type
time
userId
username
image
So perhaps a JSON serialized string like this:
{"postId":1,"commentId":10,"type":1,"time":1462960058,"userId":2,"username":"Alexander","image":"ntfpRrgx.png"}
The notifications would be output like this on the client side:
Alexander commented on your post.
Alexander replied to your comment.
Where the type determines what kind of notification it is. I can handle "type" checks client side and output notification format accordingly. But here is the part I am having difficult with.
1) I need to be able to save the notifications in an ordered way so that I know which notification is newest.
2) I need to be able to know when a notification has been seen, so that it is not registered as not seen anymore.
3) I need to have a count of unseen notifications that I can show to the user. And If the user clicks on a notification, I need to mark that as a seen notification and decrement the count of unseen notifications.
4) I need to be able to mark all notifications as marked seen if the user wishes to do that.
5) I need to be able to get a subset of the notifications, whether seen or unseen, like an offset and limit on MySQL. For example, the user sees the newest 5 notifications, but he could click a next button and see the next 5, and the next 5 and so on.
I have no idea how to do all of this on Redis.
The key for the list or set could be user:1:notification. I know a list is sorted, and we can add and remove from the head and tail. But how do I achieve all these points?
1: You can use redis sorted sets (zset) operations and use timestamp as a score, and event id (or the entire event json) as a member.
ZADD my-set-key timestamp event-id
Then to get a page newest items you use zrevrange command. If you choose to use event id as a member, then you need additional structure to store event fields. I would recommend HSET eventid, field, value.
2: You can remove an item by member (event-id)
ZREM my-set-key event-id
3: Assuming your zset only keeps unseen, then you can use ZCARD to get size of the set
ZCARD my-set-key
4: You can remove an entire set in one shot using
DELETE my-set-key
5: You can paginate using zrange/zrevrange:
ZREVRANGE my-set-key start-position to-position
If you need to keep both seen and unseen items, then you need an extra zset where you only add, but don't remove once an item is seen
New to Redis. Need some help.
USE CASE:
I have thousands of leaderboards. They have usernames in them with appropriate scores. A user can belong to 1 or more leaderboards. I need an efficient way to get the rank of every leaderboard a particular user belongs to, preferrably sorted by rank and with pagination. Typical user will belong to hundreds of leaderboards.
AS FAR AS I GOT:
I keep a set for each user containing boards he belongs to. To get a user's ranks I get his set of boards then zrank each board in the set and then order it by rank in my code. This seems very inefficient and does not support pagination.
I have been reading and brainstorming and I am stuck. What I need is something like:
user1:boards (a,c,e)
board:a (user1,user23,user5)
board:b (user2,user7,user12)
board:c (user2,user1,user42)
board:d (user36,user4,user9)
board:e (user6,user19,user1)
SORT user1:boards BY board:*->user1
Similar to sorting by hash fields except -> in this case means the sorted set score of the member provided. Would there be any performance improvement if such a feature existed? Or would it be the same as pipelining all the zranks?
Thanks.
To make your reads efficient, you just have to make a minor change to your writes.
Currently you are storing user boards in a set, store them in a sorted set instead. Lets call it user_boards_sorted_set.
So whenever you increase the score of a user 1 in a leaderboard sorted set (board1 for example) , you run a zrank for user 1 on board1, and that rank becomes score for the user1 in user_boards_sorted_set.
This way user_boards_sorted_set always contains all the boards the user belongs to, and the scores against each entry contain his rank in that particular leaderboard.Run a ZRANGE on user_boards_sorted_set and you will have the user and his ranks in all the leaderboards sorted by rank.
UPDATE : Based on feedbacks in the comments, and an incorrect assumption made in the above answer .
Another good way would be to use Lua scripting to get individual board rankings by doing ZRANK on all the boards that the user belongs to, and sorting it in LUA itself. This will give significant performance gain as all the ZRANKS and then sorting are done on server side itself, and reduces network transfers.
Node.js & Redis:
I have a LIST (users:waiting) storing a queue of users waiting to join games.
I have SORTED SET (games:waiting) of games waiting for users. This is updated by the servers every 30s with a new date. This way I can ensure if a server crashes, the game is no longer used. If the server is running and fills up, it'll remove itself from the sorted set.
Each game has a SET (game:id:users) containing the users that are in it. Each game can accept no more than 6 players.
Multiple servers are using BRPOP to pick up users from the LIST (users:waiting).
Once a server has a user id, it gets the waiting games ids, then proceeds to run SCARD on their game:id:users SET. If the result of this is less than 6, it adds them to the set.
The problem:
If multiple servers are doing this at once, we could end up with more than 6 users being added to a set at a time. For example if one server requests SCARD and immediately after another runs SADD, the number in the set will have increased but the first server won't know.
Is there anyway of preventing this?
You need transactions, which redis supports: http://redis.io/topics/transactions
in your case in particular, you want to pay attention to the watch command: http://redis.io/topics/transactions#cas
The are some articles are written in several parts,
for example, I got those articles from IBM developer works:
Distributed data processing with
Hadoop, Part 1:Getting started
Distributed data processing with
Hadoop, Part 2:Going further
Distributed data processing with
Hadoop, Part 3: Application
development
I will index those three articles separately. And some one search certain keywords, it is possible the part3 is on the top of hit whle part1 is on the 32th. Therefor, if I list results page by page, the part1 and part3 will display on different page.
How can I make sure the hitted documents in the same series displayed together?
I guess in SQL, we can use "group by".
I believe what you are asking for is Field Collapsing, which is currently a trunk feature in Solr, and will be incorporated into the next Solr version.
If you want to roll your own, One possible way to do this is:
Add a "series id" field to each document that is a member of a series. You will have to ensure that this gets incremented for every new series.
Make an initial query to Lucene, and get a hit list.
For each hit, check to see if it has a series id; If it does, make another query by the series id in order to retrieve all the members of the series.
An alternative is to store the ids of all the series members in a field inside each member's document.