There are many accounts, which get events (data points with timestamps) stored in realtime. I discovered that it is a good idea to store events using a sorted set. I tried to store events for multiple accounts in a one sorted set, but then didn't figure out how to filter events by account id.
Is it a good idea to create multiple sorted sets for each account (> 1000 accounts)?
Questions:
How long will you keep these events in memory ?
Your number of accounts won't grow ?
Are you sure you will have enough memory ?
... but yes, you should definitely create a sorted set for each account, that's the state of art when using Redis.
However, if it's all about real-time events (storing and retrieval) you may want to give a try to a database like InfluxDB that provides a powerful SQL-like query system. It seems a better answer to your problem.
Related
Let's say we want to design a gaming platform for chess and our userbase is ~50M. And whenever a player comes to our platform to play chess we assign a random player with almost the same rating as an opponent. Every player has 4 level rating [Easy, Medium, Hard, Expert].
My approach was to keep all user in Redis cache(assume all users/boats are live and waiting for the opponent) so we are keeping data in the below format:
"chess:easy" : [u1, u2, u3]
"chess:medium" : [u4, u5, u6]
so when a user comes I will remove a user from the cache and assign.
For example: u7 (easy) wants to play chess games then his opponent will be u1(easy).
But won't this create a problem for concurrent requests as we read then remove from Redis List that will be blocking?
Can anyone suggest a better approach with or without cache?
But won't this create a problem for concurrent requests as we read then remove from Redis List that will be blocking?
No, because Redis is single-threaded when it performs write but, assuming the 50M users are equality distributed across the chess level, you would get 4 lists, one per difficulty, with a length of 12,5 million
Manipulating the list could have a lot of complexity because using [LPOP][1] to select a user and remove it has O(N) complexity and the last user may wait a lot of time before getting an opponent.
I think you should aim to use a HASH data structure and splitting the users in different databases:
db0: Easy
db1: Medium
db2: Hard
db3: Expert
In this way, you can store the users that want to play with HSET <USERID> status "ready to play" and then benefit the RANDOMKEY command to select an opponent and delete it with HDEL <KEY RETURNED BY RANDOM> status.
Doing so, you will execute commands O(1) only, providing a fast and reliable matching system that you can optimize further with the redis pipeling feature.
NB: if you set and hash per difficulty level and adding multiple fields to the hash, you will hit the O(N) complexity due to the HDEL command!
[1]: https://redis.io/commands/lpop
Can anyone suggest a better approach with or without cache?
There are many algorithms you may use such as:
Stable marriage problem
Exact cover
but it depends on the user experience you want to implement for your clients.
You can explore the gamedev.stackexchange.com community
I've been trying to make replay system. So basically when player moves, system saves his datas(movements, location, animation etc.) into JSON file. In the end of the record, JSON file may be over 50 MB. I'd want to save this data into Redis with expire date (24-48 hours).
My questions are;
Is it bad to save over 50 MB into Redis with expire date?
How many datas that over 50 MB can Redis handle without performance loss?
If players make 500 records in 48 hours, may it be bad for Redis?
How many milliseconds does it takes 50 MB data from Redis with average VDS/VPS?
Storing a large object(in terms of size) is not a good practice. You may read it from here. One of the problem is network. You need to send 50MB payload to a redis server in a single call. Also if you save them as one big object, then while retrieving, updating it (a single field, element etc), you need to get 50 MB back from server and parse it to get a single field, update it back end send back to server. That's a serious problem in terms of network.
Instead of redis strings, you may prefer sorted sets or lists depending on your use case. If you are going to store them with timestamps and get the range of events between these timestamps, then sorted sets may be an ideal solution for you. It's good for pagination etc. One of the crucial drawback is the complexity of adding a new element is O(log(N)).
lists may also provide a good playground for your case. You may use LPUSH/RPUSH to add new events to your list, and since Redis lists are implemented with linked lists, both adding a message to the beginning or end of the list is same, O(1), which is great.
Whenever an event happens, you either call ZADD or RPUSH/LPUSH to send the events to redis. If you need to query those to you may use available functions such as ZRANGEBYSCORE or LRANGE depending on your choice.
While designing your keys you may use an identifier such as user-id just like you mentioned in the comments. You will not have the problems with lists/sorted sets like you will have in strings. But choosing which one is most suitable for your depends on your use case for reads/writes or business rules.
Here some useful links to read;
Redis data types intro
Redis data types
Redis labs documentation about data types
Background
We're probably going to use BigQuery to store our immutable business events so that we can replay them later to other services. I'm thinking that one approach would be to essentially just store each event as a blob (with some metadata). In order to replay them easily it would of course be nice to maintain a global order of our events and just persist each event to the same table in BigQuery. We probably have something like 10 events per second (which is nowhere near the limit of 100000 messages per second).
Question
Would it be ok to simply persist all events in the same table?
Would it perhaps be better to shard messages in different tables (perhaps based on event type, topic or date)?
If (2), is it possible to join/scan through multiple tables sorted by time so that it's possible to replay events in the same order?
If you primary usage scenario to store events and then reply them - there is no reason to split different event types into different tables. Especially since each event is an opaque blob. Keeping them all in the same table will have small benefit of you being able to do analysis by types of events and other metadata.
Sharding by days makes sense, especially if you will be looking at the most recent data - this will help you to keep the BigQuery query costs down.
But I was worried about your requirement of replying events in order. There is no clustered index in BigQuery, so every time you will need to reply your events, you will have to use "ORDER BY timestamp" in your query, and it can scale only to relatively small amount of data (tens of megabytes). So you will want to replay a lot of events - this design won't work for you.
i prefer create table based on event type and store the time in event table,you can join tables using relationship(use primary,foreign key).Since its storedon time basis you can replay as well.
Points you must remember:
Immutable business events will give you concurrency,Once an event
has been accepted and committed, it becomes an unalterable,it can be
copied everywhere.
The only way to “undo” an event is to add a compensating event on
top like a negative transaction in accounting.
Hope its useful to you.
I am building a site that allows users to view and do some activities (vote, comments,...) on articles. I am using MySql as main storage. In order to improve performance, I am considering using Redis (4.x) to handle some view activities such as top/hot articles...
I am gonna use one sortedSet, called topAticleSortedSet, to store top articles, and this set will be updated frequently every time a user vote or somment on a certain article.
Since each user will login and follow some topics and I also need to filter and display articles in the topArticleSortedSet based on users' following topics.
There is of course scroll paging as well.
For those reasons, I intend to create one topArticleSortedSet for each user and that way each user will have one independent list. But I dont know if this is best practice because there might be million of logged-in users access in my site (then it would be million of sets which is around 1000 article items for each).
Can anyone give me some advice please?
I think you should keep to one Set, and filter it for each user, instead of having a Set per user. Here is why:
My understanding is that the Set have to be updated each time someone reads an article (incrementing a counter probably).
Let's say you have n users, each one reading p articles per day. So you have to update the Set n*p times a day.
In the "single" set option, you will need to update just one set when there is an article read. So it makes a total of n*p updates. In the "one set per user" architecture, you will need to do n*p*n updates, which is much bigger.
Of course, filtering a single Set will take you some time, longer than accessing a Set designed for one user. But on average, I guess it would take you much less time than n operations. Basically, you need to know which is faster: filtering one Set or updating n Sets ?
I'm implementing a Leaderboard into my django web app and don't know the best way to do it. Currently, I'm just using SQL to order my users and, from that, make a Leaderboard, however, this creates two main problems:
Performance is shocking. I've only tried scaling it to a few hundred users but I can tell calculating ranking is slow and excessive caching is annoying since I need users to see their ranking after they are added to the Leaderboard.
It's near-impossible to tell a user what position they are without performing the whole Leaderboard calculation again.
I haven't deployed but I estimate about 5% updates to Leaderboard vs 95% reading (probably more, actually) the Leaderboard. So my latest idea is to calculate a Leaderboard again each time a user is added, with a position field I can easily sort by, and no need to re-calculate to display a user's ranking.
However, could this be a problem if multiple users are committing at the same time, will locking be enough or will rankings stuff up? Additionally, I plan to put this on a separate database solely for these leaderboards, which is the best? I hear good things about redis...
Any better ways to solve this problem? (anyone know how SO makes their leaderboards?)
I've written a number of leaderboards libraries that would help you out there. The one that would be of immediate use is python-leaderboard, which is based on the reference implementation leaderboard ruby gem. Using Redis sorted sets, your leaderboard will be ranked in real-time and there is a specific section on the leaderboard page with respect to performance metrics for inserting a large number of members in a leaderboard at once. You can expect to rank 1 million members in around 30 seconds if you're pipelining writes.
If you're worried about the data changing too often in real-time, you could operate Redis in a master-slave configuration and have the leaderboards pull data from the slave, which would only poll periodically from the master.
Hope this helps!
You will appreciate the concept of sorted sets in Redis.
Don't miss the paragraph which describes your problem :D
Make a table that stores user id and user score. Just pull the leader board using
ORDER BY user_score DESC
and join the Main table for the User name or whatever else you need.
Unless the total number of tests is a variable in your equation, the calculation from your ranking system should stay the same for each user so just update individual entries.