Design campaign management system to distribute 6 M items to users who wants to get free item. The campaign lasts for 10 mins. Each user must get only 1 item.
My approach:
I am more interested to discuss the implementation /algorithm part Basically decrementing a counter and flag a user ID atomically . Since we are looking at peak req of 100k per sec. I think distributed caches would be better suited here may be Redis.
So we are looking at transactions to do a decrement and flagging the userid.
Any suggestions on if this seems the correct approach or your thoughts on better approaches
Related
Let's say we want to design a gaming platform for chess and our userbase is ~50M. And whenever a player comes to our platform to play chess we assign a random player with almost the same rating as an opponent. Every player has 4 level rating [Easy, Medium, Hard, Expert].
My approach was to keep all user in Redis cache(assume all users/boats are live and waiting for the opponent) so we are keeping data in the below format:
"chess:easy" : [u1, u2, u3]
"chess:medium" : [u4, u5, u6]
so when a user comes I will remove a user from the cache and assign.
For example: u7 (easy) wants to play chess games then his opponent will be u1(easy).
But won't this create a problem for concurrent requests as we read then remove from Redis List that will be blocking?
Can anyone suggest a better approach with or without cache?
But won't this create a problem for concurrent requests as we read then remove from Redis List that will be blocking?
No, because Redis is single-threaded when it performs write but, assuming the 50M users are equality distributed across the chess level, you would get 4 lists, one per difficulty, with a length of 12,5 million
Manipulating the list could have a lot of complexity because using [LPOP][1] to select a user and remove it has O(N) complexity and the last user may wait a lot of time before getting an opponent.
I think you should aim to use a HASH data structure and splitting the users in different databases:
db0: Easy
db1: Medium
db2: Hard
db3: Expert
In this way, you can store the users that want to play with HSET <USERID> status "ready to play" and then benefit the RANDOMKEY command to select an opponent and delete it with HDEL <KEY RETURNED BY RANDOM> status.
Doing so, you will execute commands O(1) only, providing a fast and reliable matching system that you can optimize further with the redis pipeling feature.
NB: if you set and hash per difficulty level and adding multiple fields to the hash, you will hit the O(N) complexity due to the HDEL command!
[1]: https://redis.io/commands/lpop
Can anyone suggest a better approach with or without cache?
There are many algorithms you may use such as:
Stable marriage problem
Exact cover
but it depends on the user experience you want to implement for your clients.
You can explore the gamedev.stackexchange.com community
On the data extracts page Yodlee describes best practices for using getRefreshedUserItems but I think there are a few more details there that should be shared:
Is the 1 minute recommendation just in place to mitigate having to deal with large amounts of returned data? Is it within reason to only perform the polling for refreshed accounts every 5 minutes instead?
Say I do set up my process to retrieve refreshed items every 5 minutes as previously described, but my process fails to run during one of the iterations. If I leave it alone does that mean for 24 hours there are a few items I will have failed to pick up the refresh for? If so, how are others handling this? Recording a record of the timestamp of each successful communication with getRefreshedUserItems or perhaps iterating their local cache of Financial institutions that haven't been synced in more than 24 hours and retrieving updates for those as a one off communication? Or something else?
The main reason for keeping the limit as 1 minute is because of high number of refreshes, at current point of time you may not have high number of users but in future it may become higher.
Coming to your question about handling failure use cases- Say one of your job fails to fetch the items for that particular instance(passed duration in request), then you can have records of all such requests(which failed to get those items) and you can have a follow up job every hour which will trigger request for all the failed durations. This way you won't missed any items and keep data in sync with Yodlee.
I am building a site that allows users to view and do some activities (vote, comments,...) on articles. I am using MySql as main storage. In order to improve performance, I am considering using Redis (4.x) to handle some view activities such as top/hot articles...
I am gonna use one sortedSet, called topAticleSortedSet, to store top articles, and this set will be updated frequently every time a user vote or somment on a certain article.
Since each user will login and follow some topics and I also need to filter and display articles in the topArticleSortedSet based on users' following topics.
There is of course scroll paging as well.
For those reasons, I intend to create one topArticleSortedSet for each user and that way each user will have one independent list. But I dont know if this is best practice because there might be million of logged-in users access in my site (then it would be million of sets which is around 1000 article items for each).
Can anyone give me some advice please?
I think you should keep to one Set, and filter it for each user, instead of having a Set per user. Here is why:
My understanding is that the Set have to be updated each time someone reads an article (incrementing a counter probably).
Let's say you have n users, each one reading p articles per day. So you have to update the Set n*p times a day.
In the "single" set option, you will need to update just one set when there is an article read. So it makes a total of n*p updates. In the "one set per user" architecture, you will need to do n*p*n updates, which is much bigger.
Of course, filtering a single Set will take you some time, longer than accessing a Set designed for one user. But on average, I guess it would take you much less time than n operations. Basically, you need to know which is faster: filtering one Set or updating n Sets ?
I'm implementing a Leaderboard into my django web app and don't know the best way to do it. Currently, I'm just using SQL to order my users and, from that, make a Leaderboard, however, this creates two main problems:
Performance is shocking. I've only tried scaling it to a few hundred users but I can tell calculating ranking is slow and excessive caching is annoying since I need users to see their ranking after they are added to the Leaderboard.
It's near-impossible to tell a user what position they are without performing the whole Leaderboard calculation again.
I haven't deployed but I estimate about 5% updates to Leaderboard vs 95% reading (probably more, actually) the Leaderboard. So my latest idea is to calculate a Leaderboard again each time a user is added, with a position field I can easily sort by, and no need to re-calculate to display a user's ranking.
However, could this be a problem if multiple users are committing at the same time, will locking be enough or will rankings stuff up? Additionally, I plan to put this on a separate database solely for these leaderboards, which is the best? I hear good things about redis...
Any better ways to solve this problem? (anyone know how SO makes their leaderboards?)
I've written a number of leaderboards libraries that would help you out there. The one that would be of immediate use is python-leaderboard, which is based on the reference implementation leaderboard ruby gem. Using Redis sorted sets, your leaderboard will be ranked in real-time and there is a specific section on the leaderboard page with respect to performance metrics for inserting a large number of members in a leaderboard at once. You can expect to rank 1 million members in around 30 seconds if you're pipelining writes.
If you're worried about the data changing too often in real-time, you could operate Redis in a master-slave configuration and have the leaderboards pull data from the slave, which would only poll periodically from the master.
Hope this helps!
You will appreciate the concept of sorted sets in Redis.
Don't miss the paragraph which describes your problem :D
Make a table that stores user id and user score. Just pull the leader board using
ORDER BY user_score DESC
and join the Main table for the User name or whatever else you need.
Unless the total number of tests is a variable in your equation, the calculation from your ranking system should stay the same for each user so just update individual entries.
Can you please give me an database design suggestion?
I want to sell tickets for events but the problem is that the database can become bootleneck when many user what to buy simultaneously tickets for the same event.
if I have an counter for tickets left for each event there will be more updates on this field (locking) but I will easy found how much tickets are left
if I generate tickets for each event in advance it will be hard to know how much tickets are left
May be it will be better if each event can use separate database (if the requests for this event are expected to be high)?
May be reservation also have to asynchronous operation?
Do I have to use relation database (MySQL, Postgres) or no relation database (MongoDB)?
I'm planing to use AWS EC2 servers so I can run more servers if I need them.
I heard that "relation databases don't scale" but I think that I need them because they have transactions and data consistency that I will need when working with definite number of tickets, Am I right or not?
Do you know some resources in internet for this kind of topics?
If you sell 100.000 tickets in 5 minutes, you need a database that can handle at least 333 transactions per second. Almost any RDBMS on recent hardware, can handle this amount of traffic.
Unless you have a not so optimal database schema and/of SQL, but that's another problem.
First things first: when it comes to selling stuff (ecommerce), you really do need a transactional support. This basically excludes any type of NoSQL solutions like MongoDB or Cassandra.
So you must use database that supports transactions. MySQL does, but not in every storage engine. Make sure to use InnoDB and not MyISAM.
Of cause many popular databases support transactions, so it's up to you which one to choose.
Why transactions? Because you need to complete a bunch of database updates and you must be sure that they all succeed as one atomic operation. For example:
1) make sure ticket is available.
2) Reduce the number of available tickets by one
3) process credit card, get approval
4) record purchase details into database
If any of the operations fail you must rollback the previous updates. For example if credit card is declined you should rollback the decreasing of available ticket.
And database will lock those tables for you, so there is no change that in between step 1 and 2 someone else tries to purchase a ticket but the count of available tickets has not yet been decreased. So without the table lock it would be possible for a situation where only 1 ticket is left available but it is sold to 2 people because second purchase started between step 1 and step 2 of first transaction.
It's essential that you understand this before you start programming ecommerce project
Check out this question regarding releasing inventory.
I don't think you'll run into the limits of a relational database system. You need one that handles transactions, however. As I recommended to the poster in the referenced question, you should be able to handle reserved tickets that affect inventory vs tickets on orders where the purchaser bails before the transaction is completed.
your question seems broader than database design.
first of all, relational database will scale perfectly well for this. You may need to consider a web services layer which will provide the actual ticket brokering to the end users. here you will be able to manage things in a cached manner independent of the actual database design. however, you need to think through the appropriate steps for data insertion, and update as well as select in order to optimize your performance.
first step would be to go ahead and construct a well normalized relational model to hold your information.
second, build some web service interface to interact with the data model
then put that into a user interface and stress test for many simultaneous transactions.
my bet will be you need to then rework your web services layer iteratively until you are happy - but your database (well normalized) will not be cusing you any bottleneck issues.