Current database-structure improvements & implementing badge system - sql

I'm making a website that has logos that need to be guessed. Currently this is my db setup:
Users(user_id, etc)
Logos(logo_id, img, company, level)
Guess(guess_id, user_id, logo_id, guess, guess_count, guessed, time)
When a user does a guess, it's done with an ajax request. In this request, 2 queries are done. 1 to retrieve the company-data (from Logos), one to insert/update the new guess in the db (Guess).
Now, on every page-load I need to know the total amount of guesses, and how many logos there are per level. This requires 2 queries - one that checks the Logos, one that gets the amount of guessed (guessed = 1) guesses from Guess per level.
Now I want to implement some kind of badge-system, like here on SO. Reading through some other questions, I saw that it might be better to have a separate table containing a the total amount of guesses and such, so that it takes the same resources if a user has 10 guesses or 10000. I didn't do this for several reasons:
requires an extra query in my ajax-call, which I'd like to keep as short as possible
page-reloading shouldn't happen that frequently, so that shouldn't take too long
I wouldn't know how to count the total amounts of guesses per level, unless the table would look like: AmountOfGuesses(id, user_id, level, counter) but then it'd take more resources depending on the amount of levels you've unlocked.
As for the badge system, I know the terms should just be checked eg when a user submits an answer. If course this requires yet another query every time an answer is submitted, namely to check the total amount of answers the user has. Then depending on that amount, the badge should be assigned. As for the badges, I was thinking about a table-structure like so:
Badges( badge_id, name, description, etc)
BadgeAssigned( user_id, badge_id, time )
Does this structure seem good for badges?
Is the current structure of the rest of my database good, or is it better if it is adjusted?

Related

Reducing database load from consecutive queries

I have an application which calls the database multiple times to achieve one simple goal.
A little information about this application; In short, the application scrapes data from a webpage & stores specific information from this page into a database. The important information in this query is: Player name, Position. There can be multiple sitting at one specific position, kill points & Class
Player name has every potential to change or remain the same every day
Regarding the Position, there can be multiple sitting in one position
Kill points has the potential to increase or remain the same every day
Class, there is only 2 possibilities that a name can be, Ex: A can change to B or remain A (same in reverse), but cannot be C,D,E,F
The player name can change at any particular day, Position can also change dependent on the kill point increase from the last update which spins back around to the goal. This is to search the database day by day, from the current date to as far back as 2021-02-22 starting at the most recent entry for a player name and back track to the previous day to check if that player name is still the same or has changed.
What is being used as a main reference to the change is the kill points. As the days go on, this number will either be the exact same or increase, it can never decrease.
So now onto the implementation of this application.
The first query which runs finds the most recent entry for the player name
SELECT TOP(1) * FROM [changes] WHERE [CharacterName]=#charname AND [Territory]=#territory AND [Archived]=0 ORDER BY [Recorded] DESC
Then continue to check the previous days entries with the following query:
SELECT TOP(1) * FROM [changes] WHERE [Territory]=#territory AND [CharacterName]=#charname AND [Recorded]=#searchdate AND ([Class] LIKE '%{Class}%' OR [Class] LIKE '%{GetOpposite(Class)}%' AND [Archived]=0 )
If no results are found, will then proceed to find an alternative name with the following query:
SELECT TOP(5) * FROM [changes] WHERE [Kills] <= #kills AND [Recorded]='{Data.Recorded.AddDays(-1):yyyy-MM-dd}' AND [Territory]=#territory AND [Mode]=#mode AND ([Class] LIKE #original OR [Class] LIKE #opposite) AND [Archived]=0 ORDER BY [Kills] DESC
The aim of the query above is to get the top 5 entries that are the closest possible matches & Then cross references with the day ahead
SELECT COUNT(*) FROM [changes] WHERE [CharacterName]=#CharacterName AND [Territory]=#Territory AND [Recorded]=#SearchedDate AND [Archived]=0
So with checking the day ahead, if the character name is not found in the day ahead, then this is considered to be the old player name for this specific character, else after searching all 5 of the results and they are all found to be present in the day aheads searches, then this name is considered to be new to the table.
Now with the date this application started to run up to today's date which is over 400 individual queries on the database to achieve one goal.
It is also worth a noting that this table grows by 14,400 - 14,500 Rows each and every day.
The overall question to this specific? Is it possible to bring all these queries into less calls onto the database, reduce queries & improve performance?
What you can do to improve performance will be based on what parts of the application stack you can manipulate. Things to try:
Store Less Data - Database content retrieval speed is largely based on how well the database is ordered/normalized and just how much data needs to be searched for each query. Managing a cache of prior scraped pages and only storing data when there's been a change between the current scrape and the last one would guarantee less redundant requests to the db.
Separate specific classes of data - Separating data into dedicated tables would allow you to query a specific table for a specific character, etc... effectively removing one where clause.
Reduce time between queries - Less incoming concurrent requests means less resource contention and faster response times to prior requests.
Use another data structure - The only reason you're using top() is because you need data ordered in some specific way (most-recent, etc...). If you just used a code data structure that keeps the data ordered and still easily-query-able you could then perhaps offload some sql requests to this structure instead of the db.
The suggestions above are not exhaustive, but what you do to improve performance is largely a function of what in the application stack you have the ability to modify.

Users updating same row at the same time SQL Server

I want to create a SQL Server table that has a Department and a Maximum Capacity columns (assume 10 for this scenario). When users add them selves to a department the system will check the current assignment count (assume 9 for this scenario) in the department and compare it to the maximum value. If it is below the maximum, they will be added.
The issue is this: what if two users submit at the same time and the when the code retrieves the current assignment count it will be 9 for both. One user updates the row sooner so now its 10 but the other user has already retrieved the previous value before the update (9) and so both are valid when compared and we end up with 11 users in the department.
Is this even possible and how can one solve it?
The answer to your problem lies in understanding "Database Concurrency" and then choosing the correct solution to your specific scenario.
It too large a topic to cover in a single SO answer so I would recommend doing some reading and coming back with specific questions.
However in simple form you either block the assignments out to the first person who tries to obtain them (pessimistic locking), or you throw an error after someone tries to assign over the limit (optimistic locking).
In the pessimistic case you then need ways to unblock them if the user fails to complete the transaction e.g. a timeout. A bit like on a ticket booking website it says "These tickets are being held for you for the next 10 minutes, you must complete your booking within that time else you may lose them".
And when you're down to the last few positions you are going to be turning everyone after the first away... no other way around it if you require this level of locking. (Well you could then create a waiting list, but that's another issue in itself).

SQL complicated query with joins

I have problem with one query.
Let me explain what I want:
For the sake of bravity let's say that I have three tables:
-Offers
-Ratings
-Users
Now what I want to do is to create SQL query:
I want Offers to be listed with all its fields and additional temporary column that IS NOT storred anywhere called AverageUserScore.
This AverageUserScore is product of grabbing all offers, belonging to particular user and then grabbing all ratings belonging to these offers and then evaluating those ratings average - this average score is AverageUserScore.
To explain it even further, I need this query for Ruby on Rails application. In the browser inside application you can see all offers of other users , with AverageUserScore at the very end, as the last column.
Associations:
Offer has many ratings
Offer belongs to user
Rating belongs to offer
User has many offers
Assumptions made:
You actually have a numeric column (of any type that SQL's AVG is fine with) in your Rating model. I'm using a column ratings.rating in my examples.
AverageUserScore is unconventional, so average_user_score is better.
You don't mind not getting users that have no offers: average rating is not clearly defined for them anyway.
You don't deviate from Rails' conventions far enough to have a primary key other than id.
Displaying offers for each user is a straightforward task: in a loop of #users.each do |user|, you can do user.offers.each do |offer| and be set. The only problem here is that it will execute a separate query for every user. Not good.
The "fetching offers" part is a standard N+1 counter seen even in the guides.
#users = User.includes(:offers).all
The interesting part here is only getting the averages.
For that I'm going to use Arel. It's already part of Rails, ActiveRecord is built on top of it, so you don't need to install anything extra.
You should be able to do a join like this:
User.joins(offers: :ratings)
And this won't get you anything interesting (apart from filtering users that have no offers). Inside though, you'll get a huge set of every rating joined with its corresponding offer and that offer's user. Since we're taking averages per-user we need to group by users.id, effectively making one entry per one users.id value. That is, one per user. A list of users, yes!
Let's stop for a second and make some assignments to make Arel-related code prettier. In fact, we only need two:
users = User.arel_table
ratings = Rating.arel_table
Okay. So. We need to get a list of users (all fields), and for each user fetch an average value seen on his offers' ratings' rating field. So let's compose these SQL expressions:
# users.*
user_fields = users[Arel.star] # Arel.star is a portable SQL "wildcard"
# AVG(ratings.rating) AS average_user_score
average_user_score = ratings[:rating].average.as('average_user_score')
All set. Ready for the final query:
User.includes(:offers) # N+1 counteraction
.joins(offers: :ratings) # dat join
.select(user_fields, average_user_score) # fields we need
.group(users[:id]) # grouping to only get one row per user

How to keep a list of 'used' data per user

I'm currently working on a project in MongoDB where I want to get a random sampling of new products from the DB. But my problem is not MongoDB specific, I think it's a general database question.
The scenario:
Let's say we have a collection (or table) of products. And we also have a collection (or table) of users. Every time a user logs in, they are presented with 10 products. These products are selected randomly from the collection/table. Easy enough, but the catch is that every time the user logs in, they must be presented with 10 products that they have NEVER SEEN BEFORE. The two obvious ways that I can think of solving this problem are:
Every user begins with their own private list of all products. Each time they get one of these products, the product is removed from their private list. The result is that the next time products are chosen from this previously trimmed list, it already contains only new items.
Every user has a private list of previously viewed products. When a user logs in, they select 10 random products from the master list, compare the id of each against their list of previously viewed products, and if the item appears on the previously viewed list, the application throws this one away selects a new one, and iterates until there are 10 new items, which it then adds to the previously viewed list for next time.
The problem with #1 is it seems like a tremendous waste. You would basically be duplicating the list data for n number of users. Also removing/adding new items to the system would be a nightmare since it would have to iterate through all users. #2 seems preferable, but it too has issues. You could end up making a lot of extra and unnecessary calls to the DB in order to guarantee 10 new products. As a user goes through more and more products, there are less new ones to choose from, so the chances of having to throw one away and get new one from the DB greatly increases.
Is there an alternative solution? My first and primary concern is performance. I will give up disk space in order to optimize performance.
Those 2 ways are a complete waste of both primary and secondary memory.
You want to show 2 never before seen products, but is this a real must?
If you have a lot of products 10 random ones have a high chance of being unique.
3 . You could list 10 random products, even though not as easy as in MySQL, still less complicated than 1 and 2.
If you don't care how random the sequence of id's is you could do this:
Create a single randomized table of just product id's and a sequential integer surrogate key column. Start each customer at a random point in the list on first login and cycle through the list ordered by that key. If you reach the end, start again from the top.
The customer record would contain a single value for the last product they saw (the surrogate from the randomized list, not the actual id). You'd then pull the next ten on login and do a single update to the customer. It wouldn't really be random, of course. But this kind of table-seed strategy is how a lot of simpler pseudo-random number generators work.
The only problem I see is if your product list grows more quickly than your users log in. Then they'd never see the portions of the list which appear before wherever they started. Even so, with a large list of products and very active users this should scale much better than storing everything they've seen. So if it doesn't matter that products appear in a set psuedo-random sequence, this might be a good fit for you.
Edit:
If you stored the first record they started with as well, you could still generate the list of all things seen. It would be everything between that value and last viewed.
How about doing this: crate a collection prodUser where you will have just the id of the product and the list of customersID, (who have seen these products) .
{
prodID : 1,
userID : []
}
when a customer logs in you find the 10 prodID which has not been assigned to that user
db.prodUser.find({
userID : {
$nin : [yourUser]
}
})
(For some reason $not is not working :-(. I do not have time to figure out why. If you will - plz let me know.). After showing the person his products - you can update his prodUser collection. To mitigate mongos inability to find random elements - you can insert elements randomly and just find first 10.
Everything should work really fast.

Recurring Orders

Hi everyone I'm working on a school project, and for my project I chose to create an ecommerce system that can process recurring orders. This is for my final project, I'll be graduating in May with an associates in computer science.
Keep in mind this is no where a final solution and it's basically a jumping off point for this database design.
A little background on the business processes.
- Customer will order a product, and will specify during checkout whether it is a one time order or a weekly/monthly order.
- Customer will specify a location in which to pick up their order (this location is specific only to the order)
- If the value of the order > 25.00 then it is accepted otherwise it is rejected.
- This will populate the orders_test and order_products_test tables respectively
Person on the back end will have a report generated for deliveries for the day based on these two tables.
They will be able to print it off and it will generate a list of what items go to what location.
Based on the following criteria.
date_of_next_scheduled_delivery = current date
remaining_deliveries > 0
Once they are satisfied with the delivery list they will press "Process Deliveries" button.
This will adjust the order_products_test table as follows
Subtract 1 from remaining_deliveries
Insert current date into date_of_last_delivery_processed
Based on delivery_frequency (i.e. once, weekly, monthly) it will change the date_of_next_scheduled_delivery
status values in the order_products_test table can either be active, hold, or canceled, expired
I just would like some opinions if I am approaching this correctly or if I should scratch this approach and start over again.
A few thoughts, though not necessarily complete (there's a lot to your question, but hopefully these points help):
I don't think you need to keep track of remaining deliveries. You only have 2 options - a one time order, or a recurring order. In both cases, there's no sense in calculating remaining deliveries. It's never leveraged.
In terms of tracking the next delivery date, you can just keep track of the day of the order. If it's recurring -- monthly or weekly, regardless -- everything is calculable from that first date. Most DB systems (MySQL, SQL Server, Oracle, etc) support more than enough date computation flexibility so that you can calculate this on the fly, as opposed to maintaining such a known schedule.
If the delivery location is only specific to the order, I see no use in creating a separate table for it -- it's functionally dependent on the order, you should keep it in the same table as the order. For most e-commerce systems, this is not the case because they tend to associate a list of delivery locations with accounts, which they prompt you about when you order more than once (e.g., Amazon).
Given the above, I bet you can just get away with 2 of your 4 tables above -- Account and Order. But again, if delivery locations are associated with Accounts, I would indeed break that out. (but your question above doesn't suggest that)
Do not name your tables with a "_test" suffix -- it's confusing.