Count relevance/weighted arithmetic mean SQL - sql

I have a table of movies and a table of reviews
In my app, I want to show top 10 movies of any genre.
I clearly cannot sort movies just by rating since there are movies with only 1 5 star review, thus only irrelevant movies will be recommended to users.
Currently I receive from DB top 100 movies in this genre sorted by reviews, sort this list by rating on server and only then display top 10.
That kinda works but this solution is impractical in case of e.g. review bombing and moreover, the purpose of top 10 list is to recommend the most relevant movies.
My idea was to add relevance column into the movies table but I've got no clue how to count it:
(amount of 5 star reviews * 5 ) + (amount of 4 star reviews * 4 ) and so on - no
(amount of 5 star reviews * 1 ) + (amount of 4 star reviews * 0.8) + ... + (amount of 0 star reviews * 0.1) - no
total amount of reviews / avgrating - no
((amount of 5 star reviews * 5 ) + (amount of 4 star reviews * 4 ) and so on) / amount of reviews total - mb, I'm not sure what about 0
Moreover, the rating in ratings is not a real number. User can give only 5, 4.5, 4 etc. score review. But what about the situation where users can rate movies like 5, 4.9, 4.8 ... 0.1?
So, how to perform this operation in better way?
[Upd] I think instead of division of smth. we should multiply averagerating and reviews from movies in order to count the relevance (averagerating and reviews are alredy automatically updated on each insert/delete/update). Also we should try to normalize the product.
In this situation movies with 100 reviews of 5 and averagerating of 5 won't beat up movies with averagerating 3.8 but with 57k reviews and also the problem of review bombing will be solved.
Can anyone prove my guess?

I agree with #NickW that this is more of a statistics question than a programming question, but I'll try to answer it, anyway.
If you want to account for both average rating and number of ratings, a straightforward method is to multiply the two. This gives you the sum of all ratings, but, as #qwezxc789 notes, this does not account for the number of zero ratings. Another strategy could be a linear combination of avgrating and reviews. Collinearity shouldn't be an issue because neither variable depends on the other. You could even play around with the linear coefficients to change the relative contribution of each variable. This solution easily generalizes to n independent variables.
Let wi be the weight of predictor i, 1 ≤ i ≤ n, w1 + ... + wn = 1 (or any other constant, but why not use 1?).
You can add this value as a new relevance column in the movies table using the following SQL. I use two equally weighted predictors: ratings and reviews.
ALTER TABLE movies ADD [relevance] AS
(SELECT 0.5 * avgrating + 0.5 * reviews
FROM movies)

Related

SQL- How to get value from a different table without joining to use on a equation?

I have a table with total user count of a company by their divisions, and I have another table with total spend on multiple different products, I want to identify the per capita spend value for each product by dividing each total spend on a product by the total use count from all the divisions.
Division
User Count
A
10
B
20
C
20
D
50
Total
100
Product Table,
Product
Total Spend
Apple
670
Orange
580
Grapes
640
Tomato
1050
End result should be ,
Product
Total Spend
Apple
6.7
Orange
5.8
Grapes
6.4
Tomato
10.5
Since there is nothing common among these tables to join, I need a way to get the total of the column of User count and use it in an equation in the query. It has to be dynamic so that even if the user count changes it will reflect on the per capita spend.
I'm using Zoho Analytics to do my online queries.
You can just aggregate before joining:
select p.*, p.spend * 1.0 / d.user_count
from product p cross join
(select sum(user_count) as user_count
from divisions
) d;

How to link the relational datatables and divide the sum of 3 columns from one table by 1 column in another table using SQL SELECT query

I'm creating a database for Formula 1 drivers/teams. The idea is to display the cost comparisons of team budgets vs team points and driver salaries, to see the effective cost per point for the top 10 teams.
E.g from 2015 info
Team Mercedes
Income received = euro 467.4m (sponsors 122m, partners 212.40m, tv 133m).
Points Scored = 703 (total from both drivers: Hamilton 381, Rosberg 322).
Effective Cost Per Point = 2,506,417.11
I have 3 datatables: One each for the team, the drivers, and a table to join tables 1&2 together, to then create the correct SELECT queries:
Table 'team'
teamid
teamname
sponsors
partners
tv
total
Table 'driver'
driverid
drivername
salary
points
Table 'driverteam'
teamid
driverid
totalbudget
totalpoints
I first need to get the sum of the sponsors,partners and tv so I have created the following SELECT statement:
$sql = "SELECT teamname, sum(sponsors+partners+tv) as total FROM team ORDER BY total DESC LIMIT 10";
I know how to get the salary and points from a driver:
$sql = "SELECT drivername, salary FROM driver ORDER BY salary DESC LIMIT 10";
$sql = "SELECT drivername, points FROM driver ORDER BY points DESC LIMIT 10";
Seems ok, but now comes my problem, how to get the SUM from table 1 and divide it by the points that the drivers have got in table 2.
My limited brain says I need a INNER JOIN using table 3 to get table 1 & 2 together, before performing the necessary divisions etc.
As there are 2 drivers per team, I need to divide the team total budget by each drivers' points, as well as the team total by both drivers points.
What SELECT queries do I need to achieve this?

Compare and retrieve information using sub-queries

I am new to SQL and I got stuck at this problem.
There are three separate tables needed for this problem, with relevant information as follows
copies table rentalrates table movies table
movienum rentalcode rentalcode rate movienum title yearreleased
1000 D D 10 1000 Matrix 2001
... D WN 12 ... ... ...
... WN WL 15 ... ... ...
So I am required to display the output of "the title and year released of the movie that has the lowest rental rate" using sub queries, and "order by" is not allowed here.
final output like
title yearreleased rate
matrix 2001 10
My trouble is I don't really know hot to compare the rate and select those movies of the lowest rates.
Any help or hint is extremely appreciated :)
thanks a lot!
This query:
select min(rate) from rentalrates
will yield the minimum rental rate. To go one step further, this query:
select m.title as title,
m.yearreleased as yearreleased,
r.rate as rate
from copies as c,
rentalrates as r,
movies as m
where c.movienum = m.movienum
and r.rentalcode = c.rentalcode
and r.rate = select min(rate) from rentalrates;
will display any movies (title, year, rate) where the rate is the lowest rate in the database.

Calculate sum of one row and divide by sum of another row. Oracle view/query

This is my first question on here so bear with me. I have two tables in my Oracle database as following:
modules with fields:
module_code eg. INF211
module_title eg. Information technology
credits eg.20
module_progress with fields:
student_id eg. STU1
module_code eg. INF211
module_year eg. 1
module_percent eg. 65
Each student takes 5 modules a year.
So this is what I want to put this all in one query/view if possible:
Find sum of module percents for a particular student
Find sum of all credits for each module with regards to their module percents.
Divide sum of module percents by sum of credits and multiply by 100 to give me an average grade.
Can this be done?
SELECT student_id,
SUM(credits * module_percent) / SUM(credits) * 100.0
FROM module_progress mp
JOIN modules m
ON m.module_code = mp.module_code
GROUP BY
student_id

Getting player rank from database

I have a RuneScape private server, which stores the player scores in a database.
The highscores load the player's scores and put them into a table.
But now comes the harder part I can't fix:
I want to display the rank of the player. Like: 'Attack level: 44, ranked 12'. So it has to find the rank the user has.
How can I get this to work? I googled for 2 days now, I did not find anything.
I don't know if there's a way to achieve this using the same query.
You could make another query like:
pos = select count(*) from players where attack > 44 + 1
This query would return the number of players ranked above someone. The "plus one" part is to make the rank start at 1 (because the first one won't have anyone ranked above him).
For example, if the table is:
id attack
0 35
1 22
2 121
3 76
pos(3) = 1 (only player 2 is ranked above) + 1 = 2
You can create a view (probably) that shows every players score. Something along these lines might work.
create view player_scores as
select player_id, sum(score)
from scores
group by player_id
That will give you one row per player, with their total score. Having that view, the rank is simple.
select count(*)
from player_scores
where sum > (select sum from player_scores where player_id = 1)
That query will return the number of players having a higher score than player_id = 1.
Of course, if you know your player's score before you run the query, you can pass that score as a parameter. That will run a lot faster as long as the column is indexed.