Oracle Database Query Design - sql

I am currently working on a program that is supposed to predict the outcomes of a 1v1 contest. I have given each player their own elo score and am collecting all sorts of data in order to predict who the winner would be.
For each fighter, I want to collect the current average elo of people that they are defeating as well as the current average elo of people that are defeating them. Below is some sample data and explanations in order to help you better understand the data structure.
The picture above shows the basic stats view, V_FIGHT_REVIEW that simplifies my fights table for stats collection. FID is the unique fight id and identifies the fight. PID is the player id and identifies each unique player. The WINNER column represents the winner of the fight. So if PID is not equal to WINNER, that player did not win the fight.
This picture represents the PLAYERS table. To the left you will recognize the PID for each player. To the right you will see the column named ELO.
To rephrase the question, I am having trouble figuring out how I can produce the current average elo of each player they have defeated and the current average elo of each player that has defeated them. These average elos should change as their opponents win/lose fights. The output should be similar to below:
PID | AVG_ELO_DEF | AVG_ELO_DEF_BY

I am 99% sure there is a better way to do this but here is the answer I came up with.
I created a new view with this query:
select w.fid, w.pid winner, l.pid loser, w.elo winner_elo, l.elo loser_elo
from (select * from v_fight_review where pid = winner and fid = fid) w,
(select * from v_fight_review where pid <> winner and fid = fid) l
where w.fid = l.fid;
This query gets the elos of every winner and loser for each fight.
I then created two other views. One view is for the average elos that each opponent has defeated. The other view is for the average elos of opponents that have defeated them. Their code is below.
--gets the average elo of oppenents that they can beat
create view v_avg_elo_winning as
select p.pid, round(avg(vae.loser_elo), 0) elo
from players p, v_avg_elo vae
where p.pid = vae.winner
group by pid
order by pid;
--gets the average of all of the people that can beat them
create view v_avg_elo_losing as
select p.pid, round(avg(vae.winner_elo), 0) elo
from players p, v_avg_elo vae
where p.pid = vae.loser
group by pid
order by pid;
Suggestions are always welcome.

Related

SUM() acting weird when I select from another table

So when I run the following query:
SELECT master.playerID, master.nameFirst, master.nameLast, SUM(managers.G) AS games, SUM(managers.W) AS wins
FROM master, managers
WHERE managers.playerID = master.playerID
AND managers.playerID = 'lemonbo01'
GROUP BY managers.playerID;
I get the appropriate sum of games and sum of wins. But the moment I include another table, in this instance the pitching table, like this:
SELECT master.playerID, master.nameFirst, master.nameLast, SUM(managers.G) AS games, SUM(managers.W) AS wins
FROM master, managers, pitching
WHERE managers.playerID = master.playerID
AND managers.playerID = 'lemonbo01'
GROUP BY managers.playerID;
Although I'm not changing anything about the query except selecting from one more table the wins and games change to absurd numbers. What is exactly causing this?
Thanks in advance.

Getting win % from a table that holds "Winners" and "Losers"

I am working with an SQLite database where I store matches/fights between players like so:
matchId[int] winner[text] loser[text]
I have made queries that sum up how many times a player has won a fight and another one for how many fights a player has lost. But is there a way, in SQL, to type this so that I can find the win% directly from the database or do I have to calculate that elsewhere? There is no problem calculating this elsewhere, but I got intrigued to figure out if/how it can be done purely in SQL.
What I am trying to achieve is basically:
SELECT winner, COUNT(winner) as Wins FROM Fights GROUP BY winner
divided by
SELECT loser, COUNT(loser) as Losses FROM Fights GROUP BY loser;
for each player, which in this table is either a "winner" or a "loser". I also have a table (Players) that holds all these players as "player" that could be utilized to make this work.
You can use union all and aggregation:
select player, avg(win) as win_ratio
from (
select winner as player, 1.0 as win from fights
union all
select loser, 0 from fights
) t
group by player
This gives you, for each player that participated at least one fight, a decimal number between 0 and 1 that represents the win ratio.

How to select in SQL

I have 2 tables I am working with, one is movie_score which contains an id, name, and score. I have another table that is movie_cast which contains mid, cid, and name. Mid is movie id and cid is cast id. the problem I must do is as follows:
Find top 10 (distinct) cast members who have the highest average movie scores. The output list must be sorted by score (from high to low), and then, by cast name in alphabetical order, if/when they have the same average score. The search must NOT include: (a) movies with scores lower than 50 AND (b) cast members who have appeared in less than 3 movies (again, only counting the number of appearances in movies with scores of at least 50). (Expected Output: cid, cname, average score)
I have tried to put the command together but so far this is all I was able to get:
SELECT DISTINCT movie_cast.cid, movie_cast.cname, FROM movie_score INNER JOIN movie_cast ON movie_score.id=movie_cast.mid ORDER BY cname LIMIT 10;
movie-name-score.txt goes with movie_score:
Example of .txt file
9,"Star Wars: Episode III - Revenge of the Sith 3D",80
24214,"The Chronicles of Narnia: The Lion, The Witch and The Wardrobe",76
1789,"War of the Worlds",74
10009,"Star Wars: Episode II - Attack of the Clones 3D",67
771238285,"Warm Bodies",-1
770785616,"World War Z",-1
771303871,"War Witch",89
771323601,"War of the Worlds the True Story",-1
movie-cast.txt goes with movie_cast:
Example:
9,162652153,"Hayden Christensen"
9,162652152,"Ewan McGregor"
9,418638213,"Kenny Baker"
9,548155708,"Graeme Blundell"
9,358317901,"Jeremy Bulloch"
9,178810494,"Anthony Daniels"
9,770726713,"Oliver Ford Davies"
9,162652156,"Samuel L. Jackson"
9,162655731,"James Earl Jones"
I expect to have an output something like:
162655731,"James Earl Jones",average score of the movies they have been in
Does anyone know the best way to create this command?

Query, SQL, Ruby on Rails

I am working a project and encountered a problem regarding writing the best query for the problem.
I will start presenting the problem and the solution I found.
We have the following ERD structure:
A Player has many Scores, and a Score has many Handicap Results.
We have a many to many relationship between Handicap and League.
In my app at some point I run a calculation formula that takes all players from a Customer or League and for each score of the player we create a handicap result corresponding to he Handicaps that the Club / League has.
HandicapResults: value, score_id, handicap_id
Handicaps: game_type(string), league_ids (association between the handicap and leagues)
Score: league_id, player_id, game_type(string), play_at (date), round_id (integer)
On the customers#show action I want to display for all handicaps and players the result.
The result is the LAST SCORE PLAYED SORTED BY ROUND_ID DESC AND PLAY_AT DESC. From this score we the the handicap_result corresponding to customer handicaps.
The solution I found for one player would be:
The only problem is that this will do 1 select for all players displayed in the view. I would want to write a select that would return the values for a collection of players (player_ids).
At the moment the sql returns the handicap_result corresponding to the last score played (sorted_by round_id desc and play_at desc) where score.league_id included in handicap.league_ids and score.game_type = handicap.game_type.
I would want to created a method that can have as parameters player_ids, handicaps, ... all informations required for the query.
That returns the following:
Ex:
player_ids: [1, 2, 3]
handicaps: [handicap_1, handicap_2]
# and return something like:
{
#player_id 1: { handicap_1.id: value, handicap_2.id: value }
.........
#player_id 5: { handicap_1.id: value, handicap_2.id: value }
}
# the value is the handicap_result where handicap_result.handicap_id == handicap_1.id / handicap_2.id and for the corresponding score
Hopefully I described the problem correctly and people can understand me. I really wish that someone can help me into writing the query that runes 1 time and returns the values for a given collection of players.
Thank you and have a nice day!

How do I use the MAX function over three tables?

So, I have a problem with a SQL Query.
It's about getting weather data for German cities. I have 4 tables: staedte (the cities with primary key loc_id), gehoert_zu (contains the city-key and the key of the weather station that is closest to this city (stations_id)), wettermessung (contains all the weather information and the station's key value) and wetterstation (contains the stations key and location). And I'm using PostgreSQL
Here is how the tables look like:
wetterstation
s_id[PK] standort lon lat hoehe
----------------------------------------
10224 Bremen 53.05 8.8 4
wettermessung
stations_id[PK] datum[PK] max_temp_2m ......
----------------------------------------------------
10224 2013-3-24 -0.4
staedte
loc_id[PK] name lat lon
-------------------------------
15 Asch 48.4 9.8
gehoert_zu
loc_id[PK] stations_id[PK]
-----------------------------
15 10224
What I'm trying to do is to get the name of the city with the (for example) highest temperature at a specified date (could be a whole month, or a day). Since the weather data is bound to a station, I actually need to get the station's ID and then just choose one of the corresponding to this station cities. A possible question would be: "In which city was it hottest in June ?" and, say, the highest measured temperature was in station number 10224. As a result I want to get the city Asch. What I got so far is this
SELECT name, MAX (max_temp_2m)
FROM wettermessung, staedte, gehoert_zu
WHERE wettermessung.stations_id = gehoert_zu.stations_id
AND gehoert_zu.loc_id = staedte.loc_id
AND wettermessung.datum BETWEEN '2012-8-1' AND '2012-12-1'
GROUP BY name
ORDER BY MAX (max_temp_2m) DESC
LIMIT 1
There are two problems with the results:
1) it's taking waaaay too long. The tables are not that big (cities has about 70k entries), but it needs between 1 and 7 minutes to get things done (depending on the time span)
2) it ALWAYS produces the same city and I'm pretty sure it's not the right one either.
I hope I managed to explain my problem clearly enough and I'd be happy for any kind of help. Thanks in advance ! :D
If you want to get the max temperature per city use this statement:
SELECT * FROM (
SELECT gz.loc_id, MAX(max_temp_2m) as temperature
FROM wettermessung as wm
INNER JOIN gehoert_zu as gz
ON wm.stations_id = gz.stations_id
WHERE wm.datum BETWEEN '2012-8-1' AND '2012-12-1'
GROUP BY gz.loc_id) as subselect
INNER JOIN staedte as std
ON std.loc_id = subselect.loc_id
ORDER BY subselect.temperature DESC
Use this statement to get the city with the highest temperature (only 1 city):
SELECT * FROM(
SELECT name, MAX(max_temp_2m) as temp
FROM wettermessung as wm
INNER JOIN gehoert_zu as gz
ON wm.stations_id = gz.stations_id
INNER JOIN staedte as std
ON gz.loc_id = std.loc_id
WHERE wm.datum BETWEEN '2012-8-1' AND '2012-12-1'
GROUP BY name
ORDER BY MAX(max_temp_2m) DESC
LIMIT 1) as subselect
ORDER BY temp desc
LIMIT 1
For performance reasons always use explicit joins as LEFT, RIGHT, INNER JOIN and avoid to use joins with separated table name, so your sql serevr has not to guess your table references.
This is a general example of how to get the item with the highest, lowest, biggest, smallest, whatever value. You can adjust it to your particular situation.
select fred, barney, wilma
from bedrock join
(select fred, max(dino) maxdino
from bedrock
where whatever
group by fred ) flinstone on bedrock.fred = flinstone.fred
where dino = maxdino
and other conditions
I propose you use a consistent naming convention. Singular terms for tables holding a single item per row is a good convention. You only table breaking this is staedte. Should be stadt.
And I suggest to use station_id consistently instead of either s_id and stations_id.
Building on these premises, for your question:
... get the name of the city with the ... highest temperature at a specified date
SELECT s.name, w.max_temp_2m
FROM (
SELECT station_id, max_temp_2m
FROM wettermessung
WHERE datum >= '2012-8-1'::date
AND datum < '2012-12-1'::date -- exclude upper border
ORDER BY max_temp_2m DESC, station_id -- id as tie breaker
LIMIT 1
) w
JOIN gehoert_zu g USING (station_id) -- assuming normalized names
JOIN stadt s USING (loc_id)
Use explicit JOIN conditions for better readability and maintenance.
Use table aliases to simplify your query.
Use x >= a AND x < b to include the lower border and exclude the upper border, which is the common use case.
Aggregate first and pick your station with the highest temperature, before you join to the other tables to retrieve the city name. Much simpler and faster.
You did not specify what to do when multiple "wettermessungen" tie on max_temp_2m in the given time frame. I added station_id as tiebreaker, meaning the station with the lowest id will be picked consistently if there are multiple qualifying stations.