I am working with an SQLite database where I store matches/fights between players like so:
matchId[int] winner[text] loser[text]
I have made queries that sum up how many times a player has won a fight and another one for how many fights a player has lost. But is there a way, in SQL, to type this so that I can find the win% directly from the database or do I have to calculate that elsewhere? There is no problem calculating this elsewhere, but I got intrigued to figure out if/how it can be done purely in SQL.
What I am trying to achieve is basically:
SELECT winner, COUNT(winner) as Wins FROM Fights GROUP BY winner
divided by
SELECT loser, COUNT(loser) as Losses FROM Fights GROUP BY loser;
for each player, which in this table is either a "winner" or a "loser". I also have a table (Players) that holds all these players as "player" that could be utilized to make this work.
You can use union all and aggregation:
select player, avg(win) as win_ratio
from (
select winner as player, 1.0 as win from fights
union all
select loser, 0 from fights
) t
group by player
This gives you, for each player that participated at least one fight, a decimal number between 0 and 1 that represents the win ratio.
Related
So when I run the following query:
SELECT master.playerID, master.nameFirst, master.nameLast, SUM(managers.G) AS games, SUM(managers.W) AS wins
FROM master, managers
WHERE managers.playerID = master.playerID
AND managers.playerID = 'lemonbo01'
GROUP BY managers.playerID;
I get the appropriate sum of games and sum of wins. But the moment I include another table, in this instance the pitching table, like this:
SELECT master.playerID, master.nameFirst, master.nameLast, SUM(managers.G) AS games, SUM(managers.W) AS wins
FROM master, managers, pitching
WHERE managers.playerID = master.playerID
AND managers.playerID = 'lemonbo01'
GROUP BY managers.playerID;
Although I'm not changing anything about the query except selecting from one more table the wins and games change to absurd numbers. What is exactly causing this?
Thanks in advance.
I've created a procedure that predicts College football game lines by using the variables #Team1 and #Team2. In the current setup, these teams are entered manually.
For example: #Team1 = 'Ohio St.', #Team2 = 'Southern Miss.'
Then, my calculation will go through a series of calculations on stats comparisons, strength of schedule, etc. to calculate the hypothetical game line (in this case, Ohio St. -39.)
Here's where I need your help: I'm trying to turn this line prediction system into a ranking system, ranking each team from greatest to worst. I'd like to take each team in my Team table and put it through this calculation with each possible matchup. Then, rank the teams based on who has the biggest advantage over every team that given week, vs. who has the least advantage.
Any ideas? I've toyed around with the idea of turning the calculation into a function and pass the values through that way, but not sure where to start.
Thanks!
Apologies for the made-up column names, but the following should do what you want if you convert your proc to a function that takes the two team names as arguments:
Select a.Name as Team1
, b.Name as Team2
, fn_GetStats(a.Name, b.Name)
from TeamsList a
inner join TeamsList b
on a.Name > b.Name --To avoid duplicate rows
order by 3 desc
The join will create a list of all possible unique combinations (e.g. TeamB and TeamA, but not also TeamA and TeamB or TeamA and TeamA).
Assuming the proc outputs just a single value right now, this seems like the easiest solution. You could also do the same join and then loop through your proc with the results, instead.
I am new to SQL and am having trouble setting up this query. I have two tables, one which holds info about the teams, named TEAMS which looks like this:
TEAMS
Name|City|Attendance
Jets| NY| 50
...
And the other which holds info about the games played, named GAMES:
GAMES
Home|Visitors|Date |Result
Jets| Broncos| 1/1/2012| Tie
...
For this specific query I need to find each team that had one or more home games, give the name of the team, the number of wins, the number of losses, and the number of ties. I'm having trouble figuring out how to combine the data, I have made several queries that individually find the amount of losses, wins and ties but I don't know how to join properly or that even is the right approach. Thanks!
This should get you pretty close but without understanding your data fully I can't really give you a perfect working query, but at least you can see what the join might look like.
SELECT TeamName, SUM(SWITCH(Result = 'Win', 1)) AS Wins, SUM(SWITCH(Result = 'Tie', 1)) AS Ties, SUM(SWITCH(Result = 'Loss', 1)) AS Loss
FROM Teams INNER JOIN Games ON (Teams.TeamName = Games.Home OR Teams.TeamName = Games.Visitors)
GROUP BY TeamName
HAVING MAX(SWITCH(Teams.TeamName = Games.Home, 1)) = 1;
It'd be better database design to have IDs instead of team names in the games table, and also having a description like "Tie", "Win", "Loss" I wasn't sure which team that'd refer to (obviously tie is easy), so right now the query just takes whatever is in that column, which I'm sure is incorrect, but it should be a small change to fix it.
I am currently working on a program that is supposed to predict the outcomes of a 1v1 contest. I have given each player their own elo score and am collecting all sorts of data in order to predict who the winner would be.
For each fighter, I want to collect the current average elo of people that they are defeating as well as the current average elo of people that are defeating them. Below is some sample data and explanations in order to help you better understand the data structure.
The picture above shows the basic stats view, V_FIGHT_REVIEW that simplifies my fights table for stats collection. FID is the unique fight id and identifies the fight. PID is the player id and identifies each unique player. The WINNER column represents the winner of the fight. So if PID is not equal to WINNER, that player did not win the fight.
This picture represents the PLAYERS table. To the left you will recognize the PID for each player. To the right you will see the column named ELO.
To rephrase the question, I am having trouble figuring out how I can produce the current average elo of each player they have defeated and the current average elo of each player that has defeated them. These average elos should change as their opponents win/lose fights. The output should be similar to below:
PID | AVG_ELO_DEF | AVG_ELO_DEF_BY
I am 99% sure there is a better way to do this but here is the answer I came up with.
I created a new view with this query:
select w.fid, w.pid winner, l.pid loser, w.elo winner_elo, l.elo loser_elo
from (select * from v_fight_review where pid = winner and fid = fid) w,
(select * from v_fight_review where pid <> winner and fid = fid) l
where w.fid = l.fid;
This query gets the elos of every winner and loser for each fight.
I then created two other views. One view is for the average elos that each opponent has defeated. The other view is for the average elos of opponents that have defeated them. Their code is below.
--gets the average elo of oppenents that they can beat
create view v_avg_elo_winning as
select p.pid, round(avg(vae.loser_elo), 0) elo
from players p, v_avg_elo vae
where p.pid = vae.winner
group by pid
order by pid;
--gets the average of all of the people that can beat them
create view v_avg_elo_losing as
select p.pid, round(avg(vae.winner_elo), 0) elo
from players p, v_avg_elo vae
where p.pid = vae.loser
group by pid
order by pid;
Suggestions are always welcome.
The question is whether the query described below can be done without recourse to procedural logic, that is, can it be handled by SQL and a CTE and a windowing function alone? I'm using SQL Server 2012 but the question is not limited to that engine.
Suppose we have a national database of music teachers with 250,000 rows:
teacherName, address, city, state, zipcode, geolocation, primaryInstrument
where the geolocation column is a geography::point datatype with optimally tesselated index.
User wants the five closest guitar teachers to his location. A query using a windowing function performs well enough if we pick some arbitrary distance cutoff, say 50 miles, so that we are not selecting all 250,000 rows and then ranking them by distance and taking the closest 5.
But that arbitrary 50-mile radius cutoff might not always succeed in encompassing 5 teachers, if, for example, the user picks an instrument from a different culture, such as sitar or oud or balalaika; there might not be five teachers of such instruments within 50 miles of her location.
Also, now imagine we have a query where a conservatory of music has sent us a list of 250 singers, who are students who have been accepted to the school for the upcoming year, and they want us to send them the five closest voice coaches for each person on the list, so that those students can arrange to get some coaching before they arrive on campus. We have to scan the teachers database 250 times (i.e. scan the geolocation index) because those students all live at different places around the country.
So, I was wondering, is it possible, for that latter query involving a list of 250 student locations, to write a recursive query where the radius begins small, at 10 miles, say, and then increases by 10 miles with each iteration, until either a maximum radius of 100 miles has been reached or the required five (5) teachers have been found? And can it be done only for those students who have yet to be matched with the required 5 teachers?
I'm thinking it cannot be done with SQL alone, and must be done with looping and a temporary table--but maybe that's because I haven't figured out how to do it with SQL alone.
P.S. The primaryInstrument column could reduce the size of the set ranked by distance too but for the sake of this question forget about that.
EDIT: Here's an example query. The SINGER (submitted) dataset contains a column with the arbitrary radius to limit the geo-results to a smaller subset, but as stated above, that radius may define a circle (whose centerpoint is the student's geolocation) which might not encompass the required number of teachers. Sometimes the supplied datasets contain thousands of addresses, not merely a few hundred.
select TEACHERSRANKEDBYDISTANCE.* from
(
select STUDENTSANDTEACHERSINRADIUS.*,
rowpos = row_number()
over(partition by
STUDENTSANDTEACHERSINRADIUS.zipcode+STUDENTSANDTEACHERSINRADIUS.streetaddress
order by DistanceInMiles)
from
(
select
SINGER.name,
SINGER.streetaddress,
SINGER.city,
SINGER.state,
SINGER.zipcode,
TEACHERS.name as TEACHERname,
TEACHERS.streetaddress as TEACHERaddress,
TEACHERS.city as TEACHERcity,
TEACHERS.state as TEACHERstate,
TEACHERS.zipcode as TEACHERzip,
TEACHERS.teacherid,
geography::Point(SINGER.lat, SINGER.lon, 4326).STDistance(TEACHERS.geolocation)
/ (1.6 * 1000) as DistanceInMiles
from
SINGER left join TEACHERS
on
( TEACHERS.geolocation).STDistance( geography::Point(SINGER.lat, SINGER.lon, 4326))
< (SINGER.radius * (1.6 * 1000 ))
and TEACHERS.primaryInstrument='voice'
) as STUDENTSANDTEACHERSINRADIUS
) as TEACHERSRANKEDBYDISTANCE
where rowpos < 6 -- closest 5 is an abitrary requirement given to us
I think may be if you need just to get closest 5 teachers regardless of radius, you could write something like this. The Student will duplicate 5 time in this query, I don't know what do you want to get.
select
S.name,
S.streetaddress,
S.city,
S.state,
S.zipcode,
T.name as TEACHERname,
T.streetaddress as TEACHERaddress,
T.city as TEACHERcity,
T.state as TEACHERstate,
T.zipcode as TEACHERzip,
T.teacherid,
T.geolocation.STDistance(geography::Point(S.lat, S.lon, 4326))
/ (1.6 * 1000) as DistanceInMiles
from SINGER as S
outer apply (
select top 5 TT.*
from TEACHERS as TT
where TT.primaryInstrument='voice'
order by TT.geolocation.STDistance(geography::Point(S.lat, S.lon, 4326)) asc
) as T