SQL query to get random unused combination - sql

Background:
I want to create a database that can run a tournament of 1 vs 1 matchups. It needs to keep track of who won and lost each matchup and any comments about that matchup as well as decide the next unique matchup randomly.
Rules:
There are x number of players. Each player will eventually play every other player once, in effect covering all possible unique combinations of players.
Database Tables (with Sample data):
DECLARE #Players TABLE (
ID INT PRIMARY KEY IDENTITY,
Name VARCHAR(50)
)
ID Name
-- -----
1 Alex
2 Bob
3 Chris
4 Dave
DECLARE #Matches TABLE (
ID INT PRIMARY KEY IDENTITY,
WinnerId INT,
LoserId INT
)
ID WinnerId LoserId
-- -------- -------
1 1 2
2 4 2
3 3 1
DECLARE #Comments TABLE (
ID INT PRIMARY KEY IDENTITY,
MatchId INT,
Comment VARCHAR(MAX)
)
ID MatchId Comment
-- ------- ------------------------------
1 2 That was a close one.
2 3 I did not expect that outcome.
Problem:
How can I efficiently query to get a single random match up that has not yet occurred?
The major problem is that the number of player can and will grow over time. Right now in my example data I only have 4 players which leaves 6 possible matches.
Alex,Bob
Alex,Chris
Alex,Dave
Bob,Chris
Bob,Dave
Chris,Dave
That would be small enough to simply keep grabbing 2 random numbers that correspond to the Player's id and then check the matchups table if that matchup has already occurred. If it has: get 2 more and repeat the process. If it hasn't then use it as the next matchup. However if I have 10,000 players that would be 49995000 possible matchups and it would simply become too slow.
Can anyone point me in the right direction for a more efficient query? I am open to changes in the database design if that would help make things more efficient as well.

If you make an outer join between every possible pairing and those that have been played, then filter out the ones that have been played, you're left with pairings that have not yet been played. Selecting a random one is then a trivial case of ordering:
SELECT p1.Name, p2.Name FROM
Players p1
JOIN Players p2 ON (
p1.ID < p2.ID
)
LEFT JOIN Matches ON (
(WinnerId = p1.ID AND LoserId = p2.ID)
OR (WinnerId = p2.ID AND LoserId = p1.ID)
)
WHERE Matches.ID IS NULL
ORDER BY RAND()
LIMIT 1;
EDIT
As noted by ypercube below, the above LIMIT syntax is MySQL specific. You may need to use instead the appropriate syntax for your SQL implementation - let us know what it is and someone can advise, if required. I know that in Microsoft SQL Server one uses TOP and in Oracle ROWNUM, but otherwise your Googling is probably as good as mine. :)

I am wondering why you need to pick 2 players in random. How about generate the whole list of possible matches up front, but then add a WinnerId column? For the next match, just pick the first row which has no WinnerId set.

Although the data set is large, using the limit key will stop additional processing as soon as a single key is returned. One possibility might be to user a query like below to return the next match.
SELECT * FROM Players p1, Players p2 WHERE p1.ID <> p2.ID AND (p1.ID, p2.ID) NOT IN (Select WinnerID, LoserID FROM Matches) AND (p2.ID, p1.ID) NOT IN (Select WinnerID, LoserID FROM Matches) LIMIT 1

For your problem, you want A) to consider all 2-element subsets of players B) in a randomized order.
For A, other answers are suggesting using SQL joins with various conditions. A less database-intensive solution if you really need to handle 10,000 players might be to use an efficient combination generating algorithm. I found a previous answer listing some from TAOCP vol. 4 here. For the 2 element subset case, a simple double nested loop over the player ids in lexicographical sequence would be fine:
for player_a in 1..num_players:
for player_b in player_a+1..num_players:
handle a vs. b
For part B, you could use a second table mapping players 1..n to a shuffling of the integers 1..n. Keep this shuffled mapping around until you're done the tournament process. You can use the Knuth-Fisher-Yates shuffle.
To keep track of where you are in a instance of this problem, you'll probably want to be saving the combination generator's state to the database regularly. This would probably be faster than figuring out where you are in the sequence from the original tables alone.
As you mention, handling 10,000 players in matchups this way results in nearly fifty million matchups to handle. You might consider a tournament structure that doesn't require every player to compete against each other player. For example, if A beats B and B beats C, then you might not have to consider whether A beats C. If applicable in your scenario, that sort of shortcut could save a lot of time.

Related

Writing SQL query to find ranking

I'm trying to determine for a given person how many people have a better score than they do, and group it by the different teams they belong to. So, in the tables below, I'm grabbing the list of team_id from the team_person table where the person_id matches the person I care about. That will get me all of the teams I belong to.
Then I need to know each person_id that is in any team I belong to so that I can find out what their maximum score is from the performances table.
Once I have that, I finally want to determine, for each team_id, how many people on that team have a better score than I do, where better is simply defined as having a larger value.
I've gotten way beyond my abilities with SQL at this point. What I have so far, which seems to get me the maximum score for all the people I care about, (basically everything but my final "by team" requirement) is this:
SELECT person_id, MAX(score) m
FROM performances
WHERE category_id = 7 AND person_id IN (
-- Find all the people on the teams I belong to
SELECT DISTINCT person_id
FROM team_person
WHERE team_id IN (
-- Find all the teams that I belong to
SELECT DISTINCT team_id
FROM team_person
WHERE person_id = 2
)
)
GROUP BY person_id
ORDER BY 2 DESC
My two relevant tables are defined like so, and I'm using psql 9.1.15
Table "public.team_person"
Column | Type | Modifiers
------------+--------------------------+-------------------------------------------------------------
ident | integer | not null default nextval('team_person_ident_seq'::regclass)
team_id | integer | not null
person_id | integer | not null
*chop extraneous columns*
Indexes:
"team_person_pkey" PRIMARY KEY, btree (ident)
"teamPersonUnique" UNIQUE CONSTRAINT, btree (team_id, person_id)
Foreign-key constraints:
"team_person_person_id_fkey" FOREIGN KEY (person_id) REFERENCES person(ident) ON DELETE CASCADE
"team_person_team_id_fkey" FOREIGN KEY (team_id) REFERENCES team(ident) ON DELETE CASCADE
Referenced by:
TABLE "roster" CONSTRAINT "roster_team_person_id_fkey" FOREIGN KEY (team_person_id) REFERENCES team_person(ident) ON DELETE SET NULL
Triggers:
update_team_person_modified BEFORE INSERT OR UPDATE ON team_person FOR EACH ROW EXECUTE PROCEDURE update_modified_column()
Table "public.performances"
Column | Type | Modifiers
-------------+--------------------------+--------------------------------------------------------------
ident | bigint | not null default nextval('performances_ident_seq'::regclass)
category_id | integer | not null
person_id | integer | not null
score | real | not null
*chop extraneous columns*
Indexes:
"performances_pkey" PRIMARY KEY, btree (ident)
Foreign-key constraints:
"performances_category_id_fkey" FOREIGN KEY (category_id) REFERENCES performance_categories(ident) ON DELETE CASCADE
"performances_person_id_fkey" FOREIGN KEY (person_id) REFERENCES person(ident) ON DELETE CASCADE
First, state just the problem, without assumptions about how to get to the solution. You've done that fairly well:
determine for a given person how many people have a better score than they do, and group it by the different teams they belong to.
but I'd rephrase a bit:
For each team a given person is a member of, how many people in that team have a better score than the subject person?
I don't know about you, but it suddenly seems simpler now. Take the team table, left outer join team_person and filter for teams we're a member of, left outer join performances to find games we played with that team, left outer join team_person again to get other people who're members of each team, left outer join performances, filter out teams the subject person isn't a member of, group and aggregate.
It's underspecified for some corner cases (like a team where you're the only member, or a team where you didn't play a game), but eh, whatever.
Problems:
There's no team table. Since you don't care about anything in the team table, you can omit it from the join and just use team_person as the join root.
Your team_person table is defective, by the way. It should have a UNIQUE constraint on (team_id, person_id). Or, better, that should be the primary key. It doesn't actually matter for this query because duplicate team memberships won't change the result, but it's bad data modelling. You can't be a member of a team more than once.
performances should also have a column identifying the particular game or whatever. Since you haven't shown one, I'm going to assume you mean that you're looking for people who, in any game, performed better than the subject person at least once, in that game or another game. If you actually want to find people who did better in a particular game then you need a suitable key on performances.
Fatal problem: performances is also missing a column linking the performance to the team. This makes it impossible to properly solve the problem because you can't get performances by a given person on a given team. I'm going to assume there is in fact a team_id on performances and you just left it out.
So, allowing for the above issues, I'd first acquire the data with a big join, then group and aggregate it. This join will give us, for each team we played in, for each of our performances, for each other player, for each of their other performances, one row with all the relevant information. You can then compare performances and aggregate.
The below is totally untested, since you didn't provide sample data and you chopped important parts out of your schema (or the schema is defective), but I'd try something like:
SELECT
my_performances.team_id,
-- Find how many distinct people scored better than us at least once,
-- no matter how many times or in which game.
COUNT(distinct other_team_person.person_id)
-- Start the join with our team memberships and how we scored in each.
-- If we didn't play any games for this team don't produce a result row
-- for it, so INNER JOIN.
FROM team_person my_team_person
INNER JOIN performances my_performances ON
(my_performances.person_id = my_team_person.person_id
AND my_performances.team_id = my_team_person.team_id)
-- Other members of teams we're also a member of, skipping
-- ourselves. An `INNER JOIN` is fine here because we know
-- a team with only ourselves as a member isn't interesting
-- and we might as well skip it.
INNER JOIN team_person others_team_person ON (
my_team_person.team_id = other_team_person.team_id
AND my_team_person.person_id <> other_team_person.person_id)
-- How each of those people performed in each team they're in
-- (because of previous filter, only considers teams we're in too).
-- INNER JOIN because if they never played they can't beat us.
INNER JOIN performances other_performances ON (
other_team_person.person_id = other_performances.person_id
AND other_team_person.team_id = other_performances.team_id)
-- Make sure `my_team_person` is only teams we're a member of
WHERE my_team_person.person_id = $1
-- Also discard rows where the other person didn't do better than us
AND my_performances.score < other_performances.score
-- Emit one row per team we're a member of
GROUP BY my_performances.team_id;
If you want to show teams where you never played and teams where you're the only player, you'll need to change some INNER JOINs to LEFT OUTER JOINs.
If you want to compare to find people who beat you only within a given game, you're going to need an extra column on performances, then an extra term in the join on other_performances to restrict it to only matching in the same game as my_performances.

Confusion over relationship in a tennis tournament context

I'm making a simple tennis database that features:
Tournaments (Mens/womens singles, Mens/womens doubles, Mixed doubles)
Players
Results
My Results table looked something like
ResultID
DatePlayed
Player1ID (became ParticipantID1)
Player1Score
Player2ID (became ParticipantID2)
Player2Score
I then realised that a player could play in more than one tournament so I needed another table, Participant (or could be called team, or couple).
ParticipantID
PlayerID
TournamentID
With all three ID's being composite.
Now Joe (ID: 1) and Tom (ID: 2) can be a doubles partner but Joe can still play in the Singles
Participant ID -- PlayerID -- TournamentID
1 ----------------- 1 ------------ 1
1 ----------------- 2 ------------ 1
2 ----------------- 1 ------------ 2
Now the problem I have with this is that auto-increment/identify would have to be turned for the Participant table meaning more coding work would have to be done (when creating and validating etc).
The second problem I have with this is that all the singles players data would be repeating, as they would be stored in the Player table and also the Participant table.
I could have another Results table, i.e. DoubleResults, but I feel this isn't necessary.
Is this the only way it can be done or has my mind just gone blank?
A tournament will have many matches. Therefore you need a one to many relationship between tournaments and matches.
A match will have a type (singles, doubles, etc). So your match table has to include a type id.
You will need a many to many relationship between teams and matches. A team can have either one or two players, depending on the type of match. At this point, it comes down to how much detail you want to store. It might be simply good enough to store the winning team id in this table.

SQL JOIN returning multiple rows when I only want one row

I am having a slow brain day...
The tables I am joining:
Policy_Office:
PolicyNumber OfficeCode
1 A
2 B
3 C
4 D
5 A
Office_Info:
OfficeCode AgentCode OfficeName
A 123 Acme
A 456 Acme
A 789 Acme
B 111 Ace
B 222 Ace
B 333 Ace
... ... ....
I want to perform a search to return all policies that are affiliated with an office name. For example, if I search for "Acme", I should get two policies: 1 & 5.
My current query looks like this:
SELECT
*
FROM
Policy_Office P
INNER JOIN Office_Info O ON P.OfficeCode = O.OfficeCode
WHERE
O.OfficeName = 'Acme'
But this query returns multiple rows, which I know is because there are multiple matches from the second table.
How do I write the query to only return two rows?
SELECT DISTINCT a.PolicyNumber
FROM Policy_Office a
INNER JOIN Office_Info b
ON a.OfficeCode = b.OfficeCode
WHERE b.officeName = 'Acme'
SQLFiddle Demo
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
Simple join returns the Cartesian multiplication of the two sets and you have 2 A in the first table and 3 A in the second table and you probably get 6 results. If you want only the policy number then you should do a distinct on it.
(using MS-Sqlserver)
I know this thread is 10 years old, but I don't like distinct (in my head it means that the engine gathers all possible data, computes every selected row in each record into a hash and adds it to a tree ordered by that hash; I may be wrong, but it seems inefficient).
Instead, I use CTE and the function row_number(). The solution may very well be a much slower approach, but it's pretty, easy to maintain and I like it:
Given is a person and a telephone table tied together with a foreign key (in the telephone table). This construct means that a person can have more numbers, but I only want the first, so that each person only appears one time in the result set (I ought to be able concatenate multiple telephone numbers into one string (pivot, I think), but that's another issue).
; -- don't forget this one!
with telephonenumbers
as
(
select [id]
, [person_id]
, [number]
, row_number() over (partition by [person_id] order by [activestart] desc) as rowno
from [dbo].[telephone]
where ([activeuntil] is null or [activeuntil] > getdate()
)
select p.[id]
,p.[name]
,t.[number]
from [dbo].[person] p
left join telephonenumbers t on t.person_id = p.id
and t.rowno = 1
This does the trick (in fact the last line does), and the syntax is readable and easy to expand. The example is simple but when creating large scripts that joins tables left and right (literally), it is difficult to avoid that the result contains unwanted duplets - and difficult to identify which tables creates them. CTE works great for me.

SQL Merging Associated Records

Let's say we have a database with a table that has many other associated tables. If you diagrammed the database, this would be the table at the center with many foreign key relationships spiraling out of it.
To make it more concrete, let's say the two records in this central table are Initech and Contoso. Initech and Contoso are both associated with many other records in associated tables like Employees, AccountingTransactions, etc. Let's say the two merged (Initech bought Contoso) and from a data standpoint, it really is as simple as merging all the records. What's the easiest way to take all of Contoso's related records, make them point to Initech and then delete Contoso?
UPDATE with CASCADE comes tantalizingly close, but it obviously can't work without turning off constraints and then turning them back on (yuck).
Is there a nice generic way to do this without hunting down every single linked table and migrating them one by one? This has to be a common requirement. It's come up in two places in this project and can be summed up with: Entity A needs to control everything Entity B current controls. How can I make it happen?
Before Merge:
Companies
ID Name
1 Contoso
2 Initech
Employees
ID Name CompanyId
1 Bob 1
2 Ted 2
After Merge:
Companies
ID Name
2 Initech
Employees
ID Name CompanyId
1 Bob 2
2 Ted 2
All my attempts at searching only turned up questions about merging separate databases... so sorry if this has been asked before.
This query is likely vendor-dependent, but in MySQL:
UPDATE Employees e, Cars c, OtherEntity o
SET e.CompanyId = 2, c.CompanyId = 2, o.CompanyID = 2
WHERE e.CompanyID = 1 OR c.CompanyId = 1 OR o.CompanyId = 1;
Succinctly, no; there isn't a generic way to do it.
Consider your sample database with tables Companies, Employees, Departments, and AccountingTransactions.
You need to delete one of the company records (because after the merger, you will only record the current state of affairs).
You need to alter the employee records to change the employing company. However, it is quite possible that there is an employee number N in both companies, and one of those (presumably Contoso's) will have to be assigned a new employee number.
You probably face the problem that department 1 in Conotoso's data is Engineering, but in Initech's is Finance. So, you need to worry about how you are going to map the department numbers between the two companies, and then you face the problem of assigning Contoso's employees to Initech's departments.
For the historical accounting transactions, you probably have to keep Contoso's historical accounting records in Contoso's name, while some (of the most recent) transactions will need to be migrated to Initech's name. So maybe you won't be deleting the Contoso record from the table of companies after all, but you won't be able to use it to identify any new records.
These are just a small sampling of the reasons why such mappings cannot readily be automated.
No, there's no simple generic way of merging rows and cascading those changes throughout your system. You can script it all - which may be the best way, depending on your scenario - or devise a workaround.
One workaround might be to implement a parenting pattern on your central table (or abstract it to another table). You would then end up with something like
Companies
ID ParentID Name
1 2 Contoso
2 null Initech
or
Companies
ID ParentID Name
1 3 Contoso
2 3 Initech
3 null MegaInitech
and all your queries that join onto this central Companies table now check ID and ParentID;
SELECT *
FROM Employees
WHERE CompanyId IN (SELECT ID FROM Companies WHERE ID = #id OR ParentID = #ID)
Abstract this away to a view or function
CREATE FUNCTION fn_IsMemberOf
(
#companyId INT,
#parentId INT
)
RETURNS BIT
AS
BEGIN
DECLARE #result BIT = 0
SELECT #result = 1 FROM Companies
WHERE ID = #companyId
AND COALESCE(ParentID, ID) = #parentID
RETURN #result
END
SELECT *
FROM Employees
WHERE fn_IsMemberOf(CompanyId, 1) = 1
(haven't tested this but you get the idea)

Modelling database for a small soccer league

The database is quite simple. Below there is a part of a schema relevant to this question
ROUND (round_id, round_number)
TEAM (team_id, team_name)
MATCH (match_id, match_date, round_id)
OUTCOME (team_id, match_id, score)
I have a problem with query to retrieve data for all matches played. The simple query below gives of course two rows for every match played.
select *
from round r
inner join match m on m.round_id = r.round_id
inner join outcome o on o.match_id = m.match_id
inner join team t on t.team_id = o.team_id
How should I write a query to have the match data in one row?
Or maybe should I redesign the database - drop the OUTCOME table and modify the MATCH table to look like this:
MATCH (match_id, match_date, team_away, team_home, score_away, score_home)?
You can almost generate the suggested change from the original tables using a self join on outcome table:
select o1.team_id team_id_1,
o2.team_id team_id_2,
o1.score score_1,
o2.score score_2,
o1.match_id match_id
from outcome o1
inner join outcome o2 on o1.match_id = o2.match_id and o1.team_id < o2.team_id
Of course, the information for home and away are not possible to generate, so your suggested alternative approach might be better after all. Also, take note of the condition o1.team_id < o2.team_id, which gets rid of the redundant symmetric match data (actually it gets rid of the same outcome row being joined with itself as well, which can be seen as the more important aspect).
In any case, using this select as part of your join, you can generate one row per match.
you fetch 2 rows for every matches played but team_id and team_name are differents :
- one for team home
- one for team away
so your query is good
Using the match table as you describe captures the logic of a game simply and naturally and additionally shows home and away teams which your initial model does not.
You might want to add the round id as a foreign key to round table and perhaps a flag to indicate a match abandoned situation.
drop outcome. it shouldn't be a separate table, because you have exactly one outcome per match.
you may consider how to handle matches that are cancelled - perhaps scores are null?