Large Numbers Of Columns In Database - sql

I have been doing some research into this issue and still have not been able to make up a satisfactory decision.
This question came closest but still does not really help my situation.
Large Number of Columns in MySQL Database
I am basically creating a site of "who would win in a fight" to settle the long standing batman vs superman style arguments where users can vote on who they think would win.
Users will have the option to submit a "fighter" to the website who will then be randomly matched to every other fighter for future users to vote on.
I want to obviously keep statistics on all of the match ups to display to the users.
Now i will have a table named lets say FIGHTERS. this will store info like primary key, name, description, but not fight results.
As for storing the fight results i can see two options.
Option A: Create a table for each fighter to count the amount of winning votes they have vs every other fighters primary key.
Option B: Create One Large votes table which would have an equal amount of column's and rows indexed by the primary keys of the fighters. Then for example to get the stats for fighter1 vs fighter4 i would query row 1 (fighter1 PK1) Column 4 (for fighter 4 PK4) to get the amount of fighter 1 wins vs fighter 4, and then repeat but query row 4 (PK4 for fighter 4), column 1 to get fighter 4 wins vs fighter1. This table would obviously get very large when hundreds (thousands?) of fighters are added.
(Hope that was not too confusing!)
So i guess my question is, is it better to have hundreds of small tables (which will all need to have columns and rows added when a new fighter is added). Or to have one large table?
Im totally 50/50 with this so please any advice or other ways i could achieve this would be most appreciated.
Thanks in advance.
EDIT: Sorry for leaving this out. The voting i had in mind would work basically as a count of overall votes for each fighter in favour of winning the fight vs each other fighter.

Following clarification I would consider
CREATE TABLE FightResults
(
Fighter1Id INT REFERENCES FIGHTERS(FighterId),
Fighter2Id INT REFERENCES FIGHTERS(FighterId),
Fighter1Votes INT,
Fighter2Votes INT,
CHECK (Fighter1Id < Fighter2Id ),
PRIMARY KEY (Fighter1Id,Fighter2Id)
)
You have a row for each matchup. Gorilla vs Shark, Lion vs Tiger etc. The check and PK constraints ensure the same matchup isn't represented more than once.
This does assume that the fights will have a fixed number of participants at two. If this isn't the case then a more flexible schema is
CREATE TABLE Fight
(
FightId INT PRIMARY KEY,
/*Other columns with fight metadata*/
)
CREATE TABLE FightResult
(
FightId INT REFERENCES Fight(FightId),
FighterId INT REFERENCES FIGHTERS(FighterId),
Votes INT,
PRIMARY KEY (FightId,FighterId)
)
But this does add quite possibly unnecessary complexity to your queries.
You may also want to prevent multiple votes on the same contest by the same user. In that case you might use something like (assuming two fighters per contest again)
CREATE TABLE Fights
(
FightId INT PRIMARY KEY,
Fighter1Id INT REFERENCES FIGHTERS(FighterId),
Fighter2Id INT REFERENCES FIGHTERS(FighterId),
CHECK (Fighter1Id < Fighter2Id )
)
CREATE TABLE Votes
(
FightId INT REFERENCES Fights(FightId),
UserId INT REFERENCES Users(UserId),
Vote INT CHECK (Vote IN (1,2)),
PRIMARY KEY (FightId,UserId)
)
But possibly keeping denormalised vote totals around for performance reason.

The solution is to create 2 tables:
Fighters with FighterId (primary key) and all the other data.
FightResult: FightResultId (primary key), FighterId1, FighterId2, FightResult. The two columns FighterIdX are foreign keys to Fighter.
This will make it easy to query and add votes and will keep it simple and easy to understand.
You can also add info like which user voted for a fight (foreign keys to users) to the second table if you like.

Related

Vote based approval system in DBMS

I try to create a system to auto approve proposed posts by popular vote. I am currently evaluating what portions of this I can do within the DBMS and if it is sensible. I use PostgreSQL and I can go up to the newest version if that helps me.
My database structure would look somwhat like this:
CREATE TYPE state AS ENUM ('write', 'vote');
CREATE TABLE post
(
id SERIAL NOT NULL CONSTRAINT post_pkey PRIMARY KEY,
title VARCHAR(100),
state state
);
CREATE TABLE proposal
(
id SERIAL NOT NULL CONSTRAINT proposal_pkey PRIMARY KEY,
post_id INTEGER CONSTRAINT proposal_post_fkey REFERENCES post,
text TEXT
);
CREATE TABLE accepted
(
id SERIAL NOT NULL CONSTRAINT accepted_pkey PRIMARY KEY,
post_id INTEGER CONSTRAINT accepted_post_fkey REFERENCES post,
text TEXT
);
CREATE TABLE vote
(
proposal_id INTEGER CONSTRAINT vote_proposal_fkey REFERENCES proposal,
user_id INTEGER,
PRIMARY KEY(proposal_id, user_ID)
);
Here you can find a SQLFiddle for that layout
My goal is now to build on or more queries to copy the proposal to accepted and set the state to write according to these rules:
If the number of votes for a proposal is more higher than a threshold n
If there is exactly one proposal with the highest vote count
My question is now whether this can sensibly be done within PostgreSQL or should be done outside and also if you could point me into the right direction for the queries.
Assuming the votes are inserted one-by-one and not in bulk, then you can do what you want using a trigger on votes.
Basically, you when a vote is retrieved, the trigger would:
Calculate the total number of votes for the proposal.
Check if the total meets the threshold. If so, then do what you want for "accept".
Personally, I would tweak your design. Instead of an accepted table, I would simply have a flag in proposals. I would also modify proposals to have the total number of votes. And, I'm not clear on what or whether proposals are competing against each other -- that is, if one is accepted, does that preclude voting on others?
Also, you are unclear on what happens if votes are "removed" from a proposal.

Should I index a foreign key that is going to be updated often

I am trying to create a library relational database, in which there are two tables: users and books. The relationship is one to many:one. A user has many books, and one book is owned by only one user. I was thinking that the book table should have a foreign key column that references the user id.
However I encountered a problem if I want to get all of the books of a given user.
The only option is to query the books whose user id equals the given user id using join.
But if there are many books it will take a lot of time.
So one may suggest to index the foreign key column as a non clustered index. However a book-user combination will be updated often--you don't keep a book more than one day in this library. But I read that update an indexed column often is not the best practice.
So what should I do? What is the best solution for this case?
Best performance for bi-directions query should include a middle table to storage the relationships. Both of customer and book should have unique index
The middle table - borrowing_table
with column user_id and book_id You storage the information of both users and books index (id) on this table, so you can query the table by user_id and get which books have been borrowed by this individual, you also can get the users quick from the query by books_id.
You should have an index on book_id.
Your concern about "frequent" updates just doesn't apply in a library setting. Libraries work on the time frames of days and weeks. Databases work on the timeframes of milliseconds, seconds, and minutes. What might seem frequent in a library is rather rare from the perspective of a database.
That said, I would suggest an intermediate table, not because you have a 1-n relationship at any given point in time. Instead, you have a time-tiled relationship. So:
create table UserBooks (
UserBookId int, -- serial, auto_increment, identity, generated always
UserId int references Users(UserId),
BookId int references Books(BookId),
FromDate datetime,
ToDate datetime,
DueDate datetime,
OverdueFees numeric(20, 4)
. . .
);
In other words, "borrowing" deserves to be entity itself, because there is more information than just the book and the user.

Choosing indexes and primary keys for performance

I am new to database design and I am having a lot of trouble on designing a PostgreSQL database for a combat game.
On this game, players will fight between them, gaining resources to buy better weapons and armors. Combats will be recorded for future review and the number of combats is expected to grow rapidly, as, for example, 1k players fighting 1k rounds will produce 500k records.
Game interactivity is reduced to spend points to upgrade the fighter equipment and habilities. Combats are resolved by the machine.
Details:
A specific type of weapon or armor can only be possesed once by each fighter.
Fighters will almost exclusively searched by id.
I will often need to search what pieces of equipment (weapons and/or armor) are possesed by a specific fighter, but I do not expect to search which fighters posseses a specific type of weapon.
Combats will be often searched by winner or loser.
Two given fighters can fight multiple times on different dates, so the tuple winner-loser is not unique on table combats
fighters table contains a lot of columns that will be often retrieved all at the same time (I create two objects of class "Fighter" with all the related information anytime a combat begins)
This is my current design:
CREATE TABLE IF NOT EXISTS weapons (
id serial PRIMARY KEY,
*** Game stuff ***
);
CREATE TABLE IF NOT EXISTS armors (
id serial PRIMARY KEY,
*** Game stuff ***
);
CREATE TABLE IF NOT EXISTS fighters (
id serial PRIMARY KEY,
preferred_weapon INT references weapons(id),
preferred_armor INT references armors(id),
*** Game stuff ***
);
CREATE TABLE IF NOT EXISTS combats (
id serial PRIMARY KEY,
winner INT references fighters(id),
loser INT references fighters(id),
*** Game stuff ***
);
CREATE TABLE IF NOT EXISTS fighters_weapons (
fighter INT NOT NULL references fighters(id),
weapon INT NOT NULL references weapons(id),
PRIMARY KEY(fighter, weapon)
);
CREATE TABLE IF NOT EXISTS fighters_armors (
fighter INT NOT NULL references fighters(id),
armor INT NOT NULL references armors(id),
PRIMARY KEY(fighter, armor)
);
My questions are:
Do you think my design is well suited?
I have seen a lot of example databases containing an id column as primary key on every table. Is there any reason for that? Should I do that instead of the multiple column primary keys I am using on fighters_weapons and fighters_armors?
PostgreSQL creates indexes automatically for each primary key, but there are several tables which I do not expect to search by it (i. e. combats). Should I remove the index for performance? PostgreSQL complains about an existing constraint.
As I will search fighters_weapons and fighters_armors by fighter, as well as combats by winner and loser, do you think I should create indexes for all of this columns on these tables?
Any performance improvement advice? The most used operations will be: insert and query fighters, query equipment for a given fighter and insert combats.
Thanks a lot :)
To address your explicit questions:
2) It can be preferable to use a "natural" value as a primary key, i.e. not a serial id, if one exists. In cases where you are unlikely to use a serial id as an identifier, I would say it's slight better not to add it.
3) Unless you intend to insert many rows very quickly into the combats table, it probably won't hurt you too much to have the index on the id column.
4) Creating an index on {fighter} is not necessary if the index {fighter, weapon} exists, and similarly creating an index on {fighter} is not necessary if the index {fighter, armor} exists. In general, you don't benefit from creating an index that is the prefix of another multi-column index. Separately, creating {winner} and {loser} indexes on combats seems like a good idea given the access pattern you've described.
5) Beyond table design, there are a few database tuning parameters that you might want to set if you've installed the database yourself. If an experienced database administrator has set up the database, he/she has probably already done this for you.

Identity column separate from composite primary key

I have a table representing soccer matches:
Date
Opponent
I feel {Date,Opponent} is the primary key because in this table there can never be more than one opponent per date. The problem is that when I create foreign key constraints in other tables, I have to include both Date and Opponent columns in the other tables:
Soccer game statistics table:
Date
Opponent
Event (Goal scored, yellow card etc)
Ideally I would like to have:
Soccer matches table:
ID
Date
Opponent
Soccer match statistics table:
SoccerMatchID
Event (Goal scored, yellow card etc)
where SoccerMatch.ID is a unique ID (but not the primary key) and {Date,Opponent} is still the primary key.
The problem is SQL Server doesn't seem to let me define ID as being a unique identity whilst {Date,Component} is the primary key. When I go to the properties for ID, the part signalling unique identifying is grayed-out with "No".
(I assume everyone agrees I should try to achieve the above as it's a better design?)
I think most people don't use the graphical designer to do this, as it's the graphical designer that's preventing it, not SQL Server. Try running DDL in a query window:
ALTER TABLE dbo.YourTable ADD ID INT IDENTITY(1,1);
GO
CREATE UNIQUE INDEX yt_id ON dbo.YourTable(ID);
GO
Now you can reference this column in other tables no problem:
CREATE TABLE dbo.SomeOtherTable
(
MatchID INT FOREIGN KEY REFERENCES dbo.YourTable(ID)
);
That said, I find the column name ID completely useless. If it's a MatchID, why not call it MatchID everywhere it appears in the schema? Yes it's redundant in the PK table but IMHO consistency throughout the model is more important.
For that matter, why is your table called SoccerMatch? Do you have other kinds of matches? I would think it would be Matches with a unique ID = MatchID. That way if you later have different types of matches you don't have to create a new table for each sport - just add a type column of some sort. If you only ever have soccer, then SoccerMatch is kind of redundant, no?
Also I would suggest that the key and unique index be the other way around. If you're not planning to use the multi-column key for external reference then it is more intuitive, at least to me, to make the PK the thing you do reference in other tables. So I would say:
CREATE TABLE dbo.Matches
(
MatchID INT IDENTITY(1,1),
EventDate DATE, -- Date is also a terrible name and it's reserved
Opponent <? data type ?> / FK reference?
);
ALTER TABLE dbo.Matches ADD CONSTRAINT PK_Matches
PRIMARY KEY (MatchID);
ALTER TABLE dbo.Matches ADD CONSTRAINT UQ_Date_Opponent
UNIQUE (EventDate, Opponent);

Is the following acceptable foreign key usage

I have the following database, the first table users is a table containing my users, userid is a primary key.
The next is my results table, now for each user, there can be a result with an id and it can be against an exam. Is it ok in this scenario to use "id" as a primary key and "userid" as a foreign key? Is there a better way I could model this scenario?
These then link to the corresponding exams...
I would probably not have userid as a varchar. I would have that as an int as well.
So the user table is like this:
userId int
userName varchar
firstName varchar
lastName varchar
And then the forenkey in the results table table would be an int. Like this:
userId int
result varchar
id int
examid INT
Becuase if you are plaing on JOIN ing the tables together then JOIN ing on a varchar is not as fast as JOIN ing on a INT
EDIT
That depend on how much data you are planing to store. Beause you know that there is a minimum chans that GUIDs are not unique. Simple proof that GUID is not unique. I think if I would design this database I would go with an int. Becuase it feels a little bit overkill to use a GUID as a userid
Provided that each user/exam will only ever produce one result, then you could create a composite key using the userid and exam columns in the results table.
Personally though, i'd go with the arbitrary id field approach as I don't like having to pass in several values to reference records. But that's just me :).
Also, the exam field in the results table should also be a foreign key.
Another way of doing this could be to abstract the Grade Levels from the Exam, and make the Exam a unique entity (and primary key) on its own table. So this would make a Grade Levels table (pkey1 = A, pkey2 = B, etc) where the grade acts as the foreign key in your second table, thus removing an entire field.
You could also normal out another level and make a table for Subjects, which would be the foreign key for a dedicated Exam Code table. You can have ENG101, ENG102, etc for exams, and the same for the other exam codes for the subject. The benefit of this is to maintain your exams, subjects, students and grade levels as unique entities. The primary and foreign keys of each are evident, and you keep a simple maintenance future with room to scale up.
You could consider using composite keys, but this is a nice and simple way to start, and you can merge tables for indexing and compacting as required.
Please make sure you first understand Normal Forms before actually normalizing your schema.