Vote based approval system in DBMS - sql

I try to create a system to auto approve proposed posts by popular vote. I am currently evaluating what portions of this I can do within the DBMS and if it is sensible. I use PostgreSQL and I can go up to the newest version if that helps me.
My database structure would look somwhat like this:
CREATE TYPE state AS ENUM ('write', 'vote');
CREATE TABLE post
(
id SERIAL NOT NULL CONSTRAINT post_pkey PRIMARY KEY,
title VARCHAR(100),
state state
);
CREATE TABLE proposal
(
id SERIAL NOT NULL CONSTRAINT proposal_pkey PRIMARY KEY,
post_id INTEGER CONSTRAINT proposal_post_fkey REFERENCES post,
text TEXT
);
CREATE TABLE accepted
(
id SERIAL NOT NULL CONSTRAINT accepted_pkey PRIMARY KEY,
post_id INTEGER CONSTRAINT accepted_post_fkey REFERENCES post,
text TEXT
);
CREATE TABLE vote
(
proposal_id INTEGER CONSTRAINT vote_proposal_fkey REFERENCES proposal,
user_id INTEGER,
PRIMARY KEY(proposal_id, user_ID)
);
Here you can find a SQLFiddle for that layout
My goal is now to build on or more queries to copy the proposal to accepted and set the state to write according to these rules:
If the number of votes for a proposal is more higher than a threshold n
If there is exactly one proposal with the highest vote count
My question is now whether this can sensibly be done within PostgreSQL or should be done outside and also if you could point me into the right direction for the queries.

Assuming the votes are inserted one-by-one and not in bulk, then you can do what you want using a trigger on votes.
Basically, you when a vote is retrieved, the trigger would:
Calculate the total number of votes for the proposal.
Check if the total meets the threshold. If so, then do what you want for "accept".
Personally, I would tweak your design. Instead of an accepted table, I would simply have a flag in proposals. I would also modify proposals to have the total number of votes. And, I'm not clear on what or whether proposals are competing against each other -- that is, if one is accepted, does that preclude voting on others?
Also, you are unclear on what happens if votes are "removed" from a proposal.

Related

Restrict the number of entries in a relation based on conditions across several relations

I am using PostgreSQL and am trying to restrict the number of concurrent loans that a student can have. To do this, I have created a CTE that selects all unreturned loans grouped by StudentID, and counts the number of unreturned loans for each StudentID. Then, I am attempting to create a check constraint that uses that CTE to restrict the number of concurrent loans that a student can have to 7 at most.
The below code does not work because it is syntactically invalid, but hopefully it can communicate what I am trying to achieve. Does anyone know how I could implement my desired restriction on loans?
CREATE TABLE loan (
id SERIAL PRIMARY KEY,
copy_id INTEGER REFERENCES media_copies (copy_id),
account_id INT REFERENCES account (id),
loan_date DATE NOT NULL,
expiry_date DATE NOT NULL,
return_date DATE,
WITH currentStudentLoans (student_id, current_loans) AS
(
SELECT account_id, COUNT(*)
FROM loan
WHERE account_id IN (SELECT id FROM student)
AND return_date IS NULL
GROUP BY account_id
)
CONSTRAINT max_student_concurrent_loans CHECK(
(SELECT current_loans FROM currentStudentLoans) BETWEEN 0 AND 7
)
);
For additional (and optional) context, I include an ER diagram of my database schema.
You cannot do this using an in-line CTE like this. You have several choices.
The first is a UDF and check constraint. Essentially, the logic in the CTE is put in a UDF and then a check constraint validates the data.
The second is a trigger to do the check on this table. However, that is tricky because the counts are on the same table.
The third is storing the total number in another table -- probably accounts -- and keeping it up-to-date for inserts, updates, and deletes on this table. Keeping that value up-to-date requires triggers on loans. You can then put the check constraint on accounts.
I'm not sure which solution fits best in your overall schema. The first is closest to what you are doing now. The third "publishes" the count, so it is a bit clearer what is going on.

Large Numbers Of Columns In Database

I have been doing some research into this issue and still have not been able to make up a satisfactory decision.
This question came closest but still does not really help my situation.
Large Number of Columns in MySQL Database
I am basically creating a site of "who would win in a fight" to settle the long standing batman vs superman style arguments where users can vote on who they think would win.
Users will have the option to submit a "fighter" to the website who will then be randomly matched to every other fighter for future users to vote on.
I want to obviously keep statistics on all of the match ups to display to the users.
Now i will have a table named lets say FIGHTERS. this will store info like primary key, name, description, but not fight results.
As for storing the fight results i can see two options.
Option A: Create a table for each fighter to count the amount of winning votes they have vs every other fighters primary key.
Option B: Create One Large votes table which would have an equal amount of column's and rows indexed by the primary keys of the fighters. Then for example to get the stats for fighter1 vs fighter4 i would query row 1 (fighter1 PK1) Column 4 (for fighter 4 PK4) to get the amount of fighter 1 wins vs fighter 4, and then repeat but query row 4 (PK4 for fighter 4), column 1 to get fighter 4 wins vs fighter1. This table would obviously get very large when hundreds (thousands?) of fighters are added.
(Hope that was not too confusing!)
So i guess my question is, is it better to have hundreds of small tables (which will all need to have columns and rows added when a new fighter is added). Or to have one large table?
Im totally 50/50 with this so please any advice or other ways i could achieve this would be most appreciated.
Thanks in advance.
EDIT: Sorry for leaving this out. The voting i had in mind would work basically as a count of overall votes for each fighter in favour of winning the fight vs each other fighter.
Following clarification I would consider
CREATE TABLE FightResults
(
Fighter1Id INT REFERENCES FIGHTERS(FighterId),
Fighter2Id INT REFERENCES FIGHTERS(FighterId),
Fighter1Votes INT,
Fighter2Votes INT,
CHECK (Fighter1Id < Fighter2Id ),
PRIMARY KEY (Fighter1Id,Fighter2Id)
)
You have a row for each matchup. Gorilla vs Shark, Lion vs Tiger etc. The check and PK constraints ensure the same matchup isn't represented more than once.
This does assume that the fights will have a fixed number of participants at two. If this isn't the case then a more flexible schema is
CREATE TABLE Fight
(
FightId INT PRIMARY KEY,
/*Other columns with fight metadata*/
)
CREATE TABLE FightResult
(
FightId INT REFERENCES Fight(FightId),
FighterId INT REFERENCES FIGHTERS(FighterId),
Votes INT,
PRIMARY KEY (FightId,FighterId)
)
But this does add quite possibly unnecessary complexity to your queries.
You may also want to prevent multiple votes on the same contest by the same user. In that case you might use something like (assuming two fighters per contest again)
CREATE TABLE Fights
(
FightId INT PRIMARY KEY,
Fighter1Id INT REFERENCES FIGHTERS(FighterId),
Fighter2Id INT REFERENCES FIGHTERS(FighterId),
CHECK (Fighter1Id < Fighter2Id )
)
CREATE TABLE Votes
(
FightId INT REFERENCES Fights(FightId),
UserId INT REFERENCES Users(UserId),
Vote INT CHECK (Vote IN (1,2)),
PRIMARY KEY (FightId,UserId)
)
But possibly keeping denormalised vote totals around for performance reason.
The solution is to create 2 tables:
Fighters with FighterId (primary key) and all the other data.
FightResult: FightResultId (primary key), FighterId1, FighterId2, FightResult. The two columns FighterIdX are foreign keys to Fighter.
This will make it easy to query and add votes and will keep it simple and easy to understand.
You can also add info like which user voted for a fight (foreign keys to users) to the second table if you like.

Store array of items in SQL table

I know this has probably been asked a million times but I can't find anything definite for me. I'm making a website involving users who can build a list of items. I'm wondering what would be the best way for store their items in an SQL table?
I'm thinking will I need to make a seperate table for each user since there I can't see any way to store an array. I think this would be inefficient however.
Depending on what an "item" is, there seem to be two possible solutions:
a one-to-many relationship between users and items
a many-to-many relationship between users and items
If a single item (such as a "book") can be "assigned" to more than one user, it's 2). If each item is unique and can only belong to a single user it's 1).
one-to-many relationship
create table users
(
user_id integer primary key not null,
username varchar(100) not null
);
create table items
(
item_id integer primary key not null,
user_id integer not null references users(user_id),
item_name varchar(100) not null
);
many-to-many relationship:
create table users
(
user_id integer primary key not null,
username varchar(100) not null
);
create table items
(
item_id integer primary key not null,
item_name varchar(100) not null
);
create table user_items
(
user_id integer not null references users(user_id),
item_id integer not null references items(item_id)
);
Because of your extremely vague description, this is the best I can think of.
There is no need to use an array or something similar. It seems you are new to database modelling, so you should read up about normalisation. Each time you think about "arrays" you are probably thinking about "tables" (or relations).
Edit (just saw you mentioned MySQL): the above SQL will not create a foreign key constraint in MySQL (even though it will run without an error) due to MySQL's stupid "I'm not telling you if I can't do something" attitude. You need to define the foreign keys separately.
A separate table for each user\account would be best. This will limit the size of the necessary tables and allow for faster searching. When you present data you are usually displaying data for that current user/account. When you have to search through the table to find the relative information. The application will start to slow down the larger the dependent table grows. Write the application as if it will be used to the fullest extent of SQL. This will limit the need for redesign in the future if the website becomes popular.

Generic Database table design

Just trying to figure out the best way to design my table for the following scenario:
I have several areas in my system (documents, projects, groups and clients) and each of these can have comments logged against them.
My question is should I have one table like this:
CommentID
DocumentID
ProjectID
GroupID
ClientID
etc
Where only one of the ids will have data and the rest will be NULL or should I have a separate CommentType table and have my comments table like this:
CommentID
CommentTypeID
ResourceID (this being the id of the project/doc/client)
etc
My thoughts are that option 2 would be more efficient from an indexing point of view. Is this correct?
Option 2 is not a good solution for a relational database. It's called polymorphic associations (as mentioned by #Daniel Vassallo) and it breaks the fundamental definition of a relation.
For example, suppose you have a ResourceId of 1234 on two different rows. Do these represent the same resource? It depends on whether the CommentTypeId is the same on these two rows. This violates the concept of a type in a relation. See SQL and Relational Theory by C. J. Date for more details.
Another clue that it's a broken design is that you can't declare a foreign key constraint for ResourceId, because it could point to any of several tables. If you try to enforce referential integrity using triggers or something, you find yourself rewriting the trigger every time you add a new type of commentable resource.
I would solve this with the solution that #mdma briefly mentions (but then ignores):
CREATE TABLE Commentable (
ResourceId INT NOT NULL IDENTITY,
ResourceType INT NOT NULL,
PRIMARY KEY (ResourceId, ResourceType)
);
CREATE TABLE Documents (
ResourceId INT NOT NULL,
ResourceType INT NOT NULL CHECK (ResourceType = 1),
FOREIGN KEY (ResourceId, ResourceType) REFERENCES Commentable
);
CREATE TABLE Projects (
ResourceId INT NOT NULL,
ResourceType INT NOT NULL CHECK (ResourceType = 2),
FOREIGN KEY (ResourceId, ResourceType) REFERENCES Commentable
);
Now each resource type has its own table, but the serial primary key is allocated uniquely by Commentable. A given primary key value can be used only by one resource type.
CREATE TABLE Comments (
CommentId INT IDENTITY PRIMARY KEY,
ResourceId INT NOT NULL,
ResourceType INT NOT NULL,
FOREIGN KEY (ResourceId, ResourceType) REFERENCES Commentable
);
Now Comments reference Commentable resources, with referential integrity enforced. A given comment can reference only one resource type. There's no possibility of anomalies or conflicting resource ids.
I cover more about polymorphic associations in my presentation Practical Object-Oriented Models in SQL and my book SQL Antipatterns.
Read up on database normalization.
Nulls in the way you describe would be a big indication that the database isn't designed properly.
You need to split up all your tables so that the data held in them is fully normalized, this will save you a lot of time further down the line guaranteed, and it's a lot better practice to get into the habit of.
From a foreign key perspective, the first example is better because you can have multiple foreign key constraints on a column but the data has to exist in all those references. It's also more flexible if the business rules change.
To continue from #OMG Ponies' answer, what you describe in the second example is called a Polymorphic Association, where the foreign key ResourceID may reference rows in more than one table. However in SQL databases, a foreign key constraint can only reference exactly one table. The database cannot enforce the foreign key according to the value in CommentTypeID.
You may be interested in checking out the following Stack Overflow post for one solution to tackle this problem:
MySQL - Conditional Foreign Key Constraints
The first approach is not great, since it is quite denormalized. Each time you add a new entity type, you need to update the table. You may be better off making this an attribute of document - I.e. store the comment inline in the document table.
For the ResourceID approach to work with referential integrity, you will need to have a Resource table, and a ResourceID foreign key in all of your Document, Project etc.. entities (or use a mapping table.) Making "ResourceID" a jack-of-all-trades, that can be a documentID, projectID etc.. is not a good solution since it cannot be used for sensible indexing or foreign key constraint.
To normalize, you need to the comment table into one table per resource type.
Comment
-------
CommentID
CommentText
...etc
DocumentComment
---------------
DocumentID
CommentID
ProjectComment
--------------
ProjectID
CommentID
If only one comment is allowed, then you add a unique constraint on the foreign key for the entity (DocumentID, ProjectID etc.) This ensures that there can only be one row for the given item and so only one comment. You can also ensure that comments are not shared by using a unique constraint on CommentID.
EDIT: Interestingly, this is almost parallel to the normalized implementation of ResourceID - replace "Comment" in the table name, with "Resource" and change "CommentID" to "ResourceID" and you have the structure needed to associate a ResourceID with each resource. You can then use a single table "ResourceComment".
If there are going to be other entities that are associated with any type of resource (e.g. audit details, access rights, etc..), then using the resource mapping tables is the way to go, since it will allow you to add normalized comments and any other resource related entities.
I wouldn't go with either of those solutions. Depending on some of the specifics of your requirements you could go with a super-type table:
CREATE TABLE Commentable_Items (
commentable_item_id INT NOT NULL,
CONSTRAINT PK_Commentable_Items PRIMARY KEY CLUSTERED (commentable_item_id))
GO
CREATE TABLE Projects (
commentable_item_id INT NOT NULL,
... (other project columns)
CONSTRAINT PK_Projects PRIMARY KEY CLUSTERED (commentable_item_id))
GO
CREATE TABLE Documents (
commentable_item_id INT NOT NULL,
... (other document columns)
CONSTRAINT PK_Documents PRIMARY KEY CLUSTERED (commentable_item_id))
GO
If the each item can only have one comment and comments are not shared (i.e. a comment can only belong to one entity) then you could just put the comments in the Commentable_Items table. Otherwise you could link the comments off of that table with a foreign key.
I don't like this approach very much in your specific case though, because "having comments" isn't enough to put items together like that in my mind.
I would probably go with separate Comments tables (assuming that you can have multiple comments per item - otherwise just put them in your base tables). If a comment can be shared between multiple entity types (i.e., a document and a project can share the same comment) then have a central Comments table and multiple entity-comment relationship tables:
CREATE TABLE Comments (
comment_id INT NOT NULL,
comment_text NVARCHAR(MAX) NOT NULL,
CONSTRAINT PK_Comments PRIMARY KEY CLUSTERED (comment_id))
GO
CREATE TABLE Document_Comments (
document_id INT NOT NULL,
comment_id INT NOT NULL,
CONSTRAINT PK_Document_Comments PRIMARY KEY CLUSTERED (document_id, comment_id))
GO
CREATE TABLE Project_Comments (
project_id INT NOT NULL,
comment_id INT NOT NULL,
CONSTRAINT PK_Project_Comments PRIMARY KEY CLUSTERED (project_id, comment_id))
GO
If you want to constrain comments to a single document (for example) then you could add a unique index (or change the primary key) on the comment_id within that linking table.
It's all of these "little" decisions that will affect the specific PKs and FKs. I like this approach because each table is clear on what it is. In databases that's usually better then having "generic" tables/solutions.
Of the options you give, I would go for number 2.
Option 2 is a good way to go. The issue that I see with that is you are putting the resouce key on that table. Each of the IDs from the different resources could be duplicated. When you join resources to the comments you will more than likely come up with comments that do not belong to that particular resouce. This would be considered a many to many join. I would think a better option would be to have your resource tables, the comments table, and then tables that cross reference the resource type and the comments table.
If you carry the same sort of data about all comments regardless of what they are comments about, I'd vote against creating multiple comment tables. Maybe a comment is just "thing it's about" and text, but if you don't have other data now, it's likely you will: date the comment was entered, user id of person who made it, etc. With multiple tables, you have to repeat all these column definitions for each table.
As noted, using a single reference field means that you could not put a foreign key constraint on it. This is too bad, but it doesn't break anything, it just means you have to do the validation with a trigger or in code. More seriously, joins get difficult. You can just say "from comment join document using (documentid)". You need a complex join based on the value of the type field.
So while the multiple pointer fields is ugly, I tend to think that's the right way to go. I know some db people say there should never be a null field in a table, that you should always break it off into another table to prevent that from happening, but I fail to see any real advantage to following this rule.
Personally I'd be open to hearing further discussion on pros and cons.
Pawnshop Application:
I have separate tables for Loan, Purchase, Inventory & Sales transactions.
Each tables rows are joined to their respective customer rows by:
customer.pk [serial] = loan.fk [integer];
= purchase.fk [integer];
= inventory.fk [integer];
= sale.fk [integer];
I have consolidated the four tables into one table called "transaction", where a column:
transaction.trx_type char(1) {L=Loan, P=Purchase, I=Inventory, S=Sale}
Scenario:
A customer initially pawns merchandise, makes a couple of interest payments, then decides he wants to sell the merchandise to the pawnshop, who then places merchandise in Inventory and eventually sells it to another customer.
I designed a generic transaction table where for example:
transaction.main_amount DECIMAL(7,2)
in a loan transaction holds the pawn amount,
in a purchase holds the purchase price,
in inventory and sale holds sale price.
This is clearly a denormalized design, but has made programming alot easier and improved performance. Any type of transaction can now be performed from within one screen, without the need to change to different tables.

Database Design

I am making a webapp right now and I am trying to get my head around the database design.
I have a user model(username (which is primary key), password, email, website)
I have a entry model(id, title, content, comments, commentCount)
A user can only comment on an entry once. What is the best and most efficient way to go about doing this?
At the moment, I am thinking of another table that has username (from user model) and entry id (from entry model)
**username id**
Sonic 4
Sonic 5
Knuckles 2
Sonic 6
Amy 15
Sonic 20
Knuckles 5
Amy 4
So then to list comments for entry 4 it searches for id=4.
On a side note:
Instead of storing a commentCount, would it be better to calculate the comment count from the database each time when needed?
Your design is basically sound. Your third table should be named something like UsersEntriesComments, with fields UserName, EntryID and Comment. In this table, you would have a compound primary key consisting of the UserName and EntryID fields; this would enforce the rule that each user can comment on each entry only once. The table would also have foreign key constraints such that UserName must be in the Users table, and EntryID must be in the Entries table (the ID field, specifically).
You could add an ID field to the Users table, but many programmers (myself included) advocate the use of "natural" keys where possible. Since UserNames must be unique in your system, this is a perfectly valid (and easily readable) primary key.
Update: just read your question again. You don't need the Comments or the CommentsCount fields in your Entries table. Comments would properly be stored in the UsersEntriesComments table, and the counts would be calculated dynamically in your queries (saving you the trouble of updating this value yourself).
Update 2: James Black makes a good point in favor of not using UserName as the primary key, and instead adding an artificial primary key to the table (UserID or some such). If you use UserName as the primary key, allowing a user to change their user name is more difficult, as you have to change the username in all the related tables as well.
What exactly do you mean by
entry model(id, title, content, **comments**, commentCount)
(emphasis mine)? Since it looks like you have multiple comments per entity, they should be stored in a separate table:
comments(id, entry_id, content, user_id)
entry_id and user_id are foreign keys to respective tables. Now you just need to create a unique index on (entry_id, user_id) to ensure user can only add one comment per entity.
Also, you may want to create a surrogate (numeric, generated via sequence / identity) primary key for your users table instead of making user name your PK.
Here's my recommendation for your data model:
USERS table
USER_ID (pk, int)
USER_NAME
PASSWORD
EMAIL
WEBSITE
ENTRY table
ENTRY_ID (pk, int)
ENTRY_TITLE
CONTENT
ENTRY_COMMENTS table
ENTRY_ID (pk, fk)
USER_ID (pk, fk)
COMMENT
This setup allows an ENTRY to have 0+ comments. When a comment is added, the primary key being a composite key of ENTRY_ID and USER_ID means that the pair can only exist once in the table (IE: 1, 1 won't allow 1, 1 to be added again).
Do not store counts in a table - use a VIEW for that so the number can be generated based on existing data at the time of execution.
I wouldn't use the username as a primary ID. I would make a numeric id with autoincrement
I would use that new id in the relations table with a unique key on the 2 fields
Even though it isn't in the question, you may want to have a userid that is the primary key, otherwise it will be difficult if the user is allowed to change their username, or make certain people know you cannot change your username.
Make the joined table have a unique constraint on the userid and entryid. That way the database forces that there is only one comment/entry/user.
It would help if you specified a database, btw.
It sounds like you want to guarantee that the set of comments is unique with respect to username X post_id. You can do this by using a unique constraint, or if your database system doesn't support that explicitly, with an index that does the same. Here's some SQL expressing that:
CREATE TABLE users (
username VARCHAR(10) PRIMARY KEY,
-- any other data ...
);
CREATE TABLE posts (
post_id INTEGER PRIMARY KEY,
-- any other data ...
);
CREATE TABLE comments (
username VARCHAR(10) REFERENCES users(username),
post_id INTEGER REFERENCES posts(post_id),
-- any other data ...
UNIQUE (username, post_id) -- Here's the important bit!
);