Designing an SQL Table and getting it right the first time

Designing an SQL Table and getting it right the first time - sql

I currently working on an issue tracker for my company to help them keep track of problems that arise with the network. I am using C# and SQL.
Each issue has about twenty things we need to keep track of(status, work loss, who created it, who's working on it, etc). I need to attach a list of teams affected by the issue to each entry in my main issue table. The list of teams affected ideally contains some sort of link to a unique table instance, just for that issue, that shows the list of teams affected and what percentage of each teams labs are affected.
So my question is what is the best way to impliment this "link" between an entry into the issue table and a unique table for that issue? Or am I thinking about this problem wrong.

What you are describing is called a "many-to-many" relationship. A team can be affected by many issues, and likewise an issue can affect many teams.
In SQL database design, this sort of relationship requires a third table, one that contains a reference to each of the other two tables. For example:
CREATE TABLE teams (
team_id INTEGER PRIMARY KEY
-- other attributes
);
CREATE TABLE issues (
issue_id INTEGER PRIMARY KEY
-- other attributes
);
CREATE TABLE team_issue (
issue_id INTEGER NOT NULL,
team_id INTEGER NOT NULL,
FOREIGN KEY (issue_id) REFERENCES issues(issue_id),
FOREIGN KEY (team_id) REFERENCES teams(team_id),
PRIMARY KEY (issue_id, team_id)
);

This sounds like a classic many-to-many relationship...
You probably want three tables,
One for issues, with one record (row) per each individual unique issue created...
One for the teams, with one record for each team in your company...
And one table called say, "IssueTeams" or "TeamIssueAssociations" `or "IssueAffectedTeams" to hold the association between the two...
This last table will have one record (row) for each team an issue affects... This table will have a 2-column composite primary key, on the columns IssueId, AND TeamId... Every row will have to have a unique combination of these two values... Each of which is individually a Foreign Key (FK) to the Issue table, and the Team Table, respectively.
For each team, there may be zero to many records in this table, for each issue the team is affected by,
and for each Issue, there may be zero to many records each of which represents a team the issue affects.

If I understand the question correctly I would create....
ISSUE table containing the 20 so so items
TEAM table containing a list of teams.
TEAM_ISSUES table containing the link beteen the two
The TEAM_ISSUES table needs to contain a foriegn key to the ISSUE and TEAM tables (ie it should contain an ISSUE_ID and a TEAM_ID... it therefore acts as an intersection between the two "master" tables. It sounds like this is also the place to put the percentage.
Does that make sense?

There are so many good free open source issue trackers available that you should have pretty good reasons for implementing your own. You could use your time much better in customizing an existing tracker.
We are using Bugtracker.NET in the team I work for. It's been customized quite a bit, but there was no point in developing a system from the beginning. The reason we chose that product was that it runs on .NET and works great with SQL Server, but there are many other alternatives.

We can see those entities in your domain:
The "Issue"
"Teams" affected by that issue, in a certain percentage
So, having identified those two items, you can represent that with two tables, and the relationship between them is another table, that could track the percentage impact too.
Hope this helps.

I wouldn't create a unique table for each issue. I would do something like this
Table: Issue
IssueId primary key
status
workLoss
createdby
etc
Table: Team
TeamID primary key
TeamName
etc
Table: IssueTeam
IssueID (foreign key to issue table)
TeamID (foreign key to team table)
PercentLabsAffected

Unless I'm understanding wrong what you're trying to do, you should not have a unique table for each instance of an issue.
Your database should have three tables: an Issues table, a Teams table, and an IssueTeams joining table. The IssueTeams table would include foreign keys (i.e. TeamID and IssueID) that reference the respective team in Teams and issue in Issues. So Issue Teams might have records like (Issue1, Team1), (Issue1, Team3). You could keep the affected percentage of each teams' labs in the joining table.

Well, just to be all modern and agile-y, 'getting it right the first time' is less trendy than 'refactorable.' But to work through your model:
You have Issues (heh heh). You have Teams.
An Issue affects many Teams. A Team is affected by many Issues. So just for the basic problem, you seem to have a classic Many:Many relationship. A join table containing two columns, one to Issue PK and one to Team PK takes care of that.
Then you have the question of what % of teams. There's a dynamic aspect to that, of course, so to do it right, you'll need to specify a trigger. But the obvious place to put it is a column in Issue ("Affected_Team_Percentage").

If I understand you correctly, you want to create a new table of teams affected for each issue. Creating tables as part of normal operations rings my relational database design alarm bell. Don't do it!
Instead, use one affected_teams table with a foreign key to the issues table and a foreign key to the teams table. That will do the trick.

Related

Many to many relationship between tables

I have the following problem. I have one table Games with unique ID (primary key) and one table Users with unique ID(primary key). The problem is that many users can play one game and one game can be played by many users. I read, that I need one more table to connect the two tables, but I am not sure how to do it. My idea so far is to create another table Games_users where to store gameID and userID both as PK, but I don't know how to continue.

Your approach is correct: You have to create a bridge table between the two entities related many-to-many and put the GameId and UserId.
Remember also to set the foreign keys in order to keep consistency in your database

Creating database tables regarding Primary and Foreign Keys

I'm doing an activity online where I need to create an example database for an Airline. It details the general information needed to be put into each table but I need help linking certain tables.
I've drawn up tables in Word to help me grasp the connections between tables but I'm unsure if I'm doing it correctly.
I have a table called 'Staff' that looks like this:
I was asked to create another table that provides each staff members previous work experience such as the company they worked for and join and end dates, etc.
The 'Work Experience' table:
My question is, what would be the primary key for the Work Experience table? Seeing as Staff_ID references back to the Staff table could it be both a Primary Key and a Foreign Key?

Just create a generic work_experience_id or we_id(whatever you way you want it) so that every time you want to call previous work experience for staff you just select generic id primary key.

2-column table with two foreign keys. Performance/design question

I recently ran into a quite complex problem and after looking around a lot I couldn't find a solution to it. I've found answers to my questions many times before on stackoverflow.com, so I decided to post here.
So I'm making a user/group managment system for a web-based project, and I'm storing all related data into a postgreSQL database. This system relies on three tables:
USERS (Contains the primary key "USER_ID")
GROUPS (Contains the primary key "GROUP_ID")
GROUP_USERS
The two first tables simply define all the users and all the groups on the site, and the last table, GROUP_USERS, stores the groups every user is part of. It only has two columns:
USER_ID
GROUP_ID
Since every user can be a member of several groups, I decided to make a separate table for this purpose, rather than storing a comma separated column in the USERS-table.
Now, both columns are foreign keys, and I want to make them a composite primary key as well, this since each combination of USER_ID and GROUP_ID has to be unique. But now I am stuck with what seems to be a lot of indexes and relations to a very small table only containing numbers. In the end, I want this table to be as fast as possible, even if containing tens of thousands of rows. Size on disk shouldn't be a problem since its just all numbers anyway, but it feels quite stupid to have a full-sized index refering to a smaller table.
Should I stick with my current solution, store comma-separated values in a column in the USERS-table or is there any other solution I should be aware of. What I am looking for is best possible performance. This table could potentially (but not likely or commonly) be queried several hundreds of times on a single page load.
I don't want to use an array-column, even if they are supported by postgreSQL. I want to be as generic as possible so I can switch database later on, if necessary.
EDIT: In other words, will using a composite primary key and two foreign keys in one table with only two columns have a negative impact on performance rather than the opposite due to the size of the generated index?
EDIT2: Clarifications.
Thank you!

I believe you're in the right path right now, but didn't understand which indexes you really defined.
My suggestion is that you should have your primary key index in USERS by USER_ID, your primary key index in GROUPS by GROUP_ID, and two more indexes in GROUP_USERS. One of the indexes in GROUP_USERS should be either by the couple (USER_ID, GROUP_ID), or by the couple (GROUP_ID, USER_ID). The second index should be by the field that was left in second place in the last index defined.
Now why did I mentioned two options while defining the primary key over GROUP_USERS? That's because there is a slightly performance difference between a primary key index and any other duplicate index. It's very likely that your most common query into that table would be to find out if a user is in a certain group, and that query will perform fast in either way. What you have to consider is which of the following two queries will be more common.
Query which groups a certain user is in.
Query which users are in a certain group.
If 1 is more likely over 2, then your primary key should be (USER_ID, GROUP_ID), otherwise (GROUP_ID, USER_ID).

If I understand your question properly, what you might be missing is that Primary Keys (for that matter, Foreign Keys as well) can be what is called Composite, meaning that they contain more than one column... That's what you want here. A composite Primary Key on both UserId and GroupId, and a Foreign Key on each one indivudyally, that each points to (references) the PK in the respective parent table.

Multiple foreign keys to a single column

I'm defining a database for a customer/ order system where there are two highly distinct types of customers. Because they are so different having a single customer table would be very ugly (it'd be full of null columns as they are pointless for one type).
Their orders though are in the same format. Is it possible to have a CustomerId column in my Order table which has a foreign key to both the Customer Types? I have set it up in SQL server and it's given me no problems creating the relationships, but I'm yet to try inserting any data.
Also, I'm planning on using nHibernate as the ORM, could there be any problems introduced by doing the relationships like this?

No, you can't have a single field as a foreign key to two different tables. How would you tell where to look for the key?
You would at least need a field that tells what kind of user it is, or two separate foreign keys.
You could also put the information that is common for all users in one table and have separate tables for the information that is specific for the user types, so that you have a single table with user id as primary key.

A foreign key can only reference a single primary key, so no. However, you could use a bridge table:
CustomerA <---- CustomerA_Orders ----> Order
CustomerB <---- CustomerB_Orders ----> Order
So Order doesn't even have a foreign key; whether this is desirable, though...

I inherited a SQL Server database where this was done (a single column used in four foreign key relationships with four unrelated tables), so yes, it's possible. My predecessor is gone, though, so I can't ask why he thought it was a good idea.
He used a GUID column ("uniqueidentifier" type) to avoid the ambiguity problem, and he turned off constraint checking on the foreign keys, since it's guaranteed that only one will match. But I can think of lots of reasons that you shouldn't, and I haven't thought of any reasons you should.
Yours does sound like the classical "specialization" problem, typically solved by creating a parent table with the shared customer data, then two child tables that contain the data unique to each class of customer. Your foreign key would then be against the parent customer table, and your determination of which type of customer would be based on which child table had a matching entry.

You can create a foreign key referencing multiple tables. This feature is to allow vertical partioining of your table and still maintain referential integrity. In your case however, this is not applicable.
Your best bet would be to have a CustomerType table with possible columns - CustomerTypeID, CustomerID, where CustomerID is the PK and then refernce your OrderID table to CustomerID.
Raj

I know this is a very old question; however if other people are finding this question through the googles, and you don't mind adding some columns to your table, a technique I've used (using the original question as a hypothetical problem to solve) is:
Add a [CustomerType] column. The purpose of storing a value here is to indicate which table holds the PK for your (assumed) [CustomerId] FK column. Optional - addition of a check constraint (to ensure CustomerType is in CustomerA or CustomerB) will help you sleep better at night.
Add a computed column for each [CustomerType], eg:
[CustomerTypeAId] as case when [CustomerType] = 'CustomerA' then [CustomerId] end persisted
[CustomerTypeBId] as case when [CustomerType] = 'CustomerB' then [CustomerId] end persisted
Add your foreign keys to the calculated (and persisted) columns.
Caveat: I'm primarily in a MSSQL environment; so I don't know how well this translates to other DBMS (ie: Postgres, ORACLE, etc).

As noted, if the key is, say, 12345, how would you know which table to look it up in? You could, I suppose, do something to insure that the key values for the two tables never overlapped, but this is too ugly and painful to contemplate. You could have a second field that says which customer type it is. But if you're going to have two fields, why not have one field for customer type 1 id and another for customer type 2 id.
Without knowing more about your app, my first thought is that you really should have a general customer table with the data that is common to both, and then have two additional tables with the data specific to each customer type. I would think that there must be a lot of data common to the two -- basic stuff like name and address and customer number at the least -- and repeating columns across tables sucks big time. The additional tables could then refer back to the base table. As there is then a single key for the base table, the issue of foreign keys having to know which table to refer to evaporates.

Two distinct types of customer is a classic case of types and subtypes or, if you prefer, classes and subclasses. Here is an answer from another question.
Essentially, the class-table-inheritance technique is like Arnand's answer. The use of the shared-primary-key technique is what allows you to get around the problems created by two types of foreign key in one column. The foreign key will be customer-id. That will identify one row in the customer table, and also one row in the appropriate kind of customer type table, as the case may be.

Create a "customer" table include all the columns that have same data for both types of customer.
Than create table "customer_a" and "customer_b"
Use "customer_id" from "consumer" table as foreign key in "customer_a" and "customer_b"
customer
|
---------------------------------
| |
cusomter_a customer_b

Why specify primary/foreign key attributes in column names

A couple of recent questions discuss strategies for naming columns, and I was rather surprised to discover the concept of embedding the notion of foreign and primary keys in column names. That is
select t1.col_a, t1.col_b, t2.col_z
from t1 inner join t2 on t1.id_foo_pk = t2.id_foo_fk
I have to confess I have never worked on any database system that uses this sort of scheme, and I'm wondering what the benefits are. The way I see it, once you've learnt the N principal tables of a system, you'll write several orders of magnitude more requests with those tables.
To become productive in development, you'll need to learn which tables are the important tables, and which are simple tributaries. You'll want to commit an good number of column names to memory. And one of the basic tasks is to join two tables together. To reduce the learning effort, the easiest thing to do is to ensure that the column name is the same in both tables:
select t1.col_a, t1.col_b, t2.col_z
from t1 inner join t2 on t1.id_foo = t2.id_foo
I posit that, as a developer, you don't need to be reminded that much about which columns are primary keys, which are foreign and which are nothing. It's easy enough to look at the schema if you're curious. When looking at a random
tx inner join ty on tx.id_bar = ty.id_bar
... is it all that important to know which one is the foreign key? Foreign keys are important only to the database engine itself, to allow it to ensure referential integrity and do the right thing during updates and deletes.
What problem is being solved here? (I know this is an invitation to discuss, and feel free to do so. But at the same time, I am looking for an answer, in that I may be genuinely missing something).

I agree with you. Putting this information in the column name smacks of the crappy Hungarian Notation idiocy of the early Windows days.

I agree with you that the foreign key column in a child table should have the same name as the primary key column in the parent table. Note that this permits syntax like the following:
SELECT * FROM foo JOIN bar USING (foo_id);
The USING keyword assumes that a column exists by the same name in both tables, and that you want an equi-join. It's nice to have this available as shorthand for the more verbose:
SELECT * FROM foo JOIN bar ON (foo.foo_id = bar.foo_id);
Note, however, there are cases when you can't name the foreign key the same as the primary key it references. For example, in a table that has a self-reference:
CREATE TABLE Employees (
emp_id INT PRIMARY KEY,
manager_id INT REFERENCES Employees(emp_id)
);
Also a table may have multiple foreign keys to the same parent table. It's useful to use the name of the column to describe the nature of the relationship:
CREATE TABLE Bugs (
...
reported_by INT REFERENCES Accounts(account_id),
assigned_to INT REFERENCES Accounts(account_id),
...
);
I don't like to include the name of the table in the column name. I also eschew the obligatory "id" as the name of the primary key column in every table.

I've espoused most of the ideas proposed here over the 20-ish years I've been developing with SQL databases, I'm embarrassed to say. Most of them delivered few or none of the expected benefits and were with hindsight, a pain in the neck.
Any time I've spent more than a few hours with a schema I've fairly rapidly become familiar with the most important tables and their columns. Once it got to a few months, I'd pretty much have the whole thing in my head.
Who is all this explanation for? Someone who only spends a few minutes with the design isn't going to be doing anything serious anyway. Someone who plans to work with it for a long time will learn it if you named your columns in Sanskrit.
Ignoring compound primary keys, I don't see why something as simple as "id" won't suffice for a primary key, and "_id" for foreign keys.
So a typical join condition becomes customer.id = order.customer_id.
Where more than one association between two tables exists, I'd be inclined to use the association rather than the table name, so perhaps "parent_id" and "child_id" rather than "parent_person_id" etc

I only use the tablename with an Id suffix for the primary key, e.g. CustomerId, and foreign keys referencing that from other tables would also be called CustomerId. When you reference in the application it becomes obvious the table from the object properties, e.g. Customer.TelephoneNumber, Customer.CustomerId, etc.

I used "fk_" on the front end of any foreign keys for a table mostly because it helped me keep it straight when developing the DB for a project at my shop. Having not done any DB work in the past, this did help me. In hindsight, perhaps I didn't need to do that but it was three characters tacked onto some column names so I didn't sweat it.
Being a newcomer to writing DB apps, I may have made a few decisions which would make a seasoned DB developer shudder, but I'm not sure the foreign key thing really is that big a deal. Again, I guess it is a difference in viewpoint on this issue and I'll certainly take what you've written and cogitate on it.
Have a good one!

I agree with you--I take a different approach that I have seen recommended in many corporate environments:
Name columns in the format TableNameFieldName, so if I had a Customer table and UserName was one of my fields, the field would be called CustomerUserName. That means that if I had another table called Invoice, and the customer's user name was a foreign key, I would call it InvoiceCustomerUserName, and when I referenced it, I would call it Invoice.CustomerUserName, which immediately tells me which table it's in.
Also, this naming helps you to keep track of the tables your columns are coming from when you're joiining.
I only use FK_ and PK_ in the ACTUAL names of the foreign and primary keys in the DBMS.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas