Table needs more than one identifier - sql

I am in no way a SQL expert so I am sure I did something wrong. I have read a few questions on here about needing a primary key. The way I created this table I can't find a way to actually have a unique key. It is a survey type database. I have a table for the main details like date, triage number, and the person involved. Another table for the questions results and another for the comments. I would have made the triage unique but more than one person can be involved so the same triage number would be used more than once. The people involved can appear more than once as well. The only truly unique thing is combining the person with the triage. I thought about an auto key but it would serve no purpose. Can using two identifiers be an acceptable practice for a survey type table?

The important part:
"... more than one person can be involved so the same triage number would be used more than once. The people involved can appear more than once as well."
Based on your comments, data in these two fields, for example:
Triage Person
------ ------
1 PersonA
1 PersonB
...
7 PersonA
7 PersonB
is fine in that Triage and Person can make a composite key, provided each person recorded in the Person field is uniquely identifiable. That is, if ea. person value is a name like "John Smith", you may have a problem if there are 2 or more John Smiths answering the survey. So, your Person value itself has to identify people uniquely. Assuming the triage nos. are distinguished (i.e., no triage no. represents more than one semantically-relevant triage position), these two fields as the composite key will work for you if and only if at no time does your survey create more than one unique triage-person combination.
The foreign key for each of your other tables ought to be the main table's composite key combination, but if the other two tables can be merged into the main one, consider it to reduce join burdens. E.g.: if the comments table stores only comments in a single field and nothing more, why not include that field in the main table and get rid of the comments table?

Your question is quite general and I don't have enough information to give you a definite answer but hopefully my comments below can help.
It is not a problem to use a composite primary key (key consisting of 2 or more columns). It is more often used in linking tables, e.g. in many-to-many relationships.
One thing that you should consider is that if you want to also refer to a table with a composite primary key from other tables, you will have to refer to 2 columns in the foreign key, all the joins, etc. It may be easier to create a separate column for a primary key (e.g. autoincrementing number).

Related

How to remove redundancy when a lot of columns can have repeating data?

I am working on a library system. I have a book table in which there are 12 columns including author, title, publication etc. Among those 12 there are 2 columns book_id and isbn that are unique. When a library get n copies of a same book the only columns that are going to be unique are book_id and isbn. All other columns are going to repeat n times. Is this a bad design. Is there a way to remove this redundency? I thought of creating another table where I could store uniquely all the redudent columns and seperate book_id and isbn form it. Should I do that or is the table just fine as it is?
I think you should use two different tables.
In one table you can store unique keys like book_id, isbn_id. And in another table you can use one of these book_id or isbn_id for a reference foreign key. Or other way is you can create an auto increment id also in first table for foreign key reference to second table.
In second table you can keep all other 10 columns + 1 column for reference from 1st table.
This will help you to maintain redundancy and this first table can be also used for mapping as well if in future you want to extend database tables.
What you have described sounds like two different tables:
books which defines the attributes of a book.
copies which defines the attributes of an incoming copy.
You seem to understand the attributes of books.
The attributes of copies might include (besides a foreign key reference to books)
condition
acquired_date
source
other information about the acquisition
It might also have "disposed of" date as well. Or you could maintain that in a copies_transactions table of some sort.

Should I use composite primary keys in this example?

I am not a SQL expert, so I defer to someone with more knowledge. So here is my question. I have designed a database where every table has an Id column (auto increment) that is the primary key. And I use this design without any issue - it makes sense to me I simply do referential integrity by way of this simple primary key since the Id columns of all tables uniquely identifies each row.
Some of my colleagues have suggested that I use composite primary keys, but I see no value in doing that. The purpose of a primary key is to enable referential integrity, and that is what it does.
For example, this is a toy example but it demonstrates my design:
tbl_Customers
-------------
Id (PK)
Code (VARCHAR)
Name (VARCHAR)
Surname (VARCHAR)
tbl_CustomerDetails
-----------------
Id (PK)
CustomerId (FK to tbl_Customers)
SomeDetails (VARCHAR)
This does not use a seperate 'linking' table, but it does not matter, it demonstrates my design.
Some of my colleagues noted that I should have a composite primary key on tbl_Customers to not only include Id as I do now, but also Code. They say that this will improve performance and that it will ensure that Code will not duplicate.
My counter argument is that if I want Code to not duplicate, I can create a UNIQUE INDEX on Code. And that, since my front-end only ever works with Ids and never allows for example searching (SELECTing) by Code, that there can not be a performance improvement. On my presentation layer, if I show for example Customers and I allow the user to select one to see the associated CustomerDetails, I will select the corresponding tbl_CustomerDetails rows on CustomerId where it matches the selected Id of the clicked customer.
What do you suggest? Am I correct or am I wrong? I am always willing to learn, and if I am wrong here I'd love to learn. But at the moment, I do not feel their arguments are valid. Which is why I am asking the community.
Thanks!
I would suggest to go with single column primary key instead on composite keys. The biggest drawback with composite key is that you require more than one value /columnto identify a row. If your application uses an O/RM (Object/Relation Mapping) layer, then you will have fits mapping these database rows to objects in a programming language. O/RM's are easiest to set up when every table has a single column primary key.
Programming aside,the major drawback of composite keys in general, and especially composite keys requiring this many columns, is all of this data needs to be specified and copied to child tables in order to set up proper relationships between tables which is wastage of space and it increase unnecessary complexity too.
The biggest headache I've run into with developers is they assume "uniqueness of data" equates "identifying a row in the database". This is rarely the case. I've found applications and databases to be much more maintainable and easy to build by defaulting to single column primary keys, and using composite keys as an exception to the rule, then enforcing data uniqueness by using unique constraints or indexes on those columns.
After reading your question and arguments I would like to say you are not wrong.
Since you have ID auto incremented which will always provides uniqueness to your row.
Now talking about code column, then if code should be unique then you can always have UNIQUE constraint for column which will not allow duplicate values for code and since you are doing it from front end so no need to add composite primary key with(ID,Code) but make sure you add UNIQUE constraint for code column.
You have already given explanation buddy and I believe you are totally right.
If you are going to make composite primary key then you have to consider two things here:
Composite PK on (ID,Code) will allow duplicate ID's and duplicate codes, it will not
allow duplicate combinations.
you have to add code column in tbl_CustomerDetails table as well if you are going
to link both tables.
In Summary I would like to say I don't feel that in this case Composite Primary Key is required.
If your question is, should you use a composite key in your example, the answer to that is a resounding NO! Your colleague's suggestion to add code as a composite key is not only unnecessary but will more than likely introduce problems for you down the road. Let me illustrate:
Let's say that you'd like to distinguish customers by code: All members are having code MEMB plus the Id number, all vendors have code VEND plus the Id number, and all customers have code CUST plus Id.
Among the "customers" are donors who don't purchase anything but give a contribution. You decide to make a distinction between donors and customers.
That means you'll have to change the code of some of your customers from CUST to DONOR plus Id. To make that change you will have to UPDATE EVERY INSTANCE of CUST that's a donor into DONOR. That could be a nightmare to say the least as you'll need to figure out every table that has that Id as a reference.
With your current set up, all you have to do is update the Code in ONE place and no more changes are needed. So you're right in your implementation.

Numeric IDs vs. String IDs

I'm using a very stripped down example here so please ask if you need more context.
I'm in the process of restructuring/normalising a database where the ID fields in the majority of the tables have primary key fields which are auto-incremented numerical ID's (1,2,3 etc.) and I'm thinking I need to change the ID field from a numerical value to a string value generated from data in the row.
My reasoning for this is as follows:
I have 5 tables; Staff, Members, Volunteers, Interns and Students; all of these have numeric ID's.
I have another table called BuildingAttendance which logs when people visited the premises and for what reason which has the following relevant fields:
ID Type Premises Attended
To differentiate between staff and members. I use the type field, using MEM for member and STA for staff, etc. So as an example:
ID Type Premises Attended
1 MEM Building A 27/6/15
1 STA Building A 27/6/15
2 STU Building B 27/6/15
I'm thinking it might be a better design design to use an ID similar to the following:
ID Premises Attended
MEM1 Building A 27/6/15
STA1 Building A 27/6/15
STU2 Building B 27/6/15
What would be the best way to deal with this? I know that if my primary key is a string my query performance may take a hit, but is this easier than having 2 columns?
tl;dr - How should I deal a table that references records from other tables with the same ID system?
Auto-incremented numeric ids have several advantages over strings:
They are easier to implement. In order to generate the strings (as you want them), you would need to implement a trigger or computed column.
They occupy a fixed amount of storage (probably 4 bytes), so they are more efficient in the data record and in indexes.
They allow members to change between types, without affecting the key.
The problem that you are facing is that you have subtypes of a supertype. This information should be stored with the person, not in the attendance record (unless a person could change their type with each visit). There are several ways to approach this in SQL, none as clean as simple class inheritance in a programming language.
One technique is to put all the data in a single table called something like Persons. This would have a unique id, a type, and all the columns from your five tables. The problem is when the columns from your subtables are very different.
In that case, have a table called persons with a unique primary key and the common columns. Then have separate tables for each one and use the PersonId as the primary key for these tables.
The advantage to this approach is that you can have a foreign key reference to Persons for something like BuildingAttendance. And, you can also have foreign key references to each of the subtypes, for other tables where appropriate.
Gordon Linoff already provided an answer that points out the type/supertype issue. I refer to this a class/subclass, but that's just a difference in terminology.
There are two tags in this area that collect questions that relate to class/subclass. Here they are:
class-table-inheritance
shared-primary-key
If you will look over the info tab for each of these tags, you'll see a brief outline. Plus the answers to the questions will help you with your case.
By creating a single table called Person, with an autonumber ID, you provide a handy way of referencing a person, regardless of that person's type. By making the staff, member, volunteer, student, and intern tables use a copy of this ID as their own ID you will facilitate whatever joins you need to perform.
The decision about whether to include type in attendance depends on whether you want to retrieve the data with the person's current type, or with the type the person had at the time of the attendance.

Is ID column required in SQL?

Traditionally I have always used an ID column in SQL (mostly mysql and postgresql).
However I am wondering if it is really necessary if the rest of the columns in each row make in unique. In my latest project I have the "ID" column set as my primary key, however I never call it or use it in any way, as the data in the row makes it unique and is much more useful for me.
So, if every row in a SQL table is unique, does it need a primary key ID table, and are there ant performance changes with or without one?
Thanks!
EDIT/Additional info:
The specific example that made me ask this question is a table I am using for a many-to-many-to-many-to-many table (if we still call it that at that point) it has 4 columns (plus ID) each of which represents an ID of an external table, and each row will always be numeric and unique. only one of the columns is allowed to be null.
I understand that for normal tables an ID primary key column is a VERY good thing to have. But I get the feeling on this particular table it just wastes space and slows down adding new rows.
If you really do have some pre-existing column in your data set that already does uniquely identify your row - then no, there's no need for an extra ID column. The primary key however must be unique (in ALL circumstances) and cannot be empty (must be NOT NULL).
In my 20+ years of experience in database design, however, this is almost never truly the case. Most "natural" ID's that appear to be unique aren't - ultimately. US Social Security Numbers aren't guaranteed to be unique, and most other "natural" keys end up being almost unique - and that's just not good enough for a database system.
So if you really do have a proper, unique key in your data already - use it! But most of the time, it's easier and more convenient to have just a single surrogate ID that you can guarantee will be unique over all rows.
Don't confuse the logical model with the implementation.
The logical model shows a candidate key (all columns) which could makes your primary key.
Great. However...
In practice, having a multi column primary key has downsides: it's wide, not good when clustered etc. There is plenty of information out there and in the "related" questions list on the right
So, you'd typically
add a surrogate key (ID column)
add a unique constraint to keep the other columns unique
the ID column will be the clustered key (can be only one per table)
You can make either key the primary key now
The main exception is link or many-to-many tables that link 2 ID columns: a surrogate isn't needed (unless you have a braindead ORM)
Edit, a link: "What should I choose for my primary key?"
Edit2
For many-many tables: SQL: Do you need an auto-incremental primary key for Many-Many tables?
Yes, you could have many attributes (values) in a record (row) that you could use to make a record unique. This would be called a composite primary key.
However it will be much slower in general because the construction of the primary index will be much more expensive. The primary index is used by relational database management systems (RDBMS) not only to determine uniqueness, but also in how they order and structure records on disk.
A simple primary key of one incrementing value is generally the most performant and the easiest solution for the RDBMS to manage.
You should have one column in every table that is unique.
EDITED...
This is one of the fundamentals of database table design. It's the row identifier - the identifier identifies which row(s) are being acted upon (updated/deleted etc). Relying on column combinations that are "unique", eg (first_name, last_name, city), as your key can quickly lead to problems when two John Smiths exist, or worse when John Smith moves city and you get a collision.
In most cases, it's best to use a an artificial key that's guaranteed to be unique - like an auto increment integer. That's why they are so popular - they're needed. Commonly, the key column is simply called id, or sometimes <tablename>_id. (I prefer id)
If natural data is available that is unique and present for every row (perhaps retinal scan data for people), you can use that, but all-to-often, such data isn't available for every row.
Ideally, you should have only one unique column. That is, there should only be one key.
Using IDs to key tables means you can change the content as needed without having to repoint things
Ex. if every row points to a unique user, what would happen if he/she changed his name to let say John Blblblbe which had already been in db? And then again, what would happen if you software wants to pick up John Blblblbe's details, whose details would be picked up? the old John's or the one ho has changed his name? Well if answer for bot questions is 'nothing special gonna happen' then, yep, you don't really need "ID" column :]
Important:
Also, having a numeric ID column with numbers is much more faster when you're looking for an exact row even when the table hasn't got any indexing keys or have more than one unique
If you are sure that any other column is going to have unique data for every row and isn't going to have NULL at any time then there is no need of separate ID column to distinguish each row from others, you can make that existing column primary key for your table.
No, single-attribute keys are not essential and nor are surrogate keys. Keys should have as many attributes as are necessary for data integrity: to ensure that uniqueness is maintained, to represent accurately the universe of discourse and to allow users to identify the data of interest to them. If you have already identified a suitable key and if you don't find any real need to create another one then it would make no sense to add redundant attributes and indexes to your table.
An ID can be more meaningful, for an example an employee id can represent from which department he is, year of he join and so on. Apart from that RDBMS supports lots operations with ID's.

Multiple foreign keys to a single column

I'm defining a database for a customer/ order system where there are two highly distinct types of customers. Because they are so different having a single customer table would be very ugly (it'd be full of null columns as they are pointless for one type).
Their orders though are in the same format. Is it possible to have a CustomerId column in my Order table which has a foreign key to both the Customer Types? I have set it up in SQL server and it's given me no problems creating the relationships, but I'm yet to try inserting any data.
Also, I'm planning on using nHibernate as the ORM, could there be any problems introduced by doing the relationships like this?
No, you can't have a single field as a foreign key to two different tables. How would you tell where to look for the key?
You would at least need a field that tells what kind of user it is, or two separate foreign keys.
You could also put the information that is common for all users in one table and have separate tables for the information that is specific for the user types, so that you have a single table with user id as primary key.
A foreign key can only reference a single primary key, so no. However, you could use a bridge table:
CustomerA <---- CustomerA_Orders ----> Order
CustomerB <---- CustomerB_Orders ----> Order
So Order doesn't even have a foreign key; whether this is desirable, though...
I inherited a SQL Server database where this was done (a single column used in four foreign key relationships with four unrelated tables), so yes, it's possible. My predecessor is gone, though, so I can't ask why he thought it was a good idea.
He used a GUID column ("uniqueidentifier" type) to avoid the ambiguity problem, and he turned off constraint checking on the foreign keys, since it's guaranteed that only one will match. But I can think of lots of reasons that you shouldn't, and I haven't thought of any reasons you should.
Yours does sound like the classical "specialization" problem, typically solved by creating a parent table with the shared customer data, then two child tables that contain the data unique to each class of customer. Your foreign key would then be against the parent customer table, and your determination of which type of customer would be based on which child table had a matching entry.
You can create a foreign key referencing multiple tables. This feature is to allow vertical partioining of your table and still maintain referential integrity. In your case however, this is not applicable.
Your best bet would be to have a CustomerType table with possible columns - CustomerTypeID, CustomerID, where CustomerID is the PK and then refernce your OrderID table to CustomerID.
Raj
I know this is a very old question; however if other people are finding this question through the googles, and you don't mind adding some columns to your table, a technique I've used (using the original question as a hypothetical problem to solve) is:
Add a [CustomerType] column. The purpose of storing a value here is to indicate which table holds the PK for your (assumed) [CustomerId] FK column. Optional - addition of a check constraint (to ensure CustomerType is in CustomerA or CustomerB) will help you sleep better at night.
Add a computed column for each [CustomerType], eg:
[CustomerTypeAId] as case when [CustomerType] = 'CustomerA' then [CustomerId] end persisted
[CustomerTypeBId] as case when [CustomerType] = 'CustomerB' then [CustomerId] end persisted
Add your foreign keys to the calculated (and persisted) columns.
Caveat: I'm primarily in a MSSQL environment; so I don't know how well this translates to other DBMS (ie: Postgres, ORACLE, etc).
As noted, if the key is, say, 12345, how would you know which table to look it up in? You could, I suppose, do something to insure that the key values for the two tables never overlapped, but this is too ugly and painful to contemplate. You could have a second field that says which customer type it is. But if you're going to have two fields, why not have one field for customer type 1 id and another for customer type 2 id.
Without knowing more about your app, my first thought is that you really should have a general customer table with the data that is common to both, and then have two additional tables with the data specific to each customer type. I would think that there must be a lot of data common to the two -- basic stuff like name and address and customer number at the least -- and repeating columns across tables sucks big time. The additional tables could then refer back to the base table. As there is then a single key for the base table, the issue of foreign keys having to know which table to refer to evaporates.
Two distinct types of customer is a classic case of types and subtypes or, if you prefer, classes and subclasses. Here is an answer from another question.
Essentially, the class-table-inheritance technique is like Arnand's answer. The use of the shared-primary-key technique is what allows you to get around the problems created by two types of foreign key in one column. The foreign key will be customer-id. That will identify one row in the customer table, and also one row in the appropriate kind of customer type table, as the case may be.
Create a "customer" table include all the columns that have same data for both types of customer.
Than create table "customer_a" and "customer_b"
Use "customer_id" from "consumer" table as foreign key in "customer_a" and "customer_b"
customer
|
---------------------------------
| |
cusomter_a customer_b