How to design entity tables for entities with multiple names - sql

I want to create a table structure to store customers and I am facing a challenge: for each customer I can have multiple names, one being the primary one and the others being the alternative names.
The initial take on the tables looks like this:
CREATE TABLE dbo.Customer (
CustomerId INT IDENTITY(1,1) NOT NULL --PK
-- other fields below )
CREATE TABLE dbo.CustomerName (
CustomerNameId INT IDENTITY(1,1) NOT NULL -- PK
,CustomerId INT -- FK to Customer
,CustomerName VARCHAR(30)
,IsPrimaryName BIT)
Though, the name of the customer is part of the Customer entity and I feel that it belongs to the Customer table.
Is there a better design for this situation?
Thank you

Personally, I would keep the Primary name in the Customer table and create an "AlternateNames" table with a zero-to-many relationship to Customer.
This is because presumably most of the time when you are returning customer data, you are only going to be interested in returning the Primary Name. And probably the main (if not only) reason you want the alternate names is for looking up customers when an alternate name has been supplied.

Unfortunately, this is too long for a comment.
Before figuring this out, more information is needed.
Is additional information needed for names? For instance, language or title or date created?
Are the names unique? Is the uniqueness within a customer or over all names?
Are the primary names unique?
Does every customer have to have a primary name?
How often does the primary name change to an alternate name? (As opposed to just having the name updated.)
When querying the data, will you know if the name is a primary or alternate name? (Or do they all need to be compared?)
Depending on the answer to this question, the appropriate data structure can have some tricky nuances. For instance, if you have a flag to identify the primary name, it can be tricky to ensure that exactly one row has this value set -- particularly when updating rows.
Note: If you update the question with the answers, I'll delete this.

Related

One Primary Key Value in many tables

This may seem like a simple question, but I am stumped:
I have created a database about cars (in Oracle SQL developer). I have amongst other tables a table called: Manufacturer and a table called Parentcompany.
Since some manufacturers are owned by bigger corporations, I will also show them in my database.
The parentcompany table is the "parent table" and the Manufacturer table the "child table".
for both I have created columns, each having their own Primary Key.
For some reason, when I inserted the values for my columns, I was able to use the same value for the primary key of Manufacturer and Parentcompany
The column: ManufacturerID is primary Key of Manufacturer. The value for this is: 'MBE'
The column: ParentcompanyID is primary key of Parentcompany. The value for this is 'MBE'
Both have the same value. Do I have a problem with the thinking logic?
Or do I just not understand how primary keys work?
Does a primary key only need to be unique in a table, and not the database?
I would appreciate it if someone shed light on the situation.
A primary key is unique for each table.
Have a look at this tutorial: SQL - Primary key
A primary key is a field in a table which uniquely identifies each
row/record in a database table. Primary keys must contain unique
values. A primary key column cannot have NULL values.
A table can have only one primary key, which may consist of single or
multiple fields. When multiple fields are used as a primary key, they
are called a composite key.
If a table has a primary key defined on any field(s), then you cannot
have two records having the same value of that field(s).
Primary key is table-unique. You can use same value of PI for every separate table in DB. Actually that often happens as PI often incremental number representing ID of a row: 1,2,3,4...
For your case more common implementation would be to have hierarchical table called Company, which would have fields: company_name and parent_company_name. In case company has a parent, in field parent_company_name it would have some value from field company_name.
There are several reasons why the same value in two different PKs might work out with no problems. In your case, it seems to flow naturally from the semantics of the data.
A row in the Manufacturers table and a row in the ParentCompany table both appear to refer to the same thing, namely a company. In that case, giving a company the same id in both tables is not only possible, but actually useful. It represents a 1 to 1 correspondence between manufacturers and parent companies without adding extra columns to serve as FKs.
Thanks for the quick answers!
I think I know what to do now. I will create a general company table, in which all companies will be stored. Then I will create, as I go along specific company tables like Manufacturer and parent company that reference a certain company in the company table.
To clarify, the only column I would put into the sub-company tables is a column with a foreign key referencing a column of the company table, yes?
For the primary key, I was just confused, because I hear so much about the key needing to be unique, and can't have the same value as another. So then this condition only goes for tables, not the whole database. Thanks for the clarification!

SQL - NULL foreign key

Please have a look at the database design below:
create table Person (id int identity, InvoiceID int not null)
create table Invoice (id int identity, date datetime)
Currently all persons have an invoiceID i.e. the InvoiceID is not null.
I want to extend the database so that some person does not have an Invoice. The original developer hated nulls and never uses them. I want to be consistent so I am wondering if there are other patterns I can use to extend the database to meet this requirement. How can this be approached without using nulls?
Please note that the two tables above are for illustration purposes. They are not the actual tables.
NULL is a very important feature in databases and programming in general. It is significantly different from being zero or any other value. It is most commonly used to signify absence of value (though it also can mean unknown value, but that's less used as the interpretation). If some people do not have an invoice, then you should truly allow NULL, as that matches your desired Schema
A common pattern would be to store that association in a separate table.
Person: Id
Invoice: Id
Assoc: person_id, assoc_id
Then if a person doesn't have an invoice, you simply don't have a row. This approach also allows a person to have more than one invoice id which might make sense.
The only way to represent the optional relationship while avoiding nulls is to use another table, as some other answers have suggested. Then the absence of a row for a given Person indicates the person has no Invoice. You can enforce a 1:1 relationship between this table and the Person table by making person_id be the primary or unique key:
CREATE TABLE PersonInvoice (
person_id INT NOT NULL PRIMARY KEY,
invoice_id INT NOT NULL,
FOREIGN KEY (person_id) REFERENCES Person(id),
FOREIGN KEY (invoice_id) REFERENCES Invoice(id)
);
If you want to permit each person to have multiple invoices, you can declare the primary key as the pair of columns instead.
But this solution is to meet your requirement to avoid NULL. This is an artificial requirement. NULL has a legitimate place in a data model.
Some relational database theorists like Chris Date eschew NULL, explaining that the existence of NULL leads to some troubling logical anomalies in relational logic. For this camp, the absence of a row as shown above is a better way to represent missing data.
But other theorists, including E. F. Codd who wrote the seminal paper on relational theory, acknowledged the importance of a placeholder that means either "not known" or "not applicable." Codd even proposed in a 1990 book that SQL needed two placeholders, one for "missing but applicable" (i.e. unknown), and the other for "missing but inapplicable."
To me, the anomalies we see when using NULL in certain ways are like the undefined results we see in arithmetic when we divide by zero. The solution is: don't do that.
But certainly we shouldn't use any non-NULL value like 0 or '' (empty string) to represent missing data. And likewise we shouldn't use a NULL as if it were an ordinary scalar value.
I wrote more about NULL in a chapter titled "Fear of the Unknown" in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
You need to move the invoice/person relation to another table.
You end up with
create table Person (id int person_identity)
create table PersonInvoice (id int person_id, InvoiceID int not null)
create table Invoice (id int identity, date datetime)
You need this for some databases to allow in InvoiceId to be a foreign key as some do not allow NULLS in a foreign key.
If a person only can have one invoice then PersonInvoice can have a unique constraint on the person_id as well as the two columns together. You can also enforce having a single person for a invoice by adding a unique constraint to the invoiceID field.

SQL One-to-One Relationship Definition

I'm designing a database and I'm not sure how to define one of the relationships. Here's the situation:
An invoice is created
If the product is not in stock then it needs to be manufactured and so a work order is created.
The relationship is one-to-one. However work orders are sometimes created for other purposes so the WorkOrder table will also be linked to other tables in a similar one-to-one relationship. Also, some Invoices won't have a work order at all. This means I can't define these relationships in the normal way by using the same primary key in both tables. Instead of doing this I've created a linking table and then set unique indexes on both fields to define the one-to-one relationship (see image).
(source: markevans.org)
.
Is this the best way?
Cheers
Mark
EDIT: I just realised that this design will allow a single work order to be linked to an invoice and also to one of the other tables I mentioned via 2 linking tables. I guess no solution is perfect.
Okay, this answer is SQL Server specific, but should be adaptable to other RDBMSs, with a little work. So far as I see, we have the following constraints:
An invoice may be associated with 0 or 1 Work Orders
A Work Order must be associated with an invoice or an ABC or a DEF
I'd design the WorkOrder table as follows:
CREATE TABLE WorkOrder (
WorkOrderID int IDENTITY(1,1) not null,
/* Other Columns */
InvoiceID int null,
ABCID int null,
DEFID int null,
/* Etc for other possible links */
constraint PK_WorkOrder PRIMARY KEY (WorkOrderID),
constraint FK_WorkOrder_Invoices FOREIGN KEY (InvoiceID) references Invoice (InvoiceID),
constraint FK_WorkOrder_ABC FOREIGN KEY (ABCID) references ABC (ABCID),
/* Etc for other FKs */
constraint CK_WorkOrders_SingleFK CHECK (
CASE WHEN InvoiceID is null THEN 0 ELSE 1 END +
CASE WHEN ABCID is null THEN 0 ELSE 1 END +
CASE WHEN DEFID is null THEN 0 ELSE 1 END
/* + other FK columns */
= 1
)
)
So, basically, this table is constrained to only FK to one other table, no matter how many PKs are defined. If necessary, a computed column could tell you the "Type" of item that this is linked to, based on which FK column is non-null, or the type and a single int column could be real columns, and InvoiceID, ABCID, etc could be computed columns.
The final thing to ensure is that an invoice only has 0 or 1 Work Orders. If your RDMBS ignores nulls in unique constraints, this is as simple as applying such a constraint to each FK column. For SQL Server, you need to use a filtered index (>=2008) or an indexed view (<=2005). I'll just show the filtered index:
CREATE UNIQUE INDEX IX_WorkItems_UniqueInvoices on
WorkItem (InvoiceID) where (InvoiceID is not null)
Another way to deal with keeping WorkOrders straight is to include a WorkOrder type column in WorkOrder (e.g. 'Invoice','ABC','DEF'), including a computed or column constrained by check constraint to contain the matching value in the link table, and introduce a second foreign key:
CREATE TABLE WorkOrder (
WorkOrderID int IDENTITY(1,1) not null,
Type varchar(10) not null,
constraint PK_WorkOrder PRIMARY KEY (WorkOrderID),
constraint UQ_WorkOrder_TypeCheck UNIQUE (WorkOrderID,Type),
constraint CK_WorkOrder_Types CHECK (Type in ('INVOICE','ABC','DEF'))
)
CREATE TABLE Invoice_WorkOrder (
InvoiceID int not null,
WorkOrderID int not null,
Type varchar(10) not null default 'INVOICE',
constraint PK_Invoice_WorkOrder PRIMARY KEY (InvoiceID),
constraint UQ_Invoice_WorkOrder_OrderIDs UNIQUE (WorkOrderID),
constraint FK_Invoice_WorkOrder_Invoice FOREIGN KEY (InvoiceID) references Invoice (InvoiceID),
constraint FK_Invoice_WorkOrder_WorkOrder FOREIGN KEY (WorkOrderID) references WorkOrder (WorkOrderID),
constraint FK_Invoice_WorkOrder_TypeCheck FOREIGN KEY (WorkOrderID,Type) references WorkOrder (WorkOrderID,Type),
constraint CK_Invoice_WorkOrder_Type CHECK (Type = 'INVOICE')
)
The only disadvantage to this model, although closer to your original proposal, is that you can have a work order that isn't actually linked to any other item (although it claims to be for an e.g INVOICE).
What you have looks to be a perfectly normal way to construct your tables.
If you think you might like to use only one link table between your WorkOrder table and whatever other tables that may have WorkOrders, you could use a link table like:
WorkOrders
OtherId (Could be InvoiceId, or an ID for SomethingElse that may have a WorkOrder)
OtherType (ENUM - something like 'Invoice', 'SomethingElse')
WorkOrderId
So the issue is that you can have invoices that don't have work orders and work orders that don't have invoices but the two need to be linked when there is a link. I would say based upon that description that your database diagram is pretty good. This would open you up to allowing more than a one-to-one relationship. This way down the road you can consider having two work orders for one invoice. You might also have one work order that handles two invoices. This opens you up to a lot of possibilities that you may not need now but that you might in the future.
I would recommend your current design. In the future, you may want to add more information about the link between invoice and work order. This middle table will allow you to add this information.
In the interest of fairness to the other side of the coin, you do need to consider speed/number of tables/etc. that this will cause. For example, you have now created a third table which increased your table count by 50% in this example. Look at the rest of your database. If you did this everywhere, you would probably have the most normalized database but it might not be the most performant because of all the joins that are necessary. Basically, this isn't a "one-size-fits-all" solution. Instead it is a design choice. Personally, I hate nullable foreign key fields. I find they don't give me the granularity I usually want with my database designs.
Your schema corresponds to a many-to-many link between the 2 tables. You are de facto opening here the possibility to have one work order for multiple invoices, and multiple work orders for one invoice. The model offers then possibilities far above the rules you are setting.
You could use a simpler schema, that will reflect the (0,1) relation between work orders and invoices, and the (0,1) relation between Invoices and Work orders:
a Work Order can be independant from
an invoice, or linked to one specific
invoice: it has a (0,1) relation to Invoice table
An invoice can have no work orders, or one work orders: it has a (0,1) relation to Work Orders Table
Such a relation can be translated by the following model and rules
Invoice
id_Invoice, Primary Key
WorkOrder
id_WorkOrder, Primary Key
id_Invoice, Foreign Key, Nulls accepted, unique value
With such a structure, it will be easy to add new 'dependants' to work orders table. If, for example, you want to open the possibility to launch work orders from restocking orders (where you want to have minimal quantities of some items in stock), you can then just add the corresponding field to the WorkOrder table:
id_RestockingOrder, ForeignKey, Nulls accepted, unique value
You'll be then able to 'see' from where your WorkOrder comes: an invoice, a restocking order, etc.
Seems it corresponds to your needs.
Edit:
as noted by #mark, SQL Server will not allow multiple null values, in contradiction with ANSI specs (check here for some more details), As we do not want to wait for SQL Server 2011 to have this rule implemented, there is a workaround here, where you can build a view excluding the null values and set a unique index on this view. I must admit that I did not like this solution ...
There is still the possibility to implement the 'unique if not null' rule in your code. It will still be simpler than implementing the many-to-many model (with the Invoice_WorkOrder table) you are proposing and manage all additional unicity rules that you'll need to implement.
There is no real need for the link table, just have them linked directly and allow for NULL in the reference field of the work order. Because a work order can be linked to multiple tables what I would do is add a reference id on every work order to every table that can link from it. So you would have:
Invoice
PK - ID
FK - WorkOrderID
SomeOtherTable
PK - ID
FK - WorkOrderID
WorkOrder
PK - ID
FK - InvoiceID (allow NULL)
FK - SomeOtherTableID (allow NULL)
To make sure a WorkOrder is linked to only one item, you have to use code to validate the row (or perhaps a stored procedure which I cannot come up with right now).
EDIT: PS, if you want to use a link table, give it a generic name and add all the linked tables with the same sort of construct I just described allowing for NULL's. In my eyes adding the extra table makes the schema larger than it needs to be, but if a work order contains a lot of big text fields it could increase performance slightly and reduce database size with all the indexes flying around. In anything but the largest applications, I would consider it over-normalization though, but that is a matter of style.

Generic Database table design

Just trying to figure out the best way to design my table for the following scenario:
I have several areas in my system (documents, projects, groups and clients) and each of these can have comments logged against them.
My question is should I have one table like this:
CommentID
DocumentID
ProjectID
GroupID
ClientID
etc
Where only one of the ids will have data and the rest will be NULL or should I have a separate CommentType table and have my comments table like this:
CommentID
CommentTypeID
ResourceID (this being the id of the project/doc/client)
etc
My thoughts are that option 2 would be more efficient from an indexing point of view. Is this correct?
Option 2 is not a good solution for a relational database. It's called polymorphic associations (as mentioned by #Daniel Vassallo) and it breaks the fundamental definition of a relation.
For example, suppose you have a ResourceId of 1234 on two different rows. Do these represent the same resource? It depends on whether the CommentTypeId is the same on these two rows. This violates the concept of a type in a relation. See SQL and Relational Theory by C. J. Date for more details.
Another clue that it's a broken design is that you can't declare a foreign key constraint for ResourceId, because it could point to any of several tables. If you try to enforce referential integrity using triggers or something, you find yourself rewriting the trigger every time you add a new type of commentable resource.
I would solve this with the solution that #mdma briefly mentions (but then ignores):
CREATE TABLE Commentable (
ResourceId INT NOT NULL IDENTITY,
ResourceType INT NOT NULL,
PRIMARY KEY (ResourceId, ResourceType)
);
CREATE TABLE Documents (
ResourceId INT NOT NULL,
ResourceType INT NOT NULL CHECK (ResourceType = 1),
FOREIGN KEY (ResourceId, ResourceType) REFERENCES Commentable
);
CREATE TABLE Projects (
ResourceId INT NOT NULL,
ResourceType INT NOT NULL CHECK (ResourceType = 2),
FOREIGN KEY (ResourceId, ResourceType) REFERENCES Commentable
);
Now each resource type has its own table, but the serial primary key is allocated uniquely by Commentable. A given primary key value can be used only by one resource type.
CREATE TABLE Comments (
CommentId INT IDENTITY PRIMARY KEY,
ResourceId INT NOT NULL,
ResourceType INT NOT NULL,
FOREIGN KEY (ResourceId, ResourceType) REFERENCES Commentable
);
Now Comments reference Commentable resources, with referential integrity enforced. A given comment can reference only one resource type. There's no possibility of anomalies or conflicting resource ids.
I cover more about polymorphic associations in my presentation Practical Object-Oriented Models in SQL and my book SQL Antipatterns.
Read up on database normalization.
Nulls in the way you describe would be a big indication that the database isn't designed properly.
You need to split up all your tables so that the data held in them is fully normalized, this will save you a lot of time further down the line guaranteed, and it's a lot better practice to get into the habit of.
From a foreign key perspective, the first example is better because you can have multiple foreign key constraints on a column but the data has to exist in all those references. It's also more flexible if the business rules change.
To continue from #OMG Ponies' answer, what you describe in the second example is called a Polymorphic Association, where the foreign key ResourceID may reference rows in more than one table. However in SQL databases, a foreign key constraint can only reference exactly one table. The database cannot enforce the foreign key according to the value in CommentTypeID.
You may be interested in checking out the following Stack Overflow post for one solution to tackle this problem:
MySQL - Conditional Foreign Key Constraints
The first approach is not great, since it is quite denormalized. Each time you add a new entity type, you need to update the table. You may be better off making this an attribute of document - I.e. store the comment inline in the document table.
For the ResourceID approach to work with referential integrity, you will need to have a Resource table, and a ResourceID foreign key in all of your Document, Project etc.. entities (or use a mapping table.) Making "ResourceID" a jack-of-all-trades, that can be a documentID, projectID etc.. is not a good solution since it cannot be used for sensible indexing or foreign key constraint.
To normalize, you need to the comment table into one table per resource type.
Comment
-------
CommentID
CommentText
...etc
DocumentComment
---------------
DocumentID
CommentID
ProjectComment
--------------
ProjectID
CommentID
If only one comment is allowed, then you add a unique constraint on the foreign key for the entity (DocumentID, ProjectID etc.) This ensures that there can only be one row for the given item and so only one comment. You can also ensure that comments are not shared by using a unique constraint on CommentID.
EDIT: Interestingly, this is almost parallel to the normalized implementation of ResourceID - replace "Comment" in the table name, with "Resource" and change "CommentID" to "ResourceID" and you have the structure needed to associate a ResourceID with each resource. You can then use a single table "ResourceComment".
If there are going to be other entities that are associated with any type of resource (e.g. audit details, access rights, etc..), then using the resource mapping tables is the way to go, since it will allow you to add normalized comments and any other resource related entities.
I wouldn't go with either of those solutions. Depending on some of the specifics of your requirements you could go with a super-type table:
CREATE TABLE Commentable_Items (
commentable_item_id INT NOT NULL,
CONSTRAINT PK_Commentable_Items PRIMARY KEY CLUSTERED (commentable_item_id))
GO
CREATE TABLE Projects (
commentable_item_id INT NOT NULL,
... (other project columns)
CONSTRAINT PK_Projects PRIMARY KEY CLUSTERED (commentable_item_id))
GO
CREATE TABLE Documents (
commentable_item_id INT NOT NULL,
... (other document columns)
CONSTRAINT PK_Documents PRIMARY KEY CLUSTERED (commentable_item_id))
GO
If the each item can only have one comment and comments are not shared (i.e. a comment can only belong to one entity) then you could just put the comments in the Commentable_Items table. Otherwise you could link the comments off of that table with a foreign key.
I don't like this approach very much in your specific case though, because "having comments" isn't enough to put items together like that in my mind.
I would probably go with separate Comments tables (assuming that you can have multiple comments per item - otherwise just put them in your base tables). If a comment can be shared between multiple entity types (i.e., a document and a project can share the same comment) then have a central Comments table and multiple entity-comment relationship tables:
CREATE TABLE Comments (
comment_id INT NOT NULL,
comment_text NVARCHAR(MAX) NOT NULL,
CONSTRAINT PK_Comments PRIMARY KEY CLUSTERED (comment_id))
GO
CREATE TABLE Document_Comments (
document_id INT NOT NULL,
comment_id INT NOT NULL,
CONSTRAINT PK_Document_Comments PRIMARY KEY CLUSTERED (document_id, comment_id))
GO
CREATE TABLE Project_Comments (
project_id INT NOT NULL,
comment_id INT NOT NULL,
CONSTRAINT PK_Project_Comments PRIMARY KEY CLUSTERED (project_id, comment_id))
GO
If you want to constrain comments to a single document (for example) then you could add a unique index (or change the primary key) on the comment_id within that linking table.
It's all of these "little" decisions that will affect the specific PKs and FKs. I like this approach because each table is clear on what it is. In databases that's usually better then having "generic" tables/solutions.
Of the options you give, I would go for number 2.
Option 2 is a good way to go. The issue that I see with that is you are putting the resouce key on that table. Each of the IDs from the different resources could be duplicated. When you join resources to the comments you will more than likely come up with comments that do not belong to that particular resouce. This would be considered a many to many join. I would think a better option would be to have your resource tables, the comments table, and then tables that cross reference the resource type and the comments table.
If you carry the same sort of data about all comments regardless of what they are comments about, I'd vote against creating multiple comment tables. Maybe a comment is just "thing it's about" and text, but if you don't have other data now, it's likely you will: date the comment was entered, user id of person who made it, etc. With multiple tables, you have to repeat all these column definitions for each table.
As noted, using a single reference field means that you could not put a foreign key constraint on it. This is too bad, but it doesn't break anything, it just means you have to do the validation with a trigger or in code. More seriously, joins get difficult. You can just say "from comment join document using (documentid)". You need a complex join based on the value of the type field.
So while the multiple pointer fields is ugly, I tend to think that's the right way to go. I know some db people say there should never be a null field in a table, that you should always break it off into another table to prevent that from happening, but I fail to see any real advantage to following this rule.
Personally I'd be open to hearing further discussion on pros and cons.
Pawnshop Application:
I have separate tables for Loan, Purchase, Inventory & Sales transactions.
Each tables rows are joined to their respective customer rows by:
customer.pk [serial] = loan.fk [integer];
= purchase.fk [integer];
= inventory.fk [integer];
= sale.fk [integer];
I have consolidated the four tables into one table called "transaction", where a column:
transaction.trx_type char(1) {L=Loan, P=Purchase, I=Inventory, S=Sale}
Scenario:
A customer initially pawns merchandise, makes a couple of interest payments, then decides he wants to sell the merchandise to the pawnshop, who then places merchandise in Inventory and eventually sells it to another customer.
I designed a generic transaction table where for example:
transaction.main_amount DECIMAL(7,2)
in a loan transaction holds the pawn amount,
in a purchase holds the purchase price,
in inventory and sale holds sale price.
This is clearly a denormalized design, but has made programming alot easier and improved performance. Any type of transaction can now be performed from within one screen, without the need to change to different tables.

SQL Server 2008 - Table - Clarifications

I am new to SQL Server 2008 database development.
Here I have a master table named ‘Student’ and a child table named ‘Address’. The common column between these tables is ‘Student ID’.
My doubts are:
Do we need to put ‘Address Id’ in the ‘Address’ table and make it primary key? Is it mandatory? ( I won’t be using this ‘Address Id’ in any of my reports )
Is Primary key column a must in any table?
Would you please help me on these.
Would you please also refer best links/tutorials for SQL Server 2008 database design practices (If you are aware of) which includes naming conventions, best practices, SQL optimizations etc. etc.
1) Yes, having an ADDRESS_ID column as the primary key of the ADDRESS table is a good idea.
But having the STUDENT_ID as a foreign key in the ADDRESS table is not a good idea. This means that an address record can only be associated to one student. Students can have roommates, so they'd have identical addresses. Which comes back to why it's a good idea to have the ADDRESS_ID column as a primary key, as it will indicate a unique address record.
Rather than have the STUDENT_ID column in the ADDRESS table, I'd have a corrollary/xref/lookup table between the STUDENT and ADDRESS tables:
STUDENT_ADDRESSES_XREF
STUDENT_ID, pk, fk to STUDENTS table
ADDRESS_ID, pk, fk to ADDRESS table
EFFECTIVE_DATE, date, not null
EXPIRY_DATE, date, not null
This uses a composite primary key, so that only one combination of the student & address exist. I added the dates in case there was a need to know when exactly, because someone could move back home/etc after all.
Most importantly, this works off the ADDRESS_ID column to allow for a single address to be associated to multiple people.
2) Yes, defining a primary key is frankly a must for any table.
In most databases, the act also creates an index - making searching more efficient. That's on top of the usual things like making sure a record is a unique entry...
Every table should have a way to uniquely and unambiguously identify a record. Make AddressID the primary key for the address table.
Without a primary key, the database will allow duplicate records; possibly creating join problems or trigger problems (if you implement them) down the road.