Database design when having either foreign key or string - sql

In my application I'm creating a case having a supervisor, but as the supervisor can be either an employee or an external supervisor, I'd like to be able to save either an employee id for the internal reference or a string for the name of the external supervisors name.
How should I implement this? Is having a table "case" and the sub-tables "case_internal_sv" and "case_external_sv" the way to go?

There's not a lot of information in your question on which to base an answer.
If there is common data and functionality across all types of supervisors then you will probably want one table to hold that common data. That table would establish the primary key values for supervisors and the case table would have a foreign key into this table. Information that is unique to either internal or external supervisors would go into separate tables and those tables would also have a foreign key back to the common supervisor level data.
This design is superior because you only have one place to go in order to find a list of all supervisors and because you can enforce the supervisor / case relationship directly in the database without a lot of code or additional constraints to ensure that "one and only one" of two columns is populated.
It's sufficiently superior, from a database point of view, that I'd consider using this design even if the data for internal and external supervisors is completely disjoint (which it's unlikely to be).

If your database allows you to define multiple primary keys you could have field employee and field employee_type combine to form the unique primary key. If not you could have an autogenerated primary key for the table and have a field for employee type and a field for employee_id.
What database are you using?

Having tables too normalized can do more harm than help. You should evaluate the benefits/drawbacks of having sub-tables based on your requirements.
A simple solution can be to have a 'sv_type' column in 'case' table. And have two columns
'internal_sv_id', a nullable foreign key to the employee table
'external_sv_name', a nullable string to save external name.
Then check for supervisor in one of these two columns based on the 'sv_type'
This design might not fully conform to 3rd normal form, but it can save lot of costly joins and allow integrity with employee table.
As for me, I'd go for the simplest solution in case of doubt.

Related

A different way to Model a portion of this ERD

I have a very simple table diagram from modeling my application. The problem is I am second guessing my relation between Vendor and VendorOrder. The VendorOrders table should store all vendororders in the system. To get all orders for a certain apartment, you would just use the PK and FK relationship to gather that data. Is there anything I should improve with the overall design?
Diagram:
There's three things I see that you could improve this by doing.
Create an intersection table between your Apartment and Resident tables called ApartmentResidents, where each table references the intersection table with a one to many relationship. In this ERD, it only allows for one resident to be registered to an apartment. If a resident lives in more than one apartment for the lifetime of this database, you'll need to register them as an entirely new resident.
Intersection table example
In your Vendor table, instead of using a name as your primary key I would create an id instead. Using things that have a real-world value as your primary key can get messy for a number of reasons:
If two vendors have the same name, like "Johnson's Repair", you'll need to misspell one of them for it to be a valid key.
If you typo a vendor's name, you're also going to contain a reference to that typo in the foreign key tables (Which also might make it not show in results if you do a select query for the correct spelling).
Placing an index on a string is less performant than if you put it on an auto incrementing integer key.
(Optional) I usually name my database tables pluralized, like "Apartments", or "Vendors". It makes the SQL syntax read more like a sentence inside the query. If I remember right that's also one of the things that SQL's creator was going for too with the syntax design.

SQL tables design layout

I'm trying to design my database with very basic tables and I am confused on the CORRECT way to do it.
I've attached a picture of the main info, and I'm not quite sure how to link them. Meaning what should be a foreign key, or should some of these tables include of LIST<> of the other tables.
UPDATE TO TABLES
As per your requirements, You are right about the associative table
Client can have multiple accounts And Accounts can have multiple clients
Then, Many (Client) to Many (Account)
So, Create an associate table to break the many to many relationship first. Then join it that way
Account can have only one Manager
Which means One(Manager) to Many(Accounts)
So, add an attribute called ManagerID in Accounts
Account can have many traedetail
Which means One(Accounts) to Many(TradeDetails)
So, add an attribute called AccountID in TradeDetails
Depends on whether you are looking to have a normalized database or some other type of design paradigm. I recommend doing some reading on the concepts of database normalization and referential integrity.
What I would do is make tables that have a 1 to 1 relationship such as account/manager into a single table (unless you can think of a really good reason not to). Add Clientid as a foreign key to Account. Add AccountID as a foreign key to TradeDetail. You are basically setting up everything as 1 to many relationships where the table that has 1 record for the id has the field as a primary key and the table that has many has it as a foreign key.

Should one-to-one relationships ever be split into two tables?

I have one table [Users] and another table [Administrators] linked 1:0..1 . Is it best practice to merge these tables? I have read a lot of answers on SO stating splitting tables is only necessary for one-to-many relationships.
My reasoning for separating them is so I can reference administrators with AdministratorId rather than the general UserId. In other tables I have fields which should only ever contain an administrator so it acts as a referential check.
There is a rule of thumb that states a table either models an entity/class or the relationship between entities/classes but not both. However, it is only a rule of thumb, never say never!
SQL generally has a problem with dedicated 1:1 relationship tables because the only inter-table constraints commonly found are foreign keys. However, a FK does not require that a value exists in the referencing table. This makes the relationship 1:0..1 ("one-to-zero-or-one"), which is usually acceptable.
Strict 1:1 requires a workaround. Because SQL lacks multiple assignment, the workaround usually involves resorting to procedural code e.g. two deferrable 'bi-directional' FKs; triggers; forcing updates via CRUD stored procs; etc.
In contrast, modelling a 1:1 relationship in the same table is easy: declare both columns as NOT NULL!
I think the best option is to have two tables for the two different entities, Users and Administrators, possibly with the same Primary Key.
CREATE TABLE User
( UserId int
, ... other data --- data for all users
, PRIMARY KEY (UserId)
) ;
CREATE TABLE Administrator
( AdministratorId int
, ... other data --- data for administrators only
, PRIMARY KEY (AdministratorId)
, FOREIGN KEY AdministratorId
REFERENCES User(UserId)
) ;
This way, as you mention, other tables can reference the AdministratorId:
CREATE TABLE OtherTable
( OtherTableId int
, AdministratorId int
, ... other data
, ...
, FOREIGN KEY AdministratorId
REFERENCES Administrator(AdministratorId)
) ;
Benefits:
referential integrity is trivially implemented.
the relevant data (for Users and Admins) can be stored in the relevant tables so you have less columns in the tables and fewer NULL data.
any query that needs a JOIN to Administrator table will have to look up only a few rows, compared to the (possibly huge) number of rows of the User table. If you have only one table, you'll end up with code like:
WHERE User.admin = True
which may not be easily optimized.
There are several reasons why you might want to keep them separate. One is if the records in one table represent a subset of the records in the other. This patterns is called sub-classing, and is clearly the case in your situation.
This is wise even if the fields (the data) you need to store about admins are not different from the data you need to store about all users. Another reason is if the the usage patterns for a few columns is very different (greater frequency of access) from the usage patterns for the rest of the columns.
Let me stick to the title of your Question first.
Yes, One to one relationship can be split into different tables when you need the following:
Modularity
Security / Data Abstraction
I guess the modularity bit is quite clear.
The Security bit is obvious too since you will require access to both the tables to fetch the whole picture of data.
For Example, Let's suppose you have the below as the scenario:
Wherein, Customer has a dummy id for passport and Passport table doesn't have corresponding customer info.
Hence, the person with access to both tables can only map the passport id with the customer and see the whole picture.
It is common to have tables in a one-to-one relationship. First if they are separate entities that will need to be queried separately or if they are subclasses (as in your case) then the separate table makes sense. Also if the primary table is getting too large for the maximum record size it makes sense to have an additional table in a one-to-one relationship. Finally you have the case where the relationship is 1-1 now, but has the potential to be 1-many in the future such as when you only have one phone number now but wil probably need to store more later. In this case it will be less work to go ahead and make it a separate table.
The crtical piece to setting up a 1-1 relationship is to enforce it as 1-1. The easiest way to do this is to make the FK field also the PK field in the second table.

how to design a schema where the columns of a table are not fixed

I am trying to design a schema where the columns of a table are not fixed. Ex: I have an Employee table where the columns of the table are not fixed and vary (attributes of Employee are not fixed and vary). Frequent addition of a new attribute / column is requirement.
Nullable columns in the Employee table itself i.e. no normalization
Instead of adding nullable columns, separate those columns out in their individual tables ex: if Address is a column to be added then create table Address[EmployeeId, AddressValue].
Create tables ExtensionColumnName [EmployeeId, ColumnName] and ExtensionColumnValue [EmployeeId, ColumnValue]. ExtensionColumnName would have ColumnName as "Address" and ExtensionColumnValue would have ColumnValue as address value.
Employee table
EmployeeId
Name
ExtensionColumnName table
ColumnNameId
EmployeeId
ColumnName
ExtensionColumnValue table
EmployeeId
ColumnNameId
ColumnValue
There is a drawback is the first two ways as the schema changes with every new attribute. Note that adding a new attribute is frequent and a requirement.
I am not sure if this is the good or bad design. If someone had a similar decision to make, please give an insight on things like foreign keys / data integrity, indexing, performance, reporting etc.
It might be useful to look at the current crop of NoSQL databases which allow you to store arbitrary sets of key-value pairs per record.
I would recommend you look at couchdb, mongodb, lucene, etc ...
If the schema changes often in an SQL database this ends up in a nightmare, especially with reporting.
Putting everything in (rowId, key, value) triads is flexible, but slower because of the huge number of records.
The way the ERP vendors do it is just make their schema of the fields they're sure of and add a largisch number of "flexfields" (i.e. 20 numbers, 20 strings, etc) in fixed named columns and use a lookup table to see which flexcolumn corresponds to what. This allows some flexibility for the future while essentially having a static schema.
I recommend using a combination of numbers two and three. Where possible, model tables for standard associations like addresses. This is the most ideal approach...
But for constantly changing values that can't be summarized into logical groupings like that, use two tables in addition to the EMPLOYEES table:
EMPLOYEE_ATTRIBUTE_TYPE_CODES (two columns, employee_attribute_type_code and DESCRIPTION)
EMPLOYEE_ATTRIBUTES (three columns: employee_id foreign key to EMPLOYEES, employee_attribute_type_code foreign key to EMPLOYEE_ATTRIBUTE_TYPE_CODES, and VALUE)
In EMPLOYEE_ATTRIBUTES, set the primary key to be made of:
employee_id
employee_attribute_type_code
This will stop duplicate attributes to the same employee.
If, as you say, new attributes will be added frequently, an EAV data model may work well for you.
There is a pattern, called observation pattern.
For explanation, see these questions/answers: one, two, three.
In general, looks like this:
For example, subjects employee, company and animal can all have observation Name (trait), subjects employee and animal can have observation Weight (measurement) and subject beer bottle can have observations Label (trait) and Volume (measurement). It all fits in the model.
Combine your ExtensionColumn tables into one
Property:
EmployeeID foreign key
PropertyName string
PropertyValue string
If you use a monotonic sequence for assigning primary keys in all your object tables then a single property table can hold properties for all objects.
I would use a combination of 1 and 2. If you are adding attributes frequently, I don't think you have a handle on the data requirements.
I supect some of the attributes being added belong in a another table. If you keep adding attribututes like java certified, asp certified, ..., then you need a certification table. This can be relationship to a certifications code table listing available certifications.
Attributes like manager may be either an attribute or relationship table. If you have multiple relationships between employees, then consider a relationship table with a releation type. Organizations with a matrix management structure will require a releationship table.
Addresses and phone numbers often go in separate tables. An address key like employee_id, address_type would be appropriate. If history is desired add a start_date column to the key.
If you are keeping history I recommend using start_date and end_date columns on the appropriate columns. I try to use a relationship where the record is active when 'start_date <= date-being-considered < end_date' is true.
Attributes like weight, eye color, etc.

Multiple foreign keys to a single column

I'm defining a database for a customer/ order system where there are two highly distinct types of customers. Because they are so different having a single customer table would be very ugly (it'd be full of null columns as they are pointless for one type).
Their orders though are in the same format. Is it possible to have a CustomerId column in my Order table which has a foreign key to both the Customer Types? I have set it up in SQL server and it's given me no problems creating the relationships, but I'm yet to try inserting any data.
Also, I'm planning on using nHibernate as the ORM, could there be any problems introduced by doing the relationships like this?
No, you can't have a single field as a foreign key to two different tables. How would you tell where to look for the key?
You would at least need a field that tells what kind of user it is, or two separate foreign keys.
You could also put the information that is common for all users in one table and have separate tables for the information that is specific for the user types, so that you have a single table with user id as primary key.
A foreign key can only reference a single primary key, so no. However, you could use a bridge table:
CustomerA <---- CustomerA_Orders ----> Order
CustomerB <---- CustomerB_Orders ----> Order
So Order doesn't even have a foreign key; whether this is desirable, though...
I inherited a SQL Server database where this was done (a single column used in four foreign key relationships with four unrelated tables), so yes, it's possible. My predecessor is gone, though, so I can't ask why he thought it was a good idea.
He used a GUID column ("uniqueidentifier" type) to avoid the ambiguity problem, and he turned off constraint checking on the foreign keys, since it's guaranteed that only one will match. But I can think of lots of reasons that you shouldn't, and I haven't thought of any reasons you should.
Yours does sound like the classical "specialization" problem, typically solved by creating a parent table with the shared customer data, then two child tables that contain the data unique to each class of customer. Your foreign key would then be against the parent customer table, and your determination of which type of customer would be based on which child table had a matching entry.
You can create a foreign key referencing multiple tables. This feature is to allow vertical partioining of your table and still maintain referential integrity. In your case however, this is not applicable.
Your best bet would be to have a CustomerType table with possible columns - CustomerTypeID, CustomerID, where CustomerID is the PK and then refernce your OrderID table to CustomerID.
Raj
I know this is a very old question; however if other people are finding this question through the googles, and you don't mind adding some columns to your table, a technique I've used (using the original question as a hypothetical problem to solve) is:
Add a [CustomerType] column. The purpose of storing a value here is to indicate which table holds the PK for your (assumed) [CustomerId] FK column. Optional - addition of a check constraint (to ensure CustomerType is in CustomerA or CustomerB) will help you sleep better at night.
Add a computed column for each [CustomerType], eg:
[CustomerTypeAId] as case when [CustomerType] = 'CustomerA' then [CustomerId] end persisted
[CustomerTypeBId] as case when [CustomerType] = 'CustomerB' then [CustomerId] end persisted
Add your foreign keys to the calculated (and persisted) columns.
Caveat: I'm primarily in a MSSQL environment; so I don't know how well this translates to other DBMS (ie: Postgres, ORACLE, etc).
As noted, if the key is, say, 12345, how would you know which table to look it up in? You could, I suppose, do something to insure that the key values for the two tables never overlapped, but this is too ugly and painful to contemplate. You could have a second field that says which customer type it is. But if you're going to have two fields, why not have one field for customer type 1 id and another for customer type 2 id.
Without knowing more about your app, my first thought is that you really should have a general customer table with the data that is common to both, and then have two additional tables with the data specific to each customer type. I would think that there must be a lot of data common to the two -- basic stuff like name and address and customer number at the least -- and repeating columns across tables sucks big time. The additional tables could then refer back to the base table. As there is then a single key for the base table, the issue of foreign keys having to know which table to refer to evaporates.
Two distinct types of customer is a classic case of types and subtypes or, if you prefer, classes and subclasses. Here is an answer from another question.
Essentially, the class-table-inheritance technique is like Arnand's answer. The use of the shared-primary-key technique is what allows you to get around the problems created by two types of foreign key in one column. The foreign key will be customer-id. That will identify one row in the customer table, and also one row in the appropriate kind of customer type table, as the case may be.
Create a "customer" table include all the columns that have same data for both types of customer.
Than create table "customer_a" and "customer_b"
Use "customer_id" from "consumer" table as foreign key in "customer_a" and "customer_b"
customer
|
---------------------------------
| |
cusomter_a customer_b