SQL - complicated link table, no simple solution - sql

I am working on a project that has a relatively complicated data structure, and we are a bit stumped with the following situation:
The db has various tables which represent glazing unit* components (glass pane, coating, blind) and each one of these components has a spectra. (* glazing unit - think double/triple glazing glass unit)
Now we could have one spectra table per component table, and there would be a simple FK relationship, as follows:
tbl_glasspane 1 -- many tbl_glasspane_spectra
------------- -------------
+ Id + GlassPane_Id
GlassPaneName + Wavelength
TypeId etc. Value
Where GlassPaneId is a primary key in GlassPane, and GlassPaneId/Wavelength is a composite primary key in tbl_spectra.
This works fine, however we would need one tbl_xxx_spectra for each component table.
The solution to this was to have one Spectra table that each of the component tables would reference, but therein lies the problem - how do I get an arbitrary number of component tables to reference one spectra table?
My initial solution was:
tbl_spectra tbl_spectraIndex tbl_glasspane tbl_coating
--------------- --------------- ------------- -------------
+ SpectraIndex_Id + Id + Id + Id
+ Wavelength SpectraIndex_Id SpectraIndex_Id
Value
So an explanation...
tbl_spectraIndex is a table which contains one identity column. tbl_spectra has a FK relationship between Id from tbl_spectraIndex and SpectraIndex_Id, and this is one part of the composite PK for tbl_spectra. tbl_glasspane and tbl_coating have a FK relationship to tbl_spectraIndex.
Is this a sensible solution?

In this kind of cases is better to use a de-normalized schema. You could use one single spectra table with two fields: one is an int that store the kind of object (Table) the spectra belongs to. The other is an int that link to the table ID. In this cases spectra table is linked to multiple tables with the same field. I tend to do this for example with my address table. In my database I have many objects that need to have one or more address object asociated, while every address has the same fields. So i create a single address table, with two field like i mention early, one to represent the kind of object to represent, and one with the link to the actual object.
Also have you consider putting all you glazing tables into one single big table? Hope it helps you.

Related

A different way to Model a portion of this ERD

I have a very simple table diagram from modeling my application. The problem is I am second guessing my relation between Vendor and VendorOrder. The VendorOrders table should store all vendororders in the system. To get all orders for a certain apartment, you would just use the PK and FK relationship to gather that data. Is there anything I should improve with the overall design?
Diagram:
There's three things I see that you could improve this by doing.
Create an intersection table between your Apartment and Resident tables called ApartmentResidents, where each table references the intersection table with a one to many relationship. In this ERD, it only allows for one resident to be registered to an apartment. If a resident lives in more than one apartment for the lifetime of this database, you'll need to register them as an entirely new resident.
Intersection table example
In your Vendor table, instead of using a name as your primary key I would create an id instead. Using things that have a real-world value as your primary key can get messy for a number of reasons:
If two vendors have the same name, like "Johnson's Repair", you'll need to misspell one of them for it to be a valid key.
If you typo a vendor's name, you're also going to contain a reference to that typo in the foreign key tables (Which also might make it not show in results if you do a select query for the correct spelling).
Placing an index on a string is less performant than if you put it on an auto incrementing integer key.
(Optional) I usually name my database tables pluralized, like "Apartments", or "Vendors". It makes the SQL syntax read more like a sentence inside the query. If I remember right that's also one of the things that SQL's creator was going for too with the syntax design.

Numeric IDs vs. String IDs

I'm using a very stripped down example here so please ask if you need more context.
I'm in the process of restructuring/normalising a database where the ID fields in the majority of the tables have primary key fields which are auto-incremented numerical ID's (1,2,3 etc.) and I'm thinking I need to change the ID field from a numerical value to a string value generated from data in the row.
My reasoning for this is as follows:
I have 5 tables; Staff, Members, Volunteers, Interns and Students; all of these have numeric ID's.
I have another table called BuildingAttendance which logs when people visited the premises and for what reason which has the following relevant fields:
ID Type Premises Attended
To differentiate between staff and members. I use the type field, using MEM for member and STA for staff, etc. So as an example:
ID Type Premises Attended
1 MEM Building A 27/6/15
1 STA Building A 27/6/15
2 STU Building B 27/6/15
I'm thinking it might be a better design design to use an ID similar to the following:
ID Premises Attended
MEM1 Building A 27/6/15
STA1 Building A 27/6/15
STU2 Building B 27/6/15
What would be the best way to deal with this? I know that if my primary key is a string my query performance may take a hit, but is this easier than having 2 columns?
tl;dr - How should I deal a table that references records from other tables with the same ID system?
Auto-incremented numeric ids have several advantages over strings:
They are easier to implement. In order to generate the strings (as you want them), you would need to implement a trigger or computed column.
They occupy a fixed amount of storage (probably 4 bytes), so they are more efficient in the data record and in indexes.
They allow members to change between types, without affecting the key.
The problem that you are facing is that you have subtypes of a supertype. This information should be stored with the person, not in the attendance record (unless a person could change their type with each visit). There are several ways to approach this in SQL, none as clean as simple class inheritance in a programming language.
One technique is to put all the data in a single table called something like Persons. This would have a unique id, a type, and all the columns from your five tables. The problem is when the columns from your subtables are very different.
In that case, have a table called persons with a unique primary key and the common columns. Then have separate tables for each one and use the PersonId as the primary key for these tables.
The advantage to this approach is that you can have a foreign key reference to Persons for something like BuildingAttendance. And, you can also have foreign key references to each of the subtypes, for other tables where appropriate.
Gordon Linoff already provided an answer that points out the type/supertype issue. I refer to this a class/subclass, but that's just a difference in terminology.
There are two tags in this area that collect questions that relate to class/subclass. Here they are:
class-table-inheritance
shared-primary-key
If you will look over the info tab for each of these tags, you'll see a brief outline. Plus the answers to the questions will help you with your case.
By creating a single table called Person, with an autonumber ID, you provide a handy way of referencing a person, regardless of that person's type. By making the staff, member, volunteer, student, and intern tables use a copy of this ID as their own ID you will facilitate whatever joins you need to perform.
The decision about whether to include type in attendance depends on whether you want to retrieve the data with the person's current type, or with the type the person had at the time of the attendance.

Horizontal/Vertical partitioning and relations

I've got a question regarding horizontal/vertical partitioning in a relational database.
I'm gonna demonstrate with an example:
Let's say we've got a disjoint inheritance for Person. A person can either be registered, or unregistered, but not both.
A person also has a many-to-many relation with the table House, a house can be owned by 1..* persons, and a person can own 1..* houses.
If I then were to horizontal partition the Person table, which means we have a two identical tables, but one containing the registered persons, and one containing the unregistered. How would it work with the many-to-many relation?
I've thought of also partition the relationship, but if there's an n amount of relations between person and other tables, a horizontal partition would cause the tables to grow by n * 2. Is this really the way to go?
I hope I made myself clear and thanks in advance.
There's no "right way" to do this. There are different approaches with different trade offs. However, I would start with something like this:
+------+ +-----+
|Person<----------+House|
+^----^+ +-----+
| |
| |
+---------------++ ++-----------------+
|RegisteredPerson| |UnRegisteredPerson|
+----------------+ +------------------+
Person would have an autogenerated PersonID. The child tables RegisteredPerson and UnregisteredPerson would have the same Primary Key (making the relationship 1 to 0 or 1).
It is slightly tricky with this approach to enforce a Person having exactly one RegisteredPerson or UnRegistered person. The most straightforward way is to only allow write access to the tables through stored procedures that maintain the correct invariants. There are also schemes using triggers and tagging the Person record with the subtype.
A note on terminology: in the context of databases, "horizontal" and "vertical" partitioning usually refer to storage mechanisms that are entirely unrelated to your question. This is a question about inheritance.
Laurence's answer is correct, as far as it goes. You need an entity called Person for the relationship to House as you have described the problem. Two ways of enforcing the "only-one" registration is to store the ids in the Person table. One way is:
create table Person (
. . .
RegisteredId int references RegisteredPersons(RegisteredPersonId),
UnregisteredId int references UnregisteredPersons(UnregisteredPersonId),
check (RegisteredId is null or (UnregisteredId is null)
)
(Note this allows a person to be neither registered nor unregistered, although the check can be fixed for that.)
An alternative is:
create table Person (
. . .
RegType varchar(255),
RegId int
check RegType in ('Registered', 'Unregistered')
)
Then, depending on the database, you can define a computed column or view that defines the foreign key reference. Something like this:
RegisteredId as (case when RegType = 'Registered' then RegId end),
UnregisteredId as (case when RegType = 'Unregistered' then RegId end),
Advantages and disadvantages. The first approach enforces the foreign key relationships. However, it eats up an integer of storage for each type. For two types, this isn't a big deal.
The second approach requires functionality that is not standard across databases. For instance, in SQL Server and Oracle, you can enforce the foreign key relationship -- but one does it with computed columns and one does it with filtered indexes.

Table referenced by other tables having different PKs

I would like to create a table called "NOTES". I was thinking this table would contain a "table_name" VARCHAR(100) which indicates what table put in the note, a "key" or multiple "key" columns representing the primary key values of the table that this note applies to and a "note" field VARCHAR(MAX). When other tables use this table they would supply THEIR primary key(s) and their "table_name" and get all the notes associated with the primary key(s) they supplied. The problem is that other tables might have 1, 2 or more PKs so I am looking for ideas on how I can design this...
What you're suggesting sounds a little convoluted to me. I would suggest something like this.
Notes
------
Id - PK
NoteTypeId - FK to NoteTypes.Id
NoteContent
NoteTypes
----------
Id - PK
Description - This could replace the "table_name" column you suggested
SomeOtherTable
--------------
Id - PK
...
Other Columns
...
NoteId - FK to Notes.Id
This would allow you to keep your data better normalized, but still get the relationships between data that you want. Note that this assumes a 1:1 relationship between rows in your other tables and Notes. If that relationship will be many to one, you'll need a cross table.
Have a look at this thread about database normalization
What is Normalisation (or Normalization)?
Additionally, you can check this resource to learn more about foreign keys
http://www.w3schools.com/sql/sql_foreignkey.asp
Instead of putting the other table name's and primary key's in this table, have the primary key of the NOTES table be NoteId. Create an FK in each other table that will make a note, and store the corresponding NoteId's in the other tables. Then you can simply join on NoteId from all of these other tables to the NOTES table.
As I understand your problem, you're attempting to "abstract" the auditing of multiple tables in a way that you might abstract a class in OOP.
While it's a great OOP design principle, it falls flat in databases for multiple reasons. Perhaps the largest single reason is that if you cannot envision it, neither will someone (even you) looking at it later have an easy time reassembling the data. Smaller that that though, is that while you tend to think of a table as a container and thus similar to an object, in reality they are implemented instances of this hypothetical container you are trying to put together and operate better if you treat them as such. By creating an audit table specific to a table or a subset of tables that share structural similarity and data similarity, you increase the performance of your database and you won't run in to strange trigger or select related issues later.
And you can't envision it not because you're not good at what you're doing, but rather, the structure is not conducive to database logging.
Instead, I would recommend that you create separate logging tables that manage the auditing of each table you want to audit or log. In fact, some fast google searches show many scripts already written to do much of this tasking for you: Example of one such search
You should create these individual tables and then if you want to be able to report on multiple table or even all tables at once, you can create a stored procedure (if you want to make queries based on criterion) or a view with an included SELECT statement that JOINs and/or UNIONs the tables you are interested in - in a fashion that makes sense to the report type. You'll still have to write new objects in to the view, but even with your original table design, you'd have to account for that.

how to design a schema where the columns of a table are not fixed

I am trying to design a schema where the columns of a table are not fixed. Ex: I have an Employee table where the columns of the table are not fixed and vary (attributes of Employee are not fixed and vary). Frequent addition of a new attribute / column is requirement.
Nullable columns in the Employee table itself i.e. no normalization
Instead of adding nullable columns, separate those columns out in their individual tables ex: if Address is a column to be added then create table Address[EmployeeId, AddressValue].
Create tables ExtensionColumnName [EmployeeId, ColumnName] and ExtensionColumnValue [EmployeeId, ColumnValue]. ExtensionColumnName would have ColumnName as "Address" and ExtensionColumnValue would have ColumnValue as address value.
Employee table
EmployeeId
Name
ExtensionColumnName table
ColumnNameId
EmployeeId
ColumnName
ExtensionColumnValue table
EmployeeId
ColumnNameId
ColumnValue
There is a drawback is the first two ways as the schema changes with every new attribute. Note that adding a new attribute is frequent and a requirement.
I am not sure if this is the good or bad design. If someone had a similar decision to make, please give an insight on things like foreign keys / data integrity, indexing, performance, reporting etc.
It might be useful to look at the current crop of NoSQL databases which allow you to store arbitrary sets of key-value pairs per record.
I would recommend you look at couchdb, mongodb, lucene, etc ...
If the schema changes often in an SQL database this ends up in a nightmare, especially with reporting.
Putting everything in (rowId, key, value) triads is flexible, but slower because of the huge number of records.
The way the ERP vendors do it is just make their schema of the fields they're sure of and add a largisch number of "flexfields" (i.e. 20 numbers, 20 strings, etc) in fixed named columns and use a lookup table to see which flexcolumn corresponds to what. This allows some flexibility for the future while essentially having a static schema.
I recommend using a combination of numbers two and three. Where possible, model tables for standard associations like addresses. This is the most ideal approach...
But for constantly changing values that can't be summarized into logical groupings like that, use two tables in addition to the EMPLOYEES table:
EMPLOYEE_ATTRIBUTE_TYPE_CODES (two columns, employee_attribute_type_code and DESCRIPTION)
EMPLOYEE_ATTRIBUTES (three columns: employee_id foreign key to EMPLOYEES, employee_attribute_type_code foreign key to EMPLOYEE_ATTRIBUTE_TYPE_CODES, and VALUE)
In EMPLOYEE_ATTRIBUTES, set the primary key to be made of:
employee_id
employee_attribute_type_code
This will stop duplicate attributes to the same employee.
If, as you say, new attributes will be added frequently, an EAV data model may work well for you.
There is a pattern, called observation pattern.
For explanation, see these questions/answers: one, two, three.
In general, looks like this:
For example, subjects employee, company and animal can all have observation Name (trait), subjects employee and animal can have observation Weight (measurement) and subject beer bottle can have observations Label (trait) and Volume (measurement). It all fits in the model.
Combine your ExtensionColumn tables into one
Property:
EmployeeID foreign key
PropertyName string
PropertyValue string
If you use a monotonic sequence for assigning primary keys in all your object tables then a single property table can hold properties for all objects.
I would use a combination of 1 and 2. If you are adding attributes frequently, I don't think you have a handle on the data requirements.
I supect some of the attributes being added belong in a another table. If you keep adding attribututes like java certified, asp certified, ..., then you need a certification table. This can be relationship to a certifications code table listing available certifications.
Attributes like manager may be either an attribute or relationship table. If you have multiple relationships between employees, then consider a relationship table with a releation type. Organizations with a matrix management structure will require a releationship table.
Addresses and phone numbers often go in separate tables. An address key like employee_id, address_type would be appropriate. If history is desired add a start_date column to the key.
If you are keeping history I recommend using start_date and end_date columns on the appropriate columns. I try to use a relationship where the record is active when 'start_date <= date-being-considered < end_date' is true.
Attributes like weight, eye color, etc.