Modelling database structures for volatile hierarchy of objects - sql

[Note: start]
Hopefully, there are some of you, professionals, that have already had to cope with this kind of situation.
The particular case concerning post office is fictional and was made just to present the problem.
This question is not about improving performance by adding indexes.
[Note: end]
Recently I've been wondering how to efficiently(!) create a database structure to handle volatile hierarchy levels.
Let's get into the example for a better understanding of the matter.
Suppose we have a post office that stores physical mails in different ways depending on really any factor (that doesn't matter here).
We are to map this situation onto a database model.
Ok, so we have those mails. Mails can be physically stored in cases, boxes, drawers, safe deposits and many many more (because we don't want to strict our storage types and structure, but instead allow it to be flexible to meet our future changes). This means that for now we have such types, and for instance one mail can be inside a box that is stored in a safe deposit, but there can be really any combination.
For simplicity of this case let's assume that a single mail is the lowest granularity we can get. We have to remember, though, that we may also have for example empty box (that has no objects both beneath and above it) and we also want to store this in DB.
This model has to:
be efficient - it can't (or should it?) all be stored in a single table, since we process many mails per day
be adjustable - we may need to change our way of storing some already existing objects, so we need those grouped hierarchies to change for some objects
be flexible - we are definitely going to need to add new objects in the future and so the hierarchy may change for upcoming and already existing particular objects
[My idea] So far I've come up with this idea:
Let's store all objects in one self-referencing "hierarchy" table and mark each object to be of some type, so that recursively we can see the path of where the email is located, or show every object with hierarchy that on the top depends on safe deposit.
This method would require :
a) every record to be detailed in this table, which makes it contain many null values, since mails are not described by the same attributes as a drawer,
b) or each object type might have its own table describing it, with a foreign key in "hierarchy table"
[Warning] This table can grow so big that lookup may cause serious performance issues. This would also require us to add new structure (physical table) for every new object in our "hierarchy of storage units", which I think is fine.
[Question] Could you please let me know if my idea (consider B to be my chosen requirement) is the best I can get? What can I improve?
[SAMPLE DATA]
Emails Table:
id
---
1
2
Cases Table:
id
---
1
2
Boxes Table:
id
---
1
Hierarchy-relation Table:
seq | id_obj | obj_type | id_parent_obj | parent_obj_type |
#1 | 1 | Email | 1 | Case | -- email 1 in case 1
#2 | 1 | Case | 1 | Box | -- case 1 in box 1
#3 | 1 | Box | [null] | [null] | -- box 1 no parent
#4 | 2 | Email | [null] | [null] | -- email 2 no parent
#5 | 3 | Email | 1 | Box | -- email 3 in box 1
#6 | 1 | Box | [null] | [null] | -- box 1 no parent
Just by the look on this sample data I see that we have some redundant information for example about the box in Hierarchy-relation table in seq #3 and #6. I think there is some different approach to this, also the refferential should be kept on seq, I think.
We can see though, that case 2 is empty.

You should define your storage types in one "normal" table.
Add all information to this table, which is bound to the storage itself. But nothing about the things stored in there.
If storages can be "nested" (e.g. boxes within bigger boxes within a ...) you could either add a "self-join" (you did not state your RDBMS, SQL Server offers HIERARCHYID for this) to specify the location as a reference to the parent storage, or you could define a Locator table where you store the IDs of a storage and the IDs of its container (1:n -> One storage can be located in exactly one container).
Than you need a table with your items to store. If you have only one kind of item ("mails") you need a table to define a "mail" with all its attributes.
You can either put the location (the place it's stored) as a foreign key to the storages right into the "mail" table (so each mail knows its location), or - again - you can define a mapping with the mail's ID and the storage ID.
The second approach is needed, when you want to add more information to the process "I store a mail" (e.g. who, when, price, ...) and you could historize this. If you change a location you just set a "ValidUntilDatetime" and add a new line with the new location. So you can follow the process of putting things around... If this was just an FK-column within your mails table, this would not be possible.
If there are more items to store, you could think about a Master-table and as many different item tables as you have different things to store. They share an ID. General information (also the location) are part of the master table, specific data part of the sub tables.
Well, I hope this gets you on the right path...
EDIT: Answer your question from your comments:
ad "Why need of locator table":
Just image a table
CREATE TABLE storage(id INT, Name VARCHAR(100), ...)
And sample data like
id Name, ...
1 Box1 ...
2 Box2 ...
3 Case1 ...
[...]
And a table like this
CREATE TABLE Mail( id INT, Creation DATETIME, From VARCHAR(100), To VARCHAR(100),LocationID INT FOREING KEY REFERENCES Storage(id),...)
And sample data like:
id Creation From To LocationID ...
1 2015-10-28 12:00 adr#mail.com xyz#mail.com 2 ...
2 2015-10-28 12:05 adr#mail.com xyz#mail.com 2 ...
3 2015-10-28 12:10 adr#mail.com xyz#mail.com 3 ...
This would make clear, that mails 1 and 2 are physically stored in Box2 and mail 3 is stored in Case1. But you have no information about: Who put it there? When was it put there? How long will it stay there? Who will come to pick it?...
If you have a locator table like
CREATE TABLE Location(id INT, mailID INT FOREIGN KEY REFERENCES Mail(id),storageID INT FOREIGN KEY REFERENCES Storage(id), When, ...)
You can store the "put-process". If an item is transfered from Box2 to Box1 you would change the locationID in the first example. But you would not keep the information that this item was in Box2 before, who transfered it, when this happend and so on...
ad "different ways to store": If a mail is stored "no where", you might allow the storageID to remain NULL, or you define a storage item called "Nowhere" and use this ID.
It is enough to store the ID of the direct container. The container itself must know where itself is located. For this read the first part of my answer.

Here a complete model (air code, untested...)
CREATE TABLE StorageType(ID INT IDENTITY PRIMARY KEY
,StorageTypeName VARCHAR(100) NOT NULL);
INSERT INTO StorageType VALUES('Box'),('Case'),('OtherStorage');
CREATE TABLE Storage(ID INT IDENTITY PRIMARY KEY
,StorageTypeID INT NOT NULL CONSTRAINT FK_Storage_StorageTypeID FOREIGN KEY REFERENCES StorageType(ID)
,StorageName VARCHAR(100) NOT NULL
/*more columns to describe a storage*/
);
INSERT INTO Storage VALUES(1,'Box1'),(1,'Box2'),(3,'SomeStorage1');
CREATE TABLE ObjectType(ID INT IDENTITY PRIMARY KEY
,ObjectTypeName VARCHAR(100) NOT NULL);
INSERT INTO ObjectType VALUES('Mail'),('OtherItem');
CREATE TABLE MyObject(ID INT IDENTITY PRIMARY KEY
,MyObjectTypeID INT NOT NULL CONSTRAINT FK_MyObject_MyObjectTypeID FOREIGN KEY REFERENCES ObjectType(ID)
,MyObjectName VARCHAR(100) NOT NULL
/*more columns to describe an object*/
);
INSERT INTO MyObject VALUES(1,'Mail1'),(1,'Mail2'),(2,'SomeObject1');
CREATE TABLE StorageLocation(ID INT IDENTITY PRIMARY KEY
,CreatedOn DATETIME NOT NULL
,OutDate DATETIME NULL
,StorageID INT NOT NULL CONSTRAINT FK_StorageLocation_StorageID FOREIGN KEY REFERENCES Storage(ID)
,ContainerID INT NULL CONSTRAINT FK_StorageLocation_ContainerID FOREIGN KEY REFERENCES Storage(ID)
/*more columns to describe the "put storage" process: who, when, how long, ... */
);
INSERT INTO StorageLocation VALUES(GETDATE(),NULL,2,1); --puts the Box2 into Box1, current, because OutDate IS NULL
/*put all storages in their places*/
CREATE TABLE ObjectLocation (ID INT IDENTITY PRIMARY KEY
,CreatedOn DATETIME NOT NULL
,OutDate DATETIME NULL
,MyObjectID INT NOT NULL CONSTRAINT FK_ObjectLocation_MyObjectID FOREIGN KEY REFERENCES MyObject(ID)
,ContainerID INT NULL CONSTRAINT FK_ObjectLocation_ContainerID FOREIGN KEY REFERENCES Storage(ID)
/*more columns to describe the "put object" process: who, when, how long, ... */
);
INSERT INTO ObjectLocation VALUES(GETDATE(),NULL,1,2); --puts the Mail1 into Box2 (which is in Box1), current, because OutDate IS NULL
/*put all objects in their places*/

Related

When creating a foreign key do I also need to include the fields from the linked table?

Apologies, this is quite a fundamental part of SQL and I'm sure it's a question that has been asked before, I just can't find an example that relates to mine enough for me to understand it. Almost all the resources I find are things like employee/manager relationships which I think I understand but can't apply to my situation!
I am using MSSQL Server 2008. I am creating a database which has 3 tables. It will be used for a simple web application that allows people to record which cars they have double parked in front of (thus blocking in), so that the owner of the blocked car knows who to contact if they want to move the car.
Cars - Details about people's cars.
People - who can own at least one
car.
ParkedCars - This is where I'm getting stuck.
See below:
CREATE TABLE dbo.Cars
(
Pk_Car_Id INT PRIMARY KEY,
Manufacturer VARCHAR(55),
Model VARCHAR(55),
Colour VARCHAR(50),
RegistrationNo VARCHAR(10)
);
CREATE TABLE dbo.People
(
Pk_People_Id INT PRIMARY KEY,
FirstName VARCHAR(55),
LastName VARCHAR(55),
Extension INT,
Fk_Car_Id INT FOREIGN KEY REFERENCES Cars(Pk_Car_Id)
);
CREATE TABLE dbo.ParkedCars
(
PK_ParkedCars_Id INT PRIMARY KEY,
FK_Car_Id INT FOREIGN KEY REFERENCES dbo.Cars(Pk_Car_Id),
FK_People_Id INT FOREIGN KEY REFERENCES dbo.People(Pk_People_Id),
DateParked datetime
);
My question is - when creating the ParkedCars table, do I need to reference things like people.FirstName or Cars.RegistrationNo as a column of their own? Or can I do what I have done above and just create foreign key columns? So the table may look something like this when populated with data:
+------------------+-----------+--------------+--------------------+
| PK_ParkedCars_Id | FK_Car_Id | FK_People_Id | DateParked |
+------------------+-----------+--------------+--------------------+
| 2 | 1 | 5 | 19/2/2016 08:33:00 |
+------------------+-----------+--------------+--------------------+
| 3 | 4 | 2 | 19/2/2016 08:48:33 |
+------------------+-----------+--------------+--------------------+
Then I can just select the relevant fields from each table and display the results.
I did try a similar method but every time I tried to insert any data, it got stuck because one of the keys was set to null while inserting that particular row. It didn't allow any further data to be inserted into any table.
Please can someone explain the best way to tackle this kind of thing?
Thanks
As #Arvo stated, your schema looks good..
(Maybe with minor tweaks, for example, I'm not sure you need the FK_People_Id field in the "parked_cars" table since the "People" table relates to the "Cars" table anyways).
If you're having trouble with NULL keys, I would suggest looking into "OUTER JOINS", if you're not familiar with the subject already, and see if that helps you.
Also, make sure that when you write SELECT statement that contain the "parked_cars" table, you'll probably want to call the "cars" table TWICE!!!
Here's an example, (hope it'll work):
SELECT parked.ParkedCars_id, c1.RegistrationNo, p1.FirstName, p1.LastName,
parked.Fk_Car_id, c2.RegistrationNo, p2.FirstName, p2.LastName
FROM dbo.ParkedCars AS parked
LEFT OUTER JOIN dbo.Cars c1
ON parked.ParkedCars_Id = c1.Pk_Car_Id
LEFT OUTER JOIN dbo.People p1
ON c1.Pk_Car_Id = p1.Fk_Car_Id
LEFT OUTER JOIN dbo.Cars c2
ON parked.Fk_Car_Id = c2.Pk_Car_Id
LEFT OUTER JOIN dbo.People p2
ON c2.Pk_Car_Id = p2.Fk_Car_Id;
Based on comments and whatnot else looks like you have to redesign your schema somewhat.
First, don't use FK_Car_ID in People table, but use FK_People_ID in Car table. This solves problem for someone having multiple cars; opposite situation (car has multiple owners) is unlikely. Well, if you need to specify all possible users of said car, then you have to create 'junk' table (People_Car_Relation), linking to both car and people.
Second, you don't have to use FK_people_ID in Parked_Cars table at all - it is enough to link to Car only.
Third, create primary key fields as identity(1,1) - this way they fill themselves and you don't have to calculate ID values beforehand.
Then remember - never use insert statement without specified fields list. Currently you are attempting to insert record with specific (duplicate) primary key value, which doesn't succeed.
Always use next syntax for inserts:
insert into dbo.parkedcars(fk_car_id, fk_people_id, dateparked) values(...)
this way you can be sure that your SQL code does what you intend it to do.

Can a table title be a Primary Key?

I'm trying to retrofit some tables to an existing database. The existing database has equipment numbers and I'm trying to add in tables with more information on that equipment. Ideally, I'd like to make the table titles the ID numbers and set those as the PK of that table, making the ID numbers in the equipment list the FK.
Is it possible to set the table title as the PK? Here's an example of one of the tables, and I'd like to make "E0111" the PK.
CREATE TABLE E0111(
EQUIPMENT Varchar(200),
MAINTENANCE varchar(200),
CYCLE varchar(200)
);
No you can't do this because the primary key needs to be unique for every row of your table. If you "could" use the table name as the primary key it would be the same for every row.
You should use a unique column in your table as the primary key.
Also, I have no idea how you could achieve this with SQLite or any DBMS.
First thing before I even get anywhere near answering the question about the table names being primary keys, we need to take a step back.
You should NOT have a table for each piece of equipment.
You need an Equipment table, that will store all of your pieces of Equipment together. I assume you have that already in the existing database.
Hopefully it is keyed with a Unique Identifier AND an Equipment Number. The reason for having a separate Unique Identifier, is that your database server uses this for referential integrity and performance - this is not a value that you should show or use anywhere other than inside the database and between your database and whatever application you are using to modify the database. It should not typically be shown to the user.
The Equipment Number is the one you are familiar with (ie 'E0111'), because this is shown to the User and marked on reports etc. The two have different purposes and needs, so should not be combined into a single value.
I will take a stab at what your Equipment table may look like:
EquipmentId int -- database Id - used for primary key
EquipmentName Varchar(200) -- human readable
EquipmentDescription Text
PurchaseDate DateTime
SerialNumber VarChar(50)
Model Varchar(200)
etc..
To then add the Maintenance Cycle table as you propose above it would look like:
MaintenanceId int -- database Id - used for primary key this time for the maintenance table.
EquipmentId int -- foreign key - references the equipment table
MaintenanceType Varchar(200)
DatePerformed DateTime
MaintenanceResults VarChar(200)
NextMaintenanceDate DateTime
To get the results about the Maintenance Cycle for all equipment, you then JOIN the tables on the 2 EquipmentIds, ie
SELECT EquipmentName, EquipmentDescription, SerialNumber, MaintenanceType DatePerformed
FROM Equipment
JOIN MaintenanceCycle ON Equipment.EquipmentId = Maintenance.EquipmentId
WHERE EquipmentName = 'E0111'
You cannot make the name of the table a primary key.
All primary keys should be unique and a table column not table name. This is the general rule of thumb for a priamry key. There are plenty of resources on the internet about Primary keys.
Here are just afew:
http://www.w3schools.com/sql/sql_primarykey.asp
http://database-programmer.blogspot.co.uk/2008/01/database-skills-sane-approach-to.html
The name of a table should be descriptive of what is hold within it. See that data table as a drawer where you shall label what it contains.
In my humble point of view, the ID of an equipement shall only be labeled as-is on the equipement in question. Otherwise, in your database, it shall be the table Equipments that prevails with the ID of each piece of equipment you have.
Then, if you have other equipment-related information to save, add another table with the kind of information it shall contains, with the ID of the related equipment for which this information is saved.
For example, let's say we hold a maintenance schedule over your equipment.
Equipments
-----------------------------------------------------
Id | Description | Location | Brand | Model
-----------------------------------------------------
1 | Printer/Copier| 1st Floor | HP | PSC1000
Maintenances
---------------------------------------------------------
Id | Description | Date | EquipmentId | EmployeeId
---------------------------------------------------------
Note that the EmployeeId column shall be there only if one requires to know who did what maintenance and when, for instance.
MaintenanceCycles
--------------------------------------------
Id | Code | Description | EquipementId
--------------------------------------------
1 | M | Monthly | 1
This way, every equipment can have its cycle, and even multiple cycles per equipment if required. This lets you the flexibility that you need for further changes.

Which one is best practice for database table having two columns...one for Item1 and second for Item2?

Which one is best practice for database table having two columns...one for Item1 and second for Item2?
Table Structure
Item1 | Item2
Apple | Orange
Pen | Paper
OR
ID | Item1 | Item2
1 | Apple | Orange
2 | Pen | Paper
In short I wish to know that is it a good practice to make a primary column/field ID for tables even if they are allowed to accept multiple same values?
You should have a primary key. My guess from the extremely limited info you have posted is that neither of your fields will count as a primary key. Therefore, you need an id field.
(note: You could do it without, but it's a bad idea)
Well, having a numeric ID of the record in the table, making that ID a Primary Key for the table, is a preferred way to go:
comparison operations on the numbers are much faster then the ones on strings (given number fits into the CPU integer or long value, check the INTEGER SQL type);
you will have your database consistent in case you'll want to rename "Apple" to an "apple" or, maybe, "APPLE" one day. Without the extra ID column, you'll have to update all the dependent tables (in case you're planning to have a Primary/Foreign keys of course).
you will have only one column in your Primary Key even for cases when you'll have both, Orange and Green apples, without the extra ID you'd have to make Primary as (Item1, Item2).
Sounds like you might need more than one table, but it really depends on what you need to store and why. Try designing your model first, then create a data structure that supports your model.
To answer your question, it looks like the second table structure is the "best practice" (or should I say 'better practice') as it contains an ID column that can act as the primary key for indexing, but it really depends on usage. Considering the following:
One possible data structure (separates by type)
table - Fruit
=============
FruitId int not null identity(1,1) primary key
Name varchar(100) not null
table - OfficeSupply
====================
OfficeSupplyId int not null identity(1,1) primary key
Name varchar(100) not null
Another possibility (combined with a type column)
table - Item
============
ItemId int not null identity(1,1) primary key
Name varchar(100) not null
Type varchar(100) not null

SQL - Properties Structure

What is the best way to setup this table structure.
I have 3 tables, one table we'll call fruit and the other two tables are properties of that fruit so fruit_detailed and fruit_basic.
fruit
id | isDetailed
fruit_detailed
id | price | color | source | weight | fruitid?
fruit_basic
id | value | fruitid?
So what I want to do is have a property in fruit called isDetailed and if true, fill the fruit_detailed table with properties like color, weight, source, etc (multiple column). If its false then store in fruit_basic table with properties written in a single row.
Storage sounds quite basic but if I want to select a fruit and get its properties, how can I determine which table to join? I could use and IF statement on the isDetailed property and then join like that but then you have two different types of properties coming back
How would you create the tables or do the join to get the properties? Am I missing something?
Personally, I see no need to split the basic and detailed attributes out into separate tables. I think they can/should all be columns of the main fruit table.
I would probably model this like so:
CREATE TABLE Fruits (
fruit_id INT NOT NULL,
CONSTRAINT PK_Fruit PRIMARY KEY CLUSTERED (fruit_id)
)
CREATE TABLE Fruit_Details (
fruit_id INT NOT NULL,
price MONEY NOT NULL,
color VARCHAR(20) NOT NULL,
source VARCHAR(20) NOT NULL,
weight DECIMAL(10, 4) NOT NULL,
CONSTRAINT PK_Fruit_Detail PRIMARY KEY CLUSTERED (fruit_id),
CONSTRAINT FK_Fruit_Detail_Fruit FOREIGN KEY (fruit_id) REFERENCES Fruit (fruit_id)
)
I had to guess on appropriate data types for some of the columns. I'm also not sure exactly what the "value" column is in your Fruit_Basic table, so I've left that out for now.
Don't bother putting a bunch of IDs out there simply for the sake of having an ID column on every table. The Fruits->Fruit_Details relationship is a one-to-zero-or-one relationship. In other words, you can have at most one Fruit_Details row for each Fruits row. In some cases you might have no row in Fruit_Details for a particular row in Fruits.
When you're querying you can simply OUTER JOIN from the Fruits table to the Fruit_Details table. If you get back a NULL value for Fruit_Details.fruit_id then you know that the fruit doesn't have any details. You can always include the Fruit_Details columns, they'll just be NULL if the row doesn't exist. That way you can always have homogeneous resultsets. As you've discovered, otherwise you end up having to worry about different column lists coming back depending on the row in question, which will lead to tons of headaches.
If you want to include an "isDetailed" column then you can just use this:
CASE WHEN Fruit_Details.fruit_id IS NULL THEN 0 ELSE 1 END AS isDetailed
This approach also has an advantage over putting all of the columns in one table because it lowers the number of NULL columns in your database and depending on your data can substantially decrease storage requirements and improve performance.
I'm not sure why you would need to store a basic or detailed list of the fruit in different tables. You should just have 1 table and then leave some of the fields null if the information doesn't exist.
Assuming that value from fruit_basic is the same as price from fruit_detailed, you'd have something like this.
fruit
id | detail_id (fk to fruit_detailed table)
fruit_details
detail_id | price | color | source | weight

Design question: Filterable attributes, SQL

I have two tables in my database, Operation and Equipment. An operation requires zero or more attributes. However, there's some logic in how the attributes are attributed:
Operation Foo requires equipment A and B
Operation Bar requires no equipment
Operation Baz requires equipment B and either C or D
Operation Quux requires equipment (A or B) and (C or D)
What's the best way to represent this in SQL?
I'm sure people have done this before, but I have no idea where to start.
(FWIW, my application is built with Python and Django.)
Update 1: There will be around a thousand Operation rows and about thirty Equipment rows. The information is coming in CSV form similar to the description above: Quux, (A & B) | (C & D)
Update 2: The level of conjunctions & disjunctions shouldn't be too deep. The Quux example is probably the most complicated, though there appears to be a A | (D & E & F) case.
Think about how you'd model the operations in OO design: the operations would be subclasss of a common superclass Operation. Each subclass would have mandatory object members for the respective equipment required by that operation.
The way to model this with SQL is Class Table Inheritance. Create a common super-table:
CREATE TABLE Operation (
operation_id SERIAL PRIMARY KEY,
operation_type CHAR(1) NOT NULL,
UNIQUE KEY (operation_id, operation_type),
FOREIGN KEY (operation_type) REFERENCES OperationTypes(operation_type)
);
Then for each operation type, define a sub-table with a column for each required equipment type. For example, OperationFoo has a column for each of equipA and equipB. Since they are both required, the columns are NOT NULL. Constrain them to the correct types by creating a Class Table Inheritance super-table for equipment too.
CREATE TABLE OperationFoo (
operation_id INT PRIMARY KEY,
operation_type CHAR(1) NOT NULL CHECK (operation_type = 'F'),
equipA INT NOT NULL,
equipB INT NOT NULL,
FOREIGN KEY (operation_id, operation_type)
REFERENCES Operations(operation_d, operation_type),
FOREIGN KEY (equipA) REFERENCES EquipmentA(equip_id),
FOREIGN KEY (equipB) REFERENCES EquipmentB(equip_id)
);
Table OperationBar requires no equipment, so it has no equip columns:
CREATE TABLE OperationBar (
operation_id INT PRIMARY KEY,
operation_type CHAR(1) NOT NULL CHECK (operation_type = 'B'),
FOREIGN KEY (operation_id, operation_type)
REFERENCES Operations(operation_d, operation_type)
);
Table OperationBaz has one required equipment equipA, and then at least one of equipB and equipC must be NOT NULL. Use a CHECK constraint for this:
CREATE TABLE OperationBaz (
operation_id INT PRIMARY KEY,
operation_type CHAR(1) NOT NULL CHECK (operation_type = 'Z'),
equipA INT NOT NULL,
equipB INT,
equipC INT,
FOREIGN KEY (operation_id, operation_type)
REFERENCES Operations(operation_d, operation_type)
FOREIGN KEY (equipA) REFERENCES EquipmentA(equip_id),
FOREIGN KEY (equipB) REFERENCES EquipmentB(equip_id),
FOREIGN KEY (equipC) REFERENCES EquipmentC(equip_id),
CHECK (COALESCE(equipB, equipC) IS NOT NULL)
);
Likewise in table OperationQuux you can use a CHECK constraint to make sure at least one equipment resource of each pair is non-null:
CREATE TABLE OperationQuux (
operation_id INT PRIMARY KEY,
operation_type CHAR(1) NOT NULL CHECK (operation_type = 'Q'),
equipA INT,
equipB INT,
equipC INT,
equipD INT,
FOREIGN KEY (operation_id, operation_type)
REFERENCES Operations(operation_d, operation_type),
FOREIGN KEY (equipA) REFERENCES EquipmentA(equip_id),
FOREIGN KEY (equipB) REFERENCES EquipmentB(equip_id),
FOREIGN KEY (equipC) REFERENCES EquipmentC(equip_id),
FOREIGN KEY (equipD) REFERENCES EquipmentD(equip_id),
CHECK (COALESCE(equipA, equipB) IS NOT NULL AND COALESCE(equipC, equipD) IS NOT NULL)
);
This may seem like a lot of work. But you asked how to do it in SQL. The best way to do it in SQL is to use declarative constraints to model your business rules. Obviously, this requires that you create a new sub-table every time you create a new operation type. This is best when the operations and business rules never (or hardly ever) change. But this may not fit your project requirements. Most people say, "but I need a solution that doesn't require schema alterations."
Most developers probably don't do Class Table Inheritance. More commonly, they just use a one-to-many table structure like other people have mentioned, and implement the business rules solely in application code. That is, your application contains the code to insert only the equipment appropriate for each operation type.
The problem with relying on the app logic is that it can contain bugs and might insert data the doesn't satisfy the business rules. The advantage of Class Table Inheritance is that with well-designed constraints, the RDBMS enforces data integrity consistently. You have assurance that the database literally can't store incorrect data.
But this can also be limiting, for instance if your business rules change and you need to adjust the data. The common solution in this case is to write a script to dump all the data out, change your schema, and then reload the data in the form that is now allowed (Extract, Transform, and Load = ETL).
So you have to decide: do you want to code this in the app layer, or the database schema layer? There are legitimate reasons to use either strategy, but it's going to be complex either way.
Re your comment: You seem to be talking about storing expressions as strings in data fields. I recommend against doing that. The database is for storing data, not code. You can do some limited logic in constraints or triggers, but code belongs in your application.
If you have too many operations to model in separate tables, then model it in application code. Storing expressions in data columns and expecting SQL to use them for evaluating queries would be like designing an application around heavy use of eval().
I think you should have either a one-to-many or many-to-many relationship between Operation and Equipment, depending on whether there is one Equipment entry per piece of equipment, or per equipment type.
I would advise against putting business logic into your database schema, as business logic is subject to change and you'd rather not have to change your schema in response.
Looks like you'll need to be able to group certain equipment together as either conjunction or disjunction and combine these groups together...
OperationEquipmentGroup
id int
operation_id int
is_conjuction bit
OperationEquipment
id int
operation_equipment_group_id int
equipment_id
You can add ordering columns if that is important and maybe another column to the group table to specify how groups are combined (only makes sense if ordered). But, by your examples, it looks like groups are only conjuncted together.
Since Operations can have one or more piece of equipment, you should use a linking table. Your schema would be like this:
Operation
ID
othercolumn
Equipment
ID
othercolumn
Operation_Equipment_Link
OperationID
EquipmentID
The two fields in the third table can be set up as a composite primary key, so you don't need a third field and can more easily keep duplicates out of the table.
In addition to Nicholai's suggestion I solved a similar problem as following:
Table Operation has an additional field "OperationType"
Table Equipment has an additional field "EquipmentType"
I have an additional table "DefaultOperationEquipmentType" specifying which EquipmentType needs to be include with each OperationType, e.g.
OperationType EquipmentType
==============.=============.
Foo_Type A_Type
Foo_Type B_Type
Baz_Type B_Type
Baz_Type C_Type
My application doesn't need complex conditions like (A or B) because in my business logic both alternative equipments belong to the same type of equipment, e.g. in a PC environment I could have an equipment Mouse (A) or Trackball (B), but they both belong to EquipmentType "PointingDevice_Type"
Hope that helps
Be Aware I have not tested this in the wild. That being said, the best* way I can see to do a mapping is with a denormalized table for the grouping.
*(aside from Bill's way, which is hard to set up, but masterful when done correctly)
Operations:
--------------------
Op_ID int not null pk
Op_Name varchar 500
Equipment:
--------------------
Eq_ID int not null pk
Eq_Name varchar 500
Total_Available int
Group:
--------------------
Group_ID int not null pk
-- Here you have a choice. You can either:
-- Not recommended
Equip varchar(500) --Stores a list of EQ_ID's {1, 3, 15}
-- Recommended
Eq_ID_1 bit
Eq_1_Total_Required
Eq_ID_2 bit
Eq_2_Total_Required
Eq_ID_3 bit
Eq_3_Total_Required
-- ... etc.
Operations_to_Group_Mapping:
--------------------
Group_ID int not null frk
Op_ID int not null frk
Thus, in case X: A | (D & E & F)
Operations:
--------------------
Op_ID Op_Name
1 X
Equipment:
--------------------
Eq_ID Eq_Name Total_Available
1 A 5
-- ... snip ...
22 D 15
23 E 0
24 F 2
Group:
--------------------
Group_ID Eq_ID_1 Eq_1_Total_Required -- ... etc. ...
1 TRUE 3
-- ... snip ...
2 FALSE 0
Operations_to_Group_Mapping:
--------------------
Group_ID Op_ID
1 1
2 1
As loathe as I am to put recursive (tree) structures in SQL, it sounds like this is really what you're looking for. I would use something modeled like this:
Operation
----------------
OperationID PK
RootEquipmentGroupID FK -> EquipmentGroup.EquipmentGroupID
...
Equipment
----------------
EquipmentID PK
...
EquipmentGroup
----------------
EquipmentGroupID PK
LogicalOperator
EquipmentGroupEquipment
----------------
EquipmentGroupID | (also FK -> EquipmentGroup.EquipmentGroupID)
EntityType | PK (all 3 columns)
EntityID | (not FK, but references either Equipment.EquipmentID
or EquipmentGroup.EquipmentGroupID)
Now that I've put forth an arguably ugly schema, allow me to explain a bit...
Every equipment group can either be an and group or an or group (as designated by the LogicalOperator column). The members of each group are defined in the EquipmentGroupEquipment table, with EntityID referencing either Equipment.EquipmentID or another EquipmentGroup.EquipmentGroupID, the target being determined by the value in EntityType. This will allow you to compose a group that consists of equipment or other groups.
This will allow you to represent something as simple as "requires equipment A", which would look like this:
EquipmentGroupID LogicalOperator
--------------------------------------------
1 'AND'
EquipmentGroupID EntityType EntityID
--------------------------------------------
1 1 'A'
...all the way to your "A | (D & E & F)", which would look like this:
EquipmentGroupID LogicalOperator
--------------------------------------------
1 'OR'
2 'AND'
EquipmentGroupID EntityType EntityID
--------------------------------------------
1 1 'A'
1 2 2 -- group ID 2
2 1 'D'
2 1 'E'
2 1 'F'
(I realize that I've mixed data types in the EntityID column; this is just to make it clearer. Obviously you wouldn't do this in an actual implementation)
This would also allow you to represent structures of arbitrary complexity. While I realize that you (correctly) don't wish to overarchitect the solution, I don't think you can really get away with less without breaking 1NF (by combining multiple equipment into a single column).
From what I understood you want to store the equipments in relation to the operations in a way that will allow you to apply your business logic to it later, in that case you'll need 3 tables:
Operations:
ID
name
Equipment:
ID
name
Operations_Equipment:
equipment_id
operation_id
symbol
Where symbol is A, B, C, etc...
If you have the condition like (A & B) | (C & D) you can know which equipment is which easily.