SQL - Properties Structure - sql

What is the best way to setup this table structure.
I have 3 tables, one table we'll call fruit and the other two tables are properties of that fruit so fruit_detailed and fruit_basic.
fruit
id | isDetailed
fruit_detailed
id | price | color | source | weight | fruitid?
fruit_basic
id | value | fruitid?
So what I want to do is have a property in fruit called isDetailed and if true, fill the fruit_detailed table with properties like color, weight, source, etc (multiple column). If its false then store in fruit_basic table with properties written in a single row.
Storage sounds quite basic but if I want to select a fruit and get its properties, how can I determine which table to join? I could use and IF statement on the isDetailed property and then join like that but then you have two different types of properties coming back
How would you create the tables or do the join to get the properties? Am I missing something?

Personally, I see no need to split the basic and detailed attributes out into separate tables. I think they can/should all be columns of the main fruit table.

I would probably model this like so:
CREATE TABLE Fruits (
fruit_id INT NOT NULL,
CONSTRAINT PK_Fruit PRIMARY KEY CLUSTERED (fruit_id)
)
CREATE TABLE Fruit_Details (
fruit_id INT NOT NULL,
price MONEY NOT NULL,
color VARCHAR(20) NOT NULL,
source VARCHAR(20) NOT NULL,
weight DECIMAL(10, 4) NOT NULL,
CONSTRAINT PK_Fruit_Detail PRIMARY KEY CLUSTERED (fruit_id),
CONSTRAINT FK_Fruit_Detail_Fruit FOREIGN KEY (fruit_id) REFERENCES Fruit (fruit_id)
)
I had to guess on appropriate data types for some of the columns. I'm also not sure exactly what the "value" column is in your Fruit_Basic table, so I've left that out for now.
Don't bother putting a bunch of IDs out there simply for the sake of having an ID column on every table. The Fruits->Fruit_Details relationship is a one-to-zero-or-one relationship. In other words, you can have at most one Fruit_Details row for each Fruits row. In some cases you might have no row in Fruit_Details for a particular row in Fruits.
When you're querying you can simply OUTER JOIN from the Fruits table to the Fruit_Details table. If you get back a NULL value for Fruit_Details.fruit_id then you know that the fruit doesn't have any details. You can always include the Fruit_Details columns, they'll just be NULL if the row doesn't exist. That way you can always have homogeneous resultsets. As you've discovered, otherwise you end up having to worry about different column lists coming back depending on the row in question, which will lead to tons of headaches.
If you want to include an "isDetailed" column then you can just use this:
CASE WHEN Fruit_Details.fruit_id IS NULL THEN 0 ELSE 1 END AS isDetailed
This approach also has an advantage over putting all of the columns in one table because it lowers the number of NULL columns in your database and depending on your data can substantially decrease storage requirements and improve performance.

I'm not sure why you would need to store a basic or detailed list of the fruit in different tables. You should just have 1 table and then leave some of the fields null if the information doesn't exist.
Assuming that value from fruit_basic is the same as price from fruit_detailed, you'd have something like this.
fruit
id | detail_id (fk to fruit_detailed table)
fruit_details
detail_id | price | color | source | weight

Related

Can a table title be a Primary Key?

I'm trying to retrofit some tables to an existing database. The existing database has equipment numbers and I'm trying to add in tables with more information on that equipment. Ideally, I'd like to make the table titles the ID numbers and set those as the PK of that table, making the ID numbers in the equipment list the FK.
Is it possible to set the table title as the PK? Here's an example of one of the tables, and I'd like to make "E0111" the PK.
CREATE TABLE E0111(
EQUIPMENT Varchar(200),
MAINTENANCE varchar(200),
CYCLE varchar(200)
);
No you can't do this because the primary key needs to be unique for every row of your table. If you "could" use the table name as the primary key it would be the same for every row.
You should use a unique column in your table as the primary key.
Also, I have no idea how you could achieve this with SQLite or any DBMS.
First thing before I even get anywhere near answering the question about the table names being primary keys, we need to take a step back.
You should NOT have a table for each piece of equipment.
You need an Equipment table, that will store all of your pieces of Equipment together. I assume you have that already in the existing database.
Hopefully it is keyed with a Unique Identifier AND an Equipment Number. The reason for having a separate Unique Identifier, is that your database server uses this for referential integrity and performance - this is not a value that you should show or use anywhere other than inside the database and between your database and whatever application you are using to modify the database. It should not typically be shown to the user.
The Equipment Number is the one you are familiar with (ie 'E0111'), because this is shown to the User and marked on reports etc. The two have different purposes and needs, so should not be combined into a single value.
I will take a stab at what your Equipment table may look like:
EquipmentId int -- database Id - used for primary key
EquipmentName Varchar(200) -- human readable
EquipmentDescription Text
PurchaseDate DateTime
SerialNumber VarChar(50)
Model Varchar(200)
etc..
To then add the Maintenance Cycle table as you propose above it would look like:
MaintenanceId int -- database Id - used for primary key this time for the maintenance table.
EquipmentId int -- foreign key - references the equipment table
MaintenanceType Varchar(200)
DatePerformed DateTime
MaintenanceResults VarChar(200)
NextMaintenanceDate DateTime
To get the results about the Maintenance Cycle for all equipment, you then JOIN the tables on the 2 EquipmentIds, ie
SELECT EquipmentName, EquipmentDescription, SerialNumber, MaintenanceType DatePerformed
FROM Equipment
JOIN MaintenanceCycle ON Equipment.EquipmentId = Maintenance.EquipmentId
WHERE EquipmentName = 'E0111'
You cannot make the name of the table a primary key.
All primary keys should be unique and a table column not table name. This is the general rule of thumb for a priamry key. There are plenty of resources on the internet about Primary keys.
Here are just afew:
http://www.w3schools.com/sql/sql_primarykey.asp
http://database-programmer.blogspot.co.uk/2008/01/database-skills-sane-approach-to.html
The name of a table should be descriptive of what is hold within it. See that data table as a drawer where you shall label what it contains.
In my humble point of view, the ID of an equipement shall only be labeled as-is on the equipement in question. Otherwise, in your database, it shall be the table Equipments that prevails with the ID of each piece of equipment you have.
Then, if you have other equipment-related information to save, add another table with the kind of information it shall contains, with the ID of the related equipment for which this information is saved.
For example, let's say we hold a maintenance schedule over your equipment.
Equipments
-----------------------------------------------------
Id | Description | Location | Brand | Model
-----------------------------------------------------
1 | Printer/Copier| 1st Floor | HP | PSC1000
Maintenances
---------------------------------------------------------
Id | Description | Date | EquipmentId | EmployeeId
---------------------------------------------------------
Note that the EmployeeId column shall be there only if one requires to know who did what maintenance and when, for instance.
MaintenanceCycles
--------------------------------------------
Id | Code | Description | EquipementId
--------------------------------------------
1 | M | Monthly | 1
This way, every equipment can have its cycle, and even multiple cycles per equipment if required. This lets you the flexibility that you need for further changes.

Can Database Normalization occur from values

Say I want to normalize a table
itemID | itemDate | itemSource | type | color | size | material
254 03/08/1988 toyCo doll null 16 plastic
255 03/08/1988 toyCo car blue null plastic
256 03/08/1988 toyCo boat purple 20 wood
Now the type field can only have 1 of 3 values. doll, car, or boat. Attributes of color, size, and material are functionally dependent on type. As you can see though, items of type|doll do not determine color. I do not know if this is a problem. But moving on.
type(pk) | color | size | material = table A
itemID(pk) | itemDate | itemSource = table B
We are now in 1nf. My question is, can the type key, along with its attributes, become based on the type keys' possible values?
typeDoll(pk) | size | material = table C
typeCar(pk) | color| material = table D
typeBoat(pk) | color | size | material table E
I'm not sure I understand exactly what you're asking, but here's one approach to creating an exclusive arc in SQL.
-- Columns common to all types.
create table items (
item_id integer primary key,
item_type varchar(10) not null
check (item_type in 'doll', 'car', 'boat'),
-- This constraint lets the pair of columns be the target of a foreign key reference.
unique (item_id, item_type),
item_date date not null default current_date,
item_source varchar(25) not null
);
-- Columns unique to dolls. I'd assume that "size" means one thing when you're
-- talking about dolls, and something slightly different when you're talking
-- about boats.
create table dolls (
item_id integer primary key,
item_type varchar(10) not null default 'doll'
check(item_type = 'doll'),
foreign key (item_id, item_type) references items (item_id, item_type),
doll_size integer not null
check(doll_size between 1 and 20),
doll_material varchar(25) not null -- In production, probably references a table
-- of valid doll materials.
);
The column dolls.item_type, along with its CHECK constraint and the foreign key reference, guarantees that
every row in "dolls" has a matching row in "items", and
that matching row is also about dolls. (Not about boats or cars.)
Tables for boats and cars are similar.
If you have to implement this in MySQL, you'll have to replace the CHECK constraints, because MySQL doesn't enforce CHECK constraints. In some cases, you can replace them with a foreign key reference to a tiny table. In other cases, you might have to write a trigger.
What I am trying to achieve is called Polymorphic Association. This can be accomplished by creating a super table to store all possible columns and using a second and third table to constrain foreign keys to primary keys.
Its explained in detail here

Which one is best practice for database table having two columns...one for Item1 and second for Item2?

Which one is best practice for database table having two columns...one for Item1 and second for Item2?
Table Structure
Item1 | Item2
Apple | Orange
Pen | Paper
OR
ID | Item1 | Item2
1 | Apple | Orange
2 | Pen | Paper
In short I wish to know that is it a good practice to make a primary column/field ID for tables even if they are allowed to accept multiple same values?
You should have a primary key. My guess from the extremely limited info you have posted is that neither of your fields will count as a primary key. Therefore, you need an id field.
(note: You could do it without, but it's a bad idea)
Well, having a numeric ID of the record in the table, making that ID a Primary Key for the table, is a preferred way to go:
comparison operations on the numbers are much faster then the ones on strings (given number fits into the CPU integer or long value, check the INTEGER SQL type);
you will have your database consistent in case you'll want to rename "Apple" to an "apple" or, maybe, "APPLE" one day. Without the extra ID column, you'll have to update all the dependent tables (in case you're planning to have a Primary/Foreign keys of course).
you will have only one column in your Primary Key even for cases when you'll have both, Orange and Green apples, without the extra ID you'd have to make Primary as (Item1, Item2).
Sounds like you might need more than one table, but it really depends on what you need to store and why. Try designing your model first, then create a data structure that supports your model.
To answer your question, it looks like the second table structure is the "best practice" (or should I say 'better practice') as it contains an ID column that can act as the primary key for indexing, but it really depends on usage. Considering the following:
One possible data structure (separates by type)
table - Fruit
=============
FruitId int not null identity(1,1) primary key
Name varchar(100) not null
table - OfficeSupply
====================
OfficeSupplyId int not null identity(1,1) primary key
Name varchar(100) not null
Another possibility (combined with a type column)
table - Item
============
ItemId int not null identity(1,1) primary key
Name varchar(100) not null
Type varchar(100) not null

Design question: Filterable attributes, SQL

I have two tables in my database, Operation and Equipment. An operation requires zero or more attributes. However, there's some logic in how the attributes are attributed:
Operation Foo requires equipment A and B
Operation Bar requires no equipment
Operation Baz requires equipment B and either C or D
Operation Quux requires equipment (A or B) and (C or D)
What's the best way to represent this in SQL?
I'm sure people have done this before, but I have no idea where to start.
(FWIW, my application is built with Python and Django.)
Update 1: There will be around a thousand Operation rows and about thirty Equipment rows. The information is coming in CSV form similar to the description above: Quux, (A & B) | (C & D)
Update 2: The level of conjunctions & disjunctions shouldn't be too deep. The Quux example is probably the most complicated, though there appears to be a A | (D & E & F) case.
Think about how you'd model the operations in OO design: the operations would be subclasss of a common superclass Operation. Each subclass would have mandatory object members for the respective equipment required by that operation.
The way to model this with SQL is Class Table Inheritance. Create a common super-table:
CREATE TABLE Operation (
operation_id SERIAL PRIMARY KEY,
operation_type CHAR(1) NOT NULL,
UNIQUE KEY (operation_id, operation_type),
FOREIGN KEY (operation_type) REFERENCES OperationTypes(operation_type)
);
Then for each operation type, define a sub-table with a column for each required equipment type. For example, OperationFoo has a column for each of equipA and equipB. Since they are both required, the columns are NOT NULL. Constrain them to the correct types by creating a Class Table Inheritance super-table for equipment too.
CREATE TABLE OperationFoo (
operation_id INT PRIMARY KEY,
operation_type CHAR(1) NOT NULL CHECK (operation_type = 'F'),
equipA INT NOT NULL,
equipB INT NOT NULL,
FOREIGN KEY (operation_id, operation_type)
REFERENCES Operations(operation_d, operation_type),
FOREIGN KEY (equipA) REFERENCES EquipmentA(equip_id),
FOREIGN KEY (equipB) REFERENCES EquipmentB(equip_id)
);
Table OperationBar requires no equipment, so it has no equip columns:
CREATE TABLE OperationBar (
operation_id INT PRIMARY KEY,
operation_type CHAR(1) NOT NULL CHECK (operation_type = 'B'),
FOREIGN KEY (operation_id, operation_type)
REFERENCES Operations(operation_d, operation_type)
);
Table OperationBaz has one required equipment equipA, and then at least one of equipB and equipC must be NOT NULL. Use a CHECK constraint for this:
CREATE TABLE OperationBaz (
operation_id INT PRIMARY KEY,
operation_type CHAR(1) NOT NULL CHECK (operation_type = 'Z'),
equipA INT NOT NULL,
equipB INT,
equipC INT,
FOREIGN KEY (operation_id, operation_type)
REFERENCES Operations(operation_d, operation_type)
FOREIGN KEY (equipA) REFERENCES EquipmentA(equip_id),
FOREIGN KEY (equipB) REFERENCES EquipmentB(equip_id),
FOREIGN KEY (equipC) REFERENCES EquipmentC(equip_id),
CHECK (COALESCE(equipB, equipC) IS NOT NULL)
);
Likewise in table OperationQuux you can use a CHECK constraint to make sure at least one equipment resource of each pair is non-null:
CREATE TABLE OperationQuux (
operation_id INT PRIMARY KEY,
operation_type CHAR(1) NOT NULL CHECK (operation_type = 'Q'),
equipA INT,
equipB INT,
equipC INT,
equipD INT,
FOREIGN KEY (operation_id, operation_type)
REFERENCES Operations(operation_d, operation_type),
FOREIGN KEY (equipA) REFERENCES EquipmentA(equip_id),
FOREIGN KEY (equipB) REFERENCES EquipmentB(equip_id),
FOREIGN KEY (equipC) REFERENCES EquipmentC(equip_id),
FOREIGN KEY (equipD) REFERENCES EquipmentD(equip_id),
CHECK (COALESCE(equipA, equipB) IS NOT NULL AND COALESCE(equipC, equipD) IS NOT NULL)
);
This may seem like a lot of work. But you asked how to do it in SQL. The best way to do it in SQL is to use declarative constraints to model your business rules. Obviously, this requires that you create a new sub-table every time you create a new operation type. This is best when the operations and business rules never (or hardly ever) change. But this may not fit your project requirements. Most people say, "but I need a solution that doesn't require schema alterations."
Most developers probably don't do Class Table Inheritance. More commonly, they just use a one-to-many table structure like other people have mentioned, and implement the business rules solely in application code. That is, your application contains the code to insert only the equipment appropriate for each operation type.
The problem with relying on the app logic is that it can contain bugs and might insert data the doesn't satisfy the business rules. The advantage of Class Table Inheritance is that with well-designed constraints, the RDBMS enforces data integrity consistently. You have assurance that the database literally can't store incorrect data.
But this can also be limiting, for instance if your business rules change and you need to adjust the data. The common solution in this case is to write a script to dump all the data out, change your schema, and then reload the data in the form that is now allowed (Extract, Transform, and Load = ETL).
So you have to decide: do you want to code this in the app layer, or the database schema layer? There are legitimate reasons to use either strategy, but it's going to be complex either way.
Re your comment: You seem to be talking about storing expressions as strings in data fields. I recommend against doing that. The database is for storing data, not code. You can do some limited logic in constraints or triggers, but code belongs in your application.
If you have too many operations to model in separate tables, then model it in application code. Storing expressions in data columns and expecting SQL to use them for evaluating queries would be like designing an application around heavy use of eval().
I think you should have either a one-to-many or many-to-many relationship between Operation and Equipment, depending on whether there is one Equipment entry per piece of equipment, or per equipment type.
I would advise against putting business logic into your database schema, as business logic is subject to change and you'd rather not have to change your schema in response.
Looks like you'll need to be able to group certain equipment together as either conjunction or disjunction and combine these groups together...
OperationEquipmentGroup
id int
operation_id int
is_conjuction bit
OperationEquipment
id int
operation_equipment_group_id int
equipment_id
You can add ordering columns if that is important and maybe another column to the group table to specify how groups are combined (only makes sense if ordered). But, by your examples, it looks like groups are only conjuncted together.
Since Operations can have one or more piece of equipment, you should use a linking table. Your schema would be like this:
Operation
ID
othercolumn
Equipment
ID
othercolumn
Operation_Equipment_Link
OperationID
EquipmentID
The two fields in the third table can be set up as a composite primary key, so you don't need a third field and can more easily keep duplicates out of the table.
In addition to Nicholai's suggestion I solved a similar problem as following:
Table Operation has an additional field "OperationType"
Table Equipment has an additional field "EquipmentType"
I have an additional table "DefaultOperationEquipmentType" specifying which EquipmentType needs to be include with each OperationType, e.g.
OperationType EquipmentType
==============.=============.
Foo_Type A_Type
Foo_Type B_Type
Baz_Type B_Type
Baz_Type C_Type
My application doesn't need complex conditions like (A or B) because in my business logic both alternative equipments belong to the same type of equipment, e.g. in a PC environment I could have an equipment Mouse (A) or Trackball (B), but they both belong to EquipmentType "PointingDevice_Type"
Hope that helps
Be Aware I have not tested this in the wild. That being said, the best* way I can see to do a mapping is with a denormalized table for the grouping.
*(aside from Bill's way, which is hard to set up, but masterful when done correctly)
Operations:
--------------------
Op_ID int not null pk
Op_Name varchar 500
Equipment:
--------------------
Eq_ID int not null pk
Eq_Name varchar 500
Total_Available int
Group:
--------------------
Group_ID int not null pk
-- Here you have a choice. You can either:
-- Not recommended
Equip varchar(500) --Stores a list of EQ_ID's {1, 3, 15}
-- Recommended
Eq_ID_1 bit
Eq_1_Total_Required
Eq_ID_2 bit
Eq_2_Total_Required
Eq_ID_3 bit
Eq_3_Total_Required
-- ... etc.
Operations_to_Group_Mapping:
--------------------
Group_ID int not null frk
Op_ID int not null frk
Thus, in case X: A | (D & E & F)
Operations:
--------------------
Op_ID Op_Name
1 X
Equipment:
--------------------
Eq_ID Eq_Name Total_Available
1 A 5
-- ... snip ...
22 D 15
23 E 0
24 F 2
Group:
--------------------
Group_ID Eq_ID_1 Eq_1_Total_Required -- ... etc. ...
1 TRUE 3
-- ... snip ...
2 FALSE 0
Operations_to_Group_Mapping:
--------------------
Group_ID Op_ID
1 1
2 1
As loathe as I am to put recursive (tree) structures in SQL, it sounds like this is really what you're looking for. I would use something modeled like this:
Operation
----------------
OperationID PK
RootEquipmentGroupID FK -> EquipmentGroup.EquipmentGroupID
...
Equipment
----------------
EquipmentID PK
...
EquipmentGroup
----------------
EquipmentGroupID PK
LogicalOperator
EquipmentGroupEquipment
----------------
EquipmentGroupID | (also FK -> EquipmentGroup.EquipmentGroupID)
EntityType | PK (all 3 columns)
EntityID | (not FK, but references either Equipment.EquipmentID
or EquipmentGroup.EquipmentGroupID)
Now that I've put forth an arguably ugly schema, allow me to explain a bit...
Every equipment group can either be an and group or an or group (as designated by the LogicalOperator column). The members of each group are defined in the EquipmentGroupEquipment table, with EntityID referencing either Equipment.EquipmentID or another EquipmentGroup.EquipmentGroupID, the target being determined by the value in EntityType. This will allow you to compose a group that consists of equipment or other groups.
This will allow you to represent something as simple as "requires equipment A", which would look like this:
EquipmentGroupID LogicalOperator
--------------------------------------------
1 'AND'
EquipmentGroupID EntityType EntityID
--------------------------------------------
1 1 'A'
...all the way to your "A | (D & E & F)", which would look like this:
EquipmentGroupID LogicalOperator
--------------------------------------------
1 'OR'
2 'AND'
EquipmentGroupID EntityType EntityID
--------------------------------------------
1 1 'A'
1 2 2 -- group ID 2
2 1 'D'
2 1 'E'
2 1 'F'
(I realize that I've mixed data types in the EntityID column; this is just to make it clearer. Obviously you wouldn't do this in an actual implementation)
This would also allow you to represent structures of arbitrary complexity. While I realize that you (correctly) don't wish to overarchitect the solution, I don't think you can really get away with less without breaking 1NF (by combining multiple equipment into a single column).
From what I understood you want to store the equipments in relation to the operations in a way that will allow you to apply your business logic to it later, in that case you'll need 3 tables:
Operations:
ID
name
Equipment:
ID
name
Operations_Equipment:
equipment_id
operation_id
symbol
Where symbol is A, B, C, etc...
If you have the condition like (A & B) | (C & D) you can know which equipment is which easily.

Simple DB design question -- sets of measurements

I have multiple sets of measurements. Each set has multiple values in it. The data is currently in a spreadsheet, and I'd like to move it to a real database.
In the spreadsheet, each set of measurements is in its own column, i.e.,
1 | 2 | 3 | ...
0.5 | 0.7 | 0.2 | ...
0.3 | 0.6 | 0.4 | ...
and so on. If I have, say, 10 data sets and each has 8 measurements, I wind up with an 8x10 table with 80 values.
To put the database in normal form, I know that it should be designed to add rows, not columns, for each new set. I presume that means the tables should be arranged something like (MySQL syntax):
create table Measurement(
int id not null auto_increment,
primary key(id)
);
create table Measurement_Data(
int id not null auto_increment,
double value not null,
int measurement,
primary key(id),
constraint fk_data foreign key(measurement) references Measurement(id)
}
Is that correct? The result is a table with only one autogenerated column (though of course I could add more to it later) and another table that has each data set stored "lengthwise", meaning that table will have 800 rows given the values above. Is that the right way to handle this problem? It also means that I need to add data, I need to insert a row into the Measurement table, get its assigned pk, and then add all the new rows to the Measurement_Data table, right?
I appreciate any advice,
Ken
What about a simple one-table solution for this?
CREATE TABLE measurements
(
id INT AUTO_INCREMENT,
dataSetId INT,
measurementNumber INT,
measurementValue DOUBLE,
CONSTRAINT UNIQUE (dataSetId, measurementNumber)
);
Unless your problem is more complicated than you describe, this should be adequate.
In general, your plan sounds good as far as it goes. I'd tweak it a bit differently from the other posts:
create table Measurement(
int MeasurementId not null auto_increment,
primary key(id)
);
create table MeasurementData(
int MeasurementId not null,
int MeasurementDataEntry not null,
double value not null,
primary key(MeasurementId, MeasurementDataEntry),
constraint fk_data foreign key(MeasurementId) references Measurement(MeasurementId)
)
As per LukLed, MeasrementDataEnry would be an ordinal value, starting at 1 and incrementing by one for each "data entry" for that MeasurementId. The compound primary key should be sufficient, as with the probelm stated I see no real need for an extra surrogate key on MeasurementData.
Be sure that your solution is sufficient for your problem set. Are other columns required in either table (type of measurement, when it occured, who dunnit)? Are there always the same number of data readings for each Measurement, or can it vary? If so, do you need the number of data measurements stored in the parent table (for quicker retrieval, ad the cost of redundancy?) Will you ever need to update data entries, and will that mess up order? It just seems a bit sparse, and there might be other business rules you'll need to factor in down the road.
First I would change 'measurement' column name to "measurement-id", becase this name is not intuitive. You could also add ordinal number to Measurement-Data table to know order in which measurements were taken and add unique on measurement-id and ordinal-numer. By adding ordinal-numer You can also limit number of measurements in every test with check constraint.