Let's hope my explanation is clearer than the title.
I have a set of files. Each file contains a variable number of papers/forms. So I have a table called files, with an fid.
For the sake of simplicity, let's say we have only 3 different forms, each contains its own set of data. So I have 3 tables, FormA, FormB and FormC, with their primary keys Aid, Bid, and Cid respectively.
The file can contain for example, 2 A forms and 1 B form, or 1 of each form, or 3 A forms, 2 B forms, 2 C forms. You get the idea, variable number of forms, and might include more than 1 of the same type.
How to properly represent such relationship in SQL? If it matters, I'm using PostGreSQL.
In PostgreSQL here is how I would do this. Note I am using dangerous (non-beginner/advanced) tools, and it is worth understanding the gotchas.
Now since there are a number of tables here, the question is how we manage the constraints. This is a little convoluted but here is what I would do:
CREATE TABLE file (...);
-- add your tables for tracking form data here....
CREATE TABLE file_to_form (
file_id int NOT NULL;
refkey int NOT NULL,
form_class char NOT NULL
CHECK NOINHERIT (file_id IS NULL)
); -- this table will never have anything in it.
CREATE TABLE file_to_form_a (
PRIMARY KEY (file_id, refkey, form_class)
FOREIGN KEY (refkey) REFERENCES file_a (form_id)
CHECK (form_class = 'a')
) INHERITS (file_to_form);
CREATE TABLE file_to_form_b (
PRIMARY KEY (file_id, refkey, form_class)
FOREIGN KEY (refkey) REFERENCES file_b (form_id)
CHECK (form_class = 'b')
) INHERITS (file_to_form);
-- etc
Now you have a consistent interface for showing which forms are associated with files, and can find them by searching the file_to_form table (which will function similar to a read-only view of all tables that inherit it). This is one of those cases where PostgreSQL's table inheritance really helps, if you take the gotchas seriously and put some thought into how to handle them.
Related
I have several different component types that each have drastically different data specs to store so each component type needs its own table, but they all share some common columns. I'm most concerned with [component.ID] which must be a unique identifier to a component regardless of component type (unique across many tables).
First Option
My first idea was inheritance where the table for each component type inherits a generic [component] table.
create table if not exists component (
ID long primary key default nextval('component_id_seq'),
typeID long not null references componentType (ID),
manufacturerID long not null references manufacturer (ID),
retailPrice numeric check (retailPrice >= 0.0),
purchasePrice numeric check (purchasePrice >= 0.0),
manufacturerPartNum varchar(255) not null,
isLegacy boolean default false,
check (retailPrice >= purchasePrice)
);
create table if not exists motherboard (
foo long,
bar long
) inherits component; //<-- this guy right here!!
/* there would be many other tables with different specific types of components
which each inherit the [component] table*/
PostgreSQL inheritance has some caveats that seem to make this a bad idea.
Constraints like unique or primary key are not respected by the inheriting table. Even if you specify unique in the inheriting table it would only be unique in that table and could duplicate values in the parent table or other inheriting tables.
References do not carry over from the parent table. So the references for typeID or manufacturerID would not apply to the inheriting table.
References to the parent table would not include data in the inheriting tables. This is the worst deal breaker for me using inheritance because I need to be able to reference to all components regardless of type.
Second Option
If I don't use inheritance and just use the component table as a master component list with data common to any component of any type and then have a table for each type of component where each entry refers to a component.ID. that works fine but how do I enforce it?
How do I enforce that each entry in the component table has one and only one corresponding entry in only one of many other tables? The part that baffles me is that there are many tables and the corresponding entry could be in any of them.
A simple reference back to the component table will ensure that each row in the many specific component type tables has one valid component.id to which it belongs.
Third Option
Last of all I could forego a master component table altogether and just have each table for a specific component type have those same columns. Then I am left with the conundrum of how to enforce a unique component ID across many tables and also how to search across all these many tables (which may very well grow or shrink) in queries. I don't want a huge unwieldy UNION between all these tables. That would bog any select query to frozen molasses speed.
Fourth Option
This strikes me as a problem that comes up from time to time is DB design and there is probably a name for it that I don't know and perhaps a solution that is different entirely from the above three options.
The foreign key should contain the type of a subcomponent, the example speaks for itself.
create table component(
id int generated always as identity primary key,
-- or
-- id serial primary key,
type_id int not null,
general_info text,
unique (type_id, id)
);
create table subcomponent_1 (
id int primary key,
type_id int generated always as (1) stored,
-- or
-- type_id int default 1 check(type_id = 1),
specific_info text,
foreign key(type_id, id) references component(type_id, id)
);
insert into component (type_id, general_info)
values (1, 'component type 1');
insert into subcomponent_1 (id, specific_info)
values (1, 'specific info');
Note:
update component
set type_id = 2
where id = 1;
ERROR: update or delete on table "component" violates foreign key constraint "subcomponent_1_type_id_id_fkey" on table "subcomponent_1"
DETAIL: Key (type_id, id)=(1, 1) is still referenced from table "subcomponent_1".
I have been coding ASP.NET forms inside web applications for a long time now. Generally most web apps have a user that logs in, picks a form to fill out and answers questions so your table looks like this
Table: tblInspectionForm
Fields:
inspectionformid (either autoint or guid)
userid (user ID who entered it)
datestamp (added, modified, whatever)
Question1Answer: boolean (maybe a yes/no)
Question2Answer: int (maybe foreign key for sub table 1 with dropdown values)
Question3Answer: int (foreign key for sub table 2 with dropdown values)
If I'm not mistaken it meets both 2nd and 3rd normal forms. You're not storing user names in the tables, just the ID's. You aren't storing the dropdown or "yes/no" values in Q-3, just ID's of other tables.
However, IF all the questions are exactly the same data type (assume there's no Q1 or Q1 is also an int), which link to the exact same foreign key (e.g. a form that has 20 questions, all on a 1-10 scale or have the same answers to chose from), would it be better to do something like this?
so .. Table: tblInspectionForm
userid (user ID who entered it)
datestamp (added, modified, whatever)
... and that's it for table 1 .. then
Table2: tblInspectionAnswers
inspectionformid (composite key that links back to table1 record)
userid (composite key that links back to table1 record)
datastamp (composite key that links back to table1 record)
QuestionIDNumber: int (question 1, question 2, question3)
QuestionAnswer: int (foreign key)
This wouldn't just apply to forms that only have the same types of answers for a single form. Maybe your form has 10 of these 1-10 ratings (int), 10 boolean-valued questions, and then 10 freeform.. You could break it into three tables.
The disadvantage would be that when you save the form, you're making 1 call for every question on your form. The upside is, if you have a lot of nightly integrations or replications that pull your data, if you decide to add a new question, you don't have to manually modify any replications to reporting data sources or anything else that's been designed to read/query your form data. If you originally had 20 questions and you deploy a change to your app that adds a 21st, it will automatically get pulled into any outside replications, data sources, reporting that queries this data. Another advantage is that if you have a REALLY LONG (this happens a lot maybe in the real estate industry when you have inspection forms with 100's of questions that go beyond the 8k limit for a table row) you won't end up running into problems.
Would this kind of scenario ever been the preferred way of saving form data?
As a rule of thumb, whenever you see a set of columns with numbers in their names, you know the database is poorly designed.
What you want to do in most cases is have a table for the form / questionnaire, a table for the questions, a table for the potential answers (for multiple-choice questions), and a table for answers that the user chooses.
You might also need a table for question type (i.e free-text, multiple-choice, yes/no).
Basically, the schema should look like this:
create table Forms
(
id int identity(1,1) not null primary key,
name varchar(100) not null, -- with a unique index
-- other form related fields here
)
create table QuestionTypes
(
id int identity(1,1) not null primary key,
name varchar(100) not null, -- with a unique index
)
create table Questions
(
id int identity(1,1) not null primary key,
form_id int not null foreign key references Forms(id),
type_id int not null foreign key references QuestionTypes(id),
content varchar(1000)
)
create table Answers
(
id int identity(1,1) not null primary key,
question_id int not null foreign key references Questions(id),
content varchar(1000)
-- For quizez, unremark the next row:
-- isCorrect bit not null
)
create table Results
{
id int identity(1,1) not null primary key,
form_id int not null foreign key references Forms(id)
-- in case only registered users can fill the form, unremark the next row
--user_id int not null foreign key references Users(id),
}
create table UserAnswers
(
result_id int not null foreign key references Results(id),
question_id int not null foreign key references Questions(id),
answer_id int not null foreign key references Answers(id),
content varchar(1000) null -- for free text questions
)
This design will require a few joins when generating the forms (and if you have multiple forms per application, you just add an application table that the form can reference), and a few joins to get the results, but it's the best dynamic forms database design I know.
I'm not sure whether it's "preferred" but I have certainly seen that format used commercially.
You could potentially make the secondary table more flexible with multiple answer columns (answer_int, answer_varchar, answer_datetime), and assign a question value that you can relate to get the answer from the right column.
So if q_var = 2 you know to look in answer_varchar, whereas q_value=1 you know is an int and requires a lookup (the name of which could also be specified with the question and stored in a column).
I use an application at the moment which splits answers into combobox, textfield, numeric, date etc in this fashion. The application actually uses a JSON form which splits out the data as it saves into the separate columns. It's a bit flawed as it saves JSON into these columns but the principle can work.
You could go with a single identity field for the parent table key that the child table would reference.
What is The best Data model for Add multiple files to The multiple tables? I have for example 5 tables articles, blogs, posts... and for each item I would like to store multiple files. Files table contains only filepaths (not physicaly files).
Example:
Im using The links table, but when I create in the future The new table for example "comments", then I need to add new column to The links table.
Is there a better way of modeling such data?
One way to solve this is to use the table inheritance pattern. The main idea is to have a base table (let's call it content) with general shared information about all the items (e.g., creation date) and most importantly, the relationship with files. Then, you may add additional content types in the future without having to worry about their relation to files, since the content parent type already handles it.
E.g.:
CREATE TABLE flies (
id NUMERIC PRIMARY KEY,
path VARCHAR(100) NOT NULL
);
CREATE TABLE content (
id NUMERIC PRIMARY KEY,
created TIMESTAMP NOT NULL
);
CREATE TABLE links (
file_id NUMERIC NOT NULL REFERENCES files(id),
content_id NUMERIC NOT NULL REFERENCES content(id),
PRIMARY KEY (file_id, content_id)
);
CREATE TABLE articles (
id NUMERIC PRIMARY KEY REFERENCES content(id),
title VARCHAR(400),
subtitle VARCHAR(400)
);
-- etc...
I'd like to hear your suggestions on this very basic question:
Imagine these three tables:
--DROP TABLE a_to_b;
--DROP TABLE a;
--DROP TABLE b;
CREATE TABLE A
(
ID NUMBER NOT NULL ,
NAME VARCHAR2(20) NOT NULL ,
CONSTRAINT A_PK PRIMARY KEY ( ID ) ENABLE
);
CREATE TABLE B
(
ID NUMBER NOT NULL ,
NAME VARCHAR2(20) NOT NULL ,
CONSTRAINT B_PK PRIMARY KEY ( ID ) ENABLE
);
CREATE TABLE A_TO_B
(
id NUMBER NOT NULL,
a_id NUMBER NOT NULL,
b_id NUMBER NOT NULL,
somevalue1 VARCHAR2(20) NOT NULL,
somevalue2 VARCHAR2(20) NOT NULL,
somevalue3 VARCHAR2(20) NOT NULL
) ;
How would you design table a_to_b?
I'll give some discussion starters:
synthetic id-PK column or combined a_id,b_id-PK (dropping the "id" column)
When synthetic: What other indices/constraints?
When combined: Also index on b_id? Or even b_id,a_id (don't think so)?
Also combined when these entries are referenced themselves?
Also combined when these entries perhaps are referenced themselves in the future?
Heap or Index-organized table
Always or only up to x "somevalue"-columns?
I know that the decision for one of the designs is closely related to the question how the table will be used (read/write ratio, density, etc.), but perhaps we get a 20/80 solution as blueprint for future readers.
I'm looking forward to your ideas!
Blama
I have always made the PK be the combination of the two FKs, a_id and b_id in your example. Adding a synthetic id field to this table does no good, since you never end up looking for a row based on a knowledge of its id.
Using the compound PK gives you a constraint that prevents the same instance of the relationship between a and b from being inserted twice. If duplicate entries need to be permitted, there's something wrong with your data model at the conceptual level.
The index you get behind the scenes (for every DBMS I know of) will be useful to speed up common joins. An extra index on b_id is sometimes useful, depending on the kinds of joins you do frequently.
Just as a side note, I don't use the name "id" for all my synthetic pk columns. I prefer a_id, b_id. It makes it easier to manage the metadata, even though it's a little extra typing.
CREATE TABLE A_TO_B
(
a_id NUMBER NOT NULL REFERENCES A (a_id),
b_id NUMBER NOT NULL REFERENCES B (b_id),
PRIMARY KEY (a_id, b_id),
...
) ;
It's not unusual for ORMs to require (or, in more clueful ORMs, hope for) an integer column named "id" in addition to whatever other keys you have. Apart from that, there's no need for it. An id number like that makes the table wider (which usually degrades I/O performance just slightly), and adds an index that is, strictly speaking, unnecessary. It isn't necessary to identify the entity--the existing key does that--and it leads new developers into bad habits. (Specifically, giving every table an integer column named "id", and believing that that column alone is the only key you need.)
You're likely to need one or more of these indexed.
a_id
b_id
{a_id, b_id}
{b_id, a_id}
I believe Oracle should automatically index {a_id, b_id}, because that's the primary key. Oracle doesn't automatically index foreign keys. Oracle's indexing guidelines are online.
In general, you need to think carefully about whether you need ON UPDATE CASCADE or ON DELETE CASCADE. In Oracle, you only need to think carefully about whether you need ON DELETE CASCADE. (Oracle doesn't support ON UPDATE CASCADE.)
the other comments so far are good.
also consider adding begin_dt and end_dt to the relationship. in this way, you can manage a good number of questions about each relationship through time. (consider baseline issues)
I would like to add a constraint which prevents adding a value to a column if the value exists in the primary key column of another table. Is this possible?
EDIT:
Table: MasterParts
MasterPartNumber (Primary Key)
Description
....
Table: AlternateParts
MasterPartNumber (Composite Primary Key, Foreign Key to MasterParts.MasterPartNumber)
AlternatePartNumber (Composite Primary Key)
Problem - Alternate part numbers for each master part number must not themselves exist in the master parts table.
EDIT 2:
Here is an example:
MasterParts
MasterPartNumber Decription MinLevel MaxLevel ReOderLevel
010-00820-50 Garmin GTN™ 750 1 5 2
AlternateParts
MasterPartNumber AlternatePartNumber
010-00820-50 0100082050
010-00820-50 GTN750
only way I could think of solving this would be writing a checking function(not sure what language you are working with), or trying to play around with table relationships to ensure that it's unique
Why not have a single "part" table with an "is master part" flag and then have an "alternate parts" table that maps a "master" part to one or more "alternate" parts?
Here's one way to do it without procedural code. I've deliberately left out ON UPDATE CASCADE and ON DELETE CASCADE, but in production I'd might use both. (But I'd severely limit who's allowed to update and delete part numbers.)
-- New tables
create table part_numbers (
pn varchar(50) primary key,
pn_type char(1) not null check (pn_type in ('m', 'a')),
unique (pn, pn_type)
);
create table part_numbers_master (
pn varchar(50) primary key,
pn_type char(1) not null default 'm' check (pn_type = 'm'),
description varchar(100) not null,
foreign key (pn, pn_type) references part_numbers (pn, pn_type)
);
create table part_numbers_alternate (
pn varchar(50) primary key,
pn_type char(1) not null default 'a' check (pn_type = 'a'),
foreign key (pn, pn_type) references part_numbers (pn, pn_type)
);
-- Now, your tables.
create table masterparts (
master_part_number varchar(50) primary key references part_numbers_master,
min_level integer not null default 0 check (min_level >= 0),
max_level integer not null default 0 check (max_level >= min_level),
reorder_level integer not null default 0
check ((reorder_level < max_level) and (reorder_level >= min_level))
);
create table alternateparts (
master_part_number varchar(50) not null references part_numbers_master (pn),
alternate_part_number varchar(50) not null references part_numbers_alternate (pn),
primary key (master_part_number, alternate_part_number)
);
-- Some test data
insert into part_numbers values
('010-00820-50', 'm'),
('0100082050', 'a'),
('GTN750', 'a');
insert into part_numbers_master values
('010-00820-50', 'm', 'Garmin GTN™ 750');
insert into part_numbers_alternate (pn) values
('0100082050'),
('GTN750');
insert into masterparts values
('010-00820-50', 1, 5, 2);
insert into alternateparts values
('010-00820-50', '0100082050'),
('010-00820-50', 'GTN750');
In practice, I'd build updatable views for master parts and for alternate parts, and I'd limit client access to the views. The updatable views would be responsible for managing inserts, updates, and deletes. (Depending on your company's policies, you might use stored procedures instead of updatable views.)
Your design is perfect.
But SQL isn't very helpful when you try to implement such a design. There is no declarative way in SQL to enforce your business rule. You'll have to write two triggers, one for inserts into masterparts, checking the new masterpart identifier doesn't yet exist as an alias, and the other one for inserts of aliases checking that the new alias identifier doesn't yet identiy a masterpart.
Or you can do this in the application, which is worse than triggers, from the data integrity point of view.
(If you want to read up on how to enforce constraints of arbitrary complexity within an SQL engine, best coverage I have seen of the topic is in the book "Applied Mathematics for Database Professionals")
Apart that it sounds like a possibly poor design,
You in essence want values spanning two columns in different tables, to be unique.
In order to utilize DBs native capability to check for uniqueness, you can create a 3rd, helper column, which will contain a copy of all the values inside the wanted two columns. And that column will have uniqueness constraint. So for each new value added to one of your target columns, you need to add the same value to the helper column. In order for this to be an inner DB constraint, you can add this by a trigger.
And again, needing to do the above, sounds like an evidence for a poor design.
--
Edit:
Regarding your edit:
You say " Alternate part numbers for each master part number must not themselves exist in the master parts table."
This itself is a design decision, which you don't explain.
I don't know enough about the domain of your problem, but:
If you think of master and alternate parts, as totally different things, there is no reason why you may want "Alternate part numbers for each master part number must not themselves exist in the master parts table". Otherwise, you have a common notion of "parts" be it master or alternate. This means they need to be in the same table, and column.
If the second is true, you need something like this:
table "parts"
columns:
id - pk
is_master - boolean (assuming a part can not be master and alternate at the same time)
description - text
This tables role is to list and describe the parts.
Then you have several ways to denote which part is alternate to which. It depends on whether a part can be alternate to more than one part. And it sounds that anyway one master part can have several alternates.
You can do it in the same table, or create another one.
If same: add column: alternate_to, which will be null for master parts, and will have a foreign key into the id column of the same table.
Otherwise create a table, say "alternatives" with: master_id, alternate_id both referencing with a foreign key to the parts table.
(The first above assumes that a part cannot be alternate to more than one other part. If this is not true, the second will work anyway)