SQL attributes depending on type

SQL attributes depending on type - sql

Let's say I have an entity CLIENT, which can be either PERSON or ORGANIZATION. Depending on which type it is, I have to choose attributes (address, name for organization, date_of_birth,first_name,last_name for person). I have created all three entities, but how can I make the attributes type-dependent?
Seen Database design: objects with different attributes, didn't help...

One typical choice is a 1:1 extension table:
create table client (id int primary key);
create table person (id int foreign key references client(id), ...columns...);
create table organization (id int foreign key references client(id), ...columns...);
However, my preferred choice is to include all columns in the client table. You can have a column for type that is either person or organization. Columns that are not relevant for the row's type can be null. Your queries will be much simpler that way.

Either you use 3 tables or you use 1 table and leave the not needed columns null. Which design is superior depends on the use case. Using only 1 table gives simpler queries but requires to change the table for each new subclass. Using multiple tables allows to add more types easily but gives more complicated queries. In doubt I would start with only 1 table but your mileage may vary.

Related

SQL table design: table with multiple one-to-one relationships

In SQL I have a table setup
RegisterTable
----
DocId int
status int
docType int
CarDocument Table
----
carDocId int (PK, FK -> RegisterTable)
name string
carMake varchar
EmployeeDocument
----
emplyeeDocId int (PK, FK -> RegisterTable)
name varchar
age int
This is a database about documents. Tables design have no relevance to the question.
So I have different documents Cars/Emplyees/etc... -- they all have completely different set of fields, unrelated.
I need to have metadata for these documents, which is represented in RegisterTable. This metadata is similar across documents. So it's a bit like inheritance.
Which is the DB design for this case? Currently I made three separate tables and created one-to-one relation from CarDocument/EmployeeDpcument to RegisterTable.
When I create a document, I first create it's metadata in RegisterTable, then I take the key and use it to create a document in corresponding CarDocument or EmployeeDocument table.
This works but looks cumbersome to me.
Extra info: I have 10-20 different document tables.
I use typeorm as my ORM solution.
Research:
Has similarities with Table has one to one relationship with many tables
My design works but RegisterTable is kinda fake since it holds all the docIds.
Which is the best DB design for this case?

Postgres actually does inheritance - see https://www.postgresql.org/docs/current/tutorial-inheritance.html
Aside from that, if you have metadata that is always the same across various types of documents, your approach to have a metadata table with a relation to the document tables is the right one, in principle (see below).
The metadata table itself does not need to know about the tables that reference it. Your query logic can derive the correct secondary document from the docType and the docId.
For your specific case, as you've posted it above, if a single "status" field is the only actual metadata you hold in that table, I think you would be better off to simply add that field to the document tables. Only if you have a fixed set of metadata that you don't want to replicate over many different tables does it make sense to split it into its own table.

I see nothing wrong with your design. One key point, anyway, is deciding if you'll share the IDs for all your entities/tables (as you're doing) or have separate IDs. The second choice may be the more tidy and flexible. You'll have something like this:
RegisterTable
----
docId int
status int
docType int
CarDocument
----
carDocId int (PK)
docId int (FK-> RegisterTable)
name string
carMake varchar
EmployeeDocument
----
emplyeeDocId int (PK)
docId int (FK-> RegisterTable)
name varchar
age int
Of course, you can also have just ONE big table with a lot of fields, filling each field (or not) depending on the docType, and maybe with different semantics for each different docType (no, I'm joking, don't do that).

There's a more flexible and scalable approach that can be used.
A single table would store all document metadata and then another separate table for each document type that stores specific details for that type of doc.
RegisterTable can be renamed to DocumentMetadata and contains DocId, status, docType etc.
CarDocument and EmployeeDocument tables contain columns that are specific to each type such as carMake and age.
Can bind the tables via Foreign Key from DocumentMetadata table to document-specific tables
It's not only more flexible because you can keep adding new types of docs, but also avoids creation of a meaningless table that doesn't have any real info (RegisterTable)

Questionable SQL Relationship

I am going through a pluralsight course that is currently going through building an MVC application using an entity framework code-first approach. I was confused about the Database schema used for the project.
As you can see, the relationship between Securities and it's relating tables seems to be one-to-one, but the confusion comes when I realize there is no foreign key to relate the two sub-tables and they they appear to share the same primary key column.
The video before made the Securities model class abstract in order for the "Stock" and "MutualFund" model classes to inherit from it and contain all relating data. To me however, it seems that same thing could be done using a couple of foreign keys.
I guess my question is does this method of linking tables serve any useful purpose in SQL or EF? It seems to me in order to create a new record for one table, all tables would need a new record which is where I really get confused.

In ORM and EF terminology, this setup is referred to as the "Table per Type" inheritance paradigm, where there is a table per subclass, a base class table, and the primary key is shared between the subclasses and the base class.
e.g. In this case, Securities_Stock and Securities_MutualFund are two subclasses of the Securities base class / table (possibly abstract).
The relationship will be 0..1 (subclass) to 1 (base class) - i.e. only one of the records in Securities_MutualFund or Securities_Stock will exist for each base table Securities row.
There's also often a discriminator column on the base table to indicate which subclass table to join to, but that doesn't seem to be the case here.
It is also common to enforce referential integrity between the subclasses to the base table with a foreign key.
To answer your question, the reason why there's no FK between the two subclass instance tables is because each instance (with a unique Id) will only ever be in ONE of the sub class tables - it is NOT possible for the same Security to be both a mutual fund and a share.
You are right, in order for a new concrete Security record to be added, a row is needed in both the base Securities Table (must be inserted first, as their are FK's from the subclass tables to the base table), and then a row is inserted into one of the subclass tables, with the rest of the 'specific' data.
If a Foreign Key was added between Stock and Mutual Fund, it would be impossible to insert new rows into the tables.
The full pattern often looks like this:
CREATE TABLE BaseTable
(
Id INT PRIMARY KEY, -- Can also be Identity
... Common columns here
Discriminator, -- Type usually has a small range, so `INT` or `CHAR` are common
);
CREATE TABLE SubClassTable
(
Id INT PRIMARY KEY, -- Not identity, must be manually inserted
-- Specialized SubClass columns here
FOREIGN KEY (Id) REFERENCES BaseTable(Id)
);

Should columns in a junction table be able to store null vales?

When creating a junction table in sql to handle a many-to-many releationship between two tables, should the foreign key columns in the junction table be able to store null values?

It is a bad idea to do that because it stores no information.
A junction table is a link between 2 tables. If a record exists, by definition it must have the id from both sides to make a "junction" link. Otherwise it carries no useful information and is known as a waste-of-space.TM

No. It doesn't make sense to store a row representing the absence of a relationship in a table you designed to store the presence of relationships.

In addition to the other answers:
the two columns referencing the other tables are usually the primary key of that junction table. So per definition they cannot be null.
There are some circumstances where those columns do not make up the complete primary key (e.g. when having an attribute as part of the link and allowing multiple links with different attributes) - but then that attribute is part of the PK.

Multiple foreign keys to a single column

I'm defining a database for a customer/ order system where there are two highly distinct types of customers. Because they are so different having a single customer table would be very ugly (it'd be full of null columns as they are pointless for one type).
Their orders though are in the same format. Is it possible to have a CustomerId column in my Order table which has a foreign key to both the Customer Types? I have set it up in SQL server and it's given me no problems creating the relationships, but I'm yet to try inserting any data.
Also, I'm planning on using nHibernate as the ORM, could there be any problems introduced by doing the relationships like this?

No, you can't have a single field as a foreign key to two different tables. How would you tell where to look for the key?
You would at least need a field that tells what kind of user it is, or two separate foreign keys.
You could also put the information that is common for all users in one table and have separate tables for the information that is specific for the user types, so that you have a single table with user id as primary key.

A foreign key can only reference a single primary key, so no. However, you could use a bridge table:
CustomerA <---- CustomerA_Orders ----> Order
CustomerB <---- CustomerB_Orders ----> Order
So Order doesn't even have a foreign key; whether this is desirable, though...

I inherited a SQL Server database where this was done (a single column used in four foreign key relationships with four unrelated tables), so yes, it's possible. My predecessor is gone, though, so I can't ask why he thought it was a good idea.
He used a GUID column ("uniqueidentifier" type) to avoid the ambiguity problem, and he turned off constraint checking on the foreign keys, since it's guaranteed that only one will match. But I can think of lots of reasons that you shouldn't, and I haven't thought of any reasons you should.
Yours does sound like the classical "specialization" problem, typically solved by creating a parent table with the shared customer data, then two child tables that contain the data unique to each class of customer. Your foreign key would then be against the parent customer table, and your determination of which type of customer would be based on which child table had a matching entry.

You can create a foreign key referencing multiple tables. This feature is to allow vertical partioining of your table and still maintain referential integrity. In your case however, this is not applicable.
Your best bet would be to have a CustomerType table with possible columns - CustomerTypeID, CustomerID, where CustomerID is the PK and then refernce your OrderID table to CustomerID.
Raj

I know this is a very old question; however if other people are finding this question through the googles, and you don't mind adding some columns to your table, a technique I've used (using the original question as a hypothetical problem to solve) is:
Add a [CustomerType] column. The purpose of storing a value here is to indicate which table holds the PK for your (assumed) [CustomerId] FK column. Optional - addition of a check constraint (to ensure CustomerType is in CustomerA or CustomerB) will help you sleep better at night.
Add a computed column for each [CustomerType], eg:
[CustomerTypeAId] as case when [CustomerType] = 'CustomerA' then [CustomerId] end persisted
[CustomerTypeBId] as case when [CustomerType] = 'CustomerB' then [CustomerId] end persisted
Add your foreign keys to the calculated (and persisted) columns.
Caveat: I'm primarily in a MSSQL environment; so I don't know how well this translates to other DBMS (ie: Postgres, ORACLE, etc).

As noted, if the key is, say, 12345, how would you know which table to look it up in? You could, I suppose, do something to insure that the key values for the two tables never overlapped, but this is too ugly and painful to contemplate. You could have a second field that says which customer type it is. But if you're going to have two fields, why not have one field for customer type 1 id and another for customer type 2 id.
Without knowing more about your app, my first thought is that you really should have a general customer table with the data that is common to both, and then have two additional tables with the data specific to each customer type. I would think that there must be a lot of data common to the two -- basic stuff like name and address and customer number at the least -- and repeating columns across tables sucks big time. The additional tables could then refer back to the base table. As there is then a single key for the base table, the issue of foreign keys having to know which table to refer to evaporates.

Two distinct types of customer is a classic case of types and subtypes or, if you prefer, classes and subclasses. Here is an answer from another question.
Essentially, the class-table-inheritance technique is like Arnand's answer. The use of the shared-primary-key technique is what allows you to get around the problems created by two types of foreign key in one column. The foreign key will be customer-id. That will identify one row in the customer table, and also one row in the appropriate kind of customer type table, as the case may be.

Create a "customer" table include all the columns that have same data for both types of customer.
Than create table "customer_a" and "customer_b"
Use "customer_id" from "consumer" table as foreign key in "customer_a" and "customer_b"
customer
|
---------------------------------
| |
cusomter_a customer_b

How to design a database schema to support tagging with categories?

I am trying to so something like Database Design for Tagging, except each of my tags are grouped into categories.
For example, let's say I have a database about vehicles. Let's say we actually don't know very much about vehicles, so we can't specify the columns all vehicles will have. Therefore we shall "tag" vehicles with information.
1. manufacture: Mercedes
model: SLK32 AMG
convertible: hardtop
2. manufacture: Ford
model: GT90
production phase: prototype
3. manufacture: Mazda
model: MX-5
convertible: softtop
Now as you can see all cars are tagged with their manufacture and model, but the other categories don't all match. Note that a car can only have one of each category. IE. A car can only have one manufacturer.
I want to design a database to support a search for all Mercedes, or to be able to list all manufactures.
My current design is something like this:
vehicles
int vid
String vin
vehicleTags
int vid
int tid
tags
int tid
String tag
int cid
categories
int cid
String category
I have all the right primary and foreign keys in place, except I can't handle the case where each car can only have one manufacturer. Or can I?
Can I add a foreign key constraint to the composite primary key in vehicleTags? IE. Could I add a constraint such that the composite primary key (vid, tid) can only be added to vehicleTags only if there isn't already a row in vehicleTags such that for the same vid, there isn't already a tid in the with the same cid?
My guess is no. I think the solution to this problem is add a cid column to vehicleTags, and make the new composite primary key (vid, cid). It would look like:
vehicleTags
int vid
int cid
int tid
This would prevent a car from having two manufacturers, but now I have duplicated the information that tid is in cid.
What should my schema be?
Tom noticed this problem in my database schema in my previous question, How do you do many to many table outer joins?
EDIT
I know that in the example manufacture should really be a column in the vehicle table, but let's say you can't do that. The example is just an example.

This is yet another variation on the Entity-Attribute-Value design.
A more recognizable EAV table looks like the following:
CREATE TABLE vehicleEAV (
vid INTEGER,
attr_name VARCHAR(20),
attr_value VARCHAR(100),
PRIMARY KEY (vid, attr_name),
FOREIGN KEY (vid) REFERENCES vehicles (vid)
);
Some people force attr_name to reference a lookup table of predefined attribute names, to limit the chaos.
What you've done is simply spread an EAV table over three tables, but without improving the order of your metadata:
CREATE TABLE vehicleTag (
vid INTEGER,
cid INTEGER,
tid INTEGER,
PRIMARY KEY (vid, cid),
FOREIGN KEY (vid) REFERENCES vehicles(vid),
FOREIGN KEY (cid) REFERENCES categories(cid),
FOREIGN KEY (tid) REFERENCES tags(tid)
);
CREATE TABLE categories (
cid INTEGER PRIMARY KEY,
category VARCHAR(20) -- "attr_name"
);
CREATE TABLE tags (
tid INTEGER PRIMARY KEY,
tag VARCHAR(100) -- "attr_value"
);
If you're going to use the EAV design, you only need the vehicleTags and categories tables.
CREATE TABLE vehicleTag (
vid INTEGER,
cid INTEGER, -- reference to "attr_name" lookup table
tag VARCHAR(100, -- "attr_value"
PRIMARY KEY (vid, cid),
FOREIGN KEY (vid) REFERENCES vehicles(vid),
FOREIGN KEY (cid) REFERENCES categories(cid)
);
But keep in mind that you're mixing data with metadata. You lose the ability to apply certain constraints to your data model.
How can you make one of the categories mandatory (a conventional column uses a NOT NULL constraint)?
How can you use SQL data types to validate some of your tag values? You can't, because you're using a long string for every tag value. Is this string long enough for every tag you'll need in the future? You can't tell.
How can you constrain some of your tags to a set of permitted values (a conventional table uses a foreign key to a lookup table)? This is your "softtop" vs. "soft top" example. But you can't make a constraint on the tag column because that constraint would apply to all other tag values for other categories. You'd effectively restrict engine size and paint color to "soft top" as well.
SQL databases don't work well with this model. It's extremely difficult to get right, and querying it becomes very complex. If you do continue to use SQL, you will be better off modeling the tables conventionally, with one column per attribute. If you have need to have "subtypes" then define a subordinate table per subtype (Class-Table Inheritance), or else use Single-Table Inheritance. If you have an unlimited variation in the attributes per entity, then use Serialized LOB.
Another technology that is designed for these kinds of fluid, non-relational data models is a Semantic Database, storing data in RDF and queried with SPARQL. One free solution is RDF4J (formerly Sesame).

I needed to solve this exact problem (same general domain and everything — auto parts). I found that the best solution to the problem was to use Lucene/Xapian/Ferret/Sphinx or whichever full-text indexer you prefer. Much better performance than what SQL can offer.

What you describe are not tags, tags are only values, they do not have an associated key.
Tags are normally implemented as a string column, the value being a list of values delimited.
For example #1, a tag field would contain a value such as:
"manufacture_Mercedes,model_SLK32 AMG,convertible_hardtop"
The user then would normally be able to easily filter entries, by the existence of one or more tags. It is essentially schemaless data from a database perspective. There are downsides to tags, but they also avoid the extreme complications that come from using an EAV model. If you really need an EAV model, it also might be worth considering an attributes field, which contains JSON data. It's more painful to query, but still not as horrible as querying EAV across multiple tables.

I think your solution is to simply add a manufacturer column to your vehicles table. It's an attribute that you know all the vehicles will have (i.e. cars don't spontaneously appear by themselves) and by making it a column in your vehicle table you solve the issue of having one and only one manufacturer for each vehicle. This approach would apply to any attributes that you know will be shared by all vehicles. You can then implement the tagging system for the other attributes that aren't universal.
So taking from your example the vehicle table would be something like:
vehicle
vid
vin
make
model

One way would be to slightly rethink your schema, normalising tag keys away from values:
vehicles
int vid
string vin
tags
int tid
int cid
string key
categories
int cid
string category
vehicleTags
int vid
int tid
string value
Now all you need is a unique constraint on vehicleTags(vid, tid).
Alternatively, there are ways to create constraints beyond simple foreign keys: depending on your database, can you write a custom constraint or an insert/update trigger to enforce vehicle-tag uniqueness?

I needed to solve this exact problem (same general domain and everything — auto parts). I found that the best solution to the problem was to use Lucene/Xapian/Ferret/Sphinx or whichever full-text indexer you prefer. Much better performance than what SQL can offer.
These days, I almost never end up building a database-backed web app that doesn't involve a full-text indexer. This problem and the general issue of search just come up way too often to omit indexers from your toolbox.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas