Relational Database Design: Conditionals - sql

I'm designing a relational database that I plan to implement with SQL. I have a use case that I'm working on and seem to be having a bit of trouble thinking through the solution. The design is for an e-commerce order system.
Use Case:
The ORDER_DETAILS table contains a deliveryMethod attribute. I then have a SHIPPING_DETAILS table that contains address information and a PICKUP_DETAILS table that contains location, date, and time information for an in-person pickup. When a user places an order, they have the option to have their order shipped to their address or to pick up their order in person. My current thought is to have a shippingId foreign key and pickupId foreign key in the ORDER_DETAILS table. Then, basically run a conditional check on the deliveryMethod attribute and retrieve data from the appropriate table depending on the value of that attribute (either "shipping" or "pickup"). With this thought, however, I would be allowing for null values to be present in the ORDER_DETAILS for either the shippingId or the pickupId attributes. From my understanding, null values are viewed negatively in relational designs. So I'm looking for some feedback on this design. Is this okay? Am I overthinking the nulls? Is there a more efficient way to design this particular schema?

If I understand your problem correctly,
The cardinality of the relationship of ORDER to SHIPPING is 1 ---> (0, 1)
The cardinality of the relationship of ORDER to PICKUP is 1 ---> (0, 1)
An ORDER MUST have either a SHIPPING or a PICKUP, but not both.
To enforce the constraint (#3) you could define a functional constraint in the database. That gets into interesting stuff.
Anyway, like you say, you could make columns in ORDER that are FKs to the SHIPPING or PICKUP tables, but both of those are nullable. I don't think null FKs are evil or anything, but they do get messy especially if you had a whole bunch of delivery methods and not just two.
If you don't like the nulls, you could have separate association tables: (1) ORDER_DELIVERY that has just an order_id and an delivery_id, each are FKs to the respective tables, and (2) ORDER_PICKUP, also a two column table. In each case the primary key would be order_id. Now there are no nulls: the orders with delivery are in the ORDER_DELIVERY table and the orders with pickup are in ORDER_PICKUP.
Of course there's a tradeoff, as maintaining the constraint that there be exactly one and only one delivery method is not a consistency check across tables.
Another idea is to make the delivery and pickup details be JSON fields. Here you are doing more work on the application side, enforcing constraints programmatically, but you won't have nulls.
I wish I could say that there was a slam-dunk go-to design pattern here, but I don't see one. Personally with only two types of delivery methods, I would not shy from having nulls (as I'm not a purist). But I do love it when the database does the work, so....
(Oh, the answer to the question "are you over thinking things?" is no, this thinking is really good!)

Related

Determining if an entity is weak or not

I'm creating a relational database of a store and its stock of products.
In the brief, it says "products can be returned under agreed terms e.g. expiry date or manufacturers error", based on this I created a weak entity "Terms" with product_ID as the foreign key and errors & expiry as two attributes.
My logic was that the terms only exist if the product exists, therefore it is a weak attribute as every product has terms, but you wouldn't have terms not associated with a product.
Looking at it though, the "Terms" table would basically be Product ID (1) ---> Errors (No) ---> Expiry (01/01/23), and now I'm starting to think those two attributes should be attributes of the product table and not a separate entity, mainly because "Terms" doesn't have a partial/discriminator key that could be used as a composite primary.
Does anyone have any thoughts about which way is correct?
I think this answer really comes down to the trade-offs in terms of performance.
To make sure I understand your question correctly - you basically have two tables:
The main product table
A "lookup" table that just has Product_ID (FK), Errors, and Expiry as the columns
If this is the case, you have two options:
Just add Errors and Expiry as columns to the primary product table
Keep the two tables separated as you have them, and just JOIN that data when needed.
Option 1 has the benefit of keeping all the data in one table, assuming that "Expiry" and "Errors" are unique to the product_ID; if they're not, you may end up duplicating data, and it's better to keep these fields in your separate table to have a 1:Many relationship. The other drawback would be that if your main Product table is beefy, you've slowed down the query even further by adding these columns.
Option 2 can circumvent the two shortcomings of Option 1 - by keeping this data separate, your Product table is much lighter, and if you have a 1:many relationship, you don't duplicate data (saving you more memory overall!). The drawback with Option 2 is that your EDR gets a bit more complicated - you have one more table to keep track of.
Based on these, I recommend keeping your separate "lookup" table - the benefits of separating this data out will help you in the long run - but ultimately you'll need to weight the pros and cons since I don't know the extent of your project.

SQL Database Design: Bad practice to have parent table with only primary key?

My database holds two types of orders - internal and external. Since they are both order types I want them to share a primary key, which comes from the superentity 'RentOrder'. This design is shown below:
My questions:
Would it be considered bad practice that my RentOrder table contains only one column, which is the primary key - 'id'?
ExternalRentOrder and InternalRentOrder have a number of fields in common (e.g. orderDate, rentStartDate, rentEndDate etc.). Obviously these columns could be in the parent RentOrder table. However that would mean I would need to do a parent-child JOIN to get all InternalRentOrder or ExternalRentOrder data. This seems less efficient and performance is my priority. Is there a right way to do this and is my current solution ok?
Thank you for your time.
It is not unreasonable. That said, I usually have other columns in tables, such as:
createdAt -- datetime row was inserted
createdBy -- who created the row
In addition, common columns might also be helpful. In your case:
orderId
supplierId
orderDate
and so on.
In fact, there might be a fair amount of commonality, so you might find a single table is sufficient. Separate tables are helpful if you want foreign key relationships to InternalRentOrder and/or to ExternalRentOrder.
Finally, a type column might also be helpful. Depending on the database you are using, this can make it easier to ensure no duplication between the two tables.

Is it possible to implement a TRUE one-to-one relation?

Consider the following model where a Customer should have one and only one Address and an Address should belong to one and only one Customer:
To implement it, as almost everybody in DB field says, Shared PK is the solution:
But I think it is a fake one-to-one relationship. Because nothing in terms of database relationship actually prevents deleting any row in table Address. So truely, it is 1..[0..1] not 1..1
Am I right? Is there any other way to implement a true 1..1 relation?
Update:
Why cascade delete is not a solution:
If we consider cascade delete as a solution we should put this on either of the tables. Let's say if a row is deleted from table Address, it causes corresponding row in table Customer to be deleted. it's okay but half of the solution. If a row in Customer is deleted, the corresponding row in Address should be deleted as well. This is the second half of the solution, and it obviously makes a cycle.
Beside my comment
You could implement DELETE CASCADE See HOW
I realize there is also the problem of insert.
You have to insert Customer first and then Address
So I think the best way if you really want a 1:1 is create a single table instead.
Customer
CustomerID
Name
Address
City
Sorry, is this meant to be a real-world database relationship? In all of the many databases I have ever built with customer data, there has always been real cases of either customers with multiple addresses, or more than one organisation at the same address.
I wouldn't want to lead you into a database modelling fallacy by suggesting anything different.
Yes, the "shared PK" idiom you show is for 1-to-0-or-1.
The straightforward way to have a true 1-to-1 correspondence is to have one table with Customer and Address as CKs (candidate keys). (Via UNIQUE NOT NULL and/or PRIMARY KEY.) You could offer the separate tables as views. Unfortunately typical DBMSs have restrictions on what you can do via the views, in particular re updating.
The relational way to have separate CUSTOMER and ADDRESS tables and a third table/association/relationship with Customer and Address columns as CKs plus FKs on Customer to and from CUSTOMER and on Address to and from ADDRESS (or equivalent constraint(s)). Unfortunately most DBMSs needlessly won't let you declare cycles in FKs and you cannot impose the constraints without triggers/complexity. (Ultimately, if you want to have proper integrity in a typical SQL database you need to use triggers and complex idioms.)
Entity-oriented design methods unfortunately artificially distinguish between entities, associations and properties. Here is an example where if you consider the simplest design to simply be the one table with PKs then you don't want to always have to have distinct tables for each entity. Or if you consider the simplest design to be the three tables (or even two) with the PKs and FKs (or some other constraint(s) for 1-to-1) then unfortunately typical DBMSs just don't declaratively/ergonomically support that particular design situation.
(Straightforward relational design is to have values (that are sometimes used as ids) 1-to-1 with application things but then just have whatever relevant application relationships/associations/relations and corresponding/representing tables/relations as needed to describe your application situations.)
It's possible in principle to implement a true 1-1 data structure in some DBMSs. It's very difficult to add data or modify data in such a structure using standard SQL however. Standard SQL only permits one table to be updated at a time and therefore as soon as you insert a row into one or other table the intended constraint is broken.
Here are two examples. First using Tutorial D. Note that the comma between the two INSERT statements ensures that the 1-1 constraint is never broken:
VAR CUSTOMER REAL RELATION {
id INTEGER} KEY{id};
VAR ADDRESS REAL RELATION {
id INTEGER} KEY{id};
CONSTRAINT one_to_one (CUSTOMER{id} = ADDRESS{id});
INSERT CUSTOMER RELATION {
TUPLE {id 1234}
},
INSERT ADDRESS RELATION {
TUPLE {id 1234}
};
Now the same thing in SQL.
CREATE TABLE CUSTOMER (
id INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE ADDRESS (
id INTEGER NOT NULL PRIMARY KEY);
INSERT INTO CUSTOMER (id)
VALUES (1234);
INSERT INTO ADDRESS (id)
VALUES (1234);
ALTER TABLE CUSTOMER ADD CONSTRAINT one_to_one_1
FOREIGN KEY (id) REFERENCES ADDRESS (id);
ALTER TABLE ADDRESS ADD CONSTRAINT one_to_one_2
FOREIGN KEY (id) REFERENCES CUSTOMER (id);
The SQL version uses two foreign key constraints, which is the only kind of multi-table constraint supported by most SQL DBMSs. It requires two INSERT statements which means I could only insert a row before adding the constraints, not after.
A strict one-to-one constraint probably isn't very useful in practice but it's actually just a special case of something more important and interesting: join dependency. A join dependency is effectively an "at least one" constraint between tables rather than "exactly one". In the world outside databases it is common to encounter examples of business rules that ought to be implemented as join dependencies ("each customer must have AT LEAST ONE addresss", "each order must have AT LEAST ONE item in it"). In SQL DBMSs it's hard or impossible to implement join dependencies. The usual solution is simply to ignore such business rules thus weakening the data integrity value of the database.
Yes, what you say is true, the dependent side of a 1:1 relationship may not exist -- if only for the time it takes to create the dependent entity after creating the independent entity. In fact, all relationships may have a zero on one side or the other. You can even turn the relationship into a 1:m by placing the FK of the address in the Customer row and making the field not null. You can still have addresses that aren't referenced by any customer.
At first glance, a m:n may look like an exception. The intersection entry is generally defined so that neither FK can be null. But there can be customers and addresses both that have no entry referring to them. So this is really a 0..m:0..n relationship.
What of it? Everyone I've ever worked with has understood that "one" (as in 1:1) or "many" (as in 1:m or m:n) means "no more than this." There is no "exactly this, no more or less." For example, we can design a 1:3 relationship on paper. We cannot strictly enforce it in any database. We have to use triggers, stored procedures and/or scheduled tasks to seek out and call our attention to deviations. Execute a stored procedure weekly, for instance, that will seek and and flag or delete any such orphaned addresses.
Think of it like a "frictionless surface." It exists only on paper.
I see this question as a conceptual misunderstanding. Relations are between different things. Things with a "true 1-to-1 relation" are by definition aspects or attributes of the same thing, and belong in the same table. No, of course a person and and address are not the same, but if they are inseparable, and must always be inserted, deleted, or otherwise acted upon as a unit, then as data they are "the same thing". This is exactly what is described here.
Yes, and it's actually quite easy: just put both entities in the same table!
OTOH, if you need to keep them in separate tables for some reason, then you need a key in one table referencing1 a key in another, and vice-versa. This, of course, represents a "chicken and egg" problem2 which can be resolved by deferring the enforcement of FKs to the end of the transaction3. This works only on DBMSes that support deferred constraints (such as Oracle and PostgreSQL).
1 Via a foreign key.
2 Inserting a row in the first table is impossible because that would violate the referential integrity towards the second table, but inserting a row in the second table is impossible because that would violate the referential integrity towards the first table, etc... Ditto for deletion.
3 So you simply insert both rows, and then check both FKs.

Normalize SQL database

I'm creating a database for a project and I'm a little confused about how normalization applies to my schema. Everytime a loan is aproved for a customer, they have 2 options a check or an EFT, so I want to know wheter the loan was a check or EFT.
This are my 3 tables:
Loans
id_loan (PK)
product
amount
status
Checks
id_check (PK)
id_customer
amount
EFT
id_eft (PK)
id_customer
amount
Then I created a 4th table to establish a relationship between loans and money disposal.
Disposal
id_payment (PK)
id_loan (FK loans)
id_disposal (FK checks or EFT)
disposal_type
In this table I store whether the loan is related to a check or an EFT, disposal_type field is a varchar with two possible values "check" or "EFT". id_disposal field acts as a foreign key for two tables.
The problem is that I think my database isn't normalized with this structure, am I right? What would be the best way to solve this?
You need something like the attached. Note that the customer_loans table is kind of extraneous and overkill, but if there's any columns that relate to the customer and the loan, and not the customer's loan payments, that's where it would go.
In the object world, you'd use inheritance for this. There would be a base type Disposal which CheckDisposal and EftDisposal would derive from. Modern O/RMs support several techniques for mapping this to a relational structure.
TablePerHierarchy puts all of the records into a single table with a discriminator column to identify what type a specific record holds and maps to. The advantage is that it requires fewer joins to get a record. Disadvantage is that it requires app logic to enforce data integrity.
TablePerType maps records into different tables with a fk relationship back to the base table. Of course this requires more joins (especially for deep or wide hierarchies) but data integrity can be enforced in the DB.

Multiple foreign keys to a single column

I'm defining a database for a customer/ order system where there are two highly distinct types of customers. Because they are so different having a single customer table would be very ugly (it'd be full of null columns as they are pointless for one type).
Their orders though are in the same format. Is it possible to have a CustomerId column in my Order table which has a foreign key to both the Customer Types? I have set it up in SQL server and it's given me no problems creating the relationships, but I'm yet to try inserting any data.
Also, I'm planning on using nHibernate as the ORM, could there be any problems introduced by doing the relationships like this?
No, you can't have a single field as a foreign key to two different tables. How would you tell where to look for the key?
You would at least need a field that tells what kind of user it is, or two separate foreign keys.
You could also put the information that is common for all users in one table and have separate tables for the information that is specific for the user types, so that you have a single table with user id as primary key.
A foreign key can only reference a single primary key, so no. However, you could use a bridge table:
CustomerA <---- CustomerA_Orders ----> Order
CustomerB <---- CustomerB_Orders ----> Order
So Order doesn't even have a foreign key; whether this is desirable, though...
I inherited a SQL Server database where this was done (a single column used in four foreign key relationships with four unrelated tables), so yes, it's possible. My predecessor is gone, though, so I can't ask why he thought it was a good idea.
He used a GUID column ("uniqueidentifier" type) to avoid the ambiguity problem, and he turned off constraint checking on the foreign keys, since it's guaranteed that only one will match. But I can think of lots of reasons that you shouldn't, and I haven't thought of any reasons you should.
Yours does sound like the classical "specialization" problem, typically solved by creating a parent table with the shared customer data, then two child tables that contain the data unique to each class of customer. Your foreign key would then be against the parent customer table, and your determination of which type of customer would be based on which child table had a matching entry.
You can create a foreign key referencing multiple tables. This feature is to allow vertical partioining of your table and still maintain referential integrity. In your case however, this is not applicable.
Your best bet would be to have a CustomerType table with possible columns - CustomerTypeID, CustomerID, where CustomerID is the PK and then refernce your OrderID table to CustomerID.
Raj
I know this is a very old question; however if other people are finding this question through the googles, and you don't mind adding some columns to your table, a technique I've used (using the original question as a hypothetical problem to solve) is:
Add a [CustomerType] column. The purpose of storing a value here is to indicate which table holds the PK for your (assumed) [CustomerId] FK column. Optional - addition of a check constraint (to ensure CustomerType is in CustomerA or CustomerB) will help you sleep better at night.
Add a computed column for each [CustomerType], eg:
[CustomerTypeAId] as case when [CustomerType] = 'CustomerA' then [CustomerId] end persisted
[CustomerTypeBId] as case when [CustomerType] = 'CustomerB' then [CustomerId] end persisted
Add your foreign keys to the calculated (and persisted) columns.
Caveat: I'm primarily in a MSSQL environment; so I don't know how well this translates to other DBMS (ie: Postgres, ORACLE, etc).
As noted, if the key is, say, 12345, how would you know which table to look it up in? You could, I suppose, do something to insure that the key values for the two tables never overlapped, but this is too ugly and painful to contemplate. You could have a second field that says which customer type it is. But if you're going to have two fields, why not have one field for customer type 1 id and another for customer type 2 id.
Without knowing more about your app, my first thought is that you really should have a general customer table with the data that is common to both, and then have two additional tables with the data specific to each customer type. I would think that there must be a lot of data common to the two -- basic stuff like name and address and customer number at the least -- and repeating columns across tables sucks big time. The additional tables could then refer back to the base table. As there is then a single key for the base table, the issue of foreign keys having to know which table to refer to evaporates.
Two distinct types of customer is a classic case of types and subtypes or, if you prefer, classes and subclasses. Here is an answer from another question.
Essentially, the class-table-inheritance technique is like Arnand's answer. The use of the shared-primary-key technique is what allows you to get around the problems created by two types of foreign key in one column. The foreign key will be customer-id. That will identify one row in the customer table, and also one row in the appropriate kind of customer type table, as the case may be.
Create a "customer" table include all the columns that have same data for both types of customer.
Than create table "customer_a" and "customer_b"
Use "customer_id" from "consumer" table as foreign key in "customer_a" and "customer_b"
customer
|
---------------------------------
| |
cusomter_a customer_b