Determining if an entity is weak or not - sql

I'm creating a relational database of a store and its stock of products.
In the brief, it says "products can be returned under agreed terms e.g. expiry date or manufacturers error", based on this I created a weak entity "Terms" with product_ID as the foreign key and errors & expiry as two attributes.
My logic was that the terms only exist if the product exists, therefore it is a weak attribute as every product has terms, but you wouldn't have terms not associated with a product.
Looking at it though, the "Terms" table would basically be Product ID (1) ---> Errors (No) ---> Expiry (01/01/23), and now I'm starting to think those two attributes should be attributes of the product table and not a separate entity, mainly because "Terms" doesn't have a partial/discriminator key that could be used as a composite primary.
Does anyone have any thoughts about which way is correct?

I think this answer really comes down to the trade-offs in terms of performance.
To make sure I understand your question correctly - you basically have two tables:
The main product table
A "lookup" table that just has Product_ID (FK), Errors, and Expiry as the columns
If this is the case, you have two options:
Just add Errors and Expiry as columns to the primary product table
Keep the two tables separated as you have them, and just JOIN that data when needed.
Option 1 has the benefit of keeping all the data in one table, assuming that "Expiry" and "Errors" are unique to the product_ID; if they're not, you may end up duplicating data, and it's better to keep these fields in your separate table to have a 1:Many relationship. The other drawback would be that if your main Product table is beefy, you've slowed down the query even further by adding these columns.
Option 2 can circumvent the two shortcomings of Option 1 - by keeping this data separate, your Product table is much lighter, and if you have a 1:many relationship, you don't duplicate data (saving you more memory overall!). The drawback with Option 2 is that your EDR gets a bit more complicated - you have one more table to keep track of.
Based on these, I recommend keeping your separate "lookup" table - the benefits of separating this data out will help you in the long run - but ultimately you'll need to weight the pros and cons since I don't know the extent of your project.

Related

Many-to-many relationship without intersection table?

I am setting up a database where I'd like to have many-to-many relationships between some tables. There's no user interface for this database; we will be putting data into the tables using R scripts and retrieving it using Python scripts.
The entities involved are projects and cost forecasts. Multiple projects may use the same forecast. For each forecast, there are costs to develop a project in each of several future years. I need to be able to retrieve the cost forecast for each future year for each individual project.
I think the tables below would be a fairly standard way to represent these relationships. Note that "pk" means "primary key" and "fk" means "foreign key".
PROJECT
name
forecast_id (fk)
FORECAST
forecast_id (pk)
COST
forecast_id (fk)
year
cost
To retrieve the forecast for a particular project, I would just retrieve all the rows from COST that have a matching forecast_id. I don't need the FORECAST table for anything, except as a home for the forecast_id that establishes the many-to-many relationship between PROJECT and COST.
So my main question is, can I just drop the FORECAST table and have a direct many-to-many relationship between PROJECT and COST, using the forecast_id? I know this is physically possible, but many discussions use language along the lines that "many-to-many relationships aren't possible without a bridge table." But why would I want to add the bridge table, if I can do all my queries without it and it is one more table I would have to maintain?
Going further, many discussions of many-to-many relationships (including #mike-organek's comment below) suggest a structure similar to this:
PROJECT
project_id (pk)
name
PROJECT_COST
project_id (fk)
cost_id (fk)
COST
cost_id (pk)
year
cost
While this seems like a commonly preferred approach, it suits my needs even less well. Now every time I add a new project, instead of just assigning the forecast_id corresponding to a particular forecast, I have to add a bunch of link records to the PROJECT_COST table, one for each future year. This will also require a lot of management, and allows potential creation of relationships I don't want (e.g., one project uses costs from one forecast for the first two years, then costs from a different forecast for the next two years).
So my second question is, is there anything preferable about the second approach over the first approach, or over my simplified approach (using just the PROJECT and COST tables)?
Update
There seems to be some confusion about what I'm asking here. So I've revised the question significantly to try to make it clearer. Note that I renamed cost_group to forecast as part of this.
The second approach (with the project_cost table containing two foreign keys) is the correct way to model a many-to-many relationship.
But your idea with the shared forecast_id (with or without forecast table) exhibits that you are not thinking of a many-to-many relationship in the ordinary sense: if one project is associated with a certain set of costs, all other projects must either be associated with the same or a disjoint set of costs.
If that is what you want, I see no problem with removing the forecast table. There is no referential integrity you are losing that way.
If you have additional requirements, for example that there has to be at least a cost and a project for each existing forecast_id, things may change. That could be guaranteed with foreign keys from the forecast table, but not without that table.

Relational Database Design: Conditionals

I'm designing a relational database that I plan to implement with SQL. I have a use case that I'm working on and seem to be having a bit of trouble thinking through the solution. The design is for an e-commerce order system.
Use Case:
The ORDER_DETAILS table contains a deliveryMethod attribute. I then have a SHIPPING_DETAILS table that contains address information and a PICKUP_DETAILS table that contains location, date, and time information for an in-person pickup. When a user places an order, they have the option to have their order shipped to their address or to pick up their order in person. My current thought is to have a shippingId foreign key and pickupId foreign key in the ORDER_DETAILS table. Then, basically run a conditional check on the deliveryMethod attribute and retrieve data from the appropriate table depending on the value of that attribute (either "shipping" or "pickup"). With this thought, however, I would be allowing for null values to be present in the ORDER_DETAILS for either the shippingId or the pickupId attributes. From my understanding, null values are viewed negatively in relational designs. So I'm looking for some feedback on this design. Is this okay? Am I overthinking the nulls? Is there a more efficient way to design this particular schema?
If I understand your problem correctly,
The cardinality of the relationship of ORDER to SHIPPING is 1 ---> (0, 1)
The cardinality of the relationship of ORDER to PICKUP is 1 ---> (0, 1)
An ORDER MUST have either a SHIPPING or a PICKUP, but not both.
To enforce the constraint (#3) you could define a functional constraint in the database. That gets into interesting stuff.
Anyway, like you say, you could make columns in ORDER that are FKs to the SHIPPING or PICKUP tables, but both of those are nullable. I don't think null FKs are evil or anything, but they do get messy especially if you had a whole bunch of delivery methods and not just two.
If you don't like the nulls, you could have separate association tables: (1) ORDER_DELIVERY that has just an order_id and an delivery_id, each are FKs to the respective tables, and (2) ORDER_PICKUP, also a two column table. In each case the primary key would be order_id. Now there are no nulls: the orders with delivery are in the ORDER_DELIVERY table and the orders with pickup are in ORDER_PICKUP.
Of course there's a tradeoff, as maintaining the constraint that there be exactly one and only one delivery method is not a consistency check across tables.
Another idea is to make the delivery and pickup details be JSON fields. Here you are doing more work on the application side, enforcing constraints programmatically, but you won't have nulls.
I wish I could say that there was a slam-dunk go-to design pattern here, but I don't see one. Personally with only two types of delivery methods, I would not shy from having nulls (as I'm not a purist). But I do love it when the database does the work, so....
(Oh, the answer to the question "are you over thinking things?" is no, this thinking is really good!)

Two identical tables: Keep them seperate or merge them with extra key column

Lets assume that we have two statuses tables which are identical. Each table has his own values. They don't share the same.
Now, what is best practice?
Keep those seperated, or
Merge them with additional key column
Separate:
Table: offer_statuses
- id
- name //For example: calculating, sent
Table: project_statuses
- id
- name //For example: preparation, in progress
Merged:
Table: statuses
- id
- status //For example: offer, project
- name //For example: calculating, sent, preparation, in progress
Id keep then separate. A project status is not an offer status. You gain nothing by combining, and now every foreign key to the combined status table will be two columns instead of one. You can also introduce errors, as your foreign keys will not prevent you from using an offer status where only a project status is valid.
You can go either way. Normally, there would be a separate reference table for each type of status, because this would allow both:
foreign key relations from offers and projects to the appropriate statuses; and
no accidental mixing of statuses between the two entities.
There are some situations where having all statuses in the same table is useful. For instance, if the statuses really do overlap, then putting them in the same table makes sense. Another use-case is internationalization. If the application needs to be easily translatable, then having all language strings (such as status descriptions) in a single table (or small number thereof) is helpful.
In other words, I would usually go for separate tables. However, there might be good reasons for combining them.
I'd recommend to keep them separated. It sounds like they're different from the logical point of view. Having the same attributes is an indicator two tables might model the same entity but no proof.
Keeping them different makes it easier to enforce correctness of the data, as you can use foreign key constraints, that point to the right status. If your just have one table it becomes more difficult to ensure, that e.g. offers only use status for offers.

One mysql table with many fields or many (hundreds of) tables with fewer fields?

I am designing a system for a client, where he is able to create data forms for various products he sales him self.
The number of fields he will be using will not be more than 600-700 (worst case scenario). As it looks like he will probably be in the range of 400 - 500 (max).
I had 2 methods in mind for creating the database (using meta data):
a) Create a table for each product, which will hold only fields necessary for this product, which will result to hundreds of tables but with only the neccessary fields for each product
or
b) use one single table with all availabe form fields (any range from current 300 to max 700), resulting in one table that will have MANY fields, of which only about 10% will be used for each product entry (a product should usualy not use more than 50-80 fields)
Which solution is best? keeping in mind that table maintenance (creation, updates and changes) to the table(s) will be done using meta data, so I will not need to do changes to the table(s) manually.
Thank you!
/**** UPDATE *****/
Just an update, even after this long time (and allot of additional experience gathered) I needed to mention that not normalizing your database is a terrible idea. What is more, a not normalized database almost always (just always from my experience) indicates a flawed application design as well.
i would have 3 tables:
product
id
name
whatever else you need
field
id
field name
anything else you might need
product_field
id
product_id
field_id
field value
Your key deciding factor is whether normalization is required. Even though you are only adding data using an application, you'll still need to cater for anomalies, e.g. what happens if someone's phone number changes, and they insert multiple rows over the lifetime of the application? Which row contains the correct phone number?
As an example, you may find that you'll have repeating groups in your data, like one person with several phone numbers; rather than have three columns called "Phone1", "Phone2", "Phone3", you'd break that data into its own table.
There are other issues in normalisation, such as transitive or non-key dependencies. These concepts will hopefully lead you to a database table design without modification anomalies, as you should hope for!
Pulegiums solution is a good way to go.
You do not want to go with the one-table-for-each-product solution, because the structure of your database should not have to change when you insert or delete a product. Only the rows of one or many tables should be inserted or deleted, not the tables themselves.
While it's possible that it may be necessary, having that many fields for something as simple as a product list sounds to me like you probably have a flawed design.
You need to analyze your potential table structures to ensure that each field contains no more than one piece of information (e.g., "2 hammers, 500 nails" in a single field is bad) and that each piece of information has no more than one field where it belongs (e.g., having phone1, phone2, phone3 fields is bad). Either of these situations indicates that you should move that information out into a separate, related table with a foreign key connecting it back to the original table. As pulegium has demonstrated, this technique can quickly break things down to three tables with only about a dozen fields total.

Do these database design styles (or anti-pattern) have names?

Consider a database with tables Products and Employees. There is a new requirement to model current product managers, being the sole employee responsible for a product, noting that some products are simple or mature enough to require no product manager. That is, each product can have zero or one product manager.
Approach 1: alter table Product to add a new NULLable column product_manager_employee_ID so that a product with no product manager is modelled by the NULL value.
Approach 2: create a new table ProductManagers with non-NULLable columns product_ID and employee_ID, with a unique constraint on product_ID, so that a product with no product manager is modelled by the absence of a row in this table.
There are other approaches but these are the two I seem to encounter most often.
Assuming these are both legitimate design choices (as I'm inclined to believe) and merely represent differing styles, do they have names? I prefer approach 2 and find it hard to convey the difference in style to someone who prefers approach 1 without employing an actual example (as I have done here!) I'd would be nice if I could say, "I'm prefer the inclination-towards-6NF (or whatever) style myself."
Assuming one of these approaches is in fact an anti-pattern (as I merely suspect may be the case for approach 1 by modelling a relationship between two entities as an attribute of one of those entities) does this anti-pattern have a name?
Well the first is nothing more than a one-to-many relationship (one employee to many products). This is sometimes referred to as a O:M relationship (zero to many) because it's optional (not every product has a product manager). Also not every employee is a product manager so its optional on the other side too.
The second is a join table, usually used for a many-to-many relationship. But since one side is only one-to-one (each product is only in the table once) it's really just a convoluted one-to-many relationship.
Personally I prefer the first one but neither is wrong (or bad).
The second would be used for two reasons that come to mind.
You envision the possibility that a product will have more than one manager; or
You want to track the history of who the product manager is for a product. You do this with, say a current_flag column set to 'Y' (or similar) where only one at a time can be current. This is actually a pretty common pattern in database-centric applications.
It looks to me like the two model different behaviour. In the first example, you can have one product manager per product and one employee can be product manager for more than one product (one to many). The second appears to allow for more than one product manager per product (many to many). This would suggest the two solutions are equally valid in different situations and which one you use would depend on the business rule.
There is a flaw in the first approach. Imagine for a second, that the business requirements have changed and now you need to be able to set 2 Product Manager to a product. What will you do? Add another column to the table Product? Yuck. This obviously violates 1NF then.
Another option the second approach gives is an ability to store some attributes for a certain Product Manager <-> Product relation. Like, if you have two Product Manager for a product, then you can set one of them as a primary...
Or, for example, an employee can have a phone number, but as a product manager he/she can have another phone number... This also goes to the special table then.
Approach 1)
Slows down the use of the Product table with the additional Product Manager field (maybe not for all databases but for some).
Linking from the Product table to the Employee table is simple.
Approach 2)
Existing queries using the Product table are not affected.
Increases the size of your database. You've now duplicated the Product ID column to another table as well as added unique constraints and indexes to that table.
Linking from the Product table to the Employee table is more cumbersome and costly as you have to ink to the intermediate table first.
How often must you link between the two tables?
How many other queries use the Product table?
How many records in the Product table?
in the particular case you give, i think the main motivation for two tables is avoiding nulls for missing data and that's how i would characterise the two approaches.
there's a discussion of the pros and cons on wikipedia.
i am pretty sure that, given c date's dislike of this, he defines relational theory so that only the multiple table solution is "valid". for example, you could call the single table approach "poorly typed" (since the type of null is unclear - see quote on p4).