Redundant references vs. historical data

Redundant references vs. historical data - oop

I have a domain model in which each line item is associated with a product. The product has a list of options. Each option is either required or optional. The user can include an optional option which will add it to the line item's selections list.
In order to avoid redundancy, my first thought was to exclude required options from the line item's selections list. There are a lot of required options, so including them for every line item would lead to a bloated database.
The problem is that the products can potentially change over time. Options that were once required could become optional, and visa-versa. And entirely new options may be added to the product. This creates a problem with my initial idea, since the meaning of line items' selection lists would depend on a product's options at the time of the order.
So what should I do?
If I also include required options in the line item's selection lists, then the model is simple. I'd have a snapshot of the options that were included with the product. But then I've also got a lot of bloat in the database since references to required options will be repeated for every line item. Is this something I should be worried about or will SQL Server do some kind of behind-the-scenes compression?
Should I pursue my original idea of excluding required options from the line item's selection lists? Then I would need to keep some historical data regarding changes to the products. That way I could recreate the product and its options as they existed at the time of the order. Sounds possible but more complicated than the first option. I worry it would take more CPU cycles but that would be okay if its for old orders which won't be opened very often. I've never had to do this myself before, but maybe it wouldn't be so hard. If this is the approach you recommend, please provide some pointers to design patterns, etc. to help me get started.

I'd go with the first option if there's any chance that your list of required options will change in the future. If you don't store those options with each line item in the database, then you have to keep track of which options were required on which dates, and join them separately. This will needlessly complicate your join logic.
As for bloating your database, I don't think this will be as bad as you might think. It sounds like you probably already have join tables for ProductOptions and LineItemOptions that just contains product keys and option keys. This latter table should be the only one that ends up having more records based on your first design choice. Since it only contains keys, its records are not going to take up a lot more memory, and joining on it will be really fast anyway.

Related

Buffer table in a database, Good or not?

I have a question !
I need to make a university project, and in this project i will have one database table like this :
This table will have a LOT of records !!!!!!
And for manage this i need to create a validation system.
What is the best (and why) between create a buffer table like this :
Or add a column in my table like this :
Thank you !

Your question does not have enough information to provide a real answer. Here is some guidance on how to think about the situation. Which approach depends on the nature of your application and especially on what "validation" means.
One reasonable interpretation is that "validation" is part of a work-flow process, so it happens only once (or 99% of the time only once). And, you never want to see unvalidated advertisements when you look look at advertisements. If this is the case, then there would typically be additional information about the validation process.
This scenario suggests two reasonable approaches:
Do the validation inside a transaction. This would be reasonable if the validation process were entirely in the database and was measured in seconds.
Have a separate table for advertisements being validated. Perhaps even a separate table per "user" or "entity" responsible for them. Depending on the nature of the validation process, this could be a queue that feeds them to people doing the validation.
Putting them in the "advertisements" table doesn't make sense, because there is likely to be additional information involved with the validation process -- who, what, where, when, how.
If an advertisement can be validated and invalidated multiple times, then the best approach may be to put them in the same table. Once again, there are questions about the nature of the process.
Getting access to the two groups without a full table scan is tricky. If 10% of the rows are invalidated and 90% are validated, then a normal index would require a full table scan for reading either group. To get faster access to the smaller group, here are two options:
clustered index on the validation flag.
separate partitions for validated and invalidated rows.
In both cases, changing the validation flag for a record is relatively expensive, because it involves reading and writing the record on different data pages. Unless dozens of changes are made per second, this is probably not a big deal.

Here, there is no need to have a separate "buffer table". You can just properly index the valid field. So the following index would essentially automatically create a buffer table:
create unique index x on y (id)
include (all columns)
where (valid = 0)
This index creates a copy of the yet invalid data. You can do lots of variations such as
create unique index x on y (valid, id)
There's really no need for a separate table. Indexes are very easy compared to partitioning or even manually partitioning. Much less work, more general, more flexible and less potential for human error.

Either approach is valid, and which will perform better will depend more on the type of database you are using rather than the theoretical question of whether it is more correct to use a boolean or partition this into two tables.
I actually prefer the partitioning approach (your buffer table idea), but it will be more complex to code around. This may be a significant point to consider. Most modern databases will handle the boolean criteria very well with an index, but sometimes you can be surprised.
The most important thing from a development perspective right now is to pick one and run with it instead of paralyzing your project while you decide the "right" one.

Is it better to use fewer tables with more columns or vice versa?

I'm trying to figure out how to determine the best balance in structuring a database. I want to be able to store the information from several different forms submitted by different people, sometimes multiple times (such as a yearly update). I'm stuck between having a different table for each form, or a combination of form and element definition and element value tables.
Example A: There are three types of form with different information, so there are four tables, [FormA], [FormB], and [FormC] that each have the data associated with their respective forms, all FKed to [Customers].
Example B: Same three forms, but this time there are five different tables. [FormDescriptions] defines the form names, types, etc and has three entries, one for each form. [Forms] FKs to [Customers] and [FormDescriptions] and uses these in combination with the submission date to distinguish individual submissions. [FormElements] defines all the elements from the three forms, with a FK on FormDescriptions and a unique elementID. [ElementValues] FKs to [FormElements] and [Forms] and stores the value of the selected element on the selected form.
My question is, is one of these methods inherently better than the other, and if not, in which situations is each better than the other? As much why or why not that you want to include is appreciated.

"My question is, is one of these methods inherently better than the other, and if not, in which situations is each better than the other? As much why or why not that you want to include is appreciated."
Your option two is (your personalized variant of) the EAV antipattern. If you use this, and you expect (now or later) the system to do anything "intelligent" with the data, you'll find yourself in serious trouble. And things as basic as "rigorous data validation to catch data entry errors" already qualifies as "intelligent". So only use it if you can reasonably anticipate that the system will only be used for just merely storing the data, and that it will be unlikely for there ever to be a request to start processing/manipulating the data in "intelligent ways".
If you ever run into requests to start doing "intelligent" things with an EAV database, you'll find that whatever development time you thought you gained by working from a super duper generic information model, you'll lose orders of magnitude more time coding all the "intelligent" things required, i.e. reinstating the data structures in code that you refused to reflect in the DB.
Googling for "EAV antipattern" (try to locate the book by Bill Karwin) should provide you with more than enough info on why not to do it.

There are 2 factors in consideration here
Performance
flexibility
If your system is such that it will require you to add more forms in future frequently.. method 2 is better. You won't have to add additional tables or columns. Your forms are data driven. It will add little overhead for generating forms and saving as key value pairs.
On other hand if your system won't require many changes to forms first method can work.
Also consider usage of data after forms are submitted. Are you going to run analytics, reports on this data? Are these reports specific to forms? That will favor method 1.

Implementing Review flags in Databases; best practices

I need store some review flags that relate to some entities. Each review flag can only related to a single entity property group. For example table Parents has a ParentsStatus flag and table Children has a set of ChildrenStatus flags.
In the current design proposal I have three tables:
ReviewTypes: stores the flags and the properties they relate to.
ReviewPositions: stores the values the flags can have.
Reviews: stores the transaction data, the actual reviews. It is like UsersToFlags: Flags in a database rows, best practices.
The problem is I am getting push back that there is no need to have the Reviews table and it would be better to just store this actual review data on each entity. For example add an extra column to Parents to hold ParentsStatus. They feel it is a simpler solution and separating the data out is just “overkill” for out scenario.
I don’t like this idea as this means that every time we want to add a new review flag we need to update the core entity table to hold that flag.
Space is not a problem.
Do people have any strong opinions?
Edit:
This comment applies to the three answers. The consensus is the relational approach is best but I think I need to read up a little more on the EAV model as from some very basic reading Best beginner resources for understanding the EAV database model? and its related links it does not appear to be super straightforward and I don't want to dig myself a hole. Thanks to wildplasser. I'll loop back once I read up a bit more.

Oh yes. Their idea is simpler, until you want to enhance it. Given the scheme they are proposing what if two reviews were need per entity. What if you wanted to attach other things such as notes/annotations. Once they find out how much of an inflatable dartboard their idea is, what do you have to move to a more useful one? Not to mention you need some way of identifying status fields, with fragile rubbish like Column name ends with "_Status", or you have to hard code them somewhere.
Doing it properly is not that much more work, it's not more complex, in fact in many ways it's simpler and it will cope with the invetible changes at far less cost.

normalization is always preferable to premature optimization.

One reason why I like the reviews table separate is that you can hold changes you may not want to display yet (as it hasn't been reviewed and approved) and still maintain the old dat until the new is approved. I don't know if your situation requires that.
To make future programming simpler for when you want to display the changes, you can write a view that shows the old and new data.

Is it ok to have a non persistent variable in an entity?

When using an ORM, is it breaking some kind of good practice to have a model class with a few non-persistent properties, which are only used for calculations, and then can be safely dropped?
Let's say we have a Product. This Product has list of possible Options. An Option may have a price impact on the Product. We also have a set of Rules, which say that when one Option is selected, then the price of another Option changes.
When we add a Product to an Order, along with a selection of Options, we first need to recalculate the price of all the Options based on the rules affecting each selected Option. Then we can calculate the final price of the Product with all its selected Options.
In this example, the Option could have a calculatedPrice property, which would only have meaning within the context of the selected Options, and could be safely dropped after the Product has been added to the Order.
Is there a more correct way to think about this problem, or is that ok?

Yes, it is perfectly fine to have #Transient properties.
Some people may consider it wrong and insist on having a separate class that is almost the same as the entity, but having the additional fields, but that is unnecessary code duplication. Your approach is what I'd do.

The other approach, which is used in a large and ghastly e-commerce system i work with, is to have a parallel structure of transient objects containing the computed information. So, in parallel to the Order, there is an OrderPrice. For each Item in the order, there is an ItemPrice. If an Item has a set of Options, then the ItemPrice will have a set of OptionPrices. The Order's ShippingOption also has a ShippingPrice, and so on. Pricing is then handled by another parallel structure of price calculators - you give an Order to an OrderPriceCalculator, and it gives you back an OrderPrice. In doing so, it will send each Item to the ItemPriceCalculator, which will send each Option to the OptionPriceCalculator, and so on.
The price objects can refer to the order objects, but not vice versa. Our system does actually persist the prices, but separately from the orders.
The advantage of this is that it separates the concerns of describing the contents of an order, describing the price of an order, and calculating the price of an order.
The disadvantage is that you have a huge number of classes, and the information you need is, inevitably, never in the objects you have to hand.
The disadvantage probably outweighs the advantage.

Getting rid of hard coded values when dealing with lookup tables and related business logic

Example case:
We're building a renting service, using SQL Server. Information about items that can be rented is stored in a table. Each item has a state that can be either "Available", "Rented" or "Broken". The different states reside in a lookup table.
ItemState table:
id name
1 'Available'
2 'Rented'
3 'Broken'
Adding to this we have a business rule which states that whenever an item is returned, it's state is changed from "Rented" to "Available".
This could be done with a an update statement like "update Items set state=1 where id=#itemid". In application code we might have an enum that maps to the ItemState id:s. However, these contain hard coded values that could lead to maintenance issues later on. Say if a developer were to change the set of states but forgot to fix the related business logic layer...
What good methods or alternate designs are there for dealing with this type of design issues?
Links to related articles are also appreciated in addition to direct answers.

In my experience this is a case where you actually have to hardcode, preferably by using an Enum which integer values match the id's of your lookup tables. I can't see nothing wrong with saying that "1" is always "Available" and so forth.

Most systems that I've seen hard code the lookup table values and live with it. That's because, in practice, code tables rarely change as much as you think they might. And if they ever do change, you generally need to re-compile any programs that rely on that DDL anyway.
That said, if you want to make the code maintainable (a laudable goal), the best approach would be to externalize the values into a properties file. Then you can edit this file later without having to re-code your entire app.
The limiting factor here is that your app depends for its own internal state on the value you get from the lookup table, so that implies a certain amount of coupling.
For lookups where the app doesn't rely on that code, (for instance, if your code table stores a list of two-letter state codes for use in an address drop-down), then you can lazily load the codes into an object and access them only when needed. But that won't work for what you're doing.

When you have your lookup tables as well as enums defined in the code, then you always have an issue with keeping them in sync. There is not much that can be done here. Both live effectively in two different worlds and are generally unaware of each other.
You may wish to reject using lookup tables and only let your business logic operate these values. In that case you miss the options of relying on referential integrity to back you ap on the data integrity.
The other option is to build up your application in that way that you never need these values in your code. That means moving part of your business logic to the database layer, meaning, putting them in stored procedures and triggers. This will also have the benefit of being agnostic to the client. Anyone can invoke SPs and get assured the data will be kept in the consistence state, consistent with your business logic rules as well.

You'll need to have some predefined value that never changes, be it an integer, a string or something else.
In your case, the numerical value of the state is the state's surrogate PRIMARY KEY which should never change in a well-designed database.
If you're concerned about the consistency, use a CHAR code: A, R or B.
However, you should stick to it as well as to a numerical code so that A always means Available etc.
You database structure should be documented as well as the code is.

The answer depends entirely on the language you're using: solutions for this are not the same in Java, PHP, Smalltalk or even Assembler...
But let me tell you something: while it's true hard coded values are not a great thing, there are times in which you do need them. And this one is pretty much one of them: you need to declare in your code your current knowledge of the business logic, which includes these hard coded states.
So, in this particular case, I would hard code those values.

Don't overdesign it. Before trying to come up with a solution to this problem, you need to figure out if it's even a problem. Can you think of any legit hypothetical scenario where you would change the values in the itemState table? Not just "What if someone changes this table?" but "Someone wants to change this table in X way for Y reason, what effect would that have?". You need to stay realistic.
New state? you add a row, but it doesn't affect the existing ones.
Removing a state? You have to remove the references to it in code anyway.
Changing the id of a state? There is no legit reason to do that.
Changing the name of a state? There is no legit reason to do that.
So there really should be no reason to worry about this. But if you must have this cleanly maintainable in the case of irrational people who randomly decide to change Available to 2 because it just fits their Feng Shui better, make sure all tables are generated via a script which reads these values from a configuration file, and then make sure all code reads constants from that same configuration file. Then you have one definition location and any time you want to change the value you modify that configuration file instead of the DB/code.

I think this is a common problem and a valid concern, that's why I googled and found this article in the first place.
What about creating a public static class to hold all the lookup values, but instead of hard-coding, we initialize these values when the application is loaded and use names to refer them?
In my application, we tried this, it worked. Also you can do some checking, e.g. the number of different possible values of a lookup in code should be the same as in db, if it's not, log/email/etc. But I don't want to manually code this for the status of 40+ biz entities.
Moreover, this can be part of the bigger problem of OR mapping. We're exposed with too much details of the persistence layer, and thus we have to take care of it. With technologies like Entity Framework, we don't need to worry about the "sync" part because it's automated, am I right?
Thanks!

I've used a similar method to what you're describing - a table in the database with values and descriptions (useful for reporting, etc.) and an enum in code. I've handled the synchronization with a comment in code saying something like "these values are taken from table X in database ABC" so that the programmer knows the database needs to be updated. To prevent changes from the database side without the corresponding changes in code I set permissions on the table so that only certain people (who hopefully remember they need to change the code as well) have access.

The values have to be hard-coded, which effectively means that they can't be changed in the database, which means that storing them in the database is redundant.
Therefore, hard-code them and don't have a lookup table in the database. Instead store the items state directly in the items table.

You can structure your database so that your application doesn't actually have to care about the codes themselves, but rather the business rules behind them.
I have done both of the following:
Do one or more of your codes have a certain characteristic, such as IsAvailable, that the application cares about? If so, add it as a flag column to the code table, where those that match are set to true (or your DB's equivalent), and those that don't are set to false.
Do you need to use a specific, single code under a certain condition? You can create a singleton table, named something like EnvironmentSettings, with a column such as ItemStateIdOnReturn that's a foreign key to the ItemState table.
If I wanted to avoid declaring an enum in the application, I would use #2 to address the example in the question.
Whether you take this approach depends on your application's priorities. This type of structure comes at the cost of additional development and lookup overhead. Plus, if every individual code comes with its own business rules, then it's not practical to create one new column per required code.
But, it may be worthwhile if you don't want to worry about synchronizing your application with the contents of a code table.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas