Database design advice needed on custom fields - sql

I have a table that stores general information about a customer (name, address, etc) that is common to all customers. I have a field called CustomerType (list of types) that drives what other fields I need to capture. So if they are a government customer then they will see a different set of custom fields than a non-profit customer would see. I need to create forms that each different CustomerType will be fill out. On the SQL side, I need to figure out the best way to store the data so that when I do reporting it is simple. I don't know the best way to attack this problem.

On the SQL side, I need to figure out the best way to store the data so that
when I do reporting it is simple. I don't know the best way to attack this
problem.
There are many possible approaches each with different strengths and weaknesses, here's some to think about:
Create separate customer detail tables for each of the customer types, each containing the fields specific to that customer type. Each detail table keyed on the customer Id. The customer type does not have to be an attribute of the detail table, only of the parent Customer table.
(+) The correctly normalized solution (although you may find awkward situations
where attributes are common to a subset of customer types). The tables will be fairly easy to maintain.
(-) Reports harder to write - you may find yourself using a LOT of unions or outer joins. Development against this schema is more complex, the extra logic to insert/update attributes in the correct tables for particular customer types must be encoded somewhere. This might become unmanageable if you have many customer types, or if you're adding/changing them frequently.
Expand the customer table to contain the super-set of columns required by all customer types, keyed on the customer Id.
(+) Simple, very easy to report on, simple programming logic.
(-) The customer-type specific fields are only partially dependent on the key of the customer table (customer Id) - they are really dependent on the combination of customerId/customerType.
If there are many extra fields, and if there are few fields common between customer types then this denormalization may result in a very wide table with an unmanageable number of columns. It could be a maintenance nightmare - the table must be modified every time a new customer type is added/change.
You might find this a good solution if the number of unique fields required by each customer type is small and they don't change often and if ease of programming and reporting is an overriding concern.
Store the customer specific values as name/value pairs in a generic customer Details table, keyed on customerId/customerType/key.
(+) Very simple to maintain - No data model changes are required to add a new customer type.
(-) Non-relational, makes pure SQL reporting near impossible and makes integrity constraints very difficult to add. You might see this in specialized use cases e.g. where the data will only ever consumed as JSON and direct reporting will never be a requirement, or in some corporate environments where it may be appealing if database changes are very hard to push through.

First of all, have a look at some good tutorials on database design and object relational modelling (ORM) A beginner's guide to SQL database design
My personal suggestion for your design would be to create one table to store all costumers, together with some kind of unique customer id and the CustomerType. Next create a separate table for each of the CustomerTypes and for each user that belongs to that type, store that users unique id in a column together with its customertype specific fields.

Related

How do I represent this model in tables?

I have a table of warehouses and a table of clients to manage several warehouses belonging to different clients
warehouse
=====
id
address
capacity
owner_client
client
=====
id
name
My issue is, i have an ACME client, and ACME has an "ACME safety rating" attribute only applicable to their warehouses. Currently we just have this as a field of warehouses and its null for non-acme warehouses. But this feels wrong and has required some workarounds and special cases.
Whats the best way to represent this? I've thought of making an "Acme safety ratings" table with the number and FK to the warehouse, but now I've made a table specific for one client? What if we need to start tracking "is_foobar_accesible" for the baz client?
The relationally pure way to do this would be to implement your initial suggestion i.e. have a separate table such as ACME_WAREHOUSES that holds the attributes such as SAFTEY_RATING that are only applicable to this client. A different CLIENT_WAREHOUSES table would be created for each client that has its own attributes. In this way you could use standard database constraint functionality to ensure the integrity of the data in these tables.
Another method would be to add a series of nullable columns to the WAREHOUSES table such as ACME_SAFETY_RATING and BAZ_FOOBAR_ACCESSIBLE. This is not relationally pure as it means null values can exist in this table. However, you can still use standard database functionality to ensure the integrity of the data. It can be a bit more convoluted if certain values are mandatory in certain situations. Also, if there are many clients with many differing attributes the number of columns in the table can become unwieldy.
Another method is the Entity-Attribute-Value model. Generally, this is to be avoided if at all possible. It is not relationally pure, as your column values are now no longer defined over domains, and it is extremely difficult, if not impossible, to ensure the integrity of the data. Any real attempt to do so will require a lot of bespoke coding (which needs to be carefully implemented to cater for things like concurrency control that database constraints give you for free) as you cannot use standard database constraints. However, if you are just interested in storing values for information and not doing anything with them you could use this method.
The EAV method does have a danger that because it appears so easy to add attributes to an entity, it becomes the default way of doing so. It is then used to add attributes for which vital processing is dependent and, because you cannot ensure the integrity of the data using this method, you find the values being used are meaningless and the whole logical basis for the processing is destroyed.
I would create a ClientProperty and ClientWarehousePropertyValue table so that you can store these Client owned properties and their values for each warehouse:
ClientProperty
===============
ID
ClientID
Name
ClientWarehousePropertyValue
============================
WarehouseID
ClientPropertyID
Value

Database design for coupon usuage restriction

While working on implementing voucher feature for an eCommerce application, I need to implement Voucher usage restriction, some of restriction I am planning to have
Products
Exclude products
Product categories
Exclude categories
Email /Customer restrictions
Currently We are supporting following 2 type of Vouchers with an option to create Custom voucher type and all those Vouchers types are being maintained in a single table with help of discriminator (Hibernate use).
Serial Vouchers
Promotion Vouchers.
these are only few which I am targeting at initial stage.My main confusion is about database design and restriction of these voucher usage with Voucher.I am not able to decide which is best way to Map these restrictions in database.
Should I go for a single table for all these restriction and have a relation with Voucher table or is it good to group all similar type of restriction in a single table and have their relation with Voucher table.
As an additional information , we are using hibernate to map our entities with the DB table.
This seems like a very wide-open and freeform requirement. Some questions:
How complex will the business rules you are attempting to model be? If you’re allowing (business) users to define their own vouchers, odds are good they’ll come up with some pretty byzantine rules and combinations. If you have to support anything they come up with, you will have problems.
What will the database be tasked to do with this data? Store the “voucher definition”, sure, but then what? Run tallies or reports on them? Analyze how many are used, by who/when/how/for what? Or just list out what was used/generated over the past year?
What kind of data volumes are you going to have? One entry per voucher definition, or per voucher printed/issued? (If the latter, can you use one entry per voucher, with a count of how many issued?) Are we talking dozens, hundreds, or millions of vouchers?
If it’s totally free-form, if they just want a listing without serious analysis, if the overall volume is small, consider using blob fields rather than minutiae-oriented columns. Something like a big text field and a data-entry box wherein the user will “Enter any other criteria defining the voucher”. (You might even do this using XML.) Ugly, you can’t readily analyze the data, but if the goals are too great or diffuse and you're not going to use all that detailed data, it might be necessary.
A final note: a voucher that is good for only selected products cannot be used on products that are added after the voucher is created. A voucher that is good for all but selected products can be used for subsequently created products. This logic may apply to any voucher-limiting criteria. Both methodologies have merit, make sure the users are clear on what they’re doing.
If I understand what your your are doing, you will have a problem with only one table for all restrictions, because it means 1 row per Voucher and multiple values in your different restrictions columns.
It will be harder for you to UPDATE, extract and cast restrictions values.
In my opinion, you should have one table for each restrictions type and map them with Voucher table. However It will be easier for you to add new restrictions.
As a suggestion:
Isn't it more rational to have valid-products and valid-categories instead of Exclude-products and Exclude-categories?
Having a Customer-Creditgroup table will lead us to have valid-customer-group table.
BTW in the current design we can have a voucher definition table, I will call it voucher-type table.
About the restrictions:
In RDBMS level you can state only two types of table constraints decoratively:
uniquely identifying attributes (keys)
Subsets requirements referencing back to the same or other table
(foreign key)
Implementing all other types of table constraints (like a multi-tuple constraints or transition constraints) requires you to develop procedural data integrity code.
When a voucher is going to sold to a specific customer for a specific product we will need to check validity of excluded elements, that could be done by triggers in data base level or business logic of your application.
I would personally go with your second proposal... grouping all similar types of restrictions in a single table, which refers the Voucher table.
I'll add to that, that you can handle includes and excludes on the same table.
So the structure I'd use is some along the lines of:
Voucher (id, type, etc...)
VoucherProductRestriction (id,voucher_id,product_id,include)
VoucherProductCategoryRestriction (id,voucher_id,product_category_id,include)
VoucherCustomerRestriction (id,voucher_id,customer_id)
VoucherEmailRestriction (id,voucher_id,email)
...where the include column could be a boolean that is true in case you want to restrict the voucher to that product or category, or false if you want to restrict it to any product or category other than those specificied.
If I understand your context correctly, it makes no sense to have both include and exclude restrictions on the same voucher (although it could make sense to have more than one of the same type). You can probably handle and check this better if you use a single table for both types of restrictions.

database normalization

EDIT:
Would it be a good idea to just keep it all under 1 big table and have a flag that differentiates the different forms?
I have to build a site with 5 forms, maybe more. so far the fields for the forms are the following:
What would be the best approach to normalize this design?
I was thinking about splitting "Personal Details" into 3 different tables:
and then reference them from the others with an ID...
Would that make sense? It looks like I'll end up with lots of relationships...
Normalized data essentially means that the same data is not stored multiple times in multiple places. For example, instead of storing the customer contact info with an order, the customer ID is stored with the order and the customer's contact information is 'related' to the order. When the customer's phone number is updated, there is only one place the phone number needs to be updated (the customer table) and all the orders will have the correct information without being updated. Each piece of data exists in one, and only one, place. This is normalized data.
So, to answer your question: no, you will not make your database structure more normalized by breaking up a large table as you described.
The reason to break up a single table into multiple tables is usually to create a one to many relationship. For example, one person might have multiple e-mail addresses. Or multiple physical addresses. Another common reason for breaking up tables is to make systems modular, so that tables can be created that join to existing tables without modifying the existing tables.
Breaking one big table into multiple little tables, with a one to one relationship between them, doesn't make the data any more normalized, it just makes your queries more of a pain to write.* And you don't want to structure your database design around interfaces (forms) unless there is a good reason. There usually isn't.
*Although there are sometimes good reasons to break up big tables and create one to one relationships, normalization isn't one of them.

SQL/MySQL structure (Denormalize or keep relational)

I have a question about best practices related to de-normalization or table hierarchy relationships.
For a simple example, let's say I have an app that allows a user to make a payment for an order. I save the order information in the orders table, and I have another table for the payment called payments. Payments has a foreign key to the orders table.
Let's assume that I can pay with a credit card, check, or paypal, and I want to save the information about the payment.
My question is what is the best way to handle this relationship between the different payment data and the payment table. The types of payment all have different data associated with them. So do I denormalize the payments table, putting credit card, check, and paypal information fields in there and then just use the fields as necessary. Alternately I could specify a payment type, and store the information in their own tables, but then I would have to use logic on an application level to get the data out of the correct credit card, check or paypal information tables...
I would choose to keep the database normalized.
but then I would have to use logic on an application level to get the data out of the correct credit card, check or paypal information tables...
You have to use logic (or at least mapping) in either case. Whether its what table to pull the data from or what fields in the table to access.
What about keeping it denormalized and then making a view to put the data back together again. You get the best of both worlds. IIRC, MySQL introduced views in version 5.
So do I denormalize the payments
table, putting credit card, check, and
paypal information fields in there and
then just use the fields as necessary.
yes. but this is not "denormalizing". if you stored order information in the client table, that would be denormalizing. adding nullable columns to accurately describe a payment in the payments table is not.
You can use the idea of table per subclass as the ORM tools do. This would require a join for each query against the payment table but...
Create tables for each payment type so you will have a creditcardpayment and a checkpayment table. The common fields go in the payment table, the specific fields go in the sub tables. The sub tables primary keys are foreign keys to the payment table's id.
To add a new payment you have to first insert the common fields into the payment table, get the id generated, then insert the specific fields into the specific sub table.
To query you have to join the subtables with the payment table. You could use a view to make that easier.
This way the database is still normalized and you have no null columns.
It partially depends on the framework (if any) that you are using. For instance: the Ruby on Rails way would generally be to store the type of the payment in the payments table and then have different, separate tables for each payment type (PayPal, Credit Card, etc).
Alternatively, if you notice that you are repeating the same data in many of the tables, Rails has a way to store all of the data in the same table, using only the fields you need, but still allowing you to have separate objects. For instance, you would have an AbstractPayment object with an abstract_payments table, but you would also have PayPalPayment and CreditCardPayment objects that both inherit from AbstractPayment and use the abstract_payments table. All you need to determine the payment type is a column in abstract_payments that tells you which type it is (probably a string, but could be an integer if you so choose). This is called STI.
No matter what framework/language you use, the same ideas can definitely apply and I think the right solution will depend on how many different types of payments you have, compared with how simple you want your database to be.
Keep it as normalized as possible. Only de-normalize when the performance of a fully normalized schema requires denormalization to improve response time, and do that only on a case by case basis to deal with specific performance issues associated with individual querys within your application.
These are complex problems. Database Normalization requires intimate domain knowledge, and a skilled analysis of how that domain model will be manipulated and utilized within your application. Denormalizing for performance requires that you understand your application's usage patterns well enough to predict performance issues before they occur (waiting till they actually occur in production is too late - by then making fundemental schema changes in the database is very expensive) and know what denormalization techniques to use to address them.
You need to weight the following factors:
How much space will you waste if you put all data into a single table
How complex the SQL queries will become in either case.
If you use different tables, you'll have to use joins. If you put everything into a single table, you'll need to find some magic to "ignore" the rows which don't matter (say when you want to find all credit card payments: Your query must then ignore everything that's something else).
The latter part gets more easy when you move the special data into special tables at the cost of more complex joins.

What is the preferred way to store custom fields in a SQL database?

My friend is building a product to be used by different independent medical units.
The database stores a vast collection of measurements taken at different times, like the temperature, blood pressure, etc...
Let us assume these are held in a table called exams with columns temperature, pressure, etc... (as well as id, patient_id and timestamp). Most of the measurements are stored as floats, but some are of other types (strings, integers...)
While many of these measurements are handled by their product, it needs to allow the different medical units to record and process other custom measurements. A very nifty UI allows the administrator to edit these customs fields, specify their name, type, possible range of values, etc...
He is unsure as to how to store these custom fields.
He is leaning towards a separate table (say a table custom_exam_data with fields like exam_id, custom_field_id, float_value, string_value, ...)
I worry that this will make searching both more difficult to achieve and less efficient.
I am leaning towards modifying the exam table directly (while avoiding conflicts on column names with some scheme like prefixing all custom fields with an underscore or naming them custom_1, ...)
He worries about modifying the database dynamically and having different schemas for each medical unit.
Hopefully some people which more experience can weigh in on this issue.
Notes:
he is using Ruby on Rails but I think this question is pretty much framework agnostic, except from the fact that he is only looking for solutions in SQL databases only.
I simplified the problem a bit since the custom fields need to be available for more than one table, but I believe this doesn`t really impact the direction to take.
(added) A very generic reporting module will need to search, sort, generate stats, etc.. of this data, so it is required that this data be stored in the columns of the appropriate type
(added) User inputs will be filtered, for the standard fields as well as for the custom fields. For example, numbers will be checked within a given range (can't have a temperature of -12 or +444), etc... Thus, conversion to the appropriate SQL type is not a problem.
I've had to deal with this situation many times over the years, and I agree with your initial idea of modifying the DB tables directly, and using dynamic SQL to generate statements.
Creating string UserAttribute or Key/Value columns sounds appealing at first, but it leads to the inner-platform effect where you end up having to re-implement foreign keys, data types, constraints, transactions, validation, sorting, grouping, calculations, et al. inside your RDBMS. You may as well just use flat files and not SQL at all.
SQL Server provides INFORMATION_SCHEMA tables that let you create, query, and modify table schemas at runtime. This has full type checking, constraints, transactions, calculations, and everything you need already built-in, don't reinvent it.
It's strange that so many people come up with ad-hoc solutions for this when there's a well-documented pattern for it:
Entity-Attribute-Value (EAV) Model
Two alternatives are XML and Nested Sets. XML is easier to manage but generally slow. Nested Sets usually require some type of proprietary database extension to do without making a mess, like CLR types in SQL Server 2005+. They violate first-normal form, but are nevertheless the fastest-performing solution.
Microsoft Dynamics CRM achieves this by altering the database design each time a change is made. Nasty, I think.
I would say a better option would be to consider an attribute table. Even though these are often frowned upon, it gives you the flexibility you need, and you can always create views using dynamic SQL to pivot the data out again. Just make sure you always use LEFT JOINs and FKs when creating these views, so that the Query Optimizer can do its job better.
I have seen a use of your friend's idea in a commercial accounting package. The table was split into two, first contained fields solely defined by the system, second contained fields like USER_STRING1, USER_STRING2, USER_FLOAT1 etc. The tables were linked by identity value (when a record is inserted into the main table, a record with same identity is inserted into the second one). Each table that needed user fields was split like that.
Well, whenever I need to store some unknown type in a database field, I usually store it as String, serializing it as needed, and also store the type of the data.
This way, you can have any kind of data, working with any type of database.
I would be inclined to store the measurement in the database as a string (varchar) with another column identifying the measurement type. My reasoning is that it will presumably, come from the UI as a string and casting to any other datatype may introduce a corruption before the user input get's stored.
The downside is that when you go to filter result-sets by some measurement metric you will still have to perform a casting but at least the storage and persistence mechanism is not introducing corruption.
I can't tell you the best way but I can tell you how Drupal achieves a sort of schemaless structure while still using the standard RDBMSs available today.
The general idea is that there's a schema table with a list of fields. Each row really only has two columns, the 'table':String column and the 'column':String column. For each of these columns it actually defines a whole table with just an id and the actual data for that column.
The trick really is that when you are working with the data it's never more than one join away from the bundle table that lists all the possible columns so you end up not losing as much speed as you might otherwise think. This will also allow you to expand much farther than just a few medical companies unlike the custom_ prefix you were proposing.
MySQL is very fast at returning row data for short rows with few columns. In this way this scheme ends up fairly quick while allowing you lots of flexibility.
As to search, my suggestion would be to index the page content instead of the database content. Use Solr to parse through rendered pages and hold links to the actual page instead of trying to search through the database using clever SQL.
Define two new tables: custom_exam_schema and custom_exam_data.
custom_exam_data has an exam_id column, plus an additional column for every custom attribute.
custom_exam_schema would have a row to describe how to interpret each of the columns of the custom_exam_data table. It would have columns like name, type, minValue, maxValue, etc.
So, for example, to create a custom field to track the number of fingers a person has, you would add ('fingerCount', 'number', 0, 10) to custom_exam_schema and then add a column named fingerCount to the exam table.
Someone might say it's bad to change the database schema at run time, but I'd argue that configuring these custom fields is part of set up and won't happen too often. Still, this method lets you handle changes at any time and doesn't risk messing around with your core table schemas.
lets say that your friend's database has to store data values from multiple sources such as demogrphic values, diagnosis, interventions, physionomic values, physiologic exam values, hospitalisation values etc.
He might have as well to define choices, lets say his database is missing the race and the unit staff need the race of the patient (different races are more unlikely to get some diseases), they might want to use a drop down with several choices.
I would propose to use an other table that would have these choices or would you just use a "Custom_field_choices" table, which at some point is exactly the same but with a different name.
Considering that the database :
- needs to be flexible
- that data from multiple tables can be added and be customized
- that you might want to keep the integrity of the main structure of your database for distribution and uniformity purpose
- that data MUST have a limit and alarms and warnings
- that data must have units ( 10 kg or 10 pounds) ?
- that data can have a selection of choices
- that data can be with different rights (from simple user to admin)
- that these data might be needed to generate reports without modifying the code (automation)
- that these data might be needed to make cross reference analysis within the system without modifying the code
the custom table would be my solution, modifying each table would end up being too risky.
I would store those custom fields in a table where each record ( dataType, dataValue, dataUnit ) would use in one row. So there would be a relation oneToMany from one sample to the data. You can also create a table to record all the kind of cutsom types you would use. For example:
create table DataType
(
id int primary key,
name varchar(100) not null unique
description text,
uri varchar(255) //<-- can be used for an ONTOLOGY
)
create table DataRecord
(
id int primary key,
sample_id int not null,//<-- reference to the sample
dataType_id int not null, //<-- references DataType
value varchar(100),//<-- the value as string
unit varchar(50)//<-- g, mg/ml, etc... but it could also be a link to a table describing the units just like DataType
)