I am looking for advice on how to do something I have wanted to do many times in the past, which leads me to believe that there is a design pattern (and likely a gem, or examples) of how to do exactly what I want to achieve.
I am currently working in Rails on a project involving many types of orders. For example, I have a basic order which needs to generate a shipment. The data in the order needs to be "mapped" to a newly created shipment. (Very brief example:)
Order
- from: NYC
- to: LA
- shipping_type: FedEx
Shipment
- from: LA
- to: NYC
- shipping_type: FedEx
The Order and Shipment will have nearly identical fields, one of the minor differences is that the "from" and "to" fields switch. In the real world, each of these objects have many, many more fields. There are also many other types of orders that need to have "maps".
In interest of trying to keep things DRY, is there a way to create maps between these orders, so that I can minimize the number of times that I will have to write a generate_order_from(order) method?
Edit: I am attempting to create a method similar to RestKit's Object Mapping. I just can't find anything similar in Ruby.
The order and shipment will have nearly identical fields ... In the real world, each of these objects have many, many more fields.
Perhaps this is your problem? If you have so many fields and most of them are identical in the order and the shipment, you should pull the common fields into a separate object that is shared by both the order and the shipment (in the database, this would equate to normalizing your tables). Then you would only have to map the fields that actually change (such as switching the from and to fields).
Related
I have a table that stores general information about a customer (name, address, etc) that is common to all customers. I have a field called CustomerType (list of types) that drives what other fields I need to capture. So if they are a government customer then they will see a different set of custom fields than a non-profit customer would see. I need to create forms that each different CustomerType will be fill out. On the SQL side, I need to figure out the best way to store the data so that when I do reporting it is simple. I don't know the best way to attack this problem.
On the SQL side, I need to figure out the best way to store the data so that
when I do reporting it is simple. I don't know the best way to attack this
problem.
There are many possible approaches each with different strengths and weaknesses, here's some to think about:
Create separate customer detail tables for each of the customer types, each containing the fields specific to that customer type. Each detail table keyed on the customer Id. The customer type does not have to be an attribute of the detail table, only of the parent Customer table.
(+) The correctly normalized solution (although you may find awkward situations
where attributes are common to a subset of customer types). The tables will be fairly easy to maintain.
(-) Reports harder to write - you may find yourself using a LOT of unions or outer joins. Development against this schema is more complex, the extra logic to insert/update attributes in the correct tables for particular customer types must be encoded somewhere. This might become unmanageable if you have many customer types, or if you're adding/changing them frequently.
Expand the customer table to contain the super-set of columns required by all customer types, keyed on the customer Id.
(+) Simple, very easy to report on, simple programming logic.
(-) The customer-type specific fields are only partially dependent on the key of the customer table (customer Id) - they are really dependent on the combination of customerId/customerType.
If there are many extra fields, and if there are few fields common between customer types then this denormalization may result in a very wide table with an unmanageable number of columns. It could be a maintenance nightmare - the table must be modified every time a new customer type is added/change.
You might find this a good solution if the number of unique fields required by each customer type is small and they don't change often and if ease of programming and reporting is an overriding concern.
Store the customer specific values as name/value pairs in a generic customer Details table, keyed on customerId/customerType/key.
(+) Very simple to maintain - No data model changes are required to add a new customer type.
(-) Non-relational, makes pure SQL reporting near impossible and makes integrity constraints very difficult to add. You might see this in specialized use cases e.g. where the data will only ever consumed as JSON and direct reporting will never be a requirement, or in some corporate environments where it may be appealing if database changes are very hard to push through.
First of all, have a look at some good tutorials on database design and object relational modelling (ORM) A beginner's guide to SQL database design
My personal suggestion for your design would be to create one table to store all costumers, together with some kind of unique customer id and the CustomerType. Next create a separate table for each of the CustomerTypes and for each user that belongs to that type, store that users unique id in a column together with its customertype specific fields.
While working on implementing voucher feature for an eCommerce application, I need to implement Voucher usage restriction, some of restriction I am planning to have
Products
Exclude products
Product categories
Exclude categories
Email /Customer restrictions
Currently We are supporting following 2 type of Vouchers with an option to create Custom voucher type and all those Vouchers types are being maintained in a single table with help of discriminator (Hibernate use).
Serial Vouchers
Promotion Vouchers.
these are only few which I am targeting at initial stage.My main confusion is about database design and restriction of these voucher usage with Voucher.I am not able to decide which is best way to Map these restrictions in database.
Should I go for a single table for all these restriction and have a relation with Voucher table or is it good to group all similar type of restriction in a single table and have their relation with Voucher table.
As an additional information , we are using hibernate to map our entities with the DB table.
This seems like a very wide-open and freeform requirement. Some questions:
How complex will the business rules you are attempting to model be? If you’re allowing (business) users to define their own vouchers, odds are good they’ll come up with some pretty byzantine rules and combinations. If you have to support anything they come up with, you will have problems.
What will the database be tasked to do with this data? Store the “voucher definition”, sure, but then what? Run tallies or reports on them? Analyze how many are used, by who/when/how/for what? Or just list out what was used/generated over the past year?
What kind of data volumes are you going to have? One entry per voucher definition, or per voucher printed/issued? (If the latter, can you use one entry per voucher, with a count of how many issued?) Are we talking dozens, hundreds, or millions of vouchers?
If it’s totally free-form, if they just want a listing without serious analysis, if the overall volume is small, consider using blob fields rather than minutiae-oriented columns. Something like a big text field and a data-entry box wherein the user will “Enter any other criteria defining the voucher”. (You might even do this using XML.) Ugly, you can’t readily analyze the data, but if the goals are too great or diffuse and you're not going to use all that detailed data, it might be necessary.
A final note: a voucher that is good for only selected products cannot be used on products that are added after the voucher is created. A voucher that is good for all but selected products can be used for subsequently created products. This logic may apply to any voucher-limiting criteria. Both methodologies have merit, make sure the users are clear on what they’re doing.
If I understand what your your are doing, you will have a problem with only one table for all restrictions, because it means 1 row per Voucher and multiple values in your different restrictions columns.
It will be harder for you to UPDATE, extract and cast restrictions values.
In my opinion, you should have one table for each restrictions type and map them with Voucher table. However It will be easier for you to add new restrictions.
As a suggestion:
Isn't it more rational to have valid-products and valid-categories instead of Exclude-products and Exclude-categories?
Having a Customer-Creditgroup table will lead us to have valid-customer-group table.
BTW in the current design we can have a voucher definition table, I will call it voucher-type table.
About the restrictions:
In RDBMS level you can state only two types of table constraints decoratively:
uniquely identifying attributes (keys)
Subsets requirements referencing back to the same or other table
(foreign key)
Implementing all other types of table constraints (like a multi-tuple constraints or transition constraints) requires you to develop procedural data integrity code.
When a voucher is going to sold to a specific customer for a specific product we will need to check validity of excluded elements, that could be done by triggers in data base level or business logic of your application.
I would personally go with your second proposal... grouping all similar types of restrictions in a single table, which refers the Voucher table.
I'll add to that, that you can handle includes and excludes on the same table.
So the structure I'd use is some along the lines of:
Voucher (id, type, etc...)
VoucherProductRestriction (id,voucher_id,product_id,include)
VoucherProductCategoryRestriction (id,voucher_id,product_category_id,include)
VoucherCustomerRestriction (id,voucher_id,customer_id)
VoucherEmailRestriction (id,voucher_id,email)
...where the include column could be a boolean that is true in case you want to restrict the voucher to that product or category, or false if you want to restrict it to any product or category other than those specificied.
If I understand your context correctly, it makes no sense to have both include and exclude restrictions on the same voucher (although it could make sense to have more than one of the same type). You can probably handle and check this better if you use a single table for both types of restrictions.
Let's say we have an Order entity that will be modeled in 2 diff. BCs in a e-commerce application.
The first BC is Order Placement. This BC takes care of collecting all orders placed by our customers from our different websites, validates them and populates its corresponding database by Orders with state either Placed or Rejected.
The 2nd BC is Shipment. This allows the employees in the warehouses to mark an Order as Shipped in its database once it leaves the warehouse.
Now since both BCs use different databases which are empty at first, there will be a need to inform the Shipment BC of the orders that were Placed, so that when a scanner wants to Ship an Order it will be there in the Shipment BC.
My initial approach was to create a domain event once an Order is placed in the Order placement BC and have the Shipment BC subscribe to that event and create a corresponding Order entity in its database for every order placed.
However, I can't stop that feeling that I'm duplicating data across different databases.
My second approach is to ask the Order Placement each time an order is being Shipped for an Order entity, but I still need to maintain the state of the Order in case a failure of a failure in the shipment.
Is there a better approach to all this from a DDD POV?
Your first approach is perfectly fine in my opinion. You are not duplicating data, because as you already noticed, that data is from another context. Same data in different contexts means different things.
As Vernon Vaughn pointed out in his book «Implementing Domain Driven Design»: "A greater degree of autonomy can be achieved when dependent state is already in place in our local system. Some may think of this as a cache of whole dependent objects, but that’s not usually the case when using DDD. Instead we create local domain objects translated from the foreign model, maintaining only the minimal amount of state needed by the local model.”
So copying data is okay as long as it is the only data other BCs need.
But he also mentions that if you use exact copies, it might be a sign of a modeling problem.
I am about to embark on a project for work that is very outside my normal scope of duties. As a SQL DBA, my initial inclination was to approach the project using a SQL database but the more I learn about NoSQL, the more I believe that it might be the better option. I was hoping that I could use this question to describe the project at a high level to get some feedback on the pros and cons of using each option.
The project is relatively straightforward. I have a set of objects that have various attributes. Some of these attributes are common to all objects whereas some are common only to a subset of the objects. What I am tasked with building is a service where the user chooses a series of filters that are based on the attributes of an object and then is returned a list of objects that matches all^ of the filters. When the user selects a filter, he or she may be filtering on a common or subset attribute but that is abstracted on the front end.
^ There is a chance, depending on user feedback, that the list of objects may match only some of the filters and the quality of the match will be displayed to the user through a score that indicates how many of the criteria were matched.
After watching this talk by Martin Folwler (http://www.youtube.com/watch?v=qI_g07C_Q5I), it would seem that a document-style NoSQL database should suit my needs but given that I have no experience with this approach, it is also possible that I am missing something obvious.
Some additional information - The database will initially have about 5,000 objects with each object containing 10 to 50 attributes but the number of objects will definitely grow over time and the number of attributes could grow depending on user feedback. In addition, I am hoping to have the ability to make rapid changes to the product as I get user feedback so flexibility is very important.
Any feedback would be very much appreciated and I would be happy to provide more information if I have left anything critical out of my discussion. Thanks.
This problem can be solved in by using two separate pieces of technology. The first is to use a relatively well designed database schema with a modern RDBMS. By modeling the application using the usual principles of normalization, you'll get really good response out of storage for individual CRUD statements.
Searching this schema, as you've surmised, is going to be a nightmare at scale. Don't do it. Instead look into using Solr/Lucene as your full text search engine. Solr's support for dynamic fields means you can add new properties to your documents/objects on the fly and immediately have the ability to search inside your data if you have designed your Solr schema correctly.
I'm not an expert in NoSQL, so I will not be advocating it. However, I have few points that can help you address your questions regarding the relational database structure.
First thing that I see right away is, you are talking about inheritance (at least conceptually). Your objects inherit from each-other, thus you have additional attributes for derived objects. Say you are adding a new type of object, first thing you need to do (conceptually) is to find a base/super (parent) object type for it, that has subset of the attributes and you are adding on top of them (extending base object type).
Once you get used to thinking like said above, next thing is about inheritance mapping patterns for relational databases. I'll steal terms from Martin Fowler to describe it here.
You can hold inheritance chain in the database by following one of the 3 ways:
1 - Single table inheritance: Whole inheritance chain is in one table. So, all new types of objects go into the same table.
Advantages: your search query has only one table to search, and it must be faster than a join for example.
Disadvantages: table grows faster than with option 2 for example; you have to add a type column that says what type of object is the row; some rows have empty columns because they belong to other types of objects.
2 - Concrete table inheritance: Separate table for each new type of object.
Advantages: if search affects only one type, you search only one table at a time; each table grows slower than in option 1 for example.
Disadvantages: you need to use union of queries if searching several types at the same time.
3 - Class table inheritance: One table for the base type object with its attributes only, additional tables with additional attributes for each child object type. So, child tables refer to the base table with PK/FK relations.
Advantages: all types are present in one table so easy to search all together using common attributes.
Disadvantages: base table grows fast because it contains part of child tables too; you need to use join to search all types of objects with all attributes.
Which one to choose?
It's a trade-off obviously. If you expect to have many types of objects added, I would go with Concrete table inheritance that gives reasonable query and scaling options. Class table inheritance seems to be not very friendly with fast queries and scalability. Single table inheritance seems to work with small number of types better.
Your call, my friend!
May as well make this an answer. I should comment that I'm not strong in NoSQL, so I tend to lean towards SQL.
I'd do this as a three table set. You will see it referred to as entity value pair logic on the web...it's a way of handling multiple dynamic attributes for items. Lets say you have a bunch of products and each one has a few attributes.
Prd 1 - a,b,c
Prd 2 - a,d,e,f
Prd 3 - a,b,d,g
Prd 4 - a,c,d,e,f
So here are 4 products and 6 attributes...same theory will work for hundreds of products and thousands of attributes. Standard way of holding this in one table requires the product info along with 6 columns to store the data (in this setup at least one third of them are null). New attribute added means altering the table to add another column to it and coming up with a script to populate existing or just leaving it null for all existing. Not the most fun, can be a head ache.
The alternative to this is a name value pair setup. You want a 'header' table to hold the common values amoungst your products (like name, or price...things that all rpoducts always have). In our example above, you will notice that attribute 'a' is being used on each record...this does mean attribute a can be a part of the header table as well. We'll call the key column here 'header_id'.
Second table is a reference table that is simply going to store the attributes that can be assigned to each product and assign an ID to it. We'll call the table attribute with atrr_id for a key. Rather straight forwards, each attribute above will be one row.
Quick example:
attr_id, attribute_name, notes
1,b, the length of time the product takes to install
2,c, spare part required
etc...
It's just a list of all of your attributes and what that attribute means. In the future, you will be adding a row to this table to open up a new attribute for each header.
Final table is a mapping table that actually holds the info. You will have your product id, the attribute id, and then the value. Normally called the detail table:
prd1, b, 5 mins
prd1, c, needs spare jack
prd2, d, 'misc text'
prd3, b, 15 mins
See how the data is stored as product key, value label, value? Any future product added can have any combination of any attributes stored in this table. Adding new attributes is adding a new line to the attribute table and then populating the details table as needed.
I beleive there is a wiki for it too... http://en.wikipedia.org/wiki/Entity-attribute-value_model
After this, it's simply figuring out the best methodology to pivot out your data (I'd recommend Postgres as an opensource db option here)
EDIT:
Would it be a good idea to just keep it all under 1 big table and have a flag that differentiates the different forms?
I have to build a site with 5 forms, maybe more. so far the fields for the forms are the following:
What would be the best approach to normalize this design?
I was thinking about splitting "Personal Details" into 3 different tables:
and then reference them from the others with an ID...
Would that make sense? It looks like I'll end up with lots of relationships...
Normalized data essentially means that the same data is not stored multiple times in multiple places. For example, instead of storing the customer contact info with an order, the customer ID is stored with the order and the customer's contact information is 'related' to the order. When the customer's phone number is updated, there is only one place the phone number needs to be updated (the customer table) and all the orders will have the correct information without being updated. Each piece of data exists in one, and only one, place. This is normalized data.
So, to answer your question: no, you will not make your database structure more normalized by breaking up a large table as you described.
The reason to break up a single table into multiple tables is usually to create a one to many relationship. For example, one person might have multiple e-mail addresses. Or multiple physical addresses. Another common reason for breaking up tables is to make systems modular, so that tables can be created that join to existing tables without modifying the existing tables.
Breaking one big table into multiple little tables, with a one to one relationship between them, doesn't make the data any more normalized, it just makes your queries more of a pain to write.* And you don't want to structure your database design around interfaces (forms) unless there is a good reason. There usually isn't.
*Although there are sometimes good reasons to break up big tables and create one to one relationships, normalization isn't one of them.