Moqui - Associate multiple assets to an asset - moqui

The assets entity in Moqui has an associated asset field. But, we have a use case where multiple assets need to be associated with an asset.
For example, a tool(manufacturing equipment) may be used only in specified machine(manufacturing equipment). We are exploring the option to create an join entity.
Are we deviating from the best practices of framework?
Added to answer the comment from David E Jones
Business Requirement
There is a custom tool designed to manufacture a component.
This tool is technically compatible with wide range of machines in operation.
The operating cost of machines in question vary in a very wide band. So, the tool should be used only on specific machines to keep the overall cost of manufactured component within a specified band.
So, for a given tool, we intend to assign the allowed machine(s) and use only assigned machines for manufacturing.

As David remarked it is difficult to design for business requirements without detail and context, and there is relatively little to go on here.
I guess the tooling that might be set up on a particular machine could comprise a large range, related to the range of component specifications of orders for a component that might come in.
The main process to be designed here I guess would be to choose the most economically optimum machine to set up with the tooling for a particular order, and that would always vary depending on the other orders ongoing or scheduled, and the machines those orders were assigned to.
Back to your query with the above in mind, if you are defining particular toolings or tools as assets, it might comprise an approach to look at defining the assetTypeEnumId as 'tooling' or similar, and use the classEnumId across the asset types of machines and toolings to stipulate the maximum economic level of machine that the tool should be used with, etc.
Alternatively, or in addition, it might be useful to look at the AssetStandardCost entity and into setting up some enums for assetStandardCostTypeEnumId.
It would seem to me on the surface that the approach of trying to directly associate multiple toolings to multiple machines (with a range of constraints in addition) would quickly lead to exponential possibilities.
All in all, my experience would be that if you look into the existing entities they will typically suggest a variety of approaches, and later on when further requirements arise you may be glad you used what was existing rather than try and do something new.

Business requirements are difficult to design for without detail and context, but it sounds like what you really want to model is not at the Asset level but at the Product level. For asset type products the Product and related entities (like ProductAssoc) are used to define characteristics of physical items, Asset records represent the actual physical items.
One example of this is the maintenance side of things. The maintenance schedule is part of the Product definition (applicable to all assets for that product) and the maintenance history is part of the Asset side of things (applicable only to specific physical items).

Related

Is it better to use entity-arrtibute-value model over storing various different product in single description text column? [duplicate]

It is safe to say that the EAV/CR database model is bad. That said,
Question: What database model, technique, or pattern should be used to deal with "classes" of attributes describing e-commerce products which can be changed at run time?
In a good E-commerce database, you will store classes of options (like TV resolution then have a resolution for each TV, but the next product may not be a TV and not have "TV resolution"). How do you store them, search efficiently, and allow your users to setup product types with variable fields describing their products? If the search engine finds that customers typically search for TVs based on console depth, you could add console depth to your fields, then add a single depth for each tv product type at run time.
There is a nice common feature among good e-commerce apps where they show a set of products, then have "drill down" side menus where you can see "TV Resolution" as a header, and the top five most common TV Resolutions for the found set. You click one and it only shows TVs of that resolution, allowing you to further drill down by selecting other categories on the side menu. These options would be the dynamic product attributes added at run time.
Further discussion:
So long story short, are there any links out on the Internet or model descriptions that could "academically" fix the following setup? I thank Noel Kennedy for suggesting a category table, but the need may be greater than that. I describe it a different way below, trying to highlight the significance. I may need a viewpoint correction to solve the problem, or I may need to go deeper in to the EAV/CR.
Love the positive response to the EAV/CR model. My fellow developers all say what Jeffrey Kemp touched on below: "new entities must be modeled and designed by a professional" (taken out of context, read his response below). The problem is:
entities add and remove attributes weekly (search keywords dictate future attributes)
new entities arrive weekly (products are assembled from parts)
old entities go away weekly (archived, less popular, seasonal)
The customer wants to add attributes to the products for two reasons:
department / keyword search / comparison chart between like products
consumer product configuration before checkout
The attributes must have significance, not just a keyword search. If they want to compare all cakes that have a "whipped cream frosting", they can click cakes, click birthday theme, click whipped cream frosting, then check all cakes that are interesting knowing they all have whipped cream frosting. This is not specific to cakes, just an example.
There's a few general pros and cons I can think of, there are situations where one is better than the other:
Option 1, EAV Model:
Pro: less time to design and develop a simple application
Pro: new entities easy to add (might even
be added by users?)
Pro: "generic" interface components
Con: complex code required to validate simple data types
Con: much more complex SQL for simple
reports
Con: complex reports can become almost
impossible
Con: poor performance for large data sets
Option 2, Modelling each entity separately:
Con: more time required to gather
requirements and design
Con: new entities must be modelled and
designed by a professional
Con: custom interface components for each
entity
Pro: data type constraints and validation simple to implement
Pro: SQL is easy to write, easy to
understand and debug
Pro: even the most complex reports are relatively simple
Pro: best performance for large data sets
Option 3, Combination (model entities "properly", but add "extensions" for custom attributes for some/all entities)
Pro/Con: more time required to gather requirements and design than option 1 but perhaps not as much as option 2 *
Con: new entities must be modelled and designed by a professional
Pro: new attributes might be easily added later on
Con: complex code required to validate simple data types (for the custom attributes)
Con: custom interface components still required, but generic interface components may be possible for the custom attributes
Con: SQL becomes complex as soon as any custom attribute is included in a report
Con: good performance generally, unless you start need to search by or report by the custom attributes
* I'm not sure if Option 3 would necessarily save any time in the design phase.
Personally I would lean toward option 2, and avoid EAV wherever possible. However, for some scenarios the users need the flexibility that comes with EAV; but this comes with a great cost.
It is safe to say that the EAV/CR database model is bad.
No, it's not. It's just that they're an inefficient usage of relational databases. A purely key/value store works great with this model.
Now, to your real question: How to store various attributes and keep them searchable?
Just use EAV. In your case it would be a single extra table. index it on both attribute name and value, most RDBMs would use prefix-compression to on the attribute name repetitions, making it really fast and compact.
EAV/CR gets ugly when you use it to replace 'real' fields. As with every tool, overusing it is 'bad', and gives it a bad image.
// At this point, I'd like to take a moment to speak to you about the Magento/Adobe PSD format.
// Magento/PSD is not a good ecommerce platform/format. Magento/PSD is not even a bad ecommerce platform/format. Calling it such would be an
// insult to other bad ecommerce platform/formats, such as Zencart or OsCommerce. No, Magento/PSD is an abysmal ecommerce platform/format. Having
// worked on this code for several weeks now, my hate for Magento/PSD has grown to a raging fire
// that burns with the fierce passion of a million suns.
http://code.google.com/p/xee/source/browse/trunk/XeePhotoshopLoader.m?spec=svn28&r=11#107
The internal models are wacky at best, like someone put the schema into a boggle game, sealed that and put it in a paint shacker...
Real world: I'm working on a midware fulfilment app and here are one the queries to get address information.
CREATE OR REPLACE VIEW sales_flat_addresses AS
SELECT sales_order_entity.parent_id AS order_id,
sales_order_entity.entity_id,
CONCAT(CONCAT(UCASE(MID(sales_order_entity_varchar.value,1,1)),MID(sales_order_entity_varchar.value,2)), "Address") as type,
GROUP_CONCAT(
CONCAT( eav_attribute.attribute_code," ::::: ", sales_order_entity_varchar.value )
ORDER BY sales_order_entity_varchar.value DESC
SEPARATOR '!!!!!'
) as data
FROM sales_order_entity
INNER JOIN sales_order_entity_varchar ON sales_order_entity_varchar.entity_id = sales_order_entity.entity_id
INNER JOIN eav_attribute ON eav_attribute.attribute_id = sales_order_entity_varchar.attribute_id
AND sales_order_entity.entity_type_id =12
GROUP BY sales_order_entity.entity_id
ORDER BY eav_attribute.attribute_code = 'address_type'
Exacts address information for an order, lazily
--
Summary: Only use Magento if:
You are being given large sacks of money
You must
Enjoy pain
I'm surprised nobody mentioned NoSQL databases.
I've never practiced NoSQL in a production context (just tested MongoDB and was impressed) but the whole point of NoSQL is being able to save items with varying attributes in the same "document".
Where performance is not a major requirement, as in an ETL type of application, EAV has another distinct advantage: differential saves.
I've implemented a number of applications where an over-arching requirement was the ability to see the history of a domain object from its first "version" to it's current state. If that domain object has a large number of attributes, that means each change requires a new row be inserted into it's corresponding table (not an update because the history would be lost, but an insert). Let's say this domain object is a Person, and I have 500k Persons to track with an average of 100+ changes over the Persons life-cycle to various attributes. Couple that with the fact that rare is the application that has only 1 major domain object and you'll quickly surmize that the size of the database would quickly grow out of control.
An easy solution is to save only the differential changes to the major domain objects rather than repeatedly saving redundant information.
All models change over time to reflect new business needs. Period. Using EAV is but one of the tools in our box to use; but it should never be automatically classified as "bad".
I'm struggling with the same issue. It may be interesting for you to check out the following discussion on two existing ecommerce solutions: Magento (EAV) and Joomla (regular relational structure):
https://forum.virtuemart.net/index.php?topic=58686.0
It seems, that Magento's EAV performance is a real showstopper.
That's why I'm leaning towards a normalized structure. To overcome the lack of flexibility I'm thinking about adding some separate data dictionary in the future (XML or separate DB tables) that could be edited, and based on that, application code for displaying and comparing product categories with new attributes set would be generated, together with SQL scripts.
Such architecture seems to be the sweetspot in this case - flexible and performant at the same time.
The problem could be frequent use of ALTER TABLE in live environment. I'm using Postgres, so its MVCC and transactional DDL will hopefully ease the pain.
I still vote for modeling at the lowest-meaningful atomic-level for EAV. Let standards, technologies and applications that gear toward certain user community to decide content models, repetition needs of attributes, grains, etc.
If it's just about the product catalog attributes and hence validation requirements for those attributes are rather limited, the only real downside to EAV is query performance and even that is only a problem when your query deals with multiple "things" (products) with attributes, the performance for the query "give me all attributes for the product with id 234" while not optimal is still plenty fast.
One solution is to use the SQL database / EAV model only for the admin / edit side of the product catalog and have some process that denormalizes the products into something that makes it searchable. Since you already have attributes and hence it's rather likely that you want faceting, this something could be Solr or ElasticSearch. This approach avoids basically all downsides to the EAV model and the added complexity is limited to serializing a complete product to JSON on update.
EAV has many drawbacks:
Performance degradation over time
Once the amount of data in the application grows beyond a certain size, the retrieval and manipulation of that data is likely to become less and less efficient.
The SQL queries are very complex and difficult to write.
Data Integrity problems.
You can't define foreign keys for all the fields needed.
You have to define and maintain your own metadata.
I have a slightly different problem: instead of many attributes with sparse values (which is possibly a good reason to use EAV), I want to store something more like a spreadsheet. The columns in the sheet can change, but within a sheet all cells will contain data (not sparse).
I made a small set of tests to benchmark two designs: one using EAV, and the other using a Postgres ARRAY to store cell data.
EAV
Array
Both schemas have indexes on appropriate columns, and the indexes are used by the planner.
It turned out the array-based schema was an order of magnitude faster for both inserts and queries. From quick tests, it seemed that both scaled linearly. The tests aren't very thorough, though. Suggestions and forks welcome - they're under an MIT licence.

Domain Driven Design - Creating general purpose entities vs. Context specific Entities

Situation
Suppose you have Orders and Clients as entities in your application. In one aggregate, the Order entity is considered to be the root but you also want to make use of the Client entity for simple things. In another the Client is the root entity and the Order entity is touched ever so lightly.
An example:
Let's say that in the Order aggregate I use the Client only to read details like name, address, build order history and not to make the client do client specific business logic. (like persistence, passwords resets and back flips..).
On the other hand, in the Client aggregate I use the Order entity to report on the client's buying habbits, order totals, order counting, without requiring advanced order functionality like order processing, updating, status changes, etc.
Possible solution
I believe the better solution is to create the entities for each aggregate specific to the aggregate context, because making them full featured (general purpose) and ready for any situation and usage seems like overkill and could potentially become a maintenance nightmare. (and potentially memory intensive)
Question
What is the DDD recommended way of handling this situation?
What is your take on the matter?
The basic driver for these decisions should be the ubiquitous language, and consequently the real world domain you're modeling. If both works in a specific domain, I'd favor separation over god-classes for maintainability reasons.
Apart from separating behavior into different aggregates, you should also take care that you don't mix different bounded contexts. Depending on the requirements of your domain, it could make sense to separate the Purchase Context from the Reporting Context (to extend on your example).
To decide on a context design, context maps are a helpful tool.
You are one the right track. In DDD, entities are not merely containers encapsulating all attributes related to a "subject" (for example: a customer, or an order). This is a very important concept that eludes a lot of people. An entity in DDD represents an operation boundary, thus only the data necessary to perform the operation is considered to be a part of the entity. Exactly which data to include in an entity can be difficult to consider because some data is relevant in a different use-cases. Here are some tips when analyzing data:
Analyze invariants, things that must be considered when applying validation rules and that can not be out of sync should be in the same aggregate.
Drop the database-thinking, normalization is not a concern of DDD
Just because things look the same, it doesn't mean that they are. For example: the current shipping address registered on a customer is different from the shipping address which a specific order was shipped to.
Don't look at reads. Reading, like creating a report or populating av viewmodel/dto/whatever has nothing to do with operation boundaries and can typically be a 360 deg view of the data. In fact don't event use your domain model when returning reads, use a different architectural stack.

Product groups - Better data model? New paradigm for implementation?

Apologies for the less than ideal title; had a hard time coming up with something.
Three entities: Products, Clients and Client_Product_Authorization.
A Client has access to many Products.
A Product is accessed by many Clients.
A Client_Product_Authorization authorizes a Client to access a Product.
We have 100,000+ Products (virtual goods) and managing Client_Product_Authorizations becomes a very labor intensive task. So we have Products that is a container of other Products. Example:
Product X is a container for Products 1 through 2000.
So, by creating a Client_Product_Authorization granting a Client the Product X, we are indirectly providing for the client to access products 1 through 2000. Mind that product 1 might be contained in different container products (so, yes, it is a many-to-many self relationship).
Here is the entity level data model:
The advantage of this mechanism is that we can just change the composition of Product X (adding or removing other products) and it automatically adjusts the product list available to the clients authorized to Product X. Managing awarding access to our large product-base is a matter of selecting a few container products.
The disadvantage is that it now became harder (in terms of creating a SQL statement, because of the many-to-many self-relationship) to know what the Client is actually authorized to see (as in the individual non-container products). As in:
Product Z is a container for Product X and Product Y
Product X is a container for Products 1 through 2000
Product Y is a container for Products 2001 through 5000
What are the actual non-container products a client authorized to Product Z can see?
Products 1 through 2000 and 2001 through 5000.
I would like to make the list of non-container products a client is authorized for to be materialized in some way. So that questions like:
Should Client ABC be allowed to see Product 78?
OR
What products is Client ABC authorized to see?
can be easily responded with a query.
The goal is to make the job of software trying to determine the list of products accessible to a client a simple mechanism, instead of requiring a traversal through all container products, their sub-container products, etc etc etc.
Three questions:
a) Is there a better data-model for this scenario?
b) Is there a different mechanism to simplify the management of access authorization for this large set of products?
c) How would you go about making the obtention of the list of non-container products available to a client as simple as possible?
I appreciate the collective's input. Thanks in advance!
Update: Limiting the number of nested products is not an option for business reasons.
Since you're using SQL 2005 you should have access to common table expressions (CTEs) which makes the recursion of finding the children of a product much easier. You might want to look into CTEs and see if that's sufficient for what you're doing.
Also, I don't recall this specific scenario and my copy of the book is at home, but Joe Celko wrote a very good book on modeling hierarchies and trees in an RDBMS. It's probably worth looking into to see if there is a better model for this. He had a few rather ingenious ones for other scenarios that didn't seem obvious at first, but which are very efficient. Even if there isn't a direct match, some of the techniques which he uses might be useful.
The model which you have is what's referred to as the adjacency list model. Celko also shows how to model hierarchies using what are called the nested set model and the path enumeration model.
The nested set model may seem a little complicated at first, but it's actually simple in a way. It's more expensive for updating, but selects from it are VERY fast compared to just about any other way to model hierarchies. You can find an abbreviated description of it here. Since a product can be contained in multiple trees you would have to adapt it slightly for your case.
The path enumeration model basically just uses a delimited (or XML) string to list out the path to the row in question, starting at the root of the tree. You then use string (or XQuery) functions to find children of a parent, etc. As far as I know, it's only really useful for trees, which have a single root, so I don't think that you could use it in your case.
From a purely relational perspective (and my own personal perspective), what you have is the most logical means of representation. The downside, of course, is that SQL isn't terribly well-suited for recursion. There are other paradigms out there for storing tree structures (this is a long-standing issue in RDBMS discussions), but I won't get into them because I generally feel like they sacrifice readability and maintainability in favor of ease of querying. That may be what you need, but I don't know.
Since you're running 2005, you can do recursion pretty easily with common table expressions (CTE's; see this MSDN article). Otherwise, you either have to turn it into a multi-step procedure (a stored procedure is probably a good candidate, since you can hide the implementation complexity from the calling code) or limit the number of levels you're willing to look (ie, products can only be nested 3 levels deep) by adding left joins for each level.

Service Design (WCF, ASMX, SOA)

Soliciting feedback/thoughts on a pattern or best practice to address a situation that I have seen a few times over the years, yet I haven't found any one solution that addresses it the way I'd like.
Here is the background.
Company has 3 applications supporting 3 separate "lines of business" that are very much related to each other. Two of the applications are literally copy/paste from the original. The applications need to be able to grow at different rates and have slightly different functionality. The main differences in functionality come from the data entry fields. The differences essentially fall into one of the following categories:
One instance has a few fields
that the other does not.
String field has a max length of 200 in one
instance, but 50 in another.
Lookup/Reference fields have
different underlying values (i.e.
same table structures, but coming
from different databases).
A field is defined as a user supplied,
free text, value in one instance,
but a lookup/reference in another.
The problem is that there are other applications within the company that need to consume data from these three separate applications, but ideally, talk to them in a core/centralized manner (i.e. through a central service rather than 3 separate services). My question is how to handle, in particular, item D above. I am thinking a "lowest common denominator" approach might be the only way. For example:
<SomeFieldName>
<Code></Code> <!-- would store a FK ref value if instance used lookup, otherwise would be empty or nonexistent-->
<Text></Text> <!-- would store the text from the lookup if instance used lookup, would store user supplied text if not-->
</SomeFieldName>
Other thoughts/ideas on this?
TIA!
So are the differences strictly from a Datamodel view or are there functional business / behavioral differences at the application level.
If the later is the case then I would definetly go down the path you appear to be heading down with SOA. Now how you impliment your SOA just depends upon your architecture needs. What I would look at for design would be into some various patterns. Its hard to say for sure which one(s) would meet the needs with out more information / example on how the behavioral / functional differences are being used. From off of the top of my head tho with what you have described I would probably start off looking at a Strategy pattern in my initial design.
Definetly prototype this using TDD so that you can determine if your heading down the right path.
How about: extend your LCD approach, put a facade in front of these systems. devise a normalised form of the data which (if populated with enough data) can be transformed to any of the specific instances. [Heading towards an ESB here.]
Then you have the problem, how does a client know what "enough" is? Some kind of meta-data may be needed so that you can present a suiatble UI. So extend the services to provide an operation to deliver the meta data.

Entity Attribute Value Database vs. strict Relational Model Ecommerce

It is safe to say that the EAV/CR database model is bad. That said,
Question: What database model, technique, or pattern should be used to deal with "classes" of attributes describing e-commerce products which can be changed at run time?
In a good E-commerce database, you will store classes of options (like TV resolution then have a resolution for each TV, but the next product may not be a TV and not have "TV resolution"). How do you store them, search efficiently, and allow your users to setup product types with variable fields describing their products? If the search engine finds that customers typically search for TVs based on console depth, you could add console depth to your fields, then add a single depth for each tv product type at run time.
There is a nice common feature among good e-commerce apps where they show a set of products, then have "drill down" side menus where you can see "TV Resolution" as a header, and the top five most common TV Resolutions for the found set. You click one and it only shows TVs of that resolution, allowing you to further drill down by selecting other categories on the side menu. These options would be the dynamic product attributes added at run time.
Further discussion:
So long story short, are there any links out on the Internet or model descriptions that could "academically" fix the following setup? I thank Noel Kennedy for suggesting a category table, but the need may be greater than that. I describe it a different way below, trying to highlight the significance. I may need a viewpoint correction to solve the problem, or I may need to go deeper in to the EAV/CR.
Love the positive response to the EAV/CR model. My fellow developers all say what Jeffrey Kemp touched on below: "new entities must be modeled and designed by a professional" (taken out of context, read his response below). The problem is:
entities add and remove attributes weekly (search keywords dictate future attributes)
new entities arrive weekly (products are assembled from parts)
old entities go away weekly (archived, less popular, seasonal)
The customer wants to add attributes to the products for two reasons:
department / keyword search / comparison chart between like products
consumer product configuration before checkout
The attributes must have significance, not just a keyword search. If they want to compare all cakes that have a "whipped cream frosting", they can click cakes, click birthday theme, click whipped cream frosting, then check all cakes that are interesting knowing they all have whipped cream frosting. This is not specific to cakes, just an example.
There's a few general pros and cons I can think of, there are situations where one is better than the other:
Option 1, EAV Model:
Pro: less time to design and develop a simple application
Pro: new entities easy to add (might even
be added by users?)
Pro: "generic" interface components
Con: complex code required to validate simple data types
Con: much more complex SQL for simple
reports
Con: complex reports can become almost
impossible
Con: poor performance for large data sets
Option 2, Modelling each entity separately:
Con: more time required to gather
requirements and design
Con: new entities must be modelled and
designed by a professional
Con: custom interface components for each
entity
Pro: data type constraints and validation simple to implement
Pro: SQL is easy to write, easy to
understand and debug
Pro: even the most complex reports are relatively simple
Pro: best performance for large data sets
Option 3, Combination (model entities "properly", but add "extensions" for custom attributes for some/all entities)
Pro/Con: more time required to gather requirements and design than option 1 but perhaps not as much as option 2 *
Con: new entities must be modelled and designed by a professional
Pro: new attributes might be easily added later on
Con: complex code required to validate simple data types (for the custom attributes)
Con: custom interface components still required, but generic interface components may be possible for the custom attributes
Con: SQL becomes complex as soon as any custom attribute is included in a report
Con: good performance generally, unless you start need to search by or report by the custom attributes
* I'm not sure if Option 3 would necessarily save any time in the design phase.
Personally I would lean toward option 2, and avoid EAV wherever possible. However, for some scenarios the users need the flexibility that comes with EAV; but this comes with a great cost.
It is safe to say that the EAV/CR database model is bad.
No, it's not. It's just that they're an inefficient usage of relational databases. A purely key/value store works great with this model.
Now, to your real question: How to store various attributes and keep them searchable?
Just use EAV. In your case it would be a single extra table. index it on both attribute name and value, most RDBMs would use prefix-compression to on the attribute name repetitions, making it really fast and compact.
EAV/CR gets ugly when you use it to replace 'real' fields. As with every tool, overusing it is 'bad', and gives it a bad image.
// At this point, I'd like to take a moment to speak to you about the Magento/Adobe PSD format.
// Magento/PSD is not a good ecommerce platform/format. Magento/PSD is not even a bad ecommerce platform/format. Calling it such would be an
// insult to other bad ecommerce platform/formats, such as Zencart or OsCommerce. No, Magento/PSD is an abysmal ecommerce platform/format. Having
// worked on this code for several weeks now, my hate for Magento/PSD has grown to a raging fire
// that burns with the fierce passion of a million suns.
http://code.google.com/p/xee/source/browse/trunk/XeePhotoshopLoader.m?spec=svn28&r=11#107
The internal models are wacky at best, like someone put the schema into a boggle game, sealed that and put it in a paint shacker...
Real world: I'm working on a midware fulfilment app and here are one the queries to get address information.
CREATE OR REPLACE VIEW sales_flat_addresses AS
SELECT sales_order_entity.parent_id AS order_id,
sales_order_entity.entity_id,
CONCAT(CONCAT(UCASE(MID(sales_order_entity_varchar.value,1,1)),MID(sales_order_entity_varchar.value,2)), "Address") as type,
GROUP_CONCAT(
CONCAT( eav_attribute.attribute_code," ::::: ", sales_order_entity_varchar.value )
ORDER BY sales_order_entity_varchar.value DESC
SEPARATOR '!!!!!'
) as data
FROM sales_order_entity
INNER JOIN sales_order_entity_varchar ON sales_order_entity_varchar.entity_id = sales_order_entity.entity_id
INNER JOIN eav_attribute ON eav_attribute.attribute_id = sales_order_entity_varchar.attribute_id
AND sales_order_entity.entity_type_id =12
GROUP BY sales_order_entity.entity_id
ORDER BY eav_attribute.attribute_code = 'address_type'
Exacts address information for an order, lazily
--
Summary: Only use Magento if:
You are being given large sacks of money
You must
Enjoy pain
I'm surprised nobody mentioned NoSQL databases.
I've never practiced NoSQL in a production context (just tested MongoDB and was impressed) but the whole point of NoSQL is being able to save items with varying attributes in the same "document".
Where performance is not a major requirement, as in an ETL type of application, EAV has another distinct advantage: differential saves.
I've implemented a number of applications where an over-arching requirement was the ability to see the history of a domain object from its first "version" to it's current state. If that domain object has a large number of attributes, that means each change requires a new row be inserted into it's corresponding table (not an update because the history would be lost, but an insert). Let's say this domain object is a Person, and I have 500k Persons to track with an average of 100+ changes over the Persons life-cycle to various attributes. Couple that with the fact that rare is the application that has only 1 major domain object and you'll quickly surmize that the size of the database would quickly grow out of control.
An easy solution is to save only the differential changes to the major domain objects rather than repeatedly saving redundant information.
All models change over time to reflect new business needs. Period. Using EAV is but one of the tools in our box to use; but it should never be automatically classified as "bad".
I'm struggling with the same issue. It may be interesting for you to check out the following discussion on two existing ecommerce solutions: Magento (EAV) and Joomla (regular relational structure):
https://forum.virtuemart.net/index.php?topic=58686.0
It seems, that Magento's EAV performance is a real showstopper.
That's why I'm leaning towards a normalized structure. To overcome the lack of flexibility I'm thinking about adding some separate data dictionary in the future (XML or separate DB tables) that could be edited, and based on that, application code for displaying and comparing product categories with new attributes set would be generated, together with SQL scripts.
Such architecture seems to be the sweetspot in this case - flexible and performant at the same time.
The problem could be frequent use of ALTER TABLE in live environment. I'm using Postgres, so its MVCC and transactional DDL will hopefully ease the pain.
I still vote for modeling at the lowest-meaningful atomic-level for EAV. Let standards, technologies and applications that gear toward certain user community to decide content models, repetition needs of attributes, grains, etc.
If it's just about the product catalog attributes and hence validation requirements for those attributes are rather limited, the only real downside to EAV is query performance and even that is only a problem when your query deals with multiple "things" (products) with attributes, the performance for the query "give me all attributes for the product with id 234" while not optimal is still plenty fast.
One solution is to use the SQL database / EAV model only for the admin / edit side of the product catalog and have some process that denormalizes the products into something that makes it searchable. Since you already have attributes and hence it's rather likely that you want faceting, this something could be Solr or ElasticSearch. This approach avoids basically all downsides to the EAV model and the added complexity is limited to serializing a complete product to JSON on update.
EAV has many drawbacks:
Performance degradation over time
Once the amount of data in the application grows beyond a certain size, the retrieval and manipulation of that data is likely to become less and less efficient.
The SQL queries are very complex and difficult to write.
Data Integrity problems.
You can't define foreign keys for all the fields needed.
You have to define and maintain your own metadata.
I have a slightly different problem: instead of many attributes with sparse values (which is possibly a good reason to use EAV), I want to store something more like a spreadsheet. The columns in the sheet can change, but within a sheet all cells will contain data (not sparse).
I made a small set of tests to benchmark two designs: one using EAV, and the other using a Postgres ARRAY to store cell data.
EAV
Array
Both schemas have indexes on appropriate columns, and the indexes are used by the planner.
It turned out the array-based schema was an order of magnitude faster for both inserts and queries. From quick tests, it seemed that both scaled linearly. The tests aren't very thorough, though. Suggestions and forks welcome - they're under an MIT licence.