Why is it recommended to avoid bidirectional relations in ORM? - orm

What are the technical reasons that bidirectional relations between entities are not recommended? Does it impact an ORM's performance? (If so, why?)
Source:
http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/best-practices.html#constrain-relationships-as-much-as-possible
https://ocramius.github.io/doctrine-best-practices/#/86

In that first source you refer to are three reasons mentioned:
This has several benefits:
Reduced coupling in your domain model
Simpler code in your domain model (no need to maintain bidirectionality properly)
Less work for Doctrine
In the second:
BI-DIRECTIONAL ASSOCIATIONS ARE OVERHEAD
I assume those are the whys. "Less work doctrine" and "are overhead" most likely means that it impacts performance, I wouldn't know how else to interpret that...
Makes sense since the ORM needs to update both sides whenever you change something in a bi-directional relationship.

As well as the reasons mentioned in the source you provided (and Wilt's answer) having a lot of relationships between entities makes it easier to violate single responsibility and can make your code more complex.
Take this example, I want to update a user's phone number from a certain part of the code. I currently only have access to an organization the user belongs to. If I have a full path of connections between entities I can do this:
foreach ($organization->getDepartments() as $department) {
if ($department->getName() == 'sales') {
foreach ($department->getMembers() as $member) {
if ($member->getName == 'Kevin') {
$member->setPhoneNumber(012343929394);
}
}
}
}
It's a personal preference but I think that making this sort of thing hard to do is a good idea. Instead you would fetch the member based on name from the database in a dedicated service for editing user info. This means your logic is more encapsulated. A new developer working on the code will be more likely to look for the UserEditService if they don't have access to everything from everywhere.

Related

How to avoid aggregate being dependent on outside includes?

I do not use lazy loading. My root aggregate have entities (collection navigation properties). I want my aggregate to be self-contained, responsible for itself, and follow the Single Responsibility Principle (SRP), and adhere to high cohesion and low coupling.
The problem is that the code that retrieves the root aggregate needs to include certain child entities depending on which way it wants to interact with the aggregate.
Example:
public class Blog // My root aggregate
{
public ICollection<Author> Authors { get; set; }
public ICollection<Post> Posts { get; set; }
public AddAuthor(Author author)
{
_authors.Add(author);
}
public AddPost(Post post)
{
_posts.Add(post);
}
}
If I want to add a author, I have to do:
var blog = _context.Blogs.Include(x => x.Authors).Single(x => x.BlogId == 1);
blog.AddAuthor(/* ... */);
And if I want to add a post, I would have to do:
var blog = _context.Blogs.Include(x => x.Posts).Single(x => x.BlogId == 1);
blog.AddPost(/* ... */);
But I feel this breaks encapsulation, because now my Blog aggregate is not self-contained, its functionality depends on how the caller has retrieved the aggregate from the DbContext (or the repository). If the caller did not include the necessary dependent entities then the operation on the aggregate would fail (since the property would be null).
I would like to avoid lazy loading because it is less suitable for web applications and performs worse due to executing multiple queries. I feel that having a repository with methods such as GetBlogWithAuthors and GetBlogWithPosts would be ugly. Do I have to create a repository method such as GetBlog which always include all child entities? (this would a big, slow query, that would timeout).
Are there any solutions to this problem?
I realize it is probably a practice domain but I think an important point that is not talked about enough is that strict DDD should not always be applied. DDD brings a certain amount of complexity to minimize the explosion of complexity. If there is little complexity to start with, it is not worth the added upfront complexity.
As was mentioned in the comments, and Aggregate is a consistency boundary. Since there does not seem to be any consistency being enforced, you can split it. Blog can have a collection of PostRef or something so it need not pull back ALL Post data where PostRef has maybe Id and Title?
Then Post is its own aggregate. I am guessing that Post has an Author. It is recommended not to reference entities in other aggregates that are not the aggregate root so now it seems like Authors should not be in Blog.
When your starting point is an ORM, my experience is that your model will fight the DDD recommendations. Create your model and then see how to persist your aggregates. My and many other's experiences at that point is that an ORM just isn't worth the yak shaving that it brings throughout the project. It is also far too easy for someone who does not understand the constraints to add a reference that should not be there.
To address performance concerns. Remember that your read and write models do not have to be the same. You optimize your write model for enforcing constraints. If separate you can then optimize your read model for query performance. If this sounds like CQRS to you, then you are correct. Again though, the number of moving parts increases and it should solve more problems than it introduces. Again, your ORM will fight you on this.
Lastly, if you do have consistency constraints that require really large amounts of data, you need to ask the question of whether they really need to be enforced in real-time? When you start modeling time, some new options emerge.
SubmittedPost -> RejectedPost OR AcceptedPost -> PublishedPost. If this happens as a background process, the amount of data that needs to be pulled will not affect UX. If this sounds interesting I suggest you take a look at the great book Domain Modeling made Functional.
Some other resources:
Shameless plug: Functional modeling
Nick has an example of
relaxing invariant business rules when accepting input
A
discussion on aggregates...it went deep fast. This question was asked
and I don't think we answered it well so hopefully, I did that better here.

How to model a non member aggregate in UML class diagram

In the below UML diagram, Account has an aggregation of Orders. Based on most online resources, this would typically mean Account class has something similar to a List as an instance.
But in reality, for a real world web app with persistent storage, that is not usually how the Account Class would be. It won't have a list of orders as instance. Instead some other controller class will just query a datastore asking for all Orders belonging to an Account. So in a UML class diagram for such an app, is this still the right way to represent relations? The cardinality and maybe the concept of aggregation looks right from a database entity perspective. Just that the diamond makes no sense from a Class perspective.
Or should it show a DataStore/DataManager with a getOrdersForAccount() method and connect it to Account class and Orders class through a dependency relation (dotted line with arrow) ?
This depends on what you want to represent.
The class model you have already would be sufficient as a logical domain model, expressing the logical relationships between entities in your domain. This might not be how you implement your software in code precisely, but it will guide you (and others) in understanding the entities and their relationships without getting bogged down in that implementation detail. At this level, your diagram may have a few design choices (strong aggregation for example is arguably a design choice, but it may not be, as is the use of enumerations and keys) but not that many and nothing that really detracts from the underlying logic. If anything, you could loose some design choices here and improve the expression of logic.
What you may also want is to provide a representation of how the OO code is implemented physically as well. This would be an additional class diagram that shows more precisely the implementation detail. You will have far more design choices in this diagram -- whether to use a collection or not for orders (e.g. a list or some other collection type class), what your data access patterns are (Adapters, Managers, ORMs etc.). At this level you will most likely loose the strong aggregate notation, as at this level we are talking about classes referencing each other which is most simply denoted using basic associations. You might want to use arrows and/or dot-notation to indicate end ownership and reference directions so that it's more clear what the relationships between classes are.
So, I think your question is a classic question about levels of abstraction in models and analysis vs design. Thanks for asking it!
The aggregation just means: "if you delete the account you need to delete the orders as well".
I also recommend to just leave the aggregation away (for most cases) since it only adds little extra semantics to your model. In this case it seems obvious to delete the order when the account is deleted. The only thing the aggregation added here is (as in most cases) some confusion or some futile discussions about the worth of that diamond.
If you have a domain where the filled diamond is used it should be documented in the modeling rules. When using the shared aggregation the documentation is even mandatory since there is no semantics per se in the specs (see box on p. 110 of UML 2.5).
It depends on how deep you want to go with UML design.
If you target code generation from UML then you probably need to add the class you mentioned.
It would look a lot like Registry Pattern:
UML Diagram
You can add abstraction so you can change implementation of your DataManager (if your DataManager is third-party then just call the API from DataManagerImplementation).
After that, depending on your implementation, once you have the list, if you need to keep it then add the association Account -> Order, if you can live with the list on the stack then you are good to go.
C++ instanciation example:
DataManagerImplementation *db = new DataManagerImplementation();
// Dependency injection
Account *acc = new Account(db);
Then in 'Account' class:
Account::Account(DataManager *db)
{
// Fetch list at creation
// Here 'orders' could be a member
m_db = db;
vector<Order*> *orders = m_db->GetOrders(this);
}
PS: I also recommend to put arrow (direction) on association/aggregation, otherwise it implies that the association is bi-directional and so that account has a pointer to an order list, and every order also has a pointer to an account, and I am not sure this is needed.
To edit PlantUML: http://www.plantuml.com/plantuml/png/SoWkIImgAStDuN99B4dqJSnBJ4yjyimjo4dDJSqhIIp9pCzJqDMjiLFmBqf9BK9ImuKk05Hcfw2afGHHYIbjfL2McboINsG3bj6oKz1oJoq1iuir79EJyqlpIZIve0m5a566IfYMEgJcfG0T2m00

How to use model structure from ZF2 Blog tutorial in a real world application?

I'm currently developing a ZF2 application and want to implement it pretty similar to the ZF2 example Blog application. Probably in the future the DAL will be replaced by Doctrine, but in the first version the model should work like in the Blog application: Service + Mapper + Data Objects.
In the Blog application the method ZendDbSqlMapper#save(...) gets (like every other public method of a Mapper) an Data Object as argument, extracts it, and writes then the data to the database. But my real world case is a bit more complex and I don't (but want to) understand, whether the approach is still applicable to it and how.
The application should primarily deal with (saving and retrieving of) requests / orders of some technical services. (In the next step they are manually processed by an employee and implemented.) So, the common case will be the saving (updating/creating) of an Order.
The physical model looks like this:
As you can see, the Order has some dependencies, that also have dependencies etc. On creating an OrderI have to create a LogicalConnection first. For a LogicalConnectionan (abstract) PhysicalConnectionand a concrete physical connection variant like PhysicalConnectionX are needed. (It implements the Class Table Inheritance.) Furthermore a LogicalConnectionneeds a new Customer (to simplify: every time a new customer for a new order) and an Endpoint with a concrete endpoint variant like EndpointA (also a CTI implementation). The tables on the left side of the data model are just basic data, that should/can not be changed. (Of course the updating is even a bit more complicated, since I have to check for every related object, if it already exists, to avoid e.g. creating of multiple customers for the same endpoint.)
My first idea was to implement it like this:
transform the input, the model gets from the form (I don't use Zend\Collection, because my for is structured completely diffreren that my objects and the database);
hydrate the Order object for it (recursive hydration is already implemented);
create a Mapper for every object type;
and let every Mapper#save(...)
call the save(...)on the mappers of the object it depends on;
and then care only for its object.
Pseudocode:
MyDataObjectA {
$id;
$myObjectB;
}
MyDataObjectB {
$id;
}
MapperA {
save($dataObjectA) {
saving $dataObjectA
calling MapperA#save($dataObjectA->getObjectB() )
}
}
MapperB {
save($dataObjectB) {
saving $dataObjectB
}
}
It's a lot of code and every case has to be handled manually. (And I'm not sure, but maybe I can have some problems with context dependent saving, since this approach doesn't concider the context.) However -- I don't believe, it's a recommended solution.
Well, it might smack of an ORM. But what's about the model structure from the ZF2 Blog tutorial? Is it applicable for such a case? Or is it only useful for very simple structures and nearly never for a real world application? (Then I would ask -- do we really need this tutorial, if it shows an approach, that nearly never can be used in a real application?) Or maybe I just understand something wrong and there is a better (efficient, elegant etc.) approach?

Object persistence terminology: 'repository' vs. 'store' vs. 'context' vs. 'retriever' vs. (...)

I'm not sure how to name data store classes when designing a program's data access layer (DAL).
(By data store class, I mean a class that is responsible to read a persisted object into memory, or to persist an in-memory object.)
It seems reasonable to name a data store class according to two things:
what kinds of objects it handles;
whether it loads and/or persists such objects.
⇒ A class that loads Banana objects might be called e.g. BananaSource.
I don't know how to go about the second point (ie. the Source bit in the example). I've seen different nouns apparently used for just that purpose:
repository: this sounds very general. Does this denote something read-/write-accessible?
store: this sounds like something that potentially allows write access.
context: sounds very abstract. I've seen this with LINQ and object-relational mappers (ORMs).
P.S. (several months later): This is probably appropriate for containers that contain "active" or otherwise supervised objects (the Unit of Work pattern comes to mind).
retriever: sounds like something read-only.
source & sink: probably not appropriate for object persistence; a better fit with data streams?
reader / writer: quite clear in its intention, but sounds too technical to me.
Are these names arbitrary, or are there widely accepted meanings / semantic differences behind each? More specifically, I wonder:
What names would be appropriate for read-only data stores?
What names would be appropriate for write-only data stores?
What names would be appropriate for mostly read-only data stores that are occasionally updated?
What names would be appropriate for mostly write-only data stores that are occasionally read?
Does one name fit all scenarios equally well?
As noone has yet answered the question, I'll post on what I have decided in the meantime.
Just for the record, I have pretty much decided on calling most data store classes repositories. First, it appears to be the most neutral, non-technical term from the list I suggested, and it seems to be well in line with the Repository pattern.
Generally, "repository" seems to fit well where data retrieval/persistence interfaces are something similar to the following:
public interface IRepository<TResource, TId>
{
int Count { get; }
TResource GetById(TId id);
IEnumerable<TResource> GetManyBySomeCriteria(...);
TId Add(TResource resource);
void Remove(TId id);
void Remove(TResource resource);
...
}
Another term I have decided on using is provider, which I'll be preferring over "repository" whenever objects are generated on-the-fly instead of being retrieved from a persistence store, or when access to a persistence store happens in a purely read-only manner. (Factory would also be appropriate, but sounds more technical, and I have decided against technical terms for most uses.)
P.S.: Some time has gone by since writing this answer, and I've had several opportunities at work to review someone else's code. One term I've thus added to my vocabulary is Service, which I am reserving for SOA scenarios: I might publish a FooService that is backed by a private Foo repository or provider. The "service" is basically just a thin public-facing layer above these that takes care of things like authentication, authorization, or aggregating / batching DTOs for proper "chunkiness" of service responses.
Well so to add something to you conclusion:
A repository: is meant to only care about one entity and has certain patterns like you did.
A store: is allowed to do a bit more, also working with other entities.
A reader/writer: is separated to allow semantically show and inject only reading and wrting functionality into other classes. It's coming from the CQRS pattern.
A context: is more or less bound to a ORM mapper as you mentioned and is usually used under the hood of a repository or store, some use it directly instead of making a repository on top. But it's harder to abstract.

Single Responsibility Principle vs Anemic Domain Model anti-pattern

I'm in a project that takes the Single Responsibility Principle pretty seriously. We have a lot of small classes and things are quite simple. However, we have an anemic domain model - there is no behaviour in any of our model classes, they are just property bags. This isn't a complaint about our design - it actually seems to work quite well
During design reviews, SRP is brought out whenever new behaviour is added to the system, and so new behaviour typically ends up in a new class. This keeps things very easily unit testable, but I am perplexed sometimes because it feels like pulling behaviour out of the place where it's relevant.
I'm trying to improve my understanding of how to apply SRP properly. It seems to me that SRP is in opposition to adding business modelling behaviour that shares the same context to one object, because the object inevitably ends up either doing more than one related thing, or doing one thing but knowing multiple business rules that change the shape of its outputs.
If that is so, then it feels like the end result is an Anemic Domain Model, which is certainly the case in our project. Yet the Anemic Domain Model is an anti-pattern.
Can these two ideas coexist?
EDIT: A couple of context related links:
SRP - http://www.objectmentor.com/resources/articles/srp.pdf
Anemic Domain Model - http://martinfowler.com/bliki/AnemicDomainModel.html
I'm not the kind of developer who just likes to find a prophet and follow what they say as gospel. So I don't provide links to these as a way of stating "these are the rules", just as a source of definition of the two concepts.
Rich Domain Model (RDM) and Single Responsibility Principle (SRP) are not necessarily at odds. RDM is more at odds with a very specialised subclassof SRP - the model advocating "data beans + all business logic in controller classes" (DBABLICC).
If you read Martin's SRP chapter, you'll see his modem example is entirely in the domain layer, but abstracting the DataChannel and Connection concepts as separate classes. He keeps the Modem itself as a wrapper, since that is useful abstraction for client code. It's much more about proper (re)factoring than mere layering. Cohesion and coupling are still the base principles of design.
Finally, three issues:
As Martin notes himself, it's not always easy to see the different 'reasons for change'. The very concepts of YAGNI, Agile, etc. discourage the anticipation of future reasons for change, so we shouldn't invent ones where they aren't immediately obvious. I see 'premature, anticipated reasons for change' as a real risk in applying SRP and should be managed by the developer.
Further to the previous, even correct (but unnecessary anal) application of SRP may result in unwanted complexity. Always think about the next poor sod who has to maintain your class: will the diligent abstraction of trivial behaviour into its own interfaces, base classes and one-line implementations really aid his understanding of what should simply have been a single class?
Software design is often about getting the best compromise between competing forces. For example, a layered architecture is mostly a good application of SRP, but what about the fact that, for example, the change of a property of a business class from, say, a boolean to an enum has a ripple effect across all the layers - from db through domain, facades, web service, to GUI? Does this point to bad design? Not necessarily: it points to the fact that your design favours one aspect of change to another.
I'd have to say "yes", but you have to do your SRP properly. If the same operation applies to only one class, it belongs in that class, wouldn't you say? How about if the same operation applies to multiple classes? In that case, if you want to follow the OO model of combining data and behavior, you'd put the operation into a base class, no?
I suspect that from your description, you're ending up with classes which are basically bags of operations, so you've essentially recreated the C-style of coding: structs and modules.
From the linked SRP paper:
"The SRP is one of the simplest of the principle, and one of the hardest to get right."
The quote from the SRP paper is very correct; SRP is hard to get right. This one and OCP are the two elements of SOLID that simply must be relaxed to at least some degree in order to actually get a project done. Overzealous application of either will very quickly produce ravioli code.
SRP can indeed be taken to ridiculous lengths, if the "reasons for change" are too specific. Even a POCO/POJO "data bag" can be thought of as violating SRP, if you consider the type of a field changing as a "change". You'd think common sense would tell you that a field's type changing is a necessary allowance for "change", but I've seen domain layers with wrappers for built-in value types; a hell that makes ADM look like Utopia.
It's often good to ground yourself with some realistic goal, based on readability or a desired cohesion level. When you say, "I want this class to do one thing", it should have no more or less than what is necessary to do it. You can maintain at least procedural cohesion with this basic philosophy. "I want this class to maintain all the data for an invoice" will generally allow SOME business logic, even summing subtotals or calculating sales tax, based on the object's responsibility to know how to give you an accurate, internally-consistent value for any field it contains.
I personally do not have a big problem with a "lightweight" domain. Just having the one role of being the "data expert" makes the domain object the keeper of every field/property pertinent to the class, as well as all calculated field logic, any explicit/implicit data type conversions, and possibly the simpler validation rules (i.e. required fields, value limits, things that would break the instance internally if allowed). If a calculation algorithm, perhaps for a weighted or rolling average, is likely to change, encapsulate the algorithm and refer to it in the calculated field (that's just good OCP/PV).
I don't consider such a domain object to be "anemic". My perception of that term is a "data bag", a collection of fields that has no concept whatsoever of the outside world or even the relation between its fields other than that it contains them. I've seen that too, and it's not fun tracking down inconsistencies in object state that the object never knew was a problem. Overzealous SRP will lead to this by stating that a data object is not responsible for any business logic, but common sense would generally intervene first and say that the object, as the data expert, must be responsible for maintaining a consistent internal state.
Again, personal opinion, I prefer the Repository pattern to Active Record. One object, with one responsibility, and very little if anything else in the system above that layer has to know anything about how it works. Active Record requires the domain layer to know at least some specific details about the persistence method or framework (whether that be the names of stored procedures used to read/write each class, framework-specific object references, or attributes decorating the fields with ORM information), and thus injects a second reason to change into every domain class by default.
My $0.02.
I've found following the solid principles did in fact lead me away from DDD's rich domain model, in the end, I found I didn't care. More to the point, I found that the logical concept of a domain model, and a class in whatever language weren't mapped 1:1, unless we were talking about a facade of some sort.
I wouldn't say this is exactly a c-style of programming where you have structs and modules, but rather you'll probably end up with something more functional, I realise the styles are similar, but the details make a big difference. I found my class instances end up behaving like higher order functions, partial functions application, lazily evaluated functions, or some combination of the above. It's somewhat ineffable for me, but that's the feeling I get from writing code following TDD + SOLID, it ended up behaving like a hybrid OO/Functional style.
As for inheritance being a bad word, i think that's more due to the fact that the inheritance isn't sufficiently fine grained enough in languages like Java/C#. In other languages, it's less of an issue, and more useful.
I like the definition of SRP as:
"A class has only one business reason to change"
So, as long as behaviours can be grouped into single "business reasons" then there is no reason for them not to co-exist in the same class. Of course, what defines a "business reason" is open to debate (and should be debated by all stakeholders).
Before I get into my rant, here's my opinion in a nutshell: somewhere everything has got to come together... and then a river runs through it.
I am haunted by coding.
=======
Anemic data model and me... well, we pal around a lot. Maybe it's just the nature of small to medium sized applications with very little business logic built into them. Maybe I am just a bit 'tarded.
However, here's my 2 cents:
Couldn't you just factor out the code in the entities and tie it up to an interface?
public class Object1
{
public string Property1 { get; set; }
public string Property2 { get; set; }
private IAction1 action1;
public Object1(IAction1 action1)
{
this.action1 = action1;
}
public void DoAction1()
{
action1.Do(Property1);
}
}
public interface IAction1
{
void Do(string input1);
}
Does this somehow violate the principles of SRP?
Furthermore, isn't having a bunch of classes sitting around not tied to each other by anything but the consuming code actually a larger violation of SRP, but pushed up a layer?
Imagine the guy writing the client code sitting there trying to figure out how to do something related to Object1. If he has to work with your model he will be working with Object1, the data bag, and a bunch of 'services' each with a single responsibility. It'll be his job to make sure all those things interact properly. So now his code becomes a transaction script, and that script will itself contain every responsibility necessary to properly complete that particular transaction (or unit of work).
Furthermore, you could say, "no brah, all he needs to do is access the service layer. It's like Object1Service.DoActionX(Object1). Piece of cake." Well then, where's the logic now? All in that one method? Your still just pushing code around, and no matter what, you'll end up with data and the logic being separated.
So in this scenario, why not expose to the client code that particular Object1Service and have it's DoActionX() basically just be another hook for your domain model? By this I mean:
public class Object1Service
{
private Object1Repository repository;
public Object1Service(Object1Repository repository)
{
this.repository = repository;
}
// Tie in your Unit of Work Aspect'ing stuff or whatever if need be
public void DoAction1(Object1DTO object1DTO)
{
Object1 object1 = repository.GetById(object1DTO.Id);
object1.DoAction1();
repository.Save(object1);
}
}
You still have factored out the actual code for Action1 from Object1 but for all intensive purposes, have a non-anemic Object1.
Say you need Action1 to represent 2 (or more) different operations that you would like to make atomic and separated into their own classes. Just create an interface for each atomic operation and hook it up inside of DoAction1.
That's how I might approach this situation. But then again, I don't really know what SRP is all about.
Convert your plain domain objects to ActiveRecord pattern with a common base class to all domain objects. Put common behaviour in the base class and override the behaviour in derived classes wherever necessary or define the new behaviour wherever required.