I have been confused about ORM since I see the following sample code:
public class Article
{
public List<Comment> Comments;
public void AddComment(Comment comment)
{
Comments.Add(comment);
}
// I'm surprised by this kind of operation
// how much performance hit it should be
public void Save()
{
//update the article and all its comments
}
}
According to what I think, the responsiblity of saving comment should be assigned to the comment itself:
public class Comment
{
public Article BelongArticle;
//I think this is better/direct than use object Article,
//But it's based on the consideration of database structure,
//I was told one should "forget" the database, but it's really hard
public int ArticleId;
public void Save()
{
//save the comment directly
}
}
You are reaching conclusions without any real basis because you are just looking at some sample code and not considering what actually might be happening.
The whole point of using an ORM is so you can allow it to handle the database transactions while you work in an object oriented rather than a relational fashion in your application. You really cannot say anything about how the ORM performs when you do an update on Article by just looking at Article.Save. Article.Save is an OO construct, and what the ORM actually executes on the database is a relational action. What about Article.Save makes you think it is inefficient? Looking at that does not give you any information. You would have to look at what the ORM of choice is doing on the database.
Suppose the Article is a new object. In this case you have to save the Article, set the foreign key in the Comment, and then save the comment. Your "preferred" code does not show the full operation but it still must occur. The difference is the ORM gives you an object oriented way to do this - just call save on the article. Under the hood the same operations must occur either way. Maybe the ORM takes more steps than you could do it in manually, but maybe not.
Suppose Article is not a new object. You add a new Comment. Depending on the platform, when you call Save this time potentially what happens with an ORM could be no different than what you might think as better depending on how your code is written. If there is nothing that needs to be updated in the Article, the ORM may simply save the Comment.
ORMs use various methods, but in general they maintain some kind of running account of objects that need to be updated.
You cannot just say that the first approach is somehow inefficient just on face because you called a method on Article instead of Comment - what is actually happening will depend on the specific ORM platform you use as well as the state of the objects.
Related
I have a use case to store given object as JSON in local file system and my current implementation looks like below. Let's say I want to store the same object in different format in some remote location or database in the future. It require modifications in the constructor and implementation(addConfig) in ConfigStore. But, parameters of addConfig will remain same which means it will require changes only in the places where we construct the ConfigStore object.
Here, I am programming to an implementation. Though I introduce interface and modify my ConfigStore class to implement it, I still need to update all places which create instance of ConfigStore when I move to different format or data store later. So, does it really make sense to use interface for this particular use case? If yes, what are the advantages?
I know the concept of interface and I have been using it widely. But, I am trying to understand if "Program to interfaces but not implementations" is really applicable in this use case. I see many of my team mates are using interface just for the similar kind of purpose (i.e) what if we move to different store later and so, I want to get some thoughts here.
ConfigStore {
#Autowired private final String mPathRoot;
#Autowired private final ObjectMapper mObjectMapper;
public void addConfig(Config config, String countryCode) {
// Code goes here
}
}
In my opinion, coding to an interface principal should always be adopted. If you find yourself with a design in which you need to update all places that uses "ConfigStore" for new format than you need to take a closer look at your overall design.
I can think of two common pitfalls when adopting "program to interface":
Creation of specific object
Need for different parameters or sequence for different implementation of the interface.
Regarding the latter, it is usually solvable by rethinking your abstractions. There is no ready answer for that.
However, regarding the first pitfall, the best way to solve this is by using "dependency injection" and one of the "Factory" patterns to "hide away the mess". This way only relevant factory code will need to be updated for new "formats".
Hope this helps.
I do not use lazy loading. My root aggregate have entities (collection navigation properties). I want my aggregate to be self-contained, responsible for itself, and follow the Single Responsibility Principle (SRP), and adhere to high cohesion and low coupling.
The problem is that the code that retrieves the root aggregate needs to include certain child entities depending on which way it wants to interact with the aggregate.
Example:
public class Blog // My root aggregate
{
public ICollection<Author> Authors { get; set; }
public ICollection<Post> Posts { get; set; }
public AddAuthor(Author author)
{
_authors.Add(author);
}
public AddPost(Post post)
{
_posts.Add(post);
}
}
If I want to add a author, I have to do:
var blog = _context.Blogs.Include(x => x.Authors).Single(x => x.BlogId == 1);
blog.AddAuthor(/* ... */);
And if I want to add a post, I would have to do:
var blog = _context.Blogs.Include(x => x.Posts).Single(x => x.BlogId == 1);
blog.AddPost(/* ... */);
But I feel this breaks encapsulation, because now my Blog aggregate is not self-contained, its functionality depends on how the caller has retrieved the aggregate from the DbContext (or the repository). If the caller did not include the necessary dependent entities then the operation on the aggregate would fail (since the property would be null).
I would like to avoid lazy loading because it is less suitable for web applications and performs worse due to executing multiple queries. I feel that having a repository with methods such as GetBlogWithAuthors and GetBlogWithPosts would be ugly. Do I have to create a repository method such as GetBlog which always include all child entities? (this would a big, slow query, that would timeout).
Are there any solutions to this problem?
I realize it is probably a practice domain but I think an important point that is not talked about enough is that strict DDD should not always be applied. DDD brings a certain amount of complexity to minimize the explosion of complexity. If there is little complexity to start with, it is not worth the added upfront complexity.
As was mentioned in the comments, and Aggregate is a consistency boundary. Since there does not seem to be any consistency being enforced, you can split it. Blog can have a collection of PostRef or something so it need not pull back ALL Post data where PostRef has maybe Id and Title?
Then Post is its own aggregate. I am guessing that Post has an Author. It is recommended not to reference entities in other aggregates that are not the aggregate root so now it seems like Authors should not be in Blog.
When your starting point is an ORM, my experience is that your model will fight the DDD recommendations. Create your model and then see how to persist your aggregates. My and many other's experiences at that point is that an ORM just isn't worth the yak shaving that it brings throughout the project. It is also far too easy for someone who does not understand the constraints to add a reference that should not be there.
To address performance concerns. Remember that your read and write models do not have to be the same. You optimize your write model for enforcing constraints. If separate you can then optimize your read model for query performance. If this sounds like CQRS to you, then you are correct. Again though, the number of moving parts increases and it should solve more problems than it introduces. Again, your ORM will fight you on this.
Lastly, if you do have consistency constraints that require really large amounts of data, you need to ask the question of whether they really need to be enforced in real-time? When you start modeling time, some new options emerge.
SubmittedPost -> RejectedPost OR AcceptedPost -> PublishedPost. If this happens as a background process, the amount of data that needs to be pulled will not affect UX. If this sounds interesting I suggest you take a look at the great book Domain Modeling made Functional.
Some other resources:
Shameless plug: Functional modeling
Nick has an example of
relaxing invariant business rules when accepting input
A
discussion on aggregates...it went deep fast. This question was asked
and I don't think we answered it well so hopefully, I did that better here.
I'm currently developing a ZF2 application and want to implement it pretty similar to the ZF2 example Blog application. Probably in the future the DAL will be replaced by Doctrine, but in the first version the model should work like in the Blog application: Service + Mapper + Data Objects.
In the Blog application the method ZendDbSqlMapper#save(...) gets (like every other public method of a Mapper) an Data Object as argument, extracts it, and writes then the data to the database. But my real world case is a bit more complex and I don't (but want to) understand, whether the approach is still applicable to it and how.
The application should primarily deal with (saving and retrieving of) requests / orders of some technical services. (In the next step they are manually processed by an employee and implemented.) So, the common case will be the saving (updating/creating) of an Order.
The physical model looks like this:
As you can see, the Order has some dependencies, that also have dependencies etc. On creating an OrderI have to create a LogicalConnection first. For a LogicalConnectionan (abstract) PhysicalConnectionand a concrete physical connection variant like PhysicalConnectionX are needed. (It implements the Class Table Inheritance.) Furthermore a LogicalConnectionneeds a new Customer (to simplify: every time a new customer for a new order) and an Endpoint with a concrete endpoint variant like EndpointA (also a CTI implementation). The tables on the left side of the data model are just basic data, that should/can not be changed. (Of course the updating is even a bit more complicated, since I have to check for every related object, if it already exists, to avoid e.g. creating of multiple customers for the same endpoint.)
My first idea was to implement it like this:
transform the input, the model gets from the form (I don't use Zend\Collection, because my for is structured completely diffreren that my objects and the database);
hydrate the Order object for it (recursive hydration is already implemented);
create a Mapper for every object type;
and let every Mapper#save(...)
call the save(...)on the mappers of the object it depends on;
and then care only for its object.
Pseudocode:
MyDataObjectA {
$id;
$myObjectB;
}
MyDataObjectB {
$id;
}
MapperA {
save($dataObjectA) {
saving $dataObjectA
calling MapperA#save($dataObjectA->getObjectB() )
}
}
MapperB {
save($dataObjectB) {
saving $dataObjectB
}
}
It's a lot of code and every case has to be handled manually. (And I'm not sure, but maybe I can have some problems with context dependent saving, since this approach doesn't concider the context.) However -- I don't believe, it's a recommended solution.
Well, it might smack of an ORM. But what's about the model structure from the ZF2 Blog tutorial? Is it applicable for such a case? Or is it only useful for very simple structures and nearly never for a real world application? (Then I would ask -- do we really need this tutorial, if it shows an approach, that nearly never can be used in a real application?) Or maybe I just understand something wrong and there is a better (efficient, elegant etc.) approach?
I just started learning OOP and I'm finding it really hard to decide where functionality belongs. Let's use a down vote in SO for our example:
When we cast one, the following must happen in a transaction:
Decrement the voter's rep and downVotes count.
Decrement the recipient's rep.
Decrement the post score.
So...
How do we determine which action belongs to which object?
Where would such functionality live? In the DAO layer, services layer, or the actual objects themselves?
It becomes increasingly tricky when objects interact with each other, such as in my example. It's often hard to decide what function belongs to what object and so on...
Take a look at SOLID principles of OO design, Coupling & Cohesion.
OO can be used in many places, it is not limited to e.g. your Business Layer. You can write your Javascript object-oriented.
I'd model your example SO domain similar to this (in C#). This is idealistic OO code, and in real world some compromises would be made, such as making fields public for my ORM. What I am trying to show - each object is responsible for its data, noone else can change it directly; they must ask that object to do something, by calling one of the public methods.
public class User
{
private int _reputation;
private int _downvotes;
public void Downvote(Post post)
{
DecreaseReputation();
IncrementDownvotes();
post.Downvote();
}
public void RegisterDownvote()
{
DecreaseReputation();
}
private void DecreaseReputation()
{
_reputation--;
}
private void IncrementDownvotes()
{
_downvotes++;
}
}
public class Post
{
private int _score;
private User _poster;
public void Downvote()
{
DecreaseScore();
_poster.RegisterDownvote();
}
private void DecreaseScore()
{
_score--;
}
}
This is not an easy question to answer and sounds more like a design-pattern question than an OOP question per se. In the case of SO (I am making an assumption based on assumed design patterns for their site), all the "layers" of the design-pattern are involved in what you are calling a "transaction" (not a DB term I assume the way you are using it). The UI layer or View accepts the "down vote" and makes a what appears to be an ajax request most likely to a layer that handles business rules, which determines what actually happens when a "down vote" is cast against a user. At that point, the business layer makes requests of the data layer to update a database somewhere to update the user's score, reputation, etc. This also may be performed a little bit differently using web services, who knows what's under the hood here at SO. As far as OOP; I am sure there is a lot of OOP under the hood, everywhere, in all the layers, scripting and other languages perhaps, but I would imagine that in the case of your example, SO is not passing around a "User" class object when a vote is cast; there is no need to.
Here is the very popular MVC design pattern for example: http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller
Say I have an object for U.S. state, and I want to perform a batch update on some attribute of that object, say census data. I see two options:
have a separate object that performs batch operations and finds and loops through all instances of the states, updating each one
some kind of function like State.new.parse_census which would not have any state information but could go through and update the database.
Sorry this is such a newb question. I'm assuming the former is cleaner and correct, but I want to make sure I am not making a design mistake with that assumption. Thanks..
Standard method is to make a method within the class USState like so
public void updateCensus(...)
{
//do stuff to internal data
}
then whatever is housing all the instances of this class just loops through
public class Houser
{
ArrayList<USState> list;
public void foo()
{
for(USState state : list)
{
state.updateCensus(...)
}
}
}
The idea here is that you design your class to manage its own internals. This way, it is more maintainable, readable, and outside classes do not need to be exposed to the underlying structure of your object to interact with it appropriately.
The probably "cleanest" way of doing this would be to have the state object own a container (e.g. std::vector<censusdata>, assuming C++) and use std::foreach() to iterate over the elements.
Depending on the size of the elements and on what container you choose, this will be reasonably efficient too.
If performance is crucial, on the other hand, it is better to break with OOP and store each field of censusdata in a separate contiguous container. That way, when iterating over it, caches will work much more efficiently. It's quite non-pretty, though.
It may indicate your class structure needs refactoring
If you have a load of objects that all have a particular aspect that change at once, it may be that this part should be refactored out into its own class which all your objects point to.
So you have a list of State classes and you need to update the tax rate paid for each state. Now (I dont know if this is true in the US) every state needs to pay the government the same rate of tax. So rather than looping through each state updating their tax rate, you should probably have a separate TaxRate class that all states point to.
Then to update, you just need to update the TaxRate object in one place and all classes get updated.
You wrote "database", so I assume you are talking about persistent objects stored in a (relational?) database, right? In this case, your options depend heavily on how your architecture looks like. If you are using an object-relational mapper, for example, it may provide already some kind of navigational tools for the purpose. And if you have to update a million of objects, it might be a good idea to bypass the ORM mapper and send a single update SQL to your database which does the job, if that's possible.
If you are working in C#, a LINQ query would be the way to go.
var states = new List<USState>;
// ...initialize "states" here
// Update census in each state
states.Select( state => state.UpdateCensus(...) );
The "Select" method will iterate through every state, and call the state's UpdateCensus method.