If you have a class A that is an aggregate of class B and C, is it better for A
to store ID's for B and C
to load and store the entire object for B and C (edit, store by reference to object B/C, i.e. instantiate objects B and C as opposed to storing id's for B and C.
store the ID's and provide methods to pull methods B and C
I'm assuming this varies depending on performance requirements and other requierements, but I'm just looking for any general guidelines or thoughts.
In a typical program running in memory, objects will almost always be stored by reference as pointers, so you ARE storing IDs for B and C, it's just that you don't deal with the details yourself, the language hides them from you.
Loading and storing the "Entire Object" is a questionable concept. I know you are trying to be language independent, but one of the first things that really helped me "Get" OO is that nearly every object should have a lifecycle of its own.
If you have object A that "Contains" object B, and you pass a reference to object B to object C, then Object A has to know something about object C, this is completely NOT OK. Freeing up object B's lifecycle so that object A knows nothing of object C is one of the core concepts that makes OO work.
So if that's what you meant by storing the entire object, then no--never do that.
And that's true of Databases and other storage as well. Even if one object is responsible for destroying another, it should rarely contain the other objects' data.
And (although I think you meant to say "pull Objects B and C", not "methods"), the concept of being able to pass an object from another is also very useful and there is generally nothing wrong with it with one caveat:
Remember that an object has no control over what goes on outside itself. It could be passed around, methods called in a semi-random order, etc. Therefore it's helpful to keep your object as safe as possible. If something is called in the wrong order, or an invalid variable is passed in, or you find that somehow you've entered an invalid state, Fail early and Fail LOUD so that the programmer who made a mistake called it.
You also want to make it as difficult as possible to get into an illegal state--this means keep your object small and simple, make variables final whenever possible and try not to have too many places where parameter call order matters.
I tend to load and store the entire objects (and their subobjects) as my default approach.
Sometimes this will cause long load times, a large memory footprint, or both. Then you'll need to determine if all the loaded objects are in fact used or if many are created and never accessed.
If all the objects are used, a more creative approach will be needed to load a subset, process those, dispose them, and then load the next subset to fit everything into memory - or simply buy more memory and make it available to your app.
If many of the objects are not used, the best approach is to lazily load the sub-objects as they are needed.
This depends on the situation. If the objects stay in memory, it's more OO (and easier) to have A contain B and C. I've found that if the objects need to be persisted, though, it makes things easier and more efficient for A to store IDs for B and C. (That way if you need data directly in A but not B and C, you won't have to pull B and C out of the database, file, etc.)
It depends,
If B and C are heavy and expense to load and construct, it might be worthwhile to defer the loading of them until you are sure they are needed (Lazy Initialize).
If they are simple and lightweight maybe you just want to construct them whenever, you get the Ids.
Related
Considering a class A that instantiates from a class B. Class B requires certain values from an instance of class A. What is best-practice, in terms of the SOLID or other design principles, for passing these values from A to B:
by passing the whole instance of A to the constructor of B
by just passing the necessary attribute values from an instance of A to the constructor of B
depends on the situation?
In case of (3), which criteria would favour one or the other solution?
(I do not know if this has an effect on the possible answers, but I am coding in Python)
This is kind of an open question, and I would say it depends on the situation. Since you specifically say that B will be instantiated from A, I assume Dependency Injection (the D from SOLID) is out of the question. I would then focus on the Single-responsibility principle, in addition to the general goal of keeping code as clean as possible.
A couple of things to consider:
How many "values" will B actually require? If B only requires one or two simple values, then I would pass them in as parameters to keep it clean and simple. If it needs more than that, then you could either pass the whole of A (or a reference to it) in as a parameter, and let B pick what it needs from there, or create another class (C?) that contains exactly those values that are needed. That will keep the constructor signature of B fairly clean, and make it less burdensome to add / change the data passed into it (it will probably be easier and less messy to modify C later than to modify the signature of B, and all calls to it).
Where will B be used? If it will only ever be used by A, then it might make sense to let B fetch data directly from A as needed. If it will be used other places and by other classes, then it might be better to minimize any dependency on A, and make sure it has everything it needs from the get-go. This also applies if there is a chance B will need to be serialized and stored or sent over a network, etc., as it might then not have access to A.
The optimal solution will vary depending on context. You might for instance think of the difference between two tightly coupled classes dealing with the GUI of a Windows Desktop application, and two classes in a service architecture, where one or both might contain data to be transferred or stored in a database.
I think you missed one -- and the one that is actually most common.
You pass a reference to the whole object to B, which either holds the reference, or copies the values it needs during construction.
Depending on the relationship between the objects, either is acceptable.
Your solution #2 is common for those cases in which instantiating A does not require B -- that is, it could be instantiated by C which has equivalent values, or even just programmatically from values either input or computed at run time.
For whatever it's worth, I usually use #2, but if I commonly construct based on B, I will create a special constructor fascade that accepts B and harvests the values needed into the primary constructor.
If an aggregate needs some read-only data that doesn't belong to itself to performs an operation, is there any negative consequence to let the repository query some data from another aggregate to create the aggregate?
In detail:
I have a BC with two aggregates, say A and B. B needs a bit of data from A to perform some operation but won't modify it in any way. The data fits better on A since there are the rules to modify it.
Reading IDDD and PPP of DDD it seems that it is acceptable to pass a transient reference of an aggregate (or sub entity of it) to another one, or pass a read-only view as a value object to the other aggregate.
In my example, B doesn't need the whole A aggregate but only some specific data, so a value object seems like a good approach in this case. A could create the VO acting as a factory, the VO will conform to the UL and B doesn't need to be aware of A at all. A business use case in the application layer can reconstitute A and B from repositories, tell A to create the VO and performs operation on B passing VO.
Lets suppose now that reconstitution of A is expensive or there is another reason for what is not desirable to load the whole A to create the VO with just a bit of information (maybe the data is not from one instance of A but is aggregated from a list of them or whatever). Here a simple solution could be let the repository of A create the VO directly from the data store. I feel comfortable with this and seems it is a common pattern.
But now I'm thinking in a case when the operation on B is performed many times, or maybe is part of a bigger calculation on B that many other operations need. I could have a reference to the VO with the data needed (as a private, read-only property of B or somewhere in its graph) and let the repository of B take the data needed to create the VO and reconstitute B with it. Now B will always have the data locally to performs its operations. The data taken from A cannot be modified; saving B through its repository will just discard that data (maybe it could use it to detect a conflicting concurrent update), A and B will not be consistent at all times but that's OK, and reload B from repository will query the data again to update the view inside B in case of a conflict.
This approach seems OK to me since, as I understand, the domain model is unrelated to the data model, with the repository acting as a sort of ACL between the two. Also there is a single source of truth for the data inside A since the copy inside B is immutable and eventually consistent. The drawbacks I see are that repository will have more logic (but not business logic) and that it could be unclear where exactly the data is coming from since the dependency from B to A is now hidden inside infrastructural code.
So the questions are:
Is this a not-so-good approach after all?
Is there another drawback I am not seeing?
Did you or someone do something like this so I can learn from that experience?
I know the example is very poor since in DDD everything is about context. But this is a question I came up many times in different situations. I know as well that a valid concern is if aggregate boundaries are well defined, but let say they look good for the problem at hand.
is it acceptable to let the repository query some data from another aggregate to create the aggregate?
Acceptable is kind of weakly defined. A better question to ask might be "are there negative consequences?"
In this example, the usual consideration is whether or not the system becomes harder to change. Take a look at Adam Ralph's talk on service boundaries to get a sense for what happens when you don't control the coupling between components.
These days, if B needs a copy of A's data, then we usually introduce into our design of B a cache of A's data. Store the copy of the data with B, and work out explicitly how and when updates to A are communicated to B. The cache becomes part of B's data model.
See also Pat Helland's paper: Data on the Outside versus Data on the Inside..
I've been confused by this question for a long time and I don't know which field of computer science does the question belong to.
To be more specific, let's say we have a variable a and it's an object. b is a member of a and it's also an object.
What if b has a method which could destroy a? Can this happen? If it can, does b still exist after a is destroyed?
I think this issue is different from a leaf node is trying to delete its parent node.
Is the result varies with the language?
I feel sorry if the description is not clear enough because I haven't really meet a real case.
This depends a lot on the language.
I think most languages do not let you explicitly destroy an object, so this situation cannot happen.
In many languages (e.g. Java, Perl, JavaScript, Python, ...), objects are destroyed by the garbage collector when the program can no longer access them, and objects are inherently references, i.e. a having a member b simply means a keeps a reference to b, so if a goes away, nothing happens to b (as long as you still have some other reference to b).
In a language like C++, you can explicitly destroy objects. What happens then depends on other details: If b is a direct member of a, then b is destroyed along with a (and if you still have a pointer or reference to b, it becomes "dangling", i.e. it is an error to try to access it). On the other hand, if b is merely pointed to or referenced from a, then it stays alive.
I use to use Structures quite a lot in the VB6 days, and try to avoid them now with .NET. Just wondering if using structures in 2010 instead of a Class is considered nasty?
Thanks for the help.
Choosing a Structure takes consideration instead of being inherently "nasty". There are reasons why a Structure can be nasty; however there are also reasons a Class can be nasty in its own way...
Basically when you decide between these two object oriented kinds of containers, you're deciding how memory will be used.
There are different semantics associated with Structure and Class in VB.NET and they represent different memory usage patterns.
By creating a Class you're creating a reference type.
good for large data
memory contains a reference to the object location on the heap (like the concept of pointing to an object) though happens transparently to the VB.NET programmer because you're in "managed mode".
By creating a Structure you're creating a value type.
good for small data
memory allocated contains the actual value
be judicious because these are apt to get pushed on the stack area of memory (i.e. for local vars, but not class fields) - too large and you could run into stack issues.
Also some good video resources on YouTube if you're an audio learner.
Many articles on the Internet like these MSDN articles to teach the basics and details:
Value Types and Reference Types
7.1 Types - Reference and Value
MSDN Type Fundamentals - subheading: Reference and Value Types
Example
Structures exist because in some scenarios they make more sense than classes. They are particular useful for representing small abstract data types such as 3D points, latitude-longitude, rational numbers, etc.
The basic motivation for using structs is to avoid GC pressure. Since structs live inline (on the stack or inside whatever container you put them in) rather than on the heap, they typically result in far fewer small allocations, which makes a huge difference if you need to hold an array of one million points or rationals.
A key issue to watch out for is that structs are value types, and thus are generally passed around by value (the obvious exception being ref and out parameters). This has important implications. For instance:
Point3D[] points = ...;
points[9].Move(0, 0, 5);
The above code works fine, and increases the z coordinate of the 10th point by 5. The following code, however:
List<Point3D> points = ...;
points[9].Move(0, 0, 5);
Will still compile and run, but you will find that the z coordinate of the 10th point remains unchanged. This is because the List's index operator returns a copy of the point, and it is the copy that you are calling Move on.
The solution is quite simple. Always make structs immutable by marking all fields readonly. If you still need to Move points around, define + on the Point3D type and use assignment:
points[9] = points[9] + new Point3D(0, 0, 5);
It's considered pretty bad to use anything without understanding the implications.
Structures are value types, not reference types - and as such, they behave slightly differently. When you pass a value type, modifications are on a copy, not on the original. When you assign a value type to an object reference (say, in a non-generic list), boxing occurs. Make sure you read up on the full effect of choosing one over the other.
Read this for some understanding benefits of structures vs classes and vice-versa.
A structure can be preferable when:
You have a small amount of data and simply want the equivalent of the UDT
(user-defined type) of previous versions of Visual Basic
You perform a large number of operations on each instance and would incur
performance degradation with heap management
You have no need to inherit the structure or to specialize
functionality among its instances
You do not box and unbox the structure
You are passing blittable data across a managed/unmanaged boundary
A class is preferable when:
You need to use inheritance and polymorphism
You need to initialize one or more members at creation time
You need to supply an unparameterized constructor
You need unlimited event handling support
To answer your question directly, there is nothing inherantly wrong with using a structure in VB.NET. As with any design decision you do need to consider the consequences of this decision.
It's important that you're aware of the difference between a class and a structure so that you can make an educated decision about which is appropriate. As stated by Alex et al, one of the key differences between a structure and a class is that a structure is considered a value type and a class is considered a reference type.
Reference types use copy-by-reference sematics, this means that when an object is created or copied, only a pointer to the actual object is allocated on the stack, the actual object data is allocated on the heap.
In contrast, value types have copy-by-value sematics which means that each time a value type (e.g. a structure) is copied, then the entire object is copied to a new location on the stack/
For objects with a small amount of data, this isn't really a problem, but if you have a large amount of data then using a reference type will likely be less expensive in terms of stack allocations because only a pointer will be copied to the stack.
Microsoft have guidelines on the use of structures that more precisely describe the differences between classes and structures and the consequences of choosing one over the other
From a behavioral standpoints, there are three types of 'things' in .net:
Mutable reference types
Value types which can be mutated without being entirely replaced
Immutable reference and value types
Eric Lippert really dislikes group #2 above, since .net isn't terribly good at handling them, and sometimes treats them as though they're in group #1 or #3. Nonetheless, there are times when mutable value types make more sense semantically than would anything else.
Suppose, for example, that one has a rectangle and one wants to make another rectangle which is like the first one, but twice as tall. It is IMHO cleaner to say:
Rect2 = Rect1 ' Makes a new Rectangle that's just like Rect1
Rect2.Height = Rect2.Height*2
than to say either
Rect2 = Rect1.Clone ' Would be necessary if Rect1 were a class
Rect2.Height = Rect2.Height*2
or
Rect2 = New Rectangle(Rect1.Left, Rect1.Top, Rect1.Width, Rect1.Height*2)
When using classes, if one wants an object that's slightly different from an existing object, one must consider before mutating the object whether anyone else might want to use the original; if so, one must make a copy of it and then make the desired changes to the copy. With structs, there's no such restriction.
A simple way to think of value types is to regard every assignment operation as making a clone of the original, but in a way that's considerably cheaper than cloning a class type. If one would end up cloning a lot of objects as often as one would assign references without cloning, that's a substantial argument in favor of structs.
I have a "parent" Obj-C object containing (in a collection) a bunch of objects whose instance variables point to one another, possibly circularly (fear not, no retaining going on between these "siblings"). I write the parent object to XML, which of course involves (among other things) writing out its "children", in no particular order, and due to the possible circularity, I replace these references between the children with unique IDs that each child has.
The problem is reading this XML back in... as I create one "child", I come across an ID, but there's no guarantee the object it refers to has been created yet. Since the references are possibly circular, there isn't even an order in which to read them that solves this problem.
What do I do? My current solution is to replace (in the actual instance variables) the references with strings containing the unique IDs. This is nasty, though, because to use these instance variables, instead of something like [oneObject aSibling] I now have to do something like [theParent childWithID:[oneObject aSiblingID]]. I suppose I could create an aSibling method to simplify things, but it feels like there's a cleaner way than all this. Is there?
This sounds an awful lot like you are re-inventing NSCoding as it handles circular references, etc... Now, there might be a good reason to re-invent that wheel. Only you can answer that question.
In any case, sounds like you want a two pass unarchival process.
Pass 1: Grab all the objects out of the backing store and reconstitute. As each object comes out, shove it in a dictionary or map with the UID as the key. Whenever an object contains a UID, register the object as needing to be fixed up; add it to a set or array that you keep around during unarchival.
Pass 2: Walk the set or array of objects that need to be fixed up and fix 'em up, replacing the UIDs with objects from the map you built in pass #1.
I hit a bit of parse error on that last paragraph. Assuming your classes are sensibly declared, they ought to be able to repair themselves on the fly.
(All things considered, this is exactly the kind of data structure that is much easier to implement in a GC'd environment. If you are targeting Mac OS X, not the iPhone, turning on GC is going to make your life easier, most likely)
Java's serialization process does much the same thing. Every object it writes out, it puts in a 'previously seen objects' table. When it comes to writing out a subsequent reference, if it's seen the object before, then it writes out a code which indicates that it's a previously seen object from the list. When the reverse occurs, whenever it sees such a reference, it replaces it on the fly with the instance before.
That approach means that you don't have to use this map for all instances, but rather the substitution happens only for objects you've seen a second time. However, you still need to be able to uniquely reference the first instance you've got written, whether by some pointer to a part in the data structure or not is dependent on what you're writing.