How value objects are saved and loaded? - repository

Since there aren no respositories for value objects. How can I load all value objects?
Suppose we are modeling a blog application and we have this classes:
Post (Entity)
Comment (Value object)
Tag (Value object)
PostsRespository (Respository)
I Know that when I save a new post, its tags are saved with it in the same table. But how could I load all tags of all posts. Should PostsRespository have a method to load all tags?
I usually do it, but I want to know others opinions

I'm looking for a better solution for this question and I found this post:
This post explain very well why there is a lot of confusion with value objects and databases.
Here you are phrase which liked me too much:
"Persistence is not an excuse to turn everything to entities."
Gojko Adzic, give us three alternatives to save our value objects.

I am currently working through a similar example. Once you need to uniquely refer to tags they are no long simple value objects and may continue to grow in complexity. I decided to make them their own entities and create a separate repository to retrieve them. In most scenarios they are loaded or saved with the post but when they are required alone the other repository is used.
I hope this helps.
In part thanks to this post I decided to restructure my application slightly. You are right that I probably was incorrectly making tags an entity. I have since changed my application so that tags are just strings and the post repository handles all the storage requirements around tags. For operations that need posts the tags are loaded with them. For any operation that just requires tags or lists of tags the repository has methods for that.

Here is my take in how I might solve this type of problem in the way that I'm currently practicing DDD.
If you are editing something that requires tags to be added and removed from such as a Post then tags may be entities but perhaps they could be value objects and are loaded and saved along with the post either way. I personally tend to favor value objects unless the object needs to be modified but I do realize that there is a difference between entity object modeled as read only "snapshots" and actual value objects that lack identity. The tricky part is that perhaps sometimes what you would normally think of as a key could be part of a value object as long as it is not used as identity in that context and I think tags fall into this category.
If you are editing the tags themselves then it is probably a separate bounded context or at least a separate aggregate in which tags are themselves are the aggregate root and persisted through a repository. Note that the entity class that represents tags in this context doesn't have to be the same entity class for tags used in Post aggregate.
If your listing available tags on the display for read only purposes such as to provide a selection list, then that is probably a list of value objects. These value objects can but don't have to be in the Domain Model since they are mainly about supporting the UI and not about the actual domain.
Please chime in if anybody has any thoughts on why my take on this might be wrong but this is the way I've been doing it.


Embeddable vs one to many

I have seen an article in Dzone regarding Post and Post Details (two different entities) and the relations between them. There the post and its details are in different tables. But as I see it, Post Detail is an embeddable part because it cannot be used without the "parent" Post. So what is the logic to separate it in another table?
Please give me a more clear explanation when to use which one?
Embeddable classes represent the state of their parent classes. So to take your example, a StackOverflow POST has an ID which is invariant and used in an unbreakable URL for sharing e.g. There are a series of other attributes (state, votes, etc) which are scalar properties. When the post gets edited we have various versions of the text (which are kept and visible to people with sufficient rep). Those are your POST DETAILS.
"what is the logic to separate it in another table?"
Because keeping different things in separate tables is what relational databases do. The standard way of representing this data model is a parent table POST and child table POST_DETAIL with a defined relationship enforced through a foreign key.
Embeddable is a concept from object-oriented programming. Oracle does support object-relational constructs in the database. So it would be possible to define a POST_DETAIL Type and create a POST Table which has a column declared as a nested table of that Type. However, that would be a bad design for two reasons:
The SQL for working with nested tables is clunky. For instance, to get the POST and the latest version of its text would require unnesting the collection of details every time we need to display it. Computationally not much different from joining to a child table and filtering on latest version flag, but harder to optimise.
Children can have children themselves. In the case of Posts, Tags are details because they can vary due to editing. But if you embed TAG in POST_DETAIL embedded in POST how easy would it be to find all the Posts with an [oracle] tag?
This is the difference between Object-Oriented design and relational design.
OO is strongly hierarchical: everything is belongs to something and the way to get the detail is through the parent. This approach works well when dealing with single instances of things, and so is appropriate for UI design.
Relational prioritises commonality: everything of the same type is grouped together with links to other things. This approach is suited for dealing with sets of things, and so is appropriate for data management tasks (do you want to find all the employees who work in BERLIN or whose job is ENGINEER or who are managed by ELLIOTT?)
"give me a more clear explanation when to use which one"
Always store the data relationally in separate tables. Build APIs using OO patterns when it makes sense to do so.

2sxc Knowledge Management solution hurdles

I'm evaluating 2sxc as a possible platform for implementing a knowledge management solution but we're in a bit of a rush. Our alternative is DNN Live Articles.
So far I really like the look of 2sxc, but I have questions regarding our possible use of it.
The main questions I have are around hierarchical lists like nested Categories and permissions.
From the look of some of the apps I've installed like FAQs with Categories but I can't find anything yet where they are nested. I tried creating a Content Type and adding fields where the first is the Category Name and the second is Parent Category. I created a new Content Type Field with a Data Type of Entity, but the only option for Input Type is default and Content Block Items. It works but when you create a new category the content that comes up in the Parent Category field covers just about everything - not sure I understand the concept behind this.
Then the second issue is permissions. Does this system somehow incorporate permissions because we'd like to lock down knowledge articles by category, but I haven't seen any implementations that showcase how one would do this.
Regarding #1 I don't understand your question, sorry :)
Regarding #2: there is no rule-based security, so you can't say "items with category X may be edited, but category Y may not"
BUT: you can easily implement this in your UI, if your main concern is user guidance and not "bad people with very good IT skills"

API object versioning

I'm building an API and I have a question about how to represent objects.
Imagine we have a system with Articles that have a bunch of properties. Some of these properties are complex, for example the Author of the Article refers to another object. We have an URL to fetch all the articles in the system, and another URL to fetch a particular Article.
My first approach to implement this would be to create two representations of the same object Article, because when you request all the articles, it makes sense that you don't retrieve all the information about the Articles, but for example just the title, the date and the name of the author (instead of the whole Author object), excluding other properties like tags, or the content. The idea beneath this is to try to make the response of all the Articles a little bit lighter.
Now I'm going to the client side, and I decide to implement a SDK for Android, for example. So the first step would be to create the objects to store the information that I retrieve from the API. Now a problem pops up, because I want to define the Article object, but I would need two versions of it and it's not only more difficult to implement, but it's going to be more difficult to use.
So my question is, when defining an API, is it a good practice to have multiple versions of the same object (maybe a light one, and a full one) to save some bandwidth when sending the result of a request but generating a more difficult to use service, or it's not worth it and you should retrieve always the same version of the object, generating heavier responses but making the service easier to use?
I work at a company that deals with Articles as well and we also have a REST API to expose the data.
I think you're on the right track, but I'll even take it one step further. These are the potential three calls for large entities in an API:
Index. For the articles, this would be something like /articles. It just returns a list of article ids. You can add parameters to filter, sort, etc. It's very lightweight and I've found it to be very useful.
Header/Mini/Light version. These are only the crucial fields that you think will meet the widest variety of use cases. For us, we have a lot of use cases where we might want to display the top 5 articles, and in those cases, only title, author and maybe publication date. Those fields belong in a "header" article, or a "light" article. This is especially useful for AJAX calls as you don't want to return the entire article (for us the object is quite large.)
Full version. This is the full article. All the text/paragraphs/image references - everything. It's a heavy call to make, but you will be guaranteed to get whatever is available.
Then it just takes discipline to leave the objects the way they are. Ideally users are able to get the version described in (2) to save time over the wire, but if they have to, they go with (3).
I've considered having a dynamic way to return only fields people are interested in, but it would be a lot of implementation. Basically the idea was to let the user go to /article and then show them a sample JSON result. Then the user could click on the fields they wanted returned and get a token. Then they'd pass the token as a parameter to the API and the API would then know which fields to return.
Creates a dynamic schema. Lots of work and I never got around to it, but you can see that if you want to be creative, you can.
Consider whether your data (for one API client) is changing a lot or not. If it's possible to cache data on the client, that'll improve performance by not contacting the API as much. Otherwise I think it's a good idea to have a light-weight and full-scale object type (or more like two views of the same object type).
In the client you should implement it as one object type (to keep it DRY; Don't Repeat Yourself) with all the properties. When fetching a light-weight object, you only store a few of the properties, the rest being null (or similar “undefined” value for the given property type). It should be possible to determine whether all or only a partial subset of the properties are loaded.
When making API requests in the client on a given model (ie. authors) you should be explicit about whether the light-weight or full-scale object is needed and whether cached data is acceptable. This makes it possible to control the data in the UI layer. For example a list of authors might only need to display a name and a number of articles connected with that author. When displaying the author screen, more properties are needed. Also, if using cached data, you should provide a way for the user to refresh it.
When the app works you can start to implement optimizations like: Don't fetch light-weight data if full-scala data is already known & Don't fetch data at all if a recent cache copy exists. I think the best is to look at the actual use cases and improve performance with the highest value for the user.

Avoid loading unnecessary data from db into objects (web pages)

Really newbie question coming up. Is there a standard (or good) way to deal with not needing all of the information that a database table contains loaded into every associated object. I'm thinking in the context of web pages where you're only going to use the objects to build a single page rather than an application with longer lived objects.
For example, lets say you have an Article table containing id, title, author, date, summary and fullContents fields. You don't need the fullContents to be loaded into the associated objects if you're just showing a page containing a list of articles with their summaries. On the other hand if you're displaying a specific article you might want every field loaded for that one article and maybe just the titles for the other articles (e.g. for display in a recent articles sidebar).
Some techniques I can think of:
Don't worry about it, just load everything from the database every time.
Have several different, possibly inherited, classes for each table and create the appropriate one for the situation (e.g. SummaryArticle, FullArticle).
Use one class but set unused properties to null at creation if that field is not needed and be careful.
Give the objects access to the database so they can load some fields on demand.
Something else?
All of the above seem to have fairly major disadvantages.
I'm fairly new to programming, very new to OOP and totally new to databases so I might be completely missing the obvious answer here. :)
(1) Loading the whole object is, unfortunately what ORMs do, by default. That is why hand tuned SQL performs better. But most objects don't need this optimization, and you can always delay optimization until later. Don't optimize prematurely (but do write good SQL/HQL and use good DB design with indexes). But by and large, the ORM projects I've seen resultin a lot of lazy approaches, pulling or updating way more data than needed.
2) Different Models (Entities), depending on operation. I prefer this one. May add more classes to the object domain, but to me, is cleanest and results in better performance and security (especially if you are serializing to AJAX). I sometimes use one model for serializing an object to a client, and another for internal operations. If you use inheritance, you can do this well. For example CustomerBase -> Customer. CustomerBase might have an ID, name and address. Customer can extend it to add other info, even stuff like passwords. For list operations (list all customers) you can return CustomerBase with a custom query but for individual CRUD operations (Create/Retrieve/Update/Delete), use the full Customer object. Even then, be careful about what you serialize. Most frameworks have whitelists of attributes they will and won't serialize. Use them.
3) Dangerous, special cases will cause bugs in your system.
4) Bad for performance. Hit the database once, not for each field (Except for BLOBs).
You have a number of methods to solve your issue.
Use Stored Procedures in your database to remove the rows or columns you don't want. This can work great but takes up some space.
Use an ORM of some kind. For .NET you can use Entity Framework, NHibernate, or Subsonic. There are many other ORM tools for .NET. Ruby has it built in with Rails. Java uses Hibernate.
Write embedded queries in your website. Don't forget to parametrize them or you will open yourself up to hackers. This option is usually frowned upon because of the mingling of SQL and code. Also, it is the easiest to break.
From you list, options 1, 2 and 4 are probably the most commonly used ones.
1. Don't worry about it, just load everything from the database every time: Well, unless your application is under heavy load or you have some extremely heavy fields in your tables, use this option and save yourself the hassle of figuring out something better.
2. Have several different, possibly inherited, classes for each table and create the appropriate one for the situation (e.g. SummaryArticle, FullArticle): Such classes would often be called "view models" or something similar, and depending on your data access strategy, you might be able to get hold of such objects without actually declaring any new class. Eg, using Linq-2-Sql the expression data.Articles.Select(a => new { a .Title, a.Author }) will give you a collection of anonymously typed objects with the properties Title and Author. The generated SQL will be similar to select Title, Author from Article.
4. Give the objects access to the database so they can load some fields on demand: The objects you describe here would usaly be called "proxy objects" and/or their properties reffered to as being "lazy loaded". Again, depending on your data access strategy, creating proxies might be hard or easy. Eg. with NHibernate, you can have lazy properties, by simply throwing in lazy=true in your mapping, and proxies are automatically created.
Your question does not mention how you are actually mapping data from your database to objects now, but if you are not using any ORM framework at the moment, do have a look at NHibernate and Entity Framework - they are both pretty solid solutions.

Method names for getting data [closed]

Warning: This is a not very serious question/discussion that I am posting... but I am willing to bet that most developers have pondered this "issue"...
Always wanted to get other opinions regarding naming conventions for methods that went and got data from somewhere and returned it...
Most method names are somewhat simple and obvious... SaveEmployee(), DeleteOrder(), UploadDocument(). Of course, with classes, you would most likely use the short form...Save(), Delete(), Upload() respectively.
However, I have always struggled with initial to get the data. It seems that for every project I end up jumping between different naming conventions because I am never quite happy with the last one I used. As far as I can tell these are the possibilities -->
What is your thought?
It is all about consistent semantics;
In your question title you use getting data. This is extremely
general in a sense that you need to define what getting means
semantically significantly unambiguous way. I offer the follow
examples to hopefully put you on the right track when thinking about
naming things.
getBooks() is when you are getting
all the books associated with an
object, it implies the criteria for the set is
already defined and where they are coming from is a hidden detail.
is when are trying to find a sub-set
of the books based on parameters to
the method call, this will usually
be overloaded with different search
loadBooks(source) is when you are
loading from an external source,
like a file or db.
I would not use
fetch/retrieve because they are too vague and get conflated with get and there is no unambiguous semantic associated with the terms.
Example: fetch implies that some entity needs to go and get something that is remote and bring it back. Dogs fetch a stick, and retrieve is a synonym for fetch with the added semantic that you may have had possession of the thing prior as well. get is a synonym for obtain as well which implies that you have sole possession of something and no one else can acquire it simultaneously.
Semantics are extremely important:
the branch of linguistics and logic concerned with meaning
The comments are proof that generic terms like get and fetch have
no specific semantic and are interpreted differently by different
people. Pick a semantic for a term, document what it is intended to
imply if the semantic is not clear and be consistent with its use.
words with vague or ambigious meanings are given different semantics by different people because of their predjudices and preconceptions based on their personal opinions and that will never end well.
Honestly you should just decide with your team which naming convention to use. But for fun, lets see what your train of thought would be to decide on any of these:
This method belongs to a data source, and we don't care how it is obtaining them, we just want to Get them from the data source.
You treat your data source like a bloodhound, and it is his job to fetch your books. I guess you should decide on your own how many he can fit in his mouth at once.
Your data source is a librarian and will use the Dewey Decimal system to find your books.
These books belong in some sort of "electronic book bag" and must be loaded into it. Be sure to call ZipClosed() after loading to prevent losing them.
I have nothing.
The answer is just stick to what you are comfortable with and be consistant.
If you have a barnes and nobles website and you use GetBooks(), then if you have another item like a Movie entity use GetMovies(). So whatever you and your team likes and be consistant.
It is not clear by what you mean for "getting the data". From the database? A file? Memory?
My view about method naming is that its role is to eliminate any ambiguities and ideally a need to look up documentation. I believe that this should be done even at the cost of longer method names. According to studies, most intermediate+ developers are able to read multiple words in camel case. With IDE and auto completions, writing long method names is also not a problem.
Thus, when I see "fetchBooks", unless the context is very clear (e.g., a class named BookFetcherFromDatabase), it is ambiguous. Fetch it from where? What is the difference between fetch and find? You're also risking the problem that some developers will associate semantics with certain keywords. For example, fetch for database (or memory) vs. load (from file) or download (from web).
I would rather see something like "fetchBooksFromDatabase", "loadBookFromFile", "findBooksInCollection", etc. It is less sightly, but once you get over the length, it is clear. Everyone reading this would right away get what it is that you are trying to do.
In OO (C++/Java) I tend to use getSomething and setSomething because very often if not always I am either getting a private attribute from the class representing that data object or setting it - the getter/setter pair. As a plus, Eclipse generates them for you.
I tend to use Load only when I mean files - as in "load into memory" and that usually implies loading into primitives, structs (C) or objects. I use send/receive for web.
As said above, consistency is everything and that includes cross-developers.