What is the best way to save my POJOs into Jackrabbit JCR? - serialization

In Jackrabbit I have experienced two ways to save my POJOs into repository nodes for storage in the Jackrabbit JCR:
writing my own layer
and
using Apache Graffito
Writing my own code has proven time consuming and labor intensive (had to write and run a lot of ugly automated tests) though quite flexible.
Using Graffito has been a disappointment because it seems to be a "dead" project stuck in 2006
What are some better alternatives?

Another alternative is to completely skip an OCM framework and simply use javax.jcr.Node as a very flexible DAO itself. The fundamental reason why OCM frameworks exist is because with RDBMS you need a mapping from objects to the relational model. With JCR, which is already very object-oriented (node ~= object), this underlying reason is gone. What is left is that with DAOs you can restrict what your programmers can access in their code (incl. the help of autocompletion). But this approach does not really leverage the JCR concept, which means schema-free and flexible programming. Using the JCR API directly in your code is the best way to follow that concept.
Imagine you want to add a new property to an existing node/object later in the life of your application - with an OCM framework you have to modify it as well and make sure it still works properly. With direct access to nodes it is simply a single point of change. I know, this is a good way to get problems with typos in eg. property names; but this fear is not really backed by reality, since you will in most cases very quickly notice typos or non-matching names when you test your application. A good solution is to use string constants for the common node or property names, even as part of your APIs if you expose the JCR API across them. This still gives you the flexibility to quickly add new properties without having to adopt OCM layers.
For having some constraints on what is allowed or what is mandatory (ie. "semi-schema") you can use node types and mixins (since JCR 2.0 you can also change the node type for existing content): thus you can handle this completely on the repository level and don't have to care about typing and constraints inside your application code - apart from catching the exceptions ;-)
But, of course, this choice depends on your requirements and personal preferences.

You might want to have a look at Jackrabbit OCM that is alive and kickin. Of course another way is to manually serialize/deserialize the POJOs. For that there are many different options. Question is whether you need fix schema to query the objects in JCR. If you just want to serialize into XML then XStream is a very painless way to do so. If you need a more fix schema there is also Betwixt from Apache Commons.

It depends on your needs. When you directly use javax.jcr.node, it means your code is heavily coupled to the underlying mechanism. In medium and even some small sized projects, this is not a good idea. Obviously the question will be how to go from the Node to your own domain model. The problem is quite similar as with going from Jdbc ResultSet to your own domain model. Mind you, I mean from a technical point of view the problem is similar. From a functional point of view, there are huge differences between using JDBC and JCR.
Another deciding factor is whether you can impose a structure in your JCR content or not. Some application domains can (but still match better with JCR than JDBC), in other domains the content may be highly unstructured in nature. In such case OCM is clearly overkill. I'd still advice to write your own wrapper layer around javax.jcr.* classes.

There's also https://github.com/ilikeorangutans/omf, a very flexible object to JCR mapper. Unfortunately it doesn't have write support yet. However we're successfully using this framework in a large CMS installation.

There is also the JCROM project at http://code.google.com/p/jcrom/. That project went dormant for a couple of years, but there have been a few new releases as of summer 2013.

Related

Method JavaFx TreeItem getRoot() is not visible. What is the OOP/MVC reason it is not?

I needed to get the root item of a TreeView. The obvious way to get it is to use the getRoot() on the TreeView. Which I use.
I like to experiment, and was wondering if I can get same root, buy climbing up the tree from a leaf item (a TreeItem), using recursively getParent() until the result is NULL.
It is working as well, and, in my custom TreeItem, I added a public method 'getRoot()' to play around with it. Thus finding out this method does already exist in parent TreeItem, but is not exposed.
My question : Why would it not be exposed ? Is is a bad practice regarding OOP / MVC architecture ?
The reason for the design is summed up by kleopatra's comment:
Why would it not be exposed I would pose it the other way round: why should it? It's convenience api at best, easy to implement by clients, not really needed - adding such to a framework/toolkit tends to exploding api/implementation to maintain.
JavaFX is filled with decisions like this on purpose. A lot of the reasoning is based on experience (good and bad) from AWT/Spring. Just some examples:
For specifying execution on the UI thread, there is a runLater API, but no invokeAndWait API like Swing, even though it would be easy for the framework to provide such an API and it has been requested.
Providing an invokeAndWait API means that naive (and experienced :-) developers could use it incorrectly to accidentally deadlock threads.
Lots of classes are final and not extensible.
Sometimes developers want to extend classes, but can't because they are final. This means that they can't over-ride a lot of the built-in tested functionality of the framework and accidentally break it that way. Instead they can usually use aggregation over inheritance to do what they need. The framework forces them to do so in order to protect itself and them.
Color objects are immutable.
Immutable objects in general make stuff easier to maintain.
Native look and feels aren't part of the framework.
You can still create them if you want, and there are 3rd party libraries that do that, but it doesn't need to be in the core framework.
The application programming interface is single threaded not multi-threaded.
Because the developers of the framework realized that multi-threaded UI frameworks are a failed dream.
The philosophy was to code to make the 80% use case easier and the the 20% use case (usually) possible, using additional user or 3rd party code, while making it difficult for the user code to accidentally (or intentionally) break the framework. You just stumbled upon one instance of an application of this philosophy.
There are a whole host of catch-phrases that you could use to describe the reason for this design approach. None of them are OOP or MVC specific. The underlying principles have been around far longer than software engineering, they are just approaches towards work and engineering in general. Here are some links if interested:
You ain't going to need it YAGNI
Minimal viable product MVP
Worse-is-better
Muntzing
Feature creep prevention
Keep it simple stupid KISS
Occam's razor

Domain services seem to require only a fraction of the total queries defined in repositories -- how to address that?

I'm currently facing some doubts about layering and repositories.
I was thinking of creating my repositories in a persistence module. Those repositories would inherit (or implement/extend) from repositories created in the domain layer module, being kept "persistence agnostic".
The issue is that from all I can see, the necessities of the domain layer regarding its repositories are quite humble. In general, they tend to be rather CRUDish.
It's in general at the application layer level, when solving particular business use-cases that the queries tend to be more complex and contrived (and thus, the number of repository's methods to explode).
So this raises the question of how to deal with this:
1) Should I just leave the domain repository interfaces simple and then just add the extra methods in the repository implementations (such that the application layer, that does know about the repository implementations, can use them)?
2) Should I just add those methods at the domain level repository implementations? I think not.
3) Should I create another set of repositories to be used just at the application layer level? This would probably mean moving to a more CQRSesque application.
Thanks
I think you should react to the realities of your business / requirements.
That is, if your use-cases are clearly not "persistence agnostic" then don't hold on to that particular restriction. Not everything can be reduced to CRUD. In fact I think most things worth implementing can't be reduced to CRUD persistence. Most database systems relational or otherwise have a lot of features nowadays, and it seems quaint to just ignore those. Use them.
If you don't want to mix SQL with other code, there are still a lot of other "patterns" that let you do that without requiring you to abstract access to something you actually don't need abstraction to.
On the flipside, you build a dependency to a particular persistence system. Is that a problem? Most of the time it actually isn't, but you have to decide for yourself.
All in all I would choose option 4: Model the problem. If I need a complicated SQL to build a use-case, and I don't need database independence (I rarely if ever do), then just write it where it is used, end of story.
You can use other tools like refactoring later to correct design issues.
The Application layer doesn't have to know about the Infrastructure.
Normally it should be fine working with just what Repository interfaces declared in the Domain provide. The concrete implementations are injected at runtime.
Declaring repository interfaces in the Domain layer is not only about using them in domain services but also elsewhere.
Should I create another set of repositories to be used just at the
application layer level? This would probably mean moving to a more
CQRSesque application.
You could do that, but you would lose some reusability.
It is also not related to CQRS - CQRS is a vertical division of the whole application between queries and commands, not giving horizontal layers different ways of fetching data.
Given that a repository is not about querying but about working with full aggregates most of the time perhaps you could elaborate on why you may need to create a separate set of repositories that are used only in your application/integration layer?
Perhaps you need to have a read-specific implementation that is optimised for data retrieval:
This would probably mean moving to a more CQRSesque application
Well, you'd probably want to implement read-specific bits that make sense. I usually have my data access separated either by namespace and, at times, even in a separate assembly. I then use I{Aggregate}Query implementations that return the relevant bits of data in as simple a type as possible. However, it is quite possible to even map to a more complex read model that even has relations but it is still only a read model and is not concerned with any command processing. To this end the domain is never even aware of these classes.
I would not go with extending the repositories.

Umbraco Hive and Services Layer

I'm experimenting with the new Umbraco 5 hive, and I'm kinda a bit confused.
I'm plugging in an existing Linq to SQL services layer, which I developed for a webforms site.
I don't know much about the repository pattern, my services handle all connections with the data context, and work very well.
I have made a few repositories that plug in to the hive, and handle conversion of my entities to the Umbraco TypedEntity type.
These repositiories reference my existing services layer, to retrieve, add, update and delete. The services also handle other entity specific functions, which will not be used by the hive.
Now, it's nice to plug in these services, and just reference them in the hive repositories, but it seems I may be doing things the wrong way round, according to the offical repository pattern as I have read about.
I know there's no hard fast rules, but I would appreciate comments on what I'm doing to achieve this functionality.
I've asked this here instead of the Umbraco forum, as I want a wider perspective.
Cheers.
I personally feel that the Hive is overkill. With the ability to use your own classes directly within razor macros, I think the best approach is to forego the hive altogether and simply use your classes. Why would you trade all of the power of your existing service just to make it fit into the hive interface?
If you're writing a library for other Umbraco developers, you may need to do this, but it's my personal opinion that the hive is over-engineered at worst and a layer of abstraction aimed at newish developers at best.
So, if I were to advise you, I would say to consider the more general principles: "Keep It Simple" and "You Aren't Gonna Need It". If the interface they give you offers a tangible benefit, implement it. If not, consider what you really gain for all of that work.

traversing object graph from n-tier client

I'm a student currently dabbling in a .Net n-tier app that uses Nhibernate+WCF+WPF.
One of the things that is done quite terribly is object graph serialisation, In fact it isn't done at all, currently associations are ignored and we are using DTOs everywhere.
As far as I can tell one method to proceed is to predefine which objects and collections should be loaded and serialised to go across the wire, thus being able to present some associations to the client, however this seems limited, inflexible and inconsistent (can you tell that I don't like this idea).
One option that occurred to me was to simply replace the NHProxies that lazy load collection on the client tier with a "disconnectedProxy" that would retrieve the associated stuff over the wire. This would mean that we'd have to expand our web service signature a little and do some hackery on our generated proxies but this seemed like a good T4/other code gen experiment.
As far as I can tell this seems to be a common stumbling block but after doing a lot of reading I haven't been able to figure out any good/generally accepted solutions. I'm looking for a bit of direction as much as any particular solution, but if there is an easy way to make the client "feel" connected please let me know.
You ask a very good question that unfortunately does not have a very clean answer. Even if you were able to get lazy loading to work over WCF (which we were able to do) you still would have issues using the proxy interceptor. Trust me on this one, you want POCO objects on the client tier!
What you really need to consider...what has been conceived as the industry standard approach to this problem from the research I have seen, is called persistence vs. usage or persistence ignorance. In other words, your object model and mappings represent your persistence domain but it does not match your ideal usage scenarios. You don't want to bring the whole database down to the client just to display a couple properties right??
It seems like such a simple problem but the solution is either very simple, or very complex. On one hand you can design your entities around your usage scenarios but then you end up with proliferation of your object domain making it difficult to maintain. On the other, you still want the rich object model relationships in order to write granular business logic.
To simplify this problem let’s examine the two main gaps we need to fill…between the database and the database/service layer and the service to client gap. NHibernate fills the first one just fine by providing an ORM to load data into your objects. It does a decent job, but in order to achieve great performance it needs to be tweaked using custom loading strategies. I digress…
The second gap, between the server and client, is where things get dicey. To simplify, imagine if you did not send any mapped entities over the wire to the client? Try creating a mechanism that exchanges business entities into DTO objects and likewise DTO objects into business entities. That way your client deals with only DTOs (POCO of course), and your business logic can maintain its rich structure. This allows you to leverage not only NHibernate’s lazy loading mechanism, but other benefits from the session such as L1 cache.
For brevity and intellectual property reasons I will not go into the design of said mechanism, but hopefully this is enough information to point you in the right direction. If you don’t care about performance or latency at all…just turn lazy loading off all together and work through the serialization issues.
It has been a while for me but the injection/disconnected proxies may not be as bad as it sounds. Since you are a student I am going to assume you have some time and want to muck around a bit.
If you want to inject your own custom serialization/deserialization logic you can use IDataContractSurrogate which can be applied using DataContractSerializerOperationBehavior. I have only done a few basic things with this but it may be worth looking into. By adding some fun logic (read: potentially hackish) at this layer you might be able to make it more connected.
Here is an MSDN post about someone who came to the same realization, DynamicProxy used by NHibernate makes it not possible to directly serialize NHibernate objects doing lazy loading.
If you are really determined to transport the object graph across the network and preserve lazy loading functionality. Take a look at some code I produced over here http://slagd.com/?page_id=6 . Basically it creates a fake session on the other side of the wire and allows the nhibernate proxies to retain their functionality. Not saying it's the right way to do things, but it might give you some ideas.

Code generators or ORMs?

What do you suggest for Data Access layer? Using ORMs like Entity Framework and Hibernate OR Code Generators like Subsonic, .netTiers, T4, etc.?
For me, this is a no-brainer, you generate the code.
I'm going to go slightly off topic here because there's a bigger underlying fallacy at play. The fallacy is that these ORM frameworks solve the object/relational impedence mismatch. This claim is a barefaced lie.
I find the best way to resolve the object/relational impedance mismatch is to either use OOP exclusively and use an object database or use the idioms of the relational database exclusively and ignore OOP.
The abstraction "everything is a table" is to me, much more powerful than the abstraction "everything is a class." It takes less code, less intellectual effort and leads to faster code when you code to the database rather than to an object model.
To me this seems obvious. If your application is data driven then surely your code should be data driven too? Yet to say this is hugely controversial.
The central problem here is that OOP becomes a really leaky abstraction when used in conjunction with a database. Code that look perfectly sensible when written to the idioms of OOP looks completely insane when you see the traffic that code generates at the database. When that messiness becomes a performance problem, OOP is the first casualty.
There is really no way to resolve this. Databases work with sets of data. OOP focus on instances of classes. Trying to marry the two is always going to end in divorce.
So to answer your question, I believe you should generate your classes and try and make them map the underlying database structure as closely as possible.
Perhaps controversially, I've always felt that using code generators for the ADO.NET plumbing is fundamentally solving the wrong problem.
At some point, hopefully not too long after learning about Connection Strings, SqlCommands, DataAdapters, and all that, we notice that:
Such code is ugly
It is very boring to write
It's very easy to miss something if you're doing it by hand
It has to be repeated every time you want to access the database
So, the problem to solve is "how to do the same thing lots of times fast"?
I say no.
Using code generators to make this process quick still means that you have a ton of code, all the same, all over your business classes (or your data access layer, if you separate that from your business logic).
And then, if you want to do something generically (like track stored procedure usage, for instance), you end up having to customise your code generator if it doesn't already have the feature you want. And even if it does, you still have to regenerate everything all the time.
I like to do things once, not many times, no matter how fast I can do them.
So I rolled my own Data Access class that knows how to add parameters, set up and close connections, manage transactions, and other cool stuff. It only had to be written once, and calling its methods from a Business object that needs some database stuff done consists of one line of code.
When I needed to make the application support multithreaded database accesses, it required a change to the Data Access class only, and all the business classes just do what they already did.
There is no right answer it all depends on your project. As Simon points out if your application is all data driven, then it might make sense depending on the size and complexity of the domain to use non oop paradigm. I had a lot of success building a system using a Transaction Script pattern, which passed XML Messages around the system.
However this system started to break down after five or six years as the application grew in size and complexity (5 or 6 webs, several web services, tons of COM+ components, legacy and .net code, 8+ databases with 800+ tables 4,000+ procedures). No one knew what anything was, and duplication was running rampant.
There are other ways to alleviate the maintance then OOP; however, if you have a very complex domain then hainvg a rich domain model is ideal IMHO, as it allows for the business rules to be expressed in nice encapsulated components.
To answer your question, avoid code generators if you can. Code generators are a recipe for disaster, but if you do go with code generation do not modify the generated code. Also be sure to have a good process in place that is easy for developers to get the new generated code.
I recommend using either the following: ORM or hand roll a lightweight DAL. I am currently transitioning a project over to nHibernate off my hand rolled DAL and am having a lot of success; however, I like having the option of using either option. Further if you properly seperate your concerns (Data Access from Business Layer from Presentation) you can have a single service layer that might talk to a Dao (Data Access Object) that for one object is an ORM but for another is hand rolled). I like this flexibility as it allows to apply the best tool to the job.
I like nHibernate over a hand rolled DAL because while my DAL does abstract away most of the ADO.Net code you still have to write the code that takes a data reader to an object or an object and creates the parameters.
I've always preferred to go the code generator route, especially in C# where you can make use of extended classes to add functionality to the basic data objects.
Hate to say this, but it depends. If you find an ORM tool that fits your needs go for it. We wrote our own system in small steps while developing the application. We are using C++ and there are not that many tools out there anyway. Ours ended up being a XML description of the database, from that the SQL generation script and the basic object layer and metadata were generated.
Do your homework and evaluate theses tools and you will find a good fit for your needs.