Maven best practice for generating multiple jars with different/filtered classes? - maven-2

I developed a Java utility library (similarly to Apache Commons) that I use in various projects.
In addition to fat clients, I also use it for mobile clients (PDA with J9 Foundation profile).
In time the library that started as a single project spread over multiple packages. As a result I end up with a lot of functionality, which is not really needed in all the projects.
Since this library is also used inside some mobile/PDA projects I need a way to collect just the used classes and generate the actual specialized jars.
Currently in the projects that are using this library, I have Ant jar tasks that generate (from the utility project) the specialized jar files (ex: my-util-1.0-pda.jar, my-util-1.0-rcp.jar) using include/exclude jar task features. This is mostly needed due to the size constraints on the generated jar file, for the mobile projects.
Migrating now to Maven I just wonder if there are any best practices to arrive to something similar. I consider the following scenarios:
[1] - additionally to the main jar artifact (my-lib-1.0.jar) also generating inside my-lib project the separate/specialized artifacts using classifiers (ex: my-lib-1.0-pda.jar) using Maven Jar Plugin or Maven Assembly Plugin filtering/includes. I'm not very comfortable with this approach since it pollutes the library with library consumers demands (filters).
[2] - Create additional Maven projects for all the specialized clients/projects, that will "wrap" the "my-lib" and generate the filtered jar artifacts (ex: my-lib-wrapper-pda-1.0 ...etc). As a result, these wrapper projects will include the filtering (to generate the filtered artifact) and will depend just on the "my-lib" project and the client projects will depend on my-lib-wrapper-xxx-1.0 instead of my-lib-1.0. This approach may look problematic since even that will let "my-lib" project intact (with no additional classifiers and artifacts), basically will double the number of projects since for every client project I'll have one lib, just to collect the needed classes from the "my-util" library ("my-pda-app" project will need a "my-lib-wrapper-for-my-pda-app" project/dependency).
[3] - In every client project that uses the library (ex: my-pda-app) add some specialized Maven plugins to trim out (when generating the final artifact/package) the classes that are not required (ex: maven-assembly-plugin, maven-jar-plugin, proguard-maven-plugin).
What is the best practice for solving this kind of problems in the "Maven way"?

The Maven general rule is "one primary artifact per POM" for the sake of modularity and the reasons one shouldn't break this convention (in general) are very well explained in the How to Create Two JARs from One Project (...and why you shouldn’t) blog post. There are however justified exceptions (for example an EJB project producing an EJB JAR and a client EJB JAR with only interfaces). Having said that:
The mentioned blog post (also check Using Maven When You Can't Use the Conventions) explains how you could implement Option 1 using separate profiles or the JAR plugin. If you decide to implement this solution, keep in mind that this should be an exception and that it might make dependency management trickier (and, as you mentioned, pollute the project with "client filtering logic"). Just in case, I would use several JAR plugin executions here.
Option 2 isn't very different from Option 1 IMO (except that it separate things): basically, having N other wrapping/filtering projects is very similar with having N filtering rules in one project. And if filtering makes sense, I prefer Option 1.
I don't like Option 3 at all because I think it shouldn't be the responsibility of a client of a library to "trim out" unwanted things. First, a client project doesn't necessarily have the required knowledge (what to trim) and, second, this might create a big mess with other plugins.
BUT if the fat clients are not using the whole my-lib (like server-side code would require the whole EJB JAR), then filtering isn't the right "maven way" to handle your situation. The right way would be Option 4: put everything common in a project (producing my-lib-core-1.0.jar) and specific parts in specific projects (that will produce my-lib-pda-1.0.jar etc). Clients would then depend on the core artifact and specialized ones.


How to Group Plug-ins into Features

We are struggeling hard with how to use features the correct way.
Let’s say we have the plug-in org.acme.module which depends on org.thirdparty.specific and org.acme.core.
And we have the plug-in org.acme.other which depends on org.acme.core.
We want to create an application from these, which includes a target file and a product file. We have the following options:
One feature per module:
This makes the target and product files gigantic, and the dependencies are very hard to manage manually.
One feature per dependency group:
This approach makes the dependencies very easy to manage, and the target and product files are easy to read and maintain. However it does not work at all. The moment org.acme.core changes, you need to change ALL the features. Furthermore, the application has no say in what to package, so it can’t even decide to update org.acme.core (because of a bugfix or something).
Platform Feature:
org.thirdparty.specific (but could be its own feature)
This is the approach used for Hello World applications and Eclipse add-ons - and it only works for those. Since all modules' target platforms would point to org.acme.platform.feature, every time anything changes for any platform plug-in, you'd have to update org.acme.platform.feature accordingly.
We actually tried that approach with only about 50 platform plug-ins. It's not feasible to have a developer change the feature for every bugfix. (And while Tycho supports version "0.0.0", Eclipse does not, so it's another bag of problems to use that. Also, we need reproducibility, so having PDE choose versions willy-nilly is out of the question.)
Again it all comes down to "I can't use org.acme.platform.feature and override org.acme.core's version for two weeks until the new feature gets released.
The entire problem is made even more difficult since sometimes more than one configuration of plug-ins are possible (let's say for different database providers), and then there are high level modules using other child modules to work correctly, which has to be managed somehow.
Is there something we are missing? How do other companies manage these problems?
The Eclipse guys seem to use the “one feature per module” approach. Not surprisingly, since it’s the only one that works. But they don’t use target platforms nor product files.
The key to a successful grouping is when to use "includes" in features and when to just use dependencies. The difference is that "includes" are really included, i.e. p2 will install included bundles and/or included features all the time. That's the reason why you need to update a bundle in every feature if it's included. If you don't update it, you will end up with multiple versions in the install.
Also, in the old day one had to specify dependencies in features. These days, p2 will mostly figure out dependencies from the bundles. Thus, I would actually stop specifying dependencies in features but just includes. Think of features as a way to specify what gets aggregated.
Another key point to grouping is - less is more. If you have as many features as bundles chances a pretty high that you have a granularity issue. Instead, think about what would a user install separately. There is no need to have four features for things that a user would never install alone. Features should not be understood as a way of grouping development/project structures - that's where folders in SCM or different SCM repos are ok. Think of features as deployment structures.
With that approach, I would recommend a structure similar to the following example.
base feature containing the bare minimum of the product
could be org.acme.core plus a few minimum
features with 3rd party libraries for my.product.base
feature bundling an add-on
separate features for things that can be installed separately
3rd party libraries for add-on dependencies
Now in the product definition I would list just my.product.base. There is no need to also list the dependencies features. p2 will fetch and install the dependencies automatically. However, if you want to bind your product to specific versions of the dependencies and don't want p2 to select any matching one, then you must include the my.product.base.dependencies feature.
In the target definition I would include a "my.product.sdk" feature. That feature is an aggregation feature of all other features. It makes target platform management easier. I typically create an sdk feature with everything.
Another feature that is also very often seen is a "master" feature. This is an "everything" feature that maybe used for creating a p2 repository during the build. The resulting p2 repository is then used for assembling products.
For a more real world example see here:
Features and Continuous Delivery
There was a comment regarding frequent updates to feature.xml. A feature.xml only needs to be modified when there is a change in structure. No updates need to happen when the bundle version is modified. You should reference bundles in features with version 0.0.0. That makes Tycho to fill in the proper version at build time. Thus, all you need to do is commit a change to any bundle and then kick off a rebuild. Tycho also takes care of updating the feature qualifier based on the qualifiers of the contained bundles. Thus, the new feature qualifier will be different than in a previous build.

In cmake, what is a "project"?

This question is about the project command and, by extension, what the concept of a project means in cmake. I genuinely don't understand what a project is, and how it differs from a target (which I do understand, I think).
I had a look at the cmake documentation for the project command, and it says that the project command does this:
Set a name, version, and enable languages for the entire project.
It should go without saying that using the word project to define project is less than helpful.
Nowhere on the page does it seem to explain what a project actually is (it goes through some of the things the command does, but doesn't say whether that list is exclusive or not). The examples take us through a basic build setup, and while it uses the project keyword it also doesn't explain what it does or means, at least not as far as I can tell.
What is a project? And what does the project command do?
A project logically groups a number of targets (that is, libraries, executables and custom build steps) into a self-contained collection that can be built on its own.
In practice that means, if you have a project command in a CMakeLists.txt, you should be able to run CMake from that file and the generator should produce something that is buildable. In most codebases, you will only have a single project per build.
Note however that you may nest multiple projects. A top-level project may include a subdirectory which is in turn another self-contained project. In this case, the project command introduces additional scoping for certain values. For example, the PROJECT_BINARY_DIR variable will always point to the root binary directory of the current project. Compare this with CMAKE_BINARY_DIR, which always points to the binary directory of the top-level project. Also note that certain generators may generate additional files for projects. For example, the Visual Studio generators will create a .sln solution file for each subproject.
Use sub-projects if your codebase is very complex and you need users to be able to build certain components in isolation. This gives you a very powerful mechanism for structuring the build system. Due to the increased coding and maintenance overhead required to make the several sub-projects truly self-contained, I would advise to only go down that road if you have a real use case for it. Splitting the codebase into different targets should always be the preferred mechanism for structuring the build, while sub-projects should be reserved for those rare cases where you really need to make a subset of targets self-contained.

How to use or abuse artifact classifiers in maven?

We are currently attempting to port a very (very) large project built with ant to maven (while also moving to svn). All possibilities are being explored in remodeling the project structure to best fit the maven paradigm.
Now to be more specific, I have come across classifiers and would like to know how I could use them to my advantage, while refraining from "classifier anti-patterns".
classifier: You may occasionally find a fifth element on the
coordinate, and that is the classifier. We will visit the classifier
later, but for now it suffices to know that those kinds of projects
are displayed as groupId:artifactId:packaging:classifier:version.
The classifier allows to distinguish artifacts that were built from
the same POM but differ in their content. It is some optional and
arbitrary string that - if present - is appended to the artifact name
just after the version number. As a motivation for this element,
consider for example a project that offers an artifact targeting JRE
1.5 but at the same time also an artifact that still supports JRE 1.4. The first artifact could be equipped with the classifier jdk15 and the
second one with jdk14 such that clients can choose which one to use.
Another common use case for classifiers is the need to attach
secondary artifacts to the project's main artifact. If you browse the
Maven central repository, you will notice that the classifiers sources
and javadoc are used to deploy the project source code and API docs
along with the packaged class files.
I think the correct question would be How to use or abuse attached artifacts maven? Because basicaly that is why classifiers are introduced - to allow you to publish attached artifacts.
Well, Maven projects often implicitely use attached artifacts, e.g. by using maven-javadoc-plugin or maven-source-plugin. maven-javadoc-plugin publishes attached artifact that contains generated documentation by using a javadoc classifier, and maven-source-plugin publishes sources by using sources classifier.
Now what about explicit usage of attached artifacts? I use attached artifacts to publish harness shell scripts ( and Co). It's also a good idea to publish SQL scripts in the attached artifact with a classifier sql or something like that.
How can you attach an arbitary artifact with your classifier? - this can be done with build-helper-maven-plugin.
... I would like to know how I could use them to my advantage ...
Don't use them. They are optional and arbitrary.
If you are in the middle of porting a project over to maven, keep things simple and only do what is necessary (at first) to get everything working as you'd like. Then, after things are working like you want, you can explore more advanced features of maven to do cool stuff.
This answer is based on your question sounding like a "This features sounds neat, how can I use it even though I don't have a need for it?" kind of question. If you have a need for this feature, please update your question with more information on how you were thinking of utilizing the classifier feature and we will all be more informed to help you.
In contrast to Jesse Web's answer, it is good to learn about classifiers so that you can leverage them and avoid having to refactor code in addition to porting to maven. We went through the same process a year or two ago. Previously we had everything in one code base and built together with ant. In migrating to maven, we also found the need to break out the various components into their own maven projects. Some of these projects were really libraries, but had some web resources (jsp, js, images, etc.). The end result was us creating an attached artifact (as mentioned by #Male) with the web resources, using the classifier "web-resources" and the type "war" (to use as an overlay). This was then, and still does after understanding maven better, the best solution to port an old, coupled, project. We are eventually wanting to separate out these web resources since they don't belong in this library, but at least it can be done as a separate task.
In general, you want to avoid having attached artifacts. This is typically a sign that a separate project should be created to build that artifact. I suggest looking at doing this anytime you are tempted to attach an artifact with a separate classifier.
I use classifiers to define supporting artefacts to the main artefact.
For example I have|foo-1.0.war and have some associated config called|
You can use classifers when you have different versions of the same artifact that you want to deploy to your repository.
Here's a use case:
I use them in conjunction with properties in a pom. The pom has default values which can be overriden via the command line. Running without options uses the default property value. If I build a version of the artifact with different property values, I can deploy that to the repo with a classifier.
For example, the command:
mvn -DmyProperty=specialValue package install:install-file -Dfile=target/my-ear.ear -DpomFile=my-ear/pom.xml -Dclassifier=specialVersion
Builds a version of an ear artifact with special properties and deploys the artifact to my repo with a classifier "specialVersion".
So, my repo can have my-ear-1.0.0.ear and my-ear-1.0.0-specialVersion.ear.

Using maven to create two artifacts with overlapping classes

I have a maven pom that creates an artifact, let's call it everything.jar.
I would like to copy a subset of the classes in everything.jar into another jar, let's call it mini.jar.
What's the best way to structure my maven pom(s) to produce two jar files, one called mini.jar with just a few classes, and the other everything.jar with everything in mini plus some additional classes, without actually making copies of the source?
I'd do it the other way around.
Create a multi - module project:
/ | \
mini extra everything
mini contains the core stuff
extra has a dependency to mini and defines the additional classes
everything has a dependency to both and uses the maven-shade-plugin to create a
combined jar from the two other
projects (you can also do that from
inside the extra project, but I'd
call that less elegant)
shade:shade mojo
Selecting Contents for Uber JAR

How to find unneccesary dependencies in a maven multi-project?

If you are developing a large evolving multi module maven project it seems inevitable that there are some dependencies given in the poms that are unneccesary, since they are transitively included by other dependencies. For example this happens if you have a module A that originally includes C. Later you refactor and have A depend on a module B which in turn depends on C. If you are not careful enough you'll wind up with both B and C in A's dependency list. But of course you do not need to put C into A's pom, since it is included transitively, anyway. Is there tool to find such unneccesary dependencies?
(These dependencies do not actually hurt, but they might obscure your actual module structure and having less stuff in the pom is usually better. :-)
To some extent you can use dependency:analyze, but it's not too helpful. Also check JBoss Tattletale.
Some time ago I've started a maven-storyteller-plugin to be able to deeper analyze the poms, but the project is very far from production/public use. You can use the storyteller:recount goal to analyze the unused/redundant dependencies.
The problem with the whole story is - how to determine "unused" things. What is quite possible to analyze is for instance class references. But it won't work if you're using reflection - directly or non-directly.
Update November 2014.
I've just moved my old code of the Storyteller plugin to GitHub. I'll refresh it and release to the central so that it's usable for others.
personaly use the pom editor of M2Eclipse to visually view the dependency tree (2D tree). Then I give a look in my deliverable (war, ear) lib directories. Then still in M2Eclipse pom dependencies viewer I go to every 3rd party, and right click on the dependency I want to exclude (an exclusion is added automatically in the right dependency).
There is no golden rules, simply some basic tips:
a lot of pom are not correct: a lot of 3rd party libs out there require way too much dependencies in the default compile scope, if everybody carefully craft their pom, you must not have so much unwanted dependencies.
you need to guess by the name of dependencies what you will have to exclude, best example are parsers, transformer, documentbuilder: xalan, xerces, xalan alfred and co. try to remove them and use the internal jdk1.6 parser, common apache stuff, log4j is also worth looking at.
look also regularly in lib delivery if you do not have duplicate libraries with different version (the dependency resolver of maven should avoid that)
go bottom up, start with your common modules, then go up till the service layer, trimming down dependencies in every module, dont try to start in modules ear/war, it will be too difficult
check often if your deliverable are still working, by either testing or comparing and old deliverable with the new one (especially in web-inf/lib directory what has disappeared with winmerge/beyoncompare)
When you have A -> B, B -> C, and then refactor such that A -> (B, C). IF it is the case that A still compiles against B, you very much don't want to simply pick up the dependency because you receive it transitively.
Think of the case when A -> (B-1.0, C-1.0), B-1.0 -> C-1.0. Everything's in sync, so to avoid "duplication" you remove C from A's dependency. Then you upgrade A to use B-2.0 -> C-2.0. You begin to see errors because A wants C-1.0 classes but found C-2.0 classes. While quickly reconcilable in this scenario, it is far less so when you have lots of dependencies.
You very much want the information in A's pom that says that it explicitly expects to find C-1.0 on the classpath so that you can understand when you have transitive dependency conflicts. Again, Maven will do the job of ensuring that the "closest" version of any particular jar ends up on your classpath. But when things go wrong - you want all the dependency metatdata you can get.
On a slightly more practical note, a dependency is unused when you can remove it from your pom and all of your unit/integration/acceptance tests still pass. ;-)