How to find unneccesary dependencies in a maven multi-project?

How to find unneccesary dependencies in a maven multi-project? - maven-2

If you are developing a large evolving multi module maven project it seems inevitable that there are some dependencies given in the poms that are unneccesary, since they are transitively included by other dependencies. For example this happens if you have a module A that originally includes C. Later you refactor and have A depend on a module B which in turn depends on C. If you are not careful enough you'll wind up with both B and C in A's dependency list. But of course you do not need to put C into A's pom, since it is included transitively, anyway. Is there tool to find such unneccesary dependencies?
(These dependencies do not actually hurt, but they might obscure your actual module structure and having less stuff in the pom is usually better. :-)

To some extent you can use dependency:analyze, but it's not too helpful. Also check JBoss Tattletale.
Some time ago I've started a maven-storyteller-plugin to be able to deeper analyze the poms, but the project is very far from production/public use. You can use the storyteller:recount goal to analyze the unused/redundant dependencies.
The problem with the whole story is - how to determine "unused" things. What is quite possible to analyze is for instance class references. But it won't work if you're using reflection - directly or non-directly.
Update November 2014.
I've just moved my old code of the Storyteller plugin to GitHub. I'll refresh it and release to the central so that it's usable for others.

I
personaly use the pom editor of M2Eclipse to visually view the dependency tree (2D tree). Then I give a look in my deliverable (war, ear) lib directories. Then still in M2Eclipse pom dependencies viewer I go to every 3rd party, and right click on the dependency I want to exclude (an exclusion is added automatically in the right dependency).
There is no golden rules, simply some basic tips:
a lot of pom are not correct: a lot of 3rd party libs out there require way too much dependencies in the default compile scope, if everybody carefully craft their pom, you must not have so much unwanted dependencies.
you need to guess by the name of dependencies what you will have to exclude, best example are parsers, transformer, documentbuilder: xalan, xerces, xalan alfred and co. try to remove them and use the internal jdk1.6 parser, common apache stuff, log4j is also worth looking at.
look also regularly in lib delivery if you do not have duplicate libraries with different version (the dependency resolver of maven should avoid that)
go bottom up, start with your common modules, then go up till the service layer, trimming down dependencies in every module, dont try to start in modules ear/war, it will be too difficult
check often if your deliverable are still working, by either testing or comparing and old deliverable with the new one (especially in web-inf/lib directory what has disappeared with winmerge/beyoncompare)

When you have A -> B, B -> C, and then refactor such that A -> (B, C). IF it is the case that A still compiles against B, you very much don't want to simply pick up the dependency because you receive it transitively.
Think of the case when A -> (B-1.0, C-1.0), B-1.0 -> C-1.0. Everything's in sync, so to avoid "duplication" you remove C from A's dependency. Then you upgrade A to use B-2.0 -> C-2.0. You begin to see errors because A wants C-1.0 classes but found C-2.0 classes. While quickly reconcilable in this scenario, it is far less so when you have lots of dependencies.
You very much want the information in A's pom that says that it explicitly expects to find C-1.0 on the classpath so that you can understand when you have transitive dependency conflicts. Again, Maven will do the job of ensuring that the "closest" version of any particular jar ends up on your classpath. But when things go wrong - you want all the dependency metatdata you can get.
On a slightly more practical note, a dependency is unused when you can remove it from your pom and all of your unit/integration/acceptance tests still pass. ;-)

Related

How to Group Plug-ins into Features

We are struggeling hard with how to use features the correct way.
Let’s say we have the plug-in org.acme.module which depends on org.thirdparty.specific and org.acme.core.
And we have the plug-in org.acme.other which depends on org.acme.core.
We want to create an application from these, which includes a target file and a product file. We have the following options:
One feature per module:
org.acme.core.feature
org.acme.core
org.acme.module.feature
org.acme.module
org.acme.other.feature
org.acme.other
org.thirdparty.specific.feature
org.thirdparty.specific
This makes the target and product files gigantic, and the dependencies are very hard to manage manually.
One feature per dependency group:
org.acme.module.feature
org.acme.core
org.acme.module
org.thirdparty.specific
org.acme.other.feature
org.acme.core
org.acme.other
This approach makes the dependencies very easy to manage, and the target and product files are easy to read and maintain. However it does not work at all. The moment org.acme.core changes, you need to change ALL the features. Furthermore, the application has no say in what to package, so it can’t even decide to update org.acme.core (because of a bugfix or something).
Platform Feature:
org.acme.platform.feature
org.acme.core
org.acme.other
org.thirdparty.specific (but could be its own feature)
org.acme.module.feature
org.acme.module
This is the approach used for Hello World applications and Eclipse add-ons - and it only works for those. Since all modules' target platforms would point to org.acme.platform.feature, every time anything changes for any platform plug-in, you'd have to update org.acme.platform.feature accordingly.
We actually tried that approach with only about 50 platform plug-ins. It's not feasible to have a developer change the feature for every bugfix. (And while Tycho supports version "0.0.0", Eclipse does not, so it's another bag of problems to use that. Also, we need reproducibility, so having PDE choose versions willy-nilly is out of the question.)
Again it all comes down to "I can't use org.acme.platform.feature and override org.acme.core's version for two weeks until the new feature gets released.
The entire problem is made even more difficult since sometimes more than one configuration of plug-ins are possible (let's say for different database providers), and then there are high level modules using other child modules to work correctly, which has to be managed somehow.
Is there something we are missing? How do other companies manage these problems?
The Eclipse guys seem to use the “one feature per module” approach. Not surprisingly, since it’s the only one that works. But they don’t use target platforms nor product files.

The key to a successful grouping is when to use "includes" in features and when to just use dependencies. The difference is that "includes" are really included, i.e. p2 will install included bundles and/or included features all the time. That's the reason why you need to update a bundle in every feature if it's included. If you don't update it, you will end up with multiple versions in the install.
Also, in the old day one had to specify dependencies in features. These days, p2 will mostly figure out dependencies from the bundles. Thus, I would actually stop specifying dependencies in features but just includes. Think of features as a way to specify what gets aggregated.
Another key point to grouping is - less is more. If you have as many features as bundles chances a pretty high that you have a granularity issue. Instead, think about what would a user install separately. There is no need to have four features for things that a user would never install alone. Features should not be understood as a way of grouping development/project structures - that's where folders in SCM or different SCM repos are ok. Think of features as deployment structures.
With that approach, I would recommend a structure similar to the following example.
my.product.base
base feature containing the bare minimum of the product
could be org.acme.core plus a few minimum
my.product.base.dependencies
features with 3rd party libraries for my.product.base
my.addon.xyz
feature bundling an add-on
separate features for things that can be installed separately
my.addon.xyz.dependencies
3rd party libraries for add-on dependencies
Now in the product definition I would list just my.product.base. There is no need to also list the dependencies features. p2 will fetch and install the dependencies automatically. However, if you want to bind your product to specific versions of the dependencies and don't want p2 to select any matching one, then you must include the my.product.base.dependencies feature.
In the target definition I would include a "my.product.sdk" feature. That feature is an aggregation feature of all other features. It makes target platform management easier. I typically create an sdk feature with everything.
Another feature that is also very often seen is a "master" feature. This is an "everything" feature that maybe used for creating a p2 repository during the build. The resulting p2 repository is then used for assembling products.
For a more real world example see here:
http://git.eclipse.org/c/gyrex/gyrex-server.git/tree/releng/features
Features and Continuous Delivery
There was a comment regarding frequent updates to feature.xml. A feature.xml only needs to be modified when there is a change in structure. No updates need to happen when the bundle version is modified. You should reference bundles in features with version 0.0.0. That makes Tycho to fill in the proper version at build time. Thus, all you need to do is commit a change to any bundle and then kick off a rebuild. Tycho also takes care of updating the feature qualifier based on the qualifiers of the contained bundles. Thus, the new feature qualifier will be different than in a previous build.

IntelliJ multi-project

Moving to intellij i'm trying to understand properly the logic behind the its project structure. I come from eclipse. After reading for a while i understood the relation between workspace and project, then between project and modules. However something that is puzzling me is the logic of the default project configuration in Intellij. Indeed, when you create a project there is an initial module which to a certain extend is equivalent to the Project itself. To be more precise, the initial module folder is the Project folder. This is kind of confusing to me. Then when you add more module they are sub-module of that module.
My first question is what is the rationale of making this first module equivalent to the project folder ?
Following this, i would further ask, what the point of having modules as sub-module of others.
In eclipse i use to have simply different project (i.e. module) independent from each other and adding the dependency as necessary. So how does the Idea solution makes it better, if not what is the rational here ?
I saw that one can start an empty project and then add modules to it. However in that case, the modules added are added as subfolder of the Project and therefore there is no initial module equivalent to the Project folder ? So why this difference and what is the rationale behind it ?
What would be the better approach, the first or second ?
Would it be ok to have this first initial module with no src or test folder but just with the proper facet so as to spread it to the sub-module?
I would appreciate if someone could explain a bit the rational of all of it ?
I will move to SBT soon (i.e. maven structure which I suppose inspired all modern IDE project Structure) if one want to explain within that context fine, nevertheless i want to understand the rationale in intelliJ first.
Many thanks,
-M-
PS: What i'm looking for is some advise for some multi-module project structure in Intellij as i'm moving my eclipse workspaces to it.

I think that it's not uncommon for projects to be relatively small, so they don't need fancy modules with dependency management etc. In that case, I find the default project created by IntelliJ to fit perfectly my needs: no need to add submodules, everything is directly in the parent project, it reduces the structure to its bare minimum.
On the other hand, big projects with submodules will likely resemble the structure of a Maven multimodule project (perhaps SBT too, but I don't know this tool at all). You have a parent root which acts as a container for submodules. The parent project may also store configuration (a default SDK, a language level etc. that will be inherited by the submodules). The actual code will be contained in the submodules.
Regarding your questions, it all depends on the kind of project you are developing. For a small codebase, you could keep a simple project with no submodule. For bigger codebases, you can either create modules manually, or import an existing Maven/SBT/whatever project, which will automatically create modules reflecting the imported structure.

maven dependency clash

In my project, there are 2 libraries, each of which depend on the XML parsing class java.xml.parsers.DocumentBuilderFactory. Each of these libraries reference the file from different jar (one gets it from a jar called xmlParserAPIs while another gets it from xml-apis-1.0.b2.jar). Unfortunately there are different versions of the class in each of these files so I am seeing runtime errors, depending on the order they are loaded. Both of these xml jars are transitive dependencies of 3rd party libraries. Is there a good way to handle this conflict?
edit: I'm not sure if it makes a difference on how to handle the problem but this only happens in testing because one of the dependencies is in the test scope.
thanks,
jeff

(...) Unfortunately there are different versions of the class in each of these files so I am seeing runtime errors, depending on the order they are loaded.
In theory, xml-apis.jar and xmlParserAPIs.jar (from xerces2-j) are the same JARs but with different names, xmlParserAPIs.jar being deprecated for years (see this message and this one).
If your dependencies relies on different and incompatible versions of xml-apis.jar, I would say that these dependencies are mutually exclusive, in other words incompatible, at least for the versions you're using. The only solution would be to find versions with a converging dependency.
In case they could use compatible versions, use a dependency exclusion for xmlParserAPIs.jar to use xml-api.jar only.
I'm not sure if it makes a difference on how to handle the problem but this only happens in testing because one of the dependencies is in the test scope.
No, this just explains why you don't get the problem at runtime (because the test scoped is not on the classpath then and, obviously, doesn't conflict).

In a Maven project, what are reasons for either a nested or a flat directory layout?

As my Maven project grows, I'm trying to stay on top of the project structure. So far, I have a nested directory layout with 2-3 levels, where there's a POM on each level with module entries corresponding to the directories at that level. POM inheritance (parent property) does not necessarily follow this, and is not relevant for the purpose of this question.
Now, while the nested structure seems pretty natural to Maven, and it's nice and clean as long as you are on one particular level, I'm starting to get confused by what I look at in my IDE (Eclipse and IntelliJ IDEA).
I had a look at the Apache Felix sources, and they have a pretty complex project in what seems to be a flat directory structure, so I'm wondering if this would be a better way to go.
What are some pros and cons for either approach that you have experienced in practice?
Note that this question (which I found meanwhile) seems to be very similar. I'll leave it to the community to decide whether this should be closed as a duplicate.

I use a kind of mixed approach. Things with distinct lifecycle (from a release and thus VCS point of view) are flat, things with the same lifecycle are nested. And I use svn:externals for the checkout. I wrote about this approach in this previous answer.

I vote for nesting. I'm using IDEA 9 which shows the nesting in the project pane, so the presentation mirrors your logical project structure. (This wasn't the case in 8.1 - it was flattened out.)
I prefer keeping things nested, especially if the names are very similar - makes navigation much easier when using a command prompt. I have a project with names like myapp-layer-component, so they all start with the same prefix, and many have the same -layer-, so using autocomplete on the commandline is next to useless. Separating these out into a nested structure is then much easier because each part of the name (appname, layer or component) is repeated just once at each level in the directory structure.
If building from the command line, it's much easier to build a subset of the project, e.g. if I'm working on the db model, then often I need to build all projects in that area. This is tricky to do when the files are flattened out - the only way I know is to use the -pl argument to maven and specify the proejcts to build. With the nested directories, I just cd to the db directory and run mvn.
For example, instead of
myapp-web-gui1
myapp-web-gui2
myapp-web-base
myapp-svc-clustered
myapp-svc-clustered-integrationtest
myapp-svc-simple
myapp-db-model
myapp-db-hibernate
We have the structure
\myapp
\web
\gui1
pom.xml
\gui2
pom.xml (other poms omitted to keep it short)
\base
\svc
\clustered
\clustered-it
\simple
\db
\model
\hibernate
You could also add nesting for the integration tests, but this seems like driving the point too far.
With nesting, you also get all the benefits of inheritance (and some of it's pains...)
The only issue I've had with this is that the directory name doesn't match the artifact id. (I'm still using full artifactIds.) And so each project must explicitly define SCM paths, since these can no longer be inferred from the parent pom. Of course, each directory can be made the same as the artifactId, and then the SCM details can be inferred from the parent, but I found the long directory names a bit unwieldy.

Maven best practice for generating multiple jars with different/filtered classes?

I developed a Java utility library (similarly to Apache Commons) that I use in various projects.
In addition to fat clients, I also use it for mobile clients (PDA with J9 Foundation profile).
In time the library that started as a single project spread over multiple packages. As a result I end up with a lot of functionality, which is not really needed in all the projects.
Since this library is also used inside some mobile/PDA projects I need a way to collect just the used classes and generate the actual specialized jars.
Currently in the projects that are using this library, I have Ant jar tasks that generate (from the utility project) the specialized jar files (ex: my-util-1.0-pda.jar, my-util-1.0-rcp.jar) using include/exclude jar task features. This is mostly needed due to the size constraints on the generated jar file, for the mobile projects.
Migrating now to Maven I just wonder if there are any best practices to arrive to something similar. I consider the following scenarios:
[1] - additionally to the main jar artifact (my-lib-1.0.jar) also generating inside my-lib project the separate/specialized artifacts using classifiers (ex: my-lib-1.0-pda.jar) using Maven Jar Plugin or Maven Assembly Plugin filtering/includes. I'm not very comfortable with this approach since it pollutes the library with library consumers demands (filters).
[2] - Create additional Maven projects for all the specialized clients/projects, that will "wrap" the "my-lib" and generate the filtered jar artifacts (ex: my-lib-wrapper-pda-1.0 ...etc). As a result, these wrapper projects will include the filtering (to generate the filtered artifact) and will depend just on the "my-lib" project and the client projects will depend on my-lib-wrapper-xxx-1.0 instead of my-lib-1.0. This approach may look problematic since even that will let "my-lib" project intact (with no additional classifiers and artifacts), basically will double the number of projects since for every client project I'll have one lib, just to collect the needed classes from the "my-util" library ("my-pda-app" project will need a "my-lib-wrapper-for-my-pda-app" project/dependency).
[3] - In every client project that uses the library (ex: my-pda-app) add some specialized Maven plugins to trim out (when generating the final artifact/package) the classes that are not required (ex: maven-assembly-plugin, maven-jar-plugin, proguard-maven-plugin).
What is the best practice for solving this kind of problems in the "Maven way"?

The Maven general rule is "one primary artifact per POM" for the sake of modularity and the reasons one shouldn't break this convention (in general) are very well explained in the How to Create Two JARs from One Project (...and why you shouldn’t) blog post. There are however justified exceptions (for example an EJB project producing an EJB JAR and a client EJB JAR with only interfaces). Having said that:
The mentioned blog post (also check Using Maven When You Can't Use the Conventions) explains how you could implement Option 1 using separate profiles or the JAR plugin. If you decide to implement this solution, keep in mind that this should be an exception and that it might make dependency management trickier (and, as you mentioned, pollute the project with "client filtering logic"). Just in case, I would use several JAR plugin executions here.
Option 2 isn't very different from Option 1 IMO (except that it separate things): basically, having N other wrapping/filtering projects is very similar with having N filtering rules in one project. And if filtering makes sense, I prefer Option 1.
I don't like Option 3 at all because I think it shouldn't be the responsibility of a client of a library to "trim out" unwanted things. First, a client project doesn't necessarily have the required knowledge (what to trim) and, second, this might create a big mess with other plugins.
BUT if the fat clients are not using the whole my-lib (like server-side code would require the whole EJB JAR), then filtering isn't the right "maven way" to handle your situation. The right way would be Option 4: put everything common in a project (producing my-lib-core-1.0.jar) and specific parts in specific projects (that will produce my-lib-pda-1.0.jar etc). Clients would then depend on the core artifact and specialized ones.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas