maven dependency clash - maven-2

In my project, there are 2 libraries, each of which depend on the XML parsing class java.xml.parsers.DocumentBuilderFactory. Each of these libraries reference the file from different jar (one gets it from a jar called xmlParserAPIs while another gets it from xml-apis-1.0.b2.jar). Unfortunately there are different versions of the class in each of these files so I am seeing runtime errors, depending on the order they are loaded. Both of these xml jars are transitive dependencies of 3rd party libraries. Is there a good way to handle this conflict?
edit: I'm not sure if it makes a difference on how to handle the problem but this only happens in testing because one of the dependencies is in the test scope.
thanks,
jeff

(...) Unfortunately there are different versions of the class in each of these files so I am seeing runtime errors, depending on the order they are loaded.
In theory, xml-apis.jar and xmlParserAPIs.jar (from xerces2-j) are the same JARs but with different names, xmlParserAPIs.jar being deprecated for years (see this message and this one).
If your dependencies relies on different and incompatible versions of xml-apis.jar, I would say that these dependencies are mutually exclusive, in other words incompatible, at least for the versions you're using. The only solution would be to find versions with a converging dependency.
In case they could use compatible versions, use a dependency exclusion for xmlParserAPIs.jar to use xml-api.jar only.
I'm not sure if it makes a difference on how to handle the problem but this only happens in testing because one of the dependencies is in the test scope.
No, this just explains why you don't get the problem at runtime (because the test scoped is not on the classpath then and, obviously, doesn't conflict).

Related

Difference between yarn/npm5 lockfiles and exact package versions?

My simple question is: why can't I just use exact versions in my package.json? How is this different from a lockfile?
The main difference is that lockfiles also lock nested dependencies - all of the dependencies of your dependencies, and so on. Managing and tracking all of those changes can be incredibly difficult, and the number of packages that are used can grow exponentially.
There are also situations where you cannot manually specify that a particular version of a package should be used - consider 2 libraries that specify foo at ~1.0.0 and ~2.0.0 respectively. The difference in major version tells us that the API of foo#v1 is not going to match the API of foo#v2, so there's no way you could override the package version at your app level without causing conflicts and failures.
Finally, you might wonder "why have semver at all then? Why not just have all packages manually specify the exact version of their dependencies?" One of the main advantages of semver is it means you don't have to update every dependency in the tree whenever a sub-dependency updates. If I rely on foo, and foo relies on bar, and bar just had a critical bug that was patched, and we're using exact versions for everything, then foo must also be updated before I can get the fix. If foo and bar have different maintainers, or if foo is abandoned, that could take a while and I may need to fork the project (something I've done more than once in Java-land).
This is very useful for maintaining ecosystems of libraries because it fundamentally reduces the amount of maintenance work required per-node in the dependency tree, making it easier to extract libraries and patterns. I once had an early project where we were building a component library that used exact versions, and any time the core library containing shared functionality was updated, we had to submit a PR to each of the other packages to update the version, and sometimes followup PRs to components that depended on those. Needless to say, we consolidated the packages after a few months.
Hope that helps!

How to Group Plug-ins into Features

We are struggeling hard with how to use features the correct way.
Let’s say we have the plug-in org.acme.module which depends on org.thirdparty.specific and org.acme.core.
And we have the plug-in org.acme.other which depends on org.acme.core.
We want to create an application from these, which includes a target file and a product file. We have the following options:
One feature per module:
org.acme.core.feature
org.acme.core
org.acme.module.feature
org.acme.module
org.acme.other.feature
org.acme.other
org.thirdparty.specific.feature
org.thirdparty.specific
This makes the target and product files gigantic, and the dependencies are very hard to manage manually.
One feature per dependency group:
org.acme.module.feature
org.acme.core
org.acme.module
org.thirdparty.specific
org.acme.other.feature
org.acme.core
org.acme.other
This approach makes the dependencies very easy to manage, and the target and product files are easy to read and maintain. However it does not work at all. The moment org.acme.core changes, you need to change ALL the features. Furthermore, the application has no say in what to package, so it can’t even decide to update org.acme.core (because of a bugfix or something).
Platform Feature:
org.acme.platform.feature
org.acme.core
org.acme.other
org.thirdparty.specific (but could be its own feature)
org.acme.module.feature
org.acme.module
This is the approach used for Hello World applications and Eclipse add-ons - and it only works for those. Since all modules' target platforms would point to org.acme.platform.feature, every time anything changes for any platform plug-in, you'd have to update org.acme.platform.feature accordingly.
We actually tried that approach with only about 50 platform plug-ins. It's not feasible to have a developer change the feature for every bugfix. (And while Tycho supports version "0.0.0", Eclipse does not, so it's another bag of problems to use that. Also, we need reproducibility, so having PDE choose versions willy-nilly is out of the question.)
Again it all comes down to "I can't use org.acme.platform.feature and override org.acme.core's version for two weeks until the new feature gets released.
The entire problem is made even more difficult since sometimes more than one configuration of plug-ins are possible (let's say for different database providers), and then there are high level modules using other child modules to work correctly, which has to be managed somehow.
Is there something we are missing? How do other companies manage these problems?
The Eclipse guys seem to use the “one feature per module” approach. Not surprisingly, since it’s the only one that works. But they don’t use target platforms nor product files.
The key to a successful grouping is when to use "includes" in features and when to just use dependencies. The difference is that "includes" are really included, i.e. p2 will install included bundles and/or included features all the time. That's the reason why you need to update a bundle in every feature if it's included. If you don't update it, you will end up with multiple versions in the install.
Also, in the old day one had to specify dependencies in features. These days, p2 will mostly figure out dependencies from the bundles. Thus, I would actually stop specifying dependencies in features but just includes. Think of features as a way to specify what gets aggregated.
Another key point to grouping is - less is more. If you have as many features as bundles chances a pretty high that you have a granularity issue. Instead, think about what would a user install separately. There is no need to have four features for things that a user would never install alone. Features should not be understood as a way of grouping development/project structures - that's where folders in SCM or different SCM repos are ok. Think of features as deployment structures.
With that approach, I would recommend a structure similar to the following example.
my.product.base
base feature containing the bare minimum of the product
could be org.acme.core plus a few minimum
my.product.base.dependencies
features with 3rd party libraries for my.product.base
my.addon.xyz
feature bundling an add-on
separate features for things that can be installed separately
my.addon.xyz.dependencies
3rd party libraries for add-on dependencies
Now in the product definition I would list just my.product.base. There is no need to also list the dependencies features. p2 will fetch and install the dependencies automatically. However, if you want to bind your product to specific versions of the dependencies and don't want p2 to select any matching one, then you must include the my.product.base.dependencies feature.
In the target definition I would include a "my.product.sdk" feature. That feature is an aggregation feature of all other features. It makes target platform management easier. I typically create an sdk feature with everything.
Another feature that is also very often seen is a "master" feature. This is an "everything" feature that maybe used for creating a p2 repository during the build. The resulting p2 repository is then used for assembling products.
For a more real world example see here:
http://git.eclipse.org/c/gyrex/gyrex-server.git/tree/releng/features
Features and Continuous Delivery
There was a comment regarding frequent updates to feature.xml. A feature.xml only needs to be modified when there is a change in structure. No updates need to happen when the bundle version is modified. You should reference bundles in features with version 0.0.0. That makes Tycho to fill in the proper version at build time. Thus, all you need to do is commit a change to any bundle and then kick off a rebuild. Tycho also takes care of updating the feature qualifier based on the qualifiers of the contained bundles. Thus, the new feature qualifier will be different than in a previous build.

Apache Ivy Configurations

I'm slowly beginning to understand the importance of module configurations within the Ivy universe. However it is still difficult for me to clearly see how the same chunk of code could have different configurations that have different dependency requirements (the one exception is in the case of test configs that require JUnit on top of the normal dependencies -- I actually understand that 100%!)
For instance, take the following code:
package org.myorg.myprogram.core;
// Import an object from a dependency
import org.someElse.theirJAR.Widget;
public class MyCode
{
public MyCode()
{
if(Widget.SOME_STATIC == 3)
System.out.println("Fizz");
else
System.out.println("Buzz");
}
}
Now aside from the fact that this is terrible code, I just don't see how my program (which, let's pretend is JARred up into MyProgram.jar) could be set to have multiple "configurations"; some of which may require theirJAR and its Widget class, and others that don't. To me, if we fail to provide MyCode with a Widget it will die at runtime, always.
Again, I understand the necessity for test configurations; just not anything else (I have also asked questions about compile- vs run-time dependencies, and I guess I also see the necessity for those as well). But beyond test configs, compile-time configs, and runtime configs, what other module configurations could you possibly need? How would MyCode need a Widget in some cases, and not in other cases, yet still run perfectly fine without a Widget?
I greatly appreciate any help wrapping my brain around this!
Hibernate is a good example. Hibernate supports multiple cache implementations to act as its level-2 cache. You don't want to transitively depend on all the possible caches, only the one you use.
In general, we use the typical compile, test, runtime set of configurations.
To add to SteveD's answer, remember that dependencies can be more than just .jar files. Some dependencies come with source and javadoc files, release notes, license files, etc. Multiple configurations of the dependency might let you select the subset of files you wish to resolve.
You might also want to use configurations to control the contents of different distributions. For example you might want to release the jar on it's own ("master" configuration in Maven parlance) and additionally build a tar package containing all runtime dependencies, with (or without) source code.
Another use for configurations is when you target multiple platforms. I often release groovy scripts packaged to run as standalone jars or as tomcat web applications

How to find unneccesary dependencies in a maven multi-project?

If you are developing a large evolving multi module maven project it seems inevitable that there are some dependencies given in the poms that are unneccesary, since they are transitively included by other dependencies. For example this happens if you have a module A that originally includes C. Later you refactor and have A depend on a module B which in turn depends on C. If you are not careful enough you'll wind up with both B and C in A's dependency list. But of course you do not need to put C into A's pom, since it is included transitively, anyway. Is there tool to find such unneccesary dependencies?
(These dependencies do not actually hurt, but they might obscure your actual module structure and having less stuff in the pom is usually better. :-)
To some extent you can use dependency:analyze, but it's not too helpful. Also check JBoss Tattletale.
Some time ago I've started a maven-storyteller-plugin to be able to deeper analyze the poms, but the project is very far from production/public use. You can use the storyteller:recount goal to analyze the unused/redundant dependencies.
The problem with the whole story is - how to determine "unused" things. What is quite possible to analyze is for instance class references. But it won't work if you're using reflection - directly or non-directly.
Update November 2014.
I've just moved my old code of the Storyteller plugin to GitHub. I'll refresh it and release to the central so that it's usable for others.
I
personaly use the pom editor of M2Eclipse to visually view the dependency tree (2D tree). Then I give a look in my deliverable (war, ear) lib directories. Then still in M2Eclipse pom dependencies viewer I go to every 3rd party, and right click on the dependency I want to exclude (an exclusion is added automatically in the right dependency).
There is no golden rules, simply some basic tips:
a lot of pom are not correct: a lot of 3rd party libs out there require way too much dependencies in the default compile scope, if everybody carefully craft their pom, you must not have so much unwanted dependencies.
you need to guess by the name of dependencies what you will have to exclude, best example are parsers, transformer, documentbuilder: xalan, xerces, xalan alfred and co. try to remove them and use the internal jdk1.6 parser, common apache stuff, log4j is also worth looking at.
look also regularly in lib delivery if you do not have duplicate libraries with different version (the dependency resolver of maven should avoid that)
go bottom up, start with your common modules, then go up till the service layer, trimming down dependencies in every module, dont try to start in modules ear/war, it will be too difficult
check often if your deliverable are still working, by either testing or comparing and old deliverable with the new one (especially in web-inf/lib directory what has disappeared with winmerge/beyoncompare)
When you have A -> B, B -> C, and then refactor such that A -> (B, C). IF it is the case that A still compiles against B, you very much don't want to simply pick up the dependency because you receive it transitively.
Think of the case when A -> (B-1.0, C-1.0), B-1.0 -> C-1.0. Everything's in sync, so to avoid "duplication" you remove C from A's dependency. Then you upgrade A to use B-2.0 -> C-2.0. You begin to see errors because A wants C-1.0 classes but found C-2.0 classes. While quickly reconcilable in this scenario, it is far less so when you have lots of dependencies.
You very much want the information in A's pom that says that it explicitly expects to find C-1.0 on the classpath so that you can understand when you have transitive dependency conflicts. Again, Maven will do the job of ensuring that the "closest" version of any particular jar ends up on your classpath. But when things go wrong - you want all the dependency metatdata you can get.
On a slightly more practical note, a dependency is unused when you can remove it from your pom and all of your unit/integration/acceptance tests still pass. ;-)

Maven best practice for generating multiple jars with different/filtered classes?

I developed a Java utility library (similarly to Apache Commons) that I use in various projects.
In addition to fat clients, I also use it for mobile clients (PDA with J9 Foundation profile).
In time the library that started as a single project spread over multiple packages. As a result I end up with a lot of functionality, which is not really needed in all the projects.
Since this library is also used inside some mobile/PDA projects I need a way to collect just the used classes and generate the actual specialized jars.
Currently in the projects that are using this library, I have Ant jar tasks that generate (from the utility project) the specialized jar files (ex: my-util-1.0-pda.jar, my-util-1.0-rcp.jar) using include/exclude jar task features. This is mostly needed due to the size constraints on the generated jar file, for the mobile projects.
Migrating now to Maven I just wonder if there are any best practices to arrive to something similar. I consider the following scenarios:
[1] - additionally to the main jar artifact (my-lib-1.0.jar) also generating inside my-lib project the separate/specialized artifacts using classifiers (ex: my-lib-1.0-pda.jar) using Maven Jar Plugin or Maven Assembly Plugin filtering/includes. I'm not very comfortable with this approach since it pollutes the library with library consumers demands (filters).
[2] - Create additional Maven projects for all the specialized clients/projects, that will "wrap" the "my-lib" and generate the filtered jar artifacts (ex: my-lib-wrapper-pda-1.0 ...etc). As a result, these wrapper projects will include the filtering (to generate the filtered artifact) and will depend just on the "my-lib" project and the client projects will depend on my-lib-wrapper-xxx-1.0 instead of my-lib-1.0. This approach may look problematic since even that will let "my-lib" project intact (with no additional classifiers and artifacts), basically will double the number of projects since for every client project I'll have one lib, just to collect the needed classes from the "my-util" library ("my-pda-app" project will need a "my-lib-wrapper-for-my-pda-app" project/dependency).
[3] - In every client project that uses the library (ex: my-pda-app) add some specialized Maven plugins to trim out (when generating the final artifact/package) the classes that are not required (ex: maven-assembly-plugin, maven-jar-plugin, proguard-maven-plugin).
What is the best practice for solving this kind of problems in the "Maven way"?
The Maven general rule is "one primary artifact per POM" for the sake of modularity and the reasons one shouldn't break this convention (in general) are very well explained in the How to Create Two JARs from One Project (...and why you shouldn’t) blog post. There are however justified exceptions (for example an EJB project producing an EJB JAR and a client EJB JAR with only interfaces). Having said that:
The mentioned blog post (also check Using Maven When You Can't Use the Conventions) explains how you could implement Option 1 using separate profiles or the JAR plugin. If you decide to implement this solution, keep in mind that this should be an exception and that it might make dependency management trickier (and, as you mentioned, pollute the project with "client filtering logic"). Just in case, I would use several JAR plugin executions here.
Option 2 isn't very different from Option 1 IMO (except that it separate things): basically, having N other wrapping/filtering projects is very similar with having N filtering rules in one project. And if filtering makes sense, I prefer Option 1.
I don't like Option 3 at all because I think it shouldn't be the responsibility of a client of a library to "trim out" unwanted things. First, a client project doesn't necessarily have the required knowledge (what to trim) and, second, this might create a big mess with other plugins.
BUT if the fat clients are not using the whole my-lib (like server-side code would require the whole EJB JAR), then filtering isn't the right "maven way" to handle your situation. The right way would be Option 4: put everything common in a project (producing my-lib-core-1.0.jar) and specific parts in specific projects (that will produce my-lib-pda-1.0.jar etc). Clients would then depend on the core artifact and specialized ones.