Difference between yarn/npm5 lockfiles and exact package versions? - npm

My simple question is: why can't I just use exact versions in my package.json? How is this different from a lockfile?

The main difference is that lockfiles also lock nested dependencies - all of the dependencies of your dependencies, and so on. Managing and tracking all of those changes can be incredibly difficult, and the number of packages that are used can grow exponentially.
There are also situations where you cannot manually specify that a particular version of a package should be used - consider 2 libraries that specify foo at ~1.0.0 and ~2.0.0 respectively. The difference in major version tells us that the API of foo#v1 is not going to match the API of foo#v2, so there's no way you could override the package version at your app level without causing conflicts and failures.
Finally, you might wonder "why have semver at all then? Why not just have all packages manually specify the exact version of their dependencies?" One of the main advantages of semver is it means you don't have to update every dependency in the tree whenever a sub-dependency updates. If I rely on foo, and foo relies on bar, and bar just had a critical bug that was patched, and we're using exact versions for everything, then foo must also be updated before I can get the fix. If foo and bar have different maintainers, or if foo is abandoned, that could take a while and I may need to fork the project (something I've done more than once in Java-land).
This is very useful for maintaining ecosystems of libraries because it fundamentally reduces the amount of maintenance work required per-node in the dependency tree, making it easier to extract libraries and patterns. I once had an early project where we were building a component library that used exact versions, and any time the core library containing shared functionality was updated, we had to submit a PR to each of the other packages to update the version, and sometimes followup PRs to components that depended on those. Needless to say, we consolidated the packages after a few months.
Hope that helps!

Related

How to Group Plug-ins into Features

We are struggeling hard with how to use features the correct way.
Let’s say we have the plug-in org.acme.module which depends on org.thirdparty.specific and org.acme.core.
And we have the plug-in org.acme.other which depends on org.acme.core.
We want to create an application from these, which includes a target file and a product file. We have the following options:
One feature per module:
org.acme.core.feature
org.acme.core
org.acme.module.feature
org.acme.module
org.acme.other.feature
org.acme.other
org.thirdparty.specific.feature
org.thirdparty.specific
This makes the target and product files gigantic, and the dependencies are very hard to manage manually.
One feature per dependency group:
org.acme.module.feature
org.acme.core
org.acme.module
org.thirdparty.specific
org.acme.other.feature
org.acme.core
org.acme.other
This approach makes the dependencies very easy to manage, and the target and product files are easy to read and maintain. However it does not work at all. The moment org.acme.core changes, you need to change ALL the features. Furthermore, the application has no say in what to package, so it can’t even decide to update org.acme.core (because of a bugfix or something).
Platform Feature:
org.acme.platform.feature
org.acme.core
org.acme.other
org.thirdparty.specific (but could be its own feature)
org.acme.module.feature
org.acme.module
This is the approach used for Hello World applications and Eclipse add-ons - and it only works for those. Since all modules' target platforms would point to org.acme.platform.feature, every time anything changes for any platform plug-in, you'd have to update org.acme.platform.feature accordingly.
We actually tried that approach with only about 50 platform plug-ins. It's not feasible to have a developer change the feature for every bugfix. (And while Tycho supports version "0.0.0", Eclipse does not, so it's another bag of problems to use that. Also, we need reproducibility, so having PDE choose versions willy-nilly is out of the question.)
Again it all comes down to "I can't use org.acme.platform.feature and override org.acme.core's version for two weeks until the new feature gets released.
The entire problem is made even more difficult since sometimes more than one configuration of plug-ins are possible (let's say for different database providers), and then there are high level modules using other child modules to work correctly, which has to be managed somehow.
Is there something we are missing? How do other companies manage these problems?
The Eclipse guys seem to use the “one feature per module” approach. Not surprisingly, since it’s the only one that works. But they don’t use target platforms nor product files.
The key to a successful grouping is when to use "includes" in features and when to just use dependencies. The difference is that "includes" are really included, i.e. p2 will install included bundles and/or included features all the time. That's the reason why you need to update a bundle in every feature if it's included. If you don't update it, you will end up with multiple versions in the install.
Also, in the old day one had to specify dependencies in features. These days, p2 will mostly figure out dependencies from the bundles. Thus, I would actually stop specifying dependencies in features but just includes. Think of features as a way to specify what gets aggregated.
Another key point to grouping is - less is more. If you have as many features as bundles chances a pretty high that you have a granularity issue. Instead, think about what would a user install separately. There is no need to have four features for things that a user would never install alone. Features should not be understood as a way of grouping development/project structures - that's where folders in SCM or different SCM repos are ok. Think of features as deployment structures.
With that approach, I would recommend a structure similar to the following example.
my.product.base
base feature containing the bare minimum of the product
could be org.acme.core plus a few minimum
my.product.base.dependencies
features with 3rd party libraries for my.product.base
my.addon.xyz
feature bundling an add-on
separate features for things that can be installed separately
my.addon.xyz.dependencies
3rd party libraries for add-on dependencies
Now in the product definition I would list just my.product.base. There is no need to also list the dependencies features. p2 will fetch and install the dependencies automatically. However, if you want to bind your product to specific versions of the dependencies and don't want p2 to select any matching one, then you must include the my.product.base.dependencies feature.
In the target definition I would include a "my.product.sdk" feature. That feature is an aggregation feature of all other features. It makes target platform management easier. I typically create an sdk feature with everything.
Another feature that is also very often seen is a "master" feature. This is an "everything" feature that maybe used for creating a p2 repository during the build. The resulting p2 repository is then used for assembling products.
For a more real world example see here:
http://git.eclipse.org/c/gyrex/gyrex-server.git/tree/releng/features
Features and Continuous Delivery
There was a comment regarding frequent updates to feature.xml. A feature.xml only needs to be modified when there is a change in structure. No updates need to happen when the bundle version is modified. You should reference bundles in features with version 0.0.0. That makes Tycho to fill in the proper version at build time. Thus, all you need to do is commit a change to any bundle and then kick off a rebuild. Tycho also takes care of updating the feature qualifier based on the qualifiers of the contained bundles. Thus, the new feature qualifier will be different than in a previous build.

Examples of Semantic Version Names

I have been reading about semver. I really like the general idea. However, when it comes to putting it to practice, I feel like I'm missing some key pieces of information. I'm not sure where the name of a library exists, or what to do with file variants. For instance, is the file name something like [framework]-[semver].min.js? Are there popular JavaScript frameworks that use semver? I don't know of any.
Thank you!
Let me try to explain you.
If you are not developing a library that you like to keep for years to come, don't bother about it.. If you prefer to version every development, read the following.
Suppose you are an architect or developer developing a library that is aimed to be used by hundreds of developers over time, in a distributed manner. You really need to be cautious of what you are doing, what your developers are adding (so interesting features that grabs your attention to push those changes in the currently distributed file). You dont know how do you tell your library users to upgrade. In what scenarios? People followed some sort of versioning, and interestingly, their thoughts all are working fine.
Then why do you need semver ?
It says "There should be a concrete specification for anything for a group of people to follow anything collectively, even though they know it in their minds". With that thought, they made a specification. They have made their observation and clubbed all the best practices in the world about versioning software mainly, and given a single website where they listed them. that is semver.org. Its main principles are :
Imagine you have already released your library with a version "lib.1.0.98", Now follow these rules for subsequent development.
Let your library is bundled and named as xyz and,
Given a version number MAJOR.MINOR.PATCH, (like xyz.MAJOR.MINOR.PATCH), increment the:
1. MAJOR version when you make incompatible API changes
(existing code of users of your library breaks if they adapt this without code changes in their programs),
2. MINOR version when you add functionality in a backwards-compatible manner
(existing code works, and some improvements in performance and features also), and
3. PATCH version when you make backwards-compatible bug fixes.
Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.
If you are not a developer or are not in a position to develop a library of a standard, you need not worry at all about semver.
Finally, the famous [d3] library follows this practice.
Semantic Versioning only defines how to name your versions. It does not specify what you will do with your version number afterwards. You can put the version numbers in package names, you can store it in a properties file inside your application, or just publish it in a wiki. All those options are opened to discussion and not part of the problem space addressed by SemVer.
semver is used by npm and bower (and perhaps some other tools) for dependency management. Using semver it is possible to decide which versions of which packages to use if multiple libraries used depend on the same library.
As others have said, semantic versioning is a standard versioning scheme that tells your users which versions of your library should be compatible with each other, and which ones are not.
The idea, is to be able to give your users more confidence that it's safe to upgrade to a newer patch/version, because it's tried, tested, and true to being backwards compatible with the previous version (minor increments). That is, perceptively that's what your telling your users.
As far as tooling goes, I don't do much in javascript, but I typically let my build server handle stamping my assemblies etc with the correct version. I have a static major number I upgrade whenever I make breaking changes, a static minor number I upgrade everytime I add new features, and an auto-incrementing Patch number whenever I checkin bug fixes.
Especially if this is a javascript library you plan to share on a public repository of some kind (nuget, gem, etc) you probably want some for of automated packaging system, and you put the logic in there for specifying your version number (in the package meta data, in the name of the javascript file, which is typically the standard I've seen).
Take a look at sbt which is the Scala Build Tool. In it, we write dependencies like this:
val scalatest = "org.scalatest" %% "core" % "2.1.7" "test"
val jodatime = "org.joda" % "jodatime" % "1.4.5"
Wherein the operator %% means "the current version of Scala that you're building." Packaging things in this language generally create JAR files with the name like this <my project>_<scala version>_<library version>.jar which is quite handy for semantically naming things automagically. The % operator can be interpreted as "don't version this part."
That said, this resulted from the fact that the same library compiled to different Scala versions were not binary compatible with each other. So it was more as a result of, rather than a conscious design choice, the binary incompatibilities.

Does Ivy have different resolution behavior depending on status attribute?

My colleague pointed out a flaw in maintaining our artifacts (still somewhat new to Ivy):
The release builds are marked as “integration” which means it is rechecking for new versions on each build slowing down the build even when it has cached the dependencies.
That did not make much sense to me, since, I think, Ivy still needs to check what is in repo before making a decision about the version to deliver. So, I decided to research that a bit to understand exactly what are the effects of marking libraries with different status values.
I cannot find much in the documentation, though, or on the net. What am I missing?
Could someone please shed some light on this?
Thank you
The status is just a string, that can be defined for ivy. They don't affect the resolving of artifacts per se. It has no effect on the default retrieval. It's just a marker for an artifact.
Status:
Status of a revision A module's status indicates how stable a module
revision can be considered. It can be used to consolidate the status
of all the dependencies of a module, to prevent the use of an
integration revision of a dependency in the release of your module.
Three statuses are defined by default in Ivy:
integration: revisions builded by a continuous build, a nightly
build, and so on, fall in this category
milestone: revisions delivered to the public but not actually
finished fall in this category
release: a revision fully tested and labelled fall in this
category
You need to declare the dependency as changing or the resolver definition to achieve what your co-worker mentioned:
Changes in artifacts Some people, especially those coming from maven 2
land, like to use one special revision to handle often updated
modules. In maven 2 this is called a SNAPSHOT version, and some argue
that it helps save disk space to keep only one version for the high
number of intermediary builds you can make whilst developing.
Ivy supports this kind of approach with the notion of "changing
revision". A changing revision is just that: a revision for which Ivy
should consider that the artifacts may change over time. To handle
this, you can either specify a dependency as changing on the
dependency tag, or use the changingPattern and changingMatcher
attributes on your resolvers to indicate which revision or group of
revisions should be considered as changing.
Once Ivy knows that a revision is changing, it will follow this
principle to avoid checking your repository too often: if the module
metadata has not changed, it will considered the whole module
(including artifacts) as not changed. Even if the module descriptor
file has changed, it will check the publication data of the module to
see if this is a new publication of the same revision or not. Then if
the publication date has changed, it will check the artifacts' last
modified timestamps, and download them accordingly.
So if you want to use changing revisions, use the publish task to
publish your modules, it will take care of updating the publication
date, and everything will work fine. And remember to set
checkModified=true" on your resolver too!

maven dependency clash

In my project, there are 2 libraries, each of which depend on the XML parsing class java.xml.parsers.DocumentBuilderFactory. Each of these libraries reference the file from different jar (one gets it from a jar called xmlParserAPIs while another gets it from xml-apis-1.0.b2.jar). Unfortunately there are different versions of the class in each of these files so I am seeing runtime errors, depending on the order they are loaded. Both of these xml jars are transitive dependencies of 3rd party libraries. Is there a good way to handle this conflict?
edit: I'm not sure if it makes a difference on how to handle the problem but this only happens in testing because one of the dependencies is in the test scope.
thanks,
jeff
(...) Unfortunately there are different versions of the class in each of these files so I am seeing runtime errors, depending on the order they are loaded.
In theory, xml-apis.jar and xmlParserAPIs.jar (from xerces2-j) are the same JARs but with different names, xmlParserAPIs.jar being deprecated for years (see this message and this one).
If your dependencies relies on different and incompatible versions of xml-apis.jar, I would say that these dependencies are mutually exclusive, in other words incompatible, at least for the versions you're using. The only solution would be to find versions with a converging dependency.
In case they could use compatible versions, use a dependency exclusion for xmlParserAPIs.jar to use xml-api.jar only.
I'm not sure if it makes a difference on how to handle the problem but this only happens in testing because one of the dependencies is in the test scope.
No, this just explains why you don't get the problem at runtime (because the test scoped is not on the classpath then and, obviously, doesn't conflict).

How to find unneccesary dependencies in a maven multi-project?

If you are developing a large evolving multi module maven project it seems inevitable that there are some dependencies given in the poms that are unneccesary, since they are transitively included by other dependencies. For example this happens if you have a module A that originally includes C. Later you refactor and have A depend on a module B which in turn depends on C. If you are not careful enough you'll wind up with both B and C in A's dependency list. But of course you do not need to put C into A's pom, since it is included transitively, anyway. Is there tool to find such unneccesary dependencies?
(These dependencies do not actually hurt, but they might obscure your actual module structure and having less stuff in the pom is usually better. :-)
To some extent you can use dependency:analyze, but it's not too helpful. Also check JBoss Tattletale.
Some time ago I've started a maven-storyteller-plugin to be able to deeper analyze the poms, but the project is very far from production/public use. You can use the storyteller:recount goal to analyze the unused/redundant dependencies.
The problem with the whole story is - how to determine "unused" things. What is quite possible to analyze is for instance class references. But it won't work if you're using reflection - directly or non-directly.
Update November 2014.
I've just moved my old code of the Storyteller plugin to GitHub. I'll refresh it and release to the central so that it's usable for others.
I
personaly use the pom editor of M2Eclipse to visually view the dependency tree (2D tree). Then I give a look in my deliverable (war, ear) lib directories. Then still in M2Eclipse pom dependencies viewer I go to every 3rd party, and right click on the dependency I want to exclude (an exclusion is added automatically in the right dependency).
There is no golden rules, simply some basic tips:
a lot of pom are not correct: a lot of 3rd party libs out there require way too much dependencies in the default compile scope, if everybody carefully craft their pom, you must not have so much unwanted dependencies.
you need to guess by the name of dependencies what you will have to exclude, best example are parsers, transformer, documentbuilder: xalan, xerces, xalan alfred and co. try to remove them and use the internal jdk1.6 parser, common apache stuff, log4j is also worth looking at.
look also regularly in lib delivery if you do not have duplicate libraries with different version (the dependency resolver of maven should avoid that)
go bottom up, start with your common modules, then go up till the service layer, trimming down dependencies in every module, dont try to start in modules ear/war, it will be too difficult
check often if your deliverable are still working, by either testing or comparing and old deliverable with the new one (especially in web-inf/lib directory what has disappeared with winmerge/beyoncompare)
When you have A -> B, B -> C, and then refactor such that A -> (B, C). IF it is the case that A still compiles against B, you very much don't want to simply pick up the dependency because you receive it transitively.
Think of the case when A -> (B-1.0, C-1.0), B-1.0 -> C-1.0. Everything's in sync, so to avoid "duplication" you remove C from A's dependency. Then you upgrade A to use B-2.0 -> C-2.0. You begin to see errors because A wants C-1.0 classes but found C-2.0 classes. While quickly reconcilable in this scenario, it is far less so when you have lots of dependencies.
You very much want the information in A's pom that says that it explicitly expects to find C-1.0 on the classpath so that you can understand when you have transitive dependency conflicts. Again, Maven will do the job of ensuring that the "closest" version of any particular jar ends up on your classpath. But when things go wrong - you want all the dependency metatdata you can get.
On a slightly more practical note, a dependency is unused when you can remove it from your pom and all of your unit/integration/acceptance tests still pass. ;-)