Alternatives to Git Submodules? - git-submodules

I feel that using Git submodules is somehow troublesome for my development workflow. I've heard about Git subtree and Gitslave.
Are there more tools out there for multiple repository projects and how do they compare ?
Can these tools run on Windows ?

Which is best for you depends on your needs, desires, and workflow. They are in some senses semi-isomorphic, it is just some are a lot easier to use than others for specific tasks.
gitslave is useful when you control and develop on the subprojects at more of less the same time as the superproject, and furthermore when you typically want to tag, branch, push, pull, etc all repositories at the same time. gitslave has never been tested on windows that I know of. It requires perl.
git-submodule is better when you do not control the subprojects or more specifically wish to fix the subproject at a specific revision even as the subproject changes. git-submodule is a standard part of git and thus would work on windows.
git-subtree provides a front-end to git's built-in subtree merge strategy. It is better when you prefer to have a single-repository "unified" git history. Unlike the subtree merge strategy, it is easier to export changes to the different (directory) trees back out to the original project, but it is not as automatic as it is with gitslave or even git-submodule.
repo is in theory similar to gitslave, but not as well documented for non-android operations that I have found. It is fairly dedicated to the Google Android development model and only natively supports a handful of git commands (though you can run arbitrary commands) and the limited native support doesn't support, for example, a centralized repository to push to and checking out a branch seems fairly difficult.
kitenet's mr is what you would want to use if you have multiple version control systems in use, but is mostly limited for git-only superprojects due to its lowest common denominator approach. There are ways to run arbitrary commands, but they are not as well integrated.

For some use cases, I have liked each of the following two simple approaches:
Nested repositories. If your software project has a plugin mechanism, with each plugin in its own sub-directory, it can make sense to git-ignore these plugin directories and, in your local filesystem, to make each of them into its own git repository. This way, all your files form a single directory tree, but are managed in different git repositories. It will not confuse git.
Per-package repositories. For software projects where you use some kind of source code package management system (gem / bundler, npm, pear or the like) it can make sense to put your re-used code into separate git repositories, then to make source packages from them, and then to install them with the package management tool into the parent project. Your parent project's git repository would only contain a reference to the required packages and their versions, while the actual code of these packages will be git-ignored as done with all other packages and external libraries as well. Compared to the nested repositories proposed above, this is a more elaborate approach as it allows to specify which package version is to be installed.

I currently use submodules for development and not just relating 3rd party libraries. There are some ways that you can make life easier with submodules, especially when they are the source of merge or rebase conflicts. Look to ls-tree to get the 2 commits involved on a conflict in the submodule. This is probably the most difficult part of submodules for people to deal with. For now scripting will make this much easier to work with. Future versions of Git should have better native support for dealing with them.
Hope this helps.

We encountered a similar issue when using Git submodules in projects where we had dependencies in a variety of languages. To deal with them, we built and open-sourced a tool called MDLR ("Modular") that gives you declarative version-controlled Git dependencies with similar functionality to Git submodules, but without the annoying workflow. You can install it and manage your dependencies with the instructions/downloads on the GitHub repo

Related

git submodule vs npm package?

I'm using git submodule to build and shared components between projects. The project is not in production yet, so, at this point submodule is serving well.
But I'm concerned about maintenance and deploy, would be a good idea transform it into a npm package ?
An npm package will allow fragmentation across different package versions. On the other hand, git submodules have a bit of a learning curve, and the tooling is really not that good. With git submodules, you have all the source in one folder.
If it's at all possible, I'd recommend using a plain monorepo for all projects. You may need to create build time variables (via babel plugin/s), you may need some sort of "live config" get served from the backend. I worked with git submodules for a year and I've recently worked with a project that uses npm to share code.
I would recommend using only one git submodule, for all shared code, instead of several submodules. I would strongly consider using lerna, and use your one git submodule to track lerna's packages directory. And if the team decides they don't like git submodules, you can easily make this repo a sibling git repo, instead of a submodule. However, above all this, I'd recommend using a plain monorepo.
Here's a great talk on monorepo's from Netflix: https://www.youtube.com/watch?v=VNqmHJtItCs (strong focus on discouraging npm-style packages)
Here's google's infamous monorepo talk: https://www.youtube.com/watch?v=W71BTkUbdqE
This is a great site to read to help you think about good development flows: https://trunkbaseddevelopment.com/ (it primarily advocates for the monorepo approach)
If you are developing software for different clients(different people/companies paying you for similar projects), and have some agreement that they should be at least ~80% the same, you may really enjoy using build flags to help get started on splitting functionality, but I'm sure you should very proactively keep the code around the build flags clean, and refactor into re-usable components/packages. Give each client some sort of build-flags.json. Build flags should be named for features only, which in theory can all be individually toggled. Some code may be totally custom for each project, in this case, you may want to consider using dynamic imports, but generally this is a pain point I have yet to fully cross, although I have plenty of unrefined ideas around this.
If a monorepo is just not happening, I would actually recommend using npm packages+separate repos over git submodules, assuming you can do good semantic versioning of the package. (And, yalc seems to be a good tool for linking together packages, as opposed to the standard npm/yarn link)
My findings after trying both lerna, npm workspaces and git submodules. I find it is not a case of the one vs the other.
The reason why I say this is because one can have submodules that are part of the monorepo. Doing exactly this made my development experience better as I could clone an existing project and actively develop it within the bigger project (monorepo). I could then contribute back to the cloned project once satisfied with the changes. This is something that you cannot do with npm workspaces alone. Hence my argument that it is not a case of one vs the other. They solve different problems and can therefore complement each other.
Before using npm workspaces I would use npm link all the time. npm workspaces makes this use-case of developing with multiple packages more convenient. Even when the team you work with does not use a monorepo; you could use one to develop multiple packages and test them in conjunction. Once satisfied, you can push the individual repos. This is something you cannot do with git alone.
Maybe you can think of more novel ways of combining the features of npm and git.

Repository for storing derived information (build artifacts)

I'm looking for a "repository" to store derived information (build artifacts).
We have a repository (currently Mercurial) to store our source code. When something is pushed to the source repository the code goes through a continuous integration server and we do an incremental build and as a result some dlls will be changed. This should be added to some "repository" so that everybody can use that version without needing to do the build again.
I'm looking for the following features:
It should be easy to update the source code and get the corresponding binaries (we could probably make a script for that)
You should easily get all binaries at once (not only those that changed during the last incremental build.
Binaries that weren't changed should only be stored once in the repository.
When updating the source code and the binaries only the changed binaries should be transferred (and not all binaries). This is similar to what happens for source code.
When updating to some version, only that version should be stored locally, not the complete history.
We should be able to remove certain versions from the binary "repository" after a while. However if the dlls are still necessary for subsequent incremental builds, these dlls should of course not be completely removed from the "repository"
What would fit these requirements?
I agree with Manfred, what you are looking for is a binary repository manager. Besides the Nexus repository manager you should consider Artifactory.
As for the feature list you asked about:
As you have mentioned the CI server should be responsible for identifying a change in the version control and starting a build process which creates the binaries. The CI server/build tool should also deploy the generated binaries to the repository manager, in case the build was successful. Artifactory offers a build integration feature which takes care of deploying the binaries together with the build metadata.
Using the build integration feature of Artifactory, you can get a list of all the binaries generated by a specific build and download them as an archive. Artifactory provides a REST API for those actions.
There are different approaches for storing the artifacts in a repository manager. Some tools stores a multiple copies of the same binary. Other, for example Artifactory, use a checksum based storage which keeps only one copy per binary (based on its checksum). This pays of if you keep multiple copies of the same binary in different repositories, especially if you are dealing with large binaries (war files, docker images, ISOs etc.). Another benefit are cheap copies/moves between repositories which is a common practice for promotion workflows.
The Artifactory build integration uses checksum based deployment which deploys only binaries which does not exist in Artifactory. For binaries which do exist and have not changed, it only created a new reference to the existing binary saving the need to send the actual bytes.
Artifactory provides multiple option of cleaning up binaries, including built in cleanup policies and the option to develop your own custom logic using user plugins and the Artifactory query language (AQL)
In addition, I highly recommend to take a look at the binary repository comparison matrix.
Disclaimer: I am working for JFrog the company behind Artifactory
You are basically asking for a repository manager like the Nexus Repository Manager as you have correctly identified with the tags.
In terms of specific requirement from your questions here are a couple of ideas.
binary components are typically identified via some coordinates that most of the time includes some sort of name and version. A release and build process changes those and deploys them to the repository. This allows you to match source code with binaries. You can also embed information like git refs in the produced binaries.
accessing the binaries is typically done via HTTP, so its easy. You then just have to determine what it means to get "all binaries".
not duplicating binaries that are essentially the same can be supported by the underlying file system or the build tool. I have seen both processes to work. Often it is however not worth the effort since storage is cheap.
there are various ways to automatically clean up repositories including scheduled tasks that do it regularly. Worst case you have to implement your own logic in an extension
Disclaimer: I work as community advocate and trainer for the Nexus Repository Manager with Sonatype.

Any native git command line tool based on libgit2?

As libgit2 is a library, is there any existing C/C++ project which depends on libgit2 and exposes the usual Git command line interfaces (like git clone, git commit, etc.)?
The closer you may find is in the examples folder of the libgit2 project.
As stated by the README
These examples are a mixture of basic emulation of core Git command line functions and simple snippets demonstrating libgit2 API usage (for use with Docurium). As a whole, they are not vetted carefully for bugs, error handling, and cross-platform compatibility in the same manner as the rest of the code in libgit2, so copy with caution.
That being said, you are welcome to copy code from these examples as desired when using libgit2. They have been released to the public domain, so there are no restrictions on their use.
One of the long term goal of the libgit2 project would be to run the whole git.git tests against those examples (to ensure compatibility with the core git implementation), so there's a reasonable chance they'll keep on evolving.
From time to time there is some project which tries to reimplement the git tool on top of libgit2 or one of the bindings, but these don't tend to go very far.
The git interface is a collection of quirks, and it's not a very rewarding job to reimplement them in your own tool. To add to that, if you do go through and reimplement the interface, what you get is to have a version of git with mismatched features, which is what you had before you even started.
The are some systems where it might be worth going through all the trouble is to avoid having to have the unix-like environment with shell or perl, but there is also effort to port those parts of git to C as well, which tackles this from the other side.

How can I write branch-specific hooks in BZR?

In subversion, hooks are written on a per-repository basis. Each hook is written in a descriptive filename (e.g. pre-commit) in a folder named "hooks" at the root of the repository. According to the BZR docs, hooks are typically installed globally (e.g. in the ~/.bazaar/plugins/ directory).
Is it possible to create, say, a pre-commit hook that is committed to the branch and that runs without a user having to install a plugin?
I see in the docs and in some code discussions a reference to something called "branch hooks," which sounds promising.
I found this blog: http://schettino72.wordpress.com/2008/01/20/how-to-execute-tests-on-a-bazaar-pre-commit-hook/, which quotes:
"plugins in bazaar are not project
specific. so you cant control in which
projects (branches) your plugin will
be applied (it will be applied to
all)."
which is less promising. The blog gives a workaround in that you write and install a plugin that calls hooks in your repository if they exist. Ideally, I want not to rely on users to install plugins for a really basic hook to run, namely a simple test. Is this possible?
You can use bazaar server and install hooks on them.
also You may find interested next links:
http://people.samba.org/bzr/jelmer/bzr-shell-hooks/trunk/
http://bazaar.launchpad.net/~stianse/%2Bjunk/bzr-shell-hooks/
I did some research into this and found motivation behind the lack of branch-specific hooks in distributed revision control systems. I had compared to Subversion, a centralized RCS, as an example of the desired feature.
GIT and Mercurial are distributed RCSs (like Bazaar) that have tools for hooks, including different approaches to branch-specific hooks and global hooks. Regardless, the hooks are not revision controlled and they require the user of the branch to enable them due to the security risk. In the Mercurial documentation on hooks, under the section titled "Hooks and security," it says:
In Mercurial, hooks are not revision
controlled, and do not propagate when
you clone, or pull from, a repository.
The reason for this is simple: a hook
is a completely arbitrary piece of
executable code. It runs under your
user identity, with your privilege
level, on your machine.
It would be extremely reckless for any
distributed revision control system to
implement revision-controlled hooks,
as this would offer an easily
exploitable way to subvert the
accounts of users of the revision
control system.
In a centralized RCS, like Subversion, the hooks are run on the repository server, so user permissions and server-setup restrict the impact of a damaging hook script. In a distributed RCS, the hooks are typically run on the user's local machine, which is risky.
As vitaly.v.ch mentions, a Bazaar server could be setup to run hooks when pushed to and pulled from. But then a pre-commit hook doesn't make sense, since the commits happen on the user's machine. It would then be more like a pre-push hook.
Bazaar has all the needed functionality for hooks, but individual user configuration is required to install and enable them due to the security risk they pose.
No, your users have to install the plugin to activate your hook.

Archivable, replicable releases when building with Maven: is there a right way?

We have a largish standalone (i.e. not Java EE) commercial Java project (10,000+ classes, four or five SVN repositories, ten or twenty third-party libraries) that's in the process of switching over to Maven. Unfortunately only one engineer (in a team of a dozen or so distributed across three countries) has any prior Maven experience, so we're kind of figuring it out as we go.
In the old Ant way of doing things, we'd:
check out source code from three or four repositories
compile it all into a single monolithic JAR
release that (as part of a ZIP file with library JARs, an installer, various config files, etc.)
check the JAR into SVN so we had a record of what the customers had actually got.
Now, we've got a Maven repository full of artifacts, and a build process that depends on Maven having access to that repository. So if we need to replicate what we actually shipped to a customer, we need to do a build against a Maven repository that has all the proper versions of everything. This is doable, I guess, if in (some version of) the (SVN-controlled) POM files we set all the dependencies to released versions?
But it gives our release engineer the creepy-crawlies, because there doesn't seem to be any way:
to make sure that somebody doesn't clobber the copy of foo-api-1.2.3.jar on the WebDAV server by mistake (the WebDAV server has access control, but that wouldn't stop a buggy build script)
to detect it if they did
to recover afterwards
His idea is, for release builds, to use a local file system as the repository rather than the WebDAV server, and put that local repository under SVN control.
Our one Maven-experienced engineer doesn't like that -- I guess because he doesn't like putting binaries under version control? -- and suggests that maybe the professional version of the Nexus server can solve the clobbering or clobber-tracking/recovery problem.
Personally, I'm not happy (sorry, Sonatype readers) with shelling out money for a non-free build system when we haven't even seen any benefit from the free version yet, and there's no guarantee it will actually solve the problem.
So our choices seem to be:
WebDAV server
Pros: only one server, also accessible by devs, ...?
Cons: easy clobbering, no clobber-tracking/recovery
Local file system
Pros: can be placed under revision control
Cons: only works with the distribution script
Frankly, both of these seem like hacks to me, and I have to wonder if there isn't a better way to do this.
So: Is there a right thing to do here?
I'm not sure to get everything but I would:
Use the maven-release-plugin (which automates the release process i.e. execute all the steps documented in release:prepare).
Use WebDAV with anonymous read-only and authenticated write policy (so only release engineer can actually deploy released artifacts to the corporate repo).
There is a no need to put generated artifacts under version control (if you have the poms under version control). I don't see the benefits of using the local file system instead of WebDAV (this is not providing more security, you can secure WebDAV as well). I don't see what the commercial version of Nexus would solve here.
Nexus has a setting which prevents you from clobbering an already released artefact in a release repository.
For a team of about a dozen, the free version of Nexus should be enough.