svn to git migration with nested svn:externals - git-svn

Migrating from svn with svn externals, to git.
Each svn externals may have it's own svn:externals. There is no guide how to migrate svn with externals to git here that I found useful.
Each branch may have it's own branches tags trunk.
What's the best way to migrate the whole repository?
I am looking at git svn clone of the main repository adding git submodules, of each one external cloned as well. But Since the externals are nested, I don't know what's the best solution.
Script used till now:
https://github.com/eneroth/git-externals
https://github.com/eneroth/git-externals

I have my own research and I didn't found an all-in-one solution from the opensource. There is plenty single "one sided" solutions which are does not cover many aspect of svn-to-git conversion like:
svn:externals
svn:ignore conversion into .gitignore
automatic SVN author emails conversion or prevension to continue without a conversion (for example, GitLab does use account mails to track repo changes (aggregate participation statistics) for an account and would reject to track anything if you forget to convert the mails from author#<repo-guid> after the git-svn tool)
svn tags/branches conversion into native git tags/branches
handling conversion collisions or unavailability to convert
handling 2-way conversions (git-2-svn)
handling conversion resuming (after a commit into one of svn/git repos)
handling with out of access to the bare repository or without to access the SVN repository root
handling a manual conversion start/resume (through the script or executable) instead of as a standalone service or vice versa
support conversion from/into popular svn/git hubs like sourceforge, github, gitlab, bitbucket
...and so on.
It's a pretty big area to invetigate or research to yourself and can consume much time to at least find out what you actually need or want from the svn-to-git or whatever conversion.
For myself I've found this as a pretty much close to what i want:
https://techbase.kde.org/Projects/MoveToGit/UsingSvn2Git
Example of the rules file:
https://cgit.kde.org/kde-ruleset.git/tree/kde-rules-main
Example of the account map:
https://cgit.kde.org/kde-ruleset.git/tree/account-map
Script examples to pack/push from the local bare git repository (generates by the KDE tool) to the remote git repository:
https://phabricator.wikimedia.org/diffusion/OSOF/browse/master/svn2git/scripts
Third party projects to scan the SVN repo before slice it:
https://github.com/hartwork/svneverever
Some of ports to other third party projects:
https://github.com/mazong1123/svn2gitnet (dot net port of the ruby tool)
http://rsvndump.sourceforge.net (like the svnrdump, but additionally allow dumps of subdirectories even if your repository access is limited to this subdirectory)
Some standalone interesting implementations:
https://gitlab.com/esr/reposurgeon (some comparison from the author: http://www.catb.org/~esr/reposurgeon/features.html)
But still, there is many not resolved aspect or cons like:
support git subtree/submodules/etc to slice SVN repo into one GIT root repo with references to other small GIT repos
automatic conversion of svn:externals to git subtrees (the rules from the KDE project does support only a manual or semi automatic (regular expressions) conversion)
If you decide to use the KDE convertor, then you can write a script to prepare the rules for each revision range what you want translate for svn:externals into what you want.

Related

How to use Ivy/Ant to build using intermediate artifacts

I am trying to revise my build process to use ant with apache ivy for my personal projects. These consist of a few shared modules, and a few application modules that depend on the shared modules. For the sake of this post, let's simplify and say I have a shared module (common), and an application module (application) which depends on common. Each module has it's own effective svn repository:
svn_repo_1/common/trunk
/branches
/tags
svn_repo_2/application/trunk
/branches
/tags
I check out the relevant revision into a common workspace, in a flat structure:
workspace/common
workspace/application
In general, application will depend on a published version of common, so there will be no need to build common when building application.
However, when I need to add new functionality to common that is required by application, I would then like application to depend on the latest common build from my workspace (without needing to publish common to my repository).
I assumed this is what latest.integration meant (i.e. changing application's ivy.xml to specify latest.integration for the common revision). My intention was to use the ivy buildlist task to find the local modules that needed to be built before application could be built. This does not work however, because the buildlist task seems to include the common/build.xml entry regardless of whether application's ivy.xml file specifies latest.integration or some other published revision.
I would appreciate any suggestions. I am struggling with ivy's documentation and samples, so any real-world examples would also be helpful. Note: I am not interested in a Maven solution here.
Wow, this is truly deja vu! Go back to some of my first questions on this site from 3 - 4 months ago and they're almost all Ivy-related! I empathize with you 100% that Ivy is a difficult beast to learn and tame, but after using it professionally for a few months now, I'll never develop without it again. So my first piece of advice: keep going. Sooner or later, what little (practical) documentation you find on Apache Ivy will alll start to make sense and fall into play.
I can understand there may be extenuating reasons for why you don't want to publish your common to your repo. However, if you are a newcome to transitive dependency management, the first piece of practical advice I can give you is that you should always publish your JARs/WARs/whatever to your repo; not an intermediary "integration" local to your workspace.
The reason for this is simple: Ivy only has the ability to crawl the repositories you define in your settings file (basically). If you deliberately keep a JAR like common outside of one of these defined repositories, then: (a) Ivy has no way to resolve transitive dependencies (its primary job), and (b) "downstream" (dependent) JARs fail to be dynamically updated every time you tweak common. Thus, using Ivy only to not publish JARs is a bit counter-productive; I'm surprised Ivy even includes it as a feature.
I guess I would need to understand your motivation for not publishing common. If you're simply having problems getting the ivy:publish task to work, no worries I can provide plenty of examples to help get you started. But if there are some other reasons, then I ask you to consider this solution: set up multiple repositories.
Perhaps you have one "primary" repository where mostly everything gets published; and then you have a "secondary" or "intermediary" repository where you publish common to whenever it makes sense (for you) to do that. You can then configure your Ant build with two different publish tasks, such as publish-main and publish-integration.
That way you get the best of both worlds: you get your intermediary staging area, and you get to keep everything inside of Ivy's powerful control.

Apache Ivy Terms & Ambiguities

I'm learning how to augment my build with Ivy using a "brute force" method of just trying to get a few sample projects up and running. I've poured over the official docs and read several online tutorials, but am choking on a few terms that seem to be used vaguely, ambiguously and/or in conflicting ways somehow. I'm just looking for an experienced Ivy connoisseur to help bring some clarity to these terms for me.
"Resolution" Cache vs. "Repository" Cache vs. "Ivy" Cache
The "Ivy Repository", as opposed to my normal SCM which is a server running SVN
What's the difference between these 3 types of cache? What's the difference between the "Ivy Repository" and my SVN?
Thanks to anyone who can help!
"Resolution" Cache vs. "Repository" Cache vs. "Ivy" Cache
The ivy cache is basically a folder, where ivy stores artifacts and configurations. If not configured differently it can be found in UserHome/.ivy2
The ivy cache consists of the resolution cache and a repository cache.
The repository cache contains the artifacts from a repository, that were downloaded by ivy. It is caching the repository, so that ivy won't need to query the repository every time it tries to resolve/download an artefact. If it finds an suitable artifact in the repository cache it will not query the repository. Thus saving the cost to query the repository. If and how the cache is used is a bit more complicated and depends on the dependencies/configuration.
The resolution cache is a collection of ivy-specific files, that tell ivy how an artifact was resolved (downloaded).
The "Ivy Repository", as opposed to my normal SCM which is a server running SVN
A Repository in the ivy world is a location, which contains artifacts(jar) files. This can be the local filesystem or a web server. It has no versioning system. Each version of an artifact is contained in a seperate folder. You can't commit artifacts, you just add them to the file system. See the terminology
org\artifact\version1\artifact.jar
org\artifact\version2\artifact.jar
A repository is accessed via a resolver, which has to know the layout of the repository.
From the doc on caches:
Cache types
An Ivy cache is composed of two different parts:
the repository cache
The repository cache is where Ivy stores data downloaded from module repositories, along with some meta information concerning these artifacts, like their original location.
This part of the cache can be shared if you use a well suited lock strategy.
the resolution cache
This part of the cache is used to store resolution data, which is used by Ivy to reuse the results of a resolve process.
This part of the cache is overwritten each time a new resolve is performed, and should never be used by multiple processes at the same time.
While there is always only one resolution cache, you can define multiple repository caches, each resolver being able to use a separate cache.

How to make a maven project buildable for the customer

We have a project which should be buildable by the customer using maven. It has some open source dependencies that are mavenized (no problem), some that aren't mavenized, proprietary stuff (oracle jdbc driver) and some internal stuff.
Until now we had everything but the first category packaged with the project itself in a local repository (repository with file://path-in-project-folder specified in the projects pom.xml).
We would love to move these out of the project, as we are about to use them in other projects as well. Currently we plan to use nexus as an internal maven repository.
Whats the best practice to make such dependencies/maven repositories available to the customer so he can continue to build the project.
Ideas so far:
Customer sets up a nexus repository as well, we somehow deploy all these non-public dependencies to his repository (like a mirror)
We provide a 'dumb' dump/snapshot of the non-public dependencies, customer adds this snapshot to this settings.xml as a repository, (but how is this possible).
Make our internal nexus repo available to the customers build server (not an option in our case)
I'm wondering how others solve these problems.
Thank you!
Of course, hosting a repository of some kind is a straightforward option, as long as you can cover the uptime / bandwidth / authentication requirements.
If you're looking to ship physical artifacts, you'll find this pattern helpful: https://brettporter.wordpress.com/2009/06/10/a-maven-friendly-pattern-for-storing-dependencies-in-version-control/
That relies on the repository being created in source control - if you want a project to build a repository, consider something like: http://svn.apache.org/viewvc/incubator/npanday/trunk/dist/npanday-repository-builder/pom.xml?revision=1139488&view=markup (using the assembly plugin's capability to build a repository).
Basically, by building a repository you can ship that with the source code and use file:// to reference it from within the build.
There are two options:
Document exactly what artifacts you need to compile which are not
available via Maven Central
Implement Nexus and make a export with Nexus give the export
to customer and they need to do a import of it. I'm not sure
if you come to licenses issues.
I assumed that you already have a Repository Manager already but it reads like you didn't.

How do I fix incorrect checksums in my Nexus repository?

Some of the artifacts in my local Nexus repository don't have the correct checksum. For example (wrong checksum):
cat central/org/codehaus/plexus/plexus-compiler-api/1.8/plexus-compiler-api-1.8.pom.sha1
95f3332c2bbace129da501424f297e47dd0e976b
vs (correct checksum):
sha1sum central/org/codehaus/plexus/plexus-compiler-api/1.8/plexus-compiler-api-1.8.pom
4c2947f7e2d09b6e13da34292d897c564f1f9828
It looks like I have a few artifacts in my repository that were downloaded when this bug was active.
Maven Central has the correct checksum (4c29...) now, but the checksums in my local Nexus repository remain stale. I don't know how to get my local repository to verify and / or re-download the correct checksum from central.
What is the correct way of fixing my local repository. There aren't too many artifacts with this problem, so I think I could (by hand) verify they still exist in central and delete them from my local repository. They should get re-cached with the correct checksums. Is there a better way?
Update:
I've looked at this more and I'm almost positive I know what the source of my problem is. One of the artifacts I'm having trouble with is this one (plexus-compiler-api:1.8):
In my repository, both the .pom and .pom.sha1 are timestamped as 29-Mar-2010. At central, the .pom is timestamped as 29-Mar-2010 while the .pom.sha1 is timestamped as 21-Apr-2010. I was reading about Nexus maintenance. I assume that, on 21-Apr-2010, Maven Central rebuilt metadata and verified checksums which fixed the incorrect .sha1 for the plexus-compiler-api:1.8 artifact.
According to the Sonatype link above, I should be able to expire the caches for Maven Central and have my local installation pull new copies of anything with newer timestamps than the originally cached artifacts. However, based on the behavior I've observed, I think it's only checking timestamps for artifact files, not checksum files.
As far as my local Nexus repository is concerned, I have the most recent version of the artifact (29-Mar-2010), so there's no need to re-download anything.
I've noticed my version of Nexus is quite old (1.5 vs 1.9.1), so I'll try updating and see if the newer version does a better job of expiring caches. If not, I'll probably see what the Sonatype guys think (maybe it's a bug?).
Nope, what you face is the defined behaviour of Nexus and Maven.
First, expiring caches does not delete anything from local cache of Nexus, it just marks them "old". The effect of marking items as "old" is shown on next incoming request asking for those same artifacts (if never asked for, the "old" artifacts just sits there). Meaning, expire cache alone will not cause Nexus to download remotely changed (newer) files. Nexus never downloads on it's own (if we leave out the index from this discussion). You have to force a client (Maven) ask for them – and that will result following chain of action: "cache content old", remote change detection and finally re-download and caching of the new file.
Next, what happens here is that Maven, since artifact (the JAR file) is not changed, not even asks for checksum file either, hence nothing "triggers" the "old" marked checksum refetch on Nexus side. Other to note, if we talk about released artifact (and Maven Central does contains released artifacts only), Maven will never re-check them, unless they are not present in local repository (once brought into local repository, Maven will never try to refetch them). Meaning, you need to remove them from local repository to be sure that Maven will ask for them from Nexus, and finally, that Nexus will detect the checksum file changes on remote and do what you actually want.
Re-download should happen, for example if you nuke your Maven's local repository and rebuild with a clean/empty one. In this case, Maven should ask for both, JAR artifact and checksum file – but from your description it's not clear how did you (or did you?) invoke Maven after expiring caches on Nexus.
Try this:
a) run expire caches on Nexus "Maven Central" proxy repository
b) nuke local repository (or just redirect it to a new clean folder by tampering ~/.m2/settings.xml
c) make Maven build your project, and it should refetch both, the JAR and checksum files (by using empty/nuked local repository)
Hope this explains some of the stuff you wrote.
Reference to JIRA issue discussing same thing.
This was a bug.
As explained by Tamas, when a proxied repository cache is expired, Nexus will check the remote repository for newer timestamps. The locally cached artifacts are essentially flagged dirty and the check for updated artifacts happens on demand as artifacts are requested from the local Nexus server.
Nexus (1.9.1) is making the assumption that if an artifact timestamp is unchanged, the checksums should be unchanged as well. Most of the time this will be true, but, due to the old bug in Maven that was deploying artifacts with incorrect checksums, there are rare cases where an artifact can be unchanged yet have an updated checksum.
I think the best way to deal with this for now will be to move any bad checksums and let Nexus try to re-resolve them the next time they are requested:
mv plexus-compiler-api.pom.sha1 plexus-compiler-api-1.8.pom.sha1.bak
Thanks for the help Tamas.

Maven - installing artifacts to a local repository in workspace

I'd like to have a way in which 'mvn install' puts files in a repository folder under my source (checkout) root, while using 3rd party dependencies from ~/.m2/repository.
So after 'mvn install', the layout is:
/work/project/
repository
com/example/foo-1.0.jar
com/example/bar-1.0.jar
foo
src/main/java
bar
src/main/java
~/.m2/repository
log4j/log4j/1.2/log4j-1.2.jar
(In particular, /work/project/repository does not contain log4j)
In essense, I'm looking for a way of creating a composite repository that references other repositories
My intention is to be able to have multiple checkouts of the same source and work on each without overwriting each other in the local repository with 'install'. Multiple checkouts can be because of working on different branches in cvs/svn but in my case it is due to cloning of the master branch in git (in git, each clone is like a branch). I don't like the alternatives which are to use a special version/classifier per checkout or to reinstall (rebuild) everything each time I switch.
Maven can search multiple repositories (local, remote, "fake" remote) to resolve dependencies but there is only ONE local repository where artifacts get installed during install. It would be a real nightmare to install artifacts into specific locations and to maintain this list without breaking anything, that would just not work, you don't want to do this.
But, TBH, I don't get the point. So, why do you want to do this? There might be alternative and much simpler solutions, like installing your artifacts in the local repository and then copying them under your project root. Why wouldn't this work? I'd really like to know the final intention though.
UPDATE: Having read the update of the initial question, the only solution I can think of (given that you don't want to use different versions/tags) would be to use two local repositories and to switch between them (very error prone though).
To do so, either use different user accounts (as the local repository is user specific by default).
Or update your ~/.m2/settings.xml each time you want to switch:
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
http://maven.apache.org/xsd/settings-1.0.0.xsd">
<localRepository>${user.home}/.m2/repository</localRepository>
<!--localRepository>${user.home}/.m2/repository2</localRepository-->
...
</settings>
Or have another settings.xml and point on it using the --settings option:
mvn install --settings /path/to/alternate/settings.xml
Or specify the alternate location on the command line using the -Dmaven.repo.local option:
mvn -Dmaven.repo.local=/path/to/repo
These solutions are all error prone as I said and none of them is very satisfying. Even if you might have very good reasons to work on several branches in parallel, your use case (not rebuilding everything) is not very common. Here, using distinct user accounts migh be the less worse solution IMO.
This is INDEED possible with the command line, and in fact is quite useful. For example, if you want to create an additional repo under your Eclipse project, you just do:
mvn install:install-file -DlocalRepositoryPath=repo \
-DcreateChecksum=true -Dpackaging=jar \
-Dfile=%2 -DgroupId=%3 -DartifactId=%4 -Dversion=%5
It's the "localRepositoryPath" parameter that will direct your install to any local repo you want.
I have this in a batch file that I run from my project root, and it installs the file into a "repo" directory within my project (hence the % parameters). So why would you want to do this? Well, let's you say you are professional services consultant, and you regularly go into customer locations where you are forced to use their security hardened laptops. You copy your self-contained project to their laptop from a USB stick, and presto, you can do your maven build no problem.
Generally, if you are using YOUR laptop, then it makes sense to have a single local repo that has everything in it. But to you who got cocky and said things like "why would you want to do that", I have some news...the world is a bigger place with more options than you might realize. If you are using laptops that are NOT yours, and you need to build your project on that laptop, get the resulting artifact, and then remove your project directory (and the local repo you just used), this is the way to go.
As to why you would want to have 2 local repos, the default .m2/repository is where the companies standard stuff goes, and the local "in project" repo is where YOUR stuff goes.
This is not possible with the command line client but you can create more complex repository layouts with a Maven repository server like Nexus.
The reason why it's not possible is that Maven allows to nest projects and most of them will reference each other, so installing each artifact in a different repository would lead to lots of searches on your local hard disk (or to failed builds when you start a build in a sub-project).
FYI: symlinks work in Windows7 and above so this kind of thing is easy to achieve if all your code goes in the same place in the local repo, i.e /com/myco/.
type mklink for details
I can see that you do not want to use special versions or classifiers but that is one of the best solutions to solve this problem. I work on the same project but different versions and each mvn install takes half an hour to build. The best option is to change the pom version appended with the change name, for example 1.0.0-SNAPSHOT-change1 that I'm working on thereby having multiple versions of the same project but with different code base.
It has made my life very easy in the long run. It helps run multiple builds at the same time without issues. Even during SCM push, we can skip the pom file from staging so there can always be 2 versions for you to work on.
In case you have a huge project with multiple sub-modules and want to change all the versions together, you can use the below command to do just that
mvn versions:set -DnewVersion=1.0.0-SNAPSHOT-change1 -DprocessAllModules
And once done, you can revert using
mvn versions:revert
I know this might be not what you are looking for, but it might help someone who wants to do this.