Repository for storing derived information (build artifacts) - repository

I'm looking for a "repository" to store derived information (build artifacts).
We have a repository (currently Mercurial) to store our source code. When something is pushed to the source repository the code goes through a continuous integration server and we do an incremental build and as a result some dlls will be changed. This should be added to some "repository" so that everybody can use that version without needing to do the build again.
I'm looking for the following features:
It should be easy to update the source code and get the corresponding binaries (we could probably make a script for that)
You should easily get all binaries at once (not only those that changed during the last incremental build.
Binaries that weren't changed should only be stored once in the repository.
When updating the source code and the binaries only the changed binaries should be transferred (and not all binaries). This is similar to what happens for source code.
When updating to some version, only that version should be stored locally, not the complete history.
We should be able to remove certain versions from the binary "repository" after a while. However if the dlls are still necessary for subsequent incremental builds, these dlls should of course not be completely removed from the "repository"
What would fit these requirements?

I agree with Manfred, what you are looking for is a binary repository manager. Besides the Nexus repository manager you should consider Artifactory.
As for the feature list you asked about:
As you have mentioned the CI server should be responsible for identifying a change in the version control and starting a build process which creates the binaries. The CI server/build tool should also deploy the generated binaries to the repository manager, in case the build was successful. Artifactory offers a build integration feature which takes care of deploying the binaries together with the build metadata.
Using the build integration feature of Artifactory, you can get a list of all the binaries generated by a specific build and download them as an archive. Artifactory provides a REST API for those actions.
There are different approaches for storing the artifacts in a repository manager. Some tools stores a multiple copies of the same binary. Other, for example Artifactory, use a checksum based storage which keeps only one copy per binary (based on its checksum). This pays of if you keep multiple copies of the same binary in different repositories, especially if you are dealing with large binaries (war files, docker images, ISOs etc.). Another benefit are cheap copies/moves between repositories which is a common practice for promotion workflows.
The Artifactory build integration uses checksum based deployment which deploys only binaries which does not exist in Artifactory. For binaries which do exist and have not changed, it only created a new reference to the existing binary saving the need to send the actual bytes.
Artifactory provides multiple option of cleaning up binaries, including built in cleanup policies and the option to develop your own custom logic using user plugins and the Artifactory query language (AQL)
In addition, I highly recommend to take a look at the binary repository comparison matrix.
Disclaimer: I am working for JFrog the company behind Artifactory

You are basically asking for a repository manager like the Nexus Repository Manager as you have correctly identified with the tags.
In terms of specific requirement from your questions here are a couple of ideas.
binary components are typically identified via some coordinates that most of the time includes some sort of name and version. A release and build process changes those and deploys them to the repository. This allows you to match source code with binaries. You can also embed information like git refs in the produced binaries.
accessing the binaries is typically done via HTTP, so its easy. You then just have to determine what it means to get "all binaries".
not duplicating binaries that are essentially the same can be supported by the underlying file system or the build tool. I have seen both processes to work. Often it is however not worth the effort since storage is cheap.
there are various ways to automatically clean up repositories including scheduled tasks that do it regularly. Worst case you have to implement your own logic in an extension
Disclaimer: I work as community advocate and trainer for the Nexus Repository Manager with Sonatype.

Related

How to avoid a build and deployment of dependencies which have no code changes

I'm doing a proof of concept on continuous integration and whether our development team will benefit from automated builds and automated deployments to reduce human error.
I've already come quite far in the process but have some questions on how to configure our incremental builds to avoid rebuilding of dependencies that had no code changes.
In addition I’d like our deployment tool to identify and deploy only assemblies rebuilt as a result of a code change.
We already use Microsoft products like TFS for source control, Visual Studio for development and Team Foundation Build for continuous integration builds. We’re currently leaning toward InRelease for deployment as it seem to integrate well with Team Foundation Build.
But first, here is our current setup...
There are 200+ C# solution files, each containing one or more projects. It is not practical in the environment to combine these projects into less solutions, i.e. by design. Projects within a solution uses project references to resolve dependencies and file references to projects in other solutions. As far as I know, this is the recommended approach by Microsoft when dealing with a large amount of projects.
We use a "branch by feature" strategy e.g. isolated development on concurrent features branches which is merged up to a stable Main branch when complete. When it's time for a release, a release is branched from main and isolated for hotfixes and deployment. The feature branches and main branch have a CI build triggered by code check-ins. Releases will mostly like be manually executed from InRelease against a selected release branch. A release will be deployed through various environments including INTEGRATION/TEST, UAT and ultimately to all our clients. We're still fleshing out the details of branching strategy, but that's a question for another time.
The current problems to solve:
1. Avoid rebuilding of dependencies that have no code changes...
When we deploy new functionality or a patch to a client, we want to push the absolute minimum in files. Our company has a very large customer base (thousands of customers) with sometimes slow internet connections, so doing a full deployment of all assemblies (200+) to every customer is not an option. I've partially solved the problem by setting up incremental builds which correctly rebuilds only changed projects as expected but also rebuilds all the dependent projects even though NO CODE CHANGES were made to them. This results in both the changed assemblies and dependencies having new timestamps. If we use the change of timestamp to identify which assemblies to deploy, then this would result in deployment of functionally unchanged assemblies. The goal here is to deploy only assemblies where the code has changed and assemblies where breaking changes occur.
For example:
Solution B, has a project called Project B
Solution A, has a project called Project A
Project B -> Project A (where Project A has a file dependency on Project B)
When a non-breaking change is made in Project A, say to the interior of a method, then the expected result is: only A is built and therefore a candidate for deployment.
When a breaking change is made in Project A then that will break Project B, the expected result is: Both A and B is built and therefore a candidate for deployment.
Currently MSBuild rebuilds all dependents regardless, which is not what we want.
2. Automatically identify which assemblies should be deployed...
I have a partial solution to the problem.
When a build is performed, my build process template is configured to run a MSBuild script containing a list of solutions to build in a particular order.
This operation is performed in the build agent’s workspace. Every time a new build is performed the build process template creates an unique drop folder in the format and copies the binaries from the build agent workspace to the drop folder. This is out of the box functionality taken care of by the standard build process template. The build has been configured not to clear the build agent workspace, so the first time it runs it will build all projects within a solution but subsequent builds will only build projects that have code changes or is dependent on other projects (incremental build?). Therefore unchanged assemblies will have the original time stamps and changed assemblies will have new timestamps.
We have a tool that can do folder comparisons between drop folders and output the results to a txt file. This allows us to identify which binaries have been added/changed/removed since the last deployment. It also gives us the added benefit of comparing the list of actual artefacts to a manifest of expected artefacts as defined by the developer. This will ensure that no assemblies get deployed that has not been specified and proven to be unit tested.
The question is how can be we leverage InRelease to deploy only the required files as per the example above and not all files in the drop folder?
Install a TFS Proxy in before your build machine, this reduce the net traffic
You will start with a branch strategy like Service Pack, you can read a documentation about in ALM Rangers guidance... And adapt you process template to build just the part of code changed. I think in BRD Lite, another guidance by ALM Rangers, you will found more information.

Is nuget appropriate for daily development workflow?

I am looking at nuget for improving automatic handling of dependencies (both internal and third party) during development.
A long as you develop through the CI Build Server, all is good:
get latest source for A and B, where B depends on A
fix bug in A
build A
check into source control
CI Build Server initiated
new nuget package is created and placed in corporate repository
build B (which will get the updated A package)
run B to verify that the bug in A was fixed
n. repeat n times
However, I'm wondering if it is possible to work locally as a single developer, without having to wait for the CI Build Server to produce a new package?
Nuget has a feature Package Restore, which will download all dependencies automatically on build. You can also list the repository order that the Package Restore should look for packages.
If the workflow could become:
get latest source for A and B, where B depends on A
fix bug in A
build A
(building creates a local nuget package)
run B to test the (resolved) bug in A (should now use our local nuget package, not local repository)
...repeat n times
check into source control
CI Build Server initiated
new nuget package created in corporate repository
Is this possible using Visual Studio, MSBuild, a CI Build Server and nuget? I'm especially interested in the making of local packages while developing locally.
Note that I have native projects, although except the generation of nuget package post-build, this would be a workflow that I hope should work for both C# and C++ projects.
The solution I have now, though far from ideal, is what I could figure out works best. Oh! and it is a work in progress so it WILL change in the coming weeks/months as I figure out how to get around the kinks.
I mostly have to deal with managed DLL right now but I do have some native code and worst, multi-platform native code to deal with eventually.
Create a local repository, basically just a folder and configure it in your list of nuget feeds.
Then I created a task (MSBuild) that will package the project and output it in the local repository's root folder. Make sure the version of your package is always increasing. Presently I do this manually by editing the assembly version.
Once built, update your other projects that reference it, I usually do this though the package manager console (update-package).
Each projects that was updated, bump up their version rinse lathe and repeat until you get to your top-most project (the actual program).
Once everything is nice and good and you are ready to commit then the build system should do it's own packaging and send it to your official repository.
The Good
No clogging of the repository and build system with intermediary development versions, that garbage remains (as it should) local.
Local repos are super easy to set-up, can even be done without changes to VS though the global nuget config.
This is friendly to both paradigms of package recover or checking-in packages with the project. That said I would recommend not checking in the packages you built locally but rather one that was committed to your local repository ideally through the build system. What's built local should remain local.
The Bad
Still much more complicated than just adding projects to a solution.
The deeper (or wider) your dependency tree the bigger the pain.
The Ugly
Makes some native nuget behaviors quite quirky and annoying :
Update operation takes forever if your VS is connected to a version system (perforce for me). I hear they "solved" the problem, would hate to see how it was before if it was worst that it is now !
Having nuget change non-code reference back to never copy is a major pain.
If Only
Configure the desired state of a content dependency (copy always, never or newer) directly from the nuspec and be done with it ! (oh and same story with ClickOnce content status include, exclude etc)
Make the update operation quick, 2 minutes for a dozen project is just insane, especially if the ultimate goal is to manage 500+.
Perhaps a hybrid mode where locally we work with projects inclusion but the build system would work with nuget dependency (and build them if necessary)
If you are to parse the project do follow MSBuild parsing rules and honor the conditional statements.
There are still issues I have yet to figure out like how to manage multiple branches of the code in the repository. How to handle version conflict further up the food chain. In a large project (ultimately we have to bring 500+ separate projects together in a single application executable, conflicts are expected).
I would love to bring all the goodness of sane dependency management à la Maven but thus far I did not find nuget to be mature enough to even think of proposing it to the dev team.
Certainly. In our solutions, NuGet parks the libraries in the "packages" directory of the solution's hierarchy which is ultimately kept in TFS. This allows for complete solution check-outs that includes the required libraries. If it's your intention to update the libraries normally provided by NuGet, you'll need to update the dependent projects' references to point to the project containing the updated code normally provided by the NuGet process.
Prior to checking-in your regular solution work (not the NuGet related libs,) make sure the solution's NuGet libs are up to date, and the references in the solution point back to the NuGet installed libs. Of course, you'll check-in and fetch the NuGet related libs beforehand.

How to upload new/changed files from development server to the production one?

Recently I started to incorporate good practices in my development workflow, so I split the development server and the production one. I also incorporated a versioning system using Subversion (Tortoise SVN).
Now I have the problem of synchronize the production server (Apache shared hosting) with the files of the last development version in my local machine.
Before I didn't have this problem because I worked directly with the server files through Filezilla. But now I don't know how to transfer the files in an efficient way and what are the good practices in this aspect.
I read something about Ant and Phing but I'm not sure if this appropiate to me or is unnecessary complexity.
Rsync is a cross-platform tool designed to help in situations like this; I've used it for similar purposes on multiple occasions. This DevShed tutorial may be of some help.
I don't think you want to "authomatize" it, rather establish control over your deployment and integration process. I generally like SVN but it has some bugs and one problem I have with it is that it doesn't support baselining -- instead you need to make a physical branch of your repository if you want to have a stable version to promote to higher environments while continuing to advance the trunk.
Anyway, you should look at continuous integration and Jenkins. This is a rather wide topic to which not a specific answer can be given. There are many ins, outs, what-have-yous. Depends on your application platform, components, do you have database changes, are you dealing with external web services or 3rd party APIs etc.
Maybe out there are more structured solutions but with Tortoise SVN you can export only the files changed between versions in a folder tree structure. And then, upload as always in Filezilla.
Take a look to:
http://verysimple.com/2007/09/06/using-tortoisesvn-to-export-only-newmodified-files/
Using TortoiseSVN, right-click on your working folder and select
“Show Log” from the TortoiseSVN menu.
Click the revision that was last published
Ctrl+Click the HEAD revision (or whatever revision you want to
release) so that both the old and the new revisions are
highlighted.
Right-click on either of the highlighted revisions and select
“Compare revisions.” This will open a dialog window that lists all
new/modified files.
Select all files from this list (Ctrl+a) then right-click on the
highlighted files and select “Export selection to…”
Side note:
You have to open more details about your workflow and configuration - applicable solutions depends from it. I see 4 main nodes in game: Workplace, Repo Server, DEV, PROD, some nodes may be united (1+2, 2+3), may have different set of tools (do you have SSH, Rsync, NFS, Subversion clients on DEV|PROD). All details matter
In any case - Subversion repositories have such thing, as hooks, in your case post-commit hook (executed on Repository Server side after each commit) may be used
If this hook (any code, which can be executed in unattended mode) you can define and implement any rules for performing deploy to any target under any conditions. You must only know
Which transport will be used for transferring files
What is your webspaces on servers (Working Copies of just clean unversioned files - both solution have pro and contra sets) - it will define, which deployment-policy ("export" or "update") you have to implement in hook
Some links to scripts, which export files, affected by revision (or range of revisions) into unversioned tree

Testing a NuGet package

We are big users of NuGet, we've got 25-30 packages which we make available on a network share.
We'd like to be able to test new packages before they're built and released in the consuming applications. Ideally, this could be done using something similar to Maven's snapshot and having a specific development package (e.g. snapshot functionality).
Has anyone else come up with a, ideally reasonably non-hacky, way of doing it?
Our favoured method is to generate the package assemblies and then manually overwrite the assemblies in the packages/ directory, i.e. to replace the actual project references, but that doesn't seem particularly clean.
Update:
We use a CI build server which creates builds on every commit and has a specific manually triggered NuGet build which works off specifically tagged versions of the codebase. We don't want to create a NuGet build off every commit, but we would like to be able to test a likely candidate in the wild before we trigger the manual NuGet package build.
I ended up writing a unit / integration testing framework to solve a simular problem. Basically, I needed to verity the content of the package, the versions and info, what would happen when I installed and uninstalled the package, what versions were the assemblies in the lib, what bits the assemblies were built as (x86 or x64) and so on - and I needed it all to run without Visual Studio installed and on my build machine (headless) as a quality gate.
Standing on the shoulders of giants like: Pester, PETools, and SharpDevelop's package management module I put together - nuget-test
Clone the project into your package directory (where your .nuspec file and package files are). If for whatever reason you want to keep the nuget-test project as a "git" repo then simple remove "remove-item nuget-test/.git -Recurse -Force" from the command below.
git clone https://github.com/nickfloyd/nuget-test.git; remove-item nuget-test/.git -Recurse -Force
Run Setup.ps1 in the root of the nuget-test directory in an x86 instance of PowerShell.
PS> .\setup.ps1
Write tests and place them in the nuget-test/test directory using the Pester syntax.
Run the tests.
PS> Invoke-Pester
Project page: nuget-test
On github: https://github.com/nickfloyd/nuget-test
I hope this helps you get closer to what you're trying to get done.
If you're using NuGet packages to distribute your libraries, you should not limit to only testing the libraries. You should test the packages themselves as well (if your binaries are OK but incorrectly installed, consumers still have issues). The whole point is to improve this experience.
One way could be to have an additional CI or QA repository. The one you currently have is actually your "production" repository containing consumable releases, considered finished high-quality products.
Going further, you could have a logical package promotion flow (based on Continuous Integration or even using a Continuous Delivery approach), where:
- each check-in produces a package on your CI repository
- testers pick up a CI package for QA and if found OK promote it to either a QA feed, or to the Production feed (whatever you prefer, depends on the quality of your testing and how well it is automated)
There are various ways of implementing this scenario, using simple network shares, internal NuGet.Server or Gallery implementations, or simply use http://myget.org to give it a try with minimal cost and zero effort.
Hope that helps!
Cheers,
Xavier

Archivable, replicable releases when building with Maven: is there a right way?

We have a largish standalone (i.e. not Java EE) commercial Java project (10,000+ classes, four or five SVN repositories, ten or twenty third-party libraries) that's in the process of switching over to Maven. Unfortunately only one engineer (in a team of a dozen or so distributed across three countries) has any prior Maven experience, so we're kind of figuring it out as we go.
In the old Ant way of doing things, we'd:
check out source code from three or four repositories
compile it all into a single monolithic JAR
release that (as part of a ZIP file with library JARs, an installer, various config files, etc.)
check the JAR into SVN so we had a record of what the customers had actually got.
Now, we've got a Maven repository full of artifacts, and a build process that depends on Maven having access to that repository. So if we need to replicate what we actually shipped to a customer, we need to do a build against a Maven repository that has all the proper versions of everything. This is doable, I guess, if in (some version of) the (SVN-controlled) POM files we set all the dependencies to released versions?
But it gives our release engineer the creepy-crawlies, because there doesn't seem to be any way:
to make sure that somebody doesn't clobber the copy of foo-api-1.2.3.jar on the WebDAV server by mistake (the WebDAV server has access control, but that wouldn't stop a buggy build script)
to detect it if they did
to recover afterwards
His idea is, for release builds, to use a local file system as the repository rather than the WebDAV server, and put that local repository under SVN control.
Our one Maven-experienced engineer doesn't like that -- I guess because he doesn't like putting binaries under version control? -- and suggests that maybe the professional version of the Nexus server can solve the clobbering or clobber-tracking/recovery problem.
Personally, I'm not happy (sorry, Sonatype readers) with shelling out money for a non-free build system when we haven't even seen any benefit from the free version yet, and there's no guarantee it will actually solve the problem.
So our choices seem to be:
WebDAV server
Pros: only one server, also accessible by devs, ...?
Cons: easy clobbering, no clobber-tracking/recovery
Local file system
Pros: can be placed under revision control
Cons: only works with the distribution script
Frankly, both of these seem like hacks to me, and I have to wonder if there isn't a better way to do this.
So: Is there a right thing to do here?
I'm not sure to get everything but I would:
Use the maven-release-plugin (which automates the release process i.e. execute all the steps documented in release:prepare).
Use WebDAV with anonymous read-only and authenticated write policy (so only release engineer can actually deploy released artifacts to the corporate repo).
There is a no need to put generated artifacts under version control (if you have the poms under version control). I don't see the benefits of using the local file system instead of WebDAV (this is not providing more security, you can secure WebDAV as well). I don't see what the commercial version of Nexus would solve here.
Nexus has a setting which prevents you from clobbering an already released artefact in a release repository.
For a team of about a dozen, the free version of Nexus should be enough.