VSTS multi-phased builds - run nuget restore and npm install in parallel - npm

I am having a build where in pre-compilation stage nuget restore is taking ~3 minutes to restore packages from cache and so does npm.
These two restoration from caches could run in parallel but I am not clear whether this is possible using the VSTS Phases.
Each phase may use different agents. You should not assume that the state from an earlier phase is available during subsequent phases.
What I would need is a way to pass the content of packages and node_modules directories from two different phases into a third one that invokes the compiler.
Is this possible with VSTS phases?

I wouldn't do this with phases. I'd consider not doing it at all. Restoring packages (regardless of type) is an I/O bound operation -- you're not likely to get much out of parallelizing it. In fact, it may be slower. The bulk of the time spent restoring packages is either waiting for a file to download, or copying files around on disk. Downloading twice as many files just takes twice as long. Copying two files at once takes double the time. That's roughly speaking, of course -- it may be a bit faster in some cases, but it's not likely to be significantly faster for the average case.
That said, you could write a script to spin off two separate jobs and wait for them to complete. Something like this, in PowerShell:
$dotnetRestoreJob = (Start-Job -ScriptBlock { dotnet restore } ).Id
$npmRestoreJob = (Start-Job -ScriptBlock { npm install } ).Id
do {
$jobStatus = Get-Job -Id #($dotnetRestoreJob, $npmRestoreJob)
$jobStatus
Start-Sleep -Seconds 1
}
while ($jobStatus | where { $_.State -eq 'Running' })
Of course, you'd probably want to capture the output from the jobs and check for whether there was a success exit code or a failure exit code, but that's the general idea.

A real problem here wasn't that VSTS hosted agent npm install and nuget restore could not have been run in parallel on a hosted agent. No.
A real problem was that hosted agent do not use nuget cache by design.
We have determined that this issue is not a bug. Hosted agent will
download nuget packages every time you queue a new build. You could
not speed this nuget restore step using a hosted agent.
https://developercommunity.visualstudio.com/content/problem/148357/nuget-restore-is-slow-on-hostedagent-2017.html
So a solution to take nuget restore time from 240s to 20s was to move it to a local agent. That way local cache do get used.

Related

Configure allowed_pull_policies on shared GitLab runner

I'm using GitLab.com's managed CI runners, and I'd like to run my CI jobs using the if-not-present pull policy to avoid the extra minutes it takes to pull the image for each job. Trying to set that value in the .gitlab-ci.yml file gives me this error:
pull_policy ([if-not-present]) defined in GitLab pipeline config is not one of the allowed_pull_policies ([always])
This led me to the config.toml settings for restricting Docker pull policies, so I created a config.toml file at the root of my repository and tried that. However, I still get the same error.
Is config.toml only available for manual/self-hosted runners? Is there any other way to get past this?
Context
Image selection in .gitlab-ci.yml:
default:
image:
name: registry.gitlab.com/myorg/myrepo/ci/builder:latest
pull_policy: if-not-present
Contents of config.toml:
[[runners]]
executor = "docker"
[runners.docker]
pull_policy = ["if-not-present"]
allowed_pull_policies = ["always", "if-not-present"]
First of all, the config.toml file is not meant to be in your repo but on the runner machine (or container).
But anyways, the always pull policy should not cause image pulls to last minutes if the layers are already cached locally: it just ensures you have the latest version by checking the metadata. If the pulls take minutes, it means that either the layers are not available locally, or the image was actually updated (or that the connection to your container registry is so incredibly slow that just checking the metadata takes minutes, but that is unlikely).
It is very possible that Gitlab's managed runners do not have a way to locally cache layers, and thus there would be no practical difference between the always and if-not-present policies. For instance if you use Gitlab Saas:
A dedicated temporary runner VM hosts and runs each CI job.
(see https://docs.gitlab.com/ee/ci/runners/index.html)
Thus the downloaded layers are discarded as soon as the job finishes.

Why does artifact publish take a long time in Bamboo?

We are using Bamboo to build our code, create artifacts, and deploy.
Problem Scenario
I have a plan that has a stage with 3 jobs (dev/test/prod). The jobs build the code and publish a 16-20Mb Artifact as a shared artifact. When I run this plan, the publish takes 8-9 minutes for all 3 jobs. The publish is happening at approximately the same timestamp for all 3 jobs.
Here is an example log statement:
simple 10-Sep-2021 13:46:15 Publishing an artifact: Preview Artifact
simple 10-Sep-2021 13:55:09 Finished publishing of artifact Required shared artifact: [Preview Artifact], pattern: [**/Artifact.*.zip] in 8.897 min
I went onto the build server (Windows Server 2012) and viewed the artifact file in the work directory and in the artifacts directory. They are indeed almost 9 minutes apart with file timestamps.
This is very consistent. I can view many previous builds and it is consistently taking 8 or 9 minutes.
Fixed Scenario
I just edited the plan and disabled 2 of the jobs. Now the artifact publish step is taking a mere number of seconds:
27-Sep-2021 15:20:19 Publishing an artifact: Preview Artifact
27-Sep-2021 15:20:56 Finished publishing of artifact Required shared artifact: [Preview Artifact], pattern: [**/Artifact.*.zip] in 37.06 s
Questions
Why is the artifact publish so slow when I run concurrent jobs? What is bamboo doing during the publish job step that could take so long?
I have 20 other build plans (that do not use concurrent jobs) in which the artifact copy takes less than a minute. I have never seen this problem with any of these other plans.
I don't see anything special in the documentation, nor can I find a problem like this when I search Google and Stack Overflow. I need the artifact to be shared because I use it in a Deployment project.
EDIT:
Now that I think of it, 37 seconds is way too long as well. I just copied the file manually and it took about a second. Why is it taking so long even without concurrent jobs?

The mystery of stuck inactive msbuild.exe processes, locked Stylecop.dll, Nuget AccessViolationException and CI builds clashing with each other

Observations:
On our Jenkins build server, we were seeing lots of msbuild.exe processes (~100) hanging around after job completion with around 20mb memory usage and 0% CPU activity.
Builds using different versions of stylecop were intermittently failing:
workspace\packages\StyleCop.MSBuild.4.7.41.0\tools\StyleCop.targets(109,7):
error MSB4131: The "ViolationCount" parameter is not supported by the "StyleCopTask" task.
Verify the parameter exists on the task, and it is a gettable public instance property.
Nuget.exe was intermittently exiting with the following access violation error (0x0000005):
.\workspace\.nuget\nuget install .\workspace\packages.config -o .\workspace\packages"
exited with code -1073741819.
MsBuild was launched in the following way via a Jenkins Matrix job, with 'BuildInParallel' enabled:
`msbuild /t:%Targets% /m
/p:Client=%Client%;LOCAL_BUILD=%LOCAL_BUILD%;BUILD_NUMBER=%BUILD_NUMBER%;
JOB_NAME=%JOB_NAME%;Env=%Env%;Configuration=%Configuration%;Platform=%Platform%;
Clean=%Clean%; %~dp0\_Jenkins\Build.proj`
After a lot of digging around and trying various things to no effect, I eventually ended up creating a new minimal solution which reproduced the issue with very little else going on. The issue turned out to be caused by msbuild's multi-core parallelisation - the 'm' parameter.
The 'm' parameter tells msbuild to spawn "nodes", these will remain alive after the build has ended, and are then re-used by new builds!
The StyleCop 'ViolationCount' error was caused by a given build re-using an old version of the stylecop.dll from another build's workspace, where ViolationCount was not supported. This was odd, because the CI workspace only contained the new version. It seems that once the StyleCop.dll was loaded into a given MsBuild node, it would remain loaded for the next build. I can only assume this is because StyleCop loads some sort of singleton into the nodes processs? This also explains the file-locking between builds.
The nuget access violation crash has now gone (with no other changes), so is evidently related to the above node re-use issue.
As the 'm' parameter defaults to the number of cores - we were seeing 24 msbuild instances created on our build server for a given job.
The following posts were helpful:
msbuild.exe staying open, locking files
http://www.hanselman.com/blog/FasterBuildsWithMSBuildUsingParallelBuildsAndMulticoreCPUs.aspx
http://stylecop.codeplex.com/discussions/394606
https://github.com/Glimpse/Glimpse/issues/115
http://msdn.microsoft.com/en-us/library/vstudio/ms164311.aspx
The fix:
Add the line set MSBUILDDISABLENODEREUSE=1 to the batch file which launches msbuild
Launch msbuild with /m:4 /nr:false
The 'nr' paremeter tells msbuild to not use "Node Reuse" - so msbuild instances are closed after the build is completed and no longer clash with each other - resulting in the above errors.
The 'm' parameter is set to 4 to stop too many nodes spawning per-job
I had the same issue. One old reference I found was in csproj files
<PropertyGroup>
<StyleCopMSBuildTargetsFile>..\packages\StyleCop.MSBuild.4.7.48.0\tools\StyleCop.targets</StyleCopMSBuildTargetsFile>
Also, I deleted the entire "Packages" folder that's located in the same folder as sln file after I closed the visual studio. It triggered VS to rebuild the folder and let go of the cache of the old version of stylecop
I've had the same issue for a while, builds were taking over 6 minutes to finish after some digging I found our it's node reuse fault so adding /m:4 /nr:false fixing my issue immediately

Your total DEV#Cloud disk usage is over your subscription's quota

I have one jenkins job.
My first configuration stores the last 60 builds.
After 32 builds I get following message:
Build execution is suspended due to the following reason(s):
Your total DEV#Cloud disk usage is over your subscription's quota. Your subscription Free allows 2 GB, but you are using 2052 MB across all services (Forge and Jenkins). To fix this, you can either upgrade your subscription or delete some data in your Forge repositories, Jenkins workspaces or build artifacts.
Ok, the build artefacts are to big.
Now I configured the jenkins job to store 60builds and only 3 artefacts.
Where can I find the (old) build artefacts?
Where can I delete them?
You can manually delete build artifacts by deleting builds. This can be achieved by selecting a build from build history and then deleting it with the "delete this build" link. This is quite cumbersome, so a better solution is to go to build config and do the following: check the "discard old builds" checkbox, click "Advanced" button, put a suitable value to either the "days to keep artifacts" or "max # of builds to keep with artifacts".
You could also install the disk usage plugin, which gives you information on how much space your jobs are taking.
Here's a wiki article about managing disk usage on DEV#cloud.

Atlassian Bamboo: First plan with a simple job of downloading a local git repo

I just downloaded the free trial of Bamboo continuous integration server, and created the first plan with nothing but downloading the source code from the git. I have a local git repository on the bamboo machine so the git URL is pointing to a local path.
The problem is that when I run the job, it never finishes even after waiting for an hour. This is the last lines of the activity log:
07-Apr-2011 20:03:23 Checking out revision f9dc82500914333ed4bbdae5ed038771fd658c3c.
07-Apr-2011 20:03:23 Creating local git repository in '/home/bob/bamboo-home/xml-data/build-dir/DEV-DEV-1/.git'.
From the shell I can go to the directory shown in the log and see that the source code were cloned correctly to the bamboo working directory. But the job will never finish and the log will not have any more update from here. I have to manually terminate the job. Any ideas? Do I miss something?
Just a guess, since the Bamboo instance we have at work pulls from Accurev and not Git, and I've never run into this problem myself - but it may be hung because there isn't a builder defined for that plan. You might try defining a builder (even if it's one that you know will fail) just to see if it makes it to that next step.
I had very similar problem.
It's not very original solution but I just uninstalled bamboo and installed it again.. Now it works now