Whether Drone support configuring timeout value for build - drone.io

I setup a local drone server for our CI. And our project is a java project managed by maven. When run the mvn clean install command, maven will download all dependencies into ~/.m2 directory. The first time run this command will download a huge of data from maven remote repository which may take a very long time. In this case, I got below error on drone CI.
ERROR: terminal inactive for 15m0s, build cancelled
I understand that this message means there is no output on the console for 15 minutes. But it is a normal case in my build environment. I wander whether I can configure the 15m to be a larger value so I can build our project.

My previous answer is outdated. You can now change the default timeout for each individual repository from the repository settings screen. This setting is only available to system administrators.
You can increase the terminal inactive timeout by passing DRONE_TIMEOUT=<duration> to each of your agents.
docker run -e DRONE_TIMEOUT=15m drone/drone:0.5 agent
The timeout value may be any valid Go duration string [1].
# 30 minute timeout
DRONE_TIMEOUT=30m
# 1 hour timeout
DRONE_TIMEOUT=1h
# 1 hour, 30 minute timeout
DRONE_TIMEOUT=1h30m
[1] https://golang.org/pkg/time/#ParseDuration

Looking at the drone source code, it looks like they have environment variables DRONE_TIMEOUT and DRONE_TIMEOUT_INACTIVITY that you can use to configure the inactivity timeout. I tried putting it in my .drone.yml file and it didn't seem to do anything, so this may only be available at a higher level.
Here is the reference to the environment variable DRONE_TIMEOUT_INACTIVITY:
https://github.com/drone/drone/blob/17e5eb50363f3fcdc0a0461162bee93041d600b7/drone/exec.go#L62
Here is the reference to the environment variable DRONE_TIMEOUT:
https://github.com/drone/drone/blob/eee4fd1fd2556ac9e4115c746ce785c7364a6f12/drone/agent/agent.go#L95
Here is where the error is thrown:
https://github.com/drone/drone/blob/5abb5de44aa11ea546db1d3846d603eacef7f0d9/agent/agent.go#L206

Related

Configure allowed_pull_policies on shared GitLab runner

I'm using GitLab.com's managed CI runners, and I'd like to run my CI jobs using the if-not-present pull policy to avoid the extra minutes it takes to pull the image for each job. Trying to set that value in the .gitlab-ci.yml file gives me this error:
pull_policy ([if-not-present]) defined in GitLab pipeline config is not one of the allowed_pull_policies ([always])
This led me to the config.toml settings for restricting Docker pull policies, so I created a config.toml file at the root of my repository and tried that. However, I still get the same error.
Is config.toml only available for manual/self-hosted runners? Is there any other way to get past this?
Context
Image selection in .gitlab-ci.yml:
default:
image:
name: registry.gitlab.com/myorg/myrepo/ci/builder:latest
pull_policy: if-not-present
Contents of config.toml:
[[runners]]
executor = "docker"
[runners.docker]
pull_policy = ["if-not-present"]
allowed_pull_policies = ["always", "if-not-present"]
First of all, the config.toml file is not meant to be in your repo but on the runner machine (or container).
But anyways, the always pull policy should not cause image pulls to last minutes if the layers are already cached locally: it just ensures you have the latest version by checking the metadata. If the pulls take minutes, it means that either the layers are not available locally, or the image was actually updated (or that the connection to your container registry is so incredibly slow that just checking the metadata takes minutes, but that is unlikely).
It is very possible that Gitlab's managed runners do not have a way to locally cache layers, and thus there would be no practical difference between the always and if-not-present policies. For instance if you use Gitlab Saas:
A dedicated temporary runner VM hosts and runs each CI job.
(see https://docs.gitlab.com/ee/ci/runners/index.html)
Thus the downloaded layers are discarded as soon as the job finishes.

GitLab Runner fails to upload artifacts with "invalid argument" error

I'm completely new to trying to implement GitLab's CI/CD pipelines, but it's been going quite well. In fact, for my ASP.NET project, if I specify a Publish Profile in the msbuild command that uses Web Deploy, it actually deploys the code successfully to the web server.
However, I'm now wanting to have the "build" job create artifacts which are uploaded to GitLab that I can then subsequently deploy. We're using a self-hosted instance of GitLab, for which I'm not an admin, but I can speak to the admin if I know what I'm asking for!
So I've configured my gitlab-ci.yml file like this:
variables:
NUGET_PATH: 'C:\Program Files\Nuget\Nuget.exe'
NUGET_SOURCES: 'https://api.nuget.org/v3/index.json'
MSBUILD_PATH: 'C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Current\Bin\msbuild.exe'
stages:
- build
build-job:
variables:
CI_DEBUG_TRACE: "true"
stage: build
script:
- '& "$env:NUGET_PATH" restore ApplicationTemplate.sln -Source "$env:NUGET_SOURCES"'
- '& "$env:MSBUILD_PATH" ApplicationTemplate\ApplicationTemplate.csproj /p:DeployOnBuild=true /p:Configuration=Release /p:PublishProfile=FolderPublish.pubxml'
artifacts:
paths:
- '.\ApplicationTemplate\bin\Release\Publish\'
The output shows that this builds the code just fine, and it also seems to successfully find the artifacts for upload. However, when it uploads the artifacts, even though the request gets a 200 OK response, the process fails. Here is the log output:
So, it finds the artifacts, it attempts to upload them and even gets a 200 OK response (in contrast to the handful of similar reports of this error I've been able to find online), but it still fails due to an invalid argument.
I've already enabled verbose debugging, as you can see from the output, but I'm none the wiser. Looking at the GitLab Runner entries in the Windows Event Log on the box where the runner is hosted doesn't shed any light on things either. The total size of the artifacts is 61.1MB, so I don't think my issue is related to that.
Can anyone see from this output what's invalid? Can I identify which argument is invalid and/or why it's invalid?
Edit: Things I've tried
Specifying a value for artifacts:expire_in.
Setting artifacts:public to FALSE, since I'm using a self-hosted GitLab environment and the default value for this setting (TRUE) is not valid in such an environment.
Trying every format I can think of for the value of the artifacts:paths setting (this seems to be incredibly robust - regardless of the format I use, the Runner seems to have no problem parsing it and finding the files to upload).
Taking a cue from this question I created a new project with a very simple build job to upload a single file:
stages:
- build
build-job:
variables:
CI_DEBUG_TRACE: "true"
stage: build
script:
- echo "Test" > test.txt
artifacts:
paths:
- test.txt
About 50% of the time this job hangs on the uploading of the artifacts and I have to cancel it. The other half of the time it fails in exactly the same way as the my previous project:
After countless hours working on this, it seems that ultimately the issue was that our internal Web Application Firewall was blocking some part of the transfer of artefacts to the server, or the response back from it. With the WAF reconfigured not to block traffic from the machine running the GitLab Runner, the artefacts are successfully uploaded and the job succeeds.
This would have been significantly easier to diagnose if the logging from GitLab was better. As per my comment on this issue, it should be possible to see the content of the response from the GitLab server after uploading artefacts, even when the response code is 200.
What's strange - and made diagnosing the issue even harder - is that when I worked through the issue with the admin of our GitLab instance, digging through logs and running it in debug mode, the artefact upload process was uploading something successfully. We could see, for example, the GitLab Runner's log had been uploaded to the server. Clearly the WAF's blocking was selective and didn't block everything in both directions.

TFS - 'Run SSH task' option times out

In TFS, Am using SSH task with 'Commands' option to connect to a remote machine and run a set of few commands. Am using cd to a particular folder and running a shell script using 'sh '
This script usually takes around 2 hours to finish execution. The ssh task timesout after 15 minutes and exits the task. But when I see in the machine manually, the process is running.
Why doesn't the ssh task wait until the script finishes completely
According to your description, you may encountered a time out limitation of SSH task or build definition.
First, please double check the time out setting under control options.
Specifies the maximum time, in minutes, that a task is allowed to
execute before being cancelled by server. A zero value indicates an
infinite timeout.
Another place to check is build job time out, under the settings of your build definition: Option ->Build job timeout in minutes.
Specifies the maximum time a build job is allowed to execute on an
agent before being canceled by the server.
An empty or zero value indicates an infinite timeout.
If both set properly and you still get the time out, please attach more detail related build failed log with Verbose Debug Mode by setting system.debug=true for troubleshooting.

How to disable simultaneous build on drone io?

I use drone as CI and want to know how I can disable simultaneous build. What's happening is that when I submit two commits to git repo, drone will trigger two build on each of the submit. How can I let the second build wait until the first one finish?
Regarding the open source version of Drone: set the DOCKER_MAX_PROCS environment variable of your drone agent to 1, i.e. docker run -e DOCKER_MAX_PROCS=1 [...] drone/drone:0.5 agent. The agent will run one build concurrently, other builds will queue up.
See the Installation Reference section in the readme for more info.

How to stop (kill) a build in drone

Is there a way to kill a build in drone before it finishes or times out?
The default timeout in drone is 6 hours (https://github.com/drone/drone/blob/master/cmd/drone/drone.go#L32)
And if you have a mistake in your makefile that just get's stuck then you need to wait for 6 hours.
This is specially annoying if you have limited number of simultaneous builds.
My question is about the self hosted, open source version, not the hosted version if it makes any difference.
You can stop drone build using CLI:
drone build stop <repo/name> <build>
If build cannot be stopped/canceled, you can kill it:
drone build kill <repo/name> <build>
See more commands in drone CLI docs.
I just pushed a new commit and it automatically stopped the stuck build and started a new one. No need to wait for 6 hours. ;)
This is possible from the UI in Drone 0.4.
To stop the build using the drone cli, use the following command:
drone build stop <root/name> <DRONE_BUILD_NUMBER>
Make sure the following are exported:
export DRONE_SERVER=https://drone.server.com
export DRONE_TOKEN=<secret_drone_token>
It is also possible to stop the build using API:
DELETE /api/repos/{owner}/{repo}/builds/{build}