How do I correctly configure Gitlab Runners with S3/Minio as a distributed cache? - amazon-s3

I am running Gitlab Runners on Openshift, and they are picking up jobs correctly. However, when running the job, the cache should be configured to use s3 caches, with a local minio service serving as s3 for the distributed cache. However, when running the job, the runner appears to ignore the setup and attempt to use a local cache (and indeed gets a permission denied error when trying to create it locally?)
config.toml:
concurrent = 8
check_interval = 0
[[runners]]
name = "GitLab Runner"
url = "https://gitlab.com/"
token = "XXX"
executor = "kubernetes"
builds_dir = "/tmp/build"
environment = ["HOME=/tmp/build"]
cache_dir = "/tmp/cache"
[runners.kubernetes]
namespace = "gitlab-runners"
privileged = false
host = ""
cert_file = ""
key_file = ""
ca_file = ""
image = ""
cpus = ""
memory = ""
service_cpus = ""
service_memory = ""
helper_cpus = ""
helper_memory = ""
helper_image = ""
[runners.cache]
Type = "s3"
Shared = true
Path = "gitlab"
[runners.cache.s3]
ServerAddress = "minio-service"
AccessKey = "XXX"
SecretKey = "XXX"
BucketName = "gitlab-runner"
BucketLocation = "eu-west-1"
Insecure = true
Cache job output:
Initialized empty Git repository in /tmp/XXXX/XXX/.git/
Created fresh repository.
Checking out e57da922 as develop...
Skipping Git submodules setup
Restoring cache
00:01
Checking cache for develop-1...
FATAL: mkdir ../../../../cache: permission denied
Failed to extract cache
Executing "step_script" stage of the job script
02:02
$ npm install
added 1966 packages, and audited 1967 packages in 2m
found 0 vulnerabilities
Saving cache for successful job
00:01
Creating cache develop-1...
node_modules/: found 44671 matching files and directories
FATAL: mkdir ../../../../cache: permission denied
Failed to create cache
Cleaning up file based variables
00:00
Job succeeded
Second job (pulling from cache output):
Restoring cache
00:00
Checking cache for develop-1...
FATAL: file does not exist
Failed to extract cache
Executing "step_script" stage of the job script

Related

Where does Nomad put the downloaded S3 files?

I have the following Nomad job:
job "aws_s3_copy_rev2" {
datacenters = ["dc1"]
type = "system"
group "aws_s3_copy_rev2" {
count = 1
task "aws_s3_copy_rev2" {
driver = "raw_exec"
artifact {
source = "s3::https://my-data-files/123/"
}
resources {
cpu = 500 # 500 MHz
memory = 256 # 256MB
network {
port "http" {}
}
}
}
}
}
I submitted the job using nomad run aws_s3_copy_rev2.nomad. But I do not know where the file is downloaded to. Where does the Nomad put the downloaded S3 files?
This is my configuration file for starting the Nomad agent.
# Increase log verbosity
log_level = "DEBUG"
# Setup data dir
data_dir = "/tmp/client1"
# Give the agent a unique name. Defaults to hostname
name = "client1"
# Enable the client
client {
enabled = true
# For demo assume we are talking to server1. For production,
# this should be like "nomad.service.consul:4647" and a system
# like Consul used for service discovery.
servers = ["xxx:4647"]
options {
"driver.raw_exec.enable" = "1"
}
}
# Modify our port to avoid a collision with server1
ports {
http = 5657
}
Usually artifacts are stored in the allocation folder off your Nomad allocation, which in the default case would be /etc/nomad.d/alloc/<alloc_id>/<task>/local/<your_file.ext> on Linux machines. Not sure where things land on other OSes.
In this case, your data_dir is set to /tmp/client1, so I would expect the files would be somewhere like /tmp/client1/alloc/<alloc_id>/<task>/local/<your_file.ext>.
It is important to note that these artifacts are generated on the Nomad 'client' running an allocation of your job, not the machine you are starting the job from.
Also, you might want to be careful rooting your Nomad data directory in the /tmp folder as it might get periodically deleted, which might explain why you cannot find those files.
You can reference this directory in nomad environment as ${NOMAD_TASK_DIR}
and access or execute the file using the path:
artifact {
source = "s3::https://some-bucket/code/archive-logs.sh"
destination = "/local/"
}
driver = "raw_exec"
kill_timeout = "120s"
config {
command = "/bin/bash"
args = ["${NOMAD_TASK_DIR}/archive-logs.sh","7"]
}

Rabbit MQ declarative clustering

I have a RabbitMQ node running on a Windows 2012 server (rabbit#my-server-1).
I am creating a second node (rabbit#my-server-2) on a seperate server (also Windows 2012) and would like to cluster it with the existing node. The deployment of the second node is via Octopus Deploy and to make life easier I would like to have the clustering automatically done on startup of the node.
Reading the documentation (https://www.rabbitmq.com/clustering.html and https://www.rabbitmq.com/configure.html) leads me to believe I just need to add the following to the rabbitmq.conf file:
cluster_nodes.disc.1 = rabbit#my-server-1
However doing so causes the node to not start. The erl.exe process starts using 100% cpu and I see the following message in the erl_crash.dump file:
Slogan: init terminating in do_boot (generate_config_file)
I believe this is symptomatic of an invalid config file, and indeed removing these config entries allows me start the node fine.
I am able to cluster to the existing node manually via the relevant rabbitmqctl commands, but would prefer the declarative solution if possible.
I'm running RabbitMQ v3.7.4 and Erlang v20.3
So, what am I doing wrong? I've done some googling but haven't found anything that helps.
EDIT
Config file in full is:
listeners.ssl.default = 5671
ssl_options.cacertfile = e:/Rabbit/Certificates/cacert.pem
ssl_options.certfile = e:/Rabbit/Certificates/cert.pem
ssl_options.keyfile = e:/Rabbit/Certificates/key.pem
ssl_options.password = xxxxxxx
ssl_options.verify = verify_none
ssl_options.fail_if_no_peer_cert = false
ssl_options.versions.1 = tlsv1.2
web_stomp.ssl.port = 14879
web_stomp.ssl.backlog = 1024
web_stomp.ssl.certfile = e:/Rabbit/Certificates/cert.pem
web_stomp.ssl.keyfile = e:/Rabbit/Certificates/key.pem
web_stomp.ssl.cacertfile = e:/Rabbit/Certificates/cacert.pem
web_stomp.ssl.password = xxxxxxx
cluster_nodes.disc.1 = rabbit#my-server-1
How about adding the clustering-information like it is written in the doc under "Config File Peer Discovery Backend"
this would leave you with a configfile like this:
listeners.ssl.default = 5671
ssl_options.cacertfile = e:/Rabbit/Certificates/cacert.pem
ssl_options.certfile = e:/Rabbit/Certificates/cert.pem
ssl_options.keyfile = e:/Rabbit/Certificates/key.pem
ssl_options.password = xxxxxxx
ssl_options.verify = verify_none
ssl_options.fail_if_no_peer_cert = false
ssl_options.versions.1 = tlsv1.2
web_stomp.ssl.port = 14879
web_stomp.ssl.backlog = 1024
web_stomp.ssl.certfile = e:/Rabbit/Certificates/cert.pem
web_stomp.ssl.keyfile = e:/Rabbit/Certificates/key.pem
web_stomp.ssl.cacertfile = e:/Rabbit/Certificates/cacert.pem
web_stomp.ssl.password = xxxxxxx
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config
cluster_formation.classic_config.nodes.1 = rabbit#my-server-1
cluster_formation.classic_config.nodes.2 = rabbit#my-server-2

What's the use of the [runners.docker] section in config.toml for use case with docker machine?

reading the documentation on autoscaling I can't figure the role of the [runner.docker] section when using docker+machine as executor :
[runners.docker]
image = "ruby:2.1" # The default image used for builds is 'ruby:2.1'
In the executors documentation it says :
docker+machine : like docker, but uses auto-scaled docker machines -
this requires the presence of [runners.docker] and [runners.machine]
I get I have to define this [runners.docker] section to be able to use [runners.machine] section, but what is the aim of this [runners.docker] ?
I can't find how to configure it as I don't understand why to use it.
Our gitlab-runner runs on a vSphere VM and is configured to scale using docker+machine executor with MachineDriver using vmwarevsphere. All works nice but I would like to understand fully the configuration file.
Here is our "censored with stars" config.toml file with the [runners.docker] I can't understand (note that the guy that wrote it leaved the company, so I can't ask him):
[[runners]]
name = "gitlab-runner"
limit = 6
output_limit = 102400
url = "http://gitlab.**************.lan"
token = "*******************"
executor = "docker+machine"
[runners.docker]
tls_verify = false
image = "docker:latest"
dns = ["*.*.*.*"]
privileged = true
disable_cache = false
volumes = ["/etc/localtime:/etc/localtime:ro", "/var/run/docker.sock:/var/run/docker.sock", "/etc/docker/certs.d:/etc/docker/certs.d", "/cache:/cache", "/builds:/builds"]
cache_dir = "cache"
shm_size = 0
[runners.cache]
Type = "s3"
ServerAddress = "*.*.*.*"
AccessKey = "*****************"
SecretKey = "*****************"
BucketName = "runner"
Insecure = true
[runners.machine]
IdleCount = 4
MaxBuilds = 10
IdleTime = 3600
MachineDriver = "vmwarevsphere"
MachineName = "gitlab-runner-pool-1-%s"
MachineOptions = ["vmwarevsphere-username=************", "vmwarevsphere-password=*****************", "vmwarevsphere-vcenter=*.*.*.*", "vmwarevsphere-datastore=*********", "vmwarevsphere-memory-size=3096", "vmwarevsphere-disk-size=40960", "vmwarevsphere-cpu-count=3", "vmwarevsphere-network=*****************", "vmwarevsphere-datacenter=**************", "vmwarevsphere-hostsystem=*******************", "engine-storage-driver=overlay2", "engine-insecure-registry=**************", "engine-insecure-registry=*******************"]
OffPeakPeriods = ["* * 0-8,21-23 * * mon-fri *", "* * * * * sat,sun *"]
OffPeakTimezone = "Local"
OffPeakIdleCount = 1
OffPeakIdleTime = 600
The [runners.machine] section defines how to start and provision your runner machines, the [runners.docker] section then defines how to configure the runner on that machine.
Docker-machine on its own only does the following (as you can read here):
"Docker Machine is a tool that lets you install Docker Engine on virtual hosts, and manage the hosts with docker-machine commands."
So this does nothing with the Gitlab runner, you still need to configure the runner after that and thats where the [runners.docker] section comes into play because the runner needs to know what default image to use and what volumes to mount etc.

gitlab-runner Checking for jobs... failed . Error decoding json payload unexpected EOF

I install a gitlab-runner in Windows 10
When the Gitlab CI start to execute the job which the gitlab-runner is supposed to work on, sometimes, the gitlab-runner will yell the following logs:
time="2017-12-26T16:39:49+08:00" level=warning msg="Checking for jobs... failed" runner=96856a1d status="Error decoding json payload unexpected EOF"
It is really annoying.
I have to restart the gitlab-runner and it could work again.
The following is the content of config.toml
concurrent = 1
check_interval = 30
[[runners]]
name = "windows docker runner"
url = "http://my-gitlab.internal.example.com:9090/"
token = "abcdefg1c39f10e869625c2118e"
executor = "docker"
[runners.docker]
tls_verify = false
image = "docker:latest"
privileged = false
disable_cache = false
volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache"]
[runners.cache]
Insecure = false
Try running in debugmode (stop service first) to get more info about the error.

Cannot have file provisioner working with Terraform on DigitalOcean

I try to use Terraform to create a DigitalOcean node on which consul is installed.
I'm using the following .tf file but it hangs up and do not copy the consul .zip file onto the droplet.
I got the following error message after a couple of minutes:
ssh: handshake failed: ssh: unable to authenticate, attempted methods
[none publickey], no supported methods remain
The droplets are correctly created though. I can login on command line with the key I specified (thus not specifying password). I'm guessing the connection part might be faulty but not sure what I'm missing.
Any idea ?
variable "do_token" {}
# Configure the DigitalOcean Provider
provider "digitalocean" {
token = "${var.do_token}"
}
# Create nodes
resource "digitalocean_droplet" "consul" {
count = "1"
image = "ubuntu-14-04-x64"
name = "consul-${count.index+1}"
region = "lon1"
size = "1gb"
ssh_keys = ["7b:51:d3:e3:ae:6e:c6:e2:61:2d:40:56:17:54:fc:e3"]
connection {
type = "ssh"
user = "root"
agent = true
}
provisioner "file" {
source = "consul_0.7.1_linux_amd64.zip"
destination = "/tmp/consul_0.7.1_linux_amd64.zip"
}
provisioner "remote-exec" {
inline = [
"sudo unzip -d /usr/local/bin /tmp/consul_0.7.1_linux_amd64.zip"
]
}
}
Terraform requires that you specify the private SSH key to use for the connection with private_key You can create a new variable containing the path to your private key for use with Terraform's file interpolation function:
connection {
type = "ssh"
user = "root"
agent = true
private_key = "${file("${var.private_key_path}")}"
}
You face this issue, because you have a ssh key protected by a password. To solve this issue you should generate a key without password.