How to control job scheduling in a better way in gitlab-ci? - gitlab-ci

I have in my gitlab projects jobs defined and executed via gitlab-ci. However, it doesn't do well with interdependent jobs as there's no management of this case except manual.
The case I have is a service, which is a part of the overall app, takes long time to start. Starting this service is done within a job, while another job have another service, which is also a part of the overall app, querying the former service. Due to interdependence, I have just delayed the execution of this later job so that most probably the former job has its service up and running.
I wanted to use Rundeck as a job scheduler but not sure if this can be done with gitlab? Maybe I am wrong about gitlab, so does gitlab allow better job scheduling?
Here's an example of what I am doing:
.gitlab-ci.yml
deploy:
environment:
name:$CI_ENVIRONMENT
url: http://$CI_ENVIRONMENT.local.net:4999/
allow_failure: true
script:
- sudo dpkg -i myapp.deb
- sleep 30m //here I wait for the service to be ready for later jobs to run successfully
- RESULT=`curl http://localhost:9999/api/test | grep Success'

it looks like it is a typical trigger feature inside gitlab-ci
see gitlab-ci triggers
mostly at the end of the long start-up service job A to use curl to trigger another one
deploy_service_a:
stage: deploy
script:
- "curl --request POST --form token=TOKEN --form ref=master https://gitlab.example.com/api/v4/projects/9/trigger/pipeline"
only:
- tags

Related

Connect back to main container from GitlabCI service

I am using Gitlab CI with docker executor and services.
During test I'm starting a server in the main script, and I need the service to make a request back to the main script.
Is there address/alias I can use to connect back to the main build script? Something like host.docker.internal.
Pseudo-example:
test:
services:
name: ping-pong-service
variables:
CALLBACK_ADDRESS: 'http://host.docker.internal:8090/pong'
script:
- "Start a server at 0.0.0.0:8090"
- curl http://ping-pong-service:80/ping
Supose that ping-pong-service is a service that when receiving any http request on :80, performs new request to CALLBACK_ADDRESS. What should I enter into CALLBACK_ADDRESS to connect back to main container?
I tried looking into what containers get started on the runner, but the main container doesn't seem to have predictable name or alias in the docker network.
Env:
Docker: 20.10.12
Gitlab Runner: 14.8.0, self-hosted, FF_NETWORK_PER_BUILD=1
Gitlab: 14.9.2-ee, self-hosted
When using the FF_NETWORK_PER_BUILD feature flag for networks per job, containers started using services: can reach the main job container using the network alias build
Assuming your service is configured as you describe, you would use:
variables:
CALLBACK_ADDRESS: 'http://build:8090/pong'
Note: this does not apply to containers started using docker run in the job container for this scenario.

Running AWS Log Agent from inside a Fargate container

Trying to run the AWS Logs Agent inside a docker container running on AWS ECS Fargate.
This has been working fine under EC2 for several years. Under Fargate context, it does not seem to be able to resolve the task role being passed to it.
Permissions on the Task Role should be good... I've even tried giving it full CloudWatch permissions to eliminate that as a reason.
I've managed to hack the python based launcher script to add a --debug flag which gave me this in the log:
Caught retryable HTTP exception while making metadata service request to
http://169.254.169.254/latest/meta-data/iam/security-credentials
It does not appear to be properly resolving the credentials that are passed into the task as the 'Task Role'
I managed to find a hack workaround, that may illustrate what I believe to be a bug or inadequacy in the agent. I had to hack the launcher script using sed as follows:
sed -i "s|HTTPS_PROXY|AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI HTTPS_PROXY|"
/var/awslogs/bin/awslogs-agent-launcher.sh
This essentially de-references the ENV variable holding the URI for retrieving the task role and passes it to the agent's launcher.
It results in something like this:
/usr/bin/env -i AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=/v2/credentials/f4ca7e30-b73f-4919-ae14-567b1262b27b (etc...)
With this in place, I restart the log agent and it works as expected.
Note that you can do something like this to add --debug flag to the launcher also which was very helpful in trying to figure out where it went astray.

Managing the health and well being of multiple pods with dependencies

We have several pods (as service/deployments) in our k8s workflow that are dependent on each other, such that if one goes into a CrashLoopBackOff state, then all these services need to be redeployed.
Instead of having to manually do this, is there a programatic way of handling this?
Of course we are trying to figure out why the pod in question is crashing.
If these are so tightly dependant on each other, I would consider these options
a) Rearchitect your system to be more resilient towards failure and tolerate, if a pod is temporary unavailable
b) Put all parts into one pod as separate containers, making the atomic design more explicit
If these don't fit your needs, you can use the Kubernetes API to create a program that automates the task of restarting all dependent parts. There are client libraries for multiple languages and integration is quite easy. The next step would be a custom resource definition (CRD) so you can manage your own system using an extension to the Kubernetes API.
First thing to do is making sure that pods are started in correct sequence. This can be done using initContainers like that:
spec:
initContainers:
- name: waitfor
image: jwilder/dockerize
args:
- -wait
- "http://config-srv/actuator/health"
- -wait
- "http://registry-srv/actuator/health"
- -wait
- "http://rabbitmq:15672"
- -timeout
- 600s
Here your pod will not start until all the services in a list are responding to HTTP probes.
Next thing you may want to define liveness probe that periodically executes curl to the same services
spec:
livenessProbe:
exec:
command:
- /bin/sh
- -c
- curl http://config-srv/actuator/health &&
curl http://registry-srv/actuator/health &&
curl http://rabbitmq:15672
Now if any of those services fail - you pod will fail liveness probe, be restarted and wait for services to become back online.
That's just an example how it can be done. In your case checks can be different of course.

Running a service in the background withouth getting stuck

Currently I am having an issue with trying to run a process/script in the background[The master starts it on the minion]
The script is something like this:
#!/bin/bash
nohup ping 8.8.8.8 >/dev/null&
And I call it from the master with:
Process-Name:
service.running:
- name: Script-Name
- enable: True
For some reason it gets stuck on the master,I've read a little bit on this issue[it has happenned before apparently] and tried their solutions on it but apparently nothing involving the service state seems to work.
Is there anyway to work around this?
In short, you should configure your script as system daemon first (SysV init.d script, or systemd unit, or ... depends on OS).
Details
The service.running function requires properly configured system service ~ daemon.
For example, on RHEL-based Linux, if you don't see your script name in the output of one of these commands, you should configure it as proper service first (which is a separate topic):
# systemd
systemctl list-units | grep your_service_name
# SysV init.d
chkconfig --list | grep your_service_name
And because you want to start it in background, cmd.run function is not the right tool either:
It will only report successful start of the script without waiting for its completion results.
It will also start new instance of your script every time.
However, if all you simply want is to "fire and forget", use cmd.run.

fig up: docker containers start synchronisation

For one of my home projects I decided to use docker containers and fig for orchestration (first time using those tools).
Here is my fig.yaml:
rabbitmq:
image: dockerfile/rabbitmq:latest
mongodb:
image: mongo
app:
build: .
command: python /code/app/main.py
links:
- rabbitmq
- mongodb
volumes:
- .:/code
Rabbitmq starting time is much slower than loading time of my application. Even though rabbitmq container starts loading first (since it is in app links), when my app tries to connect to rabbitmq server it's not yet available (it's definately loading timing problem, since if I just insert sleep for 5 seconds before connecting to rabbitmq - everything works fine). Is there some standard way to resolve loading time synchronisation problems?
Thanks.
I don't think there is an standard way to solve this, but it is a known problem and some people have acceptable workarounds.
There is a proposal on the Docker issue tracker about not considering a container as started until it is listening at the exposed ports. However it likely won't be accepted due to other problems it would create elsewhere. There is a fig proposal on the same topic as well.
The easy solution is to do the sleep like #jcortejoso says. An example from http://blog.chmouel.com/2014/11/04/avoiding-race-conditions-between-containers-with-docker-and-fig/:
function check_up() {
service=$1
host=$2
port=$3
max=13 # 1 minute
counter=1
while true;do
python -c "import socket;s = socket.socket(socket.AF_INET, socket.SOCK_STREAM);s.connect(('$host', $port))" \
>/dev/null 2>/dev/null && break || \
echo "Waiting that $service on ${host}:${port} is started (sleeping for 5)"
if [[ ${counter} == ${max} ]];then
echo "Could not connect to ${service} after some time"
echo "Investigate locally the logs with fig logs"
exit 1
fi
sleep 5
(( counter++ ))
done
}
And then use check_up "DB Server" ${RABBITMQ_PORT_5672_TCP_ADDR} 5672 before starting your app server, as described in the link above.
Another option is to use docker-wait. In your fig.yml.
rabbitmq:
image: dockerfile/rabbitmq:latest
mongodb:
image: mongo
rabbitmqready:
image: aanand/wait
links:
- rabbitmq
app:
build: .
command: python /code/app/main.py
links:
- rabbitmqready
- mongodb
volumes:
- .:/code
Similar problems I have encountered I have solved using a custom script set up as CMD in my Dockerfiles. Then you can run any check command you wish (sleep for a time, or waiting to the service be listening, for example). I think there is not a standard way to do this, anyway I think the best way would be the application run could be able to ask the external service to be up and running, and the connect to them, but this is not possible in most cases.
For testing on our CI, we built a small utility that can be used in a Docker container to wait for linked services to be ready. It automatically finds all linked TCP services from their environment variables and repeatedly and concurrently tries to establish TCP connections until it succeeds or times out.
We also wrote a blog post describing why we built it and how we use it.