I have a very simple container which says "hello world"
I have successfully run them and scale them to X.
They all seem to be in a cycle where they would run it then sleep for a bit then run it again.
Marathon cycle would be: Waiting, Running, Delayed and repeat
Swarm cycle woulbd be: Ready, Running, Shutdown and repeat
How do i specify so that the container finishes after first execution whether in swarm or marathon?
You can not, both Swarm and Marathon are designed for long running services.
For running something just one time you should use plain docker run command in Swarm and some other framework in Mesos (Marathon run on Mesos) e.g. Chronos which is cron replacement for Mesos and runs tasks periodically.
Related
Am trying to install one agent in my ECS fargate task. Along with application container i have added another container definition for one agent with image as alpine:latest and used run time injection.
While running the task, initially the one agent container is in running state and after a minute it goes to stopped state same time application container will be in running state.
In dynatrace the same host is available and keeps recreating after 5-10mins frequently.
Actually the issue that I had was task was in draining status because of application issue due to which in dynatrace it keeps recreating... And the same time i used run time injection for my ECS fargate so once the binaries are downloaded and injected to volume, the one agent container definition will stop while the application container keeps running and injecting logs in dynatrace.
I have the same problem and connected via ssh to the cluster I saw that the agent needs to be privileged. The only thing that worked for me was sending traces and metrics through Opentelemetry.
https://aws-otel.github.io/docs/components/otlp-exporter
Alternative:
use sleep infinity in the command field of your oneAgent container.
I am using Jenkins-Docker-Pluginhttps://wiki.jenkins.io/display/JENKINS/Docker+Plugin to dynamically create containers and use them as Jenkins Slaves. This is working fine for some jobs. However for some longer running jobs (10mins >) docker container get removed in midway. Making job failed.
I have tried increasing various timeout options in plugin configuration, However no result. Can anyone please help.
I know I am quite late to post answer here. I am able to get the root cause of the issue. Problem was using two Jenkins instance with same Jenkins Home Directory. Seems Jenkins Docker plugin runs daemon to kill docker container associated with Jenkins Master. As we are running two Jenkins instance with same Jenkins Home directory (Copy of It) Docker containers started for CI work get deleted due to daemon of each other.
I'm trying to test out Airflow on Kubernetes. The Scheduler, Worker, Queue, and Webserver are all on different deployments and I am using a Celery Executor to run my tasks.
Everything is working fine except for the fact that the Scheduler is not able to queue up jobs. Airflow is able to run my tasks fine when I manually execute it from the Web UI or CLI but I am trying to test the scheduler to make it work.
My configuration is almost the same as it is on a single server:
sql_alchemy_conn = postgresql+psycopg2://username:password#localhost/db
broker_url = amqp://user:password#$RABBITMQ_SERVICE_HOST:5672/vhost
celery_result_backend = amqp://user:password#$RABBITMQ_SERVICE_HOST:5672/vhost
I believe that with these configurations, I should be able to make it run but for some reason, only the workers are able to see the DAGs and their state, but not the scheduler, even though the scheduler is able to log their heartbeats just fine. Is there anything else I should debug or look at?
First, you use postgres as database for airflow, don't you? Do you deploy a pod and service for postgres? If it is the case, do you verify that in your config file you have :
sql_alchemy_conn = postgresql+psycopg2://username:password#serviceNamePostgres/db
You can use this github. I used it 3 weeks ago for a first test and it worked pretty well.
The entrypoint is useful to verify that rabbitMq and Postgres are well configured.
I am running a Selenium test suite on ec2 windows instances. These instances should be restarted once every few days as maintenance to free up memory etc.
The problem I have is when I send the restart-command to the slave from Jenkins, I can´t be certain that the slave have no running jobs at that time, since the slave run several executors.
Is there a way to tell a node that, as soon as job X is triggered, drop the amount of executors to 0? If not, is there a way to gracefully put a slave offline (i.e: "complete all jobs in que but don´t accept any new jobs")?
(jenkins_url)/safeRestart - Allows all running jobs to complete. New
jobs will remain in the queue to run after the restart is complete.
I want to start redis and redis-scheduler from a rake task so I'm doing the following:
namespace :raketask do
task :start do
system("QUEUE=* rake resque:work &")
system("rake redis:start")
system("rake resque:scheduler")
end
end
The problem is the redis starts in the foreground and then this never kicks off the scheduler. If It won't start in the background (using &). Scheduler must be started AFTER redis is up and running.
similar to nirvdrum. The resque workers are going to fail/quit if redis isn't already running and accepting connections.
check out this gist for an example of how to get things started with monit (linux stuff).
Monit allows one service to be dependent on another, and makes sure they stay alive by monitoring a .pid file.
That strikes me as not a great idea. You should have your redis server started via an init script or something. But, if you really want to go this way, you probably need to modify your redis:start task to use nohup and background the process so you can disconnect from the TTY and keep the process running.