For one of my home projects I decided to use docker containers and fig for orchestration (first time using those tools).
Here is my fig.yaml:
rabbitmq:
image: dockerfile/rabbitmq:latest
mongodb:
image: mongo
app:
build: .
command: python /code/app/main.py
links:
- rabbitmq
- mongodb
volumes:
- .:/code
Rabbitmq starting time is much slower than loading time of my application. Even though rabbitmq container starts loading first (since it is in app links), when my app tries to connect to rabbitmq server it's not yet available (it's definately loading timing problem, since if I just insert sleep for 5 seconds before connecting to rabbitmq - everything works fine). Is there some standard way to resolve loading time synchronisation problems?
Thanks.
I don't think there is an standard way to solve this, but it is a known problem and some people have acceptable workarounds.
There is a proposal on the Docker issue tracker about not considering a container as started until it is listening at the exposed ports. However it likely won't be accepted due to other problems it would create elsewhere. There is a fig proposal on the same topic as well.
The easy solution is to do the sleep like #jcortejoso says. An example from http://blog.chmouel.com/2014/11/04/avoiding-race-conditions-between-containers-with-docker-and-fig/:
function check_up() {
service=$1
host=$2
port=$3
max=13 # 1 minute
counter=1
while true;do
python -c "import socket;s = socket.socket(socket.AF_INET, socket.SOCK_STREAM);s.connect(('$host', $port))" \
>/dev/null 2>/dev/null && break || \
echo "Waiting that $service on ${host}:${port} is started (sleeping for 5)"
if [[ ${counter} == ${max} ]];then
echo "Could not connect to ${service} after some time"
echo "Investigate locally the logs with fig logs"
exit 1
fi
sleep 5
(( counter++ ))
done
}
And then use check_up "DB Server" ${RABBITMQ_PORT_5672_TCP_ADDR} 5672 before starting your app server, as described in the link above.
Another option is to use docker-wait. In your fig.yml.
rabbitmq:
image: dockerfile/rabbitmq:latest
mongodb:
image: mongo
rabbitmqready:
image: aanand/wait
links:
- rabbitmq
app:
build: .
command: python /code/app/main.py
links:
- rabbitmqready
- mongodb
volumes:
- .:/code
Similar problems I have encountered I have solved using a custom script set up as CMD in my Dockerfiles. Then you can run any check command you wish (sleep for a time, or waiting to the service be listening, for example). I think there is not a standard way to do this, anyway I think the best way would be the application run could be able to ask the external service to be up and running, and the connect to them, but this is not possible in most cases.
For testing on our CI, we built a small utility that can be used in a Docker container to wait for linked services to be ready. It automatically finds all linked TCP services from their environment variables and repeatedly and concurrently tries to establish TCP connections until it succeeds or times out.
We also wrote a blog post describing why we built it and how we use it.
Related
I am trying to run an azure container instance but it appears to be getting killed off the second I run it. This works fine in 2 other resource groups but not my production resource group where I see the following:
In events I see 'Successfully pulled image
selenium/standalone-chrome:latest' with count 1 and then 'Started
container' and then 'Killing container' with count 31. The times for
started and killed are the same.
In logs, it just says 'No logs available'
The metrics for CPU and memory on the container never show any change from zero.
Looked at this article but the proposed solution didn't work: Azure Container Group Instance I have tried putting on both an empty directory volume and 2Gb of ram as advised here: https://github.com/SeleniumHQ/docker-selenium but nothing works.
This is the code I am using to create the container:
containerGroup = await azure.ContainerGroups.Define(containerName)
.WithRegion("West Europe")
.WithExistingResourceGroup(configuration.ContainerResourceGroup)
.WithLinux()
.WithPublicImageRegistryOnly()
.WithEmptyDirectoryVolume("devshm")
.DefineContainerInstance(containerName)
.WithImage("selenium/standalone-chrome")
.WithExternalTcpPorts(4444)
.WithVolumeMountSetting("devshm", "/dev/shm")
.WithMemorySizeInGB(2)
.Attach()
.WithDnsPrefix(configuration.AppServiceName + "container")
.WithRestartPolicy(ContainerGroupRestartPolicy.OnFailure)
.CreateAsync(cancellationToken);
How do I debug what is going wrong?
What is wrong with the container?
In case this helps someone I renamed the "containerName" parameter in the above example from myinstance to myinstance1 and changed the region from West Europe to UK South. This fixed the issue. I can only think that Azure caches instances somehow to reduce start up times and the cached image I was using was poisoned somehow.
One issue could be the restart policy - have a look at the Microsoft restart policy troubleshooting on Microsoft's ACI troubleshooting page. According to the website under the Container continually exits and restarts (no long-running process) header in the page:
Container groups default to a restart policy of Always, so containers
in the container group always restart after they run to completion.
You may need to change this to OnFailure or Never if you intend to run
task-based containers. If you specify OnFailure and still see
continual restarts, there might be an issue with the application or
script executed in your container.
In your case you may need to adjust the code as follows using the withStartingCommand:
containerGroup = await azure.ContainerGroups.Define(containerName)
.WithRegion("West Europe")
.WithExistingResourceGroup(configuration.ContainerResourceGroup)
.WithLinux()
.WithPublicImageRegistryOnly()
.WithEmptyDirectoryVolume("devshm")
.DefineContainerInstance(containerName)
.WithImage("selenium/standalone-chrome")
.WithExternalTcpPorts(4444)
.WithVolumeMountSetting("devshm", "/dev/shm")
.WithMemorySizeInGB(2)
.WithStartingCommandLine("tail")
.WithStartingCommandLine("-f")
.WithStartingCommandLine("/dev/null")
.Attach()
.WithDnsPrefix(configuration.AppServiceName + "container")
.WithRestartPolicy(ContainerGroupRestartPolicy.OnFailure)
.CreateAsync(cancellationToken);
This link is helpful for this issue.
--command-line
linux => "tail -f /dev/null"
windows => "ping -t localhost"
# .yml
command: tail -f /dev/null
It will keep your azure instance running.
As now azure do have a endpoint to connect/analyze the process on.
I am trying to use Geode Redis Adapter as my server for Rate Limiting provided by Spring Cloud Gateway. If I use a real Redis Server, everything works perfectly, but with Geode Redis Adapter doesn't.
I am not too sure if this functionality is supported.
I tried to start a [Geode image] (https://hub.docker.com/r/apachegeode/geode/) exposing the default Redis port 6739. Starting the container, I executed using gfsh the following commands:
start server --name=redis --redis-port=6379 --J=-Dgemfireredis.regiontype=PARTITION_PERSISTENT
When I try to access in my local machine by redis-cli -h localhost -p 6379 I can get connected.
My implementation is simple:
application.yaml
- id: rate-limitter
predicates:
- Path=${GUI_CONTEXT_PATH:/rate-limit}
- Host=${APP_HOST:localhost:8080}
filters:
- name: RequestRateLimiter
args:
key-resolver: "#{#remoteAddrKeyResolve}"
redis-rate-limiter:
replenishRate: ${rate.limit.replenishRate:1}
burstCapacity: ${rate.limit.burstCapacity:2}
uri: ${APP_HOST:localhost:8080}
Application.java
#Bean
KeyResolver remoteAddrKeyResolve() {
return exchange -> Mono.just(exchange.getSession().subscribe().toString());
}
When my application is started and I try to access /rate-limit, I expected to connect to redis and my page be displayed.
However, my Spring application keeps trying to access and can't i.l.c.p.ReconnectionHandler: Reconnected to localhost:6379. So, the page is not displayed and keep loading. FIXED in Edit1 below
Problem is I am using RedisRateLimiter and tried to simulate the access with a for loop. Checking the RedisRateLimiter.REMAINING_HEADER, the value is -1 always. Doesn't seems right, because I don't have this issue in Redis itself.
During the start of the application, I also receive these messages on connection to Geode Redis Adapter:
Starting without optional epoll library
Starting without optional kqueue library
Is anything missing in my Geode Redis Adapter or anything else in Spring?
Thank you
Edit 1: I was missing to start the locator and region, that's why I wasn't able to connect.
start locator --name=locator
start server --name=redis --redis-port=6379 --J=-Dgemfireredis.regiontype=PARTITION_PERSISTENT
create region --name=redis-region --type=REPLICATE_PERSISTENT
We have several pods (as service/deployments) in our k8s workflow that are dependent on each other, such that if one goes into a CrashLoopBackOff state, then all these services need to be redeployed.
Instead of having to manually do this, is there a programatic way of handling this?
Of course we are trying to figure out why the pod in question is crashing.
If these are so tightly dependant on each other, I would consider these options
a) Rearchitect your system to be more resilient towards failure and tolerate, if a pod is temporary unavailable
b) Put all parts into one pod as separate containers, making the atomic design more explicit
If these don't fit your needs, you can use the Kubernetes API to create a program that automates the task of restarting all dependent parts. There are client libraries for multiple languages and integration is quite easy. The next step would be a custom resource definition (CRD) so you can manage your own system using an extension to the Kubernetes API.
First thing to do is making sure that pods are started in correct sequence. This can be done using initContainers like that:
spec:
initContainers:
- name: waitfor
image: jwilder/dockerize
args:
- -wait
- "http://config-srv/actuator/health"
- -wait
- "http://registry-srv/actuator/health"
- -wait
- "http://rabbitmq:15672"
- -timeout
- 600s
Here your pod will not start until all the services in a list are responding to HTTP probes.
Next thing you may want to define liveness probe that periodically executes curl to the same services
spec:
livenessProbe:
exec:
command:
- /bin/sh
- -c
- curl http://config-srv/actuator/health &&
curl http://registry-srv/actuator/health &&
curl http://rabbitmq:15672
Now if any of those services fail - you pod will fail liveness probe, be restarted and wait for services to become back online.
That's just an example how it can be done. In your case checks can be different of course.
Currently I am having an issue with trying to run a process/script in the background[The master starts it on the minion]
The script is something like this:
#!/bin/bash
nohup ping 8.8.8.8 >/dev/null&
And I call it from the master with:
Process-Name:
service.running:
- name: Script-Name
- enable: True
For some reason it gets stuck on the master,I've read a little bit on this issue[it has happenned before apparently] and tried their solutions on it but apparently nothing involving the service state seems to work.
Is there anyway to work around this?
In short, you should configure your script as system daemon first (SysV init.d script, or systemd unit, or ... depends on OS).
Details
The service.running function requires properly configured system service ~ daemon.
For example, on RHEL-based Linux, if you don't see your script name in the output of one of these commands, you should configure it as proper service first (which is a separate topic):
# systemd
systemctl list-units | grep your_service_name
# SysV init.d
chkconfig --list | grep your_service_name
And because you want to start it in background, cmd.run function is not the right tool either:
It will only report successful start of the script without waiting for its completion results.
It will also start new instance of your script every time.
However, if all you simply want is to "fire and forget", use cmd.run.
Is there any reason that the rabbitmq-management plugin wouldn't work when I'm using 'rabbitmq-multi' to spin up a cluster of nodes on my desktop? Or, more precisely, that the management plugin would cause that spinup to fail?
I get Error: {node_start_failed,normal} when rabbitmq-multi starts rabbit_1#localhost
The first node, rabbit#localhost seems to start okay though.
If I take out the management plugins, all the nodes start up (and then cluster) fine. I think I'm using a recent enough Erlang version (5.8/OTP R14A according to the README in my erl5.8.2 folder). I'm using all the plugins that are listed as required on the plugins page, including mochiweb, webmachine, amqp_client, rabbitmq-mochiweb, rabbitmq-management-agent, and rabbitmq-management. Those plugins, and only those plugins.
The problem is that rabbitmq-multi only assigns sequential ports for AMQP, not HTTP (or STOMP or AMQPS or anything else the broker may open). Therefore each node tries to listen on the same port for the management plugin and only the first succeeds. rabbitmq-multi will be going away in the next release; this is one reason why.
I think you'll want to start the nodes without using rabbitmq-multi, just with multiple invocations of rabbitmq-server, using environment variables to configure each node differently. I use a script like:
start-node.sh:
#!/bin/sh
RABBITMQ_NODE_PORT=$1 RABBITMQ_NODENAME=$2 \
RABBITMQ_MNESIA_DIR=/tmp/rabbitmq-$2-mnesia \
RABBITMQ_PLUGINS_EXPAND_DIR=/tmp/rabbitmq-$2-plugins-scratch \
RABBITMQ_LOG_BASE=/tmp \
RABBITMQ_SERVER_START_ARGS="-rabbit_mochiweb port 5$1" \
/path/to/rabbitmq-server -detached
and then invoke it as
start-node.sh 5672 rabbit
start-node.sh 5673 hare