Deploying application which is not a web app? Kubernetes - kotlin

I am trying to deploy a pod to the cluster. The application I am deploying is not a web server. I have an issue with setting up the liveness and readiness probes. Usually, I would use something like /isActive and /buildInfo endpoint for that.
I've read this https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-command.
Wondering if I need to code a mechanism which will create a file and then somehow prob it from the deployment.yaml file?
Edit: this is what I used to keep the container running, not sure if that is the best way to do it?
- touch /tmp/healthy; while true; do sleep 30; done;

It does not make sense to create files in your application just for the liveness probe. On the K8s documentation this is just an example to show you how the exec command probe works.
The idea behind the liveness probe is bipartite:
Avoid traffic on your Pods, before they have been fully started.
Detect unresponsive applications due to lack of resources or deadlocks where the application main process is still running.
Given that your deployments don't seem to expect external traffic, you don't require a liveness probe for the first case. Regarding the second case, question is how your application could lock up and how you would notice externally, e.g. by monitoring a log file or similar.
Bear in mind, that K8s will still monitor whether your applications main process is running. So, restarts on application failure will still occur, if you application stops running without a liveness probe. So, if you can be fairly sure that your application is not prone to becoming unresponsive while still running, you can also do without a liveness probe.

Related

Kubernetes probe running acceptance test

I have a situation where my acceptance test makes a connection with a rabbitMQ instance during the pipeline. But the rabbitMQ instance is private, making not possible to make this connection in the pipeline.
I was wondering if making an api endpoint that run this test and adding to the startup probe would be a good approach to make sure this test is passing.
If the rabbitmq is a container in your pod yes, if it isn't then you shouldn't.
There's no final answer to this, but the startup probe is just there to ensure that your pod is not being falsly considered unhealthy by other probes just because it takes a little longer to start. It's aimed at legacy applications that need to build assets or compile stuff at startup.
If there was a place to put a connectivity test to rabitmq would be the liveness probe, but you should only do that if your application is entirely dependent on a connection to rabbitmq, otherwise your authentication would fail because you couldn't connect to the messaging queue. And if you have a second app that tries to connect to your endpoint as a liveness probe? And a third app that connects the second one to check if that app is alive? You could kill an entire ecosystem just because rabbitmq rebooted or crashed real quick.
Not recommended.
You could have that as part of your liveness probe IF your app is a worker, then, not having a connection to rabbitmq would make the worker unusable.
Your acceptance tests should be placed on your CD or in a post-deploy script step if you don't have a CD.

What is benefit of readiness probe and healthCheck?

I am working on an application that, as I can see is doing multiple health checks?
DB readiness probe
Another API dependency readiness probe
When I look at cluster logs, I realize that my service, when it fails a DB-check, just throws 500 and goes down. What I am failing to understand here is that if DB was down or another API was down and IF I do not have a readiness probe then my container is going down anyway. Also, I will see that my application did throw some 500 because DB or another service was off.
What is the benefit of the readiness probe of my container was going down anyway? Another question I have is that is Healthcheck something that I should consider only if I am deploying my service to a cluster? If it was not a cluster microservice environment, would it increase/decrease benefits of performing healtheck?
There are three types of probes that Kubernetes uses to check the health of a Pod:
Liveness: Tells Kubernetes that something went wrong inside the container, and it's better to restart it to see if Kubernetes can resolve the error.
Readiness: Tells Kubernetes that the Pod is ready to receive traffic. Sometimes something happens that doesn't wholly incapacitate the Pod but makes it impossible to fulfill the client's request. For example: losing connection to a database or a failure on a third party service. In this case, we don't want Kubernetes to reset the Pod, but we also don't wish for it to send it traffic that it can't fulfill. When a Readiness probe fails, Kubernetes removes the Pod from the service and stops communication with the Pod. Once the error is resolved, Kubernetes can add it back.
Startup: Tells Kubernetes when a Pod has started and is ready to receive traffic. These probes are especially useful on applications that take a while to begin. While the Pod initiates, Kubernetes doesn't send Liveness or Readiness probes. If it did, they might interfere with the app startup.
You can get more information about how probes work on this link:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
Readiness probes are used in a few places. A big one is that non-ready pods are removed from all Services that reference them. They also matter for rolling updates on Deployments/StatefulSets as the roll won't continue until the new pods reach a ready state. In general the checks used for readiness probes should only be checking the current service. So it shouldn't be reaching out to a database. Sometimes that's hard to implement and does indeed make them less useful. But check per-pod stuff like the web server is listening on the port and can return HTTP responses.

Failed Deployment in App Engine Google Cloud

I am deploying my nodejs application in google cloud app engine but it is giving error
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time.
This request may thus take longer and use more CPU than a typical request for your application. -- when making request.
I had also saw some stackoverflow answers, but they didn't worked for me.
my app.yaml have this config
runtime: nodejs10
Can anyone help me out
You could add the following to your app.yaml:
inbound_services:
- warmup
And then implement a handler that will catch all warmup requests, so that your application doesn't get the full load. The full explanation is given here. Another detailed post about this topic can be found here.
Additionally you can also add automatic scaling options. You can play a bit with those to find the optimum for your application. Especially the latency related variables are important. Good to note that they can be set in a standard GAE environment.
automatic_scaling:
min_idle_instances: automatic
max_idle_instances: automatic
min_pending_latency: automatic
max_pending_latency: automatic
More scaling options can be found here.
The "request caused a new process to be started" notification usually occurred when there is no warm up request present in your application.
Can you try to implement a health check handler that only returns a ready status when the application is warmed up. This will allow your service to not receive traffic until it is ready.
Warning: Legacy health checks using the /_ah/health path are now
deprecated, and you should migrate to use split health checks.
Here you can find Split health checks for Nodejs
Liveness checks
Liveness checks confirm that the VM and the Docker container are
running. Instances that are deemed unhealthy are restarted.
path: "/liveness_check"
check_interval_sec: 30
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
Readiness checks
Readiness checks confirm that an instance can accept incoming
requests. Instances that don't pass the readiness check are not added
to the pool of available instances.
path: "/readiness_check"
check_interval_sec: 5
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
app_start_timeout_sec: 300
Edit
For App Engine Standard, which doesn't afford you that flexibility, hardware and software failures that cause early termination or frequent restarts can occur without prior warning. link
App Engine attempts to keep manual and basic scaling instances running
indefinitely. However, at this time there is no guaranteed uptime for
manual and basic scaling instances. Hardware and software failures
that cause early termination or frequent restarts can occur without
prior warning and can take considerable time to resolve; thus, you
should construct your application in a way that tolerates these
failures.
Here are some good strategies for avoiding downtime due to instance
restarts:
Reduce the amount of time it takes for your instances restart or for
new ones to start.
For long-running computations, periodically create
checkpoints so that you can resume from that state.
Your app should be "stateless" so that nothing is stored on the instance.
Use queues for performing asynchronous task execution.
If you configure your instances to manual scaling: Use load balancing across > multiple instances. Configure more instances than required to handle normal
traffic. Write fall-back logic that uses cached results when a manual
scaling instance is unavailable.
Instance Uptime

Running a Raku Cro app as a persistent service

I'd like to run a perl6/raku Cro app as a service behind a frontend webserver.
Just running cro run won't handle restarting after segfaults & reboots.
Previously with perl5 I've used FastCGI - however Cro::HTTP::Server's Cro::HTTP::Server.new().start() idiom doesn't look compatible with FastCGI::Native's while $fcgi.accept() {} example.
The service.p6 generated by cro stub does have a SIGINT handler, however I'm unsure whether this is sufficient to point to it in a systemctl service, i.e.
[Service]
ExecStart = /path/to/service.p6
How are people currently hosting Cro apps?
cro run is intended as a development tool, not a deployment one, and so is indeed not a good choice for hosting the services.
All of the Cro services I directly take care of are containerized (some guidance on that here) and then run on a hosted Kubernetes cluster. Kubernetes takes care of the automatic restarts, rolling out new versions, etc. I'm also aware of docker-compose being used in place of Kubernetes, which I guess works, though I believe that's also considered as primarily a development tool.
Setting it up as a systemctl service should also work fine, provided that's configured to always restart. However, it seems that you'd want to handle SIGTERM for the clean shutdown to work instead of SIGINT (nothing wrong with handling both).
I do also place a frontend web server in front of Cro (using Apache, though nginx would be a fine choice too), and also use that to do some caching of static content also (using content-control in my routes to describe the cachability).

Should all pods using a redis cache be constrained to the same node as the rediscache itself?

We are running one of our services in a newly created kubernetes cluster. Because of that, we have now switched them from the previous "in-memory" cache to a Redis cache.
Preliminary tests on our application which exposes an API shows that we experience timeouts from our applications to the Redis cache. I have no idea why and it issue pops up very irregularly.
So I'm thinking maybe the reason for these timeouts are actually network related. Is it a good idea to put in affinity so we always run the Redis-cache on the same nodes as the application to prevent network issues?
The issues have not arisen during "very high load" situations so it's concerning me a bit.
This is an opinion question so I'll answer in an opinionated way:
Like you mentioned I would try to put the Redis and application pods on the same node, that would rule out wire networking issues. You can accomplish that with Kubernetes pod affinity. But you can also try nodeslector, that way you always pin your Redis and application pods to a specific node.
Another way to do this is to taint your nodes where you want to run your workloads and then add a toleration to the Redis and your application pods.
Hope it helps!