haproxy pods keep crashloopbackoff - redis

I'm setting up a redis-ha in my kubernetes cluster. And I used helm to install it. But my haproxy pods keep crashloopbackoff
I'm using helm to install a redis-ha in my kubernetes cluster with command: helm install -f develop-redis-values.yaml stable/redis-ha --namespace=develop -n=develop-redis
In develop-redis-values.yaml, I set haproxy.enabled to true
This is the logs in my crashloopbackoff pod
> [ALERT] 268/104750 (1) : parsing [/usr/local/etc/haproxy/haproxy.cfg:34] : 'tcp-check expect string' expects <string> as an argument.
> [ALERT] 268/104750 (1) : Error(s) found in configuration file : /usr/local/etc/haproxy/haproxy.cfg
> [ALERT] 268/104750 (1) : Fatal errors found in configuration.
I'm expected the haproxy pods is running

CrashLoopBackError can be related to these possible reasons:
the application inside your pod is not starting due to an error;
the image your pod is based on is not present in the registry, or the node where your pod has been scheduled cannot pull from the registry;
some parameters of the pod has not been configured correctly.
In your case, it seems that there are some errors in you haproxy configuration files.
Have you tried to pull the image you're using locally, and start a container to verify it?
You can enter in the container and check the configuration with:
haproxy -c -V -f /usr/local/etc/haproxy/haproxy.cfg
For more information and debugging ways:
https://pillsfromtheweb.blogspot.com/2020/05/troubleshooting-kubernetes.html

Related

gitlabrunner unable to clone repo

I am using gitlab UI to deploy rancher via TF, job runs on GitLab Runner as a container on linux VM.
below is the config of.gitlab-ci.yml
- echo "https://gitlab-ci-token:${CI_JOB_TOKEN}#git.myservice.demo.com" >> ~/.git-credentials
- git config --global credential.helper 'store --file ~/.git-credentials'
when i run the pipeline it fail to clone the repo , i have active deployment token not sure why its failing.
Any guidance will be appreciated as very new to gitlab.
Pipeline error
Running with gitlab-runner 13.2.1 (efa30e33)
on b069898257b6 HpcxYCyA
Preparing the "docker" executor
00:05
Using Docker executor with image hashicorp/terraform:0.12.29 ...
Pulling docker image hashicorp/terraform:0.12.29 ...
Using docker image sha256:323b4bbc567117d19a68bcfe71e87ce9be855674005f645e41c8faedf4c263cb for hashicorp/terraform:0.12.29 ...
Preparing environment
00:02
Running on runner-hpcxycya-project-257-concurrent-0 via 7d0ddeb92b75...
Getting source from Git repository
00:02
$ git config --global http.proxy $HTTP_PROXY; git config --global https.proxy $HTTPS_PROXY
Fetching changes with git depth set to 50...
Reinitialized existing Git repository in /builds/demo/rancher-prod/.git/
fatal: unable to access 'https://git.myservice.demo.com/demo/rancher-prod.git/': SSL certificate problem: unable to get local issuer certificate
ERROR: Job failed: exit code 1
# openssl s_client -connect git.myserives.demo.com:443
140605252743616:error:0200206E:system library:connect:Connection timed out:../crypto/bio/b_sock2.c:110:
140605252743616:error:2008A067:BIO routines:BIO_connect:connect error:../crypto/bio/b_sock2.c:111:
connect:errno=110

Setting up SSL between Helm and Tiller

I am following these instructions to setup SSL between helm and tiller
When I helm-init like this, I get an error
helm init --tiller-tls --tiller-tls-cert ./tiller.cert.pem --tiller-tls-key ./tiller.key.pem --tiller-tls-verify --tls-ca-cert ca.cert.pem
$HELM_HOME has been configured at /Users/Koustubh/.helm.
Warning: Tiller is already installed in the cluster.
(Use --client-only to suppress this message, or --upgrade to upgrade Tiller to the current version.)
Happy Helming!
When I check my pods, I get
tiller-deploy-6444c7d5bb-chfxw 0/1 ContainerCreating 0 2h
and after describing the pod, I get
Warning FailedMount 7m (x73 over 2h) kubelet, gke-myservice-default-pool-0198f291-nrl2 Unable to mount volumes for pod "tiller-deploy-6444c7d5bb-chfxw_kube-system(3ebae1df-e790-11e8-98ae-42010a9800f9)": timeout expired waiting for volumes to attach or mount for pod "kube-system"/"tiller-deploy-6444c7d5bb-chfxw". list of unmounted volumes=[tiller-certs]. list of unattached volumes=[tiller-certs default-token-9x886]
Warning FailedMount 1m (x92 over 2h) kubelet, gke-myservice-default-pool-0198f291-nrl2 MountVolume.SetUp failed for volume "tiller-certs" : secrets "tiller-secret" not found
If I try to delete the running tiller pod like this, it just gets stuck
helm reset --debug --force
How can I solve this issue? --upgrade flag with helm init, but that doesn't work either.
I had this issue but resolved it by deleting both the tiller deployment and the service and re-initalising.
I'm also using RBAC so have added those commands too:
# Remove existing tiller:
kubectl delete deployment tiller-deploy -n kube-system
kubectl delete service tiller-deploy -n kube-system
# Re-init with your certs
helm init --tiller-tls --tiller-tls-cert ./tiller.cert.pem --tiller-tls-key ./tiller.key.pem --tiller-tls-verify --tls-ca-cert ca.cert.pem
# Add RBAC service account and role
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
# Re-initialize
helm init --service-account tiller --upgrade
# Test the pod is up
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
tiller-deploy-69775bbbc7-c42wp 1/1 Running 0 5m
# Copy the certs to `~/.helm`
cp tiller.cert.pem ~/.helm/cert.pem
cp tiller.key.pem ~/.helm/key.pem
Validate that helm is only responding via tls
$ helm version
Client: &version.Version{SemVer:"v2.10.0", GitCommit:"9ad53aac42165a5fadc6c87be0dea6b115f93090", GitTreeState:"clean"}
Error: cannot connect to Tiller
$ helm version --tls
Client: &version.Version{SemVer:"v2.10.0", GitCommit:"9ad53aac42165a5fadc6c87be0dea6b115f93090", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.10.0", GitCommit:"9ad53aac42165a5fadc6c87be0dea6b115f93090", GitTreeState:"clean"}
Thanks to
https://github.com/helm/helm/issues/4691#issuecomment-430617255
https://medium.com/#pczarkowski/easily-install-uninstall-helm-on-rbac-kubernetes-8c3c0e22d0d7

Kubeflow setup returns no resources found

I am following the steps in the getting started guide for kubeflow and i got stuck at verify the setup works.
I managed to get this:-
$ kubectl get ns
NAME STATUS AGE
default Active 2m
kube-public Active 2m
kube-system Active 2m
kubeflow-admin Active 14s
but when i do
$ kubectl -n kubeflow get svc
No resources found.
I also got
$ kubectl -n kubeflow get pods
No resources found.
I repeated these both on my mac and my ubuntu VM, and both returned the same problem. Am i missing something here?
Thanks.
Yes you're missing something here and that is to use the correct namespace. Use:
$ kubectl -n kubeflow-admin get all
The process of installing the Kubeflow services involves downloading the container images and therefore it takes time for the services to get up and running.
You can run the command periodically until they are up.

Cannot get TCP port information from Kubernetes host-Openshift Origin-oc

I was following Openshift's Local Cluster Management documentation.
After I ran oc cluster up
[root#user ~]# oc cluster up
Starting OpenShift using openshift/origin:v3.6.0 ...
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ...
WARNING: Docker version is 1.21, it needs to be >= 1.22
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v3.6.0 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... FAIL
Error: Cannot get TCP port information from Kubernetes host
Caused By:
Error: cannot create container using image openshift/origin:v3.6.0
Caused By:
Error: Error response from daemon: SHM size must be greater then 0
[root#ip-172-31-0-186 ~]# oc cluster up --loglevel=5
-- Checking OpenShift client ...
-- Checking Docker client ...
I0803 04:30:33.543172 1417 up.go:590] No Docker environment variables found. Will attempt default socket.
I0803 04:30:33.543221 1417 up.go:595] No Docker host (DOCKER_HOST) configured. Will attempt default socket.
-- Checking Docker version ...
I0803 04:30:33.543240 1417 helper.go:114] Retrieving Docker version
I0803 04:30:33.554087 1417 helper.go:120] Docker version results: &types.Version{Version:"1.9.1", APIVersion:"1.21", GitCommit:"78ee77d/1.9.1", GoVersion:"go1.4.2", Os:"linux", Arch:"amd64", KernelVersion:"3.10.0-693.el7.x86_64", Experimental:false, BuildTime:""}
I0803 04:30:33.554126 1417 helper.go:124] APIVersion: 1.21
I0803 04:30:33.554158 1417 up.go:686] Checking that docker API version is at least 1.22
WARNING: Docker version is 1.21, it needs to be >= 1.22
-- Checking for existing OpenShift container ...
I0803 04:30:33.554181 1417 helper.go:171] Inspecting docker container "origin"
I0803 04:30:33.555084 1417 helper.go:175] Container "origin" was not found
-- Checking for openshift/origin:v3.6.0 image ...
I0803 04:30:33.555101 1417 helper.go:143] Inspecting Docker image "openshift/origin:v3.6.0"
I0803 04:30:33.556444 1417 helper.go:146] Image "openshift/origin:v3.6.0" found: &types.ImageInspect{ID:"c6d16974c8a3a5da3ab799533daa2dbd54e56b1f0ebbad59345154fc8e836ff2", RepoTags:[]string{"docker.io/openshift/origin:v3.6.0"}, RepoDigests:[]string{}, Parent:"395d30169bc02cca2e7083926b0fd6f2e6b7034a6de41a811cce0ab7c7473fca", Comment:"", Created:"2017-08-01T18:34:13.736398725Z", Container:"ae53137cc1b98b2f93051589d6aee252e505ac82f8e7a31f5ab49bfc0e9dc91a", ContainerConfig:(*container.Config)(0xc420277b00), DockerVersion:"1.12.6", Author:"", Config:(*container.Config)(0xc4202e2120), Architecture:"amd64", Os:"linux", Size:611206034, VirtualSize:974248741, GraphDriver:types.GraphDriverData{Name:"devicemapper", Data:map[string]string{"DeviceId":"7", "DeviceName":"docker-202:2-25214823-c6d16974c8a3a5da3ab799533daa2dbd54e56b1f0ebbad59345154fc8e836ff2", "DeviceSize":"107374182400"}}, RootFS:types.RootFS{Type:"", Layers:[]string(nil), BaseLayer:""}}
-- Checking Docker daemon configuration ...
I0803 04:30:33.556503 1417 helper.go:65] Retrieving Docker daemon info
I0803 04:30:33.681753 1417 helper.go:71] Docker daemon info: &types.Info{ID:"IITV:S6LY:XNQS:LA63:VAH6:POZR:RGCW:MFWK:OTI7:DEII:AQK5:FDC6", Containers:0, ContainersRunning:0, ContainersPaused:0, ContainersStopped:0, Images:6, Driver:"devicemapper", DriverStatus:[][2]string{[2]string{"Pool Name", "docker-202:2-25214823-pool"}, [2]string{"Pool Blocksize", "65.54 kB"}, [2]string{"Base Device Size", "107.4 GB"}, [2]string{"Backing Filesystem", ""}, [2]string{"Data file", "/dev/loop0"}, [2]string{"Metadata file", "/dev/loop1"}, [2]string{"Data Space Used", "1.091 GB"}, [2]string{"Data Space Total", "107.4 GB"}, [2]string{"Data Space Available", "18.09 GB"}, [2]string{"Metadata Space Used", "1.339 MB"}, [2]string{"Metadata Space Total", "2.147 GB"}, [2]string{"Metadata Space Available", "2.146 GB"}, [2]string{"Udev Sync Supported", "true"}, [2]string{"Deferred Removal Enabled", "false"}, [2]string{"Deferred Deletion Enabled", "false"}, [2]string{"Deferred Deleted Device Count", "0"}, [2]string{"Data loop file", "/var/lib/docker/devicemapper/devicemapper/data"}, [2]string{"Metadata loop file", "/var/lib/docker/devicemapper/devicemapper/metadata"}, [2]string{"Library Version", "1.02.140-RHEL7 (2017-05-03)"}}, SystemStatus:[][2]string(nil), Plugins:types.PluginsInfo{Volume:[]string(nil), Network:[]string(nil), Authorization:[]string(nil)}, MemoryLimit:true, SwapLimit:true, KernelMemory:false, CPUCfsPeriod:true, CPUCfsQuota:true, CPUShares:false, CPUSet:false, IPv4Forwarding:true, BridgeNfIptables:true, BridgeNfIP6tables:true, Debug:false, NFd:15, OomKillDisable:true, NGoroutines:25, SystemTime:"2017-08-03T04:30:33.681150233-04:00", ExecutionDriver:"native-0.2", LoggingDriver:"json-file", CgroupDriver:"", NEventsListener:0, KernelVersion:"3.10.0-693.el7.x86_64", OperatingSystem:"Red Hat Enterprise Linux Server 7.4 (Maipo)", OSType:"", Architecture:"", IndexServerAddress:"https://index.docker.io/v1/", RegistryConfig:(*registry.ServiceConfig)(0xc4210fb700), NCPU:2, MemTotal:3973541888, DockerRootDir:"/var/lib/docker", HTTPProxy:"", HTTPSProxy:"", NoProxy:"", Name:"ip-172-31-0-186.us-west-2.compute.internal", Labels:[]string(nil), ExperimentalBuild:false, ServerVersion:"1.9.1", ClusterStore:"", ClusterAdvertise:"", SecurityOptions:[]string(nil)}
I0803 04:30:33.681847 1417 helper.go:42] Looking for "172.30.0.0/16" in []*registry.NetIPNet{(*registry.NetIPNet)(0xc4210f1a10), (*registry.NetIPNet)(0xc4210f1a70)}
I0803 04:30:33.681859 1417 helper.go:46] Found "172.30.0.0/16"
-- Checking for available ports ...
I0803 04:30:33.681920 1417 run.go:181] Creating container named ""
config:
image: openshift/origin:v3.6.0
entry point:
/bin/bash
command:
-c
cat /proc/net/tcp && ( [ -e /proc/net/tcp6 ] && cat /proc/net/tcp6 || true)
host config:
pid mode: host
user mode:
network mode: host
FAIL
Error: Cannot get TCP port information from Kubernetes host
Caused By:
Error: cannot create container using image openshift/origin:v3.6.0
Caused By:
Error: Error response from daemon: SHM size must be greater then 0
I have placed kubernetes config file in .kube/config. Still getting same error. Kubernetes cluster should be in same machine?
UPDATE-1
Install latest version from docker docs
To resolve dependency, installed container-selinux (sudo yum install ftp://fr2.rpmfind.net/linux/centos/7.3.1611/extras/x86_64/Packages/container-selinux-2.9-4.el7.noarch.rpm)
After I try to bringup the cluster with oc cluster up. This time, it failing at docker configuration.
[root#ip-172-31-0-186 ~]# oc cluster up
Starting OpenShift using openshift/origin:v3.6.0 ...
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v3.6.0 image ... OK
-- Checking Docker daemon configuration ... FAIL
Error: did not detect an --insecure-registry argument on the Docker daemon
Solution:
Ensure that the Docker daemon is running with the following argument:
--insecure-registry 172.30.0.0/16
Docs says, update the --insecure-registry 172.30.0.0/16 in /etc/sysconf/docker. But for new version of docker there is no file in that location. Any way I have created and updated the /etc/sysconf/docker. But still getting the above error.
Ok, the problem is insecure registry configuration. Specify insecure registry in daemon.json in /etc/docker with below config
{
"insecure-registries": [
"172.30.0.0/16"
]
}
This is working latest docker also.
For any particular version of Kubernetes or OpenShift the supported Docker version is little behind.
So I would advise you to install docker not the latest one from docker documentation. But install using your linux distribution's package manager. For Fedora and CentOS just do:
sudo yum install -y docker
Once you have done that all the dependency management will be taken care of and you don't need to manually install anything else.
Now that you have installed docker using the package manager you will find the /etc/sysconfig/docker. And you can add that line --insecure-registry 172.30.0.0/16.
HTH.

Redis "Fatal error, can't open config file 'restart'" after a crash

So after restarting my httpd redis crashed (due to the number of sudden requests sent via httpd and written on redis) and now when I try to restart redis on my centos 6.5 server I get the following error:
[root#host /]# /usr/sbin/redis-server restart
[1705] 17 Apr 00:30:49 # Fatal error, can't open config file 'restart'
I have also tried to login to redis using redis-cli and I get an error stating the connection to the server failed.
What options do I have to safely restart the server?
From the /src directory where you downloaded and unzipped your redis source, run the following. This is for RHEL based systems.
make install
# (OR)
sudo cp src/redis-server /usr/local/bin/
sudo cp src/redis-cli /usr/local/bin/