Crash when starting Traefik in cluster mode - traefik

I recently wanted to change from a one-node Traefik install (that was using a configuration file), to a 3-node Traefik cluster.
Following the docs, I uploaded the configuration:
$ traefik storeconfig
It displayed no error, and checking the Consul KV, keys are there.
But when launching Traefik in cluster mode, I get a segmentation fault:
$ traefik --cluster=true -d
INFO[0001] Using TOML configuration file /etc/traefik/traefik.toml
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x83500e]
goroutine 1 [running]:
github.com/containous/traefik/cluster.NewLeadership(0x2e08560, 0xc420557840, 0xc4202a1340, 0x0)
/go/src/github.com/containous/traefik/cluster/leadership.go:28 +0x6e
github.com/containous/traefik/server.NewServer(0x2540be400, 0x100, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc42035b930, 0x5, 0xc4205ef740, ...)
/go/src/github.com/containous/traefik/server/server.go:105 +0x63e
main.run(0xc4205678c0)
/go/src/github.com/containous/traefik/cmd/traefik/traefik.go:307 +0x6f6
main.main.func1(0xc42016cdc0, 0xc4202b31a0)
/go/src/github.com/containous/traefik/cmd/traefik/traefik.go:61 +0xd9
github.com/containous/traefik/vendor/github.com/containous/staert.(*Staert).Run(0xc4206c1f30, 0x1aa1940, 0xc420496300)
/go/src/github.com/containous/traefik/vendor/github.com/containous/staert/staert.go:83 +0x2e
main.main()
/go/src/github.com/containous/traefik/cmd/traefik/traefik.go:218 +0x1bf1
I've tried with latest stable, 1.3.7 and the 1.4.0-rc1 releases, both show the same error.
Any ideas?

I feel like your traefik.toml is incorrect and does not have the correct configuration for your consul backend.
try using this as your command or modify the consul section of the config # /etc/traefik/traefik.toml
traefik --consul --consul.endpoint=YOURENDPOINTHERE --cluster=true -d
make sure to refer to this
https://docs.traefik.io/configuration/backends/consul/

Related

Trivy on EKS unable to scan any images

I am trying to scan all images deployed on my EKS cluster I am setting up for high security (will be deployed to classified IL5 environment). Kubernetes v1.23, all worker nodes run on Bottlerocket OS.
I expect images to be scanned and available in the VulnerabilityReports CRD.
I was able to successfully install Falco to the cluster (uses containerd). However, when deploying the official Helm chart (0.6.0-rc3) the scan-vulnerability containers start and then immediately error out. I set this environment variable on the trivy-operator deployment:
- name: CONTAINER_RUNTIME_ENDPOINT
value: /run/containerd/containerd.sock
Output of run with -debug:
{
"level": "error",
"ts": 1668286646.865245,
"logger": "reconciler.vulnerabilityreport",
"msg": "Scan job container",
"job": "trivy-system/scan-vulnerabilityreport-74f54b6cd",
"container": "discovery",
"status.reason": "Error",
"status.message": "2022-11-12T20:57:13.674Z\t\u001b[31mFATAL\u001b[0m\timage scan error: scan error: unable to initialize a scanner: unable to initialize a docker scanner: 4 errors occurred:\n\t* unable to inspect the image (023620263533.dkr.ecr.us-gov-east-1.amazonaws.com/docker.io/istio/pilot:1.15.2): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?\n\t* unable to initialize Podman client: no podman socket found: stat podman/podman.sock: no such file or directory\n\t* containerd socket not found: /run/containerd/containerd.sock\n\t* GET https://023620263533.dkr.ecr.us-gov-east-1.amazonaws.com/v2/docker.io/istio/pilot/manifests/1.15.2: unexpected status code 401 Unauthorized: Not Authorized\n\n\n\n",
"stacktrace": "github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).processFailedScanJob\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:551\ngithub.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:376\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime#v0.13.1/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime#v0.13.1/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime#v0.13.1/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime#v0.13.1/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime#v0.13.1/pkg/internal/controller/controller.go:234"
}
I confirmed that bottlerocket uses containerd, as /run/containerd/containerd.sock is specified on my Falco deployment. Even when I mount this as volume onto the pod and set the CONTAINER_RUNTIME_ENDPOINT to this path I get the same error.
Edit
I added the following security context:
seLinuxOptions:
user: system_u
role: system_r
type: control_t
level: s0-s0:c0.c1023
Initially I mounted the dockershim.sock from the host to the pod, then realized that was not necessary, the error messages were a little misleading, it was really an authentication with ECR issue. Furthermore, the seLinux flags needed to be specified at the pod level, and not the container level.

Hyperledger Fabric error: "TLS: bad certificate server" when installing chaincode

I'm just starting learning HLF, and I have an error while following tutorial from the docs: link
I downloaded fabric-samples using this command (replaced bit.ly link with the destination):
curl -sSL https://raw.githubusercontent.com/hyperledger/fabric/master/scripts/bootstrap.sh | bash -s -- 2.2.2 1.4.9
I run logspout in one terminal and try to execute peer lifecycle chaincode install basic.tar.gz in another one, and this is the result i get
Error: failed to retrieve endorser client for install: endorser client
failed to connect to localhost:7051: failed to create new connection:
context deadline exceeded
Log presented by Logspout:
peer0.org1.example.com|2022-03-15 13:03:24.452 UTC [core.comm]
ServerHandshake -> ERRO 04a Server TLS handshake failed in 2.650245ms
with error remote error: tls: bad certificate server=PeerServer
remoteaddress=172.22.0.1:61126
I set the envs in terminal as instructed in the docs, and I checked that CORE_PEER_TLS_ROOTCERT_FILE variable points to an existing file. The content of the file is the same as on the container.
What I tried to do:
download fabric-samples again and redo all the setup with copy-pasting the commands directly from docs
Do you have any suggestions where I can look for an issue?
I resolved the problem, I was using peer version 2.2.1 from previous experiments, it probably collided with FABRIC_CFG_PATH

How to delete corrupt local EKS Anywhere cluster

I've created a local EKS Anywhere cluster following this tutorial.
$ CLUSTER_NAME=dev-cluster
$ eksctl anywhere generate clusterconfig $CLUSTER_NAME \
--provider docker > $CLUSTER_NAME.yaml
$ eksctl anywhere create cluster -f $CLUSTER_NAME.yaml
After creating the cluster, I tried to delete it, but while it was still processing the deletion, I pressed Ctrl C to stop the operation, but it seems like the cluster is corrupt and I can no longer delete it.
$ eksctl anywhere delete cluster -f ${CONFIG_FILE}
Performing provider setup and validations
Creating management cluster
collecting cluster diagnostics
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x2479de7]
goroutine 1 [running]:
github.com/aws/eks-anywhere/pkg/workflows.(*CollectDiagnosticsTask).Run(0x0, 0x2b04168, 0xc000130008, 0xc00015efd0, 0xc061b451d05e8958, 0xc000c83440)
/codebuild/output/src572116999/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/aws.eks-anywhere/pkg/workflows/diagnostics.go:23 +0x67
github.com/aws/eks-anywhere/pkg/workflows.(*deleteManagementCluster).Run(0xc0004824f8, 0x2b04168, 0xc000130008, 0xc00015efd0, 0x13, 0x2)
/codebuild/output/src572116999/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/aws.eks-anywhere/pkg/workflows/delete.go:172 +0x191
github.com/aws/eks-anywhere/pkg/task.(*taskRunner).RunTask(0xc000a93ba8, 0x2b04168, 0xc000130008, 0xc00015efd0, 0x0, 0x0)
/codebuild/output/src572116999/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/aws.eks-anywhere/pkg/task/task.go:115 +0x1ef
github.com/aws/eks-anywhere/pkg/workflows.(*Delete).Run(0xc000a93c80, 0x2b04168, 0xc000130008, 0xc000a76330, 0xc0004d57a0, 0xc000c85300, 0x0, 0x0, 0x0, 0xc0007d1cc0)
/codebuild/output/src572116999/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/aws.eks-anywhere/pkg/workflows/delete.go:53 +0x145
github.com/aws/eks-anywhere/cmd/eksctl-anywhere/cmd.(*deleteClusterOptions).deleteCluster(0x37fb100, 0x2b04168, 0xc000130008, 0xc00041cec0, 0x0)
/codebuild/output/src572116999/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/aws.eks-anywhere/cmd/eksctl-anywhere/cmd/deletecluster.go:120 +0x389
github.com/aws/eks-anywhere/cmd/eksctl-anywhere/cmd.glob..func2(0x37cc180, 0xc00041cec0, 0x0, 0x2, 0x0, 0x0)
/codebuild/output/src572116999/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/aws.eks-anywhere/cmd/eksctl-anywhere/cmd/deletecluster.go:36 +0xb3
github.com/spf13/cobra.(*Command).execute(0x37cc180, 0xc00041cea0, 0x2, 0x2, 0x37cc180, 0xc00041cea0)
/go/pkg/mod/github.com/spf13/cobra#v1.1.3/command.go:852 +0x472
github.com/spf13/cobra.(*Command).ExecuteC(0x37cb280, 0x8, 0xc000000180, 0x2496ec5)
/go/pkg/mod/github.com/spf13/cobra#v1.1.3/command.go:960 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/cobra#v1.1.3/command.go:897
github.com/spf13/cobra.(*Command).ExecuteContext(...)
/go/pkg/mod/github.com/spf13/cobra#v1.1.3/command.go:890
github.com/aws/eks-anywhere/cmd/eksctl-anywhere/cmd.Execute(0x0, 0x0)
/codebuild/output/src572116999/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/aws.eks-anywhere/cmd/eksctl-anywhere/cmd/root.go:43 +0x53
main.main()
/codebuild/output/src572116999/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/aws.eks-anywhere/cmd/eksctl-anywhere/main.go:29 +0xe5
The main error seems to be:
invalid memory address or nil pointer dereference
How can I manually delete this local cluster? brew uninstall aws/tap/eks-anywhere doesn't seem to have worked.

Install Julia behind proxy on my linux box

I am trying to install Julia on a linux box which is behind proxy. The installation of julia is quite easy, but the installation of modules is very exhausting.
I tried setting up https_proxy - Failed
sslVerify: false on git level - Failed
JULIA_SSL_NO_VERIFY and other variables like these - Failed.
.curlrc file with insecure in it - Failed.
searched for documentations about the libgit2 and not really helpful documents are available for this on the official page or on the internet.
Really fed up on this. Can someone help me on this please?
Error:
(v1.6) pkg> add IJulia
Installing known registries into ~/.julia
┌ Warning: could not download https://pkg.julialang.org/registries
└ # Pkg.Types /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Types.jl:997
┌ Warning: could not download https://pkg.julialang.org/registries
└ # Pkg.Types /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Types.jl:997
Cloning registry from "https://github.com/JuliaRegistries/General.git" ERROR: failed to clone from https://github.com/JuliaRegistries/General.git, error: GitError(Code:ERROR, Class:SSL, SSL error: 0xffff8e00 - SSL - An invalid SSL record was received)

ambari-agent can not reach ambari-server

When I finished install ambari-server with httpd local repository and Comfire Hosts on webUI, I got some error as follow:
INFO 2018-05-27 15:39:16,776 NetUtil.py:70 - Connecting to https://master:8440/ca
ERROR 2018-05-27 15:39:16,787 NetUtil.py:96 - [Errno 8] _ssl.c:493: EOF occurred in violation of protocol
ERROR 2018-05-27 15:39:16,788 NetUtil.py:97 - SSLError: Failed to connect.Please check openssl library versions.
Refer to: https://bugzilla.redhat.com/show_bug.cgi?id=1022468 for more details.
WARNING 2018-05-27 15:39:16,789 NetUtil.py:124 - Server at https://master:8440 is not reachable, sleeping for 10 seconds...
INFO 2018-05-27 15:39:26,793 NetUtil.py:70 - Connecting to https://master:8440/ca
ERROR 2018-05-27 15:39:26,799 NetUtil.py:96 - [Errno 8] _ssl.c:493: EOF occurred in violation of protocol
ERROR 2018-05-27 15:39:26,799 NetUtil.py:97 - SSLError: Failed to connect. Please check openssl library versions.Refer to: https://bugzilla.redhat.com/show_bug.cgi?id=1022468 for more details.
WARNING 2018-05-27 15:39:26,801 NetUtil.py:124 - Server at https://master:8440 is not reachable, sleeping for 10 seconds...
My environment message as follow:
CentOS Linux release 7.5.1804 (Core)
Python2.7.5
Java1.8.0_171
OpenSSL1.0.2k
Ambari2.6.2.0
HDP-2.6.5.0
On my other amabri-agent nodes, I can reach master on 8440 port as follow:
[root#slave2 ~]# telnet master 8440
Trying 192.168.17.128...
Connected to master.
Escape character is '^]'.
Please give me some help, thanks a lot!
I am also getting the same issue.
This worked for me.
In /etc/ambari-agent/conf/ambari-agent.ini
Add this line below [security]
force_https_protocol=PROTOCOL_TLSv1_2
In /etc/python/cert-verification.cfg
[https]
verify=disable
(change from default to disable)
Please check JAVA_HOME and openSSL version in your setup