I am trying to scan all images deployed on my EKS cluster I am setting up for high security (will be deployed to classified IL5 environment). Kubernetes v1.23, all worker nodes run on Bottlerocket OS.
I expect images to be scanned and available in the VulnerabilityReports CRD.
I was able to successfully install Falco to the cluster (uses containerd). However, when deploying the official Helm chart (0.6.0-rc3) the scan-vulnerability containers start and then immediately error out. I set this environment variable on the trivy-operator deployment:
- name: CONTAINER_RUNTIME_ENDPOINT
value: /run/containerd/containerd.sock
Output of run with -debug:
{
"level": "error",
"ts": 1668286646.865245,
"logger": "reconciler.vulnerabilityreport",
"msg": "Scan job container",
"job": "trivy-system/scan-vulnerabilityreport-74f54b6cd",
"container": "discovery",
"status.reason": "Error",
"status.message": "2022-11-12T20:57:13.674Z\t\u001b[31mFATAL\u001b[0m\timage scan error: scan error: unable to initialize a scanner: unable to initialize a docker scanner: 4 errors occurred:\n\t* unable to inspect the image (023620263533.dkr.ecr.us-gov-east-1.amazonaws.com/docker.io/istio/pilot:1.15.2): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?\n\t* unable to initialize Podman client: no podman socket found: stat podman/podman.sock: no such file or directory\n\t* containerd socket not found: /run/containerd/containerd.sock\n\t* GET https://023620263533.dkr.ecr.us-gov-east-1.amazonaws.com/v2/docker.io/istio/pilot/manifests/1.15.2: unexpected status code 401 Unauthorized: Not Authorized\n\n\n\n",
"stacktrace": "github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).processFailedScanJob\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:551\ngithub.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:376\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime#v0.13.1/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime#v0.13.1/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime#v0.13.1/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime#v0.13.1/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime#v0.13.1/pkg/internal/controller/controller.go:234"
}
I confirmed that bottlerocket uses containerd, as /run/containerd/containerd.sock is specified on my Falco deployment. Even when I mount this as volume onto the pod and set the CONTAINER_RUNTIME_ENDPOINT to this path I get the same error.
Edit
I added the following security context:
seLinuxOptions:
user: system_u
role: system_r
type: control_t
level: s0-s0:c0.c1023
Initially I mounted the dockershim.sock from the host to the pod, then realized that was not necessary, the error messages were a little misleading, it was really an authentication with ECR issue. Furthermore, the seLinux flags needed to be specified at the pod level, and not the container level.
Related
I'm just starting learning HLF, and I have an error while following tutorial from the docs: link
I downloaded fabric-samples using this command (replaced bit.ly link with the destination):
curl -sSL https://raw.githubusercontent.com/hyperledger/fabric/master/scripts/bootstrap.sh | bash -s -- 2.2.2 1.4.9
I run logspout in one terminal and try to execute peer lifecycle chaincode install basic.tar.gz in another one, and this is the result i get
Error: failed to retrieve endorser client for install: endorser client
failed to connect to localhost:7051: failed to create new connection:
context deadline exceeded
Log presented by Logspout:
peer0.org1.example.com|2022-03-15 13:03:24.452 UTC [core.comm]
ServerHandshake -> ERRO 04a Server TLS handshake failed in 2.650245ms
with error remote error: tls: bad certificate server=PeerServer
remoteaddress=172.22.0.1:61126
I set the envs in terminal as instructed in the docs, and I checked that CORE_PEER_TLS_ROOTCERT_FILE variable points to an existing file. The content of the file is the same as on the container.
What I tried to do:
download fabric-samples again and redo all the setup with copy-pasting the commands directly from docs
Do you have any suggestions where I can look for an issue?
I resolved the problem, I was using peer version 2.2.1 from previous experiments, it probably collided with FABRIC_CFG_PATH
I am using dask-yarn in local mode in a mapr-cluster. I have unpacked the virtual environment in a shared folder between the nodes.
Some times the workers ( containers ) start properly in the cluster, but sometimes the containers have the next error message in yarn.
/usr/bin/env: 'python3.6': No such file or directory
In the meantime, I see a lot of containers with status FAILED ( > 1000 ). My initial provision is around 5 workers however I have to wait around 10 minutes or more until I get the initial provision.
The next is my /etc/dask/yarn.yaml configuration
yarn:
specification: null
name: dask
queue: default
deploy-mode: local
environment: "venv://<shared_location>"
tags: []
user: ''
host: "host_name"
port: 8788
dashboard-address: ":17439"
scheduler:
vcores: 1
memory: 2GiB
worker:
vcores: 1
memory: 2GiB
restarts: -1
env: {'SOME_VAR':'some_value'}
Reason for the problem: Some of the nodes didnt have the same python version and in the same location. Since I am using a virtual environment. The virtual environment expected to have python in the same location in all Nodes
Using filebeat 7.5.2:
I'm using a filebeat configuration with close_eof enabled and I run filebeat with the flag --once. I can see the harvester reaching eof but the filebeat keeps going.
Flebeat conf:
filebeat.inputs:
- type: log
close_eof: true
enabled: true
paths:
- "${LOGS_PATH}"
scan_frequency: 1s
fields: {
machine: "${HOST}"
}
output.logstash:
hosts: ["192.168.41.6:5044"]
bulk_max_size: 1024
timeout: 30s
pipelining: 1
workers: 1
And I run it using:
filebeat run --once -v -c "PATH TO CONF..."
And some logs from the filebeat instance:
...
2020-02-04T18:30:16.950Z INFO instance/beat.go:297 Setup Beat: filebeat; Version: 7.5.2
2020-02-04T18:30:17.059Z INFO [publisher] pipeline/module.go:97 Beat name: logstash
2020-02-04T18:30:17.167Z WARN beater/filebeat.go:152 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch out
put is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2020-02-04T18:30:17.168Z INFO instance/beat.go:429 filebeat start running.
2020-02-04T18:30:17.168Z INFO [monitoring] log/log.go:118 Starting metrics logging every 30s
2020-02-04T18:30:17.168Z INFO registrar/migrate.go:104 No registry home found. Create: /tmp/tmp.BXJtfiaEzb/data/registry/filebeat
2020-02-04T18:30:17.179Z INFO registrar/migrate.go:112 Initialize registry meta file
2020-02-04T18:30:17.192Z INFO registrar/registrar.go:108 No registry file found under: /tmp/tmp.BXJtfiaEzb/data/registry/filebeat/data.json. Creating a new re
gistry file.
2020-02-04T18:30:17.193Z INFO registrar/registrar.go:145 Loading registrar data from /tmp/tmp.BXJtfiaEzb/data/registry/filebeat/data.json
2020-02-04T18:30:17.193Z INFO registrar/registrar.go:152 States Loaded from registrar: 0
2020-02-04T18:30:17.193Z WARN beater/filebeat.go:368 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch out
put is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2020-02-04T18:30:17.193Z INFO crawler/crawler.go:72 Loading Inputs: 1
2020-02-04T18:30:17.194Z INFO log/input.go:152 Configured paths: [/tmp/tmp.BXJtfiaEzb/*.log]
2020-02-04T18:30:17.206Z INFO input/input.go:114 Starting input of type: log; ID: 13918413832820009056
2020-02-04T18:30:17.225Z INFO input/input.go:167 Stopping Input: 13918413832820009056
2020-02-04T18:30:17.225Z INFO crawler/crawler.go:106 Loading and starting Inputs completed. Enabled inputs: 1
2020-02-04T18:30:17.225Z INFO log/harvester.go:251 Harvester started for file: /tmp/tmp.BXJtfiaEzb/dcbgw-20200124080032_darkblue.log
2020-02-04T18:30:17.231Z INFO beater/filebeat.go:384 Running filebeat once. Waiting for completion ...
2020-02-04T18:30:17.231Z INFO beater/filebeat.go:386 All data collection completed. Shutting down.
2020-02-04T18:30:17.231Z INFO crawler/crawler.go:139 Stopping Crawler
2020-02-04T18:30:17.231Z INFO crawler/crawler.go:149 Stopping 1 inputs
2020-02-04T18:30:17.258Z INFO pipeline/output.go:95 Connecting to backoff(async(tcp://192.168.41.6:5044))
2020-02-04T18:30:17.296Z INFO pipeline/output.go:105 Connection to backoff(async(tcp://192.168.41.6:5044)) established
... Only metrics here ...
2020-02-04T18:35:55.686Z INFO log/harvester.go:274 End of file reached: /tmp/tmp.BXJtfiaEzb/dcbgw-20200124080032_darkblue.log. Closing because close_eof is enabled.
2020-02-04T18:35:55.686Z INFO crawler/crawler.go:165 Crawler stopped
... MORE METRICS ...
2020-02-04T18:36:26.609Z ERROR logstash/async.go:256 Failed to publish events caused by: read tcp 192.168.41.6:49662->192.168.41.6:5044: i/o timeout
2020-02-04T18:36:26.621Z ERROR logstash/async.go:256 Failed to publish events caused by: client is not connected
2020-02-04T18:36:28.520Z ERROR pipeline/output.go:121 Failed to publish events: client is not connected
2020-02-04T18:36:28.520Z INFO pipeline/output.go:95 Connecting to backoff(async(tcp://192.168.41.6:5044))
2020-02-04T18:36:28.521Z INFO pipeline/output.go:105 Connection to backoff(async(tcp://192.168.41.6:5044)) established
... MORE METRICS ...
From this I'm outputing this to Logstash 7.5.2 running in the same Ubuntu 18 VM. Running Logstash with log level trace does not output any error.
I wanted to learn more about Fabric8, however, it is not possible to build even a very simple project. I am running it locally on a Minikube cluster.
The setup is:
Mac OS Sierra
Minikube v0.18.0
Fabric8 v0.4.122
So I have a simple Spring Boot application in the local Gogs repository. The builds are failing with this message:
/usr/bin/git checkout -f d8af29f8af7a498331a244d245fb321003ef110d
/usr/bin/git rev-list d8af29f8af7a498331a244d245fb321003ef110d # timeout=10
[Pipeline] End of Pipeline
io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:57)
at io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:153)
[...]
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
So I took the ca.crt from Minikube (~/minikube/ca.crt) and added it (base64-encoded) to the jenkins-git-ssh secret which gets mounted in the Jenkins pod in /var/run/secrets/kubernetes.io/serviceaccount. The next build ended with this error:
/usr/bin/git checkout -f d8af29f8af7a498331a244d245fb321003ef110d
/usr/bin/git rev-list d8af29f8af7a498331a244d245fb321003ef110d # timeout=10
[Pipeline] End of Pipeline
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default/. Message: Unauthorized
.
The same happens when I use apiserver.crt from Minikube.
When using ca.pem instead I get:
Caused by: java.security.cert.CertificateException: Unable to initialize, java.io.IOException: extra data given to DerValue constructor
at sun.security.x509.X509CertImpl.<init>(X509CertImpl.java:198)
at sun.security.provider.X509Factory.engineGenerateCertificate(X509Factory.java:102)
I can access the Kubernetes API from the Jenkins pod only when adding both apiserver.crt and apiserver.key to the secret. Executing
curl -k --cert apiserver.crt --key apiserver.key https://kubernetes.default/.
is successful then - but the Jenkins build is still failing.
So Im a bit lost here. Does anybody have an idea how to continue?
Thanks and regards,
Daniel
we have a fix but it's not released yet. Details can be found https://github.com/fabric8io/fabric8/issues/6829#issuecomment-301467664 which also describes a workaround.
TL;DR you can edit the jenkins service account and remove the following lines before restarting the jenkins master pod:
-secrets:
-- name: "jenkins-git-ssh"
-- name: "jenkins-master-ssh"
-- name: "jenkins-release-gpg"
Hope that helps.
I'm trying to run pig on tez on amazon emr 4.5.0. The configuration works without tez, I'm just trying to get it to work on Tez.
To create the cluster (from the command line), we're using (TEZ_VERSION is defined as 0.5.2):
--bootstrap-actions Path=s3://support.elasticmapreduce/tez/bigtop/install-tez.rb,Args=[-v,$TEZ_VERSION,--tez-site,tez.lib.uris=s3://support.elasticmapreduce/tez/$TEZ_VERSION/tez-$TEZ_VERSION-minimal.tar.gz]
Additionally, I'm overwriting the PIG_CLASSPATH:
--configurations file://pig_tez_classification.json
Containing:
[
{
"Classification": "hadoop-env",
"Properties": {
},
"Configurations": [
{
"Classification": "export",
"Properties": {
"PIG_CLASSPATH": "/etc/tez/conf"
},
"Configurations": [
]
}
]
}
]
The PIG_CLASSPATH is needed to prevent this error:
ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Cannot submit DAG
org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration
The tez.lib.uris overwrite is needed to prevent this error:
ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezJob (PigTezLauncher-0): Cannot submit DAG
java.io.FileNotFoundException: File does not exist: hdfs://ip-172-31-3-207.eu-west-1.compute.internal:8020/apps/tez/0.5.2/tez-0.5.2-minimal.tar.gz
It appears the install script writes the tar.gz file to the right location in the hdfs, but when I log in over ssh afterwards, the file is not there. I think in EMR-4 the bootstrap actions are run at a different time, so before hdfs is available?
After all that, I still get this error:
WARN org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Exception while gathering stats
java.lang.NullPointerException
at org.apache.pig.tools.pigstats.tez.TezDAGStats.accumulateStats(TezDAGStats.java:191)
at org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:180)
at org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:194)
at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:167)
Trying with tez version 0.8.2 yields:
ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Cannot submit DAG
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown
Which seems to be due to a difference in used tez version, because it still prints out:
INFO org.apache.pig.tools.pigstats.tez.TezPigScriptStats - Script Statistics:
HadoopVersion: 2.7.2-amzn-0
PigVersion: 0.14.0-amzn-0
TezVersion: 0.5.2
UserId: hadoop
So does anyone know how to get pig on tez on amazon emr (whatever versions) running?
I'm successfully running it with emr-4.4.0. However, I was unable to get pig to work correctly with the tar.gz uploaded to HDFS. Instead, I had to unpack the tarball and upload all the individual files, then set tez.lib.uris to hdfs:///apps/tez-0.8.2,hdfs:///apps/tez-0.8.2/lib