Random "/usr/bin/env: 'python3.6': No such file or directory" when working with dask-yarn - hadoop-yarn

I am using dask-yarn in local mode in a mapr-cluster. I have unpacked the virtual environment in a shared folder between the nodes.
Some times the workers ( containers ) start properly in the cluster, but sometimes the containers have the next error message in yarn.
/usr/bin/env: 'python3.6': No such file or directory
In the meantime, I see a lot of containers with status FAILED ( > 1000 ). My initial provision is around 5 workers however I have to wait around 10 minutes or more until I get the initial provision.
The next is my /etc/dask/yarn.yaml configuration
yarn:
specification: null
name: dask
queue: default
deploy-mode: local
environment: "venv://<shared_location>"
tags: []
user: ''
host: "host_name"
port: 8788
dashboard-address: ":17439"
scheduler:
vcores: 1
memory: 2GiB
worker:
vcores: 1
memory: 2GiB
restarts: -1
env: {'SOME_VAR':'some_value'}

Reason for the problem: Some of the nodes didnt have the same python version and in the same location. Since I am using a virtual environment. The virtual environment expected to have python in the same location in all Nodes

Related

Trivy on EKS unable to scan any images

I am trying to scan all images deployed on my EKS cluster I am setting up for high security (will be deployed to classified IL5 environment). Kubernetes v1.23, all worker nodes run on Bottlerocket OS.
I expect images to be scanned and available in the VulnerabilityReports CRD.
I was able to successfully install Falco to the cluster (uses containerd). However, when deploying the official Helm chart (0.6.0-rc3) the scan-vulnerability containers start and then immediately error out. I set this environment variable on the trivy-operator deployment:
- name: CONTAINER_RUNTIME_ENDPOINT
value: /run/containerd/containerd.sock
Output of run with -debug:
{
"level": "error",
"ts": 1668286646.865245,
"logger": "reconciler.vulnerabilityreport",
"msg": "Scan job container",
"job": "trivy-system/scan-vulnerabilityreport-74f54b6cd",
"container": "discovery",
"status.reason": "Error",
"status.message": "2022-11-12T20:57:13.674Z\t\u001b[31mFATAL\u001b[0m\timage scan error: scan error: unable to initialize a scanner: unable to initialize a docker scanner: 4 errors occurred:\n\t* unable to inspect the image (023620263533.dkr.ecr.us-gov-east-1.amazonaws.com/docker.io/istio/pilot:1.15.2): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?\n\t* unable to initialize Podman client: no podman socket found: stat podman/podman.sock: no such file or directory\n\t* containerd socket not found: /run/containerd/containerd.sock\n\t* GET https://023620263533.dkr.ecr.us-gov-east-1.amazonaws.com/v2/docker.io/istio/pilot/manifests/1.15.2: unexpected status code 401 Unauthorized: Not Authorized\n\n\n\n",
"stacktrace": "github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).processFailedScanJob\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:551\ngithub.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:376\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime#v0.13.1/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime#v0.13.1/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime#v0.13.1/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime#v0.13.1/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime#v0.13.1/pkg/internal/controller/controller.go:234"
}
I confirmed that bottlerocket uses containerd, as /run/containerd/containerd.sock is specified on my Falco deployment. Even when I mount this as volume onto the pod and set the CONTAINER_RUNTIME_ENDPOINT to this path I get the same error.
Edit
I added the following security context:
seLinuxOptions:
user: system_u
role: system_r
type: control_t
level: s0-s0:c0.c1023
Initially I mounted the dockershim.sock from the host to the pod, then realized that was not necessary, the error messages were a little misleading, it was really an authentication with ECR issue. Furthermore, the seLinux flags needed to be specified at the pod level, and not the container level.

Filebeat does not complete on close_eof + --once

Using filebeat 7.5.2:
I'm using a filebeat configuration with close_eof enabled and I run filebeat with the flag --once. I can see the harvester reaching eof but the filebeat keeps going.
Flebeat conf:
filebeat.inputs:
- type: log
close_eof: true
enabled: true
paths:
- "${LOGS_PATH}"
scan_frequency: 1s
fields: {
machine: "${HOST}"
}
output.logstash:
hosts: ["192.168.41.6:5044"]
bulk_max_size: 1024
timeout: 30s
pipelining: 1
workers: 1
And I run it using:
filebeat run --once -v -c "PATH TO CONF..."
And some logs from the filebeat instance:
...
2020-02-04T18:30:16.950Z INFO instance/beat.go:297 Setup Beat: filebeat; Version: 7.5.2
2020-02-04T18:30:17.059Z INFO [publisher] pipeline/module.go:97 Beat name: logstash
2020-02-04T18:30:17.167Z WARN beater/filebeat.go:152 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch out
put is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2020-02-04T18:30:17.168Z INFO instance/beat.go:429 filebeat start running.
2020-02-04T18:30:17.168Z INFO [monitoring] log/log.go:118 Starting metrics logging every 30s
2020-02-04T18:30:17.168Z INFO registrar/migrate.go:104 No registry home found. Create: /tmp/tmp.BXJtfiaEzb/data/registry/filebeat
2020-02-04T18:30:17.179Z INFO registrar/migrate.go:112 Initialize registry meta file
2020-02-04T18:30:17.192Z INFO registrar/registrar.go:108 No registry file found under: /tmp/tmp.BXJtfiaEzb/data/registry/filebeat/data.json. Creating a new re
gistry file.
2020-02-04T18:30:17.193Z INFO registrar/registrar.go:145 Loading registrar data from /tmp/tmp.BXJtfiaEzb/data/registry/filebeat/data.json
2020-02-04T18:30:17.193Z INFO registrar/registrar.go:152 States Loaded from registrar: 0
2020-02-04T18:30:17.193Z WARN beater/filebeat.go:368 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch out
put is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2020-02-04T18:30:17.193Z INFO crawler/crawler.go:72 Loading Inputs: 1
2020-02-04T18:30:17.194Z INFO log/input.go:152 Configured paths: [/tmp/tmp.BXJtfiaEzb/*.log]
2020-02-04T18:30:17.206Z INFO input/input.go:114 Starting input of type: log; ID: 13918413832820009056
2020-02-04T18:30:17.225Z INFO input/input.go:167 Stopping Input: 13918413832820009056
2020-02-04T18:30:17.225Z INFO crawler/crawler.go:106 Loading and starting Inputs completed. Enabled inputs: 1
2020-02-04T18:30:17.225Z INFO log/harvester.go:251 Harvester started for file: /tmp/tmp.BXJtfiaEzb/dcbgw-20200124080032_darkblue.log
2020-02-04T18:30:17.231Z INFO beater/filebeat.go:384 Running filebeat once. Waiting for completion ...
2020-02-04T18:30:17.231Z INFO beater/filebeat.go:386 All data collection completed. Shutting down.
2020-02-04T18:30:17.231Z INFO crawler/crawler.go:139 Stopping Crawler
2020-02-04T18:30:17.231Z INFO crawler/crawler.go:149 Stopping 1 inputs
2020-02-04T18:30:17.258Z INFO pipeline/output.go:95 Connecting to backoff(async(tcp://192.168.41.6:5044))
2020-02-04T18:30:17.296Z INFO pipeline/output.go:105 Connection to backoff(async(tcp://192.168.41.6:5044)) established
... Only metrics here ...
2020-02-04T18:35:55.686Z INFO log/harvester.go:274 End of file reached: /tmp/tmp.BXJtfiaEzb/dcbgw-20200124080032_darkblue.log. Closing because close_eof is enabled.
2020-02-04T18:35:55.686Z INFO crawler/crawler.go:165 Crawler stopped
... MORE METRICS ...
2020-02-04T18:36:26.609Z ERROR logstash/async.go:256 Failed to publish events caused by: read tcp 192.168.41.6:49662->192.168.41.6:5044: i/o timeout
2020-02-04T18:36:26.621Z ERROR logstash/async.go:256 Failed to publish events caused by: client is not connected
2020-02-04T18:36:28.520Z ERROR pipeline/output.go:121 Failed to publish events: client is not connected
2020-02-04T18:36:28.520Z INFO pipeline/output.go:95 Connecting to backoff(async(tcp://192.168.41.6:5044))
2020-02-04T18:36:28.521Z INFO pipeline/output.go:105 Connection to backoff(async(tcp://192.168.41.6:5044)) established
... MORE METRICS ...
From this I'm outputing this to Logstash 7.5.2 running in the same Ubuntu 18 VM. Running Logstash with log level trace does not output any error.

Making Dockerized Flask server concurrent

I have a Flask server that I'm running on AWS Fargate. My task has 2 vCPUs and 8 GB of memory. My server is only able to respond to one request at a time. If I run 2 API requests at the same, each that takes 7 seconds, the first request will take 7 seconds to return and the second will take 14 seconds to return.
This is my Docker file (using this repo):
FROM tiangolo/uwsgi-nginx-flask:python3.7
COPY ./requirements.txt requirements.txt
RUN pip3 install --no-cache-dir -r requirements.txt
RUN python3 -m spacy download en
RUN apt-get update
RUN apt-get install wkhtmltopdf -y
RUN apt-get install poppler-utils -y
RUN apt-get install xvfb -y
COPY ./ /app
I have the following config file:
[uwsgi]
module = main
callable = app
enable-threads = true
These are my logs when I start the server:
Checking for script in /app/prestart.sh
Running script /app/prestart.sh
Running inside /app/prestart.sh, you could add migrations to this file, e.g.:
#! /usr/bin/env bash
# Let the DB start
sleep 10;
# Run migrations
alembic upgrade head
/usr/lib/python2.7/dist-packages/supervisor/options.py:298: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2019-10-05 06:29:53,438 CRIT Supervisor running as root (no user in config file)
2019-10-05 06:29:53,438 INFO Included extra file "/etc/supervisor/conf.d/supervisord.conf" during parsing
2019-10-05 06:29:53,446 INFO RPC interface 'supervisor' initialized
2019-10-05 06:29:53,446 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2019-10-05 06:29:53,446 INFO supervisord started with pid 1
2019-10-05 06:29:54,448 INFO spawned: 'nginx' with pid 9
2019-10-05 06:29:54,450 INFO spawned: 'uwsgi' with pid 10
[uWSGI] getting INI configuration from /app/uwsgi.ini
[uWSGI] getting INI configuration from /etc/uwsgi/uwsgi.ini
;uWSGI instance configuration
[uwsgi]
cheaper = 2
processes = 16
ini = /app/uwsgi.ini
module = main
callable = app
enable-threads = true
ini = /etc/uwsgi/uwsgi.ini
socket = /tmp/uwsgi.sock
chown-socket = nginx:nginx
chmod-socket = 664
hook-master-start = unix_signal:15 gracefully_kill_them_all
need-app = true
die-on-term = true
show-config = true
;end of configuration
*** Starting uWSGI 2.0.18 (64bit) on [Sat Oct 5 06:29:54 2019] ***
compiled with version: 6.3.0 20170516 on 09 August 2019 03:11:53
os: Linux-4.14.138-114.102.amzn2.x86_64 #1 SMP Thu Aug 15 15:29:58 UTC 2019
nodename: ip-10-0-1-217.ec2.internal
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 2
current working directory: /app
detected binary path: /usr/local/bin/uwsgi
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to UNIX address /tmp/uwsgi.sock fd 3
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
Python version: 3.7.4 (default, Jul 13 2019, 14:20:24) [GCC 6.3.0 20170516]
Python main interpreter initialized at 0x55e1e2b181a0
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 1239640 bytes (1210 KB) for 16 cores
*** Operational MODE: preforking ***
2019-10-05 06:29:55,483 INFO success: nginx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-10-05 06:29:55,484 INFO success: uwsgi entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

Filebeat not starting TCP server (input)

So I have configured filebeat to accept input via TCP. This is filebeat.yml file.
filebeat.inputs:
- type: tcp
host: ["localhost:9000"]
max_message_size: 20MiB
For some reason filebeat does not start the TCP server at port 9000. I have verified this using wireshark. Wireshark shows nothing at port 9000.
This is output of command "filebeat -e -d "*"" run on terminal
2019-08-14T09:12:40.745-0600 INFO instance/beat.go:468 Home path: [/usr/local/Cellar/filebeat/6.2.4] Config path: [/usr/local/etc/filebeat] Data path: [/usr/local/var/lib/filebeat] Logs path: [/usr/local/var/log/filebeat]
2019-08-14T09:12:40.745-0600 DEBUG [beat] instance/beat.go:495 Beat metadata path: /usr/local/var/lib/filebeat/meta.json
2019-08-14T09:12:40.745-0600 INFO instance/beat.go:475 Beat UUID: 764da0fd-ea93-4777-b1ea-63149be0d6b6
2019-08-14T09:12:40.745-0600 INFO instance/beat.go:213 Setup Beat: filebeat; Version: 6.2.4
2019-08-14T09:12:40.745-0600 DEBUG [beat] instance/beat.go:230 Initializing output plugins
2019-08-14T09:12:40.745-0600 DEBUG [processors] processors/processor.go:49 Processors:
2019-08-14T09:12:40.745-0600 INFO pipeline/module.go:76 Beat name: Ad-MBP.domain
2019-08-14T09:12:40.745-0600 ERROR fileset/modules.go:95 Not loading modules. Module directory not found: /usr/local/Cellar/filebeat/6.2.4/module
2019-08-14T09:12:40.745-0600 INFO [monitoring] log/log.go:97 Starting metrics logging every 30s
2019-08-14T09:12:40.745-0600 INFO instance/beat.go:301 filebeat start running.
2019-08-14T09:12:40.745-0600 DEBUG [registrar] registrar/registrar.go:90 Registry file set to: /usr/local/var/lib/filebeat/registry
2019-08-14T09:12:40.746-0600 INFO registrar/registrar.go:110 Loading registrar data from /usr/local/var/lib/filebeat/registry
2019-08-14T09:12:40.746-0600 INFO registrar/registrar.go:121 States Loaded from registrar: 0
2019-08-14T09:12:40.746-0600 WARN beater/filebeat.go:261 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2019-08-14T09:12:40.746-0600 INFO crawler/crawler.go:48 Loading Prospectors: 1
2019-08-14T09:12:40.746-0600 DEBUG [registrar] registrar/registrar.go:152 Starting Registrar
2019-08-14T09:12:40.746-0600 DEBUG [cfgfile] cfgfile/reload.go:95 Checking module configs from: /usr/local/etc/filebeat/modules.d/*.yml
2019-08-14T09:12:40.746-0600 DEBUG [cfgfile] cfgfile/reload.go:109 Number of module configs found: 0
2019-08-14T09:12:40.746-0600 INFO crawler/crawler.go:82 Loading and starting Prospectors completed. Enabled prospectors: 0
2019-08-14T09:12:40.746-0600 INFO cfgfile/reload.go:127 Config reloader started
2019-08-14T09:12:40.748-0600 DEBUG [cfgfile] cfgfile/reload.go:151 Scan for new config files
2019-08-14T09:12:40.748-0600 DEBUG [cfgfile] cfgfile/reload.go:170 Number of module configs found: 0
2019-08-14T09:12:40.748-0600 INFO cfgfile/reload.go:219 Loading of config files completed.
I am not sure what I am doing wrong..
I believe filebeat inputs are only available from filebeat 6.3+, anything older used filebeat prospectors.
6.3 TCP input documentation, nothing available for 6.2 or older as it uses prospectors:
https://www.elastic.co/guide/en/beats/filebeat/6.3/filebeat-input-tcp.html
Your logs show that you are on filebeat version 6.24, could you try out your configuration with 6.3+?

Minishift: Problems creating virtual machine

my question about the installation of openshift environment using minishift on virtual box.
minishift v1.4.1+0f658ea
VirtualBox-5.1.26-117224-Win.exe
The installation is incomplete due to the folowing error:-
C:\Users\xyzdgs\Desktop\Openshift_n_Docker\OpenShift Developer>minishift.exe start --vm-driver=C:\Program Files\Oracle\VirtualBox\VBoxSVC.exe
-- Starting local OpenShift cluster using 'C:\Program' hypervisor ...
-- Minishift VM will be configured with ...
Memory: 2 GB
vCPUs : 2
Disk size: 20 GB
Downloading ISO 'https://github.com/minishift/minishift-b2d-iso/releases/download/v1.1.0/minishift-b2d.iso'
40.00 MiB / 40.00 MiB [===========================================] 100.00% 0s
-- Starting Minishift VM ... | Unsupported driver: C:\Program
So, to solve this I simply put the directory where all drivers are located in the installation and run it again
C:\Users\xyzdgs\Desktop\Openshift_n_Docker\OpenShift Developer>minishift.exe start --vm-driver=C:\Program Files\Oracle\VirtualBox\
-- Starting local OpenShift cluster using 'C:\Program' hypervisor ...
-- Starting Minishift VM ... / FAIL E0825 11:20:43.830638 1260 start.go:342]
Error starting the VM: Error getting the state for host: machine does not exist.
Retrying.
| FAIL E0825 11:20:44.297638 1260 start.go:342] Error starting the VM: Error getting the state for host: machine does not exist. Retrying.
/ FAIL E0825 11:20:44.612638 1260 start.go:342] Error starting the VM: Error getting the state for host: . Retrying.
Error starting the VM: Error getting the state for host: machine does not exist
Error getting the state for host: machine does not exist
Error getting the state for host: machine does not exist
It says "machine does not exist", shouldn't the machine be created by minishift itself (see te procedure here: blog.novatec-gmbh.de/getting-started-minishift-openshift-origin-one-vm/)
Not sure what is causing this. Please guide.
The main issue with the command -- and what it's really complaining about -- is that you're passing in an unquoted path:
minishift.exe start --vm-driver=C:\Program Files\Oracle\VirtualBox\VBoxSVC.exe
should have been
minishift.exe start --vm-driver="C:\Program Files\Oracle\VirtualBox\VBoxSVC.exe"
But according to the MiniShift documentation, you should update to VirtualBox 5.1.12+ (which you have) and use the following syntax:
minishift.exe start --vm-driver=virtualbox
7 months after this question was asked and using VirtualBox v4.3.30, I can get MiniShift v1.15.1 running with the last command, but can't get it to accept your previous syntax or even produce the same error from it.