Is it possible to have mamba aggregate a list of yaml files during the create command? - mamba

I have a process where envs are created and then multiple env files are used to update the env via mamba env update .
The problem is that I'm running commands in parallel and the lockfile doesn't appear to work with mamba env update only for mamba env create . https://github.com/mamba-org/mamba/issues/1582
Without me having to merge all the env files myself is there a way to do something like this:
mamba create --yes --quiet --name=myenv python=3.6 --file=deps1.yml --file=deps2.yml --file=deps2.yml

Related

Do I need to set aws config in docker compose as volume?

In my project I need to configure a AWS bucket download as it always gets read time error or connection error when downloading a fairly large file in my deployment. I have a .aws/config in my root directory and in my dockerfile I use "ADD . ." which adds all the files in the project. To build the image I use Docker compose. However, for some reason it is not using the aws config values. Is there a way to pass these values to Docker so that they are actually used?
This is my "config" file which is in ".aws" in the root of the project.
[default]
read_timeout = 1200
connect_timeout = 1200
http_socket_timeout = 1200
s3 =
max_concurrent_requests = 2
multipart_threshold = 8MB
multipart_chunksize = 8MB
My Dockerfile looks like this:
FROM python:3.7.7-stretch AS BASE
RUN apt-get update \
&& apt-get --assume-yes --no-install-recommends install \
build-essential \
curl \
git \
jq \
libgomp1 \
vim
WORKDIR /app
# upgrade pip version
RUN pip install --no-cache-dir --upgrade pip
RUN pip3 install boto3
ADD . .
I expected that through the "ADD . ." boto3 would use the config file. But that is unfortunately not the case.
Perhaps this would answer your question on why the ADD command didn't work.
Hidden file .env not copied using Docker COPY
Instead of relying on the local config setting of the machine where the docker image is built, you might want to put in the configuration as an explicit file in your repo, which is copied over to ~/.aws/config or anywhere in the container and referenced by setting its path to AWS_CONFIG_FILE; OR use any one of the the methods defined in the AWS documentation below:
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
wherein you can define your configuration as part of your python code or declare them as environment variables

Why can't Poetry load my environment variables?

I'm trying to get Poetry to load env vars from a file when the .venv is activated. I can't use the plugin to load a .env because I'm not allowed to use Poetry 1.2.
I've got my env vars in a file called creds.sh.
I've edited both my .venv\Scripts\activate and .venv\Scripts\activate.bat to source from that file:
set -a
. creds.sh
set +a
and,
for /f "delims== tokens=1,2" %%G in (%~dp0..\..\creds.sh) do set %%G=%%H
respectively.
When I open up a new bash, and \activate, it works. I can env | grep <my var> and find my vars.
When I open up a new cmd, and \activate.bat, it works. I can again env | grep <my var> and find my vars.
When I poetry shell or poetry run my-script, it can't find my vars! What the heck is going on.
One thought is somehow poetry's opening up some other kind of shell, so I've tried to look for the $SHELL var inside my .venv but it doesn't seem to exist.
Any ideas? How else might I inject env vars into the Poetry env when the env is activated?

How to run a Python project (package) on AWS EMR serverless

I have a python project with several modules, classes, and dependencies files (a requirements.txt file). I want to pack it into one file with all the dependencies and give the file path to AWS EMR serverless, which will run it.
The problem is that I don't understand how to pack a python project with all the dependencies, which file the EMR can consume, etc. All the examples I have found used one python file.
In simple words, what should I do if my python project is not a single file but is more complex?
Can anyone help with some details?
There's a few ways to do this with EMR Serverless. Regardless of which way you choose, you will need to provide a main entrypoint Python script to the EMR Serverless StartJobRun command.
Let's assume you've got a job structure like this where main.py is your entrypoint that creates a Spark session and runs your jobs and job1 and job2 are your local modules.
├── jobs
│ └── job1.py
│ └── job2.py
├── main.py
├── requirements.txt
Option 1. Use --py-files with your zipped local modules and --archives with a packaged virtual environment for your external dependencies
Zip up your job files
zip -r job_files.zip jobs
Create a virtual environment using venv-pack with your dependencies.
Note: This has to be done with a similar OS and Python version as EMR Serverless, so I prefer using a multi-stage Dockerfile with custom outputs.
FROM --platform=linux/amd64 amazonlinux:2 AS base
RUN yum install -y python3
ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
RUN python3 -m pip install --upgrade pip && \
python3 -m pip install venv-pack==0.2.0 && \
python3 -m pip install -r requirements.txt
RUN mkdir /output && venv-pack -o /output/pyspark_deps.tar.gz
FROM scratch AS export
COPY --from=base /output/pyspark_deps.tar.gz /
If you run DOCKER_BUILDKIT=1 docker build --output . ., you should now have a pyspark_deps.tar.gz file on your local system.
Upload main.py, job_files.zip, and pyspark_deps.tar.gz to a location on S3.
Run your EMR Serverless job with a command like this (replacing APPLICATION_ID, JOB_ROLE_ARN, and YOUR_BUCKET):
aws emr-serverless start-job-run \
--application-id $APPLICATION_ID \
--execution-role-arn $JOB_ROLE_ARN \
--job-driver '{
"sparkSubmit": {
"entryPoint": "s3://<YOUR_BUCKET>/main.py",
"sparkSubmitParameters": "--py-files s3://<YOUR_BUCKET>/job_files.zip --conf spark.archives=s3://<YOUR_BUCKET>/pyspark_deps.tar.gz#environment --conf spark.emr-serverless.driverEnv.PYSPARK_DRIVER_PYTHON=./environment/bin/python --conf spark.emr-serverless.driverEnv.PYSPARK_PYTHON=./environment/bin/python --conf spark.executorEnv.PYSPARK_PYTHON=./environment/bin/python"
}
}'
Option 2. Package your local modules as a Python library and use --archives with a packaged virtual environment
This is probably the most reliable way, but it will require you to use setuptools. You can use a simple pyproject.toml file along with your existing requirements.txt
[project]
name = "mysparkjobs"
version = "0.0.1"
dynamic = ["dependencies"]
[tool.setuptools.dynamic]
dependencies = {file = ["requirements.txt"]}
You then can use a multi-stage Dockerfile and custom build outputs to package your modules and dependencies into a virtual environment.
Note: This requires you to enable Docker Buildkit
FROM --platform=linux/amd64 amazonlinux:2 AS base
RUN yum install -y python3
ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
WORKDIR /app
COPY . .
RUN python3 -m pip install --upgrade pip && \
python3 -m pip install venv-pack==0.2.0 && \
python3 -m pip install .
RUN mkdir /output && venv-pack -o /output/pyspark_deps.tar.gz
FROM scratch AS export
COPY --from=base /output/pyspark_deps.tar.gz /
Now you can run DOCKER_BUILDKIT=1 docker build --output . . and a pyspark_deps.tar.gz file will be generated with all your dependencies. Upload this file and your main.py script to S3.
Assuming you uploaded both files to s3://<YOUR_BUCKET>/code/pyspark/myjob/, run the EMR Serverless job like this (replacing the APPLICATION_ID, JOB_ROLE_ARN, and YOUR_BUCKET:
aws emr-serverless start-job-run \
--application-id <APPLICATION_ID> \
--execution-role-arn <JOB_ROLE_ARN> \
--job-driver '{
"sparkSubmit": {
"entryPoint": "s3://<YOUR_BUCKET>/code/pyspark/myjob/main.py",
"sparkSubmitParameters": "--conf spark.archives=s3://<YOUR_BUCKET>/code/pyspark/myjob/pyspark_deps.tar.gz#environment --conf spark.emr-serverless.driverEnv.PYSPARK_DRIVER_PYTHON=./environment/bin/python --conf spark.emr-serverless.driverEnv.PYSPARK_PYTHON=./environment/bin/python --conf spark.executorEnv.PYSPARK_PYTHON=./environment/bin/python"
}
}'
Note the additional sparkSubmitParameters that specify your dependencies and configure the driver and executor environment variables for the proper paths to python.

How to dynamically pass parameter to the RPM during installation

We are in need to dynamically pass a variable during RPM installation and capture it in the spec file to trigger a script in %post
Following is the command
RPM Install Command
sudo rpm -Uvh --force abc.noarch.rpm --define '_ip 10.1.2.4' --define 'version 3'
**abc.spec**
Name: abc
Version: 1
Release: 1.0
Summary: Test
%{!?_ip: %define _ip 0.0.0.0 }
%{!?_version: %define _version 0 }
%post
echo "ip:::: %{_ip}"
echo "VESION:::: %{_version}"
So when I run the RPM with the above command , I get the following output.
[root#test solution]$ sudo rpm -Uvh --force abc.noarch.rpm --define '_ip 10.1.2.4' --define 'version 3'
Preparing... ################################# [100%]
Updating / installing...
1:abc ################################# [ 50%]
ip:::: 0.0.0.0
VESION:::: 0
Though i pass a different value in the CLI command , I still see that the argument which I pass is not been captured in the spec file.
Need inputs on how to capture the values which im passing the CLI .
The option --define defines macro. Macros are evaluated when building an RPM from SRC.RPM using rpmbuild. The binary (does not matter if arch or noarch) package has every macro already expanded. Even the %bindir etc.
The RPM ecosystem was designed as non-interactive. This is a big difference from the DEB ecosystem when questions can be raised using debconf.
You cannot workaround it. You cannot ask even by directly reading STDIN as rpm close this descriptor before executing scriptlets.
The best practice is to use configuration files. E.g. /etc/abc/ip.conf. And:
either instruct user to manualy (or using Ansible) alter that file and store their correct data
or do NOT distribute /etc/abc/ip.conf in main abc package and instead require abc-config. And then create one or more config packages which will be like:
Package: abc-testing-config
Provides: abc-config
...
%files
/etc/abc/ip.conf
And you then instruct users to install abc abc-test-config. Or it can be abc abc-EMEA-config etc....

Singularity container from conda environment

I want to build a container from my conda environment following this post. However, I get the following error: '/bin/sh: 1: cannot create ~/.bashrc: Directory nonexistent'. I am using a vagrant VM to build my image and would be grateful for any help.
Editing the .bashrc, aside from failing, will not be helpful as the shell loaded by singularity is explicitly --norc. You want to use the $SINGULARITY_ENVIRONMENT variable in %post to have the values available.
Something along these lines:
%post
# You may need to install some pre-reqs your host system has installed outside of conda, e.g.
# apt update && apt install -y build-essential make zlib
ENV_NAME=$(head -1 environment.yml | cut -d' ' -f2)
echo ". /opt/conda/etc/profile.d/conda.sh" >> $SINGULARITY_ENVIRONMENT
echo "conda activate $ENV_NAME" >> $SINGULARITY_ENVIRONMENT
. /opt/conda/etc/profile.d/conda.sh
conda env create -f environment.yml -p /opt/conda/envs/$ENV_NAME
I listed a few libraries that you probably have installed in your current machine that might not be installed in the slim docker image. You can install them via apt or conda, depending on your preference. If it does happen though, it'll be specific to your environment.yml and host OS, so you'll have to iterate through until the build succeeds.