How do I download one or more files from a stopped Fargate task? - aws-fargate

I have an ECS task that runs some test cases. I have it running in Fargate. Yay!
Now I want to download the test results file(s) from the container. I have the task and container IDs handy. I can find the exit code with
aws ecs describe-tasks --cluster Fargate --tasks <my-task-id>
How do I download the log and/or files produced?

It looks like, as of right now, the only way to get test results off of my server is to send the results to S3 before the container shuts down.
From this thread, there's no way to mount a volume / EFS onto a Fargate container.
Here's my bash script for running my tests (in build.sh) and then uploading the results to S3:
#!/bin/bash
echo Running tests...
pushd ~circleci/project/
export AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY
export AWS_SECRET_ACCESS_KEY=$AWS_SECRET_KEY
commandToRun="~/project/.circleci/build_scripts/build.sh";
# Run the command
eval $commandToRun 2>&1 | tee /tmp/build.log
# Get the exit code
exitCode=$?
aws s3 cp /tmp/build-$FEATURE.log s3://$CICD_BUCKET/build.log \
--storage-class REDUCED_REDUNDANCY \
--region us-east-1
exit ${exitCode}
Of course, you'll have to pass in the AWS_ACCESS_KEY, AWS_SECRET_KEY and CICD_BUCKET environment variables. The bucket name you choose needs to be pre-created, but any directory structure below it does NOT need to be created in advance.

You probably want to look at using CodeBuild for this use case, which can automatically copy artifacts to S3.
It's actually quite easy to orchestrate the following using a simple bash script and the AWS CLI:
Idempotently Create/Update a CodeBuild project (using a simple CloudFormation template you can define in your source repository)
Run a Codebuild job that executes a given revision of your source repository (using again a self-defining buildspec.yml specification defined in your source repository)
Attach to the CloudWatch logs log group for your CodeBuild job and stream log output
Finally detect when the job has completed successfully or not and then download any artifacts locally using S3
I use this approach to run builds in CodeBuild, with Bamboo as the overarching continuous delivery system.

Related

How to Use Docker Build Secrets with Kaniko

Context
Our current build system builds docker images inside of a docker container (Docker in Docker). Many of our docker builds need credentials to be able to pull from private artifact repositories.
We've handled this with docker secrets.. passing in the secret to the docker build command, and in the Dockerfile, referencing the secret in the RUN command where its needed. This means we're using docker buildkit. This article explains it.
We are moving to a different build system (GitLab) and the admins have disabled Docker in Docker (security reasons) so we are moving to Kaniko for docker builds.
Problem
Kaniko doesn't appear to support secrets the way docker does. (there are no command line options to pass a secret through the Kaniko executor).
The credentials the docker build needs are stored in GitLab variables. For DinD, you simply add those variables to the docker build as a secret:
DOCKER_BUILDKIT=1 docker build . \
--secret=type=env,id=USERNAME \
--secret=type=env,id=PASSWORD \
And then in docker, use the secret:
RUN --mount=type=secret,id=USERNAME --mount=type=secret,id=PASSWORD \
USER=$(cat /run/secrets/USERNAME) \
PASS=$(cat /run/secrets/PASSWORD) \
./scriptThatUsesTheseEnvVarCredentialsToPullArtifacts
...rest of build..
Without the --secret flag to the kaniko executor, I'm not sure how to take advantage of docker secrets... nor do I understand the alternatives. I also want to continue to support developer builds. We have a 'build.sh' script that takes care of gathering credentials and adding them to the docker build command.
Current Solution
I found this article and was able to sort out a working solution. I want to ask the experts if this is valid or what the alternatives might be.
I discovered that when the kaniko executor runs, it appears to mount a volume into the image that's being built at: /kaniko. That directory does not exist when the build is complete and does not appear to be cached in the docker layers.
I also found out that if if the Dockerfile secret is not passed in via the docker build command, the build still executes.
So my gitlab-ci.yml file has this excerpt (the REPO_USER/REPO_PWD variables are GitLab CI variables):
- echo "${REPO_USER}" > /kaniko/repo-credentials.txt
- echo "${REPO_PWD}" >> /kaniko/repo-credentials.txt
- /kaniko/executor
--context "${CI_PROJECT_DIR}/docker/target"
--dockerfile "${CI_PROJECT_DIR}/docker/target/Dockerfile"
--destination "${IMAGE_NAME}:${BUILD_TAG}"
Key piece here is echo'ing the credentials to a file in the /kaniko directory before calling the executor. That directory is (temporarily) mounted into the image which the executor is building. And since all this happens inside of the kaniko image, that file will disappear when kaniko (gitlab) job completes.
The developer build script (snip):
//to keep it simple, this assumes that the developer has their credentials//cached in a file (ignored by git) called dev-credentials.txt
DOCKER_BUILDKIT=1 docker build . \
--secret id=repo-creds,src=dev-credentials.txt
Basically same as before. Had to put it in a file instead of environment variables.
The dockerfile (snip):
RUN --mount=type=secret,id=repo-creds,target=/kaniko/repo-credentials.txt USER=$(sed '1q;d' /kaniko/repo-credentials.txt) PASS=$(sed '2q;d' /kaniko/repo-credentials.txt) ./scriptThatUsesTheseEnvVarCredentialsToPullArtifacts...rest of build..
This Works!
In the Dockerfile, by mounting the secret in the /kaniko subfolder, it will work with both the DinD developer build as well as with the CI Kaniko executor.
For Dev builds, DinD secret works as always. (had to change it to a file rather than env variables which I didn't love.)
When the build is run by Kaniko, I suppose since the secret in the RUN command is not found, it doesn't even try to write the temporary credentials file (which I expected would fail the build). Instead, because I directly wrote the varibles to the temporarily mounted /kaniko directory, the rest of the run command was happy.
Advice
To me this does seem more kludgy than expected. I'm wanting to find out other/alternative solutions. Finding out the /kaniko folder is mounted into the image at build time seems to open a lot of possibilities.

upload yarn application logs from emr cluster to s3

I know they can be uploaded to s3 in ~5 minute intervals with logpusher, but I would ideally like to get them within 30s-1min of step completion.
The logs I am looking for are the application logs for stdout
I can ssh to the master node and get these logs via:
yarn logs -applicationId <<application_id>>
Is there a way that I can either write a bootstrap script that restarts the logpusher service after a step has been completed, or a way to submit an emr step that will export the yarn logs to s3?
EDIT:
I ended up accomplishing this task by setting up an automatic follow-up job with boto3 utilizing AWS's script-runner jar, where I run a bash script that creates a text file from the yarn cli of a list of yarn application id's, downloads a python script from s3 I made to parse this text file and find the most recent applicationid, then pass that appID to the yarn cli to make a text file of that apps logs, then uploads them to s3 again. This reduces the wait time to ~15 seconds after a job completes.
You need not specify anything. By default, EMR pushes application logs to s3 location specified in Log Uri. Just look under containers in the Log Uri location.

Flink on EMR cannot access S3 bucket from "flink run" command

I'm prototyping the use of AWS EMR for a Flink-based system that we're planning to deploy. My cluster has the following versions:
Release label: emr-5.10.0
Hadoop distribution: Amazon 2.7.3
Applications: Flink 1.3.2
In the documentation provided by Amazon here: Amazon flink documentation
and the documentation from Flink: Apache flink documentation
both mention directly using S3 resources as an integrated file system with the s3://<bucket>/<file> pattern. I have verified that all the correct permissions are set, I can use the AWS CLI to copy S3 resources to the Master node with no problem, but attempting to start a Flink job using a Jar from S3 does not work.
I am executing the following step:
JAR location : command-runner.jar
Main class : None
Arguments : flink run -m yarn-cluster -yid application_1513333002475_0001 s3://mybucket/myapp.jar
Action on failure: Continue
The step always fails with
JAR file does not exist: s3://mybucket/myapp.jar
I have spoken to AWS support, and they suggested having a previous step copy the S3 file to the local Master node and then referencing it with a local path. While this would obviously work, I would rather get the native S3 integration working.
I have also tried using the s3a filesystem and get the same result.
You need to download your jar from s3 to be available in the classpath.
aws s3 cp s3://mybucket/myapp.jar myapp.jar
and then run the run -m yarn-cluster myapp.jar

How to reduce time running gclient sync for WebRTC

I am building WebRTC library using travis CI.
This is running well but takes lots of time and more and more often the build ends with the message :
The job exceeded the maximum time limit for jobs, and has been
terminated.
You can consult a log that failed travis log
During the gclient sync :
_______ running 'download_from_google_storage --directory --recursive --num_threads=10 --no_auth --quiet --bucket chromium-webrtc-resources src/resources' in '/home/travis/build/mpromonet/webrtc-streamer/webrtc'
...
Hook 'download_from_google_storage --directory --recursive --num_threads=10 --no_auth --quiet --bucket chromium-webrtc-resources src/resources' took 1255.11 secs
I disabled the tests, so I think this is useless and it takes lots of time.
Is there anyway to give some arguments or setting some variables to avoid this time costly task ?
A way to not download chromium-webrtc-resources defined in dependencies DEPS
{
# Download test resources, i.e. video and audio files from Google Storage.
'pattern': '.',
'action': ['download_from_google_storage',
'--directory',
'--recursive',
'--num_threads=10',
'--no_auth',
'--quiet',
'--bucket', 'chromium-webrtc-resources',
'src/resources'],
},
is to pached it removing this section or adding a condition that is false.
In order to patch I used the folowing command :
sed -i -e "s|'src/resources'],|'src/resources'],'condition':'rtc_include_tests==true',|" src/DEPS
This save about 20mn and allow the travis build to stay below the timeout.
You can bake the entire toolchain into a docker image and run your actual tests/builds in that. Delegate the docker image update into another automated process (travis-ci cronjob for example).
An additional benefit is that you now have full control over when parts of your toolchain change. I find that very important.
Edit:
Some resources to read.
The official travis docs for using docker
Building & deploying images on travis
Dockerhub automated builds

Deploy flat files from Bamboo to S3

We deploy flat files to our web servers using bamboo SCP jobs.
I would like to move content from the web servers to S3, so need a Bamboo job to deploy static content to an S3 bucket.
I assumed it would be a 2 min job to make a build plan to deploy flat files to S3, but suspect I'm missing something obvious here, as I can't see how to do it.
First you need to create a "Script" in your build job.
Then export the AWS access keys in your build script:
export AWS_ACCESS_KEY_ID=AKIAJA335522247FF
export AWS_SECRET_ACCESS_KEY=crNwiopyfDWD780wO32hv0cAkmzV65vyA3++No+
After that you can simply iterate over your files and copy them with the aws command to your desired bucket:
FILES="backups/*"
bucket="s3://my-backups/database/"
for f in $FILES
do
file=`basename $f`
echo "Processing $file"
target=$bucket$file
aws s3 cp $f $target
done
Alternatively you can copy also a folder:
aws s3 cp "my-files/" "s3://my-backups/" --recursive
Or, if you want to be even faster, you can only synchronize the changes:
aws s3 sync "my-files/" "s3://my-backups/"