upload yarn application logs from emr cluster to s3 - amazon-s3

I know they can be uploaded to s3 in ~5 minute intervals with logpusher, but I would ideally like to get them within 30s-1min of step completion.
The logs I am looking for are the application logs for stdout
I can ssh to the master node and get these logs via:
yarn logs -applicationId <<application_id>>
Is there a way that I can either write a bootstrap script that restarts the logpusher service after a step has been completed, or a way to submit an emr step that will export the yarn logs to s3?
EDIT:
I ended up accomplishing this task by setting up an automatic follow-up job with boto3 utilizing AWS's script-runner jar, where I run a bash script that creates a text file from the yarn cli of a list of yarn application id's, downloads a python script from s3 I made to parse this text file and find the most recent applicationid, then pass that appID to the yarn cli to make a text file of that apps logs, then uploads them to s3 again. This reduces the wait time to ~15 seconds after a job completes.

You need not specify anything. By default, EMR pushes application logs to s3 location specified in Log Uri. Just look under containers in the Log Uri location.

Related

Yarn Job related commands

For any job which is submitted to YARN using YARN console and YARN Cluster UI, how to find:
Who has submitted the job?
To which YARN queue is a job submitted?
How much time did it take to finish the job?
I tried using below command, but it gives lot of details, not specific
yarn application -list
Give a look at Yarn Admin Page, there are the details about all the jobs you have submitted to the cluster.
Just accessing to <Local_ip>:8088 I.E: Localhost:8088.
Also, there is a section for logs at /logs/userlogs directory. This directory will contain logs for all applications running by a user.

MLflow Artifacts Storing But Not Listing In UI

I've run into an issue using MLflow server. When I first ran the command to start an mlflow server on an ec2 instance, everything worked fine. Now, although logs and artifacts are being stored to postgres and s3, the UI is not listing the artifacts. Instead, the artifact section of the UI shows:
Loading Artifacts Failed
Unable to list artifacts stored under <s3-location> for the current run. Please contact your tracking server administrator to notify them of this error, which can happen when the tracking server lacks permission to list artifacts under the current run's root artifact directory.
But when I check in s3, I see the artifact in the s3 location that the error shows. What could possibly have started causing this as it used to work not too long ago and nothing was changed on the ec2 that is hosting mlflow?
I found the answer. The error was that mlflow could not find boto3, so a conda installation of that worked. The logs for this were buried and hard to find in stdout.

Gitlab-CI: AWS S3 deploy is failing

I am trying to create a deployment pipeline for Gitlab-CI on a react project. The build is working fine and I use artifacts to store the dist folder from my yarn build command. This is working fine as well.
The issue is regarding my deployment with command: aws s3 sync dist/'bucket-name'.
Expected: "Done in x seconds"
Actual:
error Command failed with exit code 2. info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command. Running after_script 00:01 Uploading artifacts for failed job 00:01 ERROR: Job failed: exit code 1
The files seem to have been uploaded correctly to the S3 bucket, however I do not know why I get an error on the deployment job.
When I run the aws s3 sync dist/'bucket-name' locally everything works correctly.
Check out AWS CLI Return Codes
2 -- The meaning of this return code depends on the command being run.
The primary meaning is that the command entered on the command line failed to be parsed. Parsing failures can be caused by, but are not limited to, missing any required subcommands or arguments or using any unknown commands or arguments. Note that this return code meaning is applicable to all CLI commands.
The other meaning is only applicable to s3 commands. It can mean at least one or more files marked for transfer were skipped during the transfer process. However, all other files marked for transfer were successfully transferred. Files that are skipped during the transfer process include: files that do not exist, files that are character special devices, block special device, FIFO's, or sockets, and files that the user cannot read from.
The second paragraph might explain what's happening.
There is no yarn build command. See https://classic.yarnpkg.com/en/docs/cli/run
As Anton mentioned, the second paragraph of his answer was the problem. The solution to the problem was removing special characters from a couple SVGs. I suspect uploading the dist folder as an artifact(zip) might have changed some of the file names altogether which was confusing to S3. By removing ® and + from the filename the issue was resolved.

How do I download one or more files from a stopped Fargate task?

I have an ECS task that runs some test cases. I have it running in Fargate. Yay!
Now I want to download the test results file(s) from the container. I have the task and container IDs handy. I can find the exit code with
aws ecs describe-tasks --cluster Fargate --tasks <my-task-id>
How do I download the log and/or files produced?
It looks like, as of right now, the only way to get test results off of my server is to send the results to S3 before the container shuts down.
From this thread, there's no way to mount a volume / EFS onto a Fargate container.
Here's my bash script for running my tests (in build.sh) and then uploading the results to S3:
#!/bin/bash
echo Running tests...
pushd ~circleci/project/
export AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY
export AWS_SECRET_ACCESS_KEY=$AWS_SECRET_KEY
commandToRun="~/project/.circleci/build_scripts/build.sh";
# Run the command
eval $commandToRun 2>&1 | tee /tmp/build.log
# Get the exit code
exitCode=$?
aws s3 cp /tmp/build-$FEATURE.log s3://$CICD_BUCKET/build.log \
--storage-class REDUCED_REDUNDANCY \
--region us-east-1
exit ${exitCode}
Of course, you'll have to pass in the AWS_ACCESS_KEY, AWS_SECRET_KEY and CICD_BUCKET environment variables. The bucket name you choose needs to be pre-created, but any directory structure below it does NOT need to be created in advance.
You probably want to look at using CodeBuild for this use case, which can automatically copy artifacts to S3.
It's actually quite easy to orchestrate the following using a simple bash script and the AWS CLI:
Idempotently Create/Update a CodeBuild project (using a simple CloudFormation template you can define in your source repository)
Run a Codebuild job that executes a given revision of your source repository (using again a self-defining buildspec.yml specification defined in your source repository)
Attach to the CloudWatch logs log group for your CodeBuild job and stream log output
Finally detect when the job has completed successfully or not and then download any artifacts locally using S3
I use this approach to run builds in CodeBuild, with Bamboo as the overarching continuous delivery system.

Flink on EMR cannot access S3 bucket from "flink run" command

I'm prototyping the use of AWS EMR for a Flink-based system that we're planning to deploy. My cluster has the following versions:
Release label: emr-5.10.0
Hadoop distribution: Amazon 2.7.3
Applications: Flink 1.3.2
In the documentation provided by Amazon here: Amazon flink documentation
and the documentation from Flink: Apache flink documentation
both mention directly using S3 resources as an integrated file system with the s3://<bucket>/<file> pattern. I have verified that all the correct permissions are set, I can use the AWS CLI to copy S3 resources to the Master node with no problem, but attempting to start a Flink job using a Jar from S3 does not work.
I am executing the following step:
JAR location : command-runner.jar
Main class : None
Arguments : flink run -m yarn-cluster -yid application_1513333002475_0001 s3://mybucket/myapp.jar
Action on failure: Continue
The step always fails with
JAR file does not exist: s3://mybucket/myapp.jar
I have spoken to AWS support, and they suggested having a previous step copy the S3 file to the local Master node and then referencing it with a local path. While this would obviously work, I would rather get the native S3 integration working.
I have also tried using the s3a filesystem and get the same result.
You need to download your jar from s3 to be available in the classpath.
aws s3 cp s3://mybucket/myapp.jar myapp.jar
and then run the run -m yarn-cluster myapp.jar