Is it possible to change the bucket & path of the Log URI of a running EMR cluster? Or is the only way to change it to terminate & restart the cluster? We had a typo in the bucket name of a cluster we have already spun up and done some work on, so we'd like to change the log location without terminating the cluster.
Related
I'm using amazon EKS fargate. I can see container logs using fluentbit side car etc no problem at all. But those logs ONLY show what is happening inside the container AFTER it has started up
I enabled aws eks cluster logging fully
Now I would like to see logs in cloudwatch which is equivalent of
kubectl describe pod
command
I have searched the ENTIRE cloudwatch clustername log group and am not able to find logs like
"pulling image into container"
"efs not mounted"
etc
I want to see logs in cloudwatch prior to the actual container creation stage
IS it possible at all using eks fargate ?
Thanks a bunch
You can use Container Insights which can collect metrics by using performance log events using the embedded metric format. The logs are stored in CloudWatch Logs. CloudWatch generates several metrics automatically from the logs which you can view in the CloudWatch console.
In Amazon EKS and Kubernetes, Container Insights uses a containerized version of the CloudWatch agent to discover all of the running containers in a cluster. It then collects performance data at every layer of the performance stack.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-view-metrics.html
I'm execution a process in a cluster emr with steps that executes a command s3-dist-cp to copy files from my s3 to EMR.
I'm having an error Input path does not exist. But the ip of machine in log does not is part of cluster.
Did someone already face this problem?
We would like to process AWS ELB access logs and write them into InfluxDB
to be used for application metrics and monitoring (ex. Grafana).
We configured ELB to store access logs into S3 bucket.
What would be the best way to process those logs and write them to InfluxDB?
What we tried so far was to mount S3 bucket to filesystem using s3fs and then use Telegraf agent for processing. But this approach has some issues: s3fs mounting looks like a hack, and all the files in the bucket are compressed and need to be unzipped before telegraf can process them which makes this task overcomplicated.
Is there any better way?
Thanks,
Oleksandr
Can you just install the telegraf agent on the AWS instance that is generating the logs, and have them sent directly to InfluxDB in real-time?
I know that Beanstalk's Snapshot Logs can give you a recent overview of the httpd/access_log files from among the EC2 instances under the ELB for that environment. But does anyone know a good way to get all the logs?
It's a production environment, so I want to do the processing elsewhere. But I don't want to (for obvious reasons) configure root sftp and go around collecting the files manually.
I think I had read something about configuring logging to S3?
In the "Configuration" tab for an Environment, under "Software Configuration", there is a checkbox for enabling log file rotation to S3. These are stored in an S3 bucket used specifically for Elastic Beanstalk.
You can feed your current logs to aws cloudwatch logs.
AWS cloudwatch logs will centralise all logs of your infrastructure with a neat solution to search an process them as well as creating metrix and alarm based on your logs.
I have a guide on how to Store aws beanstalk symfony and apache logs in cloudwatch logs. This will help you to get up and running fast, and then you can tweak it.
I am working on a Java MapReduce app that has to be able to provide an upload service for some pictures from the local machine of the user to an S3 bucket.
The thing is the app must run on an EC2 cluster, so I am not sure how I can refer to the local machine when copying the files. The method copyFromLocalFile(..) needs a path from the local machine which will be the EC2 cluster...
I'm not sure if I stated the problem correctly, can anyone understand what I mean?
Thanks
You might also investigate s3distcp: http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html
Apache DistCp is an open-source tool you can use to copy large amounts of data. DistCp uses MapReduce to copy in a distributed manner—sharing the copy, error handling, recovery, and reporting tasks across several servers. S3DistCp is an extension of DistCp that is optimized to work with Amazon Web Services, particularly Amazon Simple Storage Service (Amazon S3). Using S3DistCp, you can efficiently copy large amounts of data from Amazon S3 into HDFS where it can be processed by your Amazon Elastic MapReduce (Amazon EMR) job flow. You can also use S3DistCp to copy data between Amazon S3 buckets or from HDFS to Amazon S3.
You will need to get the files from the userMachine to at least 1 node before you will be able to use them through a MapReduce.
The FileSystem and FileUtil functions refer to paths either on the HDFS or the local disk of one of the nodes in the cluster.
It cannot reference the user's local system. (Maybe if you did some ssh setup... maybe?)