Attaching EBS Volumes in AWS Fargate - aws-fargate

After specifying the volumes in task definition I am unable to find volumes . How can I attach the Volumes to AWS fargate ?

Fargate task comes with 10gig of ephemeral storage. And you can additionally mount another 4gig (only). This is ephemeral too.
Refer this doc: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/AWS_Fargate.html#fargate-tasks-storage

Related

How to set up AWS S3 bucket as persistent volume in on-premise k8s cluster

Since NFS has single point of failure issue. I am thinking to build a storage layer using S3 or Google Cloud Storage as PersistentVolumn in my local k8s cluster.
After a lot of google search, I still cannot find an way. I have tried using s3 fuse to mount volume to local, and then create PV by specifying the hotPath. However, a lot of my pods (for example airflow, jenkins), complained about no write permission, or say "version being changed".
Could someone help figuring out the right way to mount S3 or GCS bucket as a PersistenVolumn from local cluster without using AWS, or GCP.
S3 is not a file system and is not intended to be used in this way.
I do not recommend to use S3 this way, because in my experience any FUSE-drivers very unstable and with I/O operations you will easily ruin you mounted disk and stuck in Transport endpoint is not connected nightmare for you and your infrastructure users. It's also may lead to high CPU usage and RAM leakage.
Useful crosslinks:
How to mount S3 bucket on Kubernetes container/pods?
Amazon S3 with s3fs and fuse, transport endpoint is not connected
How stable is s3fs to mount an Amazon S3 bucket as a local directory

HDFS over S3 / Google storage bucket translation layer - how?

I'd love to expose a Google storage bucket over HDFS to a service.
Service in question is a cluster (SOLR) that can speak only to HDFS, given I have no hadoop (nor need for it), ideally I'd like to have a docker container that would user a Google storage bucket as a backend and expose it's contents via HDFS.
If possible I'd like to avoid mounts (like fuse gcsfs), has anyone done such thing?
I think I could just do mount gcsfs and setup a single node cluster with HDFS, but is there a simpler / more robust way?
Any hints / directions are appreciated.
The Cloud Storage Connector for Hadoop is the tool you might need.
It is not a Docker image but rather an install. Further instructions can be found in the GitHub repository under README.md and INSTALL.md
If it is accessed from AWS S3 you'll need a Service Account with access to Cloud Storage and set the env variable GOOGLE_APPLICATION_CREDENTIALS to /path/to/keyfile.
To use SOLR with GCS, you need indeed a hadoop cluster and you can do that in GCP by creating a dataproc cluster then use the connector mentioned to connect your SOLR solution with GCS. for more info check this SOLR

Is AWS Fargate useful for deploying a "web" service stack?

I see Fargate as a good service for deploying a Docker Compose based stack, but I was wondering if it is any good for "long-running" web services, not just ones where you need dynamic scaling and undeterminate workloads (e.g. containers that are created and die on demand).
That depends on your use-case. ECS lets you quickly deploy containerized applications. With Fargate we don't need to manage the underlying infrastructure (say server-less approach for containers!). Fargate is suitable for long-running apps, microservices, and batch jobs.
Few of my observations on Fargate are:
Fargate storage is ephemeral - We cannot store container data in disk such as volumes. (although Fargate provides 10 GB of volume mounts that is nonpersistent empty storage.)
Logs can be sent to Cloudwatch using awslogs driver. Recently AWS announced the support for Splunk log driver.
Fargate uses only awavpc network mode.
Fargate supports environment variables. Environment variables are the only way to pass parameters to the container.

How to set up a volume linked to S3 in Docker Cloud with AWS?

I'm running my Play! webapp with Docker Cloud (could also use Rancher) and AWS and I'd like to store all the logs in S3 (via volume). Any ideas on how I could achieve that with minimal effort?
Use docker volumes to store the logs in the host system.
Try S3 aws-cli to sync your local directory with S3 Bucket
aws s3 sync /var/logs/container-logs s3://bucket/
create a cron to run it on every minute or so.
Reference: s3 aws-cli

Hadoop upload files from local machine to amazon s3

I am working on a Java MapReduce app that has to be able to provide an upload service for some pictures from the local machine of the user to an S3 bucket.
The thing is the app must run on an EC2 cluster, so I am not sure how I can refer to the local machine when copying the files. The method copyFromLocalFile(..) needs a path from the local machine which will be the EC2 cluster...
I'm not sure if I stated the problem correctly, can anyone understand what I mean?
Thanks
You might also investigate s3distcp: http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html
Apache DistCp is an open-source tool you can use to copy large amounts of data. DistCp uses MapReduce to copy in a distributed manner—sharing the copy, error handling, recovery, and reporting tasks across several servers. S3DistCp is an extension of DistCp that is optimized to work with Amazon Web Services, particularly Amazon Simple Storage Service (Amazon S3). Using S3DistCp, you can efficiently copy large amounts of data from Amazon S3 into HDFS where it can be processed by your Amazon Elastic MapReduce (Amazon EMR) job flow. You can also use S3DistCp to copy data between Amazon S3 buckets or from HDFS to Amazon S3.
You will need to get the files from the userMachine to at least 1 node before you will be able to use them through a MapReduce.
The FileSystem and FileUtil functions refer to paths either on the HDFS or the local disk of one of the nodes in the cluster.
It cannot reference the user's local system. (Maybe if you did some ssh setup... maybe?)