Trasferring data from Google Cloud storage to AWS S3 - amazon-s3

I am transferring data from Google Cloud Storage to AWS S3 using distcp in EMR(I have made some configuration changes to EMR to achieve this). Is the data transfer secure? If not, what are the other options?

Related

AWS S3 bucket migration to Cloud Storage

I have to migrate all AWS S3 bucket and it's contents to GCP cloud storage with the help of terraform only. Please help me with how can I do that.
I have not found anything suitable so reaching to you

S3 event notifications from on-prem to cloud

Looking for technology solutions to this problem - how to trigger S3 bucket notifications from on-prem S3 NetApp to AWS cloud. We're looking at two approaches - StorageGRID event notification and Qlik S3 options.
Current setup - AWS direct connect to on-prem. NetApp S3 compatible storage on-prem. Need to trigger to AWS cloud.
thanks

How to set up AWS S3 bucket as persistent volume in on-premise k8s cluster

Since NFS has single point of failure issue. I am thinking to build a storage layer using S3 or Google Cloud Storage as PersistentVolumn in my local k8s cluster.
After a lot of google search, I still cannot find an way. I have tried using s3 fuse to mount volume to local, and then create PV by specifying the hotPath. However, a lot of my pods (for example airflow, jenkins), complained about no write permission, or say "version being changed".
Could someone help figuring out the right way to mount S3 or GCS bucket as a PersistenVolumn from local cluster without using AWS, or GCP.
S3 is not a file system and is not intended to be used in this way.
I do not recommend to use S3 this way, because in my experience any FUSE-drivers very unstable and with I/O operations you will easily ruin you mounted disk and stuck in Transport endpoint is not connected nightmare for you and your infrastructure users. It's also may lead to high CPU usage and RAM leakage.
Useful crosslinks:
How to mount S3 bucket on Kubernetes container/pods?
Amazon S3 with s3fs and fuse, transport endpoint is not connected
How stable is s3fs to mount an Amazon S3 bucket as a local directory

Apache Atlas and AWS S3

i am working on a project that has a requirement to store scientific data on AWS S3 as raw data for the beginning of a data lake. we are planning JSON for application data and using S3 metadata to persist application metadata (JSON schema) and process metadata. at the moment, on site S3 is the only service that we have available to us from the AWS cloud.
the client would like a publish environment where they can get the raw data back as files. we would like to avoid building a custom catalog and security infrastructure.
i don't see anything about Apache Atlas that will connect directly to AWS S3. but we can put Apache Hive on top of AWS S3 and then put Apache Atlas and Ranger on top of that. but not sure if this is how we can publish the raw data from S3 or if that even works as Hive is more of a processing environment.
is it possible to use Apache Atlas and Ranger on top of AWS S3 directly?

Copy objects from S3 to google cloud storage using aws-cli

Is this possible to access Google Cloud Storage using aws CLI?
Google Cloud Platform have support to copy files from S3 to Google Cloud Storage using gsutil with the following CLI.
gsutil -m cp -R s3://bucketname gs://bucketname
But I need to do this with aws CLI instead of gsutil.
I am not aware of any solution from the AWS side, but unless you have a special reason not to use gsutil or other Google solution, you may consider using Google Cloud Storage Transfer Service instead. This service is recommended when transferring data from Amazon S3 buckets.
Compared with simply using gsutil, or other CLI tools out there, Google Cloud Storage Transfer has several nice features like the possibility to schedule one-time or recurring transfers, where you can use advanced filters. Also, you can indicate if you want the source objects to be deleted after transferring them, and even synchronize the destination bucket with the source one, deleting existing objects if they don't have a corresponding object in the source.
You can schedule transfers from the GCP Console or using the XML and JSON API.