How to create and interaction between google drive and aws s3? - amazon-s3

I'm trying to set up a connection a Google Drive folder and S3 bucket, but I'm not sure where to start.
I've already created a sort of "Frankenstein process", but it's easy to use only by me and sharing it to my co-workers it's a pain.
I have a script that generates a plain text file and saves it into a drive folder. And to upload, I've installed Drive file stream to save it in my mac, then all I did was create a script using Python3, with the boto3 library, to upload the text file into different s3 buckets depending on the file name.
I was thinking that I can create a lambda to process the file into the s3 buckets but I cannot resolve how to create the connection between drive and s3. I would appreciate if someone could give me a piece of advise on how to start with this.
Thanks

if you just simply want to connect google drive and aws s3 there is one service name zapier which provide different type of integration without line of code
https://zapier.com/apps/amazon-s3/integrations/google-drive
For more details you can check this link out

Related

Is there a way to upload files to the Amazon S3 from SFTP

My idea is this: I have an SFTP host with data on it and I want to create a file in S3 from this data, but to save network resources I don't want to download all of this data to a system first to upload again. So my question is: is it possible to transfer the data directly to the s3 without first downloading it? (preferably with the Amazon S3 Java SDK)

copy documents from google drive to amzon-s3 programmatically by using java

I have download files from Google drive and save into my local system by using google drive api with java.My aim is to make a copy of documents from gdrive to amazon s3.
I can achieve this by download the Gdrive documents into my local directory and upload into amazon-s3 by using the s3Utility's public void uploadToBucket(int userId, String bucketName, String fileName, File fileData) method.
Is there any direct way to achieve this? that is i want to reduce one step. i don't like to download documents into my local.Instead of this i would like to give the gdrive document's downloadurl into s3 method,it will need to save the document into s3. Is it possible? Any Suggestions? sorry the essay type of question
You will need a server somewhere to run your code as both GDrive and Amazon S3 are closed services - you cannot add your own code to them.

Access files in s3n://elasticmapreduce/samples/wordcount/input

How I can I access the file sitting in the following folder of S3 which is own by someone else
s3n://elasticmapreduce/samples/wordcount/input
The files in s3n://elasticmapreduce/samples/wordcount/input are public, and made available as input by Amazon to the sample word count Hadoop program. The best way to fetch them is to
Start a new Amazon Elastic MapReduce Job Flow (it doesn't matter which one) from the Amazon Web Services console, and make sure that you keep the the job alive with the Keep Alive option
Once the EC2 machines have started, find the instances on EC2 from the Amazon Web Services console
ssh into one of the running EC2 instances, using the hadoop user, for example
ssh -i keypair.pem hadoop#ec2-IPADDRESS.compute-1.amazonaws.com
Obtain the files you need, using hadoop dfs -copyToLocal s3://elasticmapreduce/samples/wordcount/input/0002 .
sftp the files to your local system
You can access wordSplitter.py here:
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/wordSplitter.py
You can access the input files here:
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0012
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0011
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0010
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0009
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0008
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0007
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0006
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0005
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0004
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0003
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0002
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0001
The owner of the folder (most likely a file in the folder) must have made it accessible to anonymous reader.
If that is the case, s3n://x/y... is translated to
http://s3.amazonaws.com/x/y...
or
http://x.s3.amazonaws.com/y...
x is the name of the bucket.
y... is the path wihtin the bucket.
If you want to make sure the file exists, e.g. if you suspect the name was misspelled, you can in your browser to open
http://s3.amazonaws.com/x
and you'll see XML describing "files" that is S3 objects, available.
Try this:
http://s3.amazonaws.com/elasticmapreduce
I tried this, and seems that the path you want is not public.
AWS EBS documentation quotes s3://elasticmapreduce/samples/wordcount/input in one of the "getting started" examples. But s3 is different from s3n, so input might be available to EMR, but not to HTTP access.
In Amazon S3, there is no concept of folders, a bucket it just a flat collection of objects. But you can list all the files you are interested in a browser with the following URL:
s3.amazonaws.com/elasticmapreduce?prefix=samples/wordcount/input/
Then you can download them by specifying the whole name, e.g.
s3.amazonaws.com/elasticmapreduce/samples/wordcount/input/0001

Allowing users to download files as a batch from AWS s3 or Cloudfront

I have a website that allows users to search for music tracks and download those they they select as mp3.
I have the site on my server and all of the mp3s on s3 and then distributed via cloudfront. So far so good.
The client now wishes for users to be able to select a number of music track and then download them all in bulk or as a batch instead of 1 at a time.
Usually I would place all the files in a zip and then present the user a link to that new zip file to download. In this case, as the files are on s3 that would require I first copy all the files from s3 to my webserver process them in to a zip and then download from my server.
Is there anyway i can create a zip on s3 or CF or is there someway to batch / group files in to a zip?
Maybe i could set up an EC2 instance to handle this?
I would greatly appreciate some direction.
Best
Joe
I am afraid you won't be able to create the batches w/o additional processing. firing up an EC2 instance might be an option to create a batch per user
I am facing the exact same problem. So far the only thing I was able to find is Amazon's s3sync tool:
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
In my case, I am using Rails + its Paperclip addon which means that I have no way to easily download all of the user's images in one go, because the files are scattered in a lot of subdirectories.
However, if you can group your user's files in a better way, say like this:
/users/<ID>/images/...
/users/<ID>/songs/...
...etc., then you can solve your problem right away with:
aws s3 sync s3://<your_bucket_name>/users/<user_id>/songs /cache/<user_id>
Do have in mind you'll have to give your server the proper credentials so the S3 CLI tools can work without prompting for usernames/passwords.
And that should sort you.
Additional discussion here:
Downloading an entire S3 bucket?
s3 is single http request based.
So the answer is threads to achieve the same thing
Java api - uses TransferManager
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html
You can get great performance with multi threads.
There is no bulk download sorry.

How to upload a file from the web onto Amazon S3?

I have a link to a file (like so: http://example.com/tmp/database.csv). I want to upload it directly into S3, instead of downloading it on my computer first (and then uploading). Is this possible?
The file will have to move through some application you write. Amazon S3 does not have any mechanism to execute code or pull files, so the only way to do this is to send it directly from the server where the file is hosted or from another server.