How to import data to Amazon S3 from URL - amazon-s3

I have an S3 bucket and the URL of a large file. I would like to store the content located at the URL in the S3 bucket.
I could download the file to my local machine and then upload it to S3 with Cloudberry or Jungledisk or whatever. However, if the file is large, this may take a long time because the file must be transferred twice, and my network connection is much slower than Amazon's.
If I have a lot of data to store in S3, I can start an EC2 instance, retrieve the files to the instance with curl or wget, and then push the data from the EC2 instance to S3. This works, but it's a lot of steps if I just want to archive one file.
Any suggestions?

You can stream the file directly from the source to S3.
If you are using node, you can use streaming-s3.

Related

Is there a way to upload files to the Amazon S3 from SFTP

My idea is this: I have an SFTP host with data on it and I want to create a file in S3 from this data, but to save network resources I don't want to download all of this data to a system first to upload again. So my question is: is it possible to transfer the data directly to the s3 without first downloading it? (preferably with the Amazon S3 Java SDK)

block file system on S3

i am a little puzzled i hope someone can help me out.
we create some ORC-Files that we would like to query while they are stored on S3.
We noticed that the S3 native Filesystem S3n does not really work out for this manner. I am not really sure what the problem is - but my guess is, that the reader is not able to jump to specific bytes inside the file so that he has to load the whole file before he can query it.
So we tried storing the files on S3 (uri s3://) which is a block file system just like HDFS backed by s3 and it worked great.
But i am a little worried after reading up on this source about Amazon EMR which says
Amazon S3 block file system (URI path: s3bfs://)
The Amazon S3 block file system is a legacy file storage system. We strongly discourage the use of this system.
Important
We recommend that you do not use this file system because it can trigger a race condition that might cause your cluster to fail. However, it might be required by legacy applications.
EMRFS (URI path: s3://)
EMRFS is an implementation of HDFS used for reading and writing regular files from Amazon EMR directly to Amazon S3.
I am not using EMR - i create my files by launching an EC2 cluster and then use s3 as a cold storage - but I am kind of puzzled right now and not sure which filesystem I use when I store my files on s3 using the URI scheme s3:// - do i use EMRFS or do i use the deprecated s3bfs filesystem?
Amazon S3 is an object storage system. It is not recommended to "mount" S3 as a filesystem. Amazon Elastic Block Store (EBS) is a block storage system that appears as volumes on Amazon EC2 instances.
When used from Amazon Elastic MapReduce (EMR), Hadoop has extensions that make it easy to work with Amazon S3. However, if you are not using EMR, there is no need to use EMRFS (which is available only on EMR), nor should you use S3 as a block storage system.
The easiest way to use S3 from EC2 is via the AWS Command-Line Interface (CLI). You can copy files to/from S3 by using the aws s3 cp command. There's also a sync command to make it easy to syncrhonize data to/from S3.
You can also programmatically connect to Amazon S3 via an SDK, so that your app can directly transfer files to/from S3.
As to which to choose... typically, applications like to work with files on a local filesystem, so copy your files from S3 to a local device. However, if your app can directly communicate with S3, there will be less "moving parts".

Best way of storing and retrieving files - BaaS or S3

We are facing the following dilemma:
Our mobile client application will be user-authenticating through a BaaS (Backend-as-a-Service) and will then need to send a file to the cloud - specifically an Amazon EC2 server where the main processing will take place. Since the time of processing of the file might take place later, there is a need to store the files (and there is also a prospect of keeping an archive of them for future use by the users). The question is what would you suggest as the preferred way from the following:
a) send the file to the EC2 server directly which will then issue an Amazon S3 request to save the file there
OR
b) store the file to the BaaS (which in our case is parse.com which uses S3 as its data-storage) and retrieve it later by the EC2 server
The cost of transferring a file from EC2 to S3 and inverse is 0 as long as both are on the same region which in both a) and b) cases is true. The problem is that there is a need for mapping each user to the files that he has access to and a) and b) differ a lot in this case.
So basically you are sending file to EC2 - EC2 is processing it and saving it to S3 or it is just saving it to S3 ??.I used a very easy way of transferring data from Ec2 to S3. i.e. s3fuse :- you can basically mount EC2 drive with S3 , so when you store something on EC2, it will automatically be stored in S3 also. Might be handy for you.

Is it possible to add files to Amazon S3 buckets using web URL as source?

I am trying to load one of my S3 buckets.
File i am trying to load is huge tarball on the web, I don't want to download file on my disk and then again start uploading it to S3 bucket.
is there any way that I can directly specify this URL and it get added to S3 ?
You have to "put" to S3, and it does not "get".

How to upload a file from the web onto Amazon S3?

I have a link to a file (like so: http://example.com/tmp/database.csv). I want to upload it directly into S3, instead of downloading it on my computer first (and then uploading). Is this possible?
The file will have to move through some application you write. Amazon S3 does not have any mechanism to execute code or pull files, so the only way to do this is to send it directly from the server where the file is hosted or from another server.