Can I cUrl a file from the web to Amazon S3? - file-io

Is there a way to transfer a file from the web directly to my Amazon S3 account?
For example, I want to transfer a large RDF file from www.data.gov directly to Amazon S3 without having to download the file to my local machine first.

You need a server somewhere that will execute the curl command. The easiest way is probably to use this a tool that I wrote for AWS EC2: https://github.com/mjhm/cURLServer. You can check out the docs on a live version at http://ec2-204-236-157-181.us-west-1.compute.amazonaws.com/

Related

GoReplay - Upload to S3 does not work

I am trying to capture all incoming traffic on a specific port using GoReplay and to upload it directly to S3 servers.
I am running a simple file server on port 8000 and a gor instance using the (simple) command
gor --input-raw :8000 --output-file s3://<MyBucket>/%Y_%m_%d_%H_%M_%S.log
I does create a temporal file at /tmp/ but other than that, id does not upload any thing to S3.
Additional information :
The OS is Ubuntu 14.04
AWS cli is installed.
The AWS credentials are deffined within the environent
It seems the information you are providing or scenario you explained is not complete however to upload a file from your EC2 machine to S3 is simple as written command below.
aws s3 cp yourSourceFile s3://your bucket
To see your file you can see your file by using below command
aws s3 ls s3://your bucket
However, s3 is object storage and you can't use it to upload those files which are continually editing or adding or updating.

AWS S3 download files with exec permission

I've been struggling with this one for quite a while. Thought it would work out-of-box based on AWS documentation of supporting the acl header.
I'm using the AWS S3 CLI in order to download files from my S3 bucket. Some of the files will need to have 'exec' permissions (running on Linux).
I can chmod the files but I would like to control that during the upload rather than during the download.
So, the question is whether I can use the AWS CLI so that it will automatically grant execution (or other) permissions based on something that I can set during the upload or afterwards on the uploaded file.
Thanks,

AWS FTP behavior

I'm having some issue on my AWS S3 bucket and vsftpd.
I've created a vsftpd instance and mount AWS S3 bucket. My issue is that everytime I upload a file and the connection was disrupted, it appends the existing file on the S3 bucket instead of override it when the FTP client retry. What should I set on the S3 bucket policy to have such behavior to override instead of append?
There are no Amazon S3 configuration settings that would impact this behaviour -- it is totally the result of the software you are using.
It's also worth mentioning that FTP is a rather old protocol and these days there are much better alternatives, such as uploads via the browser or Dropbox-like shared folders.
One of the easiest options is to have your users upload directly to Amazon S3 -- that way, you don't need to run any servers. This could be done by uploading via a browser, or by providing users with some software, such as Cloudberry Explorer or the AWS Command-Line Interface (CLI).
I highly encourage you to stop using FTP these days.

block file system on S3

i am a little puzzled i hope someone can help me out.
we create some ORC-Files that we would like to query while they are stored on S3.
We noticed that the S3 native Filesystem S3n does not really work out for this manner. I am not really sure what the problem is - but my guess is, that the reader is not able to jump to specific bytes inside the file so that he has to load the whole file before he can query it.
So we tried storing the files on S3 (uri s3://) which is a block file system just like HDFS backed by s3 and it worked great.
But i am a little worried after reading up on this source about Amazon EMR which says
Amazon S3 block file system (URI path: s3bfs://)
The Amazon S3 block file system is a legacy file storage system. We strongly discourage the use of this system.
Important
We recommend that you do not use this file system because it can trigger a race condition that might cause your cluster to fail. However, it might be required by legacy applications.
EMRFS (URI path: s3://)
EMRFS is an implementation of HDFS used for reading and writing regular files from Amazon EMR directly to Amazon S3.
I am not using EMR - i create my files by launching an EC2 cluster and then use s3 as a cold storage - but I am kind of puzzled right now and not sure which filesystem I use when I store my files on s3 using the URI scheme s3:// - do i use EMRFS or do i use the deprecated s3bfs filesystem?
Amazon S3 is an object storage system. It is not recommended to "mount" S3 as a filesystem. Amazon Elastic Block Store (EBS) is a block storage system that appears as volumes on Amazon EC2 instances.
When used from Amazon Elastic MapReduce (EMR), Hadoop has extensions that make it easy to work with Amazon S3. However, if you are not using EMR, there is no need to use EMRFS (which is available only on EMR), nor should you use S3 as a block storage system.
The easiest way to use S3 from EC2 is via the AWS Command-Line Interface (CLI). You can copy files to/from S3 by using the aws s3 cp command. There's also a sync command to make it easy to syncrhonize data to/from S3.
You can also programmatically connect to Amazon S3 via an SDK, so that your app can directly transfer files to/from S3.
As to which to choose... typically, applications like to work with files on a local filesystem, so copy your files from S3 to a local device. However, if your app can directly communicate with S3, there will be less "moving parts".

How to upload files directly to Amazon S3 from a remote server?

Is it possible to upload a file to S3 from a remote server?
The remote server is basically a URL based file server. Example, using http://example.com/1.jpg, it serves the image. It doesn't do anything else and can't run code on this server.
It is possible to have another server telling S3 to upload a file from http://example.com/1.jpg
upload from http://example.com/1.jpg
server -------------------------------------------> S3 <-----> example.com
If you can't run code on the server or execute requests then, no, you can't do this. You will have to download the file to a server or computer that you own and upload from there.
You can see the operations you can perform on amazon S3 at http://docs.amazonwebservices.com/AmazonS3/latest/API/APIRest.html
Checking the operations for both the REST and SOAP APIs you'll see there's no way to give Amazon S3 a remote URL and have it grab the object for you. All of the PUT requests require the object's data to be provided as part of the request. Meaning the server or computer that is initiating the web request needs to have the data.
I have had a similar problem in the past where I wanted to download my users' Facebook Thumbnails and upload them to S3 for use on my site. The way I did it was to download the image from Facebook into Memory on my server, then upload to Amazon S3 - the full thing took under 2 seconds. After the upload to S3 was complete, write the bucket/key to a database.
Unfortunately there's no other way to do it.
I think the suggestion provided is quite good, you can SCP the file to S3 Bucket. Giving the pem file will be a password less authentication, via PHP file you can validate the extensions. PHP file can pass the file, as argument to SCP command.
The only problem with this solution is, you must have your instance in AWS. You can't use this solution if your website is hosted in other Hosting Providers and you are trying to upload files straight to S3 Bucket.
Technically it's possible, using AWS Signature Version 4, Assuming your remote server is the customer in the image below, you could prepare a form in the main server, and send the form fields to the remote server, for it to curl it. Detailed example here.
you can use scp command from Terminal.
1)using terminal, go to the place where there is that file you want to transfer to the server
2) type this:
scp -i yourAmazonKeypairPath.pem fileNameThatYouWantToTransfer.php ec2-user#ec2-00-000-000-15.us-west-2.compute.amazonaws.com:
N.B. Add "ec2-user#" before your ec2blablbla stuffs that you got from the Ec2 website!! This is such a picky error!
3) your file will be uploaded and the progress will be shown. When it is 100%, you are done!