Does Amazon S3 support symlinks? - amazon-s3

I have an object which I would like to address using different keys without actually copying the object itself, like a symlink in Linux. Does Amazon S3 provide such a thing?

S3 does not support the notion of a symlink, where one object key is treated as an alias for a different object key. (You've probably heard this before: S3 is not a filesystem. It's an object store).
If you are using the static web site hosting feature, there is a partial emulation of this capability, with object-level redirects:
http://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-redirect.html
This causes requests for "object-a" to be greeted with a 301 Moved Permanently response, with the URL for "object-b" in the Location: header, which serves a similar purpose, but is of course still quite different. It only works if the request arrives at the website endpoint (not the REST endpoint).
If you use a reverse proxy (haproxy, nginx, etc.) in EC2 to handle incoming requests and forward them to the bucket, then of course you have the option at the proxy layer of rewriting the request URL before forwarding to S3, so you could translate the incoming request path to whatever you needed to present to S3. How practical this is depends on your application and motivation, but this is one of the strategies I use to modify where, in a particular bucket, an object appears, compared to where it is actually stored, allowing me to rewrite paths based on other attributes in the request.

I had a similar question and needed a solution, which I describe below. While S3 does not support symlinks, you can do this in a way with the following:
echo "https://s3.amazonaws.com/my.bucket.name/path/to/a/targetfile" > file
aws s3 cp file s3://my.bucket.name/file
wget $(curl https://s3.amazonaws.com/my.bucket.name/file)
What this is actually doing is getting the contents of the file, which is really just a pointer to the target file, then passing that to wget (curl can also be used to redirect to a file instead of wget).
This is really just a work around though as its not a true symlink but rather a creative solution to simulate symlinks.

Symlinks no, but same object to multiple keys, maybe.
Please refer to Rodrigo's answer at Amazon S3 - Multiple keys to one object
If you're using the website serving on S3, you can do it via header x-amz-website-redirect-location
If you're not using the website serving, you can create your custom header (x-amz-meta-KeyAlias) and handle it manually.

Related

Overriding httpd/apache upstream proxy httpcode with another

I have some react code (written by someone else) that needs to be served. The preferred method is via a Google Storage Bucket, fronted by their Cloud CDN, and this works. However, due to some quirks in the code, there is a requirement to override 404s with 200s, and serve content from the homepage instead (i.e. if there is a 404, don't serve a 404, serve the content of the homepage and return as a 200 instead)
(If anyone is interested, this override currently is implemented in CloudFront on AWS. Google CDN does not provide this functionality yet)
So, if the code is served at "www.mysite.com/app/" and someone hits "www.mysite.com/app/not-here" (which would return a 404), what should happen is that the response should NOT be 404, but a 200 with the content being served from index.html instead.
I was able to get this working by bundling all the code inside a docker container and then using the solution here. However, this setup means if we have a code change, all the running containers need to be restarted, and the customer expects zero downtime, hence the bucket solution.
So I now need to do the same thing but with the files being proxied in (with the upstream being the CDN).
I cannot use the original solution since the files are no longer local, and httpd can't check for existence of something that is not local.
I've tried things like ProxyErrorOverride and ErrorDocument, and managed to get it to redirect, but that is not what is needed.
Does anyone know how/if this can be done?
If the question is: how to catch the 404 error provided by Cloud Storage when a file is missing with httpd/apache? I don't know.
However, I think that isn't the best solution. Serving files directly from Cloud Storage is convenient but not industrial.
Imagine, you deploy several broken files successively, how to rollback in a stable format?
The best is to package your different code release in an atomic bag, a container for instance. Each version are in a different container and performing a rollback is easier and consistent.
Now your "container restart" issue. I don't know on which platform you are running your container. If your run it on a Compute Engine (a VM) it's maybe the worse solution. Today, there is container orchestration system that allows you to deploy, scale up and down the containers, and to perform progressive rollout, to replace, without downtime, the existing running containers by a newer version.
Cloud Run is a wonderful serverless solution for that, you also have Kubernetes (GKE on Google Cloud) that you can use with Knative for a better developer experience.

S3 download - SDK vs HTTP request inside lambda function

I'm looking for some benchmark or article explaining what is faster.
Inside a lambda function, is it faster to....:
A) Download an S3 file through cloudfront with a regular request module (i.e. hit the cloudfront URL with request or axios and download it)
B) Use the AWS SDK to get the file through the getObject methods
I've been googling this for a while now and I don't quite get to the answer, and I'm hoping I can skip benchmark it if someone else did already.
I'm talking about pretty small files, like fonts or images.
And the root of the question is, I believe AWS uses some sort of backbone communication for some cases. Given that lambda is inside their system, as S3 is, maybe requesting the image through the internet (HTTP) is not that fast.
Thanks!
In the same region it should be faster to use the SDK to download it. If it's not in the same region you might want to replicated it so that it is.

Access files stored on Amazon S3 through web browser

Current Situation
I have a project on GitHub that builds after every commit on Travis-CI. After each successful build Travis uploads the artifacts to an S3 bucket. Is there some way for me to easily let anyone access the files in the bucket? I know I could generate a read-only access key, but it'd be easier for the user to access the files through their web browser.
I have website hosting enabled with the root document of "." set.
However, I still get an 403 Forbidden when trying to go to the bucket's endpoint.
The Question
How can I let users easily browse and download artifacts stored on Amazon S3 from their web browser? Preferably without a third-party client.
I found this related question: Directory Listing in S3 Static Website
As it turns out, if you enable public read for the whole bucket, S3 can serve directory listings. Problem is they are in XML instead of HTML, so not very user-friendly.
There are three ways you could go for generating listings:
Generate index.html files for each directory on your own computer, upload them to s3, and update them whenever you add new files to a directory. Very low-tech. Since you're saying you're uploading build files straight from Travis, this may not be that practical since it would require doing extra work there.
Use a client-side S3 browser tool.
s3-bucket-listing by Rufus Pollock
s3-file-list-page by Adam Pritchard
Use a server-side browser tool.
s3browser (PHP)
s3index Scala. Going by the existence of a Procfile, it may be readily deployable to Heroku. Not sure since I don't have any experience with Scala.
Filestash is the perfect tool for that:
login to your bucket from https://www.filestash.app/s3-browser.html:
create a shared link:
Share it with the world
Also Filestash is open source. (Disclaimer: I am the author)
I had the same problem and I fixed it by using the
new context menu "Make Public".
Go to https://console.aws.amazon.com/s3/home,
select the bucket and then for each Folder or File (or multiple selects) right click and
"make public"
You can use a bucket policy to give anonymous users full read access to your objects. Depending on whether you need them to LIST or just perform a GET, you'll want to tweak this. (I.e. permissions for listing the contents of a bucket have the action set to "s3:ListBucket").
http://docs.aws.amazon.com/AmazonS3/latest/dev/AccessPolicyLanguage_UseCases_s3_a.html
Your policy will look something like the following. You can use the S3 console at http://aws.amazon.com/console to upload it.
{
"Version":"2008-10-17",
"Statement":[{
"Sid":"AddPerm",
"Effect":"Allow",
"Principal": {
"AWS": "*"
},
"Action":["s3:GetObject"],
"Resource":["arn:aws:s3:::bucket/*"
]
}
]
}
If you're truly opening up your objects to the world, you'll want to look into setting up CloudWatch rules on your billing so you can shut off permissions to your objects if they become too popular.
https://github.com/jupierce/aws-s3-web-browser-file-listing is a solution I developed for this use case. It leverages AWS CloudFront and Lambda#Edge functions to dynamically render and deliver file listings to a client's browser.
To use it, a simple CloudFormation template will create an S3 bucket and have your file server interface up and running in just a few minutes.
There are many viable alternatives, as already suggested by other posters, but I believe this approach has a unique range of benefits:
Completely serverless and built for web-scale.
Open source and free to use (though, of course, you must pay AWS for resource utilization -- such S3 storage costs).
Simple / static client browser content:
No Ajax or third party libraries to worry about.
No browser compatibility worries.
All backing systems are native AWS components.
You never share account credentials or rely on 3rd party services.
The S3 bucket remains private - allowing you to only expose parts of the bucket.
A custom hostname / SSL certificate can be established for your file server interface.
Some or all of the host files can be protected behind Basic Auth username/password.
An AWS WebACL can be configured to prevent abusive access to the service.

Amazon S3 Cloudfront Deployment Best Practice

Our current plan for a site is to use Amazon's Cloudfront service as a CDN for asset files such as CSS, JavaScript, and Images, and any other static files.
We currently have 1 bucket in S3 that contains all of these static files. The files are separated into different folders depending on what they are, "Scripts" are JS files, "Images" are Images, etc yadda yadda yadda.
So, what I didn't realize from the start was that once you deploy a Bucket from S3 to a Cloudfront Distribution, then every subsequent update to the bucket won't deploy again to that same Distribution. So, it looks as if you have to redeploy the bucket to another Cloudfront instance every time you have a static file update.
That's fine for images, because we can easily make sure that if there is a change to an image, then we just create a new image. But, that's difficult to do for CSS and JS.
So, that gets me to the Best Practice questions:
Is it best practice to create another Cloudfront Distribution for every production deployment? The problem here would be that causes trouble with CNAME records.
Is it best practice to NOT warehouse CSS and JS in Cloudfront because of the nature of those files, and their need to be easily modified? Seems like the answer to this would be NO because that's the purpose of a CDN.
Is there some other method with Cloudfront that I don't know about?
You can issue invalidation requests to CloudFront.
http://docs.amazonwebservices.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html
Instead of an S3 bucket, though, we use our own server as a custom origin. We have .htaccess alias style_*.css to style.css, and we inject the file modification time for style.css in the HTML. As CloudFront sees a totally different URL, it'll fetch the new version.
(Note: Some CDNs let you do that via query string, but CloudFront ignores all query string data for caching, hence the .htaccess solution.)
edit: CloudFront can be (optionally) configured to use query strings now.
CloudFront has started supporting query strings, which you can use to invalidate cache.
http://aws.typepad.com/aws/2012/05/amazon-cloudfront-support-for-dynamic-content.html

Is there a way to have index.html functionality with content hosted on S3?

Is there a way to make S3 default to an index.html page? E.g.: My bucket object listing:
/index.html
/favicon.ico
/images/logo.gif
A call to www.example.com/index.html works great! But if one were to call www.example.com/ we'd either get a 403 or a REST object listing XML document depending on how bucket-level ACL was configured.
So, the question: Is there a way to have index.html functionality with content hosted on S3?
For people still struggling against this after 3 years, let me add some important information:
The URL for your website (and to which you have to point your DNS) is not
<bucket_name>.s3-us-west-2.amazonaws.com, but
<bucket_name>.s3-website-us-west-2.amazonaws.com.
If you use the first, it will not work as intended, no matter how much you config the Index document.
For a specific example, consider:
http://www-example-com.s3.amazonaws.com/index.html works.
http://www-example-com.s3.amazonaws.com/ fails with AccessDenied.
http://www-example-com.s3-website-us-west-2.amazonaws.com/ works!
To get your true website address, go to your S3 Management Console, select the target bucket, then Properties, then Static Website Hosting. It will show the website URL that will work.
Amazon S3 now supports Index Documents
The index document for a bucket can be set to something like index.html. When accessing the root of the site or a sub-directory containing a document of that name that document is returned.
It is extremely easy to do using the aws cli:
aws s3 website $MY_BUCKET_NAME --index-document index.html
You can set the index document from the AWS Management Console:
You can easily solve it by Amazon CloudFront link. At Amazon CloudFront you could modify the root object. You can download manager here: m1.mycloudbuddy.com/downloads.html.
Since It's been long time, this question being asked, and Amazon S3 changing their Interface. I would like to answer with updated screenshots.
We need to enable 'static web hosting' for S3 to serve as web hosting.
- Go to Properties -> click on static web hosting -> Select 'use this bucket to host a website'
- Enter the index document (index.html by default), error document and redirection rules, if any.
As answered in this answer on Stack Overflow, web hosting link would be: http://bucket-name.s3-website-region.amazonaws.com
I would suggest reading this thread from 2006 (On Amazon web services developers connection). It seems there's no easy solution to this.
Yes. using AWS Cloudfront lets you assign a default file.
you can do it using dns webforwards and cloaking. just forward to the complete path of the index.html
www.example.com forwards to http://www.example.com.s3.amazonaws.com and make sure you cloak the output.