S3 download - SDK vs HTTP request inside lambda function - amazon-s3

I'm looking for some benchmark or article explaining what is faster.
Inside a lambda function, is it faster to....:
A) Download an S3 file through cloudfront with a regular request module (i.e. hit the cloudfront URL with request or axios and download it)
B) Use the AWS SDK to get the file through the getObject methods
I've been googling this for a while now and I don't quite get to the answer, and I'm hoping I can skip benchmark it if someone else did already.
I'm talking about pretty small files, like fonts or images.
And the root of the question is, I believe AWS uses some sort of backbone communication for some cases. Given that lambda is inside their system, as S3 is, maybe requesting the image through the internet (HTTP) is not that fast.
Thanks!

In the same region it should be faster to use the SDK to download it. If it's not in the same region you might want to replicated it so that it is.

Related

Overriding httpd/apache upstream proxy httpcode with another

I have some react code (written by someone else) that needs to be served. The preferred method is via a Google Storage Bucket, fronted by their Cloud CDN, and this works. However, due to some quirks in the code, there is a requirement to override 404s with 200s, and serve content from the homepage instead (i.e. if there is a 404, don't serve a 404, serve the content of the homepage and return as a 200 instead)
(If anyone is interested, this override currently is implemented in CloudFront on AWS. Google CDN does not provide this functionality yet)
So, if the code is served at "www.mysite.com/app/" and someone hits "www.mysite.com/app/not-here" (which would return a 404), what should happen is that the response should NOT be 404, but a 200 with the content being served from index.html instead.
I was able to get this working by bundling all the code inside a docker container and then using the solution here. However, this setup means if we have a code change, all the running containers need to be restarted, and the customer expects zero downtime, hence the bucket solution.
So I now need to do the same thing but with the files being proxied in (with the upstream being the CDN).
I cannot use the original solution since the files are no longer local, and httpd can't check for existence of something that is not local.
I've tried things like ProxyErrorOverride and ErrorDocument, and managed to get it to redirect, but that is not what is needed.
Does anyone know how/if this can be done?
If the question is: how to catch the 404 error provided by Cloud Storage when a file is missing with httpd/apache? I don't know.
However, I think that isn't the best solution. Serving files directly from Cloud Storage is convenient but not industrial.
Imagine, you deploy several broken files successively, how to rollback in a stable format?
The best is to package your different code release in an atomic bag, a container for instance. Each version are in a different container and performing a rollback is easier and consistent.
Now your "container restart" issue. I don't know on which platform you are running your container. If your run it on a Compute Engine (a VM) it's maybe the worse solution. Today, there is container orchestration system that allows you to deploy, scale up and down the containers, and to perform progressive rollout, to replace, without downtime, the existing running containers by a newer version.
Cloud Run is a wonderful serverless solution for that, you also have Kubernetes (GKE on Google Cloud) that you can use with Knative for a better developer experience.

Mount S3 bucket as an NFS share on an EC2 instance

long time reader but I've usually been able to find the answers I've been looking for in existing posts - but this time I've not been able to.
I am essentially teaching myself AWS CDK from scratch, I've only really just started with it so not finding anything which helps me on my mission may be a result of not knowing enough yet to be asking the right questions... so please bare with me.
Thus far I've used the AWS CDK with Python to create a stack which creates an S3 bucket, and also fires up an EC2 instance with an AWS file storage gateway AMI loaded on it (so running Amazon Linux). This deploys and runs fine - however now I'd like to programmatically set up the S3 bucket to be accessed via an NFS share on the EC2 instance. From what I've seen I'd assumed it is or should be fairly trivial however I keep getting a bit lost in documentation and internet hunts and not quite sure I'm looking in the right places or asking search engines the right questions to unlock the path to achieve this.
It looks like I should be able to script something up to make it happen when the instance is start using user-data but I'm a bit lost. Is anyone able to throw me some crumbs to follow to find a good way of achieving this, or a better way of achieving what I want to happen (which is basically accessing the S3 bucket contents as though they are files on an EC2 instance) - if not tell me how to do it if it's trivial enough?
Much appreciated :)
Dan
You are on good track. user_data can be used for that.
I don't have full code to give you as its use case specific (e.g. which OS are you using?), but the user_data would have to download and install s3fs:
s3fs allows Linux and macOS to mount an S3 bucket via FUSE. s3fs preserves the native object format for files, allowing use of other tools like AWS CLI.
However, S3 is an object storage system, and it can't be really mounted on an instance like you would do with NFS or EBS storage solutions. But with s3fs-fuse you can mimic such a behavior. And for some use-cases it will be sufficient.
So what you can do, is to setup the user_data script through console, verify that it works, and then basically just copy and paste to CDK. Its more of a trial-and-see approach, but this is the best way to learn.

Does Amazon S3 support symlinks?

I have an object which I would like to address using different keys without actually copying the object itself, like a symlink in Linux. Does Amazon S3 provide such a thing?
S3 does not support the notion of a symlink, where one object key is treated as an alias for a different object key. (You've probably heard this before: S3 is not a filesystem. It's an object store).
If you are using the static web site hosting feature, there is a partial emulation of this capability, with object-level redirects:
http://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-redirect.html
This causes requests for "object-a" to be greeted with a 301 Moved Permanently response, with the URL for "object-b" in the Location: header, which serves a similar purpose, but is of course still quite different. It only works if the request arrives at the website endpoint (not the REST endpoint).
If you use a reverse proxy (haproxy, nginx, etc.) in EC2 to handle incoming requests and forward them to the bucket, then of course you have the option at the proxy layer of rewriting the request URL before forwarding to S3, so you could translate the incoming request path to whatever you needed to present to S3. How practical this is depends on your application and motivation, but this is one of the strategies I use to modify where, in a particular bucket, an object appears, compared to where it is actually stored, allowing me to rewrite paths based on other attributes in the request.
I had a similar question and needed a solution, which I describe below. While S3 does not support symlinks, you can do this in a way with the following:
echo "https://s3.amazonaws.com/my.bucket.name/path/to/a/targetfile" > file
aws s3 cp file s3://my.bucket.name/file
wget $(curl https://s3.amazonaws.com/my.bucket.name/file)
What this is actually doing is getting the contents of the file, which is really just a pointer to the target file, then passing that to wget (curl can also be used to redirect to a file instead of wget).
This is really just a work around though as its not a true symlink but rather a creative solution to simulate symlinks.
Symlinks no, but same object to multiple keys, maybe.
Please refer to Rodrigo's answer at Amazon S3 - Multiple keys to one object
If you're using the website serving on S3, you can do it via header x-amz-website-redirect-location
If you're not using the website serving, you can create your custom header (x-amz-meta-KeyAlias) and handle it manually.

Allowing users to download files as a batch from AWS s3 or Cloudfront

I have a website that allows users to search for music tracks and download those they they select as mp3.
I have the site on my server and all of the mp3s on s3 and then distributed via cloudfront. So far so good.
The client now wishes for users to be able to select a number of music track and then download them all in bulk or as a batch instead of 1 at a time.
Usually I would place all the files in a zip and then present the user a link to that new zip file to download. In this case, as the files are on s3 that would require I first copy all the files from s3 to my webserver process them in to a zip and then download from my server.
Is there anyway i can create a zip on s3 or CF or is there someway to batch / group files in to a zip?
Maybe i could set up an EC2 instance to handle this?
I would greatly appreciate some direction.
Best
Joe
I am afraid you won't be able to create the batches w/o additional processing. firing up an EC2 instance might be an option to create a batch per user
I am facing the exact same problem. So far the only thing I was able to find is Amazon's s3sync tool:
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
In my case, I am using Rails + its Paperclip addon which means that I have no way to easily download all of the user's images in one go, because the files are scattered in a lot of subdirectories.
However, if you can group your user's files in a better way, say like this:
/users/<ID>/images/...
/users/<ID>/songs/...
...etc., then you can solve your problem right away with:
aws s3 sync s3://<your_bucket_name>/users/<user_id>/songs /cache/<user_id>
Do have in mind you'll have to give your server the proper credentials so the S3 CLI tools can work without prompting for usernames/passwords.
And that should sort you.
Additional discussion here:
Downloading an entire S3 bucket?
s3 is single http request based.
So the answer is threads to achieve the same thing
Java api - uses TransferManager
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html
You can get great performance with multi threads.
There is no bulk download sorry.

Stop an in-progress query-string-authorized download on Amazon S3?

With Amazon S3, can I stop a query-string-authorized download that is in progress?
Are there other file download services that provide such a feature?
I'm not aware of a built in way to do this. If I understand your goal, you want to potentially stop an HTTP response mid-stream based on some custom rules you have. Is that right?
If so, perhaps you could write a very thin proxy to S3 that encapsulates this logic. If you ran the proxy on EC2 you wouldn't incur any additional bandwidth fees.
The downside is that you would have manage scaling the proxy (i.e. add more EC2 nodes based on traffic) so depending on your scaling requirements, this could require a bit of work. But the proxy script itself would probably be fairly trivial. Something like:
Make streaming HTTP request to S3 for object
for each x byte chunk in response from S3:
Check auth condition. Continue if valid. Break if not.
Send chunk to caller
I'm not aware of anyone that allows this. In general, the authentication is only checked once, when you begin downloading, but not thereafter.
Can you describe what you're trying to do more broadly?