Stop an in-progress query-string-authorized download on Amazon S3? - amazon-s3

With Amazon S3, can I stop a query-string-authorized download that is in progress?
Are there other file download services that provide such a feature?

I'm not aware of a built in way to do this. If I understand your goal, you want to potentially stop an HTTP response mid-stream based on some custom rules you have. Is that right?
If so, perhaps you could write a very thin proxy to S3 that encapsulates this logic. If you ran the proxy on EC2 you wouldn't incur any additional bandwidth fees.
The downside is that you would have manage scaling the proxy (i.e. add more EC2 nodes based on traffic) so depending on your scaling requirements, this could require a bit of work. But the proxy script itself would probably be fairly trivial. Something like:
Make streaming HTTP request to S3 for object
for each x byte chunk in response from S3:
Check auth condition. Continue if valid. Break if not.
Send chunk to caller

I'm not aware of anyone that allows this. In general, the authentication is only checked once, when you begin downloading, but not thereafter.
Can you describe what you're trying to do more broadly?

Related

When to use S3 Presigned Url vs Upload through Backend

I read Amazon s3: direct upload vs presigned url and was wondering when use a direct upload from the backend to s3 vs a presigned url.
I understand that the direct upload requires extra bandwidth (user -> server -> s3) but I believe its more secure. Does the savings in bandwidth with the presigned url justify the slight drawback with security (i.e. with stuff like user messages)?
I am also checking the file types on the backend (via magic numbers) which I think is incompatible with presigned urls. Should this reason alone result in not using urls?
In addition I have a file size limit of 5 MB (not sure if this is considered large?). Would there be a significant difference in terms of performance and scalability (i.e. thousands to millions of files sent per hour) between using presigned urls vs direct upload.
You question sounds like you're asking for opinion, so, mine is as follows:
It depends on how secure you need it to be and what you consider is safe. I was wondering about the same questions and I believe that in my case, in the end, it is all secured by SSL encryption anyway (which is enough for me), so I prefer to save my servers bandwidth and memory usage.
Once more it depends on your own system requirements. Anyway, if any upload fails, S3 will be returning an error cause after the request failure. If checking file type is a MUST and checking it on your backend is the only way to do it, you already have your answer.
In a scenario with millions of files (with close to 5MB each) being sent every hour, I would recommend direct upload, because that would be a lot of RAM usage to receive and resend every file.
There are a few more advantages of uploading directly to S3 as you can read here

S3 download - SDK vs HTTP request inside lambda function

I'm looking for some benchmark or article explaining what is faster.
Inside a lambda function, is it faster to....:
A) Download an S3 file through cloudfront with a regular request module (i.e. hit the cloudfront URL with request or axios and download it)
B) Use the AWS SDK to get the file through the getObject methods
I've been googling this for a while now and I don't quite get to the answer, and I'm hoping I can skip benchmark it if someone else did already.
I'm talking about pretty small files, like fonts or images.
And the root of the question is, I believe AWS uses some sort of backbone communication for some cases. Given that lambda is inside their system, as S3 is, maybe requesting the image through the internet (HTTP) is not that fast.
Thanks!
In the same region it should be faster to use the SDK to download it. If it's not in the same region you might want to replicated it so that it is.

Amazon EC2 Load Testing

I am designing a AWS deployment solution for a new dynamic website project. I have acquired an EC2 instance for testing the environment. Need some help on how do I do a load testing on an Ec2 instance to determine how many HTTP requests it can safely handle... P.S. I am new to the AWS platform.
Thanks...
RedLine offers an EC2 Load Testing solution that will automate the distribution of load tests on your own EC2 instances.
Late to the party but could help someone in the future:
A possible tool for load tests, stress tests, whatever you may call them, is Apache JMeter, but there are plenty of alternatives.
A simple starting setup, further explained in this excellent tutorial on DigitalOcean, can exist of a Thread Group containing an HTTP Request Sampler and a View Results in Table Listener. The Thread Group can be used to configure the amount of "clients" you want to simulate. The Request Sampler will be used to configure the server's properties (hostname, path, etc). The Table View Listener outputs a handy CSV file that can be used to calculate means, compare different types of EC2 instances,...
JMeter is a beautiful program with a GUI that can be run on your local workstation, producing an XML file that can be executed on another EC2 instance, for instance. You can even do simple manual edits to the XML file on your server afterward, if necessary.
Take a look at Amazon's testing policy to make sure you're not doing anything illegal.
A couple of quick points;
Set the environment up exactly like it's supposed to run. If there's a database involved, you'll want to involve that in the testing too. Synthetic <?php echo "ok"; CPU based benchmarks won't help you much since normally very little of the time spent replying to HTTP requests is actual CPU time.
A recommendation is to use a service for the benchmarking. Setting load testing up is not without its complexities, and unless you consider benchmarking your core business, you're probably better off using something like Neustar to load and measure your site (there are many services, they're not necessarily what fits you best, just pulled one out of memory)
Of course you can set a load test up yourself, but getting that done right is not anything that can be described in a few sentences. There are very well paid people that only do that for a living :)
There is good experience in using curl-loader aka Davilka tool, also on Amazon EC2 env
http://curl-loader.sourceforge.net

Correct Server Schema to upload pictures in Amazon Web Services

I want to upload pictures to the AWS s3 through the iPhone. Every user should be able to upload pictures but they must remain private for each one of them.
My question is very simple. Since I have no real experience with servers I was wondering which of the following two approaches is better.
1) Use some kind of token vending machine system to grant the user access to the AWS s3 database to upload directly.
2) Send the picture to the EC2 Servlet and have the virtual server place it on the S3 storage.
Edit: I would also need to retrieve, should i do it directly or through the servlet?
Thanks in advance.
Hey personally I don't think it's a good idea to use token vending machine to directly upload the data via the iPhone, because it's much harder to control the access privileges, etc. If you have a chance use ec2 and servlet, but that will add costs to your solution.
Also when dealing with S3 you need to take in consideration that some files are not available right after you save them. Look at this answer from S3 FAQ.
For retrieving data directly from S3 you will need to deal with the privileges issue again. Check the access model for S3, but again it's probably easier to manage the access for non public files via the servlet. The good news is that there is no data transfer charge for data transferred between EC2 and S3 within the same region.
Another important point to consider the latter solution
High performance in handling load and network speeds within amazon ecosystem. With direct uploads the client would have to handle complex asynchronous operations of multipart uploads etc instead of focusing on the presentation and rendering of the image.
The servlet hosted on EC2 would be way more powerful than what you can do on your phone.

Adding decision logic to Apache's mod_proxy_balancer with Memcache

What I am trying to achieve is to have Apache's mod_proxy_balancer check if a request was already made using a Memcache store.
Basically:
Streaming media request comes in.
Check if streaming media has already been served with Memcache.
If so, can that streaming media server handle another request.
If so send request to said streaming media server.
If not send request to the next streaming media server in line.
Store key:value pair in Memcache.
My questions are:
Does mod_proxy_balancer already do this in some way?
Is there anyway to make Apache a content-aware load balancer?
Any other suggestions would be greatly appreciated too, other software, other approach, etc.
Cheers.
Looking at 'mod_proxy_balancer.c'; one could, as suggested in the comments in the file, add additional lbmethods. Something along the lines of "bymemcached_t" or "bymemcached_r" where the t and r endings denote the "bytraffic" and "byrequests" methods respectively. We would do our pseudo code above and if not found proceed to the other methods and save the result in the memcached store.
In my research I came across HAProxy which does exactly what I want from its documentation using the balance algorithm option of 'uri' just not using Memcached. Which is fine for my purposes.