How to handle large file processing on a Heroku application? - ruby-on-rails-3

I have a simple Rails app hosted on Heroku.
I' trying to upload a 50MB file and Heroku shuts down the request after 30 seconds - as expected from reading their Docs.
How do I handle this situation?
I was thinking of creating a PHP file on my dedicated server and perform an AJAX request with the file to that PHP file and return a string URL to the file asset. Then in Rails when submitting the form, I would use that file path to the dedicated server.

You should have the user upload the file directly from the browser to AWS S3 or similar service. Here's a blog post on how to configure this. This means that the file will not have to travel through Heroku. It has the added benefit of making the file immediately available to all dynos if you've scaled your app to multiple dynos (versus being available on just the dyno that accepted the upload).

Related

s3 to ftp server without downloading to ec2

Is there is way (programmatically) to transfer file from s3 bucket to an external ftp server without downloading it to an ec2 instance ?
More details:
I have a Django server running on EC2 which serves an angular web app.
User uploads a file to S3 bucket using my web app and once the upload is complete the web app sends a POST request containing the file object s3 url.
The Django server upon receiving the POST request may need to copy the file (uploaded to s3) to an external ftp server. The target ftp server may be different depending upon the user who uploaded the file (each user group may have her own ftp server).
I understand that upon receiving POST request, Django server can download the file from s3 and then upload it to the appropriate target ftp server.
My question is: Can I reduce overhead on my EC2 instance in step 4 by somehow initiating a transfer from s3 to the target ftp server and get a callback/notification when that transfer completes (success or error).
Thanks.
You can create a lambda function to do the same.
A complete reference of the implementation is discussed here.
https://pythonvibes.wordpress.com/2016/12/09/ftp-and-sftp-through-lambda/
Hope it helps.

Uploading large file (10+ GB) from Web client via azure web site to azure blob storage

I've got a bit of a problem in uploading a really large file into azure blob storage.
I have no problem uploading that file into the web site as a file
upload in an upload directory.
I have no problem either putting this into the blob storage, as chunking will be handled internally.
The problem I'm having is that the time it takes to move the large file from the upload directory to the blob storage takes longer than the browser timeout and the customer sees an error message.
As far as I know, the solution is to chunk-upload directly from the web browser.
But how do I deal with the block ids? Since the web service is supposed to be stateless, I don't think I can keep around a list of blocks already uploaded.
Also, can the blob storage deal with out-of-order blocks?
And do I have to deal with all the state manually?
Or is there an easier way, maybe just handing the blob service the httprequest input stream from the file upload post request (multipart form data)?
Lots of Greetings!
You could move from the web server to blobs asynchronously. So return success for the original request back once file is on web server, and then have javascript query your web server periodically to confirm file has made it to durable storage in blobs. This javascript doing the polling can then display success to the user once it gets a success response from web server, confirming that the file has made it to blob storage.

Uploading large files with Carrierwave to S3

So I will need to upload large files (zip files that are a few GB large) to S3, and I would like Carrierwave to manage the download/distribution of those files.
Meaning, when a user pays Carrierwave can automagically generate the dynamic URL and send it to them. I know how to do this already, but it just occurred to me that I have never uploaded files via Carrierwave that are bigger than a few dozen MB, much less a few GB to S3.
Given the flakiness of HTTP connections, I figure this is a suboptimal way to do it.
I don't have that many files to upload (maybe 10 - 20 max), and users won't be uploading them. It will be a storefront where the customers will be buying/downloading the files, not uploading them.
It would be nice if there was a way for me to upload the files into my S3 bucket separately (say FTP, git, or some other mechanism) and then just link it to my app through Carrierwave in some way.
What's the best way to approach this?
Also, don't forget that you will encounter the Heroku 30 second timeout when you are uploading the file in the first place.
Don't worry though, there are options:
Direct Upload - S3 supports direct upload where you present a form which uploads directly to s3 bypassing Heroku, you then receive a call back into your application with the uploaded files details for you to process (https://github.com/dwilkie/carrierwave_direct)
Upload to S3 and then expose bucket/folder in your application to connect to your models. We do this approach with a number of clients. They use Transmit (Mac Client) to upload large assets to S3 and then visit their app to link the asset to a Rails model.
Also, I'm pretty sure S3 is an HTTP based service so you're only going to be able to upload via HTTP.

Pyramid/Pylons: How to check if an uploaded file is complete in a POST request?

I'm building a web tool which allows users to upload PDFs to a server using their web browsers. The server is based on Python (Paste + Pyramid).
The problem I have right now is the following: If a user uploads a rather large file (let's say 100 MB) and they cancel the upload before it is completed, my handler code on the server is still called (instead of the request being aborted).
The problem is that the request.POST['myfile'].file is incomplete when that happens. This effectively means that the PDF file is corrupted if I simply write it to some place on the server.
When I watch the server's log, it shows a "broken pipe" exception within the Paste server; however I have no idea how to catch that exception and have it prevent my view/handler code from executing and storing the incomplete file.
Seems like the paster HTTP server does not correctly validate the uploaded form data and simply passes the request down the WSGI pipeline even if the connection (HTTP POST) was closed by the user.
I worked around this issue by simply setting up NGINX to act as a reverse proxy. This also adds some security benefits as it might be better tested than paster.
Update:
My main problem was that I was using runserver (the built in web server of manage.py). After some trial and error we ended up using WSGI.
More specifically, uWSGI and Nginx as web server. Static content is served directly by Nginx while dynamic pages are piped through uWSGI and are handled by the Python web app.
Unless you are doing something fancy (like tracking the upload progress, etc), your pylons controller should not be invoked until the entire file has been uploaded.

How do I get a status report of all files currently being uploaded via a HTTP form on an Apache Server?

How do I get a status report of all files currently being uploaded via HTTP form based file upload on an Apache Server?
I don't believe you can do this with Apache itself. The upload looks like nothing more than a POST as far as Apache cares. There are modules and other servers that do special processing to uploads so you may have some luck there. It would probably be easier to keep track of it in your application.
Check out SWFUpload, its uses Flash (in a nice way) to assist with managing multiple uploads.
There are events you can monitor for how many files of a set have been uploaded.