Heroku taking too much time to upload to S3 (carrierwave and fog) - ruby-on-rails-3

I have a Rails app, when the user calls one action from the web the heroku app generates a random image, upload that image to S3 and the returns to the user the public url of that image (stored in s3)...
The "upload that image to S3" takes literately ages, it takes for a file of 27kb 20 seconds, that's not possible... I really don't know how to fix this, because the thing is working, it's uploading to s3 but taking way too much...
I was thinking in just storing the image in the tmp folder from the rails app but I don't know how long is going to be there before being deleted... Any idea?
thanks!

If the images are not temporary you should not store them on Heroku as they will be deleted when the dyno goes inactive.
I think you should investigate moving the upload to s3 to a async background job. There are several available options: DelayedJob, Resque, and Sidekiq are some of the more popular gems for background processing. You will also have to add a worker on Heroku that will process your background jobs.
By moving the upload to a background job your users will not have the huge wait time as the image upload process will be done separately.

Related

NSURLSessionUploadTask & API Calls

Problem:
I am building a photo organization/backup iOS app which requires users photos from their device galleries to be uploaded to our server.
User creates his profile and logs into the app and his gallery is scanned for images.The uploads happen in batches of around 10 files as multi-part form data requests. After each batch is uploaded, I need to make an api call to our server to commit the just concluded upload. The server could also be making its own commits like creating albums based on image content etc. after processing the batches previously committed. If that happens, the client gets an out of sync response from server while committing the upload and it has to first fetch the server commits before it commits the upload. Subsequently upload of the next batch of 10 images is scheduled.
I am using NSURLSessionUploadTask for this purpose. When the upload finishes while the app is in suspended state, the app gets woken up and I may be required to do either 1,2,3 or just 2,3 below
Fetch commits done by the server for previous committed uploads
Commit the upload that just finished
Schedule next batch of 10 images for upload
====================================================
This isn't working for us. The app won't have enough time for executing 1,2,3 or 2,3 when it's woken up after the upload finishes. Requesting background execution time can't take is beyond 180 seconds because Apple caps it at that.
I cannot just keep uploading batches without committing them because the server has a buffer of 100 images before it starts overwriting images. If I continue uploading without committing, the server starts to overwrite previously uploaded images thus losing data.
Also I have observed that even if I tried uploading multiple batches of images without performing any commits, the first task I schedule as NSURLSessionUploadTask in the background as soon as the ongoing upload finishes is not picked up by the uploader deamon nsurlsessiond immediately. Sometimes it picks it up after 5 minutes, sometimes after 10. Through out the process, I am having a good Wifi connection and my NSURLSession is configured accordingly.
For the above I had read on StackOverflow that if the nsurlsessiond queue gets empty it doesn't pick the next task immediately. So just for experimenting I submitted 3-4 batches to upload and put the app into background but the behavior is the same. The second upload doesn't get picked up until 5-10 minutes and so on for 3rd and 4th.
====================================================
I have seen Google photos upload continuously without using location services. Dropbox also does it. Their commit architectures may be different from ours. But currently, I am not even able to get 3-4 batches of image to upload continuously even after ignoring the whole commit side of things where I have to make extra api calls.
Can you guide us in analyzing this situation so that I can come up with a solution for background uploads for our app.
Thanks again.

Using AWS S3 for photo storage

I'm going to be using S3 to store user uploaded photos. Obviously, I wont be serving the image files to user agents without resizing them down. However, not one size would do, as some thumbnails will be smaller than other larger previews. So, I was thinking of making a standard set of dimensions scaling from the lowest 16x16 to some highest 1024x1024. Is this a good way to solve this problem? What if I need a new size later on? How would you solve this?
Pre-generating different sizes and storing them in S3 is a fine approach, especially if you know what sizes you need, are likely to use all of the sizes for all of the images, and don't have so many images and sizes that the storage cost is excessive.
Here's another approach I use when I don't want to pre-generate and store all the different sizes for every image, or when I don't know what sizes I will want to use in the future:
Store the original size in S3.
Run a web server that can generate any desired size from the original image on request.
Stick a CDN (CloudFront) in front of the web server.
Now, your web site or application can request a URL like /16x16/someimage.jpg from CloudFront. The first time this happens, CloudFront will get the resized image from your web server, but then CloudFront will cache the image and serve it for you, greatly reducing the amount of traffic that hits your web server.
Here's a service that resizes images from arbitrary URLs, serving them through CloudFront: http://filter.to
This sounds like a good approach. Depending on your application you should define a set of thumbnail sizes that you always generate. But also store the original user file, if your requirements change later. When you want to add a new thumbnail size, you can iterate over all original files and generate the new thumbnails from it. This option gives you flexibilty for later.

Server Load & Scalability for Massive Uploads

I want to upload millions of audio items by users to my server. The current app has designed to give the contents, transcode them and finally send by ftp to storage servers. I want to know:
Does the app server can bear the enormous tasks by user like commenting, uploading, transcoding after scaling to more servers (to carry web app load)?
If the answer of above question is yes, is it correct and best approach? Because a good architecture will be to send transcoding to storage servers wait for finishing the job and sending respond to app server but at the same time it has more complexity and insecurity.
What is the common method for this type of websites?
If I send the upload and transcoding job to storage servers does it compatible with enterprise storage technologies in a long term scalability?
5- The current App is based on PHP. Is it possible to move tmp folder to another servers to overcome upload overload?
Thanks for answer, for tmp folder question number 5. I mean the tmp folder in Apache. I know that all uploaded files before moving to final storage destination (eg: storage servers or any solution) are stored in tmp folder of apache. I was wondering if this is a rule for apache and all uploaded files should be located first in app server, so how can I control, scale and redirect this massive load of storage to a temporary storage or server? I mean a server or storage solution as tmp folder of appche to just be guest of uploaded files before sending to the final storages places. I have studied and designed all the things about scaling of database, storages, load balancing, memcache etc. but this is one of my unsolved question. Where new arrived files by users to main server will be taken place in a scaled architect? And what is the common solution for this? (In one box solution all files will be temporary in the tmp dir of appche but for massive amount of contents and in a scaled system?).
Regards
You might want to take a look at the Viddler architecture: http://highscalability.com/blog/2011/5/10/viddler-architecture-7-million-embeds-a-day-and-1500-reqsec.html
Since I don't feel I can answer this (I wanted to add a comment, but my text was too long), some thoughts:
If you are creating such a large system (as it sounds) you should have some performance tests to see, how many concurrent connections/uploads,... whatever your architecture can handle. As I always say: If you don't know it: "no, it can't ".
I think the best way to deal with heavy load (this is: a lot of uploads, requiring a lot of blocked Threads from the appserver (-> this means, I would not use the Appserver to handle the fileuploads). Perform all your heavy operations (transcoding) asynchronously (e.g. queue the uploaded files, processess them afterwards). In any case the Applicaiton server should not wait for the response of the transcoding system -> just tell the user, that his file are going to be processed and send him a message (or whatever) when its finished. You can use something like gearman for that.
I would search for existing architectures, that have to handle a lot of uploads/conversion too (e.g. flickr) just go to slideshare and search for "flickr" or "scalable web architecture"
I do not really understand this - but I would use Servers based on their tasks (e.g. Applicaiton server, Database serversm, Transconding servers, Storage,...) - each server should do, what he can do best.
I am afraid I don't know what you are talking about when you say tmp folder.
Good luck

Background jobs on amazon web services

I am new to AWS so I needed some advice on how to correctly create background jobs. I've got some data (about 30GB) that I need to:
a) download from some other server; it is a set of zip archives with links within an RSS feed
b) decompress into S3
c) process each file or sometime group of decompressed files, perform transformations of data, and store it into SimpleDB/S3
d) repeat forever depending on RSS updates
Can someone suggest a basic architecture for proper solution on AWS?
Thanks.
Denis
I think you should run an EC2 instance to perform all the tasks you need and shut it down when done. This way you will pay only for the time EC2 runs. Depending on your architecture however you might need to run it all the times, small instances are very cheap however.
download from some other server; it is a set of zip archives with links within an RSS feed
You can use wget
decompress into S3
Try to use s3-tools (github.com/timkay/aws/raw/master/aws)
process each file or sometime group of decompressed files, perform transformations of data, and store it into SimpleDB/S3
Write your own bash script
repeat forever depending on RSS updates
One more bash script to check updates + run the script by Cron
First off, write some code that does a) through c). Test it, etc.
If you want to run the code periodically, it's a good candidate for using a background process workflow. Add the job to a queue; when it's deemed complete, remove it from the queue. Every hour or so add a new job to the queue meaning "go fetch the RSS updates and decompress them".
You can do it by hand using AWS Simple Queue Service or any other background job processing service / library. You'd set up a worker instance on EC2 or any other hosting solution that will poll the queue, execute the task, and poll again, forever.
It may be easier to use Amazon Simple Workflow Service, which seems to be intended for what you're trying to do (automated workflows). Note: I've never actually used it.
I think deploying your code on an Elasticbeanstalk Instance will do the job for you at scale. Because I see that you are processing a huge chunk of data here, and using a normal EC2 Instance might max out resources mostly memory. Also the AWS SQS idea of batching the processing will also work to optimize the process and effectively manage time outs on your server-side

High load image uploader/resizer in conjunction with Amazon S3

we are running a product oriented service, that requires us to daily download and resize thousands and thousands of photos from various web sources and then upload them to Amazon's S3 bucket and use Cloud Front to serve them...
now the problem is that downloading and resizing is really resource consuming and it would take a lot of hours to process them all...
What we are looking for is a service, that would do this for us fast and of course for a reasonable price...
anybody knows such a service? I tried to google it but don't really know how to form the search to get what I need