How to build a daemon to encode video files on S3? - amazon-s3

I am interested in running a daemon to go over user uploaded video files and encode them in an optimal format (and add some watermarks).
I was considering services such as Zencoder, Encoding.com, Amazon's encoding service but some lack overlaying capabilities and some are just too expensive for our (big) volumes.
I want to build a daemon that encodes videos that are located on S3 once users upload them.
The solution I thought of would be Python Heroku servers using Celery for a task queue to keep track of the encoded files and ffmpeg to do the actual work. However, I ran into troubles compiling ffmpeg for Heroku (with libass support, so the basic ffmpeg bins aren't enough).
What approach/technology stack would you consider for this mini-project?
Thanks!
Yuval

Have you tried ffencoderd, is a basic perl daemon using ffmpeg, it accepts encoding jobs through a soap interface or locally creating files in an xml format under a server's directory. It enables a multimedia repository through an http server.
Hope helps,
(By the way i am the dev, :P , but serious, just tried to help)

Related

AWS S3 and AjaXplorer

I'm using AjaXplorer to give access to my clients to a shared directory stored in Amazon S3. I installed the SD, configured the plugin (http://ajaxplorer.info/plugins/access/s3/) and could upload and download files but the upload size is limited to my host PHP limit which is 64MB.
Is there a way I can upload directly to S3 without going over my host to improve speed and have S3 limit, no PHP's?
Thanks
I think that is not possible, because the server will first climb to the PHP file and then make transfer to bucket.
Maybe
The only way around this is to use some JQuery or JS that can bypass your server/PHP entirely and stream directly into S3. This involves enabling CORS and creating a signed policy on the fly to allow your uploads, but it can be done!
I ran into just this issue with some inordinately large media files for our website users that I no longer wanted to host on the web servers themselves.
The best place to start, IMHO is here:
https://github.com/blueimp/jQuery-File-Upload
A demo is here:
https://blueimp.github.io/jQuery-File-Upload/
This was written to upload+write files to a variety of locations, including S3. The only tricky bits are getting your MIME type correct for each particular upload, and getting your bucket policy the way you need it.

Managing files on Amazon S3

I have a git repository that stores audio files.
Obviously, it's not the best usage of git, and the repo has become quite large.
As an alternative, I would like to be able to manipulate these audio files at the command line, "commiting" when some work is done.
Is this type of context possible with manipulating Amazon S3 files at the command line?
Or do you scp, for example, files to S3?
There are some rsync tools to S3 that may work for you, here is an example which I have not tried: http://www.s3rsync.com/
How important are the older versions of the audio? Amazon S3 buckets can have 'versioning' turned on, and you get full versioning support. You pay full $ for each version - I don't know if you have 10 GB or 10TB to store, and your budget, etc... The amazon versioning is nice, but there are not a lot of tools that fully support it.
To manipulate S3 files you will first have to download it and then upload it when you are done, this is relatively simple to do.
However, if the amount of files you have is truly large, the slow transfer rate and bandwidth charge will kill you. If you don't have that much files, DropBox is built on top of S3 and have syncing and a rudimentary version control, bandwidth is not charged..
I felt like using a good networked storage system and git on your LAN is still the better idea.

Allowing users to download files as a batch from AWS s3 or Cloudfront

I have a website that allows users to search for music tracks and download those they they select as mp3.
I have the site on my server and all of the mp3s on s3 and then distributed via cloudfront. So far so good.
The client now wishes for users to be able to select a number of music track and then download them all in bulk or as a batch instead of 1 at a time.
Usually I would place all the files in a zip and then present the user a link to that new zip file to download. In this case, as the files are on s3 that would require I first copy all the files from s3 to my webserver process them in to a zip and then download from my server.
Is there anyway i can create a zip on s3 or CF or is there someway to batch / group files in to a zip?
Maybe i could set up an EC2 instance to handle this?
I would greatly appreciate some direction.
Best
Joe
I am afraid you won't be able to create the batches w/o additional processing. firing up an EC2 instance might be an option to create a batch per user
I am facing the exact same problem. So far the only thing I was able to find is Amazon's s3sync tool:
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
In my case, I am using Rails + its Paperclip addon which means that I have no way to easily download all of the user's images in one go, because the files are scattered in a lot of subdirectories.
However, if you can group your user's files in a better way, say like this:
/users/<ID>/images/...
/users/<ID>/songs/...
...etc., then you can solve your problem right away with:
aws s3 sync s3://<your_bucket_name>/users/<user_id>/songs /cache/<user_id>
Do have in mind you'll have to give your server the proper credentials so the S3 CLI tools can work without prompting for usernames/passwords.
And that should sort you.
Additional discussion here:
Downloading an entire S3 bucket?
s3 is single http request based.
So the answer is threads to achieve the same thing
Java api - uses TransferManager
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html
You can get great performance with multi threads.
There is no bulk download sorry.

Is it possible to install LAME on shared hosting?

I have a shared hosting account, is it possible to install LAME onto my account? I also don't have SSH access. I want to convert wav to mp3
You will need to request that the server administrator installs it. Most shared hosting providers are used to these types of requests, and are generally accommodating. However, they may have concerns with you processing large amounts of audio content, and using more than your fair share of system resources.
It's worth asking, at least!
If you don't have direct access to the server, I don't really see a way to do that.
That's more of a question for https://serverfault.com/. The best bet is to simply ask your provider.
But unless there is a policy against that, you could just replicate a lame binary on your webspace (outside of cgi-bin and htdocs root!) and make it executable (chmod +x via ftp). It's self-contained, so you just need to take care of the architecture (usually x86-64 nowadays, but see phpinfo).
Without having access to the server you will not be able to install LAME on the system. Some providers will take into consideration your request and will install LAME it really depends who your hosting provider is.
If this is a service you want to provide, you can always allow anyone to upload a wav file and when you have time you could convert it and send the user a link with the location of their mp3.

Fastest / best way copy data between S3 to EC2?

I have a fairly large amount of data (~30G, split into ~100 files) I'd like to transfer between S3 and EC2: when I fire up the EC2 instances I'd like to copy the data from S3 to EC2 local disks as quickly as I can, and when I'm done processing I'd like to copy the results back to S3.
I'm looking for a tool that'll do a fast / parallel copy of the data back and forth. I have several scripts hacked up, including one that does a decent job, so I'm not looking for pointers to basic libraries; I'm looking for something fast and reliable.
Unfortunately, Adam's suggestion won't work as his understanding of EBS is wrong (although I wish he was right and often thought myself it should work that way)... as EBS has nothing to do with S3, but it will only give you an "external drive" for EC2 instances that are separate, but connectable to the instances. You still have to do copying between S3 and EC2, even though there are no data transfer costs between the two.
You didn't mention an operating system of your instance, so I cannot give tailored information. A popular command line tool I use is http://s3tools.org/s3cmd ... it is based on Python and therefore, according to info on its website it should work on Win as well as Linux, although I use it ALL the time on Linux. You could easily whip up a quick script that uses its built in "sync" command that works similar to rsync, and have it triggered every time you're done processing your data. You could also use the recursive put and get commands to get and put data only when needed.
There are graphical tools like Cloudberry Pro that have some command line options for Windows too that you can setup schedule commands. http://s3tools.org/s3cmd is probably the easiest.
By now, there is a sync command in the AWS Command line tools, that should do the trick: http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
On startup:
aws s3 sync s3://mybucket /mylocalfolder
before shutdown:
aws s3 sync /mylocalfolder s3://mybucket
Of course, the details are always fun to work out eg. how can parallel it is (and can you make it more parallel and is that any faster goven the virtual nature of the whole setup)
Btw hope you're still working on this... or somebody is. ;)
I think you might be better off using an Elastic Block Store to store your files instead of S3. An EBS is akin to a 'drive' on S3 that can be mounted into your EC2 instance without having to copy the data each time, thereby allowing you to persist your data between EC2 instances without having to write to or read from S3 each time.
http://aws.amazon.com/ebs/
Install s3cmd Package as
yum install s3cmd
or
sudo apt-get install s3cmd
depending on your OS
then copy data with this
s3cmd get s3://tecadmin/file.txt
also ls can list the files.
for more detils see this
For me the best form is:
wget http://s3.amazonaws.com/my_bucket/my_folder/my_file.ext
from PuTTy