Uploaded File Storage/Retrieval - file-upload

I am developing a web application that needs to store uploaded files - images, pdfs, etc. I need this to be secure and to scale - I don't have a finite number of uploads to plan for. From my research, the best practice seems to be storing files in the private file system, storing paths and meta data in the database, and serving through an authenticated script.
My question is where should these files be stored?
I can't store them on the web servers because I have more than 1, would be worried about disk space, and don't want the performance hit from replication.
Should they be programmatically uploaded to a CDN? Should I spin up a file server/cluster to handle this?
Is there a standard way for securely storing/retrieving a large number of files for web applications?

"My question is where should these files be stored?"
I would suggest using a dedicated storage server or cloud service such as Amazon AWS. It is secure and completely scalable. That is how it is usually done these days.
"Should they be programmatically uploaded to a CDN?" - yes, along with a matching db entry of some sort for retrieval.
"Should I spin up a file server/cluster to handle this?" - you could. I would suggest one of the many cloud storage services though.
"Is there a standard way for securely storing/retrieving a large number of files for web applications?" Yes. Uploading files and info via web or mobile app (.php, rails, .net, etc) where the app uploads to storage area (not located in public directory) and then inserts file info into a database.

Related

Where and how to store files uploaded by a user using rest api?

Currently I’m using a shared storage(azure file storage) to store profile pictures and company logos and also some custom python scripts uploaded by admins. My rest services are running in a docker swarm cluster where all the nodes have access to the shared location. Are there any drawbacks to this kind of design? I’m currently saving the files to the location and creating a url for that file and serving it as a static resource using my nginx reverse proxy/load balancer. So I was curious to know if there are any drawbacks to this design and how can I make it better?
There are several ways to access, store, and manipulate files in Azure file storage using REST API:
The Azure File service offers the following four resources: the storage account, shares, directories, and files. Shares provide a way to organize sets of files and also can be mounted as an SMB file share that is hosted in the cloud.
More info here
When it comes to the design, it will depend of what kind of concerns your customers may have, slow connectivity, are they going to need these files permanently etc ...

Storing and retrieving files stored separately from codebase coldfusion

We currently have a site running cold fusion 11. In an effort to improve some aspects of security we would like to store all files uploaded by our users on a server separate from our codebase and DB servers.
I'm pretty much starting from scratch here as I wasn't able to find much in my searches so far. What's the best practice for doing this and what cold fusion functions would work for storing and retrieving files from an external source?
I could use some more information to be more helpful. But let's say you have a separate server that stores all your user files on a Windows network. I would use CFContent to serve those files with the file being retrieved over a UNC path.
I'd recommend reading this blog entry of mine on Securely Serving Files via CFContent. Wil, also from CF Webtools, posts one here: Serving File Downloads with ColdFusion
We had a similar issue when we migrated to a Unix platform. Our solution was to mount a file server to the webserver. It's accessed programmatically by ColdFusion as if it's on the same server, but it's inaccessible from the web root (browser). It's worked very smoothly for us.

AWS S3 and AjaXplorer

I'm using AjaXplorer to give access to my clients to a shared directory stored in Amazon S3. I installed the SD, configured the plugin (http://ajaxplorer.info/plugins/access/s3/) and could upload and download files but the upload size is limited to my host PHP limit which is 64MB.
Is there a way I can upload directly to S3 without going over my host to improve speed and have S3 limit, no PHP's?
Thanks
I think that is not possible, because the server will first climb to the PHP file and then make transfer to bucket.
Maybe
The only way around this is to use some JQuery or JS that can bypass your server/PHP entirely and stream directly into S3. This involves enabling CORS and creating a signed policy on the fly to allow your uploads, but it can be done!
I ran into just this issue with some inordinately large media files for our website users that I no longer wanted to host on the web servers themselves.
The best place to start, IMHO is here:
https://github.com/blueimp/jQuery-File-Upload
A demo is here:
https://blueimp.github.io/jQuery-File-Upload/
This was written to upload+write files to a variety of locations, including S3. The only tricky bits are getting your MIME type correct for each particular upload, and getting your bucket policy the way you need it.

Can I easily limit which files a user can download from an Amazon S3 server?

I have tried looking for an answer to this but I think I am perhaps using the wrong terminology so I figure I will give this a shot.
I have a Rails app where a company can have an account with multiple users each with various permissions etc. Part of the system will be the ability to upload files and I am looking at S3 for storage. What I want is the ability to say that users from Company A can only download the files associated with that company?
I get the impression I can't unless I restrict the downloads to my deployment servers IP range (which will be Heroku) and then feed the files through a controller and a send_file() call. This would work but then I am reading data from S3 to Heroku then back to the user vs. direct from S3 to the user.
If I went with the send_file method can I close off my S3 server to the outside world and have my Heroku app send the file direct?
A less secure idea I had was to create a unique slug for each file and store it under that name to prevent random guessing of files i.e. http://mys3server/W4YIU5YIU6YIBKKD.jpg etc. This would be quick and dirty but not 100% secure.
Amazon S3 Buckets support policies for granting or denying access based on different conditions. You could probably use those to protect your files from different user groups. Have a look at the policy documentation to get an idea what is possible. After that you can switch over to the AWS policy generator to generate a valid policy depending on your needs.

Correct Server Schema to upload pictures in Amazon Web Services

I want to upload pictures to the AWS s3 through the iPhone. Every user should be able to upload pictures but they must remain private for each one of them.
My question is very simple. Since I have no real experience with servers I was wondering which of the following two approaches is better.
1) Use some kind of token vending machine system to grant the user access to the AWS s3 database to upload directly.
2) Send the picture to the EC2 Servlet and have the virtual server place it on the S3 storage.
Edit: I would also need to retrieve, should i do it directly or through the servlet?
Thanks in advance.
Hey personally I don't think it's a good idea to use token vending machine to directly upload the data via the iPhone, because it's much harder to control the access privileges, etc. If you have a chance use ec2 and servlet, but that will add costs to your solution.
Also when dealing with S3 you need to take in consideration that some files are not available right after you save them. Look at this answer from S3 FAQ.
For retrieving data directly from S3 you will need to deal with the privileges issue again. Check the access model for S3, but again it's probably easier to manage the access for non public files via the servlet. The good news is that there is no data transfer charge for data transferred between EC2 and S3 within the same region.
Another important point to consider the latter solution
High performance in handling load and network speeds within amazon ecosystem. With direct uploads the client would have to handle complex asynchronous operations of multipart uploads etc instead of focusing on the presentation and rendering of the image.
The servlet hosted on EC2 would be way more powerful than what you can do on your phone.