I have an application that heavily uses the local file system. We need to port the application to use S3. What services are out there that will automate the access to the S3 without having to changing the source code of the application.
These services somehow mask the S3 FS as a local FS.
Thanks.
See FuseOverAmazon (or s3fs) but keep in mind that S3 is an eventual consistency data store and your app should be architected to take that into account. It's also important to note that trying to mount an S3 bucket as a file system has very poor performance.
Take a look at RioFS. Our project is an alternative to “s3fs” project, main advantages comparing to “s3fs” are: simplicity, the speed of operations and bugs-free code. Currently the project is in the “beta” state, but it's been running on several high-loaded fileservers for quite some time.
We are seeking for more people to join our project and help with the testing. From our side we offer quick bugs fix and will listen to your requests to add new features.
Hope it helps !
Related
long time reader but I've usually been able to find the answers I've been looking for in existing posts - but this time I've not been able to.
I am essentially teaching myself AWS CDK from scratch, I've only really just started with it so not finding anything which helps me on my mission may be a result of not knowing enough yet to be asking the right questions... so please bare with me.
Thus far I've used the AWS CDK with Python to create a stack which creates an S3 bucket, and also fires up an EC2 instance with an AWS file storage gateway AMI loaded on it (so running Amazon Linux). This deploys and runs fine - however now I'd like to programmatically set up the S3 bucket to be accessed via an NFS share on the EC2 instance. From what I've seen I'd assumed it is or should be fairly trivial however I keep getting a bit lost in documentation and internet hunts and not quite sure I'm looking in the right places or asking search engines the right questions to unlock the path to achieve this.
It looks like I should be able to script something up to make it happen when the instance is start using user-data but I'm a bit lost. Is anyone able to throw me some crumbs to follow to find a good way of achieving this, or a better way of achieving what I want to happen (which is basically accessing the S3 bucket contents as though they are files on an EC2 instance) - if not tell me how to do it if it's trivial enough?
Much appreciated :)
Dan
You are on good track. user_data can be used for that.
I don't have full code to give you as its use case specific (e.g. which OS are you using?), but the user_data would have to download and install s3fs:
s3fs allows Linux and macOS to mount an S3 bucket via FUSE. s3fs preserves the native object format for files, allowing use of other tools like AWS CLI.
However, S3 is an object storage system, and it can't be really mounted on an instance like you would do with NFS or EBS storage solutions. But with s3fs-fuse you can mimic such a behavior. And for some use-cases it will be sufficient.
So what you can do, is to setup the user_data script through console, verify that it works, and then basically just copy and paste to CDK. Its more of a trial-and-see approach, but this is the best way to learn.
Currently I’m using a shared storage(azure file storage) to store profile pictures and company logos and also some custom python scripts uploaded by admins. My rest services are running in a docker swarm cluster where all the nodes have access to the shared location. Are there any drawbacks to this kind of design? I’m currently saving the files to the location and creating a url for that file and serving it as a static resource using my nginx reverse proxy/load balancer. So I was curious to know if there are any drawbacks to this design and how can I make it better?
There are several ways to access, store, and manipulate files in Azure file storage using REST API:
The Azure File service offers the following four resources: the storage account, shares, directories, and files. Shares provide a way to organize sets of files and also can be mounted as an SMB file share that is hosted in the cloud.
More info here
When it comes to the design, it will depend of what kind of concerns your customers may have, slow connectivity, are they going to need these files permanently etc ...
I want to upload pictures to the AWS s3 through the iPhone. Every user should be able to upload pictures but they must remain private for each one of them.
My question is very simple. Since I have no real experience with servers I was wondering which of the following two approaches is better.
1) Use some kind of token vending machine system to grant the user access to the AWS s3 database to upload directly.
2) Send the picture to the EC2 Servlet and have the virtual server place it on the S3 storage.
Edit: I would also need to retrieve, should i do it directly or through the servlet?
Thanks in advance.
Hey personally I don't think it's a good idea to use token vending machine to directly upload the data via the iPhone, because it's much harder to control the access privileges, etc. If you have a chance use ec2 and servlet, but that will add costs to your solution.
Also when dealing with S3 you need to take in consideration that some files are not available right after you save them. Look at this answer from S3 FAQ.
For retrieving data directly from S3 you will need to deal with the privileges issue again. Check the access model for S3, but again it's probably easier to manage the access for non public files via the servlet. The good news is that there is no data transfer charge for data transferred between EC2 and S3 within the same region.
Another important point to consider the latter solution
High performance in handling load and network speeds within amazon ecosystem. With direct uploads the client would have to handle complex asynchronous operations of multipart uploads etc instead of focusing on the presentation and rendering of the image.
The servlet hosted on EC2 would be way more powerful than what you can do on your phone.
I have a website that allows users to search for music tracks and download those they they select as mp3.
I have the site on my server and all of the mp3s on s3 and then distributed via cloudfront. So far so good.
The client now wishes for users to be able to select a number of music track and then download them all in bulk or as a batch instead of 1 at a time.
Usually I would place all the files in a zip and then present the user a link to that new zip file to download. In this case, as the files are on s3 that would require I first copy all the files from s3 to my webserver process them in to a zip and then download from my server.
Is there anyway i can create a zip on s3 or CF or is there someway to batch / group files in to a zip?
Maybe i could set up an EC2 instance to handle this?
I would greatly appreciate some direction.
Best
Joe
I am afraid you won't be able to create the batches w/o additional processing. firing up an EC2 instance might be an option to create a batch per user
I am facing the exact same problem. So far the only thing I was able to find is Amazon's s3sync tool:
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
In my case, I am using Rails + its Paperclip addon which means that I have no way to easily download all of the user's images in one go, because the files are scattered in a lot of subdirectories.
However, if you can group your user's files in a better way, say like this:
/users/<ID>/images/...
/users/<ID>/songs/...
...etc., then you can solve your problem right away with:
aws s3 sync s3://<your_bucket_name>/users/<user_id>/songs /cache/<user_id>
Do have in mind you'll have to give your server the proper credentials so the S3 CLI tools can work without prompting for usernames/passwords.
And that should sort you.
Additional discussion here:
Downloading an entire S3 bucket?
s3 is single http request based.
So the answer is threads to achieve the same thing
Java api - uses TransferManager
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html
You can get great performance with multi threads.
There is no bulk download sorry.
On an app server in which a few source files change frequently, Is the following approach recommended?
Use a cron job with S3tools to sync the source files with S3 private bucket (every 15 mins for example).
On server start up - Use user data script to sync with the sources bucket to retrieve the latest sources.
Advantages:
1. No need to attach EBS for app server just to save a few files
2. Similar setup to all app servers
3. Sources automatically backed up.
4. As a byproduct, distributes code to multiple app servers automatically.
Disadvantages:
keeping source code on S3
other?
What do you think about this methodology? Is this the right way to use EC2 when source code change frequently (a few times a day) please recommend the best approach to run EC2 instances where sources change often.
I think you're better off using a proper source code repository, like Subversion or Git, rather than storing the source files on S3. That way you can have a central location for the source files while avoiding the update consistency problems that kdgregory mentioned.
You can put the source repository on one of your own servers outside of EC2, or host it on an EC2 instance (make sure the repository files are on an EBS volume in the latter case).
If you're going to be running a large number of EC2 instances, then it will be less effort to have them sync themselves from a central location (ie, you sync to private bucket, app-servers sync from that bucket).
HOWEVER, recognize that updates to an S3 bucket are atomic only at the object level, and more importantly, are not guaranteed to be immediately consistent (although I recall seeing a recent note that the us-west endpoint does offer read-after-write consistency).
This means that your app-servers may load a set of new files that are internally inconsistent -- some will be old, some will be new. If this is a problem for you, then you should implement a scheme that uploads directly to the app-servers, and ensures changeset consistency (perhaps by uploading to a temporary directory that is then renamed).