Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I would like to serve user uploaded content (pictures, videos, and other files) from a CDN. using Amazon S3 with cloudfront seems like a reasonable way to go. My only question is about the speed of the file system. My plan was to host user media with the following uri. cdn.mycompany.com/u/u/i/d/uuid.jpg.
I don't haven any prior experience with S3 or CDN's and I was just wondering if this strategy would scale well to handle a large amount of user uploaded content. And if there might be another conventional way to accomplish this.
You will never have problems dealing with scale on CloudFront. It's an enterprise-grade beast.
Disclaimer: Not if you're Google.
It is an excellent choice. Especially for streaming video and audio, CloudFront is priceless.
My customers use my plugin to display private streaming video and audio, one of them even has 8,000 videos in one bucket without problems.
My question stemmed from a misunderstanding of S3 buckets as a conventional file system. I was concerned that hacking too many files in the same directory would create overhead in finding the file. However, it turns out that S3 buckets are implemented more something like a hashmap so this overhead doesn't actually exist. See here for details: Max files per directory in S3
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 days ago.
Improve this question
I want to create a java utility to read S3 bucket information.
We can connect to s3 via native s3 APIs and the Hadoop filesystem approach.
Approach 1: Using S3 APIs
AmazonS3 s3client = AmazonS3ClientBuilder
.standard()
.withCredentials(new AWSStaticCredentialsProvider(credentials))
.withRegion(Regions.valueOf(region))
.build();
Approach 1: Using Hadoop Filesystem:
configuration.set("fs.s3a.access.key","XXXXXXXXXXX");
configuration.set("fs.s3a.secret.key","XXXXXXXXXXX");
configuration.set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem");
configuration.set("fs.s3a.endpoint","http://127.0.0.1:8080");
UserGroupInformation.setConfiguration(configuration);
fileSystem = new Path("s3a://"+ bucketName).getFileSystem(configuration);
Do we know when we use which approach? Which approach is more efficient to read data?
In my observation, the filesystem route is slower. But I have not found any documentation supporting the performance difference.
Performance shouldn't be the only factor. If you want higher performance, or at least better file operation consistency guarantees, look into S3Guard.
But if you have to create a Java client that will only ever talk to S3, and never needs to integrate with Hadoop ecosystem, or use other Hadoop compatible filesystems (HDFS, GCS, ADLS, etc), then you should use plain AWS SDK.
If you're trying to run some mocked S3 service (or MinIO) on 127.0.0.1, then that's not a proper benchmark to a real S3 service
Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 4 months ago.
Improve this question
I have been using rclone to back up google drive data to AWS S3 cloud storage. I have multiple google drive accounts whose backup happens on AWS S3. All those google drives have different numbers of documents.
I want to compress those documents into a single zip file and then it needs to be copied on S3.
Is there any way to achieve the same?
I referred to the link below, but it doesn't have complete steps to accomplish the task.
https://rclone.org/compress/
Any suggestion would be appreciated.
Rclone can't compress the files, but you can instead use a simple code to zip or rar the files and then use rclone to back them up to AWS.
If this is OK, I can explain the details here.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Is there a way to check that my files are already on the edge servers for my users to load from? Or does amazon s3 take time to spread your files around the world? How Long and can I receive notification about when?
So after I uploaded a file, I immediately tested the load speed by asking users from other far away places(like places in Japan). They said that it was rather slower than my current hosting in the US. That's odd because Amazon does have an edge server in Tokyo so Amazon s3 should be faster?
Before I created my bucket, I did set the region to be in the standard US. Is that why? If so, is there a way to set your files to work around the world?
Thank you for your time.
As you already said, your S3 buckets are situated in a specific location, for example us-east, europe, us-west etc. This is the place where your files are physically stored. They are not distributed geographically. Users from other places in the world will experience delay when requesting data from these buckets.
What you are looking for is the Cloudfront CDN from Amazon. You can specify an origin (that would be your S3 bucket in your case) and then your files will be distributed to all the Amazon Cloudfront edge locations worldwide. Check out their FAQ and the list of edge locations.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
We're uploading and serving/streaming media (pics,videos) using amazon s3 for storage combined with cloudfront for serving media. The site is used slightly but the Amazon costs come to 3000 $ per month and from the report 90% of the costs originate from the S3 service .
Heard that cloud can be expensive if you don't code the right way ..now my questions :
What is the right way ? and where should I pay more attention, to the way I upload files or to the way I serve them?
Has anyone else had to deal with unexpected high costs , if yes what was the cause?
We have almost similar model. We stream (rtmp ) from S3 and cloudfront. We do have 1000s of files and decent load, but our monthly bill for s3 is around 50$ ( negligible as compared to your figure). Firstly , you should complain about your charges to the technical support of AWS. They always give you a good response and also suggest better ways to utilize resources. Secondly , I think if you do live streaming, where you divide the file into chucks and stream them one by one, instead of streaming or downloading the whole file, it might be effective , in terms of i/o where users are not watching the whole video, but just the part of it. Also, you can try to utilize caching eat application level.
Another opportunity to get better picture on what's going on in your buckets: Qloudstat
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am trying to communicate to a client the likely-hood of losing files in S3. I would also like to know if it is possible to lose an entire bucket from S3. So, I would like to know the following:
Is there a documented expected file loss percentage in S3?
Is there a documented expected bucket loss percentage in S3?
When I say "lose" a file. I mean a file that is lost, damaged or otherwise unable to be pulled from S3. This "loss" is caused by a failure on S3. It is not caused by a tool or other user error.
Amazon doesn't give any kind of SLA or data loss guarantees for data stored on S3, but as far as I know nobody has ever lost any data on S3 aside from user/tool errors.
I would say the probability of user / coder error causing data loss is substantially greater than data loss through some kind of failure on S3. So you may wish to consider some kind of backup strategy to mitigate that.