When to use s3cmd over accessing the S3 API programmatically? - amazon-s3

I've been having difficulty understanding when to use s3cmd program over using the Java API. A vendor has documentation on accessing S3 with s3cmd. It is unclear to me as the bucket names appear to be dynamic. No region is specified. Additionally, I'm reaching out over an endpoint. I've tried writing some Java code to interact with S3 the same way that s3cmd does but I haven't been able to connect. Overall, it appears to quite a bit different.
To me s3cmd seems to be a utility to manipulate these files or quickly get at them. Integrating this utility into a Java program seems meaningless.
Anyone have any resources or can help me understand this better?

S3cmd (s3cmd) is a free command line tool and client for uploading, retrieving and managing data in Amazon S3 and other cloud storage service providers that use the S3 protocol, such as Google Cloud Storage or DreamHost DreamObjects. It is best suited for power users who are familiar with command line programs. It is also ideal for batch scripts and automated backup to S3, triggered from cron, etc.
S3cmd is written in Python. It's an open source project available under GNU Public License v2 (GPLv2) and is free for both commercial and private use. You will only have to pay Amazon for using their storage.
Lots of features and options have been added to S3cmd, since its very first release in 2008.... we recently counted more than 60 command line options, including multipart uploads, encryption, incremental backup, s3 sync, ACL and Metadata management, S3 bucket size, bucket policies, and more!

Related

Comparison of uploading files to GCS using Google Drive vs gsutil

I have been comparing how to upload files to a cloud storage, one is in-browser (or emulating a browser) and the other is command-line via gsutil to a Google Cloud Storage bucket.
Does Google Drive use gsutil in the backend, or or the uploader a totally customized and proprietary piece of software? Is there a way to achieve upload speeds to a Google Cloud Storage bucket similar to the upload speeds I'm able to achieve via Drive? If not, what would you suggest for how to get upload speeds equivalent to that in Google Drive, to upload files to a GCS bucket?
I'm not sure about GDrive using gsutil on the background.
There are several optimizations that you can use to improve gsutil speeds.
First of all you might use perfdiag to launch a small diagnostics tests that will give you and overview and possible speeds achievable.
gsutil perfdiag -o test.json gs://<your bucket name>
Secondly you will need to understand your workload(small/big files) and identifying the need for a regional or multi regional bucket(yes there is a perf difference)tl;dr:
"Regional buckets are great for data processing since their physical distance is fairly tight, and the overhead of write consistency is low."
"Multiregional Storage, on the other hand, guarantees 2 replicates which are geo diverse (100 miles apart) which can get better remote latency and availability.
"
There is some information on cloud Atlas specifically on this topic, you can check out in here:
https://medium.com/google-cloud/google-cloud-storage-what-bucket-class-for-the-best-performance-5c847ac8f9f2
https://medium.com/google-cloud/google-cloud-storage-large-object-upload-speeds-7339751eaa24?source=user_profile---------12----------------
https://medium.com/#duhroach/optimizing-google-cloud-storage-small-file-upload-performance-ad26530201dc
https://medium.com/#duhroach/google-cloud-storage-performance-4cfcec8bad72
https://cloud.google.com/storage/docs/best-practices

How to copy files from an encrypted S3 bucket to Google Cloud Storage?

I need to sync some files between an encrypted (S3-SSE) S3 bucket and a Google Cloud Storage bucket.
The task sounds simple, as gsutil supports S3, but unfortunately it seems it does not support SSE:
Requests specifying Server Side Encryption with AWS KMS managed keys require AWS Signature Version 4.
Is there an easy way to sync files between an encrypted (S3-SSE) S3 bucket and a Google Cloud Storage bucket (apart from writing our own script)?
As gsutil doesn't currently support Signature Version 4, there doesn't look to be an "easy" way (i.e. without writing a script of your own) to sync files between your two buckets. A naive solution might simply chain together the s3 cli and gsutil tools for each copy, using your machine as the middleman for a daisy-chain approach as gsutil already does for cross-cloud-provider copies.

AWS FTP behavior

I'm having some issue on my AWS S3 bucket and vsftpd.
I've created a vsftpd instance and mount AWS S3 bucket. My issue is that everytime I upload a file and the connection was disrupted, it appends the existing file on the S3 bucket instead of override it when the FTP client retry. What should I set on the S3 bucket policy to have such behavior to override instead of append?
There are no Amazon S3 configuration settings that would impact this behaviour -- it is totally the result of the software you are using.
It's also worth mentioning that FTP is a rather old protocol and these days there are much better alternatives, such as uploads via the browser or Dropbox-like shared folders.
One of the easiest options is to have your users upload directly to Amazon S3 -- that way, you don't need to run any servers. This could be done by uploading via a browser, or by providing users with some software, such as Cloudberry Explorer or the AWS Command-Line Interface (CLI).
I highly encourage you to stop using FTP these days.

Managing files on Amazon S3

I have a git repository that stores audio files.
Obviously, it's not the best usage of git, and the repo has become quite large.
As an alternative, I would like to be able to manipulate these audio files at the command line, "commiting" when some work is done.
Is this type of context possible with manipulating Amazon S3 files at the command line?
Or do you scp, for example, files to S3?
There are some rsync tools to S3 that may work for you, here is an example which I have not tried: http://www.s3rsync.com/
How important are the older versions of the audio? Amazon S3 buckets can have 'versioning' turned on, and you get full versioning support. You pay full $ for each version - I don't know if you have 10 GB or 10TB to store, and your budget, etc... The amazon versioning is nice, but there are not a lot of tools that fully support it.
To manipulate S3 files you will first have to download it and then upload it when you are done, this is relatively simple to do.
However, if the amount of files you have is truly large, the slow transfer rate and bandwidth charge will kill you. If you don't have that much files, DropBox is built on top of S3 and have syncing and a rudimentary version control, bandwidth is not charged..
I felt like using a good networked storage system and git on your LAN is still the better idea.

How do I use Amazon's new RRS for S3?

Reduced Redundancy Storage (RRS) is a new service from Amazon that is a bit cheaper than S3 because there is less redundancy.
However, I can not find any information on how to specify that my data should use RRS rather than standard S3. In fact, there doesn't seem to be any website interface for an S3 services. If I log into AWS, there are only options for EC2, Elastic MapReduce, CloudFront and RDS, none of which I use.
I know this question is old but it's worth mentioning that Amazon's interface for S3 now has an option to change your files (recursively) to RRS. Select a folder and right click on it, under properties change the storage to RRS.
You can use S3 Browser to switch to Reduced Redundancy Storage. It allows you to view/edit storage class for a single file or for multiple files. Moreover, you can configure default storage class for the bucket, so S3 Browser will automatically apply predefined storage class for all new files you are uploading through S3 Browser.
If you are using S3 Browser to work with RRS, the following article may be helpful:
Working with Amazon S3 Reduced Redundancy Storage (RRS)
Note, Storage Class preferences are stored in a local settings file.Other s3 applications are using their own way to store bucket defaults and currently there is not single standard on this.
All objects in Amazon S3 have a
storage class setting. The default
setting is STANDARD. You can use an
optional header on a PUT request to
specify the setting
REDUCED_REDUNDANCY.
From: http://aws.amazon.com/s3/faqs/#How_do_I_specify_that_I_want_to_store_my_data_using_RRS
If you are looking for a way to convert existing data in amazon s3, you can use a fairly recent version of boto and a script I wrote. Details explained on my blog:
http://www.bryceboe.com/2010/07/02/amazon-s3-convert-objects-to-reduced-redundancy-storage/
If you're on a mac, the free cyberduck ftp program will do it. Log into S3, right-click on the bucket (or folder, or file) and choose 'info' and change the storage class from 'unknown' or 'regular s3 storage' to 'reduced redundancy storage'. Took it about 2 hours to change 30,000 files for me...
If you use boto, you can do this:
key.change_storage_class('REDUCED_REDUNDANCY')