Modify S3 API to access Ceph instead of Amazon S3 storage - amazon-s3

I have a JAR file - jets3t-0.7.4.jar, by which I can access Amazon's S3 storage. I need to modify its source code so that it accesses Ceph object storage instead. I know it can done by modfying the S3 API, but do not know how. Does anyone know how to do this? I googled for information, but didn't really find anything informative. Any help is appreciated. Thanks!

Just let the S3 endpoint resolve to your ceph radosgw (ceph's S3 API interface.), via /etc/resolv.conf, dnsmasq, jets3t's config....many ways available.
Many object storage claim that they are S3 compatible, but in fact they are not. I think ceph is one of them. If what you want is fully compatible, google cloudian.

Related

Is it possible to sync an azure repo with MWAA (Amazon Workflows for Apache Airflow)?

I have set up a private MWAA instance in AWS. It has set up a bucket that stores DAGs in S3.
I've created a private repository in Azure DevOps and have set up a role that can access this bucket.
With Azure-Pipelines is it possible to sync the entire repository to control the DAGs created/modified in that S3 bucket?
I've seen it's possible to create artefacts and push them to the S3 bucket, but what if a dag is deleted? The DAG will still persist in the S3 Bucket and will still be available in MWAA.
Any guidance will be appreciated.
If you just want to sync entire repository to S3 bucket,you can use the task Amazon S3 Upload in your azure pipeline.
I'm not sure if that will fully address your problem, though.
If there is any misunderstanding, please feel free to add comments related to your issue.

Mount S3 bucket as an NFS share on an EC2 instance

long time reader but I've usually been able to find the answers I've been looking for in existing posts - but this time I've not been able to.
I am essentially teaching myself AWS CDK from scratch, I've only really just started with it so not finding anything which helps me on my mission may be a result of not knowing enough yet to be asking the right questions... so please bare with me.
Thus far I've used the AWS CDK with Python to create a stack which creates an S3 bucket, and also fires up an EC2 instance with an AWS file storage gateway AMI loaded on it (so running Amazon Linux). This deploys and runs fine - however now I'd like to programmatically set up the S3 bucket to be accessed via an NFS share on the EC2 instance. From what I've seen I'd assumed it is or should be fairly trivial however I keep getting a bit lost in documentation and internet hunts and not quite sure I'm looking in the right places or asking search engines the right questions to unlock the path to achieve this.
It looks like I should be able to script something up to make it happen when the instance is start using user-data but I'm a bit lost. Is anyone able to throw me some crumbs to follow to find a good way of achieving this, or a better way of achieving what I want to happen (which is basically accessing the S3 bucket contents as though they are files on an EC2 instance) - if not tell me how to do it if it's trivial enough?
Much appreciated :)
Dan
You are on good track. user_data can be used for that.
I don't have full code to give you as its use case specific (e.g. which OS are you using?), but the user_data would have to download and install s3fs:
s3fs allows Linux and macOS to mount an S3 bucket via FUSE. s3fs preserves the native object format for files, allowing use of other tools like AWS CLI.
However, S3 is an object storage system, and it can't be really mounted on an instance like you would do with NFS or EBS storage solutions. But with s3fs-fuse you can mimic such a behavior. And for some use-cases it will be sufficient.
So what you can do, is to setup the user_data script through console, verify that it works, and then basically just copy and paste to CDK. Its more of a trial-and-see approach, but this is the best way to learn.

Copy file from AWS S3 bucket to Google cloud storage bucket

I am trying to copy file from AWS S3 bucket to Google Storage bucket. I am trying to implement with python api. Please help me if anyone done this.
Thanks in advance.
You can use gsutil to do this.
You can also use the GCS Transfer Service.

Connect to AWS S3 without API

I've looked everywhere on the Interweb but couldn't find a satisfying answer...
Does anybody know what "protocol" the AWS S3 speaks?
Our idea is to write a Function for a PLC (no chance to use the provided API) to communicate directly with AWS S3.
For Example PLC to "AWS IoT" works in MQTT/HTTP - how can I skip "AWS IoT"?
I know there is the possibility to put an IoT device inbetween - but we are evaluating our possibilities right now.
Thank you in advance
All of the AWS services have a documented REST API - the S3 one is here. In addition, all of their libraries are open source so you could likely get some ideas from them too.

How do I use Amazon's new RRS for S3?

Reduced Redundancy Storage (RRS) is a new service from Amazon that is a bit cheaper than S3 because there is less redundancy.
However, I can not find any information on how to specify that my data should use RRS rather than standard S3. In fact, there doesn't seem to be any website interface for an S3 services. If I log into AWS, there are only options for EC2, Elastic MapReduce, CloudFront and RDS, none of which I use.
I know this question is old but it's worth mentioning that Amazon's interface for S3 now has an option to change your files (recursively) to RRS. Select a folder and right click on it, under properties change the storage to RRS.
You can use S3 Browser to switch to Reduced Redundancy Storage. It allows you to view/edit storage class for a single file or for multiple files. Moreover, you can configure default storage class for the bucket, so S3 Browser will automatically apply predefined storage class for all new files you are uploading through S3 Browser.
If you are using S3 Browser to work with RRS, the following article may be helpful:
Working with Amazon S3 Reduced Redundancy Storage (RRS)
Note, Storage Class preferences are stored in a local settings file.Other s3 applications are using their own way to store bucket defaults and currently there is not single standard on this.
All objects in Amazon S3 have a
storage class setting. The default
setting is STANDARD. You can use an
optional header on a PUT request to
specify the setting
REDUCED_REDUNDANCY.
From: http://aws.amazon.com/s3/faqs/#How_do_I_specify_that_I_want_to_store_my_data_using_RRS
If you are looking for a way to convert existing data in amazon s3, you can use a fairly recent version of boto and a script I wrote. Details explained on my blog:
http://www.bryceboe.com/2010/07/02/amazon-s3-convert-objects-to-reduced-redundancy-storage/
If you're on a mac, the free cyberduck ftp program will do it. Log into S3, right-click on the bucket (or folder, or file) and choose 'info' and change the storage class from 'unknown' or 'regular s3 storage' to 'reduced redundancy storage'. Took it about 2 hours to change 30,000 files for me...
If you use boto, you can do this:
key.change_storage_class('REDUCED_REDUNDANCY')