Sync files from amazon s3 to azure repo - amazon-s3

Is there any way to sync files from an s3 bucket to azure devops without having to build a pipeline?
I'm trying to research some things, I found the AWS Toolkit for Azure DevOps, but without success to find something that I don't need to develop. I would like to do something simpler and with a data synchronization if possible.

Related

S3 Buckets notifications to RabbitMQ using dotnet sdk

I'm pretty new to S3. I'm trying to create a Bucket and receive notifications on Object Created events using code only (not with the AWS Management UI).
I'm writing in dotnet so I'm using the AWSSDK.Core nuget package.
Until now I've managed to create a bucket using the sdk.
It seems like a trivial task though I couldn't find references around the web to accomplish it.
Also, the object storage is S3 compatible, not AWS S3.
I tried configuring a SNS Topic, but it seems that in order to enable notifications, the API requires SQS as a Queueing service, not RabbitMQ.
I did see another approach - configuration of a lambda function that transfers messages to RabbitMQ, but couldn't find references and documentation as well.
Any help is appreciated :)

Delta Table transactional guarantees when loading using Autoloader from AWS S3 to Azure Datalake

Trying to use autoloader where AWS S3 is source and Delta lake is in Azure Datalake Gen. When I am trying to read files it gives me following error
Writing to Delta table on AWS from non-AWS is unsafe in terms of providing transactional guarantees. If you can guarantee that no one else will be concurrently modifying the same Delta table, you may turn this check off by setting the SparkConf: "spark.databricks.delta.logStore.crossCloud.fatal" to false when launching your cluster.
Tried setting up settings at cluster level and it works fine. My question is, is there any way we can ensure transactional guarantee wile loading data from AWS3 to Azure Datalake (Datalake is backend storage for our Delta Lake). We don't want to set "spark.databricks.delta.logStore.crossCloud.fatal" at Cluster level. Will there be any issue if we do and will it be a good solution for a production ETL pipeline?
This warning appears when Databricks detects that you're doing the multicloud work.
But this warning is for case when you're writing into AWS S3 using Delta, because AWS doesn't have atomic write operation (like, put-if-absent), so it requires some kind of coordinator process that is available only on AWS.
But in your case you can ignore this message because you're just reading from AWS S3, and writing into Delta that is on Azure Datalake.

HDFS over S3 / Google storage bucket translation layer - how?

I'd love to expose a Google storage bucket over HDFS to a service.
Service in question is a cluster (SOLR) that can speak only to HDFS, given I have no hadoop (nor need for it), ideally I'd like to have a docker container that would user a Google storage bucket as a backend and expose it's contents via HDFS.
If possible I'd like to avoid mounts (like fuse gcsfs), has anyone done such thing?
I think I could just do mount gcsfs and setup a single node cluster with HDFS, but is there a simpler / more robust way?
Any hints / directions are appreciated.
The Cloud Storage Connector for Hadoop is the tool you might need.
It is not a Docker image but rather an install. Further instructions can be found in the GitHub repository under README.md and INSTALL.md
If it is accessed from AWS S3 you'll need a Service Account with access to Cloud Storage and set the env variable GOOGLE_APPLICATION_CREDENTIALS to /path/to/keyfile.
To use SOLR with GCS, you need indeed a hadoop cluster and you can do that in GCP by creating a dataproc cluster then use the connector mentioned to connect your SOLR solution with GCS. for more info check this SOLR

Copy objects from S3 to google cloud storage using aws-cli

Is this possible to access Google Cloud Storage using aws CLI?
Google Cloud Platform have support to copy files from S3 to Google Cloud Storage using gsutil with the following CLI.
gsutil -m cp -R s3://bucketname gs://bucketname
But I need to do this with aws CLI instead of gsutil.
I am not aware of any solution from the AWS side, but unless you have a special reason not to use gsutil or other Google solution, you may consider using Google Cloud Storage Transfer Service instead. This service is recommended when transferring data from Amazon S3 buckets.
Compared with simply using gsutil, or other CLI tools out there, Google Cloud Storage Transfer has several nice features like the possibility to schedule one-time or recurring transfers, where you can use advanced filters. Also, you can indicate if you want the source objects to be deleted after transferring them, and even synchronize the destination bucket with the source one, deleting existing objects if they don't have a corresponding object in the source.
You can schedule transfers from the GCP Console or using the XML and JSON API.

Can spinnaker use local storage such as mysql database?

I want to deploy spinnaker for my team. But I encounter a problem. The document of spinnaker said:
Before you can deploy Spinnaker, you must configure it to use one of the supported storage types.
Azure Storage
Google Cloud Storage
Redis
S3
Can spinnaker use local storage such as mysql database?
The Spinnaker microservice responsible for persisting your pipeline configs and application metadata, front50, has support for the storage systems you listed. One could add support for additional systems like mysql by extending front50, but that support does not exist today.
Some folks have had success configuring front50 to use s3 and pointing it at a minio installation.