Copy Data From Azure Blob Storage to AWS S3 - amazon-s3

I am new to Azure Data Factory and have an interesting requirement.
I need to move files from Azure Blob storage to Amazon S3, ideally using Azure Data Factory.
However S3 isnt supported as a sink;
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview
I also understand from a variety of comments i've read on here that you cannot directly copy from Blob Storage to S3 - you would need to download the file locally and then upload it to S3.
Does anyone know of any examples, in Data factory, SSIS or Azure Runbook that can do such a thing, I suppose an option would be to write an azure logic-app or function that is called from Data Factory.

Managed to get something working on this - it might be useful for someone else.
I decided to write an azure function that uses a HTTP request as a trigger.
These two posts helped me a lot;
How can I use NuGet packages in my Azure Functions?
Copy from Azure Blob to AWS S3 using C#
Please note my answer to the Nuget packages if you are using Azure functions 2.x.
Here is the code - you can modify the basis of this to your needs.
I return a JSON Serialized object because Azure Data Factory requires this as a response from a http request sent from a pipeline;
#r "Microsoft.WindowsAzure.Storage"
#r "Newtonsoft.Json"
#r "System.Net.Http"
using System.Net;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Extensions.Primitives;
using Newtonsoft.Json;
using Microsoft.WindowsAzure.Storage.Blob;
using System.Net.Http;
using Amazon.S3;
using Amazon.S3.Model;
using Amazon.S3.Transfer;
using Amazon.S3.Util;
public static async Task<IActionResult> Run(HttpRequest req, ILogger log)
{
log.LogInformation("Example Function has recieved a HTTP Request");
// get Params from query string
string blobUri = req.Query["blobUri"];
string bucketName = req.Query["bucketName"];
// Validate query string
if (String.IsNullOrEmpty(blobUri) || String.IsNullOrEmpty(bucketName)) {
Result outcome = new Result("Invalid Parameters Passed to Function",false,"blobUri or bucketName is null or empty");
return new BadRequestObjectResult(outcome.ConvertResultToJson());
}
// cast the blob to its type
Uri blobAbsoluteUri = new Uri(blobUri);
CloudBlockBlob blob = new CloudBlockBlob(blobAbsoluteUri);
// Do the Copy
bool resultBool = await CopyBlob(blob, bucketName, log);
if (resultBool) {
Result outcome = new Result("Copy Completed",true,"Blob: " + blobUri + " Copied to Bucket: " + bucketName);
return (ActionResult)new OkObjectResult(outcome.ConvertResultToJson());
}
else {
Result outcome = new Result("ERROR",false,"Copy was not successful Please review Application Logs");
return new BadRequestObjectResult(outcome.ConvertResultToJson());
}
}
static async Task<bool> CopyBlob(CloudBlockBlob blob, string existingBucket, ILogger log) {
var accessKey = "myAwsKey";
var secretKey = "myAwsSecret";
var keyName = blob.Name;
// Make the client
AmazonS3Client myClient = new AmazonS3Client(accessKey, secretKey, Amazon.RegionEndpoint.EUWest1);
// Check the Target Bucket Exists;
bool bucketExists = await AmazonS3Util.DoesS3BucketExistAsync (myClient,existingBucket);
if (!bucketExists) {
log.LogInformation("Bucket: " + existingBucket + " does not exist or is inaccessible to the application");
return false;
}
// Set up the Transfer Utility
TransferUtility fileTransferUtility = new TransferUtility(myClient);
// Stream the file
try {
log.LogInformation("Starting Copy");
using (var stream = await blob.OpenReadAsync()) {
// Note: You need permissions to not be private on the source blob
log.LogInformation("Streaming");
await fileTransferUtility.UploadAsync(stream,existingBucket,keyName);
log.LogInformation("Streaming Done");
}
log.LogInformation("Copy completed");
}
catch (AmazonS3Exception e) {
log.LogInformation("Error encountered on server. Message:'{0}' when writing an object", e.Message);
}
catch (Exception e) {
log.LogInformation("Unknown encountered on server. Message:'{0}' when writing an object", e.Message);
return false;
}
return true;
}
public class Result {
public string result;
public bool outcome;
public string UTCtime;
public string details;
public Result(string msg, bool outcomeBool, string fullMsg){
result=msg;
UTCtime=DateTime.Now.ToString("yyyy-MM-dd h:mm:ss tt");
outcome=outcomeBool;
details=fullMsg;
}
public string ConvertResultToJson() {
return JsonConvert.SerializeObject(this);
}
}

You can use Skyplane to copy data across clouds (110X speedup over CLI tools, with automatic compression to save on egress). To transfer from Azure blob storage to S3 you can call one of the commands:
skyplane cp -r az://azure-bucket-name/ s3://aws-bucket-name/
skyplane sync -r az://azure-bucket-name/ s3://aws-bucket-name/

ADF now includes SFTP as a sink. From the same link provided in the question (supported as a sink is the far-right column):
Using the AWS Transfer family you can set up an SFTP server and add a user with an SSH public key, then use that configuration to set up an SFTP connection from ADF that will connect directly to an S3 bucket.

Download Files From Azure Storage using AzCopy into a temporal local repository
You can download the files from Azure Cloud storage to your local system, just follow the below command, use the recursive flag to copy all the files
azcopy /Source:[source_container_url] /Dest:[local_file_path] /Sourcekey:[source_storage_account_access_key] /s
Upload Local Files to Amazon S3 using aws s3 cp command
aws s3 cp local_file_path s3://my-bucket/ --recursive

Related

How to update an existing Blob in Azure Storage in .NET 6 or in ASP.NET Core

I have prepared some C# code to create a container in the Azure Storage and then I am uploading a file into that azure container. The code is below:
var connectionString = _settings.appConfig.StorageConnectionString;
BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
BlobContainerClient blobContainer = blobServiceClient.GetBlobContainerClient("nasir-container");
await blobContainer.CreateIfNotExistsAsync(); // Create the container.
BlobClient blobClient = blobContainer.GetBlobClient(fileName); // Creating the blob
string fileName = "D:/Workspace/Adappt/MyWordFile.docx";
FileStream uploadFileStream = System.IO.File.OpenRead(fileName);
blobClient.Upload(uploadFileStream);
uploadFileStream.Close();
Now I have updated my MyWordFile.docx with more content. Now I would like to upload this updated file to the same blob storage. How can I do this? I also want to create versioning too so that I can get the file content based on the version.
Now I have updated my MyWordFile.docx with more content. Now I would
like to upload this updated file to the same blob storage. How can I
do this?
To update a blob, you simply upload the same file (basically use the same code you wrote to upload the file in the first place). Upload operation will overwrite an existing blob.
I also want to create versioning too so that I can get the file
content based on the version.
There are two ways you can implement versioning for blobs:
Automatic versioning: If you want Azure Blob Storage service to maintain versions of your blobs, all you need to do is enable versioning on the storage account. Once you enable that, anytime a blob is modified a new version of the blob will be created automatically for you by service. Please see this link to learn more about blob versioning: https://learn.microsoft.com/en-us/azure/storage/blobs/versioning-overview.
Manual versioning: While automatic versioning is great but there could be many reasons why you would want to opt for manual versioning (e.g. you only want to version a few blobs and not all blobs, you are not using V2 Storage account etc.). If that's the case, then you can create a version of the blob by taking a snapshot of the blob before you update the blob. Snapshot creates a read-only copy of the blob at the point of time snapshot was taken. Please see this link to learn more about blob snapshot: https://learn.microsoft.com/en-us/azure/storage/blobs/snapshots-overview.
First you need to enable versioning in the blob storage through the portal in the storage account.
Just click on Disable it will take you to a different page and select enable version and click save
Here after uploading the blob when you update the blob it will automatically trigger the creating of versions.
public static async Task UpdateVersionedBlobMetadata(BlobContainerClient blobContainerClient,
string blobName)
{
try
{
// Create the container.
await blobContainerClient.CreateIfNotExistsAsync();
// Upload a block blob.
BlockBlobClient blockBlobClient = blobContainerClient.GetBlockBlobClient(blobName);
string blobContents = string.Format("Block blob created at {0}.", DateTime.Now);
byte[] byteArray = Encoding.ASCII.GetBytes(blobContents);
string initalVersionId;
using (MemoryStream stream = new MemoryStream(byteArray))
{
Response<BlobContentInfo> uploadResponse =
await blockBlobClient.UploadAsync(stream, null, default);
// Get the version ID for the current version.
initalVersionId = uploadResponse.Value.VersionId;
}
// Update the blob's metadata to trigger the creation of a new version.
Dictionary<string, string> metadata = new Dictionary<string, string>
{
{ "key", "value" },
{ "key1", "value1" }
};
Response<BlobInfo> metadataResponse =
await blockBlobClient.SetMetadataAsync(metadata);
// Get the version ID for the new current version.
string newVersionId = metadataResponse.Value.VersionId;
// Request metadata on the previous version.
BlockBlobClient initalVersionBlob = blockBlobClient.WithVersion(initalVersionId);
Response<BlobProperties> propertiesResponse = await initalVersionBlob.GetPropertiesAsync();
PrintMetadata(propertiesResponse);
// Request metadata on the current version.
BlockBlobClient newVersionBlob = blockBlobClient.WithVersion(newVersionId);
Response<BlobProperties> newPropertiesResponse = await newVersionBlob.GetPropertiesAsync();
PrintMetadata(newPropertiesResponse);
}
catch (RequestFailedException e)
{
Console.WriteLine(e.Message);
Console.ReadLine();
throw;
}
}
static void PrintMetadata(Response<BlobProperties> propertiesResponse)
{
if (propertiesResponse.Value.Metadata.Count > 0)
{
Console.WriteLine("Metadata values for version {0}:", propertiesResponse.Value.VersionId);
foreach (var item in propertiesResponse.Value.Metadata)
{
Console.WriteLine("Key:{0} Value:{1}", item.Key, item.Value);
}
}
else
{
Console.WriteLine("Version {0} has no metadata.", propertiesResponse.Value.VersionId);
}
}
The above code is from the following documentation.

Easy way to retrieve image source in abp

I'm pretty new to ABP Framework and probably this question has a really simple answer, but I haven't managed to find it. Images are an important part of any app and handling them the best way (size, caching) is mandatory.
Scenario
setup a File System Blob Storing provider. This means that the upload file will be stored in the file system as an image file
make a service that uses a Blob container to save and retrieve the image. So, after saving it, I use the unique file name as a blob name. This name is used to retrieve it back.
the user is logged in, so authorization is required
I can easily obtain the byte[]s of the image by calling blobContainer.GetAllBytesOrNullAsync(blobName)
I want to easily display the image in <img> or in datatable row directly.
So, here is my question: is there an easy way to use a blob stored image as src of a <img> directly in a razor page? What I've managed to achieve is setting in the model, a source as a string made from image type + bytes converted to base 64 string (as here) however in this case I need to do it in the model and also I don't know if caching is used by the browser. I don't see how caching would work in this case.
I am aware that this may be a question more related to asp.net core, but I was thinking that maybe in abp there is some way via a link to access the image.
If you have the ID of the blob then it is easy to do. Just create a Endpoint to get the Image based on the blob id.
Here is the sample AppService
public class DocumentAppService : FileUploadAppService
{
private readonly IBlobContainer<DocumentContainer> _blobContainer;
private readonly IRepository<Document, Guid> _repository;
public DocumentAppService(IRepository<Document, Guid> repository, IBlobContainer<DocumentContainer> blobContainer)
{
_repository = repository;
_blobContainer = blobContainer;
}
public async Task<List<DocumentDto>> Upload([FromForm] List<IFormFile> files)
{
var output = new List<DocumentDto>();
foreach (var file in files)
{
using var memoryStream = new MemoryStream();
await file.CopyToAsync(memoryStream).ConfigureAwait(false);
var id = Guid.NewGuid();
var newFile = new Document(id, file.Length, file.ContentType, CurrentTenant.Id);
var created = await _repository.InsertAsync(newFile);
await _blobContainer.SaveAsync(id.ToString(), memoryStream.ToArray()).ConfigureAwait(false);
output.Add(ObjectMapper.Map<Document, DocumentDto>(newFile));
}
return output;
}
public async Task<FileResult> Get(Guid id)
{
var currentFile = _repository.FirstOrDefault(x => x.Id == id);
if (currentFile != null)
{
var myfile = await _blobContainer.GetAllBytesOrNullAsync(id.ToString());
return new FileContentResult(myfile, currentFile.MimeType);
}
throw new FileNotFoundException();
}
}
Upload function will upload the files and Get function will get the file.
Now set the Get route as a src for the image.
Here is the blog post: https://blog.antosubash.com/posts/dotnet-file-upload-with-abp
Repo: https://github.com/antosubash/FileUpload

Log exceptions from log4net to Azure table storage in Azure function

I am referring to this link to log exceptions from log4net to Azure table storage.
Now I need to do the same in Azure functions in .net core, but since there is no config files in Azure functions, I am not able to use the same there.
What can I try to resolve this?
Now I need to do the same in Azure functions in .net core, but since there is no config files in Azure functions am not able to use the same there
It seems you are trying to read some property what we usually read from config file. Yes you can do it in Azure Function also. There is file name local.settings.json you can read your required property from here. See the example below:
local.settings.json:
{
"IsEncrypted": false,
"Values": {
"AzureWebJobsStorage": "UseDevelopmentStorage=true",
"FUNCTIONS_WORKER_RUNTIME": "dotnet",
"TableName": "YourTableName",
"AccountName": "YourAccountName",
"AccountKey": "YourAccountKey"
}
}
Read Your Property On Azure Function From local.settings.json:
public static class AzureFunctionApp2Arunraj414CaseForGetConfigProperty
{
[FunctionName("AzureFunctionApp2Arunraj414CaseForGetConfigProperty")]
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)] HttpRequest req,
ILogger log)
{
log.LogInformation("C# HTTP trigger function processed a request.");
//Read Request Body
var content = await new StreamReader(req.Body).ReadToEndAsync();
//Extract Request Body and Parse To Class
Users objUsers = JsonConvert.DeserializeObject<Users>(content);
//You Can Read Your Desired Value from local.settings.json file Like Below
var yourTableNameFromLocalSettingsJson = Environment.GetEnvironmentVariable("TableName");
var yourAccountNameFromLocalSettingsJson = Environment.GetEnvironmentVariable("AccountName");
var yourAccountKeyFromLocalSettingsJson = Environment.GetEnvironmentVariable("AccountKey");
//I am returning all the property I got from local.settings.json
var result = new OkObjectResult(yourTableNameFromLocalSettingsJson +" & "+ yourAccountNameFromLocalSettingsJson + " & " + yourAccountKeyFromLocalSettingsJson+ " From local.settings.json");
return result;
}
}
Debug & Testing:
See the screen shot below:
Note: You even can set your property outside of values on local.settings.json file as separate property.

Can you download a specific Block from an Azure Block Blob?

Is it possible to download a specific block from an Azure Block Blob if you know the Block Id?
Yes, you absolutely can, here's an example of how to download the first block:
var storageAccount = CloudStorageAccount.Parse("DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;EndpointSuffix=core.windows.net");
CloudBlobClient serviceClient = storageAccount.CreateCloudBlobClient();
var container = serviceClient.GetContainerReference("containerName");
var blockBlob = container.GetBlockBlobReference("blobName");
var blocklist = await blockBlob.DownloadBlockListAsync();
var firstBlock = blocklist.First();
var memStream = new MemoryStream();
await blockBlob.DownloadRangeToStreamAsync(memStream, 0, firstBlock.Length);
string contents;
using (var streamReader = new StreamReader(memStream))
{
contents = await streamReader.ReadToEndAsync();
}
You will need a couple of packages from nuget:
Microsoft.WindowsAzure.Storage
Microsoft.WindowsAzure.Storage.Blob
You could leverage Microsoft Azure Storage SDK for getting started with Azure Blob Storage quickly. And the Azure Storage SDK is a wrapper of Blob Service REST API. From the official tutorial about Blob Service REST API, we couldn't find anything about downloading the specific block via the Block Id. In addition, you could use Get Blob to download the bytes of your blob in the specified range by specifying the offset and the length of data to download from your blob.

Cascading Tap Configuration for Amazon S3 to Local file System

So I'm trying out the cascading framework and I was able to run apps in local. As a next step I want to use Cascading to download files from S3 to local file system. I'm planning to use FileTap. When I google I found S3FS and looks like its deprecated. Can someone point me in the right direction. In other words, what is the right way to create a FileTap for Amazon S3 with access id and secret key like stuff.
Though this code is for the HadoopFlowConnector this will work with Local Flow connector if using the FileTap.
public class Main {
public void run(String[] args) {
Properties properties = new Properties();
String accessKey = args[0];
String secretKey = args[1];
// better put these keys to hadoop xml file
// for block file system
properties.setProperty("fs.s3.awsAccessKeyId", accessKey);
properties.setProperty("fs.s3.awsSecretAccessKey", secretKey);
// for s3 native file system
// properties.setProperty("fs.s3n.awsAccessKeyId", accessKey);
// properties.setProperty("fs.s3n.awsSecretAccessKey", secretKey);
// properties.setProperty("fs.defaultFS", "hdfs://localhost:8020/");
// properties.setProperty("fs.permissions.umask-mode", "007");
AppProps.setApplicationJarClass(properties, Main.class);
HadoopFlowConnector flowConnector = new HadoopFlowConnector(
properties);
String input = "s3://my-bucket/my-log.csv";
// If using the native S3
// String input = "s3n://my-bucket/my-log.csv";
Tap inTap = new Hfs(new TextDelimited(false, ";"), input);
Pipe copyPipe = new Pipe("copy");
Tap outTap = new Hfs(new TextDelimited(false, ";"),
"data/output");
FlowDef flowDef = FlowDef.flowDef()
.addSource(copyPipe, inTap)
.addTailSink(copyPipe, outTap);
flowConnector.connect(flowDef).complete();
}
public static void main(String[] args) {
new Main().run(args);
}
}
Code courtesy http://whiteboxdeveloper.blogspot.com/2015/01/processing-data-from-aws-s3-using.html.
I just added the S3N string input commented line. Since I was using S3N based accessing I uncommented the S3N features and used it.