Pause AWS S3 file upload - amazon-s3

I am developing a Springboot based POC to implement AWS S3 file upload/pause/resume functionality and following this https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/transfer/s3/S3TransferManager.html
In sample code under the description of resumeUploadFile, it mentions "// Optionally, persist the resumableFileUpload". Is this for recovering from scenarios like JVM crash? What's the impact if its not persisted?
Please advise.

Related

What is the best approach to sync data from AWS 3 bucket to Azure Data Lake Gen 2

Currently, I download csv files from AWS S3 to my local computer using:
aws s3 sync s3://<cloud_source> c:/<local_destination> --profile aws_profile. Now, I would like to use the same process to sync the files from AWS to Azure Data Lake Storage Gen2 (one-way sync) on a daily basis. [Note: I only have read/download permissions for the S3 data source.]
I thought about 5 potential paths to solving this problem:
Use AWS CLI commands within Azure. I'm not entirely sure how to do that without running an Azure VM. Also, I would like to have my AWS profile credentials persist?
Use Python's subprocess library to run AWS CLI commands. I run into similar issues as option 1, namely a) maintaining a persistent install of AWS CLI, b) passing AWS profile credentials, and c) running without an Azure VM.
Use Python's Boto3 library to access AWS services. In the past, it appears that Boto3 didn't support the AWS sync command. So, developers like #raydel-miranda developed their own. [see Sync two buckets through boto3]. However, it now appears that there is a DataSync class for Boto3. [see DataSync | Boto3 Docs 1.17.27 documentation]. Would I still need to run this in an Azure VM or could I use Azure Data Factory?
Use Azure Data Factory to copy data from AWS S3 bucket. [see Copy data from Amazon Simple Storage Service by using Azure Data Factory] My concern would be that I would want to sync rather than copy. I believe Azure Data Factory has functionality to check if a file already exists, but what if the file has been deleted from AWS S3 data source?
Use Azure Data Science Virtual Machine to: a) install the AWS CLI, 2) create my AWS profile to store the access credentials, and 3) run the aws s3 sync... command.
Any tips, suggestions, or ideas on automating this process are greatly appreciated.
Adding one more to the list :)
6. Please do also look into Azcopy option . https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-s3?toc=/azure/storage/blobs/toc.json
I am not aware of any tool which helps in syncing the data , more or less all will do the copy , I think you will have to implement that . Couple of quick thoughts .
#3 ) You can run this from a batch service . You can initate that from Azure data factory . Also since are talking about Python , you can also run that from Azure data bricks .
#4) ADF does not have any sync logic for the files to be deleted. We can implement that using the getMetadat activity . https://learn.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity
AzReplciate is another option - especially for very large containers https://learn.microsoft.com/en-us/samples/azure/azreplicate/azreplicate/

When to use s3cmd over accessing the S3 API programmatically?

I've been having difficulty understanding when to use s3cmd program over using the Java API. A vendor has documentation on accessing S3 with s3cmd. It is unclear to me as the bucket names appear to be dynamic. No region is specified. Additionally, I'm reaching out over an endpoint. I've tried writing some Java code to interact with S3 the same way that s3cmd does but I haven't been able to connect. Overall, it appears to quite a bit different.
To me s3cmd seems to be a utility to manipulate these files or quickly get at them. Integrating this utility into a Java program seems meaningless.
Anyone have any resources or can help me understand this better?
S3cmd (s3cmd) is a free command line tool and client for uploading, retrieving and managing data in Amazon S3 and other cloud storage service providers that use the S3 protocol, such as Google Cloud Storage or DreamHost DreamObjects. It is best suited for power users who are familiar with command line programs. It is also ideal for batch scripts and automated backup to S3, triggered from cron, etc.
S3cmd is written in Python. It's an open source project available under GNU Public License v2 (GPLv2) and is free for both commercial and private use. You will only have to pay Amazon for using their storage.
Lots of features and options have been added to S3cmd, since its very first release in 2008.... we recently counted more than 60 command line options, including multipart uploads, encryption, incremental backup, s3 sync, ACL and Metadata management, S3 bucket size, bucket policies, and more!

How to upload a file to AWS S3 bucket from CakePHP 3.0?

I have used S3 bucket to upload documents from my CakePHP 2.0 Web Application.
Right now I am facing problem to upload from CakePHP 3.0 Shell script.
I get Fatal Error: Class S3 not found in App\Shell\S3.php,
I have googled a lot, I have tried to use S3 SDK but if you try to use S3 class in your Shell script I get above error.
Please let me know if I am missing something, any help will be appreciated.
Your issue looks like a problem with autoloading the S3 SDK. Have you included the S3 SDK's autoloader properly?
Anyway, I'd recommend PHP league's flysystem for a very easy-to-use abstraction of S3: https://github.com/thephpleague/flysystem-aws-s3-v3

Cloudbees file logging options

It seems cloudbees writes the logs only to stream and not to a file. I need to save my logs. Can I use any option other than papertrail to store/retreive log files? Can I listen to the some input stream and get feed of logs? Can I dump logs directly to Amazon S3?
As filesystem isn't persistent we also don't provide file based logging. We don't provide a platform helper to store logs to S3, as papertrails offers a comparable persistent solution with better performances and dedicated service.
You can for sure use your favorite logging framework and custom extensions to get log stored on S3 or other if you prefer this option.

Upload file to Amazon S3 and assign callback for percentage uploaded

Is there an easy way to assign callback function for percentage uploaded in PHP - Amazon s3.
Something similar to this
File Download
but for upload?
The AWS SDK for php 1.2.6 includes a runnable sample in _samples/cli-s3_progress_bar.php.
which shows tracking upload/download progress.
Download here
http://aws.amazon.com/releasenotes/PHP/1553377899765189