How to copy data from private S3 bucket to Azure Blob storage via Azure yaml pipeline - amazon-s3

I have one S3 bucket storing CSV files in it. New CSV files get added to this bucket at the beginning of each month. I want these new files to be uploaded automatically to the Azure blob storage at the beginning of each month.
The way I was thinking to do this is to create a script(bash/PowerShell) that pulls data from the AWS S3 bucket to Azure blob storage via AZ Copy command. and then plug this script into an Azure YAML pipeline which runs every start of the month to execute this script. but I can't find a way to integrate this script in an Azure YAML pipeline. Is this command feasible with the YAML pipeline? or is there any simple way to do this?

We can copy data from S3 bucket to Azure Blob Storage using azcopy.
azcopy copy "<s3-bucket-uri>" "https://StorageAccountName.blob.core.windows.net/container-name/?sas-token" --recursive
We can integrate the azcopy with YAML pipeline.
Firstly, install azcopy in pipeline agent as below :
- task: Bash#3
displayName: Install azcopy
inputs:
targetType: 'inline'
script: |
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
mkdir $(Agent.ToolsDirectory)/azcopy
wget -O $(Agent.ToolsDirectory)/azcopy/azcopy_v10.tar.gz https://aka.ms/downloadazcopy-v10-linux
tar -xf $(Agent.ToolsDirectory)/azcopy/azcopy_v10.tar.gz -C $(Agent.ToolsDirectory)/azcopy --strip-components=1
Create cli-task with azcopy in pipeline to copy data from S3 bucket to Azure Blob Storage using azcopy
Reference code :
- task: AzureCLI#2
displayName: Download using azcopy
inputs:
azureSubscription: 'Service-Connection'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
end=`date -u -d "180 minutes" '+%Y-%m-%dT%H:%M:00Z'`
$(Agent.ToolsDirectory)/azcopy/azcopy copy "<s3-bucket-uri>" "https://StorageAccountName.blob.core.windows.net/container-name/?sas-token" --recursive --check-md5=FailIfDifferent
Reference SO thread : Azure Pipelines - Download files with azcopy - Stack Overflow

Related

Persistent Bitbucket pipeline build artifacts greater than 14 days

I have a pipeline which loses build artifacts after 14 days. I.e, after 14 days, without S3 or Artifactory integration, the pipeline of course loses "Deploy" button functionality - it becomes greyed out since the build artifact is removed. I understand this is by intention by BB/Atlassian to reduce costs etc (detail in below link).
Please check last section of this page "Artifact downloads and Expiry" - https://support.atlassian.com/bitbucket-cloud/docs/use-artifacts-in-steps/
If you need artifact storage for longer than 14 days (or more than 1
GB), we recommend using your own storage solution, like Amazon S3 or a
hosted artifact repository like JFrog Artifactory.
Question:
Is anyone able to provide advice or sample code on how to approach BB Pipeline integration with Artifactory (or S3) in order to retain artifacts. Is the Artifactory generic upload/download pipe approach the only way or is the quote above hinting at a more native BB "repository setting" to provide integration with S3 or Artifactory? https://www.jfrog.com/confluence/display/RTF6X/Bitbucket+Pipelines+Artifactory+Pipes
Bitbucket give an example of linking to an S3 bucket on their site.
https://support.atlassian.com/bitbucket-cloud/docs/publish-and-link-your-build-artifacts/
The key is Step 4 where you link the artefact to the build.
However the example doesn't actually create an artefact that is linked to S3, but rather adds a status code with a description that links to the uploaded item's in S3. To use these in further steps you would then have to download the artefacts.
This can be done using the aws cli and an image that has this installed, for example the amazon/aws-sam-cli-build-image-nodejs14.x (SAM was required in my case).
The following is an an example that:
Creates an artefact ( a txt file ) and uploads to an AWS S3 bucket
Creates a "link" as a build status against the commit that triggered the pipeline, as per Amazon's suggestion ( this is just added for reference after the 14 days... meh)
Carrys out a "deployment", where by the artefact is downloaded from AWS S3, in this stage I also then set the downloaded S3 artefact as a BitBucket artefact, I mean why not... it may expire after 14 days but at if I've just re-deployed then I may want this available for another 14 days....
image: amazon/aws-sam-cli-build-image-nodejs14.x
pipelines:
branches:
main:
- step:
name: Create artefact
script:
- mkdir -p artefacts
- echo "This is an artefact file..." > artefacts/buildinfo.txt
- echo "Generating Build Number:\ ${BITBUCKET_BUILD_NUMBER}" >> artefacts/buildinfo.txt
- echo "Git Commit Hash:\ ${BITBUCKET_COMMIT}" >> artefacts/buildinfo.txt
- aws s3api put-object --bucket bitbucket-artefact-test --key ${BITBUCKET_BUILD_NUMBER}/buildinfo.txt --body artefacts/buildinfo.txt
- step:
name: Link artefact to AWS S3
script:
- export S3_URL="https://bitbucket-artefact-test.s3.eu-west-2.amazonaws.com/${BITBUCKET_BUILD_NUMBER}/buildinfo.txt"
- export BUILD_STATUS="{\"key\":\"doc\", \"state\":\"SUCCESSFUL\", \"name\":\"DeployArtefact\", \"url\":\"${S3_URL}\"}"
- curl -H "Content-Type:application/json" -X POST --user "${BB_AUTH_STRING}" -d "${BUILD_STATUS}" "https://api.bitbucket.org/2.0/repositories/${BITBUCKET_REPO_OWNER}/${BITBUCKET_REPO_SLUG}/commit/${BITBUCKET_COMMIT}/statuses/build"
- step:
name: Test - Deployment
deployment: Test
script:
- mkdir artifacts
- aws s3api get-object --bucket bitbucket-artefact-test --key ${BITBUCKET_BUILD_NUMBER}/buildinfo.txt artifacts/buildinfo.txt
- cat artifacts/buildinfo.txt
artifacts:
- artifacts/**
Note:
I've got the following secrets/variables against the repository:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
BB_AUTH_STRING

Dotnet core code publish push to s3 as Zip from gitlab CI/CD

How can I zip the artifacts before copying to s3 bucket, this is done as the beanstalk requires zip file to update.
I wanted to deploy the dotnet publish code in beanstalk. I am using Gitlab CI/CD to trigger the build when new changes are pushed to the gitlab repo
In my .gitlab-ci.yml file what am doing is
build and publish the code using dotnet publish
copy the published folder artifact to s3 bucket as zip
create new beanstalk application version
update beanstalk environment to reflect the new changes.
Here I was able to perform all the steps except step 3. Can anyone please help me on how can I Zip the published folder and copy that zip to s3 bucket. Please find my relavant code below:
build:
image: mcr.microsoft.com/dotnet/core/sdk:3.1
script:
- dotnet publish -c Release -o /builds/maskkk/samplewebapplication/publish/
stage: build
artifacts:
paths:
- /builds/maskkk/samplewebapplication/publish/
deployFile:
image: python:latest
stage: deploy
script:
- pip install awscli
- aws configure set aws_access_key_id $AWS_ACCESS_KEY_ID
- aws configure set aws_secret_access_key $AWS_SECRET_ACCESS_KEY
- aws configure set region us-east-2
- aws s3 cp --recursive /builds/maskkk/samplewebapplication/publish/ s3://elasticbeanstalk-us-east-2-654654456/JBGood-$CI_PIPELINE_ID
- aws elasticbeanstalk create-application-version --application-name Test5 --version-label JBGood-$CI_PIPELINE_ID --source-bundle S3Bucket=elasticbeanstalk-us-east-2-654654456,S3Key=JBGood-$CI_PIPELINE_ID
- aws elasticbeanstalk update-environment --application-name Test5 --environment-name Test5-env --version-label JBGood-$CI_PIPELINE_ID````
I got the answer to this issue we can simply run a
zip -r ../published.zip *
this will create a zip file and can upload this zip folder to s3.
Please let me know if we have any other better solution to this.

How to download S3-Bucket, compress on the fly and reupload to another s3 bucket without downloading locally?

I want to download the contents of a s3 bucket (hosted on wasabi, claims to be fully s3 compatible) to my VPS, tar and gzip and gpg it and reupload this archive to another s3 bucket on wasabi!
My vps machine only has 30GB of storage, the whole buckets is about 1000GB in size so I need to download, archive, encrypt and reupload all of it on the fly without storing the data locally.
The secret seems to be in using the | pipe command. But I am stuck even in the beginning of download a bucket into an archive locally (I want to go step by step):
s3cmd sync s3://mybucket | tar cvz archive.tar.gz -
In my mind at the end I expect some code like this:
s3cmd sync s3://mybucket | tar cvz | gpg --passphrase secretpassword | s3cmd put s3://theotherbucket/archive.tar.gz.gpg
but its not working so far!
What am I missing?
The aws s3 sync command copies multiple files to the destination. It does not copy to stdout.
You could use aws s3 cp s3://mybucket - (including the dash at the end) to copy the contents of the file to stdout.
From cp — AWS CLI Command Reference:
The following cp command downloads an S3 object locally as a stream to standard output. Downloading as a stream is not currently compatible with the --recursive parameter:
aws s3 cp s3://mybucket/stream.txt -
This will only work for a single file.
You may try https://github.com/kahing/goofys. I guess, in your case it could be the following algo:
$ goofys source-s3-bucket-name /mnt/src
$ goofys destination-s3-bucket-name /mnt/dst
$ tar -cvzf /mnt/src | gpg -e -o /mnt/dst/archive.tgz.gpg

Automating Folder creation in Azure Data Lake Store (Gen2)

Is there any way we can automate Folder creations in ADLS Gen2 pragmatically. It is required for doing this in Production Subscription on which there is limited access from Portal.
If there is any suggestions, please help on this.
For the benefit of broader audience re-sharing Ivan Yang's comment as answer i.e.,.
You can use the rest api here
You could use Azure cli for this, but it is still in preview.
#add the extension
az extension add --name storage-preview
# Create Directory
$dirCheck = az storage blob directory exists -c $filesystemName -d $dirname --account-name $storageaccountname --debug
$exists = $dirCheck | ConvertFrom-Json
If (-Not $exists.exists[0])
{
az storage blob directory create -c $filesystemName -d $dirname --account-name $storageaccountname
}

gsutil: specify project on copy

I'm attempting to come up with commands to facilitate deployment to different environments (production, staging) in my GCP project using gsutil.
The following deploys to production without issue:
gsutil cp -r ./build/* gs://<production-project-name>/
I'd like to deploy to a bucket in another project. The gsutil help page alludes to a -p option for ls and mb used to change the project context of the gsutil command.
I'd like to use a command like this to deploy my app to a staging environment:
gsutil cp -r ./build/* gs://<existing-bucket-in-staging-project>/ -p <staging-project-name>
Alas, the -p option is not available for the cp command. I confirmed on the gsutil cp doc page.
What is the best way to deploy a build artifact to a Google Cloud storage bucket to a bucket in a project other than the one currently specified in the terminal environment?
The bucket namespace is global, so as long as the credentials you're using have permission to the other project, you shouldn't need a project parameter with the cp command. In other words, this command should work fine:
gsutil cp -r ./build/* gs://<bucket-in-staging-project>