How to add rollback functionality to a basic S3 CodeBuild deploy - amazon-s3

I have followed this instruction to get a very basic ci workflow in aws. It works flawless but I want to have a extra functionality, rollback. First i though it would work "out-of-the-box", but not in my case, if I select the the previous job in CodeBuild that i want to rollback to and hit "Retry" i get this error message: "Error ArtifactsOverride must be set when using artifacts type CodePipelines". I have also tried to rerun the whole pipeline again with pipeline history page, but it's just a list of builds without any functionality.
My questions is: how to add a rollback function to my workflow. It doesn't have to be in the same pipeline etc. But it should not touch git.

AWS CloudFormation now supports rolling back based in a CloudWatch alarm.
I'd put a CloudFront distribution in front of your S3 bucket with the origin path set to a folder within that bucket. Every time you deploy to S3 from CodeBuild you deploy to a random new S3 folder.
You then pass the folder name in a JSON file as an output artifact from your CodeBuild step. You can use this artifact as a parameter to a CloudFormation template updated by a CloudFormation action in your pipeline.
The CloudFormation template would update the OriginPath field of your CloudFront distribution to the folder containing your new deployment.
If the alarm fires then the CloudFormation template would roll back and flip back to the old folder.
There are several advantages to this approach:
Customers should only see either the new or old version while the deployment is happening rather than seeing potentially mixed files while the deployment is running.
The deployment logic is simpler because you're uploading a fresh set of files every time, rather than figuring out which files are new and which need to be deleted.
The rollback is pretty simple because you're flipping back to files which are still there rather than re-deploying the old files.
Your pipeline would need to contain both the CodeBuild and a sequential CloudFormation action.

Related

MLflow artifacts on S3 but not in UI

I'm running mlflow on my local machine and logging everything through a remote tracking server with my artifacts going to an S3 bucket. I've confirmed that they are present in S3 after a run but when I look at the UI the artifacts section is completely blank. There's no error, just empty space.
Any idea why this is? I've included a picture from the UI.
You should see the 500 response in your artifacts request to the MLflow tracking server e.g. by clicking on the model of interest page (in the browser console). The UI service wouldnt know the location (since you set that to be an S3 bucket) and tries to load the defaults.
You need to specify the --artifacts-destination s3://yourPathToArtifacts argument to your mlflow server command. Also, when running the server in your environment dont forget to supply some common AWS credentials provider(s) (such as AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY env variables) as well as the MLFLOW_S3_ENDPOINT_URL env variable to point to your S3 endpoint.
I had the same issue with mlflow running on an ec2 instance. I logged into the server and noticed that it was overloaded and no disk space was left. I deleted a few temp files and the mlflow UI started displaying the files again. It seems like mlflow stores tons of tmp files, but that is a separate issue.

Read bucket object in CDK

In terraform to read an object from s3 bucket at the time of deployment I can use data source
data aws_s3_bucket_object { }
Is there a similar concept in CDK? I've seen various methods of uploading assets to s3, as well as importing an existing bucket, but not getting an object from the bucket. I need to read a configuration file from the bucket that will affect further deployment.
Its important to remember that CDK itself is not a deployment option. it can deploy, but the code you are writing in a cdk stack is the definition of your resources - not a method for deployment.
So, you can do one of a few things.
Use your SDK for your language to make a call to the s3 bucket and load the data directly. This is perfectly acceptable and an understood way to gather information you need before deployment - each time the stack Synths (which it does before every cdk deploy that code will run and will pull your data.
Use a CodePipeline to set up a proper pipeline, and give it two sources - one your version control repo and the second your s3 bucket:
https://docs.aws.amazon.com/codebuild/latest/userguide/sample-multi-in-out.html
The preferred way - drop the json file, and use Parameter Store. CDK contains modules that will create a token version of this parameter on synth, and when it deploys it will reference that properly back to the Systems Manager Parameter store
https://docs.aws.amazon.com/cdk/v2/guide/get_ssm_value.html
If your parameters change after deployment, you can have that as part of your cdk stack pretty easily (using cfn outputs). If they change in the middle/during deployment, you really need to be using a CodePipeline to manage these steps instead of just CDK.
Because remember: The cdk deploy option is just a convenience. It will execute everything and has no way to pause in the middle and execute specific steps. (other than a very basic, this depends on this resources)

Does Serverless, Inc ever see my AWS credentials?

I would like to start using serverless-framework to manage lambda deploys at my company, but we handle PHI so security’s tight. Our compliance director and CTO had concerns about passing our AWS key and secret to another company.
When doing a serverless deploy, do AWS credentials ever actually pass through to Serverless, Inc?
If not, can someone point me to where in the code I can prove that?
Thanks!
Running serverless deploy isn't just one call, it's many.
AWS example (oversimplification):
Check if deployment s3 bucket already exists
Create an S3 bucket
Upload packages to s3 bucket
Call CloudFormation
Check CloudFormation stack status
Get info of created recourses (e.g. endpoint urls of created APIs)
And those calls can change dependent on what you are doing and what you have done before.
The point I'm trying to make is is that these calls which contain your credentials are not all located in one place and if you want to do a full code review of Serverless Framework and all it's dependencies, have fun with that.
But under the hood, we know that it's actually using the JavaScript aws-sdk (go check out the package.json), and we know what endpoints that uses {service}.{region}.amazonaws.com.
So to prove to your employers that nothing with your credentials is going anywhere except AWS you can just run a serverless deploy with wireshark running (other network packet analyzers are available). That way you can see anything that's not going to amazonaws.com
But wait, why are calls being made to serverless.com and serverlessteam.com when I run a deploy?
Well that's just tracking some stats and you can see what they track here. But if you are uber paranoid, this can be turned off with serverless slstats --disable.

Concourse CI tool to continuously monitor an S3 bucket and trigger job if a file appears

I need to be able to trigger a Concourse task when a file appears in an Amazon S3 bucket.
There is this Concourse tool:
https://github.com/pivotalservices/concourse-curl-resource
However I am looking to test for the existence of a file, and if it exists, trigger another job, then delete the file to reset.
Any suggestions?
The s3 resource can handle this with a versioned file. Every time you update/change the same file, the resource will detect a new update, and then trigger a job.

Backing up a Serverless Framework deployment

I'm familiar with Terraform and its terraform.tfstate file where it keeps track of which local resource identifiers map to which remote resources. I've noticed that there is a .serverless directory on my machine which seems to contain files such as CloudFormation templates and ZIP files containing Lambda code.
Suppose I create and deploy a project from my laptop, and Serverless spins up fooxyz.cloudfront.net which points to a Lambda function arn:aws:lambda:us-east-1:123456789012:function:handleRequest456. If I naively try to run Serverless again from another machine (or if I git clean my working directory), it'll spin up a new CloudFront endpoint since it doesn't know that fooxyz.cloudfront.net already represents the same application. I'm looking to back up the state it keeps internally, so that it modifies an existing resource rather than creates a new one. (The equivalent in Terraform would be to back up the terraform.tfstate file.)
If I wished to back up or restore a Serverless deployment state, which files would I back up? In the case of AWS, it seems like I should be backing up the CloudFormation templates; I don't want to back up the Lambda code since it's directly generated from the source. However, I'm likely going to use more than just AWS in the future, and so don't want to "special-case" the CloudFormation templates if at all possible.
How can I back up only the files I cannot regenerate?
I think what you are asking is If I or a colleague checks out the serverless code from git on a different machine, will we still be able to deploy and update the same lambda functions and the same API gateway endpoints?
And the answer to that is yes! Serverless keeps track of all of that for you within their files. Unless you run serverless destroy - no operation will create a new lambda or api endpoint.
My team and I are using this method: we commit all code to a git repo and one of us checks it out and deploys a function or the entire thing and it updates the existing set of functions properly. If you setup an environment file - that's all you need to worry about really. And I recommend leaving it outside of git entirely.
For AWS; Serverless Framework keeps track of your deployment via Cloudformation (CF) parameters/identifiers which are specific to an account/region. The CF stack templates are uploaded to an (auto-generated) S3 bucket so it's already backed up for you.
So all you really need to have is the original deployment code in a git repo and have access to your keys. Everything else is already backed up for you.