Can spinnaker prevent out-of-order deployments? - spinnaker

Currently
We use a CI platform to build, test, and release new code when a new PR is merged into master. The "release" step is quite simple/stupid, and essentially runs kubectl patch with the tag of the newly-pushed docker image.
The Problem
When two PRs merge at about the same time (ex: A, then B -- B includes A's commits, but not vice-versa), it may happen that B finishes its build/test first, and begins its release step first. When this happens, A releases second, even though it has older code. The result is a steady-state in which B's code has been effectively rolled-back by As deployment.
We want to keep our CI/CD as continuous as possible, ideally without:
serializing our CI pipeline (so that only one workflow runs at a time)
delaying/batching our deployments
Does Spinnaker have functionality or best-practice that solves for this?

Best practises for your issue are widely described in Message Ordering for Asynchronous systems. The simpliest solution would be to implement FIFO priciple for your CI/CD pipeline.
It will save you from implementing checks between CI and CD parts.

Related

How to test my pipeline before changing it?

When changing the pipelines for my company I often see the pipeline breaking under some specific condition that we did not anticipate. We use yaml files to describe the pipelines (Azure Devops)
We have multiple scenarios, such as:
Pipelines are run by automatic triggers, by other pipelines and manually
Pipelines share the same templates
There are IF conditions for some jobs/steps based on parameters (user input)
In the end, I keep thinking of testing all scenarios before merging changes, we could create scripts to do that. But it's unfeasible to actually RUN all scenarios because it would take forever, so I wonder how to test it without running it. Is it possible? Do you have any ideas?
Thanks!
I already tried the Preview endpoints from Azure REST api, which is good, but it only validates the input, such as variables and parameters. We also needed to make sure which steps are running and the variables being set in those
As far as I know (I am still new to our ADO solutions), we have to fully run/schedule the pipeline to see that it runs and then wait a day or so for the scheduler to complete the auto executions. At this point I do have some failing pipelines for a couple of days that I need o fix.
I do get emails when a pipeline fails like this in the json that holds the metadata to create a job:
"settings": {
"name": "pipelineName",
"email_notifications": {
"on_failure": [
"myEmail#email.com"
],
"no_alert_for_skipped_runs": true
},
Theres an equivalent extension that can be added in this link, but I have not done it this way and cannot verify if it works.
Azure Pipelines: Notification on Job failure
https://marketplace.visualstudio.com/items?itemName=rvo.SendEmailTask
I am not sure what actions your pipeline does but if there are jobs being scheduled there on external computes like Databricks, there should be a email alert system you can use to detect failures.
Other than that if you had multiple environments (dev, qa, prod) you could test in non production environment.
Or if you have a dedicated storage location that is only for testing a pipeline, use that for the first few days and then reschedule the pipeline in the real location after testing it completes a few runs.

How to use one container in pipeline?

the situation is such that we move from jenkins to gitlab ci. Every time a stage occurs in the pipeline, a new container is created, I would like to know if it is possible to make the container used by the previous one, that is, a single one. Gitlab Executer is docker.
I want to save condition of one container
No, this is not possible in a practical way with the docker executor. Each job is executed in its own container. There is no setting to change this behavior.
Keep in mind that jobs (even across stages) can run concurrently and that jobs can land on runners on completely different underlying machines. Therefore, this is not really practical.

GitLab pipelines equivalent for GitHub actions

I have a pipeline in GitLab that consists of multiple stages.
Each stage has a few jobs and produces artifacts, that are passed to the next stage if all the jobs from a stage will pass.
Something similar to this screenshot:
Is there any way to achieve something similar in GitHub actions?
Generally speaking, you can get very close to what you have above in GitHub actions. You'd trigger a workflow based on push and pull_request events so that it triggers when someone pushes to your repository, then you'd define each of your jobs. You would then use the needs syntax to define dependencies instead of stages (which is similar to the 14.2 needs syntax from GitLab), so for example your auto-deploy job would have needs: [test1, test2].
The one thing you will not be able to replicate is the manual wait on pushing to production. GitHub actions does not have the ability to pause at a job step and wait for a manual action. You can typically work around this by running workflows based on the release event, or by using a manual kick-off of the whole pipeline with a given variable set.
When looking at how to handle artifacts, check out the answer in this other stack overflow question: Github actions share workspace/artifacts between jobs?

Liquibase incremental snapshots

We've got a rather interesting use-case where we're using Liquibase to deploy a database for our application but we're not actually in control of the database. This means that we've got to add in a lot of extra logic around each time we run Liquibase to avoid encountering any errors during the actual run. One way we've done that is that we're generating snapshots of what the DB should look like for each release of our product and then comparing that snapshot with the running DB to know that it's in a compatible state. The snapshot files for our complete database aren't gigantic but if we have to have a full one for every possible release that could cause our software package to get large in the future with dead weight.
We've looked at using the Linux patch command to create offset files as the deltas between these files will typically be very small (i.e. 1 column change, etc.) but the issues are the generated IDs in the snapshot that are not consistent across runs:
"snapshotId": "aefa109",
"table": "liquibase.structure.core.Table#aefa103"
Is there any way to force the IDs to be consistent or attack this problem in a different way?
Thanks!
Perhaps we should change how we think about PROD deployments. When I read:
This means that we've got to add in a lot of extra logic around each time we run Liquibase to avoid encountering any errors during the actual run.
This is sort of an anti-pattern in the world of Liquibase. Typically, Liquibase is used in a CI/CD pipeline and deployments of SQL are done on "lower environments" to practice for the PROD deployment (which many do not have control over, so your situation is a common one).
When we try to accommodate the possible errors during a PROD deployment, I feel we already are in a bad place with our deployment automation. We should have been testing the deploys on lower environmets that look like PROD.
For example your pipeline for your DB could look like:
DEV->QA->PROD
Create SQL for deployment in a changelog
DEV & QA seeded with restore from current state of PROD (maybe minus the row data)
You would have all control in DEV (the wild west)
Less control of QA (typically only by QA)
Iterate till you have no errors in your DEV & QA env
Deploy to PROD
If you still have errors, I would argue that you must root cause why and resolve so you can have a pipeline that is automatable.
Hope that helps,
Ronak

Can TeamCity tests be run asynchronously

In our environment we have quite a few long-running functional tests which currently tie up build agents and force other builds to queue. Since these agents are only waiting on test results they could theoretically just be handing off the tests to other machines (test agents) and then run queued builds until the test results are available.
For CI builds (including unit tests) this should remain inline as we want instant feedback on failures, but it would be great to get a better balance between the time taken to run functional tests, the lead time of their results, and the throughput of our collective builds.
As far as I can tell, TeamCity does not natively support this scenario so I'm thinking there are a few options:
Spin up more agents and assign them to a 'Test' pool. Trigger functional build configs to run on these agents (triggered by successful Ci builds). While this seems the cleanest it doesn't scale very well as we then have a lead time of purchasing licenses and will often have need to run tests in alternate environments which would temporarily double (or more) the required number of test agents.
Add builds or build steps to launch tests on external machines, then immediately mark the build as successful so queued builds can be processed then, when the tests are complete, we mark the build as succeeded/failed. This is reliant on being able to update the results of a previous build (REST API perhaps?). It also feels ugly to mark something as successful then update it as failed later but we could always be selective in what we monitor so we only see the final result.
Just keep spinning up agents until we no longer have builds queueing. The problem with this is that it's a moving target. If we knew where the plateau was (or whether it existed) this would be the way to go, but our usage pattern means this isn't viable.
Has anyone had success with a similar scenario, or knows pros/cons of any of the above I haven't thought of?
Your description of the available options seems to be pretty accurate.
If you want live update of the builds progress you will need to have one TeamCity agent "busy" for each running build.
The only downside here seems to be the agent licenses cost.
If the testing builds just launch processes on other machines, the TeamCity agent processes themselves can be run on a low-end machine and even many agents on a single computer.
An extension to your second scenario can be two build configurations instead of single one: one would start external process and another one can be triggered on external process completeness and then publish all the external process results as it's own. It can also have a snapshot dependency on the starting build to maintain the relation.
For anyone curious we ended up buying more agents and assigning them to a test pool. Investigations proved that it isn't possible to update build results (I can definitely understand why this ugliness wouldn't be supported out of the box).