Set CI/CD in dbt to test only modified models and new tests - dbt

I'm new to dbt, and been searching for this answer without success: how can we configure CI/CD to only test for modifications in models and their tests, or addition of new tests?

You will need the manifest.json from your last full production run, and then you can use the state flag and the state selector to specify only modified models.
https://docs.getdbt.com/reference/node-selection/methods#the-state-method
https://docs.getdbt.com/docs/guides/understanding-state
As Aleix points out in their comment, usually you will want to run the models downstream of those modified as well.

Related

Can I default every dbt command to cautious indirect selection?

I have a dbt project with several singular tests that ref() several models, many of them testing if a downstream model matches some expected data in the upstream. Whenever I build only another downstream model that uses the same upstream models, dbt will try to execute the tests with an out-of-date model. Sample visualization:
Model "a" is an upstream view that only makes simply transformations on a source table
Model "x" is a downstream reporting table that uses ref("a")
Model "y" is another downstream reporting table that also uses ref("a")
There is a test "t1" making sure every a.some_id exists in x.some_key
Now if I run dbt build -s +y, "t1" will be picked up an executed, however "x" is out-of-date when compared to "a" since new data has been pushed into the source table, so the test will fail
If I run dbt build -s +y --indirect-selection=cautious the problem will not happen, since "t1" will not be picked up in the graph.
I want every single dbt command in my project to use --indirect-selection=cautious by default. Looking at the documentation I've been unable to find any sort of environment variable or YML key in dbt_project that I could use to change this behavior. Setting a new default selector also doesn't help because it is overriden by the usage of the -s flag. Setting some form of alias does work but only affects me, and not other developers.
Can I make every dbt build in my project use cautious selection by default, unless the flag --indirect-selection=eager is given?

gitlab-ci: Is there a way to visualize or simulate pipelines for different branch/tag names?

We have a fairly complex .gitlab-ci.yml configuration. I know about the new gitlab pipeline editor, but I can't find a way to 'simulate' what jobs get picked by my rules depending on the branch name, tag, etc.
On our jobs, we have a $PIPELINE custom variable to allow us to have different pipeline 'types' by using schedules to define this var to different values, like this:
rules:
- if: '$PIPELINE == "regular" && ($CI_COMMIT_BRANCH == "master" || $CI_COMMIT_TAG != null)'
or like this:
rules:
- if: '$CI_COMMIT_TAG != null'
Is there a way to 'simulate' a pipeline with different branch names, tags and variables so I can see what jobs get picked on each case, without actually running the pipelines (e.g. with a test tag, etc.). Or is there a better way to do this?
Thanks in advance.
Not quite, but this is close, with GitLab 15.3 (August 2022):
Simulate default branch pipeline in the Pipeline Editor
The pipeline editor helps prevent syntax errors in your pipeline before you commit. But pipeline logic issues are harder to spot. For example, incorrect rules and needs job dependencies might not be noticed until after you commit and try running a pipeline.
In this release, we’ve brought the ability to simulate a pipeline to the pipeline editor.
This was previously available in limited form in the CI Lint tool, but now you can use it directly in the pipeline editor. Use it to simulate a new pipeline creation on the default branch with your changes, and detect logic problems before you actually commit!
See Documentation and Issue.

What is the difference and relationship between an Azure DevOps Build Definition, and a Pipeline?

I am trying to automate a process in Azure DevOps, using the REST API. I think it should go like this (at least, this is the current manual process):
fork repo
create pipeline(s) based using YAML files in newly forked repo
run pipelines in particular way
I am new to the Azure DevOps REST API and I am struggling to understand what I have done and what I should be doing.
Using the REST API, I seem to be able to create what I would call a pipeline, using the pipeline endpoint; I do notice that if I want to run it, I have to interact with its build definition instead.
Also, looking at code other colleagues have written, it seems (though I may be wrong) like they are able to achieve the same by simply creating a build definition, and not explicitly creating pipeline.
This lack of understanding is driving me bonkers so I am hoping someone can enlighten me!
Question
What is the difference, and relationship, between a Build Definition and a Pipeline?
Additional info, I am not interested in working with the older Release Pipelines and I have tried to find the answer among the Azure DevOps REST API docs, but to no avail.
If you want to create a pipeline you can do this using both of this. However, the difference is actually in terms of concept:
build definitions are part of first available flow which consist: build and release where build was responsible for building, testing and publishing artifact for later use in releases to deploy
pipeline are a new approach which leverage YAML designed process for building/testing/deploying code
More info you can find here - Whats the difference between a build pipeline and a release pipeline in Azure DevOps?
And for instance for this pipeline/build
https://dev.azure.com/thecodemanual/DevOps%20Manual/_build?definitionId=157
where definition id is 157
You will get reposnses in both endpoints:
https://dev.azure.com/{{organization}}/{{project}}/_apis/build/definitions/157?api-version=5.1
and
https://dev.azure.com/{{organization}}/{{project}}/_apis/pipelines/157?api-version=6.0-preview.1
and in that term pipeline id = build id
The pipelines endpoint is not very useful:
https://dev.azure.com/{Organization}/{ProjectName}/_apis/pipelines?api-version=6.0-preview.1
It will only give you a list of pipelines with very basic info such as name, ID, folder etc.
To create and update YAML pipelines you need to use the Build definitions endpoint. The IDs you use in the endpoint are the same IDs as the Pipelines endpoint uses.
Get definition, Get list, Create, Update:
https://dev.azure.com/{Organization}/{ProjectName}/_apis/build/definitions?api-version=6.0
(To create a working pipeline you must first Get an existing pipeline, modify the JSON you receive, then POST it as a new definition.)

Is there a way to make Gitlab CI run only when I commit an actual file?

New to Gitlab CI/CD.
What is the proper construct to use in my .gitlab-ci.yml file to ensure that my validation job runs only when a "real" checkin happens?
What I mean is, I observe that the moment I create a merge request, say—which of course creates a new branch—the CI/CD process runs. That is, the branch creation itself, despite the fact that no files have changed, causes the .gitlab-ci.yml file to be processed and pipelines to be kicked off.
Ideally I'd only want this sort of thing to happen when there is actually a change to a file, or a file addition, etc.—in common-sense terms, I don't want CI/CD running on silly operations that don't actually really change the state of the software under development.
I'm passably familiar with except and only, but these don't seem to be able to limit things the way I want. Am I missing a fundamental category or recipe?
I'm afraid what you ask is not possible within Gitlab CI.
There could be a way to use the CI_COMMIT_SHA predefined variable since that will be the same in your new branch compared to your source branch.
Still, the pipeline will run before it can determine or compare SHA's in a custom script or condition.
Gitlab runs pipelines for branches or tags, not commits. Pushing to a repo triggers a pipeline, branching is in fact pushing a change to the repo.

Entity Framework Code First - Tests Overlapping Each Other

My integration tests are use a live DB that's generated using the EF initalizers. When I run the tests individually they run as expected. However when I run them all at once, I get a lot of failed tests.
I appear to have some overlapping going on. For example, I have two tests that use the same setup method. This setup method builds & populates the DB. Both tests perform the same test ACT which adds a handful of items to the DB (the same items), but what's unique is each test is looking for different calculations (instead of one big test that does a lot of things).
One way I could solve this is to do some trickery in the setup that creates a unique DB for each test that's run, that way everything stays isolated. However the EF initilization stuff isn't working when I do that because it is creating a new DB rather than dropping & replacing it iwth a new one (the latter triggers the seeding).
Ideas on how to address this? Seems like an organization of my tests... just not show how to best go about it and was looking for input. Really don't want to have to manually run each test.
Use test setup and tear down methods provided by your test framework and start transaction in test setup and rollback the transaction in test tear down (example for NUnit). You can even put setup and tear down method to the base class for all tests and each test will after that run in its own transaction which will rollback at the end of the test and put the database to its initial state.
Next to what Ladislav mentioned you can also use what's called a Delta Assertion.
For example, suppose you test adding a new Order to the SUT.
You could create a test that Asserts that there is exactly 1 Order in the database at the end of the test.
But you can also create a Delta Assertion by first checking how many Orders there are in the database at the start of the test method. Then after adding an Order to the SUT you test that there are NumberOfOrdersAtStart + 1 in the database.