Detect table to view materialization change in CI - dbt

Is there an easy way to detect a change in materialization in CI, to avoid a dbt run failure with the error
Compilation Error in model stores_stores
(models/marts/core/blah.sql) Trying to create view
`blah`.`dbt`.`blah`, but it currently exists as a
table. Either drop `blah`.`dbt`.`blah` manually,
or run dbt with `--full-refresh` and dbt will drop it for you.
Thanks!

There is a way, however it's not particularly "easy".
What you can do is leverage the artifacts that dbt generates.
manifest.json: produced by compile, run, test, docs generate, ls
run_results.json: produced by run, test, seed, snapshot, docs generate
catalog.json: produced by docs generate
The information for materialisation change can be found in both the run_results and the manifest. However, in your context of adding a check to the CI in order to fail early, you want to be notified before getting an error from a dbt run. So you could actually generate the manifest.json with dbt compile.
In the nodes key of the manifest, each node will have a config.materialized key that you can look at. You can parse that with the command line or with python and store the result to a JSON file which holds the materialisation information of each model. That file can be checked-in into your code for example.
cat target/manifest.json | jq '.nodes | to_entries | map({node: .key, materialized: .value.config.materialized})' > old_state.json
Then after you've made a change to your dbt code, you need to run
dbt compile # generates new manifest.json
cat target/manifest.json | jq '.nodes | to_entries | map({node: .key, materialized: .value.config.materialized})' > new_state.json
You can then compare two states with e.g. diff in the command line. I'll put an example output here:
$ diff old_state.json new_state.json
12c12
< "materialized": "table"
---
> "materialized": "view"
As I said, this is not "easy" per say, but I hope my answer gave you some ideas on how to proceed to get what you want. If you're interested in more details, you can check my blog post on the topic.

Related

Is it possible to disable the syntax check when running molecule test?

I have role, role-1, that I'm testing that is dependent on another role, role-2.
I clone the second role, role-2, into /tmp during the prepare step, and it is imported later from /tmp during the converge, however during the
INFO Running default > syntax
I get an error, that role-2 is not found, as this role is not yet cloned and does not exist on the system.
From the debug/verbose output it look like molecule test result in the following command being run
COMMAND: ansible-playbook --diff --inventory /home/vagrant/.cache/molecule/role-1/default/inventory --skip-tags molecule-notest,notest --syntax-check /opt/role-1-role/ansible/roles/role-1/molecule/default/converge.yml
Is there a way to stop this command running the --syntax-check, override the default command that molecule test runs? Or the have the syntax-check skip certain tasks or files?
Just found that you can add scenario to the molecule.yml file and overwrite/re-order the test sequence, so that solves the issue I was having, by reordering the sequence so that the syntax check happened after the prepare step.
See molecule.scenario

Ctest get number of tests passed/failed in script

Is there a straightforward way when using ctest to get the number of tests passed (and/or failed) within a script, e.g., BASH, without grep-ping through a generated output file?
a straightforward way ... without grep-ping
No, I believe there is not.
You can also "grep" the count the lines Test failed. and Test passed. from CMake the_build_dir/Testing/Temporary/LastTest.log.
You could potentially generate ctest XML report to a dashboard and then parse the XML reports (instead of sending them). It's nowhere as straightforward, as ctest script has to be written that configures, builds and tests the project and then separate XML tool needs to parse the result.
You can also run a cdash server and let that ctest script upload the results to cdash and then query cdash server with simple curl 'https://your.cdash.server/api/v1/index.php?project=TheProjectName' | jq '.buildgroups[] | select(.id == 2).builds[] | { "pass": .test.pass, "fail": .test.fail, }. The querying is simple, but.. it needs to run a cdash server and also test with ctest script, it's not near straightforward..
Btw, it's easy to get the number of failed tests - it's just wc -l the_build_dir/Testing/Temporary/LastTestsFailed.log.

How to run multi-tag selector

I am using dbt 0.18.1 and I follow the documentation about tags however I am curious to know how to run multi-tag selector as arguments.
According to this:
https://github.com/fishtown-analytics/dbt/pull/1014
Select using a mix of tags, fqns, and parent/child selectors:
$ dbt run --model tag:nightly+ salesforce.*+
Unfortunately this is not really a "mix of tags".
I have tags of [mixpanel_tests, quality] and I wish to run models that have both tags included (not separated). If I run dbt run -m tag:quality -t blabla
I would have executed all models that have QUALITY in the array of tags regardless if its single argument or multiple argument however I wish to run ONLY quality marked. How to do that?
How do I specify 2 tags or 3 tags selector to run models with the mentioned tags (i.e mixpanel_tests, quality - but only those models that have both tags defined). More or less an AND clause rather than an OR clause.
Hmm I hope it is clear. How to have multitag selector that executes only the combination of tags given?
Check out the intersection operator. It's new in dbt v0.18, and it's for this use case exactly.
dbt run -m tag:mixpanel_tests,tag:quality

Bamboo with tSQLt - Failed to parse test result file

First of all I should point out I'm new to Atlassian's Bamboo and continuous integration in general. This is the first project where I've used either.
I've created a raft of unit tests using the tSQLt framework. I've also configured Bamboo to:
Get a fresh copy of the repository from BitBucket
Drop & re-create the build DB
Use Red-Gate SQL Compare to deploy the DB objects from source to the build DB
Run the tSQLt tests
Output the results of the tests in XML format to a file called TestResults.xml
I've checked and can confirm that the TestResults.xml file is created.
In Bamboo I then added a JUnit Parser task to consume the contents of this TestResults.xml file. However when that task runs it returns this error:
Failed to parse test result file
At first I thought it might have meant that Bamboo could not find the file. I changed the task that created the results file to output a file called TestResults2.xml. When I did that the JUnit Parser returned this error:
Failing task since test cases were expected but none were found.
So I'm assuming that the first error message means Bamboo is finding the file, it just can't parse the file.
I have no idea where to start working out what exactly is the problem. Has anyone got any ideas?
I had a similar problem, but turned out to be weird behavior from bamboo needing file stamps being modified to have visibility of the JUnit file.
In Windows enviornment you just need to add "script task" before the "JUnit task"
powershell (ls *.xml).LastWriteTime = Get-Date
Reference
https://jira.atlassian.com/browse/BAM-12768
I have had several cases of this and was able to fix it by removing single quotes and greater than / less than characters from test names inside the *.rb file.
Example
test "make sure 'go_to_world' is removed from header and length < 23"
change to remove single quotes and < symbol
test "make sure go_to_world is removed from header and length less than 23"
Very common are contractions: "won't don't shouldn't", or possessives: "the vessel's data".
And also < or > characters.
I think there is a bug in the parser that just doesn't escape those characters in a test title appropriately.

How to force STORE (overwrite) to HDFS in Pig?

When developing Pig scripts that use the STORE command I have to delete the output directory for every run or the script stops and offers:
2012-06-19 19:22:49,680 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 6000: Output Location Validation Failed for: 'hdfs://[server]/user/[user]/foo/bar More info to follow:
Output directory hdfs://[server]/user/[user]/foo/bar already exists
So I'm searching for an in-Pig solution to automatically remove the directory, also one that doesn't choke if the directory is non-existent at call time.
In the Pig Latin Reference I found the shell command invoker fs. Unfortunately the Pig script breaks whenever anything produces an error. So I can't use
fs -rmr foo/bar
(i. e. remove recursively) since it breaks if the directory doesn't exist. For a moment I thought I may use
fs -test -e foo/bar
which is a test and shouldn't break or so I thought. However, Pig again interpretes test's return code on a non-existing directory as a failure code and breaks.
There is a JIRA ticket for the Pig project addressing my problem and suggesting an optional parameter OVERWRITE or FORCE_WRITE for the STORE command. Anyway, I'm using Pig 0.8.1 out of necessity and there is no such parameter.
At last I found a solution on grokbase. Since finding the solution took too long I will reproduce it here and add to it.
Suppose you want to store your output using the statement
STORE Relation INTO 'foo/bar';
Then, in order to delete the directory, you can call at the start of the script
rmf foo/bar
No ";" or quotations required since it is a shell command.
I cannot reproduce it now but at some point in time I got an error message (something about missing files) where I can only assume that rmf interfered with map/reduce. So I recommend putting the call before any relation declaration. After SETs, REGISTERs and defaults should be fine.
Example:
SET mapred.fairscheduler.pool 'inhouse';
REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar;
%default name 'foobar'
rmf foo/bar
Rel = LOAD 'something.tsv';
STORE Rel INTO 'foo/bar';
Once you use the fs command, there a lot of ways to do this. For an individual file, I wound up adding this to the beginning of my scripts:
-- Delete file (won't work for output, which will be a directory
-- but will work for a file that gets copied or moved during the
-- the script.)
fs -touchz top_100
rm top_100
For a directory
-- Delete dir
fs -rm -r out