DBT - [WARNING]: Did not find matching node for patch - dbt

I keep getting the error below when I use dbt run - I can't find anything on why this error occurs or how to fix it within the dbt documentation.
[WARNING]: Did not find matching node for patch with name 'vGenericView' in the 'models' section of file 'models\generic_schema\schema.sql'

did you by chance recently upgrade to dbt 1.0.0? If so, this means that you have a model, vGenericView defined in a schema.yml but you don't have a vGenericView.sql model file to which it corresponds.

If all views and tables defined in schema are 1 to 1 with model files then try to run dbt clean and test or run afterward.
Not sure what happened to my project, but ran into frustration looking for missing and/or misspelled files when it was just leftovers from different compiled files not cleaned out. Previously moved views around to different schemas and renamed others.

So the mistake is here in the naming:
The model name in the models.yml file should for example be: employees
And the sql file should be named: employees.sql
So your models.yml will look like:
version: 2
models:
- name: employees
description: "View of employees"
And there must be a model with file name: employees.sql

One case when this will happen is if you have the same data source defined in two different schema.yml file (or whatever you call it)

Related

How to specify model schema when referencing another dbt project as a package? (dbt multi-repo setup)

We're using a dbt multi-repo setup with different projects for different business areas. We have several projects, something like this:
dbt_dwh
dbt_project1
dbt_project2
The dbt_dwh project contains models which we plan to reference in projects 1 and 2 (we have ~10 projects that would reference the dbt_dwh project) by way of installing git packages. Ideally, we'd like to be able to just reference the models in the dbt_dwh project (e.g.
SELECT * FROM {{ ref('dbt_dwh', 'model_1') }}). However, each of our projects sits in it's own database schema and this causes issue upon dbt run because dbt uses the target schema from dbt_project_x, where these objects don't exist. I've included example set-up info below, for clarity.
packages.yml file for dbt_project1:
packages:
- git: https://git/repo/url/here/dbt_dwh.git
revision: master
profiles.yml for dbt_dwh:
dbt_dwh:
target: dwh_dev
outputs:
dwh_dev:
<config rows here>
dwh_prod:
<config rows here>
profiles.yml for dbt_project1:
dbt_project1:
target: project1_dev
outputs:
project1_dev:
<config rows here>
project1_prod:
<config rows here>
sf_orders.sql in dbt_dwh:
{{
config(
materialized = 'table',
alias = 'sf_orders'
)
}}
SELECT * FROM {{ source('salesforce', 'orders') }} WHERE uid IS NOT NULL
revenue_model1.sql in dbt_project1:
{{
config(
materialized = 'table',
alias = 'revenue_model1'
)
}}
SELECT * FROM {{ ref('dbt_dwh', 'sf_orders') }}
My expectation here was that dbt would examine the sf_orders model and see that the default schema for the project it sits in (dbt_dwh) is dwh_dev, so it would construct the object reference as dwh_dev.sf_orders.
However, if you use command dbt run -m revenue_model1 then the default dbt behaviour is to assume all models are located in the default target for dbt_project1, so you get something like:
11:05:03 1 of 1 START sql table model project1_dev.revenue_model1 .................... [RUN]
11:05:04 1 of 1 ERROR creating sql table model project1_dev.revenue_model1 ........... [ERROR in 0.89s]
11:05:05
11:05:05 Completed with 1 error and 0 warnings:
11:05:05
11:05:05 Runtime Error in model revenue_model1 (folder\directory\revenue_model1.sql)
11:05:05 404 Not found: Table database_name.project1_dev.sf_orders was not found
I've got several questions here:
How do you force dbt to use a specific schema on runtime when using dbt ref function?
Is it possible to force dbt to use the default parameters/settings for models inside the dbt_dwh project when this Git repo is installed as a package in another project?
Some points to note:
All objects & schemas listed above sit in the same database
I know that many people recommend mono-repo set-up to avoid exactly this type of scenario, but switching to a mono-repo structure is not feasible right now, as we are already fully invested in multi-repo setup
Although it would be feasible to create source.yml files in each of the dbt projects to reference the output objects of the dbt_dwh project, this feels like duplication of effort and could result in different versions of the same sources.yml file across projects
I appreciate it is possible to hard-code the output schema in the dbt config block, but this removes our ability to test in dev environment/schema for dbt_dwh project
I managed to find a solution so I'll answer my own question in case anybody else runs up against the same issue. Unfortunately this is not documented anywhere that I can find, however, a throw-away comment in the dbt Slack workspace sparked an idea that allowed me to find the/a solution (I'll post the message if I manage to find it again, to give credit where it's due).
To fix this is actually very simple, you just need to add the project being imported to your profiles.yml file and specify the schema. For our use case this is fine as we only have 1 schema we use.
profiles.yml for dbt_project1:
models:
db_project_1:
outputs:
project1_dev:
<configs here>
project1_prod:
<configs here>
dbt_dwh:
+schema: [[schema you want these models to run into]]
<configs here>
The advantages with this approach are:
When you generate/serve dbt docs it allows you to see the upstream lineage from the upstream project
If there are any upstream dependencies in your upstream project you can run this using dbt run -m +model_name (this can be super handy)
If you don't want this behaviour then you can use dbt run -m +model_name --exclude dbt_dwh (for example) to prevent models in your upstream project from running.
I haven't yet figured out if it is possible to use the default parameters/settings for models inside the upstream project (in this case dbt_dwh) but I will edit this answer if I find a way.

how to load / mount an existing file based database in HSQLdb

Good day all !
I have 2.5 HSQLDB running as server on my Windows machine via:
#java -classpath ./lib/hsqldb.jar org.hsqldb.server.Server -database.0 file:.\data -dbname.0 foo
I left the files of my foo database in the data folder:
foo.data
foo.log (empty BTW)
foo.properties
foo.script
... and I see this line:
[Server#74cd4d]: Database [index=0. id=0, db=file:.\data, alias=foo] opened successfully in 335ms.
I open the manager via:
#java classpath .\lib\hsqldb.jar org.hsqldb.util.DatabaseManagerSwing
connection properties:
Recent Settings: foo
Setting Name: foo
Type: HSQL Database Engine Server
Driver: org.hsqldb.jdbc.JDBCDriver
URL: jdbc:hsqldb:hsql://localhost/foo
I run the query to show all tables (which I know for a fact exists):
SELECT * FROM INFORMATION_SCHEMA.SYSTEM_TABLES where TABLE_TYPE='TABLE'
... but I get zero results.
Any ideas?
Thanks!
PS I had to spend an hour figuring out why I hit the post button and it complained I did not indent my code correctly so I indented the whole thing and was then forced to type this long diatribe in order to get my post to be accepted ...
... God bless the pedantic "Knights who say Ni" !
The file: URL among the server properties indicates the path and file name of the database files (but without the file extensions such as .script, .data).
The URL that you used in db=file:.\data refers to database files named data.script, data.properties, etc. in the parent directory of lib. You can check the parent directory and will probably find a set of database files with those names was created when you started the server.
In order to access you proper database, the URL should be db=file:.\data\foo

boost build - sources with the same name

src
|--Manager.cpp
|--Specializations
| |--Manager.cpp
Building this Boost.Build tries to create
/bin/...
|--Manager.o
|--Manager.o
but fails. How to resolve this automatically? I read FAQ item, but I don't like the solution, as I have to fix things manually when I have a same class name, but different namespace. Would it be possible to make Boost.Build automatically prefix object file names with directory?
/bin/...
|--Manager.o
|--Specializations.Manager.o
Or duplicate the source directory tree?
/bin/...
|--Manager.o
|--Specializations
| |--Manager.o
This behavior has been changed a long time ago and should just work. Boost.Build now mimics the source structure, i.e. you should get both bin/Manager.o and bin/Specializations/Manager.o.

MsTest, DataSourceAttribute - how to get it working with a runtime generated file?

for some test I need to run a data driven test with a configuration that is generated (via reflection) in the ClassInitialize method (by using reflection). I tried out everything, but I just can not get the data source properly set up.
The test takes a list of classes in a csv file (one line per class) and then will test that the mappings to the database work out well (i.e. try to get one item from the database for every entity, which will throw an exception when the table structure does not match).
The testmethod is:
[DataSource(
"Microsoft.VisualStudio.TestTools.DataSource.CSV",
"|DataDirectory|\\EntityMappingsTests.Types.csv",
"EntityMappingsTests.Types#csv",
DataAccessMethod.Sequential)
]
[TestMethod()]
public void TestMappings () {
Obviously the file is EntityMappingsTests.Types.csv. It should be in the DataDirectory.
Now, in the Initialize method (marked with ClassInitialize) I put that together and then try to write it.
WHERE should I write it to? WHERE IS THE DataDirectory?
I tried:
File.WriteAllText(context.TestDeploymentDir + "\\EntityMappingsTests.Types.csv", types.ToString());
File.WriteAllText("EntityMappingsTests.Types.csv", types.ToString());
Both result in "the unit test adapter failed to connect to the data source or read the data". More exact:
Error details: The Microsoft Jet database engine could not find the
object 'EntityMappingsTests.Types.csv'. Make sure the object exists
and that you spell its name and the path name correctly.
So where should I put that file?
I also tried just writing it to the current directory and taking out the DataDirectory part - same result. Sadly, there is limited debugging support here.
Please use the ProcessMonitor tool from technet.microsoft.com/en-us/sysinternals/bb896645. Put a filter on MSTest.exe or the associate qtagent32.exe and find out what locations it is trying to load from and at what point in time in the test loading process. Then please provide an update on those details here .
After you add the CSV file to your VS project, you need to open the properties for it. Set the Property "Copy To Output Directory" to "Copy Always". The DataDirectory defaults to the location of the compiled executable, which runs from the output directory so it will find it there.

extra-paths not added to python path with zc.recipe.testrunner

I am trying to run tests by adding a version of tornado downloaded from github.com in the sys.path.
[tests]
recipe = zc.recipe.testrunner
extra-paths = ${buildout:directory}/parts/tornado/
defaults = ['--auto-color', '--auto-progress', '-v']
But when I run bin/tests I get the following error :
ImportError: No module named tornado
Am I not understanding how to use extra-paths ?
Martin
Have you tried looking into generated bin/tests script if it contains your path? It will tell definitely if your buildout.cfg is correct or not. Maybe problem is elsewhere. Because it seem that your code is ok.
If you happen to regularly include various branches from git/mercurial or elsewhere to buildout, you might be interested in mr.developer. mr.developer can download and add package to develop =. You wont need to set extra-path in every section.