dbt two argument ref - dbt

How do I refer to relative path from view1 to view2 using ref('package_name', 'model_name') ?
/root_folder
/ project1
/models
view1.sql
dbt_project.yml
/ project2
/models
view2.sql
dbt_project.yml
There is no code example in the documentation.
Thank you.

The only way for project2 to know about models in project1 is if project2 included project1 as a package in its packages.yml file. Then you could refer to view1 as ref('project1', 'view1') in project2.
You'll have to check the syntax here, but you could include project1 in project2 in packages.yml like so:
in project2\packages.yml:
packages:
- local: ../project1
Needless to say, you'd save yourself a lot of headaches by simply not splitting projects. In most cases you shouldn't need to do that and simple folderization does most of what you might need.

To try and answer the question with a little more focus on your comment:
I have one folder for each dataset I have on bigquery. Can I write models for several datasets in one single folder?
Yes you can!
Quick note on terms from the dbt docs "BigQuery configurations" in case you are not using bigquery.
schema is interchangeable with the BigQuery concept dataset
database is interchangeable with the BigQuery concept of project
Here is how this works for me:
project-dir
| analysis
| data
| macros
| models
|> sources
- dataset1.yml
- dataset2.yml
| seed
| dbt_project.yml
| packages.yml
Where the contents of a dataset.yml is:
version: 2
sources:
- name: fivetran_log
database: my-bigquery-project-id
loader: fivetran
tables:
- name: account
- name: log
- name: user
No references are required within the dbt_project.yml to utilize these sources immediately. Instead, you can reference this directly from models like:
select *
from {{ source('fivetran_log', 'user') }}
That should allow you to have multiple dataset sources, but one single dbt project directory for all your views.
However, if the datasets you are referencing are within different bigquery regions or different billing projects, I believe you will run into some errors.
Appendix of related questions / resources across the dbt-verse:
Should I have an organisation wide project (a monorepo) or should each work flow have their own?
Building dbt models to be compatible with multiple data warehouses

Related

How can I change the target dataset dynamically based on the file path location in the project?

Is there a way to change the target dataset on BigQuery based on the file path location with dbt?
For example:
project
models
contextA
contextB
There are two datasets on BigQuery: datasetA and datasetB
I would like to save the models inside folder contextA in the dataset called datasetA and the ones inside contextB in the dataset contextB dynamically, without considering the target in profiles.yml.
I've tried to do it with that macro called generate_alias_name, but without success.
You can do this in dbt_project.yml. In the models folder, you will have folders contextA and contextB. Then in dbt_project.yml, you will add this:
models:
project_name:
contextA:
schema: datasetA
contextB:
schema: datasetB
You can do this even with nested folders:
contextA:
schema: datasetA
another_folder_in_contextA_that_saves_models_to_a_different_dataset:
schema: another_dataset
contextB:
schema: datasetB

SpacyEntityExtractor is not recognising time entities correctly

Rasa v - 0.15
OS - Mac OS
text - set an alarm at 3 am
entity = CARDINAL
value = 3
We can see that expected entities from text should be-
entity = TIME
value = 3am
Why it showing wrong result?
Model used in spacy - 'en_core_web_md'
Pipeline that I am using is -
language: "en"
pipeline:
- name: "SpacyNLP"
model: "en_core_web_sm"
case_sensitive: false
- name: "WhitespaceTokenizer"
- name: "SpacyEntityExtractor"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "CountVectorsFeaturizer"
- name: "EmbeddingIntentClassifier"
I'm not familiar with the elements of the stack that are not Spacy, but as far as Spacy goes: the models are not always correct. They use probabilistic approaches to determine the category of a Named Entity.
You can experiment with larger models (such as en_core_web_lg), but they are more expensive computationally. Alternatively, you can think about training the NER-model to be better fit for your purpose. Spacy.io offer a tool for this, it is called Prodigy I think. Either way - without extensive training it is still a challenge to create totally robust Named Entity Recognition.
I would recommend to try out rasa/duckling. This is using the entity extractor from wit.ai and it is very nice and powerful for extracting time and date entities. For this, it is necessary to run a separated docker container and include it in your pipeline configuration in your nlu_config.yml and to specify the endpoint of this docker container in your endpoints.yml

Freeze django model object in time, like a snapshot

Lets say I want to track employees working on projects.
Project 2015
- Employee A
Employee A now changes his healthcare-provider,
address and completes his bachelor degree.
Project 2018
- Employee A
For Project 2018 the Employee A details are up to date. But if I look
back at project 2015 the Employee A details are now newer than the Project itself. For example now it looks like Employee A had a bachelors degree when working at Project 2015, which is incorrect.
What I need is like an instance of Employee A frozen in time/time capsule/snapshot/copy when saving him to a Project. While still being able to update the "live" version of the employee.
There are other models where I will run into the same problem. It really boggles my mind, because it's so counterintuitive for database-thinking. Is there a right/correct way to handle this. Is there maybe a Django revision? solution? Thank You!
The project django-simple-history is very useful and you can have snapshots of your objects.
I had similiar challenges and we worked it out by doing a pattern that would rougly translate to this domain as:
class EmployeeProfile(Model):
class Meta:
abstract = True
common_field1 = CharField()
common_field2 = CharField()
common_field3 = CharField()
def get_employee_profile_data(self):
return {
'common_field1': self.common_field1,
'common_field2': self.common_field2,
'common_field3': self.common_field3,
}
class Employee(EmployeeProfile):
specific_fields
class ProjectProfile(EmployeeProfile):
class Meta:
unique_together = ('project', 'employee')
project = ForeignKey(Project)
owner = ForeignKey(Employee) # Means that the Employee "owns" this profile
# A factory function
def create_project_profile(employee, project):
return ProjectProfile.objects.create(
project=project,
owner=employee,
**employee.get_employee_profile_data())
We tried to think with separation of concern in mind.
I this scenario I think the pattern fulfills the following:
A project have project specific profile which is owned by an Employee
An employee can only have one profile per project
It is possible to change the specific profile for a project without affecting the "live data"
It is possible to change an employee profile live data without affecting any project
Benefits are that database migrations will affect both Employee and the ProjectProfile and it should be simple to put get_employee_profile_data under unittest.
The owner reference will make sure it's easy to query for participants etc for projects.
Hope it can give some ideas...

How to build a temporary FeatureClass in ArcObject?

I want to build a temporary FeatureClass which contains temporary Features , such as points, which are useless later in programming.
While, I searched for ArcObject API reference, but I can't find an efficient way to solve this problem. So how can I build temporary "container" to store some temporary Features ?
Should I first use CreateFeatureClass to build a real FeatureClass and later delete it? I don't think this method is cool for I have to deal with some CLSID thing.
PS:This "container" must have the ability to return a Cursor.
I think you should to use InMemoryWorkspace.
IWorkspaceFactory2 objWorkspaceFactory = new InMemoryWorkspaceFactoryClass();
IWorkspaceName objWorkspaceName = objWorkspaceFactory.Create(string.Empty, p_strName, null, 0);
IName objName = (IName)objWorkspaceName;
IWorkspace objWorkspace = (IWorkspace)objName.Open();
Now using this workspace you can to create Temprorary Feature Classes (perform search, get cursor and than delete the feature class).
I believe that in your case InMemory Workspace is more efficient than working with ShapeFile or Personal Geodatabase.
You can use the IScratchWorkspaceFactory2 interface, which is used to create temporary personal geodatabases in the temp directory. You can find this directory by looking at the %TEMP% environment variable. The scratch personal geodatabase will have the name mx.mdb where is the lowest positive number that uniquely identifies the geodatabase.
IScratchWorkspaceFactory2 factory = new ScratchWorkspaceFactoryClass();
var selectionContainer = factory.DefaultScratchWorkspace;

Query to get all projects in a workspace using lookback api

Is Project a valid _Type to use in a lookback lquery?
I tried "_Type":"Project"
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/1234/artifact/snapshot/query.js?find={"_Type":"Project","State":"Open"}&fields=["Name"]
and also "_TypeHierarchy":"Project"
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/1234/artifact/snapshot/query.js?find={"_TypeHierarchy":"Project","State":"Open"}&fields=["Name"]
and both returned 0 results. The same syntax works if "_TypeHierarchy":"Defect" but not with Project, but there are no errors. Thanks.
The Lookback API supports querying snapshots for a given Project or ProjectHierarchy. For example:
{
[...]
"Project": 12345
}
or
{
[...]
"_ProjectHierarchy": 12345
}
However, it's not possible get a list of projects from the Lookback API outside the context of artifact snapshots. Getting projects would be a manual process. If you get a list of snapshots, you could iterate the result set and extract the Project OIDs, then generate a list. You could even parse the _ProjectHierarchy values and construct the project tree. Another caveat is hydrating the Project OIDs will require WSAPI calls.
Querying projects from the Lookback API may be expensive. You can specify fields to reduce the amount of data in the response. e.g.
fields: ["Project", "_ProjectHierarchy"]