Referenceing an exposure in dbt - dbt

I am new to using exposures. I want show more than 1 step downstream. Is it possible to make an exposure that depends on another exposure? How do you reference it? I tried this but it doesn't work. It says there is no node Step1:
- name: Step1
depends_on:
- ref('MyTable')
- name: Step2
depends_on:
- ref('Step1')

This isn't supported today. Exposures are leaf nodes in the directed, acyclic graph.
However there is a dbt-core GitHub issue today lists what you're asking for as a potential new feature:
exposures that depend on other exposures:
one exposure for each Mode query / Looker view, one exposure for the
dashboard that depends on those queries / views
Until then, the best you can do if you have a DAG like: table_A -> exposure1 -> exposure2, then you could restructure it like:
exposure1
/
table_A
\
exposure2
IMHO, documenting only exposure1 is sufficient, but sounds like you'd like more.

Related

How to persist column descriptions in BigQuery tables

I have created models in my dbt(data build tool) where I have specified column description. In my dbt_project.yml file as shown below
models:
sakila_dbt_project:
# Applies to all files under models/example/
+persist_docs:
relation: true
columns: true
events:
materialized: table
+schema: examples
I have added +persist_docs as described by dbt as the fix to make column description appear but still no description appears in bigquery table.
My models/events/events.yml looks like this
version: 2
models:
- name: events
description: This table contains clickstream events from the marketing website
columns:
- name: event_id
description: This is a unique identifier for the event
tests:
- unique
- not_null
- name: user-id
quote: true
description: The user who performed the event
tests:
- not_null
What I'm I missing?
p.s I'm using dbt version 0.21.0
Looks consistent with the required format as shown in the docs:
dbt_project.yml
models:
..[<resource-path>](resource-path):
....+persist_docs:
......relation: true
......columns: true
models/schema.yml
version: 2
models:
..- name: dim_customers
....description: One record per customer
....columns:
......- name: customer_id
........description: Primary key
Maybe spacing? I converted the spaces to periods in the examples above because the number of spaces is unforgivingly specific for yml files.
I've started using the vscode yml formatter because of how often I run into spacing issues on these keys in both the schema.yml and the dbt_project.yml
Otherwise, this isn't for a source or external-table right? Those are the only two artifacts that persist-docs is unsupported for.
Sources unsupported persist_docs -> sources tab
External Tables unsupported (Can't find in docs again but read today in docs or github issue)
Also Apache Spark unsupported (irrelevant here) Apache Spark Profile
Also, if you're going to be working with persist_docs a lot, check out this macro example persist_docs_op that Jeremy left for a run-operation to update your persisted docs in case that's all you changed!

spring cloud sleuth - new propagation field which joins span-ids

I would like to add a new custom propagation field - Request-Id.
This new field should be a sum of span-ids separated by dots. For example if we have such a chain of calls:
(ServiceA) -----> (ServiceB) -----> (ServiceC) -----> (ServiceD)
Span Ids are ("SpanAB", "SpanBC", "SpanCD") respectively.
Request-ids should be ("SpanAB", "SpanAB.SpanBC", "SpanAB.SpanBC.SpanCD") respectively.
Could you help me with creating this new custom field?
Take a look at the docs, you should be able to implement this using a Baggage.
Fyi: Sleuth is using Brave's Baggage under the hood by default.

need to join the vertex in dse

I have created properties and vertex like
schema.propertyKey('REFERENCE_ID').Int().multiple().create();
schema.propertyKey('Name').Text().single().create();
schema.propertyKey('PARENT_NAME').Text().single().create(); ... ....
.. schema.propertyKey('XXX').Text().single().create();
schema.vertexLabel('VERT1').properties("REFERENCE_ID",.."PROPERTY10"....."PROPERTY15")//15
PROPERTIES
schema.vertexLabel('VER2').properties("REFERENCE_ID",.."PROPERTY20"......"PROPERTY35")//35
PROPERTIES
schema.vertexLabel('VERT3').properties("REFERENCE_ID",.."PROPERTY20"....."PROPERTY25")//25
PROPERTIES
schema.vertexLabel('VERT4').properties("REFERENCE_ID",.."PROPERTY20"....."PROPERTY25")//25
PROPERTIES
and loaded csv data using DSG GRAPHLOADER(CSV TO(VERTEX)).
and created edge
schema.edgeLabel('ed1').single().create()
schema.edgeLabel('ed1').connection('VERT1', 'VER2').add()
schema.edgeLabel('ed1').single().create()
schema.edgeLabel('ed1').connection('VERT1', 'VERT3').add()
schema.edgeLabel('ed2').single().create()
schema.edgeLabel('ed2').connection('VERT3','VERT4').add()
But I don't know how to map the data between vertex and edge. I want to join all these 4 vertex. Could you please help on this?
I'm new to dse. I just ran the above code in datastax studio successfully and I can see the loaded data. I need to join the vertex...
Sql code: I want same in dse germlin.
select v1.REFERENCE_ID,v2.name,v3.total from VERT1 v1
join VER2 v2 on v1.REFERENCE_ID=v2.REFERENCE_ID
join VERT3 v3 on v2.sid=v3.sid
there are 2 "main" options in DSE for adding edge data, plus one if you're also using DSE Analytics.
One is to use Gremlin, like what's documented here - https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/graph/using/insertTraversalAPI.html
This approach would be a traversal based approach and may not be the best/fastest choice for bulk operations
Another solution is to use the Graph Loader, check out the example with the .asEdge code sample here - https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/graph/dgl/dglCSV.html#dglCSV
If you have DSE Analytics enabled, you can also use DataStax's DSE GraphFrame implementation, which leverages Spark, to preform this task as well. Here's an example - https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/graph/graphAnalytics/dseGraphFrameImport.html

Creating Titan indexed types for elasticsearch

I am having problems getting the elastic search indexes to work correctly with Titan Server. I currently have a local Titan/Cassandra setup using Titan Server 0.4.0 with elastic search enabled. I have a test graph 'bg' with the following properties:
Vertices have two properties, "type" and "value".
Edges have a number of other properties with names like "timestamp", "length" and so on.
I am running titan.sh with the rexster-cassandra-es.xml config, and my configuration looks like this:
storage.backend = "cassandra"
storage.hostname = "127.0.0.1"
storage.index.search.backend = "elasticsearch"
storage.index.search.directory = "db/es"
storage.index.search.client-only= "false"
storage.index.search.local-mode = "true"
This configuration is the same in the bg config in Rexter and the groovy script that loads the data.
When I load up Rexster client and type in g = rexster.getGraph("bg"), I can perform an exact search using g.V.has("type","ip_address") and get the correct vertices back. However when I run the query:
g.V.has("type",CONTAINS,"ip_")
I get the error:
Data type of key is not compatible with condition
I think this is something to do with the type "value" not being indexed. What I would like to do is make all vertex and edge attributes indexable so that I can use any of the string matching functions on them as necessary. I have already tried making an indexed key using the command
g.makeKey("type").dataType(String.class).indexed(Vertex.class).indexed("search",Vertex.class).make()
but to be honest I have no idea how this works. Can anyone help point me in the right direction with this? I am completely unfamiliar with elastic search and Titan type definitions.
Thanks,
Adam
the Wiki page Indexing Backend Overview should answer every little detail of your questions.
Cheers,
Daniel

Circular Definitions in yaml

I'm trying to use yaml to represent a train network with stations and lines; a minimum working example might be 3 stations, connected linearly, so A<->B<->C. I represent the three stations as follows:
---
stations:
- A
- B
- C
Now I want to store the different lines on the network, and where they start/end. To do this, I add a lines array and some anchors, as follows:
---
stations:
- &S-A A
- &S-B B
- &S-C C
lines:
- &L-A2C A to C:
from: *S-A
to: *S-C
- &L-C2A C to A:
from: *S-C
to: *S-A
and here's the part I'm having trouble with: I want to store the next stop each line at each station. Ideally something like this:
---
stations:
- &S-A A:
next:
- *L-A2C: *S-B
- &S-B B:
next:
- *L-A2C: *S-C
- *L-C2A: *S-A
- &S-C C:
next:
- *L-C2A: *S-B
(the lines array remains the same)
But this fails - at least in the Python yaml library, saying yaml.composer.ComposerError: found undefined alias 'L-A2C'. I know why this is - it's because I haven't defined the line yet. But I can't define the lines first, because they depend on the stations, but now the stations depend on the lines.
Is there a better way to implement this?
Congradulations! You found an issue in most (if not all) YAML implementations. I recently discovered this limitation too and I am investigating how to work around (in Ruby world). But that's not going to help you. What you are going to have to do is store the "next stops" as a separate set of data points.
next-stops:
*S-A:
- *L-A2C: *S-B
*S-B:
- *L-A2C: *S-C
- *L-C2A: *S-A
*S-C:
- *L-C2A: *S-B
Does that help?