I am new to using exposures. I want show more than 1 step downstream. Is it possible to make an exposure that depends on another exposure? How do you reference it? I tried this but it doesn't work. It says there is no node Step1:
- name: Step1
depends_on:
- ref('MyTable')
- name: Step2
depends_on:
- ref('Step1')
This isn't supported today. Exposures are leaf nodes in the directed, acyclic graph.
However there is a dbt-core GitHub issue today lists what you're asking for as a potential new feature:
exposures that depend on other exposures:
one exposure for each Mode query / Looker view, one exposure for the
dashboard that depends on those queries / views
Until then, the best you can do if you have a DAG like: table_A -> exposure1 -> exposure2, then you could restructure it like:
exposure1
/
table_A
\
exposure2
IMHO, documenting only exposure1 is sufficient, but sounds like you'd like more.
How can I swap values within classes please?
As shown in this table:
- - - - - - - - - - Before - - - - - - - - - - - - - - - - - - After - - - - - - - - - -
I want to do this because it is over sampled data. It is very repetitive and this causes machine learning tools to over fit.
OK, try this out:
# Setup example dataframe
df = pd.DataFrame({"Class" : [1,2,1,3,1,2,1,3,1,2,1,3,1,2,1,3],
1:[1,1,1,0,1,0,1,0,1,0,1,0,1,0,1,1],
2:[0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0],
3:[0,0,1,1,1,0,1,1,0,0,1,1,1,0,1,1],
4:[1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1],
5:[0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1],
6:[0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1]}).set_index("Class")
# Do a filter on class, and store the positions/index of matching contents
class_to_edit=3
swappable_indices = np.where(df.index==class_to_edit)[0]
# Extract the column to edit
column_to_edit=1
column_values = df[column_to_edit].values
# Decide how many values to swap, and randomly assign swaps
# No guarantee here that the swaps will not contain the same values i.e. you could
# end up swapping 1's for 1's and 0's for 0's here - it's entirely random.
number_of_swaps = 2
swap_pairs = np.random.choice(swappable_indices,number_of_swaps*2, replace=False)
# Using the swap pairs, build a map of substitutions,
# starting with a vanilla no-swap map, then updating it with the generated swaps
swap_map={e:e for e in range(0,len(column_values))}
swap_map.update({swappable_indices[e]:swappable_indices[e+1] for e in range(0,len(swap_pairs),2)})
swap_map.update({swappable_indices[e+1]:swappable_indices[e] for e in range(0,len(swap_pairs),2)})
# Having built the swap-map, apply it to the data in the column,
column_values=[column_values[swap_map[e]] for e,v in enumerate(column_values)]
# and then plug the column back into the dataframe
df[column_to_edit]=column_values
It's a bit grubby, and I'm sure there's a cleaner way to build that substitution map in perhaps a one-line list comprehension - but that should do the trick.
Alternately, there's the np.permute function which might bear some fruit in terms of adding some noise (albeit not by performing discrete swaps).
[edit] For testing, try a dataset with a bit less rigidity, here's an example of a more randomly generated one. Just edit out the columns you want to replace with fixed values if you want to impose some order in the dataset.
df = pd.DataFrame({"Class" : [1,2,1,3,1,2,1,3,1,2,1,3,1,2,1,3],
1:np.random.choice([0,1],16),
2:np.random.choice([0,1],16),
3:np.random.choice([0,1],16),
4:np.random.choice([0,1],16),
5:np.random.choice([0,1],16),
6:np.random.choice([0,1],16)}).set_index("Class")
I'm trying to use yaml to represent a train network with stations and lines; a minimum working example might be 3 stations, connected linearly, so A<->B<->C. I represent the three stations as follows:
---
stations:
- A
- B
- C
Now I want to store the different lines on the network, and where they start/end. To do this, I add a lines array and some anchors, as follows:
---
stations:
- &S-A A
- &S-B B
- &S-C C
lines:
- &L-A2C A to C:
from: *S-A
to: *S-C
- &L-C2A C to A:
from: *S-C
to: *S-A
and here's the part I'm having trouble with: I want to store the next stop each line at each station. Ideally something like this:
---
stations:
- &S-A A:
next:
- *L-A2C: *S-B
- &S-B B:
next:
- *L-A2C: *S-C
- *L-C2A: *S-A
- &S-C C:
next:
- *L-C2A: *S-B
(the lines array remains the same)
But this fails - at least in the Python yaml library, saying yaml.composer.ComposerError: found undefined alias 'L-A2C'. I know why this is - it's because I haven't defined the line yet. But I can't define the lines first, because they depend on the stations, but now the stations depend on the lines.
Is there a better way to implement this?
Congradulations! You found an issue in most (if not all) YAML implementations. I recently discovered this limitation too and I am investigating how to work around (in Ruby world). But that's not going to help you. What you are going to have to do is store the "next stops" as a separate set of data points.
next-stops:
*S-A:
- *L-A2C: *S-B
*S-B:
- *L-A2C: *S-C
- *L-C2A: *S-A
*S-C:
- *L-C2A: *S-B
Does that help?