I am still a beginnner in R. For a project I am trying to fit a gam model on a simple dataset with a timeset and year. I am doing it in R and I keep getting an error message that claims an argument is unused, even though I specify it in the code.
It concerns a dataset which includes a categorical variable of "Year", with only two levels. 2020 and 2022. I want to investigate if there is a peak in the hourly rate of visitors ("H1") in a nature reserve. For each observation period the average time was taken, which is the predictor variable used here ("T"). I want to use a Gam model for this, and have the smoothing applied differently for the two years.
The following is the line of code that I tried to use
`gam1 <- gam(H1~Year+s(T,by=Year),data = d)`
When I try to run this code, I get the following error message
`Error in s(T, by = Year) : unused argument (by = Year)`
I also tried simply getting rid of the "by" argument
`gam1 <- gam(H1~Year+s(T,Year),data = d)`
This allows me to run the code, but when trying to summon the output using summary(gam1), I get
Error in [<-(tmp, snames, 2, value = round(nldf, 1)) : subscript out of bounds
Since I feel like both errors are probably related to the same thing that I'm doing wrong, I decided to combine the question.
Did you load the {mgcv} package or the {gam} package? The latter doesn't have factor by smooths and as such the first error message is what I would expect if you did library("gam") and then tried to fit the model you showed.
To fit the model you showed, you should restart R and try in a clean session:
library("mgcv")
# load you data
# fit model
gam1 <- gam(H1 ~ Year + s(T, by = Year), data = d)
It could well be that you have both {gam} and {mgcv} loaded, in which case whichever you loaded last will be earlier on the function search path. As both packages have functions gam() and s(), R might just be finding the wrong versions (masking), so you might also try
gam1 <- mgcv::gam(H1 ~ Year + mgcv::s(T, by = Year), data = d)
But you would be better off only loading {mgcv} if you wan factor by smooths.
#Gavin Simpson
I did have both loaded, and I tried just using mgcv as you suggested. However, then I get the following error.
Error in names(dat) <- object$term :
'names' attribute [1] must be the same length as the vector [0]
I am assuming this is simply because it's not actually trying to use the "gam" function, but rather it attempts to name something gam1. So I would assume I actually need the package of 'gam' before I could do this.
The second line of code also doesn't work. I get the following error
Error in model.frame.default(formula = H1 ~ Year + mgcv::s(T, by = Year), :
invalid type (list) for variable 'mgcv::s(T, by = Year)'
This happens no matter the order I download the two packages in. And if I don't download 'gam', I get the error as described above.
Looking at https://docs.raku.org/language/pod#Lists. I don't see a way to create a numbered list:
one
three
four
Is there an undocumented way to do it?
There is not currently (as of January 2022) an implemented way to use ordered list in Pod6.
The historical design documents contain Pod6 syntax for ordered lists and, as far as I know, this remains something that we'd like to add. Once that syntax is implemented, you'll be able to write something like:
=item1 # Animal
=item2 # Vertebrate
=item2 # Invertebrate
=item1 # Phase
=item2 # Solid
=item2 # Liquid
=item2 # Gas
This would produce output along the lines of:
1. Animal
1.1 Vertebrate
1.2 Invertebrate
2. Phase
2.1 Solid
2.2 Liquid
2.3 Gas
(Though the exact syntax for rendering the list would be up to the implementation of the Pod renderer.)
But until that's implemented, there isn't any way to use Pod6 syntax to create an ordered list.
Edit:
I just checked the actual parsed Pod6, and it looks like (to my surprise) the ordered list syntax I showed above actually is parsed internally. For example, running say $=pod[5].raku with the Pod6 shows the following (based on the =item2 # Liquid line):
Pod::Item.new(level => 2, config => {:numbered(1)}, contents => [Pod::Block::Para.new(config => {}, contents => ["Liquid"])])
So the parsing work is in place; it's just the Pod::To::_ renderer that need to add support. (And there could even be some out there that have that support. I do know that neither Rakudo's Pod::To::Text nor Raku's Pod::To::HTML (v0.8.1) currently render ordered lists, however.)
Workarounds
Depending on the output formats you're targeting, you could of course write the ordered list yourself (pretty easy if you're rendering to plain text, more annoying to do if you're printing to HTML). This does, of course, sacrifice Pod6's multi-output-format support, which is one of its key features.
For a workaround that doesn't sacrifice Pod's multi-output nature, you'd probably want to look into manipulating/reformatting the Pod text programmatically. If you do so, the docs to start with are the Pod6 section on accessing Pod and the (unfortunately very short) section on the DOC phaser.
Just use a list and a loop?
my #list = [ (1, 2, 3), (1, 2, ),
[<a b c>, <d e f>],
[[1]]];
for #list -> #element {
say "{#element} → {#element.^name}";
for #element -> $sub-element {
say $sub-element;
}
}
# OUTPUT:
#1 2 3 → List
#1
#2
#3
#1 2 → List
#1
#2
#a b c d e f → Array
#(a b c)
#(d e f)
#1 → Array
#1
I am new to using exposures. I want show more than 1 step downstream. Is it possible to make an exposure that depends on another exposure? How do you reference it? I tried this but it doesn't work. It says there is no node Step1:
- name: Step1
depends_on:
- ref('MyTable')
- name: Step2
depends_on:
- ref('Step1')
This isn't supported today. Exposures are leaf nodes in the directed, acyclic graph.
However there is a dbt-core GitHub issue today lists what you're asking for as a potential new feature:
exposures that depend on other exposures:
one exposure for each Mode query / Looker view, one exposure for the
dashboard that depends on those queries / views
Until then, the best you can do if you have a DAG like: table_A -> exposure1 -> exposure2, then you could restructure it like:
exposure1
/
table_A
\
exposure2
IMHO, documenting only exposure1 is sufficient, but sounds like you'd like more.
How can I swap values within classes please?
As shown in this table:
- - - - - - - - - - Before - - - - - - - - - - - - - - - - - - After - - - - - - - - - -
I want to do this because it is over sampled data. It is very repetitive and this causes machine learning tools to over fit.
OK, try this out:
# Setup example dataframe
df = pd.DataFrame({"Class" : [1,2,1,3,1,2,1,3,1,2,1,3,1,2,1,3],
1:[1,1,1,0,1,0,1,0,1,0,1,0,1,0,1,1],
2:[0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0],
3:[0,0,1,1,1,0,1,1,0,0,1,1,1,0,1,1],
4:[1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1],
5:[0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1],
6:[0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1]}).set_index("Class")
# Do a filter on class, and store the positions/index of matching contents
class_to_edit=3
swappable_indices = np.where(df.index==class_to_edit)[0]
# Extract the column to edit
column_to_edit=1
column_values = df[column_to_edit].values
# Decide how many values to swap, and randomly assign swaps
# No guarantee here that the swaps will not contain the same values i.e. you could
# end up swapping 1's for 1's and 0's for 0's here - it's entirely random.
number_of_swaps = 2
swap_pairs = np.random.choice(swappable_indices,number_of_swaps*2, replace=False)
# Using the swap pairs, build a map of substitutions,
# starting with a vanilla no-swap map, then updating it with the generated swaps
swap_map={e:e for e in range(0,len(column_values))}
swap_map.update({swappable_indices[e]:swappable_indices[e+1] for e in range(0,len(swap_pairs),2)})
swap_map.update({swappable_indices[e+1]:swappable_indices[e] for e in range(0,len(swap_pairs),2)})
# Having built the swap-map, apply it to the data in the column,
column_values=[column_values[swap_map[e]] for e,v in enumerate(column_values)]
# and then plug the column back into the dataframe
df[column_to_edit]=column_values
It's a bit grubby, and I'm sure there's a cleaner way to build that substitution map in perhaps a one-line list comprehension - but that should do the trick.
Alternately, there's the np.permute function which might bear some fruit in terms of adding some noise (albeit not by performing discrete swaps).
[edit] For testing, try a dataset with a bit less rigidity, here's an example of a more randomly generated one. Just edit out the columns you want to replace with fixed values if you want to impose some order in the dataset.
df = pd.DataFrame({"Class" : [1,2,1,3,1,2,1,3,1,2,1,3,1,2,1,3],
1:np.random.choice([0,1],16),
2:np.random.choice([0,1],16),
3:np.random.choice([0,1],16),
4:np.random.choice([0,1],16),
5:np.random.choice([0,1],16),
6:np.random.choice([0,1],16)}).set_index("Class")