RStudio Error: Unused argument ( by = ...) when fitting gam model, and smoothing seperately for a factor - smoothing

I am still a beginnner in R. For a project I am trying to fit a gam model on a simple dataset with a timeset and year. I am doing it in R and I keep getting an error message that claims an argument is unused, even though I specify it in the code.
It concerns a dataset which includes a categorical variable of "Year", with only two levels. 2020 and 2022. I want to investigate if there is a peak in the hourly rate of visitors ("H1") in a nature reserve. For each observation period the average time was taken, which is the predictor variable used here ("T"). I want to use a Gam model for this, and have the smoothing applied differently for the two years.
The following is the line of code that I tried to use
`gam1 <- gam(H1~Year+s(T,by=Year),data = d)`
When I try to run this code, I get the following error message
`Error in s(T, by = Year) : unused argument (by = Year)`
I also tried simply getting rid of the "by" argument
`gam1 <- gam(H1~Year+s(T,Year),data = d)`
This allows me to run the code, but when trying to summon the output using summary(gam1), I get
Error in [<-(tmp, snames, 2, value = round(nldf, 1)) : subscript out of bounds
Since I feel like both errors are probably related to the same thing that I'm doing wrong, I decided to combine the question.

Did you load the {mgcv} package or the {gam} package? The latter doesn't have factor by smooths and as such the first error message is what I would expect if you did library("gam") and then tried to fit the model you showed.
To fit the model you showed, you should restart R and try in a clean session:
library("mgcv")
# load you data
# fit model
gam1 <- gam(H1 ~ Year + s(T, by = Year), data = d)
It could well be that you have both {gam} and {mgcv} loaded, in which case whichever you loaded last will be earlier on the function search path. As both packages have functions gam() and s(), R might just be finding the wrong versions (masking), so you might also try
gam1 <- mgcv::gam(H1 ~ Year + mgcv::s(T, by = Year), data = d)
But you would be better off only loading {mgcv} if you wan factor by smooths.

#Gavin Simpson
I did have both loaded, and I tried just using mgcv as you suggested. However, then I get the following error.
Error in names(dat) <- object$term :
'names' attribute [1] must be the same length as the vector [0]
I am assuming this is simply because it's not actually trying to use the "gam" function, but rather it attempts to name something gam1. So I would assume I actually need the package of 'gam' before I could do this.
The second line of code also doesn't work. I get the following error
Error in model.frame.default(formula = H1 ~ Year + mgcv::s(T, by = Year), :
invalid type (list) for variable 'mgcv::s(T, by = Year)'
This happens no matter the order I download the two packages in. And if I don't download 'gam', I get the error as described above.

Related

Pyomo dynamic optimisation ERROR: variable that is not attached to an active block on the submodel being written

I am trying to set up a model for a dynamic optimisation problem with pyomo.DAE.
I defined my state variable as well as it's corresponding Derivative (both indexed by m.Time). I then set up a simple constraint that expresses the relationship between state and derivative variable in the most simple terms. Solving the problem with a dummy objective (so just testing the constraint), I get the following error:
ERROR: Model contains an expression (calc_my_state[0])
that contains a variable (derivative_var[0]) that is not
attached to an active block on the submodel being written
Here's an excerpt of what I wrote:
(.....)
m.state_var = Var(m.Time, initialize=0)
m.derivative_var = DerivativeVar(m.state_var, wrt=m.Time)
def calc_my_state(m,i):
return m.derivative_var[i] == m.state_var[i]*2
m.calc_my_state = Constraint(m.Time, rule=calc_my_state)
m.obj = Objective(expr=1)
opt = SolverFactory("glpk")
results = opt.solve(m)
I tried to reproduce the simple setup of an DAE in pyomo, more or less copied and pasted lines from the pyomoDAE docu.
I printed derivative_var.get_state_var() and it gives me the right state variable without error.
I also tried solving simple DAE examples that I found on the internet and solving them with my solver settings worked fine as well.
What am I missing? I am grateful for any input!!! Thanks!
I found the missing link: I did not specify the "Discretization Transformation". Once something like the following was added, the script ran without error!
discretizer = TransformationFactory('dae.finite_difference')
discretizer.apply_to(m, wrt=m.Time)

Understanding the "Not found: Dataset ### was not found in location US" error

I know this topic has come up many times but still here I am. Data processing location seems consistent (dataset, US; query: US) and I am using backticks & long format in the FROM clause
Below are two sequences of code. The first one works perfectly:
SELECT station_id
FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
Whereas the following returns an error message:
SELECT bikeshare_stations.station_id
FROM `bigquery-public-data.austin_bikeshare`
Not found: Dataset glassy-droplet-347618:bigquery-public-data was not found in location US
My question, thus, is why do the first lines of text work while the second doesn't?
You need to understand the different parts of the backticks:
bigquery-public-data is the name of the project;
austin_bikeshare is the name of the schema (aka dataset in BQ); and
bikeshare_stations is the name of the table/view.
Therefore, the shorter format you are looking for is: austin_bikeshare.bikeshare_stations (instead of bigquery-public-data.austin_bikeshare).
Using bigquery-public-data.austin_bikeshare means that you have a schema called bigquery-public-data that contains a table called austin_bikeshare , when this is not true.

splitting columns with str.split() not changing the outcome

Will I have to use the str.split() for an exercise. I have a column called title and it looks like this:
and i need to split it into two columns Name and Season, the following code does not through an error but it doesn't seem to be doing anything as well when i'm testing it with df.head()
df[['Name', 'Season']] = df['title'].str.split(':',n=1, expand=True)
Any help as to why?
The code you have in your question is correct, and should be working. The issue could be coming from the execution order of your code though, if you're using Jupyter Notebook or some method that allows for unclear ordering of code execution.
I recommend starting a fresh kernel/terminal to clear all variables from the namespace, then executing those lines in order, e.g.:
# perform steps to load data in and clean
print(df.columns)
df[['Name', 'Season']] = df['title'].str.split(':',n=1, expand=True)
print(df.columns)
Alternatively you could add an assertion step in your code to ensure it's working as well:
df[['Name', 'Season']] = df['title'].str.split(':',n=1, expand=True)
assert {'Name', 'Season'}.issubset(set(df.columns)), "Columns were not added"

New to Python and Pandas, Looking for help aggregating observations

I am relatively new to using Python and Pandas, and was looking for help with this line of code:
`Football.ydstogo[Football.ydstogo>='11']&[Football.ydstogo<='19']= '10-plus`'
I am working with data from the NFL, and trying to build a model to predict when a team will pass, or when a team will run the ball. One of my variables (ydstogo) measures the distance for the team, with the ball, to get a first down. I am trying to group together the observations after 10 yards for ease of visualization.
When I tried running the code above, the error in my output is "can't assign to operator". I've used this code before to change gender observations to dummy variables, so I'm confused why it is not working here.
As I understand, you want to find elements with (string)
value between '11' and '19' and set a new string there.
So probably your should change your code to:
Football.ydstogo[(Football.ydstogo >= '11') & (Football.ydstogo <= '19')] = '10-plus'
Alternative:
Football.ydstogo[Football.ydstogo.between('11', '19')] = '10-plus'

Understanding PsychoPy codes for trialHandler and responses

I am new to coding, and would like help in understanding the script used by the PsychoPy program.
To be more specific, I would like to understand the codes that are in line 6 to 15. I am aware that this is used to manage the multiple trials, but I am hoping someone can help me clarify those bits? I also noted that removing the codes from line 6-8 doesn't change the experiment, but removing the codes from line 10-15 essentially stop the experiment from running.
trialsAll = data.TrialHandler(trialList=data.importConditions('trialType.xlsx'), nReps=10, method='random', name='trialsAll', dataTypes='corr')
thisExp = data.ExperimentHandler(name='Ours')
thisExp.addLoop(trialsAll) #adds a loop to the experiment
thisTrial = trialsAll.trialList[0]
if thisTrial != None:
for paramName in thisTrial.keys():
exec(paramName + '= thisTrial.' + paramName)
# Loop through trials
for thisTrial in trialsAll:
currentLoop=trialsAll
if thisTrial != None:
for paramName in thisTrial.keys():
exec(paramName + '=thisTrial.' + paramName)
My second question would be about getting responses. Is there a reason that thisResp is equalled to None?
#get response
thisResp=None
while thisResp==None:
allKeys=event.waitKeys()
Thanks a lot for any help. I appreciate it.
Regards,
Cash
if thisTrial != None:
for paramName in thisTrial.keys():
exec(paramName + '= thisTrial.' + paramName)
This code allows the use of abbreviations. For example, say your conditions file has a field called 'angle', you can refer to this directly rather than via the keys of that trial's dictionary (e.g. thisTrial['angle'] ) or using dot notation ( thisTrial.angle ). i.e., in this example:
angle = thisTrial.angle
for thisTrial in trialsAll:
is fundamental to running a psychoPy trial loop. It will cycle though each trial that is contained in the TrialHandler object that is created to manage trials, connected to a given conditions file.
#get response
thisResp=None
while thisResp==None:
allKeys=event.waitKeys()
The line 'while thisResp==None:' requires that the variable 'thisResp' actually exists if we are going to be able to check its value. So in the immediately preceding line, it is created and given an initial null value so that the next line will run OK. Note that at this stage, it is just an arbitrary variable, which doesn't have any actual connection to the subject's response. That will presumably occur later in the code, when it gets assigned a value other than None.