vector of variable names in R - variables

I'd like to create a function that automatically generates uni and multivariate regression analyses, but I'm not able to figure out how I can specify **variables in vectors...**This seems very easy, but skimming the documentation I havent figured it out so far...
Easy example
a<-rnorm(100)
b<-rnorm(100)
k<-c("a","b")
d<-c(a,b)
summary(k[1])
But k[1]="a" and is a character vector...d is just b appended to a, not the variable names. In effect I'd like k[1] to represent the vector a.
Appreciate any answers...
//M

You can use the "get" function to get an object based on a character string of its name, but in the long run it is better to store the variables in a list and just access them that way, things become much simpler, you can grab subsets, you can use lapply or sapply to run the same code on every element. When saving or deleting you can just work on the entire list rather than trying to remember every element. e.g.:
mylist <- list(a=rnorm(100), b=rnorm(100) )
names(mylist)
summary(mylist[[1]])
# or
summary(mylist[['a']])
# or
summary(mylist$a)
# or
d <- 'a'
summary(mylist[[d]])
# or
lapply( mylist, summary )
If you are programatically creating models for analysis with lm (or other modeling functions), then one approach is to just subset your data and use the ".", e.g.:
yvar <- 'Sepal.Width'
xvars <- c('Petal.Width','Sepal.Length')
fit <- lm( Sepal.Width ~ ., data=iris[, c(yvar,xvars)] )
Or you can build the formula using "paste" or "sprintf" then use "as.formula" to convert it to a formula, e.g.:
yvar <- 'Sepal.Width'
xvars <- c('Petal.Width','Sepal.Length')
my.formula <- paste( yvar, '~', paste( xvars, collapse=' + ' ) )
my.formula <- as.formula(my.formula)
fit <- lm( my.formula, data=iris )
Note also the problem of multiple comparisons if you are looking at many different models fit automatically.

you could use a list k=list(a,b). This creates a list with components a and b but is not a list of variable names.

get() is what you're looking for :
summary(get(k[1]))
edit :
get() is not what you're looking for, it's list(). get() could be useful too though.
If you're looking for automatic generation of regression analyses, you might actually benefit from using eval(), although every R-programmer will warn you about using eval() unless you know very well what you're doing. Please read the help files about eval() and parse() very carefully before you use them.
An example :
d <- data.frame(
var1 = rnorm(1000),
var2 = rpois(1000,4),
var3 = sample(letters[1:3],1000,replace=T)
)
vars <- names(d)
auto.lm <- function(d,dep,indep){
expr <- paste(
"out <- lm(",
dep,
"~",
paste(indep,collapse="*"),
",data=d)"
)
eval(parse(text=expr))
return(out)
}
auto.lm(d,vars[1],vars[2:3])

Related

filtering "events" in awkward-array

I am reading data from a file of "events". For each event, there is some number of "tracks". For each track there are a series of "variables". A stripped down version of the code (using awkward0 as awkward) looks like
f = h5py.File('dataAA/pv_HLT1CPU_MinBiasMagDown_14Nov.h5',mode="r")
afile = awkward.hdf5(f)
pocaz = np.asarray(afile["poca_z"].astype(dtype_X))
pocaMx = np.asarray(afile["major_axis_x"].astype(dtype_X))
pocaMy = np.asarray(afile["major_axis_y"].astype(dtype_X))
pocaMz = np.asarray(afile["major_axis_z"].astype(dtype_X))
In this snippet of code, "pocaz", "pocaMx", etc. are what I have called variables (a physics label, not a Python data type). On rare occasions, pocaz takes on an extreme value, pocaMx and/or pocaMy take on nan values, and/or pocaMz takes on the value inf. I would like to remove these tracks from the events using some syntactically simple method. I am guessing this functionality exists (perhaps in the current version of awkward but not awkward0), but cannot find it described in a transparent way. Is there a simple example anywhere?
Thanks,
Mike
It looks to me, from the fact that you're able to call np.asarray on these arrays without error, that they are one-dimensional arrays of numbers. If so, then Awkward Array isn't doing anything for you here; you should be able to find the one-dimensional NumPy arrays inside
f["poca_z"], f["major_axis_x"], f["major_axis_y"], f["major_axis_z"]
as groups (note that this is f, not afile) and leave Awkward Array entirely out of it.
The reason I say that is because you can use np.isfinite on these NumPy arrays. (There's an equivalent in Awkward v1, v2, but you're talking about Awkward v0 and I don't remember.) That will give you an array of booleans for you to slice these arrays.
I don't have the HDF5 file for testing, but I think it would go like this:
f = h5py.File('dataAA/pv_HLT1CPU_MinBiasMagDown_14Nov.h5',mode="r")
pocaz = np.asarray(a["poca_z"]["0"], dtype=dtype_X)
pocaMx = np.asarray(a["major_axis_x"]["0"], dtype=dtype_X) # the only array
pocaMy = np.asarray(a["major_axis_y"]["0"], dtype=dtype_X) # in each group
pocaMz = np.asarray(a["major_axis_z"]["0"], dtype=dtype_X) # is named "0"
good = np.ones(len(pocaz), dtype=bool)
good &= np.isfinite(pocaz)
good &= np.isfinite(pocaMx)
good &= np.isfinite(pocaMy)
good &= np.isfinite(pocaMz)
pocaz[good], pocaMx[good], pocaMy[good], pocaMz[good]
If you also need to cut extreme finite values, you can include
good &= (-1000 < pocaz) & (pocaz < 1000)
etc. in the good selection criteria.
(The way you'd do this in Awkward Array is not any different, since Awkward is just generalizing what NumPy does here, but if you don't need it, you might as well leave it out.)
If you want numpy arrays, why not read the data with h5py functions? It provides a very natural way to return the datasets as arrays. Code would look like this. (FYI, I used the file context manager to open the file.)
with h5py.File('dataAA/pv_HLT1CPU_MinBiasMagDown_14Nov.h5',mode="r") as h5f:
# the [()] returns the dataset as an array:
pocaz_arr = h5f["poca_z"]["0"][()]
# verify array shape and datatype:
print(f"Shape: {pocaz_arr.shape}, Dtype: {poca_z_arr.dtype})")
pocaMx_arr = h5f["major_axis_x"]["0"][()] # the only dataset
pocaMy_arr = h5f["major_axis_y"]["0"][()] # in each group
pocaMz_arr = h5f["major_axis_z"]["0"][()] # is named "0"

Is it possible to use a comprehension list and index variable names? [duplicate]

Consider a dictionary d in Julia which contains a thousand of keys. Each key is a symbol and each value is an array. I can access to the value associated with the symbol :S1 and assign it to the variable k1 via
k1 = d[:S1]
Now assume I want to define the new variables k2, k3, k4, ..., k10 by repeating the same procedure for the special keys :S1 ... :S10 (not for all the keys in the dictionary). What is the most efficient way to do it? I have the impression this can be solved using metaprogramming but not sure about that.
The easy way is to use Parameters.jl.
using Parameters
d = Dict{Symbol,Any}(:a=>5.0,:b=>2,:c=>"Hi!")
#unpack a, c = d
a == 5.0 #true
c == "Hi!" #true
BTW, this doesn't use eval.
If the special keys are all known at compile time, I suggest using Chris Rackauckas's answer
It is much less evil and works to create local variables.
If for some reason they are only known at runtime,
then you can do as follows.
(Though I guess it is actually pretty strange to need to create variables who's name you don't even know at compile time)
#eval is your friend* here. See the manual
#eval $key = $value
Or you can use the functional form eval() taking a quoted expression:
eval(:($key = $value))
Note however you can not use this to introduce new local variables.
eval always executes at module scope.
And that is a intentional restriction for performance reasons
julia> d = Dict(k => rand(3) for k in [:a, :b1, :c2, :c1])
Dict{Symbol,Array{Float64,1}} with 4 entries:
:a => [0.446723, 0.0853543, 0.476118]
:b1 => [0.212369, 0.846363, 0.854601]
:c1 => [0.542332, 0.885369, 0.635742]
:c2 => [0.118641, 0.987508, 0.578754]
julia> for (k,v) in d
#create constants for only ones starting with `c`
if first(string(k)) == c
#eval const $k = $v
end
end
julia> c2
3-element Array{Float64,1}:
0.118641
0.987508
0.578754
*Honestly eval is not your friend.
It is however the only dude badass enough to walk with you down this dark road of generating code based on runtime values. (#generate is only marginally less badass being willing to generate code based on runtime Types).
If you are in this situation where you absolutely have to generate code based on runtime information consider whether you have not made a design mistake several forks further up the road.
In case you really want to have k1, k2, ... , k10, ... you could use a little more complicated eval than Lyndon's:
for (i,j) in enumerate(d)
#eval $(Symbol("k$i")) = $j.second
end
Warning: eval() use global scope so even if you use this inside a function k1...kn will be global variables.

Clearing numerical values in Mathematica

I am working on fairly large Mathematica projects and the problem arises that I have to intermittently check numerical results but want to easily revert to having all my constructs in analytical form.
The code is fairly fluid I don't want to use scoping constructs everywhere as they add work overhead. Is there an easy way for identifying and clearing all assignments that are numerical?
EDIT: I really do know that scoping is the way to do this correctly ;-). However, for my workflow I am really just looking for a dirty trick to nix all numerical assignments after the fact instead of having the foresight to put down a Block.
If your assignments are on the top level, you can use something like this:
a = 1;
b = c;
d = 3;
e = d + b;
Cases[DownValues[In],
HoldPattern[lhs_ = rhs_?NumericQ] |
HoldPattern[(lhs_ = rhs_?NumericQ;)] :> Unset[lhs],
3]
This will work if you have a sufficient history length $HistoryLength (defaults to infinity). Note however that, in the above example, e was assigned 3+c, and 3 here was not undone. So, the problem is really ambiguous in formulation, because some numbers could make it into definitions. One way to avoid this is to use SetDelayed for assignments, rather than Set.
Another alternative would be to analyze the names in say Global' context (if that is the context where your symbols live), and then say OwnValues and DownValues of the symbols, in a fashion similar to the above, and remove definitions with purely numerical r.h.s.
But IMO neither of these approaches are robust. I'd still use scoping constructs and try to isolate numerics. One possibility is to wrap you final code in Block, and assign numerical values inside this Block. This seems a much cleaner approach. The work overhead is minimal - you just have to remember which symbols you want to assign the values to. Block will automatically ensure that outside it, the symbols will have no definitions.
EDIT
Yet another possibility is to use local rules. For example, one could define rule[a] = a->1; rule[d]=d->3 instead of the assignments above. You could then apply these rules, extracting them as say
DownValues[rule][[All, 2]], whenever you want to test with some numerical arguments.
Building on Andrew Moylan's solution, one can construct a Block like function that would takes rules:
SetAttributes[BlockRules, HoldRest]
BlockRules[rules_, expr_] :=
Block ## Append[Apply[Set, Hold#rules, {2}], Unevaluated[expr]]
You can then save your numeric rules in a variable, and use BlockRules[ savedrules, code ], or even define a function that would apply a fixed set of rules, kind of like so:
In[76]:= NumericCheck =
Function[body, BlockRules[{a -> 3, b -> 2`}, body], HoldAll];
In[78]:= a + b // NumericCheck
Out[78]= 5.
EDIT In response to Timo's comment, it might be possible to use NotebookEvaluate (new in 8) to achieve the requested effect.
SetAttributes[BlockRules, HoldRest]
BlockRules[rules_, expr_] :=
Block ## Append[Apply[Set, Hold#rules, {2}], Unevaluated[expr]]
nb = CreateDocument[{ExpressionCell[
Defer[Plot[Sin[a x], {x, 0, 2 Pi}]], "Input"],
ExpressionCell[Defer[Integrate[Sin[a x^2], {x, 0, 2 Pi}]],
"Input"]}];
BlockRules[{a -> 4}, NotebookEvaluate[nb, InsertResults -> "True"];]
As the result of this evaluation you get a notebook with your commands evaluated when a was locally set to 4. In order to take it further, you would have to take the notebook
with your code, open a new notebook, evaluate Notebooks[] to identify the notebook of interest and then do :
BlockRules[variablerules,
NotebookEvaluate[NotebookPut[NotebookGet[nbobj]],
InsertResults -> "True"]]
I hope you can make this idea work.

Elegantly alter a list of variables: Generalization of AddTo, TimesBy, etc

Suppose I've defined a list of variables
{a,b,c} = {1,2,3}
If I want to double them all I can do this:
{a,b,c} *= 2
The variables {a,b,c} now evaluate to {2,4,6}.
If I want to apply an arbitrary transformation function to them, I can do this:
{a,b,c} = f /# {a,b,c}
How would you do that without specifying the list of variables twice?
(Set aside the objection that I'd probably want an array rather than a list of distinctly named variables.)
You can do this:
Function[Null, # = f /# #, HoldAll][{a, b, c}]
For example,
In[1]:=
{a,b,c}={1,2,3};
Function[Null, #=f/##,HoldAll][{a,b,c}];
{a,b,c}
Out[3]= {f[1],f[2],f[3]}
Or, you can do the same without hard-coding f, but defining a custom set function. The effect of your foreach loop can be reproduced easily if you give it Listable attribute:
ClearAll[set];
SetAttributes[set, {HoldFirst, Listable}]
set[var_, f_] := var = f[var];
Example:
In[10]:= {a,b,c}={1,2,3};
set[{a,b,c},f1];
{a,b,c}
Out[12]= {f1[1],f1[2],f1[3]}
You may also want to get speed benefits for cases when your f is Listable, which is especially relevant now since M8 Compile enables user-defined functions to benefit from being Listabe in terms of speed, in a way that previously only built-in functions could. All you have to do for set for such cases (when you are after speed and you know that f is Listable) is to remove the Listable attribute of set.
I hit upon an answer to this when fixing up this old question: ForEach loop in Mathematica
Defining the each function as in the accepted answer to that question, we can answer this question with:
each[i_, {a,b,c}, i = f[i]]

Variables in Haskell

Why does the following Haskell script not work as expected?
find :: Eq a => a -> [(a,b)] -> [b]
find k t = [v | (k,v) <- t]
Given find 'b' [('a',1),('b',2),('c',3),('b',4)], the interpreter returns [1,2,3,4] instead of [2,4]. The introduction of a new variable, below called u, is necessary to get this to work:
find :: Eq a => a -> [(a,b)] -> [b]
find k t = [v | (u,v) <- t, k == u]
Does anyone know why the first variant does not produce the desired result?
From the Haskell 98 Report:
As usual, bindings in list
comprehensions can shadow those in
outer scopes; for example:
[ x | x <- x, x <- x ] = [ z | y <- x, z <- y]
One other point: if you compile with -Wall (or specifically with -fwarn-name-shadowing) you'll get the following warning:
Warning: This binding for `k' shadows the existing binding
bound at Shadowing.hs:4:5
Using -Wall is usually a good idea—it will often highlight what's going on in potentially confusing situations like this.
The pattern match (k,v) <- t in the first example creates two new local variables v and k that are populated with the contents of the tuple t. The pattern match doesn't compare the contents of t against the already existing variable k, it creates a new variable k (which hides the outer one).
Generally there is never any "variable substitution" happening in a pattern, any variable names in a pattern always create new local variables.
You can only pattern match on literals and constructors.
You can't match on variables.
Read more here.
That being said, you may be interested in view patterns.