How to store several variable names in Stata? - variables

I want to store a list of variable names in a new local variable, such that I do not have to type a long list of variable names for each regression. I am using Stata 14.
E.g., I have the following 5 independent variables: a b c d e and one dependent variable: f
I don't want:
regress f a b c d e
But I want something like:
regress f allvar
How can I generate allvar?
Unfortunately, this does not work
local allvar a b c d e

The following works fine.
clear
set more off
sysuse auto
// first regressions
regress price mpg rep78 weight
// second regression
local allvars mpg rep78 weight
regress price `allvars'
Unless you show us something reproducible and/or more explicit, it's difficult to see what the problem is. A report only mentioning "does not work" is usually useless.
See also the keyword _all in help varlist.
You are using a local macro. If you are running the code by parts, then don't. You need to run the whole code, all at once. Read [P] macro, for details. An excerpt:
Local macros exist solely within the program or do-file in which they
are defined. If that program or do-file calls another program or
do-file, the local macros previously defined temporarily cease to
exist, and their existence is reestablished when the calling program
regains control. When a program or do-file ends, its local macros are
permanently deleted.

A common reason why your command sometimes "does not work" is that you ran your do-file line by line, rather than all in one go. A local macro is local to a session (hence the name). So if you ran the line local allvar a b c d e, then that will create that local macro and let it disapear as soon as Stata finished running that section of your .do file. There are two solutions:
You can get into the habit of running the definition of local macros and their use in one go. It is actually good practice to make many small .do files and make each .do file self-contained (see for example this excellent book), so you can easily just run the entire .do file each time you want to check or change something.
Alternatively, you can use global macros. These continue to exist after a session. As someone that programs in Stata, using global macros hurts my eyes, but I guess that if you use Stata only to analyse data it does little harm.
As an asside, allvar does not seem like a right name for that local macro: it does not contain all variables as it excludes the variable f. This sounds pedantic (and it is), but it is good practice to use names that accurately describe its content. In a real project we tend to come back to it after some time. A common scenarion is that you submitted a paper to a journal, it took half a year or more for the reviews to come in, and now you need to "read" your own .do-file to understand what you did half a year ago. At that point you are very happy that you were pedantic when writing the .do file...
As a further asside, assuming that a b c d e f are indeed all the variables in your dataset you can also create your local using:
ds f, not
local rhs `r(varlist)' // rhs short for right-hand side

Related

Pyiron automatically assigns bonds in the input files for lammps for O and H whenever they exist in the structure

Whenever I purposefully put H atoms inside a structure (with Fe and O in the structure as host atoms) as interstitials, I expect that it will be defined as an isolated atom with no bonds between it and the surrounding host atoms. However, depending on the location of the H interstitial, pyiron sometimes defines a bond between the additional H and the original O for lammps calculation.
This can be useful for the automatic detection of bonds. However, how can one control this feature whenever not needed?
I also believe there is a bug with the .define_bonds() function.
The inputs for the functions are:
species="O"
element_list=["H"]
cutoff_list=[2.0]
max_bond_list=1
bond_type_list=1
While I believe the max_bond_list and bond_type_list are supposed to be lists, it only works error-free if both are defined as integers instead because of the way they are defined. However, defining them as integers then messes up running the jobs because they should be iterables.

Lua Spaghetti Modules

I am currently developing my own programming language. The codebase (in Lua) is composed of several modules, as follows:
The first, error.lua, has no dependancies;
lexer.lua depends only on error.lua;
prototypes.lua also has no dependancies;
parser.lua, instead, depends on all the modules above;
interpreter.lua is the fulcrum of the whole codebase. It depends on error.lua, parser.lua, and memory.lua;
memory.lua depends on functions.lua;
finally, functions.lua depends on memory.lua and interpreter.lua. It is required from inside memory.lua, so we can say that memory.lua also depends on interpreter.lua.
With "A depends on B" I mean that the functions declared in A need those declared in B.
The real problem, though, is when A depends on B which depends on A, which, as you can understand from the list above, happens quite frequently in my code.
To give a concrete example of my problem, here's how interpreter.lua looks like:
--first, I require the modules that DON'T depend on interpreter.lua
local parser, Error = table.unpack(require("parser"))
--(since error.lua is needed both in the lexer, parser and interpreter module,
--I only actually require it once in lexer.lua and then pass its result around)
--Then, I should require memory.lua. But since memory.lua and
--functions.lua need some functions from interpreter.lua to work, I just
--forward declare the variables needed from those functions and then those functions themself:
--forward declaration
local globals, new_memory, my_nil, interpret_statement
--functions I need to declare before requiring memory.lua
local function interpret_block()
--uses interpret_statement and new_memory
end
local function interpret_expresion()
--uses new_memory, Error and my_nil
end
--Now I can safely require memory.lua:
globals, new_memory, my_nil = require("memory.lua")(interpret_block, interpret_espression)
--(I'll explain why it returns a function to call later)
--Then I have to fulfill the forward declaration of interpret_executement:
function interpret_executement()
--uses interpret_expression, new_memory and Error
end
--finally, the result is a function
return function()
--uses parser, new_fuction and globals
end
The memory.lua module returns a function so that it can receive interpret_block and interpret_expression as arguments, like this:
--memory.lua
return function(interpret_block, interpret_expression)
--declaration of globals, new_memory, my_nil
return globals, new_memory, my_nil
end
Now, I got the idea of the forward declarations here and that of the functions-as-modules (like in memory.lua, to pass some functions from the requiring module to the required module) here. They're all great ideas, and I must say that they work greatly. But you pay in readability.
In fact, breaking in smaller pieces the code this time made my work harder that it would have been if I coded everything in a single file, which is impossible for me because it's over than 1000 lines of code and I'm coding from a smartphone.
The feeling I have is that of working with spaghetti code, only on a larger scale.
So how could I solve the problem of my code being ununderstandable because of some modules needing each other to work (which doesn't involve making all the variables global, of course)? How would programmers in other languages solve this problem? How should I reorganize my modules? Are there any standard rules in using Lua modules that could also help me with this problem?
If we look at your lua files as a directed graph, where a vertice points from a dependency to its usage, the goal is to modify your graph to be a tree or forest, as you intend to get rid of the cycles.
A cycle is a set of nodes, which, traversed in the direction of the vertices can reach the starting node.
Now, the question is how to get rid of cycles?
The answer looks like this:
Let's consider node N and let's consider {D1, D2, ..., Dm} as its direct dependencies. If there is no Di in that set that depends on N either directly or indirectly, then you can leave N as it is. In that case, the set of problematic dependencies looks like this: {}
However, what if you have a non-empty set, like this: {PD1, ..., PDk} ?
You then need to analyze PDi for i between 1 and k along with N and see what is the subset in each PDi that does not depend on N and what is the subset of N which does not depend on any PDi. This way you can define N_base and N, PDi_base and PDi. N depends on N_base, just like all PDi elements and PDi depends on PDi_base along with N_base.
This approach minimalizes circles in the dependency tree. However, it is quite possible that a function set of {f1, ..., fl} exists in this group which cannot be migrated into _base as discussed due to dependencies and there are still cycles. In this case you need to give a name to the group in question, create a module for it and migrate all to functions into that group.

SPSS automatically create interaction variables for Logistic Regression

I am using SPSS and have about 300 variables (categorical, scalar and ordinal) to model. I need an Easy / Quick way to create interaction variable composites for Logistic Regression where interactions exist. R does this automatically and creates about 158 composites (variables that have interactions) – there does not appear to be any automated way to create and input interaction variables in SPSS; having to manually input and or test these 158 composites every time I run a new model is going to be A LOT OF WORK!! Any suggestions on a quick way to do this?
If you are going to be repeatedly running this model and need a way to create these synthetic variables, you should most likely create a syntax file that will do it for you. When you use the GUI in SPSS to run a command, SPSS generates the syntax in the output window. You can copy this syntax and use it to create your own script. So, for instance you might write something like this:
DO IF (!MISSING Var1).
COMPUTE Var2 = Var1 * dummy1.
END IF.
EXECUTE.
And sadly, yes you would have to write this block of code 300 times the first go around, but in the future you can simply run it and have all the new variable computed.
Another approach is to name your variables sequentially and use a loop to process them. So assuming that your variables were sequentially named VarA, VarB, & VarC, then you could do a loop like so:
VECTOR VectorVar = VarA TO VarC.
LOOP #cnt = 1 to 3 by 1.
COMPUTE VectorVar(#cnt) = VectorVar * dummy1.
ENDLOOP.
EXECUTE.
Are you really looking to put in all 158 interaction terms? I'd be skeptical of that approach. But if you want to build variables representing all these interaction terms rather than specify them in the model, you can do it with the CREATE DUMMIES extension command available from the SPSS Community website (www.ibm.com/developerworks/spssdevcentral).
You could also use Python programmability to build the explicit interaction terms in the logistic procedure.
HTH,
Jon Peck

Creating robust real-time monitors for variables

We can create a real-time monitor for a variable like this:
CreatePalette#Panel#Row[{"x = ", Dynamic[x]}]
(This is more interesting and useful if x happens to be something like $Assumptions. It's so easy to set a value and then forget about it.)
Unfortunately this stops working if the kernel is re-launched (Quit[], then evaluate something). The palette won't show changes in the value of x any more.
Is there a way to do this so it keeps working even across kernel sessions? I find myself restarting the kernel quite often. (If the resulting palette causes the kernel to be automatically started after Quit that's fine.)
Update: As mentioned in the comments, it turns out that the palette ceases working only if we quit by evaluating Quit[]. When using Evaluation -> Quit Kernel -> Local, it will keep working.
Link to same question on MathGroup.
I can only guess, because on my Ubuntu here the situations seems buggy. The trick with the Quit from the menu like Leonid suggested did not work here. Another one is: on a fresh Mathematica session with only one notebook open:
Dynamic[x]
x = 1
Dynamic[x]
x = 2
gives as expected
2
1
2
2
Typing in the next line Quit, evaluating and typing then x=3 updates only the first of the Dynamic[x].
Nevertheless, have you checked the command
Internal`GetTrackedSymbols[]
This gives not only the tracked symbols but additionally some kind of ID where the dynamic content belongs. If you can find out, what exactly these numbers are and investigate in the other functions you find in the Internal context, you may be able to add your palette Dynamic-content manually after restarting the kernel.
I thought I had something like that with
Internal`SetValueTrackExtra
but I'm currently not able to reproduce the behavior.
#halirutan's answer jarred my memory...
Have you ever come across: Experimental/ref/ValueFunction? (documentation address)
Although the documentation contains no examples, the 'more information' section provides the following tidbit:
The assignment ValueFunction[symb] = f specifies that whenever
symb gets a new value val, the expression f[symb,val] should be
evaluated.

Clearing numerical values in Mathematica

I am working on fairly large Mathematica projects and the problem arises that I have to intermittently check numerical results but want to easily revert to having all my constructs in analytical form.
The code is fairly fluid I don't want to use scoping constructs everywhere as they add work overhead. Is there an easy way for identifying and clearing all assignments that are numerical?
EDIT: I really do know that scoping is the way to do this correctly ;-). However, for my workflow I am really just looking for a dirty trick to nix all numerical assignments after the fact instead of having the foresight to put down a Block.
If your assignments are on the top level, you can use something like this:
a = 1;
b = c;
d = 3;
e = d + b;
Cases[DownValues[In],
HoldPattern[lhs_ = rhs_?NumericQ] |
HoldPattern[(lhs_ = rhs_?NumericQ;)] :> Unset[lhs],
3]
This will work if you have a sufficient history length $HistoryLength (defaults to infinity). Note however that, in the above example, e was assigned 3+c, and 3 here was not undone. So, the problem is really ambiguous in formulation, because some numbers could make it into definitions. One way to avoid this is to use SetDelayed for assignments, rather than Set.
Another alternative would be to analyze the names in say Global' context (if that is the context where your symbols live), and then say OwnValues and DownValues of the symbols, in a fashion similar to the above, and remove definitions with purely numerical r.h.s.
But IMO neither of these approaches are robust. I'd still use scoping constructs and try to isolate numerics. One possibility is to wrap you final code in Block, and assign numerical values inside this Block. This seems a much cleaner approach. The work overhead is minimal - you just have to remember which symbols you want to assign the values to. Block will automatically ensure that outside it, the symbols will have no definitions.
EDIT
Yet another possibility is to use local rules. For example, one could define rule[a] = a->1; rule[d]=d->3 instead of the assignments above. You could then apply these rules, extracting them as say
DownValues[rule][[All, 2]], whenever you want to test with some numerical arguments.
Building on Andrew Moylan's solution, one can construct a Block like function that would takes rules:
SetAttributes[BlockRules, HoldRest]
BlockRules[rules_, expr_] :=
Block ## Append[Apply[Set, Hold#rules, {2}], Unevaluated[expr]]
You can then save your numeric rules in a variable, and use BlockRules[ savedrules, code ], or even define a function that would apply a fixed set of rules, kind of like so:
In[76]:= NumericCheck =
Function[body, BlockRules[{a -> 3, b -> 2`}, body], HoldAll];
In[78]:= a + b // NumericCheck
Out[78]= 5.
EDIT In response to Timo's comment, it might be possible to use NotebookEvaluate (new in 8) to achieve the requested effect.
SetAttributes[BlockRules, HoldRest]
BlockRules[rules_, expr_] :=
Block ## Append[Apply[Set, Hold#rules, {2}], Unevaluated[expr]]
nb = CreateDocument[{ExpressionCell[
Defer[Plot[Sin[a x], {x, 0, 2 Pi}]], "Input"],
ExpressionCell[Defer[Integrate[Sin[a x^2], {x, 0, 2 Pi}]],
"Input"]}];
BlockRules[{a -> 4}, NotebookEvaluate[nb, InsertResults -> "True"];]
As the result of this evaluation you get a notebook with your commands evaluated when a was locally set to 4. In order to take it further, you would have to take the notebook
with your code, open a new notebook, evaluate Notebooks[] to identify the notebook of interest and then do :
BlockRules[variablerules,
NotebookEvaluate[NotebookPut[NotebookGet[nbobj]],
InsertResults -> "True"]]
I hope you can make this idea work.