stata logistic regression error code due to missing values- what can I do? - syntax-error

I am performing a logistic regression
regress igg1ugml age_years term_spec cesareansection estradiollevels breastfeed_howlong, robust
when I run the code with both estradiollevels and breastfeed_howlong, it leads to the error. When I run it without one of those variables, it leads to an output.
Neither variables are string variables. However, they do hold many missing values (256/334) for breastfeed_howlong and (270/334) for estradiollevels
What can I do about this? I would like to include both variables in my analysis.
Thank you in advance

Related

Can you build a model that normalises FEATURES using the test set while avoiding data leakage?

Just can't wrap my head around this one.
I understand that:
Normalising a target variable using the test set uses information on that target variable in the test set. This means we get inflated performance metrics that cannot be replicated once we receive a new test set (which does not have a target variable available).
However, when we receive a new test set, we do have predictor variables available. What is wrong with using these to normalise? Yes, the predictors contain information that relates to the target variable, however that's literally the definition of predicting using a model, we use the information in predictors to get specific predictions for a target. Why can't it be built-in to the model definition that it uses input data to normalise, before predicting?
The performance metrics, surely, wouldn't be skewed as we are just using information from the predictors.
Yes, the test set is supposed to be 'unseen', but in reality, surely it's only the test set target variable that is unseen, not the predictors.
I have read around this and answers so far are vague, just repeating that test set is unseen and that we gain information about the test set. I would really appreciate an answer on why we can't use specifically the predictors, as I think the target case is obvious.
Thanks in advance!!
Having gone away and thought about my Q - normalising our data on the training set as well - I realise this doesn't make much sense. Normalising is not part of the training, but something we do before training, therefore normalising w/ test set features is fine as an idea, but we then would have to go train this normalised data on the training set outcomes. I originally thought "normalise on more data" > "normalise on less data" but actually we'd normalise on one set (training + test), then fit on another (training). Probably get a more poorly trained model as a result and so as I believe it's a stupid idea!

Stata output variable to matrix with ebalance

I'm using the ebalance Stata package to calculate post-stratification weights, and I'd like to convert the weights output (_webal, which is generated as a double with format %10.0g) to a matrix.
I'd like to normalize all weights in the "control" group, but I can't seem to convert the variable to a matrix in order to manipulate the weights individually (I'm a novice to Stata, so I was just going to do this using a loop––I'd normally just export and do this in R, but I have to calculate results within a bootstrap). I can, however, view the individual-level weights produced by the output, and I can use them to calculate sample statistics.
Any ideas, anyone? Thanks so much!
This is not an answer, but it doesn't fit within a comment box.
As a self-described novice in Stata, you are asking the wrong question.
Your problem is that you have a variable that you want to do some calculations on, and since you can't just use R and you don't know how to do those (unspecified) calculations directly in Stata, you have decided that the first step is to create a matrix from the variable.
Your question would be better phrased as a simple description of the relevant portions of your data and the calculation you need to do using that data (ebalance is an obscure distraction that probably lost you a few readers) and where you are stuck.
See also https://stackoverflow.com/help/mcve for a discussion of completing a minimal complete example with a description of the results you expect for that example.

a tricky graph solve in tensorflow

As the following, I built a graph with two big variables and two input placeholder.
Every time, I want to use the current value of variables (partial values) and input placeholders to calculate delta values. Then the delta values are update to the variables using scatter_add.
problem: the two computing paths are not the same, one needs more computing. the tensorflow solving engine seems to prefer one of the path randomly-it solves one of path, then the other. For example, tf may update variable 0 first, then use this new variable 0 to solve another path (update variable 1). This is not my need.
so, any idea?
tensorflow graph:
I find the solution. Using the tf.control_dependencies() could solve this problem.
https://www.tensorflow.org/api_docs/python/tf/control_dependencies

Calculating a t-test in Stata

I am currently trying to run a t test on a variable and determine if it's statistically significantly different from 1. Here is the code I am using:
ttest dm1=1
And it is spitting out this output:
I don't want my null hypothesis to be that mean=1, I want it to be that dm1=1. When I do regular calculations ({Beta(dm1)-1}/SE(Beta(dm1))) on the ttest, I get that the new t statistic should be around -48.89. What is the code to determine if the coefficient is statistically different than one, if this is not the proper way? Also, here is an image of the regression model for reference:
The first t-test syntax is for testing that the null that the mean of dm1 is 1. It has nothing to do with the regression coefficients at all.
If I understand what you are asking, you want a Wald test:
sysuse auto
reg price mpg weight i.foreign
test mpg=1

How to effectively use knn in Stata

I have two questions with executing discrim knn in Stata.
1) How do you properly code the command? I've tried various versions, but seem to always get an error that there are too many variables specified.
The vector with the correct result is buy.
I am trying: discrim knn buy, group(train test) k(1)
2) My understanding with KNN was that factor variables (binary) were fine for using KNN, even encouraged. However I get the error message that factor variables and time-series operators not allowed.
Lastly, though I know this isn't the best space for this question, should each vector be normalized for knn? I've heard conflicting responses.
I'm guessing that the error you're getting is
group(): too many variables specified
This is because you can only group by 1 variable with knn. knn performs discriminant analysis based on a single grouping variable, in your case, distinguishing the training from the test. I imagine your train and test variables are binary, in which case using only one of the variables is enough, as they are merely logical opposites of each other. A single variable has enough information to distinguish the two groups.