Stata output variable to matrix with ebalance - variables

I'm using the ebalance Stata package to calculate post-stratification weights, and I'd like to convert the weights output (_webal, which is generated as a double with format %10.0g) to a matrix.
I'd like to normalize all weights in the "control" group, but I can't seem to convert the variable to a matrix in order to manipulate the weights individually (I'm a novice to Stata, so I was just going to do this using a loop––I'd normally just export and do this in R, but I have to calculate results within a bootstrap). I can, however, view the individual-level weights produced by the output, and I can use them to calculate sample statistics.
Any ideas, anyone? Thanks so much!

This is not an answer, but it doesn't fit within a comment box.
As a self-described novice in Stata, you are asking the wrong question.
Your problem is that you have a variable that you want to do some calculations on, and since you can't just use R and you don't know how to do those (unspecified) calculations directly in Stata, you have decided that the first step is to create a matrix from the variable.
Your question would be better phrased as a simple description of the relevant portions of your data and the calculation you need to do using that data (ebalance is an obscure distraction that probably lost you a few readers) and where you are stuck.
See also https://stackoverflow.com/help/mcve for a discussion of completing a minimal complete example with a description of the results you expect for that example.

Related

Predict a nonlinear array based on 2 features with scalar values using XGBoost or equivalent

So I have been looking at XGBoost as a place to start with this, however I am not sure the best way to accomplish what I want.
My data is set up something like this
Where every value, whether it be input or output is numerical. The issue I'm facing is that I only have 3 input data points per several output data points.
I have seen that XGBoost has a multi-output regression method, however I am only really seeing it used to predict around 2 outputs per 1 input, whereas my data may have upwards of 50 output points needing to be predicted with only a handful of scalar input features.
I'd appreciate any ideas you may have.
For reference, I've been looking at mainly these two demos (they are the same idea just one is scikit and the other xgboost)
https://machinelearningmastery.com/multi-output-regression-models-with-python/
https://xgboost.readthedocs.io/en/stable/python/examples/multioutput_regression.html

How to provide the Gekko Python with the first and second derivatives of the objective function?

I am trying to minimize the difference of a function with a data point over different time points. So the objective function is the sum of the squares of the difference between the model (my function) and the data points over different times.
My model has analytical first and second order derivatives. How can I provide these derivatives to Gekko Python?
There are several examples in the APMonitor webpage regarding parameter estimation. Please check the link below. It also provides the data and model that you can use for practice.
TCLab C - Parameter Estimation
You can also get the idea how to implement the higher order differential equations in GEKKO in the link below. You basically want to introduce additional variable which links the first derivative variable to the 2nd derivative variable. That way, you can collapse the higer order DE down into the multiple 1st order DEs.
Solve 2nd Order Differential Equation

Why is tf.transpose so important in a RNN?

I've been reading the docs to learn TensorFlow and have been struggling on when to use the following functions and their purpose.
tf.split()
tf.reshape()
tf.transpose()
My guess so far is that:
tf.split() is used because inputs must be a sequence.
tf.reshape() is used to make the shapes compatible (Incorrect shapes tends to be a common problem / mistake for me). I used numpy for this before. I'll probably stick to tf.reshape() now. I am not sure if there is a difference between the two.
tf.transpose() swaps the rows and columns from my understanding. If I don't use tf.transpose() my loss doesn't go down. If the parameter values are incorrect the loss doesn't go down. So the purpose of me using tf.transpose() is so that my loss goes down and my predictions become more accurate.
This bothers me tremendously because I'm using tf.transpose() because I have to and have no understanding why it's such an important factor. I'm assuming if it's not used correctly the inputs and labels can be in the wrong position. Making it impossible for the model to learn. If this is true how can I go about using tf.transpose() so that I am not so reliant on figuring out the parameter values via trial and error?
Question
Why do I need tf.transpose()?
What is the purpose of tf.transpose()?
Answer
Why do I need tf.transpose()? I can't imagine why you would need it unless you coded your solution from the beginning to require it. For example, suppose I have 120 student records with 50 stats per student and I want to use that to try and make a linear association with their chance of taking 3 classes. I'd state it like so
c = r x m
r = records, a matrix with a shape if [120x50]
m = the induction matrix. it has a shape of [50x3]
c = the chance of all students taking one of three courses, a matrix with a shape of [120x3]
Now if instead of making m [50x3], we goofed and made m [3x50], then we'd have to transpose it before multiplication.
What is the purpose of tf.transpose()?
Sometimes you just need to swap rows and columns, like above. Wikipedia has a fantastic page on it. The transpose function has some excellent properties for matrix math function, like associativeness and associativeness with the inverse function.
Summary
I don't think I've ever used tf.transpose in any CNN I've written.

Is multiple regression the best approach for optimization?

I am being asked to take a look at a scenario where a company has many projects that they wish to complete, but with any company budget comes into play. There is a Y value of a predefined score, with multiple X inputs. There are also 3 main constraints of Capital Costs, Expense Cost and Time for Completion in Months.
The ask is could an algorithmic approach be used to optimize which projects should be done for the year given the 3 constraints. The approach also should give different results if the constraint values change. The suggested method is multiple regression. Though I have looked into different approaches in detail. I would like to ask the wider community, if anyone has dealt with a similar problem, and what approaches have you used.
Fisrt thing we should understood, a conclution of something is not base on one argument.
this is from communication theory, that every human make a frame of knowledge (understanding conclution), where the frame construct from many piece of knowledge / information).
the concequence is we cannot use single linear regression in math to create a ML / DL system.
at least we should use two different variabel to make a sub conclution. if we push to use single variable with use linear regression (y=mx+c). it's similar to push computer predict something with low accuration. what ever optimization method that you pick...it's still low accuracy..., why...because linear regresion if you use in real life, it similar with predict 'habbit' base on data, not calculating the real condition.
that's means...., we should use multiple linear regression (y=m1x1+m2x2+ ... + c) to calculate anything in order to make computer understood / have conclution / create model of regression. but, not so simple like it. because of computer try to make a conclution from data that have multiple character / varians ... you must classified the data and the conclution.
for an example, try to make computer understood phitagoras.
we know that phitagoras formula is c=((a^2)+(b^2))^(1/2), and we want our computer can make prediction the phitagoras side (c) from two input values (a and b). so to do that, we should make a model or a mutiple linear regresion formula of phitagoras.
step 1 of course we should make a multi character data of phitagoras.
this is an example
a b c
3 4 5
8 6 10
3 14 etc..., try put 10 until 20 data
try to make a conclution of regression formula with multiple regression to predic the c base on a and b values.
you will found that some data have high accuration (higher than 98%) for some value and some value is not to accurate (under 90%). example a=3 and b=14 or b=15, will give low accuration result (under 90%).
so you must make and optimization....but how to do it...
I know many method to optimize, but i found in manual way, if I exclude the data that giving low accuracy result and put them in different group then, recalculate again to the data group that excluded, i will get more significant result. do again...until you reach the accuracy target that you want.
each group data, that have a new regression, is a new class.
means i will have several multiple regression base on data that i input (the regression come from each group of data / class) and the accuracy is really high, 99% - 99.99%.
and with the several class, the regresion have a fuction as a 'label' of the class, this is what happens in the backgroud of the automation computation. but with many module, the user of the module, feel put 'string' object as label, but the truth is, the string object binding to a regresion that constructed as label.
with some conditional parameter you can get the good ML with minimum number of data train.
try it on excel / libreoffice before step more further...
try to follow the tutorial from this video
and implement it in simple data that easy to construct in excel, like pythagoras.
so the answer is yes...the multiple regression is the best approach for optimization.

How to plot a Pearson correlation given a time series?

I am using the code in this website http://blog.chrislowis.co.uk/2008/11/24/ruby-gsl-pearson.html to implement a Pearson Correlation given two time series data like so:
require 'gsl'
pearson_correlation = GSL::Stats::correlation(
GSL::Vector.alloc(first_metrics),GSL::Vector.alloc(second_metrics)
)
This returns a number such as -0.2352461593569471.
I'm currently using the highcharts library and am feeding it two sets of timeseries data. Given that I have a finite time series for both sets, can I do something with this number (-0.2352461593569471) to create a third time series showing the slope of this curve? If anyone can point me in the right direction I'd really appreciate it!
No, correlation doesn't tell you anything about the slope of the line of best fit. It just tells you approximately how much of the variability in one variable (or one time series, in this case) can be explained by the other. There is a reasonably good description here: http://www.graphpad.com/support/faqid/1141/.
How you deal with the data in your specific case is highly dependent on what you're trying to achieve. Are you trying to show that variable X causes variable Y? If so, you could start by dropping the time-series-ness, and just treat the data as paired values, and use linear regression. If you're trying to find a model of how X and Y vary together over time, you could look at multivariate linear regression (I'm not very familiar with this, though).