Creating a new feature from existing ones using a decision tree - data-science

Is it possible to create a new feature out of two, or more than two existing features using a decision tree?
If so, how, and can it produce features with good information value that can better help the model?

The Decision Tree itself doesn't create the third variable. You would create the third variable yourself, a task commonly referred to as feature engineering. There are numerous and perhaps infinite possibilities, for example,
x3 = x1 + x2
x3 = x1 / x2 (as long as x2 can't be zero)
x3 = x1 * exp(x2)
...
As you explore this wonderful world of feature engineering, you may find that some types of combinations work better with decision trees than others... but in general there is no correct answer; just experiment.
Just a tip to get you started - Decision Trees naturally handle collinearity quite well, because as soon as 1 node is split on x, the variables that were collinear with x are suddenly less useful within the split. So transformations that are highly correlated with x1 or x2 directly might not help very much.

Related

Matrix Inverse in Visual Basic

I'm writing a program to do the Newton Raphson Method for n variable (System of equation) using Datagridview. My problem is to determine the inverse for Jacobian Matrix. I've search in internet to find a solution but a real couldn't get it until now so if someone can help me I will real appreciate. Thanks in advance.
If you are asking for a recommendation of a library, that is explicitly off topic in Stack Overflow. However below I mention some algorithms that are commonly used; this may help you to find, or write, what you need. I would, though, not recommend writing something, unless you really want to, as it can be tricky to get these algorithms right. If you do decide to write something I'd recommend the QR method, as the easiest to write, though the theory is a little subtle.
First off do you really need to compute the inverse? If, for example, what you need to do is to compute
x = inv(J)*y
then it's faster and more accurate to treat this problem as
solve J*x = y for x
The methods below all factor J into other matrices, for which this solution can be done. A good package that implements the factorisation will also have the code to perform the solution.
If you do really really need the inverse often the best way is to solve, one column at a time
J*K = I for K, where I is the identity matrix
LU decomposition
This may well be the fastest of the algorithms described here but is also the least accurate. An important point is that the algorithm must include (partial) pivoting, or it will not work on all invertible matrices, for example it will fail on a rotation through 90 degrees.
What you get is a factorisation of J into:
J = P*L*U
where P is a permutation matrix,
L lower triangular,
U upper triangular
So having factorised, to solve for x we do three steps, each straightforward, and each can be done in place (ie all the x's can be the same variable)
Solve P*x1 = y for x1
Solve L*x2 = x1 for x2
Solve U*x = x2 for x
QR decomposition
This may be somewhat slower than LU but is more accurate. Conceptually this factorises J into
J = Q*R
Where Q is orthogonal and R upper triangular. However as it is usually implemented you in fact pass y as well as J to the routine, and it returns R (in J) and Q'*y (in the passed y), so to solve for x you just need to solve
R*x = y
which, given that R is upper triangular, is easy.
SVD (Singular value decomposition)
This is the most accurate, but also the slowest. Moreover unlike the others you can make progress even if J is singular (you can compute the 'generalised inverse' applied to y).
I recommend reading up on this, but advise against implementing it yourself.
Briefly you factorise J as
J = U*S*V'
where U and V are orthogonal and S diagonal.
There are, of course, many other ways of solving this problem. For example if your matrices are very large (dimension in the thousands) an it may, particularly if they are sparse (lots of zeroes), be faster to use an iterative method.

How to set bounds and constraints on Tensorflow Variables (tf.Variable)

I am using Tensorflow to minimize a function. The function takes about 10 parameters. Every single parameter has bounds, e.g. a minimum and a maximum value the parameter is allowed to take. For example, the parameter x1 needs to be between 1 and 10.
I also have a pair of parameters that need to have the following constraint x2 > x3. In other words, x2 must always be bigger than x3. (In addition to this, x2 and x3 also have bounds, similarly to the example of x1 above.)
I know that tf.Variable has a "constraint" argument, however I can't really find any examples or documentation on how to use this to achieve the bounds and constraints as mentioned above.
Thank you!
It seems to me (I can be mistaken) that constrained optimization (you can google for it in tensorflow) is not exactly the case for which tensroflow was designed. You may want to take a look at this repo, it may satisfy your needs, but as far as I understand, it's still not solving arbitrary constrained optimization, just some classification problems with labels and features, compatible with precision/recall scores.
If you want to use constraints on the tensorflow variable (i.e. some function applied after gradient step - which you can do manually also - by taking variable values, doing manipulations, and reassigning then), it means that you will be cutting variables after each step done using gradient in general space. It's a question whether you will successfully reach the right optimization goal this way, or your variables will stuck at boundaries, because general gradient will point somewhere outside.
My approach 1
If your problem is simple enough. you can try to parametrize your x2 and x3 as x2 = x3 + t, and then try to do cutting in the graph:
x3 = tf.get_variable('x3',
dtype=tf.float32,
shape=(1,),
initializer=tf.random_uniform_initializer(minval=1., maxval=10.),
constraint=lambda z: tf.clip_by_value(z, 1, 10))
t = tf.get_variable('t',
dtype=tf.float32,
shape=(1,),
initializer=tf.random_uniform_initializer(minval=1., maxval=10.),
constraint=lambda z: tf.clip_by_value(z, 1, 10))
x2 = x3 + t
Then, on a separate call additionally do
sess.run(tf.assign(x2, tf.clip_by_value(x2, 1.0, 10.0)))
But my opinion is that it won't work well.
My approach 2
I would also try to invent some loss terms to keep variables within constraints, which is more likely to work. For example, constraint for x2 to be in the interval [1,10] will be:
loss += alpha*tf.abs(tf.math.tan(((x-5.5)/4.5)*pi/2))
Here the expression under tan is brought to -pi/2,pi/2 and then tan function is used to make it grow very rapidly when it reaches boundaries. In this case I think you're more likely to find your optimum, but again the loss weight alpha might be too big and training will stuck somewhere nearby, if required value of x2 lies near the boundary. In this case you can try to use smaller alpha.
In addition to the answer by Slowpoke, reparameterization is another option. E.g. let's say you have a param p which should be bounded in [lower_bound,upper_bound], you could write:
p_inner = tf.Variable(...) # unbounded
p = tf.sigmoid(p_inner) * (upper_bound - lower_bound) + lower_bound
However, this will change the behavior of gradient descent.

Should T be a parameter, a function, or what?

I'm new here and I don't really know how to precisely ask my question. I have to prepare code that will proceed like x1 = x0 + t* e, which in practice looks like:
x1 = [0.5, 1] + [0, t]
x1 = [0.5, 1+t]
How should I declare t to make it work? I mean t has to remain here all the time, to make it possible to calculate the roots of a quadratic function a few steps further.
This would be hard to implement in a general purpose programming language because you need t to stay "symbolic" so you can do algebraic manipulations with it. You should look into implementing this in a Computer Algebra System (CAS) because those are specifically designed to handle symbolic computations. Implementing what you describe would be very quick and easy in a CAS.
There is well-known (and expensive proprietary) CAS software like Mathematica or Matlab. Or if you are working in C++ or python there is SymbolicC++ and SymPy that integrate well with each of them respectively. You can see Wikipedia for a list of CAS software.

Optimization Algorithm vs Regression Models

Currently, I'm dealing with forecasting problems. I have a reference that used linear function to represent the input and output data.
y = po + p1.x1 + p2.x2
Both of x1 and x2 are known input; y is output; p0, p1, and p2 are the coefficient. Then, he used all the training data and Least Square Estimation (LSE) method to find the optimal coefficient (p0, p1, p2) to build the model.
My question is if he already used the LSE algorithm, can I try to improve his method by using any optimization algorithm (PSO or GA for example) to try find better coefficient value?
You answered this yourself:
Blockquote Then, he used all the training data and Least Square Estimation (LSE) method to find the optimal coefficient (p0, p1, p2) to build the model.
Because a linear-model is quite easy to optimize, the LSE method obtained a global optimum (ignoring subtle rounding-errors and early-stopping/tolerance errors). Without changing the model, there is no gain in terms of using other coefficients, independent on the usage of meta-heuristics lika GA.
So you may modify the model, or add additional data (feature-engineering: e.g. product of two variables; kernel-methods).
One thing to try: Support-Vector machines. These are also convex and can be trained efficiently (with not too much data). They are also designed to work well with kernels. An additional advantage (compared with more complex models: e.g. non-convex): they are quite good regarding generalization which seems to be important here because you don't have much data (sounds like a very small dataset).
See also #ayhan's comment!

Normal Distribution function

edit
So based on the answers so far (thanks for taking your time) I'm getting the sense that I'm probably NOT looking for a Normal Distribution function. Perhaps I'll try to re-describe what I'm looking to do.
Lets say I have an object that returns a number of 0 to 10. And that number controls "speed". However instead of 10 being the top speed, I need 5 to be the top speed, and anything lower or higher would slow down accordingly. (with easing, thus the bell curve)
I hope that's clearer ;/
-original question
These are the times I wish I remembered something from math class.
I'm trying to figure out how to write a function in obj-C where I define the boundries, ex (0 - 10) and then if x = foo y = ? .... where x runs something like 0,1,2,3,4,5,6,7,8,9,10 and y runs 0,1,2,3,4,5,4,3,2,1,0 but only on a curve
Something like the attached image.
I tried googling for Normal Distribution but its way over my head. I was hoping to find some site that lists some useful algorithms like these but wasn't very successful.
So can anyone help me out here ? And if there is some good sites which shows useful mathematical functions, I'd love to check them out.
TIA!!!
-added
I'm not looking for a random number, I'm looking for.. ex: if x=0 y should be 0, if x=5 y should be 5, if x=10 y should be 0.... and all those other not so obvious in between numbers
alt text http://dizy.cc/slider.gif
Okay, your edit really clarifies things. You're not looking for anything to do with the normal distribution, just a nice smooth little ramp function. The one Paul provides will do nicely, but is tricky to modify for other values. It can be made a little more flexible (my code examples are in Python, which should be very easy to translate to any other language):
def quarticRamp(x, b=10, peak=5):
if not 0 <= x <= b:
raise ValueError #or return 0
return peak*x*x*(x-b)*(x-b)*16/(b*b*b*b)
Parameter b is the upper bound for the region you want to have a slope on (10, in your example), and peak is how high you want it to go (5, in the example).
Personally I like a quadratic spline approach, which is marginally cheaper computationally and has a different curve to it (this curve is really nice to use in a couple of special applications that don't happen to matter at all for you):
def quadraticSplineRamp(x, a=0, b=10, peak=5):
if not a <= x <= b:
raise ValueError #or return 0
if x > (b+a)/2:
x = a + b - x
z = 2*(x-a)/b
if z > 0.5:
return peak * (1 - 2*(z-1)*(z-1))
else:
return peak * (2*z*z)
This is similar to the other function, but takes a lower bound a (0 in your example). The logic is a little more complex because it's a somewhat-optimized implementation of a piecewise function.
The two curves have slightly different shapes; you probably don't care what the exact shape is, and so could pick either. There are an infinite number of ramp functions meeting your criteria; these are two simple ones, but they can get as baroque as you want.
The thing you want to plot is the probability density function (pdf) of the normal distribution. You can find it on the mighty Wikipedia.
Luckily, the pdf for a normal distribution is not difficult to implement - some of the other related functions are considerably worse because they require the error function.
To get a plot like you showed, you want a mean of 5 and a standard deviation of about 1.5. The median is obviously the centre, and figuring out an appropriate standard deviation given the left & right boundaries isn't particularly difficult.
A function to calculate the y value of the pdf given the x coordinate, standard deviation and mean might look something like:
double normal_pdf(double x, double mean, double std_dev) {
return( 1.0/(sqrt(2*PI)*std_dev) *
exp(-(x-mean)*(x-mean)/(2*std_dev*std_dev)) );
}
A normal distribution is never equal to 0.
Please make sure that what you want to plot is indeed a
normal distribution.
If you're only looking for this bell shape (with the tangent and everything)
you can use the following formula:
x^2*(x-10)^2 for x between 0 and 10
0 elsewhere
(Divide by 125 if you need to have your peek on 5.)
double bell(double x) {
if ((x < 10) && (x>0))
return x*x*(x-10.)*(x-10.)/125.;
else
return 0.;
}
Well, there's good old Wikipedia, of course. And Mathworld.
What you want is a random number generator for "generating normally distributed random deviates". Since Objective C can call regular C libraries, you either need a C-callable library like the GNU Scientific Library, or for this, you can write it yourself following the description here.
Try simulating rolls of dice by generating random numbers between 1 and 6. If you add up the rolls from 5 independent dice rolls, you'll get a surprisingly good approximation to the normal distribution. You can roll more dice if you'd like and you'll get a better approximation.
Here's an article that explains why this works. It's probably more mathematical detail than you want, but you could show it to someone to justify your approach.
If what you want is the value of the probability density function, p(x), of a normal (Gaussian) distribution of mean mu and standard deviation sigma at x, the formula is
p(x) = exp( ((x-mu)^2)/(2*sigma^2) ) / (sigma * 2 * sqrt(pi))
where pi is the area of a circle divided by the square of its radius (approximately 3.14159...). Using the C standard library math.h, this is:
#include <math>
double normal_pdf(double x, double mu, double sigma) {
double n = sigma * 2 * sqrt(M_PI); //normalization factor
p = exp( -pow(x-mu, 2) / (2 * pow(sigma, 2)) ); // unnormalized pdf
return p / n;
}
Of course, you can do the same in Objective-C.
For reference, see the Wikipedia or MathWorld articles.
It sounds like you want to write a function that yields a curve of a specific shape. Something like y = f(x), for x in [0:10]. You have a constraint on the max value of y, and a general idea of what you want the curve to look like (somewhat bell-shaped, y=0 at the edges of the x range, y=5 when x=5). So roughly, you would call your function iteratively with the x range, with a step that gives you enough points to make your curve look nice.
So you really don't need random numbers, and this has nothing to do with probability unless you want it to (as in, you want your curve to look like a the outline of a normal distribution or something along those lines).
If you have a clear idea of what function will yield your desired curve, the code is trivial - a function to compute f(x) and a for loop to call it the desired number of times for the desired values of x. Plot the x,y pairs and you're done. So that's your algorithm - call a function in a for loop.
The contents of the routine implementing the function will depend on the specifics of what you want the curve to look like. If you need help on functions that might return a curve resembling your sample, I would direct you to the reading material in the other answers. :) However, I suspect that this is actually an assignment of some sort, and that you have been given a function already. If you are actually doing this on your own to learn, then I again echo the other reading suggestions.
y=-1*abs(x-5)+5