Is it possible to use Keras to optimize the coefficients of a mathematical function? - tensorflow

I'm very new to Keras a neural network in general. and I was wondering if I had a list of points (x,y) that came from a quadratic function that looks like this (ax^2+bx+c) is it possible
to feed the points into a neural network and
get the coefficients a,b and c as an output from the network?
I know that I can simply use polynomial regression to achieve my goal. that is not the point.

If you are asking how to do polynomial regression using neural networks, here's the recipe.
Your dataset consists of points (x, y). Design your network to be a fully connected network (dense network) with 1 input layer and 1 output layer. The input layer consists of 2 nodes, the output layer consists of 1 node. Then, give to your network the inputs x and x^2. The output will be computed as:
y = w * X + c
where w is a matrix of learnable parameters. Specifically, it has shape 1x2 since it contains parameters a and b. c is a bias. The input matrix X has shape 2xN, where N is the number of points in your dataset and for each point, the first component is x^2 and the second component is x.
As loss function, use the standard Mean Squared Error loss. As for the optimizer, a simple Stochastic Gradient Descent should work just fine. At convergence, w and c will be good enough to approximate the true quadratic function.
I don't know keras, but I think it will not tough figuring out by yourself how to implement this naive network.

Related

Binary classification of pairs with opposite labels

I have a data-set without labels, but I do have a way to get pairs of examples with opposite labels, that is given a pair x,z I know that their true labels are either 0,1 or 1,0.
So, I am building a model that accepts pairs of samples as input, and learns to classify them with opposite labels. Assuming I have an arbitrary model for predicting a single sample, y_hat = f(x), I am building a model with Keras that accepts pairs of samples (x,z) and outputs pairs of predictions, f(x), f(z). I then use a custom loss function that drives the model towards the correct direction: Given that a regular binary classifier is trained using the Binary Cross Entropy (BCE) to make the predicted and desired output "close", I use the negative BCE. Also, since BCE is not symmetric, I symmetrize it. So, the loss function I give the model.compile method is:
from tensorflow import keras
bce = keras.losses.BinaryCrossentropy()
def neg_sym_bce(y1, y2):
return (- 0.5 * (bce(y1, y2) + bce(y2, y1)))
My problem is, this model fails to learn to classify even a single pair of my data (I get f(x)~=f(z)~=0.5), and if I try to train it with synthetic "easy" data, it takes hundreds of epochs to converge (also on a single pair).
This made me suspect that it has to do with a "vanishing gradient" problem. Indeed, when I plot (see below) the loss for a single pair, which is a function of 2 variables (the 2 outputs), it is evident that there is a wide plateau around the 0.5, 0.5 point. It is also evident that the global minima is, as expected, around the points 0,1 and 1,0.
So, is there a way to deal with the vanishing gradient here? I read about the problem but the references I found deal with vanishing gradient in the network, not in the loss itself.
Or, is there another loss that can drive the model to predict opposite labels?
Think if your labels are always either 0,1 or 1,1 just use categorical_crossentropy for the loss.

TensorFlow / PyTorch: Gradient for loss which is measured externally

I am relatively new to Machine Learning and Python.
I have a system, which consists of a NN whose output is fed into an unknown nonlinear function F, e.g. some hardware. The idea is to train the NN to be an inverse F^(-1) of that unknown nonlinear function F. This means that a loss L is calculated at the output of F. However, backpropagation cannot be used in a straightforward manner for calculating the gradients and updating the NN weights because the gradient of F is not known either.
Is there any way how to use a loss function L, which is not directly connected to the NN, for the calculation of the gradients in TensorFlow or PyTorch? Or to take a loss that was obtained with any other software (Matlab, C, etc.) use it for backpropagation?
As far as I know, Keras keras.backend.gradients only allows to calculate gradients with respect to connected weights, otherwise the gradient is either zero or NoneType.
I read about the stop_gradient() function in TensorFlow. But I am not sure whether this is what I am looking for. It allows to not compute the gradient with respect to some variables during backpropagation. But I think the operation F is not interpreted as a variable anyway.
Can I define any arbitrary loss function (including a hardware measurement) and use it for backpropagation in TensorFlow or is it required to be connected to the graph as well?
Please, let me know if my question is not specific enough.
AFAIK, all modern deep learning packages (pytorch, tensorflow, keras etc.) are relaying on gradient descent (and its many variants) to train networks.
As the name suggests, you cannot do gradient descent without gradients.
However, you might circumvent the "non differentiability" of your "given" function F by looking at the problem from a slightly different perspective:
You are trying to learn a model M that "counters" the effect of F. So you have access to F (but not its gradients) and a set of representative inputs X={x_0, x_1, ... x_n}.
For each example x_i you can compute y_i = F(x_i) and your end goal is to have a model M that given y_i will output x_i.
Therefore, you can treat y_i as your model's input and compute a loss between M(y_i) and x_i that produced it. This way you do not need to compute gradients through the "black box" F.
A pseudo code would look something like:
for x in examples:
y = F(x) # applying F on x - getting only output WITHOUT any gradients
pred = M(y) # apply the trainable model M to the output of F
loss = ||x - pred|| # loss will propagate gradients through M and stop at F
loss.backward()

Unsupervised Neural Network to Maximize a Function?

Suppose I have vectors of dimension 1 x N {X_1...X_n} and {X_1' ...X_n'} where each X and X' are related but the relation is not able to be modeled by a function. I want to train a neural network by feeding it X_i and outputting Y_i with dimension N x 1, such that norm((X_i')(Y_i)) is maximized. The constraint is that Y_i has a norm of 1 (otherwise I will just use as large numbers as possible in Y_i).
I do not use X_i' as the inputs because they are not available in real life. I hope that when I test the neural network by feeding it {X_n+1 ... X_k}, it will output {Y_n+1 ... Y_k} where norm((X_n+1')(Y_n+1)) are maximized. Again, note that I only have {X_n+1'...X_k'} when testing, but not in real life where the neural network will be used.
I tried defining custom tensorflow or keras loss functions, but they don't seem to work. Also I tried using a neural network to first predict X_i' from X_i, but the performance is not very good.
A difficulty in this is to define a loss function that has no labels, and make neural network do backprop using this loss function. Any ideas how this may be achieved?

Computing gradients of keras output with respect to the network inputs?

If i have a network in Keras with some input variables say x, y, z. How would i calculate the gradient of the outputs with respect to each of these inputs (x,y,z). I have been looking around and can't find a clear answer to this and haven't managed to work it out myself after messing around with tf.gradients for a while.
I have seen this question Keras with TF backend: get gradient of outputs with respect to inputs
but this is not clear at all to me and i don't understand what to do or how to implement it. Any help and simple example would be great, thanks.
EDIT:
Here is a concrete example of what i am looking for.
Consider for example the function f(x,y,z) = x^2 + y^2 + z^2.
def function(x,y,z):
return (x**2) + (y**2) + (z**2)
network = Sequential()
network.add(Dense(128, input_shape=3, activation='relu'))
network.add(Dense(128, activation='relu'))
network.add(Dense(1,activation='relu'))
If I trained the neural network on random examples of x,y,z and function values f(x,y,z) and approximated the function f(x,y,z), what i would like to do is to then use the network to return the gradient of the function with respect to each of the inputs individually. The gradient vector for this example would be given by
f'(x,y,z) = 2x + 2y +2z. So once the network is trained, for a given input vector (x,y,z) i would like to not only approximate the function value but also its derivatives with respect to the inputs, for example, once the network has been trained if i provided say the input vector (1,2,3) to the network, i would not only like to get the network approximation for f(1,2,3) = 1^2 + 2^2 + 3^2 = 14, but i would like to get the approximation for the derivative of f(x,y,z) with respect to each of the individuals separately, so in this case i would want to approximate the partial derivative of f(x,y,z) with respect to x=1 which would of course be 2, and likewise for the partial derivative of f(x,y,z) with the respect to the other two inputs, which are 4 and 6 respectively. This is just a simple example of what i would like to do.

How to create a custom connected neural network using tensorflow?

I want to create a network that has specific fixed connections between layers.
For example,
Sparsely connected neural network
I tried looking into functions in Tensorflow, but I only found dense networks with regularizers, which doesn't function as I want.
If it's not possible in tensorflow, then please suggest some other library that can be used. Thanks!
You can always find a workaround. Let's say a layer does y = xW (Wx is also correct) but you want some of the entries in W always be zeros. You can do it column-wise:
For column i (or element i since y is a vector) of the output, y_i = x * D_i * W_i. The matrix D_i is a constant diagonal matrix (tf.constant, tf.diag) that controls what element would be zeros.
Then you can use tf.concat to combine all y_i to matrix Y.
You can abstract this into a function whose signature may look like def sparse_layer(input_layer, gates_matrix, activation_f, ...) which returns the output layer.