Function inverse tensorflow - tensorflow

Is there a way to find the inverse of neural network representation of a function in tensorflow v1? I require this to find the optimal function in an optimization problem that I am solving.
To be precise, the optimal function is found by minimizing the error computed as L2 norm of difference between the approximated optimal function C* (coded as a neural network object), and inverse of a value function V* (coded as another neural network object).
My problem is that I do not know how to write inverse of V* in tensorflow, as I cannot find something like tf.inverse().
Any help is much appreciated. Thanks.

Unless I am misunderstanding the situation, I believe that it is impossible to do this in a generalized way. Many functions do not have a perfect inverse. For a simple example, imagine a square(x) function that computes x2. You might think that the inverse is sqrt(y), but in reality the "correct" result could be either sqrt(y) or -sqrt(y), with no way of telling which is correct.
Similarly, with most neural networks I imagine it would be impossible to find the "true" mathematical inverse. There are architectures that attempt to train a neural net and its inverse simultaneously (autoencoders and BiGAN/ALI come to mind), and for some nets it might be possible to train an inverse empirically, but these can have extremely varying levels of accuracy that depend heavily on many factors.
Depending on how much control you have over V*, you might be able to design it in such a way that it is mathematically invertible (and then you would have to manually code the inverse), or you might be able to make it a simpler model that is not based on a neural net. However, if V* is an arbitrary preexisting net, then you're probably out of luck.
Further reading:
SO: local inverse of a neural network
AI.SE: Can we get the inverse of the function that a neural network represents?

Related

Optimization of data-driven function as Tensorflow model

I try to find the optimum of a data-driven function represented as a Tensorflow model.
Means I trained a model to approximate a function and now want to find the optimum of this approximated function using a algorithm and software package/python library like ipopt, ipyopt, casadi, .... Or is there a possibility to do this directly in Tensorflow. I also have to define constraints, so I can't just use simple autodiff to do gradient decent and optimize my input.
Is there any idea how to realize this in an efficient way?
Maybe this image visualizes my problem to better understand what I'm looking for.

APIs of make inferences in GPflow

I have built some gaussian process models in GPflow and learned them successfully, but I cannot find APIs that can help me to make inferences straightforwardly in GPflow, such as seperating the contributions of different kernels in a GPR model.
I know that I can do it manually, like calculating the covariance matrices, inverse and multiply, but such work can be quite annoying as the model gets more complex, like a multi-output SVGP model. Any suggestions?
Thanks in advance!
If you want to e.g. decompose an additive Kernel, I think the easiest way for vanilla GPR would be to just switch out the Kernel to the part you're interested in, while still keeping the learned hyperparameters.
I'm not totally sure about it, but I think it could also work out for SVGP, since the approximation itself is just a standard GP using the same kernel but conditioned on the Inducing Points.
However, I'm not sure if the decomposition of the Variational approximation can be assumed to be close to the decomposition of the true posterior.

Give an example visual recognition task where a fully connected network would be more suitable than a convolution neural networks

I know CNN has a lot of good features like weight sharing, save memory and feature extracting. However, this question makes me very confused. Is there any possible situation that fully connected network better than CNN? Why?
Thanks a lot guys!
Is there any possible situation that fully connected network better than CNN?
Well, I think we should first define what we mean by "better". Accuracy and precision are not the only things to consider: computational time, degrees of freedom and difficulty of the optimization should also be taken into account.
First, consider an input of size h*w*c. Feeding this input to a convolutional layer with F featuremaps and kernel size s will result in at about F*s*s*c learnable parameters (assuming there are no constraints on the ranks of the convolutions, otherwise we even have less parameters.). Feeding the same input into a fully connected layer with the same number of featuremaps will result in F*d_1*d_2*w*h*c, (where d_1,d_2 are the dimensions of each featuremap) which is clearly in the order of billions of learnable parameters given any input image with decent resolution.
While it can be tempting to think that we can get away with shallower networks (we already have lots of parameters, right?), fully connected layers are just linear layers after all, so we still need to insert many non-linearities in order for the network to gain reasonable representational power. So, this will mean that you will still need a deep network, however with so many parameters that it would be untractable. In addition, a larger network will have more degrees of freedom, and will therefore model much more than what we want: it will model noise unless we feed it some data or constrain it.
So yes, there might be a fully connected network that in theory could give us better performance, but we don't know how to train it yet. Finally, and this is purely based on intuition and therefore might be wrong, but it seems unlikely to me that such a fully connected network would converge to a dense solution. Since many convolutional networks achieve very high levels of accuracy (99% and up) on many tasks, I think that the optimal solution the fully connected network would converge to would be close to the convolutional network. So, we don't really need to train the fully connected one, but just a subset of its architecture.

Standard parameter representation in neural networks

Many times I have seen in neural networks forward propagation that example vectors are multiplied from the left (vector-matrix) and some times from the right (matrix-vector). Notation, some Tensorflow tutorials and the datasets I have found seem to prefer the former over the later, contrary to the way in which linear algebra tends to be teached (matrix-vector way).
Moreover, they represent inverted ways of representing parameters: enumerate problem variables in dimension 0 or enumerate neurons in dimension 0.
This confuses me and makes me wonder if there is really a standard here or it has been only coincidence. If there is, I would like to know if the standard follows some deeper reasons. I would feel really better answering this question.
(By the way, I know that you will normally use example matrices instead of vectors [or more complex things in conv nets, etc..] because the use of minibatches, but the point still holds.)
Not sure if this answer is what you are looking for, but in the context of Tensorflow, the standard is to use a dense layer (https://www.tensorflow.org/api_docs/python/tf/layers/dense) which is a higher level abstraction that wraps up the affine transformation logic you are referring to.

Neural network gives different output for same input

What are the potential reasons for a NN to output different values for the same input? Especially when there isn't any random or stochastic processes?
This is a very broad and general question, might be even too broad to even be on here, but there are several things you should know about neural networks:
They are NOT methods for finding one prefect optimal solution. A neural network usually learn examples that it is given and "figures out" a way to predict results reasonably well. Reasonable is relative, and for some models may mean 50% success and for others anything short of 99.9% will be considered failure.
They're outcome is very dependent on the data that was trained on. The order of data matters, and it's usually a good idea to shuffle data during training, but that can lead to wildly different results. Also, the quality of data matters - if the training data is very different in nature to the test data for example.
The best analogy of neural networks in computing is of course - the brain. Even with the same information and same basic underlying biology, we could all evolve different opinions on matters based on endless other variables. Same thing with computer learning to some extent.
Some types of neural networks use dropout layers, that are specifically designed to shut off random parts of the network during training. This should not affect the final prediction process, because for predictions that layer is usually set to allow all the parts of the network to operate, but if you are inputting data and telling the model it is "training" instead of asking it to predict, the results may vary significantly.
The sum of all this is just to say: The training of neural networks should be expected to yield different results from similar starting conditions, and so must be tested multiple times for every condition to determine what parts of it are inevitable and what parts are not.
It might be due to shuffling of data , If you want to use the same vector you should turn the shuffle argument off.
You should try disabling dropout. Dropout randomly sets the outputs of certain neurons to 0. This will mean that your output will be different each time.