Can Torch optim package support multiple inputs - optimization

I'm trying to use the torch7 optim package adam algorithm implementation
for optimizing a neural network which takes two independent inputs. Can this be done? The code seems to only support a single input vector. Is there some other implementation which can take a generic table of inputs?
The reference usage I saw, upon which I based my code is here

In principle, if you flatten all your parameters and gradients into two tensors then optim can handle it. See getParameters() function.

Related

How to integrate a pytorch model into a dynamic optimization, for example in Pyomo or gekko

Let's say I have a pytorch-model describing the evolution of some multidimensional system based on its own state x and an external actuator u. So x_(t+1) = f(x_t, u_t) with f being the artificial neural network from pytorch.
Now i want to solve a dynamic optimization problem to find an optimal sequence of u-values to minimize an objective that depends on x. Something like this:
min sum over all timesteps phi(x_t)
s.t.: x_(t+1) = f(x_t, u_t)
Additionally I also have some upper and lower bounds on some of the variables in x.
Is there an easy way to do this using a dynamic optimization toolbox like pyomo or gekko?
I already wrote some code that transforms a feedforward neural network to a numpy-function which can then be passed as a constraint to pyomo. The problem with this approach is, that it requires significant reprogramming-effort every time the structure of the neural network changes, so quick testing becomes difficult. Also integration of recurrent neural networks gets difficult because hidden cell states would have to be added as additional variables to the optimization problem.
I think a good solution could be to do the function evaluations and gradient calculations in torch and somehow pass the results to the dynamic optimizer. I'm just not sure how to do this.
Thanks a lot for your help!
Tensorflow or Pytorch models can't be directly integrated into the GEKKO at this moment. But, I believe you can retrieve the derivatives from Tensorflow and Pytorch, which allows you to pass them to the GEKKO.
There is a GEKKO Brain module and examples in the link below. You can also find an example that uses GEKKO Feedforward neural network for dynamic optimization.
GEKKO Brain Feedforward neural network examples
MIMO MPC example with GEKKO neural network model
Recurrent Neural Network library in the GEKKO Brain module is currently being developed, which allows using all the GEKKO's dynamic optimization functions easily.
In the meantime, you can use a sequential method by wrapping the TensorFlow or PyTorch models in the available optimization solver such as scipy optimization module.
Check out the below link for a dynamic optimization example with Keras LSTM model and scipy optimize.
Keras LSTM MPC

Trouble with implementing local response normalization in TensorFlow

I'm trying to implement a local response normalization layer in Tensorflow to be used in a Keras model:
Here is an image of the operation I am trying to implement:
Here is the Paper link, please refer to section 3.3 to see the description of this layer
I have a working NumPy implementation, however, this implementation uses for loops and inbuilt python min and max operators to compute the summation. However, these pythonic operations will cause errors when defining a custom keras layer, so I can't use this implementation.
The issue here lies in the fact that I need to iterate over all the elements in the feature map and generate a normalized value for each of them. Additionally, the upper and lower bound on the summation change depending on which value I am currently normalizing. I can't really think of a way to handle this without nested for loops, but this will not work in a Keras custom layer as it isn't a native TensorFlow function.
Could anyone point me towards tensorflow/keras backend functions that could help me in implementing this layer?
EDIT: I know that this layer is implemented as a keras layer, but I want to build intuition about custom layers, so I want to implement this layer using tensor ops.

Behaviour of Alpha Dropout in Training and Inference time

I am in the process of implementing a self normalizing neural network using the tensorflow. There are currently tensorflow "primitives" in the form of tf.nn.selu and tf.contrib.nn.alpha_dropout that should make this an easy process.
My problem is with tf.contrib.nn.alpha_dropout. I was expecting it to have a boolean switch for when you are in training and when you are in inference as does the usual dropout function used with other activation functions.
In the original implementation by the authors, we indeed see that they have this boolean switch (training) in the selu dropout function (dropout_selu).
Is there something I am missing?
tf.contrib.nn.alpha_dropout should be seen as an analogue to tf.nn.dropout. The latter function also does not have an argument for a training switch. It is not to be confused with tf.layers.dropout, which wraps tf.nn.dropout and has a training argument. As we can see in the implementation, the layers version returns either the result of nn.dropout or the identity depending on the training switch. It should be relatively easy to define your own wrapper around alpha_dropout in a similar manner.
To avoid any confusion: layers.dropout eventually calls the "keras layers" version of dropout which is the implementation linked above.

Is it possible to train pytorch and tensorflow model together on one GPU?

I have a pytorch model and a tensorflow model, I want to train them together on one GPU, following the process bellow: input --> pytorch model--> output_pytorch --> tensorflow model --> output_tensorflow --> pytorch model.
Is is possible to do this? If answer is yes, is there any problem which I will encounter?
Thanks in advance.
I haven't done this but it is possible but implementing is can be a little bit.
You can consider each network as a function, you want to - in some sense - compose these function to form your network, to do this you can compute the final function by just giving result of one network to the other and then use chain-rule to compute the derivatives(using symbolic differentiation from both packages).
I think a good way for implementing this you might be to wrap TF models as a PyTorch Function and use tf.gradients for computing the backward pass.
Doing gradient updates can really get hard (because some variables exist in TF's computation graph) you can turn TF variables to PyTorch Variable turn them into placeholdes in TF computation graph, feed them in feed_dict and update them using PyTorch mechanisms, but I think it would be really hard to do, instead if you do your updates inside backward method of the function you might be able to do the job(it is really ugly but might do the job).

Why does TensorFlow have a lot of mathematical equations re-implemented?

I was looking through the API in TensorFlow and notice that a lot of mathematical operations that already exist in python and numpy have been re-implemented (or at least given a tensorflow interface). For example:
is there a good reason to do this?
I've been searching over their page but can't find why they'd do this.
I do have some guesses though. One of my main guesses is that they probably want those operations to have some backpropagation effect on whatever Neural network graph that gets implementat. In other words, have their derivatives implemented. Is this one of the reasons? (wish I knew how to even check if my guess is right)
For example, in one of the most basic examples of linear regression, one defines the prediction function that one wants to implement:
product = tf.matmul(x,W)
y = product + b
instead of
product = tf.matmul(x,W)
y = tf.add(product, b)
Somehow the first implementation does not interfere with Stochastic Gradient Descent algorithm for training, so it probably doesn't matter if one uses numpy or tf.add to train? This is one aspect that confuses me, when do I know which one should I be using.
Or maybe they are performance reasons? Or maybe its to give those operations access to GPU if required to use GPUs?
You have to understand that you create a tensorflow graph with this operation, meaning they aren't the same as the numpy functions, they are more an abstraction of them.
Maybe you have noticed that you have to create a session and then evaluate the functions through that session to get a result, where with numpy functions they are executed directly. this is because this graph and its functions define what to do like writing down a formula, but to get results for a specific x (or whatever) you have to insert a value for x. This is what your doing through session and eval.
So to conclude this you define a graph with tensorflow which is a more abstract representation of the functions and the graph also isn't executed at runtime, then it is defined, it will be executed when you call the eval function and through that run the session.
Also notice that you cant mix numpy functions and tensorflow functions directly but you can define own tensorflow functions (https://www.tensorflow.org/versions/r0.9/how_tos/adding_an_op/index.html)
Btw I guess most of the tensorflow functions are using numpy under the hood. :)