Is it possible to train pytorch and tensorflow model together on one GPU? - tensorflow

I have a pytorch model and a tensorflow model, I want to train them together on one GPU, following the process bellow: input --> pytorch model--> output_pytorch --> tensorflow model --> output_tensorflow --> pytorch model.
Is is possible to do this? If answer is yes, is there any problem which I will encounter?
Thanks in advance.

I haven't done this but it is possible but implementing is can be a little bit.
You can consider each network as a function, you want to - in some sense - compose these function to form your network, to do this you can compute the final function by just giving result of one network to the other and then use chain-rule to compute the derivatives(using symbolic differentiation from both packages).
I think a good way for implementing this you might be to wrap TF models as a PyTorch Function and use tf.gradients for computing the backward pass.
Doing gradient updates can really get hard (because some variables exist in TF's computation graph) you can turn TF variables to PyTorch Variable turn them into placeholdes in TF computation graph, feed them in feed_dict and update them using PyTorch mechanisms, but I think it would be really hard to do, instead if you do your updates inside backward method of the function you might be able to do the job(it is really ugly but might do the job).


How to integrate a pytorch model into a dynamic optimization, for example in Pyomo or gekko

Let's say I have a pytorch-model describing the evolution of some multidimensional system based on its own state x and an external actuator u. So x_(t+1) = f(x_t, u_t) with f being the artificial neural network from pytorch.
Now i want to solve a dynamic optimization problem to find an optimal sequence of u-values to minimize an objective that depends on x. Something like this:
min sum over all timesteps phi(x_t)
s.t.: x_(t+1) = f(x_t, u_t)
Additionally I also have some upper and lower bounds on some of the variables in x.
Is there an easy way to do this using a dynamic optimization toolbox like pyomo or gekko?
I already wrote some code that transforms a feedforward neural network to a numpy-function which can then be passed as a constraint to pyomo. The problem with this approach is, that it requires significant reprogramming-effort every time the structure of the neural network changes, so quick testing becomes difficult. Also integration of recurrent neural networks gets difficult because hidden cell states would have to be added as additional variables to the optimization problem.
I think a good solution could be to do the function evaluations and gradient calculations in torch and somehow pass the results to the dynamic optimizer. I'm just not sure how to do this.
Thanks a lot for your help!
Tensorflow or Pytorch models can't be directly integrated into the GEKKO at this moment. But, I believe you can retrieve the derivatives from Tensorflow and Pytorch, which allows you to pass them to the GEKKO.
There is a GEKKO Brain module and examples in the link below. You can also find an example that uses GEKKO Feedforward neural network for dynamic optimization.
GEKKO Brain Feedforward neural network examples
MIMO MPC example with GEKKO neural network model
Recurrent Neural Network library in the GEKKO Brain module is currently being developed, which allows using all the GEKKO's dynamic optimization functions easily.
In the meantime, you can use a sequential method by wrapping the TensorFlow or PyTorch models in the available optimization solver such as scipy optimization module.
Check out the below link for a dynamic optimization example with Keras LSTM model and scipy optimize.

Why back propagation for Conv in tensorflow are separated into two operations?

I am trying to implement a custom convolution operation in tensorflow with c++ and cuda, and I found that the back-propagation for the Conv2D in tensorflow are implemented via two separate operations. Indeed, I found there are two operation implementations, namely and in the tensorflow source code, which means the gradients for filter and input are calculated respectively. May I ask what is the idea behind this implementation? Why they were not simply merged together as one single operation?
Alright, I did a test and found that there's about 30% speed boost if the back propagation for different inputs are split into different TF ops compared with wrapped into one single TF op. This is against intuition, perhaps there's something related with TF's architecture. Note: my test was based on CUDA im2col/col2im with CuBLAS instead of CuDNN.

Does Tensorflow support Keras models fit() method with eager execution?

I am training a Keras model (tf.keras.models.Sequential) calling its method fit().
Since I enabled eager execution, training time (for the same number of epochs) went up from 20.1s to 49.4s. Also, training didn't seem to converge anymore, as loss remained around 9 (without eager execution it went down to 1), while method fit() didn't even report the requested metric "accuracy" anymore.
Is eager execution support for Keras models? Note that I am calling method fit() on the model, not using an estimator.
Here the snippet of code that declares the model and does the training. Using TF 1.7 for GPU installed with pip3.
model = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(input_shape=(11,)) ,
tf.keras.layers.Dense(64, activation='relu') ,
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(11, activation='softmax')
optimizer = tf.train.AdamOptimizer()
# optimizer = 'adam'
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy']), y=train_y, epochs=200, batch_size=64, verbose=2)
UPDATE: filed issue #18642 on Tensorflow GITHUB.
The issue I reported on tensorflow got this answer:
Thank you for the bug report. We have a fix for this issue, that will
show up on GitHub soon.
See issue #18642 on GITHUB for Tensorflow.
Based on this, I understand that method fit() of Keras models will be supported with eager execution, once the bug is fixed.
Here is a quote from the Tensorflow site found here
There are many parameters to optimize when calculating derivatives. TensorFlow code is easier to read when structured into reusable classes and objects instead of a single top-level function. Eager execution encourages the use of the Keras-style layer classes in the tf.keras.layers module. Additionally, the tf.train.Optimizer classes provide sophisticated techniques to calculate parameter updates.
That means keras layers and subsequent models are allowed using Eager execution.
As for your timing, the link also mentions how using eager stops building of graphs.
TensorFlow's eager execution is an imperative programming environment that evaluates operations immediately, without an extra graph-building step. Operations return concrete values instead of constructing a computational graph to run later.
This may make it harder for your model to run given the number of DENSE layers you have. Someone may correct me on that because I have not done much work with DENSE layers before, or it has been a long time since I have. If that does not work then I would look into your loss function. This answer may help if that becomes a problem.
Everything else looks alright though. Hope this helps.
Ok I see what you are saying Fate. Yeah the first link uses Sequential model, but Gradient tape fro gradient decent. Reading deeper into the eager tutorial shows that they only use Gradient tape as well. Here is what the tutorial says about training:
Automatic differentiation is useful for implementing machine learning algorithms such as backpropagation for training neural networks. During eager execution, use tfe.GradientTape to trace operations for computing gradients later.tfe.GradientTape is an opt-in feature to provide maximal performance when not tracing. Since different operations can occur during each call, all forward-pass operations get recorded to a "tape". To compute the gradient, play the tape backwards and then discard. A particular tfe.GradientTape can only be computed once, subsequent calls throw a runtime error.
So maybe as of right now only Gradient tape and the estimator method are what you are supposed to use with eager.
When reading the compile method on Model (documentation), you can find an argument, run_eagerly:
run_eagerly: Bool. Defaults to False. If True, this Model's logic will not be wrapped in a tf.function. Recommended to leave this as None unless your Model cannot be run inside a tf.function.
So by default, a tf.keras.Model will default to running through graph execution, not eager execution.

Tensorflow: How to create new neuron (Not perceptron neuron)

So tensorflow is extremely useful at creating neural networks that involve perceptron neurons. However, if one wanted to use a new type of neuron instead of the classic perceptron neuron, is this possible through augmenting tensorflow code? I can't seem to find an answer. I understand this would change the forward propagation, and more mathematical calculations, and I am willing to change all the necessary areas.
I am also aware that I can just code from scratch the layers I need, and the neurons I had in mind, but tensorflow nevertheless has GPU integration, so one can see its more ideal to manipulate their code as opposed to creating my own from scratch.
Has anyone experimented with this? My goal is to create neural network structures that use a different type of neuron than the classic perceptron.
If someone who knows where in tensorflow I could look to see where they initialize the perceptron neurons, I would very much appreciate it!
To be more specific, is it possible to alter code in tensorflow to use a different neuron type rather than the perceptron to invoke the tensorlfow Module: tf.layers for example? Or tf.nn? (conv2D, batch-norm, max-pool, etc). I can figure out the details. I just need to know where (I'm sure they're a few locations) I would go about changing code for this.
However, if one wanted to use a new type of neuron instead of the classic perceptron neuron, is this possible through augmenting tensorflow code?
Yes. Tensorflow provides you the possibility to define a computational graph. It then can automatically calculate the gradient for that. No need to do it yourself. This is the reason why you define it symbolically. You might want to read the whitepaper or start with a tutorial.

Tensorflow - How to ignore certain labels

I'm trying to implement a fully convolutional network and train it on the Pascal VOC dataset, however after reading up on the labels in the set, I see that I need to somehow ignore the "void" label. In Caffe their softmax function has an argument to ignore labels, so I'm wondering what the mechanic is, so I can implement something similar in tensorflow.
In tensorflow you're feeding the data in feed_dict right? Generally you'd want to just pre-process the data and remove the unwanted samples - don't give them to tensorflow for processing.
My prefered approach is a producer-consumer model where you fire up a tensorflow queue and load it with samples from a loader thread which just skips enqueuing your void samples.
In training your model dequeue samples in the model (you don't use feed_dict in the optimize step). This way you're not bothering to write out a whole new dataset with the specific preprocessing step you're interested in today (tomorrow you're likely to find you want to do some other preprocessing step).
As a side comment, I think tensorflow is a little more do-it-yourself than some other frameworks. But I tend to like that, it abstracts enough to be convenient, but not so much that you don't understand what's happening. When you implement it you understand it, that's the motto that comes to mind with tensorflow.