Running Keras Sequential model in tf session - tensorflow

Given a Keras sequential model (specifically a 2 layer LSTM): How do we run it in a tf session?
I have to train the model multiple times in a single script and I run out of memory pretty fast. Is running it in a tf session the right solution? If not, what is?

In keras, model.fit() or model.fit_generator() are usually used for model training. An example can be found here. If you don't need to stick with tf session, pure keras implementation is also a good choice.
For the out of memory issue, there may be multiple potential reasons. Maybe you can check if your dataset is too large? Without further info about your code, it's hard to say.

Related

What effects should tensorflow.compat.v1.disable_v2_behavior() have on training using the Keras API?

I have a CNN that trains, on a few hundred thousand examples, to a validation accuracy of ~95% after one epoch. It's straight forward code, using Keras to define a network using the Sequential API. Originally I prepared and used this model on TF 1.3. When I port it over to TF 2.1, replacing the keras calls with tensorflow.keras, it gets to ~60% quickly and gets stuck there (seemingly for many epochs), and the training loss always seems to converge to the same value.
If I add in tf.disable_v2_behavior() at the top of the script, it trains similarly to before.
The documentation states simply that "It switches all global behaviors that are different between TensorFlow 1.x and 2.x to behave as intended for 1.x". Hidden behind the Keras API, I haven't found a clear answer to what this really means in practice. Why should I expect a VGG-like CNN, defined using Keras and trained with model.fit(), to work well without v2 behaviour but to fail so consistently with?
Edit: disable_eager_execution() produces the same result, with improved performance.
Please try disabling eager execution and see if that helps.
tf.compat.v1.disable_eager_execution()
(Add this to the top of your script)

How to do parallel GPU inferencing in Tensorflow 2.0 + Keras?

Let's begin with the premise that I'm newly approaching to TensorFlow and deep learning in general.
I have TF 2.0 Keras-style model trained using tf.Model.train(), two available GPUs and I'm looking to scale down inference times.
I trained the model distributing across GPUs using the extremely handy tf.distribute.MirroredStrategy().scope() context manager
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model.compile(...)
model.train(...)
both GPUs get effectively used (even if I'm not quite happy with the results accuracy).
I can't seem to find a similar strategy for distributing inference between GPUs with the tf.Model.predict() method: when i run model.predict() I get (obviously) usage from only one of the two GPUs.
Is it possible to istantiate the same model on both GPUs and feed them different chunks of data in parallel?
There are posts that suggest how to do it in TF 1.x but I can't seem to replicate the results in TF2.0
https://medium.com/#sbp3624/tensorflow-multi-gpu-for-inferencing-test-time-58e952a2ed95
Tensorflow: simultaneous prediction on GPU and CPU
my mental struggles with the question are mainly
TF 1.x is tf.Session()based while sessions are implicit in TF2.0, if I get it correctly, the solutions I read use separate sessions for each GPU and I don't really know how to replicate it in TF2.0
I don't know how to use the model.predict() method with a specific session.
I know that the question is probably not well-formulated but I summarize it as:
Does anybody have a clue on how to run Keras-style model.predict() on multiple GPUs (inferencing on a different batch of data on each GPU in a parallel way) in TF2.0?
Thanks in advance for any help.
Try to load model in tf.distribute.MirroredStrategy and use greater batch_size
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model = tf.keras.models.load_model(saved_model_path)
result = model.predict(batch_size=greater_batch_size)
There still does not seem to be an official example for distributed inference. There is a potential solution here using tf.distribute.MirroredStrategy: https://github.com/tensorflow/tensorflow/issues/37686. However, it does not seem to fully utilize multi gpus

Distributed Tensorflow Using fit_generator()

Is it possible to use Tensorflow in a distributed manner and use the fit_generator()? In my research so far I have not seen anything on how to do this or if it is possible. If it is not possible then what are some possible solutions to use distributed Tensorflow when all the data will not fit in memory.
Using fit_generator() is not possible under a tensorflow distribution scope.
have a lookt at tf.data. i rewrote all my Keras ImageDataGenerators to a tensorflow data pipeline. doesn't need much time, is more transparent and quite remarkably faster.

Preventing overfitting in transfer learning using TensorFlow and Keras

I've got a TensorFlow 2 model with a pre-trained Keras layer coming from TensorFlow Hub. I want to fine-tune the weights in this sub-model to suit my dataset, but if I do that naively by setting trainable=True and training=True, my model will grossly overfit.
If I had the actual layers of the underlying model under my control, I would insert dropout layers or set L2 coefficient on those individual layers. But the layers are imported to my network using TensorFlow Hub KerasLayer method. Also, I suspect that the underlying model is quite complicated.
I wonder what's the standard practice for solving this kind of issues.
Maybe there is a way to force regularization to the whole network somehow? I know that in TensorFlow 1, there were optimizers like ProximalAdagradOptimizer that took L2 coefficients. In TensorFlow 2, the only optimizer like this is FTRL, but it's hard for me to make it work for my dataset.
I "solved" it by
pretraining non-transferred parts of the model,
then turning on learning for the shared layers,
introducing early stopping,
and configuring the optimizer to go really slow.
This way, I managed to not damage the transferred layers too much. Anyway, I still wonder whether this is the best one can do.

Is it possible to train pytorch and tensorflow model together on one GPU?

I have a pytorch model and a tensorflow model, I want to train them together on one GPU, following the process bellow: input --> pytorch model--> output_pytorch --> tensorflow model --> output_tensorflow --> pytorch model.
Is is possible to do this? If answer is yes, is there any problem which I will encounter?
Thanks in advance.
I haven't done this but it is possible but implementing is can be a little bit.
You can consider each network as a function, you want to - in some sense - compose these function to form your network, to do this you can compute the final function by just giving result of one network to the other and then use chain-rule to compute the derivatives(using symbolic differentiation from both packages).
I think a good way for implementing this you might be to wrap TF models as a PyTorch Function and use tf.gradients for computing the backward pass.
Doing gradient updates can really get hard (because some variables exist in TF's computation graph) you can turn TF variables to PyTorch Variable turn them into placeholdes in TF computation graph, feed them in feed_dict and update them using PyTorch mechanisms, but I think it would be really hard to do, instead if you do your updates inside backward method of the function you might be able to do the job(it is really ugly but might do the job).