Seq2Seq use of buckets in TensorFlow tutorial - tensorflow

I am wondering why the buckets are being introduced in the Seq2Seq TensorFlow tutorial. I understand the efficiency gain from not processing the padding symbols, but you can avoid processing the paddings if you use rnn and specify the sequence_length parameter. Or if you use dynamic_rnn.
Is it because it helps distributing the training across multiple devices / machines ?

One reason is that seq2seq was created before dynamic rnn was available. The other is that, even with dynamic rnn, it still helps for speed if your batches are organized by bucket.

Related

Data augmentation on GPU

As tf.data augmentations are executed only on CPUs. I need a way to run certain augmentations on the TPU for an audio project.
For example,
CPU: tf.recs read -> audio crop -> noise addition.
TPU: spectogram -> Mixup Augmentation.
Most augmentations can be done as a Keras Layer on top of the model, but MixUp requires both changes in input as well as label.
Is there a way to do it using tf keras APIs.
And if there is any way we can transfer part of tf.data to run on TPU that will also be helpful.
As you have rightly mentioned and as per the Tensorflow documentation also the preprocessing of tf.data is done on CPU only.
However, you can do some workaround to preprocess your tf.data using TPU/GPU by directly using transformation function in your model with something like below code.
input = tf.keras.layers.Input((512,512,3))
x = tf.keras.layers.Lambda(transform)(input)
You can follow this Kaggle post for detailed discussion on this topic.
See the Tensorflow guide that discusses preprocessing data before the model or inside the model. By including preprocessing inside the model, the GPU is leveraged instead of the CPU, it makes the model portable, and it helps reduce the training/serving skew. The guide also has multiple recipes to get you started too. It doesn't explicitly state this works for a TPU but it can be tried.

Distributed Tensorflow Using fit_generator()

Is it possible to use Tensorflow in a distributed manner and use the fit_generator()? In my research so far I have not seen anything on how to do this or if it is possible. If it is not possible then what are some possible solutions to use distributed Tensorflow when all the data will not fit in memory.
Using fit_generator() is not possible under a tensorflow distribution scope.
have a lookt at tf.data. i rewrote all my Keras ImageDataGenerators to a tensorflow data pipeline. doesn't need much time, is more transparent and quite remarkably faster.

Is there a standard way to optimize models to run well on different mobile devices?

I’m working on a few side projects that involve deploying ML models to the edge. One of them is a photo-editing app that includes CNN’s for facial recognition, object detection, classification, and style transfer. The other is a NLP app that assists in the writing process by suggesting words and sentence completions..
Once I have a trained model that’s accurate, it ends up being really slow on one or more mobile devices that I'm testing on (usually the lower end Android). I’ve read that there are optimizations one can do to speed models up, but I don’t know how. Is there a standard, go-to tool for optimizing models for mobile/edge?
I will be talking about TensorFlow Lite specifically it is a platform for running TensorFlow ops on Android and iOS. There are several optimisation techniques mentioned on their website but I will discuss the ones which feel important to me.
Constructing relevant models for platforms:
The first step in model optimization is its construction from scratch meaning TensorFlow. We need to create a model which can be used exported to a memory constrained device.
We definitely need to train different models for different machines. A model constructed to work on a high-end TPU will never run efficiently on a Mobile processor.
Create a model which has minimum layers and ops.
Do this without compromising the model's accuracy.
For this, you will need expertise in ML and also which ops are the best to preprocess data.
Also, extra preprocessing of input data brings down the model complexity to a great extent.
Model quantization:
We convert the high precision floats or decimals to lower precision floats. It affects the model's performance slightly but greatly reduces the model size and then holds less of the memory.
Post-training quantization is a general technique to reduce model size while also providing up to 3x lower latency with little degradation in model accuracy. Post-training quantization quantizes weights from floating point to 8-bits of precision - from TF docs.
You can see the TensorFlow Lite TFLiteConverter example:
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
tflite_quant_model = converter.convert()
Also you should try using the post_training_quantize= flag which reduces the model size considerably.
Hope it helps.

Is it possible to train pytorch and tensorflow model together on one GPU?

I have a pytorch model and a tensorflow model, I want to train them together on one GPU, following the process bellow: input --> pytorch model--> output_pytorch --> tensorflow model --> output_tensorflow --> pytorch model.
Is is possible to do this? If answer is yes, is there any problem which I will encounter?
Thanks in advance.
I haven't done this but it is possible but implementing is can be a little bit.
You can consider each network as a function, you want to - in some sense - compose these function to form your network, to do this you can compute the final function by just giving result of one network to the other and then use chain-rule to compute the derivatives(using symbolic differentiation from both packages).
I think a good way for implementing this you might be to wrap TF models as a PyTorch Function and use tf.gradients for computing the backward pass.
Doing gradient updates can really get hard (because some variables exist in TF's computation graph) you can turn TF variables to PyTorch Variable turn them into placeholdes in TF computation graph, feed them in feed_dict and update them using PyTorch mechanisms, but I think it would be really hard to do, instead if you do your updates inside backward method of the function you might be able to do the job(it is really ugly but might do the job).

Setting learningRateMultiplier in CNTK from within Python

I am loading a pre-trained network and would like to change/set the "learningRateMultiplier" for various layers. I have done this before using Brainscript (link see below), but need to do this now from within Python. Is this supported? Or is there any other way in Python to set per-layer specific learning rates?
Brainscript:
https://github.com/Microsoft/CNTK/wiki/Parameters-And-Constants
To give some context: I would like to fine-tune all layers in Fast R-CNN training including the conv layers. However past experiments indicate that this requires smaller learning rates for the conv layers compared to the fc layers (perhaps due to the gradients from all ROIs being added up or otherwise combined).
Thanks,
Patrick
Unless a better alternative surfaces, I'd recommend creating two learners with different learning rates and disjoint parameters arguments. You can provide a list with multiple learners to your model's Trainer, which should coordinate their use during training.