From the little I have played around with TensorFlow I see it has already-implemented architectures like RNN/LSTM cells, ConvNets, etc. Is there a way to define one's "custom" architecture (e.g. an "enhanced" LSTM network with a few convolutional layers)?
Yes, it is totally possible. The output of LSTM or any network are tensors which cab used as input of another network.
See how to combine them at https://github.com/jazzsaxmafia/show_and_tell.tensorflow.
You can find more examples at https://github.com/TensorFlowKR/awesome_tensorflow_implementations.
Related
I am training a yolov4 (fully convolutional) in tensorflow 2.3.0.
I would like to change the spatial input shape of the network during training, to further adjust the weights to different scales.
Is this possible?
EDIT:
I know of the existence of darknet, but it suffers from some very specific augmentations I use and have implemented in my repo, that is why I ask explicitly for tensorflow.
To be more precisely about what I want to do.
I want to train for several batches at Y1xX1xC then change the input size to Y2xX2xC and train again for several batches and so on.
It is not possible. In the past people trained several networks for different scales but the current state-of-the-art approach is feature pyramids.
https://arxiv.org/pdf/1612.03144.pdf
Another great candidate is to use dilated convolution which can learn long distance dependencies among pixels with varying distance. You can concatenate the outputs of them and the model will then learn which distance is important for which case
https://towardsdatascience.com/review-dilated-convolution-semantic-segmentation-9d5a5bd768f5
It's important to mention which TensorFlow repository you're using. You can definitely achieve this. The idea is to keep the fixed spatial input dimension in a single batch.
But even better approach is to use the darknet repository from AlexeyAB: https://github.com/AlexeyAB/darknet
Just set, random = 1 https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4.cfg [line 1149]. It will train your network with different spatial dimensions randomly.
One thing you can do is, start your training with AlexeyAB repo with random=1 set, then take the trained weights file to tensorflow for fine-tuning.
I am trying to add an attention layer for my text classification model. The inputs are texts (e.g. movie review), the output is a binary outcome (e.g. positive vs negative).
model = Sequential()
model.add(Embedding(max_features, 32, input_length=maxlen))
model.add(Bidirectional(CuDNNGRU(16,return_sequences=True)))
##### add attention layer here #####
model.add(Dense(1, activation='sigmoid'))
After some searching, I found a couple of read-to-use attention layers for keras. There is the keras.layers.Attention layer that is built-in in Keras. There is also the SeqWeightedAttention and SeqSelfAttention layer in the keras-self-attention package. As a person who is relatively new to the deep learning field, I have a hard time understanding the mechanism behind these layers.
What does each of these lays do? Which one will be the best for my model?
Thank you very much!
If you are using RNN, I would not recommend using the keras.layers.Attention class.
While analysing tf.keras.layers.Attention Github code to better understand how to use the same, the first line I could come across was - "This class is suitable for Dense or CNN networks, and not for RNN networks"
There is another open source version maintained by CyberZHG called
keras-self-attention. To the best of my knowledge this is NOT a part of the Keras or TensorFlow library and seems to be an independent piece of code. This contains the two classes you mentioned - SeqWeightedAttention & SeqSelfAttention layer classes. former returns a 2D value and latter a 3D value. So the SeqWeightedAttention should work for your situation. The former seems to be loosely based on Raffel et al and can be used for Seq classification, The latter seems to be a variation of Bahdanau.
In general, I would suggest you to write your own seq to classification model. The attention piece can be added in less than half a dozen lines of code (bare-bones essence)...much less than the time you would spend in integrating or debugging or understanding the code in these external libraries.
Please refer: Create an LSTM layer with Attention in Keras for multi-label text classification neural network
So tensorflow is extremely useful at creating neural networks that involve perceptron neurons. However, if one wanted to use a new type of neuron instead of the classic perceptron neuron, is this possible through augmenting tensorflow code? I can't seem to find an answer. I understand this would change the forward propagation, and more mathematical calculations, and I am willing to change all the necessary areas.
I am also aware that I can just code from scratch the layers I need, and the neurons I had in mind, but tensorflow nevertheless has GPU integration, so one can see its more ideal to manipulate their code as opposed to creating my own from scratch.
Has anyone experimented with this? My goal is to create neural network structures that use a different type of neuron than the classic perceptron.
If someone who knows where in tensorflow I could look to see where they initialize the perceptron neurons, I would very much appreciate it!
Edit:
To be more specific, is it possible to alter code in tensorflow to use a different neuron type rather than the perceptron to invoke the tensorlfow Module: tf.layers for example? Or tf.nn? (conv2D, batch-norm, max-pool, etc). I can figure out the details. I just need to know where (I'm sure they're a few locations) I would go about changing code for this.
However, if one wanted to use a new type of neuron instead of the classic perceptron neuron, is this possible through augmenting tensorflow code?
Yes. Tensorflow provides you the possibility to define a computational graph. It then can automatically calculate the gradient for that. No need to do it yourself. This is the reason why you define it symbolically. You might want to read the whitepaper or start with a tutorial.
I'm guessing that DNN in the sense used in TensorFlow means "deep neural network". But I find this deeply confusing since the notion of a "deep" neural network seems to be in wide use elsewhere to mean a network with typically several convolutional and/or associated layers (ReLU, pooling, dropout, etc).
In contrast, the first instance many people will encounter this term (in the tfEstimator Quickstart example code) we find:
# Build 3 layer DNN with 10, 20, 10 units respectively.
classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 20, 10],
n_classes=3,
model_dir="/tmp/iris_model")
This sounds suspiciously shallow, and even more suspiciously like an old-style multilayer perceptron (MLP) network. However, there is no mention of DNN as an alternative term on that close-to-definitive source. So is a DNN in the TensorFlow tf.estimator context actually an MLP? Documentation on the hidden_units parameter suggests this is the case:
hidden_units: Iterable of number hidden units per layer. All layers are fully connected. Ex. [64, 32] means first layer has 64 nodes and second one has 32.
That has MLP written all over it. Is this understanding correct? Is DNN therefore a misnomer, and if so should DNNClassifier ideally be deprecated in favour of MLPClassifier? Or does DNN stand for something other than deep neural network?
Give me your definition of "deep" neural network and you get your answer.
But yes, it is simply a MLP and a proper naming would be MLPclassifier indeed. But this does not sounds as cool as the current name.
First of all your definition of DNN is a bit misleading.
There are several architectures of deep neural networks. Inclussive Deep Feedforward Networks is nothing more than a multilayered MLP, plus some techniques to make them attractive.
Some works have used "DNNs" to span all Deep Learning architectures, however, by convention, "DNNs" are used to refer to architectures that use deep forward propagation networks, also called Deep Feedforward Networks
The most important example of a Deep Learning Model is the Profound Net Feedforward or Multilayer Perceptron (MLP). MLP is just a mathematical function that maps some sets of input values to output values. The function is formed by the composition of many simpler functions. You can relate each application of a different mathematical function to provide a new representation of the input.
Therefore, it makes sense that this estimator is called DNNClassifier
My advice is to read this book here.
I am loading a pre-trained network and would like to change/set the "learningRateMultiplier" for various layers. I have done this before using Brainscript (link see below), but need to do this now from within Python. Is this supported? Or is there any other way in Python to set per-layer specific learning rates?
Brainscript:
https://github.com/Microsoft/CNTK/wiki/Parameters-And-Constants
To give some context: I would like to fine-tune all layers in Fast R-CNN training including the conv layers. However past experiments indicate that this requires smaller learning rates for the conv layers compared to the fc layers (perhaps due to the gradients from all ROIs being added up or otherwise combined).
Thanks,
Patrick
Unless a better alternative surfaces, I'd recommend creating two learners with different learning rates and disjoint parameters arguments. You can provide a list with multiple learners to your model's Trainer, which should coordinate their use during training.