What does DNN mean in a TensorFlow Estimator.DNNClassifier? - tensorflow

I'm guessing that DNN in the sense used in TensorFlow means "deep neural network". But I find this deeply confusing since the notion of a "deep" neural network seems to be in wide use elsewhere to mean a network with typically several convolutional and/or associated layers (ReLU, pooling, dropout, etc).
In contrast, the first instance many people will encounter this term (in the tfEstimator Quickstart example code) we find:
# Build 3 layer DNN with 10, 20, 10 units respectively.
classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 20, 10],
n_classes=3,
model_dir="/tmp/iris_model")
This sounds suspiciously shallow, and even more suspiciously like an old-style multilayer perceptron (MLP) network. However, there is no mention of DNN as an alternative term on that close-to-definitive source. So is a DNN in the TensorFlow tf.estimator context actually an MLP? Documentation on the hidden_units parameter suggests this is the case:
hidden_units: Iterable of number hidden units per layer. All layers are fully connected. Ex. [64, 32] means first layer has 64 nodes and second one has 32.
That has MLP written all over it. Is this understanding correct? Is DNN therefore a misnomer, and if so should DNNClassifier ideally be deprecated in favour of MLPClassifier? Or does DNN stand for something other than deep neural network?

Give me your definition of "deep" neural network and you get your answer.
But yes, it is simply a MLP and a proper naming would be MLPclassifier indeed. But this does not sounds as cool as the current name.

First of all your definition of DNN is a bit misleading.
There are several architectures of deep neural networks. Inclussive Deep Feedforward Networks is nothing more than a multilayered MLP, plus some techniques to make them attractive.
Some works have used "DNNs" to span all Deep Learning architectures, however, by convention, "DNNs" are used to refer to architectures that use deep forward propagation networks, also called Deep Feedforward Networks
The most important example of a Deep Learning Model is the Profound Net Feedforward or Multilayer Perceptron (MLP). MLP is just a mathematical function that maps some sets of input values to output values. The function is formed by the composition of many simpler functions. You can relate each application of a different mathematical function to provide a new representation of the input.
Therefore, it makes sense that this estimator is called DNNClassifier
My advice is to read this book here.

Related

Differences between different attention layers for Keras

I am trying to add an attention layer for my text classification model. The inputs are texts (e.g. movie review), the output is a binary outcome (e.g. positive vs negative).
model = Sequential()
model.add(Embedding(max_features, 32, input_length=maxlen))
model.add(Bidirectional(CuDNNGRU(16,return_sequences=True)))
##### add attention layer here #####
model.add(Dense(1, activation='sigmoid'))
After some searching, I found a couple of read-to-use attention layers for keras. There is the keras.layers.Attention layer that is built-in in Keras. There is also the SeqWeightedAttention and SeqSelfAttention layer in the keras-self-attention package. As a person who is relatively new to the deep learning field, I have a hard time understanding the mechanism behind these layers.
What does each of these lays do? Which one will be the best for my model?
Thank you very much!
If you are using RNN, I would not recommend using the keras.layers.Attention class.
While analysing tf.keras.layers.Attention Github code to better understand how to use the same, the first line I could come across was - "This class is suitable for Dense or CNN networks, and not for RNN networks"
There is another open source version maintained by CyberZHG called
keras-self-attention. To the best of my knowledge this is NOT a part of the Keras or TensorFlow library and seems to be an independent piece of code. This contains the two classes you mentioned - SeqWeightedAttention & SeqSelfAttention layer classes. former returns a 2D value and latter a 3D value. So the SeqWeightedAttention should work for your situation. The former seems to be loosely based on Raffel et al and can be used for Seq classification, The latter seems to be a variation of Bahdanau.
In general, I would suggest you to write your own seq to classification model. The attention piece can be added in less than half a dozen lines of code (bare-bones essence)...much less than the time you would spend in integrating or debugging or understanding the code in these external libraries.
Please refer: Create an LSTM layer with Attention in Keras for multi-label text classification neural network

Difference btwn high and low level libraries

What is the difference btwn high level and low level libraries?
I understand that keras is a high level library and tensorflow is a low level library but I'm still not familiar enough with these frameworks to understand what that means for high vs low libraries.
Keras is a high level Deep learning(DL) 'API'. Key components of the API are:
Model - to define the Neural network(NN).
Layers - building blocks of the NN model (e.g. Dense, Convolution).
Optimizers - different methods for doing gradient descent to learn weights of NN (e.g. SGD, Adam).
Losses - objective functions that the optimizer should minimize for use cases like classification, regression (e.g. categorical_crossentropy, MSE).
Moreover, it provides reasonable defaults for the APIs e.g. learning rates for Optimizers, which would work for the common use cases. This reduces the cognitive load on the user during the learning phase.
The 'Guiding Principles' section here is very informative:
https://keras.io/
The mathematical operations involved in running the Neural networks themselves like Convolutions, Matrix Multiplications etc. are delegated to the backend. One
of the backends supported by Keras is Tensorflow.
To highlight the differences with a code snippet:
Keras
# Define Neural Network
model = Sequential()
# Add Layers to the Network
model.add(Dense(512, activation='relu', input_shape=(784,)))
....
# Define objective function and optimizer
model.compile(loss='categorical_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])
# Train the model for certain number of epochs by feeding train/validation data
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
Tensorflow
It ain't a code snippet anymore :) since you need to define everything starting from the Variables that would store the weights, the connections between the layers, the training loop, creating batches of data to do the training etc.
You can refer the below links to understand the code complexity with training a MNIST(DL Hello world example) in Keras vs Tensorflow.
https://github.com/keras-team/keras/blob/master/examples/mnist_mlp.py
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/multilayer_perceptron.py
Considering the benefits that come with Keras, Tensorflow has made tf.keras the high-level API in Tensorflow 2.0.
https://www.tensorflow.org/tutorials/
High level means that your interactions are closer to writing English, and the code you write is essentially more understandable to humans.
An example of low level would be a language in which you would have to do things such as allocate memory, copy data from one memory address to another etc.
Keras is considered high level because you can make a neural network in just a few lines of code, the library will handle all the complexity for you.
In tensorflow (I haven't used it), you probably have to write many more lines of code to achieve the same thing, but probably have a greater degree of control. Reading tensorflow code for a NN would be less meaningful to a layman than reading keras code for a NN.
Keras sits on top of Tensorflow, and thus the framework is relatively 'higher-level' than Tensorflow itself.
A 'high' level language or framework is typically defined as one that has a greater number of dependencies or has a greater distance from core binary code, relative to a lower-level language or framework.
E.g., jQuery would be considered higher-level than JavaScript, as it depends on Javascript. Whereas Javascript would be considered higher-level than assembly code, as it's transpiled to assembly.

Tensorflow: How to create new neuron (Not perceptron neuron)

So tensorflow is extremely useful at creating neural networks that involve perceptron neurons. However, if one wanted to use a new type of neuron instead of the classic perceptron neuron, is this possible through augmenting tensorflow code? I can't seem to find an answer. I understand this would change the forward propagation, and more mathematical calculations, and I am willing to change all the necessary areas.
I am also aware that I can just code from scratch the layers I need, and the neurons I had in mind, but tensorflow nevertheless has GPU integration, so one can see its more ideal to manipulate their code as opposed to creating my own from scratch.
Has anyone experimented with this? My goal is to create neural network structures that use a different type of neuron than the classic perceptron.
If someone who knows where in tensorflow I could look to see where they initialize the perceptron neurons, I would very much appreciate it!
Edit:
To be more specific, is it possible to alter code in tensorflow to use a different neuron type rather than the perceptron to invoke the tensorlfow Module: tf.layers for example? Or tf.nn? (conv2D, batch-norm, max-pool, etc). I can figure out the details. I just need to know where (I'm sure they're a few locations) I would go about changing code for this.
However, if one wanted to use a new type of neuron instead of the classic perceptron neuron, is this possible through augmenting tensorflow code?
Yes. Tensorflow provides you the possibility to define a computational graph. It then can automatically calculate the gradient for that. No need to do it yourself. This is the reason why you define it symbolically. You might want to read the whitepaper or start with a tutorial.

Setting learningRateMultiplier in CNTK from within Python

I am loading a pre-trained network and would like to change/set the "learningRateMultiplier" for various layers. I have done this before using Brainscript (link see below), but need to do this now from within Python. Is this supported? Or is there any other way in Python to set per-layer specific learning rates?
Brainscript:
https://github.com/Microsoft/CNTK/wiki/Parameters-And-Constants
To give some context: I would like to fine-tune all layers in Fast R-CNN training including the conv layers. However past experiments indicate that this requires smaller learning rates for the conv layers compared to the fc layers (perhaps due to the gradients from all ROIs being added up or otherwise combined).
Thanks,
Patrick
Unless a better alternative surfaces, I'd recommend creating two learners with different learning rates and disjoint parameters arguments. You can provide a list with multiple learners to your model's Trainer, which should coordinate their use during training.

Define custom model/architecture in TensorFlow

From the little I have played around with TensorFlow I see it has already-implemented architectures like RNN/LSTM cells, ConvNets, etc. Is there a way to define one's "custom" architecture (e.g. an "enhanced" LSTM network with a few convolutional layers)?
Yes, it is totally possible. The output of LSTM or any network are tensors which cab used as input of another network.
See how to combine them at https://github.com/jazzsaxmafia/show_and_tell.tensorflow.
You can find more examples at https://github.com/TensorFlowKR/awesome_tensorflow_implementations.