Difference btwn high and low level libraries - tensorflow

What is the difference btwn high level and low level libraries?
I understand that keras is a high level library and tensorflow is a low level library but I'm still not familiar enough with these frameworks to understand what that means for high vs low libraries.

Keras is a high level Deep learning(DL) 'API'. Key components of the API are:
Model - to define the Neural network(NN).
Layers - building blocks of the NN model (e.g. Dense, Convolution).
Optimizers - different methods for doing gradient descent to learn weights of NN (e.g. SGD, Adam).
Losses - objective functions that the optimizer should minimize for use cases like classification, regression (e.g. categorical_crossentropy, MSE).
Moreover, it provides reasonable defaults for the APIs e.g. learning rates for Optimizers, which would work for the common use cases. This reduces the cognitive load on the user during the learning phase.
The 'Guiding Principles' section here is very informative:
https://keras.io/
The mathematical operations involved in running the Neural networks themselves like Convolutions, Matrix Multiplications etc. are delegated to the backend. One
of the backends supported by Keras is Tensorflow.
To highlight the differences with a code snippet:
Keras
# Define Neural Network
model = Sequential()
# Add Layers to the Network
model.add(Dense(512, activation='relu', input_shape=(784,)))
....
# Define objective function and optimizer
model.compile(loss='categorical_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])
# Train the model for certain number of epochs by feeding train/validation data
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
Tensorflow
It ain't a code snippet anymore :) since you need to define everything starting from the Variables that would store the weights, the connections between the layers, the training loop, creating batches of data to do the training etc.
You can refer the below links to understand the code complexity with training a MNIST(DL Hello world example) in Keras vs Tensorflow.
https://github.com/keras-team/keras/blob/master/examples/mnist_mlp.py
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/multilayer_perceptron.py
Considering the benefits that come with Keras, Tensorflow has made tf.keras the high-level API in Tensorflow 2.0.
https://www.tensorflow.org/tutorials/

High level means that your interactions are closer to writing English, and the code you write is essentially more understandable to humans.
An example of low level would be a language in which you would have to do things such as allocate memory, copy data from one memory address to another etc.
Keras is considered high level because you can make a neural network in just a few lines of code, the library will handle all the complexity for you.
In tensorflow (I haven't used it), you probably have to write many more lines of code to achieve the same thing, but probably have a greater degree of control. Reading tensorflow code for a NN would be less meaningful to a layman than reading keras code for a NN.

Keras sits on top of Tensorflow, and thus the framework is relatively 'higher-level' than Tensorflow itself.
A 'high' level language or framework is typically defined as one that has a greater number of dependencies or has a greater distance from core binary code, relative to a lower-level language or framework.
E.g., jQuery would be considered higher-level than JavaScript, as it depends on Javascript. Whereas Javascript would be considered higher-level than assembly code, as it's transpiled to assembly.

Related

Strategies for pre-training models for use in tfjs

This is a more general version of a question I've already asked: Significant difference between outputs of deep tensorflow keras model in Python and tensorflowjs conversion
As far as I can tell, the layers of a tfjs model when run in the browser (so far only tested in Chrome and Firefox) will have small numerical differences in the output values when compared to the same model run in Python or Node. The cumulative effect of these small differences across all the layers of the model can cause fairly significant differences in the output. See here for an example of this.
This means a model trained in Python or Node will not perform as well in terms of accuracy when run in the browser. And the deeper your model, the worse it will get.
Therefore my question is, what is the best way to train a model to use with tfjs in the browser? Is there a way to ensure the output will be identical? Or do you just have to accept that there will be small numerical differences and, if so, are there any methods that can be used to train a model to be more resilient to this?
This answer is based on my personal observations. As such, it is debatable and not backed by much evidence. Some things that I follow to get accuracy of 16-bit models close to 32 bit models are:
Avoid using activations that have small upper and lower bounds, such as sigmoid or tanh, for hidden layers. These activations cause the weights of the next layer to become very sensitive to small values, and hence, small changes. I prefer using ReLU for such models. Since it is now the standard activation for hidden layers in most models, you should be using it in any case.
Avoid weight decay and L1/L2 regularizations on weights while training (the kernel_regularizer parameter in keras), since these increase sensitivity of weights. Use Dropout instead, I didn't observe a major drop in performance on TFLite when using it instead of numerical regularizers.

Best case to use tensorflow

I followed all the steps mentioned in the article:
https://stackabuse.com/tensorflow-2-0-solving-classification-and-regression-problems/
Then I compared the results with Linear Regression and found that the error is less (68) than the tensorflow model (84).
from sklearn.linear_model import LinearRegression
logreg_clf = LinearRegression()
logreg_clf.fit(X_train, y_train)
pred = logreg_clf.predict(X_test)
print(np.sqrt(mean_squared_error(y_test, pred)))
Does this mean that if I have large dataset, I will get better results than linear regression?
What is the best situation - when I should be using tensorflow?
Answering your first question, Neural Networks are notoriously known for overfitting on smaller datasets, and here you are comparing the performance of a simple linear regression model with a neural network with two hidden layers on the testing data set, so it's not very surprising to see that the MLP model falling behind (assuming that you are working with relatively a smaller dataset) the linear regression model. Larger datasets will definitely help neural networks in learning more accurate parameters and generalize the phenomena well.
Now coming to your second question, Tensorflow is basically a library for building deep learning models, so whenever you are working on a deep learning problem like image recognition, Natural Language Processing, etc. you need massive computational power and will be processing a ton of data to train your models, and this is where TensorFlow becomes handy, it offers you GPU support which will significantly boost your training process which otherwise becomes practically impossible. Moreover, if you are building a product that has to be deployed in a production environment for it to be consumed, you can make use of TensorFlow Serving which helps you to take your models much closer to the customers.

Differences between different attention layers for Keras

I am trying to add an attention layer for my text classification model. The inputs are texts (e.g. movie review), the output is a binary outcome (e.g. positive vs negative).
model = Sequential()
model.add(Embedding(max_features, 32, input_length=maxlen))
model.add(Bidirectional(CuDNNGRU(16,return_sequences=True)))
##### add attention layer here #####
model.add(Dense(1, activation='sigmoid'))
After some searching, I found a couple of read-to-use attention layers for keras. There is the keras.layers.Attention layer that is built-in in Keras. There is also the SeqWeightedAttention and SeqSelfAttention layer in the keras-self-attention package. As a person who is relatively new to the deep learning field, I have a hard time understanding the mechanism behind these layers.
What does each of these lays do? Which one will be the best for my model?
Thank you very much!
If you are using RNN, I would not recommend using the keras.layers.Attention class.
While analysing tf.keras.layers.Attention Github code to better understand how to use the same, the first line I could come across was - "This class is suitable for Dense or CNN networks, and not for RNN networks"
There is another open source version maintained by CyberZHG called
keras-self-attention. To the best of my knowledge this is NOT a part of the Keras or TensorFlow library and seems to be an independent piece of code. This contains the two classes you mentioned - SeqWeightedAttention & SeqSelfAttention layer classes. former returns a 2D value and latter a 3D value. So the SeqWeightedAttention should work for your situation. The former seems to be loosely based on Raffel et al and can be used for Seq classification, The latter seems to be a variation of Bahdanau.
In general, I would suggest you to write your own seq to classification model. The attention piece can be added in less than half a dozen lines of code (bare-bones essence)...much less than the time you would spend in integrating or debugging or understanding the code in these external libraries.
Please refer: Create an LSTM layer with Attention in Keras for multi-label text classification neural network

Preventing overfitting in transfer learning using TensorFlow and Keras

I've got a TensorFlow 2 model with a pre-trained Keras layer coming from TensorFlow Hub. I want to fine-tune the weights in this sub-model to suit my dataset, but if I do that naively by setting trainable=True and training=True, my model will grossly overfit.
If I had the actual layers of the underlying model under my control, I would insert dropout layers or set L2 coefficient on those individual layers. But the layers are imported to my network using TensorFlow Hub KerasLayer method. Also, I suspect that the underlying model is quite complicated.
I wonder what's the standard practice for solving this kind of issues.
Maybe there is a way to force regularization to the whole network somehow? I know that in TensorFlow 1, there were optimizers like ProximalAdagradOptimizer that took L2 coefficients. In TensorFlow 2, the only optimizer like this is FTRL, but it's hard for me to make it work for my dataset.
I "solved" it by
pretraining non-transferred parts of the model,
then turning on learning for the shared layers,
introducing early stopping,
and configuring the optimizer to go really slow.
This way, I managed to not damage the transferred layers too much. Anyway, I still wonder whether this is the best one can do.

What does DNN mean in a TensorFlow Estimator.DNNClassifier?

I'm guessing that DNN in the sense used in TensorFlow means "deep neural network". But I find this deeply confusing since the notion of a "deep" neural network seems to be in wide use elsewhere to mean a network with typically several convolutional and/or associated layers (ReLU, pooling, dropout, etc).
In contrast, the first instance many people will encounter this term (in the tfEstimator Quickstart example code) we find:
# Build 3 layer DNN with 10, 20, 10 units respectively.
classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 20, 10],
n_classes=3,
model_dir="/tmp/iris_model")
This sounds suspiciously shallow, and even more suspiciously like an old-style multilayer perceptron (MLP) network. However, there is no mention of DNN as an alternative term on that close-to-definitive source. So is a DNN in the TensorFlow tf.estimator context actually an MLP? Documentation on the hidden_units parameter suggests this is the case:
hidden_units: Iterable of number hidden units per layer. All layers are fully connected. Ex. [64, 32] means first layer has 64 nodes and second one has 32.
That has MLP written all over it. Is this understanding correct? Is DNN therefore a misnomer, and if so should DNNClassifier ideally be deprecated in favour of MLPClassifier? Or does DNN stand for something other than deep neural network?
Give me your definition of "deep" neural network and you get your answer.
But yes, it is simply a MLP and a proper naming would be MLPclassifier indeed. But this does not sounds as cool as the current name.
First of all your definition of DNN is a bit misleading.
There are several architectures of deep neural networks. Inclussive Deep Feedforward Networks is nothing more than a multilayered MLP, plus some techniques to make them attractive.
Some works have used "DNNs" to span all Deep Learning architectures, however, by convention, "DNNs" are used to refer to architectures that use deep forward propagation networks, also called Deep Feedforward Networks
The most important example of a Deep Learning Model is the Profound Net Feedforward or Multilayer Perceptron (MLP). MLP is just a mathematical function that maps some sets of input values to output values. The function is formed by the composition of many simpler functions. You can relate each application of a different mathematical function to provide a new representation of the input.
Therefore, it makes sense that this estimator is called DNNClassifier
My advice is to read this book here.