What's the relationship between Tensorflow's dataflow graph and DNN? - tensorflow

As we know, a DNN is comprised of many layers which consist of many neurons applying the same function to different parts of the input. Meanwhile, if we use Tensorflow to execute a DNN task, we will get a dataflow graph generated by Tensorflow automatically and we can use Tensorboard to visualize the dataflow graph as blow. But there is no neuron in the layer. So I wonder what is the relationship between Tensorflow dataflow graph and a DNN? When a neuron of DNN's layer map into dataflow graph, how is it represented?What is the relationship of neuron in DNN and node in tensorflow(representing an operation)? I just started to learn DNN and Tensorflow, please help me arrange thoughts in order. Thanks:) enter image description here

You have to differentiate between the metaphoric representation of a DNN and it's mathematic description. The math behind a classic neuron is the sum of the weighted inputs + a bias (usually calling a activation function on this result)
So in this case you have an input vector mutplied by a weight vector (containing trainable variables) and then summed up with a bias scalar (also trainable)
If you now consider a layer of neurons instead of one, the weights will become a matrix and the bias a vector. So calculating a feed forward layer is nothing more then a matrix multiplication follow by a sum of vectors.
This is the operation you can see in your tensorflow graph.
You can actually build your Neural Network this way without any use of the so called High Level API which use the Layer abstraction. (Many have done this in the early days of tensorflow)
The actual "magic", which tensorflow does for you is calculating and executing the derivatives of this foreword pass in order to calculate the updates for the weights.

Related

Deploying tensorflow RNN models (other than keras LSTM) to a microcontroller without unrolling the network?

Goal
I want to compare different types of RNN tflite-micro models, built using tensorflow, on a microcontroller based on their accuracy, model size and inference time. I have also created my own custom RNN cell that I want to compare with the LSTM cell, GRU cell, and SimpleRNN cell. I create the tensorflow model using tf.keras.layers.RNN(Cell(...)).
Problem
I have successfully deployed a keras LSTM-RNN using tf.keras.layers.LSTM(...) but when I create the same model using tf.keras.layers.RNN(tf.keras.layers.LSTMCell(...)) and deploy it to the microcontroller, then it does not work. I trained both networks on a batch size of 64, but then I copy the weights and biases to a model where the batch_size is fixed to 1 as tflite-micro does not support dynamic batch sizes.
When the keras LSTM layer is converted to a tflite model it creates a fused operator called UnidirectionalSequenceLSTM but the network created with an RNN layer using the LSTMCell does not have that UnidirectionalSequenceLSTM operator, instead it has a reshape and while operator. The first network has only 1 subgraph but the second has 3 subgraphs.
When I run that second model on the microcontroller, two things go wrong:
the interpreter returns the same result for different inputs
the interpreter fails on some inputs reporting an error with the while loop saying that int32 is not supported (which is in the while operator, and can't be quantized to int8)
LSTM tflite-model vizualized with Netron
RNN(LSTMCell) tflite-model vizualized with Netron
Bad solution (10x model size)
I figured out that by unrolling the second network I can successfully deploy it and get correct results on the microcontroller. However, that increases the model size 10x which is really bad as we are trying to deploy the model on a resource constrained device.
Better solution?
I have explained the problem using the example of the LSTM layer (works) and LSTM cell in an RNN layer (does not work), but I want to be able to deploy a model using the GRU cell, SimpleRNN cell, and of course the custom cell that I have created. And all those have the same problem as the network created with the LSTM cell.
What can I do?
Do I have to create a special fused operator? Maybe even one for each cell I want to compare? How would I do that?
Can I use the interface into the conversion infrastructure for user-defined RNN implementations mentioned here: https://www.tensorflow.org/lite/models/convert/rnn. How I understand the documentation, is that this would only work for user-defined LSTM implementations, not user-defined RNN implemenations like the title suggests.

What are the uses of layers in keras/Tensorflow

So I am new to computer vision, and I do not really know what the layers do in keras. What is the use of adding layers (dense, Conv2D, etc) in keras? What do they add to it?
Convolution neural network has 4 main steps: Convolution, Pooling, Flatten, and Full connection.
Conv2D(), Conv3D(), etc. is for Feature extraction (It's a Convolution Layer).
Pooling layers (MaxPool2D(), AvgPool2D(), etc) is for Feature extraction as well (It has different operation though).
Flattening layers (Flatten() ) are to convert the extracted feature map into Vector before being fed into the Fully connection layers (The Dense layers).
Dense layers are for Fully connected step in Computer vision that acts as Classifier (The Neural network classify each extracted features from the Convolution layers.)
There are also optimization layers such as Dropout(), BatchNormalization(), etc.
For more information, just open the keras documentation.
If you want to start learning Convolution neural network, this article may help.
A layer in an Artificial Neural Network is a bunch of nodes bound together at a specific depth in a Neural Network. Keras is a high level API used over NN modules like TensorFlow or CNTK in order to simplify tasks. A Keras layer comprises 3 main parts:
Input Layer - Which contains the raw data
Hidden layer - Where the nodes of a layer learn some aspects about
the raw data which is input. It's similar to levels of abstraction
to form a Neural network.
Output Layer - Consists of a single output which is mostly a single
node and can be subjected to classification.
Keras, as a whole consists of many different types of layers. A Convolutional layer creates a kernel which is convoluted with the input over a single temporal space to derive a group of outputs. Pooling layers provide sampling of the feature maps by simplifying features in a map into patches. Max Pooling and Average Pooling are commonly used methods in a Pool layer.
Other commonly used layers in Keras are Embedding layers, Noise layers and Core layers. A single NN layer can represent only a Linearly seperable method. Most prediction problems are complicated and more than just one layer is required. This is where Multi Layer concept is required.
I think i clear your doubts and for any other queries you can see on https://www.tensorflow.org/api_docs/python/tf/keras
Neural networks are a great tool nowadays to automate classification problems. However when it comes to computer vision the amount of input data is too great to be handled efficiently by simple neural networks.
To reduce the network workload, your data needs to be preprocessed and certain features need to be identified. To find features in images we can use certain filters (like sobel edge detection), which will highlight the essential features needed for classification.
Again the amount of filters required to classify one image is too great, and thus the selection of those filters needs to be automated.
That's where the convolutional layer comes in.
We use a convolutional layer to generate multiple random (at first) filters that will highlight certain features in an image. While the network is training those filters are optimized to do a better job at highlighting features.
In Tensorflow we use Conv2D() to add one of those layers. An example of parameters is : Conv2D(64, 3, activation='relu'). 64 denotes the number of filters used, 3 denotes the size of the filters (in this case 3x3) and activation='relu' denotes the activation function
After the convolutional layer we use a pooling layer to further highlight the features produced by the previous convolutional layer. In Tensorflow this is usually done with MaxPooling2D() which takes the filtered image and applies a 2x2 (by default) layer every 2 pixels. The filter applied by MaxPooling is basically looking for the maximum value in that 2x2 area and adds it in a new image.
We can use this set of convolutional layer and pooling layers multiple times to make the image easier for the network to work with.
After we are done with those layers, we need to pass the output to a conventional (Dense) neural network.
To do that, we first need to flatten the image data from a 2D Tensor(Matrix) to a 1D Tensor(Vector). This is done by calling the Flatten() method.
Finally we need to add our Dense layers which are used to train on the flattened data. We do this by calling Dense(). An example of parameters is Dense(64, activation='relu')
where 64 is the number of nodes we are using.
Here is an example CNN structure I used recently:
# Build model
model = tf.keras.models.Sequential()
# Convolution and pooling layers
model.add(tf.keras.layers.Conv2D(64, 3, activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 1))) # Input layer
model.add(tf.keras.layers.MaxPooling2D())
model.add(tf.keras.layers.Conv2D(64, 3, activation='relu'))
model.add(tf.keras.layers.MaxPooling2D())
# Flattened layers
model.add(tf.keras.layers.Flatten())
# Dense layers
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(2, activation='softmax')) # Output layer
Of course this worked for a certain classification problem and the number of layers and method parameters differ depending on the problem.
The Youtube channel The Coding Train has a very helpful video explaining the Convolutional and Pooling layer.

Training quantized models in TensorFlow

I would like to train a quantized network, i.e. use quantized weights during the forward pass to calculate the loss and then update the underlying full-precision floating point weights during the backward pass.
Note that in my case "fake quantization" is sufficient. That means that the weights can still be stored as 32-bit floating point values, as long as they represent a low bitwidth quantized value.
In a blog post from Pete Warden he states:
[...] we do have support for “fake quantization” operators. If you include these in your graphs at the points where quantization is expected to occur (for example after convolutions), then in the forward pass the float values will be rounded to the specified number of levels (typically 256) to simulate the effects of quantization.
The mentioned operators can be found in the TensorFlow API.
Can anybody point out to me how to use these functions?
If I call them after e.g. a conv layer in my model definition, why would this quantize the weights in the layer instead of the outputs (activations) of this layer?

When predicting with an LSTM in Keras, is the hidden state still adjusted?

When I first train an LSTM in Keras on sequence data - my training data -
and then use model.predict() to make predictions with my test data as input, is the hidden state of the LSTM still being adjusted?
Basic operation of a neural network is to take an input (vector) which is connected to the output with connections and, sometimes, other layers such as context layers. These connections are modelled as matrices and vary in strength, we call these weight matrices.
This means that the only thing we do when we are feeding data into the network is to put a vector into the network, multiply the values with the weight matrix and call that the output. In special cases, like recurrent networks, we even keep some values stored in other vectors and combine this stored value with the current input.
During training we not only feed data into the network, we also compute an error value that we evaluate in a clever way so that it tells us how we should change the weight matrices we multiply our inputs (and possibly past inputs for recurrent layers) with.
Therefore: yes, of course the basic execution behavior does not change for recurrent layers. We are just not updating weights anymore.
There are layers that do behave differently during execution time because they are treated as regularisers, i.e. methods that make training the network more efficient, which are deemed as unnecessary during execution. Examples for these layers are Noise and BatchNormalization. Almost all neural network layers (including recurrent ones) include drop-out which is another form of regularisation which disables a random percentage of connections in the layer. This is also only done during training.

pruning tensorflow connections and weights (using cifar10 cnn)

I'm using tensorflow to run a cnn for image classification.
I use tensorflow cifar10 cnn implementation.(tensorflow cifar10)
I want to decrease the number of connections, meaning I want to prune the low-weight connections.
How can I create a new graph(subgraph) without some of the nuerones?
Tensorflow does not allow you lock/freeze a particular kernel of a particular layer, that I have found. The only I've found to do this is to use the tf.assign() function as shown in
How to freeze/lock weights of one Tensorflow variable (e.g., one CNN kernel of one layer
It's fairly cave-man but I've seen no other solution that works. Essentially, you have to .assign() the values every so often as you iterate through the data. Since this approach is so inelegant and brute-force, it's very slow. I do the .assign() every 100 batches.
Someone please post a better solution and soon!
The cifar10 model you point to, and for that matter, most models written in TensorFlow, do not model the weights (and hence, connections) of individual neurons directly in the computation graph. For instance, for fully connected layers, all the connections between the two layers, say, with M neurons in the layer below, and 'N' neurons in the layer above, are modeled by one MxN weight matrix. If you wanted to completely remove a neuron and all of its outgoing connections from the layer below, you can simply slice out a (M-1)xN matrix by removing the relevant row, and multiply it with the corresponding M-1 activations of the neurons.
Another way is add an addition mask to control the connections.
The first step involves adding mask and threshold variables to the
layers that need to undergo pruning. The variable mask is the same
shape as the layer's weight tensor and determines which of the weights
participate in the forward execution of the graph.
There is a pruning implementation under tensorflow/contrib/model_pruning to prune the model. Hope this can help you to prune model quickly.
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/model_pruning
I think google has an updated answer here : https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/model_pruning
Removing pruning nodes from the trained graph:
$ bazel build -c opt contrib/model_pruning:strip_pruning_vars
$ bazel-bin/contrib/model_pruning/strip_pruning_vars --checkpoint_path=/tmp/cifar10_train --output_node_names=softmax_linear/softmax_linear_2 --filename=cifar_pruned.pb
I suppose that cifar_pruned.pb will be smaller, since the pruned "or zero masked" variables are removed.