Accessing neural network weights and neuron activations - tensorflow

After training a network using Keras:
I want to access the final trained weights of the network in some order.
I want to know the neuron activation values for every input passed. For example, after training, if I pass X as my input to the network, I want to know the neuron activation values for that X for every neuron in the network.
Does Keras provide API access to these things? I want to do further analysis based on the neuron activation values.
Update : I know I can do this using Theano purely, but Theano requires more low-level coding. And, since Keras is built on top of Theano, I think there could be a way to do this?
If Keras can't do this, then among Tensorflow and Caffe , which can? Keras is the easiest to use, followed by Tensorflow/Caffe, but I don't know which of these provide the network access I need. The last option for me would be to drop down to Theano, but I think it'd be more time-consuming to build a deep CNN with Theano..

This is covered in the Keras FAQ, you basically want to compute the activations for each layer, so you can do it with this code:
from keras import backend as K
#The layer number
n = 3
# with a Sequential model
get_nth_layer_output = K.function([model.layers[0].input],
[model.layers[n].output])
layer_output = get_nth_layer_output([X])[0]
Unfortunately you would need to compile and run a function for each layer, but this should be straightforward.
To get the weights, you can call get_weights() on any layer.
nth_weights = model.layers[n].get_weights()

Related

Freezing BERT layers after importing via TF-hub and training them?

I will describe my intention here. I want to import BERT pretrained model via tf-hub function hub.module(bert_url, trainable = True) and utilize it for text classification task. I plan to use a large corpus to fine-tune weights of BERT as well as a few dense layers whose inputs are the BERT outputs. I would then like to freeze layers of BERT and train only the dense layers following BERT. How can I do this efficiently?
You mention Hub's TF1 API hub.Module, so I suppose you are writing TF1 code and using the TF1-compatible Hub assets google/bert/..., such as https://tfhub.dev/google/bert_cased_L-12_H-768_A-12/1
Are you going to have separate run of your program for the two phases of training? If so, maybe you can just drop trainable=True from the hub.Module call in the second run. This doesn't affect variable names, so you can restore the training result from the first run, including BERT's adjusted weights. (To be clear: the pre-trained weights shipped with the hub.Module are only used for initialization at the very start of training; restoring a checkpoint overrides them.)

What are the uses of layers in keras/Tensorflow

So I am new to computer vision, and I do not really know what the layers do in keras. What is the use of adding layers (dense, Conv2D, etc) in keras? What do they add to it?
Convolution neural network has 4 main steps: Convolution, Pooling, Flatten, and Full connection.
Conv2D(), Conv3D(), etc. is for Feature extraction (It's a Convolution Layer).
Pooling layers (MaxPool2D(), AvgPool2D(), etc) is for Feature extraction as well (It has different operation though).
Flattening layers (Flatten() ) are to convert the extracted feature map into Vector before being fed into the Fully connection layers (The Dense layers).
Dense layers are for Fully connected step in Computer vision that acts as Classifier (The Neural network classify each extracted features from the Convolution layers.)
There are also optimization layers such as Dropout(), BatchNormalization(), etc.
For more information, just open the keras documentation.
If you want to start learning Convolution neural network, this article may help.
A layer in an Artificial Neural Network is a bunch of nodes bound together at a specific depth in a Neural Network. Keras is a high level API used over NN modules like TensorFlow or CNTK in order to simplify tasks. A Keras layer comprises 3 main parts:
Input Layer - Which contains the raw data
Hidden layer - Where the nodes of a layer learn some aspects about
the raw data which is input. It's similar to levels of abstraction
to form a Neural network.
Output Layer - Consists of a single output which is mostly a single
node and can be subjected to classification.
Keras, as a whole consists of many different types of layers. A Convolutional layer creates a kernel which is convoluted with the input over a single temporal space to derive a group of outputs. Pooling layers provide sampling of the feature maps by simplifying features in a map into patches. Max Pooling and Average Pooling are commonly used methods in a Pool layer.
Other commonly used layers in Keras are Embedding layers, Noise layers and Core layers. A single NN layer can represent only a Linearly seperable method. Most prediction problems are complicated and more than just one layer is required. This is where Multi Layer concept is required.
I think i clear your doubts and for any other queries you can see on https://www.tensorflow.org/api_docs/python/tf/keras
Neural networks are a great tool nowadays to automate classification problems. However when it comes to computer vision the amount of input data is too great to be handled efficiently by simple neural networks.
To reduce the network workload, your data needs to be preprocessed and certain features need to be identified. To find features in images we can use certain filters (like sobel edge detection), which will highlight the essential features needed for classification.
Again the amount of filters required to classify one image is too great, and thus the selection of those filters needs to be automated.
That's where the convolutional layer comes in.
We use a convolutional layer to generate multiple random (at first) filters that will highlight certain features in an image. While the network is training those filters are optimized to do a better job at highlighting features.
In Tensorflow we use Conv2D() to add one of those layers. An example of parameters is : Conv2D(64, 3, activation='relu'). 64 denotes the number of filters used, 3 denotes the size of the filters (in this case 3x3) and activation='relu' denotes the activation function
After the convolutional layer we use a pooling layer to further highlight the features produced by the previous convolutional layer. In Tensorflow this is usually done with MaxPooling2D() which takes the filtered image and applies a 2x2 (by default) layer every 2 pixels. The filter applied by MaxPooling is basically looking for the maximum value in that 2x2 area and adds it in a new image.
We can use this set of convolutional layer and pooling layers multiple times to make the image easier for the network to work with.
After we are done with those layers, we need to pass the output to a conventional (Dense) neural network.
To do that, we first need to flatten the image data from a 2D Tensor(Matrix) to a 1D Tensor(Vector). This is done by calling the Flatten() method.
Finally we need to add our Dense layers which are used to train on the flattened data. We do this by calling Dense(). An example of parameters is Dense(64, activation='relu')
where 64 is the number of nodes we are using.
Here is an example CNN structure I used recently:
# Build model
model = tf.keras.models.Sequential()
# Convolution and pooling layers
model.add(tf.keras.layers.Conv2D(64, 3, activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 1))) # Input layer
model.add(tf.keras.layers.MaxPooling2D())
model.add(tf.keras.layers.Conv2D(64, 3, activation='relu'))
model.add(tf.keras.layers.MaxPooling2D())
# Flattened layers
model.add(tf.keras.layers.Flatten())
# Dense layers
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(2, activation='softmax')) # Output layer
Of course this worked for a certain classification problem and the number of layers and method parameters differ depending on the problem.
The Youtube channel The Coding Train has a very helpful video explaining the Convolutional and Pooling layer.

TensorFlow Graph to Keras Model?

Is it possible to define a graph in native TensorFlow and then convert this graph to a Keras model?
My intention is simply combining (for me) the best of the two worlds.
I really like the Keras model API for prototyping and new experiments, i.e. using the awesome multi_gpu_model(model, gpus=4) for training with multiple GPUs, saving/loading weights or whole models with oneliners, all the convenience functions like .fit(), .predict(), and others.
However, I prefer to define my model in native TensorFlow. Context managers in TF are awesome and, in my opinion, it is much easier to implement stuff like GANs with them:
with tf.variable_scope("Generator"):
# define some layers
with tf.variable_scope("Discriminator"):
# define some layers
# model losses
G_train_op = ...AdamOptimizer(...)
.minimize(gloss,
var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
scope="Generator")
D_train_op = ...AdamOptimizer(...)
.minimize(dloss,
var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
scope="Discriminator")
Another bonus is structuring the graph this way. In TensorBoard debugging complicated native Keras models are hell since they are not structured at all. With heavy use of variable scopes in native TF you can "disentangle" the graph and look at a very structured version of a complicated model for debugging.
By utilizing this I can directly setup custom loss function and do not have to freeze anything in every training iteration since TF will only update the weights in the correct scope, which is (at least in my opinion) far easier than the Keras solution to loop over all the existing layers and set .trainable = False.
TL;DR:
Long story short: I like the direct access to everything in TF, but most of the time a simple Keras model is sufficient for training, inference, ... later on. The model API is much easier and more convenient in Keras.
Hence, I would prefer to set up a graph in native TF and convert it to Keras for training, evaluation, and so on. Is there any way to do this?
I don't think it is possible to create a generic automated converter for any TF graph, that will come up with a meaningful set of layers, with proper namings etc. Just because graphs are more flexible than a sequence of Keras layers.
However, you can wrap your model with the Lambda layer. Build your model inside a function, wrap it with Lambda and you have it in Keras:
def model_fn(x):
layer_1 = tf.layers.dense(x, 100)
layer_2 = tf.layers.dense(layer_1, 100)
out_layer = tf.layers.dense(layer_2, num_classes)
return out_layer
model.add(Lambda(model_fn))
That is what sometimes happens when you use multi_gpu_model: You come up with three layers: Input, model, and Output.
Keras Apologetics
However, integration between TensorFlow and Keras can be much more tighter and meaningful. See this tutorial for use cases.
For instance, variable scopes can be used pretty much like in TensorFlow:
x = tf.placeholder(tf.float32, shape=(None, 20, 64))
with tf.name_scope('block1'):
y = LSTM(32, name='mylstm')(x)
The same for manual device placement:
with tf.device('/gpu:0'):
x = tf.placeholder(tf.float32, shape=(None, 20, 64))
y = LSTM(32)(x) # all ops / variables in the LSTM layer will live on GPU:0
Custom losses are discussed here: Keras: clean implementation for multiple outputs and custom loss functions?
This is how my model defined in Keras looks in Tensorboard:
So, Keras is indeed only a simplified frontend to TensorFlow so you can mix them quite flexibly. I would recommend you to inspect source code of Keras model zoo for clever solutions and patterns that allows you to build complex models using clean API of Keras.
You can insert TensorFlow code directly into your Keras model or training pipeline! Since mid-2017, Keras has fully adopted and integrated into TensorFlow. This article goes into more detail.
This means that your TensorFlow model is already a Keras model and vice versa. You can develop in Keras and switch to TensorFlow whenever you need to. TensorFlow code will work with Keras APIs, including Keras APIs for training, inference and saving your model.

Implementing stochastic forward passes in part of a neural network in Keras?

my problem is the following:
I am working on an object detection problem and would like to use dropout during test time to obtain a distribution of outputs. The object detection network consists of a training model and a prediction model, which wraps around the training model. I would like to perform several stochastic forward passes using the training model and combine these e.g. by averaging the predictions in the prediction wrapper. Is there a way of doing this in a keras model instead of requiring an intermediate processing step using numpy?
Note that this question is not about how to enable dropout during test time
def prediction_wrapper(model):
# Example code.
# Arguments
# model: the training model
regression = model.outputs[0]
classification = model.outputs[1]
predictions = # TODO: perform several stochastic forward passes (dropout during train and test time) here
avg_predictions = # TODO: combine predictions here, e.g. by computing the mean
outputs = # TODO: do some processing on avg_predictions
return keras.models.Model(inputs=model.inputs, outputs=outputs, name=name)
I use keras with a tensorflow backend.
I appreciate any help!
The way I understand, you're trying to average the weight updates for a single sample while Dropout is enabled. Since dropout is random, you would get different weight updates for the same sample.
If this understanding is correct, then you could create a batch by duplicating the same sample. Here I am assuming that the Dropout is different for each sample in a batch. Since, backpropagation averages the weight updates anyway, you would get your desired behavior.
If that does not work, then you could write a custom loss function and train with a batch-size of one. You could update a global counter inside your custom loss function and return non-zero loss only when you've averaged them the way you want it. I don't know if this would work, it's just an idea.

DeepLearning Anomaly Detection for images

I am still relatively new to the world of Deep Learning. I wanted to create a Deep Learning model (preferably using Tensorflow/Keras) for image anomaly detection. By anomaly detection I mean, essentially a OneClassSVM.
I have already tried sklearn's OneClassSVM using HOG features from the image. I was wondering if there is some example of how I can do this in deep learning. I looked up but couldn't find one single code piece that handles this case.
The way of doing this in Keras is with the KerasRegressor wrapper module (they wrap sci-kit learn's regressor interface). Useful information can also be found in the source code of that module. Basically you first have to define your Network Model, for example:
def simple_model():
#Input layer
data_in = Input(shape=(13,))
#First layer, fully connected, ReLU activation
layer_1 = Dense(13,activation='relu',kernel_initializer='normal')(data_in)
#second layer...etc
layer_2 = Dense(6,activation='relu',kernel_initializer='normal')(layer_1)
#Output, single node without activation
data_out = Dense(1, kernel_initializer='normal')(layer_2)
#Save and Compile model
model = Model(inputs=data_in, outputs=data_out)
#you may choose any loss or optimizer function, be careful which you chose
model.compile(loss='mean_squared_error', optimizer='adam')
return model
Then, pass it to the KerasRegressor builder and fit with your data:
from keras.wrappers.scikit_learn import KerasRegressor
#chose your epochs and batches
regressor = KerasRegressor(build_fn=simple_model, nb_epoch=100, batch_size=64)
#fit with your data
regressor.fit(data, labels, epochs=100)
For which you can now do predictions or obtain its score:
p = regressor.predict(data_test) #obtain predicted value
score = regressor.score(data_test, labels_test) #obtain test score
In your case, as you need to detect anomalous images from the ones that are ok, one approach you can take is to train your regressor by passing anomalous images labeled 1 and images that are ok labeled 0.
This will make your model to return a value closer to 1 when the input is an anomalous image, enabling you to threshold the desired results. You can think of this output as its R^2 coefficient to the "Anomalous Model" you trained as 1 (perfect match).
Also, as you mentioned, Autoencoders are another way to do anomaly detection. For this I suggest you take a look at the Keras Blog post Building Autoencoders in Keras, where they explain in detail about the implementation of them with the Keras library.
It is worth noticing that Single-class classification is another way of saying Regression.
Classification tries to find a probability distribution among the N possible classes, and you usually pick the most probable class as the output (that is why most Classification Networks use Sigmoid activation on their output labels, as it has range [0, 1]). Its output is discrete/categorical.
Similarly, Regression tries to find the best model that represents your data, by minimizing the error or some other metric (like the well-known R^2 metric, or Coefficient of Determination). Its output is a real number/continuous (and the reason why most Regression Networks don't use activations on their outputs). I hope this helps, good luck with your coding.