For benchmarking different Frameworks, I want to train a inception v3 network from scratch.
Here the code snippet to build the model:
IMAGE_RES = 229
NUM_CLASSES = 102
model = tf.keras.applications.InceptionV3(include_top=True,weights=None,classes=NUM_CLASSES)
model.build(input_shape=(None, IMAGE_RES , IMAGE_RES , channels))
according to the official keras website, the argument weight=None , means a random initialization. Does this mean that I am training my network from scratch? If not, how is it possible to train the nerwork from scratch?
Yes it means that you are training your model from scratch.
Weight and biases in deep learning models are randomly initialized following some specific shemes. (See the Xavier Glorot scheme for example) Those schemes generally helps the network converge faster and achieves better results, by preventing the gradient to either vanish or explode, and by maintaining a low variance in the gradient across all layers.
Related
I am starting to learn Convolutional Neural Networks and have designed the famous MNIST and fashion-MNIST models and obtained good accuracy.
But then I moved to another trivial dataset that is cat vs. Dog dataset from Kaggle, but after applying all my concepts, I learned from Stanford lectures and Andrew ng lectures I was only able to get 80% accuracy. So, I decided to try the GoogleNet and Alexnet, but these model were not able to give me accuracy anything above 50% on 6 epochs.
I wanted to know whether the GoogleNet and ImageNet are designed for 1000 categories output and won't work on 2 categories output?
While making my own model I obtained an accuracy of 80%. I expected the famous GoogleNet model to give me more accuracy, but that's not the case.
Below is the GoogleNet model that I am using:
data=[]
labels=[]
for i in range(0,12499):
img=cv2.imread("train/cat."+str(i)+".jpg")
res = cv2.resize(img, dsize=(224, 224), interpolation=cv2.INTER_CUBIC)
data.append(res)
labels.append(0);
img2=cv2.imread("train/dog."+str(i)+".jpg")
res2 = cv2.resize(img2, dsize=(224,224),interpolation=cv2.INTER_CUBIC)
data.append(res2)
labels.append(1);
train_data, test_data,train_labels, test_labels = train_test_split(data,
labels,
test_size=0.2,
random_state=42)
model=tf.keras.Sequential()
model.add(layers.Conv2D(64,kernel_size=3,activation='relu', input_shape=
(224,224,3)))
model.add(layers.Conv2D(64,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
model.add(layers.Conv2D(128,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(128,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
model.add(layers.Conv2D(256,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(256,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dense(4096,activation='relu'))
model.add(Dense(4096,activation='relu'))
model.add(Dense(2,activation='softmax'))
model.compile(optimizer=tf.train.AdamOptimizer(0.001),
loss='sparse_categorical_c rossentropy',metrics=['accuracy'])
model.fit(x=train_data,y=train_labels,batch_size=32,epochs=10,
validation_data=(test_data,test_labels))
The expected accuracy of the above google model should be more than 50%, but it's ranging between 50% and 51% after 6 epochs.
p.s I changed the last dense layer to 2 instead of 1000, and I am using Keras API for tensor flow.
Any help would be appreciated.
I struggled a bit with this earlier as well.I didn't try it yet on googlenet but I tried it on Alexnet. On Alexnet I managed to get relatively ok results (83%) for cats vs dogs after following closely to the paper. Few things you may want to do:
If you refer to the CS231n notes from Fei Fei Li
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture9.pdf
On slide 10, you will notice that the input layer should be 227 by 227 instead. They also provided the mathematical justification
why it is so.
I started to try and follow other items closely to the original
paper here:
https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
These included:
As in the paper section 3.3, adding a normalization layer at the end of the first two max pooling layers. Keras has stopped supporting LRN but I added batch normalization and it works. (I ran an experiment of a model with batch normalization and without. The accuracy difference is 82% versus 62%
As in the paper section 4.2, I added two dropout layers (0.5) at the end of the two fully connected layers.
As in the paper section 5, I changed my batches to 128, SGD momentum of 0.9 and weight decay of 0.0005
As pointed above in one of the comments from your original question,
my final layer was also a single dimension with sigmoid function.
Training for 20 epochs gave me a 83% accuracy. In the original paper, they included data augmentation but I did not include it in my implementation.
Keras has a modified googlenet example. It is modified from the Xecption architecture, I believe one of the derivatives of the inception architecture.
https://keras.io/examples/vision/image_classification_from_scratch/
I have tried it and after running for 15 epochs, accuracy is about 90%
Hope this helps.
Is it possible to define a graph in native TensorFlow and then convert this graph to a Keras model?
My intention is simply combining (for me) the best of the two worlds.
I really like the Keras model API for prototyping and new experiments, i.e. using the awesome multi_gpu_model(model, gpus=4) for training with multiple GPUs, saving/loading weights or whole models with oneliners, all the convenience functions like .fit(), .predict(), and others.
However, I prefer to define my model in native TensorFlow. Context managers in TF are awesome and, in my opinion, it is much easier to implement stuff like GANs with them:
with tf.variable_scope("Generator"):
# define some layers
with tf.variable_scope("Discriminator"):
# define some layers
# model losses
G_train_op = ...AdamOptimizer(...)
.minimize(gloss,
var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
scope="Generator")
D_train_op = ...AdamOptimizer(...)
.minimize(dloss,
var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
scope="Discriminator")
Another bonus is structuring the graph this way. In TensorBoard debugging complicated native Keras models are hell since they are not structured at all. With heavy use of variable scopes in native TF you can "disentangle" the graph and look at a very structured version of a complicated model for debugging.
By utilizing this I can directly setup custom loss function and do not have to freeze anything in every training iteration since TF will only update the weights in the correct scope, which is (at least in my opinion) far easier than the Keras solution to loop over all the existing layers and set .trainable = False.
TL;DR:
Long story short: I like the direct access to everything in TF, but most of the time a simple Keras model is sufficient for training, inference, ... later on. The model API is much easier and more convenient in Keras.
Hence, I would prefer to set up a graph in native TF and convert it to Keras for training, evaluation, and so on. Is there any way to do this?
I don't think it is possible to create a generic automated converter for any TF graph, that will come up with a meaningful set of layers, with proper namings etc. Just because graphs are more flexible than a sequence of Keras layers.
However, you can wrap your model with the Lambda layer. Build your model inside a function, wrap it with Lambda and you have it in Keras:
def model_fn(x):
layer_1 = tf.layers.dense(x, 100)
layer_2 = tf.layers.dense(layer_1, 100)
out_layer = tf.layers.dense(layer_2, num_classes)
return out_layer
model.add(Lambda(model_fn))
That is what sometimes happens when you use multi_gpu_model: You come up with three layers: Input, model, and Output.
Keras Apologetics
However, integration between TensorFlow and Keras can be much more tighter and meaningful. See this tutorial for use cases.
For instance, variable scopes can be used pretty much like in TensorFlow:
x = tf.placeholder(tf.float32, shape=(None, 20, 64))
with tf.name_scope('block1'):
y = LSTM(32, name='mylstm')(x)
The same for manual device placement:
with tf.device('/gpu:0'):
x = tf.placeholder(tf.float32, shape=(None, 20, 64))
y = LSTM(32)(x) # all ops / variables in the LSTM layer will live on GPU:0
Custom losses are discussed here: Keras: clean implementation for multiple outputs and custom loss functions?
This is how my model defined in Keras looks in Tensorboard:
So, Keras is indeed only a simplified frontend to TensorFlow so you can mix them quite flexibly. I would recommend you to inspect source code of Keras model zoo for clever solutions and patterns that allows you to build complex models using clean API of Keras.
You can insert TensorFlow code directly into your Keras model or training pipeline! Since mid-2017, Keras has fully adopted and integrated into TensorFlow. This article goes into more detail.
This means that your TensorFlow model is already a Keras model and vice versa. You can develop in Keras and switch to TensorFlow whenever you need to. TensorFlow code will work with Keras APIs, including Keras APIs for training, inference and saving your model.
I am working on MNIST dataset on TensorFlow with deep neural networks classifier. I am using the following structure for the network.
MNIST_DATASET = input_data.read_data_sets(mnist_data_path)
train_data = np.array(MNIST_DATASET.train.images, 'int64')
train_target = np.array(MNIST_DATASET.train.labels, 'int64')
test_data = np.array(MNIST_DATASET.test.images, 'int64')
test_target = np.array(MNIST_DATASET.test.labels, 'int64')
classifier = tf.contrib.learn.DNNClassifier(
feature_columns=[tf.contrib.layers.real_valued_column("", dimension=784)],
n_classes=10, #0 to 9 - 10 classes
hidden_units=[2500, 1000, 1500, 2000, 500],
model_dir="model"
)
classifier.fit(train_data, train_target, steps=1000)
However, I faced with the 40% accuracy when I run the following line.
accuracy_score = 100*classifier.evaluate(test_data, test_target)['accuracy']
How can I tune the network? I do something wrong? Similar studies retrieved 99% accuracy in academia.
Thank you.
I find an optimum configuration on GitHub.
Firstly, that's not the best configuration. Academic studies have already reached the 99.79% accuracy on test set.
classifier = tf.contrib.learn.DNNClassifier(
feature_columns=feature_columns
, n_classes=10
, hidden_units=[128, 32]
, optimizer=tf.train.ProximalAdagradOptimizer(learning_rate=learning_rate)
, activation_fn = tf.nn.relu
)
Also, the following parameters is transfered to the classifier.
epoch = 15000
learning_rate = 0.1
batch_size = 40
In this way, model classifies 97.83% accuray on test set, and 99.77% accuracy on trainset.
Speaking from experience, it would be a good idea to have no more than 2 hidden layers in fully connected network for MNIST dataset. i.e. hidden_units=[500, 500]. That should get to over 90% accuracy.
What is the problem? Extreme number of model parameters. For example, just second hidden layer would require (2500*1000+1000) of parameters. The rule of thumb would be to keep number of trainable parameters somewhat comparable to number of training examples, or it is at least so in classical machine learning. If otherwise, regularize model rigorously.
What steps can be taken here?
Use simpler model. Decrease number of hidden units, number of layers
Use model with smaller number of parameters. Convolutional layers, for instance, would generally utilize much smaller number of parameters for the same number of units. For instance 1000 convolutinal neurons with 3x3 kernels would need only 1000*(3*3+1) parameters
Apply regularization: batch normalization, noise injection into your input, dropout, weight decay would be good examples to start from.
I am still relatively new to the world of Deep Learning. I wanted to create a Deep Learning model (preferably using Tensorflow/Keras) for image anomaly detection. By anomaly detection I mean, essentially a OneClassSVM.
I have already tried sklearn's OneClassSVM using HOG features from the image. I was wondering if there is some example of how I can do this in deep learning. I looked up but couldn't find one single code piece that handles this case.
The way of doing this in Keras is with the KerasRegressor wrapper module (they wrap sci-kit learn's regressor interface). Useful information can also be found in the source code of that module. Basically you first have to define your Network Model, for example:
def simple_model():
#Input layer
data_in = Input(shape=(13,))
#First layer, fully connected, ReLU activation
layer_1 = Dense(13,activation='relu',kernel_initializer='normal')(data_in)
#second layer...etc
layer_2 = Dense(6,activation='relu',kernel_initializer='normal')(layer_1)
#Output, single node without activation
data_out = Dense(1, kernel_initializer='normal')(layer_2)
#Save and Compile model
model = Model(inputs=data_in, outputs=data_out)
#you may choose any loss or optimizer function, be careful which you chose
model.compile(loss='mean_squared_error', optimizer='adam')
return model
Then, pass it to the KerasRegressor builder and fit with your data:
from keras.wrappers.scikit_learn import KerasRegressor
#chose your epochs and batches
regressor = KerasRegressor(build_fn=simple_model, nb_epoch=100, batch_size=64)
#fit with your data
regressor.fit(data, labels, epochs=100)
For which you can now do predictions or obtain its score:
p = regressor.predict(data_test) #obtain predicted value
score = regressor.score(data_test, labels_test) #obtain test score
In your case, as you need to detect anomalous images from the ones that are ok, one approach you can take is to train your regressor by passing anomalous images labeled 1 and images that are ok labeled 0.
This will make your model to return a value closer to 1 when the input is an anomalous image, enabling you to threshold the desired results. You can think of this output as its R^2 coefficient to the "Anomalous Model" you trained as 1 (perfect match).
Also, as you mentioned, Autoencoders are another way to do anomaly detection. For this I suggest you take a look at the Keras Blog post Building Autoencoders in Keras, where they explain in detail about the implementation of them with the Keras library.
It is worth noticing that Single-class classification is another way of saying Regression.
Classification tries to find a probability distribution among the N possible classes, and you usually pick the most probable class as the output (that is why most Classification Networks use Sigmoid activation on their output labels, as it has range [0, 1]). Its output is discrete/categorical.
Similarly, Regression tries to find the best model that represents your data, by minimizing the error or some other metric (like the well-known R^2 metric, or Coefficient of Determination). Its output is a real number/continuous (and the reason why most Regression Networks don't use activations on their outputs). I hope this helps, good luck with your coding.
After training a network using Keras:
I want to access the final trained weights of the network in some order.
I want to know the neuron activation values for every input passed. For example, after training, if I pass X as my input to the network, I want to know the neuron activation values for that X for every neuron in the network.
Does Keras provide API access to these things? I want to do further analysis based on the neuron activation values.
Update : I know I can do this using Theano purely, but Theano requires more low-level coding. And, since Keras is built on top of Theano, I think there could be a way to do this?
If Keras can't do this, then among Tensorflow and Caffe , which can? Keras is the easiest to use, followed by Tensorflow/Caffe, but I don't know which of these provide the network access I need. The last option for me would be to drop down to Theano, but I think it'd be more time-consuming to build a deep CNN with Theano..
This is covered in the Keras FAQ, you basically want to compute the activations for each layer, so you can do it with this code:
from keras import backend as K
#The layer number
n = 3
# with a Sequential model
get_nth_layer_output = K.function([model.layers[0].input],
[model.layers[n].output])
layer_output = get_nth_layer_output([X])[0]
Unfortunately you would need to compile and run a function for each layer, but this should be straightforward.
To get the weights, you can call get_weights() on any layer.
nth_weights = model.layers[n].get_weights()