Hierarchical classification using LCPN (Local Classifier per Parent Node) approach - tensorflow

Objective:
I am working on a Hierarchical Classification problem and want to solve it using the Local Classifier per Parent Node (LCPN) approach using Tensorflow. In order to do so, I have to create local classifiers based on the hierarchical dataset.
For example:
I have manually created a hierarchical tree structure for the CIFAR-10 Dataset following this paper. The hierarchical structure is as follows:
Based on this structure, it requires a total of 6 local classifiers.
1 Classifier in level 1:
for classifying class transport and animal.
2 Classifier in level 2:
1 for classifying classes sky, water, road (Subclass of class transport )
1 for classifying classes bird, reptile, per, medium (Subclass of class animal)
3 Classifier in level 3:
1 for classifying classes automobile and truck (Subclass of the class road)
1 for classifying classes cat and dog (Subclass of class pet)
1 for classifying classes deer and horse (Subclass of class medium)
NOTE:
I want to get all the predictions on level 3 (10 classes). If the classifier for level 1 outputs a class of level 2 that does not have more than one subclass in level 3, it should automatically assign the corresponding class in level 3 for that sample.
For example: If the 1st classifier identifies a sample as transport, then it will select the classifier to classify the subclass of transport (sky, water, road). If the classifier in level 2 classifies that sample as sky then will no longer need another classifier to classify the subclass as it has only one subclass which is the class airplane. But for my implementation, I want the final prediction as level 3 predictions and output as airplane.
Implementation:
To implement this, so far I have done the following:
I have determined the number of local classifiers from the dataset and the number of classes by using treelib. It determines the number of outputs requires for the local classifiers.
I am working on generating a dataset pipeline using tf.data.Dataset.filter which will provide a filtered dataset for training the models. As I am going to train the local classifier with relevant samples. For example, the classifier for determining the subclass of level 1 class transport will be trained with samples of all the classes that are under the level 1 class transport. So, I want to filter out the samples that belong to the class animal or any subclass of animal.
After that, I have to implement a decision tree for predicting from the models.
Now, I am struggling with the implementation using this approach. Is there any better solution for this kind of problem? Or any alternative approach?

Look at it this way when you create an initial of information with 10 samples in one input, you do it the same way and capture the results in each layer.
You know about the classification problems you do it by finding maximum or minimum, ranges or critical.
Why do the notebooks select LAN interfaces over WiFi? Speed and Priorities even some of them are faster than LAN.
Sample: Simply print out each time you had herachy.
import tensorflow as tf
class MyDenseLayer(tf.keras.layers.Layer):
def __init__(self, num_outputs):
super(MyDenseLayer, self).__init__()
self.num_outputs = num_outputs
def build(self, input_shape):
self.kernel = self.add_weight("kernel",
shape=[int(input_shape[-1]),
self.num_outputs])
def call(self, inputs):
return tf.matmul(inputs, self.kernel)
start = 3
limit = 33
delta = 3
# Create DATA
sample = tf.range(start, limit, delta)
sample = tf.cast( sample, dtype=tf.float32 )
# Initail, ( 10, 1 )
sample = tf.constant( sample, shape=( 10, 1 ) )
layer = MyDenseLayer(10)
data = layer(sample)
# Layer 1, ( 10, 2 )
layer = MyDenseLayer(2)
data = layer(data)
# Layer 2, ( 10, 7 )
layer = MyDenseLayer(7)
data = layer(data)
# Layer 3, ( 10, 10 )
layer = MyDenseLayer(10)
data = layer(data)
print( data )

Related

How to fine tune an object detection model for custom data and classes using Detectron2?

I have a pre-trained model weight (as .pth) and it's configuration (as .yaml) and I want to fine-tune this model on my downstream task. The only problem is that I have 1 class while the pre trained model has 5 classes and when I have fine tuned my model with Detectron2, it gives me results for all the 5 classes instead of my 1 class. How can I deal with that scenario?
This is the exact tutorial which I am following but instead of training my classes on all 5 classes as thing_classes= ['None','text', 'title', 'list', 'table', 'figure'], I want to train just on one class as [text]. Author has answered but it did not help me as when I got the results during testing, I got results for all the 5 classes.
Pre-trained Model Weight
Pre- trained Model Config
I have put 'category_id' of every instance as 0 (because I have just 1 class).
Below is the code where I have registered the data and everything and there is no problem with training, model trains well
from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor, DefaultTrainer
!wget -O ./faster_rcnn_R_50_FPN_3x.pth 'https://www.dropbox.com/s/dgy9c10wykk4lq4/model_final.pth?dl=1'
!wget -O ./faster_rcnn_R_50_FPN_3x.yaml 'https://www.dropbox.com/s/f3b12qc4hc0yh4m/config.yml?dl=1'
cfg = get_cfg()
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 # Just one class predictions
cfg.merge_from_file("./faster_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.WEIGHTS= './faster_rcnn_R_50_FPN_3x.pth' # layout parser Pre trained weights
cfg.SOLVER.IMS_PER_BATCH = 4
cfg.SOLVER.BASE_LR = 0.0025
cfg.SOLVER.MAX_ITER = 50 #adjust up if val mAP is still rising, adjust down if overfit
cfg.SOLVER.GAMMA = 0.05
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 4
cfg.DATASETS.TRAIN = (Data_Resister_training,)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
Not sure if this will fix it, but try inverting the merge_from_file call and the number of classes setting:
cfg = get_cfg()
cfg.merge_from_file("./faster_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 # Just one class predictions
...
Maybe that parameter gets overwritten.

How should I access feature maps before Pooling layer in the BiT model for tensorflow?

I would like to perform training on the BiT models, which provide the suitable transfer learning potential. Over 25 architectures of various models are provided at Tensorflow Hub that are grouped for classifiers and feature extractors. As I would like to develop a model with two outputs (features and classification results), I used R50x1 pre-trained on ImageNet-21k feature extraction model as follows.
First, I load the model and added a fully connected head to it.
# Load
module = hub.KerasLayer("https://tfhub.dev/google/bit/m-r50x1/1")
# Function to add head
class BiT_2outs(tf.keras.Model):
def __init__(self, num_classes, module):
super().__init__()
self.num_classes = num_classes
self.head = tf.keras.layers.Dense(num_classes, kernel_initializer='zeros')
self.bit_model = module
def call(self, images):
bit_embedding = self.bit_model(images)
print(bit_embedding.shape) # print out feature shapes
return self.head(bit_embedding), bit_embedding
model = MyBiTModel(num_classes=NUM_CLASSES, module=module)
model(image) # predict for a random image
The feature maps shape printed by bit_embedding.shape results in shape=(1, 2048), which corresponds clearly to the feature maps after passing the Global Average/Max Pooling layer. However, I would like to have the feature maps with sizes for example (32, 32, 2048) before the GAP layer. How should I access these feature maps?
A. Kolesnikov, L. Beyer, X. Zhai, J. Puigcerver, J. Yung, S. Gelly and N. Houlsby: Big Transfer (BiT): General Visual Representation Learning.
Update 09.03.2022
I have downloaded manually the saved model and loaded the model in tensorflow. Which now allows me to extract the outputs of various layers included in the BiT model.

Can we create a recursive model in keras?

I have models to make in keras. The output of one model has to be fed as input to other models.
Input -> say a batch of 64 X 64 images
First model outputs -> three outputs , splits some of the input images of the batch to 32 X 32, 64 X 32 and 64 X 16.
Each of these images of different sizes will be input to three different models which will further split them. This will continue six times in a recursive fashion.
See the pic for better understanding:Click to see image
There are 6 stages in each stage there are three choices from the parent model.
In this way a ternary tree structure of models is formed.
Each model has its own loss and optimizers.
How to implement such a model during training ? Should we use recursion ? Is recursion allowed in model training in such a manner in keras?
Will the sizes/number change during training? Or will you define the setup and keep it like that? If you are keeping it the same throughout, but just changing it to test different model setups, you can easily create a function that generates the model tree. For example
def create_model(tree_depth):
models = []
for i in range(tree_depth):
model = ... # might be nice to have a function for defining a single model
models.append(model)
top_level_inputs = tf.keras.layers.Input((64, 64))
x = model[0](top_level_inputs) # using functional model format here
# if you want different parts of the input to go to different models, you may struggle.
# Look into strided_slice if necessary
for mod in models:
x = mod(x) # you will need to code the true tree structure here, rather than this one-level for loop
total_model = tf.keras.models.Model(top_level_inputs, x)
return total_model
my_model = create_model(my_depth)
The biggest challenge will be automating the shapes if you don't have each layer get the same sized inputs, and making some sort of nested for-loop to handle the recursions/splitting.

How to design an optimal CNN? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I am working on a Ph.D. project, which objective is to reduce CO2 emissions on Earth.
I have a dataset, and I was able to successfully implement a CNN, which gives 80% accuracy (worst-case scenario). However, the field where I work is very demanding, and I have the impression that I could get better accuracy with a well-optimized CNN.
How do experts design CNN's? How could I choose between Inception Modules, Dropout Regularization, Batch Normalization, convolutional filter size, size and depth of convolutional channels, number of fully-connected layers, activations neurons, etc? How do people navigate this large optimization problem in a scientific manner? The combinations are endless. Are there any real-life examples where this problem is navigated, addressing its full complexity (not just optimizing a few hyper-parameters)?
Hopefully, my dataset is not too large, so the CNN models that I am considering should have very few parameters.
How do experts design CNN's? How could I choose between Inception Modules, Dropout Regularization, Batch Normalization, convolutional filter size, size and depth of convolutional channels, number of fully-connected layers, activations neurons, etc? How do people navigate this large optimization problem in a scientific manner? The combinations are endless.
You said truly that the combinations are huge in number. And without approaching rightly you may end up with nowhere. A great one said machine Learning is an art, not science. Results are data-dependent. Here are a few tips regarding your above concern.
Log Everything: In the training time, save necessary logs of every experiment such as training loss, validation loss, weight files, execution times, visualization, etc. Some of them can be saved with CSVLogger, ModelCheckpoint etc. TensorBoard is a great tool for inspecting both training log and visualization and many more.
Strong Validation Strategies: This is very important. To build a stable Cross-Validation (CV), we must have a good understanding of the data and the challenges faced. We’ll check and make sure the validation set has a similar distribution to the training set and test set. And We’ll try to make sure our models improve both on our CV and on the test set (if gt is available for the test set). Basically, partitioning the data randomly is usually not enough to satisfy this. Understanding the data and how we can partition it without introducing a data leakage in our CV is key to avoid overfitting.
Change Only One: During the experiment, change one thing at a time and save the observations (logs) for those changes. For example: change the image size gradually from 224 (for example) to higher and observe the results. We should start with a small combination. While experimenting with image size, fix others like model architecture, learning rate, etc. The same goes for the learning rate part or model architectures. However, later we also may need to change more than one when we get some promising combinations. In kaggle competition, these are very common approaches one would follow. Below is a very simple example regarding this. But it's not limited any way.
However, as you said, your Ph.D. project is to reduce CO2 emissions on Earth. In my understanding, these are more application-specific problems and less than the algorithm-specific problems. So, we think it's better to take benefit from well-recognized pre-trained models.
In case if we wish to write our CNN on our own, we should give a decent time on it. Start with a very simple one, for example:
Conv2D (16, 3, 'relu') - > MaxPool (2)
Conv2D (32, 3, 'relu') - > MaxPool (2)
Conv2D (64, 3, 'relu') - > MaxPool (2)
Conv2D (128, 3, 'relu') - > MaxPool (2)
Here we gradually increase the depth but reducing the feature dimension. By the end layer, more semantic information would emerge. While stacking Conv2D layers, it's common practice to increase the channel depth in such order 16, 32, 64, 128 etc. If we want to impute Inception or Residual Block inside our network, I think, we should do some basic math first about what feature properties will come out of this, etc. Following a concept like this, we may also wish to look at approaches like SENet, ResNeSt etc. About Dropout, if we observe that our model is getting overfitted during training, then we should add some. In the final layer, we may want to choose GlobalAveragePooling over the Flatten layer (FCC). We can probably now understand that there are lots of ablation studies that need to be done to get a satisfactory CNN model.
In this regard, We suggest you explore the two most important things: (1). Read one of the pre-trained model papers/blogs/videos about their strategies to build the algorithm. For example: check out this EfficientNet Explained. (2). Next, explore the source code of it. That would give your more sense and encourage you to build your own giant.
We like to end this with one last working example. See the model diagram below, it's a small inception network, source. If we look closely, we will see, it consists of the following three modules.
Conv Module
Inception Module
Downsample Modul
Take a close look at each module's configuration such as filter size, strides, etc. Let's try to understand and implement this module. Before that, here are two good references (1, 2) for the Inception concept to refresh the concept.
Conv Module
From the diagram we can see, it consists of one convolutional network, one batch normalization, and one relu activation. Also, it produces C times feature maps with K x K filters and S x S strides. To do that, we will create a class object that will inherit the tf.keras.layers.Layer classes
class ConvModule(tf.keras.layers.Layer):
def __init__(self, kernel_num, kernel_size, strides, padding='same'):
super(ConvModule, self).__init__()
# conv layer
self.conv = tf.keras.layers.Conv2D(kernel_num,
kernel_size=kernel_size,
strides=strides, padding=padding)
# batch norm layer
self.bn = tf.keras.layers.BatchNormalization()
def call(self, input_tensor, training=False):
x = self.conv(input_tensor)
x = self.bn(x, training=training)
x = tf.nn.relu(x)
return x
Inception Module
Next comes the Inception module. According to the above graph, it consists of two convolutional modules and then merges together. Now as we know to merge, here we need to ensure that the output feature maps dimension ( height and width ) needs to be the same.
class InceptionModule(tf.keras.layers.Layer):
def __init__(self, kernel_size1x1, kernel_size3x3):
super(InceptionModule, self).__init__()
# two conv modules: they will take same input tensor
self.conv1 = ConvModule(kernel_size1x1, kernel_size=(1,1), strides=(1,1))
self.conv2 = ConvModule(kernel_size3x3, kernel_size=(3,3), strides=(1,1))
self.cat = tf.keras.layers.Concatenate()
def call(self, input_tensor, training=False):
x_1x1 = self.conv1(input_tensor)
x_3x3 = self.conv2(input_tensor)
x = self.cat([x_1x1, x_3x3])
return x
Here you may notice that we are now hard-coded the exact kernel size and strides number for both convolutional layers according to the network (diagram). And also in ConvModule, we have already set padding to the same, so that the dimension of the feature maps will be the same for both (self.conv1 and self.conv2); which is required in order to concatenate them to the end.
Again, in this module, two variable performs as the placeholder, kernel_size1x1, and kernel_size3x3. This is for the purpose of course. Because we will need different numbers of feature maps to the different stages of the entire model. If we look into the diagram of the model, we will see that InceptionModule takes a different number of filters at different stages in the model.
Downsample Module
Lastly the downsampling module. The main intuition for downsampling is that we hope to get more relevant feature information that highly represents the inputs to the model. As it tends to remove the unwanted feature so that model can focus on the most relevant. There are many ways we can reduce the dimension of the feature maps (or inputs). For example: using strides 2 or using the conventional pooling operation. There are many types of pooling operation, namely: MaxPooling, AveragePooling, GlobalAveragePooling.
From the diagram, we can see that the downsampling module contains one convolutional layer and one max-pooling layer which later merges together. Now, if we look closely at the diagram (top-right), we will see that the convolutional layer takes a 3 x 3 size filter with strides 2 x 2. And the pooling layer (here MaxPooling) takes pooling size 3 x 3 with strides 2 x 2. Fair enough, however, we also ensure that the dimension coming from each of them should be the same in order to merge at the end. Now, if we remember when we design the ConvModule we purposely set the value of the padding argument to same. But in this case, we need to set it to valid.
class DownsampleModule(tf.keras.layers.Layer):
def __init__(self, kernel_size):
super(DownsampleModule, self).__init__()
# conv layer
self.conv3 = ConvModule(kernel_size, kernel_size=(3,3),
strides=(2,2), padding="valid")
# pooling layer
self.pool = tf.keras.layers.MaxPooling2D(pool_size=(3, 3),
strides=(2,2))
self.cat = tf.keras.layers.Concatenate()
def call(self, input_tensor, training=False):
# forward pass
conv_x = self.conv3(input_tensor, training=training)
pool_x = self.pool(input_tensor)
# merged
return self.cat([conv_x, pool_x])
Okay, now we have built all three modules, namely: ConvModule InceptionModule DownsampleModule. Let's initialize their parameter according to the diagram.
class MiniInception(tf.keras.Model):
def __init__(self, num_classes=10):
super(MiniInception, self).__init__()
# the first conv module
self.conv_block = ConvModule(96, (3,3), (1,1))
# 2 inception module and 1 downsample module
self.inception_block1 = InceptionModule(32, 32)
self.inception_block2 = InceptionModule(32, 48)
self.downsample_block1 = DownsampleModule(80)
# 4 inception module and 1 downsample module
self.inception_block3 = InceptionModule(112, 48)
self.inception_block4 = InceptionModule(96, 64)
self.inception_block5 = InceptionModule(80, 80)
self.inception_block6 = InceptionModule(48, 96)
self.downsample_block2 = DownsampleModule(96)
# 2 inception module
self.inception_block7 = InceptionModule(176, 160)
self.inception_block8 = InceptionModule(176, 160)
# average pooling
self.avg_pool = tf.keras.layers.AveragePooling2D((7,7))
# model tail
self.flat = tf.keras.layers.Flatten()
self.classfier = tf.keras.layers.Dense(num_classes, activation='softmax')
def call(self, input_tensor, training=True, **kwargs):
# forward pass
x = self.conv_block(input_tensor)
x = self.inception_block1(x)
x = self.inception_block2(x)
x = self.downsample_block1(x)
x = self.inception_block3(x)
x = self.inception_block4(x)
x = self.inception_block5(x)
x = self.inception_block6(x)
x = self.downsample_block2(x)
x = self.inception_block7(x)
x = self.inception_block8(x)
x = self.avg_pool(x)
x = self.flat(x)
return self.classfier(x)
The amount of filter number for each computational block is set according to the design of the model (see the diagram). After initialing all the blocks (in the __init__ function), we connect them according to the design (in the call function).
I think you are way off on your estimate of the number of parameters needed. Think more like a few million which is what you will get if you use transfer learning. You can struggle trying to make your own model if you wish but you will probable not be any better (and more likely no where near as good) as the results you will get from transfer learning. I highly recommend the MobileV2 model. Now you can make that or any of the other models perform better if you use an adjustable learning rate using ReduceLROnPlateau . Documentation for that is here. The other thing I recommend is to use the Keras callback EarlyStopping. Documentation is here. . Set it to monitor validation loss and set restore_best_weights=True. Set the number of epochs to a large number so this callback gets triggered and returns the model with the weights from the epoch with the lowest validation loss. My recommended code is shown below
height=224
width=224
img_shape=(height, width, 3)
dropout=.3
lr=.001
class_count=156 # number of classes
img_shape=(height, width, 3)
base_model=tf.keras.applications.MobileNetV2( include_top=False, input_shape=img_shape, pooling='max', weights='imagenet')
x=base_model.output
x=keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )(x)
x = Dense(512, kernel_regularizer = regularizers.l2(l = 0.016),activity_regularizer=regularizers.l1(0.006),
bias_regularizer=regularizers.l1(0.006) ,activation='relu', kernel_initializer= tf.keras.initializers.GlorotUniform(seed=123))(x)
x=Dropout(rate=dropout, seed=123)(x)
output=Dense(class_count, activation='softmax',kernel_initializer=tf.keras.initializers.GlorotUniform(seed=123))(x)
model=Model(inputs=base_model.input, outputs=output)
model.compile(Adamax(lr=lr), loss='categorical_crossentropy', metrics=['accuracy'])
rlronp=tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=1, verbose=1, mode='auto', min_delta=0.0001, cooldown=0, min_lr=0)
estop=tf.keras.callbacks.EarlyStopping( monitor="val_loss", min_delta=0, patience=4,
verbose=1, mode="auto", baseline=None,
restore_best_weights=True)
callbacks=[rlronp, estop]
Also look at the balance in your data set. That is, compare how many training samples you have for each class. If the ratio of most samples/least samples>2 or 3 you may want to take action to mitigate that. Numerous methods are available, the simplest is to use the class_weight parameter in model.fit. o do that you need to create a class_weights dictionary. The process to do that is outline below
Lets say your class distribution is
class0 - 500 samples
class1- 2000 samples
class2 - 1500 samples
class3 - 200 samples
Then your dictionary would be
class_weights={0: 2000/500, 1:2000/2000, 2: 2000/1500, 3: 2000/200}
in model.fit set class_weight=class_weights

Tensorflow 2.x: List behavior of model architecture / select model parts via indices like in Pytorch

Is it possible to imitate the pytorch behaviour of selecting parts of an existing model using list indices (for transfer learning) in TF 2.x?
Example:
In pytorch I can take the first seven layers or a mobilenet v2 architecture and use them as backbone for my custom architecture like this:
class MyCustomNetwork(nn.Module):
def __init__(self, num_classes):
super(MyCustomNetwork, self).__init__()
model_mobilenetv2 = mobilenet_v2(pretrained=True, cpu_mode=cpu_mode)
# Take output features after layer 7
self.model_until_layer7 = model_mobilenetv2.features[:7]
self.some_custom_layers = [...]
def forward(self, x):
x = self.model_until_layer7(x)
x = self.some_custom_layers(x)
return x
I know the TF transfer learning example but it only allows for the head of mobilenet_v2 and not the cutting after arbitrary layers