should I create json annotation for validation images? - object-detection

I am trying to implement mask rcnn for my own dataset but couldnt find any info about annotations for the val folder that contains the images for validattion. I created json annotations using Via 2.0.8 for my training set and that make senese. but if the validation images are the images to test later on why to make annotations for them. I can't train my module without json file in the val folder.
I tried to copy the json annotation for training images to the validation folder. it worker I think but that means I should have the same amount of images in both training and val with same names as well.

You can take a look at this answer. Basically, you need validation set to validate the output and to measure the performance of your model. After the model is trained using the training set, the validation set is used to measure the model's performance in case of accuracy, average precision, etc. This means that the validation set needs to have similar annotation files (ground truth) as the training set, so that the result of the model's prediction can be compared to the true results defined by you. For example, the model performs segmentation on an image and outputs some result. This result is then compared with the annotation (the expected correct output) in the validation set to measure the accuracy of the model's prediction. The test set is just for you to test your model on and see how it is performing. However there is no exact measurements in the test set to calculate the performance and accuracy.
In case of segmentation, one of the popular measurements is the dice score for which we need the annotations (in validation set) to calculate.

Related

Feed image data without class label

I am trying to implement image super resolution using SRGAN. In the process, I used DIV2K dataset (http://data.vision.ee.ethz.ch/cvl/DIV2K/DIV2K_train_HR.zip) as my source.
I have worked with image classification using CNN (I used keras.layers.convolutional.Conv2D). But in this case we don't have class label in my data source.
I have unzipped the file and kept in D:\Unzipped\DIV2K_train_HR. Then used following command to read the files.
img_dataset = tensorflow.keras.utils.image_dataset_from_directory("D:\\unzipped")
Then created the model as follows
model = Sequential()
model.add(Conv2D(filters=64,kernel_size=(3,3),activation="relu",input_shape=(256,256,3)))
model.add(AveragePooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=64,kernel_size=(3,3),activation="relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.compile(optimizer='sgd', loss='mse')
model.fit(img_dataset,batch_size=32, epochs=10)
But I am getting error : "Graph execution error". I am unable to find the root cause behind this error. Is this error appearing as the class label is missing (I think as per code DIV2K_train_HR is treated as one class label)? Or is this happening due to images don't have one specific size?
Note: This code does not match with SRGAN architecture. I am new to GAN and trying to move ahead step by step. I got stuck in the first step itself.
Yes, the error message is because you don't have labels in your dataset.
As a first step in GAN network you need to create a discriminator model: given some image it should recognize if it is a real or fake image. You can take images from your dataset and label them as 1 ("real images"). Then generate "fake images" by down-sampling and up-sampling images from your dataset and label them as 0. Train your discriminator model so that it can distinguish between original and processed images.
After that, you create generator model. The generator model takes a down-sampled version of the image as an input and creates an up-sampled version in original resolution. GAN model combines generator and discriminator models by passing output from generator to discriminator. The target label is 1, i.e. we want generator create up-sampled versions of images, which discriminator can't distinguish from the real ones. Now train GAN network (set 'trainable' to false for discriminator model weights).
After your generator manages to produce images, which discriminator can't distinguish from the real, you take them, label as 0 and train discriminator again. Then train generator again etc.
The process continues until discriminator can't distinguish fake images from the real ones anymore (i.e. accuracy doesn't exceed 0.5).
Please see a simple example on ("Generative Adversarial Networks"):
https://github.com/ageron/handson-ml3/blob/main/17_autoencoders_gans_and_diffusion_models.ipynb
This code is explained in ch. 17 in book "Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow (3rd edition)" by Aurélien Géron.

variational autoencoder with limited data

Im working on a binary classificaton project, and im using VAE (variational autoencoder) to handle the imbalance between the 2 classes by generating new samples for the minority class.
the first class (majority class) contains 20000 samples, and the second one (minority class) contains 500 samples.
After training VAE model on the minority class, i generated new samples for this class and add them to the training set, then i trained two classification models, a model on trained on the imbalanced data (only training set) and the second one trained with training set + data generated by VAE). The problem is the first model is giving results better than the second(f1-score, Roc auc...), and i thought that maybe the problem was because of the limited amount of data that the VAE was trained on.
Any help please.
Though 500 training Images are not good enough to generate diversified images from a VAE, you can still try producing some. It's better to take mean of latents of 10 different images (or even more) and pass it through the decoder ( if you're already doing this, ignore it. If you're doing some other method, try this).
If it's still not working, then, I suggest you to build a Conditional VAE on your entire dataset. In conditional VAE, you train VAE using the labels so that your models learns not only reconstruction but also what class of image it is reconstructing. This helps you to generate an Image of any particular class.

Can choose some data in training data after doing data augmentation?

I am training a UNET for semantic segmentation but I only have 200 labeled images. Given the small size of the dataset, it definitely needs some data augmentation techniques.
I have question about the test and the validation set.
I have custom data generator which keep feeding data from folder for training model.
So what I plan to do is:
do data augmentation for the training set and keep all of it in the same folder
"randomly" pick some of training data into test and validation set (of course, before training).
I am not sure if this is fine, since we just do some simple processing (flipping, transposing, adjusting brightness)
Would it be better to separate the data first and do the augmentation for the rest of data in the training folder?

Improving training accuracy on Resnet

I am training an object detector using mxnet/resnet50
After the last training run the mAP was 78%, and the loss was 0.37
When I run the detector on my test set (independent of train/val data)
I am getting false positives result - with some rather high 30-60% confidence levels. I think I need to add some train/val images that do not have ANY of the objects i'm training the detector for.
I'm planning on adding about 20% more images that have a label of -1 -- which I read somewhere is how you designate an image having no label in mxnet.
Does this seem reasonable? is -1 the right way to designate it? any downside?
Thanks,
john
One method for an unbalanced object detection task is to have a classifier before the object detection stage, which determines if the image contains an object or not. You can weight the loss for each class in this classifier relative to its inverse frequency (i.e. higher weight for classes that appear less frequently). You should test on data with a similar class balance as the real world. You might find this post useful.

How can I evaluate FaceNet embeddings for face verification on LFW?

I am trying to create a script that is able to evaluate a model on lfw dataset. As a process, I am reading pair of images (using the LFW annotation list), track and crop the face, align it and pass it through a pre-trained facenet model (.pb using tensorflow) and extract the features. The feature vector size = (1,128) and the input image is (160,160).
To evaluate for the verification task, I am using a Siamese architecture. That is, I am passing a pair of images (same or different person) from two identical models ([2 x facenet] , this is equivalent like passing a batch of images with size 2 from a single network) and calculating the euclidean distance of the embeddings. Finally, I am training a linear SVM classifier to extract 0 when the embedding distance is small and 1 otherwise using pair labels. This way I am trying to learn a threshold to be used while testing.
Using this architecture I am getting a score of 60% maximum. On the other hand, using the same architecture on other models (e.g vgg-face), where the features are 4096 [fc7:0] (not embeddings) I am getting 90%. I definitely cannot replicate the scores that I see online (99.x%), but using the embeddings the score is very low. Is there something wrong with the pipeline in general ?? How can I evaluate the embeddings for verification?
Nevermind, the approach is correct, facenet model that is available online is poorly trained and that is the reason for the poor score. Since this model is trained on another dataset and not the original one that is described in the paper (obviously), verification score will be less than expected. However, if you set a constant threshold to the desired value you can probably increase true positives but by sacrificing f1 score.
You can use a similarity search engine. Either using approximated kNN search libraries such as Faiss or Nmslib, cloud-ready similarity search open-source tools such as Milvus, or production-ready managed service such as Pinecone.io.