Retraining Inception and Downsampling - tensorflow

I have followed this Tensorflow tutorial on transfer learning with the Inception model using my own dataset of 640x360 images. My question comes in 2 parts
1) My data set conatains 640x360 images. Is the first operation that happens a downsampling to 299x299? I ask because I have a higher res version of the same dataset and I am wondering if training with the higher resolution images will result in different performance (hopefully better)
2) When running the network (using tf.sess.run()) is my input image down-sampled to 299x299?
Note: I have seen the 299x299 resolution stat listed many places online like this one and I am confused at exactly which images its referring to; the initial training dataset images (for Inception I think it was imagenet), the transfer learning dataset (my personal one), the input image when running the CNN, or a combination of the 3.
Thanks in advance :)

The inception model will resize your image to 299x299. This can be confirmed by visualizing the tensorflow graph. If you have enough samples to do the transfer learning, the accuracy will be good enough with resizing to 299x299. But if you really want to try out the training with actual resolution, the initial input layers of the graph size needs to be changed

Related

How to create a dataset for image classification

I trained a model using images I gathered from the web. Then, when inferences were made using images newly collected from the web, performance was poor.
I am wondering how I can improve my dataset using misclassified images. Can I add all the misclassified images to the training dataset? And then do I have to collect new images?
[Edit]
I added some of the misclassified images to the training dataset, although the performance evaluation got better.
It might be worth if you could provide more info on how you trained your model, and your network architecture.
However this are some general guidelines:
You can try to diversify your images in your train set by, yes, adding new images. The more different examples you provide to your network, the higher the chance that they will be similar to images you want to obtain prediction from.
Do data augmentation, it is pretty straightforward and usually improves quite a bit the accuracy. You can have a look at this Tensorflow tutorial for Data Augmentation. If you don’t know what data augmentation is, basically is a technique to perform minor changes to your images, that is by rotating the image a bit, resizing etc. This way the model is trained to learn your images even with slight changes, which usually makes it more robust to new images.
You could consider doing Transfer Learning. The main idea here is to leverage a model that has learned on a huge dataset and use it to fine-tune your specific problem. In the tutorial I linked they show the typical workflow of transfer learning, by taking a model pretrained on the ImageNet dataset (the huge dataset), and retraining it on the Kaggle "cats vs dogs" classification dataset (a smaller dataset, like the one you could have).

How can I Localize anomalies with heatmaps using autoencoder?

I am working on anomaly detection model (for PCBs) using Autoencoder , I am working on google Colab using free GPU. so as a first step I was trying to build my autoencoder and visualise the reconstruction of my training data(pictures without defects size 1,3 MP). I built a model of three layers with 150 epochs batch size =2, it gave me good results. I used SSIM loss function to calculate the difference between the test photos ( pictures with aomalies) and the training data(pictures without anomalies). The problem here that I want to visualize these differences with the HeatMap as I read in some articles that it is possible to localize anomalies in a pixel level .. I suppose it is related to the loss function that we use to calculate the difference.
do you have any idea what functions could help me visualize/Localize anomalies ?
The task of outputting where in an image is known as Anomaly Localization. There are many academic papers on the topic for advanced methods.
When using a reconstructing autoencoder on images for anomaly detection, one can compute the difference between the input image and the reconstructed output image as an anomaly-level image.

Variation in total loss while training the Faster RCNN model using customized data

I am working on object detection model to identify two classes. I am using Faster RCNN on customized dataset in tensorflow api. The dataset contains 20k images (augmented) with two classes. While training the model the loss is not decreasing properly as it reach to 100k steps. It has lot of variation as shown in image. Can someone tell me where i am making mistake.
enter image description here

Training SSD-MOBILENET V1 and the loss does not deacrease

I'm new in everithing about CNN and tensorflow. Im training a pretrained ssd-mobilenev1-pets.config to detect columns of buildings, about one day but the loss is between 2-1 and doesnt decrease since 10 hours ago.
I realized that my input images are 128x128 and SSD resize de image to 300*300.
Does the size of the input images affect the training?
If that is the case, should I retrain the network with larger input images? or what would be another option to decrease the loss? my train dataset has 660 images and test 166 I dont Know if there are enough images
I really aprecciate your help ....
Loss values of ssd_mobilenet can be different from faster_rcnn. From EdjeElectronics' TensorFlow Object Detection Tutorial:
For my training on the Faster-RCNN-Inception-V2 model, it started at
about 3.0 and quickly dropped below 0.8. I recommend allowing your
model to train until the loss consistently drops below 0.05, which
will take about 40,000 steps, or about 2 hours (depending on how
powerful your CPU and GPU are). Note: The loss numbers will be
different if a different model is used. MobileNet-SSD starts with a
loss of about 20, and should be trained until the loss is consistently
under 2.
For more information: https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10#6-run-the-training
The SSD Mobilnet architecture demands additional training to suffice
the loss accuracy values of the R-CNN model, however, offers
practicality, scalability, and easy accessibility on smaller devices
which reveals the SSD model as a promising candidate for further
assessment (Fleury and Fleury, 2018).
For more information: Fleury, D. & Fleury, A. (2018). Implementation of Regional-CNN and SSD machine learning object detection architectures for the real time analysis of blood borne pathogens in dark field microscopy. MDPI AG.
I would recommend you to take 15%-20% images for testing which cover all the variety present in training data. As you said you have 650+ images for training and 150+ for testing. That is roughly 25% of testing images. It looks like you have enough images to start with. I know the more, the merrier but make sure your model also has sufficient data to learn from!
Resizing the images does not contribute to the loss. It makes sure there is consistency across all images for the model to recognize them without bias. The loss has nothing to do with image resizing as long as every image is resized identically.
You have to make stops and recover checkpoints again and again if you want your model to be perfectly fit. Usually, you can get away with good accuracy by re-training the ssd mobilenet until the loss consistently becomes under 1.Ideally we want the loss to be as lower as possible but we want to make sure the model is not over-fitting. It is all about trial and error. (Loss between 0.5 and 1 seems to be doing the job well but again it all depends on you.)
The reason I think your model is underperforming is due to the fact that you have variety of testing data and not enough training data to suffice.
The model has not been given enough knowledge in training data to make the model learn for new variety of testing data. (For example : Your test data has some images of new angles of buildings which are not sufficiently present in training data). In that case, I recommend you to put variety of all images in training data and then picking images to test making sure you still have sufficient training data of new postures. That's why I recommend you to take 15%-20% test data.

Tensorflow object detection: why is the location in image affecting detection accuracy when using ssd mobilnet v1?

I'm training a model to detect meteors within a picture of the night sky and I have a fairly small dataset with about 85 images and each image is annotated with a bounding box. I'm using the transfer learning technique starting with the ssd_mobilenet_v1_coco_11_06_2017 checkpoint and Tensorflow 1.4. I'm resizing images to 600x600pixels during training. I'm using data augmentation in the pipeline configuration to randomly flip the images horizontally, vertically and rotate 90 deg. After 5000 steps, the model converges to a loss of about 0.3 and will detect meteors but it seems to matter where in the image the meteor is located. Do I have to train the model by giving examples of every possible location? I've attached a sample of a detection run where I tiled a meteor over the entire image and received various levels of detection (filtered to 50%). How can I improve this?detected meteors in image example
It could very well be your data and I think you are making a prudent move by improving the heterogeneity of your dataset, BUT it could also be your choice of model.
It is worth noting that ssd_mobilenet_v1_coco has the lowest COCO mAP relative to the other models in the TensorFlow Object Detection API model zoo. You aren't trying to detect a COCO object, but the mAP numbers are a reasonable aproximation for generic model accuracy.
At the highest possible level, the choice of model is largely a tradeoff between speed/accuracy. The model you chose, ssd_mobilenet_v1_coco, favors speed over accuracy. Consequently, I would reccomend you try one of the Faster RCNN models (e.g., faster_rcnn_inception_v2_coco) before you spend a signifigant amount of time preprocessing images.