How to refine the Graphcut cmex code based on a specific energy functions? - optimization

I download the following graph-cut code:
https://github.com/shaibagon/GCMex
I compiled the mex files, and ran it for pre-defined image in the code (which is rgb image)
I wanna optimize the image segmentation results,
I have probability map of the image, which its dimension is (width,height, 5). Five probability distribution over the image dimension are stacked together. each relates to one the classes.
My problem is which parts of code should according to the probability image.
I want to define Data and Smoothing terms based on my application.
My question is:
1) Has someone refined the code according to the defining different energy function (I wanna change Unary and pair-wise formulation).
2) I have a stack of 3D images. I wanna define 6-neighborhood system, 4 neighbors in current slice and the other two from two adjacent slices. In which function and part of code can I do the refinements?
Thanks

Related

Is it possible that two displacement sensors value( one is digital input and the other is analog) express simultaneously in a vi?

I tried to express two displacement values in one waveform chart.
I have two displacement sensors, one is a digital input sensor and the other is analog input sensor.
I have to see those values in one waveform chart simultaneously.
In my attempts, two VIs of each instruments were combined into one VI. I found the error that when I ran the VI, only one VI would report values and not both simultaneously.
By any chance, is there a way to run it at the same time and see the values on one graph?
Let me share a few scenarios that might help solve your issue.
Plot multiple doubles on a chart: bundle them together and put the resulting cluster into a chart.
bundled doubles on a chart
Two VIs measuring a double and plotting those results.
bundled subvi outputs to a chart
You could similarly plot boolean values (your digital input sensor) by using the boolean to 0,1 VI first (converts the T/F to 1/0 respectively) and that can be bundled as above.
(These are the most direct/easiest ways of doing this; you have a number generator and you bundle the numbers to a graph).
Of course, I can imagine that your question might actually be about how to share values from parallel-running subVIs. If that's the case, say so and this answer can be edited to point you in the right direction.

Implement CVAE for a single image

I have a multi-dimensional, hyper-spectral image (channels, width, height = 15, 2500, 2500). I want to compress its 15 channel dimensions into 5 channels.So, the output would be (channels, width, height = 5, 2500, 2500). One simple way to do is to apply PCA. However, performance is not so good. Thus, I want to use Variational AutoEncoder(VAE).
When I saw the available solution in Tensorflow or keras library, it shows an example of clustering the whole images using Convolutional Variational AutoEncoder(CVAE).
https://www.tensorflow.org/tutorials/generative/cvae
https://keras.io/examples/generative/vae/
However, I have a single image. What is the best practice to implement CVAE? Is it by generating sample images by moving window approach?
One way of doing it would be to have a CVAE that takes as input (and output) values of all the spectral features for each of the spatial coordinates (the stacks circled in red in the picture). So, in the case of your image, you would have 2500*2500 = 6250000 input data samples, which are all vectors of length 15. And then the dimension of the middle layer would be a vector of length 5. And, instead of 2D convolutions that are normally used along the spatial domain of images, in this case it would make sense to use 1D convolution over the spectral domain (since the values of neighbouring wavelengths are also correlated). But I think using only fully-connected layers would also make sense.
As a disclaimer, I haven’t seen CVAEs used in this way before, but like this, you would also get many data samples, which is needed in order for the learning generalise well.
Another option would be indeed what you suggested -- to just generate the samples (patches) using a moving window (maybe with a stride that is the half size of the patch). Even though you wouldn't necessarily get enough data samples for the CVAE to generalise really well on all HSI images, I guess it doesn't matter (if it overfits), since you want to use it on that same image.

DeepLabV3, segmentation and classification/detection on coral

I am trying to use DeepLabV3 for image segmentation and object detection/classification on Coral.
I was able to sucessfully run the semantic_segmentation.py example using DeepLabV3 on the coral, but that only shows an image with an object segmented.
I see that it assigns labels to colors - how do i associate the labels.txt file that I made based off of the label info of the model to these colors? (how do i know which color corresponds to which label).
When I try to run the
engine = DetectionEngine(args.model)
using the deeplab model, I get the error
ValueError: Dectection model should have 4 output tensors!This model
has 1.
I guess this way is the wrong approach?
Thanks!
I believe you have reached out to us regarding the same query. I just wanted to paste the answer here for others to reference:
"The detection model usually have 4 output tensors to specifies the locations, classes, scores, and number and detections. You can read more about it here. In contrary, the segmentation model only have a single output tensor, so if you treat it the same way, you'll most likely segfault trying to access the wrong memory region. If you want to do all three tasks on the same image, my suggestion is to create 3 different engines and feed the image into each. The only problem with this is that each time you switch the model, there will likely be data transfer bottleneck for the model to get loaded onto the TPU. We have here an example on how you can run 2 models on a single TPU, you should be able to modify it to take 3 models."
On the last note, I just saw that you added:
how do i associate the labels.txt file that I made based off of the label info of the model to these colors
I just don't think this is something you can do for segmentation model but maybe I'm just confused on your query?
Take object detection model for example, there are 4 output tensors, the second tensor gives you an array of id associates with a certain class that you can map to a a label file. Segmentaion models only give the pixel surrounding an objects.
[EDIT]
Apology, looks like I'm the one confused on segmentation models.
Quote form my college :)
"You are interested to know the name of the label, you can find the corresponding integer to that label from result array in Semantic_segmentation.py. Where result is classification data of each pixel.
For example;
if you print result array in the with bird.jpg as input you would find few pixel's value as 3 which is corresponding 4th label in pascal_voc_segmentation_labels.txt (as indexing starts at 0 )."

TensorFlow Object Detection API: evaluation mAP behaves weirdly?

I am training an object detector for my own data using Tensorflow Object Detection API. I am following the (great) tutorial by Dat Tran https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9. I am using the provided ssd_mobilenet_v1_coco-model pre-trained model checkpoint as the starting point for the training. I have only one object class.
I exported the trained model, ran it on the evaluation data and looked at the resulted bounding boxes. The trained model worked nicely; I would say that if there was 20 objects, typically there were 13 objects with spot on predicted bounding boxes ("true positives"); 7 where the objects were not detected ("false negatives"); 2 cases where problems occur were two or more objects are close to each other: the bounding boxes get drawn between the objects in some of these cases ("false positives"<-of course, calling these "false positives" etc. is inaccurate, but this is just for me to understand the concept of precision here). There are almost no other "false positives". This seems much better result than what I was hoping to get, and while this kind of visual inspection does not give the actual mAP (which is calculated based on overlap of the predicted and tagged bounding boxes?), I would roughly estimate the mAP as something like 13/(13+2) >80%.
However, when I run the evaluation (eval.py) (on two different evaluation sets), I get the following mAP graph (0.7 smoothed):
mAP during training
This would indicate a huge variation in mAP, and level of about 0.3 at the end of the training, which is way worse than what I would assume based on how well the boundary boxes are drawn when I use the exported output_inference_graph.pb on the evaluation set.
Here is the total loss graph for the training:
total loss during training
My training data consist of 200 images with about 20 labeled objects each (I labeled them using the labelImg app); the images are extracted from a video and the objects are small and kind of blurry. The original image size is 1200x900, so I reduced it to 600x450 for the training data. Evaluation data (which I used both as the evaluation data set for eval.pyand to visually check what the predictions look like) is similar, consists of 50 images with 20 object each, but is still in the original size (the training data is extracted from the first 30 min of the video and evaluation data from the last 30 min).
Question 1: Why is the mAP so low in evaluation when the model appears to work so well? Is it normal for the mAP graph fluctuate so much? I did not touch the default values for how many images the tensorboard uses to draw the graph (I read this question: Tensorflow object detection api validation data size and have some vague idea that there is some default value that can be changed?)
Question 2: Can this be related to different size of the training data and the evaluation data (1200x700 vs 600x450)? If so, should I resize the evaluation data, too? (I did not want to do this as my application uses the original image size, and I want to evaluate how well the model does on that data).
Question 3: Is it a problem to form the training and evaluation data from images where there are multiple tagged objects per image (i.e. surely the evaluation routine compares all the predicted bounding boxes in one image to all the tagged bounding boxes in one image, and not all the predicted boxes in one image to one tagged box which would preduce many "false false positives"?)
(Question 4: it seems to me the model training could have been stopped after around 10000 timesteps were the mAP kind of leveled out, is it now overtrained? it's kind of hard to tell when it fluctuates so much.)
I am a newbie with object detection so I very much appreciate any insight anyone can offer! :)
Question 1: This is the tough one... First, I think you don't understand correctly what mAP is, since your rough calculation is false. Here is, briefly, how it is computed:
For each class of object, using the overlap between the real objects and the detected ones, the detections are tagged as "True positive" or "False positive"; all the real objects with no "True positive" associated to them are labelled "False Negative".
Then, iterate through all your detections (on all images of the dataset) in decreasing order of confidence. Compute the accuracy (TP/(TP+FP)) and recall (TP/(TP+FN)), only counting the detections that you've already seen ( with confidence bigger than the current one) for TP and FP. This gives you a point (acc, recc), that you can put on a precision-recall graph.
Once you've added all possible points to your graph, you compute the area under the curve: this is the Average Precision for this category
if you have multiple categories, the mAP is the standard mean of all APs.
Applying that to your case: in the best case your true positive are the detections with the best confidence. In that case your acc/rec curve will look like a rectangle: you'd have 100% accuracy up to (13/20) recall, and then points with 13/20 recall and <100% accuracy; this gives you mAP=AP(category 1)=13/20=0.65. And this is the best case, you can expect less in practice due to false positives which higher confidence.
Other reasons why yours could be lower:
maybe among the bounding boxes that appear to be good, some are still rejected in the calculations because the overlap between the detection and the real object is not quite big enough. The criterion is that Intersection over Union (IoU) of the two bounding boxes (real one and detection) should be over 0.5. While it seems like a gentle threshold, it's not really; you should probably try and write a script to display the detected bounding boxes with a different color depending on whether they're accepted or not (if not, you'll get both a FP and a FN).
maybe you're only visualizing the first 10 images of the evaluation. If so, change that, for 2 reasons: 1. maybe you're just very lucky on these images, and they're not representative of what follows, just by luck. 2. Actually, more than luck, if these images are the first from the evaluation set, they come right after the end of the training set in your video, so they are probably quite similar to some images in the training set, so they are easier to predict, so they're not representative of your evaluation set.
Question 2: if you have not changed that part in the config file mobilenet_v1_coco-model, all your images (both for training and testing) are rescaled to 300x300 pixels at the start of the network, so your preprocessings don't matter.
Question 3: no it's not a problem at all, all these algorithms were designed to detect multiple objects in images.
Question 4: Given the fluctuations, I'd actually keep training it until you can see improvement or clear overtraining. 10k steps is actually quite small, maybe it's enough because your task is relatively easy, maybe it's not enough and you need to wait ten times that to have significant improvement...

VTK / ITK Dice Similarity Coefficient on Meshes

I am new to VTK and am trying to compute the Dice Similarity Coefficient (DSC), starting from 2 meshes.
DSC can be computed as 2 Vab / (Va + Vb), where Vab is the overlapping volume among mesh A and mesh B.
To read a mesh (i.e. an organ contour exported in .vtk format using 3D Slicer, https://www.slicer.org) I use the following snippet:
string inputFilename1 = "organ1.vtk";
// Get all data from the file
vtkSmartPointer<vtkGenericDataObjectReader> reader1 = vtkSmartPointer<vtkGenericDataObjectReader>::New();
reader1->SetFileName(inputFilename1.c_str());
reader1->Update();
vtkSmartPointer<vtkPolyData> struct1 = reader1->GetPolyDataOutput();
I can compute the volume of the two meshes using vtkMassProperties (although I observed some differences between the ones computed with VTK and the ones computed with 3D Slicer).
To then intersect 2 meshses, I am trying to use vtkIntersectionPolyDataFilter. The output of this filter, however, is a set of lines that marks the intersection of the input vtkPolyData objects, and NOT a closed surface. I therefore need to somehow generate a mesh from these lines and compute its volume.
Do you know which can be a good, accurate way to generete such a mesh and how to do it?
Alternatively, I tried to use ITK as well. I found a package that is supposed to handle this problem (http://www.insight-journal.org/browse/publication/762, dated 2010) but I am not able to compile it against the latest version of ITK. It says that ITK must be compiled with the (now deprecated) ITK_USE_REVIEW flag ON. Needless to say, I compiled it with the new Module_ITKReview set to ON and also with backward compatibility but had no luck.
Finally, if you have any other alternative (scriptable) software/library to solve this problem, please let me know. I need to perform these computation automatically.
You could try vtkBooleanOperationPolyDataFilter
http://www.vtk.org/doc/nightly/html/classvtkBooleanOperationPolyDataFilter.html
filter->SetOperationToIntersection();
if your data is smooth and well-behaved, this filter works pretty good. However, sharp structures, e.g. the ones originating from binary image marching cubes algorithm can make a problem for it. That said, vtkPolyDataToImageStencil doesn't necessarily perform any better on this regard.
I had once impression that the boolean operation on polygons is not really ideal for "organs" of size 100k polygons and more. Depends.
If you want to compute a Dice Similarity Coefficient, I suggest you first generate volumes (rasterize) from the meshes by use of vtkPolyDataToImageStencil.
Then it's easy to compute the DSC.
Good luck :)