Libsvm model file: support vector labels are different from class labels - libsvm

I have a training file for a two class problem and the labels are +1 and -1. After I run svm-train, the model file generated has real valued labels between -2 and +2.
Portion of Training file:
-1 1:-0.0902235 2:0.642459 3:-0.996008 4:-0.990354 5:-0.0415552 6:-0.559606 7:0.481824
-1 1:-0.53561 2:-0.739702 3:0.0719997 4:-0.0874957 5:-0.804345 6:-0.492728 7:-0.192003
1 1:-0.0607377 2:0.621136 3:-0.998019 4:-0.997149 5:0.0494642 6:-0.402682 7:0.128106
Corresponding support vectors in the model file:
-2 1:-0.0902235 2:0.642459 3:-0.996008 4:-0.990354 5:-0.0415552 6:-0.559606 7:0.481824
-0.962578101983108 1:-0.53561 2:-0.739702 3:0.0719997 4:-0.0874957 5:-0.804345 6:-0.492728 7:-0.19200
2 1:-0.0607377 2:0.621136 3:-0.998019 4:-0.997149 5:0.0494642 6:-0.402682 7:0.128106
They are in libsvm format.
I have not been able to figure out why this label alteration happens. Are the support vector labels important for tests?

Just got the answer. The label alteration is explained by the following:
http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f402

Related

How to batch an object detection dataset?

I am working on implementing a face detection model on the wider face dataset. I learned it was built into Tensorflow datasets and I am using it.
However, I am facing an issue while batching the data. Since, an Image can have multiple faces, therefore the number of bounding boxes output are different for each Image. For example, an Image with 2 faces will have 2 bounding box, whereas one with 4 will have 4 and so on.
But the problem is, these unequal number of bounding boxes is causing each of the Dataset object tensors to be of different shapes. And in TensorFlow afaik we cannot batch tensors of unequal shapes ( source - Tensorflow Datasets: Make batches with different shaped data). So I am unable to batch the dataset.
So after loading the following code and batching -
ds,info = tfds.load('wider_face', split='train', shuffle_files=True, with_info= True)
ds1 = ds.batch(12)
for step, (x,y,z) in enumerate(ds1) :
print(step)
break
I am getting this kind of error on run Link to Error Image
In general any help on how can I batch the Tensorflow object detection datasets will be very helpfull.
It might be a bit late but I thought I should post this anyways. The padded_batch feature ought to do the trick here. It kind of goes around the issue by matching dimension via padding zeros
ds,info = tfds.load('wider_face', split='train', shuffle_files=True, with_info= True)
ds1 = ds.padded_batch(12)
for step, (x,y,z) in enumerate(ds1) :
print(step)
break
Another solution would be to process not use batch and process with custom buffers with for loops but that kind of defeats the purpose. Just for posterity I'll add the sample code here as an example of a simple workaround.
ds,info = tfds.load('wider_face', split='train', shuffle_files=True, with_info= True)
batch_size = 12
image_annotations_pair = [x['image'], x['faces']['bbox'] for n, x in enumerate(ds) if n < batch_size]
Then use a train_step modified for this.
For details one may refer to - https://www.kite.com/python/docs/tensorflow.contrib.autograph.operators.control_flow.dataset_ops.DatasetV2.padded_batch

DeepLabV3, segmentation and classification/detection on coral

I am trying to use DeepLabV3 for image segmentation and object detection/classification on Coral.
I was able to sucessfully run the semantic_segmentation.py example using DeepLabV3 on the coral, but that only shows an image with an object segmented.
I see that it assigns labels to colors - how do i associate the labels.txt file that I made based off of the label info of the model to these colors? (how do i know which color corresponds to which label).
When I try to run the
engine = DetectionEngine(args.model)
using the deeplab model, I get the error
ValueError: Dectection model should have 4 output tensors!This model
has 1.
I guess this way is the wrong approach?
Thanks!
I believe you have reached out to us regarding the same query. I just wanted to paste the answer here for others to reference:
"The detection model usually have 4 output tensors to specifies the locations, classes, scores, and number and detections. You can read more about it here. In contrary, the segmentation model only have a single output tensor, so if you treat it the same way, you'll most likely segfault trying to access the wrong memory region. If you want to do all three tasks on the same image, my suggestion is to create 3 different engines and feed the image into each. The only problem with this is that each time you switch the model, there will likely be data transfer bottleneck for the model to get loaded onto the TPU. We have here an example on how you can run 2 models on a single TPU, you should be able to modify it to take 3 models."
On the last note, I just saw that you added:
how do i associate the labels.txt file that I made based off of the label info of the model to these colors
I just don't think this is something you can do for segmentation model but maybe I'm just confused on your query?
Take object detection model for example, there are 4 output tensors, the second tensor gives you an array of id associates with a certain class that you can map to a a label file. Segmentaion models only give the pixel surrounding an objects.
[EDIT]
Apology, looks like I'm the one confused on segmentation models.
Quote form my college :)
"You are interested to know the name of the label, you can find the corresponding integer to that label from result array in Semantic_segmentation.py. Where result is classification data of each pixel.
For example;
if you print result array in the with bird.jpg as input you would find few pixel's value as 3 which is corresponding 4th label in pascal_voc_segmentation_labels.txt (as indexing starts at 0 )."

whether submitting single column of a data (i have 513 columns) as a image to CNN as input is correct way?

I have DNA methylation data of Lower grade glioma samples obtained from GDC data portal. the values in my data are ranging from 0 to 1. I want to do classification of these samples into two classes as IDH WT and IDH mutant.
I want to use CNN for classification purpose. Thus i am doing image embedding here to give image as input. I am new to deep learning methods. I needed help with data preparation for CNN method. I am referring to this paper-(reference paper: doi: http://dx.doi.org/10.1101/364323.) My question is,
I have data frame of 9203*513 (row*column), i have saved each column into a separate file (excel file).
then i have reshaped each single file into 767*12 (row*column) by inserting one more row and zero is added there.
then i have done image embedding of each file. and i have submitted these all files to CNN as input. (training set: 80% images and test set: 20% images)
So is this approach correct to give input to CNN for classification problem?
I'm open to all suggestions,
Thank you for your time and consideration,

CNTK ImageDeserializer and DCGAN sample

I'm reworking this sample https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_206B_DCGAN.ipynb to work with png MNIST files (rather than flat 1d array image input that tutorial uses). I use ImageDeserializer (and map file to load the data):
def create_mb_source(map_file, image_dims, num_classes, randomize=True):
transforms = [
xforms.scale(width=image_dims[2], height=image_dims[1], channels=image_dims[0], interpolations='linear')]
return MinibatchSource(ImageDeserializer(map_file, StreamDefs(
features=StreamDef(field='image', transforms=transforms),
labels=StreamDef(field='label', shape=num_classes))),
randomize=randomize)
I changed the input output of to Discriminator to expect 28x28 image (and output of Generator). See the code here: https://github.com/olgaliak/cntk-cyclegan/blob/master/trainDCGan.py
the problem is that trainDCGan.py is generating noise now. Appreciate your help!
The issue got solved once I
1) Switched to used 3 channels in ImageDeserializer
2) Changed network architecture to use 2d strides\kernels instead 1d.
This commit highlights the changes that made things working.

Strict class labels in SVM

I'm using one-vs-all to do a 21-class svm categorization.
I want the label -1 to mean "not in this class" and the label 1 to mean "indeed in this class" for each of the 21 kernels.
I've generated my pre-computed kernels and my test vectors using this standard.
Using easy.py everything went well for 20 of the classes, but for one of them the labels were switched so that all the inputs that should have been labelled with 1 for being in the class were instead labelled -1 and vice-versa.
The difference in that class was that the first vector in the pre-computed kernel was labelled 1, while in all the other kernels the first vector was labelled -1. This suggests that LibSVM relabels all of my vectors.
Is there a way to prevent this or a simple way to work around it?
You already discovered that libsvm uses the label -1 for whatever label it encounters first.
The reason is, that it allows arbitrary labels and changes them to -1 and +1 according to the order in which they appear in the label vector.
So you can either check this directly or you look at the model returned by libsvm.
It contains an entry called Label which is a vector containing the order in which libsvm encountered the labels. You can also use this information to switch the sign of your scores.
If during training libsvm encounters label A first, then during prediction
libsvm will use positive values for assigning object the label A and negative values for another label.
So if you use label 1 for positive class and 0 for negative, then to obtain right output values you should do the following trick (Matlab).
%test_data.y contains 0-s and 1-s
[labels,~,values] = svmpredict(test_data.y, test_data.X, model, ' ');
if (model.Label(1) == 0) % we check which label was encountered by libsvm first
values = -values;
end