How to detect only one specified class instead of all classes in tensorflow object detection? - tensorflow

I trained my dataset with six classes and it works fine in detection different classes. Is it possible to modify object detector script to detect only one specified class instead of all six classes? Or I must retrain my dataset for one class again from scratch? Thanks a lot for any recommendation.
Here is my drawing part of object detector script:
vis_util.visualize_boxes_and_labels_on_image_array(
image,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=1,
agnostic_mode=False,
groundtruth_box_visualization_color='black',
skip_scores=False,
skip_labels=False,
min_score_thresh=0.80)

Unless you change the code, you're going to get probabilities for all the classes. Ofc, you can select highest one among em. Makes sense?

It might not be the best solution to this problem, but you could try making a copy of the label_map.pbtxt-file (one to alter and one for safe-keeping) and delete all labels but the one you are interested in, in one of them.
Then you can lower the min_score_thresh to maybe 0.1 or something (or not modify this parameter at all), and only detect the one label you kept in the label_map.pbtxt-file.
If you are using the Object detection API from GitHub, the mscoco_label_map.pbtxt-file can be found in models-master/research/object_detection/data/ (remember to open it with a text-editor)

Before you call the visualization function add the following code -
objectOfInterest = 1 # Interested object class number as per label file
box = np.asarray(boxes)
cls = np.asarray(classes).astype(np.int32)
scr = np.asarray(scores)
bl = (cls == objectOfInterest)
classes = np.extract(boolar,cls)
scores = np.extract(boolar,scr)
boxes = np.extract(boolar,box)

The code suggested below by Suman was almost perfect, but the "boxes" array needs to be a 4 position tuple (box coordinates). To select a specific class, selecting the matching box coordinate tuple is needed. So I've added some lines before the code suggested by Suman. Check the code below:
objectOfInterest = 1 # Interested object class number as per label file
box = np.asarray(boxes)
cls = np.asarray(classes).astype(np.int32)
scr = np.asarray(scores)
boxes = []
for in range(1, len(cls)):
if cls[i] == objectOfInterest:
boxes.append(box[i])
boxes = np.array(boxes)
bl = (cls == objectOfInterest)
classes = np.extract(boolar,cls)
scores = np.extract(boolar,scr)

Related

Non Max Suppression settings and postprocessing for EfficientDet

I've downloaded and installed the Tensorflow Object Detection API and downloaded one of the EfficientDet models. As I want to do some work on the raw scores directly before Non-Max Suppression reduces it to class output, my first goal was to try and get the same final outputs from the raw scores, using the downloaded model config as a guide.
post_processing {
batch_non_max_suppression {
score_threshold: 9.99999993922529e-09
iou_threshold: 0.5
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
As the Object Detection API has no score converter method under postprocessing, I'm not sure what this does, but the only batch NMS method in utils seems to be batch_multiclass_non_max_suppression.
So, having fed an image into the network and got an output detections, to try and replicate its results:
result = post_processing.batch_multiclass_non_max_suppression(tf.expand_dims(detections['raw_detection_boxes'], 2), detections['raw_detection_scores'], 9.99999993922529e-09, 0.5, 100, max_total_size=100)
detections['detection_boxes'] = result[0]
detections['detection_scores'] = result[1]
detections['detection_classes'] = result[2]
i.e., substitute the relevant scores in the detections with the output of NMS, and insert the dimension needed for the batch function to work. This is then visualised following per the TensorFlow Hub colab.
The problem is that whilst the input image (this from the MSCOCO dataset) should produce this:
It instead produces this:
The bounding boxes are all (seemingly) shifted upwards and the categories are simply off, which suggests there's more processing being done between the raw scores, NMS, and output, but it's entirely unclear what. The scores are correct, so it appears to be pruning correctly.
Edit: I suspect, after looking at the SSD model template, that the problem with the misaligned bounding boxes is because I'm not passing the resized image dimensions along to NMS, which is generated by the preprocessing step, which should be easy enough to address via generating the image resize function. However, after applying the slice operation to remove a background class doesn't address the incorrect labels:
Instead, it seems to have lost the person class entirely--this makes sense; it isn't configured to include a background class of any sort and if Person (id 1) is instead coming out as index 0, then this would cut them off.
EDIT 2: I looked at the original meta-architecture further and copied the image-resizing function, i.e.:
from object_detection.protos import image_resizer_pb2
from object_detection.utils import config_util as c
from object_detection.utils import shape_utils
config = c.get_configs_from_pipeline_file(r"C:\Users\Person\.keras\datasets\efficientdet_d7_coco17_tpu-32\pipeline.config")
image_config = c.get_image_resizer_config(config['model'])
resize = image_resizer_builder.build(image_config)
def compute_clip_window(preprocessed_images, true_image_shapes):
# identical to the meta-arch definition
# image resizing
im = tf.cast(input_tensor, tf.float32)
channel_offset = [0.485, 0.456, 0.406]
channel_scale = [0.229, 0.224, 0.225]
im = ((im / 255.0) - [[channel_offset]]) / [[channel_scale]]
resized = shape_utils.resize_images_and_return_shapes(im, resize)
clip = compute_clip_window(resized[0], resized[1])
Therefore allowing the clip argument to be supplied to NMS. However, this doesn't change anything, and it still returns the same mis-aligned boxes as the second image. This is incredibly confusing, as this seems like it should replicate everything the model needs in both the preprocessing and postprocessing steps to generate its own output: the image is normalized and resized; the true image size is retained alongside the resized image; no further processing of the raw boxes or raw scores happens before they get passed to the NMS (the returned versions of the raw values are identical to the values passed to NMS except with one dimension and the model itself doesn't interfere with the post-processing at all--and the call signature calls preprocessing, prediction, and postprocessing in turn, so nothing else should be happening in the interim.
Edit 3: I added another line was added (to no effect)--setting the multiclass scores in the NMS additional fields to the detection scores with backgrounds (i.e., the raw scores). By adding +1 to all the label classes, I got the following image:
Whilst this is correct, this only corrects for the earlier parts of the dataset, i.e. where the only empty class is the 0th. It still appears that there must be some mapping step I'm not following, alongside whatever is causing the image misalignment.
The easiest solution in my case was to load the model from the checkpoint and configs, rather than use the saved model directly, in order to access the original preprocess, predict, and postprocess methods, rather than having a single function call.

Applying Tensorflow Dataset .map() to subsequent dataset elements

I've got a TFRecordDataset and I'm trying to preprocess the features of two subsequent elements by means of the map() API.
dataset_ext = dataset.map(lambda x: tf.py_function(parse_data, [x], [tf.float32]))
As map applies the function parse_data to every dataset element, I don't know what parse_data should look like in order to keep track of the feature extracted from the previous dataset element.
Can anyone help? Thank you
EDIT: I'm working on the Waymo dataset, so each element is a frame. You can refer to https://github.com/Jossome/Waymo-open-dataset-document for its structure.
This is my parse function parse_data:
from waymo_open_dataset import dataset_pb2 as open_dataset
def parse_data(input_data):
frame = open_dataset.Frame()
frame.ParseFromString(bytearray(input_data.numpy()))
av_speed = (frame.images[0].velocity.v_x, frame.images[0].velocity.v_y, frame.images[0].velocity.v_z)
return av_speed
I'd like to build a dataset whose features are the car speed and acceleration, defined as the speed variation between subsequent frames (the first value can be 0).
One way I thought about is to give the map function dataset and dataset.skip(1) as inputs but I'm not sure about it yet.
I am not sure but it might be unnecessary to make your mapped function a tf.py_function. How parse_data is supposed to look like depends on your dataset dataset_ext. If it has for example two file paths (1 instace of input data and 1 instance of output data), the mapping function should have 2 arguments and should return 2 arguments.
For example: if your dataset contains images and you want them to be randomly cropped each time an example of your dataset is drawn the mapping function looks like this:
def process_img_random_crop(img_in, img_out, output_shape):
merged = tf.stack([img_in, img_out])
mergedCrop = tf.image.random_crop(merged, size=(2,) + output_shape)
img_in_cropped, img_out_cropped = tf.unstack(mergedCrop, 2, 0)
return img_in_cropped, img_out_cropped
I call it as follows:
image_ds_test = image_ds_test.map(lambda i, o: process_img_random_crop(i, o, output_shape=(64, 64, 1)), num_parallel_calls=tf.data.experimental.AUTOTUNE)
What exactly is your plan with dataset_ext and what does it contain?
Edit:
Okay, got what you meant with you the two frames. So the map function is applied to each entry of your dataset separatly. If you need cross-entry information, a single entry of your dataset needs to contain two frames. With this more complicated set-up, I would suggest you to use a tensorflow Sequence: The explanation from the tensorflow team is pretty straigth forward. Hope this help!

Tensorflow: linking segmented images with data

I'm trying to implement a scenario where I have some 900x900 images, that have some feature I would like to segment out. I would thus like to create a network that outputs a binary 900x900 image, specifying whether each pixels contains the said feature.
I am able to load my input 900x900 int 32 images and my label 900x900 binary images using tf.train.shuffle_batch, but the images and labels are not linked, meaning each image is not correctly attached to it's corresponding label.
I have the data input images in one folder, and the labeled images in a second folder currently.
How can link the input images with my label images?
Much thanks!
This question is essentially a question about the best way to represent the labels that you have in a way ordered as per your training data.
Representation
A simple representation that you could use for a label can be a simple 900 x 900 mask (a numpy array) with 1's and 0's representing whether the corresponding pixel has or does not have the feature
Order
Assuming you have some sort of ordering or identifier in the file names that connects the input images and label images, you could load the input images and label images in the order of the file names:
def findpaths(path):
print(path)
im_paths = []
im_dict = {}
for root, dirs, files in os.walk(path, topdown=False):
# print(root, dirs, len(files))
for name in files:
if name.find('.png') != -1:
im_path = os.path.join(root, name)
im_paths.append(im_path)
im_id = int(re.findall('\d+', name)[0])
im_dict[im_path] = im_id
im_paths_sorted = sorted(im_paths, key=lambda x: im_dict[x])
return im_paths_sorted, im_dict
The above example shows a function that loads files with the naming convention as d-1.png, d-2.png ... so on. This will allow you to create a list of file paths, in order of an identifier in the filenames. You can do this for both , input and label images, and then maintain them as a numpy array in the order of the identifier.
Hope this helps!

sklearn: get feature names after L1-based feature selection

This question and answer demonstrate that when feature selection is performed using one of scikit-learn's dedicated feature selection routines, then the names of the selected features can be retrieved as follows:
np.asarray(vectorizer.get_feature_names())[featureSelector.get_support()]
For example, in the above code, featureSelector might be an instance of sklearn.feature_selection.SelectKBest or sklearn.feature_selection.SelectPercentile, since these classes implement the get_support method which returns a boolean mask or integer indices of the selected features.
When one performs feature selection via linear models penalized with the L1 norm, it's unclear how to accomplish this. sklearn.svm.LinearSVC has no get_support method and the documentation doesn't make clear how to retrieve the feature indices after using its transform method to eliminate features from a collection of samples. Am I missing something here?
For sparse estimators you can generally find the support by checking where the non-zero entries are in the coefficients vector (provided the coefficients vector exists, which is the case for e.g. linear models)
support = np.flatnonzero(estimator.coef_)
For your LinearSVC with l1 penalty it would accordingly be
from sklearn.svm import LinearSVC
svc = LinearSVC(C=1., penalty='l1', dual=False)
svc.fit(X, y)
selected_feature_names = np.asarray(vectorizer.get_feature_names())[np.flatnonzero(svc.coef_)]
I've been using sklearn 15.2, and according to LinearSVC documentation , coef_ is an array, shape = [n_features] if n_classes == 2 else [n_classes, n_features].
So first, np.flatnonzero doesn't work for multi-class. You'll have index out of range error. Second, it should be np.where(svc.coef_ != 0)[1] instead of np.where(svc.coef_ != 0)[0] . 0 is index of classes, not features. I ended up with using np.asarray(vectorizer.get_feature_names())[list(set(np.where(svc.coef_ != 0)[1]))]

ValueError: setting an array element with a sequence at fit(X, y) in k-nearest neighbor

i have an error at this line:neigh.fit(X, y) :
ValueError: setting an array element with a sequence.
I checked fit function and X is: {array-like, sparse matrix, BallTree, cKDTree}
My X is a list of list with first element solidity number and second elemnt humoment list (7 cells).
If i change and i take only first humoment number for having a pure list of list
give this error: query data dimension must match BallTree data dimension.
My code:
listafeaturevector = list()
path = 'imgknn/'
for infile in glob.glob( os.path.join(path, '*.jpg') ):
print("current file is: " + infile )
gray = cv2.imread(infile,0)
element = cv2.getStructuringElement(cv2.MORPH_CROSS,(6,6))
graydilate = cv2.erode(gray, element)
ret,thresh = cv2.threshold(graydilate,127,255,cv2.THRESH_BINARY_INV)
imgbnbin = thresh
#CONTOURS
contours, hierarchy = cv2.findContours(imgbnbin, cv2.RETR_TREE ,cv2.CHAIN_APPROX_SIMPLE)
print(len(contours))
for i in range (0, len(contours)):
fv = list() #1 feature vector
#HUMOMENTS
#print("humoments")
mom = cv2.moments(contours[i], 1)
Humoments = cv2.HuMoments(mom)
#print(Humoments)
fv.append(Humoments) #query data dimension must match BallTree data dimension
#SOLIDITY
area = cv2.contourArea(contours[i])
hull = cv2.convexHull(contours[i]) #ha tanti valori
hull_area = cv2.contourArea(hull)
solidity = float(area)/hull_area
fv.append(solidity)
#fv.append(elongation)
listafeaturevector.append(fv)
print("i have done")
print(len(listafeaturevector))
lenmatrice=len(listafeaturevector)
#KNN
X = listafeaturevector
y = [0,1,2,3]* (lenmatrice/4)
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X, y) #ValueError: setting an array element with a sequence.
print(neigh.predict([[1.1]]))
print(neigh.predict_proba([[0.9]]))
If i try to covert it in a numpy array:
listafv = np.dstack(listafeaturevector)
listafv=np.rollaxis(listafv,-1)
print(listafv.shape)
data = listafv.reshape((lenmatrice, -1))
print(data.shape)
#KNN
X = data
i got: setting an array element with a sequence
A couple of suggestions/questions:
Humoments = cv2.HuMoments(mom)
What is the class of the return value Humoments? a float or a list? If float, that is fine.
for each image file
for i in range (0, len(contours)):
fv = list() #1 feature vector
...
fv.append(Humoments)
...
fv.append(solidity)
listafeaturevector.append(fv)
The above code does not seem correct. In your problem, I think you need to a construct a feature vector for each image. So anything that is related to image i should go to the same feature vector x_i. Then you combine all feature vectors to get a list of feature vectors X. However, your listafeaturevector (or X) presents in the inner-most loop, it's obviously not correct.
Second, you have a loop against the number of elements in the contours, are you sure the number of elements stays the same for each image? Otherwise, the number of features (|x_i|) is totally different across different images, that might cause the error of
setting an array element with a sequence.
Third, are you clear about how you want to classify the images? what are the target values/labels of different images? I see you just setting labels with [0,1,2,3]* (lenmatrice/4). Can you elaborate on what you are trying to do with those images? Are they containing different type of object? Are they showing different patterns? Are those images describe different topic/color? If yes, for each different type, you give a different label - either 0,1,2 or 'red','white','black' (assume you have only 3 types). The values of the label do not matter. What matters is how many values they have. I am trying to understand the difference of labels in your case.
On the other hand, if you only want to retrieve similar images, you don't need to use a classifier or specify a label for each image. Instead, try to use NearestNeighbors.
print(neigh.predict([[1.1]]))
print(neigh.predict_proba([[0.9]]))
Fourth, the above two lines of test are not correct. You need to set an X-like object in order to get a prediction from the classifier. That is to say, you need a feature vector x with the identical structure as you constructed in your training examples (with all h,e,s in the same order).