can I get coordinates of objects in a image with yolov5? - object-detection

I want to know, if it is possible to use yolov5 to find objects in an image and than give back the type of object and where it is in the picture? Not only the image with the bounding boxes

found it:
results.pandas().xyxy[0]
or
results.xyxy[0]

Related

Is it possible to make pytesseract read only a certain part of an image using coordinates?

I know the coordinates of the rectangle I want it to read in, but how do I make it read data only in that box?

Is it possible to use polygon data annotation to perform tensorflow object detection?

My problem is not exactly annotate data using polygon, circle or line, it's how to use these annotated data to gerenate a ".tfrecord" file and perform an object detection. The tutorials I saw use rectangle annotation, like these: taylor swift detection raccon detection
It would be a great one for me if the objects I want to detect (pipelines) were not too close.
Example of rectangle drawn in PASCAL VOC format:
<bndbox>
<xmin>82</xmin>
<xmax>172</xmax>
<ymin>108</ymin>
<ymax>146</ymax>
</bndbox>
Is there a way to add a "mask" to highlight some part of this bounding box?
If it's something unclear, please let me know.
You can go for instance segmentation instead of object detection if your objects are very close to each other, there you can use polygons to generate masks and bounding boxes to train the model.
Consider this well presented and easy to use repository for mask-rcnn(kind of instance segmentation)
https://github.com/matterport/Mask_RCNN
check this for lite weight mask-rcnn

Bounding box coordinates via video object detection

Is it possible to get the coordinates of the detected object through video object detection? I used OpenCV to change video to images and LabelImg to put the bounding box to them. For the output, I want to be able to read (for a video file) the coordinates of the bounding box so I could get the centre of the box.
Object Detection works on per image or per frame basis.
The basic object detection model takes image as input and gives back the detected bounding boxes.
Now, what you need to do is, once your model is trained, read a video, frame by frame, or by skippng 'n' frames, pass the frame or image to the object detector and get the coordinates from it and show it into the output video frame.
That is how object detection works for a video.
Please refer to below links for references :
https://github.com/tensorflow/models/tree/master/research/object_detection
https://www.edureka.co/blog/tensorflow-object-detection-tutorial/

Resize images for object detection

I want to train images with mask RCNN and my understanding is that all the images need to be the same size. I also read that you can add "padding" to images so that you can retain the right aspect ration.
Does anyone know how to add padding to the images and resize?Does anyone have a code for that or an online tool which can do that?
Thanks
In opencv library, there are padding function that can add borders to your images.
also, resize function too.
refer to this webpage.

extract an object from image using some image processing filter

I am working on an application which something like that I have an image and e.g. there is a glass or a cup or a chair in it. The object can be of any type
My question here is that is there any way that i can apply some image processing filters or something like that which returns me an image that just contain the object and the background is transparent
You can use object detection methods such as
http://opencv.willowgarage.com/documentation/object_detection.html
http://docs.opencv.org/modules/objdetect/doc/latent_svm.html
to detect the object, plot a bounding box around it and extract it from the image.
depends on your application, but you can also use image difference (background subtraction) to get the object...
Actually I have solved the problem
the issue was I do not want to use any advance method that uses template matching or neural networks or anything like that
so in my case the aim was to recognize an object in an image and that object could be anything (e.g. a table,a cellphone, a person, a shirt etc) and the catch was that there could be at most one object in an image
so just using watershed segmentation of opencv I was able to separate the object from the background
but the threshold used for the watershed differs with respect to the frequency of the image and the difference of shades of the object from the background