I am building a RCNN detection network using Tensorflow's object detection API.
My goal is to detect bounding boxes for animals in outdoor videos. Most frames do not have animals and are just of dynamic backgrounds.
Most tutorials focus on training custom labels, but make no mention of negative training samples. How do these class of detectors deal with images which do not contain objects of interest? Does it just output a low probability, or will it force to try to draw a bounding box within an image?
My current plan is to use traditional background subtraction in opencv to generate potential frames and pass them to a trained network. Should I also include a class of 'background' bounding boxes as 'negative data'?
The final option would be to use opencv for background subtraction, RCNN to generate bounding boxes, then a classification model of crops to identify animals versus background.
In general it's not necessary to explicitly include "negative images". What happens in these detection models is that they use the parts of the image that don't belong to the annotated objects as negatives.
If you expect your model to differentiate between "found a figure" and "no figure", then you will almost certainly need to train it on negative examples. Label these as "no image". In the "no image" case, yes, use the entire image as the bounding box; don't suggest that the model recognize anything smaller.
In "no image" cases, you may get a smaller bounding box, but that doesn't matter: in inference, you'll simply ignore whatever box is returned for "no image".
Of course, the critical issue here is to try it out, and see how well it works for you.
I have found success by scanning my ground truth, copying the box areas plus a margin, then pasting tilings of those box areas onto new background images (guaranteed to have no objects), and creating corresponding XML files with the box category assertions.
I collect non-objects as "uncategorised" boxes - usually from glitches in the output from my latest model. These are tiled (just like the "is-objects") but are not updated in the XML files.
I produce tilings at various scales to build each new training set.
A further explanation and sample python code is here:
https://github.com/brentcroft/ground-truth-productions
Related
please help me with training my own dataset on mask_rcnn_inception_resnet_v2_atrous_coco model.
https://github.com/tensorflow/models/tree/master/research/object_detection
model:https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
I have refered to https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/instance_segmentation.md ; but I can't clearly understand the steps.
Do we have to give the Bounding box coordinates of the object along with the mask.png file?
How to convert the mask data to tfRecord files (for instance segmentation).?
Can anyone suggest the labelling tool used for bounding box as well as mask.png file!!
tools like LabelBox, labelme, labelimg gives either bounding box coordinated or mask.png file or the polygon coordinates for the object.
please help
The best you give png mask and xml labelization it should be working with create_pet_tf_record.py, set faces_only=false in this file... You can see into the code what is expected in this file..
change path into to point your directories in pipeline configuration
Do we have to give the Bounding box coordinates of the object along with the mask.png file?
Answer: Yes, you need the original images, bounding box files, and mask images.
Use the following tool to annotate each object in your original images Label image
Once you're done with this, you need to annotate each pixel inside each bounding box. There are several tools you can use, for example you can use these tool VGG annotator
According to information on the Nvidia website Digits uses datatasets in Kitti format. Is there possibilty in Digits or in external application to prepare such dataset or I will have to write it on my own?
I would like to simply draw bounding boxes on the displayed image and then have it converted to txt appropiate txt file.
Thanks in advance!
Yep, you can use one of the available solutions for bounding box annotations eg. RectLabel, save the annotations in Pascal VOC format and then transform it to Kitti using one of the freely available converters, eg: VOD Converter
I want to create a visualization of a matrix for some academic work. I decided to go about this by having the pixels in the image correspond to the values in the matrix. I created the nice small png that follows:
When properly scaled up, you get a very reasonable image:
This is a screenshot from within inkscape. However, when export this as a pdf, both evince and chrome do a terrible job at upscaling what should be very trivial, and instead I get something that looks like:
The pdf itself seems to scale appropriately well for printing, but unfortunately I do a lot of my editing without printing, and this looks unacceptable. I did find this incredibly old thread about people seeming to have a similar issue with chrome's pdf viewer, and the "solution" was to just upscale the raster graphics. This is a solution, but is terribly inefficient.
Is anyone aware of a way to change the pdf so that it gets upscaled appropriately? Maybe a config change in evince or chrome that will render these properly? Even a nice way to go from a raster image to a vector image might be suitable?
The comments aggregated into an answer...
An image dictionary in a PDF has an (optional) boolean entry Interpolate. It is specified as a flag indicating whether image interpolation shall be performed by a conforming reader.
The program used by the OP to create the PDF, Inkscape, seems to have explicitly set this flag to true. Editing the PDF to unset this flag creates a file which looks as desired by the OP.
(This also is a solution proposed in this Inkscape forum thread eventually found by the OP, which is to save the PDF with high-resolution bitmaps embedded. File -> Inkscape Preferences -> Bitmaps -> Resolution for Create Bitmap Copy, and set it to 6000 dpi)
The fact that interpolation looks different in different viewers and different output media, is by design. The PDF specification states on interpolation:
A conforming Reader may choose to not implement this feature of PDF, or may use any specific implementation of interpolation that it wishes.
A different way to get around this problem (especially as some PDF viewers have the tendency to not really live up to the specification and e.g. interpolate ignoring that flag) would be to use vector graphics here, drawing the bitmap pixels as rectangles. The result should be optimal.
I am looking for a way to measure the coordinates of different rectangles on a PDF file?
Mainly I do have to perform some overprinting on an existing PDF and I need to know the x,y,w,h on where I am supposed to write the texts.
It seems that Preview.app on Mac has this ability but so far I wasn't able to find anything on Windows that does the same.
Please do not confuse this feature with the Measuring Tools from Adobe Reader which are used to measure distance in printed construction stuff, not the PDF page itself.
It seems that the default using of measure is point, so I need something that would allow to select a rectangle and that will tell me the coordinates.
Please do not suggest on exporting as a imagine and using something else to measure the pixels on the image.
Update: http://legacy.activepdf.com/support/knowledgebase/view.cfm?tk=rl&kb=11866 -- PDF Units, that's what I am looking for, something to measure the PDF coordinates in PDF units.
Disclaimer: I work for Atalasoft.
I know you said not to suggest this, but honestly, it's the easiest approach:
If you mean "sweep out a rectangle in the UI and report the coordinates", that's pretty straight forward, but it's going to be a build-your-own type of thing. What you will need are:
A PDF rasterizer (GhostScript, Acrobat, FoxIt, Atalasoft) to get you an image at a specific resolution.
A tool to display that image in a window and let you sweep out a rectangle (this is straight forward winforms type code for .NET, but we have a control that does this out of the box - combining 1 & 2 into one step).
A tool that can look at the structure of a PDF page and report back the crop box (if any) and the media box for each page (iText, DotPdf).
A tool/understanding of matrix transformations to build the matrix that goes from display space into PDF space (and/or vice versa, probably in iText, definitely in DotPdf)
The code flow becomes something like:
For each page:
Open document, pull out crop and media box, rasterize page, build transformation matrix.
Display image, build/hook into event for selection changing.
Push the image viewer rectangle coordinates through the transformation matrix.
Profit.
From a coding point of view (assuming 0 prior knowledge of this, but a decent understanding of linear algebra), from 3 days to a 2 weeks. If I were to write it, it would probably take on the order of a few hours, but I wrote most of our PDF tools and this is pretty easy.
If your goal is to intuit where rectangles are on the page and report back those coordinates, that's also doable, but it decidedly non-trivial in comparison. You need to write code that can rip through a PDF display list and interpret the contents correctly. That means being able to handle all the cumulative matrix transformations, the graphics state changes, the gstate object use, Form XObject placement, and so on. You need to answer the question "what is a rectangle?" because in PDF placement, it could be an re operator, a set of degenerate beziers, a set of lines, an image of a rectangle or (surprise!) a combination of all of the above. Honestly, intuiting anything about the content on a PDF page is a Herculean task.
I generate a figure in MATLAB with large amount of elements (100000+) and want to save it into a PDF file. With zbuffer or painters renderer I've got very large and slowly opened file (over 4 Mb) - all points are in vector format. Using OpenGL renderer rasterize the figure in PDF, ok for the plot, but not good for text labels. The file size is about 150 Kb.
Try this simplified code, for example:
x=linspace(1,10,100000);
y=sin(x)+randn(size(x));
plot(x,y,'.')
set(gcf,'Renderer','zbuffer')
print -dpdf -r300 testpdf_zb
set(gcf,'Renderer','painters')
print -dpdf -r300 testpdf_pa
set(gcf,'Renderer','opengl')
print -dpdf -r300 testpdf_op
The actual figure is much more complex with several axes and different types of plots.
Is there a way to rasterize the figure, but keep text labels as vectors?
Another problem with OpenGL is that is does not work in terminal mode (-nosplash -nodesktop -nodisplay) under Mac OSX. Looks like OpenGL is not supported. I have to use terminal mode for automation. The MATLAB version I run is 2007b. Mac OSX server 10.4.
This is a funny one. Your problem is not Matlab, it's Ghostscript (Matlab creates PDFs by calling Ghostscript, at least on Windows). When I run
x=linspace(1,10,100000);
y=sin(x)+randn(size(x));
plot(x,y,'.')
print -dpsc2 test.ps
I've got a 2Mb PS file (all vector, of course), which when compressed became a 164Kb ZIP. One would expect to get more-or-less the same result when converting PS to PDF, but ps2pdf test.ps produced your 4Mb file!
Since you are on a Mac, you probably have Distiller. I'd give it a try — generate PS files as above, and then run them through Distiller; you should get a 150K vector PDF.
If you insist on rasterizing, I can suggest printing the figure without any axes or labels to a tiff, opening the tiff, and recreating axes and labels on top of it.
If you don't want to go with a 2D histogram (i.e. an image where pixel brightness corresponds to density of points) as BlessedKey suggests, it looks like the only good way is to do the rasterizing yourself, as mentioned by AB.
getframe followed by frame2im seems to be the way to go for that. Unfortunately, getframe returns empty if you run with -nodisplay. Therefore, you'd have to save the figure as .fig, and on another computer run a script that
opens the figure, gets the content of the axes with getframe, displays the image from getframe and then saves to pdf.
As an alternative to simple plotting or a 2D histogram, you may want to look into scattercloud, which combines plotting the points with density information, by the way.
If at all possible you should try to subsample your problem before building the illustration. If you are plotting points on a curve then 10,000 is probably more than you need. A modern printer is only about 600 DPI afterall.
If the points are illustrating a cloud with some density properties, a better solution may be to build a two dimensional histogram first, and illustrate that with imshow or imagesc.
If multiple clouds are being illustrated with different colors you may be interested in building one such image for each cloud and the combining them with transparency.