YOLO 3 Assertion '0' failed error - NOT CRLF error - object-detection

I'm looking to train my own object detector using YOLO 3 for a single class. Basically, it needs to detect whether the test images have the object or not. I face an error where the training doesn't begin and exits with an assertion '0' failed. I checked other answers which say that the CRLF must be encoded as LF for it to work on linux. But that solution doesn't work either. I'm following all the steps outlined in pjreddie's website!

The error happens due to image size or with batch size (also sometimes with sub-division also).
You can try to tune the parameters to lower values.

Okay, so what helped was re-starting the server and re-doing everything from the beginning.
It hasn't happened again, so I think it was just a random error?
I'm not sure though. I will try to look into it further.

Related

How to check the root cause of CUDA out of memory issue in the middle of training?

I'm running roberta on huggingface language_modeling.py. After doing 400 steps I suddenly get a CUDA out of memory issue. Don't know how to deal with it. Can you please help? Thanks
This can have multiple reasons. If you only get it after a few iterations, it might be that you don't free the computational graphs. Do you use loss.backward(retain_graph=True) or something similar?
Also, when you're running inference, be sure to use
with torch.no_grad():
model.forward(...)
Otherwise the computational graphs are saved there as well and potentially never freed since you never call backward() on them.
My problem was that I didn't check the size of my GPU memory with comparison to the sizes of samples. I had a lot of pretty small samples and after many iterations a large one. My bad.
Thank you and remember to check these things if it happens to you to.

Tensorflow: The graph couldn't be sorted in topological order only when running in terminal

I encounter some problem while running the python script on Google cloud computing instance, with python 3.6, tensorflow 1.13.1. I see several people encounter similar problems of loops in computational graph on stackoverflow. But none of it really find the culprit for it. And I observe something interesting so maybe someone experienced can figure it out.
The error message is like this:
2019-05-28 22:28:57.747339: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-05-28 22:28:57.754195: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
My script for train.py will look like this:
import A,B,C
...
def main():
....
if __name__ == '__main__':
main()
So I will show my two ways to run this script:
VERSION1:
In terminal,
python3 train.py
This gives me the error like I state above. When I only use CPU, i notice it throws me something like failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected. So I add GPU to my instance but the loop in computational graph is still there.
VERSION 2(This is where weird thing happens):
I just simply copy, with nothing changed, the code in main to the jupyter notebook and run there. Then suddenly, no error occurs anymore.
I don't really know what's going on under the hood. I just notice the message at the beginning of the running is not the same between two different ways of running the code.
If you encounter the same problem, copy to jupyter notebook might help directly. I would really like share more info if someone has any ideas what might possible cause this. Thank you!
Well it turns out, no matter what, I choose a wrong way to build the graph at the beginning, which will not give the loop in my perspective. The loop error give me an idea I'm doing something wrong. But the interesting question I mentioned above is still not answered! However, I'd like to share my error so anyone see the loop error should think whether you're doing the same thing as me.
In the input_fn, I use tensor.eval() to get corresponding numpy.array in the middle to interact with data outside of that function. I choose not to use tf.data.Dataset because the whole process is complicated and I can't compress the whole thing into Dataset directly. But it turns out this approach sabotage the static computational graph design of Tensorflow. So during training, it trains on the same batch again and again. So my two cents advice is that if you want to achieve something super complex in your input_fn. It's likely you will be better off or even only doing the right thing by using the old fashioned modelling way- tf.placeholder.

Drawing an image using Vulkan

I'm stuck so I took my code and wrote a smallish example to illustrate my issue. The texture renders all black. The target is vkwayland, but I exported my buffers and created vktest to make testing simpler.
Edit: links to sources and renderdocs:
https://www.reddit.com/r/vulkan/comments/abp7re/a_smallish_example_of_drawing_an_image_that/ed977in
Validation layers silent.
Much discussion here: https://www.reddit.com/r/vulkan/comments/abp7re/a_smallish_example_of_drawing_an_image_that/ed4r5br
So, as resolved on reddit:
You are using obsolete SDK (and Validation layers), therefore you get no error in this case.
And you are reading UINT image format through FLOAT sampler, which yields undefined values (in your case zeroes, i.e. black).

How to disable summary for Tensorflow Estimator?

I'm using Tensorflow-GPU 1.8 API on Windows 10. For many projects I use the tf.Estimator's, which really work great. It takes care of a bunch of steps including writting summaries for Tensorboard. But right now the 'events.out.tfevents' file getting way to big and I am running into "out of space" errors. For that reason I want to disable the summary writting or at least reduce the amount of summaries written.
Going along with that mission I found out about the RunConfig you can pass over at construction of tf.Estimator. Apparently the parameter 'save_summary_steps' (which by default is 200) controls the way summaries are wrtitten out. Unfortunately changing this parameter seems to have no effect at all. It won't disable (using None value) the summary or reducing (choosing higher values, e.g. 3000) the file size of 'events.out.tfevents'.
I hope you guys can help me out here. Any help is appreciated.
Cheers,
Tobs.
I've observed the following behavior. It doesn't make sense to me so I hope we get a better answer:
When the input_fn gets data from tf.data.TFRecordDataset then the number of steps between saving events is the minimum of save_summary_steps and (number of training examples divided by batch size). That means it does it a minimum of once per epoch.
When the input_fn gets data from tf.TextLineReader, it follows save_summary_steps as you'd expect and I can give it a large value for infrequent updates.

How to give best chance of success to an OCR software?

I am using Tesseract OCR (via pytesser) and PIL (Python Image Library) for automated test of an application.
I am checking that the displayed text is ok by making a screenshot and getting the text thanks to tesseract.
I had some issues in the beginning and it seems to work better since I have increased the size of the screenshot thanks to the bicubic interpolation of PIL.
Unfortunatelly, I still have some mistakes like confusion between '0' and 'O'. I can imagine that I will have other similar issues in the future.
I would like to know if there are some techniques to prepare an image in order to help the OCR. Any idea is welcomed.
Thanks in advance
Shameless plug and disclaimer: my company packages Tesseract for use in .NET
Tesseract is an OK OCR engine. It can miss a lot and gets readily confused by non-text. The best thing you can do for it is to make sure it gets text only. The next best thing is to give it something sanely binarized (adaptive or dynamic threshold to get there) or grayscale and let it try to do binarization.
Train tesseract to recognize your font
Make image extra clean and with enough free space around characters
Profit :)
Here are few real world examples.
First image is original image (croped power meter numbers)
Second image is slightly cleaned up image in GIMP, around 50% OCR accuracy in tesseract
Third image is completely cleaned image - 100% OCR recognized without any training!
Even under the best conditions OCR variants will sneak up on you. Your best option will be to design your tests to be aware of them.
For distinguishing between 0 and O, one simple solution is to choose a font that distinguishes between both (eg: 0 has a dash or dot in its middle). Would that be acceptable in your application?
Another solution is to apply a dictionary-based step after the character-by-character analysis of the text - feeding the recognized text into some form of spell-checker or validator to differentiate between difficult characters.
For instance, a round symbol followed by other numbers is most likely to be a zero, while the same symbol followed by letters is most likely to be a capital o. It's a trivial example, but it shows how context is necessary to make a more reliable OCR system.