Xcode's "Create ML" developer tool always "converged early at 10 iterations" - createml

I am using Xcode's Create ML tool (right-click on Xcode icon in the dock → Open Developer Tool → Create ML). It works pretty well except that it always completes the training with the message:
Completed training - converged early at 10 iterations
even though I set Maximum Iterations to 100 (default is 25). When I test the model using new images by dragging them into the Preview tab and if it fails to identify the image correctly, I would add it to the training data, and rerun the training afresh. But again it would converge early at 10 iterations. Seems like it never gets to see the newly added images which were supposed to help improve its accuracy. How do I make Create ML work on all the images in the training data?


How to disable summary for Tensorflow Estimator?

I'm using Tensorflow-GPU 1.8 API on Windows 10. For many projects I use the tf.Estimator's, which really work great. It takes care of a bunch of steps including writting summaries for Tensorboard. But right now the 'events.out.tfevents' file getting way to big and I am running into "out of space" errors. For that reason I want to disable the summary writting or at least reduce the amount of summaries written.
Going along with that mission I found out about the RunConfig you can pass over at construction of tf.Estimator. Apparently the parameter 'save_summary_steps' (which by default is 200) controls the way summaries are wrtitten out. Unfortunately changing this parameter seems to have no effect at all. It won't disable (using None value) the summary or reducing (choosing higher values, e.g. 3000) the file size of 'events.out.tfevents'.
I hope you guys can help me out here. Any help is appreciated.
I've observed the following behavior. It doesn't make sense to me so I hope we get a better answer:
When the input_fn gets data from tf.data.TFRecordDataset then the number of steps between saving events is the minimum of save_summary_steps and (number of training examples divided by batch size). That means it does it a minimum of once per epoch.
When the input_fn gets data from tf.TextLineReader, it follows save_summary_steps as you'd expect and I can give it a large value for infrequent updates.

What are the methods for reducing false positives with tensorflow object detection?

I am training a single object detector with mask rcnn and I have tried several methods for reducing false positives. I started with a few thousand examples of images of the object with bounding boxes and trained that, got decent results, but when running on images that don't contain that object, would often get false matches with high confidence (sometimes .99).
The first thing I tried was adding the hard example miner in the config file. I believe I did this correctly because I added a print statement to ensure the object gets created. However none of the configs for faster rcnn have hard example mining in them. So I am suspicious that the miner only works correctly for ssd. I would expect a noticeable improvement with a hard example miner but I did not see it
The second thing I tried was to add "background" images. I set the minimum number of negatives to a non-zero value in the hard example miner config and added tons of background images that previously got false detections as part of the training. I even added these images into the tfrecords file so that it would be balanced evenly with images that do have the object. This approach actually made things worse - and gave me more false detections
The last thing I tried was creating another category, called "object-background" and took all the false matches and assigned them to this new category. This approach worked pretty well, but I view it as a hack.
I guess to summarize my main question is - what is the best method for reducing false positives within the current tensorflow object detection framework? Would SSD be a better approach since that seems to have a hard example miner built into it by default in the configs?
After some more investigation I actually was able to get the hard example miner with faster rcnn working. I had a bug where I wasn't actually inserting background images into the tf records file.
I think when training a single object detector (category with one model) it's most crucial to add background images if you want to have good precision/recall. If you just have a few thousand examples of the object, that won't be nearly enough images for the model to learn all the various background noise you will be sending when actually using the model for your application

Approx. Graphics card memory required to run Tensor flow object detection with faster_rcnn_nas model

I recently tried running tensor flow object-detection with faster_rcnn_nas model on k80 graphics card, which got usable 11 GB memory. But still it crashed and appears that it required more memory based on the console errors.
My training data set has 1000 images of size 400X500 (approx.) and test data has 200 images of same size.
I am wondering what would be the approx memory needed for running the aster_rcnn_nas model or in general is it possible to know the memory requirements for any other model ?
Tensorflow doesn't have an easy way to compute memory requirements (yet), and it's a bit of a job to work it out by hand. You could probably just run it on the CPU and look at the process to get a ballpark number.
But the solution to your problem is straightforward. Trying to pass 1000 images of size 400x500 is insanity. That would probably exceed the capacity of the largest 24GB GPUs. You can probably only pass through 10's or 100's of images per batch. You need to split up your batch and process it in multiple iterations.
In fact during training you should be taking a random sample of images and training on this (this is the "stochastic" part of gradient descent). This is known as the "batch size". For the test set you might get all 200 images to go through (since you don't run backprop), but if not then you'll have to split up the test set too (this is quite common).

tensorflow one of 20 parameter server is very slow

I am trying to train DNN model using tensorflow, my script have two variables, one is dense feature and one is sparse feature, each minibatch will pull full dense feature and pull specified sparse feature using embedding_lookup_sparse, feedforward could only begin after sparse feature is ready. I run my script using 20 parameter servers and increasing worker count did not scale out. So I profiled my job using tensorflow timeline and found one of 20 parameter server is very slow compared to the other 19. there is not dependency between different part of all the trainable variables. I am not sure if there is any bug or any limitation issues like tensorflow can only queue 40 fan out requests, any idea to debug it? Thanks in advance.
tensorflow timeline profiling
It sounds like you might have exactly 2 variables, one is stored at PS0 and the other at PS1. The other 18 parameter servers are not doing anything. Please take a look at variable partitioning (https://www.tensorflow.org/versions/master/api_docs/python/state_ops/variable_partitioners_for_sharding), i.e. partition a large variable into small chunks and store them at separate parameter servers.
This is kind of a hack way to log Send/Recv timings from Timeline object for each iteration, but it works pretty well in terms of analyzing JSON dumped data (compared to visualize it on chrome://trace).
The steps you have to perform are:
download TensorFlow source and checkout a correct branch (r0.12 for example)
modify the only place that calls SetTimelineLabel method inside executor.cc
instead of only recording non-transferable nodes, you want to record Send/Recv nodes also.
be careful to call SetTimelineLabel once inside NodeDone as it would set the text string of a node, which will be parsed later from a python script
build TensorFlow from modified source
modify model codes (for example, inception_distributed_train.py) with correct way of using Timeline and graph meta-data
Then you can run the training and retrieve JSON file once for each iteration! :)
Some suggestions that were too big for a comment:
You can't see data transfer in timeline that's because the tracing of Send/Recv is currently turned off, some discussion here -- https://github.com/tensorflow/tensorflow/issues/4809
In the latest version (nightly which is 5 days old or newer) you can turn on verbose logging by doing export TF_CPP_MIN_VLOG_LEVEL=1 and it shows second level timestamps (see here about higher granularity).
So with vlog perhaps you can use messages generated by this line to see the times at which Send ops are generated.

Tensorboard Event File Is Large and Growing

I am training a fairly complex network and noticed that the event file continues to grow during training and can reach a size of 2GB or more. I assume that the size of the event file should be approximately constant over a training session -- correct? I tried using tf.get_default_graph().finalize() placed just before the training loop per mrry's suggestion but that did not cause any errors. How else can I go about debugging this?