Remove data from tensorboard event files to make them smaller - tensorflow

When I train a model for multiple days with image summary activated, my .tfevent files are huge ( > 70GiB).
I don't want to deactivate the image summary as it allows me to visualize the progress of the network during training. However, once the network is trained, I don't need those information anymore (in fact, I'm not even sure it is possible to visualize previous images with tensorboard).
I would like to be able to remove them from the event file without loosing other information like the loss curve (as it is useful to compare models together).
The solution would be to use two separate summary (one for the images and one for the loss) but I would like to know if there is a better way.

It is sure better to save the big summaries less often as Terry has suggested, but in case you already have an event file which is huge, you can still reduce its size by deleting some of the summaries.
I have had this issue, where I have saved a lot of image summaries, which I don't need now, so I have written a script to copy the eventfile, while only leaving the scalar summaries:
https://gist.github.com/serycjon/c9ad58ecc3176d87c49b69b598f4d6c6
The important stuff is:
for event in tf.train.summary_iterator(event_file_path):
event_type = event.WhichOneof('what')
if event_type != 'summary':
writer.add_event(event)
else:
wall_time = event.wall_time
step = event.step
# possible types: simple_value, image, histo, audio
filtered_values = [value for value in event.summary.value if value.HasField('simple_value')]
summary = tf.Summary(value=filtered_values)
filtered_event = tf.summary.Event(summary=summary,
wall_time=wall_time,
step=step)
writer.add_event(filtered_event)
you can use this as a base for more complicated stuff, like leaving only every 100-th image summary, filtering based on summary tag, etc.

If you look at the event types in the log using #serycjon's loop you'll see that the graph_def and meta_graph_def might be saved often.
I had 46 GB worth of logs that I reduced to 1.6 GB by removing all the graphs. You can leave one graph so that you can still view it in tensorboard.

Just handled this problem, hoping this is not too late.
My slolution is to save your image summary every 100(or other value) training steps, then the growth speed of the .tfevent's file size will be slow down, eventually the file size will be much smaller.

Related

TensorFlow Object Detection API: evaluation mAP behaves weirdly?

I am training an object detector for my own data using Tensorflow Object Detection API. I am following the (great) tutorial by Dat Tran https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9. I am using the provided ssd_mobilenet_v1_coco-model pre-trained model checkpoint as the starting point for the training. I have only one object class.
I exported the trained model, ran it on the evaluation data and looked at the resulted bounding boxes. The trained model worked nicely; I would say that if there was 20 objects, typically there were 13 objects with spot on predicted bounding boxes ("true positives"); 7 where the objects were not detected ("false negatives"); 2 cases where problems occur were two or more objects are close to each other: the bounding boxes get drawn between the objects in some of these cases ("false positives"<-of course, calling these "false positives" etc. is inaccurate, but this is just for me to understand the concept of precision here). There are almost no other "false positives". This seems much better result than what I was hoping to get, and while this kind of visual inspection does not give the actual mAP (which is calculated based on overlap of the predicted and tagged bounding boxes?), I would roughly estimate the mAP as something like 13/(13+2) >80%.
However, when I run the evaluation (eval.py) (on two different evaluation sets), I get the following mAP graph (0.7 smoothed):
mAP during training
This would indicate a huge variation in mAP, and level of about 0.3 at the end of the training, which is way worse than what I would assume based on how well the boundary boxes are drawn when I use the exported output_inference_graph.pb on the evaluation set.
Here is the total loss graph for the training:
total loss during training
My training data consist of 200 images with about 20 labeled objects each (I labeled them using the labelImg app); the images are extracted from a video and the objects are small and kind of blurry. The original image size is 1200x900, so I reduced it to 600x450 for the training data. Evaluation data (which I used both as the evaluation data set for eval.pyand to visually check what the predictions look like) is similar, consists of 50 images with 20 object each, but is still in the original size (the training data is extracted from the first 30 min of the video and evaluation data from the last 30 min).
Question 1: Why is the mAP so low in evaluation when the model appears to work so well? Is it normal for the mAP graph fluctuate so much? I did not touch the default values for how many images the tensorboard uses to draw the graph (I read this question: Tensorflow object detection api validation data size and have some vague idea that there is some default value that can be changed?)
Question 2: Can this be related to different size of the training data and the evaluation data (1200x700 vs 600x450)? If so, should I resize the evaluation data, too? (I did not want to do this as my application uses the original image size, and I want to evaluate how well the model does on that data).
Question 3: Is it a problem to form the training and evaluation data from images where there are multiple tagged objects per image (i.e. surely the evaluation routine compares all the predicted bounding boxes in one image to all the tagged bounding boxes in one image, and not all the predicted boxes in one image to one tagged box which would preduce many "false false positives"?)
(Question 4: it seems to me the model training could have been stopped after around 10000 timesteps were the mAP kind of leveled out, is it now overtrained? it's kind of hard to tell when it fluctuates so much.)
I am a newbie with object detection so I very much appreciate any insight anyone can offer! :)
Question 1: This is the tough one... First, I think you don't understand correctly what mAP is, since your rough calculation is false. Here is, briefly, how it is computed:
For each class of object, using the overlap between the real objects and the detected ones, the detections are tagged as "True positive" or "False positive"; all the real objects with no "True positive" associated to them are labelled "False Negative".
Then, iterate through all your detections (on all images of the dataset) in decreasing order of confidence. Compute the accuracy (TP/(TP+FP)) and recall (TP/(TP+FN)), only counting the detections that you've already seen ( with confidence bigger than the current one) for TP and FP. This gives you a point (acc, recc), that you can put on a precision-recall graph.
Once you've added all possible points to your graph, you compute the area under the curve: this is the Average Precision for this category
if you have multiple categories, the mAP is the standard mean of all APs.
Applying that to your case: in the best case your true positive are the detections with the best confidence. In that case your acc/rec curve will look like a rectangle: you'd have 100% accuracy up to (13/20) recall, and then points with 13/20 recall and <100% accuracy; this gives you mAP=AP(category 1)=13/20=0.65. And this is the best case, you can expect less in practice due to false positives which higher confidence.
Other reasons why yours could be lower:
maybe among the bounding boxes that appear to be good, some are still rejected in the calculations because the overlap between the detection and the real object is not quite big enough. The criterion is that Intersection over Union (IoU) of the two bounding boxes (real one and detection) should be over 0.5. While it seems like a gentle threshold, it's not really; you should probably try and write a script to display the detected bounding boxes with a different color depending on whether they're accepted or not (if not, you'll get both a FP and a FN).
maybe you're only visualizing the first 10 images of the evaluation. If so, change that, for 2 reasons: 1. maybe you're just very lucky on these images, and they're not representative of what follows, just by luck. 2. Actually, more than luck, if these images are the first from the evaluation set, they come right after the end of the training set in your video, so they are probably quite similar to some images in the training set, so they are easier to predict, so they're not representative of your evaluation set.
Question 2: if you have not changed that part in the config file mobilenet_v1_coco-model, all your images (both for training and testing) are rescaled to 300x300 pixels at the start of the network, so your preprocessings don't matter.
Question 3: no it's not a problem at all, all these algorithms were designed to detect multiple objects in images.
Question 4: Given the fluctuations, I'd actually keep training it until you can see improvement or clear overtraining. 10k steps is actually quite small, maybe it's enough because your task is relatively easy, maybe it's not enough and you need to wait ten times that to have significant improvement...

Shuffling the training dataset with Tensorflow object detection api

I'm working on a logo detection algorithm using the Faster-RCNN model with the Tensorflow object detection api.
My dataset is alphabetically ordered (so there are a hundred adidas logo, then hundred apple logo etc.). And i would like it to be shuffled while training.
I've put some values in the config file:
train_input_reader:{
shuffle: true
queue_capacity: some value
min_after_dequeue : some other value}
However whatever are the values, I'm putting in, algorithm is at first training on all of the a's logos (adidas, apple and so on) and only a lapse of time after starting to see the b's logos (bmw etc.) and the c's one etc.
Of course I could just shuffle my input dataset directly, but I would like to understand the logic behind it.
PS: I've seen this post about shuffling and min_after_dequeue, but I still dont quite get it. My batch size is 1 so it shouldn't be using tf.train.shuffle_batch() but only tf.RandomShuffleQueue
My training dataset size is 5000 and if I write min_after_dequeue: 4000 or 5000 it is still not shuffled right. Why though?
Update:
#AllenLavoie It's a bit hard for me; as there is a lot of dependencies and I'm new to Tensorflow.
But in the end the queue is constructed by
tf.contrib.slim.parallel_reader.parallel_read( _, string_tensor = parallel_reader.parallel_read(
config.input_path,
reader_class=tf.TFRecordReader,
num_epochs=(input_reader_config.num_epochs
if input_reader_config.num_epochs else None),
num_readers=input_reader_config.num_readers,
shuffle=input_reader_config.shuffle,
dtypes=[tf.string, tf.string],
capacity=input_reader_config.queue_capacity,
min_after_dequeue=input_reader_config.min_after_dequeue)
It seems that when I'm putting num_readers = 1 in the config file the dataset is finally shuffling as I want, (at least in the beginning), but when there are more somehow on the start the logos are getting in the alphabetical order.
I recommend shuffling the dataset prior to training. The way shuffling currently happens is imperfect and my guess at what is happening is that at the beginning the queue starts off empty and only gets examples that start with 'A' --- after a while it may be more shuffled, but there is no getting around the beginning part when the queue hasn't been filled yet.

Can I change Inv operation into Reciprocal in an existing graph in Tensorflow?

I am working on an image classification problem with tensorflow. I have 2 different CNNs trained separately (in fact 3 in total but I will deal with the third later), for different tasks and on a AWS (Amazon) machine. One tells if there is text in the image and the other one tells if the image is safe for work or not. Now I want to use them in a single script on my computer, so that I can put an image as input and get the results of both networks as output.
I load the two graphs in a single tensorflow Session, using the import_meta_graph API and the import_scope argument and putting each subgraph in a separate scope. Then I just use the restore method of the created saver, giving it the common Session as argument.
Then, in order to run inference, I retrieve the placeholders and final output with graph=tf.get_default_graph() and my_var=graph.get_operation_by_name('name').outputs[0] before using it in sess.run (I think I could just have put 'name' in sess.run instead of fetching the output tensor and putting it in a variable, but this is not my problem).
My problem is the text CNN works perfectly fine, but the nsfw detector always gives me the same output, no matter the input (even with np.zeros()). I have tried both separately and same story: text works but not nsfw. So I don't think the problem comes from using two networks simultaneaously.
I also tried on the original AWS machine I trained it on, and this time the nsfw CNN worked perfectly.
Both networks are very similar. I checked on Tensorboard if everything was fine and I think it is ok. The differences are in the number of hidden units and the fact that I use batch normalization in the nsfw model and not in the text one. Now why this title ? I observed that I had a warning when running the nsfw model that I didn't have when using only the text model:
W tensorflow/core/framework/op_def_util.cc:332] Op Inv is deprecated. It will cease to work in GraphDef version 17. Use Reciprocal.
So I thougt maybe this was the reason, everything else being equal. I checked my GraphDef version, which seems to be 11, so Inv should still work in theory. By the way the AWS machine use tensroflow version 0.10 and I use version 0.12.
I noticed that the text network only had one Inv operation (via a filtering on the names of the operations given by graph.get_operations()), and that the nsfw model had the same operation plus multiple Inv operations due to the batch normalization layers. As precised in the release notes, tf.inv has simply been renamed to tf.reciprocal, so I tried to change the names of the operations to Reciprocal with tf.group(), as proposed here, but it didn't work. I have seen that using tf.identity() and changing the name could also work, but from what I understand, tensorflow graphs are an append-only structure, so we can't really modify its operations (which seems to be immutable anyway).
The thing is:
as I said, the Inv operation should still work in my GraphDef version;
this is only a warning;
the Inv operations only appear under name scopes that begin with 'gradients' so, from my understanding, this shouldn't be used for inference;
the text model also have an Inv operation.
For these reasons, I have a big doubt on my diagnosis. So my final questions are:
do you have another diagnosis?
if mine is correct, is it possible to replace Inv operations with Reciprocal operations, or do you have any other solution?
After a thorough examination of the output of relevant nodes, with the help of Tensorboard, I am now pretty certain that the renaming of Inv to Reciprocal has nothing to do with my problem.
It appears that the last batch normalization layer eliminates almost any variance of its output when the inputs varies. I will ask why elsewhere.

Tensorflow--how to limit epochs with evaluation only?

Given that I train a model; save it off with metagraph/save.Saver, and the load that graph into a new script/process to test against test data, what is the best way to make sure I only iterate over the test data once?
With my training data, I want to be able to iterate over the entire data set for an arbitrary number of iterations. I use
tf.train.string_input_producer()
to drive a queue of loading files for training, so I can safely leave num_epochs as default (=None) and let other controls drive training termination.
However, when I run the graph for evaluation, I just want to the evaluate the test set once (and gather the appropriate statistics).
Initial attempted solution:
Make a tensor for Epochs, and pass that into tf.train.string_input_producer, and then tf.Assign it to the appropriate value based on test/train.
But:
tf.train.string_input_producer only takes integers as num_epochs, so this isn't possible...unless I'm missing something.
Further notes: I use
tf.train.batch()
to read-in test/train data that has been serialized into protocol buffers (https://www.tensorflow.org/versions/r0.11/how_tos/reading_data/index.html#file-formats), so I have minimal visibility into how the data is loaded and how far along it is.
tf.train.batch apparently will throw tf.errors.OutOfRangeError, but I'm not clear how to catch that successfully, or if that is even what I really want to do. I tried a very naive
try...except...finally
(like in https://www.tensorflow.org/versions/r0.11/how_tos/reading_data/index.html#creating-threads-to-prefetch-using-queuerunner-objects), which didn't catch the error from tf.train.batch.

Incorporating very large constants in Tensorflow

For example, the comments for the Tensorflow image captioning example model state:
NOTE: This script will consume around 100GB of disk space because each image
in the MSCOCO dataset is replicated ~5 times (once per caption) in the output.
This is done for two reasons:
1. In order to better shuffle the training data.
2. It makes it easier to perform asynchronous preprocessing of each image in
TensorFlow.
The primary goal of this question is to see if there is an alternative to this type of duplication. In my use case, storing the data in this way would require each image to be duplicated in the TFRecord files many more times, on the order of 20 - 50 times.
I should note first that I have already fed the images through VGGnet to extract 4096 dim features, and I have these stored as a mapping between filename and the vectors.
Before switching over to Tensorflow, I had been feeding batches containing filename strings and then looking up the corresponding vector on a per-batch basis. This allows me to store all of the image data in ~15GB without needing to duplicate the data on disk.
My first attempt to do this in in Tensorflow involved storing indices in the TFExample buffers and then doing a "preprocessing" step to slice into the corresponding matrix:
img_feat = pd.read_pickle("img_feats.pkl")
img_matrix = np.stack(img_feat)
preloaded_images = tf.Variable(img_matrix)
first_image = tf.slice(preloaded_images, [0,0], [1,4096])
However, in this case, Tensorflow disallows a variable larger than 2GB. So my next thought was to partition this across several variables:
img_tensors = []
for i in range(NUM_SPLITS):
with tf.Graph().as_default():
img_tensors.append(tf.Variable(img_matrices[i], name="preloaded_images_%i"%i))
first_image = tf.concat(1, [tf.slice(t, [0,0], [1,4096//NUM_SPLITS]) for t in img_tensors])
In this case, I'm forced to store each partition on a separate graph, because it seems any one graph cannot be this large either. However, now the concat fails because each tensor I am concatenating is on a separate graph.
Any advice on incorporating a large amount (~15GB) of preloaded into the Tensorflow graph.
Potentially related is this question; however in this case I'd like to override the decoding of the actual JPEG file with the preprocessed value in a tensor op.