Multiple ckpt, meta into single .pb file? - tensorflow

In tensorflow, the training produced the following files:
checkpoint
model.ckpt-10000.meta
model.ckpt-10000.data-00000-of-00001
model.ckpt-10000.index
model.ckpt-11000.meta
model.ckpt-11000.data-00000-of-00001
model.ckpt-11000.index
model.ckpt-12000.meta
model.ckpt-12000.data-00000-of-00001
model.ckpt-12000.index
model.ckpt-8000.meta
model.ckpt-8000.data-00000-of-00001
model.ckpt-8000.index
model.ckpt-9000.meta
model.ckpt-9000.data-00000-of-00001
model.ckpt-9000.index
I am interested in creating a .pb file from the output generated training; however, from the examples I have seen, it requires one set of intermediate output files. How do I merge all the output set files into a single .pb?

What you are trying to do does not make sense (at least to me). I recommend you to read about these checkpoint files here and here at first.
In short, checkpoint file just tells you what is the latest model. The .meta file stores info about your graph structure, .data stores values for variables and .index stores key/value pairs which have info where the values for each parameter can be found in .data files.
All your files look like model.ckpt-xxxx. This xxxx is the step number. So you have snapshots of training at different steps. And this is why it does not make sense to combine the value of the variable at step 9000 with the value at step 11000. Also .meta files are probably all the same.

Related

How to load in a downloaded tfrecord dataset into TensorFlow?

I am quite new to TensorFlow, and have never worked with TFRecords before.
I have downloaded a dataset of images from online and the download format was TFRecord.
This is the file structure in the downloaded dataset:
1.
2.
E.g. inside "test"
What I want to do is load in the training, validation and testing data into TensorFlow in a similar way to what happens when you load a built-in dataset, e.g. you might load in the MNIST dataset like this, and get arrays containing pixel data and arrays containing the corresponding image labels.
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
However, I have no idea how to do so.
I know that I can use dataset = tf.data.TFRecordDataset(filename) somehow to open the dataset, but would this act on the entire dataset folder, one of the subfolders, or the actual files? If it is the actual files, would it be on the .TFRecord file? How do I use/what do I do with the .PBTXT file which contains a label map?
And even after opening the dataset, how can I extract the data and create the necessary arrays which I can then feed into a TensorFlow model?
It's mostly archaeology, and plus a few tricks.
First, I'd read the README.dataset and README.roboflow files. Can you show us what's in them?
Second, pbtxt are text formatted so we may be able to understand what that file is if you just open it with a text editor. Can you show us what's in that.
The think to remember about a TFRecord file is that it's nothing but a sequence of binary records. tf.data.TFRecordDataset('balls.tfrecord') will give you a dataset that yields those records in order.
Number 3. is the hard part, because here you'll have binary blobs of data, but we don't have any clues yet about how they're encoded.
It's common for TFRecord filed to contian serialized tf.train.Example.
So it would be worth a shot to try and decode it as a tf.train.Example to see if that tells us what's inside.
ref
for record in tf.data.TFRecordDataset('balls.tfrecord'):
break
example = tf.train.Example()
example.ParseFromString(record.numpy())
print(example)
The Example object is just a representation of a dict. If you get something other than en error there look for the dict keys and see if you can make sense out of them.
Then to make a dataset that decodes them you'll want something like:
def decode(record):
return tf.train.parse_example(record, {key:tf.io.RaggedFeature(dtype) for key, dtype in key_dtypes.items()})
ds = ds.map(decode)

YOLO darknet retrain does not even start saying it could not find *.txt in some *labels* directory

I have been trying to retrain YOLOv3 on a custom dataset. I saved the jpg images and their corresponding txt annotation files in the same directory. I have set the my .data file, .names file and .cfg file appropriately as suggested in many tutorials online. Quite frustratingly, I have been running into the problem where it says Couldn't open file: <some-path>/labels/<some file>.txt. What is annoying here it seems to be looking for .txt files in some labels directory which neither exists, nor did I mention it anywhere. All my .jpg and .txt files are in a directory named images located at the same level as where the system is looking for this labels directory.
What is further annoying is if I do separate .txt files into a labels directory which is where the yolo darknet is looking for, this error goes away but the training never starts.
I have tried many different ways of specifying the paths, using different models, cfg files etc, but all in vain. Please help someone.
After making many attempts from different angles, I found out the right way. The answer to the first question of organizing images and labels is contrary to what most tutorials online suggest. labels should be located in a separate directory from the images. The path to the each .txt file should differ from their corresponding .jpg only in the word images. For example, if the path to an image is <path/to/somewhere/images/somewhere/xyz.jpg>, the path to its corresponding label file should be <path/to/somewhere/labels/somewhere/xyz.txt>.
As an answer to the second part where the training does not start, make sure to use argument -clear 1 at the end of the darknet training command, i.e. ./darknet detector train cfg/data_file.data cfg/cfg_file.cfg yolov3.weights -clear 1.
Remember, in your *.data file, you have the following settings:
classes = [Your number of classes]
train = data/train.txt
valid = data/test.txt
names = data/obj.names
backup = backup/
You should have data/train.txtand data/test.txt, which is a text file containining the directory list of your images.
For example, if you put all of your images and txt files at data/obj, the txt file should contain:
data/obj/1.jpg
data/obj/2.jpg
.
.
(and so on)
Then, YOLO will automatically check the corresponding label of the images, which should be having the same name (in this case: 1.txt, 2.txt, ...)
Reference:
(No. 3 and 4 in https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects)

How to load images for classification problem using Keras

I am working on image classification problem using Keras framework. This is binary classification problem and I have 2 folders training set and test set which contains images of both the classes. I don't have separate folder for each class (say cat vs. dog). Keras ImageDataGenerator works when we have separate folders for each class (cat folder & dog folder). But I have all the images in single folder training set and I am not understanding how to proceed further. Kindly suggest how to load the images.
I also have 2 CSV files - train.csv and test.csv. train.csv contain 2 columns namely image_id and class_name. test.csv contains image_id. Note that image_id is matching with the name of files in the images folders.
The latest versions of ImageDataGenerator have a method called flow_from_dataframe, which does exactly what you want.
Basically it is used by first loading your CSV file in a pandas DataFrame, instantiate a ImageDataGenerator and then call flow_from_dataframe with three important parameters:
directory: Folder where your data lives.
x_col: Column in the DataFrame that contains the filenames inside the directory that correspond to your training/testing data.
y_col: Column in the DataFrame corresponding to the labels that will be output by the generator.
Then you use this generator as any other, by calling fit_generator. More information and examples are available here.

TensorFlow Supervisor just stores the latest five models

I am using TensorFlow's Supervisor to train my own model. I followed the official guide to set save_model_secs to be 600. However, I strangely find the path log_dir merely saves the latest five models and automatically discard models generated earlier. I carefully read the source code supervisor.py but cannot find the relevant removal code or mechanism why just five models can be saved all along the training process. Does any have any hint to help me? Any help is really appreciated.
tf.train.Supervisor has a saver argument. If not given, it will use a default. This is configured to only store the last five checkpoints. You can overwrite this by passing your own tf.train.Saver object.
See here for the docs. There are essentially two ways of storing more checkpoints when creating the Saver:
Pass some large integer to the max_to_keep argument. If you have enough storage, passing 0 or None should result in all checkpoints being kept.
Saver also has an argument keep_checkpoint_every_n_hours. This will give you a separate "stream" of checkpoints that will be kept indefinitely. So for example you could store checkponts every 600 seconds (via the save_model_secs argument to Supervisor), but only keep the five most recent of those, but additionally save checkpoints each, say, 30 minutes (0.5 hours) all of which will be kept.

Training custom dataset with translate model

Running the model out of the box generates these files in the data dir :
ls
dev-v2.tgz newstest2013.en
giga-fren.release2.fixed.en newstest2013.en.ids40000
giga-fren.release2.fixed.en.gz newstest2013.fr
giga-fren.release2.fixed.en.ids40000 newstest2013.fr.ids40000
giga-fren.release2.fixed.fr training-giga-fren.tar
giga-fren.release2.fixed.fr.gz vocab40000.from
giga-fren.release2.fixed.fr.ids40000 vocab40000.to
Reading the src of translate.py :
https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/translate.py
tf.app.flags.DEFINE_string("from_train_data", None, "Training data.")
tf.app.flags.DEFINE_string("to_train_data", None, "Training data.")
To utilize my own training data I created dirs my-from-train-data & to-from-train-data and add my own training data to each of these dirs, training data is contained in the files mydata.from & mydata.to
my-to-train-data contains mydata.from
my-from-train-data contains mydata.to
I could not find documentation as to using own training data or what format it should take so I inferred this from the translate.py src and contents of data dir created when executing translate model out of the box.
Contents of mydata.from :
Is this a question
Contents of mydata.to :
Yes!
I then attempt to train the model using :
python translate.py --from_train_data my-from-train-data --to_train_data my-to-train-data
This returns with an error :
tensorflow.python.framework.errors_impl.NotFoundError: my-from-train-data.ids40000
Appears I need to create file my-from-train-data.ids40000 , what should it's contents be ? Is there an example of how to train this model using custom data ?
blue-sky
Great question, training a model on your own data is way more fun than using the standard data. An example of what you could put in the terminal is:
python translate.py --from_train_data mydatadir/to_translate.in --to_train_data mydatadir/to_translate.out --from_dev_data mydatadir/test_to_translate.in --to_dev_data mydatadir/test_to_translate.out --train_dir train_dir_model --data_dir mydatadir
What goes wrong in your example is that you are not pointing to a file, but to a folder. from_train_data should always point to a plaintext file, whose rows should be aligned with those in the to_train_data file.
Also: as soon as you run this script with sensible data (more than one line ;) ), translate.py will generate your ids (40.000 if from_vocab_size and to_vocab_size are not set). Important to know is that this file is created in the folder specified by data_dir... if you do not specify one this means they are generated in /tmp (I prefer them at the same place as my data).
Hope this helps!
Quick answer to :
Appears I need to create file my-from-train-data.ids40000 , what should it's contents be ? Is there an example of how to train this model using custom data ?
Yes, that's the vocab/ word-id file missing, which is generated when preparing to create the data.
Here is a tutorial from the Tesnorflow documentation.
quick over-view of the files and why you might be confused by the files outputted vs what to use:
python/ops/seq2seq.py: >> Library for building sequence-to-sequence models.
models/rnn/translate/seq2seq_model.py: >> Neural translation sequence-to-sequence model.
models/rnn/translate/data_utils.py: >> Helper functions for preparing translation data.
models/rnn/translate/translate.py: >> Binary that trains and runs the translation model.
The Tensorflow translate.py file requires several files to be generated when using your own corpus to translate.
It needs to be aligned, meaning: language line 1 in file 1. <> language line 1 file 2. This
allows the model to do encoding and decoding.
You want to make sure the Vocabulary have been generated from the dataset using this file:
Check these steps:
python translate.py
--data_dir [your_data_directory] --train_dir [checkpoints_directory]
--en_vocab_size=40000 --fr_vocab_size=40000
Note! If the Vocab-size is lower, then change that value.
There is a longer discussion here tensorflow/issues/600
If all else fails, check out this ByteNet implementation in Tensorflow which does translation task as well.