TFRecord vs RecordIO - mxnet

TensorFlow Object Detection API prefers TFRecord file format. MXNet and Amazon Sagemaker seem to use RecordIO format. How are these two binary file formats different, or are they the same thing?

RecordIO and TFRecord are same in the sense that they are serving the same purpose - to put data in one sequence for faster reading, and both of them using Protocol buffers under the hood for better space allocation.
It seems to me that RecordIO is more like an umbrella term: a format that is used to store huge chunk of data in one file for faster reading. Some products adopt "RecordIO" as an actual term, but in Tensorflow they decided to use a specific word for that - TFRecord. That's why some people call TFRecord as "TensorFlow-flavored RecordIO format".
There is no single RecordIO format as is. People from Apache Mesos, who also call their format RecordIO, say: "Since there is no formal specification of the RecordIO format, there tend to be slight incompatibilities between RecordIO implementations". And their RecordIO format is different from the one MXNet uses - I don't see "magic number" at the beginning of each record.
So, on structure level TFRecord of Tensorflow and RecordIO of MXNet are different file formats, e.g. you don't expect MXNet to be able to read TFRecord and vice versa. But on a logical level - they serve same purpose and can be considered similar.

Related

Difference between .pb and .h5

What is the main difference between .pb format of tensorflow and .h5 format of keras to store models? Is there any reason to choose one over the other?
Different file formats with different characteristics, both used by tensorflow to save models (.h5 specifically by keras).
.pb - protobuf
It is a way to store some structured data (in this case a neural network),project is open source and currently overviewed by Google.
Example
person {
name: "John Doe"
email: "jdoe#example.com"
}
Simple class containing two fields, you can load it in one of multiple supported languages (e.g. C++, Go), parse, modify and send to someone else in binary format.
Advantages
really small and efficient to parse (when compared to say .xml), hence often used for data transfer across the web
used by Tensorflow's Serving when you want to take your model to production (e.g. inference over the web)
language agnostic - binary format can be read by multiple languages (Java, Python, Objective-C, and C++ among others)
advised to use since tf2.0 , you can see official serializing guide
saves various metadata (optimizers, losses etc. if using keras's model)
Disadvantages
SavedModel is conceptually harder to grasp than single file
creates folder where weights are
Sources
You can read about this format here
.h5 - HDF5 binary data format
Used originally by keras to save models (keras is now officially part of tensorflow). It is less general and more "data-oriented", less programmatic than .pb.
Advantages
Used to save giant data (so some neural networks would fit well)
Common file saving format
Everything saved in one file (weights, losses, optimizers used with keras etc.)
Disadvantages
Cannot be used with Tensorflow Serving but you can simply convert it to .pb via keras.experimental.export_saved_model(model, 'path_to_saved_model')
All in all
Use the simpler one (.h5) if you don't need to productionize your model (or it's reasonably far away). Use .pb if you are going for production or just want to standardize on single format across all tensorflow provided tools.

Tensorflow Stored Learning

I haven't tried Tensorflow yet but still curious, how does it store, and in what form, data type, file type, the acquired learning of a machine learning code for later use?
For example, Tensorflow was used to sort cucumbers in Japan. The computer used took a long time to learn from the example images given about what good cucumbers look like. In what form the learning was saved for future use?
Because I think it would be inefficient if the program should have to re-learn the images again everytime it needs to sort cucumbers.
Ultimately, a high level way to think about a machine learning model is three components - the code for the model, the data for that model, and metadata needed to make this model run.
In Tensorflow, the code for this model is written in Python, and is saved in what is known as a GraphDef. This uses a serialization format created at Google called Protobuf. Common serialization formats include Python's native Pickle for other libraries.
The main reason you write this code is to "learn" from some training data - which is ultimately a large set of matrices, full of numbers. These are the "weights" of the model - and this too is stored using ProtoBuf, although other formats like HDF5 exist.
Tensorflow also stores Metadata associated with this model - for instance, what should the input look like (eg: an image? some text?), and the output (eg: a class of image aka - cucumber1, or 2? with scores, or without?). This too is stored in Protobuf.
During prediction time, your code loads up the graph, the weights and the meta - and takes some input data to give out an output. More information here.
Are you talking about the symbolic math library, or the idea of tensor flow in general? Please be more specific here.
Here are some resources that discuss the library and tensor flow
These are some tutorials
And here is some background on the field
And this is the github page
If you want a more specific answer, please give more details as to what sort of work you are interested in.
Edit: So I'm presuming your question is more related to the general field of tensor flow than any particular application. Your question still is too vague for this website, but I'll try to point you toward a few resources you might find interesting.
The tensorflow used in image recognition often uses an ANN (Artificial Neural Network) as the object on which to act. What this means is that the tensorflow library helps in the number crunching for the neural network, which I'm sure you can read all about with a quick google search.
The point is that tensorflow isn't a form of machine learning itself, it more serves as a useful number crunching library, similar to something like numpy in python, in large scale deep learning simulations. You should read more here.

Can the canned tensorflow estimator train on a dataset that can't fit in memory?

I have been looking at the high-level estimator interface in Tensorflow, walked through fairly well in the wide_n_deep tutorial. It doesn't seem to allow streaming input, which I think I require for a training set that doesn't fit in memory.
Does the high-level API support this? I was reading the source, and I can't quite tell. It looks like maybe I could write an input function that had generators instead of arrays, but maybe the code precludes that.
P.S. Sort of related to this question, but I want to stick to the high-level API if I could.
You can certainly train data that does not fit into memory with TensorFlow using high-level APIs. Just use the Dataset API. You can search for:
"The Dataset API supports a variety of file formats so that you can process large datasets that do not fit in memory" in that page. If you want to use Datasets with Estimators, search for "input_fn" on that page.

How to use tensorflow-wavenet

I am trying to use the tensorflow-wavenet program for text to speech.
These are the steps:
Download Tensorflow
Download librosa
Install requirements pip install -r requirements.txt
Download corpus and put into directory named "corpus"
Train the machine python train.py --data_dir=corpus
Generate audio python generate.py --wav_out_path=generated.wav --samples 16000 model.ckpt-1000
After doing this, how can I generate a voice read-out of a text file?
According to the tensorflow-wavenet page:
Currently there is no local conditioning on extra information which would allow context stacks or controlling what speech is generated.
You can find more information about current development of the project by reading the issues on the repository (local conditioning is a desired feature!)
The Wavenet paper compares Wavenet to two TTS baselines, one of which appears to have code for training available online: http://hts.sp.nitech.ac.jp
A recent paper by DeepMind describes one approach to going from text to speech using WaveNet, which I have not tried to implement but which at least states the method they use: they first train one network to predict a spectrogram from text, then train WaveNet to use the same sort of spectrogram as an additional conditional input to produce speech. It's a neat idea, especially since you can train the WaveNet part on some huge database of voice-only data, for which you can extract the spectrogram, and then train the text-to-spectrogram part using a different dataset where you have text.
https://google.github.io/tacotron/publications/tacotron2/index.html has the paper and some example outputs.
There seems to be a bunch of unintuitive engineering around the spectrogram prediction part (no doubt because of the nature of text-to-time learning), but there's some detail in the paper at least. The dataset is proprietary so I've no idea how hard it would be to get any results using other datasets.
For those who may come across this question, there is a new python implementation ForwardTacotron that enables text-to-speech readily.

mnist and cifar10 examples with TFRecord train/test file

I am a new user of Tensorflow. I would like to use it for training a dataset of 2M images. I did this experiment in caffe using lmdb file format.
After reading Tensorflow related posts, I realized that TFRecord is the most suitable file format to do so. Therefore, I am looking for complete CNN examples which use TFRecord data. I noticed that the image related tutorials (mnist and cifar10 in link1 and link2) are provided with a different binary file format where the entire data-set is loaded at once. Therefore, I would like to know if anyone knows if these tutorials (mnist and cifar10) are available using TFRecord data (for both CPU and GPU).
I assume that you want to both, write and read TFRecord files. I think what is done here reading_data.py should help you converting MNIST data into TFRecors.
For reading it back, this script does the trick: fully_connected_reader.py
This could be done similarly with cifar10.