Tensorflow: partially decode binary data - tensorflow

I am wondering if there is a native tensorflow function that allows to decode a binary file (for example a tfrecord) starting from a given byte (offset) and reading the following N bytes, without decoding the entire file.
This has been implemented for jpeg images: tf.image.decode_and_crop_jpeg
but I cannot find a way to do the same thing with any binary file.
This would be very useful when the cropping window is much smaller than the whole data.
Currently, I am using a custom tf.py_func as mapping function of a Dataset object. It works, but with all the limitation of a custom py_func.
Is there a native tensorflow way to do the same thing?

Related

Buffer deduplication procedure will be skipped when flatbuffer library is not properly loaded. (Tensorflow Lite)

Every time I convert a model to a tflite format, I always receive this WARNING. I wonder if this library will further reduce the model size. If so, I hope to use it. But I can't find relevant information in Google, and flatbuffer's documentation doesn't seem to mention how to simply install it so that tensorflow can invoke it.

How to draw samples from a categorical distribution in TensorFlow.js

Issue in short
In Python version of Tensorflow there is a tf.random.categorical() method that draws samples from a categorical distribution. But I can't find a similar method in TensorFlow.js API. So, what is the proper way to draw samples from a categorical distribution in TensorFlow.js?
Issue in details
In Text generation with an RNN tutorial the tf.random.categorical() method is being used in generate_text() function to decide what character should be passed next to the RNN input to generate a sequence.
predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()
I'm experimenting with TensorFlow.js and trying trying to generate a "random" Shakespeare-like writing but in the browser. All parts of the tutorial seems to work well together except the step with using a tf.random.categorical() method.
I guess writing the alternative to tf.random.categorical() function manually should not be that hard, and also there are couple of 3rd-party JavaScript libraries that implement this functionality already, but it looks pretty logical to have it as a part of TensorFlow.js API.
I think you can use tf.multinomial instead.
I peeked at the source code and with name and seed parameters set to None, it is essentially the same as tf.multinomial with some random seeding going on, I guess.

Tensorflow Object Detection API model for use in TensorFlow.js

I am trying to use an object detection model, that was created using the TF Object Detection API, in TensorFlow.js.
I converted the model using:
tensorflowjs_converter --input_format=tf_saved_model inference_graph/saved_model inference_graph/web_model
It gets converted without any problems and loads in my javascript code.
Now I am a bit unsure about what my next steps should be.
I have to translate the Python into JavaScript, but certain areas I am unsure about.
With the object detection API in python, there are many steps; (1)preprocessing the image, such as convert to RGB, numpy array reshape, expand dimensions (I have an idea of how I would approach it) and (2) the run inference for single image function, I am not sure how I would go about it in tensorflow.js.
I tried to find some general information about deploying an object detection model in tensorflow.js, but I could not find much, except with pre-trained models.
Any information about this topic would be great!
Thanks!
as mentioned by #edkeveked you will need to perform similar input processing and output processing in JavaScript as is being done in Python. i can't say exactly what you will need to do since i am not familiar with the model. however, you can find an example using a specific object detection model here:
https://github.com/vabarbosa/tfjs-model-playground/blob/master/object-detector/demo/object-detector.js
see also
https://medium.com/codait/bring-machine-learning-to-the-browser-with-tensorflow-js-part-iii-62d2b09b10a3
You would need to replicate the same process in javascript before giving it to the model. In js, the image use by default the RGB channel, so there is no need to make that conversion again.

Build a dataset for TensorFlow

I have a large number of JPGs representing vehicles. I want to create a dataset for TensorFlow with a categorization such that every vehicle image describes the side, the angle or the roof, i.e. I want to create nine subsets of images (front, back, driver side, driver front angle, driver back angle, passenger side, passenger front angle, passenge back angle, roof). At the moment the filename of each JPG describes the desired point.
How can I turn this set to be a dataset that TensorFlow can easily manipulate? Also, should I run a procedure which crop the JPG to extract only the vehicle portion? How could I do that using TensorFlow?
I apologize in advance for not providing details and examples to this question, but I don't really know how can I achieve an entry point for this problem. The tutorials I'm following all assume an already created dataset ready to use.
Okay, I'm going to try to answer this as well as I can, but producing and pre-processing data for use in ML algorithms is laborious and often expensive (hence the repeated use of well known data sets for testing algorithm designs).
To address a few straight-forward questions first:
should I run a procedure which crop the JPG to extract only the vehicle portion?
No. This isn't necessary. The neural network will sort the relevant information in the images from the irrelevant itself and having a diverse set of images will help to build a robust classifier. Also you would likely make life a lot more difficult for yourself later on by resizing images (see point 1. below for more).
How could I do that using TensorFlow?
You wouldn't. Tensorflow is designed to build and test ML models, and does not have tool for pre-processing data. (well perhaps TensorFlow Extended does, but this shouldn't be necessary)
Now a rough guideline for how you would go about creating a data set from the files described:
1) The first thing you will need to do is to load your .jpg images into python and resize them all to be identical. A neural network will need the same number of inputs (pixels in this case) in every training example, so having different sized images will not work.
There is a good answer detailing how to load images using python image library (PIL) on stack overflow here.
The PIL image instances (elements of the list loadedImages in the example above) can then be converted to numpy arrays using data = np.asarray(image), which tensorflow can work with.
In addition to building a set of numpy arrays of your data, you will also need a second numpy array of labels for this data. A typical way to encode this will be as a numpy array the same length as your number of images with an integer value for each point representing the class to which that image belongs (0-8 for your 9 classes). You could input these by hand, but this will be labour intensive, and I would suggest using python strings inbuilt find method to locate key words within the filenames to automate determining their class. This could be done within the
for image in imagesList:
loop in the above link, as image should be a string containing the image filename.
As I mentioned above, resizing the images is necessary to make sure they are all identical. You could do this with numpy, using indexing to choose a subsection of each image array, or using PIL's resize function before converting to numpy. There is no right answer here, and many methods have been used to resize images for this purpose, from padding, to stretching to cropping.
Then end result here should be 2 numpy arrays. One of image data which has shape [w,h,3,n] where w=image width, h=image height, 3 = the three RGB layers (provided images are in colour) and n= the number of images you have. The second of labels associated with these images, of shape [n,] where every element of the length n array is an integer from 0-8 specifying its class.
At this point it would be a good idea to save the dataset in this format using numpy.save() so that you don't have to go through this process again.
2) Once you have your images in this format, tensorflow has a class called tf.Dataset into which you can load the image and label data described above and will allow you to shuffle and sample data from it.
I hope that was helpful, and I am sorry that there is no quick-fix solution to this (at least not one I am aware of). Good luck.

How to load tensorflow graph from memory address

I'm using the TensorFlow C++ API to load a graph from a file and execute it. Everything is working great, but I'd like to load the graph from memory rather than from a file (so that I can embed the graph into the binary for better portability). I have variables that reference both the binary data (as an unsigned char array) and the size of the data.
This how I am currently loading my graph.
GraphDef graph_def;
ReadBinaryProto(tensorflow::Env::Default(), "./graph.pb", &graph_def);
Feels like this should be simple but most of the discussion is about the python API. I did try looking for the source of ReadBinaryProto but wasn't able to find it in the tensorflow repo.
The following should work:
GraphDef graph_def;
if (!graph_def.ParseFromArray(data, len)) {
// Handle error
}
...
This is because GraphDef is a sub-class of google::protobuf::MessageList, and thus inherits a variety of parsing methods
Edit: Caveat: As of January 2017, the snippet above works only when the serialized graph is <64MB because of a default protocol buffer setting. For larger graphs, take inspiration from ReadBinaryProtos implementation
FWIW, the code for ReadBinaryProto is in tensorflow/core/platform/env.cc