Fastest way to load_model for inference in Tensorflow Keras - tensorflow

I’m trying to quickly load a model from disk to make predictions in a REST API. The tf.keras.models.load_model method takes ~1s to load so it’s too slow for what I’m trying to do. Compile flag is set to false.
What is the fastest way to load a model from disk for inference only in Tensorflow/Keras?
Is there any way to persist the model in memory between requests?
I tried caching but pickle deserialisation is very expensive and adds ~1.2s. I suspect the built-in Keras load model does some sort of serialisation too, which seems to be the killer.
PD: I'm aware of TFX but feels like an overkill as I've already set up a REST API. Predictions are fast, just need to quickly load the model from disk or persist in memory between requests.
Thanks in advance,

Doink! I had a bit of a brain fart moment just there so in case you have it too, here is a solution that does the job.
Just load the model when you start the server so all request can use the model.


Using dynamically generated data with keras

I'm training a neural network using keras but I'm not sure how to feed the training data into the model in the way that I want.
My training data set is effectively infinite, I have some code to generate training examples as needed, so I just want to pipe a continuous stream of novel data into the network. keras seems to want me to specify my entire dataset in advance by creating a numpy array with everything in it, but this obviously wont work with my approach.
I've experimented with creating a generator class based on keras.utils.Sequence which seems like a better fit, but it still requires me to specify a length via the __len__ method which makes me think it will only create that many examples before recycling them. Can someone suggest a better approach?

production - What is the best way to load a file for fast computation?

I'm deploying a deep learning model and saved the keras model as .h5 file. I think complex model will make it big in size and hence slow interaction at the server, but is there a way other than reducing the layers in the model that I can do? Is there a sort of compressing the .h5 file in order to load it faster for the server?
Thank you
There is a way to do that.
What you are looking for is called quantization.
Not necessarily reducing the layers which is equivalent to model-pruning, quantization reduces both the size and the latency of the model by modifying the precision of the weights (or even activations in some cases).
For more detailed information, read this page on the official TensorFlow documentation:

Tensorflow object detection api: how to use imgaug for augmentation?

I've been hand-rolling augmenters using imgaug, as I really like some of the options that are not available in the tf object detection api. For instance, I use motion blur because so much of my data has fast-moving, blurry objects.
How can I best integrate my augmentation sequence with the api for on-the-fly training?
E.g., say I have an augmenter:
aug = iaa.SomeOf((0, 2),
[iaa.Fliplr(0.5), iaa.Flipud(0.5), iaa.Affine(rotate=(-10, 10))])
Is there some way to configure the object detection api to work with this?
What I am currently doing is using imgaug to generate (augmented) training data, and then creating tfrecord files from each iteration of this augmentation pipeline. This is very inefficient as I am saving large amounts of data to disk rather than running augmentation on the fly, during training.
Someone has made a repo for this:
Sorry this is not a code answer and I have not actually looked into it, so I will not mark this as officially answered. If I ever get a chance to test it I will let people know.

Deep Learning with TensorFlow on Compute Engine VM

I'm actualy new in Machine Learning, but this theme is vary interesting for me, so Im using TensorFlow to classify some images from MNIST datasets...I run this code on Compute Engine(VM) at Google Cloud, because my computer is to weak for this. And the code actualy run well, but the problam is that when I each time enter to my VM and run the same code I need to wait while my model is training on CNN, and after I can make some tests or experiment with my data to plot or import some external images to impruve my accuracy etc.
Is There is some way to save my result of trainin model just once, some where, that when I will decide for example to enter to the same VM tomorrow...and dont wait anymore while my model is training. Is that possible to do this ?
Or there is maybe some another way to do something similar ?
You can save a trained model in TensorFlow and then use it later by loading it; that way you only have to train your model once, and use it as many times as you want. To do that, you can follow the TensorFlow documentation regarding that topic, where you can find information on how to save and load the model. In short, you will have to use the SavedModelBuilder class to define the type and location of your saved model, and then add the MetaGraphs and variables you want to save. Loading the saved model for posterior usage is even easier, as you will only have to run a command pointing to the location of the file in which the model was exported.
On the other hand, I would strongly recommend you to change your working environment in such a way that it can be more profitable for you. In Google Cloud you have the Cloud ML Engine service, which might be good for the type of work you are developing. It allows you to train your models and perform predictions without the need of an instance running all the required software. I happen to have worked a little bit with TensorFlow recently, and at first I was also working with a virtualized instance, but after following some tutorials I was able to save some money by migrating my work to ML Engine, as you are only charged for the usage. If you are using your VM only with that purpose, take a look at it.
You can of course consult all the available documentation, but as a first quickstart, if you are interested in ML Engine, I recommend you to have a look at how to train your models and how to get your predictions.

VLFeat SVM storage

I am working with VLFeat and I have code that trains a linear SVM on my dataset. I would like to save the SVM to a file somehow so that I can load it up later and test it on several other datasets. What is the best way to do this?
edit: I am using the C++ API. Is it enough to save the model by using vl_svm_get_model? Would a simple serialization of the bytes of the svm struct work?
I am not using this approach however it should be sufficient to save the model and bias terms to a file for later testing