I am processing some data from the UNSW_NB15 dataset using tensorflow and python. I want to save the processed data so that it can be reloaded at any time. This seems to be a more efficient method. What is the best and most effective way to do this?
I have not tried anything as yet. But I am wondering if there is a way to do it with a simple load statement that will not require me to sort out datatypes of columns. I am using tensorflow 2.7
Related
I'm training a neural network using keras but I'm not sure how to feed the training data into the model in the way that I want.
My training data set is effectively infinite, I have some code to generate training examples as needed, so I just want to pipe a continuous stream of novel data into the network. keras seems to want me to specify my entire dataset in advance by creating a numpy array with everything in it, but this obviously wont work with my approach.
I've experimented with creating a generator class based on keras.utils.Sequence which seems like a better fit, but it still requires me to specify a length via the __len__ method which makes me think it will only create that many examples before recycling them. Can someone suggest a better approach?
I’m trying to quickly load a model from disk to make predictions in a REST API. The tf.keras.models.load_model method takes ~1s to load so it’s too slow for what I’m trying to do. Compile flag is set to false.
What is the fastest way to load a model from disk for inference only in Tensorflow/Keras?
Is there any way to persist the model in memory between requests?
I tried caching but pickle deserialisation is very expensive and adds ~1.2s. I suspect the built-in Keras load model does some sort of serialisation too, which seems to be the killer.
PD: I'm aware of TFX but feels like an overkill as I've already set up a REST API. Predictions are fast, just need to quickly load the model from disk or persist in memory between requests.
Thanks in advance,
Joan
Doink! I had a bit of a brain fart moment just there so in case you have it too, here is a solution that does the job.
Just load the model when you start the server so all request can use the model.
I have a TF dataset that has been prepared by doing some complex preprocessing as part of a separate process. Now, I would like to save it to disk and read it later.
What is the best way to do that?
Maybe you can use tf.data.experimental.save
I'm hoping someone can guide me in the right direction.
I am trying to feed input variables (features) and label to tf.estimator.DNNClassifier, and it keeps recommending that I use tensorflow datasets instead of reading the data from a pandas dataframe (using tf.estimator.inputs.pandas_input_fn()).
The issue is, I need to read my CSV file first into a dataframe to make a lot of transformations before feeding into the DNN. As I understand from this blog post, the tensorflow dataset wants to read the data from a CSV file - for reasons that make sense.
So, then will have to write the transformed data to another CSV so I can re-import into a tensorflow dataset? That doesn't make any sense. Is there a good guide that I can read. I'm frustrated.
The benefit would be that I can store and load individual models using tf.train.export_meta_graph() but I'm not sure if this usage is what TensorFlow was designed for. Does it have any negative impacts on parallelism/performance, functionality, etc to use multiple graphs in parallel, as long as I don't want to share data between them?
It's not a good idea because passing data between models would require fetching from one session, and feeding the Python object back into the other session. Locally, that's unnecessary copy operations, and it's worse in the distributed setting.
There is now export_scoped_meta_graph() and import_scoped_meta_graph() in tf.contrib.framework.meta_graph to save and load parts of a graph and using a single global graph is recommended.