What is the current 'state of technology' when having a pipeline composed of python code and Tensorflow/Keras models?
We are trying to have scalability and reactive design using dask and Streamz (for servers registered using Kubernetes).
But currently, we do not know what is the right way to design such infrastructure concerning the fact, that we do want our Tensorflow models to persist and not to be repeatedly created and deleted.
Is Tensorflow serving the technology to be used for this task?
(I was able to find only the basic examples like Persistent dataflows with dask and http://matthewrocklin.com/blog/work/2017/02/11/dask-tensorflow)
Related
We are looking for ML model serving with a developer experience where the ML engineers don’t need to know Devops.
Ideally we are looking for the following ergonomics or something similar:
Initialize a new model serving end point preferably by a CLI, get a GCS bucket
each time we train a new model, we put it in the GCS bucket of step 1.
The serving system guarantees that the most recent model in the bucket is served unless a model is specified by version number.
We are also looking for a service that optimizes cost and latency.
Any suggestions?
Have you considered https://www.tensorflow.org/tfx/serving/architecture? You can definitely automate the entire workflow using tfx. I think the guide here does a good job walking through it. Depending on your use-case, you may want to use tft instead of Kubeflow like they're doing in that guide. Besides serving automation, you may also want to consider pipeline automation to separate the feature engineering from the pipeline mechanics itself. For example, you can build the pipeline, abstract out the feature engineering into a tensorflow function meeting certain requirements, and automate the deployment process also. This way you don't need to deal with the feature specs/schemas manually, and you know that your transformations are the same during serving as they were while training.
You can do the same thing with scikit-learn also, and I believe serving scikit-learn models is also supported under the vertex-ai umbrella.
To your point about latency, you definitely want the pipeline doing the transformations on the gpu, as such, I would recommend using tensorflow over something like scikit-learn if the use-case is truly time sensitive.
Best of luck!
Folks, I am writing an application which will produce recommendations based on ML model call. The application will have different models, some of them should be called in sequence. A data scientist should be able, to upload a model in the system. This means that the application should have logic to store models metadata as well as address of a model server. A model server will be instantiated dynamically on a model upload event.
I would like to use a cluster of TensorFlow Serving here, however I am stacked with a question of architecture.
Is there a way to have something like service registry for TensorFlow servers? What is the best way to build such a cluster of servers with different models?
I need some clarification on what you're trying to do. Is the feature vector for all the models the same? If not then it will be quite a bit harder to do this. Trained models are encapsulated in the SavedModel format. It sounds like you're trying to train an ensemble, but some of the models are frozen? You could certainly write a custom component to make an inference request as part of the input to Trainer, if that's what you need.
UPDATE 1
From your comment below it sounds like what you might be looking for is a service mesh, such as Istio for example. That would help manage the connections between services running inside containers, and the connections between users and services. In this case tf.Serving instances running your models are the services, but the basic request-response pattern is the same. Does that help?
I need to put in production a Tensorflow model with a simple APIs endpoint. The model should be shared among processes/workers/threads in order to not waste too many resources in terms of memory.
I already tried with multiple gunicorn workers setting the --preload option and loading the model before the definition of the resources, but after a while I get timeout error. The model is not responding. I know that there is Tensorflow serving service available for this purpose, but the problem is about the fact that the model is not still available at deployment time and the ML pipeline is composed of many different components (the Tensorflow model is just one of these components). The end user (customer) is the one that trains/saves the model. (I am using docker) Thanks in advance
The issue is that the Tensorflow runtime and global state is not fork safe.
See this thread: https://github.com/tensorflow/tensorflow/issues/5448 or this one: https://github.com/tensorflow/tensorflow/issues/51832
You could attempt to load the model only in the workers using the post_fork and post_worker_init server hooks.
I was thinking about how one should deploy multiple models for use. I am currently dealing with tensorflow. I was referring this and this article.
But I am not able to find any article which targets need to serve several models distributed manner. Q.1. Does tensorflow serving serve models off from single machine? Is there any way to set up a cluster of machines running tensorflow serving? So that multiple machines serve same model somewhat working as master and slave or say load balance between them while serving different models.
Q.2. Does similar functionality exist for other deep learning frameworks, say keras, mxnet etc (not just restricting to tensorflow and serving models from different frameworks)?
A1: Serving tensorflow models in a distributed fashion is made easy with Kubernetes, a container orchestration system, that takes much of the pain related to having distributed system away from you, including load balancing. Please check serving kubernetes.
A2: Sure, check for instance Prediction IO. It's not deep learning specific, but can be used to deploy models made with e.g. Spark MLLib.
I'm trying to create ML models dealing with big datasets. My question is more related to the preprocessing of these big datasets. In this sense, I'd like to know what are the differences between doing the preprocessing with Dataprep, Dataproc or Tensorflow.
Any help would be appreciated.
Those are 3 different things, you can't really compare them.
Dataprep - data service for visually exploring, cleaning, and
preparing structured and unstructured data for analysis
In other words, if you have a large training data and you want to clean it up, visualize etc. google dataprep enables you to do that easily.
Cloud Dataproc is a fast, easy-to-use, fully-managed cloud service for
running Apache Spark and Apache Hadoop clusters in a simpler, more
cost-efficient way.
Within the context of your question, after you cleanup your data and it is ready to feed into your ML algorithm, you can use Cloud Dataproc to distribute it across multiple nodes and process it much faster. In some machine learning algorithms the disk read speed might be a bottleneck so it could greatly improve your machine learning algorithms running time.
Finally Tensorflow:
TensorFlow™ is an open source software library for numerical
computation using data flow graphs. Nodes in the graph represent
mathematical operations, while the graph edges represent the
multidimensional data arrays (tensors) communicated between them.
So after your data is ready to process; you can use Tensorflow to implement machine learning algorithms. Tensorflow is a python library so it is relatively easy to pick up. Tensorflow also enables to run your algorithms on GPU instead of CPU and (recently) also on Google Cloud TPUs(hardware made specifically for machine learning, even better performance than GPUs).
In the context of preprocessing for Machine Learning, I would like to put a time to answer this question in details. So, please bear with me!
Google provides four different processing products. Since, preprocessing has different aspects and covers many different ML prerequisites, each of these platforms is more suitable for a particular preprocessing domain. Products are as follows:
Google ML Engine/ Cloud AI: This product is based on Tensorflow. You can run your Machine Learning code in Tensorflow on the ML Engine. For specific types of data like image, text or sequential, tf.keras.preprocessing or tf.contrib.learn.preprocessing Libraries are available to make the appropriate input/tensor format of data for Tensorflow rapidly.
You may also need to transform your data via tf.Transform in a preprocessing step. tf.Transform, a library for TensorFlow, allows users to define preprocessing pipelines as part of a TensorFlow graph. tf.Transform ensures that no skew can arise during preprocessing.
Cloud DataPrep: Preprocessing sometimes is defined as data cleaning, data cleansing, data prepping and data alteration. For this purposes, Cloud DataPrep is the best option. For instance, if you want to get rid of null values or some ASCII characters which may cause errors in your ML model, you can use Cloud DataPrep.
Cloud DataFlow, Cloud Dataproc: Feature extraction, feature selection, scaling, dimension reduction also can be considered as a part of ML preprocessing. Since Cloud DataFlow and DataProc both support Spark, one can use Spark libraries for distributed fast preprocessing of the ML models input. Apache Spark MLlib can also be applied to many ML preprocessing/processing. Note that since Cloud DataFlow supports Apache Beam, it is more into stream processing while Cloud DataProc is more Hadoop-based and is better for batch preprocessing. For more details, please refer to Using Apache Spark with TensorFlow document