We are looking for ML model serving with a developer experience where the ML engineers don’t need to know Devops.
Ideally we are looking for the following ergonomics or something similar:
Initialize a new model serving end point preferably by a CLI, get a GCS bucket
each time we train a new model, we put it in the GCS bucket of step 1.
The serving system guarantees that the most recent model in the bucket is served unless a model is specified by version number.
We are also looking for a service that optimizes cost and latency.
Any suggestions?
Have you considered https://www.tensorflow.org/tfx/serving/architecture? You can definitely automate the entire workflow using tfx. I think the guide here does a good job walking through it. Depending on your use-case, you may want to use tft instead of Kubeflow like they're doing in that guide. Besides serving automation, you may also want to consider pipeline automation to separate the feature engineering from the pipeline mechanics itself. For example, you can build the pipeline, abstract out the feature engineering into a tensorflow function meeting certain requirements, and automate the deployment process also. This way you don't need to deal with the feature specs/schemas manually, and you know that your transformations are the same during serving as they were while training.
You can do the same thing with scikit-learn also, and I believe serving scikit-learn models is also supported under the vertex-ai umbrella.
To your point about latency, you definitely want the pipeline doing the transformations on the gpu, as such, I would recommend using tensorflow over something like scikit-learn if the use-case is truly time sensitive.
Best of luck!
Related
I was wondering if it is possible/how easy it would be to implement a TFX pipeline (on a real dataset, with 100+ GB dataset, not a tutorial with a small dataset) in AWS?
For the orchestration, I might use Kubeflow. But I suppose, the major issue would be setting up a proper scalable runner for the Apache Beam. I am thinking of using Apache Flink for that.
Anyone with experience doing it? How would you go about putting a TF in production in AWS in general when you need to train the model on a regular basis on new data, do you write the pipeline from scratch or use some tool?
Folks, I am writing an application which will produce recommendations based on ML model call. The application will have different models, some of them should be called in sequence. A data scientist should be able, to upload a model in the system. This means that the application should have logic to store models metadata as well as address of a model server. A model server will be instantiated dynamically on a model upload event.
I would like to use a cluster of TensorFlow Serving here, however I am stacked with a question of architecture.
Is there a way to have something like service registry for TensorFlow servers? What is the best way to build such a cluster of servers with different models?
I need some clarification on what you're trying to do. Is the feature vector for all the models the same? If not then it will be quite a bit harder to do this. Trained models are encapsulated in the SavedModel format. It sounds like you're trying to train an ensemble, but some of the models are frozen? You could certainly write a custom component to make an inference request as part of the input to Trainer, if that's what you need.
UPDATE 1
From your comment below it sounds like what you might be looking for is a service mesh, such as Istio for example. That would help manage the connections between services running inside containers, and the connections between users and services. In this case tf.Serving instances running your models are the services, but the basic request-response pattern is the same. Does that help?
I'm trying to create ML models dealing with big datasets. My question is more related to the preprocessing of these big datasets. In this sense, I'd like to know what are the differences between doing the preprocessing with Dataprep, Dataproc or Tensorflow.
Any help would be appreciated.
Those are 3 different things, you can't really compare them.
Dataprep - data service for visually exploring, cleaning, and
preparing structured and unstructured data for analysis
In other words, if you have a large training data and you want to clean it up, visualize etc. google dataprep enables you to do that easily.
Cloud Dataproc is a fast, easy-to-use, fully-managed cloud service for
running Apache Spark and Apache Hadoop clusters in a simpler, more
cost-efficient way.
Within the context of your question, after you cleanup your data and it is ready to feed into your ML algorithm, you can use Cloud Dataproc to distribute it across multiple nodes and process it much faster. In some machine learning algorithms the disk read speed might be a bottleneck so it could greatly improve your machine learning algorithms running time.
Finally Tensorflow:
TensorFlow™ is an open source software library for numerical
computation using data flow graphs. Nodes in the graph represent
mathematical operations, while the graph edges represent the
multidimensional data arrays (tensors) communicated between them.
So after your data is ready to process; you can use Tensorflow to implement machine learning algorithms. Tensorflow is a python library so it is relatively easy to pick up. Tensorflow also enables to run your algorithms on GPU instead of CPU and (recently) also on Google Cloud TPUs(hardware made specifically for machine learning, even better performance than GPUs).
In the context of preprocessing for Machine Learning, I would like to put a time to answer this question in details. So, please bear with me!
Google provides four different processing products. Since, preprocessing has different aspects and covers many different ML prerequisites, each of these platforms is more suitable for a particular preprocessing domain. Products are as follows:
Google ML Engine/ Cloud AI: This product is based on Tensorflow. You can run your Machine Learning code in Tensorflow on the ML Engine. For specific types of data like image, text or sequential, tf.keras.preprocessing or tf.contrib.learn.preprocessing Libraries are available to make the appropriate input/tensor format of data for Tensorflow rapidly.
You may also need to transform your data via tf.Transform in a preprocessing step. tf.Transform, a library for TensorFlow, allows users to define preprocessing pipelines as part of a TensorFlow graph. tf.Transform ensures that no skew can arise during preprocessing.
Cloud DataPrep: Preprocessing sometimes is defined as data cleaning, data cleansing, data prepping and data alteration. For this purposes, Cloud DataPrep is the best option. For instance, if you want to get rid of null values or some ASCII characters which may cause errors in your ML model, you can use Cloud DataPrep.
Cloud DataFlow, Cloud Dataproc: Feature extraction, feature selection, scaling, dimension reduction also can be considered as a part of ML preprocessing. Since Cloud DataFlow and DataProc both support Spark, one can use Spark libraries for distributed fast preprocessing of the ML models input. Apache Spark MLlib can also be applied to many ML preprocessing/processing. Note that since Cloud DataFlow supports Apache Beam, it is more into stream processing while Cloud DataProc is more Hadoop-based and is better for batch preprocessing. For more details, please refer to Using Apache Spark with TensorFlow document
I have implemented a neural network model using Python and Tensorflow, which normally runs on my own computer.
Now I would like to train it on new datasets on the Google Cloud Platform. Do you think it is possible? Do I need to change my code?
Thank you very much for your help!
Google Cloud offers the Cloud ML Engine service, which allows to train your models and perform predictions without the need of running and maintaining an instance with the required software.
In order to run the TensorFlow NN models you already have, you will not need to change your code, you will only have to package the trainer appropriately, as described in the documentation, and run a ML Engine job that performs the training itself. Once you have your model, you can also deploy it in the same service and later get predictions with different features depending on your requirements (urgency in getting the predictions, data set sources, etc.).
Alternatively, as suggested in the comments, you can always launch a Compute Engine instance and run there your TensorFlow model as if you were doing it locally in your computer. However, I would strongly recommend the approach I proposed earlier, as you will be saving some money, because you will only be charged for your usage (training jobs and/or predictions) and do not need to configure an instance from scratch.
I'm actualy new in Machine Learning, but this theme is vary interesting for me, so Im using TensorFlow to classify some images from MNIST datasets...I run this code on Compute Engine(VM) at Google Cloud, because my computer is to weak for this. And the code actualy run well, but the problam is that when I each time enter to my VM and run the same code I need to wait while my model is training on CNN, and after I can make some tests or experiment with my data to plot or import some external images to impruve my accuracy etc.
Is There is some way to save my result of trainin model just once, some where, that when I will decide for example to enter to the same VM tomorrow...and dont wait anymore while my model is training. Is that possible to do this ?
Or there is maybe some another way to do something similar ?
You can save a trained model in TensorFlow and then use it later by loading it; that way you only have to train your model once, and use it as many times as you want. To do that, you can follow the TensorFlow documentation regarding that topic, where you can find information on how to save and load the model. In short, you will have to use the SavedModelBuilder class to define the type and location of your saved model, and then add the MetaGraphs and variables you want to save. Loading the saved model for posterior usage is even easier, as you will only have to run a command pointing to the location of the file in which the model was exported.
On the other hand, I would strongly recommend you to change your working environment in such a way that it can be more profitable for you. In Google Cloud you have the Cloud ML Engine service, which might be good for the type of work you are developing. It allows you to train your models and perform predictions without the need of an instance running all the required software. I happen to have worked a little bit with TensorFlow recently, and at first I was also working with a virtualized instance, but after following some tutorials I was able to save some money by migrating my work to ML Engine, as you are only charged for the usage. If you are using your VM only with that purpose, take a look at it.
You can of course consult all the available documentation, but as a first quickstart, if you are interested in ML Engine, I recommend you to have a look at how to train your models and how to get your predictions.