How to retain Entity Identifier with Batch Prediction of XGBoost Model in Vertex AI - xgboost

I am wondering how we can match back predictions to the entity after executing a batch prediction using an XGBoost model via Custom Training on Prebuilt Images.
When kicking off a BatchPredictionJob it expects the input to be of the form
input_1,input_2,input_3
0.1,0.2,0.3
0.4,0.5,0.6
...
for csv or
[0.1,0.2,0.3]
[0.4,0.5,0.6]
...
for jsonl with the output predictions:
{"instance":[0.1,0.2,0.3], "prediction":0.0345}
...
The output predictions then just contain these instances of input values without any indication of how to map these predictions back to the original entity. As the training is distributed I do not believe I can rely on the file ordering, does anyone have a method to do so?

Doing Batch predictions on a model runs the jobs using distributed processing which means the data is distributed among an arbitrary cluster of virtual machines, and is processed in an unpredictable order.
In the AI platform, to match the returned batch prediction with input instances an instance key needs to be defined. But in Vertex AI this feature has not been documented.
As the concept of using instance keys with prebuilt XGBoost container image on custom trained models is not mentioned in the Vertex AI docs, this issue has been raised in this issue tracker. We cannot provide an ETA at this moment but you can follow the progress in the issue tracker and you can ‘STAR’ the issue to receive automatic updates and give it traction by referring to this link.
In Vertex AI the batch prediction outputs are not ordered, for which a feature request has been raised and you can track the update on this request from this link.

Related

NER Incremental training with Spacy

I would like to incrementally train a NER Spacy Model.
By incrementally I mean send a first batch of N training samples, get a first model, then send a second batch of M training samples and get a model identical as if the N+M samples would have been sent in one batch and the model trained.
To be clear, this is not about adding samples after the model has been fully trained. Instead it is the ability to save intermediate states in the model so we can "resume" and add more training samples.
This is very useful if the number of samples is large or to create an "active learning" systems.
It seems doable with NLTK according to this article : and I was wondering if this can be done with Spacy.
So far I have trained my own custom NER model with Spacy using nlp.update but it does not seem to store any intermediate state that supports incremental training.
Yes, this is possible in spaCy. Your approach with nlp.update is correct; once you have added your second batch of training samples, you just need to make a call to nlp.to_disk("/path") (https://spacy.io/usage/saving-loading). Then you can continue this process by loading your saved model again.

How to deploy a live learning tensor flow model in cloud?

How do I deploy a tensor flow model in cloud which can learn and update the weights when given as input . Since most of the deployment methods I saw involved model freezing which implied freezing of weights also . Is it possible or is the latter the only way ?
Freezing the model is the most compact form and lets you have a smaller inference node which you can call for just prediction and only has the necessary information to do just that.
If you want to have and model and make it available to learn online and also make inference you could have so it has all the graph loaded with the newest weights. For security save the weights from time to time. Of course you could have two programs one for inference with the latest frozen model and another one that you up from time to time to make a new training, using the last saved weights. I recommend you the second option. Hope it helps!

What is difference between a regular model checkpoint and a saved model in tensorflow?

There's a fairly clear difference between a model and a frozen model. As described in model_files, relevant part: Freezing
...so there's the freeze_graph.py script that takes a graph definition and a set of checkpoints and freezes them together into a single file.
Is a "saved_model" most similar to a "frozen_model" (and not a saved
"model_checkpoint")?
Is this defined somewhere in docs I'm missing?
In prior versions of tensorflow we would save and restore model
weights, but this seems to be in context of a "model_checkpoint" not
a "saved_model", is that still correct?
I'm asking more for the design overview here, not implementation specifics.
Checkpoint file only contains variables for specific model and should be loaded with either exactly same, predefined graph or with specific assignment_map to load only chosen variables. See https://www.tensorflow.org/api_docs/python/tf/train/init_from_checkpoint
Saved model is more broad cause it contains graph that can be loaded within a session and training could be continued. Frozen graph, however, is serialized and could not be used to continue training.
You can find all the info here https://www.tensorflow.org/guide/saved_model

Deep networks on Cloud ML

I am trying to train a very deep model on Cloud ML however i am having serious memory issues that i am not managing to go around. The model is a very deep convolutional neural network to auto-tag music.
The model for this can be found in the image below. A batch of 20 with a tensor of 12x38832x1 is inserted in the network.
The music was originally 465894x1 samples which was then split into 12 windows. Hence, 12x38832x1. When using the map_fn function each loop would have the seperate 38832x1 samples (conv1d).
Processing windows at a time yields better results than the whole music using one CNN. This was split prior to storing the data in TFRecords in order to minimise the needed processing during training. This is loaded in a queue with maximum queue size of 200 samples (ie 10 batches).
Once dequeue, it is transposed to have the 12 dimension first which then can be used in the map_fn function for processing of the windows. This is not transposed prior to being queued as the first dimension needs to match the batch dimension of the output which is [20, 50]. Where 20 is the batch size as the data and 50 are the different tags.
For each window, the data is processed and the results of each map_fn are superpooled using a smaller network. The processing of the windows is done by a very deep neural network which is giving me problems to keep as all the config options i am giving are giving me out of memory errors.
As a model i am using one similar to Census Tensorflow Model.
First and foremost, i am not sure if this is the best option since for evaluation a separate graph is built and not shared variables. This would require double the amount of parameters.
Secondly, as a cluster setup, i have been using one complex_l master, 3 complex_l workers and 3 large_model parameter servers. I do not know if am underestimating the amount of memory needed here.
My model has previously worked with a much smaller network. However, increasing it in size started giving me bad out of memory errors.
My questions are:
The memory requirement is big, but i am sure it can be processed on cloud ml. Am i underestimating the amount of memory needed? What are your suggestions about the cluster for such a network?
When using a train.server in the dispatch function, do you need to pass on the cluster_spec so it is used in the replica_device setter? Or does it allocate on it's own? When not using it, and setting tf.configProto of log placement, all the variables seem to be on the master worker. On the Census Example in the task.py this is not passed on. I can assume this is correct?
How does one calculate how much memory is needed for a model (rough estimate to select the cluster)?
Is there any other tensorflow core tutorial how to setup such big jobs? (other than Census)
When training a big model in distributed between-graph replication, does all the model need to fit on the worker, or the worker only does ops and then transmits the results to the PS. Does that mean that the workers can have low memory just for singular ops?
PS: With smaller models the network trained successfully. I am trying to deepen the network for better ROC.
Questions coming up from on-going troubleshooting:
When using the replica_device_setter with the parameter cluster, i noticed that the master has very little memory and CPU usage and checking the log placement there are very little ops on the master. I checked the TF_CONFIG that is loaded and it says the following for the cluster field:
u'cluster': {u'ps': [u'ps-4da746af4e-0:2222'], u'worker': [u'worker-4da746af4e-0:2222'], u'master': [u'master-4da746af4e-0:2222']}
On the other hand, in the tf.train.Clusterspec documentation, it only shows workers. Does that mean that the master is not considered as worker? What happens in such case?
Error is it Memory or something else? EOF Error?

Sharing Queue between two graphs in tensorflow

Is it possible to share a queue between two graphs in TensorFlow? I'd like to do a kind of bootstrapping to select "hard negative" examples during training.
To speed up the process, I want separate threads for hard negative example selection, and for the training process. The hard negative selection is based on the evaluation of the current model, and it will load its graph from a checkpoint file. The training graph is run on another thread and writes the checkpoint file. The two graphs should share the same queue: the training graph will consume examples and the hard negative selection will produce them.
Currently there's no support for sharing state between different graphs in the open-source version of TensorFlow: each graph runs in a separate session, and each session uses an isolated set of devices.
However, it seems like it would be possible to achieve your goal using a queue in single graph. Simply construct a queue (using e.g. tf.FIFOQueue) and use tf.import_graph_def() to import the graph from the checkpoint file into the current graph. Using the return_elements argument to tf.import_graph_def() you can specify the name of the tensor that will contain the negative examples, and then add a q.enqueue_many() operation to add them to your queue. You would then fork a thread to run the enqueue_many operation in a loop. In your training graph, you can use q.dequeue_many() to get a batch of negative examples, and use them as the input to your training process.