How does AWS-Sagemaker XGBoost perform compared to XGBoost installed locally? - xgboost

I did hyperparameter tuning on two XGBoost model -- one is the XGBoost in AWS-Sagemaker, the other is XGBoost installed locally using the same parameter range. It seems the optimized model via the former performs worse than the latter (18% less in prediction accuracy for a binary classification problem). I wonder has anyone encounter similar problem and if so, what would be the possible reasons? Thanks!

Related

Best case to use tensorflow

I followed all the steps mentioned in the article:
https://stackabuse.com/tensorflow-2-0-solving-classification-and-regression-problems/
Then I compared the results with Linear Regression and found that the error is less (68) than the tensorflow model (84).
from sklearn.linear_model import LinearRegression
logreg_clf = LinearRegression()
logreg_clf.fit(X_train, y_train)
pred = logreg_clf.predict(X_test)
print(np.sqrt(mean_squared_error(y_test, pred)))
Does this mean that if I have large dataset, I will get better results than linear regression?
What is the best situation - when I should be using tensorflow?
Answering your first question, Neural Networks are notoriously known for overfitting on smaller datasets, and here you are comparing the performance of a simple linear regression model with a neural network with two hidden layers on the testing data set, so it's not very surprising to see that the MLP model falling behind (assuming that you are working with relatively a smaller dataset) the linear regression model. Larger datasets will definitely help neural networks in learning more accurate parameters and generalize the phenomena well.
Now coming to your second question, Tensorflow is basically a library for building deep learning models, so whenever you are working on a deep learning problem like image recognition, Natural Language Processing, etc. you need massive computational power and will be processing a ton of data to train your models, and this is where TensorFlow becomes handy, it offers you GPU support which will significantly boost your training process which otherwise becomes practically impossible. Moreover, if you are building a product that has to be deployed in a production environment for it to be consumed, you can make use of TensorFlow Serving which helps you to take your models much closer to the customers.

How to do parallel GPU inferencing in Tensorflow 2.0 + Keras?

Let's begin with the premise that I'm newly approaching to TensorFlow and deep learning in general.
I have TF 2.0 Keras-style model trained using tf.Model.train(), two available GPUs and I'm looking to scale down inference times.
I trained the model distributing across GPUs using the extremely handy tf.distribute.MirroredStrategy().scope() context manager
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model.compile(...)
model.train(...)
both GPUs get effectively used (even if I'm not quite happy with the results accuracy).
I can't seem to find a similar strategy for distributing inference between GPUs with the tf.Model.predict() method: when i run model.predict() I get (obviously) usage from only one of the two GPUs.
Is it possible to istantiate the same model on both GPUs and feed them different chunks of data in parallel?
There are posts that suggest how to do it in TF 1.x but I can't seem to replicate the results in TF2.0
https://medium.com/#sbp3624/tensorflow-multi-gpu-for-inferencing-test-time-58e952a2ed95
Tensorflow: simultaneous prediction on GPU and CPU
my mental struggles with the question are mainly
TF 1.x is tf.Session()based while sessions are implicit in TF2.0, if I get it correctly, the solutions I read use separate sessions for each GPU and I don't really know how to replicate it in TF2.0
I don't know how to use the model.predict() method with a specific session.
I know that the question is probably not well-formulated but I summarize it as:
Does anybody have a clue on how to run Keras-style model.predict() on multiple GPUs (inferencing on a different batch of data on each GPU in a parallel way) in TF2.0?
Thanks in advance for any help.
Try to load model in tf.distribute.MirroredStrategy and use greater batch_size
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model = tf.keras.models.load_model(saved_model_path)
result = model.predict(batch_size=greater_batch_size)
There still does not seem to be an official example for distributed inference. There is a potential solution here using tf.distribute.MirroredStrategy: https://github.com/tensorflow/tensorflow/issues/37686. However, it does not seem to fully utilize multi gpus

How to optimize a trained Tensorflow graph for execution speedup?

in order to do fast CPU inference of a frozen Tensorflow graph (.pb) I am currently using Tensorflow's C API. The inference speed is already fairly good, however (compared to CPU-specific tools like Intel's OpenVINO) I have so far no possibility to somehow optimize the graph before running it. I am interested in any sort of optimization that is suitable:
- device-specific optimization for CPU
- graph-specific optimization (fusing operations, dropping out nodes, ...)
- ... and everything else lowering the time required for inference.
Therefore I am looking for a way to optimize graphs after training and before execution. As mentioned, Tools like Intel's OpenVINO (for CPUs) and NVIDIA's TensorRT (for GPUs) do stuff like that. I am also working with OpenVINO but currently waiting for a bug fix so that I would like to try an additional way.
I thought about trying Tensorflow XLA, but I have no experience using it. Moreover I have to make sure to either get a frozen graph (.pb) or something that I can convert to a frozen graph (e.g. .h5) in the end.
I would be grateful for recommendations!
Greets
follow these steps:
freeze tensorflow trained model (frozen_graph.pb) - for that you may required trained model .pb, checkpoints & output node names
optimize your frozen model with Intel OpenVINO model optimizer -
python3 mo.py --input_model frozen_graph.pb
Additionally you may required input_shape
you will get .xml & .bin files as result. with the help of benchmark_app, you can check inference optimisation .

Using scikit learn for Neural Networks vs Tensorflow in training

I was implementing some sample Neural networks and in most tutorials saw this statement.
Neural networks tend to work better on GPUs than on CPU.
The scikit-learn framework isn’t built for GPU optimization.
So does this statement (work better) refers solely regarding the train phase of a neural network or it includes the prediction part also. Would greatly appreciate some explanation on this.
That statement refers to the training phase. The only issue here is that you can explore the search space of feasible models in a more efficient way using a GPU so you will probably find better models in less time. However, this is only related to computational costs and not to model predictive performance.

Optimizers in Tensorflow

From various examples of Tensorflow (translation, ptb) it seems like that you need to explicitly change learning rate when using GradientDescentOptimizer. But is it the case while using some more 'sophisticated' techniques like Adagrad, Adadelta etc. Also when we continue training the model from a saved instance, are the past values used by these optimizers saved in the model file ?
It depends on the Optimizer you are using. Vanilla SGD needs (accepts) individual adaption of the learning rate. Some others do. Adadelta for example does not. (https://arxiv.org/abs/1212.5701)
So this depends not so much on Tensorflow but rather on the mathematical background of the optimizer you are using.
Furthermore: Yes, saving and restarting the training does not reset the learning rates, but continuous at the point saved.