is there any way to train spacy on google colab gpu? - google-colaboratory

I want to train spacy model on custom dataset but its take too much time for training, is there any way to speed up the training.
I passed device=0 in ner.begin_training() but it takes same amount of time as before.

Yes, it is possible.
Go to [EDIT] -> [Notebook settings] -> Select GPU under Hardware Acceleration (It restarts the runtime, so all your cell states get lost)
Use !pip install -U spacy[cuda100] to install spacy with Cuda Support
Run the following
Script:
import spacy
gpu = spacy.prefer_gpu()
print('GPU:', gpu)
It returns:
GPU: True

Related

How to optimize your tensorflow model by using TensorRT?

These are the instruction to solve the assignments?
Convert your TensorFlow model to UFF
Use TensorRT’s C++ API to parse your model to convert it to a CUDA engine.
TensorRT engine would automatically optimize your model and perform steps
like fusing layers, converting the weights to FP16 (or INT8 if you prefer) and
optimize to run on Tensor Cores, and so on.
Can anyone tell me how to proceed with this assignment because I don't have GPU in my laptop and is it possible to do this in google colab or AWS free account.
And what are the things or packages I have to install for running TensorRT in my laptop or google colab?
so I haven't used .uff but I used .onnx but from what I've seen the process is similar.
According to the documentation, with TensorFlow you can do something like:
from tensorflow.python.compiler.tensorrt import trt_convert as trt
converter = trt.TrtGraphConverter(
input_graph_def=frozen_graph,
nodes_blacklist=['logits', 'classes'])
frozen_graph = converter.convert()
In TensorFlow1.0, so they have it pretty straight forward, TrtGraphConverter has the option to serialized for FP16 like:
converter = trt.TrtGraphConverter(
input_saved_model_dir=input_saved_model_dir,
max_workspace_size_bytes=(11<32),
precision_mode=”FP16”,
maximum_cached_engines=100)
See the preciosion_mode part, once you have serialized you can load the networks easily on TensorRT, some good examples using cpp are here.
Unfortunately, you'll need a nvidia gpu with FP16 support, check this support matrix.
If I'm correct, Google Colab offered a Tesla K80 GPU which does not have FP16 support. I'm not sure about AWS but I'm certain the free tier does not have gpus.
Your cheapest option could be buying a Jetson Nano which is around ~90$, it's a very powerful board and I'm sure you'll use it in the future. Or you could rent some AWS gpu server, but that is a bit expensive and the setup progress is a pain.
Best of luck!
Export and convert your TensorFlow model into .onnx file.
Then, use this onnx-tensorrt tool to do the CUDA engine file conversion.

TensorFlow Keras Sequential API GPU usage

When using TensorFlow's Keras sequential API is there any way to force my model to be trained on a certain piece of hardware? My understanding is that if there is a GPU to use (and I have tensorflow-gpu installed) I will, by default, do my training on the GPU.
Do I have to switch to a different API to gain more control over where my model is deployed?
I am a keras user and I work on ubuntu. I specify a certain GPU as follows:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
where 0 is the number of GPU. By default, tensorflow uses the first GPU (whose number is 0) if there are several ones on your computer. You can obtain the information of GPUs by typing the following command on your terminal:
nvidia-smi
or
watch -n 1 -d nvidia-smi
if you want to refresh your terminal every second. The following picture shows the information of my GPU, and the number of it has been circled by a red box.

How to specify which GPU to use when running tensorflow?

We have a DGX-1 in Lab.
I see many tasks are running on different GPU.
For MLperf docker application, I can use NV_GPU=x to assign which GPU to use.
However, I have a python Keras/TensorFlow code, I used this same way, the loading doesn't go to the specified GPU.
You could use CUDA_VISIBLE_DEVICES to specify the GPU to be used by your model:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = 0,1 #Will assign GPUs 0 and 1 to the model

Using GPU capabilities for retraining images using retrain.py on tensorflow-hub

I am new to Tensorflow, I am using retrain.py to train some images. In case I have a larger data base of 10000 images and I have a GPU capable system. How can i use retrain.py to run on my Nvidia GPU. So that training will be done faster.
I am following the steps from the link below
https://www.tensorflow.org/hub/tutorials/image_retraining
To get GPU support, be sure to install the PIP package tensorflow-gpu instead of plain tensorflow. You should see some performance benefits from that for retrain.py. That said, retrain.py shows its age (far predating TF Hub) and does not utilize GPUs so well, because it does not properly batch images when extracting bottleneck values.
If you are ready to live on the cutting edge of TF 2.0.0alpha0 (announced last week), take a look at Hub's
examples/colab/tf2_image_retraining.ipynb which is considerably smaller, faster (if you use a GPU), and even supports fine-tuning the image module.

htop cpu almost red when running tensorflow, predict is very slow

I'm using tensorflow to train a model and predict, and use htop on ubuntu to monitor cpu usage. predict is very slow, I just can't bear it. htop shows that cpu color is almost red, which means almost all cpu resource is used by system kernel threads, but cpu usage is 0% before tensorflow start.
I have not changed the thread_num, I'm using tensorflow v0.11 on ubuntu14.04.
The problem is that default glibc malloc is not efficient for small allocations. Also, because Google develops/tests tensorflow with tcmalloc internally, bad interactions with regular malloc don't get ironed out. The solution is to run TensorFlow with tcmalloc.
sudo apt-get install google-perftools
export LD_PRELOAD="/usr/lib/libtcmalloc.so.4"
python ...
If you're looking for something to improve the inference performance, I could recommend trying OpenVINO. It improves your model's accuracy by converting it to Intermediate Representation (IR), conducting graph pruning, and fusing certain operations into others. Then, in runtime, it uses vectorization. OpenVINO is optimized for Intel hardware, although it should work with any CPU.
It's rather straightforward to convert the Tensorflow model to OpenVINO unless you have fancy custom layers. The full tutorial on how to do it can be found here. Some snippets are below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[tensorflow]
Use Model Optimizer to convert SavedModel model
The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, which is a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (just change data_type). Run in the command line:
mo --saved_model_dir "model" --data_type FP32 --output_dir "model_ir"
Run the inference
The converted model can be loaded by the runtime and compiled for a specific device, e.g., CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what the best choice for you is, use AUTO. If you care about latency, I suggest adding a performance hint (as shown below) to use the device that fulfills your requirement. If you care about throughput, change the value to THROUGHPUT or CUMULATIVE_THROUGHPUT.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="AUTO", config={"PERFORMANCE_HINT":"LATENCY"})
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
Disclaimer: I work on OpenVINO.