Training Tensorflow object detection model on AWS - tensorflow

Tutorial on the github page for Tensorflow object detection API also has information on running the training on Google Cloud Platform.
But I need to run the training on AWS instance. I have the TFRecords files with me. Is there any tutorial etc available for same?Googling doesn't help much.I am new to AWS.

You need to launch an instance which already has Tensorflow installed on it. AWS has prepared AMIs for that.
see here: https://aws.amazon.com/tensorflow/
Then you just upload anything to the instance and run the script.

Related

Can Horovod with TensorFlow work on non-GPU instances in Amazon SageMaker?

I want to perform distributed training on Amazon SageMaker. The code is written with TensorFlow and similar to the following code where I think CPU instance should be enough: 
https://github.com/horovod/horovod/blob/master/examples/tensorflow_word2vec.py
Can Horovod with TensorFlow work on non-GPU instances in Amazon SageMaker?
Yeah you should be able to use both CPU's and GPU's with Horovod on Amazon SageMaker. Please follow the below example for the same
https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/tensorflow_script_mode_horovod/tensorflow_script_mode_horovod.ipynb

Use a model trained by Google Cloud Vertex AI accelerated with TRT on Jetson Nano

I am trying to standardize our deployment workflow for machine vision systems. So we were thinking of the following workflow.
Deployment workflow
So, we want to create the prototype for the same, so we followed the workflow. So, there is no problem with GCP operation whatsoever but when we try to export models, which we train on the vertexAI it will give three models as mentioned in the workflow which is:
SaveModel
TFLite
TFJS
and we try these models to convert into the ONNX model but we failed due to different errors.
SaveModel - Always getting the same error with any parameter which is as follows
Error in savemodel
I tried to track the error and I identified that the model is not loading inside the TensorFlow only which is wired since it is exported from the GCP vertexAI which leverages the power of TensorFlow.
TFLite - Successfully converted but again the problem with the opset of ONNX but with 15 opset it gets successfully converted but then NVIDIA tensorRT ONNXparser doesn't recognize the model during ONNX to TRT conversion.
TFJS - yet not tried.
So we are blocked here due to these problems.
We can run these models exported directly from the vertexAI on the Jetson Nano device but the problem is TF-TRT and TensorFlow is not memory-optimized on the GPU so the system gets frozen after 3 to 4 hours of running.
We try this workflow with google teachable machine once and it workout well all steps are working perfectly fine so I am really confused How I conclude this full workflow since it's working on a teachable machine which is created by Google and not working on vertexAI model which is again developed by same Company.
Or am I doing Something wrong in this workflow?
For the background we are developing this workflow inside C++ framework for the realtime application in industrial environment.

Automate AI-Platforms model deployment using pipelines on GCP

I have some models running in AI-Platforms under GCP which are serving predictions without a problem.
Now I am trying to automate this deployment process using kubernets pipelines so that model version gets updated periodically. I tried to create some pipelines using available samples but non of these are for AI platforms.
The training of the model has been handled by AI-Platform Jobs with following parameters:
Python: 3.7
Framework: Tensorflow
Framework version: 2.1
ML runtime version: 2.1
Trained model are being created parodically and are being saved in buckets.
How can I automate this deployment process using pipelines.
If there is another alternative approach for this automation, I would like to try it as well.

Downloading ImageNet validation set

I have a problem when Downloading the ImageNet validation dataset on colab using wget command.
Using the official site inforced me to create an account which doesn't help me.
Is there a solution to download it or someone could share it with me ?

How can I use the TensorFlow Embeddings Projector inside my own GCP VM or Jupterlab instance?

Is there a way of running the Embedding Projector inside my GCP Jupyterlab instance (or through any other GCP service) as opposed to using the public https://projector.tensorflow.org ?
The TensorFlow documentation mentions that Embeddings Projector can be run inside Tensorboard, but doesn't provide any links or details.
Unfortunately there is not an Google Cloud product available that brings those projector functionalities specifically but you can run the projector Tensorboard plugin in AI Notebooks (Jupyterlab) locally.
Here's the source Tensorboard's projector plugin repository and here's the step by step guide where the projector plugin has been used for that specific use case you mentioned. Bear in mind that this step by step guide is done on Tensorflow 1.1x not 2.0.0.
If you want to use Tensorflow 2.0.0 you will need to import the plugin like this
from tensorboard.plugins import projector
and then migrate all the Tensorflow 1.1x code to >= 2.0 described in the guide in order to get the same log files as the guide. If you already have the neccesary files to make your custom projector you just need to select the plugin inside the Tensorboard UI.
Tensorboard Projector plugin selection
You can also make a web embedding into an IFrame if using the public Tensorboard tool (I understand that this is not your case but this might be helpful to other people searching for an alternative solution). Opening an AI Notebook and pasting the following code would do the job.
import IPython
url = 'https://projector.tensorflow.org/'
IPython.display.IFrame(url, width=1333, height=900)
Remember to change the width and height values if you need to.