Tensorflow Slim Imagenet training - tensorflow

I am trying to prepare the date to train an ImageNet model from scratch and I am a bit confused about how the training works.
While preparing the TF records I noticed this file inside the Inception model data directory: "imagenet_metadata.txt". The file holds labels for 21842 classes yet the training script and "imagenet_lsvrc_2015_synsets.txt" file only works for 1000 classes.
I am wondering what modifications I need to do to train the model on the 21K classes not the 1K one?

It's quite straightforward with slim.To Train imgnet 21k with slim I recommend to do the following steps:
1.In tf_models/slim/datasets Folder create a copy of imagenet.py File ( for example imgnet.py).In the new file Change the required
Variables to your desired Values:
_FILE_PATTERN = ####your tfrecord_file_pattern. for me('imgnet_%s_*.tfrecord')
_SPLITS_TO_SIZES = {
'train': ####Training Samples,
'validation': ####Validation Samples,}
_NUM_CLASSES = 21841
*the wordnet synset contains 21482 entries but the total number of classes in imagenet21k in 21481 (n04399382 is missed).so be sure about the total number of available classes.
*Also you need to do a little modification in code in order to load the synset files from your local address.
base_url = '/home/snf/libraries/tf_models/slim'
synset_url = '{}/listOfTags.txt'.format(base_url)
synset_to_human_url = '{}/imagenet21k_metadata.txt'.format(base_url)
Add the new dataset to datasetfactory.py in tf_models/slim/datasets :
from datasets import imgnet
datasets_map = {
'cifar10': cifar10,
'flowers': flowers,
'imagenet': imagenet,
'mnist': mnist,
'imgnet': imgnet, #add this line to dataset_map
}
In tf_models/slim/ create Train_Imgnet.sh File containing these lines:
TRAIN_DIR=trained/imgnet-inception-v4
DATASET_DIR=/media/where/tfrecords/saved/tfRecords-fall11_21k
CUDA_VISIBLE_DEVICES="0,1,2,3" python train_image_classifier.py
--train_dir=${TRAIN_DIR} \
--dataset_name=imgnet \
--dataset_split_name=train \
--dataset_dir=${DATASET_DIR} \
--model_name=inception_v4 \
--max_number_of_steps=10000000 \
--batch_size=32 \
--learning_rate=0.01 \
--learning_rate_decay_type=fixed \
--save_interval_secs=60 \
--save_summaries_secs=60 \
--log_every_n_steps=100 \
--optimizer=rmsprop \
--weight_decay=0.00004\
--num_readers=12 \
--num_clones=4
set the file to executable (Chmod +x Train_Imgnet.sh ) and run it (./Train_Imgnet.sh )

Related

How to Loop Yolov3 iterate on google colab?

Im using google colab and iterate one image using this code
%cd /content/darknet/
!./darknet detector test \
/content/dataset/data/obj.data \
/content/dataset/models/yolov3.cfg \
/content/dataset/backup/yolov3_final.weights \
/content/test_images/IMG_2022.jpg\
show_image(/content/darknet/predictions.jpg)```
but i want to iterate multiple image in one folder, can someone have a solution on how to run multiple image in one folder using yolov3 program?

Using LaBSE deployed to Google Cloud AI Platform

I deployed LaBSE model to AI Platform in the last past few days.
The issue I encounter is the answer of the request is above the limit of 2MB.
Several ideas I had to improve the situation:
make AI Platform return minified (not beautifully formatted) JSON (without spaces and newlines everywhere
make AI Plateform return the results in a binary format
since the answer is composed of ~13 outputs : change it to only one output
Do you know any ways of doing 1) or 2) ?
I spent lost of efforts on 3). I'm sure this one is possible. For example by editing the network before uploading it. Here are stuff I tried so far for that:
VERSION = 'v1'
MODEL = 'labse_2_b'
MODEL_DIR = BUCKET + '/' + MODEL
# Download the model
! wget 'https://tfhub.dev/google/LaBSE/2?tf-hub-format=compressed' \
--output-document='{MODEL}.tar.gz'
! mkdir {MODEL}
! tar -xzvf '{MODEL}.tar.gz' -C {MODEL}
# Attempts to load the model, edit it, and save it:
model.save(export_path, save_format='tf') # ValueError: Model <keras.engine.sequential.Sequential object at 0x7f87e833c650>
# cannot be saved because the input shapes have not been set.
# Usually, input shapes are automatically determined from calling
# `.fit()` or `.predict()`.
# To manually set the shapes, call `model.build(input_shape)`.
model.build(input_shape=(None,)) # cannot find a proper shape
# create a AI Plateform model version:
! gsutil -m cp -r '{MODEL}' {MODEL_DIR} # upload model to Google Cloud Storage
! gcloud ai-platform versions create $VERSION \
--model {MODEL} \
--origin {MODEL_DIR} \
--runtime-version=2.1 \
--framework='tensorflow' \
--python-version=3.7 \
--region="{REGION}"
Could some please help with with that?
Thanks a lot in advance,
EDIT :
For those wondering about this limitation, as in the comments below : Here are some complementary pieces of information:
A short sentence as
"I wish you a pleasant flight and a good meal aboard this plane."
which is just 16 parts of words long:
[101, 146, 34450, 15100, 170, 147508, 48088, 14999, 170, 17072, 66369, 351617, 15272, 69746, 119, 102]
cannot be processed:
Response size too large. Received at least 3220082 bytes; max is 2000000.". Details: "Response size too large. Received at least 3220082 bytes; max is 2000000.

Can't get $TPU_NAME environment variable to work properly

I'm a newbie! I'm trying to train BERT model from scratch on a Kaggle kernel. Can't get the BERT run_pretraining.py script to work on TPUs. Works fine on CPUs though. I'm guessing the issue is with the $TPU_NAME environment variable.
!python run_pretraining.py \
--input_file='gs://xxxxxxxxxx/*' \
--output_dir=/kaggle/working/model/ \
--do_train=True \
--do_eval=True \
--bert_config_file=/kaggle/input/bert-bangla-test-config/config.json \
--train_batch_size=32 \
--max_seq_length=128 \
--max_predictions_per_seq=20 \
--num_train_steps=20 \
--num_warmup_steps=2 \
--learning_rate=2e-5 \
--use_tpu=True \
--tpu_name=$TPU_NAME
If the script is using a tf.distribute.cluster_resolver.TPUClusterResolver() (https://www.tensorflow.org/api_docs/python/tf/distribute/cluster_resolver/TPUClusterResolver), then you can simply instantiate the TPUClusterResolver without any arguments, and it'll automatically pickup the TPU_NAME (https://github.com/tensorflow/tensorflow/blob/v2.3.0/tensorflow/python/tpu/client/client.py#L47).
Okay, I found a rookie solution :P
run:
import os
os.environ
from the returned dictionary, you can get the address. Just copy paste it or something. It'll be in the form 'TPU_NAME': 'grpc://xxxxxxx'.

Using your own evaluation and training set in cloud-ml-engine sample

in the flowers tutorial by google here: https://cloud.google.com/ml-engine/docs/tensorflow/flowers-tutorial
For preproccessing of data we used the dollwoing command:
python trainer/preprocess.py \
--input_dict "$DICT_FILE" \
--input_path "gs://cloud-ml-data/img/flower_photos/train_set.csv" \
--output_path "${GCS_PATH}/preproc/train" \
--cloud
I understand we could replace the csv file with our own list and hence train with a different set of images, however creating a csv files for over a 100 types of images will be cumbersome, is there a way to overcome this?
The train_set.csv is a list of file paths in Google Cloud Storage and the prediction label.
This is a part of the file:
gs://cloud-ml-data/img/flower_photos/daisy/754296579_30a9ae018c_n.jpg,daisy
gs://cloud-ml-data/img/flower_photos/dandelion/18089878729_907ed2c7cd_m.jpg,dandelion
gs://cloud-ml-data/img/flower_photos/dandelion/284497199_93a01f48f6.jpg,dandelion
gs://cloud-ml-data/img/flower_photos/dandelion/3554992110_81d8c9b0bd_m.jpg,dandelion
gs://cloud-ml-data/img/flower_photos/daisy/4065883015_4bb6010cb7_n.jpg,daisy
gs://cloud-ml-data/img/flower_photos/roses/7420699022_60fa574524_m.jpg,roses
gs://cloud-ml-data/img/flower_photos/dandelion/4558536575_d43a611bd4_n.jpg,dandelion
gs://cloud-ml-data/img/flower_photos/daisy/7568630428_8cf0fc16ff_n.jpg,daisy
gs://cloud-ml-data/img/flower_photos/tulips/7064813645_f7f48fb527.jpg,tulips
gs://cloud-ml-data/img/flower_photos/sunflowers/4933229095_f7e4218b28.jpg,sunflowers
gs://cloud-ml-data/img/flower_photos/daisy/14523675369_97c31d0b5b.jpg,daisy
gs://cloud-ml-data/img/flower_photos/sunflowers/21518663809_3d69f5b995_n.jpg,sunflowers
gs://cloud-ml-data/img/flower_photos/dandelion/15782158700_3b9bf7d33e_m.jpg,dandelion
gs://cloud-ml-data/img/flower_photos/tulips/8713398906_28e59a225a_n.jpg,tulips
gs://cloud-ml-data/img/flower_photos/tulips/6770436217_281da51e49_n.jpg,tulips
gs://cloud-ml-data/img/flower_photos/dandelion/8754822932_948afc7cef.jpg,dandelion
gs://cloud-ml-data/img/flower_photos/daisy/22873310415_3a5674ec10_m.jpg,daisy
gs://cloud-ml-data/img/flower_photos/sunflowers/5967283168_90dd4daf28_n.jpg,sunflowers
So you will have to collect a set of images for your own train set, upload it to the GCS and clasify them. Then you just have to retrieve the list of path (it could be easily achieve using gsutil ls command) and concatenate with the classification label.

How periodicaly evaluate the Performance of Models in TF-Slim?

I am trying to use DensNet for regression problem with TF-Slim. My data contains 60000 jpeg images with 37 float labels for each image. I divided my data into three different tfrecords files of a train set (60%), a validation set (20%) and a test set (20%).
I need to evaluate validation set during training loop and make a plot like image.
In TF-Slim documentation they just explain train loop and evaluation loop separately. I can just evaluate validation or test set after training loop finished. While as I said I need to evaluate during training.
I tried to use slim.evaluation.evaluation_loop function instead of slim.evaluation.evaluate_once. But it doesn't help.
slim.evaluation.evaluation_loop(
master=FLAGS.master,
checkpoint_dir=checkpoint_path,
logdir=FLAGS.eval_dir,
num_evals=num_batches,
eval_op=list(names_to_updates.values()) + print_ops,
variables_to_restore=variables_to_restore,
summary_op = tf.summary.merge(summary_ops),
eval_interval_secs = eval_interval_secs )
I tried evaluation.evaluate_repeatedly as well.
from tensorflow.contrib.training.python.training import evaluation
evaluation.evaluate_repeatedly(
master=FLAGS.master,
checkpoint_dir=checkpoint_path,
eval_ops=list(names_to_updates.values()) + print_ops,
eval_interval_secs = eval_interval_secs )
In both of these functions, they just read the latest available checkpoint from checkpoint_dir and apparently waiting for the next one, however when the new checkpoints are generated, they don't perform at all.
I use Python 2.7.13 and Tensorflow 1.3.0 on CPU.
Any help will be highly appreciated.
Using evaluate_once works just fine with bash script using sleep. Appears that Tensorboard is capable plotting multiple single runs from given eval_dir...
So I use something like:
#!/bin/bash
set -e
# Paths to model and evaluation results
TRAIN_DIR=~/pDL/tensorflow/model/mobilenet_v1_1_224_rp-v1/run0004
TEST_DIR=${TRAIN_DIR}/eval
# Where the dataset is saved to.
DATASET_DIR=/mnt/data/tensorflow/data
# Run evaluation (using slim.evaluation.evaluate_once)
CONTINUE=1
while [ "$CONTINUE" -ne 0 ]
do
python eval_image_classifier.py \
--checkpoint_path=${TRAIN_DIR} \
--eval_dir=${TEST_DIR} \
--dataset_name=master_db \
--preprocessing_name=preprocess224 \
--dataset_split_name=valid \
--dataset_dir=${DATASET_DIR} \
--model_name=mobilenet_v1 \
--patch_size=64
echo "sleeping for next run"
sleep 600
done
This appear to be issue of setting the checkpoint_path properly as addressed here:
https://github.com/tensorflow/tensorflow/issues/13769
Where the answer is by Ellie68 setting:
if tf.gfile.IsDirectory(FLAGS.checkpoint_path):
if tf.train.latest_checkpoint(FLAGS.checkpoint_path):
checkpoint_path = tf.train.latest_checkpoint(FLAGS.checkpoint_path)
else:
checkpoint_path = FLAGS.checkpoint_path