Not getting global steps in TensorFlow - tensorflow2.0

When I run following code, I don't get global steps until 5000.
First,
When I run this code, output is what I expect.
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import pandas as pd
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']
train_path = tf.keras.utils.get_file(
"iris_training.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv")
test_path = tf.keras.utils.get_file(
"iris_test.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv")
train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)
train_y = train.pop('Species')
test_y = test.pop('Species')
def input_fn(features, labels, training=True, batch_size=256):
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
# Shuffle and repeat if you are in training mode.
if training:
dataset = dataset.shuffle(1000).repeat()
return dataset.batch(batch_size)
my_feature_columns = []
for key in train.keys():
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
#print(my_feature_columns)
But,
After adding classifier code, I don't get global steps.
This is addition of code after writing code above:
classifier = tf.estimator.DNNClassifier(
feature_columns=my_feature_columns,
# Two hidden layers of 30 and 10 nodes respectively.
hidden_units=[30, 10],
# The model must choose between 3 classes.
n_classes=3)
classifier.train(
input_fn=lambda: input_fn(train, train_y, training=True),
steps=5000)
Instead of getting global steps until 5000, I get this output
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpfl0bqu11
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 1061 vs previous value: 1061. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize
I write the same code that my teacher demo but he could run this code. I can't.
How could I get global steps?

Related

Any example workflow from TensorFlow to OpenMV?

I have trained an image multi classification model based on MobileNet-V2(Only the Dense layer has been added), and have carried out full integer quantization(INT8), and then exported model.tflite file, using TF Class () to call this model.
Here is my code to quantify it:
import tensorflow as tf
import numpy as np
import pathlib
def representative_dataset():
for _ in range(100):
data = np.random.rand(1, 96, 96, 3) // random tensor for test
yield [data.astype(np.float32)]
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model/my_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
tflite_quant_model = converter.convert()
tflite_models_dir = pathlib.Path("/tmp/mnist_tflite_models/")
tflite_models_dir.mkdir(exist_ok=True, parents=True)
tflite_model_quant_file = tflite_models_dir/"mnist_model_quant.tflite"
tflite_model_quant_file.write_bytes(tflite_quant_model)
The accuracy of this model is quite good in the test while training. However, when tested on openmv, the same label is output for all objects (although the probability is slightly different).
I looked up some materials, one of them mentioned TF Classify() has offset and scale parameters, which is related to compressing RGB values to [- 1,0] or [0,1] during training, but this parameter is not available in the official API document.
for obj in tf.classify(self.net , img1, min_scale=1.0, scale_mul=0.5, x_overlap=0.0, y_overlap=0.0):
print("**********\nTop 1 Detections at [x=%d,y=%d,w=%d,h=%d]" % obj.rect())
sorted_list = sorted(zip(self.labels, obj.output()), key = lambda x: x[1], reverse = True)
for i in range(1):
print("%s = %f" % (sorted_list[i][0], sorted_list[i][1]))
return sorted_list[i][0]
So are there any examples of workflow from tensorflow training model to deployment to openmv?

tensorflow estimator passes train data through some weird normalization

Problem Description
I'm using tensorflow Estimator API, and have encountered a weird phenomenon.
I'm passing the exact same input_fn to both training and evaluation, and for some reason the images which are provided to the network are not identical.
They seem similar, but after taking a closer look, it seems that evaluation images are ok, but train images are somewhat distorted.
After loading them both, I noticed that for some reason the training images go through some kind of ReLu. I affirmed it with this code, which operates on mat_eval and mat_train, which are tensors that input_fn provides in evaluation and train mode:
special_relu = lambda mat: ((mat - 0.5) / 0.5) * ((mat - 0.5) / 0.5 > 0)
np.allclose(mat_train, special_relu(mat_eval))
>>> True
What I thought and tried
My initial thought was that it is some form of BatchNormalization. But BatchNormalization is supposed to happen within the network, and not as some preprocess, shouldn't it?
What I recorded (using tf.summary.image) was the features['image'] object, passed to my model_fn. And if I understand correctly, the features object is passed to model_fn by the input_fn called by the Estimator object.
Regardless, I tried to remove the parts in the code which are supposed to call the BatchNormalization. This had no effect. Of course, I might have not done that in the right way, but as I said it I don't really think it is BatchNormalization.
Code
from datetime import datetime
from pathlib import Path
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.python.platform import tf_logging as logging
from dcnn import modeling
from dcnn.dv_constants import BATCH_SIZE, BATCHES_PER_EPOCH
from dcnn.variant_io import get_input_fn, num_variants_in_ds
logging.set_verbosity(logging.INFO)
new_checkpoint_name = lambda: f'./train_dir/' \
f'{datetime.now().strftime("%d-%m %H:%M:%S")}'
if __name__ == '__main__':
model_name = 'small_inception'
start_from_checkpoint = ''
# start_from_checkpoint = '/home/yonatan/Desktop/yonas_code/dcnn/train_dir' \
# '/2111132905/model.ckpt-256'
model_dir = str(Path(start_from_checkpoint).parent) if \
start_from_checkpoint else new_checkpoint_name()
test = False
train = True
predict = False
epochs = 1
train_dataset_name = 'same_example'
val_dataset_name = 'same_example'
test_dataset_name = 'same_example'
predict_dataset_name = 'same_example'
model = modeling.get_model(model_name=model_name)
estimator = model.make_estimator( \
batch_size=BATCH_SIZE,
model_dir=model_dir,
params=dict(batches_per_epoch=BATCHES_PER_EPOCH),
use_tpu=False,
master='',
# The target of the TensorFlow standard server to use. Can be the empty string to run locally using an inprocess server.
start_from_checkpoint=start_from_checkpoint)
if train:
train_input_fn = get_input_fn(train_dataset_name, repeat=True)
val_input_fn = get_input_fn(val_dataset_name, repeat=False)
steps = (epochs * num_variants_in_ds(train_dataset_name)) / \
BATCH_SIZE
train_spec = tf.estimator.TrainSpec(input_fn=val_input_fn,
max_steps=steps)
eval_spec = tf.estimator.EvalSpec(input_fn=val_input_fn,
throttle_secs=1)
metrics = tf.estimator.train_and_evaluate(estimator, train_spec,
eval_spec)
print(metrics)
I have plenty of more code to share, but I tried to be concise. If anyone has any idea why this behavior happens, or needs more information, let me know.

Tensorflow: FailedPreconditionError: Error while reading resource variable from Container: localhost. When running sess.run() on custom loss function

I have a code running Keras with TensorFlow 1. The code modifies the loss function in order to do deep reinforcement learning:
import os
import gym
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
env = gym.make("CartPole-v0").env
env.reset()
n_actions = env.action_space.n
state_dim = env.observation_space.shape
from tensorflow import keras
import random
from tensorflow.keras import layers as L
import tensorflow as tf
from tensorflow.python.keras.backend import set_session
sess = tf.compat.v1.Session()
graph = tf.compat.v1.get_default_graph()
init = tf.global_variables_initializer()
sess.run(init)
network = keras.models.Sequential()
network.add(L.InputLayer(state_dim))
# let's create a network for approximate q-learning following guidelines above
network.add(L.Dense(5, activation='elu'))
network.add(L.Dense(5, activation='relu'))
network.add(L.Dense(n_actions, activation='linear'))
s = env.reset()
# Create placeholders for the <s, a, r, s'> tuple and a special indicator for game end (is_done = True)
states_ph = keras.backend.placeholder(dtype='float32', shape=(None,) + state_dim)
actions_ph = keras.backend.placeholder(dtype='int32', shape=[None])
rewards_ph = keras.backend.placeholder(dtype='float32', shape=[None])
next_states_ph = keras.backend.placeholder(dtype='float32', shape=(None,) + state_dim)
is_done_ph = keras.backend.placeholder(dtype='bool', shape=[None])
#get q-values for all actions in current states
predicted_qvalues = network(states_ph)
#select q-values for chosen actions
predicted_qvalues_for_actions = tf.reduce_sum(predicted_qvalues * tf.one_hot(actions_ph, n_actions),
axis=1)
gamma = 0.99
# compute q-values for all actions in next states
predicted_next_qvalues = network(next_states_ph)
# compute V*(next_states) using predicted next q-values
next_state_values = tf.math.reduce_max(predicted_next_qvalues, axis=1)
# compute "target q-values" for loss - it's what's inside square parentheses in the above formula.
target_qvalues_for_actions = rewards_ph + tf.constant(gamma) * next_state_values
# at the last state we shall use simplified formula: Q(s,a) = r(s,a) since s' doesn't exist
target_qvalues_for_actions = tf.where(is_done_ph, rewards_ph, target_qvalues_for_actions)
#mean squared error loss to minimize
loss = (predicted_qvalues_for_actions - tf.stop_gradient(target_qvalues_for_actions)) ** 2
loss = tf.reduce_mean(loss)
# training function that resembles agent.update(state, action, reward, next_state) from tabular agent
train_step = tf.compat.v1.train.AdamOptimizer(1e-4).minimize(loss)
a = 0
next_s, r, done, _ = env.step(a)
sess.run(train_step, {
states_ph: [s], actions_ph: [a], rewards_ph: [r],
next_states_ph: [next_s], is_done_ph: [done]
})
When I run a sess.run() training step, I get the following error:
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable beta1_power from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/beta1_power)
Any ideas on what might be the problem?
The initialization operation should be fetched and run (only one time) after the variables (i.e. model) have been created or the computation graph has been defined. Therefore, they should be put right before running the training step:
# Define and create the computation graph/model
# ...
# Initialize variables in the graph/model
init = tf.global_variables_initializer()
sess.run(init)
# Start training
sess.run(train_step, ...)

Tensorflow data import

I just started to use tensorflow, but I failed to import the data properly to use with the DNNClassifier. I actually have two files in the hdf5 format, that I import with pandas. The feature vector has dimension 100 and there are 5 classes where the features can belong to. If I use for example the following code:
import pandas as pd
import numpy as np
import tensorflow as tf
#Data
train = pd.read_hdf("train.h5", "train")
test = pd.read_hdf("test.h5", "test")
Y=train.iloc[0:,0]
X=train.iloc[0:,1:]
X_t=test.iloc[0:,0:]
Y=np.array(Y.values).astype('int')
X=np.array(X.values).astype('double')
X_t=np.array(X_t.values).astype('double')
#Train
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=100)]
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 20],
n_classes=5,
model_dir="/tmp/model")
# Define the training inputs
def get_train_inputs():
x = tf.constant(X)
y = tf.constant(Y)
return x, y
#fit
classifier.fit(input_fn=get_train_inputs, steps=1000)
predictions = list(classifier.predict(input_fn=get_train_inputs))
print(predictions)
I get the error: InvalidArgumentError (see above for traceback): Shape in shape_and_slice spec [100,10] does not match the shape stored in checkpoint: [1,10]
[[Node: save/RestoreV2_2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_2/tensor_names, save/RestoreV2_2/shape_and_slices)]]
I don't get why this happens? How should I transform my data to apply to this classifier?
My Solution:-
Change your model_dir="/tmp/model" to
model_dir="/tmp/model-1
Note:- It need not to be model-1, replace it with any valid names like
model_dir="/tmp/model-a ..something like that..

Tensorflow - How to manipulate Saver

I am working with the Boston housing data tutorial for tensorflow, but am inserting my own data set:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import pandas as pd
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.INFO)
COLUMNS = ["crim", "zn", "indus", "nox", "rm", "age",
"dis", "tax", "ptratio", "medv"]
FEATURES = ["crim", "zn", "indus", "nox", "rm",
"age", "dis", "tax", "ptratio"]
LABEL = "medv"
def input_fn(data_set):
feature_cols = {k: tf.constant(data_set[k].values) for k in FEATURES}
labels = tf.constant(data_set[LABEL].values)
return feature_cols, labels
def main(unused_argv):
# Load datasets
training_set = pd.read_csv("boston_train.csv", skipinitialspace=True,
skiprows=1, names=COLUMNS)
test_set = pd.read_csv("boston_test.csv", skipinitialspace=True,
skiprows=1, names=COLUMNS)
# Set of 6 examples for which to predict median house values
prediction_set = pd.read_csv("boston_predict.csv", skipinitialspace=True,
skiprows=1, names=COLUMNS)
# Feature cols
feature_cols = [tf.contrib.layers.real_valued_column(k)
for k in FEATURES]
# Build 2 layer fully connected DNN with 10, 10 units respectively.
regressor = tf.contrib.learn.DNNRegressor(
feature_columns=feature_cols, hidden_units=[10, 10])
# Fit
regressor.fit(input_fn=lambda: input_fn(training_set), steps=5000)
# Score accuracy
ev = regressor.evaluate(input_fn=lambda: input_fn(test_set), steps=1)
loss_score = ev["loss"]
print("Loss: {0:f}".format(loss_score))
# Print out predictions
y = regressor.predict(input_fn=lambda: input_fn(prediction_set))
print("Predictions: {}".format(str(y)))
if __name__ == "__main__":
tf.app.run()
The issue I am having is that the dataset is so big that the saving of checkpoint files via tf.train.Saver() is filling up all my disk space.
Is there a way to either disable the saving of checkpoint files, or reduce the amount of checkpoints saved in the script above?
Thanks
The tf.contrib.learn.DNNRegressor initializer takes a tf.contrib.learn.RunConfig object, which can be used to control the behavior of the internally-created saver. For example, you can do the following to keep only one checkpoint:
config = tf.contrib.learn.RunConfig(keep_checkpoint_max=1)
regressor = tf.contrib.learn.DNNRegressor(
feature_columns=feature_cols, hidden_units=[10, 10], config=config)