Eager-Mode very slow (~22x slower than Graph-Mode) - tensorflow

I read that Tensorflow 2.0 will have some major changes, and a big part will be eager-execution [1], so I tried to play a bit around with the eager-mode of tensorflow.
I took a code from a github-repo and tried to run it in eager-mode (however, without usage of Keras-Model/Layers as proposed).
It turned out, that its quite slow. So I tried different modifications and compared it with the original source (graph-mode) of the model. The result is, that the graph-mode is around 22x times faster than the eager-mode. Its total clear to me, that the graph mode is faster, but by this number?
Is this always the case or do I need some special modifications / configurations of the variables to get a comparable performance to graph mode?
The source code, for both variants, can be found at [2].
Thanks in advance!
Eager-Mode:
# With
# with tf.device("/gpu:0"):
# ...
#
# Runtime is 0.35395
# Runtime is 0.12711
# Runtime is 0.12438
# Runtime is 0.12428
# Runtime is 0.12572
# Runtime is 0.12593
# Runtime is 0.12505
# Runtime is 0.12527
# Runtime is 0.12418
# Runtime is 0.12340
Graph Mode:
# Runtime is 0.81241
# Runtime is 0.00573
# Runtime is 0.00573
# Runtime is 0.00570
# Runtime is 0.00555
# Runtime is 0.00564
# Runtime is 0.00545
# Runtime is 0.00540
# Runtime is 0.00591
# Runtime is 0.00574
[1] https://groups.google.com/a/tensorflow.org/forum/#!topic/developers/JHDpgRyFVUs
[2] https://gist.github.com/lhlmgr/f6709e5aba4a5314b5221d58232b09bd

Using eager execution may mean undoing some habits developed with TensorFlow graphs since code snippets that used to run once (e.g., Python function that constructs the graph to compute the loss) will run repeatedly (the same Python function will now compute the loss on each iteration).
I took a cursory look at code links provided and noticed some easy wins that would probably also be seen by using standard Python profiling tools. You may want use those (cProfile, pyspy etc.)
For example, the Keras network is currently implemented as:
class NFModel(tf.keras.Model):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def call(self, *args, **kwargs):
num_layers = 6
d, r = 2, 2
bijectors = []
for i in range(num_layers):
with tf.variable_scope('bijector_%d' % i):
V = tf.get_variable('V', [d, r], dtype=DTYPE) # factor loading
shift = tf.get_variable('shift', [d], dtype=DTYPE) # affine shift
L = tf.get_variable('L', [d * (d + 1) / 2], dtype=DTYPE) # lower triangular
bijectors.append(tfb.Affine(
scale_tril=tfd.fill_triangular(L),
scale_perturb_factor=V,
shift=shift,
))
alpha = tf.get_variable('alpha', [], dtype=DTYPE)
abs_alpha = tf.abs(alpha) + .01
bijectors.append(LeakyReLU(alpha=abs_alpha))
base_dist = tfd.MultivariateNormalDiag(loc=tf.zeros([2], DTYPE))
mlp_bijector = tfb.Chain(list(reversed(bijectors[:-1])), name='2d_mlp_bijector')
dist = tfd.TransformedDistribution(distribution=base_dist, bijector=mlp_bijector)
Instead, if you create the variables in __init__ once and avoid tf.get_variable calls on every call to the network, you should see a big improvement.
class NFModel(tf.keras.Model):
def __init__(self, *args, **kwargs):
super(NFModel, self).__init__(*args, **kwargs)
num_layers = 6
d, r = 2, 2
self.num_layers = num_layers
self.V = [tf.get_variable('V', [d, r], dtype=DTYPE) for _ in range(num_layers)]
self.shift = [tf.get_variable('shift', [d], dtype=DTYPE) for _ in range(num_layers)]
self.L = [tf.get_variable('L', [d * (d + 1) / 2], dtype=DTYPE) for _ in range(num_layers)]
self.alpha = [tf.get_variable('alpha', [], dtype=DTYPE) for _ in range(num_layers)]
def call(self, *args, **kwargs):
bijectors = []
for i in range(self.num_layers):
V = self.V[i]
shift = self.shift[i]
L = self.L[i]
bijectors.append(tfb.Affine(
scale_tril=tfd.fill_triangular(L),
scale_perturb_factor=V,
shift=shift,
))
alpha = self.alpha[i]
abs_alpha = tf.abs(alpha) + .01
bijectors.append(LeakyReLU(alpha=abs_alpha))
base_dist = tfd.MultivariateNormalDiag(loc=tf.zeros([2], DTYPE))
mlp_bijector = tfb.Chain(list(reversed(bijectors[:-1])), name='2d_mlp_bijector')
dist = tfd.TransformedDistribution(distribution=base_dist, bijector=mlp_bijector)
return {"dist": dist}
There are probably other such easy wins, a profiling tool will nudge you in the right direction.
Also, note that, TF 2.0 is less about "eager execution" and more about how one interacts with graphs, as per the RFC
Hope that helps.

Related

How to calculate cosine similarity given sparse matrix data in TensorFlow?

I'm supposed to change part of a python script on the GitHub website. This code is an attention-based similarity measure, but I want to turn it to cosine similarity.
The respective code is in the layers.py file (inside the call method).
Attention-Based:
def __call__(self, inputs):
x = inputs
# dropout
if self.sparse_inputs:
x = sparse_dropout(x, 1-self.dropout, self.num_features_nonzero)
else:
x = tf.nn.dropout(x, 1-self.dropout)
# graph learning
h = dot(x, self.vars['weights'], sparse=self.sparse_inputs)
N = self.num_nodes
edge_v = tf.abs(tf.gather(h,self.edge[0]) - tf.gather(h,self.edge[1]))
edge_v = tf.squeeze(self.act(dot(edge_v, self.vars['a'])))
sgraph = tf.SparseTensor(indices=tf.transpose(self.edge), values=edge_v, dense_shape=[N, N])
sgraph = tf.sparse_softmax(sgraph)
return h, sgraph
I edited the above code to what I believe are my requirements (cosine similarity). However, when I run the following code, like so:
def __call__(self, inputs):
x = inputs
# dropout
if self.sparse_inputs:
x = sparse_dropout(x, 1-self.dropout, self.num_features_nonzero)
else:
x = tf.nn.dropout(x, 1-self.dropout)
# graph learning
h = dot(x, self.vars['weights'], sparse=self.sparse_inputs)
N = self.num_nodes
h_norm = tf.nn.l2_normalize(h)
edge_v = tf.matmul(h_norm, tf.transpose(h_norm))
h_norm_1 = tf.norm(h_norm)
edge_v /= h_norm_1 * h_norm_1
edge_v = dot(edge_v, self.vars['a']) # It causes an error when I add this line
zero = tf.constant(0, dtype=tf.float32)
where = tf.not_equal(edge_v, zero)
indices = tf.where(where)
values = tf.gather_nd(edge_v, indices)
sgraph = tf.SparseTensor(indices, values, dense_shape= [N,N])
return h, sgraph
The script shows some runtime errors:
Screenshot of error message
I suspect the error here is related to line 226:
edge_v = dot(edge_v, self.vars['a']) # It causes an error when I add this line
Any admonition on how to accomplish this successfully?
Link of the script on GitHub:
https://github.com/jiangboahu/GLCN-tf
Note: I don't want to use built-in functions, because I think they are not precise to do this job.
ETA: It appears that there are some answers around but they seem to tackle different problems, as far, as I understood them.
Thanks a bunch in advance
What's the dot? Have you imported the method?
It should either be:
edge_v = tf.keras.backend.dot(edge_v, self.vars['a'])
or
edge_v = tf.tensordot(edge_v, self.vars['a'])

Tensorflow: OOM when batch size too large

My script is failing due to too high memory usage. When I reduce the batch size it works.
#tf.function(autograph=not DEBUG)
def step(prev_state, input_b):
input_b = tf.reshape(input_b, shape=[1,input_b.shape[0]])
state = FastALIFStateTuple(v=prev_state[0], z=prev_state[1], b=prev_state[2], r=prev_state[3])
new_b = self.decay_b * state.b + (tf.ones(shape=[self.units],dtype=tf.float32) - self.decay_b) * state.z
thr = self.thr + new_b * self.beta
z = state.z
i_in = tf.matmul(input_b, W_in)
i_rec = tf.matmul(z, W_rec)
i_t = i_in + i_rec
I_reset = z * thr * self.dt
new_v = self._decay * state.v + (1 - self._decay) * i_t - I_reset
# Spike generation
is_refractory = tf.greater(state.r, .1)
zeros_like_spikes = tf.zeros_like(z)
new_z = tf.where(is_refractory, zeros_like_spikes, self.compute_z(new_v, thr))
new_r = tf.clip_by_value(state.r + self.n_refractory * new_z - 1,
0., float(self.n_refractory))
return [new_v, new_z, new_b, new_r]
#tf.function(autograph=not DEBUG)
def evolve_single(inputs):
accumulated_state = tf.scan(step, inputs, initializer=state0)
Z = tf.squeeze(accumulated_state[1]) # -> [T,units]
if self.model_settings['avg_spikes']:
Z = tf.reshape(tf.reduce_mean(Z, axis=0), shape=(1,-1))
out = tf.matmul(Z, W_out) + b_out
return out # - [BS,Num_labels]
# # - Using a simple loop
# out_store = []
# for i in range(fingerprint_3d.shape[0]):
# out_store.append(tf.squeeze(evolve_single(fingerprint_3d[i,:,:])))
# return tf.reshape(out_store, shape=[fingerprint_3d.shape[0],self.d_out])
final_out = tf.squeeze(tf.map_fn(evolve_single, fingerprint_3d)) # -> [BS,T,self.units]
return final_out
This code snippet is inside a tf.function, but I omitted it since I don't think it's relevant.
As can be seen, I run the code on fingerprint_3d, a tensor that has the dimension [BatchSize,Time,InputDimension], e.g. [50,100,20]. When I run this with BatchSize < 10 everything works fine, although tf.scan already uses a lot of memory for that.
When I now execute the code on a batch of size 50, suddenly I get an OOM, even though I am executing it in an iterative matter (here commented out).
How should I execute this code so that the Batch Size doesn't matter?
Is tensorflow maybe parallelizing my for loop so that it executed over multiple batches at once?
Another unrelated question is the following: What function instead of tf.scan should I use if I only want to accumulate one state variable, compared to the case for tf.scan where it just accumulates all the state variables? Or is that possible with tf.scan?
As mentioned in the discussions here, tf.foldl, tf.foldr, and tf.scan all require keeping track of all values for all iterations, which is necessary for computations like gradients. I am not aware of any ways to mitigate this issue; still, I would also be interested if anyone has a better answer than mine.
When I used
#tf.function
def get_loss_and_gradients():
with tf.GradientTape(persistent=False) as tape:
logits, spikes = rnn.call(fingerprint_input=graz_dict["train_input"], W_in=W_in, W_rec=W_rec, W_out=W_out, b_out=b_out)
loss = loss_normal(tf.cast(graz_dict["train_groundtruth"],dtype=tf.int32), logits)
gradients = tape.gradient(loss, [W_in,W_rec,W_out,b_out])
return loss, logits, spikes, gradients
it works.
When I remove the #tf.function decorator the memory blows up. So it really seems important that tensorflow can create a graph for you computations.

Balanced Accuracy Score in Tensorflow

I am implementing a CNN for an highly unbalanced classification problem and I would like to implement custum metrics in tensorflow to use the Select Best Model callback.
Specifically I would like to implement the balanced accuracy score, which is the average of the recall of each class (see sklearn implementation here), does someone know how to do it?
I was facing the same issue so I implemented a custom class based off SparseCategoricalAccuracy:
class BalancedSparseCategoricalAccuracy(keras.metrics.SparseCategoricalAccuracy):
def __init__(self, name='balanced_sparse_categorical_accuracy', dtype=None):
super().__init__(name, dtype=dtype)
def update_state(self, y_true, y_pred, sample_weight=None):
y_flat = y_true
if y_true.shape.ndims == y_pred.shape.ndims:
y_flat = tf.squeeze(y_flat, axis=[-1])
y_true_int = tf.cast(y_flat, tf.int32)
cls_counts = tf.math.bincount(y_true_int)
cls_counts = tf.math.reciprocal_no_nan(tf.cast(cls_counts, self.dtype))
weight = tf.gather(cls_counts, y_true_int)
return super().update_state(y_true, y_pred, sample_weight=weight)
The idea is to set each class weight inversely proportional to its size.
This code produces some warnings from Autograph but I believe those are Autograph bugs, and the metric seems to work fine.
There are 3 ways I can think of tackling the situation :-
1)Random Under-sampling - In this method you can randomly remove samples from the majority classes.
2)Random Over-sampling - In this method you can increase the samples by replicating them.
3)Weighted cross entropy - You can also use weighted cross entropy so that the loss value can be compensated for the minority classes. See here
I have personally tried method2 and it does increase my accuracy by significant value but it may vary from dataset to dataset
NOTE
It appears that the implementation/API of the Recall class, which I used as a template for my answer, has been modified in the newer TF versions (as pointed out by #guilaumme-gaudin), so I recommend you look at the Recall implementation used in your current TF version and take it from there to implement the metric using the same approach I describe in the original post, this way I don't have to update my answer every time the TF team modifies the implementation/API of its metrics.
Original post
I'm not an expert in Tensorflow but using a bit of pattern matching between metrics implementations in the tf source code I came up with this
from tensorflow.python.keras import backend as K
from tensorflow.python.keras.metrics import Metric
from tensorflow.python.keras.utils import metrics_utils
from tensorflow.python.ops import init_ops
from tensorflow.python.ops import math_ops
from tensorflow.python.keras.utils.generic_utils import to_list
class BACC(Metric):
def __init__(self, thresholds=None, top_k=None, class_id=None, name=None, dtype=None):
super(BACC, self).__init__(name=name, dtype=dtype)
self.init_thresholds = thresholds
self.top_k = top_k
self.class_id = class_id
default_threshold = 0.5 if top_k is None else metrics_utils.NEG_INF
self.thresholds = metrics_utils.parse_init_thresholds(
thresholds, default_threshold=default_threshold)
self.true_positives = self.add_weight(
'true_positives',
shape=(len(self.thresholds),),
initializer=init_ops.zeros_initializer)
self.true_negatives = self.add_weight(
'true_negatives',
shape=(len(self.thresholds),),
initializer=init_ops.zeros_initializer)
self.false_positives = self.add_weight(
'false_positives',
shape=(len(self.thresholds),),
initializer=init_ops.zeros_initializer)
self.false_negatives = self.add_weight(
'false_negatives',
shape=(len(self.thresholds),),
initializer=init_ops.zeros_initializer)
def update_state(self, y_true, y_pred, sample_weight=None):
return metrics_utils.update_confusion_matrix_variables(
{
metrics_utils.ConfusionMatrix.TRUE_POSITIVES: self.true_positives,
metrics_utils.ConfusionMatrix.TRUE_NEGATIVES: self.true_negatives,
metrics_utils.ConfusionMatrix.FALSE_POSITIVES: self.false_positives,
metrics_utils.ConfusionMatrix.FALSE_NEGATIVES: self.false_negatives,
},
y_true,
y_pred,
thresholds=self.thresholds,
top_k=self.top_k,
class_id=self.class_id,
sample_weight=sample_weight)
def result(self):
"""
Returns the Balanced Accuracy (average between recall and specificity)
"""
result = (math_ops.div_no_nan(self.true_positives, self.true_positives + self.false_negatives) +
math_ops.div_no_nan(self.true_negatives, self.true_negatives + self.false_positives)) / 2
return result
def reset_states(self):
num_thresholds = len(to_list(self.thresholds))
K.batch_set_value(
[(v, np.zeros((num_thresholds,))) for v in self.variables])
def get_config(self):
config = {
'thresholds': self.init_thresholds,
'top_k': self.top_k,
'class_id': self.class_id
}
base_config = super(BACC, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
I've simply taken the Recall class implementation from the source code as a template and I extended it to make sure it has a TP,TN,FP and FN defined.
After that I modified the result method so that it calculates balanced accuracy and voila :)
I compared the results from this with sklearn's balanced accuracy score and the values matched so I think it's correct, but do double check just in case.
I have not tested this code yet, but looking at the source code of tensorflow==2.1.0, this might work for the binary classification case:
from tensorflow.keras.metrics import Recall
from tensorflow.python.ops import math_ops
class BalancedBinaryAccuracy(Recall):
def result(self):
result = (math_ops.div_no_nan(self.true_positives, self.true_positives + self.false_negatives) +
math_ops.div_no_nan(self.true_negatives, self.true_negatives + self.false_positives)) / 2
return result[0] if len(self.thresholds) == 1 else result
As an alternative to writing a custom metric, you can write a custom callback using the metrics already implemented ad available via the training logs. For example you can define the training balanced accuracy callback like this:
class TrainBalancedAccuracyCallback(tf.keras.callbacks.Callback):
def __init__(self, **kargs):
super(TrainBalancedAccuracyCallback, self).__init__(**kargs)
def on_epoch_end(self, epoch, logs={}):
train_sensitivity = logs['tp'] / (logs['tp'] + logs['fn'])
train_specificity = logs['tn'] / (logs['tn'] + logs['fp'])
logs['train_sensitivity'] = train_sensitivity
logs['train_specificity'] = train_specificity
logs['train_balacc'] = (train_sensitivity + train_specificity) / 2
print('train_balacc', logs['train_balacc'])
and the same for the validation:
class ValBalancedAccuracyCallback(tf.keras.callbacks.Callback):
def __init__(self, **kargs):
super(ValBalancedAccuracyCallback, self).__init__(**kargs)
def on_epoch_end(self, epoch, logs={}):
val_sensitivity = logs['val_tp'] / (logs['val_tp'] + logs['val_fn'])
val_specificity = logs['val_tn'] / (logs['val_tn'] + logs['val_fp'])
logs['val_sensitivity'] = val_sensitivity
logs['val_specificity'] = val_specificity
logs['val_balacc'] = (val_sensitivity + val_specificity) / 2
print('val_balacc', logs['val_balacc'])
and then you can use these as values to the callback argument of the fit method of the model.

(Openmdao 2.4.0) 'compute_partials' function of a Component seems to be run even when forcing 'declare_partials' to FD for this component

I want to solve MDA for Sellar using Newton non linear solver for the Group . I have defined Disciplines with Derivatives (using 'compute_partials') but I want to check the number of calls to Discipline 'compute' and 'compute_partials' when forcing or not the disciplines not to use their analytical derivatives (using 'declare_partials' in the Problem definition ). The problem is that is seems that the 'compute_partials' function is still called even though I force not to use it .
Here is an example (Sellar)
So for Discipline 2, I add a counter and I have
from openmdao.test_suite.components.sellar import SellarDis1, SellarDis2
class SellarDis2withDerivatives(SellarDis2):
"""
Component containing Discipline 2 -- derivatives version.
"""
def _do_declares(self):
# Analytic Derivs
self.declare_partials(of='*', wrt='*')
self.exec_count_d = 0
def compute_partials(self, inputs, J):
"""
Jacobian for Sellar discipline 2.
"""
y1 = inputs['y1']
if y1.real < 0.0:
y1 *= -1
J['y2', 'y1'] = .5*y1**-.5
J['y2', 'z'] = np.array([[1.0, 1.0]])
self.exec_count_d += 1
I create a similar MDA as on OpendMDAO docs but calling SellarDis2withDerivatives I have created and SellarDis1withDerivatives and changing the nonlinear_solver for Newton_solver() like this
cycle.add_subsystem('d1', SellarDis1withDerivatives(), promotes_inputs=['x', 'z', 'y2'], promotes_outputs=['y1'])
cycle.add_subsystem('d2', SellarDis2withDerivatives(), promotes_inputs=['z', 'y1'], promotes_outputs=['y2'])
# Nonlinear Block Gauss Seidel is a gradient free solver
cycle.nonlinear_solver = NewtonSolver()
cycle.linear_solver = DirectSolver()
Then I run the following problem
prob2 = Problem()
prob2.model = SellarMDA()
prob2.setup()
prob2.model.cycle.d1.declare_partials('*', '*', method='fd')
prob2.model.cycle.d2.declare_partials('*', '*', method='fd')
prob2['x'] = 2.
prob2['z'] = [-1., -1.]
prob2.run_model()
count = prob2.model.cycle.d2.exec_count_d
print("Number of derivatives calls (%i)"% (count))
And , as a results, I obtain
=====
cycle
NL: Newton Converged in 3 iterations
Number of derivatives calls (3)
Therefore, it seems that the function 'compute_partials' is still called somehow (even if the derivatives are computed with FD ). Does someone as an explanation ?
I believe this to be a bug (or perhaps an unintended consequence of how derivatives are specified.)
This behavior is a by-product of mixed declaration of derivative, where we allow the user to specify some derivatives on a component to be 'fd' and other derivatives to be analytic. So, we are always capable of doing both fd and compute_partials on a component.
There are two changes we could make in openmdao to remedy this:
Don't call compute_partials if no derivatives were explicitly declared as analytic.
Filter out any variables declared as 'fd' so that if a user tries to set them in compute_partials, a keyerror is raised (or maybe just a warning, and the derivative value is not overwritten)
In the meantime, the only workarounds would be to comment out the compute_partials method, or alternatively enclose the component in a group and finite difference the group.
Another workaround is to have an attribute (here called _call_compute_partials) in your class, which tracks, if there where any analytical derivatives declared. And the conditional in compute_partials() could be implemented outside the method, where the method is called.
from openmdao.core.explicitcomponent import ExplicitComponent
from openmdao.core.indepvarcomp import IndepVarComp
from openmdao.core.problem import Problem
from openmdao.drivers.scipy_optimizer import ScipyOptimizeDriver
class ExplicitComponent2(ExplicitComponent):
def __init__(self, **kwargs):
super(ExplicitComponent2, self).__init__(**kwargs)
self._call_compute_partials = False
def declare_partials(self, of, wrt, dependent=True, rows=None, cols=None, val=None,
method='exact', step=None, form=None, step_calc=None):
if method == 'exact':
self._call_compute_partials = True
super(ExplicitComponent2, self).declare_partials(of, wrt, dependent, rows, cols, val,
method, step, form, step_calc)
class Cylinder(ExplicitComponent2):
"""Main class"""
def setup(self):
self.add_input('radius', val=1.0)
self.add_input('height', val=1.0)
self.add_output('Area', val=1.0)
self.add_output('Volume', val=1.0)
# self.declare_partials('*', '*', method='fd')
# self.declare_partials('*', '*')
self.declare_partials('Volume', 'height', method='fd')
self.declare_partials('Volume', 'radius', method='fd')
self.declare_partials('Area', 'height', method='fd')
self.declare_partials('Area', 'radius')
# self.declare_partials('Area', 'radius', method='fd')
def compute(self, inputs, outputs):
radius = inputs['radius']
height = inputs['height']
area = height * radius * 2 * 3.14 + 3.14 * radius ** 2 * 2
volume = 3.14 * radius ** 2 * height
outputs['Area'] = area
outputs['Volume'] = volume
def compute_partials(self, inputs, partials):
if self._call_compute_partials:
print('Calculate partials...')
if __name__ == "__main__":
prob = Problem()
indeps = prob.model.add_subsystem('indeps', IndepVarComp(), promotes=['*'])
indeps.add_output('radius', 2.) # height
indeps.add_output('height', 3.) # radius
main = prob.model.add_subsystem('cylinder', Cylinder(), promotes=['*'])
# setup the optimization
prob.driver = ScipyOptimizeDriver()
prob.model.add_design_var('radius', lower=0.5, upper=5.)
prob.model.add_design_var('height', lower=0.5, upper=5.)
prob.model.add_objective('Area')
prob.model.add_constraint('Volume', lower=10.)
prob.setup()
prob.run_driver()
print(prob['Volume']) # should be around 10

Dequeueing from RandomShuffleQueue does not reduce size

In order to train a model I have encapsulated my model in a class.
I use a tf.RandomShuffleQueue to enqueue a list of filenames to.
However when I dequeue the elements they get dequeued but the size of the queue does not reduce.
Following are more specific questions followed by the code snippet :
If I have only 5 images for example, but steps range upto 100, would this result in the addfilenames called repeatedly automatically ? It does not give me any error on dequeuing so I am thinking that it is getting called automatically.
Why the size of the tf.RandomShuffleQueue is not changing ? It remains constant.
import os
import time
import functools
import tensorflow as tf
from Read_labelclsloc import readlabel
def ReadTrain(traindir):
# Returns a list of training images, their labels and a dictionay.
# The dictionary maps label names to integer numbers.
return trainimgs, trainlbls, classdict
def ReadVal(valdir, classdict):
# Reads the validation image labels.
# Returns a dictionary with filenames as keys and
# corresponding labels as values.
return valdict
def lazy_property(function):
# Just a decorator to make sure that on repeated calls to
# member functions, ops don't get created repeatedly.
# Acknowledgements : https://danijar.com/structuring-your-tensorflow-models/
attribute= '_cache_' + function.__name__
#property
#functools.wraps(function)
def decorator(self):
if not hasattr(self, attribute):
setattr(self, attribute, function(self))
return getattr(self, attribute)
return decorator
class ModelInitial:
def __init__(self, traindir, valdir):
self.graph
self.traindir = traindir
self.valdir = valdir
self.traininginfo()
self.epoch = 0
def traininginfo(self):
self.trainimgs, self.trainlbls, self.classdict = ReadTrain(self.traindir)
self.valdict = ReadVal(self.valdir, self.classdict)
with self.graph.as_default():
self.trainimgs_tensor = tf.constant(self.trainimgs)
self.trainlbls_tensor = tf.constant(self.trainlbls, dtype=tf.uint16)
self.trainimgs_dict = {}
self.trainimgs_dict["ImageFile"] = self.trainimgs_tensor
return None
#lazy_property
def graph(self):
g = tf.Graph()
with g.as_default():
# Layer definitions go here
return g
#lazy_property
def addfilenames (self):
# This is the function where filenames are pushed to a RandomShuffleQueue
filename_queue = tf.RandomShuffleQueue(capacity=len(self.trainimgs), min_after_dequeue=0,\
dtypes=[tf.string], names=["ImageFile"],\
seed=0, name="filename_queue")
sz_op = filename_queue.size()
dq_op = filename_queue.dequeue()
enq_op = filename_queue.enqueue_many(self.trainimgs_dict)
return filename_queue, enq_op, sz_op, dq_op
def Train(self):
# The function for training.
# I have not written the training part yet.
# Still struggling with preprocessing
with self.graph.as_default():
filename_q, filename_enqueue_op, sz_op, dq_op= self.addfilenames
qr = tf.train.QueueRunner(filename_q, [filename_enqueue_op])
filename_dequeue_op = filename_q.dequeue()
init_op = tf.global_variables_initializer()
sess = tf.Session(graph=self.graph)
sess.run(init_op)
coord = tf.train.Coordinator()
enq_threads = qr.create_threads(sess, coord=coord, start=True)
counter = 0
for step in range(100):
print(sess.run(dq_op["ImageFile"]))
print("Epoch = %d "%(self.epoch))
print("size = %d"%(sess.run(sz_op)))
counter+=1
names = [n.name for n in self.graph.as_graph_def().node]
coord.request_stop()
coord.join(enq_threads)
print("Counter = %d"%(counter))
return None
if __name__ == "__main__":
modeltrain = ModelInitial(<Path to training images>,\
<Path to validation images>)
a = modeltrain.graph
print(a)
modeltrain.Train()
print("Success")
The mystery is caused by the tf.train.QueueRunner that you created for the queue, which causes it to be filled in the background.
The following lines cause a background "queue runner" thread to be created:
qr = tf.train.QueueRunner(filename_q, [filename_enqueue_op])
# ...
enq_threads = qr.create_threads(sess, coord=coord, start=True)
This thread calls filename_enqueue_op in a loop, which causes the queue to be filled up as you remove elements from it.
The background thread from step 1 will almost always have a pending enqueue operation (filename_enqueue_op) on the queue. This means that after you dequeue a filename, the pending enqueue will run add fill the queue back up to capacity. (Technically there is a race condition here and you could see a size of capacity - 1, but this is quite unlikely).