tf.data.Dataset input pipeline delivering bad results compared to Threading/Queue - tensorflow

Earlier I used Threads and Queues as my Data Pipeline and I got a really high Util on both GPUs (the data was created on the fly). I wanted to use the tf Dataset, but I struggle to reproduce the results.
I tried a lot of approches. Since I create data on the fly the from_generator() method seemed perfect. The code you see below is my last try I did. It seems that there is a bottleneck in creating the data although I am using the map() function for the processing of the generated images. What I tried in the code below I wanted to "multithread" the generators somehow, so there is more data coming in at the same time. But no better results so far.
def generator(n):
with tf.device('/cpu:0'):
while True:
...
yield image, label
def get_generator(n):
return partial(generator, n)
def dataset(n):
return tf.data.Dataset.from_generator(get_generator(n), output_types=(tf.float32, tf.float32), output_shapes=(tf.TensorShape([None,None,1]),tf.TensorShape([None,None,1])))
def input_fn():
# ds = tf.data.Dataset.from_generator(generator, output_types=(tf.float32, tf.float32), output_shapes=(tf.TensorShape([None,None,1]),tf.TensorShape([None,None,1])))
ds = tf.data.Dataset.range(BATCH_SIZE).apply(tf.data.experimental.parallel_interleave(dataset, cycle_length=BATCH_SIZE))
ds = ds.map(map_func=lambda img, lbl: processImage(img, lbl))
ds = ds.shuffle(SHUFFLE_SIZE)
ds = ds.batch(BATCH_SIZE)
ds = ds.prefetch(1)
return ds
The expected results would be a high GPU Util (>80%), but for now it is really low 10/20%.

You can use tf.data.Dataset.from_tensor_slices instead.
Just pass image/labels path. The function accepts filenames as argument.
def input_func():
dataset = tf.data.Dataset.from_tensor_slices(images_path, labels_path)
dataset = dataset.shuffle().repeat()
...
return dataset

Related

How to avoid memory leakage in an autoregressive model within tensorflow

Recently, I am training a LSTM with attention mechanism for regressionin tensorflow 2.9 and I met an problem during training with model.fit():
At the beginning, the training time is okay, like 7s/step. However, it was increasing during the process and after several steps, like 1000, the value might be 50s/step. Here below is a part of the code for my model:
class AttentionModel(tf.keras.Model):
def __init__(self, encoder_output_dim, dec_units, dense_dim, batch):
super().__init__()
self.dense_dim = dense_dim
self.batch = batch
encoder = Encoder(encoder_output_dim)
decoder = Decoder(dec_units,dense_dim)
self.encoder = encoder
self.decoder = decoder
def call(self, inputs):
# Creat a tensor to record the result
tempt = list()
encoder_output, encoder_state = self.encoder(inputs)
new_features = np.zeros((self.batch, 1, 1))
dec_initial_state = encoder_state
for i in range(6):
dec_inputs = DecoderInput(new_features=new_features, enc_output=encoder_output)
dec_result, dec_state = self.decoder(dec_inputs, dec_initial_state)
tempt.append(dec_result.logits)
new_features = dec_result.logits
dec_initial_state = dec_state
result=tf.concat(tempt,1)
return result
In the official documents for tf.function, I notice: "Don't rely on Python side effects like object mutation or list appends".
Since I use a dynamic python list with append() to record the intermediate variables, I guess each time during training, a new tf.graph was added. Is the reason my training is getting slower and slower?
Additionally, what should I use instead of python list to avoid this? I have tried with a numpy.zeros matrix but it will lead to another problem:
tempt = np.zeros(shape=(1,6))
...
for i in range(6):
dec_inputs = DecoderInput(new_features=new_features, enc_output=encoder_output)
dec_result, dec_state = self.decoder(dec_inputs, dec_initial_state)
tempt[i]=(dec_result.logits)
...
Cannot convert a symbolic tf.Tensor (decoder/dense_3/BiasAdd:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported.

Metrics using batches v/s metrics using full dataset

I am using training an image classification model using the pre-trained mobile network. During training, I am seeing very high values (more than 70%) for Accuracy, Precision, Recall, and F1-score on both the training dataset and validation dataset.
For me, this is an indication that my model is learning fine.
But when I checked these metrics on an Unbatched training and Unbatched Validation these metrics are very low. These are not even 1%.
Unbatched dataset means I am not taking calculating these metrics over batches and not taking the average of metrics to calculate the final metrics which is what Tensorflow/Keras does during model training. I am calculating these metrics on a full dataset in a single run
I am unable to find out what is causing this Behaviour. Please help me understand what is causing this difference and how to ensure that results are consistent on both, i.e. a minor difference is acceptable.
Code that I used for evaluating metrics
My old code
def test_model(model, data, CLASSES, label_one_hot=True, average="micro",
threshold_analysis=False, thres_analysis_start_point=0.0,
thres_analysis_end_point=0.95, thres_step=0.05, classwise_analysis=False,
produce_confusion_matrix=False):
images_ds = data.map(lambda image, label: image)
labels_ds = data.map(lambda image, label: label).unbatch()
NUM_VALIDATION_IMAGES = count_data_items(tf_records_filenames=data)
cm_correct_labels = next(iter(labels_ds.batch(NUM_VALIDATION_IMAGES))).numpy() # get everything as one batch
if label_one_hot is True:
cm_correct_labels = np.argmax(cm_correct_labels, axis=-1)
cm_probabilities = model.predict(images_ds)
cm_predictions = np.argmax(cm_probabilities, axis=-1)
warnings.filterwarnings('ignore')
overall_score = f1_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=average)
overall_precision = precision_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=average)
overall_recall = recall_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=average)
# cmat = (cmat.T / cmat.sum(axis=1)).T # normalized
# print('f1 score: {:.3f}, precision: {:.3f}, recall: {:.3f}'.format(score, precision, recall))
overall_test_results = {'overall_f1_score': overall_score, 'overall_precision':overall_precision, 'overall_recall':overall_recall}
if classwise_analysis is True:
label_index_dict = get_index_label_from_tf_record(dataset=data)
label_index_dict = {k:v for k, v in sorted(list(label_index_dict.items()))}
label_index_df = pd.DataFrame(label_index_dict, index=[0]).T.reset_index().rename(columns={'index':'class_ind', 0:'class_names'})
# Class wise precision, recall and f1_score
classwise_score = f1_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=None)
classwise_precision = precision_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=None)
classwise_recall = recall_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=None)
ind_class_count_df = class_ind_counter_from_tfrecord(data)
ind_class_count_df = ind_class_count_df.merge(label_index_df, how='left', left_on='class_names', right_on='class_names')
classwise_test_results = {'classwise_f1_score':classwise_score, 'classwise_precision':classwise_precision,
'classwise_recall':classwise_recall, 'class_names':CLASSES}
classwise_test_results_df = pd.DataFrame(classwise_test_results)
if produce_confusion_matrix is True:
cmat = confusion_matrix(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)))
return overall_test_results, classwise_test_results, cmat
return overall_test_results, classwise_test_results
if produce_confusion_matrix is True:
cmat = confusion_matrix(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)))
return overall_test_results, cmat
warnings.filterwarnings('always')
return overall_test_results
Just to ensure that my model testing function is correct I write a newer version of code in TensorFlow.
def eval_model(y_true, y_pred):
eval_results = {}
unbatch_accuracy = tf.keras.metrics.CategoricalAccuracy(name='unbatch_accuracy')
unbatch_recall = tf.keras.metrics.Recall(name='unbatch_recall')
unbatch_precision = tf.keras.metrics.Precision(name='unbatch_precision')
unbatch_f1_micro = tfa.metrics.F1Score(name='unbatch_f1_micro', num_classes=n_labels, average='micro')
unbatch_f1_macro = tfa.metrics.F1Score(name='unbatch_f1_macro', num_classes=n_labels, average='macro')
unbatch_accuracy.update_state(y_true, y_pred)
unbatch_recall.update_state(y_true, y_pred)
unbatch_precision.update_state(y_true, y_pred)
unbatch_f1_micro.update_state(y_true, y_pred)
unbatch_f1_macro.update_state(y_true, y_pred)
eval_results['unbatch_accuracy'] = unbatch_accuracy.result().numpy()
eval_results['unbatch_recall'] = unbatch_recall.result().numpy()
eval_results['unbatch_precision'] = unbatch_precision.result().numpy()
eval_results['unbatch_f1_micro'] = unbatch_f1_micro.result().numpy()
eval_results['unbatch_f1_macro'] = unbatch_f1_macro.result().numpy()
unbatch_accuracy.reset_states()
unbatch_recall.reset_states()
unbatch_precision.reset_states()
unbatch_f1_micro.reset_states()
unbatch_f1_macro.reset_states()
return eval_results
The results are nearly the same by using both of the functions.
Please suggest what is going on here.
I think this sugesstion MAY help you, I am not sure. in this, you added
unbatch_accuracy.reset_states()
unbatch_recall.reset_states()
unbatch_precision.reset_states()
unbatch_f1_micro.reset_states()
unbatch_f1_macro.reset_states()
resetting states at each epoch maybe not be a cumulative one
After spending many hours, I found the issue was due to the shuffle function. I was using the below function to shuffle, batch and prefetch the dataset.
def shuffle_batch_prefetch(dataset, prefetch_size=1, batch_size=16,
shuffle_buffer_size=None,
drop_remainder=False,
interleave_num_pcall=None):
if shuffle_buffer_size is None:
raise ValueError("shuffle_buffer_size can't be None")
def shuffle_fn(ds):
return ds.shuffle(buffer_size=shuffle_buffer_size, seed=108)
dataset = dataset.apply(shuffle_fn)
dataset = dataset.batch(batch_size, drop_remainder=drop_remainder)
dataset = dataset.prefetch(buffer_size=prefetch_size)
return dataset
Part of the function that causes the problem
def shuffle_fn(ds):
return ds.shuffle(buffer_size=shuffle_buffer_size, seed=108)
dataset = dataset.apply(shuffle_fn)
I removed the shuffle part and metrics are back as per the expectation.
Function after removing the shuffle part
def shuffle_batch_prefetch(dataset, prefetch_size=1, batch_size=16,
drop_remainder=False,
interleave_num_pcall=None):
dataset = dataset.batch(batch_size, drop_remainder=drop_remainder)
dataset = dataset.prefetch(buffer_size=prefetch_size)
return dataset
Results after removing the shuffle part
I am still not able to understand why shuffling causes this error. Shuffling was the best practice to follow before training your data. Although, I have already shuffled training data during data read time so removing this was not a problem for me

Sampling for large class and augmentation for small classes in each batch

Let's say we have 2 classes one is small and the second is large.
I would like to use for data augmentation similar to ImageDataGenerator
for the small class, and sampling from each batch, in such a way, that, that each batch would be balanced. (Fro minor class- augmentation for major class- sampling).
Also, I would like to continue using image_dataset_from_directory (since the dataset doesn't fit into RAM).
What about
sample_from_datasets
function?
import tensorflow as tf
from tensorflow.python.data.experimental import sample_from_datasets
def augment(val):
# Example of augmentation function
return val - tf.random.uniform(shape=tf.shape(val), maxval=0.1)
big_dataset_size = 1000
small_dataset_size = 10
# Init some datasets
dataset_class_large_positive = tf.data.Dataset.from_tensor_slices(tf.range(100, 100 + big_dataset_size, dtype=tf.float32))
dataset_class_small_negative = tf.data.Dataset.from_tensor_slices(-tf.range(1, 1 + small_dataset_size, dtype=tf.float32))
# Upsample and augment small dataset
dataset_class_small_negative = dataset_class_small_negative \
.repeat(big_dataset_size // small_dataset_size) \
.map(augment)
dataset = sample_from_datasets(
datasets=[dataset_class_large_positive, dataset_class_small_negative],
weights=[0.5, 0.5]
)
dataset = dataset.shuffle(100)
dataset = dataset.batch(6)
iterator = dataset.as_numpy_iterator()
for i in range(5):
print(next(iterator))
# [109. -10.044552 136. 140. -1.0505208 -5.0829906]
# [122. 108. 141. -4.0211563 126. 116. ]
# [ -4.085523 111. -7.0003924 -7.027302 -8.0362625 -4.0226436]
# [ -9.039093 118. -1.0695585 110. 128. -5.0553837]
# [100. -2.004463 -9.032592 -8.041705 127. 149. ]
Set up the desired balance between the classes in the weights parameter of sample_from_datasets.
As it was noticed by
Yaoshiang,
the last batches are imbalanced and the datasets length are different. This can be avoided by
# Repeat infinitely both datasets and augment the small one
dataset_class_large_positive = dataset_class_large_positive.repeat()
dataset_class_small_negative = dataset_class_small_negative.repeat().map(augment)
instead of
# Upsample and augment small dataset
dataset_class_small_negative = dataset_class_small_negative \
.repeat(big_dataset_size // small_dataset_size) \
.map(augment)
This case, however, the dataset is infinite and the number of batches in epoch has to be further controlled.
You can use tf.data.Dataset.from_generator that allows more control on your data generation without loading all your data into RAM.
def generator():
i=0
while True :
if i%2 == 0:
elem = large_class_sample()
else :
elem =small_class_augmented()
yield elem
i=i+1
ds= tf.data.Dataset.from_generator(
generator,
output_signature=(
tf.TensorSpec(shape=yourElem_shape , dtype=yourElem_ype))
This generator will alterate samples between the two classes,and you can add more dataset operations(batch , shuffle..)
I didn't totally follow the problem. Would psuedo-code this work? Perhaps there are some operators on tf.data.Dataset that are sufficient to solve your problem.
ds = image_dataset_from_directory(...)
ds1=ds.filter(lambda image, label: label == MAJORITY)
ds2=ds.filter(lambda image, label: label != MAJORITY)
ds2 = ds2.map(lambda image, label: data_augment(image), label)
ds1.batch(int(10. / MAJORITY_RATIO))
ds2.batch(int(10. / MINORITY_RATIO))
ds3 = ds1.zip(ds2)
ds3 = ds3.map(lambda left, right: tf.concat(left, right, axis=0)
You can use the tf.data.Dataset.from_tensor_slices to load the images of two categories seperately and do data augmentation for the minority class. Now that you have two datasets combine them with tf.data.Dataset.sample_from_datasets.
# assume class1 is the minority class
files_class1 = glob('class1\\*.jpg')
files_class2 = glob('class2\\*.jpg')
def augment(filepath):
class_name = tf.strings.split(filepath, os.sep)[0]
image = tf.io.read_file(filepath)
image = tf.expand_dims(image, 0)
if tf.equal(class_name, 'class1'):
# do all the data augmentation
image_flip = tf.image.flip_left_right(image)
return [[image, class_name],[image_flip, class_name]]
# apply data augmentation for class1
train_class1 = tf.data.Dataset.from_tensor_slices(files_class1).\
map(augment,num_parallel_calls=tf.data.AUTOTUNE)
train_class2 = tf.data.Dataset.from_tensor_slices(files_class2)
dataset = tf.python.data.experimental.sample_from_datasets(
datasets=[train_class1,train_class2],
weights=[0.5, 0.5])
dataset = dataset.batch(BATCH_SIZE)

Does tf.dataset.Dataset.interleave() offer any benefit if items are simple images?

Assuming we load images from disk using tensorflow dataset like this:
paths = tf.data.Dataset.list_files("/path/to/dataset/train-*.png")
images = paths.map(load_img_from_path, num_parallel_calls=tf.data.AUTOTUNE)
In this special case: Would we gain any benefit (speed) in using interleave() instead of map()?
P.s.: I asked a broader question here, but the question above was not answered. Or more specifically the existing answer only compared the scenario above with another scenario using a tf.records dataset.
In the example below, interleave() is actually significantly slower than map(), but I'm probably misunderstanding something...
PATHS_IMG = glob.glob(pattern)
BATCH_SIZE = 4
def main():
measure_time(func_map) # ~20s
measure_time(func_interleave) # ~60s
measure_time(func_map) # ~20s
def measure_time(func_parse):
t0 = time.time()
# setup dataset and iterate over it
ds = tf.data.Dataset.from_tensor_slices(PATHS_IMG)
ds = func_parse(ds)
ds = ds.batch(BATCH_SIZE)
ds = ds.prefetch(1) # same time measurements if we set ds.prefetch(4)
for i, item in enumerate(ds):
print(f"{i}/{len(PATHS_IMG)//BATCH_SIZE}: len(item)={len(item)} and item[0].shape={item[0].shape}")
# return
print(f"It took {time.time() - t0} seconds")
def func_map(ds):
ds = ds.map(_parse_tf, num_parallel_calls=BATCH_SIZE)
return ds
def func_interleave(ds):
ds = ds.interleave(
lambda p: tf.data.Dataset.from_tensors(_parse_tf(p)),
num_parallel_calls=BATCH_SIZE,
)
return ds
def _parse_tf(path):
def _parse_np(path):
img = PIL.Image.open(path) # uint8
img = np.array(img, dtype=np.float32)
img = img[:512, :512, :]
return img
return tf.numpy_function(_parse_np, [path], tf.float32)

Does the Tensorflow Dataset API include Queue functionality? [duplicate]

TL;DR: how to ensure that data is loaded in multi threaded manner when using Dataset api in tensorflow 0.1.4?
Previously I did something like this with my images in disk:
filename_queue = tf.train.string_input_producer(filenames)
image_reader = tf.WholeFileReader()
_, image_file = image_reader.read(filename_queue)
imsize = 120
image = tf.image.decode_jpeg(image_file, channels=3)
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
image_r = tf.image.resize_images(image, [imsize, imsize])
images = tf.train.shuffle_batch([image_r],
batch_size=20,
num_threads=30,
capacity=200,
min_after_dequeue=0)
This ensures that there will be 20 threads getting data ready for next learning iterations.
Now with the Dataset api I do something like:
dataset = tf.data.Dataset.from_tensor_slices((filenames, filenames_up, filenames_blacked))
dataset = dataset.map(parse_upscaler_corrector_batch)
After this I create an iterator:
sess = tf.Session();
iterator = dataset.make_initializable_iterator();
next_element = iterator.get_next();
sess.run(iterator.initializer);
value = sess.run(next_element)
Variable value will be passed for further processing.
So how do I ensure that data is being prepared in multui-threading manner here? Where could I read about Dataset api and multi threading data read?
So it appears that the way to achieve this is as follows:
dataset = dataset.map(parse_upscaler_corrector_batch, num_parallel_calls=12).prefetch(32).batch(self.ex_config.batch_size)
If one changes num_parallel_calls=12 one can see that both network/hdd load and cpu load either spike or decrease.