PyTorch Multi-GPU K80s Batch fails for Tensors - batch-processing

My training works for mini-batches when trained on single GPU (default).
if USE_CUDA:
encoderchar = encoderchar.cuda()
encoder = encoder.cuda()
decoder = decoder.cuda()
But, when I train with all available GPUs, I get an error.
if USE_CUDA:
encoderchar = torch.nn.DataParallel(encoderchar, device_ids=[0, 1, 2, 3, 4, 5, 6, 7])
encoder = torch.nn.DataParallel(encoder, device_ids=[0, 1, 2, 3, 4, 5, 6, 7])
decoder = torch.nn.DataParallel(decoder, device_ids=[0, 1, 2, 3, 4, 5, 6, 7])
encoderchar = encoderchar.cuda()
encoder = encoder.cuda()
decoder = decoder.cuda()
I get the following error during forward.
RuntimeError Traceback (most recent call last)
<ipython-input-10-227f3e86847c> in <module>()
18 loss, ar1, ar2 = train(data_input_batch_index, data_input_batch_length, data_target_batch_index, data_target_batch_length,
19 encoderchar, encoder, decoder, encoderchar_optimizer, encoder_optimizer, decoder_optimizer,
---> 20 criterion, batch_size)
21
22 # Keep track of loss
<ipython-input-8-21861d792653> in train(input_batch, input_batch_length, target_batch, target_batch_length, encoderchar, encoder, decoder, encoderchar_optimizer, encoder_optimizer, decoder_optimizer, criterion, batch_size)
21 #reshaped_input_length = Variable(torch.LongTensor(reshaped_input_length)).cuda()
22 hidden_all, output = encoderchar(w, reshaped_input_length)
---> 23 encoder_input[ix] = output.transpose(0,1).contiguous().view(batch_size, -1)
24
25 temporary_target_batch_length = [15] * batch_size
/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/autograd/variable.py in __setitem__(self, key, value)
78 else:
79 if isinstance(value, Variable):
---> 80 return SetItem(key)(self, value)
81 else:
82 return SetItem(key, value)(self)
/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py in forward(self, i, value)
37 else: # value is Tensor
38 self.value_size = value.size()
---> 39 i._set_index(self.index, value)
40 return i
41
RuntimeError: sizes do not match at /py/conda-bld/pytorch_1493681908901/work/torch/lib/THC/THCTensorCopy.cu:31
one cuda long tensor and one list are the parameter types passed to the encoderchar feedforward.
hidden_all, output = encoderchar(w, reshaped_input_length)
encoder_input[ix] = output.transpose(0,1).contiguous().view(batch_size, -1)
nvidia-smi shows the following after the error is thrown.
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage
| 0 18320 C python 453MiB |
| 1 18320 C python 266MiB |
| 2 18320 C python 266MiB |
| 3 18320 C python 266MiB |
| 4 18320 C python 266MiB |
| 5 18320 C python 266MiB |
| 6 18320 C python 266MiB |
| 7 18320 C python 262MiB |
+-----------------------------------------------------------------------------+
What's wrong here?

DataParallel needs to know which dim to split the input data (ie which dim is the batch_size). It assumes (by default) that the dimension representing the batch_size of the input in dim=0.
In your case the batch size is in dim 1 for the inputs to encoderchar module.
So, either you modify your DataParallel instantiation, specifying dim=1:
encoderchar = torch.nn.DataParallel(encoderchar, device_ids=[0, 1, 2, 3, 4, 5, 6, 7], dim=1)
Or, change the input size by doing this, (moving the batch_size dim to 0):
w = w.view(batch_size, -1)

Related

Error resulting from ImageDataGenerator during data augmentation

Can someone please help me in fixing the error? The code works fine before the for loop. Before the for loop, an array of the image was printed. Is there something wrong with the for loop? The output should be a file stored with augmented images of the input image. The input image is a jpg image.
The code I wrote:
import keras
import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator
data_gen = tf.keras.preprocessing.image.ImageDataGenerator(
rotation_range=45,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='contrast',
cval=125
)
x = io.imread('mona.jpg')
x = x.reshape((1, ) + x.shape) #Array with shape (1, 256, 256, 3)
i = 0
for batch in data_gen.flow(x, batch_size=16, save_to_dir='/Users/ghad/Desktop',
save_prefix='aug',
save_format='jpg'):
i += 1
if i > 20:
The generated error:
RuntimeError Traceback (most recent call last)
Input In [14], in <cell line: 31>()
28 x = x.reshape((1, ) + x.shape) #Array with shape (1, 256, 256, 3)
30 i = 0
---> 31 for batch in data_gen.flow(x, batch_size=16,
32 save_to_dir='/Users/ghadahalhabib/Desktop',
33 save_prefix='aug',
34 save_format='jpg'):
35 i += 1
36 if i > 20:
File ~/opt/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/preprocessing/image.py:148, in Iterator.__next__(self, *args, **kwargs)
147 def __next__(self, *args, **kwargs):
--> 148 return self.next(*args, **kwargs)
File ~/opt/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/preprocessing/image.py:160, in Iterator.next(self)
157 index_array = next(self.index_generator)
158 # The transformation of images is not under thread lock
159 # so it can be done in parallel
--> 160 return self._get_batches_of_transformed_samples(index_array)
File ~/opt/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/preprocessing/image.py:709, in NumpyArrayIterator._get_batches_of_transformed_samples(self, index_array)
707 x = self.x[j]
708 params = self.image_data_generator.get_random_transform(x.shape)
--> 709 x = self.image_data_generator.apply_transform(
710 x.astype(self.dtype), params)
711 x = self.image_data_generator.standardize(x)
712 batch_x[i] = x
File ~/opt/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/preprocessing/image.py:1800, in ImageDataGenerator.apply_transform(self, x, transform_parameters)
1797 img_col_axis = self.col_axis - 1
1798 img_channel_axis = self.channel_axis - 1
-> 1800 x = apply_affine_transform(
1801 x,
1802 transform_parameters.get('theta', 0),
1803 transform_parameters.get('tx', 0),
1804 transform_parameters.get('ty', 0),
1805 transform_parameters.get('shear', 0),
1806 transform_parameters.get('zx', 1),
1807 transform_parameters.get('zy', 1),
1808 row_axis=img_row_axis,
1809 col_axis=img_col_axis,
1810 channel_axis=img_channel_axis,
1811 fill_mode=self.fill_mode,
1812 cval=self.cval,
1813 order=self.interpolation_order)
1815 if transform_parameters.get('channel_shift_intensity') is not None:
1816 x = apply_channel_shift(x,
1817 transform_parameters['channel_shift_intensity'],
1818 img_channel_axis)
File ~/opt/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/preprocessing/image.py:2324, in apply_affine_transform(x, theta, tx, ty, shear, zx, zy, row_axis, col_axis, channel_axis, fill_mode, cval, order)
2321 final_affine_matrix = transform_matrix[:2, :2]
2322 final_offset = transform_matrix[:2, 2]
-> 2324 channel_images = [ndimage.interpolation.affine_transform( # pylint: disable=g-complex-comprehension
2325 x_channel,
2326 final_affine_matrix,
2327 final_offset,
2328 order=order,
2329 mode=fill_mode,
2330 cval=cval) for x_channel in x]
2331 x = np.stack(channel_images, axis=0)
2332 x = np.rollaxis(x, 0, channel_axis + 1)
File ~/opt/anaconda3/envs/tensorflow/lib/python3.9/site-packages/keras/preprocessing/image.py:2324, in <listcomp>(.0)
2321 final_affine_matrix = transform_matrix[:2, :2]
2322 final_offset = transform_matrix[:2, 2]
-> 2324 channel_images = [ndimage.interpolation.affine_transform( # pylint: disable=g-complex-comprehension
2325 x_channel,
2326 final_affine_matrix,
2327 final_offset,
2328 order=order,
2329 mode=fill_mode,
2330 cval=cval) for x_channel in x]
2331 x = np.stack(channel_images, axis=0)
2332 x = np.rollaxis(x, 0, channel_axis + 1)
File ~/opt/anaconda3/envs/tensorflow/lib/python3.9/site-packages/scipy/ndimage/interpolation.py:574, in affine_transform(input, matrix, offset, output_shape, output, order, mode, cval, prefilter)
572 npad = 0
573 filtered = input
--> 574 mode = _ni_support._extend_mode_to_code(mode)
575 matrix = numpy.asarray(matrix, dtype=numpy.float64)
576 if matrix.ndim not in [1, 2] or matrix.shape[0] < 1:
File ~/opt/anaconda3/envs/tensorflow/lib/python3.9/site-packages/scipy/ndimage/_ni_support.py:54, in _extend_mode_to_code(mode)
52 return 6
53 else:
---> 54 raise RuntimeError('boundary mode not supported')
RuntimeError: boundary mode not supported
for the code
for batch in data_gen.flow(x, batch_size=16, save_to_dir='/Users/ghad/Desktop', save_prefix='aug', save_format='jpg'):
you are inputting only a single image but asking to produce 16 augmented images. That won't work. Normal the length of x is LARGER than the batch size. Set the batch size to 1. That way you will produce 1 augment image each time you feed a new image into the generator

How to split the dataset into inputs and labels in tensorflow?

consider the code below. I want to split the tensorflow.python.data.ops.dataset_ops.BatchDataset into inputs and labels according to the function below. Yet I get the error 'BatchDataset' object is not subscriptable. Can anyone help me with that?
import tensorflow as tf
input_slice=3
labels_slice=2
def split_window(features):
inputs = features[:, input_slice, :]
labels = features[:, labels_slice, :]
#####create a batch dataset
dataset = tf.data.Dataset.range(1, 25 + 1).batch(5)
#####split the dataset into input and labels
dataset=split_window(dataset)
The dataset without the split window looks like this:
tf.Tensor([1 2 3 4 5], shape=(5,), dtype=int64)
tf.Tensor([ 6 7 8 9 10], shape=(5,), dtype=int64)
tf.Tensor([11 12 13 14 15], shape=(5,), dtype=int64)
tf.Tensor([16 17 18 19 20], shape=(5,), dtype=int64)
tf.Tensor([21 22 23 24 25], shape=(5,), dtype=int64)
But what I meant was to display the inputs and labels like this:
Inputs:
[1 2 3 ]
[ 6 7 8 ]
[11 12 13 ]
[16 17 18 ]
[21 22 23 ]
Labels:
[4 5]
[9 10]
[14 15]
[19 20]
[24 25]
You can try this:
import tensorflow as tf
input_slice=3
labels_slice=2
def split_window(x):
features = tf.slice(x,[0], [input_slice])
labels = tf.slice(x,[input_slice], [labels_slice])
return features, labels
dataset = tf.data.Dataset.range(1, 25 + 1).batch(5).map(split_window)
for i, j in dataset:
print(i.numpy(),end="->")
print(j.numpy())
[1 2 3]->[4 5]
[6 7 8]->[ 9 10]
[11 12 13]->[14 15]
[16 17 18]->[19 20]
[21 22 23]->[24 25]
You can't apply a Python function directly to a tf.data.Dataset. You need to use the .map() method. Also, your function is returning nothing.
import tensorflow as tf
input_slice = 3
labels_slice = 2
def split_window(features):
inputs = tf.gather_nd(features, [input_slice])
labels = tf.gather_nd(features, [labels_slice])
return inputs, labels
dataset = tf.data.Dataset.range(1, 25 + 1).batch(5).map(split_window)
for x, y in dataset:
print(x.numpy(), y.numpy())
4 3
9 8
14 13
19 18
24 23

Conditional sum of two 3D arrays; condition is a 1D array

I want to add two 3D (shape (N, 3)) numpy arrays conditionally where the condition is specified as a 1D array (shape of N).
What's an efficient (vectorized) way to do this? numpy.where() only supports a conditional operation where all three arrays (including the condition) have matching dimensions.
For instance:
a = np.asarray([[1, 1], [2, 2], [3, 3], [4, 4]])
b = np.asarray([[0.2, 0.2], [0.3, 0.3], [0.4, 0.4], [0.5, 0.5]])
c = np.asarray([0, 1, 1, 0])
I would like to be able to do:
np.where(c == 1, a + b, a)
i.e., do an element-wise add of a + b as long as the corresponding element in c is equal to 1 at the same index in the array.
However, I get an error instead:
ValueError: operands could not be broadcast together with shapes (4,) (4,3) (4,3)
Use boolean indexing:
import numpy as np
a = np.random.randint(20, 30, size=(5, 3))
#[[23 29 23]
# [20 27 24]
# [28 26 26]
# [27 20 26]
# [23 24 23]]
b = np.random.randint(20, 30, size=(5, 3))
#[[22 25 20]
# [28 29 20]
# [29 22 29]
# [28 28 21]
# [22 26 27]]
c = np.random.randint(0, 2, size=5).astype(bool)
# [ True True True False False]
r = a + b
r[~c] = a[~c] # keeping the default value if the corresponding value is 0 (or False) in c.
print(r)
[[45 54 43]
[48 56 44]
[57 48 55]
[27 20 26]
[23 24 23]]

Tensorflow: how to make sure all samples in each batch are with the same label?

I wonder whether there are some ways to apply constraints on the batches to generate in Tensorflow. For example, let's say we are training a CNN on a huge dataset to do image classification. Is it possible to force Tensorflow to generate batches where all samples are with the same class? Like, one batch of images all tagged with "Apple", the other one where samples all tagged with "Orange".
The reason I ask this question is I want to do some experiments to see how different levels of shuffling influence the final trained models. It's common practice to do sample-level shuffling for CNN training, and everybody is doing it. I just want to check it myself, thus obtaining a more vivid and first-hand knowledge about it.
Thanks!
Dataset.filter() can be used:
labels = np.random.randint(0, 10, (10000))
data = np.random.uniform(size=(10000, 5))
ds = tf.data.Dataset.from_tensor_slices((data, labels))
ds = ds.filter(lambda data, labels: tf.equal(labels, 1)) #comment this line out for unfiltered case
ds = ds.batch(5)
iterator = ds.make_one_shot_iterator()
vals = iterator.get_next()
with tf.Session() as sess:
for _ in range(5):
py_data, py_labels = sess.run(vals)
print(py_labels)
with ds.filter():
> [1 1 1 1 1]
[1 1 1 1 1]
[1 1 1 1 1]
[1 1 1 1 1]
[1 1 1 1 1]
without ds.filter():
> [8 0 7 6 3]
[2 4 7 6 1]
[1 8 5 5 5]
[7 1 7 4 0]
[7 1 8 0 0]
Edit. The following code shows how to use a feedable iterator to perform batch label selection on the fly. See "Creating an iterator"
labels = ['Apple'] * 100 + ['Orange'] * 100
data = list(range(200))
random.shuffle(labels)
batch_size = 4
ds_apple = tf.data.Dataset.from_tensor_slices((data, labels)).filter(
lambda data, label: tf.equal(label, 'Apple')).batch(batch_size)
ds_orange = tf.data.Dataset.from_tensor_slices((data, labels)).filter(
lambda data, label: tf.equal(label, 'Orange')).batch(batch_size)
handle = tf.placeholder(tf.string, [])
iterator = tf.data.Iterator.from_string_handle(
handle, ds_apple.output_types, ds_apple.output_shapes)
batch = iterator.get_next()
apple_iterator = ds_apple.make_one_shot_iterator()
orange_iterator = ds_orange.make_one_shot_iterator()
with tf.Session() as sess:
apple_handle = sess.run(apple_iterator.string_handle())
orange_handle = sess.run(orange_iterator.string_handle())
# loop and switch back and forth between apples and oranges
for _ in range(3):
feed_dict = {handle: apple_handle}
print(sess.run(batch, feed_dict=feed_dict))
feed_dict = {handle: orange_handle}
print(sess.run(batch, feed_dict=feed_dict))
Typical output for this is as follows. Note that the data values increase monotonically across Apple and Orange batches showing that the iterators are not resetting.
> (array([2, 3, 6, 7], dtype=int32), array([b'Apple', b'Apple', b'Apple', b'Apple'], dtype=object))
(array([0, 1, 4, 5], dtype=int32), array([b'Orange', b'Orange', b'Orange', b'Orange'], dtype=object))
(array([ 9, 13, 15, 19], dtype=int32), array([b'Apple', b'Apple', b'Apple', b'Apple'], dtype=object))
(array([ 8, 10, 11, 12], dtype=int32), array([b'Orange', b'Orange', b'Orange', b'Orange'], dtype=object))
(array([21, 22, 23, 25], dtype=int32), array([b'Apple', b'Apple', b'Apple', b'Apple'], dtype=object))
(array([14, 16, 17, 18], dtype=int32), array([b'Orange', b'Orange', b'Orange', b'Orange'], dtype=object))

Does 'tf.assign' return its argument?

The Tensorflow documentation says that tf.assign(ref, ...) returns ref, but it appears instead (not surprisingly) to return a Tensor (attached to the assign op):
import tensorflow as tf
sess = tf.InteractiveSession()
Q = tf.Variable(tf.constant(range(1, 12)))
sess.run(tf.global_variables_initializer())
qop = tf.assign(Q, tf.zeros(Q.shape, tf.int32))#.eval()
print(Q.eval())
print(qop.eval())
print(Q.eval())
produces
[ 1 2 3 4 5 6 7 8 9 10 11]
[0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0]
demonstrating that the argument Q and what's returned qop behave differently (and that Q is unchanged until qop is executed).
Is the return value of tf.assign described correctly in the documentation?
Take a look at the documentation of Tensorflow about operations. tf.assign returns an Operation, which represents a graph node that performs computations on tensors. You use the operations to compose a graph of computations. Those computations actually occur at a later time, when you call eval on any of the operations of the graph.
In your example qop is the definition of an operation that assigns zeros to variable Q. The graph of your example would look something like Q --> qep. For pedagogical purposes, let's change the order of your code to something like this:
Q = tf.Variable(tf.constant(range(1, 12)))
Q.eval() # Error: Variable has not been initalized.
sess.run(tf.global_variables_initializer())
Q.eval() # Output: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], dtype=int32)
qop = tf.assign(Q, tf.zeros(Q.shape, tf.int32))
Q.eval() # Output: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], dtype=int32)
qop.eval()
Q.eval() # Output array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)
The first time you evaluate Q you get an error, because the variable represented by Q does not contain anything in it yet. But after you run sess.run(tf.global_variables_initializer()) the error goes away. That is because that line of code runs an operation that initializes all the global variables of the current graph. When you run Q.eval() after defining the qop operation, Q still has the same values, because operation qop was defined, but not yet executed. Once you execute qop (qop.eval) is that the value of the variable represnted by Q changes.