I'm building a small program to predict some float from an 1d array of floats. So far I've been using dense layers to achieve this:
const model = sequential();
model.add(layers.dense({units: 32, inputShape: [numCols,]}));
model.add(layers.activation({activation: 'relu'}));
model.add(layers.dense({units: 4}));
model.add(layers.dense({units: 1}));
Where my xs input shape is [numRows, numCols] (e.g. [132, 100] - in a dataset of 132 examples: [[1, 2, 3, ...], [4, 5, 6, ...], ...]) and my ys output is a single value [num] (e.g. [17.50]).
But I wanted to try out LSTM to test if it would perform better. The issue is that the layers for LSTM want a 3d matrix and I was not sure how to go about it.
I've tried the following:
const trainXs = xs.clone()
.reshape([numRows, numCols, 1]);
The above converted my input [[1, 2, 3, ...], [4, 5, 6, ...], ...] to [[[1], [2], [3], ...], [[4], [5], [6], ...], ...].
And the layers:
const model = sequential();
model.add(layers.simpleRNN({
units: 32,
inputShape: [numCols, numRows], // [100, 132]
recurrentInitializer: 'glorotNormal',
returnSequences: true
}));
model.add(layers.simpleRNN({
units: 32,
recurrentInitializer: 'glorotNormal',
returnSequences: true
}));
But the above would fail with the following error:
Error: Error when checking input: expected simple_rnn_SimpleRNN1_input to have shape [,100,132], but got array with shape [132,100,1].
I'm a bit confused and I'm not sure how I should reshape my 2d tensor to fit the requirements of the LSTM layers.
Update:
The fit call:
model.fit(trainXs, trainYs, {
epochs: 1000,
batchSize: 12,
validationData: [testXs, testYs] // Test data has the same shape as trainXs/trainYs
});
I only have a single layer the moment:
model.add(layers.simpleRNN({
units: 32,
inputShape: [1, numCols, numRows],
recurrentInitializer: 'glorotNormal',
returnSequences: true
}));
The reference says:
The shape of the input (not including the first, batch dimension) needs to be at least 2-D, with the first dimension being time steps.
so the first dimension of your input should contain the time steps. For simplicity just use 1. So in your case the shape of the tensor, which is passed to the cell would be [1, numCols, numRows] as you already got in the error message.
Related
I am new to deep learning and I am utterly confused about the terminology.
In the Tensorflow documentation,
for [RNN layer] https://www.tensorflow.org/api_docs/python/tf/keras/layers/RNN#input_shape
N-D tensor with shape [batch_size, timesteps, ...]
for [LSTM layer]
https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM
inputs: A 3D tensor with shape [batch, timesteps, feature].
I understand for the input_shape, we don't have to specify the batch/batch size.
But still I would like to know the difference between batch & batch size.
What is time-steps vs features?
Is the 1st Dimension always the batch? The 2nd-D = Time-steps, and 3rd-D = Features?
Example 1
data = array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
data = data.reshape((1, 5, 2))
print(data.shape) --> (1, 5, 2)
print(data)
[[[ 1 2]
[ 3 4]
[ 5 6]
[ 7 8]
[ 9 10]]]
model = Sequential()
model.add(LSTM(32, input_shape=(5, 2)))
Example 2
data1 = array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11])
n_features = 1
data1 = data1.reshape((len(data1), n_features))
print(data1)
# define generator
n_input = 2
generator = TimeseriesGenerator(data1, data1, length=n_input, stride=2, batch_size=10)
# number of batch
print('Batches: %d' % len(generator))
# OUT --> Batches: 1
# print each batch
for i in range(len(generator)):
x, y = generator[i]
print('%s => %s' % (x, y))
x, y = generator[0]
print(x.shape)
[[[ 1]
[ 2]]
[[ 3]
[ 4]]
[[ 5]
[ 6]]
[[ 7]
[ 8]]
[[ 9]
[10]]] => [[ 3]
[ 5]
[ 7]
[ 9]
[11]]
(5, 2, 1)
# define model
model = Sequential()
model.add(LSTM(100, activation='relu', input_shape=(n_input, n_features)))
Difference between batch_size v. batch
In the documentation you quoted, batch means batch_size.
Meaning of timesteps and feature
Taking a glance at https://www.tensorflow.org/tutorials/structured_data/time_series (weather forecast example with real-world data!) will help you understand more about time-series data.
feature is what you want the model to make predictions from; in the above forecast example, it is an vector (array) of pressure, temperature, etc...
RNN/ LSTM are designed to handle time-series. This is why you need to feed timesteps, along with feature, to your model. timesteps represents when the data is recorded; again, in the example above, data is sampled every hour, so timesteps == 0 is the data taken at the first hour, timesteps == 1 the second hour, ...
Order of dimensions of the input/ output data
In TensorFlow, the first dimension of data often represents a batch.
What comes after the batch axis, depends on the problem field. In general, global features (like batch size) precedes element-specific features (like image size).
Examples:
time-series data are in (batch_size, timesteps, feature) format.
Image data are often represented in NHWC format: (batch_size, image_height, image_width, channels).
From https://www.tensorflow.org/guide/tensor#about_shapes :
While axes are often referred to by their indices, you should always
keep track of the meaning of each. Often axes are ordered from global
to local: The batch axis first, followed by spatial dimensions, and
features for each location last. This way feature vectors are
contiguous regions of memory.
Hello I am creating my first neural network using Tensorflow.js.
I want to use the points (0,0), (0,1), (1,0), (1,1) and the labels 0, 1, 1, 0 as inputs to my NN. I tried it the following way:
async function runModel() {
// Build and compile model.
const model = tf.sequential();
model.add(tf.layers.dense({units: 2, inputShape: [2]}));
model.compile({optimizer: 'sgd', loss: 'meanSquaredError'});
// Generate some synthetic data for training.
const xs = tf.tensor2d([[1], [0]], [2,1]);
const ys = tf.tensor2d([[1]], [1, 1]);
// Train model with fit().
await model.fit(xs, ys, {epochs: 10});
// Run inference with predict().
model.predict(tf.tensor2d([[0], [1]], [2, 1])).print();
}
runModel()
I end up with the error:
Uncaught (in promise) Error: Error when checking input: expected
dense_Dense1_input to have shape [,2], but got array with shape [2,1].
and I tried to play with all the parameters but I don't get it (even with documentation) how to succeed.
As already explained here and there, this error is thrown when there is a mismatch of the shape expected by the model and the shape of the training data.
expected dense_Dense1_input to have shape [,2], but got array with shape [2,1]
The error thrown is meaningful enough to help solve the issue. The first layer is expecting a tensor of shape [,2] since the inputShape is [2]. But xs has the shape [2, 1], it should rather have the shape [1, 2].
In the model, the last layer will return 2 values whereas in reality it should be only one ( an xor operation outputs only a single value). Therefore instead of units: 2, it should be units: 1. That means that ys should have the shape [,1]. The shape of ys is already what the model is supposed to have - so no changes there.
The shape of the tensor used for prediction should match the model input shape ie [, 2]
By making the above changes, it becomes the following:
const model = tf.sequential();
model.add(tf.layers.dense({units: 1, inputShape: [2]}));
model.compile({optimizer: 'sgd', loss: 'meanSquaredError'});
// Generate some synthetic data for training.
const xs = tf.tensor2d([[1, 0]]);
const ys = tf.tensor2d([[1]], [1, 1]);
// Train model with fit().
await model.fit(xs, ys, {epochs: 10});
// Run inference with predict().
model.predict(tf.tensor([[0, 1]], [1, 2])).print()
I have to swap tensor's axes using tf.transpose to do the batch matrix multiplication (as the code shown below).
tensor input_a: shape [10000, 10000]
tensor input_b: shape [batch_size, 10000, 10]
tensor output: shape [batch_size, 10000, 10]
# reshape_input_b: shape [10000, batch_size, 10]
transpose_input_b = tf.transpose(input_b, [1, 0, 2])
# transpose_input_b : shape [10000, batch_size * 10]
reshape_input_b = tf.reshape(transpose_input_b , [10000, -1])
# ret: shape [10000, batch_size * 10]
ret = tf.matmul(input_a, reshape_input_b, a_is_sparse = True)
# reshape_ret: [10000, batch_size, 10]
reshape_ret = tf.reshape(ret, [10000, -1, 10])
# output : [batch_size, 10000, 10]
output = tf.transpose(reshape_ret, [1, 0, 2])
However, it seems very slow. I noticed this in the document page of tf.transpose:
In numpy transposes are memory-efficient constant time operations as they simply return a new view of the same data with adjusted strides.
TensorFlow does not support strides, so transpose returns a new tensor with the items permuted.
So, I think it might be the reason why my code run slowly? Is there any way to swap tensor's axes, or do the batch matrix multiplication efficiently?
I am doing the image semantic segmentation job with unet. I am confused with the last layers for pixel classification. The Unet code is like this:
...
reshape = Reshape((n_classes,self.img_rows * self.img_cols))(conv9)
permute = Permute((2,1))(reshape)
activation = Activation('softmax')(permute)
model = Model(input = inputs, output = activation)
return model
...
Can I just reshape without using Permute like this?
reshape = Reshape((self.img_rows * self.img_cols, n_classes))(conv9)
Updated:
I found the training result is not right when when using the directly reshape way:
reshape = Reshape((self.img_rows * self.img_cols, n_classes))(conv9) // the loss is not convergent
My groundtruth is generated like this:
X = []
Y = []
im = cv2.imread(impath)
X.append(im)
seg_labels = np.zeros((height, width, n_classes))
for spath in segpaths:
mask = cv2.imread(spath, 0)
seg_labels[:, :, c] += mask
Y.append(seg_labels.reshape(width*height, n_classes))
Why reshape directly does not work?
You clearly misunderstand the meaning of each operation and the final goal:
final goal: classification for each pixel, i.e. softmax along the semantic class axis
how to achieve this goal in the original code? Let's see the code line by line:
reshape = Reshape((n_classes,self.img_rows * self.img_cols))(conv9) # L1
permute = Permute((2,1))(reshape) # L2
activation = Activation('softmax')(permute) # L3
L1's output dim = n_class-by-n_pixs, (n_pixs=img_rows x img_cols)
L2's output dim = n_pixs-by-n_class
L3's output dim = n_pixs-by-n_class
Note the default softmax activation is applied to the last axis, i.e. the axis that n_class stands for, which is the semantic class axis.
Therefore, this original code fulfills the final goal of semantic segmentation.
Let's revisit the code that you want to change, which is
reshape = Reshape((self.img_rows * self.img_cols, n_classes))(conv9) # L4
L4's output dim = n_pixs-by-n_class
My guess is that you think L4's output dim matches L2's, and thus L4 is a short-cut that is equivalent to executing L1 and L2.
However, matching the shape does not necessarily mean matching the physical meaning of axes. Why? A simple example will explain.
Say you have 2 semantic classes and 3 pixels. To see the difference assume all three pixels belong to the same class.
In other words, a ground truth tensor will look like this
# cls#1 cls#2
[ [0, 1], # pixel #1
[0, 1], # pixel #2
[0, 1], # pixel #3
]
Assume you have a perfect network and generate the exact response for each pixel, but your solution will create a tensor like below
# cls#1 cls#2
[ [0, 0], # pixel #1
[0, 1], # pixel #2
[1, 1], # pixel #3
]
whose shape is the same as the ground truth's, but fails to match the physical meaning of axes.
This further makes the softmax operation meaningless, because it is supposed to apply to the class dimension, but this dimension does not physically exist. As a result, it leads to the following erroneous output after applying softmax,
# cls#1 cls#2
[ [0.5, 0.5], # pixel #1
[0, 1], # pixel #2
[0.5, 0.5], # pixel #3
]
which completely mess up the training even if it is under the ideal assumption.
Therefore, it is a good habit to write down the physical meaning of each axis of a tensor. When you do any tensor reshape operation, ask yourself whether the physical meaning of an axis is changed in your expected way.
For example, if you have a tensor T of shape batch_dim x img_rows x img_cols x feat_dim, you can do many things and not all of them make sense (due to the problematic physical meaning of axes)
(Wrong) reshape it to whatever x feat_dim, because whatever dimension is meaningless in testing where the batch_size might be different.
(Wrong) reshape it to batch_dim x feat_dim x img_rows x img_cols, because the 2nd dimension is NOT the feature dimension and neither for the 3rd and 4th dimension.
(Correct) permute axes (3,1,2), and this will lead you the tensor of shape batch_dim x feat_dim x img_rows x img_cols, while keeping the physical meaning of each axis.
(Correct) reshape it to batch_dim x whatever x feat_dim. This is also valid, because the whatever=img_rows x img_cols is equivalent to the pixel location dimension, and both the meanings of batch_dim and feat_dim are unchanged.
Your code will still be runnable since the shape will be the same, but the result (backprops) will be different since the values of tensors will be different. For example:
arr = np.array([[[1,1,1],[1,1,1]],[[2,2,2],[2,2,2]],[[3,3,3],[3,3,3]],[[4,4,4],[4,4,4]]])
arr.shape
>>>(4, 2, 3)
#do reshape, then premute
reshape_1 = arr.reshape((4, 2*3))
np.swapaxes(reshape_1, 1, 0)
>>>array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
#do reshape directly
reshape_2 = arr.reshape(2*3, 4)
reshape_2
>>>array([[1, 1, 1, 1],
[1, 1, 2, 2],
[2, 2, 2, 2],
[3, 3, 3, 3],
[3, 3, 4, 4],
[4, 4, 4, 4]])
The Reshape and Permute is done to take the softmax at each pixel location. Adding to #meowongac's answer, Reshape preserves the order of the elements. In this case, since the channel dimensions have to be swapped, Reshape followed by Permute is appropriate.
Considering the case of (2,2) image with 3 values at each location,
arr = np.array([[[1,1],[1,1]],[[2,2],[2,2]],[[3,3],[3,3]]])
>>> arr.shape
(3, 2, 2)
>>> arr
array([[[1, 1],
[1, 1]],
[[2, 2],
[2, 2]],
[[3, 3],
[3, 3]]])
>>> arr[:,0,0]
array([1, 2, 3])
The channel values at each location are [1,2,3]. The goal is to swap the channel axis(length 3) to the end.
>>> arr.reshape((2,2,3))[0,0]
array([1, 1, 1]) # incorrect
>>> arr.transpose((1,2,0))[0,0] # similar to what permute does.
array([1, 2, 3]) # correct
More examples at this link: https://discuss.pytorch.org/t/how-to-change-shape-of-a-matrix-without-dispositioning-the-elements/30708
From the accepted answer in this question,
given the following
input and kernel matrices, the output of tf.nn.conv2d is
[[14 6]
[6 12]]
which makes sense. However, when I make the input and kernel matrices have 3-channels each (by repeating each original matrix), and run the same code:
# the previous input
i_grey = np.array([
[4, 3, 1, 0],
[2, 1, 0, 1],
[1, 2, 4, 1],
[3, 1, 0, 2]
])
# copy to 3-dimensions
i_rgb = np.repeat( np.expand_dims(i_grey, axis=0), 3, axis=0 )
# convert to tensor
i_rgb = tf.constant(i_rgb, dtype=tf.float32)
# make kernel depth match input; same process as input
k = np.array([
[1, 0, 1],
[2, 1, 0],
[0, 0, 1]
])
k_rgb = np.repeat( np.expand_dims(k, axis=0), 3, axis=0 )
# convert to tensor
k_rgb = tf.constant(k_rgb, dtype=tf.float32)
here's what my input and kernel matrices look like at this point
# reshape input to format: [batch, in_height, in_width, in_channels]
image_rgb = tf.reshape(i_rgb, [1, 4, 4, 3])
# reshape kernel to format: [filter_height, filter_width, in_channels, out_channels]
kernel_rgb = tf.reshape(k_rgb, [3, 3, 3, 1])
conv_rgb = tf.squeeze( tf.nn.conv2d(image_rgb, kernel_rgb, [1,1,1,1], "VALID") )
with tf.Session() as sess:
conv_result = sess.run(conv_rgb)
print(conv_result)
I get the final output:
[[35. 15.]
[35. 26.]]
But I was expecting the original output*3:
[[42. 18.]
[18. 36.]]
because from my understanding, each channel of the kernel is convolved with each channel of the input, and the resultant matrices are summed to get the final output.
Am I missing something from this process or the tensorflow implementation?
Reshape is a tricky function. It will produce you the shape you want, but can easily ground things together. In cases like yours, one should avoid using reshape by all means.
In that particular case instead, it is better to duplicate the arrays along the new axis. When using [batch, in_height, in_width, in_channels] channels is the last dimension and it should be used in repeat() function. Next code should better reflect the logic behind it:
i_grey = np.expand_dims(i_grey, axis=0) # add batch dim
i_grey = np.expand_dims(i_grey, axis=3) # add channel dim
i_rgb = np.repeat(i_grey, 3, axis=3 ) # duplicate along channels dim
And likewise with filters:
k = np.expand_dims(k, axis=2) # input channels dim
k = np.expand_dims(k, axis=3) # output channels dim
k_rgb = np.repeat(k, 3, axis=2) # duplicate along the input channels dim