concat_axis in the merge function of Keras - tensorflow

In keras, we can use merge to concatenate two layers. There is a parameter concat_axis. Looks like the default value for this parameter is -1, and quite some code setup it as 1. What do they mean, concat_axis=1 and concat_axis=-1, respectively. I could not find the explanation in Keras document. Thanks.

concat_axis means the axis/dimension to concatenate.
if your input tensor has shape (samples, channels, rows, cols),
set concat_axis to 1 to concatenate per feature map (channels axis).

Related

Why is "step" argument necessary when predicting using data tensors? what does this error mean?

I am trying to predict() the output for a single data point d, using my trained Keras model loaded from a file. But I get a ValueError If predicting from data tensors, you should specify the 'step' argument. What does that mean?
I tried setting step=1, but then I get a different error ValueError: Cannot feed value of shape () for Tensor u'input_1:0', which has shape '(?, 600)'.
Here is my code:
d = np.concatenate((hidden[p[i]], hidden[x[i]])).resize((1,600))
hidden[p[i]] = autoencoder.predict(d,steps=)
The model is expecting (?,600) as input. I have concatenated two numpy arrays of shape (300,) each to get (600,), which is resized to (1,600). This (1,600) is my input to predict().
In my case, the input to predict was None (because I had a bug in another part of the code).
In official doc, steps refer to the total number of steps before stopping. So steps=1 means make predictions on one batch instead of making prediction on one record (single data point).
https://keras.io/models/sequential/
-> Define value of steps argument,
d = np.concatenate((hidden[p[i]],
hidden[x[i]])).resize((1,600))
hidden[p[i]] = autoencoder.predict(d,steps=1)
If you are using a test data generator, it is good practice to define the steps, as mentioned in the documentation.
If you are predicting a single instance, no need to define the steps. Just make sure the argument (i.e. instance 'd') is not None, otherwise that error will show up. Some reshaping may also be necessary.
in my case i got the same error, i just reshaped the data to predict with numpy function reshape() to the shape of the data originally used to train the model.

How to change the tensor shape in middle layers?

Saying I have a 2000x100 matrix, I put it into 10 dimension embedding layer, which gives me 2000x100x10 tensor. so it's 2000 examples and each example has a 100x10 matrix. and then, I pass it to a conv1d and KMaxpolling to get 2000x24 matrix, which is 2000 examples and each example has a 24 dimension vector. and now, I would like to recombine those examples before I apply another layer. I would like to combine the first 10 examples together, and such and such, so I get a tuple. and then I pass that tuple to the next layer.
My question is, Can I do that with Keras? and any idea on how to do it?
The idea of using "samples" is that these samples should be unique and not relate to each other.
This is something Keras will demand from your model: if it started with 2000 samples, it must end with 2000 samples. Ideally, these samples do not talk to each other, but you can use custom layers to hack this, but only in the middle. You will need to end with 2000 samples anyway.
I believe you're going to end your model with 200 groups, so maybe you should already start with shape (200,10,100) and use TimeDistributed wrappers:
inputs = Input((10,100)) #shape (200,10,100)
out = TimeDistributed(Embedding(....))(inputs) #shape (200,10,100,10)
out = TimeDistributed(Conv1D(...))(out) #shape (200,10,len,filters)
#here, you use your layer that will work on the groups without TimeDistributed.
To reshape a tensor without changing the batch size, use the Reshape(newShape) layer, where newShape does not include the first dimension (batch size).
To reshape a tensor including the batch size, use a Lambda(lambda x: K.reshape(x,newShape)) layer, where newShape includes the first dimension (batch size) - Here you must remember the warning above: somewhere you will need to undo this change so you end up with the same batch size as the input.

Do I need to add custom gradients for this op?

I have a convolutional layer, which produces 16 output feature maps, and I want to take these maps and transform them into 4 maps like this:
Split 16 feature maps into 4 groups, 4 maps each.
Multiply each group by a mask to zero out some values.
Add the resulting feature maps in each group to get 4 maps.
Or, I can first multiply all 16 maps by a mask, and then split the result into 4 groups to do reduce_sum on each group. The resulting 4 maps will be used as input to the next convolutional or pooling layer.
Will Tensorflow be able to automatically calculate gradient for this combination of tf.split, tf.multiply, and tf.reduce_sum?
EDIT: here's the series of ops, where conv is an output from tf.layers.conv2d, and mask is a binary numpy array of the same shape as conv (full code is here):
conv_masked = mask * conv
conv_grouped = tf.reshape(conv_masked, (batch_size, num_groups, fs*fs, dim, dim))
out = tf.reduce_sum(conv_grouped, axis=2)
All tensorflow operations already have the gradient formula implemented. As long as all your operations are tf.operation, you are fine.
Also, as you can see here, tensorflow overloads basic operations.
masked_tensor = tensor * mask
masked_tensor = tf.multiply(tensor, mask)
If the elements involved are tensors then the two expressions above are equivalent.
As for the type used for the mask
mask = tf.constant(array)
mask = np.array(array)
For me, using python 3.6.3 and tensorflow 1.3.0 both generated the same result from the operation. But I found nothing in the documentation that explicitly says that np.arrays are always accepted, so I would avoid it.
One point of notice though is that the mask you are multiplying by should be a non-trainable variable. Otherwise the optimizer will alter your mask.

TensorFlow: evaluate a single input with sess.run

I've trained a network in TensorFlow using a session
sess.run([train_op, loss], feed_dict=feed_dict)
What's the simplest way to give the session an input of a single data row and print the output?
I've tried (many variations of)
sess.run(print_function, data_row)
But I get the result
You must feed a value for placeholder tensor 'Placeholder_1'
with dtype int32 and shape [<batch size>]
Assumption - Your first dimension represents the number of inputs (as is the case in most offical examples)
I'd suggest making the first dimension of the placeholder as None to have the option of passing any number of batches. Have a look at this tutorial for an example - http://learningtensorflow.com/lesson4/
Quoting from this tutorial,
The first dimension of the placeholder is None, meaning we can have any number of rows. The second dimension is fixed at 3, meaning each row needs to have three columns of data.
It's also well documented in the official documentation in https://www.tensorflow.org/resources/faq#how_do_i_build_a_graph_that_works_with_variable_batch_sizes.

Dynamic LSTM model in Tensorflow

I am looking to design a LSTM model using Tensorflow, wherein the sentences are of different length. I came across a tutorial on PTB dataset (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/rnn/ptb/ptb_word_lm.py). How does this model capture the instances of varying length? The example does not discuss anything about padding or other technique to handle the variable size sequences.
If I use padding, what should be the unrolling dimension?
You can do this in two way.
TF has a way to specify the input size. Look for a parameter called "sequence_length", I have used this in tf.nn.bidirectional_rnn. So the TF will unroll your cell only up to sequence_length but not to the step size.
Pad your input with predefined dummy input and predefined dummy output (for the dummy output). The lstm cell will learn to predict dummy output for the dummy input. When using it (say for matrix calculation) chop of the dummy parts.
The PTB model is truncated in time -- it always back-propagates a fixed number of steps (num_steps in the configs). So there is no padding -- it just reads the data and tries to predict the next word, and always reads num_steps words at a time.