TF-slim layers count - tensorflow

Would the code below represent one or two layers? I'm confused because isn't there also supposed to be an input layer in a neural net?
input_layer = slim.fully_connected(input, 6000, activation_fn=tf.nn.relu)
output = slim.fully_connected(input_layer, num_output)
Does that contain a hidden layer? I'm just trying to be able to visualize the net. Thanks in advance!

You have a neural network with one hidden layer. In your code, input corresponds to the 'Input' layer in the above image. input_layer is what the image calls 'Hidden'. output is what the image calls 'Output'.
Remember that the "input layer" of a neural network isn't a traditional fully-connected layer since it's just raw data without an activation. It's a bit of a misnomer. Those neurons in the picture above in the input layer are not the same as the neurons in the hidden layer or output layer.

From tensorflow-slim:
Furthermore, TF-Slim's slim.stack operator allows a caller to repeatedly apply the same operation with different arguments to create a stack or tower of layers. slim.stack also creates a new tf.variable_scope for each operation created. For example, a simple way to create a Multi-Layer Perceptron (MLP):
# Verbose way:
x = slim.fully_connected(x, 32, scope='fc/fc_1')
x = slim.fully_connected(x, 64, scope='fc/fc_2')
x = slim.fully_connected(x, 128, scope='fc/fc_3')
# Equivalent, TF-Slim way using slim.stack:
slim.stack(x, slim.fully_connected, [32, 64, 128], scope='fc')
So the network mentioned here is a [32, 64,128] network - a layer with a hidden size of 64.

Related

gaussian projection versus gaussian noise

I am facing difficulties with the following layer in keras:
gaussian_projection = 64
gaussian_scale = 20
initializer = tf.keras.initializers.TruncatedNormal(mean=0.0, stddev=gauss_scale)
proj_kernel = tf.keras.layers.Dense(gaussian_projection, use_bias=False, trainable=False,
kernel_initializer=initializer)
What does above layers intends to do? Is it a layer to add gaussian noise or something different?
I hope someone knows about it.
##################### Another 2nd version of the layer ##########
input_dim = 3
new_layer = tf.keras.layers.Dense(input_dim, use_bias=False, trainable=False,
kernel_initializer='identity')
tf.keras.layers.GaussianNoise(stddev=gaussian_scale)
Does both version of layers (1st and 2nd) intends to do the same thing, i.e., adding gaussian noise?
I think the above 2 are different as follows:
The first block of codes basically create a Dense layer, in which the gaussian_projection variable is the number of units and the initializer is a way to initialize the layer. This initialization is normally done to improve the convergence of the layer and network; but overall, the first block of codes is a typical Dense layer. I think there is no noise added in this first block of code.
On the other hand, the second block of codes create a GaussianNoise layer after the Dense layer, which is normally done to regularize the network and reduce overfitting. And based on the official documentation, this GaussianNoise layer is only active during training.

How to use embedding models in tensorflow hub with LSTM layer?

I'm learning tensorflow 2 working through the text classification with TF hub tutorial. It used an embedding module from TF hub. I was wondering if I could modify the model to include a LSTM layer. Here's what I've tried:
train_data, validation_data, test_data = tfds.load(
name="imdb_reviews",
split=('train[:60%]', 'train[60%:]', 'test'),
as_supervised=True)
embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(embedding, input_shape=[],
dtype=tf.string, trainable=True)
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Embedding(10000, 50))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(1))
model.summary()
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_data.shuffle(10000).batch(512),
epochs=10,
validation_data=validation_data.batch(512),
verbose=1)
results = model.evaluate(test_data.batch(512), verbose=2)
for name, value in zip(model.metrics_names, results):
print("%s: %.3f" % (name, value))
I don't know how to get the vocabulary size from the hub_layer. So I just put 10000 there. When run it, it throws this exception:
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[480,1] = -6 is not in [0, 10000)
[[node sequential/embedding/embedding_lookup (defined at .../learning/tensorflow/text_classify.py:36) ]] [Op:__inference_train_function_36284]
Errors may have originated from an input operation.
Input Source operations connected to node sequential/embedding/embedding_lookup:
sequential/embedding/embedding_lookup/34017 (defined at Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py:112)
Function call stack:
train_function
I stuck here. My questions are:
how should I use the embedding module from TF hub to feed an LSTM layer? it looks like embedding lookup has some issues with the setting.
how do I get the vocabulary size from the hub layer?
Thanks
Finally figured out the way to link pre-trained embeddings to LSTM or other layers. Just post the steps here in case anyone feels helpful.
Embedding layer has to be the first layer in the model. (hub_layer is the same as Embedding layer.) The not very intuitive part is that any text input to the hub layer will be converted to only one vector of shape [embedding_dim]. You need to do sentence splitting and tokenization to make sure whatever input to the model is a sequence in the form of array of arrays. e.g., "Let us prepare the data." should be converted to [["let"],["us"],["prepare"], ["the"], ["data"]]. You will also need to pad the sequences if you are using batch mode.
In addition, you will need to convert your target tokens to int if your training labels are strings. The input to the model is array of strings with shape [batch, seq_length], the hub embedding layer converts it to [batch, seq_length, embed_dim]. (If you add a LSTM or other RNN layer, the output from the layer is [batch, seq_length, rnn_units]. ) The output dense layer will output index of text instead of actual text. The index of text is stored in the downloaded tfhub directory as "tokens.txt". You can load the file and convert text to the corresponding index. Otherwise you cannot compute the loss.

In CNN network, should input image go to all neurons in first convolution layer (I mean first hidden layer) or not?

I am new to CNN topic, I have one basic question regarding mapping between input image with neurons in first convolution layer.
My question is :
should input image go to all neurons in first convolution layer (I mean first hidden layer) or not ?
For example: if my first hidden layer in CNN as 8 neurons, in that case complete input image is passed to all these 8 neurons or only set of pixels of input image is passed to each neuron.
I am not sure whether you understand what you are asking because the number of neurons in a convolutional layer is something that you don't concern yourself too much with unless you are building your own implementation of a CNN.
To answer your question - No, each neuron in the first convolutional layer is connected only to pixels in its own receptive field (given by kernel size) and the same logic also applies to next convolutional layers as well except now they are connected to lower layer neurons in their receptive field.
For example: if my first hidden layer in CNN as 8 neurons
How do you know that it has 8 neurons? Unless you are doing some low level CNN programming, you do not specify the number of neurons. The number of neurons that you need is usually given by a combination of kernel size, stride, type of padding and the amount of filters that you have chosen to use. These 4 things (together with input size of the image) tells you exactly how many neurons you need.
For example, in Keras (since you have tagged this question with tensorflow) you might see convolutional layer like this one:
keras.layers.Conv2D(filters=128, kernel_size=3, strides=1, activation="relu",
input_shape=(100, 100, 1))
As you can see, you don't specify anything like the number of neurons here (at least not directly). Under this settings, you end up with output whose width and height are cropped by 2 (due to the default padding) so the output shape of this layer is (128, 98, 98) (technically it has a shape of (None, 128, 98, 98) where None is for batch size). If you would flatten this and feed the output into a single neuron (let's say in a dense layer), you would end up with 128 * 98 * 98 = 1,229,313 weights just between these two layers.
So for the analogy between dense and convolutional layer, the above convolutional layers with 128 filters connected to one output neuron is similar to having dense layer with 1,229,313 neurons connected to one output neuron.

How to use tf.nn.ctc_loss in cnn+ctc network

Recently, I try to use tensorflow to implement a cnn+ctc network base on the article Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks.
I try to feed batch spectrogram data (shape:(10,120,155,3),batch_size is 10) into 10 convolution layer and 3 fully connected layer. So the output before connecting the ctc layer is 2d data(shape:(10,1024)).
Here is my problem: I want to use tf.nn.ctc_loss function in tensorflow library,but it generate the ValueError: Dimension must be 2 but is 3 for 'transpose'(op:'Transpose') with input shapes:[?,1024],[3].
I guess the error is related to the dimension of my 2d input data. The discription of the ctc_loss function in tensorflow official site is require a 3d input with the shape (batch_size x max_time x num_classes).
So, what is the extra dimension of 'num_classes' ? what should I change the shape of my cnn+fc output data?
The fully connected layer should be applied per time step.
It's like applying same dense layer per time step in recurrent neural network.
For output of convolution layer, time step is width.
So for example, output shape would be:
convolution: (10,120,155,3) = (batch, height, width, channels)
flatten: (10, 155, 120*3) = (batch, max_time, features)
fully connected: (10, 155, 1024), (same dense layer applied per time step)
(10, 155, num_classes)
It is expected shape for ctc_loss in tensorflow.

Regarding setting up the target tensor shape for sparse_categorical_crossentropy

I am trying to experiment with a multi-layer encoder-decoder type of network. The screenshot of the last several layers of network architecture is as follows. This is how I setup model compiling and training process.
optimizer = SGD(lr=0.001, momentum=0.9, decay=0.0005, nesterov=False)
autoencoder.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])
model.fit(imgs_train, imgs_mask_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1,callbacks=[model_checkpoint])
imgs_train and imgs_mask_train are of shape (2000, 1, 128, 128). imgs_train represent the raw image and imgs_mask_train represents the mask image. I am trying to solve a semantic segmentation problem. However, running the program generates the following error message, (I only keep the main related part).
tensorflow.python.pywrap_tensorflow.StatusNotOK: Invalid argument: logits first dimension must match labels size. logits shape=[4096,128] labels shape=[524288]
[[Node: SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_364, Cast_158)]]
It seems to me that the loss function of sparse_categorical_crossentropy causes the problem for the current (imgs_train, imgs_mask_train) shape setting. The Keras API does not include the detail about how to setup the target tensor. Any suggestions are highly appreciated!
I am currently trying to figure the same problem and as far as I can tell it takes a sparse representation of the target category. That means integers as the target label instead of the one-hot encoded binary class matrix.
Concerning your problem, do you have categories in your masking or do you just have information about the outline of an object? With outline information it becomes a pixel wise binary loss instead of a categorical one. If you have categories, the output of your decoder should have dimensionality (None, number_of_classes, 128, 128). On that you should be able to use a sparse target mask but I haven't tried this myself...
Hope that helps