Why are my TRAINABLE_VARIABLES in Tensorflow so weird? - tensorflow

I've just started my first TF project.
I trained a 4 layer vanilla NN on MNIST.
Then I wanted to display the learned weights,
but weirdly I got way more output than I expected.
I used
sess.run(tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, "my_w1"))
where I had previously defined
tf.Variable(tf.random_normal([layer_sizes[i-1], layer_sizes[i]]), name = "my_w1").
The problem is, that I expected a 2d array of the shape (784, 500),
but I got a 3d one of the shape (15, 784, 500).
What does the first dimension mean?

This is your batch size: the number of images you use in each iteration. It comes from this part of the code: epoch_x, epoch_y = mnist.train.next_batch(batch_size)

Related

Keras model.fit, dimensions must be equal?

I am a newbie in ML. I have a set of timeseries data with Date and Temp cols., that I want to use for anomaly detection. I used the MinMax scaler on the data and I got an array normal_train_data with shape (200, 0).
Then I used the autoencoder which uses
keras.layers.Dense(128, activation ='sigmoid').
After that, when I call
history = model.fit(normal_train_data, normal_train_data, epochs= 50, batch_size=128, validation_data=(train_data_scaled[:,1:], train_data_scaled[:,1:]) ...)
I get the error:
ValueaError: Dimensions must be equal but are 128 and 0 with input shapes: [?,128], [?,0].
As far as I understand the input has shape (200,0) and the output(1,128).
Can you help me to fix this error please? Thankyou
I tried to use tf.keras.layers.Flatten() in the encoder part. I am not sure if it's ok to use Dense layer or should I choose another.

Properly using tf.GradientTape().gradient

This is not a code question, but a "how it works" morelike one. I have a model which has as inputs 4 tf.Tensor() of same shapes (60, 200, 15000). And my output is a tf.Tensor with shape (60, 200). My custom loss changes the shape of all 4 tensors to output's shape so here there is no problem. Then, inside my custom loss, I measure the loss and compute it. My question comes after, by doing loss = tf.GradientTape().gradient(loss_fn, model.trainable_variables) and optimizer.apply_gradients(zip(loss, model.trainable_variables).
How does the gradient "know" which variables to apply itself? How can I know if the gradient is properly made?

Strange output of Conv2D in tflite graph

I have a tflite graph fragment of which depicted on attached picture
I needed to debug it's behavior and already on the first step I got quite puzzling results.
When I feed zeros tensor as input after first Conv2D I expect to get a tensor which consists only of values from bias of Conv2D (since all kernel elements get multiplied by zeros), but instead I've got a tensor which consists of some random data, here is the code snippet:
def test_graph(path=PATH_DEFAULT):
interp = tf.lite.Interpreter(path)
interp.allocate_tensors()
input_details = interp.get_input_details()
in_idx = input_details[0]['index']
zeros = np.zeros(shape=(1, 256, 256, 3), dtype=np.float32)
interp.set_tensor(in_idx, zeros)
interp.invoke()
# index of output of first conv2d operator is 3 (see netron pic)
after_conv_2d = interp.get_tensor(3)
# shape of bias is just [count of output channels]
n, h, w, c = after_conv_2d.shape
# if we feed zeros as input, we can expect that the only values we get are the values of bias
# since all kernel elems in that case are multiplied by zeros
uniq_vals_cnt = len(np.unique(after_conv_2d))
assert uniq_vals_cnt <= c, f"There are {uniq_vals_cnt} in output, should be <= than {c}"
output:
AssertionError: There are 287928 in output, should be <= than 24
Can someone help me with my misunderstanding?
Seems my assumption that I can get any intermediate tensor from interpreter is wrong, we can do it only for outputs, even though interpreter do not raise error and even gives tensors of the right shape for indices related to non-output tesnors.
One way to debug such graph would be to make all tensors outputs, but it seems easiest way to do it would be converting tflite file to pb with toco and then convert pb back to tflite with new outputs specified. This way is not ideal though because toco support for tflite -> pb conversion was removed after 1.9 and using versions before that can break (in my case it breaks) on some graphs.
More of it is here:
tflite: get_tensor on non-output tensors gives random values

CNN Keras Object Localization - Bad predictions

I'm a beginner in machine learning and I currently am trying to predict the position of an object within an image that is part of a dataset I created.
This dataset contains about 300 images in total and contains 2 classes (Ace and Two).
I created a CNN that predicts whether it's an Ace or a two with about 88% accuracy.
Since this dataset was doing a great job, I decided to try and predict the position of the card (instead of the class). I read up some articles and from what I understood, all I had to do was to take the same CNN that I used to predict the class and to change the last layer for a Dense layer of 4 nodes.
That's what I did, but apparently this isn't working.
Here is my model:
model = Sequential()
model.add(Conv2D(64,(3,3),input_shape = (150,150,1)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=2))
model.add(Conv2D(32,(3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=2))
model.add(Dense(64))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(4))
model.compile(loss="mean_squared_error",optimizer='adam',metrics=[])
model.fit(X,y,batch_size=1,validation_split=0,
epochs=30,verbose=1,callbacks=[TENSOR_BOARD])
What I feed to my model:
X: a grayscale Image of 150x150 pixels. Each pixels are rescaled between [0-1]
y: Smallest X coordinate, Highest Y coordinate, Width and Height of the object (each of those values are between [0-1].
And here's an example of predictions it gives me:
[array([ 28.66145 , 41.278576, -9.568813, -13.520659], dtype=float32)]
but what I really wanted was:
[0.32, 0.38666666666666666, 0.4, 0.43333333333333335]
I knew something was wrong here so I decided to train and test my CNN on a single image (so it should overfit and predict the right bounding box for this single image if it worked). Even after overfitting on this single image, the predicted values were ridiculously high.
So my question is:
What am I doing wrong ?
EDIT 1
After trying #Matias's solution which was to add a sigmoid activation function to the last layer, all of the output's values are now between [0,1].
But, even with this, the model still produces bad outputs.
For example, after training it 10 epochs on the same image, it predicted this:
[array([0.0000000e+00, 0.0000000e+00, 8.4378130e-18, 4.2288357e-07],dtype=float32)]
but what I expected was:
[0.2866666666666667, 0.31333333333333335, 0.44666666666666666, 0.5]
EDIT 2
Okay, so, after experimenting for quite a while, I've come to a conclusion that the problem was either my model (the way it is built)
or the lack of training data.
But even if it was caused by a lack of training data, I should have been able to overfit it on 1 image in order to get the right predictions for this one, right?
I created another post which asks about my last question since the original one has been answered and I don't want to completely re-edit the post since it would make the first answers kind of pointless.
Since your targets (the Y values) are normalized to the [0, 1] range, the output of the model should match this range. For this you should use a sigmoid activation at the output layer, so the output is constrained to the [0, 1] range:
model.add(Dense(4, activation='sigmoid'))

Regarding setting up the target tensor shape for sparse_categorical_crossentropy

I am trying to experiment with a multi-layer encoder-decoder type of network. The screenshot of the last several layers of network architecture is as follows. This is how I setup model compiling and training process.
optimizer = SGD(lr=0.001, momentum=0.9, decay=0.0005, nesterov=False)
autoencoder.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])
model.fit(imgs_train, imgs_mask_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1,callbacks=[model_checkpoint])
imgs_train and imgs_mask_train are of shape (2000, 1, 128, 128). imgs_train represent the raw image and imgs_mask_train represents the mask image. I am trying to solve a semantic segmentation problem. However, running the program generates the following error message, (I only keep the main related part).
tensorflow.python.pywrap_tensorflow.StatusNotOK: Invalid argument: logits first dimension must match labels size. logits shape=[4096,128] labels shape=[524288]
[[Node: SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_364, Cast_158)]]
It seems to me that the loss function of sparse_categorical_crossentropy causes the problem for the current (imgs_train, imgs_mask_train) shape setting. The Keras API does not include the detail about how to setup the target tensor. Any suggestions are highly appreciated!
I am currently trying to figure the same problem and as far as I can tell it takes a sparse representation of the target category. That means integers as the target label instead of the one-hot encoded binary class matrix.
Concerning your problem, do you have categories in your masking or do you just have information about the outline of an object? With outline information it becomes a pixel wise binary loss instead of a categorical one. If you have categories, the output of your decoder should have dimensionality (None, number_of_classes, 128, 128). On that you should be able to use a sparse target mask but I haven't tried this myself...
Hope that helps