Keras Input Layer Shape On Input Layer Error - tensorflow

I am trying to learn ai algorithms by building. I found a question on Stackoverflow which is here.
I copied this code to try it out, and then modified it to this.
import numpy as np
import tensorflow as tf
from tensorflow import keras as keras
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.models import Sequential
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from tensorflow.python.keras import activations
# Importing the dataset
dataset = np.genfromtxt("data.txt", delimiter='')
X = dataset[:, :-1]
y = dataset[:, -1]
# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.08, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Initialising the ANN
#model = Sequential()
# Adding the input layer and the first hidden layer
#model.add(Dense(32, activation = 'relu', input_dim = 6))
# Adding the second hidden layer
#model.add(Dense(units = 32, activation = 'relu'))
# Adding the third hidden layer
#model.add(Dense(units = 32, activation = 'relu'))
# Adding the output layer
#model.add(Dense(units = 1))
#model = Sequential([
# keras.Input(shape= (6),name= "digits"),
# Dense(units = 32, activation = "relu"),
# Dense(units = 32, activation = "relu"),
# Dense(units = 1 , name = "predict")##
#])
#
input = keras.Input(shape= (6),name= "digits")
#x0 = Dense(units = 6)(input)
x1 = Dense(units = 32, activation = "relu")(input)
x2 = Dense(units = 32, activation = "relu")(x1)
output = Dense(units = 1 , name = "predict")(x2)
model = keras.Model(inputs = input , outputs= output)
#model.add(Dense(1))
# Compiling the ANN
#model.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fitting the ANN to the Training set
#model.fit(X_train, y_train, batch_size = 10, epochs = 200)
optimizer = keras.optimizers.Adam(learning_rate=1e-3)
loss = keras.losses.MeanSquaredError()
epochs = 200
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
# Iterate over the batches of the dataset.
for step in range(len(X_train)):
# Open a GradientTape to record the operations run
# during the forward pass, which enables auto-differentiation.
with tf.GradientTape() as tape:
# Run the forward pass of the layer.
# The operations that the layer applies
# to its inputs are going to be recorded
# on the GradientTape.
logits = model( X_train[step] , training=True) # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = loss(y_train[step], logits)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))
# Log every 200 batches.
if step % 200 == 0:
print(
"Training loss (for one batch) at step %d: %.4f"
% (step, float(loss_value))
)
print("Seen so far: %s samples" % ((step + 1) * 64))
y_pred = model.predict(X_test)
plt.plot(y_test, color = 'red', label = 'Real data')
plt.plot(y_pred, color = 'blue', label = 'Predicted data')
plt.title('Prediction')
plt.legend()
plt.show()
I modified the code for creating data when processing. If I use model.fit, it uses data I have given but I wanted to when epochs start to create data from a simulation and then process it.(sorry for bad english. if i couldn't explain very well)
When I start code in line 81:
Exception has occurred: ValueError
Input 0 of layer dense is incompatible with the layer: : expected min_ndim=2, found ndim=1. Full shape received: (6,)
It gives an Exception. I tried to use shape=(6,) shape=(6,1) or similar to this but it doesn't fix anything.

You need to add a batch dimension when calling the keras model:
logits = model( X_train[step][np.newaxis,:] , training=True) # Logits for this minibatch
A batch dimension is used to feed multiple samples to the network. By default, Keras assumes that the input has a batch dimension. To feed one sample, Keras expects a batch of 1 sample. In that case, it means a shape of (1,6). If you want to feed a batch of 2 samples, then the shape will be (2,6), etc.

Related

"No gradients provided for any variable" error when trying to use GradientTape mechanism

I'm trying to use GradientTape mechanism for the first time. I've looked at some examples but I'm getting the "No gradients provided for any variable" error and was wondering how to overcome this?
I want to define some complex loss functions, so I tried using GradientTape to produce its gradient for the CNN training. What was I doing wrong and can I fix it?
Attached is a run-able example code that demonstrates my problem:
# imports
import numpy as np
import tensorflow as tf
import sklearn
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
tf.config.run_functions_eagerly(True)
#my loss function
def my_loss_fn(y_true, y_pred):
` # train SVM classifier
VarC=1E6
VarGamma='scale'
clf = SVC(kernel='rbf', C=VarC, gamma=VarGamma, probability=True )
clf.fit(y_pred, y_true)
y_pred = clf.predict_proba(y_pred)
scce = tf.keras.losses.SparseCategoricalCrossentropy()
return scce(y_true, y_pred)
`
#creating inputs to demontration
X0=0.5*np.ones((12,12))
X0[2:12:4,:]=0
X0[3:12:4,:]=0
X1=0.5*np.ones((12,12))
X1[1:12:4,:]=0
X1[2:12:4,:]=0
X1=np.transpose(X1)
X=np.zeros((2000,12,12))
for i in range(0,1000):
X[i]=X0+np.random.rand(12,12)
for i in range(1000,2000):
X[i]=X1+np.random.rand(12,12)
y=np.zeros(2000, dtype=int)
y[1000:2000]=1
x_train, x_val, y_train, y_val = train_test_split(X, y, train_size=0.5)
x_val, x_test, y_val, y_test = train_test_split(x_val, y_val, train_size=0.5)
x_train = tf.convert_to_tensor(x_train)
x_val = tf.convert_to_tensor(x_val)
x_test = tf.convert_to_tensor(x_test)
y_train = tf.convert_to_tensor(y_train)
y_val = tf.convert_to_tensor(y_val)
y_test = tf.convert_to_tensor(y_test)
inputs = keras.Input((12,12,1), name='images')
x0 = tf.keras.layers.Conv2D(8,4,strides=4)(inputs)
x0 = tf.keras.layers.AveragePooling2D(pool_size=(3, 3), name='pooling')(x0)
outputs = tf.keras.layers.Flatten(name='predictions')(x0)
model = keras.Model(inputs=inputs, outputs=outputs)
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001)
# Instantiate a loss function.
loss_fn = my_loss_fn
# Prepare the training dataset.
batch_size = 256
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
epochs = 100
for epoch in range(epochs):
print('Start of epoch %d' % (epoch,))
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
# Open a GradientTape to record the operations run
# during the forward pass, which enables autodifferentiation.
with tf.GradientTape() as tape:
tape.watch(model.trainable_weights)
# Run the forward pass of the layer.
# The operations that the layer applies
# to its inputs are going to be recorded
# on the GradientTape.
logits = model(x_batch_train, training=True) # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = loss_fn(y_batch_train, logits)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))
# Log every 200 batches.
if step % 200 == 0:
print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
print('Seen so far: %s samples' % ((step + 1) * 64))
And when running, I get:
ValueError: No gradients provided for any variable: (['conv2d_2/kernel:0', 'conv2d_2/bias:0'],). Provided grads_and_vars is ((None, <tf.Variable 'conv2d_2/kernel:0' shape=(4, 4, 1, 8) dtype=float32, nump
If I use some standard loss function:
For example the following model and loss function
inputs = keras.Input((12,12,1), name='images')
x0 = tf.keras.layers.Conv2D(8,4,strides=4)(inputs)
x0 = tf.keras.layers.AveragePooling2D(pool_size=(3, 3), name='pooling')(x0)
x0 = tf.keras.layers.Flatten(name='features')(x0)
x0 = layers.Dense(16, name='meta_features')(x0)
outputs = layers.Dense(2, name='predictions')(x0)
model = keras.Model(inputs=inputs, outputs=outputs)
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
Everything works fine and converges well.
What am I doing wrong and can I fix it?

Mean of Tensorflow Keras's Glorot Normal Initializer is not zero

As per the documentation of Glorot Normal, mean of the Normal Distribution of the Initial Weights should be zero.
Draws samples from a truncated normal distribution centered on 0
But it doesn't seem to be zero, am I missing something?
Please find the code below:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
print(tf.__version__)
initializer = tf.keras.initializers.GlorotNormal(seed = 1234)
model = Sequential([Dense(units = 3, input_shape = [1], kernel_initializer = initializer,
bias_initializer = initializer),
Dense(units = 1, kernel_initializer = initializer,
bias_initializer = initializer)])
batch_size = 1
x = np.array([-1.0, 0, 1, 2, 3, 4.0], dtype = 'float32')
y = np.array([-3, -1.0, 1, 3.0, 5.0, 7.0], dtype = 'float32')
x = np.reshape(x, (-1, 1))
# Prepare the training dataset.
train_dataset = tf.data.Dataset.from_tensor_slices((x, y))
train_dataset = train_dataset.shuffle(buffer_size=64).batch(batch_size)
epochs = 1
learning_rate=1e-3
# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate)
for epoch in range(epochs):
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
logits = model(x_batch_train, training=True) # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = tf.keras.losses.MSE(y_batch_train, logits)
Initial_Weights_1st_Hidden_Layer = model.trainable_weights[0]
Mean_Weights_Hidden_Layer = tf.reduce_mean(Initial_Weights_1st_Hidden_Layer)
Initial_Weights_Output_Layer = model.trainable_weights[2]
Mean_Weights_Output_Layer = tf.reduce_mean(Initial_Weights_Output_Layer)
Initial_Bias_1st_Hidden_Layer = model.trainable_weights[1]
Mean_Bias_Hidden_Layer = tf.reduce_mean(Initial_Bias_1st_Hidden_Layer)
Initial_Bias_Output_Layer = model.trainable_weights[3]
Mean_Bias_Output_Layer = tf.reduce_mean(Initial_Bias_Output_Layer)
if epoch ==0 and step==0:
print('\n Initial Weights of First-Hidden Layer = ', Initial_Weights_1st_Hidden_Layer)
print('\n Mean of Weights of Hidden Layer = %s' %Mean_Weights_Hidden_Layer.numpy())
print('\n Initial Weights of Second-Hidden/Output Layer = ', Initial_Weights_Output_Layer)
print('\n Mean of Weights of Output Layer = %s' %Mean_Weights_Output_Layer.numpy())
print('\n Initial Bias of First-Hidden Layer = ', Initial_Bias_1st_Hidden_Layer)
print('\n Mean of Bias of Hidden Layer = %s' %Mean_Bias_Hidden_Layer.numpy())
print('\n Initial Bias of Second-Hidden/Output Layer = ', Initial_Bias_Output_Layer)
print('\n Mean of Bias of Output Layer = %s' %Mean_Bias_Output_Layer.numpy())
Because you don't draw too many samples from that distribution.
initializer = tf.keras.initializers.GlorotNormal(seed = 1234)
mean = tf.reduce_mean(initializer(shape=(1, 3))).numpy()
print(mean) # -0.29880756
But if you increase the samples:
initializer = tf.keras.initializers.GlorotNormal(seed = 1234)
mean = tf.reduce_mean(initializer(shape=(1, 500))).numpy()
print(mean) # 0.003004579
Same thing applies for your example too. If you increase first dense layer's units to 500, you should see 0.003004579 with same seed.

Implementing Variational Auto Encoder using Tensoroflow_Probability NOT on MNIST data set

I know there are many questions related to Variational Auto Encoders. However, this question in two aspects differs from the existing ones: 1) it is implemented using Tensforflow V2 and Tensorflow_probability; 2) It does not use MNIST or any other image data set.
As about the problem itself:
I am trying to implement VAE using Tensorflow_probability and Keras. and I want to train and evaluate it on some synthetic data sets --as part of my research. I provided the code below.
Although the implementation is done and during the training, the loss value decreases but once I want to evaluate the trained model on my test set I face different errors.
I am somehow confident that the issue is related to input/output shape but unfortunately I did not manage the solve it.
Here is the code:
import numpy as np
import tensorflow as tf
import tensorflow.keras as tfk
import tensorflow_probability as tfp
from tensorflow.keras import layers as tfkl
from sklearn.datasets import make_classification
from tensorflow_probability import layers as tfpl
from sklearn.model_selection import train_test_split
tfd = tfp.distributions
n_epochs = 5
n_features = 2
latent_dim = 1
n_units = 4
learning_rate = 1e-3
n_samples = 400
batch_size = 32
# Generate synthetic data / load data sets
x_in, y_in = make_classification(n_samples=n_samples, n_features=n_features, n_informative=2, n_redundant=0,
n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=[0.5, 0.5],
flip_y=0.01, class_sep=1.0, hypercube=True,
shift=0.0, scale=1.0, shuffle=False, random_state=42)
x_in = x_in.astype('float32')
y_in = y_in.astype('float32') # .reshape(-1, 1)
x_train, x_test, y_train, y_test = train_test_split(x_in, y_in, test_size=0.4, random_state=42, shuffle=True)
x_test, x_val, y_test, y_val = train_test_split(x_test, y_test, test_size=0.5, random_state=42, shuffle=True)
print("shapes:", x_train.shape, y_train.shape, x_test.shape, y_test.shape, x_val.shape, y_val.shape)
prior = tfd.Independent(tfd.Normal(loc=[tf.zeros(latent_dim)], scale=1.), reinterpreted_batch_ndims=1)
train_dataset = tf.data.Dataset.from_tensor_slices(x_train).batch(batch_size)
valid_dataset = tf.data.Dataset.from_tensor_slices(x_val).batch(batch_size)
test_dataset = tf.data.Dataset.from_tensor_slices(x_test).batch(batch_size)
encoder = tf.keras.Sequential([
tfkl.InputLayer(input_shape=[n_features, ], name='enc_input'),
tfkl.Lambda(lambda x: tf.cast(x, tf.float32)), # - 0.5
tfkl.Dense(n_units, activation='relu', name='enc_dense1'),
tfkl.Dense(int(n_units / 2), activation='relu', name='enc_dense2'),
tfkl.Dense(tfpl.MultivariateNormalTriL.params_size(latent_dim),
activation=None, name='mvn_triL1'),
tfpl.MultivariateNormalTriL(
# weight >> num_train_samples or some thing except 1 to convert VAE to beta-VAE
latent_dim, activity_regularizer=tfpl.KLDivergenceRegularizer(prior, weight=1.), name='bottleneck'),
])
decoder = tf.keras.Sequential([
tfkl.InputLayer(input_shape=latent_dim, name='dec_input'),
# tfkl.Dense(n_units, activation='relu', name='dec_dense1'),
# tfkl.Dense(int(n_units * 2), activation='relu', name='dec_dense2'),
tfpl.IndependentBernoulli([n_features], tfd.Bernoulli.logits, name='dec_output'),
])
vae = tfk.Model(inputs=encoder.inputs, outputs=decoder(encoder.outputs), name='VAE')
print("enoder:", encoder)
print(" ")
print("encoder.inputs:", encoder.inputs)
print(" ")
print(" encoder.outputs:", encoder.outputs)
print(" ")
print("decoder:", decoder)
print(" ")
print("decoder:", decoder.inputs)
print(" ")
print("decoder.outputs:", decoder.outputs)
print(" ")
# negative log likelihood i.e the E_{S(eps)} [p(x|z)];
# because the KL term was added in the last layer of the encoder, i.e., via activity_regularizer.
# this loss function takes two arguments, namely the original data points x, and the output of the model,
# which we call it rv_x (because it is a random variable)
negloglik = lambda x, rv_x: -rv_x.log_prob(x)
vae.compile(optimizer=tf.optimizers.Adam(learning_rate=learning_rate),
loss=negloglik,)
vae.summary()
history = vae.fit(train_dataset, epochs=n_epochs, validation_data=valid_dataset,)
print("x.shape:", x_test.shape)
x_hat = vae(x_test)
print("original:")
print(x_test)
print(" ")
print("Decoded Random Samples:")
print(x_hat.sample())
print(" ")
print("Decoded Means:")
print(x_hat.mean())
The Questions:
With the above code I receive the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 80 values, but the requested shape has 160 [Op:Reshape]
As far I know we can add as many layers as I want in the decoder model before its output layer --as it is done a convolutional VAEs, am I right?
If I uncomment the following two lines of code in decoder:
# tfkl.Dense(n_units, activation='relu', name='dec_dense1'),
# tfkl.Dense(int(n_units * 2), activation='relu', name='dec_dense2'),
I see the following warnings and the upcoming error:
WARNING:tensorflow:Gradients do not exist for variables ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] when minimizing the loss.
And the error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 640 values, but the requested shape has 160 [Op:Reshape]
Now the question is why the decoder layers are not used during the training as it is mentioned in the warning.
PS, I also tried to pass the x_train, x_valid, x_test directly during the training and evaluation process but it does not help.
Any helps would be indeed appreciated.

implement one RNN layer in deep DAE seems worse performance

I was trying to implement one RNN layer in deep DAE which is shown in the figure:
DRDAE:
My code is modified based on the DAE tutorial, I change one layer to basic LSTM RNN layer. It somehow can works. The noise in output among different pictures seems lies in same places.
However, compared to both only one layer of RNN and the DAE tutorial, the performance of the structure is much worse. And it requires much more iteration to reach a lower cost.
Can someone help why does the structure got worse result? Below is my code for DRDAE.
# -*- coding: utf-8 -*-
from __future__ import division, print_function, absolute_import
import tensorflow as tf
from tensorflow.contrib import rnn
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
# Parameters
learning_rate = 0.0001
training_epochs = 50001
batch_size = 256
display_step = 500
examples_to_show = 10
total_batch = int(mnist.train.num_examples/batch_size)
# Network Parameters
n_input = 784 # data input
n_hidden_1 = 392 # 1st layer num features
n_hidden_2 = 196 # 2nd layer num features
n_steps = 14
# tf Graph input
X = tf.placeholder("float", [None, n_input])
Y = tf.placeholder("float", [None, n_input])
weights = {
'encoder_h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'encoder_h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'decoder_h1': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_1])),
'decoder_h2': tf.Variable(tf.random_normal([n_hidden_1, n_input])),
}
biases = {
'encoder_b1': tf.Variable(tf.random_normal([n_hidden_1])),
'encoder_b2': tf.Variable(tf.random_normal([n_hidden_2])),
'decoder_b1': tf.Variable(tf.random_normal([n_hidden_1])),
'decoder_b2': tf.Variable(tf.random_normal([n_input])),
}
def RNN(x, size, weights, biases):
# Prepare data shape to match `rnn` function requirements
# Current data input shape: (batch_size, n_steps, n_input)
# Required shape: 'n_steps' tensors list of shape (batch_size, n_input)
# Unstack to get a list of 'n_steps' tensors of shape (batch_size, n_input)
x = tf.split(x,n_steps,1)
# Define a lstm cell with tensorflow
lstm_cell = rnn.BasicLSTMCell(size, forget_bias=1.0)
# Get lstm cell output
outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)
# Linear activation, using rnn inner loop last output
return tf.matmul(outputs[-1], weights) + biases
# Building the encoder
def encoder(x):
# Encoder Hidden layer with sigmoid activation #1
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['encoder_h1']), biases['encoder_b1']))
# Decoder Hidden layer with sigmoid activation #2
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['encoder_h2']), biases['encoder_b2']))
return layer_2
# Building the decoder
def decoder(x):
# Encoder Hidden layer with sigmoid activation #1
layer_1 = RNN(x, n_hidden_2, weights['decoder_h1'],biases['decoder_b1'])
# Decoder Hidden layer with sigmoid activation #2
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['decoder_h2']), biases['decoder_b2']))
return layer_2
# Construct model
encoder_op = encoder(X)
decoder_op = decoder(encoder_op)
# Prediction
y_pred = decoder_op
# Targets (Labels) are the original data.
y_true = Y
# Define loss and optimizer, minimize the squared error
cost = tf.reduce_mean(tf.pow(y_true - y_pred, 2))
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(y_pred,1), tf.argmax(y_true,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
#with tf.device("/cpu:0"):
sess.run(init)
# Training cycle
for epoch in range(training_epochs):
# Loop over all batches
for i in range(total_batch):
batch, _ = mnist.train.next_batch(batch_size)
origin = batch
# Run optimization op (backprop) and cost op (to get loss value)
sess.run(optimizer, feed_dict={X: batch, Y: origin})
# Display logs per epoch step
if epoch % display_step == 0:
c, acy = sess.run([cost, accuracy], feed_dict={X: batch, Y: origin})
print("Epoch:", '%05d' % (epoch+1), "cost =", "{:.9f}".format(c), "accuracy =", "{:.3f}".format(acy))
print("Optimization Finished!")
# Applying encode and decode over test set
encode_decode = sess.run(
y_pred, feed_dict={X: mnist.test.images[:examples_to_show]})
# Compare original images with their reconstructions
f, a = plt.subplots(2, 10, figsize=(10, 2))
for i in range(examples_to_show):
a[0][i].imshow(np.reshape(mnist.test.images[i], (28, 28)))
a[1][i].imshow(np.reshape(encode_decode[i], (28, 28)))

Tensorflow Autoencoder - How To Calculate Reconstruction Error?

I've implemented the following Autoencoder in Tensorflow as shown below. It basically takes MNIST digits as inputs, learns the structure of the data and reproduces the input at its output.
from __future__ import division, print_function, absolute_import
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
# Parameters
learning_rate = 0.01
training_epochs = 20
batch_size = 256
display_step = 1
examples_to_show = 10
# Network Parameters
n_hidden_1 = 256 # 1st layer num features
n_hidden_2 = 128 # 2nd layer num features
n_input = 784 # MNIST data input (img shape: 28*28)
# tf Graph input (only pictures)
X = tf.placeholder("float", [None, n_input])
weights = {
'encoder_h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'encoder_h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'decoder_h1': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_1])),
'decoder_h2': tf.Variable(tf.random_normal([n_hidden_1, n_input])),
}
biases = {
'encoder_b1': tf.Variable(tf.random_normal([n_hidden_1])),
'encoder_b2': tf.Variable(tf.random_normal([n_hidden_2])),
'decoder_b1': tf.Variable(tf.random_normal([n_hidden_1])),
'decoder_b2': tf.Variable(tf.random_normal([n_input])),
}
# Building the encoder
def encoder(x):
# Encoder Hidden layer with sigmoid activation #1
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['encoder_h1']),
biases['encoder_b1']))
# Decoder Hidden layer with sigmoid activation #2
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['encoder_h2']),
biases['encoder_b2']))
return layer_2
# Building the decoder
def decoder(x):
# Encoder Hidden layer with sigmoid activation #1
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['decoder_h1']),
biases['decoder_b1']))
# Decoder Hidden layer with sigmoid activation #2
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['decoder_h2']),
biases['decoder_b2']))
return layer_2
# Construct model
encoder_op = encoder(X)
decoder_op = decoder(encoder_op)
# Prediction
y_pred = decoder_op
# Targets (Labels) are the input data.
y_true = X
# Define loss and optimizer, minimize the squared error
cost = tf.reduce_mean(tf.pow(y_true - y_pred, 2))
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
total_batch = int(mnist.train.num_examples/batch_size)
# Training cycle
for epoch in range(training_epochs):
# Loop over all batches
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
_, c = sess.run([optimizer, cost], feed_dict={X: batch_xs})
# Display logs per epoch step
if epoch % display_step == 0:
print("Epoch:", '%04d' % (epoch+1),
"cost=", "{:.9f}".format(c))
print("Optimization Finished!")
# Applying encode and decode over test set
encode_decode = sess.run(
y_pred, feed_dict={X: mnist.test.images[:examples_to_show]})
# Compare original images with their reconstructions
f, a = plt.subplots(2, 10, figsize=(10, 2))
for i in range(examples_to_show):
a[0][i].imshow(np.reshape(mnist.test.images[i], (28, 28)))
a[1][i].imshow(np.reshape(encode_decode[i], (28, 28)))
f.show()
plt.draw()
plt.waitforbuttonpress()
When I am encoding and decoding over the test set, how do I calculate the reconstruction error (i.e. the Mean Squared Error/Loss) for each sample?
In other words I'd like to see how well the Autoencoder is able to reconstruct its input so that I can use the Autoencoder as a single-class classifier.
Many thanks in advance.
Barry
You can take the output of the decoder and take the difference with the true image and take the average.
Say y is the output of the decoder and the original test image is x then you can do something like for each of the examples and take an average over it:
tf.square(y-x)
This will be your reconstruction cost for the test set.