Why deep NN can't approximate simple ln(x) function? - tensorflow

I have created ANN with two RELU hidden layers + linear activation layer and trying to approximate simple ln(x) function. And I am can't do this good. I am confused because lx(x) in x:[0.0-1.0] range should be approximated without problems (I am using learning rate 0.01 and basic grad descent optimization).
import tensorflow as tf
import numpy as np
def GetTargetResult(x):
curY = np.log(x)
return curY
# Create model
def multilayer_perceptron(x, weights, biases):
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# # Hidden layer with RELU activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Output layer with linear activation
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
# Parameters
learning_rate = 0.01
training_epochs = 10000
batch_size = 50
display_step = 500
# Network Parameters
n_hidden_1 = 50 # 1st layer number of features
n_hidden_2 = 10 # 2nd layer number of features
n_input = 1
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_uniform([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_uniform([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_uniform([n_hidden_2, 1]))
}
biases = {
'b1': tf.Variable(tf.random_uniform([n_hidden_1])),
'b2': tf.Variable(tf.random_uniform([n_hidden_2])),
'out': tf.Variable(tf.random_uniform([1]))
}
x_data = tf.placeholder(tf.float32, [None, 1])
y_data = tf.placeholder(tf.float32, [None, 1])
# Construct model
pred = multilayer_perceptron(x_data, weights, biases)
# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(pred - y_data))
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train = optimizer.minimize(loss)
# Before starting, initialize the variables. We will 'run' this first.
init = tf.initialize_all_variables ()
# Launch the graph.
sess = tf.Session()
sess.run(init)
for step in range(training_epochs):
x_in = np.random.rand(batch_size, 1).astype(np.float32)
y_in = GetTargetResult(x_in)
sess.run(train, feed_dict = {x_data: x_in, y_data: y_in})
if(step % display_step == 0):
curX = np.random.rand(1, 1).astype(np.float32)
curY = GetTargetResult(curX)
curPrediction = sess.run(pred, feed_dict={x_data: curX})
curLoss = sess.run(loss, feed_dict={x_data: curX, y_data: curY})
print("For x = {0} and target y = {1} prediction was y = {2} and squared loss was = {3}".format(curX, curY,curPrediction, curLoss))
For the configuration above NN is just learning to guess y = -1.00. I have tried different learning rates, couple optimizers and different configurations with no success - learning does not converge in any case. I did something like that with logarithm in past in other deep learning framework without problem.. Can be the TF specific issue? What am I doing wrong?

What your network has to predict
Source: WolframAlpha
What your architecture is
ReLU(ReLU(x * W_1 + b_1) * W_2 + b_2)*W_out + b_out
Thoughts
My first thought was that ReLU is the problem. However, you don't apply relu to the output, so that should not cause the problem.
Changing the initialization (from uniform to normal) and the Optimizer (from SGD to ADAM) seems to fix the problem:
#!/usr/bin/env python
import tensorflow as tf
import numpy as np
def get_target_result(x):
return np.log(x)
def multilayer_perceptron(x, weights, biases):
"""Create model."""
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# # Hidden layer with RELU activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Output layer with linear activation
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
# Parameters
learning_rate = 0.01
training_epochs = 10**6
batch_size = 500
display_step = 500
# Network Parameters
n_hidden_1 = 50 # 1st layer number of features
n_hidden_2 = 10 # 2nd layer number of features
n_input = 1
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.truncated_normal([n_input, n_hidden_1], stddev=0.1)),
'h2': tf.Variable(tf.truncated_normal([n_hidden_1, n_hidden_2], stddev=0.1)),
'out': tf.Variable(tf.truncated_normal([n_hidden_2, 1], stddev=0.1))
}
biases = {
'b1': tf.Variable(tf.constant(0.1, shape=[n_hidden_1])),
'b2': tf.Variable(tf.constant(0.1, shape=[n_hidden_2])),
'out': tf.Variable(tf.constant(0.1, shape=[1]))
}
x_data = tf.placeholder(tf.float32, [None, 1])
y_data = tf.placeholder(tf.float32, [None, 1])
# Construct model
pred = multilayer_perceptron(x_data, weights, biases)
# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(pred - y_data))
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
# train = optimizer.minimize(loss)
train = tf.train.AdamOptimizer(1e-4).minimize(loss)
# Before starting, initialize the variables. We will 'run' this first.
init = tf.initialize_all_variables()
# Launch the graph.
sess = tf.Session()
sess.run(init)
for step in range(training_epochs):
x_in = np.random.rand(batch_size, 1).astype(np.float32)
y_in = get_target_result(x_in)
sess.run(train, feed_dict={x_data: x_in, y_data: y_in})
if(step % display_step == 0):
curX = np.random.rand(1, 1).astype(np.float32)
curY = get_target_result(curX)
curPrediction = sess.run(pred, feed_dict={x_data: curX})
curLoss = sess.run(loss, feed_dict={x_data: curX, y_data: curY})
print(("For x = {0} and target y = {1} prediction was y = {2} and "
"squared loss was = {3}").format(curX, curY,
curPrediction, curLoss))
Training this for 1 minute gave me:
For x = [[ 0.19118255]] and target y = [[-1.65452647]] prediction was y = [[-1.65021849]] and squared loss was = 1.85587377928e-05
For x = [[ 0.17362741]] and target y = [[-1.75084364]] prediction was y = [[-1.74087048]] and squared loss was = 9.94640868157e-05
For x = [[ 0.60853624]] and target y = [[-0.4966988]] prediction was y = [[-0.49964082]] and squared loss was = 8.65551464813e-06
For x = [[ 0.33864763]] and target y = [[-1.08279514]] prediction was y = [[-1.08586168]] and squared loss was = 9.4036658993e-06
For x = [[ 0.79126364]] and target y = [[-0.23412406]] prediction was y = [[-0.24541236]] and squared loss was = 0.000127425722894
For x = [[ 0.09994856]] and target y = [[-2.30309963]] prediction was y = [[-2.29796076]] and squared loss was = 2.6408026315e-05
For x = [[ 0.31053194]] and target y = [[-1.16946852]] prediction was y = [[-1.17038012]] and squared loss was = 8.31002580526e-07
For x = [[ 0.0512077]] and target y = [[-2.97186542]] prediction was y = [[-2.96796203]] and squared loss was = 1.52364455062e-05
For x = [[ 0.120253]] and target y = [[-2.11815739]] prediction was y = [[-2.12729549]] and squared loss was = 8.35050013848e-05
So the answer might be that your optimizer is not good / the optimization problem starts at a bad point. See
Xavier Glorot, Yoshua Bengio: Understanding the difficulty of training deep feedforward neural networks
Visualizing Optimization Algos
The following image is from Alec Radfords nice gifs. It does not contain ADAM, but you get a feeling for how much better one can do than SGD:
Two idea how this might be improved
try dropout
try not to use x values close to 0. I would rather sample values in [0.01, 1].
However, my experience with regression problems is quite limited.

First of all, your input data is in range [0, 1), which is not a good input to a neural network. Subtract mean from x after computing y to make it normalized (also ideally divide by standard deviation).
However, in your particular case it was not enough to make it work.
I played with it and found two ways to make it work (both require data normalization as described above):
Either completely remove the second layer
or
Make the number of neurons in the second layer 50.
My guess would be that 10 neurons do not have sufficient representation power to pass enough information to the last layer (obviously, a perfectly smart NN would learn to ignore the second layer in this case passing the answer in one of the neurons, but the theoretical possibility doesn't mean that gradient descent will learn to do so).

I have not look at the code but this is the theory. If you use an activation function like "tanh", then for small weights the activation function is in the linear region and for large weights the activation function is either -1 or +1. If you are in the linear region across all layers then you can not approximate complex functions (i.e. you have a sandwich of linear layers hence the best you can do is linear aproximations) but if you have bigger weights then the nonlinearly allow you to approximate a wide range of functions. There are no free lunches, the weights need to be at the right values to avoid over-fitting and under-fitting. This process is called regularization.

Related

Why Spark ML perceptron classifier has high F1-score while the same model on TensorFlow performs very badly?

Our team is working on a NLP problem. We have a dataset with some labeled sentences and we must classify them into two classes, 0 or 1.
We preprocess the data and use word embeddings so that we have 300 features for each sentence, then we use a simple neural network to train the model.
Since the data are very skewed we measure the model score with the F1-score, computing it both on the train set (80%) and the test set (20%).
Spark
We used the multilayer perceptron classifier featured in PySpark's MLlib:
layers = [300, 600, 2]
trainer = MultilayerPerceptronClassifier(featuresCol='features', labelCol='target',
predictionCol='prediction', maxIter=10, layers=layers,
blockSize=128)
model = trainer.fit(train_df)
result = model.transform(test_df)
predictionAndLabels = result.select("prediction", "target").withColumnRenamed("target", "label")
evaluator = MulticlassClassificationEvaluator(metricName="f1")
f1_score = evaluator.evaluate(predictionAndLabels)
This way we get F1-scores ranging between 0.91 and 0.93.
TensorFlow
We then chose to switch (mainly for learning purpose) to TensorFlow, so we implemented a neural network using the same architecture and formulas of the MLlib's one:
# Network Parameters
n_input = 300
n_hidden_1 = 600
n_classes = 2
# TensorFlow graph input
features = tf.placeholder(tf.float32, shape=(None, n_input), name='inputs')
labels = tf.placeholder(tf.float32, shape=(None, n_classes), name='labels')
# Initializes weights and biases
init_biases_and_weights()
# Layers definition
layer_1 = tf.add(tf.matmul(features, weights['h1']), biases['b1'])
layer_1 = tf.nn.sigmoid(layer_1)
out_layer = tf.matmul(layer_1, weights['out']) + biases['out']
out_layer = tf.nn.softmax(out_layer)
# Optimizer definition
learning_rate_ph = tf.placeholder(tf.float32, shape=(), name='learning_rate')
loss_function = tf.losses.log_loss(labels=labels, predictions=out_layer)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate_ph).minimize(loss_function)
# Start TensorFlow session
init = tf.global_variables_initializer()
tf_session = tf.InteractiveSession()
tf_session.run(init)
# Train Neural Network
learning_rate = 0.01
iterations = 100
batch_size = 256
total_batch = int(len(y_train) / batch_size)
for epoch in range(iterations):
avg_cost = 0.0
for block in range(total_batch):
batch_x = x_train[block * batch_size:min(block * batch_size + batch_size, len(x_train)), :]
batch_y = y_train[block * batch_size:min(block * batch_size + batch_size, len(y_train)), :]
_, c = tf_session.run([optimizer, loss_function], feed_dict={learning_rate_ph: learning_rate,
features: batch_x,
labels: batch_y})
avg_cost += c
avg_cost /= total_batch
print("Iteration " + str(epoch + 1) + " Logistic-loss=" + str(avg_cost))
# Make predictions
predictions_train = tf_session.run(out_layer, feed_dict={features: x_train, labels: y_train})
predictions_test = tf_session.run(out_layer, feed_dict={features: x_test, labels: y_test})
# Compute F1-score
f1_score = f1_score_tf(y_test, predictions_test)
Support functions:
def initialize_weights_and_biases():
global weights, biases
epsilon_1 = sqrt(6) / sqrt(n_input + n_hidden_1)
epsilon_2 = sqrt(6) / sqrt(n_classes + n_hidden_1)
weights = {
'h1': tf.Variable(tf.random_uniform([n_input, n_hidden_1],
minval=0 - epsilon_1, maxval=epsilon_1, dtype=tf.float32)),
'out': tf.Variable(tf.random_uniform([n_hidden_1, n_classes],
minval=0 - epsilon_2, maxval=epsilon_2, dtype=tf.float32))
}
biases = {
'b1': tf.Variable(tf.constant(1, shape=[n_hidden_1], dtype=tf.float32)),
'out': tf.Variable(tf.constant(1, shape=[n_classes], dtype=tf.float32))
}
def f1_score_tf(actual, predicted):
actual = np.argmax(actual, 1)
predicted = np.argmax(predicted, 1)
tp = tf.count_nonzero(predicted * actual)
fp = tf.count_nonzero(predicted * (actual - 1))
fn = tf.count_nonzero((predicted - 1) * actual)
precision = tp / (tp + fp)
recall = tp / (tp + fn)
f1 = 2 * precision * recall / (precision + recall)
return tf.Tensor.eval(f1)
This way we get F1-scores ranging between 0.24 and 0.25.
Question
The only differences that I can see between the two neural networks are:
Optimizer: L-BFGS in Spark, Gradient Descent in TensorFlow
Weights and biases initialization: Spark makes its own initialization while we initialize them manually in TensorFlow
I don't think that these two parameters can cause a so big difference in performance between the models, but still Spark seems to get very high scores in very few iterations.
I can't understand if TensorFlow is performing very bad or maybe Spark's scores are not truthful. And in both cases I think we aren't seeing something important.
Initializing weights as uniform and bias as 1 is certainly not a good idea, and it may very well be the cause of this discrepancy.
Use normal or truncated_normal instead, with the default zero mean and a small variance for the weights:
weights = {
'h1': tf.Variable(tf.truncated_normal([n_input, n_hidden_1],
stddev=0.01, dtype=tf.float32)),
'out': tf.Variable(tf.truncated_normal([n_hidden_1, n_classes],
stddev=0.01, dtype=tf.float32))
}
and zero for the biases:
biases = {
'b1': tf.Variable(tf.constant(0, shape=[n_hidden_1], dtype=tf.float32)),
'out': tf.Variable(tf.constant(0, shape=[n_classes], dtype=tf.float32))
}
That said, I am not sure about the correctness of using the MulticlassClassificationEvaluator for a binary classification problem, and I would suggest doing some further manual checks to confirm that the function indeed returns what you think it returns...

implement one RNN layer in deep DAE seems worse performance

I was trying to implement one RNN layer in deep DAE which is shown in the figure:
DRDAE:
My code is modified based on the DAE tutorial, I change one layer to basic LSTM RNN layer. It somehow can works. The noise in output among different pictures seems lies in same places.
However, compared to both only one layer of RNN and the DAE tutorial, the performance of the structure is much worse. And it requires much more iteration to reach a lower cost.
Can someone help why does the structure got worse result? Below is my code for DRDAE.
# -*- coding: utf-8 -*-
from __future__ import division, print_function, absolute_import
import tensorflow as tf
from tensorflow.contrib import rnn
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
# Parameters
learning_rate = 0.0001
training_epochs = 50001
batch_size = 256
display_step = 500
examples_to_show = 10
total_batch = int(mnist.train.num_examples/batch_size)
# Network Parameters
n_input = 784 # data input
n_hidden_1 = 392 # 1st layer num features
n_hidden_2 = 196 # 2nd layer num features
n_steps = 14
# tf Graph input
X = tf.placeholder("float", [None, n_input])
Y = tf.placeholder("float", [None, n_input])
weights = {
'encoder_h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'encoder_h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'decoder_h1': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_1])),
'decoder_h2': tf.Variable(tf.random_normal([n_hidden_1, n_input])),
}
biases = {
'encoder_b1': tf.Variable(tf.random_normal([n_hidden_1])),
'encoder_b2': tf.Variable(tf.random_normal([n_hidden_2])),
'decoder_b1': tf.Variable(tf.random_normal([n_hidden_1])),
'decoder_b2': tf.Variable(tf.random_normal([n_input])),
}
def RNN(x, size, weights, biases):
# Prepare data shape to match `rnn` function requirements
# Current data input shape: (batch_size, n_steps, n_input)
# Required shape: 'n_steps' tensors list of shape (batch_size, n_input)
# Unstack to get a list of 'n_steps' tensors of shape (batch_size, n_input)
x = tf.split(x,n_steps,1)
# Define a lstm cell with tensorflow
lstm_cell = rnn.BasicLSTMCell(size, forget_bias=1.0)
# Get lstm cell output
outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)
# Linear activation, using rnn inner loop last output
return tf.matmul(outputs[-1], weights) + biases
# Building the encoder
def encoder(x):
# Encoder Hidden layer with sigmoid activation #1
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['encoder_h1']), biases['encoder_b1']))
# Decoder Hidden layer with sigmoid activation #2
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['encoder_h2']), biases['encoder_b2']))
return layer_2
# Building the decoder
def decoder(x):
# Encoder Hidden layer with sigmoid activation #1
layer_1 = RNN(x, n_hidden_2, weights['decoder_h1'],biases['decoder_b1'])
# Decoder Hidden layer with sigmoid activation #2
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['decoder_h2']), biases['decoder_b2']))
return layer_2
# Construct model
encoder_op = encoder(X)
decoder_op = decoder(encoder_op)
# Prediction
y_pred = decoder_op
# Targets (Labels) are the original data.
y_true = Y
# Define loss and optimizer, minimize the squared error
cost = tf.reduce_mean(tf.pow(y_true - y_pred, 2))
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(y_pred,1), tf.argmax(y_true,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
#with tf.device("/cpu:0"):
sess.run(init)
# Training cycle
for epoch in range(training_epochs):
# Loop over all batches
for i in range(total_batch):
batch, _ = mnist.train.next_batch(batch_size)
origin = batch
# Run optimization op (backprop) and cost op (to get loss value)
sess.run(optimizer, feed_dict={X: batch, Y: origin})
# Display logs per epoch step
if epoch % display_step == 0:
c, acy = sess.run([cost, accuracy], feed_dict={X: batch, Y: origin})
print("Epoch:", '%05d' % (epoch+1), "cost =", "{:.9f}".format(c), "accuracy =", "{:.3f}".format(acy))
print("Optimization Finished!")
# Applying encode and decode over test set
encode_decode = sess.run(
y_pred, feed_dict={X: mnist.test.images[:examples_to_show]})
# Compare original images with their reconstructions
f, a = plt.subplots(2, 10, figsize=(10, 2))
for i in range(examples_to_show):
a[0][i].imshow(np.reshape(mnist.test.images[i], (28, 28)))
a[1][i].imshow(np.reshape(encode_decode[i], (28, 28)))

Solve a prediction task using Tensor Flow

I've been trying to solve a prediction/regression problem by using Tensor Flow, but i'm facing some problems. Let me give you some context before I explain my real problem.
The data I've been playing with are a set of 5 features, let's call them [f1, f2, f3, f4, f5], representing in somehow a particular phenomena identified by a real value (target).
What I've been trying to do is to train a Multi Layer Perceptron that learns the relationship among the features and the target values. In a nutshell i'd like to predict real values on the base of what the neural network as seen during the training phase.
I've identified this problem as prediction/regression problem and wrote down the following code:
#picking device to run on
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
# Parameters
learning_rate = 0.001
training_epochs = 99999
batch_size = 4096
STDDEV = 0.1
# Network Parameters
n_hidden_1 = 10 # 1st layer number of neurons
n_hidden_2 = 10 # 2nd layer number of neurons
n_hidden_3 = 10 # 3nd layer number of neurons
n_hidden_4 = 10 # 4nd layer number of neurons
n_hidden_5 = 10 # 5nd layer number of neurons
n_input = 5 # number of features
n_classes = 1 # one target value (float)
# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
# LOADING DATA
data_train = loader.loadDataset(dir_feat, train_path, 'TRAIN', features)
data_test = loader.loadDataset(dir_feat, test_path, 'TEST', features)
valid_period = 5
test_period = 10
def multilayer_perceptron(x, weights, biases):
# Hidden layer with sigmoid activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.sigmoid(layer_1)
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.sigmoid(layer_2)
layer_3 = tf.add(tf.matmul(layer_2, weights['h3']), biases['b3'])
layer_3= tf.nn.sigmoid(layer_3)
layer_4 = tf.add(tf.matmul(layer_3, weights['h4']), biases['b4'])
layer_4 = tf.nn.sigmoid(layer_4)
layer_5 = tf.add(tf.matmul(layer_4, weights['h5']), biases['b5'])
layer_5 = tf.nn.sigmoid(layer_5)
# Output layer with linear activation
out = tf.matmul(layer_5, weights['out']) + biases['out']
return out
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1],stddev=STDDEV)),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2],stddev=STDDEV)),
'h3': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_3],stddev=STDDEV)),
'h4': tf.Variable(tf.random_normal([n_hidden_3, n_hidden_4],stddev=STDDEV)),
'h5': tf.Variable(tf.random_normal([n_hidden_4, n_hidden_5],stddev=STDDEV)),
'out': tf.Variable(tf.random_normal([n_hidden_5, n_classes],stddev=STDDEV))
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'b3': tf.Variable(tf.random_normal([n_hidden_3])),
'b4': tf.Variable(tf.random_normal([n_hidden_4])),
'b5': tf.Variable(tf.random_normal([n_hidden_5])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
# Construct model
pred = multilayer_perceptron(x, weights, biases)
def RMSE():
return tf.sqrt(tf.reduce_mean(tf.square(y - pred)))
cost = RMSE()
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(1, training_epochs):
avg_cost = 0.
avg_R_square_train = []
train_dataset = loader.Dataset(data=data_train, batch_size=batch_size, num_feats=n_input)
total_batch = train_dataset.getNumberBatches()
# Loop over all batches
for i in range(total_batch):
batch_x, batch_y = train_dataset.next_batch(update=True)
# Run optimization op (backprop) and cost op (to get loss value)
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
c_train = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
# Compute average loss
avg_cost += c_train / total_batch
print("Epoch:" + str(epoch) + ", TRAIN_loss = {:.9f}".format(avg_cost))
# TESTING
if epoch % test_period == 0:
c_test = sess.run(cost, feed_dict={x: data_test[0][0], y: data_test[0][1]})
print("Epoch:" + str(epoch) + ", TEST_loss = {:.9f}".format(c_test))
The problem I've encountered is that the cost function for the test set (after some iterations) got stuck in a local minima and doesn't decrease anymore.
Epoch:6697, TRAIN_loss = 2.162182076
Epoch:6698, TRAIN_loss = 2.156500859
Epoch:6699, TRAIN_loss = 2.157814605
Epoch:6700, TRAIN_loss = 2.160744122
Epoch:6700, TEST_loss = 2.301288128
Epoch:6701, TRAIN_loss = 2.139338647
...
Epoch:6709, TRAIN_loss = 2.166410744
Epoch:6710, TRAIN_loss = 2.162357884
Epoch:6710, TEST_loss = 2.301478863
Epoch:6711, TRAIN_loss = 2.143475396
...
Epoch:6719, TRAIN_loss = 2.145476401
Epoch:6720, TRAIN_loss = 2.150237552
Epoch:6720, TEST_loss = 2.301517725
Epoch:6721, TRAIN_loss = 2.151232243
...
Epoch:6729, TRAIN_loss = 2.163080522
Epoch:6730, TRAIN_loss = 2.160523321
Epoch:6730, TEST_loss = 2.301782370
...
Epoch:6739, TRAIN_loss = 2.156920952
Epoch:6740, TRAIN_loss = 2.162290675
Epoch:6740, TEST_loss = 2.301943779
...
I've tried to change the several hyper-parameters such as the number of hidden layer and/or the number of nodes, the learning rate, the batch size, etc but the situation doesn't change at all. I have also tried to use other loss function such as MAE, MSE.
Actually the number of data sample i have is roughly 270,000.
Can someone please suggest me how to solve this problem or give me some useful advices about it?
Thanks in advance.
Davide

Zero cost in tensorflow multi perceptron

I have built a simple neural network to classify data into only 2 classes
Data is something like this
34.62365962451697,78.0246928153624,0
60.18259938620976,86.30855209546826,1
There are no zero values in data so there's no source of such cost.Cost is zero with adagrad optimiser and nan with gradient descent optimiser
Here's the code
import numpy as ny
import tensorflow as tf
def load():
data = []
for line in open("ex2data1.txt"):
row = line.split(',')
x = ny.array(row, dtype='|S4')
data.append(x.astype(ny.float64))
return ny.array(data)
def multilayer_perceptron(x, weights, biases):
# Hidden layer with ReLU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# Hidden layer with ReLU activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Output layer with linear activation
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([2, 15])),
'h2': tf.Variable(tf.random_normal([15, 15])),
'out': tf.Variable(tf.random_normal([15, 1]))
}
biases = {
'b1': tf.Variable(tf.random_normal([15])),
'b2': tf.Variable(tf.random_normal([15])),
'out': tf.Variable(tf.random_normal([1]))
}
data = load()
Xdata = ny.array(data[:, 0:2])
Ydata = ny.array(data[:, 2])
Ydata = ny.array(Ydata.reshape([100, 1]))
# Step 2 - Create input and output placeholders for data
X = tf.placeholder("float", [None, 2], name="X")
Y = tf.placeholder("float", [None, 1], name="Y")
pred = multilayer_perceptron(X, weights, biases)
# Minimize error using cross entropy
with tf.name_scope("cost"):
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=Y))
optimizer = tf.train.AdagradOptimizer(0.001).minimize(cost)
tf.summary.scalar("cost", cost)
init = tf.global_variables_initializer()
summary_op = tf.summary.merge_all()
with tf.Session() as sess:
sess.run(init)
print(Xdata)
print(Ydata)
# Step 12 train the model
for i in range(1000):
sess.run(optimizer, feed_dict={X: Xdata, Y: Ydata})
if (i % 100 == 0):
print(sess.run(cost, feed_dict={X: Xdata, Y: Ydata}))
With the way your labels are represented you should not use this loss function. I think this is relevant

Can i generate the input given the output in a pretrained Tensorflow model?

Let's assume i have trained a model for the MNist task, given the following code:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
import tensorflow as tf
# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
display_step = 1
# Network Parameters
n_hidden_1 = 256 # 1st layer number of features
n_hidden_2 = 256 # 2nd layer number of features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
# Create model
def multilayer_perceptron(x, weights, biases):
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# Hidden layer with RELU activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Output layer with linear activation
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Test model
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(training_epochs):
avg_cost = 0.
avg_acc = 0.
total_batch = int(mnist.train.num_examples/batch_size)
# Loop over all batches
for i in range(total_batch):
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})
batch_acc = accuracy.eval({x: batch_x, y: batch_y})
# Compute average loss
avg_cost += c / total_batch
avg_acc += batch_acc / total_batch
# Display logs per epoch step
if epoch % display_step == 0:
test_acc = accuracy.eval({x: mnist.test.images, y: mnist.test.labels})
print(
"Epoch:",
'%04d' % (epoch+1),
"cost=",
"{:.9f}".format(avg_cost),
"average_train_accuracy=",
"{:.6f}".format(avg_acc),
"test_accuracy=",
"{:.6f}".format(test_acc)
)
print("Optimization Finished!")
So this model predicts the number shown in an image given the image.
Once i have trained it, could i make the input a 'variable' instead of 'placeholder' and try to reverse engineer the input given an output ?
For example i would like to feed the output '8' and produce a representative image of number eight.
I thought of:
Freezing the model
Add a variable matrix 'M' of the same size as the input between the input and the weights
Feed an Identical matrix as input to the input placeholder
Run the optimizer to learn the 'M' matrix.
Is there a better way ?
If your goal is to reverse the model in the sense that the input should be a digit and the output an image displaying that digit (in say, handwritten form), it is not quite possible to do with machine learning models.
Because machine learning models attempt to create generalizations from the input (so that similar input will provide similar output, although the model was never trained on it) they tend to be quite lossy. Additionally, the reduction from hundreds, thousands and more input variables into a single output variable obviously has to lose some information in the process.
More specifically, although a Multilayer Perceptron (as you're using in your example) is a fully connected Neural Network, some weights are expected to be zero, thus completely dropping the information in certain input variables. Moverover, the same output of a neuron can be retrieved by multiple distinctive input values to it's function, due to the many degrees of freedom.
It is theoretically possible to replace those degrees of freedom and lost information with specifically crafted or random data, but that does not guarantee a successful output.
On a side note, I'm a bit puzzled by this question. If you are able to generate that model yourself, you could also create a similar model that does the opposite. You could train a model to accept an input digit (and perhaps some random seed) and output an image.