I got the network below and when I train it the accuracy remains 0.000. I tried to make it easy by including just 2 samples. The inputs are all zero's except for one of the samples. The difference between the samples is that in the case of zero's everywhere the output is something like 0.3 0.4 0.3 and in the other case its 0.4 0.3 0.3 (both sum up to 1). I would expect it to be easy to get at least 50% accuracy and probably 100% on just two training samples.
Question: is there something wrong in the configuration of my network? If not, any suggestions on how to proceed. So far tensorflow is not easy to debug for me.
Might have relevance: I first had the weights and bias initialized to zero and then got a 0.5 accuracy. When I print the content of the layers after training only the weight and bias of the out layer contain positive values.
self.session = tf.Session()
n_hidden_1 = 10 # 1st layer number of neurons
n_hidden_2 = 10 # 2nd layer number of neurons
self.num_input = 68 # data values
self.num_classes = 18
self.weights = {
'h1': tf.Variable(tf.random_normal([self.num_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, self.num_classes]))
}
self.biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([self.num_classes]))
}
self.input = tf.placeholder(dtype=tf.float32, shape = [None, self.num_input])
self.output = tf.placeholder(dtype=tf.float32, shape = [None, self.num_classes])
layer_1 = tf.nn.relu(tf.add(tf.matmul(self.input, self.weights['h1']), self.biases['b1']))
# Hidden fully connected layer with 256 neurons
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, self.weights['h2']), self.biases['b2']))
# Output fully connected layer with a neuron for each class
self.out_layer = tf.nn.softmax(tf.matmul(layer_2, self.weights['out']) + self.biases['out'])
self.loss_op = tf.reduce_mean(tf.squared_difference(self.out_layer, self.output))
optimizer = tf.train.AdamOptimizer(learning_rate=0.1)
self.train_op = optimizer.minimize(self.loss_op)
# Evaluate model
correct_pred = tf.equal(tf.argmax(self.out_layer, 1), tf.argmax(self.output, 1))
self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
self.session.run(tf.global_variables_initializer())
def train(self,train_x,train_y):
loss, acc = self.session.run([self.loss_op, self.accuracy], feed_dict={self.input: train_x, self.output: train_y})
self.logger.info("Before training Loss= " + \
"{:.4f}".format(loss) + ", Training Accuracy= " + \
"{:.3f}".format(acc))
self.session.run(self.train_op, feed_dict={self.input: train_x, self.output: train_y})
loss, acc = self.session.run([self.loss_op, self.accuracy], feed_dict={self.input: train_x, self.output: train_y})
self.logger.info("After training Loss= " + \
"{:.4f}".format(loss) + ", Training Accuracy= " + \
"{:.3f}".format(acc))
It looks like you're running train(...) just once. You need to call session.run(train_op, feed_dict=...) in a loop.
That call will only make a single update to the parameters, which isn't going to be much better than random initialization.
Related
I've been walking through some tensorflow tutorials and am cobbling together a pet experiment. However, I am running into some dimension errors and I can seem to figure them out.
My goal: I have an input matrix for the shape 1xN. I have a training set of dimension 10xN. (1 and 10 were chosen arbitrarily). N is intended to represent N samples in a training set: 1 input value mapped to one vector of outputs. You can think of this as 1 input neuron and m output neurons. The training set is a set of these single values mapped to a 1d vector. I wish to train the network by running the set of these mapped inputs and outputs against it and reducing the error.
The simple algorithm that I am trying to accomplish:
For each value in the input vector
Load the input neuron with that value
Feed forward
Evaluate against the corresponding vector
Repeat to minimize error.
However, I seem to be getting mixed up with how to format the data to feed to the network. I have a placeholder of 1 input neurons and one of n output neurons. I want to follow the above algorithm but I am not sure if I am doing it right:
# Data parameters
num_frames = 10
stimuli_value_low = .00001
stimuli_value_high = 100
pixel_value_low = .00001
pixel_value_high = 256.0
stimuli_dimension = 1
frame_dimension = 10
stimuli = np.random.uniform(stimuli_value_low, stimuli_value_high, (stimuli_dimension, num_frames))
frames = np.random.uniform(pixel_value_low, pixel_value_high, (frame_dimension, num_frames))
# Parameters
learning_rate = 0.01
training_iterations = 1000
display_iteration = 10
# Network Parameters
n_hidden_1 = 100
n_hidden_2 = 100
num_input_neurons = stimuli_dimension
num_output_neurons = frame_dimension
# Create placeholders
input_placeholder = tf.placeholder("float", [None, num_input_neurons])
output_placeholder = tf.placeholder("float", [None, num_output_neurons])
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([num_input_neurons, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, num_output_neurons]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([num_output_neurons]))
}
# Create model
def neural_net(input_placeholder):
# Hidden fully connected layer
layer_1 = tf.add(tf.matmul(input_placeholder, weights['h1']), biases['b1'])
# Hidden fully connected layer
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
# Output fully connected layer with a neuron for each pixel
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
# Construct model
logits = neural_net(input_placeholder)
# Define loss operation and optimizer
loss_operation = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = output_placeholder))
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate)
train_operation = optimizer.minimize(loss_operation)
# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(output_placeholder, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()
# Start Training
with tf.Session() as sess:
# Run the initializer
sess.run(init)
for step in range(1, training_iterations + 1):
sess.run(train_operation, feed_dict = {X: stimuli, Y: frames})
if iteration % display_iteration == 0 or iteration == 1:
loss, accuracy = sess.run([loss_operation, accuracy_operation], feed_dict = {X: stimuli, Y: frames})
print("Step " + str(iteration) +
", Loss = " + "{:.4f}".format(loss) +
", Training Accuracy= " + \
"{:.3f}".format(acc))
print("Optimization finished!")
I think it is something to do with how I am structuring my data or feeding it to the run function.
Here is the error I am getting:
ValueError Traceback (most recent call last)
<ipython-input-420-7517598734d6> in <module>()
6 for step in range(1, training_iterations + 1):
7
----> 8 sess.run(train_operation, feed_dict = {X: stimuli, Y: frames})
9
10 if iteration % display_iteration == 0 or iteration == 1:
1 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1147 'which has shape %r' %
1148 (np_val.shape, subfeed_t.name,
-> 1149 str(subfeed_t.get_shape())))
1150 if not self.graph.is_feedable(subfeed_t):
1151 raise ValueError('Tensor %s may not be fed.' % subfeed_t)
ValueError: Cannot feed value of shape (1, 10) for Tensor 'Placeholder_6:0', which has shape '(?, 1)'
How can I ensure I am formatting my input data correctly and forming my network corresponingly?
Turns out I had the dimensions of the arrays I was generating backwards:
stimuli = np.random.uniform(stimuli_value_low, stimuli_value_high, (stimuli_dimension, num_frames))
frames = np.random.uniform(pixel_value_low, pixel_value_high, (frame_dimension, num_frames))
should be:
stimuli = np.random.uniform(stimuli_value_low, stimuli_value_high, (num_frames, stimuli_dimension))
frames = np.random.uniform(pixel_value_low, pixel_value_high, (num_frames, frame_dimension))
Our team is working on a NLP problem. We have a dataset with some labeled sentences and we must classify them into two classes, 0 or 1.
We preprocess the data and use word embeddings so that we have 300 features for each sentence, then we use a simple neural network to train the model.
Since the data are very skewed we measure the model score with the F1-score, computing it both on the train set (80%) and the test set (20%).
Spark
We used the multilayer perceptron classifier featured in PySpark's MLlib:
layers = [300, 600, 2]
trainer = MultilayerPerceptronClassifier(featuresCol='features', labelCol='target',
predictionCol='prediction', maxIter=10, layers=layers,
blockSize=128)
model = trainer.fit(train_df)
result = model.transform(test_df)
predictionAndLabels = result.select("prediction", "target").withColumnRenamed("target", "label")
evaluator = MulticlassClassificationEvaluator(metricName="f1")
f1_score = evaluator.evaluate(predictionAndLabels)
This way we get F1-scores ranging between 0.91 and 0.93.
TensorFlow
We then chose to switch (mainly for learning purpose) to TensorFlow, so we implemented a neural network using the same architecture and formulas of the MLlib's one:
# Network Parameters
n_input = 300
n_hidden_1 = 600
n_classes = 2
# TensorFlow graph input
features = tf.placeholder(tf.float32, shape=(None, n_input), name='inputs')
labels = tf.placeholder(tf.float32, shape=(None, n_classes), name='labels')
# Initializes weights and biases
init_biases_and_weights()
# Layers definition
layer_1 = tf.add(tf.matmul(features, weights['h1']), biases['b1'])
layer_1 = tf.nn.sigmoid(layer_1)
out_layer = tf.matmul(layer_1, weights['out']) + biases['out']
out_layer = tf.nn.softmax(out_layer)
# Optimizer definition
learning_rate_ph = tf.placeholder(tf.float32, shape=(), name='learning_rate')
loss_function = tf.losses.log_loss(labels=labels, predictions=out_layer)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate_ph).minimize(loss_function)
# Start TensorFlow session
init = tf.global_variables_initializer()
tf_session = tf.InteractiveSession()
tf_session.run(init)
# Train Neural Network
learning_rate = 0.01
iterations = 100
batch_size = 256
total_batch = int(len(y_train) / batch_size)
for epoch in range(iterations):
avg_cost = 0.0
for block in range(total_batch):
batch_x = x_train[block * batch_size:min(block * batch_size + batch_size, len(x_train)), :]
batch_y = y_train[block * batch_size:min(block * batch_size + batch_size, len(y_train)), :]
_, c = tf_session.run([optimizer, loss_function], feed_dict={learning_rate_ph: learning_rate,
features: batch_x,
labels: batch_y})
avg_cost += c
avg_cost /= total_batch
print("Iteration " + str(epoch + 1) + " Logistic-loss=" + str(avg_cost))
# Make predictions
predictions_train = tf_session.run(out_layer, feed_dict={features: x_train, labels: y_train})
predictions_test = tf_session.run(out_layer, feed_dict={features: x_test, labels: y_test})
# Compute F1-score
f1_score = f1_score_tf(y_test, predictions_test)
Support functions:
def initialize_weights_and_biases():
global weights, biases
epsilon_1 = sqrt(6) / sqrt(n_input + n_hidden_1)
epsilon_2 = sqrt(6) / sqrt(n_classes + n_hidden_1)
weights = {
'h1': tf.Variable(tf.random_uniform([n_input, n_hidden_1],
minval=0 - epsilon_1, maxval=epsilon_1, dtype=tf.float32)),
'out': tf.Variable(tf.random_uniform([n_hidden_1, n_classes],
minval=0 - epsilon_2, maxval=epsilon_2, dtype=tf.float32))
}
biases = {
'b1': tf.Variable(tf.constant(1, shape=[n_hidden_1], dtype=tf.float32)),
'out': tf.Variable(tf.constant(1, shape=[n_classes], dtype=tf.float32))
}
def f1_score_tf(actual, predicted):
actual = np.argmax(actual, 1)
predicted = np.argmax(predicted, 1)
tp = tf.count_nonzero(predicted * actual)
fp = tf.count_nonzero(predicted * (actual - 1))
fn = tf.count_nonzero((predicted - 1) * actual)
precision = tp / (tp + fp)
recall = tp / (tp + fn)
f1 = 2 * precision * recall / (precision + recall)
return tf.Tensor.eval(f1)
This way we get F1-scores ranging between 0.24 and 0.25.
Question
The only differences that I can see between the two neural networks are:
Optimizer: L-BFGS in Spark, Gradient Descent in TensorFlow
Weights and biases initialization: Spark makes its own initialization while we initialize them manually in TensorFlow
I don't think that these two parameters can cause a so big difference in performance between the models, but still Spark seems to get very high scores in very few iterations.
I can't understand if TensorFlow is performing very bad or maybe Spark's scores are not truthful. And in both cases I think we aren't seeing something important.
Initializing weights as uniform and bias as 1 is certainly not a good idea, and it may very well be the cause of this discrepancy.
Use normal or truncated_normal instead, with the default zero mean and a small variance for the weights:
weights = {
'h1': tf.Variable(tf.truncated_normal([n_input, n_hidden_1],
stddev=0.01, dtype=tf.float32)),
'out': tf.Variable(tf.truncated_normal([n_hidden_1, n_classes],
stddev=0.01, dtype=tf.float32))
}
and zero for the biases:
biases = {
'b1': tf.Variable(tf.constant(0, shape=[n_hidden_1], dtype=tf.float32)),
'out': tf.Variable(tf.constant(0, shape=[n_classes], dtype=tf.float32))
}
That said, I am not sure about the correctness of using the MulticlassClassificationEvaluator for a binary classification problem, and I would suggest doing some further manual checks to confirm that the function indeed returns what you think it returns...
I have tried dropout implementation in Tensorflow.
I do know that dropout should be declared as a placeholder and keep_prob parameter during training and testing should be different. However still almost broke my brain trying to find why with dropout the accuracy is so low. When keep_drop = 1, the train accuracy 99%, test accuracy 85%, with keep_drop = 0.5, both train and test accuracy is 16% Any ideas where to look into, anyone? Thank you!
def forward_propagation(X, parameters, keep_prob):
"""
Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX
Arguments:
X -- input dataset placeholder, of shape (input size, number of examples)
parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3"
the shapes are given in initialize_parameters
Returns:
Z3 -- the output of the last LINEAR unit
"""
# Retrieve the parameters from the dictionary "parameters"
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
W3 = parameters['W3']
b3 = parameters['b3']
Z1 = tf.add(tf.matmul(W1,X),b1) # Z1 = np.dot(W1, X) + b1
A1 = tf.nn.relu(Z1) # A1 = relu(Z1)
A1 = tf.nn.dropout(A1,keep_prob) # apply dropout
Z2 = tf.add(tf.matmul(W2,A1),b2) # Z2 = np.dot(W2, a1) + b2
A2 = tf.nn.relu(Z2) # A2 = relu(Z2)
A2 = tf.nn.dropout(A2,keep_prob) # apply dropout
Z3 = tf.add(tf.matmul(W3,A2),b3) # Z3 = np.dot(W3,A2) + b3
return Z3
def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001, lambd = 0.03, train_keep_prob = 0.5,
num_epochs = 800, minibatch_size = 32, print_cost = True):
"""
Implements a three-layer tensorflow neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX.
Arguments:
X_train -- training set, of shape (input size = 12288, number of training examples = 1080)
Y_train -- test set, of shape (output size = 6, number of training examples = 1080)
X_test -- training set, of shape (input size = 12288, number of training examples = 120)
Y_test -- test set, of shape (output size = 6, number of test examples = 120)
learning_rate -- learning rate of the optimization
lambd -- L2 regularization hyperparameter
train_keep_prob -- probability of keeping a neuron in hidden layer for dropout implementation
num_epochs -- number of epochs of the optimization loop
minibatch_size -- size of a minibatch
print_cost -- True to print the cost every 100 epochs
Returns:
parameters -- parameters learnt by the model. They can then be used to predict.
"""
ops.reset_default_graph() # to be able to rerun the model without overwriting tf variables
tf.set_random_seed(1) # to keep consistent results
seed = 3 # to keep consistent results
(n_x, m) = X_train.shape # (n_x: input size, m : number of examples in the train set)
n_y = Y_train.shape[0] # n_y : output size
costs = [] # To keep track of the cost
# Create Placeholders of shape (n_x, n_y)
X, Y = create_placeholders(n_x, n_y)
keep_prob = tf.placeholder(tf.float32)
# Initialize parameters
parameters = initialize_parameters()
# Forward propagation: Build the forward propagation in the tensorflow graph
Z3 = forward_propagation(X, parameters, keep_prob)
# Cost function: Add cost function to tensorflow graph
cost = compute_cost(Z3, Y, parameters, lambd)
# Backpropagation.
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
# Initialize all the variables
init = tf.global_variables_initializer()
# Start the session to compute the tensorflow graph
with tf.Session() as sess:
# Run the initialization
sess.run(init)
# Do the training loop
for epoch in range(num_epochs):
epoch_cost = 0. # Defines a cost related to an epoch
num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set
seed = seed + 1
minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)
for minibatch in minibatches:
# Select a minibatch
(minibatch_X, minibatch_Y) = minibatch
# IMPORTANT: The line that runs the graph on a minibatch.
# Run the session to execute the "optimizer" and the "cost", the feedict should contain a minibatch for (X,Y).
_ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y, keep_prob: train_keep_prob})
epoch_cost += minibatch_cost / num_minibatches
# Print the cost every epoch
if print_cost == True and epoch % 100 == 0:
print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
if print_cost == True and epoch % 5 == 0:
costs.append(epoch_cost)
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()
# lets save the parameters in a variable
parameters = sess.run(parameters)
print ("Parameters have been trained!")
# Calculate the correct predictions
correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))
# Calculate accuracy on the test set
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print ("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train, keep_prob: 1.0}))
print ("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test, keep_prob: 1.0}))
return parameters
The algo is correct. It is just the keep_prob = 0.5 is too low.
Managed to get 87% accuracy on the test set with the following hyperparameters:
learning_rate = 0.00002, lambd = 0.03, train_keep_prob = 0.90, num_epochs = 1500, minibatch_size = 32,
In the first case your model was overfitting to the data, hence the large difference between the train and test accuracy. Dropout is a regularization technique to reduce the variance of the model by reducing the effect of particular nodes and hence prevent overfitting. But keeping the keep_prob = 0.5(too low) weakens the model and hence it underfits severely to the data, giving an accuracy as low as 16%. You should iterate by gradually decreasing the keep_prob value untill you find a suitable value.
Let's assume i have trained a model for the MNist task, given the following code:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
import tensorflow as tf
# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
display_step = 1
# Network Parameters
n_hidden_1 = 256 # 1st layer number of features
n_hidden_2 = 256 # 2nd layer number of features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
# Create model
def multilayer_perceptron(x, weights, biases):
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# Hidden layer with RELU activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Output layer with linear activation
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Test model
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(training_epochs):
avg_cost = 0.
avg_acc = 0.
total_batch = int(mnist.train.num_examples/batch_size)
# Loop over all batches
for i in range(total_batch):
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})
batch_acc = accuracy.eval({x: batch_x, y: batch_y})
# Compute average loss
avg_cost += c / total_batch
avg_acc += batch_acc / total_batch
# Display logs per epoch step
if epoch % display_step == 0:
test_acc = accuracy.eval({x: mnist.test.images, y: mnist.test.labels})
print(
"Epoch:",
'%04d' % (epoch+1),
"cost=",
"{:.9f}".format(avg_cost),
"average_train_accuracy=",
"{:.6f}".format(avg_acc),
"test_accuracy=",
"{:.6f}".format(test_acc)
)
print("Optimization Finished!")
So this model predicts the number shown in an image given the image.
Once i have trained it, could i make the input a 'variable' instead of 'placeholder' and try to reverse engineer the input given an output ?
For example i would like to feed the output '8' and produce a representative image of number eight.
I thought of:
Freezing the model
Add a variable matrix 'M' of the same size as the input between the input and the weights
Feed an Identical matrix as input to the input placeholder
Run the optimizer to learn the 'M' matrix.
Is there a better way ?
If your goal is to reverse the model in the sense that the input should be a digit and the output an image displaying that digit (in say, handwritten form), it is not quite possible to do with machine learning models.
Because machine learning models attempt to create generalizations from the input (so that similar input will provide similar output, although the model was never trained on it) they tend to be quite lossy. Additionally, the reduction from hundreds, thousands and more input variables into a single output variable obviously has to lose some information in the process.
More specifically, although a Multilayer Perceptron (as you're using in your example) is a fully connected Neural Network, some weights are expected to be zero, thus completely dropping the information in certain input variables. Moverover, the same output of a neuron can be retrieved by multiple distinctive input values to it's function, due to the many degrees of freedom.
It is theoretically possible to replace those degrees of freedom and lost information with specifically crafted or random data, but that does not guarantee a successful output.
On a side note, I'm a bit puzzled by this question. If you are able to generate that model yourself, you could also create a similar model that does the opposite. You could train a model to accept an input digit (and perhaps some random seed) and output an image.
I have created ANN with two RELU hidden layers + linear activation layer and trying to approximate simple ln(x) function. And I am can't do this good. I am confused because lx(x) in x:[0.0-1.0] range should be approximated without problems (I am using learning rate 0.01 and basic grad descent optimization).
import tensorflow as tf
import numpy as np
def GetTargetResult(x):
curY = np.log(x)
return curY
# Create model
def multilayer_perceptron(x, weights, biases):
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# # Hidden layer with RELU activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Output layer with linear activation
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
# Parameters
learning_rate = 0.01
training_epochs = 10000
batch_size = 50
display_step = 500
# Network Parameters
n_hidden_1 = 50 # 1st layer number of features
n_hidden_2 = 10 # 2nd layer number of features
n_input = 1
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_uniform([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_uniform([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_uniform([n_hidden_2, 1]))
}
biases = {
'b1': tf.Variable(tf.random_uniform([n_hidden_1])),
'b2': tf.Variable(tf.random_uniform([n_hidden_2])),
'out': tf.Variable(tf.random_uniform([1]))
}
x_data = tf.placeholder(tf.float32, [None, 1])
y_data = tf.placeholder(tf.float32, [None, 1])
# Construct model
pred = multilayer_perceptron(x_data, weights, biases)
# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(pred - y_data))
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train = optimizer.minimize(loss)
# Before starting, initialize the variables. We will 'run' this first.
init = tf.initialize_all_variables ()
# Launch the graph.
sess = tf.Session()
sess.run(init)
for step in range(training_epochs):
x_in = np.random.rand(batch_size, 1).astype(np.float32)
y_in = GetTargetResult(x_in)
sess.run(train, feed_dict = {x_data: x_in, y_data: y_in})
if(step % display_step == 0):
curX = np.random.rand(1, 1).astype(np.float32)
curY = GetTargetResult(curX)
curPrediction = sess.run(pred, feed_dict={x_data: curX})
curLoss = sess.run(loss, feed_dict={x_data: curX, y_data: curY})
print("For x = {0} and target y = {1} prediction was y = {2} and squared loss was = {3}".format(curX, curY,curPrediction, curLoss))
For the configuration above NN is just learning to guess y = -1.00. I have tried different learning rates, couple optimizers and different configurations with no success - learning does not converge in any case. I did something like that with logarithm in past in other deep learning framework without problem.. Can be the TF specific issue? What am I doing wrong?
What your network has to predict
Source: WolframAlpha
What your architecture is
ReLU(ReLU(x * W_1 + b_1) * W_2 + b_2)*W_out + b_out
Thoughts
My first thought was that ReLU is the problem. However, you don't apply relu to the output, so that should not cause the problem.
Changing the initialization (from uniform to normal) and the Optimizer (from SGD to ADAM) seems to fix the problem:
#!/usr/bin/env python
import tensorflow as tf
import numpy as np
def get_target_result(x):
return np.log(x)
def multilayer_perceptron(x, weights, biases):
"""Create model."""
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# # Hidden layer with RELU activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Output layer with linear activation
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
# Parameters
learning_rate = 0.01
training_epochs = 10**6
batch_size = 500
display_step = 500
# Network Parameters
n_hidden_1 = 50 # 1st layer number of features
n_hidden_2 = 10 # 2nd layer number of features
n_input = 1
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.truncated_normal([n_input, n_hidden_1], stddev=0.1)),
'h2': tf.Variable(tf.truncated_normal([n_hidden_1, n_hidden_2], stddev=0.1)),
'out': tf.Variable(tf.truncated_normal([n_hidden_2, 1], stddev=0.1))
}
biases = {
'b1': tf.Variable(tf.constant(0.1, shape=[n_hidden_1])),
'b2': tf.Variable(tf.constant(0.1, shape=[n_hidden_2])),
'out': tf.Variable(tf.constant(0.1, shape=[1]))
}
x_data = tf.placeholder(tf.float32, [None, 1])
y_data = tf.placeholder(tf.float32, [None, 1])
# Construct model
pred = multilayer_perceptron(x_data, weights, biases)
# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(pred - y_data))
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
# train = optimizer.minimize(loss)
train = tf.train.AdamOptimizer(1e-4).minimize(loss)
# Before starting, initialize the variables. We will 'run' this first.
init = tf.initialize_all_variables()
# Launch the graph.
sess = tf.Session()
sess.run(init)
for step in range(training_epochs):
x_in = np.random.rand(batch_size, 1).astype(np.float32)
y_in = get_target_result(x_in)
sess.run(train, feed_dict={x_data: x_in, y_data: y_in})
if(step % display_step == 0):
curX = np.random.rand(1, 1).astype(np.float32)
curY = get_target_result(curX)
curPrediction = sess.run(pred, feed_dict={x_data: curX})
curLoss = sess.run(loss, feed_dict={x_data: curX, y_data: curY})
print(("For x = {0} and target y = {1} prediction was y = {2} and "
"squared loss was = {3}").format(curX, curY,
curPrediction, curLoss))
Training this for 1 minute gave me:
For x = [[ 0.19118255]] and target y = [[-1.65452647]] prediction was y = [[-1.65021849]] and squared loss was = 1.85587377928e-05
For x = [[ 0.17362741]] and target y = [[-1.75084364]] prediction was y = [[-1.74087048]] and squared loss was = 9.94640868157e-05
For x = [[ 0.60853624]] and target y = [[-0.4966988]] prediction was y = [[-0.49964082]] and squared loss was = 8.65551464813e-06
For x = [[ 0.33864763]] and target y = [[-1.08279514]] prediction was y = [[-1.08586168]] and squared loss was = 9.4036658993e-06
For x = [[ 0.79126364]] and target y = [[-0.23412406]] prediction was y = [[-0.24541236]] and squared loss was = 0.000127425722894
For x = [[ 0.09994856]] and target y = [[-2.30309963]] prediction was y = [[-2.29796076]] and squared loss was = 2.6408026315e-05
For x = [[ 0.31053194]] and target y = [[-1.16946852]] prediction was y = [[-1.17038012]] and squared loss was = 8.31002580526e-07
For x = [[ 0.0512077]] and target y = [[-2.97186542]] prediction was y = [[-2.96796203]] and squared loss was = 1.52364455062e-05
For x = [[ 0.120253]] and target y = [[-2.11815739]] prediction was y = [[-2.12729549]] and squared loss was = 8.35050013848e-05
So the answer might be that your optimizer is not good / the optimization problem starts at a bad point. See
Xavier Glorot, Yoshua Bengio: Understanding the difficulty of training deep feedforward neural networks
Visualizing Optimization Algos
The following image is from Alec Radfords nice gifs. It does not contain ADAM, but you get a feeling for how much better one can do than SGD:
Two idea how this might be improved
try dropout
try not to use x values close to 0. I would rather sample values in [0.01, 1].
However, my experience with regression problems is quite limited.
First of all, your input data is in range [0, 1), which is not a good input to a neural network. Subtract mean from x after computing y to make it normalized (also ideally divide by standard deviation).
However, in your particular case it was not enough to make it work.
I played with it and found two ways to make it work (both require data normalization as described above):
Either completely remove the second layer
or
Make the number of neurons in the second layer 50.
My guess would be that 10 neurons do not have sufficient representation power to pass enough information to the last layer (obviously, a perfectly smart NN would learn to ignore the second layer in this case passing the answer in one of the neurons, but the theoretical possibility doesn't mean that gradient descent will learn to do so).
I have not look at the code but this is the theory. If you use an activation function like "tanh", then for small weights the activation function is in the linear region and for large weights the activation function is either -1 or +1. If you are in the linear region across all layers then you can not approximate complex functions (i.e. you have a sandwich of linear layers hence the best you can do is linear aproximations) but if you have bigger weights then the nonlinearly allow you to approximate a wide range of functions. There are no free lunches, the weights need to be at the right values to avoid over-fitting and under-fitting. This process is called regularization.