RNN for Speech Emotion Recognition - tensorflow

I want to classify speech data into four different emotions (angry, sad, happy, neutral).
The problem is that when I run RNN code, all speech data classified into one class.
(For example, all speech data classified as "angry" all the time.)
I don't know what is the reason for this problem and what I have to change for training.
Here's my tensorflow RNN main function for training and calculating accuracy:
def RNN(x, weights, biases, lstm_size):
lstm_cell = []
for i in range(lstm_size):
lstm_cell.append(rnn.BasicLSTMCell(hidden_dim, forget_bias=1.0, state_is_tuple=True, activation=tf.nn.sigmoid))
stacked_lstm = tf.contrib.rnn.MultiRNNCell(lstm_cell, state_is_tuple=True)
outputs, states = tf.nn.dynamic_rnn(stacked_lstm, x, dtype=tf.float32)
foutput = tf.contrib.layers.fully_connected(outputs[:,-1], output_dim, activation_fn = None)
return foutput
logits = RNN(X, weights, biases, lstm_size)
prediction = tf.nn.sigmoid(logits)
cost =tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=Y))
learning_rate =tf.train.exponential_decay(learning_rate=initial_learning_rate, global_step=training_steps, decay_steps=training_steps/10, decay_rate=0.96, staircase=True)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(cost)
pred = tf.argmax(prediction, axis=1)
label = tf.argmax(Y, axis=1)
correct_pred = tf.equal(pred, label)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float))
Input for RNN is speech features(pitch and MFCC) and output for RNN is one-hot code.(For example, angry=[1,0,0,0]).
Also, I wonder whether it is right or not to calculate classification accuracy like this.

Related

In tensorflow 1, when the loss function is defined with operations on Tensors, is the model really trained?

First, I m sorry but it's not possible to reproduce this problem on a few lines, as the model involved is a very complex network.
But here is an idea of the code:
def return_iterator(data, nb_epochs, batch_size):
dataset = tf.data.Dataset.from_tensor_slices(data)
dataset = dataset.repeat(nb_epochs).batch(batch_size)
iterator = dataset.make_one_shot_iterator()
yy = iterator.get_next()
return tf.cast(yy, tf.float32)
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
y_pred = complex_model.autoencode(train)
y_pred = tf.convert_to_tensor(y_pred, dtype=tf.float32)
nb_epochs = 10
batch_size = 64
y_real = return_iterator(train, nb_epochs, batch_size)
y_pred = return_iterator(y_pred, nb_epochs, batch_size)
res_equal = 1. - tf.reduce_mean(tf.abs(y_pred - y_real), [1,2,3])
loss = 1 - tf.reduce_sum(res_equal, axis=0)
opt = tf.train.AdamOptimizer().minimize(loss)
tf.global_variables_initializer().run()
for epoch in range(0, nb_epochs):
_, d_loss = sess.run([opt, loss])
To define the loss, I must use operations like tf.reduce_mean and tf.reduce_sum , and these operations only accept Tensors as input.
My question is: with this code, will the complex_model autoencoder be trained during the training ? (eventhough here, it's just used to output the predictions to compute the loss)
Thank you
p.s: I am using TF1.15 (and I cannot use another version)

Transformer didn't work well with tensorflow gradient tape

I implemented transformer with tensorflow 2.0. The model works well when I train the model with model.fit(dataset)
However, when I train the model with tensorflow.GradientTape and evaluate it, the model yields blank space token for all inputs. Here is my code, and tensorflow version is 2.7.0
def loss_function(y_true, y_pred):
y_true = tf.reshape(y_true, shape=(-1, MAX_LENGTH - 1))
loss = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True, reduction='none')(y_true, y_pred)
mask = tf.cast(tf.not_equal(y_true, 0), tf.float32)
loss = tf.multiply(loss, mask)
return tf.reduce_mean(loss)
for epoch in range(num_epochs):
for step, data in enumerate(dataset):
enc_inputs, dec_inputs, outputs = data[0]['inputs'], data[0]['dec_inputs'], data[1]['outputs']
with tf.GradientTape() as tape:
logits = model([enc_inputs, dec_inputs], training = True)
loss = loss_function(outputs, logits)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
I think there is no problem with my transformer model code, because it works well with model.fit(dataset). What's wrong with my code?

BERT Model Fine Tuning and migrating to TF2

I executed this excellent tutorial:
https://towardsdatascience.com/building-a-multi-label-text-classifier-using-bert-and-tensorflow-f188e0ecdc5d
I understood most of it except where model is being created. I would like to know it and migrate to TF2 bert.
When he says "Basically we load the pre-trained model and then train the last layer for classification task.", does it mean that he is freezing all the other layers and fine-tuning the last layer? This is the relevant code (in TF1) which I am not able to understand:
def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,
labels, num_labels, use_one_hot_embeddings):
"""Creates a classification model."""
model = modeling.BertModel(
config=bert_config,
is_training=is_training,
input_ids=input_ids,
input_mask=input_mask,
token_type_ids=segment_ids,
use_one_hot_embeddings=use_one_hot_embeddings)
output_layer = model.get_pooled_output()
hidden_size = output_layer.shape[-1].value
output_weights = tf.get_variable(
"output_weights", [num_labels, hidden_size],
initializer=tf.truncated_normal_initializer(stddev=0.02))
output_bias = tf.get_variable(
"output_bias", [num_labels], initializer=tf.zeros_initializer())
with tf.variable_scope("loss"):
if is_training:
# I.e., 0.1 dropout
output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
logits = tf.matmul(output_layer, output_weights, transpose_b=True)
logits = tf.nn.bias_add(logits, output_bias)
# probabilities = tf.nn.softmax(logits, axis=-1) ### multiclass case
probabilities = tf.nn.sigmoid(logits)#### multi-label case
labels = tf.cast(labels, tf.float32)
tf.logging.info("num_labels:{};logits:{};labels:{}".format(num_labels, logits, labels))
per_example_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)
loss = tf.reduce_mean(per_example_loss)
return (loss, per_example_loss, logits, probabilities)
I went through the TF2 fine tuning tutorials for BERT, but how do I achieve the same? I am able to train other models where step 1 is not required.
Use the official bert example :
https://www.tensorflow.org/tutorials/text/classify_text_with_bert

Can i generate the input given the output in a pretrained Tensorflow model?

Let's assume i have trained a model for the MNist task, given the following code:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
import tensorflow as tf
# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
display_step = 1
# Network Parameters
n_hidden_1 = 256 # 1st layer number of features
n_hidden_2 = 256 # 2nd layer number of features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
# Create model
def multilayer_perceptron(x, weights, biases):
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# Hidden layer with RELU activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Output layer with linear activation
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Test model
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(training_epochs):
avg_cost = 0.
avg_acc = 0.
total_batch = int(mnist.train.num_examples/batch_size)
# Loop over all batches
for i in range(total_batch):
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})
batch_acc = accuracy.eval({x: batch_x, y: batch_y})
# Compute average loss
avg_cost += c / total_batch
avg_acc += batch_acc / total_batch
# Display logs per epoch step
if epoch % display_step == 0:
test_acc = accuracy.eval({x: mnist.test.images, y: mnist.test.labels})
print(
"Epoch:",
'%04d' % (epoch+1),
"cost=",
"{:.9f}".format(avg_cost),
"average_train_accuracy=",
"{:.6f}".format(avg_acc),
"test_accuracy=",
"{:.6f}".format(test_acc)
)
print("Optimization Finished!")
So this model predicts the number shown in an image given the image.
Once i have trained it, could i make the input a 'variable' instead of 'placeholder' and try to reverse engineer the input given an output ?
For example i would like to feed the output '8' and produce a representative image of number eight.
I thought of:
Freezing the model
Add a variable matrix 'M' of the same size as the input between the input and the weights
Feed an Identical matrix as input to the input placeholder
Run the optimizer to learn the 'M' matrix.
Is there a better way ?
If your goal is to reverse the model in the sense that the input should be a digit and the output an image displaying that digit (in say, handwritten form), it is not quite possible to do with machine learning models.
Because machine learning models attempt to create generalizations from the input (so that similar input will provide similar output, although the model was never trained on it) they tend to be quite lossy. Additionally, the reduction from hundreds, thousands and more input variables into a single output variable obviously has to lose some information in the process.
More specifically, although a Multilayer Perceptron (as you're using in your example) is a fully connected Neural Network, some weights are expected to be zero, thus completely dropping the information in certain input variables. Moverover, the same output of a neuron can be retrieved by multiple distinctive input values to it's function, due to the many degrees of freedom.
It is theoretically possible to replace those degrees of freedom and lost information with specifically crafted or random data, but that does not guarantee a successful output.
On a side note, I'm a bit puzzled by this question. If you are able to generate that model yourself, you could also create a similar model that does the opposite. You could train a model to accept an input digit (and perhaps some random seed) and output an image.

NaN values for loss function (MSE) in TensorFlow

I would like to use a Feedforward Neural Network to output a continuous real value, using TensorFlow. My inputs values are, of course, continuous real values too.
I want my net to have two hidden layers and to use MSE as the cost function, so I've defined it like this:
def mse(logits, outputs):
mse = tf.reduce_mean(tf.pow(tf.sub(logits, outputs), 2.0))
return mse
def training(loss, learning_rate):
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_op = optimizer.minimize(loss)
return train_op
def inference_two_hidden_layers(images, hidden1_units, hidden2_units):
with tf.name_scope('hidden1'):
weights = tf.Variable(tf.truncated_normal([WINDOW_SIZE, hidden1_units],stddev=1.0 / math.sqrt(float(WINDOW_SIZE))),name='weights')
biases = tf.Variable(tf.zeros([hidden1_units]),name='biases')
hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)
with tf.name_scope('hidden2'):
weights = tf.Variable(tf.truncated_normal([hidden1_units, hidden2_units],stddev=1.0 / math.sqrt(float(hidden1_units))),name='weights')
biases = tf.Variable(tf.zeros([hidden2_units]),name='biases')
hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
with tf.name_scope('identity'):
weights = tf.Variable(tf.truncated_normal([hidden2_units, 1],stddev=1.0 / math.sqrt(float(hidden2_units))),name='weights')
biases = tf.Variable(tf.zeros([1]),name='biases')
logits = tf.matmul(hidden2, weights) + biases
return logits
I'm doing a batch training and every step I evaluate the train_op and loss operators.
_, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)
The problem is that I'm getting some NaN values as the result of evaluating the loss function. That does NOT happen if I just use a neural network with just one hidden layer like the following:
def inference_one_hidden_layer(inputs, hidden1_units):
with tf.name_scope('hidden1'):
weights = tf.Variable(
tf.truncated_normal([WINDOW_SIZE, hidden1_units],stddev=1.0 / math.sqrt(float(WINDOW_SIZE))),name='weights')
biases = tf.Variable(tf.zeros([hidden1_units]),name='biases')
hidden1 = tf.nn.relu(tf.matmul(inputs, weights) + biases)
with tf.name_scope('identity'):
weights = tf.Variable(
tf.truncated_normal([hidden1_units, NUM_CLASSES],stddev=1.0 / math.sqrt(float(hidden1_units))),name='weights')
biases = tf.Variable(tf.zeros([NUM_CLASSES]),name='biases')
logits = tf.matmul(hidden1, weights) + biases
return logits
Why do I get NaN loss values when using a two hidden layers net?
Mind your learning rate. If you expand your network, you'll have more parameters to learn. That means you also need to decrease the learning rate.
For a high learning rate, your weights will explode. Also your output values will explode then.