Denoising autoencoder - training with added noise on custom interval - tensorflow

I'm trying to understand denoising autoencoders. I've followed this keras tutorial - https://blog.keras.io/building-autoencoders-in-keras.html
In the tutorial, the training data is created by adding an artificial noise in the following way:
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape)
x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape)
which produces:
Which means the noise as well as the underlying data from MINST dataset have values between 0 and 1.
After applying the trained model, most of the noise is correctly removed:
I'm trying to train the model with only very little artificial noise, but on interval -5 to 5 as following:
def noise_matrix(arr, num, min, max):
m = np.product(arr.shape)
arr.ravel()[np.random.randint(0, m, size=num)] = np.random.uniform(min, max, num)
return arr
x_train_noisy = noise_matrix(x_train, x_train.shape[0] * 2, -5, 5)
x_test_noisy = noise_matrix(x_test, x_test.shape[0] * 2, -5, 5)
which produces:
(Differences in contrast in the above picture is caused by implicit normalization in matplotlib library)
Now, when I train the autoencoder and apply the model, I'm getting the following result:
Most of the noise is not removed. What steps do I need to do in order to remove the noise from interval (-5,5)? I've tried to normalize all the data after adding noise to interval (0,1) but this is not the way to go (I was getting very bad results with this approach).

The decoded image still has obvious noises. Since you did not provide the code used to fit the autoencoder, I am guessing that you are fitting it on the noised data ae.fit(x=x_noised, y=x_noised),
whereas you should be fitting on the original data:
ae.fit(x=x_noised, y=x_original)

Related

Evaluating (model.evaluate) with a triplet loss Siamese neural network model - tensorflow

I have trained a Siamese neural network that uses triplet loss. It was a pain, but I think I managed to do it. However, I am struggling to understand how to make evaluations with this model.
The SNN:
def triplet_loss(y_true, y_pred):
margin = K.constant(1)
return K.mean(K.maximum(K.constant(0), K.square(y_pred[:,0]) - 0.5*(K.square(y_pred[:,1])+K.square(y_pred[:,2])) + margin))
def euclidean_distance(vects):
x, y = vects
return K.sqrt(K.maximum(K.sum(K.square(x - y), axis=1, keepdims=True), K.epsilon()))
anchor_input = Input((max_len, ), name='anchor_input')
positive_input = Input((max_len, ), name='positive_input')
negative_input = Input((max_len, ), name='negative_input')
Shared_DNN = create_base_network(embedding_dim = EMBEDDING_DIM, max_len=MAX_LEN, embed_matrix=embed_matrix)
encoded_anchor = Shared_DNN(anchor_input)
encoded_positive = Shared_DNN(positive_input)
encoded_negative = Shared_DNN(negative_input)
positive_dist = Lambda(euclidean_distance, name='pos_dist')([encoded_anchor, encoded_positive])
negative_dist = Lambda(euclidean_distance, name='neg_dist')([encoded_anchor, encoded_negative])
tertiary_dist = Lambda(euclidean_distance, name='ter_dist')([encoded_positive, encoded_negative])
stacked_dists = Lambda(lambda vects: K.stack(vects, axis=1), name='stacked_dists')([positive_dist, negative_dist, tertiary_dist])
model = Model([anchor_input, positive_input, negative_input], stacked_dists, name='triple_siamese')
model.compile(loss=triplet_loss, optimizer=adam_optim, metrics=[accuracy])
history = model.fit([Anchor,Positive,Negative],y=Y_dummy,validation_data=([Anchor_test,Positive_test,Negative_test],Y_dummy2), batch_size=128, epochs=25)
I understand that once a model is trained with triplets, the evaluation shouldn't actually require that triplets be used. However, how do I finagle this reshaping?
Because this is a SNN, I would want to feed two inputs into model.evaluate, along with a categorical variable denoting if the two inputs are similar or not (1 = similar, 0 = not similar).
So basically, I want model.evaluate(input1, input2, y_label). But I am not sure how to get this with the model that I trained. As shown above, I trained with three inputs: model.fit([Anchor,Positive,Negative],y=Y_dummy ... ) .
I know I should save the weights of my trained model, but I just don't know what model to load the weights onto.
Your help is greatly appreciated!
EDIT:
I am aware of the below approach for prediction, but I am not looking for prediction, I am looking to use model.evaluate as I want to get some final measure of loss/accuracy for the model. Also this approach only feeds the anchor into the model (wheras I'm interested in text similarity, so would want to feed in 2 inputs)
eval_model = Model(inputs=anchor_input, outputs=encoded_anchor)
eval_model.load_weights('weights.hdf5')
Considering that eval_model is trained to produce embeddings, I think that should be good to evaluate the similarity between two embeddings using cosine similarity.
Following the TF documentation, the cosine similarity is a number between -1 and 1. When it is a negative number closer to -1, it indicates greater similarity. When it is a positive number closer to 1, it indicates greater dissimilarity.
We can simply calculate the cosine similarity between Positive and Negative inputs for all the samples at disposal. When the cosine similarity is < 0 we can say that the two inputs are similar (1 = similar, 0 = not similar). In the end, is possible to calculate the binary accuracy as a final metric.
We can make all the calculations using TF and without the need of using model.evaluate.
eval_model = Model(inputs=anchor_input, outputs=encoded_anchor)
eval_model.load_weights('weights.hdf5')
cos_sim = tf.keras.losses.cosine_similarity(
eval_model(X1), eval_model(X2)
).numpy().reshape(-1,1)
accuracy = tf.reduce_mean(tf.keras.metrics.binary_accuracy(Y, -cos_sim, threshold=0))
Another approach consists in computing the cosine similarity between the anchor and positive images and comparing it with the similarity between the anchor and the negative images.
eval_model = Model(inputs=anchor_input, outputs=encoded_anchor)
eval_model.load_weights('weights.hdf5')
positive_similarity = tf.keras.losses.cosine_similarity(
eval_model(X_anchor), eval_model(X_positive)
).numpy().mean()
negative_similarity = tf.keras.losses.cosine_similarity(
eval_model(X_anchor), eval_model(X_negative)
).numpy().mean()
We should expect the similarity between the anchor and positive images to be larger than the similarity between the anchor and the negative images.

Tensorflow Quantum: PQC not optimizing

I have followed the tutorial available at: https://www.tensorflow.org/quantum/tutorials/mnist. I have modified this tutorial to the simplest example I could think of: an input set in which x increases linearly from 0 to 1 and y = x < 0.3. I then use a PQC with a single Rx gate with a symbol, and a readout using a Z gate.
When retrieving the optimized symbol and adjusting it manually, I can easily find a value that provides 100% accuracy, but when I let the Adam optimizer run, it converges to either always predict 1 or always predict -1. Does anybody spot what I do wrong? (and I apologize for not being able to break down the code to a smaller example)
import tensorflow as tf
import tensorflow_quantum as tfq
import cirq
import sympy
import numpy as np
# used to embed classical data in quantum circuits
def convert_to_circuit_cont(image):
"""Encode truncated classical image into quantum datapoint."""
values = np.ndarray.flatten(image)
qubits = cirq.GridQubit.rect(4, 1)
circuit = cirq.Circuit()
for i, value in enumerate(values):
if value:
circuit.append(cirq.rx(value).on(qubits[i]))
return circuit
# define classical dataset
length = 1000
np.random.seed(42)
# create a linearly increasing set for x from 0 to 1 in 1/length steps
x_train_sorted = np.asarray([[x/length] for x in range(0,length)], dtype=np.float32)
# p is used to shuffle x and y similarly
p = np.random.permutation(len(x_train_sorted))
x_train = x_train_sorted[p]
# y = x < 0.3 in {-1, 1} for Hinge loss
y_train_sorted = np.asarray([1 if (x/length)<0.30 else -1 for x in range(0,length)])
y_train = y_train_sorted[p]
# test == train for this example
x_test = x_train_sorted[:]
y_test = y_train_sorted[:]
# convert classical data into quantum circuits
x_train_circ = [convert_to_circuit_cont(x) for x in x_train]
x_test_circ = [convert_to_circuit_cont(x) for x in x_test]
x_train_tfcirc = tfq.convert_to_tensor(x_train_circ)
x_test_tfcirc = tfq.convert_to_tensor(x_test_circ)
# define the PQC circuit, consisting out of 1 qubit with 1 gate (Rx) and 1 parameter
def create_quantum_model():
data_qubits = cirq.GridQubit.rect(1, 1)
circuit = cirq.Circuit()
a = sympy.Symbol("a")
circuit.append(cirq.rx(a).on(data_qubits[0])),
return circuit, cirq.Z(data_qubits[0])
model_circuit, model_readout = create_quantum_model()
# Build the Keras model.
model = tf.keras.Sequential([
# The input is the data-circuit, encoded as a tf.string
tf.keras.layers.Input(shape=(), dtype=tf.string),
# The PQC layer returns the expected value of the readout gate, range [-1,1].
tfq.layers.PQC(model_circuit, model_readout),
])
# used for logging progress during optimization
def hinge_accuracy(y_true, y_pred):
y_true = tf.squeeze(y_true) > 0.0
y_pred = tf.squeeze(y_pred) > 0.0
result = tf.cast(y_true == y_pred, tf.float32)
return tf.reduce_mean(result)
# compile the model with Hinge loss and Adam, as done in the example. Have tried with various learning_rates
model.compile(
loss = tf.keras.losses.Hinge(),
optimizer=tf.keras.optimizers.Adam(learning_rate=0.1),
metrics=[hinge_accuracy])
EPOCHS = 20
BATCH_SIZE = 32
NUM_EXAMPLES = 1000
# fit the model
qnn_history = model.fit(
x_train_tfcirc, y_train,
batch_size=32,
epochs=EPOCHS,
verbose=1,
validation_data=(x_test_tfcirc, y_test),
use_multiprocessing=False)
results = model.predict(x_test_tfcirc)
results_mapped = [-1 if x<=0 else 1 for x in results[:,0]]
print(np.sum(np.equal(results_mapped, y_test)))
After 20 epochs of optimization, I get the following:
1000/1000 [==============================] - 0s 410us/sample - loss: 0.5589 - hinge_accuracy: 0.6982 - val_loss: 0.5530 - val_hinge_accuracy: 0.7070
This results in 700 samples out of 1000 predicted correctly. When looking at the mapped results, this is because all results are predicted as -1. When looking at the raw results, they linearly increase from -0.5484014 to -0.99996257.
When retrieving the weight with w = model.layers[0].get_weights(), subtracting 0.8, and setting it again with model.layers[0].set_weights(w), I get 920/1000 correct. Fine-tuning this process allows me to achieve 1000/1000.
Update 1:
I have also printed the update of the weight over the various epochs:
4.916246, 4.242602, 3.3765688, 2.6855211, 2.3405066, 2.206207, 2.1734586, 2.1656137, 2.1510274, 2.1634471, 2.1683235, 2.188944, 2.1510284, 2.1591303, 2.1632445, 2.1542525, 2.1677444, 2.1702878, 2.163104, 2.1635907
I set the weight to 1.36, a value which gives 908/1000 (as opposed to 700/100). The optimizer moves away from it:
1.7992111, 2.0727847, 2.1370323, 2.15711, 2.1686404, 2.1603785, 2.183334, 2.1563332, 2.156857, 2.169908, 2.1658351, 2.170673, 2.1575692, 2.1505954, 2.1561477, 2.1754034, 2.1545155, 2.1635509, 2.1464484, 2.1707492
One thing that I noticed is that the value for the hinge accuracy was 0.75 with the weight 1.36, which is higher than the 0.7 for 2.17. If this is the case, I am either in an unlucky part of the optimization landscape where the global minimum does not correspond to the minimum of the loss landscape, or the loss value is determined incorrectly. This is what I will be investigating next.
The minima of the Hinge loss function for this examples does not correspond with the maxima of number of correctly classified examples. Please see plot of these w.r.t. the value of the parameter. Given that the optimizer works towards the minima of the loss, not the maxima of the number of classified examples, the code (and framework/optimizer) do what they are supposed to do. Alternatively, one could use a different loss function to try to find a better fit. For example binarized l1 loss. This function would have the same global optimum, but would likely have a very flat landscape.

How does a 1D multi-channel convolutional layer (Keras) train?

I am working with time series EEG data recorded from 10 individual locations on the body to classify future behavior in terms of increasing heart activity. I would like to better understand how my labeled data corresponds to the training inputs.
So far, several RNN configurations as well as countless combinations of vanilla dense networks have not gotten me great results and I'd figure a 1D convnet is worth a try.
The things I'm having trouble understanding are:
1.) Feeding data into the model.
orig shape = (30000 timesteps, 10 channels)
array fed to layer = (300 slices, 100 timesteps, 10 channels)
Are the slices separated by 1 time step, giving me 300 slices of timesteps at either end of the original array, or are they separated end to end? If the second is true, how could I create an array of (30000 - 100) slices separated by one ts and is also compatible with the 1D CNN layer?
2) Matching labels with the training and testing data
My understanding is that when you feed in a sequence of train_x_shape = (30000, 10), there are 30000 labels with train_y_shape = (30000, 2) (2 classes) associated with the train_x data.
So, when (300 slices of) 100 timesteps of train_x data with shape = (300, 100, 10) are fed into the model, does the label value correspond to the entire 100 ts (one label per 100 ts, with this label being equal to the last time step's label), or are each 100 rows/vectors in the slice labeled- one for each ts?
Train input:
train_x = train_x.reshape(train_x.shape[0], 1, train_x.shape[1])
n_timesteps = 100
n_channels = 10
layer : model.add(Convolution1D(filters = n_channels * 2, padding = 'same', kernel_size = 3, input_shape = (n_timesteps, n_channels)))
final layer : model.add(Dense(2, activation = 'softmax'))
I use categorical_crossentropy for loss.
Answer 1
This will really depend on "how did you get those slices"?
The answer is totally dependent on what "you're doing". So, what do you want?
If you have simply reshaped (array.reshape(...)) the original array from shape (30000,10) to shape (300,100,10), the model will see:
300 individual (and not connected) sequences
100 timesteps in each sequence
Sequence 1 goes from step 0 to 299;
Sequence 2 goes from step 300 to 599 and so on.
Creating overlapping slices - Sliding window
If you want to create sequences shifted by only one timestep, make a loop for that.
import numpy as np
originalSequence = someArrayWithShape((30000,10))
newSlices = [] #empty list
start = 0
end = start + 300
while end <= 30000:
newSlices.append(originalSequence[start:end])
start+=1
end+=1
newSlices = np.asarray(newSlices)
Beware: if you do this in the input data, you will have to do a similar thing in your output data as well.
Answer2
Again, that's totally up to you. What do you want to achieve?
Convolutional layers will keep the timesteps with these options:
If you use padding='same', the final length will be the same as the input
If you don't, the final length will be reduced depending on the kernel size you choose
Recurrent layers will keep the timesteps or not depending on:
Whether you use return_sequences=True - Output has timesteps
Or you use return_sequences=False - Output has no timesteps
If you want only one output for each sequence (not per timestep):
Recurrent models:
Use LSTM(...., return_sequences=True) until the last LSTM
The last LSTM will be LSTM(..., return_sequences=False)
Convolutional models:
At some point after the convolutions, choose one of these to add:
GlobalMaxPooling1D
GlobalAveragePooling1D
Flatten (but treat the number of channels later with a Dense(2)
Reshape((2,))
I think I'd go with GlobalMaxPooling2D if using convoltions, but recurrent models seem better for this. (Not a rule, though).
You can choose to use intermediate MaxPooling1D layers to gradually reduce the length from 100 to 50, then to 25 and so on. This will probably reach a better output.
Remember to keep X and Y paired:
import numpy as np
train_x = someArrayWithShape((30000,10))
train_y = someArrayWithShape((30000,2))
newXSlices = [] #empty list
newYSlices = [] #empty list
start = 0
end = start + 300
while end <= 30000:
newXSlices.append(train_x[start:end])
newYSlices.append(train_y[end-1:end])
start+=1
end+=1
newXSlices = np.asarray(newXSlices)
newYSlices = np.asarray(newYSlices)

How to deal with Imbalanced Dataset for Multi Label Classification

I was wondering how to penalize less represented classes more then other classes when dealing with a really imbalanced dataset (10 classes over about 20000 samples but here is th number of occurence for each class : [10868 26 4797 26 8320 26 5278 9412 4485 16172 ]).
I read about the Tensorflow function : weighted_cross_entropy_with_logits (https://www.tensorflow.org/api_docs/python/tf/nn/weighted_cross_entropy_with_logits) but I am not sure I can use it for a multi label problem.
I found a post that sum up perfectly the problem I have (Neural Network for Imbalanced Multi-Class Multi-Label Classification) and that propose an idea but it had no answers and I thought the idea might be good :)
Thank you for your ideas and answers !
First of all, there is my suggestion you can modify your cost function to use in a multi-label way. There is code which show how to use Softmax Cross Entropy in Tensorflow for multilabel image task.
With that code, you can multiple weights in each row of loss calculation. Here is the example code in case you have multi-label task: (i.e, each image can have two labels)
logits_split = tf.split( axis=1, num_or_size_splits=2, value= logits )
labels_split = tf.split( axis=1, num_or_size_splits=2, value= labels )
weights_split = tf.split( axis=1, num_or_size_splits=2, value= weights )
total = 0.0
for i in range ( len(logits_split) ):
temp = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( logits=logits_split[i] , labels=labels_split[i] ))
total += temp * tf.reshape(weights_split[i],[-1])
I think you can just use tf.nn.weighted_cross_entropy_with_logits for multiclass classification.
For example, for 4 classes, where the ratios to the class with the largest number of members are [0.8, 0.5, 0.6, 1], You would just give it a weight vector in the following way:
cross_entropy = tf.nn.weighted_cross_entropy_with_logits(
targets=ground_truth_input, logits=logits,
pos_weight = tf.constant([0.8,0.5,0.6,1]))
So I am not entirely sure that I understand your problem given what you have written. The post you link to writes about multi-label AND multi-class, but that doesn't really make sense given what is written there either. So I will approach this as a multi-class problem where for each sample, you have a single label.
In order to penalize the classes, I implemented a weight Tensor based on the labels in the current batch. For a 3-class problem, you could eg. define the weights as the inverse frequency of the classes, such that if the proportions are [0.1, 0.7, 0.2] for class 1, 2 and 3, respectively, the weights will be [10, 1.43, 5]. Defining a weight tensor based on the current batch is then
weight_per_class = tf.constant([10, 1.43, 5]) # shape (, num_classes)
onehot_labels = tf.one_hot(labels, depth=3) # shape (batch_size, num_classes)
weights = tf.reduce_sum(
tf.multiply(onehot_labels, weight_per_class), axis=1) # shape (batch_size, num_classes)
reduction = tf.losses.Reduction.MEAN # this ensures that we get a weighted mean
loss = tf.losses.softmax_cross_entropy(
onehot_labels=onehot_labels, logits=logits, weights=weights, reduction=reduction)
Using softmax ensures that the classification problem is not 3 independent classifications.

tflearn multi layer perceptron with unexpected prediction

I would like to rebuild a MLP I implemented first with scikit-learn's MLPRegressor with tflearn.
sklearn.neural_network.MLPRegressor implementation:
train_data = pd.read_csv('train_data.csv', delimiter = ';', decimal = ',', header = 0)
test_data = pd.read_csv('test_data.csv', delimiter = ';', decimal = ',', header = 0)
X_train = np.array(train_data.drop(['output'], 1))
X_scaler = StandardScaler()
X_scaler.fit(X_train)
X_train = X_scaler.transform(X_train)
Y_train = np.array(train_data['output'])
clf = MLPRegressor(activation = 'tanh', solver='lbfgs', alpha=0.0001, hidden_layer_sizes=(3))
clf.fit(X_train, Y_train)
prediction = clf.predict(X_train)
The model worked and I got an accuracy of 0.85. Now I would like to build a similar MLP with tflearn. I started with the following code:
train_data = pd.read_csv('train_data.csv', delimiter = ';', decimal = ',', header = 0)
test_data = pd.read_csv('test_data.csv', delimiter = ';', decimal = ',', header = 0)
X_train = np.array(train_data.drop(['output'], 1))
X_scaler = StandardScaler()
X_scaler.fit(X_train)
X_train = X_scaler.transform(X_train)
Y_train = np.array(train_data['output'])
Y_scaler = StandardScaler()
Y_scaler.fit(Y_train)
Y_train = Y_scaler.transform(Y_train.reshape((-1,1)))
net = tfl.input_data(shape=[None, 6])
net = tfl.fully_connected(net, 3, activation='tanh')
net = tfl.fully_connected(net, 1, activation='sigmoid')
net = tfl.regression(net, optimizer='sgd', loss='mean_square', learning_rate=3.)
clf = tfl.DNN(net)
clf.fit(X_train, Y_train, n_epoch=200, show_metric=True)
prediction = clf.predict(X_train)
At some point I definitely configured something the wrong way because the prediction is way off. The range of Y_train is between 20 and 88 and the prediction shows numbers around 0.005. In the tflearn documentation I just found examples for classification.
UPDATE 1:
I realized that the regression layer uses by default 'categorical_crossentropy' as loss-function which is for classification problems. So I selected 'mean_square' instead. I also tried to normalize Y_train. The prediction still not even matches the range of Y_train. Any thoughts?
FINAL UPDATE:
Take a look at the accepted answer.
One step should be not to scale the output.
I am also working on regression problem and I scale only the inputs and it work fine with some neural networks. Although if I use tflearn I get wrong predictions.
I made a couple of actually really dumb mistakes.
First of all I scalled the output to the interval 0 to 1 but used in the output-layer the activatuion function tanh which delivers values from -1 to 1. So I had to use either an activation function that outputs values between 0 and 1 (like e.g. sigmoid) or linear without any scaling applied.
Secondly and most importantly, for my data I chose a pretty bad combination for learning rate and n_epoch. I didn't specify any learning rate and the default one is 0.1, I think. In any case it was too small (I end up using 3.0). At the same time the epoch count (10) was also far too small, with 200 it worked fine.
I also explicitly chose sgd as optimizer (default: adam), which turned out to work a lot better.
I updated the code in my question.