I have two Keras models, let's call them model1 and model2. Both models are a simple perceptron. Here is the code for setting up model1; model2 is identical.
model1 = keras.Sequential([
keras.layers.Dense(100, activation=tf.nn.relu),
keras.layers.Dropout(0.5, noise_shape=None, seed=None),
keras.layers.Dense(26, activation=tf.nn.softmax)
])
model1.compile(optimizer='sgd',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
I want to mix these two models after training them, such that the resulting model is a random sampling of the weights and biases of model1 and model2. So for example, if the weights are represented by [x1,x2,x3,x4...] and [y1,y2,y3.y4...] the result will be a random combination of those [x1, y2, y3, x4...]
I've looked into merging layers of Keras, but do not see a clear way of accomplishing this in the API. I am looking for insight on how to build a new model that consists of a random ~50/50 split of the weights and biases of model1 and model2. Any ideas on how to accomplish this?
Aight, after another week of beating my head against a table, I finally realized how much of a doofus I was being. Here's the function I made to solve this problem, and it is so unbelievably simple.
#Initialize and train model1 and model2, they are the inputs to this function.
def mateKerasNN(net1,net2):
net1weights = net1.get_weights()
net2weights = net2.get_weights()
net3weights = net1.get_weights()
for i in range(len(net1weights)):
for j in range(len(net1weights[i])):
net3weights[i][j] = random.choice([net1weights[i][j],net2weights[i][j]])
return net3weights
model3weights = mateKerasNN(model1,model2)
model3.set_weights(model3weights)
Note, this actually randomizes each neuron's weights as a group. So neuron 1 with its 40 weights all move as one group into the new model, as do neurons 2 through 784. I will be building a version where all the weights are randomized, but this code is a good start.
Related
I have a task to write a neural network. On input of 9 neurons, and output of 4 neurons for a multiclass classification problem. I have tried different models and for all of them:
Drop-out mechanism is used.
Batch normalization is used.
And the resulting neural networks all are overfitting. Precision is <80%, I want to have min 90% precision. Loss is 0.8 on the median.
Please, can you suggest to me what model I should use?
Dataset:
TMS_coefficients.RData file
Part of my code:
(trainX, testX, trainY, testY) = train_test_split(dataset,
values, test_size=0.25, random_state=42)
# модель нейронки
visible = layers.Input(shape=(9,))
hidden0 = layers.Dense(64, activation="tanh")(visible)
batch0 = layers.BatchNormalization()(hidden0)
drop0 = layers.Dropout(0.3)(batch0)
hidden1 = layers.Dense(32, activation="tanh")(drop0)
batch1 = layers.BatchNormalization()(hidden1)
drop1 = layers.Dropout(0.2)(batch1)
hidden2 = layers.Dense(128, activation="tanh")(drop1)
batch2 = layers.BatchNormalization()(hidden2)
drop2 = layers.Dropout(0.5)(batch2)
hidden3 = layers.Dense(64, activation="tanh")(drop2)
batch3 = layers.BatchNormalization()(hidden3)
output = layers.Dense(4, activation="softmax")(batch3)
model = tf.keras.Model(inputs=visible, outputs=output)
model.compile(optimizer=tf.keras.optimizers.Adam(0.0001),
loss='categorical_crossentropy',
metrics=['Precision'],)
history = model.fit(trainX, trainY, validation_data=(testX, testY), epochs=5000, batch_size=256)
From the loss curve, I can say it is not overfitting at all! In fact, your model is underfitting. Why? because, when you have stopped training, the loss curve for the validation set has not become flat yet. That means, your model still has the potential to do well if it was trained more.
The model overfits when the training loss is decreasing (or remains the same) but the validation loss gradually increases without decreasing. This is clearly not the case
So, what you can do:
Try training longer.
Add more layers.
Try different activation functions like ReLU instead of tanh.
Use lower dropout (probably your model is struggling to learn for high value of dropouts).
Make sure you have shuffled your data before train-test splitting (if you are using sklearn for train_test_split() then it is done by default) and also check if the test data is similar to the train data and both of them goes under the same preprocessing steps.
I'm learning how to train RNN model on Keras and I was expecting that training a model to predict the Moving Average of the last N steps would be quite easy.
I have a time series with thousands of steps and I'm able to create a model and train it with batches of data.
If I train it with the following model though, the test set predictions differ a lot from real values. (batch = 30, moving average window = 10)
inputs = tf.keras.Input(shape=(batch_length, num_features))
x = tf.keras.layers.LSTM(10, return_sequences=False)(inputs)
outputs = tf.keras.layers.Dense(num_labels)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="test_model")
To be able to get good predictions, I need to add another layer of TimeDistributed, getting 2D predictions instead of 1D ones (I get one prediction per each time step)
inputs = tf.keras.Input(shape=(batch_length, num_features))
x = tf.keras.layers.LSTM(10, return_sequences=True)(inputs)
x = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(num_labels))(x)
outputs = tf.keras.layers.Dense(num_labels)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="test_model")
I suggest that if your goal is to give as input the last 10 timesteps and have as a prediction the moving average to try a regressor model with Densely Connected layers rather than an RNN. (Linear activation with regularization might work well enough)
That option would be cheaper to train and run than an LSTM
I am using a deep neural network model (implemented in keras)to make predictions. Something like this:
def make_model():
model = Sequential()
model.add(Conv2D(20,(5,5), activation = "relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(20, activation = "relu"))
model.add(Lambda(lambda x: tf.expand_dims(x, axis=1)))
model.add(SimpleRNN(50, activation="relu"))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss = "binary_crossentropy", optimizer = adagrad, metrics = ["accuracy"])
return model
model = make_model()
model.fit(x_train, y_train, validation_data = (x_validation,y_validation), epochs = 25, batch_size = 25, verbose = 1)
##Prediciton:
prediction = model.predict_classes(x)
probabilities = model.predict_proba(x) #I assume these are the probabilities of class being predictied
My problem is a classification(binary) problem. I wish to calculate the confidence score of each of these prediction i.e. I wish to know - Is my model 99% certain it is "0" or is it 58% it is "0".
I have found some views on how to do it, but can't implement them. The approach I wish to follow says: "With classifiers, when you output you can interpret values as the probability of belonging to each specific class. You can use their distribution as a rough measure of how confident you are that an observation belongs to that class."
How should I predict with something like above model so that I get its confidence about each predictions? I would appreciate some practical examples (preferably in Keras).
The softmax is a problematic way to estimate a confidence of the model`s prediction.
There are a few recent papers about this topic.
You can look for "calibration" of neural networks in order to find relevant papers.
This is one example you can start with - https://arxiv.org/pdf/1706.04599.pdf
In Keras, there is a method called predict() that is available for both Sequential and Functional models. It will work fine in your case if you are using binary_crossentropy as your loss function and a final Dense layer with a sigmoid activation function.
Here is how to call it with one test data instance. Below, mymodel.predict() will return an array of two probabilities adding up to 1.0. These values are the confidence scores that you mentioned. You can further use np.where() as shown below to determine which of the two probabilities (the one over 50%) will be the final class.
yhat_probabilities = mymodel.predict(mytestdata, batch_size=1)
yhat_classes = np.where(yhat_probabilities > 0.5, 1, 0).squeeze().item()
I've come to understand that the probabilities that are output by logistic regression can be interpreted as confidence.
Here are some links to help you come to your own conclusion.
https://machinelearningmastery.com/how-to-score-probability-predictions-in-python/
how to assess the confidence score of a prediction with scikit-learn
https://stats.stackexchange.com/questions/34823/can-logistic-regressions-predicted-probability-be-interpreted-as-the-confidence
https://kiwidamien.github.io/are-you-sure-thats-a-probability.html
Feel free to upvote my answer if you find it useful.
How about to use a softmax as the activation in the last layer? Let's say something like this:
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer = adagrad, metrics = ["accuracy"])
In this way, for each data point, you will be given a probabilistic-ish result by the model, which tells what is the likelihood that your data point belongs to each of two classes.
For example for a given X, if the model returns (0.3,0.7), you will know it is more likely that X belongs to class 1 than class 0. and you know that the likelihood has been estimated to be 0.7 over 0.3.
I'm trying to save and load the weights of my model which has merged layer. Since my model code is bit long, let me shorten it by using a simple example and pseudo-style code.
First, my model looks like this:
def some_model():
model1 = Sequential()
model1.add(...)
model2 = Sequential()
model2.add(..)
final_model = Sequential()
final_model.add(Merge[model1, model2], mode='concat')
return final_model
So, after the training, I only saved final_mode's weight.
final_model.save_weights('w_final_model.h5')
And it had no problem when I load the weight for further training or testing.
final_model = some_model()
final_model.load_weights('w_final_model.h5')
So far so good. Yet, my curiosity comes when I tried to investigate the shape of the final_model's layers.
Obviously, the final_model only has its own layers. In other words, it doesn't look carrying all the weights vectors of model1 and model2. But, it still works. I wonder how this can be possible. Or, is it only loading the weights for the final_model layers while the model1 and model2 weights are initialized again? Yet, the network's output is too good to assume that model1 and model2 weights are newly initialized. Indeed, do I need to save each model's weight separately?
For the problem I'm solving, the following is true:
I'm trying to predict the probabilities for the input belonging to each of 12 classes.
It's possible that the input belongs to none of the 12 classes, which means all 12 outputs (probabilities) would be low.
The output probabilities should be independent. That is, if the likelihood of class 1 is 95%, the likelihood of class 2 could be >5% etc. In other words, I don't need the total probabilities to add up to 1 because some classes are similar. To be clear, in practice, each input can only belong to 1 class. What I'm referring to is the probabilities.
The way I'm currently approaching this is as follows:
One hot encode the 12 output classes
Loss function: Categorical crossentropy
Final layer: Dense with 12 neurons and sigmoid activation
Questions
Does it make sense to have 1 model to predict membership of each of these 12 classes? Or does it make more sense to have independent models each of which predicts a probability just for one class? What's better?
Is it better to have 13 classes instead of 12 where the new one represents that the input doesn't belong to any class?
Code
test_fraction = 0.2
dropout_prob = 0.4
activation_function = 'relu'
loss_function = 'categorical_crossentropy'
opt = Adam()
verbose_level = 1
num_targets = 12
batch_size = 32
epochs = 75
X = np.array(keypoints)
labels = np.array(labels)
labels = np_utils.to_categorical(labels)
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=test_fraction, random_state=42)
# model training
model = Sequential()
model.add(Dense(32, input_shape=(180,)))
model.add(Dense(64, activation=activation_function))
model.add(Dense(128, activation=activation_function))
model.add(Dense(num_targets, activation='sigmoid'))
model.summary()
model.compile(loss=loss_function, optimizer=opt, metrics=['accuracy'])
history = model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, verbose=verbose_level,
validation_data=(X_test, y_test))
model.save('../models/model.h5')
This is called multi-label classification, and it can be easily implemented by making two changes in the model:
Use a sigmoid activation at the output
Use the binary_crossentropy loss which can handle multi-label classification.
And then just train your model.
For start correct
model.add(Dense(num_targets, activation='sigmoid'))
to
model.add(Dense(num_targets, activation='softmax'))
sigmoid activation is used for binary classification not multiclass classification
To answer your questions i would say that
you don't need independent models (one model per class), one model for all is fine. One model per class could be from hardware recources point of view unprofitable.
Regarding your second question, to have 13 instead of 12 classes it depends from the problem you are dealing with. If you have data for this 13th class you can train your model if that is what you want to do.