I use this code to build a regression model
training_input_func = tf.estimator.inputs.pandas_input_fn( x=x_train,
y=y_train['Price'],
batch_size=256,
num_epochs=500,
shuffle=True )
regressor = tf.estimator.DNNRegressor(feature_columns = feature_cols,
activation_fn = tf.nn.relu,
hidden_units=[100, 50, 100],
model_dir = 'model',
optimizer = tf.train.GradientDescentOptimizer( learning_rate= 0.01 ))
regressor.train(input_fn = training_input_func, steps=2000)
It takes 2-3 min to execute, But when i try this code using keras
epochs = 500
batch_size = 256
model_1 = keras.Sequential()
model_1.add(Dense(100, activation ="tanh"))
model_1.add(Dense(50, activation ="relu"))
model_1.add(Dense(y_train_array.shape[0]))
model_1.compile(loss='mean_squared_error', optimizer=Adam(), metrics=[metrics.mae])
model_1.fit(x_train_array, y_train_array,
batch_size=batch_size,
epochs=epochs,
shuffle=True,
verbose=2, # Change it to 2, if wished to observe execution. o if not
validation_data=(x_validation_array, y_validation_array),
callbacks=keras_callbacks,
use_multiprocessing=True,
workers=50)
It takes almost 3-4 hour for first epoch. The size of training example is around 3M and validation is 30 k. Is there any problem in my code? I know keras take more time time compare to tensorflow.
Related
I'm working on image classification task for diabetic retinopathy with fundus image data. There are 5 classes. The data distribution is 1805 images (class 1), 370 images (class 2), 999 images (class 3), 193 images (class 4), 295 images (class 5).
Here are the steps that I have tried to run:
Preprocessing (resized 224 * 224)
The divide of train and test data is 85% : 15%
x_train, xtest, y_train, ytest = train_test_split(
x_train, y_train,
test_size = 0.15,
random_state=SEED,
stratify = y_train
)
Data agumentation
ImageDataGenerator(
zoom_range=0.15,
fill_mode='constant',
cval=0.,
horizontal_flip=True,
vertical_flip=True,
)
Training with the ResNet-50 model and cross-validation
def getResNet():
modelres = ResNet50(weights=None, include_top=False, input_shape= (IMAGE_HEIGHT,IMAGE_HEIGHT, 3))
x = modelres.output
x = GlobalAveragePooling2D()(x)
x = Dense(5, activation= 'softmax')(x)
model = Model(inputs = modelres.input, outputs = x)
return model
num_folds = 5
skf = StratifiedKFold(n_splits = 5, shuffle=True, random_state=2021)
cvscores = []
fold = 1
for train, val in skf.split(x_train, y_train.argmax(1)):
print('Fold: ', fold)
Xtrain = x_train[train]
Xval = x_train[val]
Ytrain = y_train[train]
Yval = y_train[val]
data_generator = create_datagen().flow(Xtrain, Ytrain, batch_size=32, seed=2021)
model = getResNet()
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.0001),
metrics=['accuracy'])
with tf.compat.v1.device('/device:GPU:0'):
model_train = model.fit(data_generator,
validation_data=(Xval, Yval),
epochs=30, batch_size = 32, verbose=1)
model_name = 'cnn_keras_aug_Fold_'+str(fold)+'.h5'
model.save(model_name)
scores = model.evaluate(xtest, ytest, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
cvscores.append(scores[1] * 100)
fold = fold +1
The maximum results I got from this method were training accuracy of 81.2%, validation accuracy of 72.2%, and test accuracy of 70.73%.
Can anyone give me an idea to improve the model so that I can get the test accuracy above 90% as possible?
Later, I will use this model as a pre-trained model to train diabetic retinopathy data as well but from other sources.
BTW, I've tried replacing my preprocessing with this method:
def preprocessing(path):
image = cv2.imread(path)
image = crop_image_from_gray(image)
green = image[:,:,1]
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
cl = clahe.apply(green)
image[:,:,0] = image[:,:,0]
image[:,:,2] = image[:,:,2]
image[:,:,1] = cl
image = cv2.resize(image, (224,224))
return image
I've also tried to replace my model with VGG16, EfficientNetB0. However, none of that had much effect on my results. I'm still stucked with about 70% accuracy.
Please help me come up with ideas to improve my modeling results. I hope.
Your training accuracy is 81.2%. It is generally impossible to have testing accuracy higher that training accuracy, i.e. with current setup you will not achieve 90%.
However, your validation (and also testing) accuracy is about 70-72%. I can suggest that on your small dataset your model is overfitting. So if you add model regularization (e.g. dropout), it is possible that the gap between your training and your validation (and test) will decrease. This way you can improve your validation score.
To further increase the score, you need to check your data manually and try to understand which classes contribute the most to the errors and figure out how those errors can be reduced (e.g. updating your preprocessing pipeline).
I've to predict the time dependence of soil wet from the rainfall and some several time series. For all of them I've forecasts and the only to do is prediction of soil wet.
According to guide I build a CNN model, cause Arima's can't take into account outer stohastic influence.
The model work's, but not as it should.
If You have a look on this picture enter image description here, You'll find that the forecasted series(yellow smsfu_sum) doesn't depend on rain (aprec series) as in training set. I want a sharp peak in forecast, but changing the sizes of kernel and pooling don't help.
So I tried to train CNN-LSTM model based on this guide
Here's code of architecture of model :
def build_model(train, n_input):
# prepare data
train_x, train_y = to_supervised(train, n_input)
# define parameters
verbose, epochs, batch_size = 1, 20, 32
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# define model
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=3, activation='softmax', input_shape=(n_timesteps,n_features)))
model.add(Conv1D(filters=64, kernel_size=3, activation='softmax'))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='softmax')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
return model
I used batch size = 32, and split data with function:
def to_supervised(train, n_input, n_out=300):
# flatten data
data = train.reshape((train.shape[0]*train.shape[1], train.shape[2]))
X, y = list(), list()
in_start = 0
# step over the entire history one time step at a time
for _ in range(len(data)):
# define the end of the input sequence
in_end = in_start + n_input
out_end = in_end + n_out
# ensure we have enough data for this instance
if out_end <= len(data):
X.append(data[in_start:in_end, :])
y.append(data[in_end:out_end, 2])
# move along one time step
in_start += 1
return array(X), array(y)
Using n_input = 1000 and n_output = 480 (I've to predict for this time)
So the first iteration on this Network tends the loss function to Nan.
How should I fix it? There no missing values in my data, I droped every NaNs.
I try to train a model to predict gender using Celeba dataset and tensorflow.
This is my model:
train_data_gen = train_image_generator.flow_from_dataframe(
dataframe=train_split,
directory=celeba.images_folder,
x_col='id',
y_col='Male',
target_size=(IMG_WIDTH, IMG_HEIGHT),
batch_size=batch_size,
classes=['1', '0']
)
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(2),
tf.keras.layers.Softmax()
])
base_learning_rate = 0.001
model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=base_learning_rate),
loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
Then I use the following to evaluate the model
test_data_gen = test_image_generator.flow_from_dataframe(
dataframe=test_split,
directory=celeba.images_folder,
x_col='id',
y_col='Male',
target_size=(IMG_WIDTH, IMG_HEIGHT),
batch_size=batch_size,
classes=['1', '0']
)
model = tf.keras.models.load_model("cp-0004.ckpt")
# Re-evaluate the model
loss, acc = model.evaluate(test_data_gen, verbose=2)
which gives accuracy of 0.87
But when I use the following, I get 0.51 accuracy!
pred_test = model.predict(test_data_gen)
pred_df = pd.DataFrame(pred_test, columns=["Male", "Female"])
pred_df[pred_df > 0.5] = "1"
pred_df[pred_df < 0.5] = "0"
# test_split_raw = celeba.split('test', drop_zero=False)
confusion_matrix(test_split["Male"].astype(int).values, np.argmax(pred_df.values, 1))
Can anyone explain why the accuracy from the evaluate function is different?
You want to check test_image_generator.flow_from_dataframe. The default value of shuffle is set to True.
Your generator object therefore yields randomly from your test data.
Your model then predicts for those randomly generated images, but you compare to your ordered dataframe. If you want to compare to test_split["Male"] set shuffle to False. If you don't set shuffle to False you will always get ~0.5 accuracy (If your data is equally distributed)
Another hint: You should use the .evaluate() method if you have labeled data. Using .evaluate() also yields accuracy.
Use .predict() only for new, unlabeled data.
I made a direct comparison between TensorFlow vs Keras with the same parameters and the same dataset (MNIST).
The strange thing is that Keras achieves 96% performance in 10 epochs, while TensorFlow achieves about 70% performance in 10 epochs. I have run this code many times in the same instance and this inconsistency always occurs.
Even setting 50 epochs for TensorFlow, the final performance reaches 90%.
Code:
import keras
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# One hot encoding
from keras.utils import np_utils
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
# Changing the shape of input images and normalizing
x_train = x_train.reshape((60000, 784))
x_test = x_test.reshape((10000, 784))
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
# Creating the neural network
model = Sequential()
model.add(Dense(30, input_dim=784, kernel_initializer='normal', activation='relu'))
model.add(Dense(30, kernel_initializer='normal', activation='relu'))
model.add(Dense(10, kernel_initializer='normal', activation='softmax'))
# Optimizer
optimizer = keras.optimizers.Adam()
# Loss function
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['acc'])
# Training
model.fit(x_train, y_train, epochs=10, batch_size=200, validation_data=(x_test, y_test), verbose=1)
# Checking the final accuracy
accuracy_final = model.evaluate(x_test, y_test, verbose=0)
print('Model Accuracy: ', accuracy_final)
TensorFlow code: (x_train, x_test, y_train, y_test are the same as the input for the Keras code above)
import tensorflow as tf
# Epochs parameters
epochs = 10
batch_size = 200
# Neural network parameters
n_input = 784
n_hidden_1 = 30
n_hidden_2 = 30
n_classes = 10
# Placeholders x, y
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
# Creating the first layer
w1 = tf.Variable(tf.random_normal([n_input, n_hidden_1]))
b1 = tf.Variable(tf.random_normal([n_hidden_1]))
layer_1 = tf.nn.relu(tf.add(tf.matmul(x,w1),b1))
# Creating the second layer
w2 = tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]))
b2 = tf.Variable(tf.random_normal([n_hidden_2]))
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1,w2),b2))
# Creating the output layer
w_out = tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
bias_out = tf.Variable(tf.random_normal([n_classes]))
output = tf.matmul(layer_2, w_out) + bias_out
# Loss function
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = output, labels = y))
# Optimizer
optimizer = tf.train.AdamOptimizer().minimize(cost)
# Making predictions
predictions = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
# Accuracy
accuracy = tf.reduce_mean(tf.cast(predictions, tf.float32))
# Variables that will be used in the training cycle
train_size = x_train.shape[0]
total_batches = train_size / batch_size
# Initializing the variables
init = tf.global_variables_initializer()
# Opening the session
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(epochs):
# Loop through all batch iterations
for i in range(0, train_size, batch_size):
batch_x = x_train[i:i + batch_size]
batch_y = y_train[i:i + batch_size]
# Fit training
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
# Running accuracy (with test data) on each epoch
acc_val = sess.run(accuracy, feed_dict={x: x_test, y: y_test})
# Showing results after each epoch
print ("Epoch: ", "{}".format((epoch + 1)))
print ("Accuracy_val = ", "{:.3f}".format(acc_val))
print ("Training Completed!")
# Checking the final accuracy
checking = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
accuracy_final = tf.reduce_mean(tf.cast(checking, tf.float32))
print ("Model Accuracy:", accuracy_final.eval({x: x_test, y: y_test}))
I'm running everything in the same instance. Can anyone explain this inconsistency?
I think it's the initialization that's the culprit. For example, one real difference is that you initialize bias in TF with random_normal which isn't the best practice, and in fact Keras defaults to initializing the bias to zero, which is the best practice. You don't override this, since you only set kernel_initializer, but not bias_initializer in your Keras code.
Furthermore, things are worse for the weight initializers. You are using RandomNormal for Keras, defined like so:
keras.initializers.RandomNormal(mean=0.0, stddev=0.05, seed=None)
But in TF you use tf.random.normal:
tf.random.normal(shape, mean=0.0, stddev=1.0, dtype=tf.dtypes.float32, seed=None, name=None)
I can tell you that using standard deviation of 0.05 is reasonable for initialization, but using 1.0 is not.
I suspect that if you changed these parameters, things would look better. But if they don't, I'd suggest dumping the TensorFlow graph for both models and just checking by hand to see the differences. The graphs are small enough in this case to double-check.
To some extent this highlights the difference in philosophy between Keras and TF. Keras tries hard to set good defaults for NN training that correspond to what is known to work. But TensorFlow is completely agnostic - you have to know those practices and explicitly code them in. The standard deviation thing is a stellar example: of course it should be 1 by default in a mathematical function, but 0.05 is a good value if you know it will be used to initialize an NN layer.
Answer originally provided by Dmitriy Genzel on Quora.
Below is the code I am using. I commented out the line to convert my model to the TPU model. With GPU for the same amount of data it's taking 7 seconds for an epoch while using TPU it takes 90 secs.
Inp = tf.keras.Input(name='input', shape=(input_dim,), dtype=tf.float32)
x = tf.keras.layers.Dense(900, kernel_initializer='uniform', activation='relu', input_dim=input_dim, name = 'Dense_01')(Inp)
x = tf.keras.layers.Dropout(0.3, name = 'Dropout_02')(x)
output = tf.keras.layers.Dense(stop_criteria, activation='softmax',name = 'Dense_02')(x)
model = tf.keras.Model(inputs=[Inp], outputs=[output])
opt = tf.train.AdamOptimizer(.001)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['acc'])
'''tpu_model = tf.contrib.tpu.keras_to_tpu_model(model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_ADDRESS)))'''
model.fit(X_tra, y_tra, epochs=5, batch_size=batch_size, shuffle=False,
validation_split=0.1, verbose=2)
Here is the link to the notebook
Have you tried the tpu_model.fit_generator method like in the example below?
The other part looks fine.
Also, one problem could be the use of Adam Optimizer. There was smth. about it, but I forgot where the link is. Try another optimizer and the code below and if a different optimizer worked, you know it must be smth. with the Adam Optimizer.
tf.keras.backend.clear_session()
training_model = lstm_model(seq_len=100, batch_size=128, stateful=False)
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
training_model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
tpu_model.fit_generator(
training_generator(seq_len=100, batch_size=1024),
steps_per_epoch=100,
epochs=10,
)
tpu_model.save_weights('/tmp/bard.h5', overwrite=True)