XOR problem with 2-2-1 configuration should always predict output accurately? - tensorflow

I am trying to solve the XOR problem using the following code:
import numpy as np
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras.utils import plot_model
from tensorflow.keras.optimizers import SGD, Adam
# input data
x = np.array([[0,0], [0,1], [1,0], [1,1]], 'float32')
y = np.array([[0], [1], [1], [0]], 'float32')
### Model
model = Sequential()
# add layers (architecture)
model.add(Dense(2, activation = 'relu')
model.add(Dense(1, activation = 'sigmoid'))
# compile
model.compile(loss = 'mean_squared_error',
optimizer = SGD(learning_rate = 0.1, momentum=0.8),
metrics = ['accuracy'])
# train
model.fit(x, y, epochs = 25000, batch_size = 1)
# evaluate
ev = model.evaluate(x, y)
I already tested:
using different activation functions in the hidden layer (sigmoid and tanh)
using different learning rates and momentum
Also, I am running with a high number of epochs (25000). Still, it only accurately predicts all outputs a few times. Most of the times accuracy is equal to 0.5 or 0.75.
I have read that this is the minimum configuration to solve this problem. However, it also seems that the error surface presents a number of regions with local minima.
My question is:
Should I assume that the model is correct and can learn the problem, although sometimes it gets 'stuck' in a local minima, OR do I still need to improve my model somehow to solve the XOR more accurately and consistently?

Related

How can I make my Neural Network predict new output values?

I'm working on a project where I have 3 inputs (v, f, n) and 1 output (delta(t)).
I'm trying to test the effect of the inputs on the output and to figure out which input is the most effective in different situations, therefore I would like to predict new output values that depend on new inputs values.
I have been testing this system and I got the following data table:
This table contains 1000 rows.
I'm new to this whole Neural Network thing, so I don't know what should be the Activation function, the loss function, etc.
I've been trying use some Keras models, but I'm getting wrong predictions when trying model.predict() some inputs values.
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
model = Sequential()
model.add(Dense(16, activation='relu', input_shape=(3,)))
model.add(Dense(16, activation='relu'))
model.add(Dense(1))
model.compile(optimizer=Adam(), loss='mse')
data = np.array(pd.read_excel(r'Data.xlsx'))
x = data[:, :3]
y = data[:, 3]
target = model.fit(x, y, validation_split=0.2, epochs=15000,
batch_size=256)
# check some predictions:
print(model.predict([[0.9, 840370875, 240]]))

Resnet-50 adversarial training with cleverhans FGSM accuracy stuck at 5%

I am facing a strange problem when adversarially training a resnet-50, and I am not sure whether is's a logical error, or a bug somewhere in the code/libraries.
I am adversarially training a resnet-50 thats loaded from Keras, using the FastGradientMethod from cleverhans, and expecting the adversarial accuracy to rise at least above 90% (probably 99.x%). The training algorithm, training- and attack-params should be visible in the code.
The problem, as already stated in the title is, that the accuracy is stuck at 5% after training ~3000 of 39002 training inputs in the first epoch. (GermanTrafficSignRecognitionBenchmark, GTSRB).
When training without and adversariy loss function, the accuracy does not get stuck after 3000 samples, but continues to rise > 0.95 in the first epoch.
When substituting the network with a lenet-5, alexnet and vgg19, the code works as expected, and an accuracy absolutely comparabele to the non-adversarial, categorical_corssentropy lossfunction is achieved. I've also tried running the procedure using solely tf-cpu and different versions of tensorflow, the result is always the same.
Code for obtaining ResNet-50:
def build_resnet50(num_classes, img_size):
from tensorflow.keras.applications import ResNet50
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Flatten
resnet = ResNet50(weights='imagenet', include_top=False, input_shape=img_size)
x = Flatten(input_shape=resnet.output.shape)(resnet.output)
x = Dense(1024, activation='sigmoid')(x)
predictions = Dense(num_classes, activation='softmax', name='pred')(x)
model = Model(inputs=[resnet.input], outputs=[predictions])
return model
Training:
def lr_schedule(epoch):
# decreasing learning rate depending on epoch
return 0.001 * (0.1 ** int(epoch / 10))
def train_model(model, xtrain, ytrain, xtest, ytest, lr=0.001, batch_size=32,
epochs=10, result_folder=""):
from cleverhans.attacks import FastGradientMethod
from cleverhans.utils_keras import KerasModelWrapper
import tensorflow as tf
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.callbacks import LearningRateScheduler, ModelCheckpoint
sgd = SGD(lr=lr, decay=1e-6, momentum=0.9, nesterov=True)
model(model.input)
wrap = KerasModelWrapper(model)
sess = tf.compat.v1.keras.backend.get_session()
fgsm = FastGradientMethod(wrap, sess=sess)
fgsm_params = {'eps': 0.01,
'clip_min': 0.,
'clip_max': 1.}
loss = get_adversarial_loss(model, fgsm, fgsm_params)
model.compile(loss=loss, optimizer=sgd, metrics=['accuracy'])
model.fit(xtrain, ytrain,
batch_size=batch_size,
validation_data=(xtest, ytest),
epochs=epochs,
callbacks=[LearningRateScheduler(lr_schedule)])
Loss-function:
def get_adversarial_loss(model, fgsm, fgsm_params):
def adv_loss(y, preds):
import tensorflow as tf
tf.keras.backend.set_learning_phase(False) #turn off dropout during input gradient calculation, to avoid unconnected gradients
# Cross-entropy on the legitimate examples
cross_ent = tf.keras.losses.categorical_crossentropy(y, preds)
# Generate adversarial examples
x_adv = fgsm.generate(model.input, **fgsm_params)
# Consider the attack to be constant
x_adv = tf.stop_gradient(x_adv)
# Cross-entropy on the adversarial examples
preds_adv = model(x_adv)
cross_ent_adv = tf.keras.losses.categorical_crossentropy(y, preds_adv)
tf.keras.backend.set_learning_phase(True) #turn back on
return 0.5 * cross_ent + 0.5 * cross_ent_adv
return adv_loss
Versions used:
tf+tf-gpu: 1.14.0
keras: 2.3.1
cleverhans: > 3.0.1 - latest version pulled from github
It is a side-effect of the way we estimate the moving averages on BatchNormalization.
The mean and variance of the training data that you used are different from the ones of the dataset used to train the ResNet50. Because the momentum on the BatchNormalization has a default value of 0.99, with only 10 iterations it does not converge quickly enough to the correct values for the moving mean and variance. This is not obvious during training when the learning_phase is 1 because BN uses the mean/variance of the batch. Nevertheless when we set learning_phase to 0, the incorrect mean/variance values which are learned during training significantly affect the accuracy.
You can fix this problem by below approachs:
More iterations
Reduce the size of the batch from 32 to 16(to perform more updates per epoch) and increase the number of epochs from 10 to 250. This way the moving average and variance will converge to the correct values.
Change the momentum of BatchNormalization
Keep the number of iterations fixed but change the momentum of the BatchNormalization layer to update more aggressively the rolling mean and variance (not recommended for production models).
On the original snippet, add the following code between reading the base_model and defining the new layers:
# ....
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=input_shape)
# PATCH MOMENTUM - START
import json
conf = json.loads(base_model.to_json())
for l in conf['config']['layers']:
if l['class_name'] == 'BatchNormalization':
l['config']['momentum'] = 0.5
m = Model.from_config(conf['config'])
for l in base_model.layers:
m.get_layer(l.name).set_weights(l.get_weights())
base_model = m
# PATCH MOMENTUM - END
x = base_model.output
# ....
Would also recommend you to try another hack provided bu us here.

Batch normalization layer for CNN-LSTM

Suppose that I have a model like this (this is a model for time series forecasting):
ipt = Input((data.shape[1] ,data.shape[2])) # 1
x = Conv1D(filters = 10, kernel_size = 3, padding = 'causal', activation = 'relu')(ipt) # 2
x = LSTM(15, return_sequences = False)(x) # 3
x = BatchNormalization()(x) # 4
out = Dense(1, activation = 'relu')(x) # 5
Now I want to add batch normalization layer to this network. Considering the fact that batch normalization doesn't work with LSTM, Can I add it before Conv1D layer? I think it's rational to have a batch normalization layer after LSTM.
Also, where can I add Dropout in this network? The same places? (after or before batch normalization?)
What about adding AveragePooling1D between Conv1D and LSTM? Is it possible to add batch normalization between Conv1D and AveragePooling1D in this case without any effect on LSTM layer?
Update: the LayerNormalization implementation I was using was inter-layer, not recurrent as in the original paper; results with latter may prove superior.
BatchNormalization can work with LSTMs - the linked SO gives false advice; in fact, in my application of EEG classification, it dominated LayerNormalization. Now to your case:
"Can I add it before Conv1D"? Don't - instead, standardize your data beforehand, else you're employing an inferior variant to do the same thing
Try both: BatchNormalization before an activation, and after - apply to both Conv1D and LSTM
If your model is exactly as you show it, BN after LSTM may be counterproductive per ability to introduce noise, which can confuse the classifier layer - but this is about being one layer before output, not LSTM
If you aren't using stacked LSTM with return_sequences=True preceding return_sequences=False, you can place Dropout anywhere - before LSTM, after, or both
Spatial Dropout: drop units / channels instead of random activations (see bottom); was shown more effective at reducing coadaptation in CNNs in paper by LeCun, et al, w/ ideas applicable to RNNs. Can considerably increase convergence time, but also improve performance
recurrent_dropout is still preferable to Dropout for LSTM - however, you can do both; just do not use with with activation='relu', for which LSTM is unstable per a bug
For data of your dimensionality, any sort of Pooling is redundant and may harm performance; scarce data is better transformed via a non-linearity than simple averaging ops
I strongly recommend a SqueezeExcite block after your Conv; it's a form of self-attention - see paper; my implementation for 1D below
I also recommend trying activation='selu' with AlphaDropout and 'lecun_normal' initialization, per paper Self Normalizing Neural Networks
Disclaimer: above advice may not apply to NLP and embed-like tasks
Below is an example template you can use as a starting point; I also recommend the following SO's for further reading: Regularizing RNNs, and Visualizing RNN gradients
from keras.layers import Input, Dense, LSTM, Conv1D, Activation
from keras.layers import AlphaDropout, BatchNormalization
from keras.layers import GlobalAveragePooling1D, Reshape, multiply
from keras.models import Model
import keras.backend as K
import numpy as np
def make_model(batch_shape):
ipt = Input(batch_shape=batch_shape)
x = ConvBlock(ipt)
x = LSTM(16, return_sequences=False, recurrent_dropout=0.2)(x)
# x = BatchNormalization()(x) # may or may not work well
out = Dense(1, activation='relu')
model = Model(ipt, out)
model.compile('nadam', 'mse')
return model
def make_data(batch_shape): # toy data
return (np.random.randn(*batch_shape),
np.random.uniform(0, 2, (batch_shape[0], 1)))
batch_shape = (32, 21, 20)
model = make_model(batch_shape)
x, y = make_data(batch_shape)
model.train_on_batch(x, y)
Functions used:
def ConvBlock(_input): # cleaner code
x = Conv1D(filters=10, kernel_size=3, padding='causal', use_bias=False,
kernel_initializer='lecun_normal')(_input)
x = BatchNormalization(scale=False)(x)
x = Activation('selu')(x)
x = AlphaDropout(0.1)(x)
out = SqueezeExcite(x)
return out
def SqueezeExcite(_input, r=4): # r == "reduction factor"; see paper
filters = K.int_shape(_input)[-1]
se = GlobalAveragePooling1D()(_input)
se = Reshape((1, filters))(se)
se = Dense(filters//r, activation='relu', use_bias=False,
kernel_initializer='he_normal')(se)
se = Dense(filters, activation='sigmoid', use_bias=False,
kernel_initializer='he_normal')(se)
return multiply([_input, se])
Spatial Dropout: pass noise_shape = (batch_size, 1, channels) to Dropout - has the effect below; see Git gist for code:

How to fix flatlined accuracy and NaN loss in tensorflow image classification

I am currently experimenting with TensorFlow and machine learning, and as a challenge, I decided to try and code a machine learning software, on the Kaggle website, that can analyze brain MRI scans and predict if a tumour exists or not. I did so with the code below and began training the model. However, the text that showed up during training showed that none of the loss values (training or validation) had proper values and that the accuracies flatlined, or fluctuated between two numbers (the same numbers each time).
I have looked at other posts but was unable to find anything that gave me tips. I changed my loss function (from sparse_categorical_crossentropy to binary_crossentropy). But none of these changed the values.
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import tensorflow as tf
from tensorflow import keras
import numpy as np
import cv2
import pandas as pd
from random import shuffle
IMG_SIZE = 50
data_path = "../input/brain_tumor_dataset"
data = []
folders = os.listdir(data_path)
for folder in folders:
for file in os.listdir(os.path.join(data_path, folder)):
if file.endswith("jpg") or file.endswith("jpeg") or file.endswith("png") or file.endswith("JPG"):
data.append(os.path.join(data_path, folder, file))
shuffle(data)
images = []
labels = []
for file in data:
img = cv2.imread(file)
img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
images.append(img)
if "Y" in file:
labels.append(1)
else:
labels.append(0)
union_list = list(zip(images, labels))
shuffle(union_list)
images, labels = zip(*union_list)
images = np.array(images)
labels = np.array(labels)
train_img = images[:200]
train_lbl = labels[:200]
val_img = images[200:]
val_lbl = labels[200:]
train_img = np.array(train_img)
val_img = np.array(val_img)
train_img = train_img.astype("float32") / 255.0
val_img = val_img.astype("float32") / 255.0
model = keras.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), padding='same', activation=tf.nn.relu, input_shape=(IMG_SIZE, IMG_SIZE, 3)),
tf.keras.layers.MaxPooling2D((2,2), strides=2),
tf.keras.layers.Conv2D(64, (3, 3), padding='same', activation=tf.nn.relu),
tf.keras.layers.MaxPooling2D((2,2), strides=2),
tf.keras.layers.Dropout(0.8),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation=tf.nn.relu),
tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(train_img, train_lbl, epochs = 100, validation_data=(val_img, val_lbl))
This should give a result with increasing accuracy, and decreasing loss, but the loss is nan, and the accuracy is flatlined.
I managed to solve the problem. I looked at my code again and realized that my output layer only had one node. However, it needed to output the probabilities for two different categories ('yes' or 'no' for whether it is a tumour or not). Once I changed it to 2 nodes, the network began working properly and reached 95% accuracy on both the training and validation sets.
My validation accuracy still fluctuates a little between a few values, but this is most likely because I only have 23 images in the validation set. In order to decrease the fluctuations, however, I also decreased the epoch number to just 10. Everything seems to be great now.
It's likely the cause of the flatlining accuracy is the NaN loss. I'd try to figure out at what point in the computation the loss is becoming NaN (in inference? in the optimiser? in the loss calculation?). This post details some methods for outputting these intermediate values.

Keras CNN overfitting for more than four classes

I'm trying to train a classifier on Google QuickDraw drawings using Keras:
import numpy as np
from tensorflow.keras.layers import Conv2D, Dense, Flatten, MaxPooling2D
from tensorflow.keras.models import Sequential
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=5, data_format="channels_last", activation="relu", input_shape=(28, 28, 1)))
model.add(MaxPooling2D(data_format="channels_last"))
model.add(Conv2D(filters=16, kernel_size=3, data_format="channels_last", activation="relu"))
model.add(MaxPooling2D(data_format="channels_last"))
model.add(Flatten(data_format="channels_last"))
model.add(Dense(units=128, activation="relu"))
model.add(Dense(units=64, activation="relu"))
model.add(Dense(units=4, activation="softmax"))
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
x = np.load("./x.npy")
y = np.load("./y.npy")
model.fit(x=x, y=y, batch_size=100, epochs=40, validation_split=0.2)
The input data is a 4d array with 12000 normalized images (28 x 28 x 1) per class. The output data is an array of one hot encoded vectors.
If I train this model on four classes, it produces convincing results:
(red is training data, blue is validation data)
I know the model is slightly overfitted. However, I want to keep the architecture as simple as possible, so I accepted that.
My problem is that as soon as I add just one arbitrary class, the model starts to overfit extremely:
I tried many different things to prevent it from overfitting such as Batch Normalization, Dropout, Kernel Regularizers, much more training data and different batch sizes, none of which caused any significant improvement.
What could be the reason why my CNN overfits so much?
EDIT: This is the code I used to create x.npy and y.npy:
import numpy as np
from tensorflow.keras.utils import to_categorical
files = ['cat.npy', 'dog.npy', 'apple.npy', 'banana.npy', 'flower.npy']
SAMPLES = 12000
x = np.concatenate([np.load(f'./data/{f}')[:SAMPLES] for f in files]) / 255.0
y = np.concatenate([np.full(SAMPLES, i) for i in range(len(files))])
# (samples, rows, cols, channels)
x = x.reshape(x.shape[0], 28, 28, 1).astype('float32')
y = to_categorical(y)
np.save('./x.npy', x)
np.save('./y.npy', y)
The .npy files come from here.
The problem lies with how the data split is done. Notice that there are 5 classes and you do 0.2 validation split. By default there's no shuffling and in your code you feed the data in a sequential order. What that means:
Training data consists entirely of 4 classes: 'cat.npy', 'dog.npy', 'apple.npy', 'banana.npy'. That's the 0.8 training split.
Test data is 'flower.npy'. That's your 0.2 validation split. The model was never trained on this so it gets terrible accuracy.
Such results are only possible thanks to the fact that the validation_split=0.2, so you get close to perfect class separation.
Solution
x = np.load("./x.npy")
y = np.load("./y.npy")
# Shuffle the data!
p = np.random.permutation(len(x))
x = x[p]
y = y[p]
model.fit(x=x, y=y, batch_size=100, epochs=40, validation_split=0.2)
if my hypothesis is correct, setting the validation_split to e.g. 0.5 should also get you much better results (though it's not a solution).