tensorflow keras model for solving simple equation - tensorflow

I'm trying to understand how to create a simple tensorflow 2.2 keras model that can predict a simple function value:
f(a, b, c, d) = a < b : max(a/2, c/3) : max (b/2, d/3)
I know this exact question can be reduced to a categorical classification but my intention is to find a good way to build a model that can estimate the value and build more and more functions based on that with a more and more complex conditions later on.
For start I am stumbled upon understanding why a simple function works that hard.
For using with tensorflow on a created model I have:
def generate_input(multiplier):
return np.random.rand(1024 * multiplier, 4) * 1000
def generate_output(input):
def compute_func(row):
return max(row[0]/2, row[2]/3) if row[0] < row[1] else max(row[1]/2, row[3]/3)
return np.apply_along_axis(compute_func, 1, input)
for epochs in range(0, 512):
# print('Generating data...')
train_input = generate_input(1000)
train_output = generate_output(train_input)
# print('Training...')
fit_history = model.fit(
train_input, train_output,
I have tried with different models that are less or more complex but I still didn't got a good conversion.
For example a simple liniar one:
input = Input(shape=(4,))
layer = Dense(8, activation=tanh)(input)
layer = Dense(16, activation=tanh)(layer)
layer = Dense(32, activation=tanh)(layer)
layer = Dense(64, activation=tanh)(layer)
layer = Dense(128, activation=tanh)(layer)
layer = Dense(32, activation=tanh)(layer)
layer = Dense(8, activation=tanh)(layer)
output = Dense(1)(layer)
model = Model(inputs=input, outputs=output)
model.compile(optimizer=Adam(), loss=mean_squared_error)
Can you give point to the direction one should follow in order to solve this type of conditional functions?
Or do I miss some pre-processing?

In my honest opinion, you have a pretty deep model, and therefore, you do not have enough data to train. I do not think you will need that much deep architecture.
Your problem definition is not what I would have done. You actually do not desire to generate the max value at the output, but you want the max value to get selected, right? If it is the case, I would go with a multiclass classification instead of a regression problem in my design. That's saying, I would go with an output = Dense(4)(layer,activation=softmax) as the last layer and in my optimizer, I would use a categorical cross-entropy. Of course, in the output generation, you need to manage to return an array of 3 zeros and one 1, something like this:
def compute_func(row):
if row[0] < row[1]:
if row[0] < row[2]:
if row[1]< row[3]:
return ret_value


concatenate models with weighting or leave seperate for manual weighting?

I have 2 models I am training, one for each column of data in my dataset.
It seems 1 model is fairly accurate in its results so I want to give it a better weight in determining the actual outputs.
I do not know if I should be trying to concatenate these to models and somehow provide the weights using something like a Rescaling layer in keras OR if I should leave them separate then just do my own processing after?
What are the pro's and con's of each?
def get_model(n_inputs_1, n_inputs_2, n_outputs):
inp1 = keras.layers.Input(shape=(n_inputs_1,))
de1 = keras.layers.Dense(DENSE_LAYER_SIZE, activation='relu')(inp1) #
dr1 = keras.layers.Dropout(.2)(de1)
inp2 = keras.layers.Input(shape=(n_inputs_2,))
de2 = keras.layers.Dense(DENSE_LAYER_SIZE, activation='relu')(inp2) #
dr2 = keras.layers.Dropout(.2)(de2)
rs2 = keras.layers.Rescaling(0.01)(dr2) # reduce impact of input 2 - is this ok?
conc = keras.layers.Concatenate()([dr1, rs2])
out = keras.layers.Dense(n_outputs, activation='sigmoid')(conc)
model = keras.models.Model([inp1, inp2], out)
opt = keras.optimizers.Adam(learning_rate=0.01)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['categorical_accuracy'])
return model
Full code here

Discrepancy between results reported by TensorFlow model.evaluate and model.predict

I've been back and forth with this for ages, but without being able to find a solution so far anywhere. So, I have a HuggingFace model ('bert-base-cased') that I'm using with TensorFlow and a custom dataset. I've: (1) tokenized my data (2) split the data; (3) converted the data to TF dataset format; (4) instantiated, compiled and fit the model.
During training, it behaves as you'd expect: training and validation accuracy go up. But when I evaluate the model on the test dataset using TF's model.evaluate and model.predict, the results are very different. The accuracy as reported by model.evaluate is higher (and more or less in line with the validation accuracy); the accuracy as reported by model.predict is about 10% lower. (Maybe it's just a coincidence, but it's similar to the reported training accuracy after the single epoch of fine-tuning.)
Can anyone figure out what's causing this? I include snippets of my code below.
# tokenize the dataset
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path="bert-base-cased",use_fast=False)
def tokenize_function(examples):
return tokenizer(examples['text'], padding = "max_length", truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
# splitting dataset
trainSize = 0.7
valTestSize = 1 - trainSize
train_testvalid = tokenized_datasets.train_test_split(test_size=valTestSize,stratify_by_column='class')
valid_test = train_testvalid['test'].train_test_split(test_size=0.5,stratify_by_column='class')
# renaming each of the datasets for convenience
train_set = train_testvalid['train']
val_set = valid_test['train']
test_set = valid_test['test']
# converting the tokenized datasets to TensorFlow datasets
data_collator = DefaultDataCollator(return_tensors="tf")
tf_train_dataset = train_set.to_tf_dataset(
columns=["attention_mask", "input_ids", "token_type_ids"],
tf_validation_dataset = val_set.to_tf_dataset(
columns=["attention_mask", "input_ids", "token_type_ids"],
tf_test_dataset = test_set.to_tf_dataset(
columns=["attention_mask", "input_ids", "token_type_ids"],
# loading tensorflow model
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=1)
# compiling the model
# fitting model
history = model.fit(tf_train_dataset,
# Evaluating the model on the test data using `evaluate`
results = model.evaluate(x=tf_test_dataset,verbose=2) # reports binary_accuracy: 0.9152
# first attempt at using model.predict method
hits = 0
misses = 0
for x, y in tf_test_dataset:
logits = tf.keras.backend.get_value(model(x, training=False).logits)
labels = tf.keras.backend.get_value(y)
for i in range(len(logits)):
if logits[i][0] < 0:
z = 0
z = 1
if z == labels[i]:
hits += 1
misses += 1
print(hits/(hits+misses)) # reports binary_accuracy: 0.8187
# second attempt at using model.predict method
modelPredictions = model.predict(tf_test_dataset).logits
testDataLabels = np.concatenate([y for x, y in tf_test_dataset], axis=0)
hits = 0
misses = 0
for i in range(len(modelPredictions)):
if modelPredictions[i][0] >= 0:
z = 1
z = 0
if z == testDataLabels[i]:
hits += 1
misses += 1
print(hits/(hits+misses)) # reports binary_accuracy: 0.8187
Things I've tried include:
different loss functions (it's a binary classification problem with the label column of the dataset filled with either a zero or a one for each row);
different ways of unpacking the test dataset and feeding it to model.predict;
altering the 'num_labels' parameter between 1 and 2.
I fixed the problem by changing the num_labels parameter to two and the loss function to sparse categorical cross entropy. (I then had to change my model.predict loop by taking the argmax of the two logits produced by the model.)

Evaluating (model.evaluate) with a triplet loss Siamese neural network model - tensorflow

I have trained a Siamese neural network that uses triplet loss. It was a pain, but I think I managed to do it. However, I am struggling to understand how to make evaluations with this model.
The SNN:
def triplet_loss(y_true, y_pred):
margin = K.constant(1)
return K.mean(K.maximum(K.constant(0), K.square(y_pred[:,0]) - 0.5*(K.square(y_pred[:,1])+K.square(y_pred[:,2])) + margin))
def euclidean_distance(vects):
x, y = vects
return K.sqrt(K.maximum(K.sum(K.square(x - y), axis=1, keepdims=True), K.epsilon()))
anchor_input = Input((max_len, ), name='anchor_input')
positive_input = Input((max_len, ), name='positive_input')
negative_input = Input((max_len, ), name='negative_input')
Shared_DNN = create_base_network(embedding_dim = EMBEDDING_DIM, max_len=MAX_LEN, embed_matrix=embed_matrix)
encoded_anchor = Shared_DNN(anchor_input)
encoded_positive = Shared_DNN(positive_input)
encoded_negative = Shared_DNN(negative_input)
positive_dist = Lambda(euclidean_distance, name='pos_dist')([encoded_anchor, encoded_positive])
negative_dist = Lambda(euclidean_distance, name='neg_dist')([encoded_anchor, encoded_negative])
tertiary_dist = Lambda(euclidean_distance, name='ter_dist')([encoded_positive, encoded_negative])
stacked_dists = Lambda(lambda vects: K.stack(vects, axis=1), name='stacked_dists')([positive_dist, negative_dist, tertiary_dist])
model = Model([anchor_input, positive_input, negative_input], stacked_dists, name='triple_siamese')
model.compile(loss=triplet_loss, optimizer=adam_optim, metrics=[accuracy])
history = model.fit([Anchor,Positive,Negative],y=Y_dummy,validation_data=([Anchor_test,Positive_test,Negative_test],Y_dummy2), batch_size=128, epochs=25)
I understand that once a model is trained with triplets, the evaluation shouldn't actually require that triplets be used. However, how do I finagle this reshaping?
Because this is a SNN, I would want to feed two inputs into model.evaluate, along with a categorical variable denoting if the two inputs are similar or not (1 = similar, 0 = not similar).
So basically, I want model.evaluate(input1, input2, y_label). But I am not sure how to get this with the model that I trained. As shown above, I trained with three inputs: model.fit([Anchor,Positive,Negative],y=Y_dummy ... ) .
I know I should save the weights of my trained model, but I just don't know what model to load the weights onto.
Your help is greatly appreciated!
I am aware of the below approach for prediction, but I am not looking for prediction, I am looking to use model.evaluate as I want to get some final measure of loss/accuracy for the model. Also this approach only feeds the anchor into the model (wheras I'm interested in text similarity, so would want to feed in 2 inputs)
eval_model = Model(inputs=anchor_input, outputs=encoded_anchor)
Considering that eval_model is trained to produce embeddings, I think that should be good to evaluate the similarity between two embeddings using cosine similarity.
Following the TF documentation, the cosine similarity is a number between -1 and 1. When it is a negative number closer to -1, it indicates greater similarity. When it is a positive number closer to 1, it indicates greater dissimilarity.
We can simply calculate the cosine similarity between Positive and Negative inputs for all the samples at disposal. When the cosine similarity is < 0 we can say that the two inputs are similar (1 = similar, 0 = not similar). In the end, is possible to calculate the binary accuracy as a final metric.
We can make all the calculations using TF and without the need of using model.evaluate.
eval_model = Model(inputs=anchor_input, outputs=encoded_anchor)
cos_sim = tf.keras.losses.cosine_similarity(
eval_model(X1), eval_model(X2)
accuracy = tf.reduce_mean(tf.keras.metrics.binary_accuracy(Y, -cos_sim, threshold=0))
Another approach consists in computing the cosine similarity between the anchor and positive images and comparing it with the similarity between the anchor and the negative images.
eval_model = Model(inputs=anchor_input, outputs=encoded_anchor)
positive_similarity = tf.keras.losses.cosine_similarity(
eval_model(X_anchor), eval_model(X_positive)
negative_similarity = tf.keras.losses.cosine_similarity(
eval_model(X_anchor), eval_model(X_negative)
We should expect the similarity between the anchor and positive images to be larger than the similarity between the anchor and the negative images.

Applying Keras Model to give an output value for each row

I am learning keras and would like to understand how I can apply a classifier (sequential) to all rows in my data set and not just the x% left for test validation.
The confusion I am having is, when I define my data split, I will have a portion for train and test. How would I apply model to full data set to show me the predicted values for each row? The end goal I have is to produce an concatenate the predicted value for every customer in the data set.
dataset = pd.read_csv('BankCustomers.csv')
X = dataset.iloc[:, 3:13]
y = dataset.iloc[:, 13]
feature_train, feature_test, label_train, label_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
sc = StandardScaler()
feature_train = sc.fit_transform(feature_train)
feature_test = sc.transform(feature_test)
For completeness the classifier looks like below.
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(activation="relu", input_dim=11, units=6, kernel_initializer="uniform"))
# Adding the second hidden layer
classifier.add(Dense(activation="relu", units=6, kernel_initializer="uniform"))
# Adding the output layer
classifier.add(Dense(activation="sigmoid", units=1, kernel_initializer="uniform"))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Fitting the ANN to the Training set
classifier.fit(feature_train, label_train, batch_size = 10, nb_epoch = 100)
The course I am doing will suggest ways to get accuracy and predictions for the test set like below, but not the full batch.
# Predicting the Test set results
label_pred = classifier.predict(feature_test)
label_pred = (label_pred > 0.5) # FALSE/TRUE depending on above or below 50%
cm = confusion_matrix(label_test, label_pred)
I tried concatenating the model applied to both training and test data, but i then was unsure how to ascertain which index matched up with the original data set (i.e. I don't know which of the 20% test data is relative to the original set).
I apologise in advance if this question is superfluous, I have been looking for answers on stack and via the course but so far no luck.
You can utilize pandas indexes to sort your results back to the original order.
Predict on each feature_train and feature_test (not sure why you'd want to predict on feature_train though.)
Add a new column to each feature_train and feature_test, which would contain the predictions
feature_train["predictions"] = pd.Series(classifier.predict(feature_train))
feature_test["predictions"] = pd.Series(classifier.predict(feature_test))
If you look at the indexes of each data frame above, you can see they're shuffled (because of the train_test_split).
You can now concatenate them, use sort_index, and retrieve the predictions column, which would have the predictions according to the order that appeared in your initial dataframe (X)
pd.concat([feature_train, feature_test], axis=0).sort_index()

Non-linear loss combination

My network has 2 outputs. I'm trying to have a loss on two terms that is not a linear sum of two losses:
def weightedBCE(y_true, y_pred):
assert y_pred.shape[2] == 2
y_pred_val = y_pred[:,:,0]
stds = y_pred[:,:,1]
bce = K.binary_crossentropy(y_true, y_pred_val)
loss = bce * (1. + LAM*stds )
return loss
The final layers of my model are defined like this (outSall has 3 values):
std = make_std_model()(outSall)
final = Dense(1, activation="sigmoid")(outSall)
output = concatenate([DSAfinal, std ], axis=-1)
But it doesn't work because Kears expects 1 loss function per output. My loss uses both outputs of the network together.
The first output is a standard classification one with Binary Cross Entropy loss, but I want it to be multiplied by (1+ LAM* stds) with a lambda factor multiplying stds. stds are the second output of the network.
How can I do this?
assert y_pred.shape[2] == 2
IndexError: list index out of range
I had an extra index, now fixed. See below. But I get an error pasted below.
def weightedBCE(y_true, y_pred):
assert y_pred.shape[1] == 2
y_pred_val = y_pred[:,0]
stds = y_pred[:,1]
bce = K.binary_crossentropy(y_true, y_pred_val)
loss = bce * (1. + LAM*stds )
return loss
ValueError: logits and labels must have the same shape ((?,) vs (?, ?)
Keras assumes the y_true has same shape as y_pred. Which was the problem. Changed the loss to:
def weightedBCE(y_true, y_pred):
assert y_pred.shape[1] == 2
y_pred_val = y_pred[:,0]
stds = y_pred[:,1]
bce = K.binary_crossentropy(y_true[:,0], y_pred_val)
loss = bce * (1. + LAM*stds )
return loss
There is still some problem with handling two outputs, see Binary Cross Entropy not giving similar results when I have 2 outputs
Instead of creating a Keras model with two outputs, create a Keras model with a single output which is a concatenation of the two tensors (you can use keras.layers.Concatenate for that). Then you can compile the model with a single custom loss function, as the one you wrote above.