I have created a custom loss function to deal with binary class imbalance, but my loss function does not improve per epoch. For metrics, I'm using precision and recall.
Is this a design issue where I'm not picking good hyper-parameters?
weights = [np.array([.10,.90]), np.array([.5,.5]), np.array([.1,.99]), np.array([.25,.75]), np.array([.35,.65])]
for weight in weights:
print('Model with weights {a}'.format(a=weight))
model = keras.models.Sequential([
keras.layers.Flatten(), #input_shape=[X_train.shape[1]]
keras.layers.Dense(32, activation='relu'),
keras.layers.Dense(32, activation='relu'),
keras.layers.Dense(1, activation='sigmoid')])
model.compile(loss=weighted_loss(weight),metrics=[tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])
n_epochs = 10
history = model.fit(X_train.astype('float32'), y_train.values.astype('float32'), epochs=n_epochs, validation_data=(X_test.astype('float32'), y_test.values.astype('float32')), batch_size=64)
model.evaluate(X_test.astype('float32'), y_test.astype('float32'))
pd.DataFrame(history.history).plot(figsize=(8, 5))
plt.grid(True); plt.gca().set_ylim(0, 1); plt.show()
Custom loss function to deal with class imbalance issue:
def weighted_loss(weights):
weights = K.variable(weights)
def loss(y_true, y_pred):
y_pred /= K.sum(y_pred, axis=-1, keepdims=True)
y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
loss = y_true * K.log(y_pred) * weights
loss = -K.sum(loss, -1)
return loss
return loss
Output:
Model with weights [0.1 0.9]
Epoch 1/10
274/274 [==============================] - 1s 2ms/step - loss: 1.1921e-08 - precision_24: 0.1092 - recall_24: 0.4119 - val_loss: 1.4074e-08 - val_precision_24: 0.1247 - val_recall_24: 0.3953
Epoch 2/10
274/274 [==============================] - 0s 1ms/step - loss: 1.1921e-08 - precision_24: 0.1092 - recall_24: 0.4119 - val_loss: 1.4074e-08 - val_precision_24: 0.1247 - val_recall_24: 0.3953
Epoch 3/10
274/274 [==============================] - 0s 1ms/step - loss: 1.1921e-08 - precision_24: 0.1092 - recall_24: 0.4119 - val_loss: 1.4074e-08 - val_precision_24: 0.1247 - val_recall_24: 0.3953
Epoch 4/10
274/274 [==============================] - 0s 969us/step - loss: 1.1921e-08 - precision_24: 0.1092 - recall_24: 0.4119 - val_loss: 1.4074e-08 - val_precision_24: 0.1247 - val_recall_24: 0.3953
[...]
Image of the input data set and the true y variable class designation:
Input Dataset a (17480 X 20) matrix:
y is the output array (2 classes) with dimensions (17480 x 1) and total number of 1's is: 1748 (the class that I want to predict)
Since there is no MWE present it's rather difficult to be sure. In order to be as educative as possible I'll lay out some observations and remarks.
The first observation is that your custom loss function has really small values i.e. ~10e-8 throughout training. This seems to tell your model that performance is already really good while in fact, when looking at the metrics you chose, it isn't. This indicates that the problem resides near the output or has something to do with the loss function. My recommendation here is since you have a classification problem to have a look at this post regarding weighted cross-entropy [1].
Second observation is that it seems you don't have a benchmark for performance of your model. In general, ML workflow goes from very simple to complex models. I would recommend trying a simple Logistic Regression [2] to get an idea for minimal performance. After this I would try some more complex models such as tree booster (XGBoost/LightGBM/...) or a random forest. Especially considering you are using a full-blown neural network for tabular data with only about 20 numerical features that tends to still be in the traditional machine learning territory.
Once you have obtained a baseline and perhaps improved performance using a standard machine learning technique, you can look towards a neural network again. Some other recommendations depending on the results of the traditional approaches are:
Try several and optimizers and cross-validate them over different learning rates.
Try, as mentioned by #TyQuangTu, some simpler and shallower architectures.
Try an activation function that does not have the "dying neuron" problems such as LeakyRelu or ELU.
Hopefully this answer can help you and if you have any more questions I am glad to help.
[1] Unbalanced data and weighted cross entropy
[2] https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
Related
How is Accuracy defined when the loss function is mean square error? Is it mean absolute percentage error?
The model I use has output activation linear and is compiled with loss= mean_squared_error
model.add(Dense(1))
model.add(Activation('linear')) # number
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
and the output looks like this:
Epoch 99/100
1000/1000 [==============================] - 687s 687ms/step - loss: 0.0463 - acc: 0.9689 - val_loss: 3.7303 - val_acc: 0.3250
Epoch 100/100
1000/1000 [==============================] - 688s 688ms/step - loss: 0.0424 - acc: 0.9740 - val_loss: 3.4221 - val_acc: 0.3701
So what does e.g. val_acc: 0.3250 mean? Mean_squared_error should be a scalar not a percentage - shouldnt it? So is val_acc - mean squared error, or mean percentage error or another function?
From definition of MSE on wikipedia:https://en.wikipedia.org/wiki/Mean_squared_error
The MSE is a measure of the quality of an estimator—it is always
non-negative, and values closer to zero are better.
Does that mean a value of val_acc: 0.0 is better than val_acc: 0.325?
edit: more examples of the output of accuracy metric when I train - where the accuracy is increase as I train more. While the loss function - mse should decrease. Is Accuracy well defined for mse - and how is it defined in Keras?
lAllocator: After 14014 get requests, put_count=14032 evicted_count=1000 eviction_rate=0.0712657 and unsatisfied allocation rate=0.071714
1000/1000 [==============================] - 453s 453ms/step - loss: 17.4875 - acc: 0.1443 - val_loss: 98.0973 - val_acc: 0.0333
Epoch 2/100
1000/1000 [==============================] - 443s 443ms/step - loss: 6.6793 - acc: 0.1973 - val_loss: 11.9101 - val_acc: 0.1500
Epoch 3/100
1000/1000 [==============================] - 444s 444ms/step - loss: 6.3867 - acc: 0.1980 - val_loss: 6.8647 - val_acc: 0.1667
Epoch 4/100
1000/1000 [==============================] - 445s 445ms/step - loss: 5.4062 - acc: 0.2255 - val_loss: 5.6029 - val_acc: 0.1600
Epoch 5/100
783/1000 [======================>.......] - ETA: 1:36 - loss: 5.0148 - acc: 0.2306
There are at least two separate issues with your question.
The first one should be clear by now from the comments by Dr. Snoopy and the other answer: accuracy is meaningless in a regression problem, such as yours; see also the comment by patyork in this Keras thread. For good or bad, the fact is that Keras will not "protect" you or any other user from putting not-meaningful requests in your code, i.e. you will not get any error, or even a warning, that you are attempting something that does not make sense, such as requesting the accuracy in a regression setting.
Having clarified that, the other issue is:
Since Keras does indeed return an "accuracy", even in a regression setting, what exactly is it and how is it calculated?
To shed some light here, let's revert to a public dataset (since you do not provide any details about your data), namely the Boston house price dataset (saved locally as housing.csv), and run a simple experiment as follows:
import numpy as np
import pandas
import keras
from keras.models import Sequential
from keras.layers import Dense
# load dataset
dataframe = pandas.read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:13]
Y = dataset[:,13]
model = Sequential()
model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model asking for accuracy, too:
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
model.fit(X, Y,
batch_size=5,
epochs=100,
verbose=1)
As in your case, the model fitting history (not shown here) shows a decreasing loss, and an accuracy roughly increasing. Let's evaluate now the model performance in the same training set, using the appropriate Keras built-in function:
score = model.evaluate(X, Y, verbose=0)
score
# [16.863721372581754, 0.013833992168483997]
The exact contents of the score array depend on what exactly we have requested during model compilation; in our case here, the first element is the loss (MSE), and the second one is the "accuracy".
At this point, let us have a look at the definition of Keras binary_accuracy in the metrics.py file:
def binary_accuracy(y_true, y_pred):
return K.mean(K.equal(y_true, K.round(y_pred)), axis=-1)
So, after Keras has generated the predictions y_pred, it first rounds them, and then checks to see how many of them are equal to the true labels y_true, before getting the mean.
Let's replicate this operation using plain Python & Numpy code in our case, where the true labels are Y:
y_pred = model.predict(X)
l = len(Y)
acc = sum([np.round(y_pred[i])==Y[i] for i in range(l)])/l
acc
# array([0.01383399])
Well, bingo! This is actually the same value returned by score[1] above...
To make a long story short: since you (erroneously) request metrics=['accuracy'] in your model compilation, Keras will do its best to satisfy you, and will return some "accuracy" indeed, calculated as shown above, despite this being completely meaningless in your setting.
There are quite a few settings where Keras, under the hood, performs rather meaningless operations without giving any hint or warning to the user; two of them I have happened to encounter are:
Giving meaningless results when, in a multi-class setting, one happens to request loss='binary_crossentropy' (instead of categorical_crossentropy) with metrics=['accuracy'] - see my answers in Keras binary_crossentropy vs categorical_crossentropy performance? and Why is binary_crossentropy more accurate than categorical_crossentropy for multiclass classification in Keras?
Disabling completely Dropout, in the extreme case when one requests a dropout rate of 1.0 - see my answer in Dropout behavior in Keras with rate=1 (dropping all input units) not as expected
The loss function (Mean Square Error in this case) is used to indicate how far your predictions deviate from the target values. In the training phase, the weights are updated based on this quantity. If you are dealing with a classification problem, it is quite common to define an additional metric called accuracy. It monitors in how many cases the correct class was predicted. This is expressed as a percentage value. Consequently, a value of 0.0 means no correct decision and 1.0 only correct decisons.
While your network is training, the loss is decreasing and usually the accuracy increases.
Note, that in contrast to loss, the accuracy is usally not used to update the parameters of your network. It helps to monitor the learning progress and the current performane of the network.
#desertnaut has said it very clearly.
Consider the following two pieces of code
compile code
binary_accuracy code
def binary_accuracy(y_true, y_pred):
return K.mean(K.equal(y_true, K.round(y_pred)), axis=-1)
Your labels should be integer,Because keras does not round y_true, and you get high accuracy.......
I've trained/fine-tuned a Spanish RoBERTa model that has recently been pre-trained for a variety of NLP tasks except for text classification.
Since the baseline model seems to be promising, I want to fine-tune it for a different task: text classification, more precisely, sentiment analysis of Spanish Tweets and use it to predict labels on scraped tweets I have.
The preprocessing and the training seem to work correctly. However, I don't know how I can use this mode afterwards for prediction.
I'll leave out the preprocessing part because I don't think there seems to be an issue.
Code:
# Training with native TensorFlow
from transformers import TFAutoModelForSequenceClassification
## Model Definition
model = TFAutoModelForSequenceClassification.from_pretrained("BSC-TeMU/roberta-base-bne", from_pt=True, num_labels=3)
## Model Compilation
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.metrics.SparseCategoricalAccuracy()
model.compile(optimizer=optimizer,
loss=loss,
metrics=metric)
## Fitting the data
history = model.fit(train_dataset.shuffle(1000).batch(64), epochs=3, batch_size=64)
Output:
/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py:337: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
"Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 "
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFRobertaForSequenceClassification: ['roberta.embeddings.position_ids']
- This IS expected if you are initializing TFRobertaForSequenceClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFRobertaForSequenceClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
Some weights or buffers of the TF 2.0 model TFRobertaForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/5
16/16 [==============================] - 35s 1s/step - loss: 1.0455 - sparse_categorical_accuracy: 0.4452
Epoch 2/5
16/16 [==============================] - 18s 1s/step - loss: 0.6923 - sparse_categorical_accuracy: 0.7206
Epoch 3/5
16/16 [==============================] - 18s 1s/step - loss: 0.3533 - sparse_categorical_accuracy: 0.8885
Epoch 4/5
16/16 [==============================] - 18s 1s/step - loss: 0.1871 - sparse_categorical_accuracy: 0.9477
Epoch 5/5
16/16 [==============================] - 18s 1s/step - loss: 0.1031 - sparse_categorical_accuracy: 0.9714
Question:
How can I use the model after fine-tuning for text classification/sentiment analysis? (I want to create a predicted label for each tweet I scraped.)
What would be a good way of approaching this?
I've tried to save the model, but I don't know where I can find it and use then:
# Save the model
model.save_pretrained('Twitter_Roberta_Model')
I've also tried to just add it to a HuggingFace pipeline like the following. But I'm not sure if this works correctly.
classifier = pipeline('sentiment-analysis',
model=model,
tokenizer=AutoTokenizer.from_pretrained("BSC-TeMU/roberta-base-bne"))
Although this is an example for a specific model (DistilBert), the following prediction code should work similarly (small modifications according to your needs). You just need to replace the distillbert according to your model (TFAutoModelForSequenceClassification) and of course ensure the proper tokenizer is used.
loaded_model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
loaded_model.load_weights('./distillbert_tf.h5')
input_text = "The text on which I test"
input_text_tokenized = tokenizer.encode(input_text,
truncation=True,
padding=True,
return_tensors="tf")
prediction = loaded_model(input_text_tokenized)
prediction_logits = prediction[0]
prediction_probs = tf.nn.softmax(prediction_logits,axis=1).numpy()
print(f'The prediction probs are: {prediction_probs}')
I've trained/fine-tuned a Spanish RoBERTa model that has recently been pre-trained for a variety of NLP tasks except for text classification.
Since the baseline model seems to be promising, I want to fine-tune it for a different task: text classification, more precisely, sentiment analysis of Spanish Tweets and use it to predict labels on scraped tweets I have.
The preprocessing and the training seem to work correctly. However, I don't know how I can use this mode afterwards for prediction.
I'll leave out the preprocessing part because I don't think there seems to be an issue.
Code:
# Training with native TensorFlow
from transformers import TFAutoModelForSequenceClassification
## Model Definition
model = TFAutoModelForSequenceClassification.from_pretrained("BSC-TeMU/roberta-base-bne", from_pt=True, num_labels=3)
## Model Compilation
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.metrics.SparseCategoricalAccuracy()
model.compile(optimizer=optimizer,
loss=loss,
metrics=metric)
## Fitting the data
history = model.fit(train_dataset.shuffle(1000).batch(64), epochs=3, batch_size=64)
Output:
/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py:337: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
"Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 "
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFRobertaForSequenceClassification: ['roberta.embeddings.position_ids']
- This IS expected if you are initializing TFRobertaForSequenceClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFRobertaForSequenceClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
Some weights or buffers of the TF 2.0 model TFRobertaForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/5
16/16 [==============================] - 35s 1s/step - loss: 1.0455 - sparse_categorical_accuracy: 0.4452
Epoch 2/5
16/16 [==============================] - 18s 1s/step - loss: 0.6923 - sparse_categorical_accuracy: 0.7206
Epoch 3/5
16/16 [==============================] - 18s 1s/step - loss: 0.3533 - sparse_categorical_accuracy: 0.8885
Epoch 4/5
16/16 [==============================] - 18s 1s/step - loss: 0.1871 - sparse_categorical_accuracy: 0.9477
Epoch 5/5
16/16 [==============================] - 18s 1s/step - loss: 0.1031 - sparse_categorical_accuracy: 0.9714
Question:
How can I use the model after fine-tuning for text classification/sentiment analysis? (I want to create a predicted label for each tweet I scraped.)
What would be a good way of approaching this?
I've tried to save the model, but I don't know where I can find it and use then:
# Save the model
model.save_pretrained('Twitter_Roberta_Model')
I've also tried to just add it to a HuggingFace pipeline like the following. But I'm not sure if this works correctly.
classifier = pipeline('sentiment-analysis',
model=model,
tokenizer=AutoTokenizer.from_pretrained("BSC-TeMU/roberta-base-bne"))
Although this is an example for a specific model (DistilBert), the following prediction code should work similarly (small modifications according to your needs). You just need to replace the distillbert according to your model (TFAutoModelForSequenceClassification) and of course ensure the proper tokenizer is used.
loaded_model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
loaded_model.load_weights('./distillbert_tf.h5')
input_text = "The text on which I test"
input_text_tokenized = tokenizer.encode(input_text,
truncation=True,
padding=True,
return_tensors="tf")
prediction = loaded_model(input_text_tokenized)
prediction_logits = prediction[0]
prediction_probs = tf.nn.softmax(prediction_logits,axis=1).numpy()
print(f'The prediction probs are: {prediction_probs}')
I am working on an autoencoder in Keras that contains some dropout layers. To evaluate bias and variance, I'd like to compare the losses of training and test data. However, since dropout is used during training, the losses cannot be compared. (See here for an explanation of why the training data results can be worse than test data results.)
In order to get training data losses that are not influenced by the dropout, I wrote a callback to validate some additional data set (in this case, it would be the training data again).
The strange thing is that I ALWAYS get the same results as for the validation data. Here's a minimal example:
from pprint import pprint
import keras
import numpy as np
import pandas as pd
from numpy.random import seed as np_seed
from tensorflow.random import set_seed as tf_seed
np_seed(1)
tf_seed(2)
# Generation of data sets for training and testing. Random data is only used to showcase the problem.
df_train = pd.DataFrame(data=np.random.random((1000, 10))) # This will be used for training
df_test_1 = pd.DataFrame(data=np.random.random((1000, 10))) # This will be used as validation data set directly
df_test_2 = pd.DataFrame(data=np.random.random((1000, 10))) # This will be used within the callback
np_seed(1)
tf_seed(2)
model = keras.models.Sequential(
[
keras.Input(shape=(10, )),
keras.layers.Dropout(rate=0.01),
keras.layers.Dense(5, activation='relu'),
keras.layers.Dropout(rate=0.01),
keras.layers.Dense(10, activation='linear'),
]
)
model.compile(
loss='mean_squared_error',
optimizer=keras.optimizers.Adam(),
)
class CustomDataValidation(keras.callbacks.Callback):
def __init__(self, x=None, y=None):
self.x = x
self.y = y
def on_epoch_end(self, epoch, logs=None):
result = self.model.evaluate(x=self.x, y=self.y, return_dict=True)
for loss_name, loss_value in result.items():
logs["custom_" + loss_name] = loss_value
cdv = CustomDataValidation(df_test_2, df_test_2)
hist = model.fit(df_train, df_train, validation_data=(df_test_1, df_test_1), epochs=2, validation_split=0.1, callbacks=[cdv])
pprint(hist.history)
The output is
Epoch 1/2
4/4 [==============================] - 0s 1ms/step - loss: 0.7625
29/29 [==============================] - 0s 5ms/step - loss: 0.9666 - val_loss: 0.7625
Epoch 2/2
4/4 [==============================] - 0s 1ms/step - loss: 0.5331
29/29 [==============================] - 0s 2ms/step - loss: 0.6638 - val_loss: 0.5331
{'custom_loss': [0.7624925374984741, 0.5331208109855652],
'loss': [0.9665887951850891, 0.6637843251228333],
'val_loss': [0.7624925374984741, 0.5331208109855652]}
'custom_loss' and 'val_loss' are equal although they should be based on totally different data sets.
The question is therefore: How can I evaluate the model performance on custom data within a callback?
Edit: Since I did not yet got an answer on stackoverflow, I created an issue at tensorflow's git repo. Also, there's now a notebook available that shows the problem.
It seems that this is a bug in tensorflow versions 2.3.x (tested with 2.3.0 and 2.3.1). In versions 2.4.0-rc0 and 2.2.1, the loss outputs of loss and custom_loss differ, which is the expected behavior:
{'custom_loss': [0.7694963216781616, 0.541864812374115],
'loss': [0.9665887951850891, 0.6637843251228333],
'val_loss': [0.7624925374984741, 0.5331208109855652]}
I cannot seem to make val_sample_weights work with validation data, using model.fit().
My understanding of val_sample_weights is that it is suppose to do exactly the same as sample_weight such that reported loss and val_loss can be directly compared when training using sample_weight.
From my example code below, we see that loss goes to zero when setting sample_weight to zero, but val_loss is unaffected when setting val_sample_weights to zero.
Am I missing or misunderstanding something here, or does this seem like val_sample_weights has no use in tf.keras? Maybe a bug in tf.keras? I'm on Windows, tensorflow version 1.14.0.
Below is a working example if you want to test it yourself:
import tensorflow as tf
import numpy as np
data_size = 100
input_size=3
seed = 1
# synetic train data
x_train = np.random.rand(data_size ,input_size)
y_train= np.random.rand(data_size,1)
# synetic test data
x_test = np.random.rand(data_size ,input_size)
y_test= np.random.rand(data_size,1)
tf.random.set_random_seed(seed)
# create model
inputs = tf.keras.layers.Input(shape=(input_size))
pred=tf.keras.layers.Dense(1, activation='sigmoid')(inputs)
model = tf.keras.models.Model(inputs=inputs, outputs=pred)
loss = tf.keras.losses.MSE
metrics = tf.keras.losses.MAE
model.compile(loss=loss , metrics=[metrics], optimizer='adam')
# Make model static, so we can compare it between different scenarios
for layer in model.layers:
layer.trainable = False
# base model no weights (same result as without class_weights)
model.fit(x=x_train,y=y_train, validation_data=(x_test,y_test))
# which outputs:
#> 100/100 [==============================] - 0s 1ms/sample - loss: 0.0851 - mean_absolute_error: 0.2511 - val_loss: 0.0900 - val_mean_absolute_error: 0.2629
#changing the sample_weights to zero, Hence loss and val_loss should be zero, and metrics should be the same
sample_weight_train = np.zeros(100)
sample_weight_val = np.zeros(100)
model.fit(x=x_train,y=y_train,sample_weight=sample_weight_train, validation_data=(x_test,y_test,sample_weight_val))
# which outputs:
#> 100/100 [==============================] - 0s 860us/sample - loss: 0.0000e+00 - mean_absolute_error: 0.2507 - val_loss: 0.0899 - val_mean_absolute_error: 0.2627