When using the TACO dataset with mmdetection, I get the error 'numpy.float64' object cannot be interpreted as an integer - numpy

I am trying to train instance segmentation model with mmdetection using dataset and I got this error
TypeError: 'numpy.float64' object cannot be interpreted as an integer
I want to segment with mmedetection (You can also find the mmdetection repo here. mmdetection) using the dataset in the link. TACO
TACO dataset is a dataset in COCO format used for garbage detection. Classes in them
['Aluminium foil', 'Battery', 'Aluminium blister pack', 'Carded blister pack', 'Other plastic bottle', 'Clear plastic bottle', 'Glass bottle', 'Plastic bottle cap', 'Metal bottle cap', 'Broken glass', 'Food Can', 'Aerosol', 'Drink can', 'Toilet tube', 'Other carton', 'Egg carton', 'Drink carton', 'Corrugated carton', 'Meal carton', 'Pizza box', 'Paper cup', 'Disposable plastic cup', 'Foam cup', 'Glass cup', 'Other plastic cup', 'Food waste', 'Glass jar', 'Plastic lid', 'Metal lid', 'Other plastic', 'Magazine paper', 'Tissues', 'Wrapping paper', 'Normal paper', 'Paper bag', 'Plastified paper bag', 'Plastic film', 'Six pack rings', 'Garbage bag', 'Other plastic wrapper', 'Single-use carrier bag', 'Polypropylene bag', 'Crisp packet', 'Spread tub', 'Tupperware', 'Disposable food container', 'Foam food container', 'Other plastic container', 'Plastic glooves', 'Plastic utensils', 'Pop tab', 'Rope & strings', 'Scrap metal', 'Shoe', 'Squeezable tube', 'Plastic straw', 'Paper straw', 'Styrofoam piece', 'Unlabeled litter', 'Cigarette']
I obtained the data by following the instructions in the Getting started section in the TACO github repo. I separated the annotations.json file as test, train and valid with the help of split-method link.
As you can see below, when I run it for the train after giving the path of the test, train and valid annotation files and images I have obtained to the config file in mmdetection for the train process, I get an error in the Evaluating section after the train part is finished.
2023-02-08 08:33:25,978 - mmdet - INFO - Epoch [10][10/50] lr:
2.500e-03, eta: 0:12:28, time: 0.731, data_time: 0.244, memory: 3987, loss_rpn_cls: 0.0011, loss_rpn_bbox: 0.0070, loss_cls: 0.0439, acc:
98.4180, loss_bbox: 0.0905, loss_mask: 0.0740, loss: 0.2165
2023-02-08 08:33:30,935 - mmdet - INFO - Epoch [10][20/50] lr:
2.500e-03, eta: 0:12:21, time: 0.499, data_time: 0.030, memory: 3987, loss_rpn_cls: 0.0021, loss_rpn_bbox: 0.0070, loss_cls: 0.0459, acc:
98.4473, loss_bbox: 0.0870, loss_mask: 0.0935, loss: 0.2354
2023-02-08 08:33:35,857 - mmdet - INFO - Epoch [10][30/50] lr:
2.500e-03, eta: 0:12:13, time: 0.488, data_time: 0.027, memory: 3987, loss_rpn_cls: 0.0008, loss_rpn_bbox: 0.0077, loss_cls: 0.0322, acc:
98.9062, loss_bbox: 0.0798, loss_mask: 0.0771, loss: 0.1976
2023-02-08 08:33:41,099 - mmdet - INFO - Epoch [10][40/50] lr:
2.500e-03, eta: 0:12:07, time: 0.523, data_time: 0.030, memory: 3987, loss_rpn_cls: 0.0009, loss_rpn_bbox: 0.0074, loss_cls: 0.0388, acc:
98.8574, loss_bbox: 0.0996, loss_mask: 0.0815, loss: 0.2281
2023-02-08 08:33:46,139 - mmdet - INFO - Epoch [10][50/50] lr:
2.500e-03, eta: 0:12:00, time: 0.506, data_time: 0.031, memory: 3987, loss_rpn_cls: 0.0012, loss_rpn_bbox: 0.0073, loss_cls: 0.0400, acc:
98.8672, loss_bbox: 0.0757, loss_mask: 0.0783, loss: 0.2025
2023-02-08 08:33:53,641 - mmdet - INFO - Epoch [11][10/50] lr:
2.500e-03, eta: 0:11:59, time: 0.724, data_time: 0.240, memory: 3987, loss_rpn_cls: 0.0006, loss_rpn_bbox: 0.0054, loss_cls: 0.0373, acc:
98.8672, loss_bbox: 0.0857, loss_mask: 0.0723, loss: 0.2013
2023-02-08 08:33:58,535 - mmdet - INFO - Epoch [11][20/50] lr:
2.500e-03, eta: 0:11:51, time: 0.488, data_time: 0.028, memory: 3987, loss_rpn_cls: 0.0008, loss_rpn_bbox: 0.0064, loss_cls: 0.0309, acc:
99.1016, loss_bbox: 0.0847, loss_mask: 0.0703, loss: 0.1931
2023-02-08 08:34:03,679 - mmdet - INFO - Epoch [11][30/50] lr:
2.500e-03, eta: 0:11:45, time: 0.514, data_time: 0.034, memory: 3987, loss_rpn_cls: 0.0014, loss_rpn_bbox: 0.0096, loss_cls: 0.0424, acc:
98.4766, loss_bbox: 0.0927, loss_mask: 0.0798, loss: 0.2260
2023-02-08 08:34:09,220 - mmdet - INFO - Epoch [11][40/50] lr:
2.500e-03, eta: 0:11:39, time: 0.553, data_time: 0.063, memory: 3987, loss_rpn_cls: 0.0009, loss_rpn_bbox: 0.0089, loss_cls: 0.0336, acc:
98.9746, loss_bbox: 0.0824, loss_mask: 0.0768, loss: 0.2026
2023-02-08 08:34:14,489 - mmdet - INFO - Epoch [11][50/50] lr:
2.500e-03, eta: 0:11:33, time: 0.529, data_time: 0.032, memory: 3987, loss_rpn_cls: 0.0005, loss_rpn_bbox: 0.0079, loss_cls: 0.0359, acc:
98.7793, loss_bbox: 0.0876, loss_mask: 0.0947, loss: 0.2265
2023-02-08 08:34:21,829 - mmdet - INFO - Epoch [12][10/50] lr:
2.500e-03, eta: 0:11:31, time: 0.708, data_time: 0.232, memory: 3987, loss_rpn_cls: 0.0005, loss_rpn_bbox: 0.0037, loss_cls: 0.0285, acc:
98.9844, loss_bbox: 0.0640, loss_mask: 0.0747, loss: 0.1714
2023-02-08 08:34:26,653 - mmdet - INFO - Epoch [12][20/50] lr:
2.500e-03, eta: 0:11:24, time: 0.483, data_time: 0.028, memory: 3987, loss_rpn_cls: 0.0005, loss_rpn_bbox: 0.0058, loss_cls: 0.0302, acc:
98.8965, loss_bbox: 0.0657, loss_mask: 0.0694, loss: 0.1716
2023-02-08 08:34:31,685 - mmdet - INFO - Epoch [12][30/50] lr:
2.500e-03, eta: 0:11:17, time: 0.500, data_time: 0.028, memory: 3987, loss_rpn_cls: 0.0015, loss_rpn_bbox: 0.0101, loss_cls: 0.0366, acc:
98.9844, loss_bbox: 0.1061, loss_mask: 0.0828, loss: 0.2371
2023-02-08 08:34:36,939 - mmdet - INFO - Epoch [12][40/50] lr:
2.500e-03, eta: 0:11:11, time: 0.525, data_time: 0.031, memory: 3987, loss_rpn_cls: 0.0015, loss_rpn_bbox: 0.0063, loss_cls: 0.0334, acc:
98.8477, loss_bbox: 0.0706, loss_mask: 0.0762, loss: 0.1880
2023-02-08 08:34:42,106 - mmdet - INFO - Epoch [12][50/50] lr:
2.500e-03, eta: 0:11:04, time: 0.520, data_time: 0.032, memory: 3987, loss_rpn_cls: 0.0011, loss_rpn_bbox: 0.0073, loss_cls: 0.0400, acc:
98.7305, loss_bbox: 0.0884, loss_mask: 0.0778, loss: 0.2147
2023-02-08 08:34:42,154 - mmdet - INFO - Saving checkpoint at 12
epochs
3/3, 4.3 task/s, elapsed: 1s, ETA: 0s 2023-02-08 08:34:46,374 -
mmdet - INFO - Evaluating bbox... Loading and preparing results...
DONE (t=0.00s) creating index... index created!
TypeError Traceback (most recent call last)
/tmp/ipykernel_26640/1144535353.py in
17 # Create work_dir
18 mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
19 train_detector(model, datasets, cfg, distributed=False, validate=True)
~/violations-tracing-project/mmdetection/mmdet/apis/train.py in train_detector(model, dataset, cfg, distributed, validate, timestamp, meta)
244 elif cfg.load_from:
245 runner.load_checkpoint(cfg.load_from)
246 runner.run(data_loaders, cfg.workflow)
/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py in run(self, data_loaders, workflow, max_epochs, **kwargs)
134 if mode == 'train' and self.epoch >= self._max_epochs:
135 break
136 epoch_runner(data_loaders[i], **kwargs)
137
138 time.sleep(1) # wait for some hooks like loggers to finish
/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py in train(self, data_loader, **kwargs)
56 self._iter += 1
57
58 self.call_hook('after_train_epoch')
59 self._epoch += 1
60
/opt/conda/lib/python3.7/site-packages/mmcv/runner/base_runner.py in call_hook(self, fn_name)
315 """
316 for hook in self._hooks:
317 getattr(hook, fn_name)(self)
318
319 def get_hook_info(self) -> str:
/opt/conda/lib/python3.7/site-packages/mmcv/runner/hooks/evaluation.py in after_train_epoch(self, runner)
269 """Called after every training epoch to evaluate the results."""
270 if self.by_epoch and self._should_evaluate(runner):
271 self._do_evaluate(runner)
272
273 def _do_evaluate(self, runner):
~/violations-tracing-project/mmdetection/mmdet/core/evaluation/eval_hooks.py in _do_evaluate(self, runner)
61 self.latest_results = results
62 runner.log_buffer.output['eval_iter_num'] = len(self.dataloader)
63 key_score = self.evaluate(runner, results)
64 # the key_score may be None so it needs to skip the action to save
65 # the best checkpoint
/opt/conda/lib/python3.7/site-packages/mmcv/runner/hooks/evaluation.py in evaluate(self, runner, results)
366 """
367 eval_res = self.dataloader.dataset.evaluate(
368 results, logger=runner.logger, **self.eval_kwargs)
369
370 for name, val in eval_res.items():
~/violations-tracing-project/mmdetection/mmdet/datasets/coco.py in evaluate(self, results, metric, logger, jsonfile_prefix, classwise, proposal_nums, iou_thrs, metric_items)
643 metrics, logger, classwise,
644 proposal_nums, iou_thrs,
645 metric_items)
646
647 if tmp_dir is not None:
~/violations-tracing-project/mmdetection/mmdet/datasets/coco.py in evaluate_det_segm(self, results, result_files, coco_gt, metrics, logger, classwise, proposal_nums, iou_thrs, metric_items)
481 break
482
483 cocoEval = COCOeval(coco_gt, coco_det, iou_type)
484 cocoEval.params.catIds = self.cat_ids
485 cocoEval.params.imgIds = self.img_ids
/opt/conda/lib/python3.7/site-packages/pycocotools/cocoeval.py in init(self, cocoGt, cocoDt, iouType)
74 self._gts = defaultdict(list) # gt for evaluation
75 self._dts = defaultdict(list) # dt for evaluation
76 self.params = Params(iouType=iouType) # parameters
77 self._paramsEval = {} # parameters for evaluation
78 self.stats = [] # result summarization
/opt/conda/lib/python3.7/site-packages/pycocotools/cocoeval.py in init(self, iouType)
525 def init(self, iouType='segm'):
526 if iouType == 'segm' or iouType == 'bbox':
527 self.setDetParams()
528 elif iouType == 'keypoints':
529 self.setKpParams()
/opt/conda/lib/python3.7/site-packages/pycocotools/cocoeval.py in setDetParams(self)
505 self.catIds = []
506 # np.arange causes trouble. the data point on arange is slightly larger than the true value
507 self.iouThrs = np.linspace(.5, 0.95, np.round((0.95 - .5) / .05) + 1, endpoint=True)
508 self.recThrs = np.linspace(.0, 1.00, np.round((1.00 - .0) / .01) + 1, endpoint=True)
509 self.maxDets = [1, 10, 100]
<array_function internals> in linspace(*args, **kwargs)
/opt/conda/lib/python3.7/site-packages/numpy/core/function_base.py in linspace(start, stop, num, endpoint, retstep, dtype, axis)
118
119 """
120 num = operator.index(num)
121 if num < 0:
122 raise ValueError("Number of samples, %s, must be non-negative." % num)
TypeError: 'numpy.float64' object cannot be interpreted as an integer
I converted the values in the segmentation part of the annotation files from float to integer, and I did the same for the values in the box part. but the problem was not fixed.
on this line where train for mmdetection is started
train_detector(model, datasets, cfg, distributed=False, validate=True)
If validate=False, the train process is finished without an error, but since there is no validation, I cannot determine my accuracy (such as precision, recall). Also, although it was not the best result, it was not a correct train because it saved the last log.
If I solve the float problem in the validation part here, I will continue my train process in a healthy way.

Related

Different Results between Sequential and Model in Tensorflow despite same Build up

I am trying to move a sequential neural network from the time series tutorial on the Tensorflow website to a functional API one (https://www.tensorflow.org/tutorials/structured_data/time_series#single-shot_models).
The tutorial code is as follows:
multi_dense_model = tf.keras.Sequential()
multi_dense_model.add(tf.keras.layers.Input(shape=(24, 19)))
multi_dense_model.add(tf.keras.layers.Lambda(lambda x: x[:, -1:, :]))
multi_dense_model.add(tf.keras.layers.Dense(512, activation='relu'))
multi_dense_model.add(tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros()))
multi_dense_model.add(tf.keras.layers.Reshape([OUT_STEPS, num_features]))
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=2, mode='min')
multi_dense_model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam(), metrics=[tf.keras.metrics.MeanAbsoluteError()])
history = multi_dense_model.fit(multi_window.train, epochs=MAX_EPOCHS, validation_data=multi_window.val, callbacks=[early_stopping])
Where I get the following result:
Epoch 1/20
1532/1532 [==============================] - 11s 7ms/step - loss: 0.2391 - mean_absolute_error: 0.3012 - val_loss: 0.2272 - val_mean_absolute_error: 0.2895
Epoch 2/20
1532/1532 [==============================] - 9s 6ms/step - loss: 0.2226 - mean_absolute_error: 0.2850 - val_loss: 0.2283 - val_mean_absolute_error: 0.2908
Epoch 3/20
1532/1532 [==============================] - 11s 7ms/step - loss: 0.2192 - mean_absolute_error: 0.2820 - val_loss: 0.2230 - val_mean_absolute_error: 0.2847
Epoch 4/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.2166 - mean_absolute_error: 0.2798 - val_loss: 0.2212 - val_mean_absolute_error: 0.2836
Epoch 5/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.2144 - mean_absolute_error: 0.2780 - val_loss: 0.2189 - val_mean_absolute_error: 0.2809
Epoch 6/20
1532/1532 [==============================] - 9s 6ms/step - loss: 0.2131 - mean_absolute_error: 0.2768 - val_loss: 0.2196 - val_mean_absolute_error: 0.2812
Epoch 7/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.2118 - mean_absolute_error: 0.2759 - val_loss: 0.2193 - val_mean_absolute_error: 0.2827
437/437 [==============================] - 2s 4ms/step - loss: 0.2193 - mean_absolute_error: 0.2827
Now I changed the code to the functional API:
input1 = tf.keras.layers.Input(shape=(24, 19))
lamb1 = tf.keras.layers.Lambda(lambda x: x[:, -1:, :])(input1)
dense1 = tf.keras.layers.Dense(512, activation='relu')(lamb1)
dense2 = tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros(), activation='relu')(dense1)
resha1 = tf.keras.layers.Reshape([OUT_STEPS, num_features])(dense2)
multi_dense_model = tf.keras.models.Model(inputs=input1, outputs=resha1)
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=2, mode='min')
multi_dense_model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam(), metrics=[tf.keras.metrics.MeanAbsoluteError()])
history = multi_dense_model.fit(multi_window.train, epochs=MAX_EPOCHS, validation_data=multi_window.val, callbacks=[early_stopping])
multi_val_performance['Dense'] = multi_dense_model.evaluate(multi_window.val)
multi_performance['Dense'] = multi_dense_model.evaluate(multi_window.test, verbose=0)
multi_window.plot(multi_dense_model)
And get this:
Epoch 1/20
1532/1532 [==============================] - 11s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 2/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 3/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 4/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 5/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 6/20
1532/1532 [==============================] - 10s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Epoch 7/20
1532/1532 [==============================] - 11s 7ms/step - loss: 0.9995 - mean_absolute_error: 0.8084 - val_loss: 0.9425 - val_mean_absolute_error: 0.7799
Any idea why this might be? I tried a lot of things but cant get it to match. Also model.summary prints pretty much the same for both (Sequential is always ignoring the Input Layer but I think that does not make a difference since you have to specify for Models Input)
This is the complete code I am using in case you want to copy paste
import os
import datetime
import IPython
import IPython.display
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf
mpl.rcParams['figure.figsize'] = (8, 6)
mpl.rcParams['axes.grid'] = False
zip_path = tf.keras.utils.get_file(
origin='https://storage.googleapis.com/tensorflow/tf-keras-datasets/jena_climate_2009_2016.csv.zip',
fname='jena_climate_2009_2016.csv.zip',
extract=True)
csv_path, _ = os.path.splitext(zip_path)
df = pd.read_csv(csv_path)
# Slice [start:stop:step], starting from index 5 take every 6th record.
df = df[5::6]
date_time = pd.to_datetime(df.pop('Date Time'), format='%d.%m.%Y %H:%M:%S')
wv = df['wv (m/s)']
bad_wv = wv == -9999.0
wv[bad_wv] = 0.0
max_wv = df['max. wv (m/s)']
bad_max_wv = max_wv == -9999.0
max_wv[bad_max_wv] = 0.0
# The above inplace edits are reflected in the DataFrame.
df['wv (m/s)'].min()
wv = df.pop('wv (m/s)')
max_wv = df.pop('max. wv (m/s)')
# Convert to radians.
wd_rad = df.pop('wd (deg)')*np.pi / 180
# Calculate the wind x and y components.
df['Wx'] = wv*np.cos(wd_rad)
df['Wy'] = wv*np.sin(wd_rad)
# Calculate the max wind x and y components.
df['max Wx'] = max_wv*np.cos(wd_rad)
df['max Wy'] = max_wv*np.sin(wd_rad)
timestamp_s = date_time.map(pd.Timestamp.timestamp)
day = 24*60*60
year = (365.2425)*day
df['Day sin'] = np.sin(timestamp_s * (2 * np.pi / day))
df['Day cos'] = np.cos(timestamp_s * (2 * np.pi / day))
df['Year sin'] = np.sin(timestamp_s * (2 * np.pi / year))
df['Year cos'] = np.cos(timestamp_s * (2 * np.pi / year))
column_indices = {name: i for i, name in enumerate(df.columns)}
n = len(df)
train_df = df[0:int(n*0.7)]
val_df = df[int(n*0.7):int(n*0.9)]
test_df = df[int(n*0.9):]
num_features = df.shape[1]
train_mean = train_df.mean()
train_std = train_df.std()
train_df = (train_df - train_mean) / train_std
val_df = (val_df - train_mean) / train_std
test_df = (test_df - train_mean) / train_std
df_std = (df - train_mean) / train_std
df_std = df_std.melt(var_name='Column', value_name='Normalized')
class WindowGenerator():
def __init__(self, input_width, label_width, shift,
train_df=train_df, val_df=val_df, test_df=test_df,
label_columns=None):
# Store the raw data.
self.train_df = train_df
self.val_df = val_df
self.test_df = test_df
# Work out the label column indices.
self.label_columns = label_columns
if label_columns is not None:
self.label_columns_indices = {name: i for i, name in
enumerate(label_columns)}
self.column_indices = {name: i for i, name in
enumerate(train_df.columns)}
# Work out the window parameters.
self.input_width = input_width
self.label_width = label_width
self.shift = shift
self.total_window_size = input_width + shift
self.input_slice = slice(0, input_width)
self.input_indices = np.arange(self.total_window_size)[self.input_slice]
self.label_start = self.total_window_size - self.label_width
self.labels_slice = slice(self.label_start, None)
self.label_indices = np.arange(self.total_window_size)[self.labels_slice]
def __repr__(self):
return '\n'.join([
f'Total window size: {self.total_window_size}',
f'Input indices: {self.input_indices}',
f'Label indices: {self.label_indices}',
f'Label column name(s): {self.label_columns}'])
def make_dataset(self, data):
data = np.array(data, dtype=np.float32)
ds = tf.keras.utils.timeseries_dataset_from_array(
data=data,
targets=None,
sequence_length=self.total_window_size,
sequence_stride=1,
shuffle=True,
batch_size=32,)
ds = ds.map(self.split_window)
return ds
WindowGenerator.make_dataset = make_dataset
#property
def train(self):
return self.make_dataset(self.train_df)
#property
def val(self):
return self.make_dataset(self.val_df)
#property
def test(self):
return self.make_dataset(self.test_df)
#property
def example(self):
"""Get and cache an example batch of `inputs, labels` for plotting."""
result = getattr(self, '_example', None)
if result is None:
# No example batch was found, so get one from the `.train` dataset
result = next(iter(self.train))
# And cache it for next time
self._example = result
return result
WindowGenerator.train = train
WindowGenerator.val = val
WindowGenerator.test = test
WindowGenerator.example = example
def split_window(self, features):
inputs = features[:, self.input_slice, :]
labels = features[:, self.labels_slice, :]
if self.label_columns is not None:
labels = tf.stack(
[labels[:, :, self.column_indices[name]] for name in self.label_columns],
axis=-1)
# Slicing doesn't preserve static shape information, so set the shapes
# manually. This way the `tf.data.Datasets` are easier to inspect.
inputs.set_shape([None, self.input_width, None])
labels.set_shape([None, self.label_width, None])
return inputs, labels
WindowGenerator.split_window = split_window
OUT_STEPS = 24
multi_window = WindowGenerator(input_width=24,
label_width=OUT_STEPS,
shift=OUT_STEPS)
MAX_EPOCHS = 20
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
# multi_dense_model = tf.keras.Sequential()
# multi_dense_model.add(tf.keras.layers.Input(shape=(24, 19)))
# multi_dense_model.add(tf.keras.layers.Lambda(lambda x: x[:, -1:, :]))
# multi_dense_model.add(tf.keras.layers.Dense(512, activation='relu'))
# multi_dense_model.add(tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros()))
# multi_dense_model.add(tf.keras.layers.Reshape([OUT_STEPS, num_features]))
input1 = tf.keras.layers.Input(shape=(24, 19))
lamb1 = tf.keras.layers.Lambda(lambda x: x[:, -1:, :])(input1)
dense1 = tf.keras.layers.Dense(512, activation='relu')(lamb1)
dense2 = tf.keras.layers.Dense(OUT_STEPS*num_features, activation='relu')(dense1)
resha1 = tf.keras.layers.Reshape([OUT_STEPS, num_features])(dense2)
multi_dense_model = tf.keras.models.Model(inputs=input1, outputs=resha1)
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=2, mode='min')
multi_dense_model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam(), metrics=[tf.keras.metrics.MeanAbsoluteError()], run_eagerly=True)
history = multi_dense_model.fit(multi_window.train, epochs=MAX_EPOCHS, validation_data=multi_window.val, callbacks=[early_stopping])
Most likely because you are applying 2 non linearity:
#Sequential
multi_dense_model.add(tf.keras.layers.Dense(512, activation='relu'))
multi_dense_model.add(tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros()))
#Functional
dense1 = tf.keras.layers.Dense(512, activation='relu')(lamb1)
dense2 = tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros(), activation='relu')(dense1)
# scroll right -------------------------------> ^^^^^^^^^^^^^^^^^
And by definition, Dense layer with no activation, becomes linear layer... So the two models are not equivalent

Segmentation of German Asphalt Pavement Distress Dataset (GAPs) using U-Net

I'm trying to train a U-Net like model to segment the German Asphalt Pavement Distress Dataset.
Mask images are stored as grey value images.
Coding of the grey values:
0 = VOID, 1 = intact road, 2 = applied patch, 3 = pothole, 4 = inlaid patch, 5 = open joint, 6 = crack 7 = street inventory
I found the following colab notebook which was implementing U-Net segmentation on Oxford pets dataset:
https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/vision/ipynb/oxford_pets_image_segmentation.ipynb
I modified the notebook to fit my problem of GAPs segmentation, and this is a link to my modified notebook:
https://colab.research.google.com/drive/1YfM4lC78QNdfbkgz-1LGSKaBG4-65dkC?usp=sharing
The training is running, but while the loss is decreasing, but accuracy is never increasing above 0.05. I'm stuck with this issue for days now, and I need help about how to get the model to train properly.
The following is a link to the dataset images and masks:
https://drive.google.com/drive/folders/1-JvLSa9b1falqEake2KVaYYtyVh-dgKY?usp=sharing
In the Sequence class, you do not shuffle the content of the batches, only the batch order is shuffled with the fit method. You have to shuffle the order of all the data at each epoch.
Here is a way to do it in a Sequence subclass:
class OxfordPets(keras.utils.Sequence):
"""Helper to iterate over the data (as Numpy arrays)."""
def __init__(self, batch_size, img_size, input_img_paths, target_img_paths):
self.batch_size = batch_size
self.img_size = img_size
self.input_img_paths = input_img_paths
self.target_img_paths = target_img_paths
self.set_len = len(self.target_img_paths) // self.batch_size
self.indices = random.sample(range(self.set_len), k=self.set_len)
def __len__(self):
return self.set_len
def __getitem__(self, idx):
"""Returns tuple (input, target) correspond to batch #idx."""
i = idx * self.batch_size
indices = self.indices[i : i + self.batch_size]
batch_input_img_paths = [self.input_img_paths[k] for k in indices]
batch_target_img_paths = [self.target_img_paths[k] for k in indices]
x = np.zeros((self.batch_size,) + self.img_size + (3,), dtype="float32")
for j, path in enumerate(batch_input_img_paths):
img = load_img(path, target_size=self.img_size)
x[j] = img
y = np.zeros((self.batch_size,) + self.img_size + (1,), dtype="uint8")
for j, path in enumerate(batch_target_img_paths):
img = load_img(path, target_size=self.img_size, color_mode="grayscale")
y[j] = np.expand_dims(img, 2)
# Ground truth labels are 1, 2, 3. Subtract one to make them 0, 1, 2:
#y[j] -= 1 # I commented this line out because the ground truth labels of GAPs dataset are 0, 1, 2, 3, 4, 5, 6, 7
return x, y
def on_epoch_end(self):
self.indices = random.sample(range(self.set_len), k=self.set_len)
self.indices is a random shuffle of the all indices range(self.set_len) and it is built in the constructor and at the end of each epoch. This permits to shuffle the order of all the data.
Using rmsprop optimizer, it works then :
Epoch 1/15
88/88 [==============================] - 96s 1s/step - loss: 1.9617 - categorical_accuracy: 0.9156 - val_loss: 5.8705 - val_categorical_accuracy: 0.9375
Epoch 2/15
88/88 [==============================] - 93s 1s/step - loss: 0.4754 - categorical_accuracy: 0.9369 - val_loss: 1.9207 - val_categorical_accuracy: 0.9375
Epoch 3/15
88/88 [==============================] - 94s 1s/step - loss: 0.4497 - categorical_accuracy: 0.9447 - val_loss: 9.3833 - val_categorical_accuracy: 0.9375
Epoch 4/15
88/88 [==============================] - 94s 1s/step - loss: 0.3173 - categorical_accuracy: 0.9423 - val_loss: 14.2518 - val_categorical_accuracy: 0.9369
Epoch 5/15
88/88 [==============================] - 94s 1s/step - loss: 0.0645 - categorical_accuracy: 0.9400 - val_loss: 110.9821 - val_categorical_accuracy: 0.8963
Note that there is very quickly some overfitting.

Tensorflow DataSet Shuffle Impact the validation training accuracy and ambiguous behavior

i am struggling with training a neural network that uses tf.data.DataSet as input.
What I find is that if I call .shuffle() before split the entire dataset in train, val, test set the accuracy on val (in training) and test (in evaluate) is 91%, but when I run .evaluate() on the test set many times the accuracy and loss metrics change every time. The same behavior occurs with .predict() on test set, with the classes that change every time.
This is the output of traning, evaluate end predict process
total_record: 93166 - trainin_size: 74534 - val_size: 9316 - test_size: 9316
Epoch 1/5
145/145 [==============================] - 42s 273ms/step - loss: 1.7143 - sparse_categorical_accuracy: 0.4051 - val_loss: 1.4997 - val_sparse_categorical_accuracy: 0.4885
Epoch 2/5
145/145 [==============================] - 40s 277ms/step - loss: 0.7571 - sparse_categorical_accuracy: 0.7505 - val_loss: 1.1634 - val_sparse_categorical_accuracy: 0.6050
Epoch 3/5
145/145 [==============================] - 41s 281ms/step - loss: 0.4894 - sparse_categorical_accuracy: 0.8223 - val_loss: 0.7628 - val_sparse_categorical_accuracy: 0.7444
Epoch 4/5
145/145 [==============================] - 38s 258ms/step - loss: 0.3417 - sparse_categorical_accuracy: 0.8656 - val_loss: 0.4236 - val_sparse_categorical_accuracy: 0.8579
Epoch 5/5
145/145 [==============================] - 40s 271ms/step - loss: 0.2660 - sparse_categorical_accuracy: 0.8926 - val_loss: 0.2807 - val_sparse_categorical_accuracy: 0.9105
accr = model.evaluate(test_set)
19/19 [==============================] - 1s 39ms/step - loss: 0.2622 - sparse_categorical_accuracy: 0.9153
accr = model.evaluate(test_set)
19/19 [==============================] - 1s 40ms/step - loss: 0.2649 - sparse_categorical_accuracy: 0.9170
accr = model.evaluate(test_set)
19/19 [==============================] - 1s 40ms/step - loss: 0.2726 - sparse_categorical_accuracy: 0.9141
accr = model.evaluate(test_set)
19/19 [==============================] - 1s 40ms/step - loss: 0.2692 - sparse_categorical_accuracy: 0.9166
pred = model.predict(test_set)
pred_class = np.argmax(pred, axis=1)
pred_class
Out[41]: array([0, 1, 5, ..., 2, 0, 1])
pred = model.predict(test_set)
pred_class = np.argmax(pred, axis=1)
pred_class
Out[42]: array([2, 3, 1, ..., 1, 2, 0])
pred = model.predict(test_set)
pred_class = np.argmax(pred, axis=1)
pred_class
Out[43]: array([1, 2, 4, ..., 1, 3, 0])
pred = model.predict(test_set)
pred_class = np.argmax(pred, axis=1)
pred_class
Out[44]: array([0, 3, 1, ..., 0, 5, 4])
So, I tried to apply .shuffle() after the split and only on the training and validation (commenting the main .shuffle() and uncommenting the shuffle in train_set and val_set).
But in this case, I find that the network goes into overfitting after just 5 epochs (with the previous training process callbacks block the training at 30° epochs with 94% val accuracy), with an accuracy of 75% since 2° epoch on validation set.
However, in this case if I run .evaluate() and .predict() on the testset to which .shuffle () has not been applied, the metrics and classes remain unchanged on each call.
Why this behavior?
But especially what is the great way and what is the real accuracy of model?
Thank's
This is the code of the process
""" ### Make tf.data.Dataset """
dataset = tf.data.Dataset.from_tensor_slices(({ "features_emb_subj": features_emb_subj,
"features_emb_snip": features_emb_snip,
"features_emb_fromcat": features_emb_fromcat,
"features_dense": features_dense,
"features_emb_user": features_emb_user}, cat_labels))
dataset = dataset.shuffle(int(len(features_dense)), reshuffle_each_iteration=True)
""" ### Split in train,val,test """
train_size = int(0.8 * len(features_dense))
val_size = int(0.10 * len(features_dense))
test_size = int(0.10 * len(features_dense))
test_set = dataset.take(test_size)
validation_set = dataset.skip(test_size).take(val_size)
training_set = dataset.skip(test_size + val_size)
test_set = test_set.batch(BATCH_SIZE, drop_remainder=False)
#validation_set = validation_set.shuffle(val_size, reshuffle_each_iteration=True)
validation_set = validation_set.batch(BATCH_SIZE, drop_remainder=False)
#training_set = training_set.shuffle(train_size, reshuffle_each_iteration=True)
training_set = training_set.batch(BATCH_SIZE, drop_remainder=True)
"""### Train model """
callbacks = [EarlyStopping(monitor='val_loss', patience=3, min_delta=0.0001, restore_best_weights=True)]
history = model.fit( training_set,
epochs = 5,
validation_data = validation_set,
callbacks=callbacks,
class_weight = setClassWeight(cat_labels),
verbose = 1)
"""### Evaluate model """
accr = model.evaluate(test_set)
"""### Predict test_test """
pred = model.predict(test_set)
pred_class = np.argmax(pred, axis=1)
pred_class
In the comments of this Question you can see that shuffle applies to the base dataset, and such is propagated to the references in the train, test and validation sets.
I would recommend to create 3 distinct datasets, using (e.g.) sklearn.model_selection.train_test_split on the original data before tf.data.Dataset.from_tensor_slices on those split tensor slices, so you can use the shuffle on the training dataset only.

custom metric in keras goes to NaN

I am trying to make a custom metric in Keras for evaluating a balanced accuracy score while training and validation. The accuracy is the average of the accuracies for each class.
Let's say if we have 4 classes and accuracy for each of them are [1,1,0.5,0.5], then the balanced accuracy = (1+1+0.5+0.5)/4 = 0.75
I use it in the compilation of model like model.compile(..., metrics = [balanced_acc])
And while training the first few epochs are fine. but it always becomes NaN after few iterations within an epoch.
But it never exceeds 1 before NaN. Namely I don't think it's a numerical error.
And when a new epoch comes, it becomes rational for a while then hit NaN again.
Some records are as follows:
5700/43200 [==>...........................] - ETA: 28s - loss: 3.8991 - accuracy: 0.9254 - f1_m: 0.8699 - balanced_acc: 0.9279
5900/43200 [===>..........................] - ETA: 27s - loss: 4.0909 - accuracy: 0.9258 - f1_m: 0.8718 - balanced_acc: 0.9271
6100/43200 [===>..........................] - ETA: 27s - loss: 4.1779 - accuracy: 0.9274 - f1_m: 0.8737 - balanced_acc: 0.9279
6200/43200 [===>..........................] - ETA: 26s - loss: 4.3051 - accuracy: 0.9268 - f1_m: 0.8746 - balanced_acc: 0.9278
6400/43200 [===>..........................] - ETA: 26s - loss: 4.4368 - accuracy: 0.9266 - f1_m: 0.8763 - balanced_acc: 0.9284
6600/43200 [===>..........................] - ETA: 26s - loss: 4.5449 - accuracy: 0.9267 - f1_m: 0.8779 - balanced_acc: nan
6700/43200 [===>..........................] - ETA: 25s - loss: 4.5420 - accuracy: 0.9273 - f1_m: 0.8786 - balanced_acc: nan
6900/43200 [===>..........................] - ETA: 25s - loss: 4.6166 - accuracy: 0.9265 - f1_m: 0.8801 - balanced_acc: nan
And the code for the custom metric function is:
def balanced_acc(y_true, y_pred):
y_true = K.argmax(y_true, axis = 1)
y_pred = K.argmax(y_pred, axis = 1)
# now working under ordinary label instead of one-hot representation
# datatype should be int for both now.
acc_sum = K.cast(0, dtype='float32') #initialize sum of accuracies for each batch
for class_label in [0,1,2]: # this code is for 3 classes
shape = K.shape(y_true)[0]
empty_index = K.arange(0, shape) # make an indices tensor like [0 1 2 3 4 ... shape-1]
indices = empty_index[tf.math.equal(y_true, class_label)] #get the indices where y_true == the individual class
y_true_class_label = tf.keras.backend.gather(y_true, indices) #gather the elements of that class_label, here just repeating that specific label
y_pred_corresponds = tf.keras.backend.gather(y_pred, indices) #gather the corresponding predictions
acc_sum = acc_sum + tf.contrib.metrics.accuracy(y_true_class_label, y_pred_corresponds)
return acc_sum/3.0

Understanding Keras Callback Output

My Keras model below with callback is giving the following output during training.
from keras.callbacks import ModelCheckpoint
checkpoint = ModelCheckpoint('main_model_weights_new.h5', monitor='val_loss', verbose=1,
save_best_only=False, mode='auto',save_weights_only=True)
import pandas
pandas.DataFrame(model.fit(trainX, trainY, epochs=200, batch_size=100,
validation_data=(testX, testY),
callbacks= [checkpoint]).history).to_csv("history.csv")
I was expecting to see the train loss, train accuracy, valid_loss and valid_accuracy. But as shown below it appears there is one more train_loss and train accuracy, output along with the Valid Loss and Valid Accuracy. Can anyone explain me which one to consider as train_loss here?
Output:
Epoch 1/200 2800/2810 [============================>.] - ETA: 0s - loss: 29.7255 - dense_2_loss_1: 3.9492 - dense_2_loss_2: 5.5785 -
dense_2_loss_3: 5.5198 - dense_2_loss_4: 5.6908 - dense_2_loss_5:
4.9863 - dense_2_loss_6: 4.0008 - dense_2_acc_1: 0.1711 - dense_2_acc_2: 0.0836 - dense_2_acc_3: 0.0821 - dense_2_acc_4: 0.1200
- dense_2_acc_5: 0.2393 - dense_2_acc_6: 0.4171Epoch 00000: saving model to main_model_weights_new.h5 2810/2810
[==============================] - 62s - loss: 29.7213 -
dense_2_loss_1: 3.9471 - dense_2_loss_2: 5.5732 - dense_2_loss_3:
5.5226 - dense_2_loss_4: 5.6907 - dense_2_loss_5: 4.9885 - dense_2_loss_6: 3.9992 - dense_2_acc_1: 0.1715 - dense_2_acc_2: 0.0843
- dense_2_acc_3: 0.0822 - dense_2_acc_4: 0.1199 - dense_2_acc_5: 0.2388 - dense_2_acc_6: 0.4167 - val_loss: 31.5189 - val_dense_2_loss_1: 3.6305 - val_dense_2_loss_2: 6.3004 -
val_dense_2_loss_3: 5.9689 - val_dense_2_loss_4: 5.5387 -
val_dense_2_loss_5: 4.9914 - val_dense_2_loss_6: 5.0890 -
val_dense_2_acc_1: 0.2982 - val_dense_2_acc_2: 0.0351 -
val_dense_2_acc_3: 0.0351 - val_dense_2_acc_4: 0.1228 -
val_dense_2_acc_5: 0.2456 - val_dense_2_acc_6: 0.4035 Epoch 2/200