Getting memory error during preprocessing HMDB51 dataset - tensorflow

I am working on action recognition on HMDB51. Here is my code below.
This part is for declaring some constants and directories:
# Specify the height and width to which each video frame will be resized in our dataset.
IMAGE_HEIGHT , IMAGE_WIDTH = 64, 64
# Specify the number of frames of a video that will be fed to the model as one sequence.
SEQUENCE_LENGTH = 20
# Specify the directory containing the UCF50 dataset.
DATASET_DIR = r"\HMDB51"
# Specify the list containing the names of the classes used for training. Feel free to choose any set of classes.
CLASSES_LIST = ["brush_hair", "cartwheel", "catch", "chew", "clap", "climb", "climb_stairs", "dive",
"draw_sword", "dribble", "drink", "eat", "fall_floor", "fencing", "flic_flac", "golf",
"handstand", "hit", "hug", "jump", "kick", "kick_ball", "kiss", "laugh",
"pick", "pour", "pullup", "punch", "push", "pushup", "ride_bike", "ride_horse",
"run", "shake_hands", "shoot_ball", "shoot_bow", "shoot_gun", "sit", "situp", "smile",
"smoke", "somersault", "stand","swing_baseball", "sword", "sword_exercise", "talk", "throw", "turn",
"walk", "wave"]
This part is for extracting frames from each video:
def frames_extraction(video_path):
# Declare a list to store video frames.
frames_list = []
# Read the Video File using the VideoCapture object.
video_reader = cv2.VideoCapture(video_path)
# Get the total number of frames in the video.
video_frames_count = int(video_reader.get(cv2.CAP_PROP_FRAME_COUNT))
# Calculate the the interval after which frames will be added to the list.
skip_frames_window = max(int(video_frames_count/SEQUENCE_LENGTH), 1)
# Iterate through the Video Frames.
for frame_counter in range(SEQUENCE_LENGTH):
# Set the current frame position of the video.
video_reader.set(cv2.CAP_PROP_POS_FRAMES, frame_counter * skip_frames_window)
# Reading the frame from the video.
success, frame = video_reader.read()
# Check if Video frame is not successfully read then break the loop
if not success:
break
# Resize the Frame to fixed height and width.
resized_frame = cv2.resize(frame, (IMAGE_HEIGHT, IMAGE_WIDTH))
# Normalize the resized frame by dividing it with 255 so that each pixel value then lies between 0 and 1
normalized_frame = resized_frame / 255
# Append the normalized frame into the frames list
frames_list.append(normalized_frame)
# Release the VideoCapture object.
video_reader.release()
# Return the frames list.
return frames_list
This part is for creating train, label lists:
def create_dataset():
'''
This function will extract the data of the selected classes and create the required dataset.
Returns:
features: A list containing the extracted frames of the videos.
labels: A list containing the indexes of the classes associated with the videos.
video_files_paths: A list containing the paths of the videos in the disk.
'''
# Declared Empty Lists to store the features, labels and video file path values.
features = []
labels = []
video_files_paths = []
# Iterating through all the classes mentioned in the classes list
for class_index, class_name in enumerate(CLASSES_LIST):
# Display the name of the class whose data is being extracted.
print(f'Extracting Data of Class: {class_name}')
# Get the list of video files present in the specific class name directory.
files_list = os.listdir(os.path.join(DATASET_DIR, class_name))
# Iterate through all the files present in the files list.
for file_name in files_list:
# Get the complete video path.
video_file_path = os.path.join(DATASET_DIR, class_name, file_name)
# Extract the frames of the video file.
frames = frames_extraction(video_file_path)
# Check if the extracted frames are equal to the SEQUENCE_LENGTH specified above.
# So ignore the vides having frames less than the SEQUENCE_LENGTH.
if len(frames) == SEQUENCE_LENGTH:
# Append the data to their repective lists.
features.append(frames)
labels.append(class_index)
video_files_paths.append(video_file_path)
# Converting the list to numpy arrays
features = np.asarray(features)
labels = np.array(labels)
# Return the frames, class index, and video file path.
return features, labels, video_files_paths
So, when I tried to create the dataset as below:
# Create the dataset.
features, labels, video_files_paths = create_dataset()
I am getting the errow below:
Here it is!
How can I fix this? I think I need to preprocess dataset as batches during training the model. But how can I do that? When I worked on images, I have used keras.utils.image_dataset_from_directory but now should I build my own data loader?

You can also do the task without building your custom data loader.
Install necessary dependencies: tensorflow, keras, glob
then install: !pip install keras-video-generators==1.0.11
Here is a full example below:
import tensorflow as tf
import glob
import keras
from keras_video import VideoFrameGenerator
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import TimeDistributed, GRU, Dense, Flatten
from tensorflow.keras.optimizers import Adam
glob_pattern='/content/HMDB51/{classname}/*.mp4'
classes = [i.split('/')[3] for i in glob.glob('/content/HMDB51/*')]
classes.sort()
SIZE = (128, 128)
CHANNELS = 3
NBFRAME = 20
BS = 5
data_aug = keras.preprocessing.image.ImageDataGenerator(
dtype = 'float16',
rescale = 1./255,
zoom_range = .1,
horizontal_flip = True,
rotation_range = 15
)
train = VideoFrameGenerator(
classes=classes,
glob_pattern=glob_pattern,
nb_frames=NBFRAME,
split_val=.20,
shuffle=True,
batch_size=BS,
target_shape=SIZE,
nb_channel=CHANNELS,
transformation=data_aug,
use_frame_cache=True)
valid = train.get_validation_generator()
def build_vgg(shape=(128, 128, 3), nbout=51):
vgg_model = VGG16(include_top=False, input_shape=shape, weights='imagenet')
for layer in vgg_model.layers:
layer.trainable = False
return vgg_model
def action_model(shape=(20, 128, 128, 3), nbout=51):
convnet = build_vgg(shape[1:])
model = Sequential()
model.add(TimeDistributed(convnet, input_shape=shape))
model.add(Dense(256, activation='relu'))
model.add(TimeDistributed(Flatten()))
model.add(GRU(256, dropout=0.20))
model.add(Dense(128, activation='relu'))
model.add(Dense(nbout, activation='softmax'))
model.summary()
return model
INSHAPE=(NBFRAME,) + SIZE + (CHANNELS,)
model = action_model(INSHAPE, len(classes))
optimizer = Adam(lr=1e-4)
model.compile(
optimizer=optimizer,
loss='categorical_crossentropy',
metrics=['accuracy']
)
EPOCHS=100
history = model.fit_generator(
train,
validation_data=valid,
verbose=1,
epochs=EPOCHS,
callbacks=callbacks
)

Related

TFX Tensorflow model validator component - You passed a data dictionary with keys ['image_raw_xf']. Expected the following keys: ['input_1']

I'm building a tfx pipeline based on the cifar10 example : [https://github.com/tensorflow/tfx/tree/master/tfx/examples/cifar10]
The difference is that I don't want to convert it to tf_lite model and instead use a regular keras based tensorflow model.
Everything works as expected until I get to the Evaluator component as it fails with the following error:
ValueError: Missing data for input "input_1". You passed a data dictionary with keys ['image_xf']. Expected the following keys: ['input_1']
[while running 'Run[Trainer]']
Not sure what I'm doing wrong, but so far I debugged/modified the code as follows:
[1] The preprocessing_fn output is outputting the key image_xf:
_IMAGE_KEY = 'image'
_LABEL_KEY = 'label'
def _transformed_name(key):
return key + '_xf'
def preprocessing_fn(inputs):
"""tf.transform's callback function for preprocessing inputs.
Args:
inputs: map from feature keys to raw not-yet-transformed features.
Returns:
Map from string feature key to transformed feature operations.
"""
outputs = {}
# tf.io.decode_png function cannot be applied on a batch of data.
# We have to use tf.map_fn
image_features = tf.map_fn(
lambda x: tf.io.decode_png(x[0], channels=3),
inputs[_IMAGE_KEY],
dtype=tf.uint8)
# image_features = tf.cast(image_features, tf.float32)
image_features = tf.image.resize(image_features, [224, 224])
image_features = tf.keras.applications.mobilenet.preprocess_input(
image_features)
outputs[_transformed_name(_IMAGE_KEY)] = image_features
#outputs["input_1"] = image_features
# TODO(b/157064428): Support label transformation for Keras.
# Do not apply label transformation as it will result in wrong evaluation.
outputs[_transformed_name(_LABEL_KEY)] = inputs[_LABEL_KEY]
return outputs
[2] When I build the model, I am using transfer learning with an inputLayer with the same name image_xf.
def _build_keras_model() -> tf.keras.Model:
"""Creates a Image classification model with MobileNet backbone.
Returns:
The image classifcation Keras Model and the backbone MobileNet model
"""
# We create a MobileNet model with weights pre-trained on ImageNet.
# We remove the top classification layer of the MobileNet, which was
# used for classifying ImageNet objects. We will add our own classification
# layer for CIFAR10 later. We use average pooling at the last convolution
# layer to get a 1D vector for classifcation, which is consistent with the
# origin MobileNet setup
base_model = tf.keras.applications.MobileNet(
input_shape=(224, 224, 3),
include_top=False,
weights='imagenet',
pooling='avg')
base_model.input_spec = None
# We add a Dropout layer at the top of MobileNet backbone we just created to
# prevent overfiting, and then a Dense layer to classifying CIFAR10 objects
model = tf.keras.Sequential([
tf.keras.layers.InputLayer(
input_shape=(224, 224, 3), name=_transformed_name(_IMAGE_KEY)),
base_model,
tf.keras.layers.Dropout(0.1),
tf.keras.layers.Dense(10, activation='softmax')
])
[3] The model signature is created accordingly:
def _get_serve_image_fn(model, tf_transform_output):
"""Returns a function that feeds the input tensor into the model."""
model.tft_layer = tf_transform_output.transform_features_layer()
#tf.function
def serve_image_fn(serialized_tf_examples):
feature_spec = tf_transform_output.raw_feature_spec()
feature_spec.pop(_LABEL_KEY)
parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)
transformed_features = model.tft_layer(parsed_features)
return model(transformed_features)
return serve_image_fn
def run_fn(fn_args: FnArgs):
tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)
signatures = {
'serving_default':
_get_serve_image_fn(model,tf_transform_output).get_concrete_function(
tf.TensorSpec(
shape=[None],
dtype=tf.string,
name=_IMAGE_KEY))
}
temp_saving_model_dir = os.path.join(fn_args.serving_model_dir)
model.save(temp_saving_model_dir, save_format='tf', signatures=signatures)
Now, I suspect that tensorflow is not saving the model correctly because when I export the saved model, the input layer is input_1 instead of image_xf.
import tensorflow as tf
import numpy as np
import tensorflow.python.ops.numpy_ops.np_config as np_config
np_config.enable_numpy_behavior()
path = './model/Format-Serving/'
imported = tf.saved_model.load(path)
model = tf.keras.models.load_model(path)
print(model.summary())
print(list(imported.signatures.keys()))
print(model.get_layer('mobilenet_1.00_224').layers[0].name)
The thing to notice here is (1) that the Input layer I added in Sequential model above is missing and (2) the mobilenet first layer is input_1, so it makes sense why I'm getting a mismatch.
2021-10-15 08:33:40.683034: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
mobilenet_1.00_224 (Function (None, 1024) 3228864
_________________________________________________________________
dropout (Dropout) (None, 1024) 0
_________________________________________________________________
dense (Dense) (None, 10) 10250
=================================================================
Total params: 3,239,114
Trainable params: 1,074,186
Non-trainable params: 2,164,928
_________________________________________________________________
None
['serving_default']
input_1
So how can I actually get the model to save correctly with the right input?
Here is the full code:
pipeline.py
# Lint as: python2, python3
# Copyright 2019 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""CIFAR10 image classification example using TFX.
This example demonstrates how to do data augmentation, transfer learning,
and inserting TFLite metadata with TFX.
The trained model can be pluged into MLKit for object detection.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import logging
import os
from typing import List, Text
import absl
from tfx import v1 as tfx
import tensorflow_model_analysis as tfma
from tfx.components import Evaluator
from tfx.components import ExampleValidator
from tfx.components import ImportExampleGen
from tfx.components import Pusher
from tfx.components import SchemaGen
from tfx.components import StatisticsGen
from tfx.components import Trainer
from tfx.components import Transform
from tfx.dsl.components.common import resolver
from tfx.dsl.experimental import latest_blessed_model_resolver
from tfx.orchestration import metadata
from tfx.orchestration import pipeline
from tfx.orchestration.beam.beam_dag_runner import BeamDagRunner
from tfx.proto import example_gen_pb2
from tfx.proto import pusher_pb2
from tfx.proto import trainer_pb2
from tfx.types import Channel
from tfx.types.standard_artifacts import Model
from tfx.types.standard_artifacts import ModelBlessing
_pipeline_name = 'cifar10_native_keras'
# This example assumes that CIFAR10 train set data is stored in
# ~/cifar10/data/train, test set data is stored in ~/cifar10/data/test, and
# the utility function is in ~/cifar10. Feel free to customize as needed.
_cifar10_root = os.path.join(os.getcwd())
_data_root = os.path.join(_cifar10_root, 'data')
# Python module files to inject customized logic into the TFX components. The
# Transform and Trainer both require user-defined functions to run successfully.
_module_file = os.path.join(_cifar10_root, 'cifar10_utils_native_keras.py')
# Path which can be listened to by the model server. Pusher will output the
# trained model here.
_serving_model_dir_lite = os.path.join(_cifar10_root, 'serving_model_lite',
_pipeline_name)
# Directory and data locations. This example assumes all of the images,
# example code, and metadata library is relative to $HOME, but you can store
# these files anywhere on your local filesystem.
_tfx_root = os.path.join(os.getcwd(), 'tfx')
_pipeline_root = os.path.join(_tfx_root, 'pipelines', _pipeline_name)
# Sqlite ML-metadata db path.
_metadata_path = os.path.join(_tfx_root, 'metadata', _pipeline_name,
'metadata.db')
# Path to labels file for mapping model outputs.
_labels_path = os.path.join(_data_root, 'labels.txt')
# Pipeline arguments for Beam powered Components.
_beam_pipeline_args = [
'--direct_running_mode=multi_processing',
'--direct_num_workers=0',
]
def _create_pipeline(pipeline_name: Text, pipeline_root: Text, data_root: Text,
module_file: Text, serving_model_dir_lite: Text,
metadata_path: Text,
labels_path: Text,
beam_pipeline_args: List[Text]) -> pipeline.Pipeline:
"""Implements the CIFAR10 image classification pipeline using TFX."""
# This is needed for datasets with pre-defined splits
# Change the pattern argument to train_whole/* and test_whole/* to train
# on the whole CIFAR-10 dataset
input_config = example_gen_pb2.Input(splits=[
example_gen_pb2.Input.Split(name='train', pattern='train/*'),
example_gen_pb2.Input.Split(name='eval', pattern='test/*')
])
# Brings data into the pipeline.
example_gen = ImportExampleGen(
input_base=data_root, input_config=input_config)
# Computes statistics over data for visualization and example validation.
statistics_gen = StatisticsGen(examples=example_gen.outputs['examples'])
# Generates schema based on statistics files.
schema_gen = SchemaGen(
statistics=statistics_gen.outputs['statistics'], infer_feature_shape=True)
# Performs anomaly detection based on statistics and data schema.
example_validator = ExampleValidator(
statistics=statistics_gen.outputs['statistics'],
schema=schema_gen.outputs['schema'])
# Performs transformations and feature engineering in training and serving.
transform = Transform(
examples=example_gen.outputs['examples'],
schema=schema_gen.outputs['schema'],
module_file=module_file)
model_resolver = resolver.Resolver(
#instance_name='latest_model_resolver',
strategy_class=tfx.dsl.experimental.LatestArtifactStrategy,
model=Channel(type=Model)).with_id('latest_blessed_model_resolver')
# Uses user-provided Python function that trains a model.
# When traning on the whole dataset, use 18744 for train steps, 156 for eval
# steps. 18744 train steps correspond to 24 epochs on the whole train set, and
# 156 eval steps correspond to 1 epoch on the whole test set. The
# configuration below is for training on the dataset we provided in the data
# folder, which has 128 train and 128 test samples. The 160 train steps
# correspond to 40 epochs on this tiny train set, and 4 eval steps correspond
# to 1 epoch on this tiny test set.
trainer = Trainer(
module_file=module_file,
examples=transform.outputs['transformed_examples'],
transform_graph=transform.outputs['transform_graph'],
schema=schema_gen.outputs['schema'],
base_model=model_resolver.outputs['model'],
train_args=trainer_pb2.TrainArgs(num_steps=160),
eval_args=trainer_pb2.EvalArgs(num_steps=4),
custom_config={'labels_path': labels_path})
# Get the latest blessed model for model validation.
# model_resolver = resolver.Resolver(
# strategy_class=latest_blessed_model_resolver.LatestBlessedModelResolver,
# model=Channel(type=Model),
# model_blessing=Channel(
# type=ModelBlessing)).with_id('latest_blessed_model_resolver')
# Uses TFMA to compute evaluation statistics over features of a model and
# perform quality validation of a candidate model (compare to a baseline).
eval_config = tfma.EvalConfig(
model_specs=[tfma.ModelSpec(label_key='label')],
slicing_specs=[tfma.SlicingSpec()],
metrics_specs=[
tfma.MetricsSpec(metrics=[
tfma.MetricConfig(
class_name='SparseCategoricalAccuracy',
threshold=tfma.MetricThreshold(
value_threshold=tfma.GenericValueThreshold(
lower_bound={'value': 0.55}),
# Change threshold will be ignored if there is no
# baseline model resolved from MLMD (first run).
change_threshold=tfma.GenericChangeThreshold(
direction=tfma.MetricDirection.HIGHER_IS_BETTER,
absolute={'value': -1e-3})))
])
])
# Uses TFMA to compute the evaluation statistics over features of a model.
# We evaluate using the materialized examples that are output by Transform
# because
# 1. the decoding_png function currently performed within Transform are not
# compatible with TFLite.
# 2. MLKit requires deserialized (float32) tensor image inputs
# Note that for deployment, the same logic that is performed within Transform
# must be reproduced client-side.
evaluator = Evaluator(
examples=example_gen.outputs['examples'],
model=trainer.outputs['model'],
#baseline_model=model_resolver.outputs['model'],
eval_config=eval_config)
# Checks whether the model passed the validation steps and pushes the model
# to a file destination if check passed.
pusher = Pusher(
model=trainer.outputs['model'],
model_blessing=evaluator.outputs['blessing'],
push_destination=pusher_pb2.PushDestination(
filesystem=pusher_pb2.PushDestination.Filesystem(
base_directory=serving_model_dir_lite)))
components = [
example_gen, statistics_gen, schema_gen, example_validator, transform,
trainer, model_resolver, evaluator, pusher
]
return pipeline.Pipeline(
pipeline_name=pipeline_name,
pipeline_root=pipeline_root,
components=components,
enable_cache=True,
metadata_connection_config=metadata.sqlite_metadata_connection_config(
metadata_path),
beam_pipeline_args=beam_pipeline_args)
# To run this pipeline from the python CLI:
# $python cifar_pipeline_native_keras.py
if __name__ == '__main__':
loggers = [logging.getLogger(name) for name in logging.root.manager.loggerDict]
for logger in loggers:
logger.setLevel(logging.INFO)
logging.getLogger().setLevel(logging.INFO)
absl.logging.set_verbosity(absl.logging.FATAL)
BeamDagRunner().run(
_create_pipeline(
pipeline_name=_pipeline_name,
pipeline_root=_pipeline_root,
data_root=_data_root,
module_file=_module_file,
serving_model_dir_lite=_serving_model_dir_lite,
metadata_path=_metadata_path,
labels_path=_labels_path,
beam_pipeline_args=_beam_pipeline_args))
utils file:
# Lint as: python2, python3
# Copyright 2019 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Python source file includes CIFAR10 utils for Keras model.
The utilities in this file are used to build a model with native Keras.
This module file will be used in Transform and generic Trainer.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from typing import List, Text
import absl
import tensorflow as tf
import tensorflow_transform as tft
from tfx.components.trainer.fn_args_utils import DataAccessor
from tfx.components.trainer.fn_args_utils import FnArgs
from tfx.components.trainer.rewriting import converters
from tfx.components.trainer.rewriting import rewriter
from tfx.components.trainer.rewriting import rewriter_factory
from tfx.dsl.io import fileio
from tfx_bsl.tfxio import dataset_options
# import flatbuffers
# from tflite_support import metadata_schema_py_generated as _metadata_fb
# from tflite_support import metadata as _metadata
# When training on the whole dataset use following constants instead.
# This setting should give ~91% accuracy on the whole test set
# _TRAIN_DATA_SIZE = 50000
# _EVAL_DATA_SIZE = 10000
# _TRAIN_BATCH_SIZE = 64
# _EVAL_BATCH_SIZE = 64
# _CLASSIFIER_LEARNING_RATE = 3e-4
# _FINETUNE_LEARNING_RATE = 5e-5
# _CLASSIFIER_EPOCHS = 12
_TRAIN_DATA_SIZE = 128
_EVAL_DATA_SIZE = 128
_TRAIN_BATCH_SIZE = 32
_EVAL_BATCH_SIZE = 32
_CLASSIFIER_LEARNING_RATE = 1e-3
_FINETUNE_LEARNING_RATE = 7e-6
_CLASSIFIER_EPOCHS = 30
_IMAGE_KEY = 'image'
_LABEL_KEY = 'label'
_TFLITE_MODEL_NAME = 'tflite'
def _transformed_name(key):
return key + '_xf'
def _get_serve_image_fn(model, tf_transform_output):
"""Returns a function that feeds the input tensor into the model."""
model.tft_layer = tf_transform_output.transform_features_layer()
#tf.function
def serve_image_fn(serialized_tf_examples):
feature_spec = tf_transform_output.raw_feature_spec()
feature_spec.pop(_LABEL_KEY)
parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)
transformed_features = model.tft_layer(parsed_features)
return model(transformed_features)
return serve_image_fn
def _image_augmentation(image_features):
"""Perform image augmentation on batches of images .
Args:
image_features: a batch of image features
Returns:
The augmented image features
"""
batch_size = tf.shape(image_features)[0]
image_features = tf.image.random_flip_left_right(image_features)
image_features = tf.image.resize_with_crop_or_pad(image_features, 250, 250)
image_features = tf.image.random_crop(image_features,
(batch_size, 224, 224, 3))
return image_features
def _data_augmentation(feature_dict):
"""Perform data augmentation on batches of data.
Args:
feature_dict: a dict containing features of samples
Returns:
The feature dict with augmented features
"""
image_features = feature_dict[_transformed_name(_IMAGE_KEY)]
image_features = _image_augmentation(image_features)
feature_dict[_transformed_name(_IMAGE_KEY)] = image_features
return feature_dict
def _input_fn(file_pattern: List[Text],
data_accessor: DataAccessor,
tf_transform_output: tft.TFTransformOutput,
is_train: bool = False,
batch_size: int = 200) -> tf.data.Dataset:
"""Generates features and label for tuning/training.
Args:
file_pattern: List of paths or patterns of input tfrecord files.
data_accessor: DataAccessor for converting input to RecordBatch.
tf_transform_output: A TFTransformOutput.
is_train: Whether the input dataset is train split or not.
batch_size: representing the number of consecutive elements of returned
dataset to combine in a single batch
Returns:
A dataset that contains (features, indices) tuple where features is a
dictionary of Tensors, and indices is a single Tensor of label indices.
"""
dataset = data_accessor.tf_dataset_factory(
file_pattern,
dataset_options.TensorFlowDatasetOptions(
batch_size=batch_size, label_key=_transformed_name(_LABEL_KEY)),
tf_transform_output.transformed_metadata.schema)
# Apply data augmentation. We have to do data augmentation here because
# we need to apply data agumentation on-the-fly during training. If we put
# it in Transform, it will only be applied once on the whole dataset, which
# will lose the point of data augmentation.
if is_train:
dataset = dataset.map(lambda x, y: (_data_augmentation(x), y))
return dataset
def _freeze_model_by_percentage(model: tf.keras.Model, percentage: float):
"""Freeze part of the model based on specified percentage.
Args:
model: The keras model need to be partially frozen
percentage: the percentage of layers to freeze
Raises:
ValueError: Invalid values.
"""
if percentage < 0 or percentage > 1:
raise ValueError('Freeze percentage should between 0.0 and 1.0')
if not model.trainable:
raise ValueError(
'The model is not trainable, please set model.trainable to True')
num_layers = len(model.layers)
num_layers_to_freeze = int(num_layers * percentage)
for idx, layer in enumerate(model.layers):
if idx < num_layers_to_freeze:
layer.trainable = False
else:
layer.trainable = True
def _build_keras_model() -> tf.keras.Model:
"""Creates a Image classification model with MobileNet backbone.
Returns:
The image classifcation Keras Model and the backbone MobileNet model
"""
# We create a MobileNet model with weights pre-trained on ImageNet.
# We remove the top classification layer of the MobileNet, which was
# used for classifying ImageNet objects. We will add our own classification
# layer for CIFAR10 later. We use average pooling at the last convolution
# layer to get a 1D vector for classifcation, which is consistent with the
# origin MobileNet setup
base_model = tf.keras.applications.MobileNet(
input_shape=(224, 224, 3),
include_top=False,
weights='imagenet',
pooling='avg')
base_model.input_spec = None
# We add a Dropout layer at the top of MobileNet backbone we just created to
# prevent overfiting, and then a Dense layer to classifying CIFAR10 objects
model = tf.keras.Sequential([
tf.keras.layers.InputLayer(
input_shape=(224, 224, 3), name=_transformed_name(_IMAGE_KEY)),
base_model,
tf.keras.layers.Dropout(0.1),
tf.keras.layers.Dense(10, activation='softmax')
])
# Freeze the whole MobileNet backbone to first train the top classifer only
_freeze_model_by_percentage(base_model, 1.0)
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=tf.keras.optimizers.RMSprop(lr=_CLASSIFIER_LEARNING_RATE),
metrics=['sparse_categorical_accuracy'])
model.summary(print_fn=absl.logging.info)
return model, base_model
# TFX Transform will call this function.
def preprocessing_fn(inputs):
"""tf.transform's callback function for preprocessing inputs.
Args:
inputs: map from feature keys to raw not-yet-transformed features.
Returns:
Map from string feature key to transformed feature operations.
"""
outputs = {}
# tf.io.decode_png function cannot be applied on a batch of data.
# We have to use tf.map_fn
image_features = tf.map_fn(
lambda x: tf.io.decode_png(x[0], channels=3),
inputs[_IMAGE_KEY],
dtype=tf.uint8)
# image_features = tf.cast(image_features, tf.float32)
image_features = tf.image.resize(image_features, [224, 224])
image_features = tf.keras.applications.mobilenet.preprocess_input(
image_features)
outputs[_transformed_name(_IMAGE_KEY)] = image_features
#outputs["input_1"] = image_features
# TODO(b/157064428): Support label transformation for Keras.
# Do not apply label transformation as it will result in wrong evaluation.
outputs[_transformed_name(_LABEL_KEY)] = inputs[_LABEL_KEY]
return outputs
# TFX Trainer will call this function.
def run_fn(fn_args: FnArgs):
"""Train the model based on given args.
Args:
fn_args: Holds args used to train the model as name/value pairs.
Raises:
ValueError: if invalid inputs.
"""
tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)
baseline_path = fn_args.base_model
if baseline_path is not None:
model = tf.keras.models.load_model(os.path.join(baseline_path))
else:
train_dataset = _input_fn(
fn_args.train_files,
fn_args.data_accessor,
tf_transform_output,
is_train=True,
batch_size=_TRAIN_BATCH_SIZE)
eval_dataset = _input_fn(
fn_args.eval_files,
fn_args.data_accessor,
tf_transform_output,
is_train=False,
batch_size=_EVAL_BATCH_SIZE)
model, base_model = _build_keras_model()
absl.logging.info('Tensorboard logging to {}'.format(fn_args.model_run_dir))
# Write logs to path
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir=fn_args.model_run_dir, update_freq='batch')
# Our training regime has two phases: we first freeze the backbone and train
# the newly added classifier only, then unfreeze part of the backbone and
# fine-tune with classifier jointly.
steps_per_epoch = int(_TRAIN_DATA_SIZE / _TRAIN_BATCH_SIZE)
total_epochs = int(fn_args.train_steps / steps_per_epoch)
if _CLASSIFIER_EPOCHS > total_epochs:
raise ValueError('Classifier epochs is greater than the total epochs')
absl.logging.info('Start training the top classifier')
model.fit(
train_dataset,
epochs=_CLASSIFIER_EPOCHS,
steps_per_epoch=steps_per_epoch,
validation_data=eval_dataset,
validation_steps=fn_args.eval_steps,
callbacks=[tensorboard_callback])
absl.logging.info('Start fine-tuning the model')
# Unfreeze the top MobileNet layers and do joint fine-tuning
_freeze_model_by_percentage(base_model, 0.9)
# We need to recompile the model because layer properties have changed
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=tf.keras.optimizers.RMSprop(lr=_FINETUNE_LEARNING_RATE),
metrics=['sparse_categorical_accuracy'])
model.summary(print_fn=absl.logging.info)
model.fit(
train_dataset,
initial_epoch=_CLASSIFIER_EPOCHS,
epochs=total_epochs,
steps_per_epoch=steps_per_epoch,
validation_data=eval_dataset,
validation_steps=fn_args.eval_steps,
callbacks=[tensorboard_callback])
# Prepare the TFLite model used for serving in MLKit
signatures = {
'serving_default':
_get_serve_image_fn(model,tf_transform_output).get_concrete_function(
tf.TensorSpec(
shape=[None],
dtype=tf.string,
name=_IMAGE_KEY))
}
temp_saving_model_dir = os.path.join(fn_args.serving_model_dir)
model.save(temp_saving_model_dir, save_format='tf', signatures=signatures)
# tfrw = rewriter_factory.create_rewriter(
# rewriter_factory.TFLITE_REWRITER,
# name='tflite_rewriter')
# converters.rewrite_saved_model(temp_saving_model_dir,
# fn_args.serving_model_dir, tfrw,
# rewriter.ModelType.TFLITE_MODEL)
# # Add necessary TFLite metadata to the model in order to use it within MLKit
# # TODO(dzats#): Handle label map file path more properly, currently
# # hard-coded.
# tflite_model_path = os.path.join(fn_args.serving_model_dir,
# _TFLITE_MODEL_NAME)
# # TODO(dzats#): Extend the TFLite rewriter to be able to add TFLite metadata
# ## to the model.
# _write_metadata(
# model_path=tflite_model_path,
# label_map_path=fn_args.custom_config['labels_path'],
# mean=[127.5],
# std=[127.5])
# fileio.rmtree(temp_saving_model_dir)
Ok I found the answer. Because the model is expecting the input_1 name, then in _get_serve_image_fn, I need to create the dictionary key, such as:
def _get_serve_image_fn(model, tf_transform_output):
"""Returns a function that feeds the input tensor into the model."""
model.tft_layer = tf_transform_output.transform_features_layer()
#tf.function
def serve_image_fn(serialized_tf_examples):
feature_spec = tf_transform_output.raw_feature_spec()
feature_spec.pop(_LABEL_KEY)
parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)
transformed_features = model.tft_layer(parsed_features)
transformed_features[model.get_layer('mobilenet_1.00_224').layers[0].name] = transformed_features[_transformed_name(_IMAGE_KEY)]
del transformed_features[_transformed_name(_IMAGE_KEY)]
return model(transformed_features)
return serve_image_fn

CNN sudden drop in accuracy after steady increase over ~24 epochs

I'm attempting to create a CNN to classify emotions based on facial expressions from an image. I'm using this dataset I found on Kaggle, in csv format with each row having grayscale image data and emotion. It has ~30,000 data points randomly split into 80% training, 10% validation and 10% testing.
While adjusting settings and structure of my CNN I've experienced at least one of these problems at a time.
CNN only (or mainly) predicting one or two outputs.
Validation Accuracy that does not change at all
Accuracy fluctuating around equivalent to random guessing (16.66%) +/- 10%
Constant accuracy and loss for both training and data
I've tried changing learning rate, optimizer, batch size, number of filters, epochs. I've used a different data set and tried class weights for slight imbalance as well as varying numbers of hidden layers and complexities.
I've also tried training on one sample but got this for accuracy and this for loss.
The following graphs are for the configuration currently in the code, though changing what I've listed above just results in any of the problems I mentioned.
Train/Val Accuracy during training
Train/Vall Loss during training
I suspected it may be due to labels being mismatched with data, so I manually check whether labels and images matched in train_set, which they did.
However using the following code to check them in all_sets (after conversion) gave several mismatched data samples out of 16.
for num, row in enumerate(all_sets[0]):
if num < 16:
newimage = tensorflow.keras.preprocessing.image.array_to_img(row)
newimage.save("filename.png") # takes type from filename extension
display(Image('filename.png'))
print(classes[all_labels[0][num]])
!rm -rf filename.png
Conversion code in question:
tempSets = [train_set, test_set, val_set] #store all sets in list to iterate through
#panda reads image from CSV as a string rather than an array of floats.
#convert from panda dataframe to numpy array, as easier to feed into image augmentation
set_sizes = [train_set.shape[0] , test_set.shape[0], val_set.shape[0]] #Array storing number of records for each set
all_sets = [np.empty([set_sizes[0], 48,48,1], dtype=float), # train
np.empty([set_sizes[1], 48,48,1], dtype=float), # test
np.empty([set_sizes[2], 48,48,1], dtype=float)] # validate
all_labels = [np.empty(set_sizes[0], dtype=int), # train
np.empty(set_sizes[1], dtype=int), # test
np.empty(set_sizes[2], dtype=int)] # validate
for count, val in enumerate(tempSets): # for each set
for num, row in enumerate(val.itertuples()): #each row in a set, store as a tuple and keep index num
num_str_list = row._2.split() # split long string into array of string seperated by whitespaces (_2 is emotion, for some reason it renames itself)
for convertI in range(len(num_str_list)): # for each element in string array
num_str_list[convertI] = float(num_str_list[convertI]) #convert to float and store in new (1d) array
for xPixel in range(48):
for yPixel in range(48):
#match indexes to map 1d array contents into 2d
all_sets[count][num, xPixel, yPixel, 0] = num_str_list[xPixel*48 + yPixel]
all_labels[count][num] = data['emotion'][num] #assign labels with matching index
I'm afraid I'm lost as to where the issue is, as I can't find problems in my conversion or in displaying the images. I would appreciate any help. Thank you.
My full code:
drive.mount('/content/drive', force_remount=False)
driveContent='/content/drive/My Drive/EmotionClassification'
#Set base_dir as current working directory
base_dir = os.getcwd()
#Check whether training data is present in current working directory
dataSetIsInColab = False
for fname in os.listdir(base_dir): #get all names in current directory
if fname == 'icml_face_data.csv':
dataSetIsInColab = True
if dataSetIsInColab == False:
#dataset.zip is a zip file containing a CSV file which holds the dataset.
zip_path = os.path.join(driveContent,'dataset.zip') #grab content from drive
!cp "{zip_path}" . #Copy zip file to current working directory
!unzip -q dataset.zip # Unzip (-q prevents printing all files)
!rm dataset.zip #Remove zip file since we now have unzipped data
data = panda.read_csv('icml_face_data.csv') # import data as panda dataframe object
data.pop(' Usage') #remove unnecessary column
data.rename(columns={" pixels":"pixels"}) # remove space from column name as it can cause issues
data[data['emotion']!=1] #remove disgust emotion due to large imbalance
data.loc[data['emotion'] > 1, 'emotion'] = data['emotion'] - 1 #shift emotion values to fill disgust slot
classes = ['anger', 'fear', 'happiness', 'sadness', 'surprise', 'neutral']
#Use panda dataframe methods to randomly split dataset
train_set = data.sample(frac=0.8) # Train set is 80%
temp_set = data.drop(train_set.index) # Temp set is 20%
test_set = temp_set.sample(frac=0.5) #Test set is half of 20%
val_set = temp_set.drop(test_set.index) #Val set is the other half of 20%
#train_set.reset_index(drop=True, inplace=True)
#test_set.reset_index(drop=True, inplace=True)
#val_set.reset_index(drop=True, inplace=True)
tempSets = [train_set, test_set, val_set] #store all sets in list to iterate through
#panda reads image from CSV as a string rather than an array of floats.
#convert from panda dataframe to numpy array, as easier to feed into image augmentation
set_sizes = [train_set.shape[0] , test_set.shape[0], val_set.shape[0]] #Array storing number of records for each set
all_sets = [np.empty([set_sizes[0], 48,48,1], dtype=float), # train
np.empty([set_sizes[1], 48,48,1], dtype=float), # test
np.empty([set_sizes[2], 48,48,1], dtype=float)] # validate
all_labels = [np.empty(set_sizes[0], dtype=int), # train
np.empty(set_sizes[1], dtype=int), # test
np.empty(set_sizes[2], dtype=int)] # validate
for count, val in enumerate(tempSets): # for each set
for num, row in enumerate(val.itertuples()): #each row in a set, store as a tuple and keep index num
num_str_list = row._2.split() # split long string into array of string seperated by whitespaces (_2 is emotion, for some reason it renames itself)
for convertI in range(len(num_str_list)): # for each element in string array
num_str_list[convertI] = float(num_str_list[convertI]) #convert to float and store in new (1d) array
for xPixel in range(48):
for yPixel in range(48):
#match indexes to map 1d array contents into 2d
all_sets[count][num, xPixel, yPixel, 0] = num_str_list[xPixel*48 + yPixel]
all_labels[count][num] = data['emotion'][num] #assign labels with matching index
#Check to use GPU if possible
gpu_name = tensorflow.test.gpu_device_name()
if gpu_name != '/device:GPU:0':
print('GPU device not found')
print('Found GPU at: {}'.format(gpu_name))
#SET UP IMAGE AUGMENTATION AND INPUT
imgShape=(48,48,1) # input image dimensions (greyscale)
batchSize = 64
#this datagen is applied to training sets
dataGen = ImageDataGenerator(
rescale=1./255,
rotation_range=45,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
vertical_flip = True)
# Set up data generator batches, shapes and class modes
trainingDataGen = dataGen.flow(
x=all_sets[0],
y=tensorflow.keras.utils.to_categorical(all_labels[0]),
batch_size=batchSize)
testGen = ImageDataGenerator(rescale=1./255).flow(
x=all_sets[1],
y=tensorflow.keras.utils.to_categorical(all_labels[1]))
validGen = ImageDataGenerator(rescale=1./255).flow(
x=all_sets[2],
y=tensorflow.keras.utils.to_categorical(all_labels[2]),
batch_size=batchSize)
# BUILD THE MODEL
model = Sequential()
model.add(Conv2D(256, (3,3), activation='relu', input_shape = imgShape))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.1))
model.add(Conv2D(512, (3,3), activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.1))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(6, activation='softmax'))
model.summary() # Show the structure of the network
opt = tensorflow.keras.optimizers.Adam(learning_rate = 0.0001) # initialise optimizer
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
weights = sklearn.utils.class_weight.compute_class_weight('balanced', np.unique(all_labels[0]), all_labels[0]) # use class weights for imbalanced classes
weights = {i : weights[i] for i in range(6)}
history = model.fit(
trainingDataGen,
batch_size = batchSize,
epochs = 32,
class_weight = weights,
validation_data = validGen
)
EDIT: I found the issue in my conversion, I was assigning labels based on the unsplit data instead of each split set individually.
From all_labels[count][num] = data['emotion'][num] to
all_labels[count][num] = row.emotion
Labels are matching now. Training accuracy/loss steadily improves, but validation accuracy and loss are fluctuating a lot. I'm going to do some more tweaking, but is this something I should worry about?
EDIT 2: Made title more fitting
So I left it training for more epochs, and here are the results.
accuracy and loss.
Peaking at 35% isn't ideal
EDIT 3: It's back to fluctuating wildly.
accuracy and loss.
I used some code to test predictions of my network after training. Here are results;
Total tests performed: 4068
Total accuracy: 0.17649950835791545
anger : {'Successes': 0, 'Occured': 567} Accuracy: 0.0
fear : {'Successes': 0, 'Occured': 633} Accuracy: 0.0
happiness : {'Successes': 0, 'Occured': 1023} Accuracy: 0.0
sadness : {'Successes': 0, 'Occured': 685} Accuracy: 0.0
surprise : {'Successes': 0, 'Occured': 442} Accuracy: 0.0
neutral : {'Successes': 718, 'Occured': 718} Accuracy: 1.0
Code below;
#Test our model's performance
truth = []
predict = []
#Test a batch of 128 test images against the model
for i in range(128):
a,b = next(testGen)
predict.append(model.predict(a))
truth.append(b)
predict = np.concatenate(predict) #Array of predictions model has made
truth = np.concatenate(truth) #Array of true results
successes = 0
items = []
for i in range(len(classes)):
items.append({"Successes" : 0, "Occured" : 0})
for i in range(0, len(predict)):
items[np.argmax(truth[i])]["Occured"] += 1
if (np.argmax(predict[i]) == np.argmax(truth[i])): # If the model's prediction matches the true value
successes += 1
items[np.argmax(truth[i])]["Successes"] += 1
print("Total tests performed: ", len(predict))
print("Total accuracy: ", successes/len(predict))
print()
for i in range(len(classes)):
print(classes[i], ':', items[i], "Accuracy: ", (items[i]["Successes"]/items[i]["Occured"]))

I want to convert my binary classification model to multiclass classification model I am taking labels using directory names

This is my code below it works fine for classification of two categories of images it takes labels based on directory names but whenever I add one more directory it stops working can someone help me
This is my code for image classification for images from two directories and two labels but when I convert it to three labels/ directories I get an error the error is posted below can someone help me solve the problem This if for image classification
I have tried removing the NumPy array I somewhere saw I need to just pass it through a CNN but I couldn't do that.
I am trying to make a classifier for pneumonia caused by a coronavirus and other disease using frontal chest x rays
from tensorflow.keras.preprocessing.image import ImageDataGeneratorfrom
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import cv2
import os
# construct the argument parser and parse the arguments
# initialize the initial learning rate, number of epochs to train for,
# and batch size
INIT_LR = 1e-3
EPOCHS = 40
BS = 66
# grab the list of images in our dataset directory, then initialize
# the list of data (i.e., images) and class images
print("[INFO] loading images...")
imagePaths = list(paths.list_images('/content/drive/My Drive/testset/'))
data = []
labels = []
# loop over the image paths
for imagePath in imagePaths:
# extract the class label from the filename
label = imagePath.split(os.path.sep)[-2]
# load the image, swap color channels, and resize it to be a fixed
# 224x224 pixels while ignoring aspect ratio
image = cv2.imread(imagePath)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (224, 224))
# update the data and labels lists, respectively
data.append(image)
labels.append(label)
# convert the data and labels to NumPy arrays while scaling the pixel
# intensities to the range [0, 255]
data = np.array(data) / 255.0
labels = np.array(labels)
# perform one-hot encoding on the labels
lb = LabelBinarizer()
labels = lb.fit_transform(labels)
labels = to_categorical(labels)
# partition the data into training and testing splits using 80% of
# the data for training and the remaining 20% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
test_size=0.20, stratify=labels, random_state=42)
# initialize the training data augmentation object
trainAug = ImageDataGenerator(
rotation_range=15,
fill_mode="nearest")
# load the VGG16 network, ensuring the head FC layer sets are left
# off
baseModel = VGG16(weights="imagenet", include_top=False,
input_tensor=Input(shape=(224, 224, 3)))
# construct the head of the model that will be placed on top of the
# the base model
headModel = baseModel.output
headModel = AveragePooling2D(pool_size=(4, 4))(headModel)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(64, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(2, activation="softmax")(headModel)
# place the head FC model on top of the base model (this will become
# the actual model we will train)
model = Model(inputs=baseModel.input, outputs=headModel)
# loop over all layers in the base model and freeze them so they will
# *not* be updated during the first training process
for layer in baseModel.layers:
layer.trainable = False
# compile our model
print("[INFO] compiling model...")
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="binary_crossentropy", optimizer=opt, metrics=["accuracy"])
# train the head of the network
print("[INFO] training head...")
H = model.fit(
trainAug.flow(trainX, trainY, batch_size=BS),
steps_per_epoch=len(trainX) // BS,
validation_data=(testX, testY),
validation_steps=len(testX) // BS,
epochs=EPOCHS)
# make predictions on the testing set
print("[INFO] evaluating network...")
predIdxs = model.predict(testX, batch_size=BS)
# for each image in the testing set we need to find the index of the
# label with corresponding largest predicted probability
predIdxs = np.argmax(predIdxs, axis=1)
# show a nicely formatted classification report
print(classification_report(testY.argmax(axis=1), predIdxs,
target_names=lb.classes_))
# compute the confusion matrix and and use it to derive the raw
# accuracy, sensitivity, and specificity
cm = confusion_matrix(testY.argmax(axis=1), predIdxs)
total = sum(sum(cm))
acc = (cm[0, 0] + cm[1, 1]) / total
sensitivity = cm[0, 0] / (cm[0, 0] + cm[0, 1])
specificity = cm[1, 1] / (cm[1, 0] + cm[1, 1])
# show the confusion matrix, accuracy, sensitivity, and specificity
print(cm)
print("acc: {:.4f}".format(acc))
print("sensitivity: {:.4f}".format(sensitivity))
print("specificity: {:.4f}".format(specificity))
# plot the training loss and accuracy
N = EPOCHS
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, N), H.history["accuracy"], label="train_acc")
plt.plot(np.arange(0, N), H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy on COVID-19 Dataset")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig("plot.png")
# serialize the model to disk
print("[INFO] saving COVID-19 detector model...")
model.save('/content/drive/My Drive/setcovid/model.h5', )
This is the error I got in my program
There are a few changes you need to make it work. The error you're getting is because of one-hot-encode. You're encoding your labels to one-hot twice.
lb = LabelBinarizer()
labels = lb.fit_transform(labels)
labels = to_categorical(labels)
Please remove the last line 'to_categorical' from your code. You will get the one-hot encode in the correct format. It will fix the error you're getting now.
And there is another problem I must mention. Your model output layer has only 2 neurons but you want to classify 3 classes. Please set the output layer neurons to 3.
headModel = Dense(3, activation="softmax")(headModel)
And you're now training with 3 classes, it's not binary anymore. You have to use another loss. I will recommend you to use categorical.
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
You also forgot to import the followings. Add these imports too.
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.layers import *
And you're good to go.
Btw, I'm pretty much afraid of the batch size(66) you're using. I don't know which GPU you have but still, I would suggest you decrease the batch size.

Predictions in recorded video using object detection tensorflow API

I am trying to read a video file (using opencv), loop over all frames using tensorflow's object-detection API to do the predictions and bounding boxes, and writing the predicted frames (with boxes) to a new video file. I used the object_detection_tutorial.ipynb with some modifications to capture the video frames and process it in faster-rcnn-inception-resnet-v2 loaded from a frozen graph (after trained).
I am using a tesla P100 gpu in a cloud machine with windows 10 and 56GB ram. Also using tensorflow-gpu.
When I run the code, it takes 0,5 second per frame. Is it a normal speed for a tesla P100 or I am doing something wrong in the code to make it slower?
This code is just a test, as later I will have to use it in a real time video prediction task. If 0,5 second per frame is an expected speed using tensorflow API, I think I will cannot use it in my task :(
So, after running it, i get the following running times
processing frame number 1.0
time to capture video frame 0.0
time to predict 0.49225664138793945
time to generate boxes in a frame 0.14833950996398926
time to write a frame in video file 0.04687023162841797
total time in the loop 0.6874663829803467
As you guys can see, the code using the CPU (opencv) goes fast. But when I use the GPU, it takes almost 0,5 seconds just in prediction task (used in sess.run).
Any advices? Thank you in advance. Bellow follows my code
from distutils.version import StrictVersion
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import time
from collections import defaultdict
from io import StringIO
#from matplotlib import pyplot as plt
from PIL import Image
import cv2
from imutils import paths
import re
#This is needed since the code is stored in the object_detection folder.
sys.path.append("..")
from object_detection.utils import ops as utils_ops
if StrictVersion(tf.__version__) < StrictVersion('1.9.0'):
raise ImportError('Please upgrade your TensorFlow installation to v1.9.* or later!')
from utils import label_map_util
from utils import visualization_utils as vis_util
#Detection using tensorflow inside write_video function
def write_video():
filename = 'output/teste_v2.avi'
codec = cv2.VideoWriter_fourcc('W', 'M', 'V', '2')
cap = cv2.VideoCapture('pneu_trim2.mp4')
framerate = round(cap.get(5),2)
w = int(cap.get(3))
h = int(cap.get(4))
resolution = (w, h)
VideoFileOutput = cv2.VideoWriter(filename, codec, framerate, resolution)
################################
# # Model preparation
# ## Variables
#
# Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_FROZEN_GRAPH` to point to a new .pb file.
#
# What model to download.
MODEL_NAME = 'training/pneu_incep_step_24887'
print("loading model from " + MODEL_NAME)
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'object-detection.pbtxt')
NUM_CLASSES = 5
# ## Load a (frozen) Tensorflow model into memory.
time_graph = time.time()
print('loading graphs')
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
print("tempo build graph = " + str(time.time() - time_graph))
# ## Loading label map
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
################################
with tf.Session(graph=detection_graph) as sess:
with detection_graph.as_default():
while (cap.isOpened()):
time_loop = time.time()
print('processing frame number: ' + str(cap.get(1)))
time_captureframe = time.time()
ret, image_np = cap.read()
print("time to capture video frame = " + str(time.time() - time_captureframe))
if (ret != True):
break
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
#image_np = load_image_into_numpy_array(image)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
scores = detection_graph.get_tensor_by_name('detection_scores:0')
classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
# Actual detection.
time_prediction = time.time()
(boxes, scores, classes, num_detections) = sess.run(
[boxes, scores, classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
print("time to predict = " + str(time.time() - time_prediction))
# Visualization of the results of a detection.
time_visualizeboxes = time.time()
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8)
print("time to generate boxes in a frame = " + str(time.time() - time_visualizeboxes))
time_writeframe = time.time()
VideoFileOutput.write(image_np)
print("time to write a frame in video file = " + str(time.time() - time_writeframe))
print("total time in the loop = " + str(time.time() - time_loop))
cap.release()
VideoFileOutput.release()
print('done')
Actually the problem is with the model you were using.
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
Basically the model Faster-rcnn-inception-resnet-v2 will take more time.
You can refer the link to know the speed for the model

CNN Training in Keras freezes

I am training a CNN model in Keras (Tensorflow backend). I have used on the fly augmentation with fit_generator(). The model takes images aa input and is supposed to predict the steering angle for a self driving car. The training just freezes after this point. I have tried changing the batch size, learning rate etc, but it doesn't work.
The training freezes at the end of first epoch.
Please help!
[BATCH_SIZE=32
INPUT_IMAGE_ROWS=160
INPUT_IMAGE_COLS=320
INPUT_IMAGE_CHANNELS=3
AUGMENTATION_NUM_BINS=200
NUM_EPOCHS=3
AUGMENTATION_BIN_MAX_PERC=0.5
AUGMENTATION_FACTOR=3
import csv
import cv2
import numpy as np
from random import shuffle
from sklearn.model_selection import train_test_split
import keras
from keras.callbacks import Callback
import math
from keras.preprocessing.image import *
print("\nLoading the dataset from file ...")
def load_dataset(file_path):
dataset = \[\]
with open(file_path) as csvfile:
reader = csv.reader(csvfile)
for line in reader:
try:
dataset.append({'center':line\[0\], 'left':line\[1\], 'right':line\[2\], 'steering':float(line\[3\]),
'throttle':float(line\[4\]), 'brake':float(line\[5\]), 'speed':float(line\[6\])})
except:
continue # some images throw error during loading
return dataset
dataset = load_dataset('C:\\Users\\kiit1\\Documents\\steering angle prediction\\dataset_coldivision\\data\\driving_log.csv')
print("Loaded {} samples from file {}".format(len(dataset),'C:\\Users\\kiit1\\Documents\\steering angle prediction\\dataset_coldivision\\data\\driving_log.csv'))
print("Partioning the dataset:")
shuffle(dataset)
#partitioning data into 80% training, 19% validation and 1% testing
X_train,X_validation=train_test_split(dataset,test_size=0.2)
X_validation,X_test=train_test_split(X_validation,test_size=0.05)
print("X_train has {} elements.".format(len(X_train)))
print("X_validation has {} elements.".format(len(X_validation)))
print("X_test has {} elements.".format(len(X_test)))
print("Partitioning the dataset complete.")
def generate_batch_data(dataset, batch_size = 32):
global augmented_steering_angles
global epoch_steering_count
global epoch_bin_hits
batch_images = np.zeros((batch_size, INPUT_IMAGE_ROWS, INPUT_IMAGE_COLS, INPUT_IMAGE_CHANNELS))
batch_steering_angles = np.zeros(batch_size)
while 1:
for batch_index in range(batch_size):
# select a random image from the dataset
image_index = np.random.randint(len(dataset))
image_data = dataset\[image_index\]
while 1:
try:
image, steering_angle = load_and_augment_image(image_data)
except:
continue
bin_idx = int (steering_angle * AUGMENTATION_NUM_BINS / 2)
if( epoch_bin_hits\[bin_idx\] < epoch_steering_count/AUGMENTATION_NUM_BINS*AUGMENTATION_BIN_MAX_PERC
or epoch_steering_count<500 ):
batch_images\[batch_index\] = image
batch_steering_angles\[batch_index\] = steering_angle
augmented_steering_angles.append(steering_angle)
epoch_bin_hits\[bin_idx\] = epoch_bin_hits\[bin_idx\] + 1
epoch_steering_count = epoch_steering_count + 1
break
yield batch_images, batch_steering_angles
print("\nTraining the model ...")
class LifecycleCallback(keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs={}):
pass
def on_epoch_end(self, epoch, logs={}):
global epoch_steering_count
global epoch_bin_hits
global bin_range
epoch_steering_count = 0
epoch_bin_hits = {k:0 for k in range(-bin_range, bin_range)}
def on_batch_begin(self, batch, logs={}):
pass
def on_batch_end(self, batch, logs={}):
self.losses.append(logs.get('loss'))
def on_train_begin(self, logs={}):
print('Beginning training')
self.losses = \[\]
def on_train_end(self, logs={}):
print('Ending training')
# Compute the correct number of samples per epoch based on batch size
def compute_samples_per_epoch(array_size, batch_size):
num_batches = array_size / batch_size
samples_per_epoch = math.ceil(num_batches)
samples_per_epoch = samples_per_epoch * batch_size
return samples_per_epoch
def load_and_augment_image(image_data, side_camera_offset=0.2):
# select a value between 0 and 2 to swith between center, left and right image
index = np.random.randint(3)
if (index==0):
image_file = image_data\['left'\].strip()
angle_offset = side_camera_offset
elif (index==1):
image_file = image_data\['center'\].strip()
angle_offset = 0.
elif (index==2):
image_file = image_data\['right'\].strip()
angle_offset = - side_camera_offset
steering_angle = image_data\['steering'\] + angle_offset
image = cv2.imread(image_file)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# apply a misture of several augumentation methods
image, steering_angle = random_transform(image, steering_angle)
return image, steering_angle
augmented_steering_angles = \[\]
epoch_steering_count = 0
bin_range = int(AUGMENTATION_NUM_BINS / 4 * 3)
epoch_bin_hits = {k:0 for k in range(-bin_range, bin_range)}
#flips image about y-axis
def horizontal_flip(image,steering_angle):
flipped_image=cv2.flip(image,1);
steering_angle=-steering_angle
return flipped_image,steering_angle
def translate(image,steering_angle,width_shift_range=50.0,height_shift_range=5.0):
tx = width_shift_range * np.random.uniform() - width_shift_range / 2
ty = height_shift_range * np.random.uniform() - height_shift_range / 2
# new steering angle
steering_angle += tx / width_shift_range * 2 * 0.2
transformed_matrix=np.float32(\[\[1,0,tx\],\[0,1,ty\]\])
rows,cols=(image.shape\[0\],image.shape\[1\])
translated_image=cv2.warpAffine(image,transformed_matrix,(cols,rows))
return translated_image,steering_angle
def brightness(image,bright_increase=None):
if(image.shape\[2\]>1):
image_hsv=cv2.cvtColor(image,cv2.COLOR_RGB2HSV)
else:
image_hsv=image
if bright_increase:
image_hsv\[:,:,2\] += bright_increase
else:
bright_increase = int(30 * np.random.uniform(-0.3,1))
image_hsv\[:,:,2\] = image\[:,:,2\] + bright_increase
image = cv2.cvtColor(image_hsv, cv2.COLOR_HSV2RGB)
return image
def rotation(image,rotation_range=5):
image=random_rotation(image,rotation_range);
return image
# Shift range for each channels
def channel_shift(image, intensity=30, channel_axis=2):
image = random_channel_shift(image, intensity, channel_axis)
return image
# Crop and resize the image
def crop_resize_image(image, cols=INPUT_IMAGE_COLS, rows=INPUT_IMAGE_ROWS, top_crop_perc=0.1, bottom_crop_perc=0.2):
height = image.shape\[0\]
width= image.shape\[1\]
# crop top and bottom
top_rows = int(height*top_crop_perc)
bottom_rows = int(height*bottom_crop_perc)
image = image\[top_rows:height-bottom_rows, 0:width\]
# resize to the final sizes even the aspect ratio is destroyed
image = cv2.resize(image, (cols, rows), interpolation=cv2.INTER_LINEAR)
return image
# Apply a sequence of random tranformations for a better generalization and to prevent overfitting
def random_transform(image, steering_angle):
# all further transformations are done on the smaller image to reduce the processing time
image = crop_resize_image(image)
# every second image is flipped horizontally
if np.random.random() < 0.5:
image, steering_angle = horizontal_flip(image, steering_angle)
image, steering_angle = translate(image, steering_angle)
image = rotation(image)
image = brightness(image)
image = channel_shift(image)
return img_to_array(image), steering_angle
from keras.models import Sequential, Model
from keras.layers.core import Lambda, Dense, Activation, Flatten, Dropout
from keras.layers.convolutional import Cropping2D, Convolution2D
from keras.layers.advanced_activations import ELU
from keras.layers.noise import GaussianNoise
from keras.optimizers import Adam
print("\nBuilding and compiling the model ...")
model = Sequential()
model.add(Lambda(lambda x: (x / 127.5) - 1.0, input_shape=(INPUT_IMAGE_ROWS, INPUT_IMAGE_COLS, INPUT_IMAGE_CHANNELS)))
# Conv Layer1 of 16 filters having size(8, 8) with strides (4,4)
model.add(Convolution2D(16, 8, 8, subsample=(4, 4), border_mode="same"))
model.add(ELU())
# Conv Layer1 of 32 filters having size(5, 5) with strides (2,2)
model.add(Convolution2D(32, 5, 5, subsample=(2, 2), border_mode="same"))
model.add(ELU())
# Conv Layer1 of 64 filters having size(5, 5) with strides (2,2)
model.add(Convolution2D(64, 5, 5, subsample=(2, 2), border_mode="same"))
model.add(Flatten())
model.add(Dropout(.5))
model.add(ELU())
model.add(Dense(512))
model.add(Dropout(.5))
model.add(ELU())
model.add(Dense(1))
model.summary()
adam = Adam(lr=0.0001)
model.compile(loss='mse', optimizer=adam)
lifecycle_callback = LifecycleCallback()
train_generator = generate_batch_data(X_train, BATCH_SIZE)
validation_generator = generate_batch_data(X_validation, BATCH_SIZE)
samples_per_epoch = compute_samples_per_epoch((len(X_train)*AUGMENTATION_FACTOR), BATCH_SIZE)
nb_val_samples = compute_samples_per_epoch((len(X_validation)*AUGMENTATION_FACTOR), BATCH_SIZE)
history = model.fit_generator(train_generator,
validation_data = validation_generator,
samples_per_epoch = ((len(X_train) // BATCH_SIZE ) * BATCH_SIZE) * 2,
nb_val_samples = ((len(X_validation) // BATCH_SIZE ) * BATCH_SIZE) * 2,
nb_epoch = NUM_EPOCHS, verbose=1,
)
print("\nTraining the model ended.")][1]
You have a weird structure for the data generator and that is most likely causing this issue, though I cannot be completely sure.
You structure is as follows:
while 1:
....
for _ in range(batch_size):
randomly select an image # this is inefficient, see below for comments
while 1:
process image
if epoch is not done:
collect images in a list
break
yield ...
Now,
Do not choose images randomly at each iteration. Instead shuffle your dataset once at the starting of each epoch and then choose sequentially.
As far as I understood, if epoch is not done, then break is a typo. Did you mean if epoch is not done then collect images, otherwise break? Your break is inside the if which means when it enters if for the first time, it will come out of the innermost while 1 loop. Surely not what you intend to do, right?
The yield is outside the for loop. You should yield each batch, so if for is iterating over batches, then yield should be inside for.
The structure of a basic data generator should be:
while 1:
shuffle entire dataset once # not applicable for massive datasets
for _ in range(n_batches_per_epoch):
get a data batch
Optionally, do some preprocessing # preferably on the entire batch,
not one by one, you could also preprocess the entire dataset if its simple
enough, such as mean subtraction.
yield batches, labels
I would suggest you to again write the data generator. You could see the myGenerator() function on this page for a basic data generator. Once you write the generator, then test it as a stand-alone function to make sure it outputs the data indefinitely and keeps the track of epochs.
In short, it is hard to say which part is problematic, maybe data, maybe a model, or something else. So please be patient, and you will resolve the issue eventually.
First of all, you can train a baseLine model without data augmentation. If your data augmentation is helpful, you shall expect performance improvement after applying data augmentation to the new augmLine model.
If baseLine behaves similarly to augmLine, you may consider changing your network design. For example, in your current design, 1) Conv2D layers without any activation are very rare, and you may want to use relu or tanh, and 2) ELU(alpha) is known to be sensitive to the alpha value.
If baseLine actually works fine, this is an indicator that your augmLine's data is problematic. To ensure the correctness of the augmented data, you'd better plot both image data and target values and manually verify them. One common mistake for image data augmentation is that if the target values depend on the input image, then you have to generate new target values according to the augmented image. Sometimes this task is not trivial.
Note, to have a fair comparison, you need to keep validation data unchanged for both experiments.