I try to learn machine learning from the TensorFlow official tutorial.
But most tutorials do the download in command prompt.
I can't find any tutorial about loading my own image dataset from my own disk.
Would be great if you can give me a direct answer.
I put the image data set on my window 10 desktop:
C:\Users\User\Desktop\DataSet\coins\data
\test (label 1-211)
\train (label 1-211)
\validation (label 1-211)
You can use image_dataset_from_directory for this where you just have to pass in the path to the files in the argument directory.
from tensorflow.keras.preprocessing import image_dataset_from_directory
train_dataset = image_dataset_from_directory(
directory=TRAIN_DIR,
labels="inferred",
label_mode="categorical",
image_size=SIZE,
seed=SEED,
subset=None,
interpolation="bilinear",
follow_links=False,
)
validation_dataset = image_dataset_from_directory(
directory=VALIDATION_DIR,
labels="inferred",
label_mode="categorical",
image_size=SIZE,
seed=SEED,
subset=None,
interpolation="bilinear",
follow_links=False,
)
test_dataset = image_dataset_from_directory(
directory=TEST_DIR,
labels="inferred",
label_mode="categorical",
image_size=SIZE,
seed=SEED,
subset=None,
interpolation="bilinear",
follow_links=False,
)
you can use flow_from_disk in keras.
here is a pretty good tutorial
flow from disk in keras
Related
I created my tf.data.Dataset from the image files in the directory:
train_ds = tf.keras.utils.image_dataset_from_directory(
"home/the path/to the directory/",
validation_split=0.2,
subset="training",
seed=13,
image_size=image_size,
batch_size=batch_size,
)
val_ds = tf.keras.utils.image_dataset_from_directory(
"home/the path/to the directory/",
validation_split=0.2,
subset="validation",
seed=13,
image_size=image_size,
batch_size=batch_size,
)
I save the dataset using the
tf.data.experimental.save(train_ds, path)
tf.data.experimental.save(val_ds, path)
The original directory contained jpg-images totaling about 500 MB, but the saved binary files after method tf.data.experimental.save() are 15 GB each!
What did I do wrong?
I am fairly new to tensor flow and I am trying to train a BERT model for a binary classification task.
I have a data set in a single CSV file that looks like this:
Description
Target
This text passed
1
This text failed
0
I loaded the data set as a pandas data frame.
The guide I am using is the official tensorflow guide I found here.
The guide uses the IMDb dataset that is structured in separate folders.
This is the code block that created the TensorFlow dataset:
AUTOTUNE = tf.data.AUTOTUNE
batch_size = 32
seed = 42
raw_train_ds = tf.keras.preprocessing.text_dataset_from_directory(
'aclImdb/train',
batch_size=batch_size,
validation_split=0.2,
subset='training',
seed=seed)
class_names = raw_train_ds.class_names
train_ds = raw_train_ds.cache().prefetch(buffer_size=AUTOTUNE)
val_ds = tf.keras.preprocessing.text_dataset_from_directory(
'aclImdb/train',
batch_size=batch_size,
validation_split=0.2,
subset='validation',
seed=seed)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
test_ds = tf.keras.preprocessing.text_dataset_from_directory(
'aclImdb/test',
batch_size=batch_size)
test_ds = test_ds.cache().prefetch(buffer_size=AUTOTUNE)
My question: is there a way to convert my Pandas dataframe into the same format?
I.e How do I generate train_ds, test_ds, and val_ds from a pandas data frame?
I've got a dataset coming in via
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=validation_split,
subset="training",
seed=seed,
image_size=(img_height, img_width),
batch_size=batch_size)
(Based around code from https://www.tensorflow.org/tutorials/load_data/images with very minor changes to configuration)
I'm converting the eventual model to a TFLite model, which is working, but I think the model's too large for the end device so I'm trying to run post training quantization by supplying a representative_dataset (like https://www.tensorflow.org/lite/performance/post_training_quantization)
However I can't work out how to turn the dataset generated from image_dataset_from_directory into the format expected by representative_dataset
The example provided has
def representative_dataset():
for data in tf.data.Dataset.from_tensor_slices((images)).batch(1).take(100):
yield [data.astype(tf.float32)]
I've tried things like
def representative_dataset():
for data in train_ds.batch(1).take(100):
yield [data.astype(tf.float32)]
but that wasn't it
Looks like
def representative_dataset():
for image_batch, labels_batch in train_ds:
yield [image_batch]
Was what I was looking for, image_batch is already tf.float32
I wasn't able to get tf.keras.preprocessing.image_dataset_from_directory to work, but I had some luck with tf.keras.preprocessing.ImageDataGenerator.
In my case, the images were in the 'images/all' directory. I had to make sure to remove any non-image files (e.g. XML annotations) from that directory.
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.mobilenet import preprocess_input
def representative_dataset():
test_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
test_generator = test_datagen.flow_from_directory(
'./images',
target_size=(300, 300),
batch_size=1,
classes=['all'],
class_mode='categorical')
for ind in range(len(test_generator.filenames)):
img_with_label = test_generator.next()
yield [np.array(img_with_label[0], dtype=np.float32, ndmin=2)]
I'm trying to implement an Autoencoder in Tensorflow 2.3. I am taking my own Image dataset stored on disk as input.can someone explain to me how this can be done in a correct way?
I tried loading the data in directory using tf.keras.preprocessing.image_dataset_from_directory() but when I use start training with the data taken from above method I am getting following error.
"ValueError: y argument is not supported when using dataset as input."
PFB the code that I am running
'''
import tensorflow as tf
from convautoencoder import ConvAutoencoder
from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt
import numpy as np
EPOCHS = 25
batch_size = 1
img_height = 180
img_width = 180
data_dir = "/media/aniruddha/FE47-91B8/Laptop_Backup/Auto-Encoders/Basic/data"
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
(encoder, decoder, autoencoder) = ConvAutoencoder.build(224, 224, 3)
opt = Adam(lr=1e-3)
autoencoder.compile(loss="mse", optimizer=opt)
H = autoencoder.fit( train_ds, train_ds, validation_data=(val_ds, val_ds), epochs=EPOCHS, batch_size=batch_size)
'''
I resolved this. I was not feeding the input dataset as a tuple to the model for training. Once I corrected that the training started.
I used generators to feed the input data as tuple to the autoencoder.
Please find my code below.
# initialize the training training data augmentation object
trainAug = ImageDataGenerator(rescale=1. / 255)
valAug = ImageDataGenerator(rescale=1. / 255)
# initialize the training generator
trainGen = trainAug.flow_from_directory(
config.TRAIN_PATH,
class_mode="input",
classes=None,
target_size=(64, 64),
color_mode="grayscale",
shuffle=True,
batch_size=BS)
# initialize the validation generator
valGen = valAug.flow_from_directory(
config.TRAIN_PATH,
class_mode="input",
classes=None,
target_size=(64, 64),
color_mode="grayscale",
shuffle=False,
batch_size=BS)
# initialize the testing generator
testGen = valAug.flow_from_directory(
config.TRAIN_PATH,
class_mode="input",
classes=None,
target_size=(64, 64),
color_mode="grayscale",
shuffle=False,
batch_size=BS)
early_stop = EarlyStopping(monitor='val_loss', patience=20)
mc = ModelCheckpoint('best_model_1.h5', monitor='val_loss', mode='min', save_best_only=True)
# construct our convolutional autoencoder
print("[INFO] building autoencoder...")
(encoder, decoder, autoencoder) = ConvAutoencoder.build(64, 64, 1)
opt = Adam(learning_rate= 0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-04, amsgrad=False)
autoencoder.compile(loss="mse", optimizer=opt)
# train the convolutional autoencoder
H = autoencoder.fit( trainGen, validation_data=valGen, epochs=EPOCHS, batch_size=BS ,callbacks=[ mc , early_stop])
fit is expecting data and labels, but it only accepts a single tf.data.Dataset. To use data as labels for the autoencoder you should provide it twice to the dataset constructor, e.g. :
dataset = tf.data.Dataset.from_tensor_slices((images, images))
Firstly, I trained a ResNet50 to be a six-class classifier from scratch on Kaggle, and got like this.
As you can see, the accuracy of training set and validation set improved steadily.
And after that, I rented a cloud host on the internet for a better GPU(1080ti), and copied my code (I uploaded my Jupyter notebook). And then I runned it. But strange things happened. My validation accuracy is extremely unsteady and always fluctuated widely (around 0.3). Here's the screenshot.
And also, the training on the host is much more difficult than on Kaggle kernel.
Here are the screenshots after some epochs.(actually the host's one is trained over much more epochs than the Kaggle's one)
And here's my codes of ImageDataGenerator.
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.1,
zoom_range=0.1,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True,
vertical_flip=True,
validation_split=0.1
)
test_datagen = ImageDataGenerator(
rescale=1./255,
validation_split=0.1
)
train_generator = train_datagen.flow_from_directory(
base_path,
target_size=(300, 300),
batch_size=16,
class_mode='categorical',
subset='training',
seed=0
)
validation_generator = test_datagen.flow_from_directory(
base_path,
target_size=(300, 300),
batch_size=16,
class_mode='categorical',
subset='validation',
seed=0
)