TensorFlow Inception Model Can't See Files In Folder - tensorflow

I hope everyone is having a great day.
I'm attempting to train a convnet for fun, but the TensorFlow Inception v3 model doesn't find any files within my folders even though they are there.
I get this error when trying to train it:
Looking for images in 'evening_star'
No files found
Looking for images in 'morning_star'
No files found
No valid folders of images found at /tf_files/image_recognition
Does anyone have any ideas as to why this could be occurring?

check out your files extension. I had the same problem because all my images were "png". If you have a look into the script "retrain.py", you'll see:
extensions = ['jpg', 'jpeg', 'JPG', 'JPEG']
just modify it to allow "png" as well and problem solved:
extensions = ['jpg', 'jpeg', 'JPG', 'JPEG', 'png', 'PNG']
hope this help you!

Related

when trying to load external tfrecord with TFDS, given tf.train.Example, how to get tfds.features?

What I need help with / What I was wondering
Hi, I am trying to load external tfrecord files with TFDS. I have read the official doc here, and find I need to define the feature structure using tfds.features. However, since the tfrecords files are alreay generated, I do not have control the generation pipeline. I do, however, know the tf.train.Example structre used in TFRecordWriter during generation, shown as follows.
from tensorflow.python.training.training import BytesList, Example, Feature, Features, Int64List
dict(Example=Features({
'image': Feature(bytes_list=BytesList(value=[img_str])), # img_str is jpg encoded image raw bytes
'caption': Feature(bytes_list=BytesList(value=[caption])), # caption is a string
'height': Feature(bytes_list=Int64List(value=[caption])),
'width': Feature(bytes_list=Int64List(value=[caption])),
})
The doc only describes how to translate tfds.features into the human readable structure of the tf.train.Example. But nowhere does it mention how to translate a tf.train.Example into tfds.features, which is needed to automatically add the proper metadata fileswith tfds.folder_dataset.write_metadata.
I wonder how to translate the above tf.train.Example into tfds.features? Thanks a lot!
BTW, while I understand that it is possible to directly read the data as it is in TFRecord with tf.data.TFRecordDataset and then use map(decode_fn) for decoding as suggested here, it seems to me this approach lacks necessary metadata like num_shards or shard_lengths. In this case, I am not sure if it is still ok to use common operations like cache/repeat/shuffle/map/batch on that tf.data.TFRecordDataset. So I think it is better to stick to the tfds approach.
What I've tried so far
I have searched the official doc for quite some time but cannot find the answer. There is a Scalar class in tfds.features, which I assume could be used to decode Int64List. But How can I decode the BytesList?
Environment information
tensorflow-datasets version: 4.8.2
tensorflow version: 2.11.0
After some searching, I find the simplest solution is
features = tfds.features.FeaturesDict({
'image': tfds.features.Image(), # << Ideally best if add the `shape=(h, w, c)` info
'caption': tfds.features.Text(),
'height': tf.int32,
'width': tf.int32,
})
Then I can load the data either with tfds.folder_dataset.write_metadata or directly with:
ds = tf.data.TFRecordDataset()
ds = ds.map(features.deserialize_example)
batch, shuffle ,... will work as expected in both cases.
The TFDS metadata would be helpful if fine-grained split control is needed.

How to convert the .wav file to tfrecord file?

I am new in the tensorflow part and hope someone can help me.
I've seen this document https://www.tensorflow.org/api_docs/python/tf/audio/decode_wav and executed tf.audio.decode_wav( contents, desired_channels=-1, desired_samples=-1, name=None )
The only thing I've changed is the contents, changing to my path name.
But still get an error!
Any methods to convert it? And I want to output something like this. .tfrecord-00000-of-00008
Read the .wav file into a string of bytes and then decode it:
import tensorflow as tf
wav_contents = tf.io.read_file("file.wav")
audio, sample_rate = tf.audio.decode_wav(contents=wav_contents)
audio.shape
This example was borrowed from the TensorFlow tutorial on reading audio files.

Keras ImageDataGenerator: PIL.UnidentifiedImageError

I tried to use ImageDataGenerator to build generator images to train my model, But I am unable to do so because of the PIL.UnidentifiedImageError error. I tried different datasets and the problem pertains only to my dataset.
Now I can't unfortunately delete all the training/testing images as an answer suggested but I can remove the files causing this problem. How can I detect the error causing files?
This is a common problem particularly if you download images from say google. I developed a function that given a directory, it will go through all sub directories and check the files in each sub directory to ensure the have proper extensions and are valid image files. Code is provided below. It returns two lists. good_list is a list of valid image files and bad_list is a list of invalid image files. You will need to have Opencv installed.If you do not have it installed use pip install opencv-contrib-python.
def test_images(dir):
import os
import cv2
bad_list=[]
good_list=[]
good_exts=['jpg', 'png', 'bmp','tiff','jpeg', 'gif'] # make a list of acceptable image file types
for klass in os.listdir(dir) : # iterate through the sub directories
class_path=os.path.join (dir, klass) # create path to sub directory
if os.path.isdir(class_path):
for f in os.listdir(class_path): # iterate through image files
f_path=os.path.join(class_path, f) # path to image files
ext=f[f.rfind('.')+1:] # get the files extension
if ext not in good_exts:
print(f'file {f_path} has an invalid extension {ext}')
bad_list.append(f_path)
else:
try:
img=cv2.imread(f_path)
size=img.shape
good_list.append(f_path)
except:
print(f'file {f_path} is not a valid image file ')
bad_list.append(f_path)
else:
print(f'** WARNING ** directory {dir} has files in it, it should only contain sub directories')
return good_list, bad_list

Accessing already downloaded dataset with tensorflow_datasets API

I am trying to work with the quite recently published tensorflow_dataset API to train a Keras model on the Open Images Dataset. The dataset is about 570 GB in size. I downloaded the data with the following code:
import tensorflow_datasets as tfds
import tensorflow as tf
open_images_dataset = tfds.image.OpenImagesV4()
open_images_dataset.download_and_prepare(download_dir="/notebooks/dataset/")
After the download was complete, the connection to my jupyter notebook somehow interrupted but the extraction seemed to be finished as well, at least all downloaded files had a counterpart in the "extracted" folder. However, I am not able to access the downloaded data now:
tfds.load(name="open_images_v4", data_dir="/notebooks/open_images_dataset/extracted/", download=False)
This only gives the following error:
AssertionError: Dataset open_images_v4: could not find data in /notebooks/open_images_dataset/extracted/. Please make sure to call dataset_builder.download_and_prepare(), or pass download=True to tfds.load() before trying to access the tf.data.Dataset object.
When I call the function download_and_prepare() it only downloads the whole dataset again.
Am I missing something here?
Edit:
After the download the folder under "extracted" has 18 .tar.gz files.
This is with tensorflow-datasets 1.0.1 and tensorflow 2.0.
The folder hierarchy should be like this:
/notebooks/open_images_dataset/extracted/open_images_v4/0.1.0
All the datasets have a version. Then the data could be loaded like this.
ds = tf.load('open_images_v4', data_dir='/notebooks/open_images_dataset/extracted', download=False)
I didn't have open_images_v4 data. I put cifar10 data into a folder named open_images_v4 to check what folder structure tensorflow_datasets was expecting.
The solution to this was to also use the "data_dir" parameter when initializing the dataset:
builder = tfds.image.OpenImagesV4(data_dir="/raid/openimages/dataset")
builder.download_and_prepare(download_dir="/raid/openimages/dataset")
This way the dataset is donwloaded and extracted in the same directory. Before, it was (for me unnoticeably) extracting to the default directory, which is under /home/.../. That's what caused the error, as there wasn't enough space left under my home directory.
After the extraction, the folder structure is exactly as Manoj-Mohan described.
Above solution haven't worked for me.
builder = tfds.builder(name='folder_name', data_dir=data_dir)
builder.download_and_prepare(download_dir="/home/...")
ds = builder.as_dataset()

ValueError: Input 0 of node Variable/Assign was passed int32 from Variable:0 incompatible with expected int32_ref

I am currently trying to get a trained TF seq2seq model working with Tensorflow.js. I need to get the json files for this. My input is a few sentences and the output is "embeddings". This model is working when I read in the checkpoint however I can't get it converted for tf.js. Part of the process for conversion is to get my latest checkpoint frozen as a protobuf (pb) file and then convert that to the json formats expected by tensorflow.js.
The above is my understanding and being that I haven't done this before, it may be wrong so please feel free to correct if I'm wrong in what I have deduced from reading.
When I try to convert to the tensorflow.js format I use the following command:
sudo tensorflowjs_converter --input_format=tf_frozen_model
--output_node_names='embeddings'
--saved_model_tags=serve
./saved_model/model.pb /web_model
This then displays the error listed in this post:
ValueError: Input 0 of node Variable/Assign was passed int32 from
Variable:0 incompatible with expected int32_ref.
One of the problems I'm running into is that I'm really not even sure how to troubleshoot this. So I was hoping that perhaps one of you maybe had some guidance or maybe you know what my issue may be.
I have upped the code I used to convert the checkpoint file to protobuf at the link below. I then added to the bottom of the notebook an import of that file that is then providing the same error I get when trying to convert to tensorflowjs format. (Just scroll to the bottom of the notebook)
https://github.com/xtr33me/textsumToTfjs/blob/master/convert_ckpt_to_pb.ipynb
Any help would be greatly appreciated!
Still unsure as to why I was getting the above error, however in the end I was able to resolve this issue by just switching over to using TF's SavedModel via tf.saved_model. A rough example of what worked for me can be found below should anyone in the future run into something similar. After saving out the below model, I was then able to perform the tensorflowjs_convert call on it and export the correct files.
if first_iter == True: #first time through
first_iter = False
#Lets try saving this badboy
cwd = os.getcwd()
path = os.path.join(cwd, 'simple')
shutil.rmtree(path, ignore_errors=True)
inputs_dict = {
"batch_decoder_input": tf.convert_to_tensor(batch_decoder_input)
}
outputs_dict = {
"batch_decoder_output": tf.convert_to_tensor(batch_decoder_output)
}
tf.saved_model.simple_save(
sess, path, inputs_dict, outputs_dict
)
print('Model Saved')
#End save model code