MNIST-like issue. Convolutional Neural Network - tensorflow

This should be easy for some, but I'm a bit new to Tensorflow and all my research has brought me to multi-thousand line gits and I'm just curious if there is a simpler alternative for a beginner. I had an idea which inputs a 200x260 color image and outputs a one-hot vector between 1-10. I realized it is very similar to MNIST, but Tensorflow does not have any documentation on how the mnist library turns its images into a usable form. Does anybody have any ideas to turn a folder of about 200 images (yes, I know, small) into a usable form? I already have my one-hot vectors. Also, I set my placeholder shape as tf.placeholder(tf.float32,[None, 200, 260, 3]) Would that work? I would really prefer to maintain color as well. Thanks for any tips!

First, you can import all of your images using imread from skimage
For example:
my_image = skimage.io.imread('./path/myimage.png')
Then if all of them are in the size you desired (200x260) then you can normalize them by dividing all of them by 255 (normalized to a value between 0 and 1). If not, you can use resize from skimage, this will automatically resize and normalize the images for you.
For example
my_image = skimage.transform.resize(my_image, (200, 260))
To visualize it, you can use imshow from matplotlib.pyplot to plot the image.
For the convenient next_batch function that grabs next batch built in in Tensorflow, you can use the following code
i = 0
def next_batch(batch_size):
x = training_images[i:i + batch_size]
y = training_labels[i:i + batch_size]
i = (i + batch_size) % len(training_images)
return x, y
Then you can create your CNN and train the images. The placeholder you created for X looks right.

I also struggled with that in the beginning, but the best way that I know to get data into tensorflow would be to convert your images into the tfRecord format. Especially if you have a large dataset that doesn't fit into RAM. That way tensorflow can load in your data as needed (You need to provide input functions to convert your files back).
Although this might not be/certainly isnt the easiest way it would probably be the best in the long run in case you want to add more images.
To easiest way would just be to load your images using pillow or any other image library (I'm assuming your using tensorflow with python) and hand them over to tensorflow when running your session.

Related

False prediction from efficientnet transfer learning

I'm new to transfer learning in TensorFlow and I choose tfhub to simplify finding a dataset, but now I'm confused because my model gives me a wrong prediction when I try to use an image from the internet. I used the efficientnet_v2_imagenet1k_b0 feature vector without fine-tuning to train a rock-paper-scissors dataset from https://www.kaggle.com/drgfreeman/rockpaperscissors. I used image data generator and flow from directory for data processing.
This is my model here
This is my train result here
This is my test result here
It's the second time I get something like this when using transfer learning with tfhub. I want to know why this happened and how to fix it, so this problem doesn't happen again. Thanks a lot for your help and sorry for my bad English.
I downloaded your code to my local machine and the dataset as well.
Had to make a few adjustments to make it run locally.
I believe the model efficientnet_v2_imagenet1k_b0 is different
from the newer efficient net models in that this version DOES
require pixel levels to be scaled between 0 and 1. I ran the model
with and without rescaling and it works well only if the pixlels
are rescaled. Below is the code I used to test if the model correctly predicts
an image downloaded from the internet. It worked as expected.
import cv2
class_dict=train_generator.class_indices
print (class_dict)
rev_dict={}
for key, value in class_dict.items():
rev_dict[value]=key
print (rev_dict)
fpath=r'C:\Temp\rps\1.jpg' # an image downloaded from internet that should be paper class
img=plt.imread(fpath)
print (img.shape)
img=cv2.resize(img, (224,224)) # resize to 224 X 224 to be same size as model was trained on
print (img.shape)
plt.imshow(img)
img=img/255.0 # rescale as was done with training images
img=np.expand_dims(img,axis=0)
print(img.shape)
p=model.predict(img)
print (p)
index=np.argmax(p)
print (index)
klass=rev_dict[index]
prob=p[0][index]* 100
print (f'image is of class {klass}, with probability of {prob:6.2f}')
the results were
{'paper': 0, 'rock': 1, 'scissors': 2}
{0: 'paper', 1: 'rock', 2: 'scissors'}
(300, 300, 3)
(224, 224, 3)
(1, 224, 224, 3)
[[9.9902594e-01 5.5121275e-04 4.2284720e-04]]
0
image is of class paper, with probability of 99.90
You had this in your code
uploaded = files.upload()
len_file = len(uploaded.keys())
This did not run because files was not defined
so could not find what causes your misclassification problem.
Remember in flow_from_directory, if you do not specify the color mode it defaults to rgb. So even though training images are 4 channel PNG the
actual model is trained on 3 channels. So make sure the images you want to predict are 3 channels.
To help really need to see the code for how you provide your data to model.predict. However as a guess, remember efficientnet needs to have the pixels in the range from0 to 255 so do not scale your images. Make sure your test images are rgb an of the same size as the image size used in training. Also need to see code for how you process the predictions

Resnet50 image preprocessing

I am using https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/3 to extract image feature vectors. However, I'm confused when it comes to how to preprocess the images prior to passing them through the module.
Based on the related Github explanation, it's said that the following should be done:
image_path = "path/to/the/jpg/image"
image_string = tf.read_file(image_path)
image = tf.image.decode_jpeg(image_string, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
# All other transformations (during training), in my case:
image = tf.random_crop(image, [224, 224, 3])
image = tf.image.random_flip_left_right(image)
# During testing:
image = tf.image.resize_image_with_crop_or_pad(image, 224, 224)
However, using the aforementioned transformation, the results I am getting suggest that something might be wrong. Moreover, the Resnet paper is saying that the images should be preprocessed by:
A 224×224 crop is randomly sampled from an image or its
horizontal flip, with the per-pixel mean subtracted...
which I can't quite understand what is means. Can someone point me in the right direction?
Looking forward to you answers!
The image modules on TensorFlow Hub all expect pixel values in range [0,1], like you get in your code snippet above. This makes it easy and safe to switch between modules.
Inside the module, the input values are scaled to the range that the network was trained for. The module https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/3 has been published from a TF-Slim checkpoint (see documentation), which uses yet another convention for normalizing inputs than He&al. -- but all this is taken care of.
To demystify the language in He&al.: it refers to the mean R, G and B values aggregated over all pixels of the dataset they studied, following the old wisdom that normalizing inputs to zero mean helps neural networks train better. However, later papers on image classification no longer expended this degree of attention to dataset-specific preprocessing.
The citation from the Resnet paper you mentioned is based on the following explanation from the Alexnet paper:
ImageNet consists of variable-resolution images, while our system requires a constant input dimensionality. Therefore, we down-sampled the images to a fixed resolution of256×256. Given a rectangular image, we first rescaled the image such that the shorter side was of length 256, and thencropped out the central 256×256patch from the resulting image. We did not pre-process the images in any other way, except for subtracting the mean activity over the training set from each pixel.
So in the Resnet paper, a similar process consist in taking a of 224x224 pixels part of the image (or of its horizontally flipped version) to ensure the network is given constant-sized images, and then center it by substracting the mean.

How to deal with large(>2GB) embedding lookup table in tensorflow?

When I use pre-trained word vectors to do classification with LSTM, I wondered how to deal with embedding lookup table larger than 2gb in tensorflow.
To do this, I tried to make embedding lookup table like the code below,
data = tf.nn.embedding_lookup(vector_array, input_data)
got this value error.
ValueError: Cannot create a tensor proto whose content is larger than 2GB
variable vector_array on the code is numpy array, and it contains about 14 million unique tokens and 100 dimension word vectors for each word.
thank you for your helping with
You need to copy it to a tf variable. There's a great answer to this question in StackOverflow:
Using a pre-trained word embedding (word2vec or Glove) in TensorFlow
This is how I did it:
embedding_weights = tf.Variable(tf.constant(0.0, shape=[embedding_vocab_size, EMBEDDING_DIM]),trainable=False, name="embedding_weights")
embedding_placeholder = tf.placeholder(tf.float32, [embedding_vocab_size, EMBEDDING_DIM])
embedding_init = embedding_weights.assign(embedding_placeholder)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
sess.run(embedding_init, feed_dict={embedding_placeholder: embedding_matrix})
You can then use the embedding_weights variable for performing the lookup (remember to store word-index mapping)
Update: Use of the variable is not required but it allows you to save it for future use so that you don't have to re-do the whole thing again (it takes a while on my laptop when loading very large embeddings). If that's not important, you can simply use placeholders like Niklas Schnelle suggested
For me the accepted answer doesn't seem to work. While there is no error the results were terrible (when compared to a smaller embedding via direct initialization) and I suspect the embeddings were just the constant 0 the tf.Variable() is initialized with.
Using just a placeholder without an extra variable
self.Wembed = tf.placeholder(
tf.float32, self.embeddings.shape,
name='Wembed')
and then feeding the embedding on every session.run() of the graph seems to work however.
Using feed_dict with large embeddings was too slow for me with TF 1.8, probably due to the issue mentioned by Niklas Schnelle.
I ended up with the following code:
embeddings_ph = tf.placeholder(tf.float32, wordVectors.shape, name='wordEmbeddings_ph')
embeddings_var = tf.Variable(embeddings_ph, trainable=False, name='wordEmbeddings')
embeddings = tf.nn.embedding_lookup(embeddings_var,input_data)
.....
sess.run(tf.global_variables_initializer(), feed_dict={embeddings_ph:wordVectors})

How can I feed a numpy array to a prefetch and buffer pipeline of TensorFlow

I tried to follow the Cifar10 example. However, I want to replace the file reading with the Numpy array. There are a few benefits for doing that:
Simpler code (I want to remove the binary file parsing)
Simpler graph and visualization --> easier to explain to other audience
Small perf improvement (due to I/O and parsing)?
What would be a simple way to do it?
You need to get the tensor reshape_image by either:
giving it a name
finding its default name, with Tensorboard for instance
reshaped_image = tf.cast(read_input.uint8image, tf.float32, name="float_image")
Then you can feed your numpy array using a feed_dict like:
reshaped_image = tf.get_default_graph().get_tensor_by_name("float_image")
sess.run(loss, feed_dict={reshaped_image: your_numpy})
The same goes for labels.

How should I structure my labels for TensorFlow?

I'm trying to use TensorFlow to train output servo commands given an input image.
I plan on using a file as #mrry suggested in this question, with the images like so:
../some/path/some_img.JPG *some_label*
My question is, what are the label formats I can provide to TensorFlow and what structures are suggested?
My data is basically n servo commands from 0-10 seconds. A vector would work great:
[0,2,4,3]
or similarly:
[0,.25,.4,.3]
I couldn't find much about labels in the docs. Can anyone shed any light on TensorFlow labels?
And a very related question is what is the best way to structure these for TensorFlow to properly learn from them?
In Tensorflow Labels are just generic tensor. You can use any kind of tensor to store your labels. In your case a 1-D tensor with shape (4,) seems to be desired.
Labels do only differ from the rest of the data by its use in the computational graph. (Usually) labels should only be used inside the loss function while you propagate the other data through the whole network. For your problem a 4-d regression function should work.
Also, look at my newest comment to the (old) question. Using the slice_input_producer seems to be preferable in your case.