Value Error when trying to visualize an image - numpy

I´m trying to visualize some images belonging to different classes. The classes are class0,class1,class2 and they mean X-ray pictures with healthy, covid and pneumonia lungs respectively. As an example, see picture below of a covid lung:
I´ve created three datasets containing the training, test and validation data. Please, see below the code:
import pandas as pd
from keras_preprocessing.image import ImageDataGenerator
from matplotlib import pyplot as plt
import numpy as np
#Creating three dataframes reading .txt files
trainingfile = pd.read_table('data/training.txt', delim_whitespace=True, names=('class', 'image'))
testingfile = pd.read_table('data/testing.txt', delim_whitespace=True, names=('class', 'image'))
validationfile = pd.read_table('data/validation.txt', delim_whitespace=True, names=('class', 'image'))
#Change 0,1,2 to categorical class class0,class1,class2
trainingfile = trainingfile.replace([0, 1, 2], ['class0', 'class1', 'class2'])
testingfile = testingfile.replace([0, 1, 2], ['class0', 'class1', 'class2'])
validationfile = validationfile.replace([0, 1, 2], ['class0', 'class1', 'class2'])
#Final training, test and validation data
datagen=ImageDataGenerator(rescale=None)
train_generator=datagen.flow_from_dataframe(dataframe=trainingfile, directory="data/", x_col="image", y_col="class", class_mode="categorical", target_size=(256,256), batch_size=32)
test_generator=datagen.flow_from_dataframe(dataframe=testingfile, directory="data/", x_col="image", y_col="class", class_mode="categorical", target_size=(256,256), batch_size=15)
validation_generator=datagen.flow_from_dataframe(dataframe=validationfile, directory="data/", x_col="image", y_col="class", class_mode="categorical", target_size=(256,256), batch_size=21)
Now, the code to visualize one picture:
first_image = train_generator[0]
first_image = np.array(first_image, dtype='float')
pixels = first_image.reshape((28, 28))
plt.imshow(pixels, cmap='gray')
plt.show()
I get the following error:
ValueError Traceback (most recent call last)
<ipython-input-3-b237e88f96dd> in <module>
1 first_image = train_generator[0]
----> 2 first_image = np.array(first_image, dtype='float')
3 pixels = first_image.reshape((28, 28))
4 plt.imshow(pixels, cmap='gray')
5 plt.show()
ValueError: could not broadcast input array from shape (32,256,256,3) into shape (32)
Furthermore, is there any way to visualize an image corresponding to a specific class?
If instead of first_image= first_image[0], I do first_image= first_image[0][0]. Then the error that pops up is:
ValueError Traceback (most recent call last)
<ipython-input-4-0664c7dc8c6b> in <module>
1 first_image = train_generator[0][0]
2 first_image = np.array(first_image, dtype='float')
----> 3 pixels = first_image.reshape((28, 28))
4 plt.imshow(pixels, cmap='gray')
5 plt.show()
ValueError: cannot reshape array of size 6291456 into shape (28,28)

Related

'key of type tuple not found and not a MultiIndex' while generating ROC for multi-class classification

I am trying to generate a ROC curve using XGBoost through a multi-class classification but facing this 'key of type tuple not found and not a MultiIndex' everytime.
Classification:
from xgboost import XGBClassifier
from xgboost import plot_tree
from sklearn import metrics
from sklearn.metrics import roc_curve, auc
from itertools import cycle
from sklearn.metrics import roc_auc_score
model = XGBClassifier()
model = model.fit(x_train, y_train)
print('Accuracy:', model.score(x_test,y_test))
score=cross_val_score(model,X,y,cv=5)
print(score)
print('CV Score:',np.mean(score))
y_pred1=model.predict(x_test)
Generating ROC:
n_classes = 5
fpr = dict()
tpr = dict()
roc_auc = dict()
lw=2
for i in range(n_classes):
fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_pred1[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])
colors = cycle(['blue', 'red', 'green', 'yellow', 'pink'])
for i, color in zip(range(n_classes), colors):
plt.plot(fpr[i], tpr[i], color=color, lw=2,
label='ROC curve of class {0} (area = {1:0.2f})'
''.format(i, roc_auc[i]))
plt.plot([0, 1], [0, 1], 'k--', lw=lw)
plt.xlim([-0.05, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic for multi-class data')
plt.legend(loc="lower right")
plt.show()
Out:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-34-14f08a1b6222> in <module>
5 lw=2
6 for i in range(n_classes):
----> 7 fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_pred1[:, i])
8 roc_auc[i] = auc(fpr[i], tpr[i])
9 colors = cycle(['blue', 'red', 'green', 'yellow', 'pink'])
2 frames
/usr/local/lib/python3.8/dist-packages/pandas/core/series.py in _get_values_tuple(self, key)
1014
1015 if not isinstance(self.index, MultiIndex):
-> 1016 raise KeyError("key of type tuple not found and not a MultiIndex")
1017
1018 # If key is contained, would have returned by now
KeyError: 'key of type tuple not found and not a MultiIndex'
Q: Why is it returning a multi-index error even after I have 5 classes in my dataframe?

How to fix an attribute error on a saved model

I have my model saved as a numpy array. I am trying to pickle the model to run predictions on a new dataset. However I keep running into an attribute error.
model = y_pred_binary
filename = 'transfer_prediction_model.pkl'
pickle.dump(model, open(filename, 'wb'))
ickled_model = pickle.load(open(filename, 'rb'))
df = pd.read_csv(f"transfer_students_prediction_dataset.csv")
predictions = pickled_model.predict(df.drop('on_time_graduation', axis = 1))
print(predictions)
AttributeError Traceback (most recent call last)
Input In [24], in <cell line: 3>()
`1 pickled_model = pickle.load(open(filename, 'rb'))
2 df = pd.read_csv(f"transfer_students_prediction_dataset.csv")
----> 3 predictions = pickled_model.predict(df.drop('on_time_graduation', axis = 1))
4 print(predictions)`
AttributeError: 'numpy.ndarray' object has no attribute 'predict'`
I have tried using other methods such as joblib or fitting the model using sklearn but I still run into the same problem.

Get a sample of one image per class with image_dataset_from_directory

I am trying to visualize Skin Cancer Images using Keras. I have imported the images in my notebook and have created batch datasets using Keras.image_dataset_from_directory. The code is as follows:
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=1337,
image_size=image_size,
batch_size=batch_size)
Now, I have been trying to visualize the images. However, I want one image from each class (there are 9 classes in the dataset). I have used the below code:
plt.figure(figsize = (10,10))
for images, labels in train_ds.take(1):
for i in range(9):
ax = plt.subplot(3,3,i+1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")
This code gets me a lot of duplicate classes. How do I get one value for each class (in this case I have 9 classes. I want one plot for each of those 9 classes). I am not sure how to fetch unique images and their labels from a BatchDataset!
for i in range(len(class_names)):
filtered_ds = train_ds.filter(lambda x, l: tf.math.equal(l[0], i))
for image, label in filtered_ds.take(1):
ax = plt.subplot(3, 3, i+1)
plt.imshow(image[0].numpy().astype('uint8'))
plt.title(class_names[label.numpy()[0]])
plt.axis('off')
You could loop through and filter on each label.
Example:
import tensorflow as tf
# fake images
imgs = tf.random.normal([100, 64, 64, 3])
# fake labels
labels = tf.random.uniform([100], minval=0, maxval=10, dtype=tf.int32)
# make dataset
ds = tf.data.Dataset.from_tensor_slices((imgs, labels))
for i in range(9):
filtered = ds.filter(lambda _, l: tf.math.equal(l, i))
for img, label in filtered.take(1):
assert label.numpy() == i
# plot image
Try Following Code - This works perfectly to display one image from each of the 10 categories of cifar10:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras
(x_train, y_train), (x_test, y_test)= keras.datasets.cifar10.load_data()
fig, ax= plt.subplots(nrows= 2, ncols= 5, figsize= (18,5))
plt.suptitle('displaying one image of each category in train set'.upper(),
y= 1.05, fontsize= 16)
i= 0
for j in range(2):
for k in range(5):
ax[j,k].imshow(x_train[list(y_train).index(i)])
ax[j,k].axis('off')
ax[j,k].set_title(i)
i+=1
plt.tight_layout()
plt.show()

Using tf extract_image_patches for input to a CNN?

I want to extract patches from my original images to use them as input for a CNN.
After a little research I found a way to extract patches with
tensorflow.compat.v1.extract_image_patches.
Since these need to be reshaped to "image format" I implemented a method reshape_image_patches to reshape them and store the reshaped patches in an array.
image_patches2 = []
def reshape_image_patches(image_patches, sess, ksize_rows, ksize_cols):
a = sess.run(tf.shape(image_patches))
nr, nc = a[1], a[2]
for i in range(nr):
for j in range(nc):
patch = tf.reshape(image_patches[0,i,j,], [ksize_rows, ksize_cols, 3])
image_patches2.append(patch)
return image_patches2
How can I use this in combination with Keras generators to make these patches the input of my CNN?
Edit 1:
I have tried the approach in Load tensorflow images and create patches
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
dataset = tf.keras.preprocessing.image_dataset_from_directory(
<directory>,
label_mode=None,
seed=1,
subset='training',
validation_split=0.1,
image_size=(900, 900))
get_patches = lambda x: (tf.reshape(
tf.image.extract_patches(
x,
sizes=[1, 16, 16, 1],
strides=[1, 8, 8, 1],
rates=[1, 1, 1, 1],
padding='VALID'), (111*111, 16, 16, 3)))
dataset = dataset.map(get_patches)
fig = plt.figure()
plt.subplots_adjust(wspace=.1, hspace=.2)
images = next(iter(dataset))
for index, image in enumerate(images):
ax = plt.subplot(2, 2, index + 1)
ax.set_xticks([])
ax.set_yticks([])
ax.imshow(image)
plt.show()
In line: images = next(iter(dataset)) I get the error: InvalidArgumentError: Input to reshape is a tensor with 302800896 values, but the requested shape has 9462528
[[{{node Reshape}}]]
Does somebody know how to fix this?
The tf.reshape does not change the order of or the total number of elements in the tensor. The error as states, you are trying to reduce total number of elements from 302800896 to 9462528 . You are using tf.reshape in lambda function.
In below example, I have recreated your scenario where I have the given the shape argument as 2 for tf.reshape which doesn't accommodate all the elements of original tensor, thus throws the error -
Code -
%tensorflow_version 2.x
import tensorflow as tf
t1 = tf.Variable([1,2,2,4,5,6])
t2 = tf.reshape(t1, 2)
Output -
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-3-0ff1d701ff22> in <module>()
3 t1 = tf.Variable([1,2,2,4,5,6])
4
----> 5 t2 = tf.reshape(t1, 2)
3 frames
/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)
InvalidArgumentError: Input to reshape is a tensor with 6 values, but the requested shape has 2 [Op:Reshape]
tf.reshape should be in such a way that the arrangement of elements can change but total number of elements must remain the same. So the fix would be to change the shape to [2,3] -
Code -
%tensorflow_version 2.x
import tensorflow as tf
t1 = tf.Variable([1,2,2,4,5,6])
t2 = tf.reshape(t1, [2,3])
print(t2)
Output -
tf.Tensor(
[[1 2 2]
[4 5 6]], shape=(2, 3), dtype=int32)
To solve your problem, either extract patches(tf.image.extract_patches) of size that you are trying to tf.reshape OR change the tf.reshape to size of extract patches.
Will also suggest you to look into other tf.image functionality like tf.image.central_crop and tf.image.crop_and_resize.

Converting tensorflow dataset to pandas dataframe

I am very new to the deep learning and computer vision. I want to do some face recognition project. For that I downloaded some images from Internet and converted to Tensorflow dataset by the help of this article from tensorflow documentation. Now I want to convert that dataset to pandas dataframe in order to convert that to csv files. I tried a lot but am unable to do it.
Can someone help me with it.
Here is the code for making datasets and and then some of the wrong code which I tried for this.
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
filenames = tf.constant(['al.jpg', 'al2.jpg', 'al3.jpg', 'al4.jpeg','al5.jpeg', 'al6.jpeg','al7.jpg','al8.jpeg', '5.jpg', 'hrit8.jpeg', 'Hrithik-Roshan.jpg', 'Hrithik.jpg', 'hriti1.jpeg', 'hriti2.jpg', 'hriti3.jpeg', 'hritik4.jpeg', 'hritik5.jpg', 'hritk9.jpeg', 'index.jpeg', 'sah.jpeg', 'sah1.jpeg', 'sah3.jpeg', 'sah4.jpg', 'sah5.jpg','sah6.jpg','sah7.jpg'])
labels = tf.constant([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 2, 2, 2, 2])
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
def _parse_function(filename, label):
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_jpeg(image_string,channels=3)
image_resized = tf.image.resize_images(image_decoded, [28, 28])
return image_resized, label
dataset = dataset.map(_parse_function)
dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(26)
iterator = dataset.make_one_shot_iterator()
image,labels = iterator.get_next()
sess = tf.Session()
print(sess.run([image, labels]))
Initially I just tried to use df = pd.DataFrame(dataset)
Then i got following error:
enter code here
ValueError Traceback (most recent call last)
<ipython-input-15-d5503ae4603d> in <module>()
----> 1 df = pd.DataFrame((dataset))
~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
402 dtype=values.dtype, copy=False)
403 else:
--> 404 raise ValueError('DataFrame constructor not properly called!')
405
406 NDFrame.__init__(self, mgr, fastpath=True)
ValueError: DataFrame constructor not properly called!
Thereafter I came across this article I got my mistake that in tensorflow anything exist only within a session. So I tried following code:
with tf.Session() as sess:
df = pd.DataFrame(sess.run(dataset))
Please pardon me if i did stupidest mistake because i wrote above code from this analogy print(sess.run(dataset)) and got a much bigger error:
TypeError: Fetch argument <BatchDataset shapes: ((?, 28, 28, 3), (?,)), types: (tf.float32, tf.int32)> has invalid type <class 'tensorflow.python.data.ops.dataset_ops.BatchDataset'>, must be a string or Tensor. (Can not convert a BatchDataset into a Tensor or Operation.)
I think you could use map like this. I assumed that you want to add a numpy array to a data frame as described here. But you have to append one by one and also figure out how this whole array fits in one column of the data frame.
import tensorflow as tf
import pandas as pd
filenames = tf.constant(['C:/Machine Learning/sunflower/50987813_7484bfbcdf.jpg'])
labels = tf.constant([1])
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
sess = tf.Session()
def convert_to_dataframe(filename, label):
print ( pd.DataFrame.from_records(filename))
return filename, label
def _parse_function(filename, label):
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_jpeg(image_string,channels=3)
image_resized = tf.image.resize_images(image_decoded, [28, 28])
return image_resized, label
dataset = dataset.map(_parse_function)
dataset = dataset.map( lambda filename, label: tf.py_func(convert_to_dataframe,
[filename, label],
[tf.float32,tf.int32]))
dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(26)
iterator = dataset.make_one_shot_iterator()
image,labels = iterator.get_next()
sess.run([image, labels])
One easy way to do it is to save the dataset into normal csv file, and then directly read the csv file into pandas dataframe.
import tensorflow_datasets as tfds
# Construct a tf.data.Dataset
ds = tfds.load('civil_comments/CivilCommentsCovert', split='train')
#read the dataset into a tensorflow styled_dataframe
df = tfds.as_dataframe(ds)
#save the dataframe into csv file
df.to_csv("/.../.../Desktop/covert_toxicity.csv")
#read the csv file as normal, then you have the df you need
import pandas as pd
file_path = "/.../.../Desktop/covert_toxicity.csv"
df = pd.read_csv(file_path, header = 0, sep=",")
df
A more simpler way to convert a TensorFlow object to a dataframe would be to convert the TensorFlow object to a numpy array and pass the pandas DataFrame class.
import pandas as pd
dataset = pd.DataFrame(labels.numpy(), columns=filenames)