There are a list of image files I want to convert to numpy arrays and append them to a txt file, each array line after line. This is my code:
from PIL import Image
import numpy as np
import os
data = os.listdir("inputs")
print(len(data))
with open('np_arrays.txt', 'a+') as file:
for dt in data:
img = Image.open("inputs\\" + dt)
np_img = np.array(img)
file.write(np_img)
file.write('\n')
but file.write() requires a string and does not accept a numpy ndarray. How can I solve this?
Numpy also allows you to save directly to .txt files with np.savetxt.
I'm still not entirely sure what format you want your text file to be in but a solution might be something like:
from PIL import Image
import numpy as np
import os
data = os.listdir("inputs")
print(len(data))
shape = ( len(data), .., .., ) # input the desired shape
np_imgs = np.empty(shape)
for i, dt in enumerate(data):
img = Image.open("inputs\\" + dt)
np_imgs[i] = np.array(img) # a caveat here is that all images should be of the exact same shape, to fit nicely in a numpy array
np.savetxt('np_arrays.txt', np_imgs)
Note that np.savetxt() has a lot of parameters that allow you to finetune the outputted txt file.
The write() function only allows strings as its input. Try using numpy.array2string.
Related
I need help with my code. I have built a recommendation system using cosine similarity on a colab and used pickle to serialized it. when I deserialized it inside a colab file, it works perfectly fine but when I deserialize it in a new colab file. it gives me an error
name 'data' is not defined
data is a variable that is initialized with my dataset which is outside of the class InstaPost.
import pandas as pd
import numpy as np
from sklearn.feature_extraction import text
from sklearn.metrics.pairwise import cosine_similarity
import dill as pickle
data = pd.read_csv("/content/instaData.txt")
data
data = data[["Caption", "Hashtags"]]
captions = data["Caption"].tolist()
uni_tfidf = text.TfidfVectorizer(input=captions, stop_words="english")
uni_matrix = uni_tfidf.fit_transform(captions)
uni_sim = cosine_similarity(uni_matrix)
def recommend_post(x):
return ", ".join(data["Caption"].loc[x.argsort()[-7:-1]])
data["Recommended Post"] = [recommend_post(x) for x in uni_sim]
class InstaPost:
def Post(number):
count = 0
wordy = (data["Recommended Post"][number])
sentence = wordy.split(',')
for i in sentence:
count=count+1
print(count," ",i)
obj = InstaPost
obj.Post(1)
pickle_out = open("modelREC", "wb")
pickle.dump(obj, pickle_out)
pickle_out.close()
pickle_in = open("modelREC", "rb")
exe = pickle.load(pickle_in)
print(exe.Post(10))
NOTE: on a different file
print(exe.Post)
works
and give output
<function InstaPost.Post at 0x7efc0b4c3f70>
if I need to give the reference of the data than please guide me how should I do it. It will be a great help to me
Thanks in advance
I am trying to create a simple image classification tool.
I would like the code below to work with classifying images. It works fine when it is a non image NumPy array.
#https://e2eml.school/images_to_numbers.html
import numpy as np
from sklearn.utils import Bunch
from PIL import Image
monkey = [1]
dog = [2]
example_animals = Bunch(data = np.array([monkey,dog]),target = np.array(['monkey','dog']))
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=2) #with KMeans you get to pre specify the number of Clusters
KModel = kmeans.fit(example_animals.data) #fit a model using the training data , in this case original example animal data passed through
import pandas as pd
crosstab = pd.crosstab(example_animals.target,KModel.labels_)
print(crosstab)
I have looked into how to make an image into a NumPy array at https://e2eml.school/images_to_numbers.html
The code below where I have converted images to NumPy array doesn't work.
When run it gets the following error
** 'setting an array element with a sequence'**
#https://e2eml.school/images_to_numbers.html
import numpy as np
from sklearn.utils import Bunch
from PIL import Image
monkey = np.asarray(Image.open("monkey.jpg"))
dog = np.asarray(Image.open("dog.jpeg"))
example_animals = Bunch(data = np.array([monkey,dog]),target = np.array(['monkey','dog']))
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=2) #with KMeans you get to pre specify the number of Clusters
KModel = kmeans.fit(example_animals.data) #fit a model using the training data , in this case original example animal data passed through
import pandas as pd
crosstab = pd.crosstab(example_animals.target,KModel.labels_)
print(crosstab)
I would appreciate any insight how I fix the error 'setting an array element with a sequence' so that the images will be compatible with the sklearn processing.
You need to be sure that your images "monkey.jpg" and "dog.jpeg" have the same number of pixels. Otherwise, you will have to resize the images to have the same size. Moreover, the data of your Bunch object need to be of shape (n_samples, n_features) (you can check the documentation https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans.fit)
You need to be aware that you use an unserpervised learning model (Kmeans). So the output of the model is not directly "monkey" or "dog".
I found the solution to error setting an array element with a sequence
Kmeans requires the data arrays for comparison need to be the same size.
This means if importing pictures, the pictures need to be resized, converted into a numpy array (a format that is compatible with Kmeans) and finally made into a 1 dimensional array.
#https://e2eml.school/images_to_numbers.html
#https://machinelearningmastery.com/how-to-load-and-manipulate-images-for-deep-learning-in-python-with-pil-pillow/
import numpy as np
from matplotlib import pyplot as plt
from sklearn.utils import Bunch
from PIL import Image
from sklearn.cluster import KMeans
import pandas as pd
monkey = Image.open("monkey.jpg")
dog = Image.open("dog.jpeg")
#resize pictures
monkey1 = monkey.resize((180,220))
dog1 = dog.resize((180,220))
#make pictures into numpy array
monkey2 = np.asarray(monkey1)
dog2 = np.asarray(dog1)
#https://www.quora.com/How-do-I-convert-image-data-from-2D-array-to-1D-using-python
#make numpy array into 1 dimensional array
monkey3 = monkey2.reshape(-1)
dog3 = dog2.reshape(-1)
example_animals = Bunch(data = np.array([monkey3,dog3]),target = np.array(['monkey','dog']))
kmeans = KMeans(n_clusters=2) #with KMeans you get to pre specify the number of Clusters
KModel = kmeans.fit(example_animals.data) #fit a model using the training data , in this case original example food data passed through
crosstab = pd.crosstab(example_animals.target,KModel.labels_)
print(crosstab)
I have a simple ndarray with shape as:
import matplotlib.pyplot as plt
%matplotlib inline
plt.imshow(trainImg[0]) #can display a sample image
print(trainImg.shape) : (4750, 128, 128, 3) #shape of the dataset
I intend to apply Gaussian blur to all the images. The for loop I went with:
trainImg_New = np.empty((4750, 128, 128,3))
for idx, img in enumerate(trainImg):
trainImg_New[idx] = cv2.GaussianBlur(img, (5, 5), 0)
I tried to display a sample blurred image as:
plt.imshow(trainImg_New[0]) #view a sample blurred image
but I get an error:
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
It just displays a blank image.
TL;DR:
The error is most likely caused by trainImg_New is float datatype and its value is larger than 1. So, as #Frightera mentioned, try using np.uint8 to convert images' datatype.
I tested the snippets as below:
import numpy as np
import matplotlib.pyplot as plt
import cv2
trainImg_New = np.random.rand(4750, 128, 128,3) # all value is in range [0, 1]
save = np.empty((4750, 128, 128,3))
for idx, img in enumerate(trainImg_New):
save[idx] = cv2.GaussianBlur(img, (5, 5), 0)
plt.imshow(np.float32(save[0]+255)) # Reported error as question
plt.imshow(np.float32(save[0]+10)) # Reported error as question
plt.imshow(np.uint8(save[0]+10)) # Good to go
First of all, cv2.GaussianBlur will not change the range of the arrays' value and the original image arrays's value is legitimate. So I believe the only reason is the datatype of the trainImg_New[0] is not match its range.
So I tested the snippets above, we can see when the datatype of trainImg_New[0] matter the available range of the arrays' value.
I suggest you use tfa.image.gaussian_filter2d from the tensorflow_addons package. I think you'll be able to pass all your images at once.
import tensorflow as tf
from skimage import data
import tensorflow_addons as tfa
import matplotlib.pyplot as plt
image = data.astronaut()
plt.imshow(image)
plt.show()
blurred = tfa.image.gaussian_filter2d(image,
filter_shape=(25, 25),
sigma=3.)
plt.imshow(blurred)
plt.show()
I'm experimenting with numpy and I'd like to ask a solution for the following code. I'd like to, actually, generate a 256x256 image, from start using a random rgb schema -- probably that would be the way to go. Any numpy insights would be welcome!
# -*- coding: utf-8 -*-
from PIL import Image
import numpy as np
def transform_matrice(data):
aux_data = []
for e in data:
aux = []
for a in e:
aux.append(np.array([[random.randrange(255), random.randrange(255), random.randrange(255)]]))
aux_data.append(aux)
return aux_data
w, h = 250, 250
data = np.zeros((h, w, 3), dtype=np.uint8)
ret = transform_matrice(data)
img = Image.fromarray(ret, 'RGB')
img.save('eg.png')
img.show()
with this code I got the following error:
AttributeError: 'list' object has no attribute '__array_interface__'
You do not need to create a empty data table neither you need to use for loops, numpy can do it for you!
np.random.randint will create you a 3D matrix of size (w,h,3) with integers from 0 to 255 using the following command:
def transform_matrice(w,h):
return np.random.randint(0,256,size=(w,h,3)).astype('uint8')
ret = transform_matrice(250,250)
None that I put 256 and not 255 as second parameter since the parameter is one above the largest integer you want
I have a Pandas column which contains numpy arrays or lists of varying size. If I try to convert the dataframe to hdf5 using to_hdf , I get the message that says
PerformanceWarning:
your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed-integer,key->block0_values]
I am guessing this because of the ragged tensors in pandas column. HDpy does have a special datatype for ragged tensors.
http://docs.h5py.org/en/stable/special.html#arbitrary-vlen-data
Example here
h5f = h5py.File('data.h5', 'w')
dt = h5py.special_dtype(vlen=np.dtype('int32'))
h5f.create_dataset('batch', data=yourData, dtype=dt, compression='gzip', compression_opts=9)
So I can convert the pandas df to numpy, and then save each numpy array separately, with the varying length column stored with the special vlen datatype.
I am wondering if there is a way to do this in Pandas.
The following is a minimal example using a small chunk of my data. It downloads and opens a small chunk of the dataframe, and saves it to hdf5
import requests
import pickle
import numpy as np
import pandas as pd
#Download function for google drive
def download_file_from_google_drive(id, destination):
URL = "https://docs.google.com/uc?export=download"
session = requests.Session()
response = session.get(URL, params = { 'id' : id }, stream = True)
token = get_confirm_token(response)
if token:
params = { 'id' : id, 'confirm' : token }
response = session.get(URL, params = params, stream = True)
save_response_content(response, destination)
def get_confirm_token(response):
for key, value in response.cookies.items():
if key.startswith('download_warning'):
return value
return None
def save_response_content(response, destination):
CHUNK_SIZE = 32768
with open(destination, "wb") as f:
for chunk in response.iter_content(CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
#download the google drive file
download_file_from_google_drive('1-0R28Yhdrq2QWQ-4MXHIZUdZG2WZK2qR', 'sample.pkl')
sampleDF2 = pd.read_pickle('sample.pkl')
sampleDF2.to_hdf( 'pandasList.hdf', 'first', complevel = 9 )
sampleDF2['totalCites2'] = sampleDF2['totalCites2'].apply(lambda x: np.array(x))
sampleDF2.to_hdf( 'pandasNumpy.hdf', 'first', complevel = 9 )
For convenience, here is a colab notebook which has this code
https://colab.research.google.com/drive/1DjiPsN3MbRWP6NnJwvaAhzy66FNbPVA8
Edit:
As hpualj mentioned, Pandas uses Pytables not h5py, so it looks like the question should be how to use vlarray, which is how pytables store variable length arrays.