Python Dill or Pickle gives error when use in a new file - google-colaboratory

I need help with my code. I have built a recommendation system using cosine similarity on a colab and used pickle to serialized it. when I deserialized it inside a colab file, it works perfectly fine but when I deserialize it in a new colab file. it gives me an error
name 'data' is not defined
data is a variable that is initialized with my dataset which is outside of the class InstaPost.
import pandas as pd
import numpy as np
from sklearn.feature_extraction import text
from sklearn.metrics.pairwise import cosine_similarity
import dill as pickle
data = pd.read_csv("/content/instaData.txt")
data
data = data[["Caption", "Hashtags"]]
captions = data["Caption"].tolist()
uni_tfidf = text.TfidfVectorizer(input=captions, stop_words="english")
uni_matrix = uni_tfidf.fit_transform(captions)
uni_sim = cosine_similarity(uni_matrix)
def recommend_post(x):
return ", ".join(data["Caption"].loc[x.argsort()[-7:-1]])
data["Recommended Post"] = [recommend_post(x) for x in uni_sim]
class InstaPost:
def Post(number):
count = 0
wordy = (data["Recommended Post"][number])
sentence = wordy.split(',')
for i in sentence:
count=count+1
print(count," ",i)
obj = InstaPost
obj.Post(1)
pickle_out = open("modelREC", "wb")
pickle.dump(obj, pickle_out)
pickle_out.close()
pickle_in = open("modelREC", "rb")
exe = pickle.load(pickle_in)
print(exe.Post(10))
NOTE: on a different file
print(exe.Post)
works
and give output
<function InstaPost.Post at 0x7efc0b4c3f70>
if I need to give the reference of the data than please guide me how should I do it. It will be a great help to me
Thanks in advance

Related

AttributeError: module 'PIL.TiffTags' has no attribute 'IFD' PS: i`m using google colab

here is the part of my code
import PIL
import numpy as np
ramp = "$#B%8&WM#*oahkbdpqwmZO0QLCJUYXzcvunxrjft/\|()1{}[]?-_+~<>i!lI;:,^`'."
def average(image):
im = np.array(image)
return np.average(im.flatten())
def convert(path, imgScale, fontScale):
if imgScale>1:
raise Exception('isnt right scale')
image = Image.open(path).convert("L")
W, H = image.size
I used to watch for any solutions. People say it is because of pil version. But i have the last one(https://i.stack.imgur.com/04mlK.png)

Why ImportExampleGen reads TFRecords as SparseTensor instead of Tensor?

I'm converting a CSV file into a TFRecords file like this:
File: ./dataset/csv/file.csv
feature_1, feture_2, output
1, 1, 1
2, 2, 2
3, 3, 3
import tensorflow as tf
import csv
import os
print(tf.__version__)
def create_csv_iterator(csv_file_path, skip_header):
with tf.io.gfile.GFile(csv_file_path) as csv_file:
reader = csv.reader(csv_file)
if skip_header: # Skip the header
next(reader)
for row in reader:
yield row
def _int64_feature(value):
"""Returns an int64_list from a bool / enum / int / uint."""
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def create_example(row):
"""
Returns a tensorflow.Example Protocol Buffer object.
"""
features = {}
for feature_index, feature_name in enumerate(["feature_1", "feture_2", "output"]):
feature_value = row[feature_index]
features[feature_name] = _int64_feature(int(feature_value))
return tf.train.Example(features=tf.train.Features(feature=features))
def create_tfrecords_file(input_csv_file):
"""
Creates a TFRecords file for the given input data
"""
output_tfrecord_file = input_csv_file.replace("csv", "tfrecords")
writer = tf.io.TFRecordWriter(output_tfrecord_file)
print("Creating TFRecords file at", output_tfrecord_file, "...")
for i, row in enumerate(create_csv_iterator(input_csv_file, skip_header=True)):
if len(row) == 0:
continue
example = create_example(row)
content = example.SerializeToString()
writer.write(content)
writer.close()
print("Finish Writing", output_tfrecord_file)
create_tfrecords_file("./dataset/csv/file.csv")
Then I'll read the generated TFRecords files using ImportExampleGen class:
import os
import absl
import tensorflow_model_analysis as tfma
tf.get_logger().propagate = False
from tfx import v1 as tfx
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
%load_ext tfx.orchestration.experimental.interactive.notebook_extensions.skip
context = InteractiveContext()
example_gen = tfx.components.ImportExampleGen(input_base="./dataset/tfrecords")
context.run(example_gen, enable_cache=True)
statistics_gen = tfx.components.StatisticsGen(
examples=example_gen.outputs['examples'])
context.run(statistics_gen, enable_cache=True)
schema_gen = tfx.components.SchemaGen(
statistics=statistics_gen.outputs['statistics'],
infer_feature_shape=False)
context.run(schema_gen, enable_cache=True)
File: ./transform.py
def preprocessing_fn(inputs):
"""tf.transform's callback function for preprocessing inputs.
Args:
inputs: map from feature keys to raw not-yet-transformed features.
Returns:
Map from string feature key to transformed feature operations.
"""
print(inputs)
return inputs
transform = tfx.components.Transform(
examples=example_gen.outputs['examples'],
schema=schema_gen.outputs['schema'],
module_file=os.path.abspath("./transform.py"))
context.run(transform, enable_cache=True)
In the preprocessing_fn function shows that inputs is a SparseTensor objects. My question is why? As far as I can tell, my dataset's samples are dense and they should be Tensor instead. Am I doing something wrong?
For anyone else who might be struggling with the same issue, I found the culprit. It's the SchemaGen class. This is how I was instantiating its object:
schema_gen = tfx.components.SchemaGen(
statistics=statistics_gen.outputs['statistics'],
infer_feature_shape=False)
I don't know what's the use case for asking SchemaGen class not to infer the shape of the features but the tutorial I was following had it set to False and I had just copied and pasted the same thing. Comparing with some other tutorials, I realized that it could be the reason why I was getting SparseTensor.
So, if you let SchemaGen infer the shape of your features or you load a hand crafted schema in which you've set the shapes yourself, you'll be getting a Tensor in your preprocessing_fn. But if the shapes are not set, the features will be instances of SparseTensor.
For the sake of completeness, this is the fixed snippet:
schema_gen = tfx.components.SchemaGen(
statistics=statistics_gen.outputs['statistics'],
infer_feature_shape=True)

Using Sklearn with NumPy and Images and get this error 'setting an array element with a sequence'

I am trying to create a simple image classification tool.
I would like the code below to work with classifying images. It works fine when it is a non image NumPy array.
#https://e2eml.school/images_to_numbers.html
import numpy as np
from sklearn.utils import Bunch
from PIL import Image
monkey = [1]
dog = [2]
example_animals = Bunch(data = np.array([monkey,dog]),target = np.array(['monkey','dog']))
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=2) #with KMeans you get to pre specify the number of Clusters
KModel = kmeans.fit(example_animals.data) #fit a model using the training data , in this case original example animal data passed through
import pandas as pd
crosstab = pd.crosstab(example_animals.target,KModel.labels_)
print(crosstab)
I have looked into how to make an image into a NumPy array at https://e2eml.school/images_to_numbers.html
The code below where I have converted images to NumPy array doesn't work.
When run it gets the following error
** 'setting an array element with a sequence'**
#https://e2eml.school/images_to_numbers.html
import numpy as np
from sklearn.utils import Bunch
from PIL import Image
monkey = np.asarray(Image.open("monkey.jpg"))
dog = np.asarray(Image.open("dog.jpeg"))
example_animals = Bunch(data = np.array([monkey,dog]),target = np.array(['monkey','dog']))
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=2) #with KMeans you get to pre specify the number of Clusters
KModel = kmeans.fit(example_animals.data) #fit a model using the training data , in this case original example animal data passed through
import pandas as pd
crosstab = pd.crosstab(example_animals.target,KModel.labels_)
print(crosstab)
I would appreciate any insight how I fix the error 'setting an array element with a sequence' so that the images will be compatible with the sklearn processing.
You need to be sure that your images "monkey.jpg" and "dog.jpeg" have the same number of pixels. Otherwise, you will have to resize the images to have the same size. Moreover, the data of your Bunch object need to be of shape (n_samples, n_features) (you can check the documentation https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans.fit)
You need to be aware that you use an unserpervised learning model (Kmeans). So the output of the model is not directly "monkey" or "dog".
I found the solution to error setting an array element with a sequence
Kmeans requires the data arrays for comparison need to be the same size.
This means if importing pictures, the pictures need to be resized, converted into a numpy array (a format that is compatible with Kmeans) and finally made into a 1 dimensional array.
#https://e2eml.school/images_to_numbers.html
#https://machinelearningmastery.com/how-to-load-and-manipulate-images-for-deep-learning-in-python-with-pil-pillow/
import numpy as np
from matplotlib import pyplot as plt
from sklearn.utils import Bunch
from PIL import Image
from sklearn.cluster import KMeans
import pandas as pd
monkey = Image.open("monkey.jpg")
dog = Image.open("dog.jpeg")
#resize pictures
monkey1 = monkey.resize((180,220))
dog1 = dog.resize((180,220))
#make pictures into numpy array
monkey2 = np.asarray(monkey1)
dog2 = np.asarray(dog1)
#https://www.quora.com/How-do-I-convert-image-data-from-2D-array-to-1D-using-python
#make numpy array into 1 dimensional array
monkey3 = monkey2.reshape(-1)
dog3 = dog2.reshape(-1)
example_animals = Bunch(data = np.array([monkey3,dog3]),target = np.array(['monkey','dog']))
kmeans = KMeans(n_clusters=2) #with KMeans you get to pre specify the number of Clusters
KModel = kmeans.fit(example_animals.data) #fit a model using the training data , in this case original example food data passed through
crosstab = pd.crosstab(example_animals.target,KModel.labels_)
print(crosstab)

Accessing methods within a class from bokeh FileInput widget

I am working on a Bokeh serve UI and am running into trouble interfacing a class (and its methods) with the FileInput widget. I am using a class (in this example, called "EIS_data") which, when instantiated, loads a file using pd.read_csv. The EIS_data class also has a method to plot the data in a particular way, and I'd like to be able to load the pandas dataframe and call and manipulate the data using the methods already in place in the class.
So far, I have been able to load the data successfully using the FileInput widget, but I can't figure out how to access the dataframe again once it's loaded in. In a standalone Jupyter notebook, I could run d = EIS_data("filename") and then ```d.plot''' to load the data into a pandas dataframe and then plot it according to the method defined in the EIS_data class, but I can't figure out how to replicate this in the UI code once the data are loaded using the FileInput widget.
Is there a way I can interface this with Bokeh widgets, such that I could simply add d.plot() to curdoc()? I have found a workaround using ColumnDataSource, but it seems a shame to redefine plotting methods and data handling when they are already defined in the class. Below are minimal working examples of the UI code and the class definition.
UI Code:
import numpy as np
import pandas as pd
from eis_analysis_trimmed import EIS_data
import bokeh
from bokeh.io import curdoc
from bokeh import layouts
from bokeh.layouts import column,row,gridplot
from bokeh.plotting import figure
from bokeh.models import *
import base64
import io
## Instantiate the EIS_data class for loading data
def load_data(f):
return EIS_data(f)
## updater function called to load data with FileInput widget
## Must be decoded using base64
def load_file(attr, old, new):
decoded = base64.b64decode(new)
d = io.BytesIO(decoded)
dat = load_data(d)
print(dat.df)
print(dat)
print("EIS Data Uploaded Successfully")
return dat
f_load = Paragraph(text="""Load Data""",height=15)
f = FileInput()
f.on_change('value',load_file)
curdoc().add_root(column(f))
and here is the EIS_data class:
import numpy as np
from scipy.optimize import curve_fit
from sklearn.metrics import r2_score
from bokeh.plotting import figure, show
from bokeh.models import LinearAxis, Range1d
from bokeh.resources import INLINE
import bokeh.io
#locally include javascript dependencies in html
bokeh.io.output_notebook(INLINE)
class EIS_data:
def __init__(self, file_name, delimiter='\t',
header=0, f_low=None, f_high=None):
#load eis data into a pandas dataframe
eis_data = pd.read_csv(file_name, delimiter=delimiter, header=header)
#iterate through all of the columns and check to see
#if all of the values in that column are null
#if they are, then remove that column
for c in eis_data.columns:
if eis_data[c].isnull().all():
eis_data = eis_data.drop([c], axis=1)
#make sure that the data are imported as floats and not strings
eis_data = eis_data[['freq/Hz', 'Re(Z)/Ohm', '-Im(Z)/Ohm']]
eis_data['freq/Hz'] = pd.to_numeric(eis_data['freq/Hz'])
eis_data['Re(Z)/Ohm'] = pd.to_numeric(eis_data['Re(Z)/Ohm'])
eis_data['-Im(Z)/Ohm'] = pd.to_numeric(eis_data['-Im(Z)/Ohm'])
self.df = eis_data.sort_values(by='freq/Hz')
def plot(self, fit_vals = None):
plot = figure(title="Nyquist Plot",
x_axis_label='Re(Z) Ohm',
y_axis_label='-Im(Z) Ohm',
plot_width=600,
plot_height=600)
plot.circle(self.df['Re(Z)/Ohm'], self.df['-Im(Z)/Ohm'],
size=7, color='navy', name='Data')
return plot
EDIT: Adding the workaround using ColumnDataSource
from bokeh.layouts import column
from bokeh.plotting import figure
from bokeh.models import *
from bokeh.models.widgets import FileInput
import base64
import io
from eis_analysis2 import EIS_data
# Instantiate the EIS_data class for loading data
def load_data(data):
return EIS_data(data)
# updater function called to load data with FileInput widget
# Must be decoded using base64
def load_file(attr, old, new):
decoded = base64.b64decode(new)
d = io.BytesIO(decoded)
dat = load_data(d)
dat_df = dat.df
# Replace plot data with data from newly-loaded file
source.data = dict(freq=dat_df[dat_df.columns[0]], reZ=dat_df[dat_df.columns[1]], imZ=dat_df[dat_df.columns[2]])
#phase,mag = bode_calc(reZ,imZ)
print(dat_df)
print("EIS Data Uploaded Successfully")
# Create Column Data Source that will be used by the plot
source = ColumnDataSource(data=dict(freq=[], reZ=[], imZ=[]))
##Make the nyquist plot
nyq_plot = figure(title="Nyquist Plot",
x_axis_label='Re(Z) Ohm',
y_axis_label='-Im(Z) Ohm',
plot_width=600,
plot_height=600)
nyq_plot.circle(x="reZ", y="imZ",source=source,size=7, color='navy', name='Data')
f = FileInput()
f.on_change('value', load_file)
layout = column(f, nyq_plot)
curdoc().add_root(layout)

A bytes-like object is required, not 'Tensor' when calling map on string tensors in eager mode

I am trying to use TF.dataset.map to port over this old code because I get a deprecation warning.
Old code which reads a set of custom protos from a TFRecord file:
record_iterator = tf.python_io.tf_record_iterator(path=filename)
for record in record_iterator:
example = MyProto()
example.ParseFromString(record)
I am trying to use eager mode and map, but I get this error.
def parse_proto(string):
proto_object = MyProto()
proto_object.ParseFromString(string)
dataset = tf.data.TFRecordDataset(dataset_paths)
parsed_protos = raw_tf_dataset.map(parse_proto)
This code works:
for raw_record in raw_tf_dataset:
proto_object = MyProto()
proto_object.ParseFromString(raw_record.numpy())
But the map gives me an error:
TypeError: a bytes-like object is required, not 'Tensor'
What is the right way to take use the argument the function results of the map and treat them like a string?
You need to extract string form the tensor and use in the map function. Below are the steps to be implemented in the code to achieve this.
You have to decorate the map function with tf.py_function(get_path, [x], [tf.float32]). You can find more about tf.py_function here. In tf.py_function, first argument is the name of map function, second argument is the element to be passed to map function and final argument is the return type.
You can get your string part by using bytes.decode(file_path.numpy()) in map function.
So modify your program as below,
parsed_protos = raw_tf_dataset.map(parse_proto)
to
parsed_protos = raw_tf_dataset.map(lambda x: tf.py_function(parse_proto, [x], [function return type]))
Also modify parse_proto as below,
def parse_proto(string):
proto_object = MyProto()
proto_object.ParseFromString(string)
to
def parse_proto(string):
proto_object = MyProto()
proto_object.ParseFromString(bytes.decode(string.numpy()))
In the below simple program, we are using tf.data.Dataset.list_files to read path of the image. Next in the map function we are reading the image using load_img and later doing the tf.image.central_crop function to crop central part of the image.
Code -
%tensorflow_version 2.x
import tensorflow as tf
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array, array_to_img
from matplotlib import pyplot as plt
import numpy as np
def load_file_and_process(path):
image = load_img(bytes.decode(path.numpy()), target_size=(224, 224))
image = img_to_array(image)
image = tf.image.central_crop(image, np.random.uniform(0.50, 1.00))
return image
train_dataset = tf.data.Dataset.list_files('/content/bird.jpg')
train_dataset = train_dataset.map(lambda x: tf.py_function(load_file_and_process, [x], [tf.float32]))
for f in train_dataset:
for l in f:
image = np.array(array_to_img(l))
plt.imshow(image)
Output -
Hope this answers your question. Happy Learning.