Passing a dict of tensors to a Keras model - tensorflow

I am trying to preprocess the infamous Titanic data (from Kaggle) by following this tutorial.
Everything was okay until I get to run the titanic_processing Model on the data (titanic_features) and I get this error:
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).
In the tutorial it is mentioned that one should transform the data into a dict of tensors, but:
I don't see how the code (see HERE1 tag in my code below) makes a dict of tensors (there is no tf.convert_to_tensor for example)
I don't understand why one should retransform all the data as the previous code was suppose to do just that (when one create preprocessed_inputs etc.)
Here is my code, but you can also execute it on Google Colab here.
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing
url = "https://raw.githubusercontent.com/aymeric75/IA/master/train.csv"
titanic = pd.read_csv(url)
titanic_features = titanic.copy()
titanic_labels = titanic_features.pop('Survived')
inputs = {}
for name, column in titanic_features.items():
dtype = column.dtype
if dtype == object:
dtype = tf.string
else:
dtype = tf.float32
inputs[name] = tf.keras.Input(shape=(1,), name=name, dtype=dtype)
numeric_inputs = {name:input for name,input in inputs.items()
if input.dtype==tf.float32}
x = layers.Concatenate()(list(numeric_inputs.values()))
norm = preprocessing.Normalization()
norm.adapt(np.array(titanic[numeric_inputs.keys()]))
all_numeric_inputs = norm(x)
preprocessed_inputs = [all_numeric_inputs]
for name, input in inputs.items():
if input.dtype == tf.float32:
continue
lookup = preprocessing.StringLookup(vocabulary=np.unique(titanic_features[name].dropna()))
one_hot = preprocessing.CategoryEncoding(max_tokens=lookup.vocab_size())
x = lookup(input)
x = one_hot(x)
preprocessed_inputs.append(x)
preprocessed_inputs_cat = layers.Concatenate()(preprocessed_inputs)
titanic_preprocessing = tf.keras.Model(inputs, preprocessed_inputs_cat)
titanic_features_dict = {}
# This model just contains the input preprocessing. You can run it to see what it does to your data.
# Keras models don't automatically convert Pandas DataFrames because
# it's not clear if it should be converted to one tensor or to a dictionary of tensors. So convert it to a dictionary of tensors:
# HERE1
titanic_features_dict = {name: np.array(value)
for name, value in titanic_features.items()}
features_dict = {name:values[:1] for name, values in titanic_features_dict.items()}
titanic_preprocessing(features_dict)
Thanks a lot for you support!
Aymeric
[UPDATE] if you can answer question 2 ("I don't understand why one should retransform all the data as the previous code was suppose to do just that (when one create preprocessed_inputs etc.") then I will validate your answer, because I think I need to reformat the input indeed (but I don't see what it the point of doing all the code before...)

In your case, the problem is caused by the fact that your feature "Cabin" contains some nan (Not a Number) values. Tensorflow is fine with nan in floating point and integer data types, but not for strings.
You can replace all those nan values with an empty strings in your pandas dataframe :
titanic_features["Cabin"] = titanic_features["Cabin"].fillna("")
The previous code simply declares a preprocessing function as a keras model. You don't actually preprocess any data until your call to the titanic_preprocessing model.

Related

Linear regression with one feature from Pandas dataframe

I have tried the code below
import pandas as pd
from sklearn.linear_model import LinearRegression
import numpy as np
# Assign the dataframe to this variable.
# TODO: Load the data
bmi_life_data = pd.read_csv("bmi_and_life_expectancy.csv")
X= bmi_life_data['BMI'].values.reshape(-1,1)
y = bmi_life_data['Life expectancy'].values.reshape(-1,1)
# Make and fit the linear regression model
#TODO: Fit the model and Assign it to bmi_life_model
bmi_life_model = LinearRegression()
bmi_life_model.fit(X,y)
# Mak a prediction using the model
# TODO: Predict life expectancy for a BMI value of 21.07931
laos_life_exp = bmi_life_model.predict(21.07931)
but it gives me the error
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Even after reshaping it. I have tried to not reshape it but it still gives me the same error.
The error was in the prediction line
laos_life_exp = bmi_life_model.predict(21.07931)
should be
laos_life_exp = bmi_life_model.predict([[21.07931]])
to be of appropriate dimension
Thanks to #onyambu

NumPy array value error from training in Auto-Keras with StratifiedKFold

Background
My sentiment analysis research comes across a variety of datasets. Recently I've encountered one dataset that somehow I just cannot train successfully. I mostly work with open data in .CSV file format, hence Pandas and NumPy are heavily used.
During my research, one of the approaches is trying to integrate automated machine learning (AutoML), and the library I chose to use was Auto-Keras, mainly using its TextClassifier() wrapper function to achieve AutoML.
Main Problem
I've verified with official documentation, that the TextClassifier() takes data in the format of the NumPy array. However, when I load the data into Pandas DataFrame and used .to_numpy() on the columns that I need to train, the following error kept showing:
ValueError Traceback (most recent call last)
<ipython-input-13-1444bf2a605c> in <module>()
16 clf = ak.TextClassifier(overwrite=True, max_trials=2)
17
---> 18 clf.fit(x_train, y_train, epochs=3, callbacks=cbs)
19
20
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).
Error-related code sectors
The sector where I drop the unneeded Pandas DataFrame columns using .drop(), and convert the needed columns to NumPy Array using the to_numpy() function that Pandas has provided.
df_src = pd.read_csv(get_data)
df_src = df_src.drop(columns=["Name", "Cast", "Plot", "Direction",
"Soundtrack", "Acting", "Cinematography"])
df_src = df_src.reset_index(drop=True)
X = df_src["Review"].to_numpy()
Y = df_src["Overall Sentiment"].to_numpy()
print(X, "\n")
print("\n", Y)
The main error code part, where I perform StratifedKFold() and at the same time, use TextClassifier() to train and test the model.
fold = 0
for train, test in skf.split(X, Y):
fold += 1
print(f"Fold #{fold}\n")
x_train = X[train]
y_train = Y[train]
x_test = X[test]
y_test = Y[test]
cbs = [tf.keras.callbacks.EarlyStopping(patience=3)]
clf = ak.TextClassifier(overwrite=True, max_trials=2)
# The line where it indicated the error.
clf.fit(x_train, y_train, epochs=3, callbacks=cbs)
pred = clf.predict(x_test) # result data type is in lists of `string`
ceval = clf.evaluate(x_test, y_test)
metrics_test = metrics.classification_report(y_test, np.array(list(pred), dtype=int))
print(metrics_test, "\n")
print(f"Fold #{fold} finished\n")
Supplementary
I am sharing the full code related to the error through Google Colab, which you can help me diagnose here.
Edit notes
I have tried the potential solution, such as:
x_train = np.asarray(x_train).astype(np.float32)
y_train = np.asarray(y_train).astype(np.float32)
or
x_train = tf.data.Dataset.from_tensor_slices((x_train,))
y_train = tf.data.Dataset.from_tensor_slices((y_train,))
However, the problem remains.
One of the strings is equal to nan. Just remove this entry and the corresponding label.

how to transfer type of data in Tesorflow code

Assuming the two models has been established in tensorflow,the model1 followed by model2.
The condition is that the output's type of model1 is a "tensor" and
the input type of model2 is requiring "ndarray" in creating structure of graph's model.(the data don't flow the graph) If we haven't build two or more Session, how we can combine model1 with model2.
(In fact, The library fuction requiring the input's type is "ndarray" can be call in model2. I don't want to code this process)
The sample is following
import tensorflow as tf
import cv2
img = cv2.read("star_sky.jpg")#assumpting shape of image is (256,256,3)
x_input = tf.placeholder(shape=(1,256,256,3),dtype=tf.float32)
W = tf.Variable(tf.random_normal([3,3,3,3]),dtype = tf.float32)
x_output_temp = tf.nn.conv2d(x_input,W,[1,1,1,1],padding="SAME")
#the other model want to use x_output to get Canny edge of image
x_output_ = x_output_temp[0]
x_output = cv2.Canny(x_output_,100,200)#number is parameter of threshold
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
img = [img]
x_output.eval({x_input:img})
If you want to use cv2 to process a tensorflow Tensor you need to do it inside a tf.py_func (which will convert the tensor to an ndarray at graph execution time and run the python code you pass on that array)

My TensorFlow Graph is abnormally large using Edward

I have code here that I've modified from this website. Basically what I have written is this:
#import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
#from tensorflow.examples.tutorials.mnist import input_data
from edward.models import Categorical, Normal
import edward as ed
#ed.set_seed(39)
import pandas as pd
import csv
# Use the TensorFlow method to download and/or load the data.
with open ("data_final.csv", "r") as csvfile:
reader1 = csv.reader(csvfile)
data1 = np.array(list(reader1)).astype(np.float)
#mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
N = data1.shape[0] -1 # number of images in a minibatch.
D = 4 # number of features.
K = 4 # number of classes.
# Create a placeholder to hold the data (in minibatches) in a TensorFlow graph.
x = tf.placeholder(tf.float32, [N, D])
# Normal(0,1) priors for the variables. Note that the syntax assumes TensorFlow 1.1.
w = Normal(loc=tf.zeros([D, K]), scale=tf.ones([D, K]))
b = Normal(loc=tf.zeros(K), scale=tf.ones(K))
# Categorical likelihood for classication.
y =tf.matmul(x,w)+b
# Contruct the q(w) and q(b). in this case we assume Normal distributions.
qw = Normal(loc=tf.Variable(tf.random_normal([D, K])),
scale=tf.nn.softplus(tf.Variable(tf.random_normal([D, K]))))
qb = Normal(loc=tf.Variable(tf.random_normal([K])),
scale=tf.nn.softplus(tf.Variable(tf.random_normal([K]))))
# We use a placeholder for the labels in anticipation of the traning data.
y_ph = tf.placeholder(tf.float32, [N, K])
# Define the VI inference technique, ie. minimise the KL divergence between q and p.
inference = ed.KLqp({w: qw, b: qb}, data={y:y_ph})
# Initialse the infernce variables
inference.initialize(n_iter=5000, n_print=100, scale={y: 1})
# We will use an interactive session.
sess = tf.InteractiveSession()
# Initialise all the vairables in the session.
tf.global_variables_initializer().run()
I use the data linked here, to run the code. I get an error after less than a second of running the code (so I have a hard time believing this actually happened) that said:
ValueError: GraphDef cannot be larger than 2GB.
I think there were other topics with the same error as mine, but those people had instantiated like 1 million parameters of something. I have on the order to 20 parameters, so unsure why I'm getting this error.
In my case, there were still variables (and likely a graphs) that were not garbage collected from a previous Edward runs. Garbage collecting/resetting the console fixed the problem.

Tensorflow: How to feed a placeholder variable with a tensor?

I have a placeholder variable that expects a batch of input images:
input_placeholder = tf.placeholder(tf.float32, [None] + image_shape, name='input_images')
Now I have 2 sources for the input data:
1) a tensor and
2) some numpy data.
For the numpy input data, I know how to feed data to the placeholder variable:
sess = tf.Session()
mLoss, = sess.run([loss], feed_dict = {input_placeholder: myNumpyData})
How can I feed a tensor to that placeholder variable?
mLoss, = sess.run([loss], feed_dict = {input_placeholder: myInputTensor})
gives me an error:
TypeError: The value of a feed cannot be a tf.Tensor object. Acceptable feed values include Python scalars, strings, lists, or numpy ndarrays.
I don't want to convert the tensor into a numpy array using .eval(), since that would slow my program down, is there any other way?
This has been discussed on GitHub in 2016, and please check here. Here is the key point by concretevitamin:
One key thing to note is that Tensor is simply a symbolic object. The values of your feed_dict are the actual values, e.g. a Numpy ndarry.
The tensor as a symbolic object is flowing in the graph while the actual values are outside of it, then we can only pass the actual values into the graph and the symbolic object can not exist outside the graph.
You can use feed_dict to feed data into non-placeholders. So, first, wire up your dataflow graph directly to your myInputTensor tensor data source (i.e. don't use a placeholder). Then when you want to run with your numpy data you can effectively mask myImportTensor with myNumpyData, like this:
mLoss, = sess.run([loss], feed_dict={myImportTensor: myNumpyData})
[I'm still trying to figure out how to do this with multiple tensor data sources however.]
One way of solving the problem is to actually remove the Placeholder tensor and replace it by your "myInputTensor".
You will use the myInputTensor as the source for the other operations in the graph and when you want to infer the graph with your np array as input data, you will feed a value to this tensor directly.
Here is a quick example:
import tensorflow as tf
import numpy as np
# Input Tensor
myInputTensor = tf.ones(dtype=tf.float32, shape=1) # In your case, this would be the results of some ops
output = myInputTensor * 5.0
with tf.Session() as sess:
print(sess.run(output)) # == 5.0, using the Tensor value
myNumpyData = np.zeros(1)
print(sess.run(output, {myInputTensor: myNumpyData}) # == 0.0 * 5.0 = 0.0, using the np value
This works for me in latest version...maybe you have older version of TF?
a = tf.Variable(1)
sess.run(2*a, feed_dict={a:5}) # prints 10