I wonder if I can use a Tensorflow Dataset for training scikit-learn and other ML frameworks.
So, for example, can I take a tf.data.dataset for training xgboost, LogisticReg, RandomForest classifier etc?
i.e. Can I pass the tf.data.dataset object into the .fit() method of these models, for training?
I tried out:
xs=np.asarray([i for i in range(10000)]).reshape(-1, 1)
ys=np.asarray([int(i%2==0)for i in range(10000)])
xs = tf.data.Dataset.from_tensor_slices(xs)
ys = tf.data.Dataset.from_tensor_slices(ys)
cls.fit(xs, ys)
I'm getting the following error:
TypeError: float() argument must be a string or a number, not 'TensorSliceDataset'
You can use the as_numpy_iterator() method; from the docs:
Returns an iterator which converts all elements of the dataset to numpy.
Following your example:
from sklearn.svm import SVC
x = list(xs.as_numpy_iterator())
y = list(ys.as_numpy_iterator())
clf = SVC(gamma='auto')
clf.fit(x, y)
Related
I want to implement wide and deep neural network using keras, I am using keras official code implementation but I want to pass numpy array as a dataset instead of csv file.
Code:
def get_dataset_from_csv(csv_file_path, batch_size, shuffle=False):
dataset = tf.data.experimental.make_csv_dataset(
csv_file_path,
batch_size=batch_size,
column_names=CSV_HEADER,
column_defaults=COLUMN_DEFAULTS,
label_name=TARGET_FEATURE_NAME,
num_epochs=1,
header=True,
shuffle=shuffle,
)
return dataset.cache()
used below link for code implementation: https://keras.io/examples/structured_data/wide_deep_cross_networks/
Above function is used to generate cache dataset using tensorflow function "make_csv_dataset" but I want to pass numpy array as dataset directly.
what could be the solution?
Keras doesn't support int type so you need to cast them to float. Simple way is changing the type to float after you load your data as numpy array:
x = x.astype('float32')
I am trying to write a function that runs KMeans on a dataset and outputs the cluster centroids. My aim is to use this in a custom keras layer, so I am using TensorFlow's implementation of KMeans that takes a tensor as the input dataset.
My problem however is that I can't make it work even as a standalone function. The problem comes from the fact that KMeans accepts a generator function that provides mini-batches instead of a plain tensor, but when I am using closure to do that, I get a graph disconnected error:
import tensorflow as tf # version: 2.4.1
from tensorflow.compat.v1.estimator.experimental import KMeans
#tf.function
def KMeansCentroids(inputs, num_clusters, steps, use_mini_batch=False):
# `inputs` is a 2D tensor
def input_fn():
# Each one of the lines below results in the same "Graph Disconnected" error. Tuples don't really needed but just to be consistent with the documentation
return (inputs, None)
return (tf.data.Dataset.from_tensor_slices(inputs), None)
return (tf.convert_to_tensor(inputs), None)
kmeans = KMeans(
num_clusters=num_clusters,
use_mini_batch=use_mini_batch)
kmeans.train(input_fn, steps=steps) # This is where the error happens
return kmeans.cluster_centers()
>>> x = tf.random.uniform((100, 2))
>>> c = KMeansCentroids(x, 5, 10)
The exact error is:
ValueError:
Tensor("strided_slice:0", shape=(), dtype=int32)
must be from the same graph as
Tensor("Equal:0", shape=(), dtype=bool)
(graphs are FuncGraph(name=KMeansCentroids, id=..) and <tensorflow.python.framework.ops.Graph object at ...>).
If I were to use a numpy dataset and convert to tensor inside the function, the code would work just fine.
Also, making input_fn() return directly tf.random.uniform((100, 2)) (ignoring the inputs argument), would again work. That's why I am guessing that tensorflow doesn't support closures since it needs to build the computation graph at the beginning.
But I don't see how to work around that.
Could it be a version error due to KMeans being a compat.v1.experimental module?
Note that the documentation of KMeans states for the input_fn():
The function should construct and return one of the following:
A tf.data.Dataset object: Outputs of Dataset object must be a tuple (features, labels) with same constraints as below.
A tuple (features, labels): Where features is a tf.Tensor or a dictionary of string feature name to Tensor and labels is a Tensor or a dictionary of string label name to Tensor. Both features and labels are consumed by model_fn. They should satisfy the expectation of model_fn from inputs.
The problem you're facing is more about invoking tensor outside the created graph. Basically, when you called the .train function, a new graph will be created and that is with the graph defined in that input_fn and the graph defined in the model_fn.
kmeans.train(input_fn, steps=steps)
And, after that all the tensors those coming outside these functions will be treated as outsiders and won't part of this new graph. That's why you're getting a graph disconnected error for trying to use outsider tensor. To resolve this, you need to create the necessary tensors within these graphs.
import tensorflow as tf
from tensorflow.compat.v1.estimator.experimental import KMeans
#tf.function
def KMeansCentroids(num_clusters, steps, use_mini_batch=False):
def input_fn(batch_size):
pinputs = tf.random.uniform((100, 2))
dataset = tf.data.Dataset.from_tensor_slices((pinputs))
dataset = dataset.shuffle(1000).repeat()
return dataset.batch(batch_size)
kmeans = KMeans(
num_clusters=num_clusters,
use_mini_batch=use_mini_batch)
kmeans.train(input_fn = lambda: input_fn(5),
steps=steps)
return kmeans.cluster_centers()
c = KMeansCentroids(5, 10)
Here is some more info for reading, 1. FYI, I tested your code with a few versions of tf > 2, and I don't think it's related to version error or something.
Re-mentioning here for future readers. An alternative of using KMeans within Keras layers:
tf_kmeans.py
ClusteringLayer
I can't find a simple way to convert a tensor to a NumPy array without enabling eager mode, which gives a nice .numpy() method, but also slows down my model training.
I'd be super grateful for your suggestions. For context, I'm writing a custom metric for my TensorFlow model that relies on a scikit learn function, which only takes numpy arrays.
I've tried wrapping the tensors with np.array(), which throws a not implemented error. Also gave sessions and .eval() a go, but didn't get it to work either and seemed like too much for this simple job.
My specific error:
NotImplementedError: Cannot convert a symbolic Tensor (model_17/dense_17/Sigmoid:0) to a numpy array.
# Custom metric
def accuracy_ml(y_true, y_pred):
return accuracy_score(y_true, np.round(y_pred)) # ERROR here feeding tensor to sklearn function
# Model
cnn = simple_model(input_shape=(224, 224, 3),
num_classes=10,
base_model = base_ResNet101)
lr = 1e-2
loss_fn = tf.keras.losses.BinaryCrossentropy()
metrics = [accuracy_ml]
cnn.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=lr),
loss=loss_fn,
metrics=metrics)
# Simple baseline eval that fails
validation_steps=17
loss0, accuracy0 = cnn.evaluate(validation_batches, steps = validation_steps)
Wrapping my NumPy metric with tf.numpy_function() solved it. https://www.tensorflow.org/api_docs/python/tf/numpy_function
Trying to use non keras backend functions for custom loss calculation in keras models.
I am trying to make my keras cnn model use a custom loss function ( KAppa score). However since kappas is not defined in Keras backend , i need to used scikit-learn based kappa implementation. This sklearn function takes array of labels as the argument unlike keras backend functions which take tensors. The loss function call within keras mostly sends tensors Y_pred and Y_true. I did the implementation below using some quide i found online but I get errors .
import keras.backend as K
def cohen_kappa_score_func(y_true, y_pred):
sess = tf.Session()
with sess.as_default():
score = cohen_kappa_score(type(y_true.eval()),type(y_pred.eval()), weights='linear')#idea is to convert the tensor to array
sess.close()
return score
#use this later to compile the keras model with custom loss function as
model.compile(optimizer=optimizers.SGD(lr=0.001, momentum=0.9),
loss=cohen_kappa_score_func,
metrics=['categorical_crossentropy', 'mae','categorical_accuracy'])
This doesnt work and i get the following error
"InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'dense_15_target' with dtype float and shape [?,?]
[[node dense_15_target "
Please give me suggestios to solve this.
I'd like to pass the parameters of the trained model (weights and bias for convolution and fully connected layers) to other frameworks or languages including iOS and Torch by parsing the saved file.
I tried tf.train.write_graph(session.graph_def, '', 'graph.pb'), but it seems it only includes the graph architecture without weights and bias. If so, to create checkpoint file (saver.save(session, "model.ckpt")) is the best way? Is it easy to parse ckpt file type in Swift or other languages?
Please let me know if you have any suggestions.
Instead of parsing a .ckpt file, you can just try evaluating the tensor (in your case the weights of a convolutional layer) and getting the values as a numpy array. Here is a quick toy example (tested on r0.10 - there might some small API changes in newer versions):
import tensorflow as tf
import numpy as np
x = tf.placeholder(np.float32, [2,1])
w = tf.Variable(tf.truncated_normal([2,2], stddev=0.1))
b = tf.Variable(tf.constant(1.0, shape=[2,1]))
z = tf.matmul(w, x) + b
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
w_val, z_val = sess.run([w, z], feed_dict={x: np.arange(2).reshape(2,1)})
print(w_val)
print(z_val)
Output:
[[-0.02913031 0.13549708]
[ 0.13807134 0.03763327]]
[[ 1.13549709]
[ 1.0376333 ]]
If you have trouble getting a reference to your tensor (say it is in nested into a higher-level "layer" operation), try finding by name. More info here: Tensorflow: How to get a tensor by name?
If you want to see the how the weights change during training, you can also try to save all the values you are interested into tf.Summary objects and parse them later: Parsing `summary_str` byte string evaluated on tensorflow summary object