I have a dataset in a python dictionary. The structure is as follow:
data.data['0']['input'],data.data['0']['target'],data.data['0']['length']
Both input and target are arrays of size (n,) and length is an int.
I have created a class object with tf.keras.utils.Sequence and specify __getitem__ as this:
def __getitem__(self, idx):
idx = str(idx)
return {
'input': np.asarray(self.data[idx]['input']),
'target': np.asarray(self.data[idx]['target']),
'length': self.data[idx]['length']
}
How can I iterate over such dataset using tf.data.Dataset? I am getting this error if I try to use from_tensor_slices
ValueError: Attempt to convert a value with an unsupported type (<class 'dict'>) to a Tensor.
I think you should modify the dictionary to a tensor as proposed here convert a dictionary to a tensor
or change the dictionary to a text file or to a tfrecords. Hope this would help you!
simple question and im sure answer is straightforward but im really struggling to match model shape with tensor fitting into model.
this simple code
let tf = require('#tensorflow/tfjs-node');
let features = {
x: [1,2,3,4,5,6,7,8,9],
y: [1,2,3,4,5,6,7,8,9]
}
let tensorfeature = tf.tensor2d(Object.values(features))
console.log(tensorfeature.shape)
const model = tf.sequential();
model.add(tf.layers.dense(
{
inputShape: tensorfeature.shape,
units: 1
}
))
const optimizer = tf.train.sgd(0.005);
model.compile({optimizer: optimizer, loss: 'meanAbsoluteError'});
model.fit(tensorfeature,
{epochs: 5}
)
Results in Error: Error when checking input: expected dense_Dense1_input to have 3 dimension(s). but got array with shape 2,9
tried multiple things with reshape, slice, etc with no luck. Can someone point me what exactly is wrong?
model.fit takes at least two parameters x, y which are either tensors or array of tensors. The config object is the third parameter.
Also, the feature(tensorfeature) tensor passed as argument to model.fit should be one dimension higher than the inputShape of the model. Since tensorfeature.shape is used as the inputShape, if we want to traing the model with tensorfeature its dimension should be expanded. It can be done using reshape or expandDims.
model.fit(tensorfeature.expandDims(0))
// or possibly
model.fit(tensorfeature.reshape([1, ...tensorfeature.shape])
This shape mismatch between the model and the training data has been discussed here and there
I currently follow the tutorial to retrain Inception for image classification:
https://cloud.google.com/blog/big-data/2016/12/how-to-train-and-classify-images-using-google-cloud-machine-learning-and-cloud-dataflow
However, when I make a prediction with the API I get only the index of my class as a label. However I would like that the API actually gives me a string back with the actual class name e.g instead of
​predictions:
- key: '0'
prediction: 4
scores:
- 8.11998e-09
- 2.64907e-08
- 1.10307e-06
I would like to get:
​predictions:
- key: '0'
prediction: ROSES
scores:
- 8.11998e-09
- 2.64907e-08
- 1.10307e-06
Looking at the reference for the Google API it should be possible:
https://cloud.google.com/ml-engine/reference/rest/v1/projects/predict
I already tried to change in the model.py the following to
outputs = {
'key': keys.name,
'prediction': tensors.predictions[0].name,
'scores': tensors.predictions[1].name
}
tf.add_to_collection('outputs', json.dumps(outputs))
to
if tensors.predictions[0].name == 0:
pred_name ='roses'
elif tensors.predictions[0].name == 1:
pred_name ='tulips'
outputs = {
'key': keys.name,
'prediction': pred_name,
'scores': tensors.predictions[1].name
}
tf.add_to_collection('outputs', json.dumps(outputs))
but this doesn't work.
My next idea was to change this part in the preprocess.py file. So instead getting the index I want to use the string label.
def process(self, row, all_labels):
try:
row = row.element
except AttributeError:
pass
if not self.label_to_id_map:
for i, label in enumerate(all_labels):
label = label.strip()
if label:
self.label_to_id_map[label] = label #i
and
label_ids = []
for label in row[1:]:
try:
label_ids.append(label.strip())
#label_ids.append(self.label_to_id_map[label.strip()])
except KeyError:
unknown_label.inc()
but this gives the error:
TypeError: 'roses' has type <type 'str'>, but expected one of: (<type 'int'>, <type 'long'>) [while running 'Embed and make TFExample']
hence I thought that I should change something here in preprocess.py, in order to allow strings:
example = tf.train.Example(features=tf.train.Features(feature={
'image_uri': _bytes_feature([uri]),
'embedding': _float_feature(embedding.ravel().tolist()),
}))
if label_ids:
label_ids.sort()
example.features.feature['label'].int64_list.value.extend(label_ids)
But I don't know how to change it appropriately as I could not find someting like str_list. Could anyone please help me out here?
Online prediction certainly allows this, the model itself needs to be updated to do the conversion from int to string.
Keep in mind that the Python code is just building a graph which describes what computation to do in your model -- you're not sending the Python code to online prediction, you're sending the graph you build.
That distinction is important because the changes you have made are in Python -- you don't yet have any inputs or predictions, so you won't be able to inspect their values. What you need to do instead is add the equivalent lookups to the graph that you're exporting.
You could modify the code like so:
labels = tf.constant(['cars', 'trucks', 'suvs'])
predicted_indices = tf.argmax(softmax, 1)
prediction = tf.gather(labels, predicted_indices)
And leave the inputs/outputs untouched from the original code
I want to create tensorflow records to feed my model;
so far I use the following code to store uint8 numpy array to TFRecord format;
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
def convert_to_record(name, image, label, map):
filename = os.path.join(params.TRAINING_RECORDS_DATA_DIR, name + '.' + params.DATA_EXT)
writer = tf.python_io.TFRecordWriter(filename)
image_raw = image.tostring()
map_raw = map.tostring()
label_raw = label.tostring()
example = tf.train.Example(features=tf.train.Features(feature={
'image_raw': _bytes_feature(image_raw),
'map_raw': _bytes_feature(map_raw),
'label_raw': _bytes_feature(label_raw)
}))
writer.write(example.SerializeToString())
writer.close()
which I read with this example code
features = tf.parse_single_example(example, features={
'image_raw': tf.FixedLenFeature([], tf.string),
'map_raw': tf.FixedLenFeature([], tf.string),
'label_raw': tf.FixedLenFeature([], tf.string),
})
image = tf.decode_raw(features['image_raw'], tf.uint8)
image.set_shape(params.IMAGE_HEIGHT*params.IMAGE_WIDTH*3)
image = tf.reshape(image_, (params.IMAGE_HEIGHT,params.IMAGE_WIDTH,3))
map = tf.decode_raw(features['map_raw'], tf.uint8)
map.set_shape(params.MAP_HEIGHT*params.MAP_WIDTH*params.MAP_DEPTH)
map = tf.reshape(map, (params.MAP_HEIGHT,params.MAP_WIDTH,params.MAP_DEPTH))
label = tf.decode_raw(features['label_raw'], tf.uint8)
label.set_shape(params.NUM_CLASSES)
and that's working fine. Now I want to do the same with my array "map" being a float numpy array, instead of uint8, and I could not find examples on how to do it;
I tried the function _floats_feature, which works if I pass a scalar to it, but not with arrays;
with uint8 the serialization can be done by the method tostring();
How can I serialize a float numpy array and how can I read that back?
FloatList and BytesList expect an iterable. So you need to pass it a list of floats. Remove the extra brackets in your _float_feature, ie
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
numpy_arr = np.ones((3,)).astype(np.float)
example = tf.train.Example(features=tf.train.Features(feature={"bytes": _floats_feature(numpy_arr)}))
print(example)
features {
feature {
key: "bytes"
value {
float_list {
value: 1.0
value: 1.0
value: 1.0
}
}
}
}
I will expand on the Yaroslav's answer.
Int64List, BytesList and FloatList expect an iterator of the underlying elements (repeated field). In your case you can use a list as an iterator.
You mentioned: it works if I pass a scalar to it, but not with arrays. And this is expected, because when you pass a scalar, your _floats_feature creates an array of one float element in it (exactly as expected). But when you pass an array you create a list of arrays and pass it to a function which expects a list of floats.
So just remove construction of the array from your function: float_list=tf.train.FloatList(value=value)
I've stumbled across this while working on a similar problem. Since part of the original question was how to read back the float32 feature from tfrecords, I'll leave this here in case it helps anyone:
If map.ravel() was used to input map of dimensions [x, y, z] into _floats_feature:
features = {
...
'map': tf.FixedLenFeature([x, y, z], dtype=tf.float32)
...
}
parsed_example = tf.parse_single_example(serialized=serialized, features=features)
map = parsed_example['map']
Yaroslav's example failed when a nd array was the input:
numpy_arr = np.ones((3,3)).astype(np.float)
I found that it worked when I used numpy_arr.ravel() as the input. But is there a better way to do it?
First of all, many thanks to Yaroslav and Salvador for their enlightening answers.
According to my experience, their methods only works when the input is a 1D NumPy array as the size of (n, ). When the input is a Numpy array with the dimension of more than 2, the following error info appears:
def _float_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
numpy_arr = np.arange(12).reshape(2, 2, 3).astype(np.float)
example = tf.train.Example(features=tf.train.Features(feature={"bytes":
_float_feature(numpy_arr)}))
print(example)
TypeError: array([[0., 1., 2.],
[3., 4., 5.]]) has type numpy.ndarray, but expected one of: int, long, float
So, I'd like to expand on Tsuan's answer, that is, flattening the input before it was fed into the TF example. The modified code is as follows:
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
numpy_arr = np.arange(12).reshape(2, 2, 3).astype(np.float).flatten()
example = tf.train.Example(features=tf.train.Features(feature={"bytes":
_float_feature(numpy_arr)}))
print(example)
In addition, np.flatten() is more applicable than np.ravel().
Use tfrmaker, a TFRecord utility package. You can install the package with pip:
pip install tfrmaker
Then you could create tfrecords like this:
from tfrmaker import images
# mapping label names with integer encoding.
LABELS = {"bishop": 0, "knight": 1, "pawn": 2, "queen": 3, "rook": 4}
# specifiying data and output directories.
DATA_DIR = "datasets/chess/"
OUTPUT_DIR = "tfrecords/chess/"
# create tfrecords from the images present in the given data directory.
info = images.create(DATA_DIR, LABELS, OUTPUT_DIR)
# info contains a list of information (path: releative path, size: no of images in the tfrecord) about created tfrecords
print(info)
The package also has some cool features like:
dynamic resizing
splitting tfrecords into optimal shards
spliting training, validation, testing of tfrecords
count no of images in tfrecords
asynchronous tfrecord creation
NOTE: This package currently supports image datasets that are organised as directories with class names as sub directory names.
I train a model with a placeholder for is_training:
is_training_ph = tf.placeholder(tf.bool)
however once training and validation are done, I would like to permanently inject a constant of false in for this value and then "re-optimize" the graph (ie using optimize_for_inference). Is there something along the lines of freeze_graph that will do this?
One possibility is to use the tf.import_graph_def() function and its input_map argument to rewrite the value of that tensor in the graph. For example, you could structure your program as follows:
with tf.Graph().as_default() as training_graph:
# Build model.
is_training_ph = tf.placeholder(tf.bool, name="is_training")
# ...
training_graph_def = training_graph.as_graph_def()
with tf.Graph().as_default() as temp_graph:
tf.import_graph_def(training_graph_def,
input_map={is_training_ph.name: tf.constant(False)})
temp_graph_def = temp_graph.as_graph_def()
After building temp_graph_def, you can use it as the input to freeze_graph.
An alternative, which might be more compatible with the freeze_graph and optimize_for_inference scripts (which make assumptions about variable names and checkpoint keys) would be to modify TensorFlow's graph_util.convert_variables_to_constants() function so that it converts placeholders instead:
def convert_placeholders_to_constants(input_graph_def,
placeholder_to_value_map):
"""Replaces placeholders in the given tf.GraphDef with constant values.
Args:
input_graph_def: GraphDef object holding the network.
placeholder_to_value_map: A map from the names of placeholder tensors in
`input_graph_def` to constant values.
Returns:
GraphDef containing a simplified version of the original.
"""
output_graph_def = tf.GraphDef()
for node in input_graph_def.node:
output_node = tf.NodeDef()
if node.op == "Placeholder" and node.name in placeholder_to_value_map:
output_node.op = "Const"
output_node.name = node.name
dtype = node.attr["dtype"].type
data = np.asarray(placeholder_to_value_map[node.name],
dtype=tf.as_dtype(dtype).as_numpy_dtype)
output_node.attr["dtype"].type = dtype
output_node.attr["value"].CopyFrom(tf.AttrValue(
tensor=tf.contrib.util.make_tensor_proto(data,
dtype=dtype,
shape=data.shape)))
else:
output_node.CopyFrom(node)
output_graph_def.node.extend([output_node])
return output_graph_def
...then you could build training_graph_def as above, and write:
temp_graph_def = convert_placeholders_to_constants(training_graph_def,
{is_training_ph.op.name: False})