TensorFlow: how find out that tensor already created - tensorflow

I'm trying to create big tensor with shape 42000x28x28, data comes as a ArrayBuffer, let's assume that I create tensor something like that:
tf.tensor4d(new Uint8Array(xsTrainBuffer), [42000 * 0.9, 28, 28, 1]).div<Tensor<Rank.R4>>(255);
The question is - how find out that tensor is already prepared and I may use it in model training, since this function doesn't return any promise, but as for me it looks a little bit weird consider this operation as synchronous one.

I believe a better approach would be the tf.data.array() function. This will create a DataSet for you that is ideal for working with large amounts of data.
While the function itself is synchronous, it does a lazy loading of the data - so immediately after you call the function, if you try to log the data, it will show empty. However, use the forEachAsync() function to log the data and you will see that it reads it correctly.
Check out this iris-fitDataset example where they use the tf.data.array() function.
Note that Dataset's have their own model functions such as fitDataset() instead of fit() so you don't have to convert the Dataset into anything else.

Related

Applying Tensorflow Dataset .map() to subsequent dataset elements

I've got a TFRecordDataset and I'm trying to preprocess the features of two subsequent elements by means of the map() API.
dataset_ext = dataset.map(lambda x: tf.py_function(parse_data, [x], [tf.float32]))
As map applies the function parse_data to every dataset element, I don't know what parse_data should look like in order to keep track of the feature extracted from the previous dataset element.
Can anyone help? Thank you
EDIT: I'm working on the Waymo dataset, so each element is a frame. You can refer to https://github.com/Jossome/Waymo-open-dataset-document for its structure.
This is my parse function parse_data:
from waymo_open_dataset import dataset_pb2 as open_dataset
def parse_data(input_data):
frame = open_dataset.Frame()
frame.ParseFromString(bytearray(input_data.numpy()))
av_speed = (frame.images[0].velocity.v_x, frame.images[0].velocity.v_y, frame.images[0].velocity.v_z)
return av_speed
I'd like to build a dataset whose features are the car speed and acceleration, defined as the speed variation between subsequent frames (the first value can be 0).
One way I thought about is to give the map function dataset and dataset.skip(1) as inputs but I'm not sure about it yet.
I am not sure but it might be unnecessary to make your mapped function a tf.py_function. How parse_data is supposed to look like depends on your dataset dataset_ext. If it has for example two file paths (1 instace of input data and 1 instance of output data), the mapping function should have 2 arguments and should return 2 arguments.
For example: if your dataset contains images and you want them to be randomly cropped each time an example of your dataset is drawn the mapping function looks like this:
def process_img_random_crop(img_in, img_out, output_shape):
merged = tf.stack([img_in, img_out])
mergedCrop = tf.image.random_crop(merged, size=(2,) + output_shape)
img_in_cropped, img_out_cropped = tf.unstack(mergedCrop, 2, 0)
return img_in_cropped, img_out_cropped
I call it as follows:
image_ds_test = image_ds_test.map(lambda i, o: process_img_random_crop(i, o, output_shape=(64, 64, 1)), num_parallel_calls=tf.data.experimental.AUTOTUNE)
What exactly is your plan with dataset_ext and what does it contain?
Edit:
Okay, got what you meant with you the two frames. So the map function is applied to each entry of your dataset separatly. If you need cross-entry information, a single entry of your dataset needs to contain two frames. With this more complicated set-up, I would suggest you to use a tensorflow Sequence: The explanation from the tensorflow team is pretty straigth forward. Hope this help!

Learning to rank how to save model

I successfully managed to implement learning to rank by following the tutorial TF-Ranking for sparse features using the ANTIQUE question answering dataset.
Now my goal is to successfully save the learned model to disk so that I can easily load it without training again. Due to the Tensorflow docs, the estimator.export_saved_model() method seems to be the way to go. But I can't wrap my head around how to tell Tensorflow how my feature structure looks like. Due to the docs here the easiest way seems to be calling tf.estimator.export.build_parsing_serving_input_receiver_fn(), which returns me the required inpur receiver function which I have to pass to the export_saved_model function. But how do I tell Tensorflow how my features from my learning to rank model look like?
From my current understanding the model has context feature specs and example feature specs. So I guess I somehow have to combine those two specs into one feature description, which I then can pass to the build_parsing_serving_input_receiver_fn function?
So I think you are on the right track;
You can get a build_ranking_serving_input_receiver_fn like this: (substitue context_feature_columns(...) and example_feature_columns(...) with defs you probably have for creating your own context and example structures for your training data):
def example_serving_input_fn():
context_feature_spec = tf.feature_column.make_parse_example_spec(
context_feature_columns(_VOCAB_PATHS).values())
example_feature_spec = tf.feature_column.make_parse_example_spec(
list(example_feature_columns(_VOCAB_PATHS).values()))
servingInputReceiver = tfr.data.build_ranking_serving_input_receiver_fn(
data_format=tfr.data.ELWC,
context_feature_spec=context_feature_spec,
example_feature_spec=example_feature_spec,
list_size=_LIST_SIZE,
receiver_name="input_ranking_data",
default_batch_size=None)
return servingInputReceiver
And then pass this to export_saved_model like this:
ranker.export_saved_model('path_to_save_model', example_serving_input_fn())
(ranker here is a tf.estimator.Estimator, maybe you called this 'estimator' in your code)
ranker = tf.estimator.Estimator(
model_fn=model_fn,
model_dir=_MODEL_DIR,
config=run_config)

How to implement the tensor product of two layers in Keras/Tf

I'm trying to set up a DNN for classification and at one point I want to take the tensor product of a vector with itself. I'm using the Keras functional API at the moment but it isn't immediately clear that there is a layer that does this already.
I've been attempting to use a Lambda layer and numpy in order to try this, but it's not working.
Doing a bit of googling reveals
tf.linalg.LinearOperatorKronecker, which does not seem to work either.
Here's what I've tried:
I have a layer called part_layer whose output is a single vector (rank one tensor).
keras.layers.Lambda(lambda x_array: np.outer(x_array, x_array),) ( part_layer) )
Ideally I would want this to to take a vector of the form [1,2] and give me [[1,2],[2,4]].
But the error I'm getting suggests that the np.outer function is not recognizing its arguments:
AttributeError: 'numpy.ndarray' object has no attribute '_keras_history
Any ideas on what to try next, or if there is a simple function to use?
You can use two operations:
If you want to consider the batch size you can use the Dot function
Otherwise, you can use the the dot function
In both case the code should look like this:
dot_lambda = lambda x_array: tf.keras.layers.dot(x_array, x_array)
# dot_lambda = lambda x_array: tf.keras.layers.Dot(x_array, x_array)
keras.layers.Lambda(dot_lamda)( part_layer)
Hope this help.
Use tf.tensordot(x_array, x_array, axes=0) to achieve what you want. For example, the expression print(tf.tensordot([1,2], [1,2], axes=0)) gives the desired result: [[1,2],[2,4]].
Keras/Tensorflow needs to keep an history of operations applied to tensors to perform the optimization. Numpy has no notion of history, so using it in the middle of a layer is not allowed. tf.tensordot performs the same operation, but keeps the history.

How do I get and use value from a tensor within a TF 2.0 Dataset map step?

I'm using TensorFlow Alpha 2.0.
I have TFRecords files I'm reading from, each one holding a short video clip with each frame encoded as jpeg byte string to save space:
{
'numframes': tf.io.FixedLenFeature([], tf.int64),
'frames': tf.io.VarLenFeature(tf.string)
}
I have a map step in my tf.data.Dataset pipeline that successfully parses each example:
def parse_tfrecord(p):
return tf.io.parse_single_example(p, example_schema)
My next step is to read out the number of frames from numframes and run the tf.io.decode_jpeg function on each frame in frames.values[i] with i being from range(numframes):
def parse_jpegs(p):
numframes = p['numframes']
return tf.map_fn(tf.io.decode_jpeg, [p['frames'].values[i] for i in range(numframes)])
My dataset pipeline for completeness:
def dataset():
dataset = tf.data.Dataset.list_files("*.tfrecord")
dataset = tf.data.TFRecordDataset(dataset)
dataset = dataset.shuffle(1000).repeat()
dataset = dataset.map(parse_tfrecord)
dataset = dataset.map(parse_jpegs)
return dataset
If I exclude the dataset.map(parse_jpegs) line it all works alright, showing me something like {'frames': <tensorflow.python.framework.sparse_tensor.SparseTensor at 0x7f394c285518>, 'numframes': <tf.Tensor: id=2937, shape=(), dtype=int64, numpy=25>}
(Note that the numframes tensor includes a numpy value of 25. I can get that outside my dataset pipeline with the tensor.numpy() method)
Within that map function though, I can't call .numpy() to get the value out of the tensor, and when printing the tensor itself it hasn't been evaluated or something because there is no value shown yet.
What is the best way to parse all these frames within the dataset pipeline?
EDIT: Error message I'm getting is TypeError: 'Tensor' object cannot be interpreted as an integer in parse_jpegs when trying to get numframes. This makes sense to me why a tensor can't be interpreted as an int, but how can I get the value from that tensor to use to set the range?
The problem I'm running into comes down to the fact that each "frames" object has a different number of frames. If I can apply tf.io.decode_jpeg to each frame in that list without needing to record number of frames separately I would be fine with that, but I have "numframes" here so I know how many frames need to be decoded in my "frames" list.
EDIT: I'll heave the question up for anyone else who might find it helpful, but I ended up just returning the raw bytestrings and doing the decode_jpeg in a separate generator function outside the dataset API. It was much easier that way, even if it might be slower.
In my specific case, I ended up finding out that map_fn was trying to turn my input tensor into an output tensor of the same type. In this case, tf.io.decode_jpeg takes in a string (of bytes) and outputs a uint8 array, which was causing problems. Another argument to tf.map_fn(... output_type=tf.uint8) seems to have fixed it for me! Maybe not exactly as written since I continued tinkering with it since asking the question, but I got it working now.

Pytorch register_hook to Keras implementation

Im trying to implement the following project into Tensorflow/Keras.
https://github.com/jacobgil/pytorch-pruning
Im having a hard time understanding what register_hook does? It can be found in finetune.py, row 66.
x.register_hook(self.compute_rank)
I've searched for clear explanations regarding this function and tried to find Keras-equivalents, without any luck. Do you have answers to these questions?
First things first, here's the documentation:
http://pytorch.org/docs/master/autograd.html#torch.autograd.Variable.register_hook
This allows you to register a method to a Variable that is called whenever the Variable's .grad is updated, i.e. in a backward pass, and takes the grad as input. The method can return a Variable that would replace the original .grad or None if you just want to read the gradients to do something else.
If you update the gradients this way, the nodes further down in the compute graph see the new updated gradient in the backward pass and will have their respective gradients calculated with the updated value.
I'm not a Tensorflow expert, but the RegisterGradient decorators (documentation) seem to be able to do the same, for an example see this answer.