How to convert a MapDataset to a 1D tensor?

How to convert a MapDataset to a 1D tensor? - tensorflow

I am trying to read data that is one id per line to a 1D tensor of ids. My current strategy is to read the data from files to TextLineDataset and then convert each id from string to int64. But the output is in MapDataset and I am not sure how to convert it to a 1D tensor
def decode_vocab_fn(record_bytes):
data = tf.io.decode_csv(record_bytes, [["item_list"]], field_delim='\x01')
return tf.strings.to_number(data[0],out_type=tf.dtypes.int64)
expanded_eval_paths = io.expand_paths(data_path)
files = tf.data.Dataset.list_files(expanded_eval_paths,shuffle=True)
dataset = tf.data.TextLineDataset(files)
dataset = dataset.map(decode_vocab_fn)
Can anyone give some pointers on how to convert a MapDataset to a 1D tensor? Thank you!
Can anyone give some pointers on how to convert a MapDataset to a 1D tensor? Thank you!

Related

tf.io.decode_raw return tensor how to make it bytes or string

I'm struggling with this for a while. I searched stack and check tf2
doc a bunch of times. There is one solution indicated, but
I don't understand why my solution doesn't work.
In my case, I store a binary string (i.e., bytes) in tfrecords.
if I iterate over dataset via as_numpy_list or directly call numpy()
on each item, I can get back binary string.
while iterating the dataset, it does work.
I'm not sure what exactly map() passes to test_callback.
I see doesn't have a method nor property numpy, and the same about type
tf.io.decode_raw return. (it is Tensor, but it has no numpy as well)
Essentially I need to take a binary string, parse it via my
x = decoder.FromString(y) and then pass it my encoder
that will transform x binary string to tensor.
def test_callback(example_proto):
# I tried to figure out. can I use bytes?decode
# directly and what is the most optimal solution.
parsed_features = tf.io.decode_raw(example_proto, out_type=tf.uint8)
# tf.io.decoder returns tensor with N bytes.
x = creator.FromString(parsed_features.numpy)
encoded_seq = midi_encoder.encode(x)
return encoded_seq
raw_dataset = tf.data.TFRecordDataset(filenames=["main.tfrecord"])
raw_dataset = raw_dataset.map(test_callback)
Thank you, folks.

I found one solution but I would love to see more suggestions.
def test_callback(example_proto):
from_string = creator.FromString(example_proto.numpy())
encoded_seq = encoder.encoder(from_string)
return encoded_seq
raw_dataset = tf.data.TFRecordDataset(filenames=["main.tfrecord"])
raw_dataset = raw_dataset.map(lambda x: tf.py_function(test_callback, [x], [tf.int64]))
My understanding that tf.py_function has a penalty on performance.
Thank you

How can I efficiently replace the last row of a rank-2 tensor with zeros?

Let us say that I have a rank-2 tensor (a matrix). I want fill the last row of this pre-existing matrix with zeros. I would not like tensorflow to copy the whole matrix in a new place, because it is huge. Is it possible to do?

The answer is based on David Parks' suggestion to look into this thread:
How to do slice assignment in Tensorflow
Using this answer I have arrived at the exact solution to my problem:
a = tf.Variable(tf.ones([10, 36, 36]))
value = tf.zeros([36, 36])
d = tf.scatter_update(a, 9 , value)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print a.eval(session=sess)
sess.run(d)
print a.eval(session=sess)

Why does a dot product of two masked vectors in numpy return an oddly shaped array?

I have the following code:
result = np.ma.dot( array1, masked_array2 )
Which gives something like this:
masked_array(data = 24.681441709536468,
mask = False,
fill_value = 1e+20)
result.data.shape gives:
()
I can access the value by converting it to a float, like
float(result.data)
Is this the correct way of accessing the data?

The result is a 0D tensor.
Typically numpy converts 0D tensor to native type
type(np.dot([1,2], [3,4])) # gives 'int'
However, when the result is masked array, due to the existence of mask, there's no way to convert it directly to a native type without losing information. Thus you get a "oddly shaped" 0D tensor as result.
Yes, you can access it by converting it to float.

Numpy Array Shape Issue

I have initialized this empty 2d np.array
inputs = np.empty((300, 2), int)
And I am attempting to append a 2d row to it as such
inputs = np.append(inputs, np.array([1,2]), axis=0)
But Im getting
ValueError: all the input arrays must have same number of dimensions
And Numpy thinks it's a 2 row 0 dimensional object (transpose of 2d)
np.array([1, 2]).shape
(2,)
Where have I gone wrong?

To add a row to a (300,2) shape array, you need a (1,2) shape array. Note the matching 2nd dimension.
np.array([[1,2]]) works. So does np.array([1,2])[None, :] and np.atleast_2d([1,2]).
I encourage the use of np.concatenate. It forces you to think more carefully about the dimensions.
Do you really want to start with np.empty? Look at its values. They are random, and probably large.
#Divakar suggests np.row_stack. That puzzled me a bit, until I checked and found that it is just another name for np.vstack. That function passes all inputs through np.atleast_2d before doing np.concatenate. So ultimately the same solution - turn the (2,) array into a (1,2)

Numpy requires double brackets to declare an array literal, so
np.array([1,2])
needs to be
np.array([[1,2]])

If you intend to append that as the last row into inputs, you can just simply use np.row_stack -
np.row_stack((inputs,np.array([1,2])))
Please note this np.array([1,2]) is a 1D array.
You can even pass it a 2D row version for the same result -
np.row_stack((inputs,np.array([[1,2]])))

assign certain entries of Tensor, like set_subtensor of Theano

Can I just assign values to certain entries in a tensor? I got this problems when I compute the cross correlation matrix of a NxP feature matrix feats, where N is observations and P is dimension. Some columns are constant so the standard deviation is zero, and I don't want to devide by std for those constant column. Here is what I did:
fmean, fvar = tf.nn.moments(feats, axes = [0], keep_dims = False)
fstd = tf.sqrt(fvar)
feats = feats - fmean
sel = (fstd != 0)
feats[:, sel] = feats[:, sel]/ fstd[sel]
corr = tf.matmul(tf.transpose(feats), feats)
However, I got this error: TypeError: 'Tensor' object does not support item assignment. Is there any workaround for such issue?

You can make your feats a tf.Variable and use tf.scatter_update to update locations selectively.
It's a bit awkward in that scatter_update needs a list of linear indices to update, so you'd need to convert your [:, sel] implicit 2D specification into explicit list of 1D indices. There's example of constructing 1D indices from 2D here
There's some work in simplifying this kind of use-case in issue #206

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to convert a MapDataset to a 1D tensor? - tensorflow

Related

tf.io.decode_raw return tensor how to make it bytes or string

How can I efficiently replace the last row of a rank-2 tensor with zeros?

Why does a dot product of two masked vectors in numpy return an oddly shaped array?

Numpy Array Shape Issue

assign certain entries of Tensor, like set_subtensor of Theano

Categories

Resources