Is there an efficient way to select 5 regions of a tensor in Tensorflow? - tensorflow

For example, given a tensor m which its shape is [28, 28].
I want to randomly select five regions with the tensor, the shape of each region is [3, 3].
Then, I want to modify the values of these regions.

One sulution would be random extraction inside a loop:
import random
tensor = tf.ones(shape=(28,28))
desired_shape = (3,3)
dim1 = random.randint(0,tensor.shape[0] - desired_shape[0])
dim2 = random.randint(0,tensor.shape[1] - desired_shape[1])
extracted_tensor = tensor[dim1:dim1+desired_shape[0]][:,dim2 + desired_shape[1]]
First import the random module and create a (or use your) tensor. Set your desired_shape.
Then create two random variables, one for each dimension and extract the tensor via sublisting.
But, keep in mind, that you cannot assign values to a tensor in tensorflow as this thread says.
To solve this, first convert it to a numpy array, change the values and convert it to a tensor again, so this would be a solution for your issue.
np_arr = tensor.numpy()
for i in range(5):
dim1 = random.randint(0,tensor.shape[0] - desired_shape[0])
dim2 = random.randint(0,tensor.shape[1] - desired_shape[1])
np_arr[dim1:dim1+desired_shape[0]][:,dim2 + desired_shape[1]] = [1,2,3] # any value
new_tens = tf.convert_to_tensor(np_arr)

Related

Get embedding vectors from Embedding Column in Tensorflow

I want to get the numpy vectors created using the "Embedding Column" in Tensorflow.
For example, creating a sample DF:
sample_column1 = ["Apple","Apple","Mango","Apple","Banana","Mango","Mango","Banana","Banana"]
sample_column2 = [1,2,1,3,4,6,2,1,3]
ds = pd.DataFrame(sample_column1,columns=["A"])
ds["B"] = sample_column2
ds
Converting the pandas DF to Tensorflow object
# A utility method to create a tf.data dataset from a Pandas Dataframe
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
dataframe = dataframe.copy()
labels = dataframe.pop('B')
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
#print (ds)
if shuffle:
ds = ds.shuffle(buffer_size=len(dataframe))
#print (ds)
ds = ds.batch(batch_size)
return ds
Creating a embedding column:
tf_ds = df_to_dataset(ds)
# embedding cols
col_a = feature_column.categorical_column_with_vocabulary_list(
'A', ['Apple', 'Mango', 'Banana'])
col_a_embedding = feature_column.embedding_column(col_a, dimension=8)
Is there anyway to get the embeddings as numpy vectors from the 'col_a_embedding' object?
Example,
The category "Apple" will be embedded into a vector size 8:
[a1 a2 a3 a4 a5 a6 a7 a8]
Can we fetch that vector?
I don't see a way to get what you want using feature columns (I dont see a function named sequence_embedding_column or similar in the available functions in tf.feature_column). Because the result from feature columns seem to be a fixed-length tensor. They achieve that by using a combiner to aggregate individual embedding vectors (sum, mean, sqrtn etc). So the dimension on the sequence of categories are actually lost.
But it's totally doable if you use lower-level apis.
First you could construct a lookup table to convert categorical strings to ids.
features = tf.constant(["apple", "banana", "apple", "mango"])
table = tf.lookup.index_table_from_file(
vocabulary_file="fruit.txt", num_oov_buckets=1)
ids = table.lookup(features)
#Content of "fruit.txt"
apple
mango
banana
unknown
Now you could initialize the embedding as a 2d variable. Its shape is [number of categories, embedding dimension].
num_categories = 3
embedding_dim = 64
category_emb = tf.get_variable(
"embedding_table", [num_categories, embedding_dim],
initializer=tf.truncated_normal_initializer(stddev=0.02))
You could then lookup category embedding like below:
ids_embeddings = tf.nn.embedding_lookup(category_emb, ids)
Note the results in ids_embeddings is a concatenated long tensor. Feel free to reshape it to the shape you want.
I suggest the easiest fastest way is to do like this, which is what I am doing in my own app:
Use pandas to read_csv your file into a string column of type
"category" in pandas using the dtype parameter. Let's call it field
"f". This is the original string column, not a numerical column yet.
Still in pandas, create a new column and copy the original column's
pandas cat.codes into the new column. Let's call it field "f_code". Pandas automatically encodes this into a compactly represented numerical column. It will have the numbers you need for passing to neural networks.
Now in an Embedding layer in your keras functional api neural
network model, pass the f_code to your model's Input layer. The
value in the f_code will be a number now, like int8. The Embedding
layer will process it correctly now. Don't pass the original column to the model.
Below are some sample code lines copied out of my project doing exactly the steps above.
all_col_types_readcsv = {'userid':'int32','itemid':'int32','rating':'float32','user_age':'int32','gender':'category','job':'category','zipcode':'category'}
<some code omitted>
d = pd.read_csv(fn, sep='|', header=0, dtype=all_col_types_readcsv, encoding='utf-8', usecols=usecols_readcsv)
<some code omitted>
from pandas.api.types import is_string_dtype
# Select the columns to add code columns to. Numeric cols work fine with Embedding layer so ignore them.
cat_cols = [cn for cn in d.select_dtypes('category')]
print(cat_cols)
str_cols = [cn for cn in d.columns if is_string_dtype(d[cn])]
print(str_cols)
add_code_columns = [cn for cn in d.columns if (cn in cat_cols) and (cn in str_cols)]
print(add_code_columns)
<some code omitted>
# Actually add _code column for the selected columns
for cn in add_code_columns:
codecolname = cn + "_code"
if not codecolname in d.columns:
d[codecolname] = d[cn].cat.codes
You can see the numeric codes pandas made for you:
d.info()
d.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99991 entries, 0 to 99990
Data columns (total 5 columns):
userid 99991 non-null int32
itemid 99991 non-null int32
rating 99991 non-null float32
job 99991 non-null category
job_code 99991 non-null int8
dtypes: category(1), float32(1), int32(2), int8(1)
memory usage: 1.3 MB
Finally, you can omit the job column and retain the job_code column, in this example, for passing into your keras neural network model. Here is some of my model code:
v = Lambda(lambda z: z[:, field_num0_X_cols[cn]], output_shape=(), name="Parser_" + cn)(input_x)
emb_input = Lambda(lambda z: tf.expand_dims(z, axis=-1), output_shape=(1,), name="Expander_" + cn)(v)
a = Embedding(input_dim=num_uniques[cn]+1, output_dim=emb_len[cn], input_length=1, embeddings_regularizer=reg, name="E_" + cn)(emb_input)
By the way, please also wrap np.array() around all pandas dataframes when passing them into model.fit(). It's not well documented and apparnetly also not checked at runtime that pandas dataframes cannot be safely passed in. You get massive memory allocs otherwise which crash hosts.

How to create Keras ZeroTensor of specific shape

I am a total beginner with tensorflow.keras and I am wondering how I could create a constant zero tensor of a specific shape.
For example with this:
zeros = tf.keras.backend.zeros((someTensor.shape[0], someTensor.shape[1], someTensor.shape[2], channels))
concat = tf.kerasbackend.concatenate([someTensor, zeros], axis=3)
The operation tf.keras.backend.zeros fails with:
ValueError: Cannot convert a partially known TensorShape to a Tensor
I guess thats because the batch size is unknown during graph building. How can I create a ZeroTensor or any other constant tensor when I don't know the batchsize at that moment? Or is there some kind of unknown(?) value that I can specify?
It's strange because you are using a tuple of tensors and integers. Sort of weird.
You should:
shape = K.shape(someTensor)
ch = K.variable([channels]) #I think K.constant also works.
newShape = K.concatenate([shape[:3], ch])
zeros = K.zeros(newShape)
Now, if this doesn't work because of unknown shapes, a dirty workaround would be:
#if someTensor is 3D
zeros = K.zeros_like(someTensor)
zeros = K.stack([zeros] * channels, axis=-1)
#if someTensor is 4D
zeros = K.zeros_like(someTensor[:,:,:,0])
zeros = K.stack([zeros]*channels, axis=-1)

Numpy: stack arrays whose internal dimensions differ

I have a situation similar to the following:
import numpy as np
a = np.random.rand(55, 1, 3)
b = np.random.rand(55, 626, 3)
Here the shapes represent the number of observations, then the number of time slices per observation, then the number of dimensions of the observation at the given time slice. So b is a full representation of 3 dimensions for each of the 55 observations at one new time interval.
I'd like to stack a and b into an array with shape 55, 627, 3. How can one accomplish this in numpy? Any suggestions would be greatly appreciated!
To follow up on Divakar's answer above, the axis argument in numpy is the index of a given dimension within an array's shape. Here I want to stack a and b by virtue of their middle shape value, which is at index = 1:
import numpy as np
a = np.random.rand(5, 1, 3)
b = np.random.rand(5, 100, 3)
# create the desired result shape: 55, 627, 3
stacked = np.concatenate((b, a), axis=1)
# validate that a was appended to the end of b
print(stacked[:, -1, :], '\n\n\n', a.squeeze())
This returns:
[[0.72598529 0.99395887 0.21811998]
[0.9833895 0.465955 0.29518207]
[0.38914048 0.61633291 0.0132326 ]
[0.05986115 0.81354865 0.43589306]
[0.17706517 0.94801426 0.4567973 ]]
[[0.72598529 0.99395887 0.21811998]
[0.9833895 0.465955 0.29518207]
[0.38914048 0.61633291 0.0132326 ]
[0.05986115 0.81354865 0.43589306]
[0.17706517 0.94801426 0.4567973 ]]
A purist might use instead np.all(stacked[:, -1, :] == a.squeeze()) to validate this equivalence. All glory to #Divakar!
Strictly for the curious, the use case for this concatenation is a kind of wonky data preparation pipeline for a Long Short Term Memory Neural Network. In that kind of network, the training data shape should be number_of_observations, number_of_time_intervals, number_of_dimensions_per_observation. I am generating new predictions of each object at a new time interval, so those predictions have shape number_of_observations, 1, number_of_dimensions_per_observation. To visualize the sequence of observations' positions over time, I want to add the new positions to the array of previous positions, hence the question above.

gather values from 2dim tensor in tensorflow

Hi tensorflow beginner here... I'm trying to get the value of a certain elements in an 2 dim tensor, in my case class scores from a probability matrix.
The probability matrix is (1000,81) with batchsize 1000 and number of classes 81. ClassIDs is (1000,) and contains the index for the highest class score for each sample. How do I get the corresponding class score from the probability matrix using tf.gather?
class_ids = tf.cast(tf.argmax(probs, axis=1), tf.int32)
class_scores = tf.gather_nd(probs,class_ids)
class_scores should be a tensor of shape (1000,) containing the highest class_score for each sample.
Right now I'm using a workaround that looks like this:
class_score_count = []
for i in range(probs.shape[0]):
prob = probs[i,:]
class_score = prob[class_ids[i]]
class_score_count.append(class_score)
class_scores = tf.stack(class_score_count, axis=0)
Thanks for the help!
You can do it with tf.gather_nd like this:
class_ids = tf.cast(tf.argmax(probs, axis=1), tf.int32)
# If shape is not dynamic you can use probs.shape[0].value instead of tf.shape(probs)[0]
row_ids = tf.range(tf.shape(probs)[0], dtype=tf.int32)
idx = tf.stack([row_ids, class_ids], axis=1)
class_scores = tf.gather_nd(probs, idx)
You could also just use tf.reduce_max, even though it would actually compute the maximum again it may not be much slower if your data is not too big:
class_scores = tf.reduce_max(probs, axis=1)
you need to run the tensor class_ids to get the values
the values will be a bumpy array
you can access numpy array normally by a loop
you have to do something like this :
predictions = sess.run(tf.argmax(probs, 1), feed_dict={x: X_data})
predictions variable has all the information you need
tensorflow only returns those tensor values which you run explicitly
I think this is what the batch_dims argument for tf.gather is for.

Tensorflow : Choosing a range of columns in each row from a Tensor

I would like to choose only particular columns in each row of a tensor, using it for an RNN
seq_len=[11,12,20,30] #This is the sequence length, assume 4 sequences
array=tf.ones([4,30]) #Assuming this is the array I want to index from
function(array,seq_len) #apply required function
Output=(first 11 elements from row 0, first 12 from row 2, first 20 from row 3 etc), perhaps obtained as a flat tensor
You can use tf.sequence_mask and tf.boolean_mask to get them flattened:
mask = tf.sequence_mask(seq_len, MAX_LENGTH) # Replace MAX_LENGTH with the size of array on the right dimension, 30 in your case
output= tf.boolean_mask(array, mask=mask)
A tensor in tensorflow can be sliced just like a numpy array and then concatenated into one tensor. Assuming you measure the sequence length from the first element.
Use [row_idx,column_idx] to slice the tensor. slice = array[0,:] would assign the first row to slice.
flat_slices = tf.concat([slice,slice]) will flatten them into one tensor.
import tensorflow as tf
seq_len = [11,12,20,30]
array = tf.ones([4,30])
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
flatten = array[0,:seq_len[0]]
for i in range(1,len(seq_len)):
row = array[i,:seq_len[i]]
flatten = tf.concat([flatten, row])
print(sess.run(flatten))