vectorize pytorch tensor indexing - indexing

I have a batch of images img_batch, size [8,3,32,32], and I want to manipulate each image by setting randomly selected pixels to zero. I can do this using a for loop over each image but I'm not sure how to vectorize it so I'm not processing only one image at a time. This is my code using loops.
batch_size = 8
prct0 = 0.1
noise = torch.tensor([9, 14, 5, 7, 6, 14, 1, 3])
comb_img = []
for ind in range(batch_size):
img = img_batch[ind]
c, h, w = img.shape
prct = 1 - (1 - prct0)**noise[ind].item()
idx = random.sample(range(h*w), int(prct*h*w) )
img_noised = img.clone()
img_noised.view(c,1,-1)[:,0,idx] = 0
comb_img.append(img_noised)
comb_img = torch.stack(comb_img) # output is comb_img [8,3,32,32]
I'm new to pytorch and if you see any other improvements, please share.

First note: Do you need to use noise? It will be a lot easier if you treat all images the same and don't have a different set number of pixels to set to 0.
However, you can do it this way, but you still need a small for loop (in the list comprehension).
#don't want RGB masking, want the whole pixel
rng = torch.rand(*img_batch[:,0:1].shape)
#create binary mask
mask = torch.stack([rng[i] <= 1-(1-prct0)**noise[i] for i in range(batch_size)])
img_batch_masked = img_batch.clone()
#broadcast mask to 3 RGB channels
img_batch_masked[mask.tile([1,3,1,1])] = 0
You can check that the mask is set correctly by summing mask across the last 3 dims, and seeing if it matches your target percentage:
In [5]: print(mask.sum([1,2,3])/(mask.shape[2] * mask.shape[3]))
tensor([0.6058, 0.7716, 0.4195, 0.5162, 0.4739, 0.7702, 0.1012, 0.2684])
In [6]: print(1-(1-prct0)**noise)
tensor([0.6126, 0.7712, 0.4095, 0.5217, 0.4686, 0.7712, 0.1000, 0.2710])

You can easily do this without a loop in a fully vectorized manner:
Create noise tensor
Select a threshold and round the noise tensor to 0 or 1 based on above or below that threshold (prct0)
Element-wise multiply image tensor by noise tensor
I think calling the vector of power mutlipliers noise is a bit confusing, so I've renamed that vector power_vec in this example:
power_vec = noise
# create random noise - note one channel rather than 3 color channels
rand_noise = torch.rand(8,1,32,32)
noise = torch.pow(rand_noise,power_vec) # these tensors are broadcastable
# "round" noise based on threshold
z = torch.zeros(noise.shape)
o = torch.ones(noise.shape)
noise_rounded = torch.where(noise>prct0,o,z)
# apply noise mask to each color channel
output = img_batch * noise_rounded.expand(8,3,32,32)
For simplicity this solution uses your original batch size and image size but could be trivially extended to work on inputs of any image and batch size.

Related

How to correctly ignore padded or missing timesteps at decoding time in multi-feature sequences with LSTM autonecoder

I am trying to learn a latent representation for text sequence (multiple features (3)) by doing reconstruction USING AUTOENCODER. As some of the sequences are shorter than the maximum pad length or a number of time steps I am considering (seq_length=15), I am not sure if reconstruction will learn to ignore the timesteps or not for calculating loss or accuracies.
I followed suggestions from this answer to crop the outputs but my losses are nan and several of accuracies as well.
input1 = keras.Input(shape=(seq_length,),name='input_1')
input2 = keras.Input(shape=(seq_length,),name='input_2')
input3 = keras.Input(shape=(seq_length,),name='input_3')
input1_emb = layers.Embedding(70,32,input_length=seq_length,mask_zero=True)(input1)
input2_emb = layers.Embedding(462,192,input_length=seq_length,mask_zero=True)(input2)
input3_emb = layers.Embedding(84,36,input_length=seq_length,mask_zero=True)(input3)
merged = layers.Concatenate()([input1_emb, input2_emb,input3_emb])
activ_func = 'tanh'
encoded = layers.LSTM(120,activation=activ_func,input_shape=(seq_length,),return_sequences=True)(merged) #
encoded = layers.LSTM(60,activation=activ_func,return_sequences=True)(encoded)
encoded = layers.LSTM(15,activation=activ_func)(encoded)
# Decoder reconstruct inputs
decoded1 = layers.RepeatVector(seq_length)(encoded)
decoded1 = layers.LSTM(60, activation= activ_func , return_sequences=True)(decoded1)
decoded1 = layers.LSTM(120, activation= activ_func , return_sequences=True,name='decoder1_last')(decoded1)
Decoder one has an output shape of (None, 15, 120).
input_copy_1 = layers.TimeDistributed(layers.Dense(70, activation='softmax'))(decoded1)
input_copy_2 = layers.TimeDistributed(layers.Dense(462, activation='softmax'))(decoded1)
input_copy_3 = layers.TimeDistributed(layers.Dense(84, activation='softmax'))(decoded1)
For each output, I am trying to crop the O padded timesteps as suggested by this answer. padding has 0 where actual input was missing (had zero due to padding) and 1 otherwise
#tf.function
def cropOutputs(x):
#x[0] is softmax of respective feature (time distributed) on top of decoder
#x[1] is the actual input feature
padding = tf.cast( tf.not_equal(x[1][1],0), dtype=tf.keras.backend.floatx())
print(padding)
return x[0]*tf.tile(tf.expand_dims(padding, axis=-1),tf.constant([1,x[0].shape[2]], tf.int32))
Applying crop function to all three outputs.
input_copy_1 = layers.Lambda(cropOutputs, name='input_copy_1', output_shape=(None, 15, 70))([input_copy_1,input1])
input_copy_2 = layers.Lambda(cropOutputs, name='input_copy_2', output_shape=(None, 15, 462))([input_copy_2,input2])
input_copy_3 = layers.Lambda(cropOutputs, name='input_copy_3', output_shape=(None, 15, 84))([input_copy_3,input3])
My logic is to crop timesteps of each feature (all 3 features for sequence have the same length, meaning they miss timesteps together). But for timestep, they have been applied softmax as per their feature size (70,462,84) so I have to zero out timestep by making a multi-dimensional mask array of zeros or ones equal to this feature size with help of mask padding, and multiply by respective softmax representation using this using multi-dimensional mask array.
I am not sure I am doing this right or not as I have Nan losses for these inputs as well as other accuracies have that I am learning jointly with this task (it happens only with this cropping thing).
If it helps someone, I end up cropping the padded entries from the loss directly (taking some keras code pointer from these answers).
#tf.function
def masked_cc_loss(y_true, y_pred):
mask = tf.keras.backend.all(tf.equal(y_true, masked_val_hotencoded), axis=-1)
mask = 1 - tf.cast(mask, tf.keras.backend.floatx())
loss = tf.keras.losses.CategoricalCrossentropy()(y_true, y_pred) * mask
return tf.keras.backend.sum(loss) / tf.keras.backend.sum(mask) # averaging by the number of unmasked entries

Using One Hot Encodings

Problem definition:
Implement the function below to take one label and the total number of classes 𝐶 , and return the one hot encoding in a column wise matrix. Use tf.one_hot() to do this, and tf.reshape() to reshape your one hot tensor!
tf.reshape(tensor, shape)
enter code here
def one_hot_matrix(label, depth=6):
"""
    Computes the one hot encoding for a single label
    
    Arguments:
label -- (int) Categorical labels
depth -- (int) Number of different classes that label can take
    
    Returns:
one_hot -- tf.Tensor A single-column matrix with the one hot encoding.
"""
# (approx. 1 line)
# one_hot = ...
# YOUR CODE STARTS HERE
# YOUR CODE ENDS HERE
return one_hot
enter code here
when you take this one serious "# (approx. 1 line)"
one_hot = tf.reshape(tf.one_hot(label,depth,axis = 0), [depth, ])
one_hot = tf.one_hot(label, depth, axis = 0)
one_hot = tf.reshape(one_hot, (-1,1))
one_hot = tf.reshape(tf.one_hot(label,depth,axis=0), (depth))

How does a 1D multi-channel convolutional layer (Keras) train?

I am working with time series EEG data recorded from 10 individual locations on the body to classify future behavior in terms of increasing heart activity. I would like to better understand how my labeled data corresponds to the training inputs.
So far, several RNN configurations as well as countless combinations of vanilla dense networks have not gotten me great results and I'd figure a 1D convnet is worth a try.
The things I'm having trouble understanding are:
1.) Feeding data into the model.
orig shape = (30000 timesteps, 10 channels)
array fed to layer = (300 slices, 100 timesteps, 10 channels)
Are the slices separated by 1 time step, giving me 300 slices of timesteps at either end of the original array, or are they separated end to end? If the second is true, how could I create an array of (30000 - 100) slices separated by one ts and is also compatible with the 1D CNN layer?
2) Matching labels with the training and testing data
My understanding is that when you feed in a sequence of train_x_shape = (30000, 10), there are 30000 labels with train_y_shape = (30000, 2) (2 classes) associated with the train_x data.
So, when (300 slices of) 100 timesteps of train_x data with shape = (300, 100, 10) are fed into the model, does the label value correspond to the entire 100 ts (one label per 100 ts, with this label being equal to the last time step's label), or are each 100 rows/vectors in the slice labeled- one for each ts?
Train input:
train_x = train_x.reshape(train_x.shape[0], 1, train_x.shape[1])
n_timesteps = 100
n_channels = 10
layer : model.add(Convolution1D(filters = n_channels * 2, padding = 'same', kernel_size = 3, input_shape = (n_timesteps, n_channels)))
final layer : model.add(Dense(2, activation = 'softmax'))
I use categorical_crossentropy for loss.
Answer 1
This will really depend on "how did you get those slices"?
The answer is totally dependent on what "you're doing". So, what do you want?
If you have simply reshaped (array.reshape(...)) the original array from shape (30000,10) to shape (300,100,10), the model will see:
300 individual (and not connected) sequences
100 timesteps in each sequence
Sequence 1 goes from step 0 to 299;
Sequence 2 goes from step 300 to 599 and so on.
Creating overlapping slices - Sliding window
If you want to create sequences shifted by only one timestep, make a loop for that.
import numpy as np
originalSequence = someArrayWithShape((30000,10))
newSlices = [] #empty list
start = 0
end = start + 300
while end <= 30000:
newSlices.append(originalSequence[start:end])
start+=1
end+=1
newSlices = np.asarray(newSlices)
Beware: if you do this in the input data, you will have to do a similar thing in your output data as well.
Answer2
Again, that's totally up to you. What do you want to achieve?
Convolutional layers will keep the timesteps with these options:
If you use padding='same', the final length will be the same as the input
If you don't, the final length will be reduced depending on the kernel size you choose
Recurrent layers will keep the timesteps or not depending on:
Whether you use return_sequences=True - Output has timesteps
Or you use return_sequences=False - Output has no timesteps
If you want only one output for each sequence (not per timestep):
Recurrent models:
Use LSTM(...., return_sequences=True) until the last LSTM
The last LSTM will be LSTM(..., return_sequences=False)
Convolutional models:
At some point after the convolutions, choose one of these to add:
GlobalMaxPooling1D
GlobalAveragePooling1D
Flatten (but treat the number of channels later with a Dense(2)
Reshape((2,))
I think I'd go with GlobalMaxPooling2D if using convoltions, but recurrent models seem better for this. (Not a rule, though).
You can choose to use intermediate MaxPooling1D layers to gradually reduce the length from 100 to 50, then to 25 and so on. This will probably reach a better output.
Remember to keep X and Y paired:
import numpy as np
train_x = someArrayWithShape((30000,10))
train_y = someArrayWithShape((30000,2))
newXSlices = [] #empty list
newYSlices = [] #empty list
start = 0
end = start + 300
while end <= 30000:
newXSlices.append(train_x[start:end])
newYSlices.append(train_y[end-1:end])
start+=1
end+=1
newXSlices = np.asarray(newXSlices)
newYSlices = np.asarray(newYSlices)

how to resolve tensorflow gradient issue with large weight matrix dimension?

I am trying to compute gradient using tf.gradient by passing loss and update the weight using the gradient. I observe that the gradients are zero when the weight matrix dimension is large. What could be the reason for this?
I see that there is no error in the code. I am getting non-zero gradients with smaller weight matrix and input vector sizes. typically for input vector size of 1*100 and weight of 100x50 the FP value will be 1 x 50. and during BP ip will be 1 x 50 and weight will be 50 x 100. BP vector will be 1 x 100.
loss is sum of differences of Forward (FP) and Backward propagation (BP) values. FP values are input vector (IP) - sigmoid(IP x weight) and BP values - sigmoid(FP x weight.Transpose).
var_grad = tf.gradients(loss, [weight_matrix])[0]
update = tf.subtract(weight_matrix,(tf.mul(var_grad,0.1)))
I am calling the TF graph as below
result,cost,gradient = sess.run([update,loss,var_grad], feed_dict={weight_matrix: weight_mat})
Forward propagation values
w = tf.matmul(final,weight)
sig = tf.sigmoid(w)
loss is computed as
for i in range(len(PopulatedList)):
branch = PopulatedList[i]
RC_FP = branch['RC_FP']
RC_BP = branch['RC_BP']
LC_FP = branch['RC_FP']
LC_BP = branch['LC_BP']
loss = tf.reduce_sum(tf.squared_difference(RC_FP_TF,RC_BP_TF),[0, 1]) + tf.reduce_sum(tf.squared_difference(LC_FP_TF,LC_BP_TF),[0, 1])
out_error.append(loss)
error = tf.reduce_sum(out_error)
return error
There is nothing seemingly wrong in your code, so what you're seeing is probably a consequence of how you initialize your weights.

Create color histogram of an image using tensorflow

Is there a neat way to compute a color histogram of an image? Maybe by abusing the internal code of tf.histogram_summary? From what I've seen, this code is not very modular and calls directly some C++ code.
Thanks in advance.
I would use tf.unsorted_segment_sum, where the "segment IDs" are computed from the color values and the thing you sum is a tf.ones vector. Note that tf.unsorted_segment_sum is probably better thought of as "bucket sum". It implements dest[segment] += thing_to_sum -- exactly the operation you need for a histogram.
In slightly pseudocode (meaning I haven't run this):
binned_values = tf.reshape(tf.floor(img_r * (NUM_BINS-1)), [-1])
binned_values = tf.cast(binned_values, tf.int32)
ones = tf.ones_like(binned_values, dtype=tf.int32)
counts = tf.unsorted_segment_sum(ones, binned_values, NUM_BINS)
You could accomplish this in one pass instead of separating out the r, g, and b values with a split if you wanted to cleverly construct your "ones" to look like "100100..." for red, "010010" for green, etc., but I suspect it would be slower overall, and harder to read. I'd just do the split that you proposed above.
This is what I'm using right now:
# Assumption: img is a tensor of the size [img_width, img_height, 3], normalized to the range [-1, 1].
with tf.variable_scope('color_hist_producer') as scope:
bin_size = 0.2
hist_entries = []
# Split image into single channels
img_r, img_g, img_b = tf.split(2, 3, img)
for img_chan in [img_r, img_g, img_b]:
for idx, i in enumerate(np.arange(-1, 1, bin_size)):
gt = tf.greater(img_chan, i)
leq = tf.less_equal(img_chan, i + bin_size)
# Put together with logical_and, cast to float and sum up entries -> gives count for current bin.
hist_entries.append(tf.reduce_sum(tf.cast(tf.logical_and(gt, leq), tf.float32)))
# Pack scalars together to a tensor, then normalize histogram.
hist = tf.nn.l2_normalize(tf.pack(hist_entries), 0)
tf.histogram_fixed_width
might be what you are looking for...
Full documentation on
https://www.tensorflow.org/api_docs/python/tf/histogram_fixed_width