How to write expensive summaries less often in tensorflow - tensorflow

I have a tensorflow model. In it, I have different summaries. Some, such as loss and accuracy and inexpensive, and I want to write them often. Others, like accuracy on the test set are more expensive to calculate and I want to write them say, 100 times less often than normal summaries. What is the best way to implement it in tensorflow?

Instead of merging all summaries with merge_all(), you create a few different groups of summaries with merge() and then write them with different frequency. Something like this:
s1 = tf.summary.image(...)
s2 = tf.summary.scalar(...)
s3 = tf.summary.histogram(...)
s4 = tf.summary.audio(...)
summary_expensive = tf.summary.merge([s1, s4])
summary_cheap = tf.summary.merge([s2, s3])
# open a session `sess`
# init variables
# create a writer `writer`
for i in xrange(many_steps):
summary1 = sess.run(summary_cheap)
writer.add_summary(summary1, i)
if i % 100 == 0:
summary2 = sess.run(summary_expensive)
writer.add_summary(summary2, i)

Related

PyTorch alternative for tf.data.experimental.sample_from_datasets

Suppose I have two datasets, dataset one with 100 items and dataset two with 5000 items.
Now I want that during training my model sees as much items from dataset one as from dataset two.
In Tensorflow I can do:
dataset = tf.data.experimental.sample_from_datasets(
[dataset_one, dataset_two], weights=[50,1], seed=None
)
Is there an alternative in PyTorch that does the same?
I think this is not too difficult to implement by creating a custom dataset (not working example)
from torch.utils.data import Dataset
class SampleDataset(Dataset):
def __init__(self, datasets, weights):
self.datasets = datasets
self.weights = weights
def __len__(self):
return sum([len(dataset) for dataset in self.datasets])
def __getitem__(self, idx):
# sample a random number and based on that sample an item
return self.datasets[dataset_idx][sample_idx]
However, this seems quite common. Is there already something like this available?
I don't think there is a direct equivalent in PyTorch.
However, there's a function called torch.utils.data.WeightedRandomSampler which samples indices based on a list of probabilities. You can use this in combination with torch.data.utils.ConcatDataset and torch.utils.data.DataLoader's sampler option.
I'll give an example with two datasets: SetA has 500 elements and SetB which only has 10.
First, you can create a concatenation of all your datasets with ConcaDataset:
ds = ConcatDataset([SetA(), SetB()])
Then, we need to sample it. The problem is, you can't just give WeightedRandomSampler [50, 1], as you did in Tensorflow. As a workaround, you can create a list of probabilities of the same length as the size of the total dataset.
The corresponding probability list for this example would be:
dist = np.array([1/51]*500 + [50/51]*10)
Essentially, the first 500 indices (i.e. indices 'pointing' to SetA) will have a probability of 1/51 of being choosen while the following 10 indices (i.e. indices in SetB) will have a probability of 50/51 (i.e much more likely to being sampled since there are less elements in SetB, this is the desired result!)
We can create a sampler from that distribution:
WeightedRandomSampler(dist, 10)
Where 10 is the number of sampled elements. I would put the size of the smallest dataset, otherwise you would likely be going over the same datapoints multiple times during the same epoch...
Finally, we just have to instanciate the dataloader with our dataset and sampler:
dl = DataLoader(ds, sampler=sampler)
To summarize:
ds = ConcatDataset([SetA(), SetB()])
dist = np.array([1/51]*500 + [50/51]*10)
sampler = WeightedRandomSampler(dist, 10)
dl = DataLoader(ds, sampler=sampler)
Edit, for any number of datasets:
sets = [SetA(), SetB(), SetC()]
ds = ConcatDataset(sets)
dist = np.concatenate([[(len(ds) - len(s))/len(ds)]*len(s) for s in sets])
sampler = WeightedRandomSampler(weights=dist, num_samplesmin([len(s) for s in sets])
dl = DataLoader(ds, sampler=sampler)

How can I use multiple gpus in cupy?

I am trying to parallelise multiple matrix multiplications using multiple GPUs in CUPY.
Cupy accelerates matrix multiplication (e.g. $A\times B$).
I am wondering if I have four square matrices A,B,C,D. I want to calculate AB and CD on two different local GPUs. How can I do it in CUPY?
For example, in tensorflow,
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
Is there a similar way in CUPY. The thing about Cupy is that it execute code straight away, so that it cannot run the next line (e.g. $C\times D$) until current line finishes (e.g. $A\times B$).
Thanks for Tos's help. Now the new questions is,
say I have ten of these matrices pairs stored in two 3d numpy array (say ?*?*10). How can I write a loop to store the result of multiplication?
anumpy #size(1e5,1e5,10)
bnumpy #size(1e5,1e5,10)
for i in range(10):
#say I have 3 gpus
with cupy.cuda.Device(i % 3):
a = cupy.array(anumpy[:,:,i])
b = cupy.array(bnumpy[:,:,i])
ab[:,:,math.floor(i/3)] = a # b
How can I combine these 3 ab in different devices?
Can I have arrays with the same name in different GPUs?
Use with cupy.cuda.Device(i) and avoid blocking operations. For example, to compute matmul of pairs of CPU arrays, send the results to CPU (cupy.asnumpy) after all matmul operations are called.
a = cupy.array(a)
b = cupy.array(b)
ab = a # b
# ab = cupy.asnumpy(ab) # not here
with cupy.cuda.Device(1):
c = cupy.array(c)
d = cupy.array(d)
cd = c # d
cd = cupy.asnumpy(cd)
ab = cupy.asnumpy(ab)
CuPy does not synchronize the device execution in most operations. The code like A.dot(B) returns immediately after launching the matrix product on the device, without waiting for the device side operation itself, so if the operation is heavy enough (e.g. the matrices are large), the computation effectively overlaps with the second matrix product on another device.
I'm not 100% sure if I understand the question properly, but I guess it can be something like this:
def my_cal(gpu_id, anumpy, bnumpy):
a = None
b = None
ab = None
with cupy.cuda.Device(gpu_id):
for i in range(10):
a = cupy.array(anumpy[:,:,i])
b = cupy.array(bnumpy[:,:,i])
ab[:,:,math.floor(i/3)] = a # b
return cupy.asnumpy(ab)
np_ab0 = my_cal(0, anumpy, bnumpy)
np_ab1 = my_cal(1, anumpy, bnumpy)
np_ab2 = my_cal(2, anumpy, bnumpy)

After quantisation in neural network, will the output need to be scaled with the inverse of the weight scaling

I'm currently writing a script to quantise a Keras model down to 8 bits. I'm doing a fairly basic linear scaling on the weights, by assuming a normal distribution of weights and biases, and then interpolating all the values within 2 standard deviations of the mean, to the range [-128, 127].
This all works, and I run the model through inference, but my image out is crazy bad. I know there will be a small performance hit, but I'm seeing roughly 10x performance degradation.
My question is, after this scaling of the weights, do I need to do the inverse scaling operation to my output? None of the papers I've been reading seem to mention this, but I'm unsure why else my results would be so bad.
The network is for image demosaicing. It takes in a RAW image, and is meant to output an image with very low noise, and no demosaicing artefacts. My full precision model is very good, with image PSNRs of around 40-43dB, but after quantisation, I'm getting 4-8dB, and incredibly bad looking images.
Code for anyone who's bothered to read it
for i in layer_index:
count = count+1
layer = model.get_layer(index = i);
weights = layer.get_weights();
weights_act = weights[0];
bias_act = weights[1];
std = np.std(weights_act)
if (std > max_std):
max_std = std
mean = np.mean(weights_act)
mean_of_mean = mean_of_mean + mean
mean_of_mean = mean_of_mean / count
max_bound = mean_of_mean + 2*max_std
min_bound = mean_of_mean - 2*max_std
print(max_bound, min_bound)
for i in layer_index:
layer = model.get_layer(index = i);
weights = layer.get_weights();
weights_act = weights[0];
bias_act = weights[1];
weights_shape = weights_act.shape;
bias_shape = bias_act.shape;
new_weights = np.empty(weights_shape, dtype = np.int8)
print(new_weights.dtype)
new_biass = np.empty(bias_shape, dtype = np.int8)
for a in range(weights_shape[0]):
for b in range(weights_shape[1]):
for c in range(weights_shape[2]):
for d in range(weights_shape[3]):
new_weight = (((weights_act[a,b,c,d] - min_bound) * (127 - (-128)) / (max_bound - min_bound)) + (-128))
new_weights[a,b,c,d] = np.int8(new_weight)
#print(new_weights[a,b,c,d], weights_act[a,b,c,d])
for e in range(bias_shape[0]):
new_bias = (((bias_act[e] - min_bound) * (127 - (-128)) / (max_bound - min_bound)) + (-128))
new_biass[e] = np.int8(new_bias)
new_weight_layer = (new_weights, new_biass)
layer.set_weights(new_weight_layer)
You dont do what you think you are doing, I'll explain.
If you wish to take pre-trained model and quantize it you have to add scales after each operation that involves weights, lets take for example the convolution operation.
As we know convolution operation is linear in my explantion i will ignore the bias for the sake of simplicity (adding him is relatively easy), Let's assume X is our input Y is our output and W is the weights, convolution can be written as:
Y=W*X
where '*' represent the convolution operation, what you are basically doing is taking the weights and multiple them by some scalar (lets call it 'a') and shift them by some other scalar (let's call it 'b') so in your model you use W' where: W'= Wa+b
So if we return to the convolution operation we get that in your quantized network you basically do the next operation: Y' = W'*X = (Wa+b)*X
Because convolution is linear we get: Y' = a(W*X) + b*X'
Don't forget that in your network you want to receive Y not Y' at the output of the convolution therefore you must do shift + re scale to get the correct answer.
So after that explanation (which i hope was clear enough) i hope you can understand what is the problem in your network, you do this scale and shift to all of weights and you never compensate for it, I think your confusion is because your read papers that trained models in quantized mode from the beginning and didn't take pretrained model quantized it.
For you problem i think tensorflow graph transform tool might help, take a look at:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md
If you wish to read more about quantizing pre trained model you can find more information in (for more academic info just go to scholar.google.com:
https://www.tensorflow.org/lite/performance/post_training_quantization

Restored model in tensorflow gives different results for relu operation

The weights retrieved from restored model doesn't change and the input is also constant
But the output of 'Relu:0' operation is giving different results each time.
Below is my code:
sess=tf.Session()
saver = tf.train.import_meta_graph('checkpoints/checkpoints_otherapproach_1/cameranetwork_RAID_CNN-3100.meta')
saver.restore(sess,tf.train.latest_checkpoint(checkpoint_dir='checkpoints/checkpoints_otherapproach_1/'))
images = tf.get_default_graph().get_tensor_by_name('images:0')
phase = tf.get_default_graph().get_tensor_by_name('phase:0')
Activ = tf.get_default_graph().get_tensor_by_name('network/siamese_model/convolution_1/conv_1/Relu:0')
image_array = np.zeros(shape = [1,3,128,64,3]) #*******
imagepath = 'RAiD_Dataset' + '/images_afterremoving_persons_notinallcameras/'+'test'+'/camera_'+str(1)
fullfile_name = imagepath+"/"+ 'camera_1_person_23_index_1.jpg'
image_array[0][0] = cv2.imread(fullfile_name)
image_array[0][1] = image_array[0][0]
image_array[0][2] = image_array[0][0]
image_array = image_array.astype(np.float32)
feed_dict_values ={images: image_array, phase:False}
temp2 = sess.run(Activ, feed_dict =feed_dict_values)
temp1 = sess.run(Activ, feed_dict =feed_dict_values)
print (temp1==temp2).all() #output is false
There are two possible reasons for this:
Some of the tensorflow ops inherit non-deterministic behavior from CUDA. This results in small numerical errors (which might be amplified by non-linearities). See this answer on how to try running your model on a single CPU thread. If the two arrays will turn out to be identical in this condition, then this is the case.
I'm assuming that you know the graph you are loading, but the graph itself might produce inconsistent results 'by design' due to operations deliberately introducing either randomness or inconstant data. For example, consider operations that use the random number generator or operations that update variables (e.g., tf.assign) each time Activ is evaluated.

How to connect two models

I have model A (autoencoder) which takes as input a batch of images A_in (original images), and outputs a batch of images A_out (reconstructed images). Then I have model B (binary classifier) which takes as input a batch of images B_in, which is a mixture of A_in and A_out.
I want B to distinguish between A_in and A_out, to see if A is doing a good job reconstructing images. B_out is a probability that a given image is A_in.
B trains in parallel with A to classify the two kinds of images. B_loss = (B_out - label). Labels are 0 or 1 (original or reconstructed). When we optimize B_loss we only update B parameters.
I want to train model A so that it optimizes a combined loss function: Combined_Loss = reconstruction error (A_out - A_in) - classification error (B_out - label), so that it tries to reconstruct the images and fool B at the same time. Here I want to only update A parameters (we don't want to help B here).
Now, my question is about constructing that mixture of A_in and A_out, and feeding it to B so that the graphs A and B are connected.
Right now it's like this:
A_out = autoencoder(A_in: orig_images)
B_out = classifier(B_in: numpy(mix(A_in, A_out))
How do I define it like this:
A_out = autoencoder(A_in: orig_images)
B_out = classifier(mix(A_out, A_in))
So that when I train A and B at the same time:
sess.run([autoencoder_train_op, classifier_train_op], feed_dict=
{A_in: orig_images, B_in: classifier_images, labels: classifier_labels})
I wouldn't need B_in placeholder (the graphs would be connected)?
Here's my Numpy code that constructs classifier_images (mix(A_in, A_out)):
reconstr_images = sess.run(A_out, feed_dict={A_in: orig_images})
half_and_half_images = np.concatenate((reconstr_images[:batch_size/2], orig_images[batch_size/2:]))
half_and_half_labels = np.zeros(labels.shape)
half_and_half_labels[batch_size/2:] = 1
random_indices = np.random.permutation(batch_size)
classifier_images = half_and_half_images[random_indices]
classifier_labels = half_and_half_labels[random_indices]
How do I convert it into TensorFlow node?
You can connect your models directly. In other words, you don't use placeholder for B's inputs, but use your mixture of A_in and A_out. If you just want to run B, you can still feed your inputs into the tensors that are coming from A. Feeding only placeholders is common, but TensorFlow supports feeding a value into any tensor. If it makes it easier to think about, you can pass the A's outputs through tf.identity so that you have something like a placeholder.
Another approach is what is usually done in GANs (where the generator output is fed into discriminator). You can create two "towers" of operations that share the variables. One tower will be just B and you can feed your inputs into B's placeholders to run just B. Another tower can be B on top of A, which you can use to run/train A and B together. The Bs in these two towers will have the same structure and share variables, but have separate ops. This approach is likely the cleanest and most flexible.