How to do fast nearest-neighbor upsampling in Theano? - resize

Currently I am using this code, but it is really slow:
def resizeNN(inp, scale):
b, ch, row, col = inp.shape
out = T.zeros((b, ch, row * scale, col * scale))
for y range(scale):
for x range(scale):
out = T.inc_subtensor(out[:, :, y::scale, x::scale], inp)
return out
Is there any way to speed it up?

This code is more vectorized. I'm not sure how fast this is due to reshape.
def resizeNN(inp, scale):
inp_shp = T.shape(inp)
return T.tile(inp.dimshuffle(0,1,2,3,'x','x'),
(scale,scale)).transpose(
0,1,2,4,3,5).reshape(
inp_shp[0], inp_shp[1],
inp_shp[2]*scale, inp_shp[3]*scale)

Related

Evaluating the squared term of a gaussian kernel for having a covariance matrix for multi-dimensional inputs [duplicate]

I have the following code. It is taking forever in Python. There must be a way to translate this calculation into a broadcast...
def euclidean_square(a,b):
squares = np.zeros((a.shape[0],b.shape[0]))
for i in range(squares.shape[0]):
for j in range(squares.shape[1]):
diff = a[i,:] - b[j,:]
sqr = diff**2.0
squares[i,j] = np.sum(sqr)
return squares
You can use np.einsum after calculating the differences in a broadcasted way, like so -
ab = a[:,None,:] - b
out = np.einsum('ijk,ijk->ij',ab,ab)
Or use scipy's cdist with its optional metric argument set as 'sqeuclidean' to give us the squared euclidean distances as needed for our problem, like so -
from scipy.spatial.distance import cdist
out = cdist(a,b,'sqeuclidean')
I collected the different methods proposed here, and in two other questions, and measured the speed of the different methods:
import numpy as np
import scipy.spatial
import sklearn.metrics
def dist_direct(x, y):
d = np.expand_dims(x, -2) - y
return np.sum(np.square(d), axis=-1)
def dist_einsum(x, y):
d = np.expand_dims(x, -2) - y
return np.einsum('ijk,ijk->ij', d, d)
def dist_scipy(x, y):
return scipy.spatial.distance.cdist(x, y, "sqeuclidean")
def dist_sklearn(x, y):
return sklearn.metrics.pairwise.pairwise_distances(x, y, "sqeuclidean")
def dist_layers(x, y):
res = np.zeros((x.shape[0], y.shape[0]))
for i in range(x.shape[1]):
res += np.subtract.outer(x[:, i], y[:, i])**2
return res
# inspired by the excellent https://github.com/droyed/eucl_dist
def dist_ext1(x, y):
nx, p = x.shape
x_ext = np.empty((nx, 3*p))
x_ext[:, :p] = 1
x_ext[:, p:2*p] = x
x_ext[:, 2*p:] = np.square(x)
ny = y.shape[0]
y_ext = np.empty((3*p, ny))
y_ext[:p] = np.square(y).T
y_ext[p:2*p] = -2*y.T
y_ext[2*p:] = 1
return x_ext.dot(y_ext)
# https://stackoverflow.com/a/47877630/648741
def dist_ext2(x, y):
return np.einsum('ij,ij->i', x, x)[:,None] + np.einsum('ij,ij->i', y, y) - 2 * x.dot(y.T)
I use timeit to compare the speed of the different methods. For the comparison, I use vectors of length 10, with 100 vectors in the first group, and 1000 vectors in the second group.
import timeit
p = 10
x = np.random.standard_normal((100, p))
y = np.random.standard_normal((1000, p))
for method in dir():
if not method.startswith("dist_"):
continue
t = timeit.timeit(f"{method}(x, y)", number=1000, globals=globals())
print(f"{method:12} {t:5.2f}ms")
On my laptop, the results are as follows:
dist_direct 5.07ms
dist_einsum 3.43ms
dist_ext1 0.20ms <-- fastest
dist_ext2 0.35ms
dist_layers 2.82ms
dist_scipy 0.60ms
dist_sklearn 0.67ms
While the two methods dist_ext1 and dist_ext2, both based on the idea of writing (x-y)**2 as x**2 - 2*x*y + y**2, are very fast, there is a downside: When the distance between x and y is very small, due to cancellation error the numerical result can sometimes be (very slightly) negative.
Another solution besides using cdist is the following
difference_squared = np.zeros((a.shape[0], b.shape[0]))
for dimension_iterator in range(a.shape[1]):
difference_squared = difference_squared + np.subtract.outer(a[:, dimension_iterator], b[:, dimension_iterator])**2.

Can I implement a gradient descent for arbitrary convex loss function?

I have a loss function I would like to try and minimize:
def lossfunction(X,b,lambs):
B = b.reshape(X.shape)
penalty = np.linalg.norm(B, axis = 1)**(0.5)
return np.linalg.norm(np.dot(X,B)-X) + lambs*penalty.sum()
Gradient descent, or similar methods, might be useful. I can't calculate the gradient of this function analytically, so I am wondering how I can numerically calculate the gradient for this loss function in order to implement a descent method.
Numpy has a gradient function, but it requires me to pass a scalar field at pre determined points.
You could try scipy.optimize.minimize
For your case a sample call would be:
import scipy.optimize.minimize
scipy.optimize.minimize(lossfunction, args=(b, lambs), method='Nelder-mead')
You could estimate the derivative numerically by a central difference:
def derivative(fun, X, b, lambs, h):
return (fun(X + 0.5*h,b,lambs) - fun(X - 0.5*h,b,lambs))/h
And use it like this:
# assign values to X, b, lambs
# set the value of h
h = 0.001
print derivative(lossfunction, X, b, lambs, h)
The code above is valid for dimX = 1, some modifications are needed to account for multidimensional vector X:
def gradient(fun, X, b, lambs, h):
res = []
for i in range (0,len(X)):
t1 = list(X)
t1[i] = t1[i] + 0.5*h
t2 = list(X)
t2[i] = t2[i] - 0.5*h
res = res + [(fun(t1,b,lambs) - fun(t2,b,lambs))/h]
return res
Forgive the naivity of the code, I barely know how to write some python :-)

Row-wise Histogram

Given a 2-dimensional tensor t, what's the fastest way to compute a tensor h where
h[i, :] = tf.histogram_fixed_width(t[i, :], vals, nbins)
I.e. where tf.histogram_fixed_width is called per row of the input tensor t?
It seems that tf.histogram_fixed_width is missing an axis parameter that works like, e.g., tf.reduce_sum's axis parameter.
tf.histogram_fixed_width works on the entire tensor indeed. You have to loop through the rows explicitly to compute the per-row histograms. Here is a complete working example using TensorFlow's tf.while_loop construct :
import tensorflow as tf
t = tf.random_uniform([2, 2])
i = 0
hist = tf.constant(0, shape=[0, 5], dtype=tf.int32)
def loop_body(i, hist):
h = tf.histogram_fixed_width(t[i, :], [0.0, 1.0], nbins=5)
return i+1, tf.concat_v2([hist, tf.expand_dims(h, 0)], axis=0)
i, hist = tf.while_loop(
lambda i, _: i < 2, loop_body, [i, hist],
shape_invariants=[tf.TensorShape([]), tf.TensorShape([None, 5])])
sess = tf.InteractiveSession()
print(hist.eval())
Inspired by keveman's answer and because the number of rows of t is fixed and rather small, I chose to use a combination of tf.gather to split rows and tf.pack to join rows. It looks simple and works, will see if it is efficient...
t_histo_rows = [
tf.histogram_fixed_width(
tf.gather(t, [row]),
vals, nbins)
for row in range(t_num_rows)]
t_histo = tf.pack(t_histo_rows, axis=0)
I would like to propose another implementation.
This implementation can also handle multi axes and unknown dimensions (batching).
def histogram(tensor, nbins=10, axis=None):
value_range = [tf.reduce_min(tensor), tf.reduce_max(tensor)]
if axis is None:
return tf.histogram_fixed_width(tensor, value_range, nbins=nbins)
else:
if not hasattr(axis, "__len__"):
axis = [axis]
other_axis = [x for x in range(0, len(tensor.shape)) if x not in axis]
swap = tf.transpose(tensor, [*other_axis, *axis])
flat = tf.reshape(swap, [-1, *np.take(tensor.shape.as_list(), axis)])
count = tf.map_fn(lambda x: tf.histogram_fixed_width(x, value_range, nbins=nbins), flat, dtype=(tf.int32))
return tf.reshape(count, [*np.take([-1 if a is None else a for a in tensor.shape.as_list()], other_axis), nbins])
The only slow part here is tf.map_fn but it is still faster than the other solutions mentioned.
If someone knows a even faster implementation please comment since this operation is still very expensive.
answers above is still slow running in GPU. Here i give an another option, which is faster(at least in my running envirment), but it is limited to 0~1 (you can normalize the value first). the train_equal_mask_nbin can be defined once in advance
def histogram_v3_nomask(tensor, nbins, row_num, col_num):
#init mask
equal_mask_list = []
for i in range(nbins):
equal_mask_list.append(tf.ones([row_num, col_num], dtype=tf.int32) * i)
#[nbins, row, col]
#[0, row, col] is tensor of shape [row, col] with all value 0
#[1, row, col] is tensor of shape [row, col] with all value 1
#....
train_equal_mask_nbin = tf.stack(equal_mask_list, axis=0)
#[inst, doc_len] float to int(equaly seg float in bins)
int_input = tf.cast(tensor * (nbins), dtype=tf.int32)
#input [row,col] -> copy N times, [nbins, row_num, col_num]
int_input_nbin_copy = tf.reshape(tf.tile(int_input, [nbins, 1]), [nbins, row_num, col_num])
#calculate histogram
histogram = tf.transpose(tf.count_nonzero(tf.equal(train_equal_mask_nbin, int_input_nbin_copy), axis=2))
return histogram
With the advent of tf.math.bincount, I believe the problem has become much simpler.
Something like this should work:
def hist_fixed_width(x,st,en,nbins):
x=(x-st)/(en-st)
x=tf.cast(x*nbins,dtype=tf.int32)
x=tf.clip_by_value(x,0,nbins-1)
return tf.math.bincount(x,minlength=nbins,axis=-1)

Emulating nested for loops with scan is slow

I am trying to simulate nesting for loops by using the scan function, but this is slow. Is there a better way to simulate nesting for loops with Tensorflow? I am not doing this computation with solely numpy so that I can have automatic differentiation.
Specifically, I am convolving over an image with a bilateral filter all while using Tensorflow control ops. To accomplish this, I nested scan() functions, but this leaves me with remarkably poor performance---filtering a small image takes more than 5 minutes.
Is there a better way than nesting scan functions, and how badly am I using Tensorflow control flow operations? I'm interested in general answers more than one specific for my code.
Here is the original, faster code if you want to see it:
def bilateralFilter(image, sigma_space=1, sigma_range=None, win_size=None):
if sigma_range is None:
sigma_range = sigma_space
if win_size is None: win_size = max(5, 2 * int(np.ceil(3*sigma_space)) + 1)
win_ext = (win_size - 1) / 2
height = image.shape[0]
width = image.shape[1]
# pre-calculate spatial_gaussian
spatial_gaussian = []
for i in range(-win_ext, win_ext+1):
for j in range(-win_ext, win_ext+1):
spatial_gaussian.append(np.exp(-0.5*(i**2+j**2)/sigma_space**2))
padded = np.pad(image, win_ext, mode="edge")
out_image = np.zeros(image.shape)
weight = np.zeros(image.shape)
idx = 0
for row in xrange(-win_ext, 1+win_ext):
for col in xrange(-win_ext, 1+win_ext):
slice = padded[win_ext+row:height+win_ext+row,
win_ext+col:width+win_ext+col]
value = np.exp(-0.5*((image - slice)/sigma_range)**2) \
* spatial_gaussian[idx]
out_image += value*slice
weight += value
idx += 1
out_image /= weight
return out_image
This is the Tensorflow version:
sess = tf.InteractiveSession()
with sess.as_default():
def bilateralFilter(image, sigma_space, sigma_range):
win_size = max(5., 2 * np.ceil(3 * sigma_space) + 1)
win_ext = int((win_size - 1) / 2)
height = tf.shape(image)[0].eval()
width = tf.shape(image)[1].eval()
spatial_gaussian = []
for i in range(-win_ext, win_ext + 1):
for j in range(-win_ext, win_ext + 1):
spatial_gaussian.append(np.exp(-0.5 * (i ** 2 +\
j ** 2) / sigma_space ** 2))
# we use "symmetric" as it best approximates "edge" padding
padded = tf.pad(image, [[win_ext, win_ext], [win_ext, win_ext]],
mode='SYMMETRIC')
out_image = tf.zeros(tf.shape(image))
weight = tf.zeros(tf.shape(image))
spatial_index = tf.constant(0)
row = tf.constant(-win_ext)
col = tf.constant(-win_ext)
def cond(padded, row, col, weight, out_image, spatial_index):
return tf.less(row, win_ext + 1)
def body(padded, row, col, weight, out_image, spatial_index):
sub_image = tf.slice(padded, [win_ext + row, win_ext + col],
[height, width])
value = tf.exp(-0.5 *
(((image - sub_image) / sigma_range) ** 2)) *
spatial_gaussian[spatial_index.eval()]
out_image += value * sub_image
weight += value
spatial_index += 1
row, col = tf.cond(tf.not_equal(tf.mod(col,
tf.constant(2*win_ext + 1)), 0),
lambda: (row + 1, tf.constant(-win_ext)),
lambda: (row, col))
return padded, row, col, weight, out_image, spatial_index
padded, row, col, weight, out_image, spatial_index =
tf.while_loop(cond, body,
[padded, row, col, weight, out_image, spatial_index])
out_image /= weight
return out_image
cat = plt.imread("cat.png") # grayscale
cat = tf.reshape(tf.constant(cat), [276, 276])
cat_blurred = bilateralFilter(cat, 2., 0.25)
cat_blurred = cat_blurred.eval()
plt.figure()
plt.gray()
plt.imshow(cat_blurred)
plt.show()
Here is one problem with your code. cols() has a bunch of python globals and you seemed to expect them to be updated at each loop iteration. Please take a look at the TensorFlow tutorial about graph construction and execution. In short, those python globals and their associated code will only be executed at graph construction time, and they are not even in TensorFlow's execution graph. An operation can only be included in the execution graph if it is a tf operator.
Also, it seems that tf.while_loop is better suited for your code than scan.

How can I make a greyscale copy of a Surface in pygame?

In pygame, I have a surface:
im = pygame.image.load('foo.png').convert_alpha()
im = pygame.transform.scale(im, (64, 64))
How can I get a grayscale copy of the image, or convert the image data to grayscale? I have numpy.
Use a Surfarray, and filter it with numpy or Numeric:
def grayscale(self, img):
arr = pygame.surfarray.array3d(img)
#luminosity filter
avgs = [[(r*0.298 + g*0.587 + b*0.114) for (r,g,b) in col] for col in arr]
arr = numpy.array([[[avg,avg,avg] for avg in col] for col in avgs])
return pygame.surfarray.make_surface(arr)
After a lot of research, I came up with this solution, because answers to this question were too slow for what I wanted this feature to:
def greyscale(surface: pygame.Surface):
start = time.time() # delete me!
arr = pygame.surfarray.array3d(surface)
# calulates the avg of the "rgb" values, this reduces the dim by 1
mean_arr = np.mean(arr, axis=2)
# restores the dimension from 2 to 3
mean_arr3d = mean_arr[..., np.newaxis]
# repeat the avg value obtained before over the axis 2
new_arr = np.repeat(mean_arr3d[:, :, :], 3, axis=2)
diff = time.time() - start # delete me!
# return the new surface
return pygame.surfarray.make_surface(new_arr)
I used time.time() to calculate the time cost for this approach, so for a (800, 600, 3) array it takes: 0.026769161224365234 s to run.
As you pointed out, here is a variant preserving the luminiscence:
def greyscale(surface: pygame.Surface):
arr = pygame.surfarray.pixels3d(surface)
mean_arr = np.dot(arr[:,:,:], [0.216, 0.587, 0.144])
mean_arr3d = mean_arr[..., np.newaxis]
new_arr = np.repeat(mean_arr3d[:, :, :], 3, axis=2)
return pygame.surfarray.make_surface(new_arr)
The easiest way is to iterate over all the pixels in your image and call .get_at(...) and .set_at(...).
This will be pretty slow, so in answer to your implicit suggestion about using NumPy, look at http://www.pygame.org/docs/tut/surfarray/SurfarrayIntro.html. The concepts and most of the code are identical.