How to isolate regions of a 3d surface with a constrained height? - numpy

I have a 2-variable discrete function represented in the form of a tuple through the following line of code:
hist_values, hist_x, hist_y = np.histogram2d()
Where you can think of a non-smooth 3d surface with hist_values being the height of the surface at grids with edge coordinates of (hist_x, hist_y).
Now, I would like to collect those grids for which hist_values is above some threshold level.

You could simply compare the hist_values with the threshold, this would give you a mask as an array of bool which can be used in slicing, e.g.:
import numpy as np
# prepare random input
arr1 = np.random.randint(0, 100, 1000)
arr2 = np.random.randint(0, 100, 1000)
# compute 2D histogram
hist_values, hist_x, hist_y = np.histogram2d(arr1, arr2)
mask = hist_values > threshold # the array of `bool`
hist_values[mask] # only the values above `threshold`
Of course, the values are then collected in a flattened array.
Alternatively, you could also use mask to instantiate a masked-array object (using numpy.ma, see docs for more info on it).
If you are after the coordinates at which this is happening, you should use numpy.where().
# i0 and i1 contain the indices in the 0 and 1 dimensions respectively
i0, i1 = np.where(hist_values > threshold)
# e.g. this will give you the first value satisfying your condition
hist_values[i0[0], i1[0]]
For the corresponding values of hist_x and hist_y you should note that these are the boundaries of the bins, and not, for example, the mid-values, therefore you could resort to the lower or upper bound of it.
# lower edges of `hist_x` and `hist_y` respectively...
hist_x[i0]
hist_y[i1]
# ... and upper edges
hist_x[i0 + 1]
hist_y[i1 + 1]

Related

vectorize pytorch tensor indexing

I have a batch of images img_batch, size [8,3,32,32], and I want to manipulate each image by setting randomly selected pixels to zero. I can do this using a for loop over each image but I'm not sure how to vectorize it so I'm not processing only one image at a time. This is my code using loops.
batch_size = 8
prct0 = 0.1
noise = torch.tensor([9, 14, 5, 7, 6, 14, 1, 3])
comb_img = []
for ind in range(batch_size):
img = img_batch[ind]
c, h, w = img.shape
prct = 1 - (1 - prct0)**noise[ind].item()
idx = random.sample(range(h*w), int(prct*h*w) )
img_noised = img.clone()
img_noised.view(c,1,-1)[:,0,idx] = 0
comb_img.append(img_noised)
comb_img = torch.stack(comb_img) # output is comb_img [8,3,32,32]
I'm new to pytorch and if you see any other improvements, please share.
First note: Do you need to use noise? It will be a lot easier if you treat all images the same and don't have a different set number of pixels to set to 0.
However, you can do it this way, but you still need a small for loop (in the list comprehension).
#don't want RGB masking, want the whole pixel
rng = torch.rand(*img_batch[:,0:1].shape)
#create binary mask
mask = torch.stack([rng[i] <= 1-(1-prct0)**noise[i] for i in range(batch_size)])
img_batch_masked = img_batch.clone()
#broadcast mask to 3 RGB channels
img_batch_masked[mask.tile([1,3,1,1])] = 0
You can check that the mask is set correctly by summing mask across the last 3 dims, and seeing if it matches your target percentage:
In [5]: print(mask.sum([1,2,3])/(mask.shape[2] * mask.shape[3]))
tensor([0.6058, 0.7716, 0.4195, 0.5162, 0.4739, 0.7702, 0.1012, 0.2684])
In [6]: print(1-(1-prct0)**noise)
tensor([0.6126, 0.7712, 0.4095, 0.5217, 0.4686, 0.7712, 0.1000, 0.2710])
You can easily do this without a loop in a fully vectorized manner:
Create noise tensor
Select a threshold and round the noise tensor to 0 or 1 based on above or below that threshold (prct0)
Element-wise multiply image tensor by noise tensor
I think calling the vector of power mutlipliers noise is a bit confusing, so I've renamed that vector power_vec in this example:
power_vec = noise
# create random noise - note one channel rather than 3 color channels
rand_noise = torch.rand(8,1,32,32)
noise = torch.pow(rand_noise,power_vec) # these tensors are broadcastable
# "round" noise based on threshold
z = torch.zeros(noise.shape)
o = torch.ones(noise.shape)
noise_rounded = torch.where(noise>prct0,o,z)
# apply noise mask to each color channel
output = img_batch * noise_rounded.expand(8,3,32,32)
For simplicity this solution uses your original batch size and image size but could be trivially extended to work on inputs of any image and batch size.

How can I reconstruct original matrix from SVD components with following shapes?

I am trying to reconstruct the following matrix of shape (256 x 256 x 2) with SVD components as
U.shape = (256, 256, 256)
s.shape = (256, 2)
vh.shape = (256, 2, 2)
I have already tried methods from documentation of numpy and scipy to reconstruct the original matrix but failed multiple times, I think it maybe 3D matrix has a different way of reconstruction.
I am using numpy.linalg.svd for decompostion.
From np.linalg.svd's documentation:
"... If a has more than two dimensions, then broadcasting rules apply, as explained in :ref:routines.linalg-broadcasting. This means that SVD is
working in "stacked" mode: it iterates over all indices of the first
a.ndim - 2 dimensions and for each combination SVD is applied to the
last two indices."
This means that you only need to handle the s matrix (or tensor in general case) to obtain the right tensor. More precisely, what you need to do is pad s appropriately and then take only the first 2 columns (or generally, the number of rows of vh which should be equal to the number of columns of the returned s).
Here is a working code with example for your case:
import numpy as np
mat = np.random.randn(256, 256, 2) # Your matrix of dim 256 x 256 x2
u, s, vh = np.linalg.svd(mat) # Get the decomposition
# Pad the singular values' arrays, obtain diagonal matrix and take only first 2 columns:
s_rep = np.apply_along_axis(lambda _s: np.diag(np.pad(_s, (0, u.shape[1]-_s.shape[0])))[:, :_s.shape[0]], 1, s)
mat_reconstructed = u # s_rep # vh
mat_reconstructed equals to mat up to precision error.

Adding a third dimension to my 2D array in a for loop

I have a for loop that gives me an output of 16 x 8 2D arrays per entry in the loop. I want to stack all of these 2D arrays along the z-axis in a 3D array. This way, I can determine the variance over the z-axis. I have tried multiple commands, such as np.dstack, matrix3D[p,:,:] = ... and np.newaxis both in- and outside the loop. However, the closest I've come to my desired output is just a repetition of the last array stacked on top of each other. Also the dimensions were way off. I need to keep the original 16 x 8 format. By now I'm in a bit too deep and could use some nudge in the right direction!
My code:
excludedElectrodes = [1,a.numberOfColumnsInArray,a.numberOfElectrodes-a.numberOfColumnsInArray+1,a.numberOfElectrodes]
matrixEA = np.full([a.numberOfRowsInArray, a.numberOfColumnsInArray], np.nan)
for iElectrode in range(a.numberOfElectrodes):
if a.numberOfDeflectionsPerElectrode[iElectrode] != 0:
matrixEA[iElectrode // a.numberOfColumnsInArray][iElectrode % a.numberOfColumnsInArray] = 0
for iElectrode in range (a.numberOfElectrodes):
if iElectrode+1 not in excludedElectrodes:
"""Preprocessing"""
# Loop over heartbeats
for p in range (1,len(iLAT)):
# Calculate parameters, store them in right row-col combo (electrode number)
matrixEA[iElectrode // a.numberOfColumnsInArray][iElectrode % a.numberOfColumnsInArray] = (np.trapz(abs(correctedElectrogram[limitA[0]:limitB[0]]-totalBaseline[limitA[0]:limitB[0]]))/(1000))
# Stack all matrixEA arrays along z axis
matrix3D = np.dstack(matrixEA)
This example snippet does what you want, although I suspect your errors have to do more with things not relative to the concatenate part. Here, we use the None keyword in the array to create a new empty dimension (along which we concatenate the 2D arrays).
import numpy as np
# Function does create a dummy (16,8) array
def foo(a):
return np.random.random((16,8)) + a
arrays2D = []
# Your loop
for i in range(10):
# Calculate your (16,8) array
f = foo(i)
# And append it to the list
arrays2D.append(f)
# Stack arrays along new dimension
array3D = np.concatenate([i[...,None] for i in arrays2D], axis = -1)

Difficulty with numpy broadcasting

I have two 2d point clouds (oldPts and newPts) which I whish to combine. They are mx2 and nx2 numpyinteger arrays with m and n of order 2000. newPts contains many duplicates or near duplicates of oldPts and I need to remove these before combining.
So far I have used the histogram2d function to produce a 2d representation of oldPts (H). I then compare each newPt to an NxN area of H and if it is empty I accept the point. This last part I am currently doing with a python loop which i would like to remove. Can anybody show me how to do this with broadcasting or perhaps suggest a completely different method of going about the problem. the working code is below
npzfile = np.load(path+datasetNo+'\\temp.npz')
arrs = npzfile.files
oldPts = npzfile[arrs[0]]
newPts = npzfile[arrs[1]]
# remove all the negative values
oldPts = oldPts[oldPts.min(axis=1)>=0,:]
newPts = newPts[newPts.min(axis=1)>=0,:]
# round to integers
oldPts = np.around(oldPts).astype(int)
newPts = newPts.astype(int)
# put the oldPts into 2d array
H, xedg,yedg= np.histogram2d(oldPts[:,0],oldPts[:,1],
bins = [xMax,yMax],
range = [[0, xMax], [0, yMax]])
finalNewList = []
N = 5
for pt in newPts:
if not H[max(0,pt[0]-N):min(xMax,pt[0]+N),
max(0,pt[1]- N):min(yMax,pt[1]+N)].any():
finalNewList.append(pt)
finalNew = np.array(finalNewList)
The right way to do this is to use linear algebra to compute the distance between each pair of 2-long vectors, and then accept only the new points that are "different enough" from each old point: using scipy.spatial.distance.cdist:
import numpy as np
oldPts = np.random.randn(1000,2)
newPts = np.random.randn(2000,2)
from scipy.spatial.distance import cdist
dist = cdist(oldPts, newPts)
print(dist.shape) # (1000, 2000)
okIndex = np.max(dist, axis=0) > 5
print(np.sum(okIndex)) # prints 1503 for me
finalNew = newPts[okIndex,:]
print(finalNew.shape) # (1503, 2)
Above I use the Euclidean distance of 5 as the threshold for "too close": any point in newPts that's farther than 5 from all points in oldPts is accepted into finalPts. You will have to look at the range of values in dist to find a good threshold, but your histogram can guide you in picking the best one.
(One good way to visualize dist is to use matplotlib.pyplot.imshow(dist).)
This is a more refined version of what you were doing with the histogram. In fact, you ought to be able to get the exact same answer as the histogram by passing in metric='minkowski', p=1 keyword arguments to cdist, assuming your histogram bin widths are the same in both dimensions, and using 5 again as the threshold.
(PS. If you're interested in another useful function in scipy.spatial.distance, check out my answer that uses pdist to find unique rows/columns in an array.)

ValueError: setting an array element with a sequence Ask

This python code:
import numpy,math
import scipy.optimize as optimization
import matplotlib.pyplot as plt
# Create toy data for curve_fit.
zo = numpy.array([0.0,1.0,2.0,3.0,4.0,5.0])
mu = numpy.array([0.1,0.9,2.2,2.8,3.9,5.1])
sig = numpy.array([1.0,1.0,1.0,1.0,1.0,1.0])
# Define hubble function.
def Hubble(x,a,b):
return H0 * m.sqrt( a*(1+x)**2 + 1/2 * a * (1+b)**3 )
# Define
def Distancez(x,a,b):
return c * (1+x)* np.asarray(quad(lambda tmp:
1/Hubble(a,b,tmp),0,x))
def mag(x,a,b):
return 5*np.log10(Distancez(x,a,b)) + 25
#return a+b*x
# Compute chi-square manifold.
Steps = 101 # grid size
Chi2Manifold = numpy.zeros([Steps,Steps]) # allocate grid
amin = 0.2 # minimal value of a covered by grid
amax = 0.3 # maximal value of a covered by grid
bmin = 0.3 # minimal value of b covered by grid
bmax = 0.6 # maximal value of b covered by grid
for s1 in range(Steps):
for s2 in range(Steps):
# Current values of (a,b) at grid position (s1,s2).
a = amin + (amax - amin)*float(s1)/(Steps-1)
b = bmin + (bmax - bmin)*float(s2)/(Steps-1)
# Evaluate chi-squared.
chi2 = 0.0
for n in range(len(xdata)):
residual = (mu[n] - mag(zo[n], a, b))/sig[n]
chi2 = chi2 + residual*residual
Chi2Manifold[Steps-1-s2,s1] = chi2 # write result to grid.
Throws this error message:
ValueError Traceback (most recent call last)
<ipython-input-136-d0ef47a881a7> in <module>()
36 residual = (mu[n] - mag(zo[n], a, b))/sig[n]
37 chi2 = chi2 + residual*residual
---> 38 Chi2Manifold[Steps-1-s2,s1] = chi2 # write result to
grid.
ValueError: setting an array element with a sequence.
Note: If I define a simple mag function such as (a+b*x), I do not get any error message.
In fact all three functions Hubble, Distancez and Meg have to be functions of redshift z, which is an array.
Now do you think I need to redefine all these functions to have an output array? I mean first, create an array of redshift and then the output of the functions automatically become array?
I need the output of the Distancez() and mag() functions to be arrays. I managed to do it, simply by changing the upper limit of the integral in the Distancez function from x to x.any(). Now I have an array and this is what I want. However, now I see that the output value of the for example Distance(0.25, 0.5, 0.3) is different from when I just put x in the upper limit of the integral? Any help would be appreciated.
Thanks for your reply.
I need the output of the Distancez() and mag() functions to be arrays. I managed to do it, simply by changing the upper limit of the integral in the Distancez function from x to x.any(). Now I have an array and this is what I want. However, now I see that the output value of the for example Distance(0.25, 0.5, 0.3) is different from when I just put x in the upper limit of the integral? Any help would be appreciated.
The ValueError is saying that it cannot assign an element of the array Chi2Manifold with a value that is a sequence. chi2 is probably a numpy array because residual is a numpy array because, your mag() function returns a numpy array, all because your Distancez function returns an numpy array -- you are telling it to do this with that np.asarray().
If Distancez() returned a scalar floating point value you'd probably be set. Do you need to use np.asarray() in Distancez()? Is that actually a 1-element array, or perhaps you intend to reduce that somehow to a scalar. I don't know what your Hubble() function is supposed to do and I'm not an astronomer but in my experience distances are often scalars ;).
If chi2 is meant to be a sequence or numpy array, you probably want to set an appropriately-sized range of values in Chi2Manifold to chi2.