I have 2 Numpy arrays which I need to perform some basic math operations on them.
But also I can't have the result of this operation to be greater than 255, due to the type (uint8) of the final numpy array (named magnitude). Any Idea? Except of traversing through the array...
# Notice that the data type is "np.uint8", also arrays are 2D
magnitude = np.zeros((org_im_width,org_im_height), dtype=np.uint8)
# "numpy_arr_1" and "numpy_arr_2" both of the same size & type as "magnitude"
# In the following operation, I should limit the number to 255
magnitude = ( (np.int_(numpy_arr_1))**2 + (np.int_(numpy_arr_2))**2 )**0.5
# The following doesn't work obviously:
# magnitude = min(255,((np.int_(numpy_arr_1))**2+(np.int_(numpy_arr_2))**2)**0.5)
First of all, if you assign magnitude = ... after its creation, you are replacing the initial uint8 array by the obtianed in the operation, so magnitude wont be uint8 anymore.
Anyway, in case is just a mistake in the example, to perform what you want you can either clamp/clip or normalize the values of the resulting operations:
You can find np.clip which limits the values of an array to a min and max values:
>>> magnitude = np.clip(operation, 0, 255)
Where operation is the magnitude you calculate. In fact, what you might want is:
>>> magnitude = np.clip(np.sqrt(a**2 + b**2), 0, 255).astype(np.uint8)
Where a and b are your np.int_(numpy_arr_1) and np.int_(numpy_arr_2) respectively, renamed for readability purposes.
Additionally, as in your case all the values are positive, you can replace np.clip by np.minimum:
>>> magnitude = np.minimum(np.sqrt(a**2 + b**2), 255).astype(np.uint8)
However, this just limits the magnitude of the vector to 255 (what you want), but you will lose a lot of information for points of higher magnitude. If the magnitude at some point is 1000 it will be clamped to 255, and therefore in your final array 1000 = 255. Two points with a wide variation in magnitude will end up having the same magnitude (1000 and 255 in this case).
To avoid this, you can normalize (re-scale) your range of magnitudes to [0, 255]. This means, if in your initial computation the magnitude array is in the ranges [0, 1000], transform it to [0, 255] so 1000 before will be 255 after, but 255 before will now be 63 (simple linear scaling).
>>> tmp = np.sqrt(a**2 + b**2).astype(float)
>>> magnitude = (tmp / tmp.max() * 255).astype(np.uint8)
tmp / tmp.max() will rescale all the values to [0, 1] range (if the array is float), and by multiplying by 255 the array is reescaled to [0, 255] again.
In case your magnitude's lower range is not 0, you can perform a re-scale from say [200, 1000] to [0, 255] that better represents your data:
>>> tmp = np.sqrt(a**2 + b**2).astype(float)
>>> tmax, tmin = tmp.max(), tmp.min()
>>> magnitude = ((tmp - tmin) / (tmax - tmin) * 255).astype(np.uint8)
Related
I have a density function (from quantum mechanics calculations) to be multiplied with the spherical Bessel function with a momentum grid (momentum q 1d array, real space distance r 1d array, so need to calculate jn(q*r) 2d array). The product will be integrated across the real space to get results as function of momentum (results in 1d array same shape as q).
The Bessel function has oscillation, while the density function fast decay over a threshold distance. I used the adaptive quadrature in quadpy which is fine when oscillation is slow but it fails with high oscillation (for high momentum values or high orders in Bessel functions). The mpmath quadosc could be a nice option, but currently I have the problem that "object arrays are not supported", which seems to be the same as in Relation between mpmath and scipy: Type Error, what would be the best way to solve it since the density function is calculated outside of the mpmath.
from mpmath import besselj, sqrt, pi, besseljzero, inf,quadosc
from scipy.interpolate import interp1d
n = 1
q = np.geomspace(1e-7, 500, 1000)
# lets create a fake gaussian density
x = np.geomspace(1e-7, 10, 1000)
y = np.exp(-(x-5)**2)
density = interp1d(x,y,kind='cubic',fill_value=0,bounds_error=False)
# if we just want to integrate the spherical bessel function
def spherical_jn(x,n=n):
return besselj(n + 1 / 2, x) * sqrt(pi / 2 / x)
# this is fine
vals = quadosc(
spherical_jn, [0, inf], zeros=lambda m: besseljzero(n + 1 / 2, m)
)
# now we want to integrate the spherical bessel function times the density
def spherical_jn_density(x,n=lprimeprime):
grid = q[..., None] *x
return besselj(n + 1 / 2, grid) * sqrt(pi / 2 / grid)*density(x)
# this will fail
vals_density = quadosc(
spherical_jn_density, [0, inf], zeros=lambda m: besseljzero(n + 1 / 2, m)
)
Expect: an accurate integral of highly oscillating spherical Bessel function with arbitrary decaying function (decay to zero at large distance).
Your density is interp callable, which works like:
In [33]: density(.5)
Out[33]: array(1.60522789e-09)
It does not work when given a mpmath object:
In [34]: density(mpmath.mpf(.5))
ValueError: object arrays are not supported
It's ok if x is first converted to ordinary float:
In [37]: density(float(mpmath.mpf(.5)))
Out[37]: array(1.60522789e-09)
Tweaking your function:
def spherical_jn_density(x,n=1):
print(repr(x))
grid = q[..., None] *x
return besselj(n + 1 / 2, grid) * sqrt(pi / 2 /grid) * density(x)
and trying to run the quadosc (with a smaller q)
In [57]: vals_density = quadosc(
...: spherical_jn_density, [0, inf], zeros=lambda m: besseljzero(n + 1 / 2, m))
mpf('0.506414729137261838698106')
TypeError: cannot create mpf from array([[mpf('5.06414729137261815781894e-8')],
[mpf('0.000000473559111442409924364745')],
[mpf('0.00000442835129247081824275722')],
[mpf('0.0000414104484439061558283487')],
[mpf('0.000387237851532012775822723')],
[mpf('0.00362114295531604773233197')],
[mpf('0.0338620727569835882851491')],
[mpf('0.316651395857188250996884')],
[mpf('2.96107409661232278850947')],
[mpf('27.6896294168963266721213')]], dtype=object)
In other words,
besselj(n + 1 / 2, grid)
is having problems, even before trying to evaluate density(x). mpmath functions don't work with numpy arrays; and many numpy/scipy functions don't work with mpmath objects.
Question:
Assume for this part that the total count for every sample is 5000 (i.e., sum of column = 5000).
Imagine there was a row (gene G) in this dataset for which the count is expected to be 1 in 10% of samples and 0 in the remaining 90% of samples. We are doing an experiment where we would like to know if the expression of gene G changes in experimental vs control conditions, and we will measure n samples (single cells) from each condition.
Plot the statistical power to detect a 10% increase in the expression of G in experimental vs control at Bonferroni-corrected p < 0.05 as a function of n, assuming that we will be performing a similar test for significance on 1000 genes total. How many samples from each condition do we need to measure to achieve a power of 95%?
First, for gene G, I created a pandas dataframe for control and experimental conditions, where the ratio of 0:1 is 10% and 20%, respectively. I extrapolated the conditions for 1000 genes and then perform cross-tabulation analysis.
import pandas as pd
from statsmodels.stats.contingency_tables import mcnemar
from statsmodels.stats.gof import chisquare_effectsize
from statsmodels.stats.power import GofChisquarePower
n = 5000 # no of records
nog = 1000 # no of genes
gene_list = ["gene_" + str(i) for i in range(0,nog)]
def generate_gene_df(gene, n):
df = pd.DataFrame.from_dict(
{"Gene" : gene,
"Cells": (f'Cell{x}' for x in range(1, n+1)),
"Control": np.random.choice([1,0], p=[0.1, 0.9], size=n),
"Experimental": np.random.choice([1,0], p=[0.1+0.1, 0.9-0.1], size=n)},
orient='columns'
)
df = df.set_index(["Gene","Cells"])
return df
# List of simulated genes
gene_df_list = [generate_gene_df(gene, n) for gene in gene_list]
df = pd.concat(gene_df_list)
df = df.reset_index()
table = pd.crosstab([df["Gene"], df["Cells"]],
[df["Control"], df["Experimental"]]).to_numpy()
Table:
array([[1, 0, 0, 0],
[1, 0, 0, 0],
[0, 0, 1, 0],
...,
[0, 1, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0]])
Now, I want to plot the statistical power at Bonferroni-corrected p < 0.05 as a function of n. I also want to find out how many samples from each condition do we need to measure to achieve a power of 95%.
My attempt:
McNemar's test
result = mcnemar(table, exact=True)
print('statistic=%.3f, p-value=%.3f' % (result.statistic, result.pvalue))
alpha=0.05
if result.pvalue > alpha:
print('Same proportions of errors (fail to reject H0)')
else:
print('Different proportions of errors (reject H0)')
Output:
statistic=0.000, p-value=1.000
Same proportions of errors (fail to reject H0)
I calculated the power analysis using:
nobs = 5000
alpha = 0.05
effect_size = chisquare_effectsize(0.5, 0.5*1.1, correction=None, cohen=True, axis=0)
analysis = GofChisquarePower()
power_chisquare = analysis.solve_power(effect_size=effect_size, nobs=nobs, alpha=alpha)
print('Based on Chi Square test, the minimum number of samples required to see an effect of the desired size: %.3f' % power_chisquare)
Based on Chi Square test, the minimum number of samples required to see an effect of the desired size: 0.050
Why does the power curve look atypical? Did I perform the analyses correctly? Is McNemar an appropriate statistical test method in this case?
I have a 2-variable discrete function represented in the form of a tuple through the following line of code:
hist_values, hist_x, hist_y = np.histogram2d()
Where you can think of a non-smooth 3d surface with hist_values being the height of the surface at grids with edge coordinates of (hist_x, hist_y).
Now, I would like to collect those grids for which hist_values is above some threshold level.
You could simply compare the hist_values with the threshold, this would give you a mask as an array of bool which can be used in slicing, e.g.:
import numpy as np
# prepare random input
arr1 = np.random.randint(0, 100, 1000)
arr2 = np.random.randint(0, 100, 1000)
# compute 2D histogram
hist_values, hist_x, hist_y = np.histogram2d(arr1, arr2)
mask = hist_values > threshold # the array of `bool`
hist_values[mask] # only the values above `threshold`
Of course, the values are then collected in a flattened array.
Alternatively, you could also use mask to instantiate a masked-array object (using numpy.ma, see docs for more info on it).
If you are after the coordinates at which this is happening, you should use numpy.where().
# i0 and i1 contain the indices in the 0 and 1 dimensions respectively
i0, i1 = np.where(hist_values > threshold)
# e.g. this will give you the first value satisfying your condition
hist_values[i0[0], i1[0]]
For the corresponding values of hist_x and hist_y you should note that these are the boundaries of the bins, and not, for example, the mid-values, therefore you could resort to the lower or upper bound of it.
# lower edges of `hist_x` and `hist_y` respectively...
hist_x[i0]
hist_y[i1]
# ... and upper edges
hist_x[i0 + 1]
hist_y[i1 + 1]
I'm new to Tensorflow, I'm processing a number of feature maps from two images. At a certain point, I've N features map for each of the two images and I want to obtain a single, feature volume concatenation of the two.
I can simply concat them with tf.concat([features1, features2]), in order to obtain a new volume having, for each pixel, the features at that position from both images.
What if I want to concat features with pixels having different coordinates on the two images?
For example, I've a function mapping x,y from the first image to u,v on the second image. Such functions does not follow a shared rule for all pixels (e.g., it's not a simple horizontal translation). Using numpy arrays, the behavior should be this:
for i in range(0,H):
for j in range(0,W):
u = f(x)
v = f(y)
concat[i][j] = np.concatenate([image1[y][x], image2[v][u]])
I tried to slice single pixel features and concat them together within for loops, but as you know it's very inefficient (and infeasible with large images, the memory required is just too high).
matrix = []
for y in range(0,H):
row = []
for x in range(0,W):
row.append(tf.concat([tf.slice(image1, [0, y, x, 0], [-1, 1, 1, -1]), tf.slice(image2, [0, v, u, 0], [-1, 1, 1, -1]) ], 3))
row_array = tf.concat(row, 2)
matrix.append(row_array)
result = tf.concat(matrix, 1)
What's the best option, it it exists?
Given two matrices A (1000 x 100) and B (100 x 1000), instead of directly computing their product in tensorflow, i.e., tf.dot(A,B), I want to first select 10 cols (randomly) from A and 10 rows from B and then use the tf.dot(A_s,B_s)
Naturally, the second multiplication should be much faster as the number of required multiplications reduces by factor of 10.
However, in reality, it seems selecting given columns of matrix A in tensorflow to creat A_s is an extremly inefficient process.
Given indices of the required columns in idx, I tried the following solutions to creat A_s. The solutions are ranked according to their performance:
. A_s = tf.transpose(tf.gather(tf.unstack(A, axis=1), idx)):
tf.dot(A_s,B_s) 5 times slower than tf.dot(A,B) because creating A_s is too expensive.
2.
p_shape = K.shape(params)
p_flat = K.reshape(params, [-1])
i_flat = K.reshape(K.reshape(
K.arange(0, p_shape[0]) * p_shape[1], [-1, 1]) + indices, [-1])
indices = [i_flat]
v = K.transpose(indices)
updates = i_flat * 0 - 1
shape = tf.to_int32([p_shape[0] * p_shape[1]])
scatter = tf.scatter_nd(v, updates, shape) + 1
out_temp = tf.dynamic_partition(p_flat,
partitions=scatter, num_partitions=2)[0]
A_s = tf.reshape(out_temp, [p_shape[0], -1])
results in 6-7 times slower product
3.
X,Y = tf.meshgrid((tf.range(0, p_shape[0])),indices)
idx = K.concatenate([K.expand_dims(
K.reshape((X),[-1]),1),
K.expand_dims(K.reshape((Y),[-1]),1)],axis=1)
A_s = tf.reshape(tf.gather_nd(params, idx), [p_shape[0], -1])
10-12 times slower.
Any idea on how I can improve the efficiency of column selection process is very much appreciated.
PS1: I ran all the experiments on CPU.
PS2: Matrix A is a placeholder not a variable. In some implementation it can get problematic as its shape may not be inferred.