How do I efficiently compute the gyration tensor in numpy?

How do I efficiently compute the gyration tensor in numpy? - numpy

The gyration tensor of a set of N points in 3d space is defined as
assuming the condition
.
How do I compute this in numpy without using an explicit for loop? I know that I can just do something like
import numpy as np
def calculate_gyration_tensor(points):
'''
Calculates the gyration tensor of a set of points.
'''
COM = centre_of_mass(points)
gyration_tensor = np.zeros((3, 3))
for p in points:
gyration_tensor += np.outer(p-COM, p-COM)
return gyration_tensor / len(points)
but this quickly becomes inefficient for large N, because of the for loop. Is there a better way to do it?

You can do with np.einsum like this:
def gyration(points):
'''
Calculate the gyrason tensor
points : numpy array of shape N x 3
'''
center = points.mean(0)
# normalized points
normed_points = points - center[None,:]
return np.einsum('im,in->mn', normed_points,normed_points)/len(points)
# test
points = np.arange(36).reshape(12,3)
gyration(points)
Output:
array([[107.25, 107.25, 107.25],
[107.25, 107.25, 107.25],
[107.25, 107.25, 107.25]])

Related

passing panda dataframe data to functions and its not outputting the results

In my code, I am trying to extract data from csv file to use in the function, but it doesnt output anything, and gives no error. My code works because I tried it with just numpy array as inputs. not sure why it doesnt work with panda.
import numpy as np
import pandas as pd
import os
# change the current directory to the directory where the running script file is
os.chdir(os.path.dirname(os.path.abspath(__file__)))
# finding best fit line for y=mx+b by iteration
def gradient_descent(x,y):
m_iter = b_iter = 1 #starting point
iteration = 10000
n = len(x)
learning_rate = 0.05
last_mse = 10000
#take baby steps to reach global minima
for i in range(iteration):
y_predicted = m_iter*x + b_iter
#mse = 1/n*sum([value**2 for value in (y-y_predicted)]) # cost function to minimize
mse = 1/n*sum((y-y_predicted)**2) # cost function to minimize
if (last_mse - mse)/mse < 0.001:
break
# recall MSE formula is 1/n*sum((yi-y_predicted)^2), where y_predicted = m*x+b
# using partial deriv of MSE formula, d/dm and d/db
dm = -(2/n)*sum(x*(y-y_predicted))
db = -(2/n)*sum((y-y_predicted))
# use current predicted value to get the next value for prediction
# by using learning rate
m_iter = m_iter - learning_rate*dm
b_iter = b_iter - learning_rate*db
print('m is {}, b is {}, cost is {}, iteration {}'.format(m_iter,b_iter,mse,i))
last_mse = mse
#x = np.array([1,2,3,4,5])
#y = np.array([5,7,8,10,13])
#gradient_descent(x,y)
df = pd.read_csv('Linear_Data.csv')
x = df['Area']
y = df['Price']
gradient_descent(x,y)

My code works because I tried it with just numpy array as inputs. not sure why it doesnt work with panda.
Well no, your code also works with pandas dataframes:
df = pd.DataFrame({'Area': [1,2,3,4,5], 'Price': [5,7,8,10,13]})
x = df['Area']
y = df['Price']
gradient_descent(x,y)
Above will give you the same output as with numpy arrays.
Try to check what's in Linear_Data.csv and/or add some print statements in the gradient_descent function just to check your assumptions. I would suggest to first of all add a print statement before the condition with the break statement:
print(last_mse, mse)
if (last_mse - mse)/mse < 0.001:
break

how to vectorize exponential probability function

I believe code below is somewhat correct implementation of this exponential heatmap function:
def expfunc(image, landmark, sigma=6): #image = array of shape (512,512), landmark = array of shape (2,)
a= np.sqrt(np.log(2)/2)/sigma #
for i in range(image.shape[0]):
for j in range(image.shape[1]):
prob = np.exp(-a*(np.abs(i-landmark[0])+np.abs(j-landmark[1])))
if prob > 0.01:
image[i][j] = prob
else:
image[i][j]= 0
return image
My questions are:
How could I vectorize this code?
This probability function gives values to all pixels so how should proceed with very small values? Now I am using threshold of 0.01 for zeros?

Let me know if this works for you:
i = np.arange(image.shape[0])
j = np.arange(image.shape[1])
prob = np.exp(-a*(np.abs(i[:,None]-landmark[0])+np.abs(j-landmark[1])))
image = np.where(prob>0.01, prob, 0)
First compute the array prob for all of the indices i and j. Then prob has the same shape as image, and you can redefine image based on the values of prob using numpy.where.

How to concatenate two tensors with intervals in tensorflow?

I want to concatenate two tensors checkerboard-ly in tensorflow2, like examples showed below:
example 1:
a = [[1,1],[1,1]]
b = [[0,0],[0,0]]
concated_a_and_b = [[1,0,1,0],[0,1,0,1]]
example 2:
a = [[1,1,1],[1,1,1],[1,1,1]]
b = [[0,0,0],[0,0,0],[0,0,0]]
concated_a_and_b = [[1,0,1,0,1,0],[0,1,0,1,0,1],[1,0,1,0,1,0]]
Is there a decent way in tensorflow2 to concatenate them like this?
A bit of background for this:
I first split a tensor c with a checkerboard mask into two halves a and b. A after some transformation I have to concat them back into oringnal shape and order.
What I mean by checkerboard-ly:

Step 1: Generate a matrix with alternated values
You can do this by first concatenating into [1, 0] pairs, and then by applying a final reshape.
Step 2: Reverse some rows
I split the matrix into two parts, reverse the second part and then rebuild the full matrix by picking alternatively from the first and second part
Code sample:
import math
import numpy as np
import tensorflow as tf
a = tf.ones(shape=(3, 4))
b = tf.zeros(shape=(3, 4))
x = tf.expand_dims(a, axis=-1)
y = tf.expand_dims(b, axis=-1)
paired_ones_zeros = tf.concat([x, y], axis=-1)
alternated_values = tf.reshape(paired_ones_zeros, [-1, a.shape[1] + b.shape[1]])
num_samples = alternated_values.shape[0]
middle = math.ceil(num_samples / 2)
is_num_samples_odd = middle * 2 != num_samples
# Gather first part of the matrix, don't do anything to it
first_elements = tf.gather_nd(alternated_values, [[index] for index in range(middle)])
# Gather second part of the matrix and reverse its elements
second_elements = tf.reverse(tf.gather_nd(alternated_values, [[index] for index in range(middle, num_samples)]), axis=[1])
# Pick alternatively between first and second part of the matrix
indices = np.concatenate([[[index], [index + middle]] for index in range(middle)], axis=0)
if is_num_samples_odd:
indices = indices[:-1]
output = tf.gather_nd(
tf.concat([first_elements, second_elements], axis=0),
indices
)
print(output)

I know this is not a decent way as it will affect time and space complexity. But it solves the above problem
def concat(tf1, tf2):
result = []
for (index, (tf_item1, tf_item2)) in enumerate(zip(tf1, tf2)):
item = []
for (subitem1, subitem2) in zip(tf_item1, tf_item2):
if index % 2 == 0:
item.append(subitem1)
item.append(subitem2)
else:
item.append(subitem2)
item.append(subitem1)
concated_a_and_b.append(item)
return concated_a_and_b

In numpy, what is the efficient way to find the maximum values and their indices of a 3D ndarray across two axis?

How to find the correlation-peak values and coordinates of a set of 2D cross-correlation functions?
Given an 3D ndarray that contains a set of 2D cross-correlation functions. What is the efficient way to find the maximum(peak) values and their coordinates(x and y indices)?
The code below do the work but I think it is inefficient.
import numpy as np
import numpy.matlib
ccorr = np.random.rand(7,5,5)
xind = ccorr.argmax(axis=-1)
mccorr = ccorr[np.matlib.repmat(np.arange(0,7)[:,np.newaxis],1,5),np.matlib.repmat(np.arange(0,5)[np.newaxis,:],7,1), xind]
yind = mccorr.argmax(axis=-1)
xind = xind[np.arange(0,7),yind]
values = mccorr[np.arange(0,7),yind]
print("cross-correlation functions (z,y,x)")
print(ccorr)
print("x and y indices of the maximum values")
print(xind,yind)
print("Maximum values")
print(values)

You'll want to flatten the dimensions you're searching over and then use unravel_index and take_along_axis to get the coordinates and values, respectively.
ccorr = np.random.rand(7,5,5)
cc_rav = ccorr.reshape(ccorr.shape[0], -1)
idx = np.argmax(cc_rav, axis = -1)
indices_2d = np.unravel_index(idx, ccorr.shape[1:])
vals = np.take_along_axis(ccorr, indices = indices_2d, axis = 0)
if you're using numpy version <1.15:
vals = cc_rav[np.arange(ccorr.shape[0]), idx]
or:
vals = ccorr[np.arange(ccorr.shape[0]),
indices_2d[0], indices_2d[1]]

How to optimize the linear coefficients for numpy arrays in a maximization function?

I have to optimize the coefficients for three numpy arrays which maximizes my evaluation function.
I have a target array called train['target'] and three predictions arrays named array1, array2 and array3.
I want to put the best linear coefficients i.e., x,y,z for these three arrays which will maximize the function
roc_aoc_curve(train['target'], xarray1 + yarray2 +z*array3)
the above function would be maximum when prediction is closer to the target.
i.e, xarray1 + yarray2 + z*array3 should be closer to train['target'].
The range of x,y,z >=0 and x,y,z <= 1
Basically I am trying to put the weights x,y,z for each of the three arrays which would make the function
xarray1 + yarray2 +z*array3 closer to the train['target']
Any help in getting this would be appreciated.
I used pulp.LpProblem('Giapetto', pulp.LpMaximize) to do the maximization. It works for normal numbers, integers etc, however failing while trying to do with arrays.
import numpy as np
import pulp
# create the LP object, set up as a maximization problem
prob = pulp.LpProblem('Giapetto', pulp.LpMaximize)
# set up decision variables
x = pulp.LpVariable('x', lowBound=0)
y = pulp.LpVariable('y', lowBound=0)
z = pulp.LpVariable('z', lowBound=0)
score = roc_auc_score(train['target'],x*array1+ y*array2 + z*array3)
prob += score
coef = x+y+z
prob += (coef==1)
# solve the LP using the default solver
optimization_result = prob.solve()
# make sure we got an optimal solution
assert optimization_result == pulp.LpStatusOptimal
# display the results
for var in (x, y,z):
print('Optimal weekly number of {} to produce: {:1.0f}'.format(var.name, var.value()))
Getting error at the line
score = roc_auc_score(train['target'],x*array1+ y*array2 + z*array3)
TypeError: unsupported operand type(s) for /: 'int' and 'LpVariable'
Can't progress beyond this line when using arrays. Not sure if my approach is correct. Any help in optimizing the function would be appreciated.

When you add sums of array elements to a PuLP model, you have to use built-in PuLP constructs like lpSum to do it -- you can't just add arrays together (as you discovered).
So your score definition should look something like this:
score = pulp.lpSum([train['target'][i] - (x * array1[i] + y * array2[i] + z * array3[i]) for i in arr_ind])
A few notes about this:
[+] You didn't provide the definition of roc_auc_score so I just pretended that it equals the sum of the element-wise difference between the target array and the weighted sum of the other 3 arrays.
[+] I suspect your actual calculation for roc_auc_score is nonlinear; more on this below.
[+] arr_ind is a list of the indices of the arrays, which I created like this:
# build array index
arr_ind = range(len(array1))
[+] You also didn't include the arrays, so I created them like this:
array1 = np.random.rand(10, 1)
array2 = np.random.rand(10, 1)
array3 = np.random.rand(10, 1)
train = {}
train['target'] = np.ones((10, 1))
Here is my complete code, which compiles and executes, though I'm sure it doesn't give you the result you are hoping for, since I just guessed about target and roc_auc_score:
import numpy as np
import pulp
# create the LP object, set up as a maximization problem
prob = pulp.LpProblem('Giapetto', pulp.LpMaximize)
# dummy arrays since arrays weren't in OP code
array1 = np.random.rand(10, 1)
array2 = np.random.rand(10, 1)
array3 = np.random.rand(10, 1)
# build array index
arr_ind = range(len(array1))
# set up decision variables
x = pulp.LpVariable('x', lowBound=0)
y = pulp.LpVariable('y', lowBound=0)
z = pulp.LpVariable('z', lowBound=0)
# dummy roc_auc_score since roc_auc_score wasn't in OP code
train = {}
train['target'] = np.ones((10, 1))
score = pulp.lpSum([train['target'][i] - (x * array1[i] + y * array2[i] + z * array3[i]) for i in arr_ind])
prob += score
coef = x + y + z
prob += coef == 1
# solve the LP using the default solver
optimization_result = prob.solve()
# make sure we got an optimal solution
assert optimization_result == pulp.LpStatusOptimal
# display the results
for var in (x, y,z):
print('Optimal weekly number of {} to produce: {:1.0f}'.format(var.name, var.value()))
Output:
Optimal weekly number of x to produce: 0
Optimal weekly number of y to produce: 0
Optimal weekly number of z to produce: 1
Process finished with exit code 0
Now, if your roc_auc_score function is nonlinear, you will have additional troubles. I would encourage you to try to formulate the score in a way that is linear, possibly using additional variables (for example, if you want the score to be an absolute value).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How do I efficiently compute the gyration tensor in numpy? - numpy

Related

passing panda dataframe data to functions and its not outputting the results

how to vectorize exponential probability function

How to concatenate two tensors with intervals in tensorflow?

In numpy, what is the efficient way to find the maximum values and their indices of a 3D ndarray across two axis?

How to optimize the linear coefficients for numpy arrays in a maximization function?

Categories

Resources