Remove duplicates with additional requirements - numpy

I have three columns (x,y,m), where x and y are coordinates and m is the measurement. There are some duplicates, which are defined to be same (x,y). Among those duplicates, I then rank them by the measurement m, I only pick one of the duplicates with minimum m. Here is an example:
x = np.array([1,1,2,2,1,1,2])
y = np.array([1,2,1,2,1,1,1])
m = np.array([10,2,13,4,6,15,7])
there are three duplicates with same coordinates (1,1), among the three, the minimum m is 6. There are two duplicates with same coordinates (2,1), among the two, the minimum m is 7. So the final result I want is:
x = np.array([1,2,1,2])
y = np.array([2,2,1,1])
m = np.array([2,4,6,7])
The numpy.unique can not handle such situation. Any great thoughts?

We could use pandas here for a cleaner solution -
import pandas as pd
In [43]: df = pd.DataFrame({'x':x,'y':y,'m':m})
In [46]: out_df = df.iloc[df.groupby(['x','y'])['m'].idxmin()]
# Format #1 : Final output as a 2D array
In [47]: out_df.values
Out[47]:
array([[1, 1, 6],
[1, 2, 2],
[2, 1, 7],
[2, 2, 4]])
# Format #2 : Final output as three separate 1D arrays
In [50]: X,Y,M = out_df.values.T
In [51]: X
Out[51]: array([1, 1, 2, 2])
In [52]: Y
Out[52]: array([1, 2, 1, 2])
In [53]: M
Out[53]: array([6, 2, 7, 4])

You can try something like this:
import collections
x = np.array([1,1,2,2,1,1,2])
y = np.array([1,2,1,2,1,1,1])
m = np.array([10,2,13,4,6,15,7])
coords = [str(x[i]) + ',' + str(y[i]) for i in range(len(x))]
results = collections.OrderedDict()
for coords, m in zip(coords, m):
if coords not in results:
results[coords] = m
else:
if m < results[coords]:
results[coords] = m
x = np.array([int(key.split(',')[0]) for key, _ in results.items()])
y = np.array([int(key.split(',')[1]) for key, _ in results.items()])
m = np.array([value for _, value in results.items()])

Related

Python: create (sparse) stacked diagonal block matrix

I need to create a matrix with the form
M=[
[a1, 0, 0],
[0, b1, 0],
[0, 0, c1],
[a2, 0, 0],
[0, b2, 0],
[0, 0, c2],
[a3, 0, 0],
[0, b3, 0],
[0, 0, c3],
...]
where a(i), b(i) and c(i) are [1xp] blocks. The resulting matrix M has the form [3m x 3p]. I am given the input data in the form of 3 matrices [m x p]:
A = [[a1.T, a2.T, a3.T, ...]].T
B = [[b1.T, b2.T, b3.T, ...]].T
C = [[c1.T, c2.T, c3.T, ...]].T
How can I create the matrix M? Ideally it would be sparse using the scipy.sparse library but I am even struggling creating it as a dense matrix using numpy. Is there no way around a loop or at least list comprehension in this case?
No need to make it complicated. For your scale, the following executes in less than a second.
import numpy as np
import scipy.sparse
from numpy.random import default_rng
rand = default_rng(seed=0)
m = 70_000
p = 20
abc = rand.random((3, m, p))
M_dense = np.zeros((m, 3, 3*p))
for i in range(3):
M_dense[:, i, i*p:(i+1)*p] = abc[i, ...]
M_sparse = scipy.sparse.csr_matrix(M_dense.reshape((-1, 3*p)))
print(M_sparse.shape)
(210000, 60)
Far better, though, is to construct the sparse matrix directly. Note the permuted shape of abc.
abc = rand.random((m, 3, p))
data = abc.ravel()
indices = np.tile(np.arange(3*p), m)
indptr = np.arange(0, data.size+1, p)
M_sparse = scipy.sparse.csr_matrix((data, indices, indptr))

Numpy.polyfit Not Returning Polynomial

I am trying to create a python program in which the user inputs a set of data and the program spits out an output in which it creates a graph with a line/polynomial which best fits the data.
This is the code:
from matplotlib import pyplot as plt
import numpy as np
x = []
y = []
x_num = 0
while True:
sequence = int(input("Input 1 number in the sequence, type 9040321 to stop"))
if sequence == 9040321:
poly = np.polyfit(x, y, deg=2, rcond=None, full=False, w=None, cov=False)
plt.plot(poly)
plt.scatter(x, y, c="blue", label="data")
plt.legend()
plt.show()
break
else:
y.append(sequence)
x.append(x_num)
x_num += 1
I used the polynomial where I inputed 1, 2, 4, 8 each in separate inputs. MatPlotLib graphed it properly, however, for the degree of 2, the output was the following image:
This is clearly not correct, however I am unsure what the problem is. I think it has something to do with the degree, however when I change the degree to 3, it still does not fit. I am looking for a graph like y=sqrt(x) to go over each of the points and when that is not possible, create the line that fits the best.
Edit: I added a print(poly) feature and for the selected input above, it gives [0.75 0.05 1.05]. I do not know what to make of this.
Approximation by a second degree polynomial
np.polyfit gives the coefficients of a polynomial close to the given points. To plot the polynomial as a smooth curve with matplotlib, you need to calculate a lot of x,y pairs. Using np.linspace(start, stop, numsteps) for the xs, numpy's vectorization allows calculating all the corresponding ys in one go. E.g. ys = a * x**2 + b * x + c.
from matplotlib import pyplot as plt
import numpy as np
x = [0, 1, 2, 3, 4, 5, 6]
y = [1, 2, 4, 8, 16, 32, 64]
plt.scatter(x, y, color='crimson', label='given points')
poly = np.polyfit(x, y, deg=2, rcond=None, full=False, w=None, cov=False)
xs = np.linspace(min(x), max(x), 100)
ys = poly[0] * xs ** 2 + poly[1] * xs + poly[2]
plt.plot(xs, ys, color='dodgerblue', label=f'$({poly[0]:.2f})x^2+({poly[1]:.2f})x + ({poly[2]:.2f})$')
plt.legend()
plt.show()
Higher degree approximating polynomials
Given N points, an N-1 degree polynomial can pass exactly through each of them. Here is an example with 7 points and polynomials of up to degree 6,
from matplotlib import pyplot as plt
import numpy as np
x = [0, 1, 2, 3, 4, 5, 6]
y = [1, 2, 4, 8, 16, 32, 64]
plt.scatter(x, y, color='black', zorder=3, label='given points')
for degree in range(0, len(x)):
poly = np.polyfit(x, y, deg=degree, rcond=None, full=False, w=None, cov=False)
xs = np.linspace(min(x) - 0.5, max(x) + 0.5, 100)
ys = sum(poly_i * xs**i for i, poly_i in enumerate(poly[::-1]))
plt.plot(xs, ys, label=f'degree {degree}')
plt.legend()
plt.show()
Another example
x = [0, 1, 2, 3, 4]
y = [1, 1, 6, 5, 5]
import numpy as np
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [1, 2, 4, 8]
coeffs = np.polyfit(x, y, 2)
print(coeffs)
poly = np.poly1d(coeffs)
print(poly)
x_cont = np.linspace(0, 4, 81)
y_cont = poly(x_cont)
plt.scatter(x, y)
plt.plot(x_cont, y_cont)
plt.grid(1)
plt.show()
Executing the code, you have the graph above and this is printed in the terminal:
[ 0.75 -1.45 1.75]
2
0.75 x - 1.45 x + 1.75
It seems to me that you had false expectations about the output of polyfit.

Multiply every row of a matrix with every row of another matrix

In numpy / PyTorch, I have two matrices, e.g. X=[[1,2],[3,4],[5,6]], Y=[[1,1],[2,2]]. I would like to dot product every row of X with every row of Y, and have the results
[[3, 6],[7, 14], [11,22]]
How do I achieve this?, Thanks!
I think this is what you are looking for:
import numpy as np
x= [[1,2],[3,4],[5,6]]
y= [[1,1],[2,2]]
x = np.asarray(x) #convert list to numpy array
y = np.asarray(y) #convert list to numpy array
product = np.dot(x, y.T)
.T transposes the matrix, which is neccessary in this case for the multiplication (because of the way dot products are defined). print(product) will output:
[[ 3 6]
[ 7 14]
[11 22]]
Using einsum
np.einsum('ij,kj->ik', X, Y)
array([[ 3, 6],
[ 7, 14],
[11, 22]])
In PyTorch, you can achieve this using torch.mm(a, b) or torch.matmul(a, b), as shown below:
x = np.array([[1,2],[3,4],[5,6]])
y = np.array([[1,1],[2,2]])
x = torch.from_numpy(x)
y = torch.from_numpy(y)
# print(torch.matmul(x, torch.t(y)))
print(torch.mm(x, torch.t(y)))
output:
tensor([[ 3, 6],
[ 7, 14],
[11, 22]], dtype=torch.int32)

Position variances and covariances in matrix form

We have the following sets of data which are already given to us : A,B,C that represent variances and D,E, F that represent covariances . I would like to position this sets of data in the matrix form:
matrix: Z Y X
Z A D F
Y D B E
X F E C
How can I arrange the sets of data in the matrix form considering that I don't Know the number of variances/cov?
Then I would like the resulting matrix multiply :
matrix* (G,H,I) * (G
H
I)
The second question is , how I multiply matrix `dimensions 3*3 by 1*3 and 3*1
You can use numpy.matrix and numpy.array to create your own matrix and arrays,
In [1]: import numpy as np
matrix1 = np.matrix([[1, 4, 6], [4, 2, 5],[6, 5, 3]])
array1 = np.array([7,8,9])
Second question: Now use numpy.transpose to calculate the quadratic matrix from array1,
In [2]: matrix2 = array1*np.transpose([array1])
In [3]: matrix2
Out[3]: array([[49, 56, 63],
[56, 64, 72],
[63, 72, 81]])
Finally, multiply both matrix with numpy.matmul,
In [4]: matrix3 = np.matmul(matrix1, matrix2)
In [5]: matrix3
Out[5]: matrix([[651, 744, 837],
[623, 712, 801],
[763, 872, 981]])

Managing high dimensions in Numpy

I want to write a function of 4 variables : f(x1,x2,x3,x4), each in a different dimension.
This can be achieved by f(x1,x2[newaxis],x3[newaxis,newaxis],x4[newaxis,newaxis,newaxis]).
Do you know a smarter way ?
You're looking for np.ix_1:
f(*np.ix_(x1, x2, x3, x4))
For example:
>>> np.ix_([1, 2, 3], [4, 5])
(array([[1],
[2],
[3]]), array([[4, 5]]))
1Or equivalently, np.meshgrid(..., sparse=True, indexing='ij')
One way would be to reshape each array giving appropriate number of singleton dimensions along the leading axes. To do this across all arrays, we could use a list comprehension.
Thus, one way to handle generic number of input arrays would be -
L = [x1,x2,x3,x4]
out = [l.reshape([1]*i + [len(l)]) for i,l in enumerate(L)]
Sample run -
In [186]: # Initialize input arrays
...: x1 = np.random.randint(0,9,(4))
...: x2 = np.random.randint(0,9,(2))
...: x3 = np.random.randint(0,9,(5))
...: x4 = np.random.randint(0,9,(3))
...:
In [187]: A = x1,x2[None],x3[None,None],x4[None,None,None]
In [188]: L = [x1,x2,x3,x4]
...: out = [l.reshape([1]*i + [len(l)]) for i,l in enumerate(L)]
...:
In [189]: A
Out[189]:
(array([2, 1, 1, 1]),
array([[8, 2]]),
array([[[0, 3, 5, 8, 7]]]),
array([[[[6, 7, 0]]]]))
In [190]: out
Out[190]:
[array([2, 1, 1, 1]),
array([[8, 2]]),
array([[[0, 3, 5, 8, 7]]]),
array([[[[6, 7, 0]]]])]