How to sort a two-dimensional array in descending order for a column? - numpy

array([[ 0. , 0.04],
[ 0. , 0.1 ],
[ 0. , 0.2 ],
[ 0. , 0.4 ],
[ 0.27, 1. ],
[ 0.3 , 1. ]])
How to sort the array by the second column in descend order in an simple way ?
The result's shape is also (6,2).

Get argsort indices for the second column, flip them and index into rows -
a[a[:,1].argsort()[::-1]]
Alternatively, get argsort indices on negated version and index into rows -
a[(-a[:,1]).argsort()]

Related

Addressing polynomial multiplication and division "overflow" issue

I have a list of the coefficient to degree 1 polynomials, with a[i][0]*x^1 + a[i][1]
a = np.array([[ 1. , 77.48514702],
[ 1. , 0. ],
[ 1. , 2.4239275 ],
[ 1. , 1.21848739],
[ 1. , 0. ],
[ 1. , 1.18181818],
[ 1. , 1.375 ],
[ 1. , 2. ],
[ 1. , 2. ],
[ 1. , 2. ]])
And running into issues with the following operation,
np.polydiv(reduce(np.polymul, a), a[0])[0] != reduce(np.polymul, a[1:])
where
In [185]: reduce(np.polymul, a[1:])
Out[185]:
array([ 1. , 12.19923307, 63.08691612, 179.21045388,
301.91486027, 301.5756213 , 165.35814595, 38.39582615,
0. , 0. ])
and
In [186]: np.polydiv(reduce(np.polymul, a), a[0])[0]
Out[186]:
array([ 1.00000000e+00, 1.21992331e+01, 6.30869161e+01, 1.79210454e+02,
3.01914860e+02, 3.01575621e+02, 1.65358169e+02, 3.83940472e+01,
1.37845155e-01, -1.06809521e+01])
First of all the remainder of np.polydiv(reduce(np.polymul, a), a[0]) is way bigger than 0, 827.61514239 to be exact, and secondly, the last two terms to quotient should be 0, but way larger from 0. 1.37845155e-01, -1.06809521e+01.
I'm wondering what are my options to improve the accuracy?
There is a slightly complicated way to keep the product first and then divide structure.
By first employ n points and evaluate on a.
xs = np.linspace(0, 1., 10)
ys = np.array([np.prod(list(map(lambda r: np.polyval(r, x), a))) for x in xs])
then do the division on ys instead of coefficients.
ys = ys/np.array([np.polyval(a[0], x) for x in xs])
finally recover the coefficient using polynomial interpolation with xs and ys
from scipy.interpolate import lagrange
lagrange(xs, ys)

How can the numpy computing sequence influence the result?

why the following two lines of code compute the same thing ,but I get the different results.
kernel1 = np.diag(np.exp(-scale*eigen_values))
kernel2 = np.exp(-scale*np.diag(eigen_values))
the
np.all(kernel1==kernel2)
output
False
Look at the values! Then you'll see the problem: when given a 1-d array, numpy.diag creates a 2-d array with zeros in the off-diagonal positions. In kernel1, you do diag last, so the off-diagonal values are 0. In kernel2, you apply exp after diag, and exp(0) is 1, so in kernel2, the off-diagonal terms are all 1. (Remember that numpy.exp is applied element-wise; it is not the matrix exponential.)
In [19]: eigen_values = np.array([1, 0.5, 0.1])
In [20]: scale = 1.0
In [21]: np.diag(np.exp(-scale*eigen_values))
Out[21]:
array([[0.36787944, 0. , 0. ],
[0. , 0.60653066, 0. ],
[0. , 0. , 0.90483742]])
In [22]: np.exp(-scale*np.diag(eigen_values))
Out[22]:
array([[0.36787944, 1. , 1. ],
[1. , 0.60653066, 1. ],
[1. , 1. , 0.90483742]])

Data standardization, across samples or across features?

I have 4 samples data with 5 features, as an array, data.
import numpy as np
data = np.array([[1,1,1,1,0],
[0,0,0,0,0],
[1,1,1,1,0],
[1,0,0,0,0]])
print (data)
n_samples, n_features = data.shape = (4,5)
When I apply StandardScaler on it as follows, does it standardize the data across features or across samples?
from sklearn.preprocessing import StandardScaler, MinMaxScaler
result = StandardScaler().fit_transform(data)
print (result)
[[ 0.57735027 1. 1. 1. 0. ]
[-1.73205081 -1. -1. -1. 0. ]
[ 0.57735027 1. 1. 1. 0. ]
[ 0.57735027 -1. -1. -1. 0. ]]
What's the best practice of data standardization in machine learning, across samples or across features?
in case of StandardScaler/MinMaxScaler the data are scaled across features and this is the best common practice
import numpy as np
from sklearn.preprocessing import StandardScaler
data = np.array([[1,1,1,1,0],
[0,0,0,0,0],
[1,1,1,1,0],
[1,0,0,0,0]])
result = StandardScaler().fit_transform(data)
result
array([[ 0.57735027, 1. , 1. , 1. , 0. ],
[-1.73205081, -1. , -1. , -1. , 0. ],
[ 0.57735027, 1. , 1. , 1. , 0. ],
[ 0.57735027, -1. , -1. , -1. , 0. ]])
you can verify it by your self
(data - data.mean(0))/data.std(0).clip(1e-5)
array([[ 0.57735027, 1. , 1. , 1. , 0. ],
[-1.73205081, -1. , -1. , -1. , 0. ],
[ 0.57735027, 1. , 1. , 1. , 0. ],
[ 0.57735027, -1. , -1. , -1. , 0. ]])

sort a Numpy ndarray of classes and coordinates by min value from left to right

I want to sort a numpy array that contains classes of objects found on an image with corresponding coordinates. Start should be in top left corner of the image and work through row-wise untill bottom right corner.
My numpy array:
import numpy as np
columns=['classses','ymin','xmin','ymax','xmax']
arr=np.array([[10., 0.50835305, 0.47248545, 0.59892374, 0.51885366],
[11., 0.36795592, 0.52040386, 0.46757331, 0.56760514],
[ 4., 0.24611123, 0.29460225, 0.34236759, 0.34000006],
[ 2. , 0.37274304, 0.38200337, 0.46354109, 0.4273783 ],
[ 2. , 0.510912 , 0.37931672, 0.59918219, 0.42638448],
[11. , 0.10971789, 0.51647586, 0.20377752, 0.562015 ],
[ 7. , 0.51268667, 0.24481608, 0.59831458, 0.29086089],
[10. , 0.24716213, 0.47549573, 0.33929491, 0.52023494],
[ 1. , 0.37433949, 0.61748177, 0.46359614, 0.65206224],
[ 7. , 0.24870941, 0.24960253, 0.33646214, 0.29458734],
[11. , 0.24345258, 0.51865327, 0.33831981, 0.565395 ],
[ 8. , 0.11206201, 0.33702213, 0.19984987, 0.38336146],
[10. , 0.24955718, 0.6559478 , 0.34239537, 0.70276546],
[ 2. , 0.24712075, 0.38360605, 0.33835301, 0.42949697],
[ 4. , 0.51084387, 0.29126126, 0.59996665, 0.33353919],
[ 8. , 0.51466578, 0.33362284, 0.60250646, 0.37810257],
[ 6. , 0.510656 , 0.56336159, 0.59472215, 0.61143786],
[ 2. , 0.1192565 , 0.69437939, 0.2057956 , 0.73883325],
[ 7. , 0.11934 , 0.25181183, 0.20320818, 0.29591617],
[ 9. , 0.51130402, 0.65646565, 0.59214538, 0.70244706],
[ 3. , 0.11690334, 0.56094837, 0.20533638, 0.60812557],
[11. , 0.50439239, 0.51784241, 0.59443074, 0.56629324],
[ 7. , 0.37829998, 0.24856552, 0.46135774, 0.29153487],
[ 4. , 0.37588719, 0.29197016, 0.46272004, 0.33599868],
[ 1. , 0.37316957, 0.57077163, 0.46224919, 0.60553724],
[10. , 0.1145431 , 0.47239822, 0.20014074, 0.5183605 ],
[10. , 0.37647596, 0.65606439, 0.46242031, 0.70245349],
[ 1. , 0.24754623, 0.61552459, 0.34198812, 0.65568751],
[10. , 0.37339926, 0.47152713, 0.461395 , 0.52023202],
[10. , 0.37436292, 0.69828469, 0.46418577, 0.74559146],
[ 6. , 0.37082726, 0.42555344, 0.4643003 , 0.47343689],
[ 9. , 0.5126825 , 0.69970727, 0.59857124, 0.74693108],
[ 2. , 0.1202545 , 0.3842268 , 0.19877489, 0.42925853],
[ 5. , 0.24687886, 0.5643267 , 0.33911708, 0.61170775],
[10. , 0.12104956, 0.65108246, 0.21425578, 0.69579262],
[ 6. , 0.24587491, 0.42739749, 0.33760101, 0.47690719],
[ 8. , 0.24526763, 0.33704251, 0.33957234, 0.38356996],
[ 4. , 0.1150065 , 0.29550964, 0.20008969, 0.3379634 ],
[ 6. , 0.514301 , 0.42620456, 0.59742886, 0.47339022],
[ 1. , 0.24682792, 0.7001856 , 0.34188086, 0.74008971],
[ 8. , 0.11335434, 0.42906916, 0.19882832, 0.47424948],
[ 1. , 0.11596378, 0.61286598, 0.20856762, 0.64871949],
[ 8. , 0.37103209, 0.33494309, 0.46368858, 0.38201007],
[ 6. , 0.37533277, 0.33500299, 0.46548373, 0.38105384]])
Arrays shape is (44,5)
I converted the array to pandas Dataframe, multiplied the values by the actual height and width of the image and found the mean value for X and Y from their min and max values.
import pandas as pd
df=pd.DataFrame(arr.copy(),index=None,columns=['classses','ymin','xmin','ymax','xmax'])
df['ymin']=(df['ymin']+df['ymax'])*1080/2
df['xmin']=(df['xmin']+df['xmax'])*1920/2
df=df.drop(columns=['xmax','ymax'])
## now it's rather y and x actually
df.sort_values(by=['ymin','xmin'])
Output:
classses ymin xmin
11 8.0 168.432415 691.568246
40 8.0 168.578636 867.185894
5 11.0 169.287521 1035.351226
25 10.0 169.929274 951.128371
37 4.0 170.151943 608.134118
32 2.0 172.275871 780.945917
20 3.0 174.009449 1122.310982
18 7.0 174.176017 525.818880
41 1.0 175.246956 1211.122051
...
While the class 8 is located pretty far in top left it's not the lowest value for both X and Y.
I've also tried argsort() and lexsort() and also converting to list and using sorted() with operator.itemgetter() but it brought same results when sorting for both columns.
I thought also about using pop() and argmin() to get the min value of each column and then use the pandas Index to get the corresponding class. But i guess it would be a problem as soon as i arrive at the end of each row.
Thanks in advance!
Here you can see a (not so accurate) plot of the objects on the image
One mistake is that if you want it to start from top left to bottom right, then for y-axis, you need to sort by ascending=False, while for x-axis, you need to sort by ascending=True.
Try pd.sort_values(by=['ymin','xmin'],ascending=[False,True])
This will at least give you something in the first row.
However, if you want strictly the left top class, you need first set up some rules to classify that which objects are in the same row. This is another question.

one-hot encoding and existing data

I have a numpy array (N,M) where some of the columns should be one-hot encoded. Please help to make a one-hot encoding using numpy and/or tensorflow.
Example:
[
[ 0.993, 0, 0.88 ]
[ 0.234, 1, 1.00 ]
[ 0.235, 2, 1.01 ]
.....
]
The 2nd column here ( with values 3 and 2 ) should be one hot-encoded, I know that there are only 3 distinct values ( 0, 1, 2 ).
The resulting array should look like:
[
[ 0.993, 0.88, 0, 0, 0 ]
[ 0.234, 1.00, 0, 1, 0 ]
[ 0.235, 1.01, 1, 0, 0 ]
.....
]
Like that I would be able to feed this array into the tensorflow.
Please notice that 2nd column was removed and it's one-hot version was appended in the end of each sub-array.
Any help would be highly appreciated.
Thanks in advance.
Update:
Here is what I have right now:
Well, not exactly...
1. I have more than 3 columns in the array...but I still want to do it only with 2nd..
2. First array is structured, ie it's shape is (N,)
Here is what I have:
def one_hot(value, max_value):
value = int(value)
a = np.zeros(max_value, 'uint8')
if value != 0:
a[value] = 1
return a
# data is structured array with the shape of (N,)
# it has strings, ints, floats inside..
# was get by np.genfromtxt(dtype=None)
unique_values = dict()
unique_values['categorical1'] = 1
unique_values['categorical2'] = 2
for row in data:
row[col] = unique_values[row[col]]
codes = np.zeros((data.shape[0], len(unique_values)))
idx = 0
for row in data:
codes[idx] = one_hot(row[col], len(unique_values)) # could be optimised by not creating new array every time
idx += 1
data = np.c_[data[:, [range(0, col), range(col + 1, 32)]], codes[data[:, col].astype(int)]]
Also trying to concatenate via:
print data.shape # shape (5000,)
print codes.shape # shape (5000,3)
data = np.concatenate((data, codes), axis=1)
Here's one approach -
In [384]: a # input array
Out[384]:
array([[ 0.993, 0. , 0.88 ],
[ 0.234, 1. , 1. ],
[ 0.235, 2. , 1.01 ]])
In [385]: codes = np.array([[0,0,0],[0,1,0],[1,0,0]]) # define codes here
In [387]: codes
Out[387]:
array([[0, 0, 0], # encoding for 0
[0, 1, 0], # encoding for 1
[1, 0, 0]]) # encoding for 2
# Slice out the second column and append one-hot encoded array
In [386]: np.c_[a[:,[0,2]], codes[a[:,1].astype(int)]]
Out[386]:
array([[ 0.993, 0.88 , 0. , 0. , 0. ],
[ 0.234, 1. , 0. , 1. , 0. ],
[ 0.235, 1.01 , 1. , 0. , 0. ]])