Numpy Numpty - HexBytes to String Literal - numpy

I fromfile using a structured dtype and have one field that is raw hexbytes ('V2) - it looks like this:
[[b'\x00\x00', b'\x05\x01', b'\x00\x00', b'\x00\x00', b'\x00\x00' .....],
...
[b'\x00\x00', b'\x05\x01', b'\x00\x00', b'\x00\x00', b'\x00\x00' .....]] - the sub-array is shape (44640, 50)
I'd like to decode this entire array into string literals and keep the same shape
(e.g. each 2byte chunk from b'\x05\x01' into '0501')
Tried iterating through using bytes.hex() instance method but doesn't keep the 2bytes x 50 structure ..
Always extremely grateful for your time and advice...
copied from comment with guess as to line breaks
dt3 = np.dtype([('DIG', 'u1', (digField)), ('ANL', 'V2', (anField)), ('MSG', 'u1', (260 - digField - (anField * 2))), ('DAT', 'u1', (20))])
raw_ry = np.fromfile(logpath, dtype=dt3, count=-1)
dt4 = np.dtype('U')
anDecode_ry = np.array([item.hex() for item in raw_ry['ANL']], dtype=dt4)

Related

np.array returns different dimensions of the array for the same data

I need to convert the list of ndarray to ndarray of ndarray. In the first case I'm splitting the original array into 5 pieces using the np.array_split function, as a result i have list of ndarray, then I transform this list using the np.array() function and get ndaray with shape (5,). In the second case i do the same but with other data and as a result I get ndarray with shape (5,200,3072). The only difference between the data is their shape. In first case it is (121, 3072), in second case (1000,3072).
Here shape will be (5,)
train_folds_X = []
train_folds_X = np.array_split(binary_train_X,5,axis = 0)
np.array(train_folds_X).shape
but here shape will be (5,200,3072)
train_folds_X = []
train_folds_X = np.array_split(train_X,5,axis = 0)
np.array(train_folds_X).shape
binary_train_X shape is (121,3072), train_X shape(1000,3072) in other it is same data,this is number from Street View House Numbers (http://ufldl.stanford.edu/housenumbers/) but in binary_train_X only 0 and 9. train_folds_X before using the np.array in the first and second cases have the same len = 5. I don't understand why this is happening.
The reason is in the first case the result of the split is a list of uneven arrays; look at this:
y = np.empty([121,3072])
[np.array_split(y,5,axis = 0)[i].shape for i in range(5)]
which returns the first array a bit bigger than the others:
[(25, 3072), (24, 3072), (24, 3072), (24, 3072), (24, 3072)]
and thus the resulting shape of:
np.array(np.array_split(y,5,axis = 0)).shape
is (5,). Plus you should get a deprecation warning:
Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
On the other hand the second case behaves as expected:
z = np.empty([1000,3072])
[np.array_split(z,5,axis = 0)[i].shape for i in range(5)]
returns:
[(200, 3072), (200, 3072), (200, 3072), (200, 3072), (200, 3072)]
and thus the resulting array can be constructed as you expect:
np.array(np.array_split(z,5,axis = 0)).shape
returns (5, 200, 3072)

Python - Slicing an Array of float

I have two 1-D of array of float ('Xdata' and 'tdata'). I want to make a new variable named 'ratedata'. The problem is when I run the code, the console showed "IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices". How to encounter this problem? thank you.
the code:
dxdt_a = np.array(pd.read_excel('T50-katalis1-m14.xlsx',index_col=0,header=5))
Xdata = dxdt_a[:,1]
tdata = dxdt_a[:,0]
ratedata = np.zeros(len(Xdata))
for i in ratedata:
ratedata[i] = (Xdata[i+1]-Xdata[i])/(tdata[1]-tdata[0])

Represent a 3d vector as a single numerical value

Is it possible to convert a 3d vector representing a colour into a single numerical value (x)? Something ideally that is a float value between 0 and 1. Math's is not my strong suit at all so from my googling I think I either need to use vectorization or convert the value to a tensor to achieve my objective. Would that be correct?
An example of what I am trying to achieve is:
labColour = (112, 48, 0)
labAsFloat = colour_to_float(luvColour, cspace='LAB')
print(labAsFloat) # outputs something like 0.74673543
def colour_to_float(colour, cspace):
return ??? somehow vectorise??
Not quite sure I understand your question correctly. If the objective is merely a unique floating number representation then this might work.
def colour_to_float(colour):
int_arr = list(colour)
int_arr.append(0)
data_bytes = np.array(int_arr, dtype=np.uint8)
return (data_bytes.view(dtype=np.float32))[0]
def float_to_colour(num):
return np.array([num], dtype=np.float32).view(dtype=np.uint8)[:3].tolist()
Results:
labColour = (230, 140, 50)
f = colour_to_float(labColour)
print(f)
4.64232e-39
lab = float_to_colour(f)
print(lab)
[230, 140, 50]

TypeError: unhashable type: 'numpy.ndarray' - How to get data from data frame by querying radius from ball tree?

How to get data by querying radius from ball tree? For example
from sklearn.neighbors import BallTree
import pandas as pd
bt = BallTree(df[['lat','lng']], metric="haversine")
for idx, row in df.iterrow():
res = df[bt.query_radius(row[['lat','lng']],r=1)]
I want to get those rows in df that are in radius r=1. But it throws type error
TypeError: unhashable type: 'numpy.ndarray'
Following the first answer I got index out of range when iterating over the rows
5183
(5219, 25)
5205
(5219, 25)
5205
(5219, 25)
5221
(5219, 25)
Traceback (most recent call last):
File "/Users/Chu/Documents/dssg2018/sa4.py", line 45, in <module>
df.loc[idx,word]=len(df.iloc[indices[idx]][df[word]==1])/\
IndexError: index 5221 is out of bounds for axis 0 with size 5219
And the code is
bag_of_words = ['beautiful','love','fun','sunrise','sunset','waterfall','relax']
for idx,row in df.iterrows():
for word in bag_of_words:
if word in row['caption']:
df.loc[idx, word] = 1
else:
df.loc[idx, word] = 0
bt = BallTree(df[['lat','lng']], metric="haversine")
indices = bt.query_radius(df[['lat','lng']],r=(float(10)/40000)*360)
for idx,row in df.iterrows():
for word in bag_of_words:
if word in row['caption']:
print(idx)
print(df.shape)
df.loc[idx,word]=len(df.iloc[indices[idx]][df[word]==1])/\
np.max([1,len(df.iloc[indices[idx]][df[word]!=1])])
The error is not in the BallTree, but the indices returned by it are not used properly for putting it into index.
Do it this way:
for idx, row in df.iterrows():
indices = bt.query_radius(row[['lat','lng']].values.reshape(1,-1), r=1)
res = df.iloc[[x for b in indices for x in b]]
# Do what you want to do with res
This will also do (since we are sending only a single point each time):
res = df.iloc[indices[0]]
Explanation:
I'm using scikit 0.20. So the code you wrote above:
df[bt.query_radius(row[['lat','lng']],r=1)]
did not work for me. I needed to make it a 2-d array by using reshape().
Now bt.query_radius() returns array of array of indices within the radius r specified as mentioned in the documentation:
ind : array of objects, shape = X.shape[:-1]
each element is a numpy integer array listing the indices of neighbors of the corresponding point. Note that unlike the results of
a k-neighbors query, the returned neighbors are not sorted by distance
by default.
So we needed to iterate two arrays to reach the actual indices of the data.
Now once we got the indices, in a pandas Dataframe, iloc is the way to access data with indices.
Update:
You dont need to query the bt each time for individual points. You can send all the df at once to return a 2-d array containing the indices of points within the radius to the point specified that index.
indices = bt.query_radius(df, r=1)
for idx, row in df.iterrows():
nearest_points_index = indices[idx]
res = df.iloc[nearest_points_index]
# Do what you want to do with res

Sort Numpy array by subfield

I have a structured numpy array, in which one of field has subfields:
import numpy, string, random
dtype = [('name', 'a10'), ('id', 'i4'),
('size', [('length', 'f8'), ('width', 'f8')])]
a = numpy.zeros(10, dtype = dtype)
for idx in range(len(a)):
a[idx] = (''.join(random.sample(string.ascii_lowercase, 10)), idx,
numpy.random.uniform(0, 1, size=[1, 2]))
I can easily get it sorted by any of fields, like this:
a.sort(order = ['name'])
a.sort(order = ['size'])
When I try to sort it by a structured field ('size' in this example), it is effectively getting sorted by the first subfield ('length' in this example). However, I would like to have my elements sorted by 'height'. I tried something like this, but it does not work:
a.sort(order = ['size[\'height\']']))
ValueError: unknown field name: size['height']
a.sort(order = ['size', 'height'])
ValueError: unknown field name: height
Therefore, I wonder, if there is a way to accomplish the task?
I believe this is what you want:
a[a["size"]["width"].argsort()]