NumPy argmax and structured array error: expected a readable buffer object - numpy

I got the following error while using NumPy argmax method. Could some one help me to understand what happened:
import numpy as np
b = np.zeros(1, dtype={'names':['a','b'], 'formats': ['i4']*2})
b.argmax()
The error is
TypeError: expected a readable buffer object
While the following runs without a problem:
a = np.zeros(3)
a.argmax()
It seems the error dues to the structured array. But could you anyone help to explain the reason?

Your b is:
array([(0, 0)], dtype=[('a', '<i4'), ('b', '<i4')])
I get a different error message with argmax:
TypeError: Cannot cast array data from dtype([('a', '<i4'), ('b', '<i4')]) to dtype('V8') according to the rule 'safe'
But this works:
In [88]: b['a'].argmax()
Out[88]: 0
Generally you can't do math operations across the fields of a structured array. You can operate within each field (if it is numeric). Since the fields could be a mix of numbers, strings and other objects, so there's been no effort to handle special cases where such operations might make sense.
If you really must to operations across the fields, try a different view, eg:
In [94]: b.view('<i4').argmax()
Out[94]: 0

Related

Why are numpy array called homogeneous?

Why are numpy arrays called homogeneous when you can have elements of different type in the same numpy array like this?
np.array([1,2,3,4,"a"])
I understand that I cannot perform some types of broadcasting operations like I cannot perform
np1*4 here and it results in an error.
but my question really is when it can have elements of different types, why it is called homogeneous?
Numpy automatically converts them to most applicable datatype.
e.g.,
>>> np.array([1,2,3,4,"a"]).dtype.type
numpy.str_
In short this means all elements are of string.
>>> np.array([1,2,3,4]).dtype.type
numpy.int64

How to compare numpy arrays of tuples?

Here's an MWE that illustrates the issue I have:
import numpy as np
arr = np.full((3, 3), -1, dtype="i,i")
doesnt_work = arr == (-1, -1)
n_arr = np.full((3, 3), -1, dtype=int)
works = n_arr == 10
arr is supposed to be an array of tuples, but it doesn't behave as expected.
works is an array of booleans, as expected, but doesnt_work is False. Is there a way to get numpy to do elementwise comparisons on more complex types, or do I have to resort to list comprehension, flatten and reshape?
There's a second problem:
f = arr[(0, 0)] == (-1, -1)
f is False, because arr[(0,0)] is of type numpy.void rather than a tuple. So even if the componentwise comparison worked, it would give the wrong result. Is there a clever numpy way to do this or should I just resort to list comprehension?
Both problems are actually the same problem! And are both related to the custom data type you created when you specified dtype="i,i".
If you run arr.dtype you will get dtype([('f0', '<i4'), ('f1', '<i4')]). That is a 2 signed integers that are placed in one continuous block of memory. This is not a python tuple. Thus it is clear why the naive comparison fails, since (-1,-1) is a python tuple and is not represented in memory the same way that the numpy data type is.
However if you compare with a_comp = np.array((-1,-1), dtype="i,i") you get the exact behavior you are expecting!
You can read more about how the custom dtype stuff works on the numpy docs:
https://numpy.org/doc/stable/reference/arrays.dtypes.html
Oh and to address what np.void is: it comes from the idea that it is a void c pointer which essentially means that it is an address to a continuous block of memory of unspecified type. But, provided you (the programer) knows what is going to be stored in that memory (in this case two back to back integers) it's fine provided you are careful (compare with the same custom data type).

Writing data frame with object dtype to HDF5 only works after converting to string

I have a big data dataframe and I want to write it to disk for quick retrieval. I believe to_hdf(...) infers the data type of the columns and sometimes gets it wrong. I wonder what the correct way is to cope with this.
import pandas as pd
import numpy as np
length = 10
df = pd.DataFrame({"a": np.random.randint(1e7, 1e8, length),})
# df.loc[1, "a"] = "abc"
# df["a"] = df["a"].astype(str)
print(df.dtypes)
df.to_hdf("df.hdf5", key="data", format="table")
Uncommenting various lines leads me to the following.
Just filling the column with numbers will lead to a data type int32 and stores without problem
Setting one element to abc changes the data to object, but it seems that to_hdf internally infers another data type and throws an error: TypeError: object of type 'int' has no len()
Explicitely converting the column to str leads to success, and to_hdf stores the data.
Now I am wondering what is happening in the second case, and is there a way to prevent this? The only way I found was to go through all columns, check if they are dtype('O') and explicitely convert them to str.
Instead of using hdf5, I have found a generic pickling library which seems to be perfect for the job: jiblib
Storing and loading data is straight forward:
import joblib
joblib.dump(df, "file.jl")
df2 = joblib.load("file.jl")

Evaluate several elements of numpy object array

I have an ndarray A that stores objects of the same type, in particular various LinearNDInterpolator objects. For example's sake assume it's just 2:
>>> A
array([ <scipy.interpolate.interpnd.LinearNDInterpolator object at 0x7fe122adc750>,
<scipy.interpolate.interpnd.LinearNDInterpolator object at 0x7fe11daee590>], dtype=object)
I want to be able to do two things. First, I'd like to evaluate all objects in A at a certain point and get back an ndarray of A.shape with all the values in it. Something like
>> A[[0,1]](1,1) =
array([ 1, 2])
However, I get
TypeError: 'numpy.ndarray' object is not callable
Is it possible to do that?
Second, I would like to change the interpolation values without constructing new LinearNDInterpolator objects (since the nodes stay the same). I.e., something like
A[[0,1]].values = B
where B is an ndarray containing the new values for every element of A.
Thank you for your suggestions.
The same issue, but with simpler functions:
In [221]: A=np.array([add,multiply])
In [222]: A[0](1,2) # individual elements can be called
Out[222]: 3
In [223]: A(1,2) # but not the array as a whole
---------------------------------------------------------------------------
TypeError: 'numpy.ndarray' object is not callable
We can iterate over a list of functions, or that array as well, calling each element on the parameters. Done right we can even zip a list of functions and a list of parameters.
In [224]: ll=[add,multiply]
In [225]: [x(1,2) for x in ll]
Out[225]: [3, 2]
In [226]: [x(1,2) for x in A]
Out[226]: [3, 2]
Another test, the callable function:
In [229]: callable(A)
Out[229]: False
In [230]: callable(A[0])
Out[230]: True
Can you change the interpolation values for individual Interpolators? If so, just iterate through the list and do that.
In general, dtype object arrays function like lists. They contain the same kind of object pointers. Most operations requires the same sort of iteration. Unless you need to organize the elements in multiple dimensions, dtype object arrays have few, if any advantages over lists.
Another thought - the normal array dtype is numeric or fixed length strings. These elements are not callable, so there's no need to implement a .__call__ method on these arrays. They could write something like that to operate on object dtype arrays, but the core action is a Python call. So such a function would just hide the kind of iteration that I outlined.
In another recent question I showed how to use np.char.upper to apply a string method to every element of a S dtype array. But my time tests showed that this did not speedup anything.

arcpy TypeError: narray.fields require

I am receiving a really unhelpful error message 'TypeError: narray.fields require' on doing the following;
I have a pandas data frame which I have converted to a numpy array using
df.as_matrix()
this is the numpy array "npArrayIN" shape: (3, 10)
I then need to create a feature class - here is the call to the arcpy function which has the list of 10 fields I want to create but which crashes returning the error. All numbers are floating point.
arcpy.da.NumPyArrayToFeatureClass(npArrayIN, outputShape, ("TID","X","Y","Z","H","D","WGS84Lat","WGS84Long","OFFSETA", "OFFSETB"), spRef)
Any suggestions gratefully received.
Thanks
Have you tried it with the "X","Y","Z" as the 1st three columns instead of leading it with "TID"?
Also, you may want to try it with only the xyz columns.