Why 'float object not iterable error' if it's an array of integers? - pandas

So I have a column dataframe made of arrays that I already divided by using indexes of other columns. Therefore I just get a part of the array that depends on these indexes and position, the position is a list of tuples with the positions (they can have the same start and ending point) and the index is a value. These are displayed in columns as well. The code is the following:
df['relative_cells_array'] = df.apply(lambda x: x['cells_array'][:, x['position'][x['relative_track']][0]:x['position'][x['relative_track']][1]+1] if x['relative_track']<=len(x['position']) else np.nan, axis=1)
This works. But the problem comes when I use other arrays that are modified, in this case the array uses spatial binomial weights to interpolate values. Due to the standardization, this transformation of the original array gives you float when dividing by the neighboring cells. I convert it to integer and PRINT the array, but still gives me error, I tried other things and it also gave me error of the tuple (position is a list of tuple). But why it worked before?
The code for this is the following:
df['relative_cells_array_weighted1'] = df.apply(lambda x: [[int(y) for y in sublist] for sublist in x['cells_weighted1'][:, x['position'][x['relative_track']][0]:x['position'][x['relative_track']][1]+1]] if x['relative_track']<=len(x['position']) else np.nan, axis=1)
df['relative_average_weighted1_cell_reading'] = df['relative_cells_array_weighted1'].apply(lambda x: [num for sublist in x for num in sublist])
This is the error: TypeError: 'float' object is not iterable
And after making some changes I have the tuple error (don't remember the changes, I used chatgpt)

Related

Selecting two sets of columns from a dataFrame with all rows

I have a dataFrame with 28 columns (features) and 600 rows (instances). I want to select all rows, but only columns from 0-12 and 16-27. Meaning that I don't want to select columns 12-15.
I wrote the following code, but it doesn't work and throws a syntax error at : in 0:12 and 16:. Can someone help me understand why?
X = df.iloc[:,[0:12,16:]]
I know there are other ways for selecting these rows, but I am curious to learn why this one does not work, and how I should write it to work (if there is a way).
For now, I have written it is as:
X = df.iloc[:,0:12]
X = X + df.iloc[:,16:]
Which seems to return an incorrect result, because I have already treated the NaN values of df, but when I use this code, X includes lots of NaNs!
Thanks for your feedback in advance.
You can use np.r_ to concatenate the slices:
x = df.iloc[:, np.r_[0:12,16:]]
iloc has these allowed inputs (from the docs):
An integer, e.g. 5.
A list or array of integers, e.g. [4, 3, 0].
A slice object with ints, e.g. 1:7.
A boolean array.
A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.
What you're passing to iloc in X = df.iloc[:,[0:12,16:]] is not a list of integers or a slice of ints, but a list of slice objects. You need to convert those slices to a list of integers, and the best way to do that is using the numpy.r_ function.
X = df.iloc[:, np.r_[0:13, 16:28]]

Isn't this a row vector? [duplicate]

I know that numpy array has a method called shape that returns [No.of rows, No.of columns], and shape[0] gives you the number of rows, shape[1] gives you the number of columns.
a = numpy.array([[1,2,3,4], [2,3,4,5]])
a.shape
>> [2,4]
a.shape[0]
>> 2
a.shape[1]
>> 4
However, if my array only have one row, then it returns [No.of columns, ]. And shape[1] will be out of the index. For example
a = numpy.array([1,2,3,4])
a.shape
>> [4,]
a.shape[0]
>> 4 //this is the number of column
a.shape[1]
>> Error out of index
Now how do I get the number of rows of an numpy array if the array may have only one row?
Thank you
The concept of rows and columns applies when you have a 2D array. However, the array numpy.array([1,2,3,4]) is a 1D array and so has only one dimension, therefore shape rightly returns a single valued iterable.
For a 2D version of the same array, consider the following instead:
>>> a = numpy.array([[1,2,3,4]]) # notice the extra square braces
>>> a.shape
(1, 4)
Rather then converting this to a 2d array, which may not be an option every time - one could either check the len() of the tuple returned by shape or just check for an index error as such:
import numpy
a = numpy.array([1,2,3,4])
print(a.shape)
# (4,)
print(a.shape[0])
try:
print(a.shape[1])
except IndexError:
print("only 1 column")
Or you could just try and assign this to a variable for later use (or return or what have you) if you know you will only have 1 or 2 dimension shapes:
try:
shape = (a.shape[0], a.shape[1])
except IndexError:
shape = (1, a.shape[0])
print(shape)

Python: AttributeError: "'numpy.float64' object has no attribute 'tanh'"

I have seen couple of questions with similar title, however I am afraid, none of them could satisfactorily answer my question and that is, how do I take tan inverse or lets say exp of a numpy ndarray? For instance, piece of my code looks similar to this-
import numpy as np
from numpy import ndarray,zeros,array,dot,exp
import itertools
def zetta_G(x,spr_g,theta_g,c_g):
#this function computes estimated g:
#c_g is basically a matrix of dim equal to g and whose elements contains list of centers that describe the fuzzy system for each element of g:
m,n=c_g.shape[0],c_g.shape[1]
#creating an empty matrix of dim mxn to hold regressors:
zetta_g=zeros((m,n),dtype=ndarray)
#creating an empty matrix of dim mxn to hold estimated g:
z_g=np.zeros((m,n),dtype=ndarray)
#for filling rows
for k in range(m):
#for filling columns
for p in range(n):
#container to hold-length being equal to number of inputs(e1,e2,e3 etc)
Mu=[[] for i in range(len(x))]
for i in range(len(x)):
#filling that with number of zeros equal to len of center
Mu[i]=np.zeros(len(c_g[k][p]))
#creating an empty list for holding rules
M=[]
#piece of code for creating rules-all possible combinations
for i in range(len(x)):
for j in range(len(c_g[k][p])):
Mu[i][j]=exp(-.5*((x[i]-c_g[k][p][j])/spr_g[k][p])**2)
b=list(itertools.product(*Mu))
for i in range(len(b)):
M.append(reduce(lambda x,y:x*y,b[i]))
M=np.array(M)
S=np.sum(M)
#import pdb;pdb.set_trace()
zetta_g[k][p]=M/S
z_g[k][p]=dot(M/S,theta_g[k][p])
return zetta_g,z_g
if __name__=='__main__':
x=[1.2,.2,.4]
cg11,cg12,cg13,cg21,cg22,cg23,cg31,cg32,cg33=[-10,-8,-6,-4,-2,0,2,4,6,8,10],[-10,-8,-6,-4,-2,0,2,4,6,8,10],[-10,-8,-6,-4,-2,0,2,4,6,8,10],[-10,-8,-6,-4,-2,0,2,4,6,8,10],[-10,-8,-6,-4,-2,0,2,4,6,8,10],[-12,-9,-6,-3,0,3,6,9,12],[-6.5,-4.5,-2.5,0,2.5,4.5,6.5],[-5,-4,-3,-2,-1,0,1,2,3,4,5],[-3.5,-2.5,-1.5,0,1.5,2.5,3.5]
C,spr_f=array([[-10,-8,-6,-4,-2,0,2,4,6,8,10],[-10,-8,-6,-4,-2,0,2,4,6,8,10],[-10,-8,-6,-4,-2,0,2,4,6,8,10]]),[2.2,2,2.1]
c_g=array([[cg11,cg12,cg13],[cg21,cg22,cg23],[cg31,cg32,cg33]])
spr_g=array([[2,2.1,2],[2.1,2.2,3],[2.5,1,1.5]])
theta_g=np.zeros((c_g.shape[0],c_g.shape[1]),dtype=ndarray)
#import pdb;pdb.set_trace()
N=0
for i in range(c_g.shape[0]):
for j in range(c_g.shape[1]):
length=len(c_g[i][j])**len(x)
theta_g[i][j]=np.random.sample(length)
N=N+(len(c_g[i][j]))**len(x)
zetta_g,z_g=zetta_G(x,spr_g,theta_g,c_g)
#zetta is a function that accepts following args-- x: which is a list of certain dim, spr_g: is a matrix of dimension similar to theta_g and c_g. theta_g and c_g are numpy matrices with lists as individual elements
print(zetta_g)
print(z_g)
inv=np.tanh(z_g)
print(inv)
In [89]: a=np.array([[1],[3],[2]],dtype=np.ndarray)
In [90]: a
Out[90]:
array([[1],
[3],
[2]], dtype=object)
Note that the dtype is object, not ndarray. If the dtype isn't one of the recognized numeric or string types, it is object, a generic pointer, just like the elements of a list.
In [91]: np.tanh(a)
AttributeError: 'int' object has no attribute 'tanh'
np.tanh is trying to delegate the task to the elements of array. Commonly math on object dtype arrays is performed by list like iteration on the elements. It does not do the fast compiled numeric numpy math.
If a is ordinary number array:
In [95]: np.tanh(np.array([[1],[3],[2]]))
Out[95]:
array([[0.76159416],
[0.99505475],
[0.96402758]])
With object dtype arrays, your ability to do numeric calculations is limited. Some things work, others don't. It's hit-or-miss.
Here's a first stab at cleaning up your code; it's not tested.
def zetta_G(x,spr_g,theta_g,c_g):
m,n=c_g.shape[0],c_g.shape[1]
#creating an empty matrix of dim mxn to hold regressors:
zetta_g=zeros((m,n),dtype=object)
#creating an empty matrix of dim mxn to hold estimated g:
z_g=np.zeros((m,n),dtype=object)
#for filling rows
for k in range(m):
#for filling columns
for p in range(n):
#container to hold-length being equal to number of inputs(e1,e2,e3 etc)
Mu = np.zeros((len(x), len(c_g[k,p])))
#creating an empty list for holding rules
for i in range(len(x)):
Mu[i,:]=exp(-.5*((x[i]-c_g[k,p,:])/spr_g[k,p])**2)
# probably can calc Mu without any loop
M = []
b=list(itertools.product(*Mu))
for i in range(len(b)):
M.append(reduce(lambda x,y:x*y,b[i]))
M=np.array(M)
S=np.sum(M)
zetta_g[k,p]=M/S
z_g[k,p]=dot(M/S,theta_g[k,p])
return zetta_g,z_g
Running your code, and adding some .shape displays I see that
z_g is (3,3) and contains just single numbers. So it can be initialed as a plain 2d float array:
z_g=np.zeros((m,n))
theta_g is (3,3), but with variable length array elements
print([i.shape for i in theta_g.flat])
[(1331,), (1331,), (1331,), (1331,), (1331,), (729,), (343,), (1331,), (343,)]
zetta_g matches in shapes
If I change:
x=np.array([1.2,.2,.4])
I can calculate Mu without a loop with:
Mu = exp(-.5*((x[:,None]-np.array(c_g[k,p])[None,:])/spr_g[k,p])**2)
c_g is a (3,3) array with variable length lists; I can vectorize the
((x[i]-c_g[k,p][j])
expression with:
x[:,None]-np.array(c_g[k,p])[None,:]
Not a big time saver here since x has 4 elements and c_g elements are only 7-11 long. But cleaner.
In this running code I don't see a tanh, so I don't know what kinds of arrays are using that.
You set type of array's elements to dtype=np.ndarray. Replace type to, let say, dtype=np.float64 or any numeric type.

How to find if any column in an array has duplicate values

Let's say I have a numpy matrix A
A = array([[ 0.5, 0.5, 3.7],
[ 3.8, 2.7, 3.7],
[ 3.3, 1.0, 0.2]])
I would like to know if there is at least two rows i and i' such that A[i, j]=A[i', j] for some column j?
In the example A, i=0 and i'=1 for j=2 and the answer is yes.
How can I do this?
I tried this:
def test(A, n):
for j in range(n):
i = 0
while i < n:
a = A[i, j]
for s in range(i+1, n):
if A[s, j] == a:
return True
i += 1
return False
Is there a faster/better way?
There are a number of ways of checking for duplicates. The idea is to use as few loops in the Python code as possible to do this. I will present a couple of ways here:
Use np.unique. You would still have to loop over the columns since it wouldn't make sense for unique to accept an axis argument because each column could have a different number of unique elements. While it still requires a loop, unique allows you to find the positions and other stats of repeated elements:
def test(A):
for i in range(A.shape[1]):
if np.unique(A[:, i]).size < A.shape[0]:
return True
return False
With this method, you basically check if the number of unique elements in a column is equal to the size of the column. If not, there are duplicates.
Use np.sort, np.diff and np.any. This is a fully vectorized solution that does not require any loops because you can specify an axis for each of these functions:
def test(A):
return np.any(diff(np.sort(A, axis=0), axis=0) == 0)
This literally reads "if any of the column-wise differences in the column-wise sorted array are zero, return True". A zero difference in the sorted array means that there are identical elements. axis=0 makes sort and diff operate on each column individually.
You never need to pass in n since the size of the matrix is encoded in the attribute shape. If you need to look at the subset of a matrix, just pass in the subset using indexing. It will not copy the data, just return a view object with the required dimensions.
A solution without numpy would look like this: First, swap columns and rows with zip()
zipped = zip(*A)
then check if any now row has any duplicates. You can check for duplicates by turning a list into a set, which discards duplicates, and check the length.
has_duplicates = any(len(set(row)) != len(row) for row in zip(*A))
Most probably way slower and also more memory intensive than the pure numpy solution, but this may help for clarity

Apply to each element in a Pandas dataframe

Since each series in the data frame is of tuple, I need to convert them into one number. Basically I have something like this:
price_table['Col1'].apply(lambda x: x[0])
But I actually need to do this for each column. x itself is a tuple but it has only 1 number inside, so I need to return x[0] to get its "value" which is of format float instead of tuple.
In R, I will put axis = c(1,2) but here seems that putting 2 numbers in axis doesnt work:
price_table.apply(lambda x: x[0],axis = 1)
TypeError: <lambda>() got an unexpected keyword argument 'axis'
Is there anyway to apply this simple function to each element in the data frame?
Thanks in advance.
For me the following works well:
"price_table['Col1'].apply(lambda x: x[0],1)"
I do not use the axis. But, I do not know the reason.