Python Pandas: itemsize functions throws AttributeError: 'Series' object has no attribute 'itemsize' - pandas

I am new to Python Pandas, and, as part of that, I wanted to create a pandas.Series and print the itemsize of the Series. However, while I was doing that, it returned an attribute error.
Following is my code:
import pandas as pd
data = [1, 2, 3, 4]
s = pd.Series(data)
print(s.itemsize)
This however does not return the itemsize and throws the following error:
Traceback (most recent call last):
File "c:\Users\USER\filename.py", line 32, in <module>
print(s.itemsize)
^^^^^^^^^^
File "C:\Users\USER\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\generic.py", line 5902, in __getattr__
return object.__getattribute__(self, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Series' object has no attribute 'itemsize'```
What might the reason for this be?

You probably confused it with numpy.ndarray.itemsize in numpy.
In pandas, it's pandas.Series.dtype.itemsize :
import pandas as pd
data = [1, 2, 3, 4]
s = pd.Series(data)
Output :
print(s.dtype.itemsize)
#8

Related

Exporting pandas df with column of tuples to BQ throws pyarrow error

I have the following pandas dataframe:
import pandas as pd
df = pd.DataFrame({"id": [1,2,3], "items": [('a', 'b'), ('a', 'b', 'c'), tuple('d')]}
>print(df)
id items
0 1 (a, b)
1 2 (a, b, c)
2 3 (d,)
After registering my GCP/BQ credentials in the normal way...
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path_to_my_creds.json"
... I try to export it to a BQ table:
import pandas_gbq
pandas_gbq.to_gbq(df, "my_table_name", if_exists="replace")
but I keep getting the following error:
Traceback (most recent call last):
File "<string>", line 4, in <module>
File "/Users/max.epstein/opt/anaconda3/envs/rec2env/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 1205, in to_gbq
...
File "/Users/max.epstein/opt/anaconda3/envs/rec2env/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 342, in bq_to_arrow_array
return pyarrow.Array.from_pandas(series, type=arrow_type)
File "pyarrow/array.pxi", line 915, in pyarrow.lib.Array.from_pandas
File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 122, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'tuple' object
I have tried converting the tuple column to string with df = df.astype({"items":str}) and adding a table_schema param to the pandas_gbq.to_gbq... line but I keep getting this same error.
I have also tried replacing the pandas_gbq.to_gbq... line with the bq_client.load_table_from_dataframe method described here but still get the same pyarrow.lib.ArrowTypeError: Expected bytes, got a 'tuple' object error...
So I think this is a weird issue with pandas dtypes being separate from Python types, and the astype only converting the type and not the pandas dtype. Try also converting the dtype to match the type after the astype statement.
Such that.
df = df.astype({"items": str})
Is replaced with:
df = df.astype({"items": str})
df = df.convert_dtypes()
Let me know if this works.

numpy - why np.multiply.reduce([], axis=None) results in 1?

Why multiply & reduce non-existing element in an empty array-like results in 1?
np.multiply.reduce([], axis=None)
---
1
ufunc may have an identify attribute:
In [200]: np.multiply.identity
Out[200]: 1
In [201]: np.multiply.reduce([])
Out[201]: 1.0
which can be replaced in a reduce:
In [202]: np.multiply.reduce([], initial=10)
Out[202]: 10.0
In [203]: np.multiply.reduce([1,2,3], initial=10)
Out[203]: 60
In [204]: np.multiply.reduce([1,2,3], initial=None)
Out[204]: 6
and if None,it can produce an error:
In [205]: np.multiply.reduce([], initial=None)
Traceback (most recent call last):
File "<ipython-input-205-1c3b1c890fd6>", line 1, in <module>
np.multiply.reduce([], initial=None)
ValueError: zero-size array to reduction operation multiply which has no identity
max is a ufunc without an intial:
In [211]: np.max([])
Traceback (most recent call last):
File "<ipython-input-211-93f3814168a1>", line 1, in <module>
np.max([])
File "<__array_function__ internals>", line 5, in amax
File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 2733, in amax
return _wrapreduction(a, np.maximum, 'max', axis, None, out,
File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity
In [212]: np.max([], initial=-1)
Out[212]: -1.0
Python reduce
In [222]: from functools import reduce
In [223]: reduce?
Docstring:
reduce(function, sequence[, initial]) -> value
Apply a function of two arguments cumulatively to the items of a sequence,
from left to right, so as to reduce the sequence to a single value.
For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates
((((1+2)+3)+4)+5). If initial is present, it is placed before the items
of the sequence in the calculation, and serves as a default when the
sequence is empty.
Type: builtin_function_or_method
So with a multiply lambda:
In [224]: reduce(lambda x,y: x*y,[1,2,3])
Out[224]: 6
For a empty list, error is the default behavior:
In [225]: reduce(lambda x,y: x*y,[])
Traceback (most recent call last):
File "<ipython-input-225-780706778563>", line 1, in <module>
reduce(lambda x,y: x*y,[])
TypeError: reduce() of empty sequence with no initial value
But with a supplied initial value:
In [227]: reduce(lambda x,y: x*y,[],1)
Out[227]: 1

model.predict(sample) returning TypeError: cannot perform reduce with flexible type

I keep running into this error when I try to predict based on fitted model.
training, testing = train_test_split(gesture, test_size = 0.2, random_state = 0)
x = training.drop('CLASS', axis = 1) # remove the Class column from Training dataframe
y = testing.drop('CLASS', axis = 1) # remove the Class column from Testing dataframe
f_train = x.values.tolist()
l_train = training['CLASS'].values.tolist() # make a list of class identifiers from Training dataframe
f_test = y.values.tolist()
knn = KNeighborsRegressor(n_neighbors = 5)
knn.fit(f_train, l_train)
predictions = knn.predict(f_test)
The error occurs in the last line of the above code and the error message is given below:
Traceback (most recent call last):
File "C:\Users\Umair Khan\Dropbox\`Shift betweeen PCs\Work\EMG Hand Gesture\Codes\ML_on_CSV.py", line 39, in <module>
predictions = knn.predict(f_test)
File "C:\Users\Umair Khan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\neighbors\_regression.py", line 185, in predict
y_pred = np.mean(_y[neigh_ind], axis=1)
File "<__array_function__ internals>", line 6, in mean
File "C:\Users\Umair Khan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\numpy\core\fromnumeric.py", line 3335, in mean
out=out, **kwargs)
File "C:\Users\Umair Khan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\numpy\core\_methods.py", line 151, in _mean
ret = umr_sum(arr, axis, dtype, out, keepdims)
TypeError: cannot perform reduce with flexible type
f_test is a list of lists like such [[16, 30, 35, 250, -1, 0.5, 35, 0.03, 0.02], [16, 30, 35, 250, -1, 0.5, 35, 0.03, 0.02]]
I have also tried passing an array in predict(sample) but the issue still remains.
predictions = knn.predict(np.array(f_test).astype(np.float))
We need to see more of the error traceback. And info on the function inputs, particularly shape and dtype.
I've seen this error message when working with structured arrays. But it's not obvious where those might arise in your code.
In [15]: np.ones((2,), dtype='i,i')
Out[15]: array([(1, 1), (1, 1)], dtype=[('f0', '<i4'), ('f1', '<i4')])
In [16]: np.sum(np.ones((2,), dtype='i,i'))
....
---> 87 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
88
89
TypeError: cannot perform reduce with flexible type
Solved:
changed dtype of l_train from string to float and the error disappeared. f_train and f_test were already of type float.

Subclass pandas DataFrame with required argument

I'm working on a new data structure that subclasses pandas DataFrame. I want to enforce my new data structure to have new_property, so that it can be processed safely later on.
However, I'm running into error when using my new data structure, because the constructor gets called by some internal pandas function without the required property.
Here is my new data structure.
import pandas as pd
class MyDataFrame(pd.DataFrame):
#property
def _constructor(self):
return MyDataFrame
_metadata = ['new_property']
def __init__(self, data, new_property, index=None, columns=None, dtype=None, copy=True):
super(MyDataFrame, self).__init__(data=data,
index=index,
columns=columns,
dtype=dtype,
copy=copy)
self.new_property = new_property
Here is an example that causes error
data1 = {'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [15, 25, 30], 'd': [1, 1, 2]}
df1 = MyDataFrame(data1, new_property='value')
df1[['a', 'b']]
Here is the error message
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-
packages\IPython\core\interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-33-b630fbf14234>", line 1, in <module>
df1[['a', 'b']]
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2053, in __getitem__
return self._getitem_array(key)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2098, in _getitem_array
return self.take(indexer, axis=1, convert=True)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1670, in take
result = self._constructor(new_data).__finalize__(self)
TypeError: __init__() missing 1 required positional argument: 'new_property'
Is there a fix to this or an alternative way to design this to enforce my new data structure to have new_property?
Thanks in advance!
This question has been answered by a brilliant pandas developer. See this issue for more details. Pasting the answer here.
class MyDataFrame(pd.DataFrame):
#property
def _constructor(self):
return MyDataFrame._internal_ctor
_metadata = ['new_property']
#classmethod
def _internal_ctor(cls, *args, **kwargs):
kwargs['new_property'] = None
return cls(*args, **kwargs)
def __init__(self, data, new_property, index=None, columns=None, dtype=None, copy=True):
super(MyDataFrame, self).__init__(data=data,
index=index,
columns=columns,
dtype=dtype,
copy=copy)
self.new_property = new_property
data1 = {'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [15, 25, 30], 'd': [1, 1, 2]}
df1 = MyDataFrame(data1, new_property='value')
df1[['a', 'b']].new_property
Out[121]: 'value'
MyDataFrame(data1)
TypeError: __init__() missing 1 required positional argument: 'new_property'
I know this is an old issue, but I wanted to extend on hlu's answer.
When implementing the answer described by hlu, I was getting the following error when just trying to print the subclassed DataFrame: AttributeError: 'internal_constructor' object has no attribute '_from_axes'
To fix this, I have used an object instead of the function used in hlu's answer to be able to implement the _from_axes method on the callable.
There is no classmethod type decorator for the _internal_constructor class, so instead we instantiate it with the callers class so it can be used when the _internal_constructor is called.
class MyDataFrame(pd.DataFrame):
#property
def _constructor(self):
return MyDataFrame._internal_constructor(self.__class__)
class _internal_constructor(object):
def __init__(self, cls):
self.cls = cls
def __call__(self, *args, **kwargs):
kwargs['my_required_argument'] = None
return self.cls(*args, **kwargs)
def _from_axes(self, *args, **kwargs):
return self.cls._from_axes(*args, **kwargs)

Incompatible indexer with Series

Why do I get an error:
import pandas as pd
a = pd.Series(index=[4,5,6], data=0)
print a.loc[4:5]
a.loc[4:5] += 1
Output:
4 0
5 0
Traceback (most recent call last):
File "temp1.py", line 9, in <module>
dtype: int64
a.loc[4:5] += 1
File "lib\site-packages\pandas\core\indexing.py", line 88, in __setitem__
self._setitem_with_indexer(indexer, value)
File "lib\site-packages\pandas\core\indexing.py", line 177, in _setitem_with_indexer
value = self._align_series(indexer, value)
File "lib\site-packages\pandas\core\indexing.py", line 206, in _align_series
raise ValueError('Incompatible indexer with Series')
ValueError: Incompatible indexer with Series
Pandas 0.12.
I think this is a bug, you can work around this by use tuple index:
import pandas as pd
a = pd.Series(index=[4,5,6], data=0)
print a.loc[4:5]
a.loc[4:5,] += 1