pandas: df.to_csv() creates AttributeError - pandas

I'm trying to export a pandas dataframe with df.to_csv(), which should be easy enough. Unfortunately, this code:
df.to_csv(r'C:/Users/my/path/to/file.csv', index=FALSE, encoding='utf-8')
Gives me this error:
AttributeError: '_io.BufferedReader' object has no attribute 'to_csv'
What am I doing wrong? I'm working in a jupyter notebook on a mac in case that's important. Sorry for such a noob question, I know this should be super easy
I googled similar issues where attribute so-and-so is missing, but none of the ones I found helped my problem

Related

Plots from excel with panda and seaborn 'ufunc 'isfinite' not supported for the input types'

I am trying to configure a template for creating plots for my test data. Therefore I need to say I am pretty new to that in python, and I already googled quite a lot regarding my question but what I found could not help me. I have a excel table with data in two columns, which I want to plot against each other. My code looks as follows
file='C:/Documents/Test/test_file.xlsx'
df1=pd.read_Excel(file,sheet_name='sheet1',header=0, engine="openpyxl")
plt.figure()
sns.lineplot(data=df1[:,:],x="eps",y="sigma",sort=False,linewidth=0.8)
The excel has -as mentioned a header with eps and sigma as x and y values. The values following are floats, when I check the datatype with df1.dtypes, the result is 'float64' So has anyone an idea what is not working? I get the error 'ufunc 'isfinite' not supported for the input types'
Plotting data from excel with panda and seaborn against each other and save the image.
This might be a library issue. I've been running into the same problem with example datasets and even a very simple:
sns.lineplot(x=[1], y=[1])
I'll update if I find a solution.
Edit: There seems to be an issue with Numpy that is causing this issue with Seaborn. Solution is to downgrade Numpy to 1.23 until 1.24.1 is released.
https://github.com/mwaskom/seaborn/issues/3192

TypeError when using modin with pd.cut(df[column],300)

I first sub in Modin for Pandas for the benefit of distributed work over multiple cores:
import modin.pandas as pd
from modin.config import Engine
Engine.put("dask")
After initializing my dataframe, I attempt to use:
df['bins'] = pd.cut(df[column],300)
I get this error:
TypeError: ('Could not serialize object of type function.', '<function PandasDataframe._build_mapreduce_func.<locals>._map_reduce_func at 0x7fbe78580680>')
Would be glad to get help.
I can't seem to get Modin to perform the way that I want out of the box, the way I expected.

fft gives KeyError: 'ALIGNED' in Pandas

The code only has the error when I use the scipy fftpack on my data(from excel).
Plotting my data normally has worked just fine. I have heard some suggestions saying turn it into an array but I have tried this and it did not work. enter image description here
enter image description here

Sklearn datasets default data structure is pandas or numPy?

I'm working through an exercise in https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/ and am finding unexpected behavior on my computer when I fetch a dataset. The following code returns
numpy.ndarray
on the author's Google Collab page, but returns
pandas.core.frame.DataFrame
on my local Jupyter notebook. As far as I know, my environment is using the exact same versions of libraries as the author. I can easily convert the data to a numPy array, but since I'm using this book as a guide for novices, I'd like to know what could be causing this discrepancy.
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1)
mnist.keys()
type(mnist['data'])
The author's Google Collab is at the following link, scrolling down to the "MNIST" heading. Thanks!
https://colab.research.google.com/github/ageron/handson-ml2/blob/master/03_classification.ipynb#scrollTo=LjZxzwOs2Q2P.
Just to close off this question, the comment by Ben Reiniger, namely to add as_frame=False, is correct. For example:
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
The OP has already made this change to the Colab code in the link.

floats in NumPy structured array and native string formatting with .format()

Can anyone tell me why this NumPy record is having trouble with Python's new-style string formatting? All floats in the record choke on "{:f}".format(record).
Thanks for your help!
In [334]: type(tmp)
Out[334]: numpy.core.records.record
In [335]: tmp
Out[335]: ('XYZZ', 2001123, -23.823917388916016)
In [336]: tmp.dtype
Out[336]: dtype([('sta', '|S6'), ('ondate', '<i8'), ('lat', '<f4')])
# Some formatting works fine
In [337]: '{0.sta:6.6s} {0.ondate:8d}'.format(tmp)
Out[337]: 'XYZZ 2001123'
# Any float has trouble
In [338]: '{0.sta:6.6s} {0.ondate:8d} {0.lat:11.6f}'.format(tmp)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/Users/jkmacc/python/pisces/<ipython-input-338-e5f6bcc4f60f> in <module>()
----> 1 '{0.sta:6.6s} {0.ondate:8d} {0.lat:11.6f}'.format(tmp)
ValueError: Unknown format code 'f' for object of type 'str'
This question was answered on the NumPy user mailing list under "floats coerced to string with "{:f}".format() ?":
It seems that np.int64/32 and np.str inherit their respective native Python __format__(), but np.float32/64 doesn't get __builtin__.float.__format__(). That's not intuitive, but I see now why this works:
In [8]: '{:6.6s} {:8d} {:11.6f}'.format(tmp.sta, tmp.ondate, float(tmp.lat))
Out[8]: 'XYZZ 2001123 -23.820000'
Thanks!
-Jon
EDIT:
np.float32/int32 inherits from native Python types if your system is 32-bit. Same for 64-bit. A mismatch will generate the same problem as the original post.