Pandas and correlation, cant make them work - pandas

I am trying to generate a correlation matrix using corr()
from datascience import *
import pandas as pd
import numpy as np
import scipy
import matplotlib.pyplot as plots
plots.style.use ("fivethirtyeight")
premier = Table.read_table("Documents//Stats PL1.csv")
premier.corr()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-21-9531b81a92c8> in <module>
----> 1 premier.corr()
~\anaconda3\lib\site-packages\datascience\tables.py in __getattr__(self, attr)
233 else:
234 msg = "'{0}' object has no attribute '{1}'".format(type(self).__name__, attr)
--> 235 raise AttributeError(msg)
236
237 ####################
AttributeError: 'Table' object has no attribute 'corr'
I already review all the pandas instructions, installed and unistalled all the packages, this is not the first function that fail (the other one was df.to_excel) but didnĀ“t solved it either
Realy dont know where to look for an answer, thanks for all the help!

Related

ImportError: cannot import name '_backports' from 'matplotlib.cbook'

---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_22516\254480426.py in <module>
2 import matplotlib.pyplot as plt
3 from mpl_toolkits.mplot3d import Axes3D, art3d # NOQA
----> 4 from matplotlib.cbook import _backports
5 from collections import defaultdict
6 import types
ImportError: cannot import name '_backports' from 'matplotlib.cbook' (C:\Users\saidt\anaconda3\envs\tensorflow\lib\site-packages\matplotlib\cbook\__init__.py)
I cannot solve this problem. I tried several ways advised on other posts (uninstalled, reinstalled, changed the version of matplotlib) but still have this issue. I even checked matplotlib.cbook (https://matplotlib.org/stable/api/cbook_api.html) but there is no _backports. I wonder where to find the solution. If someone could help, I would really appreciate it.

when i import numpy and pandas in jupyter it gives error same in spider but in spider works after starting new kernel

When I import numpy and pandas in jupyter it gives error same in spider but in spider works after starting new kernel.
import numpy as np
NameError Traceback (most recent call last)
<ipython-input-1-0aa0b027fcb6> in <module>
----> 1 import numpy as np
~\numpy.py in <module>
1 from numpy import*
2
----> 3 arr = array([1,2,3,4])
NameError: name 'array' is not defined
this is showing "NameError" which is due to the
arr=array([1,2,3,4])
you should try something like this
arr=np.array([1,2,3,4])
I found the error. It was a very bad mistake my c files have program numpy.py so while importing numpy python was accessing that file not the numpy module. So i deleted that and everything worked fine.
Try this:
arr=np.array([1,2,3,4])
As you are using numpy as np, to create an array the following syntax is needed:
arr=np.array([1,2,3])

How can I install GEOS?

I have problems installing basemap.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
I get the following error:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-1-db2649dcf0a1> in <module>
2 import numpy as np
3 import matplotlib.pyplot as plt
----> 4 from mpl_toolkits.basemap import Basemap
~/opt/anaconda3/lib/python3.7/site-packages/mpl_toolkits/basemap/__init__.py in <module>
154 # create dictionary that maps epsg codes to Basemap kwargs.
155 pyproj_datadir = os.environ['PROJ_LIB']
--> 156 epsgf = open(os.path.join(pyproj_datadir,'epsg'))
157 epsg_dict={}
158 for line in epsgf:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/andreamathis/opt/anaconda3/share/proj/epsg'
It looks that a the file 'epsg' is missing. Has somebody encountered this error before and knows how to solve the problem?

Unable to use seaborn.countplot

I'm trying to plot some graphs using the latest version of Pycharm as a Python IDE.
As an interpreter, I'm using Anaconda with Python 3.4.3-0.
I have installed using conda install the news version of pandas (0.17.0), seaborn (0.6.0), numpy (1.10.1), matplotlib (1.4.3), ipython (4.0.1)
Inside the nesarc_pds.csv I have this:
IDNUM,S1Q2I
39191,1
39787,1
40082,1
40189,1
40226,1
40637,1
41306,1
41627,1
41710,1
42113,1
42120,1
42720,1
42909,1
43092,1
7,2
15,2
25,2
40,2
46,2
49,2
57,2
63,2
68,2
100,2
104,2
116,2
125,2
136,2
137,2
145,2
168,2
3787,9
6554,9
7616,9
11686,9
12431,9
14889,9
17694,9
19440,9
20141,9
21540,9
22476,9
24207,9
25762,9
29045,9
29731,9
So, that being said, this is my code:
import pandas as pd
import numpy
import seaborn as snb
import matplotlib.pyplot as plt
data = pd.read_csv("nesarc_pds.csv", low_memory=False)
#converting variable to numeric
pd.to_numeric(data["S1Q2I"], errors='coerce')
#setting a new dataset...
sub1=data[(data["S1Q2I"]==1) & (data["S3BQ1A5"]==1)]
sub2 = sub1.copy()
#setting the missing data 9 = unknown into NaN
sub2["S1Q2I"] = sub2["S1Q2I"].replace(9, numpy.nan)
#setting date to categorical type
sub2["S1Q2I"] = sub2["S1Q2I"].astype('category')
#plotting
snb.countplot(x="S1Q2I", data=sub2)
plt.xlabel("blablabla")
plt.title("lalala")
And then.....this is the error:
Traceback (most recent call last):
File "C:/Users/LPForGE_1/PycharmProjects/guido/haha.py", line 49, in <module>
snb.countplot(x="S1Q2I", data=sub2)
File "C:\Anaconda3\lib\site-packages\seaborn\categorical.py", line 2544, in countplot
errcolor)
File "C:\Anaconda3\lib\site-packages\seaborn\categorical.py", line 1263, in __init__
self.establish_colors(color, palette, saturation)
File "C:\Anaconda3\lib\site-packages\seaborn\categorical.py", line 300, in establish_colors
l = min(light_vals) * .6
ValueError: min() arg is an empty sequence
Any help would be really nice. I pretty much exhausted my intelligence trying to understand how to solve this.

How to resample a time series Pandas dataframe?

I am trying to resample 1 minute based data to day. I have tried the following code on IPython
import pandas as pd
import numpy as np
from pandas import Series, DataFrame, Panel
import matplotlib.pyplot as plt
%matplotlib inline
data = pd.read_csv("DATALOG_22_01_2014.csv",\
names = ['DATE','TIME','HUM1','TMP1','HUM2','TMP2','HUM3','TMP3','WS','WD'])
data.set_index(['DATE','TIME'])
data.resample('D',how=mean)
But I got the following error
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-75-aa63b6b16877> in <module>()
----> 1 data.resample('D', how=mean)
NameError: name 'mean' is not defined
Could you help me?
Thank you
Hugo
Try
data.resample('D', how='mean')
instead. Right now you're asking Python to pass the mean object to the resample method as the how argument, but you don't have one defined.