How to resample a time series Pandas dataframe? - pandas

I am trying to resample 1 minute based data to day. I have tried the following code on IPython
import pandas as pd
import numpy as np
from pandas import Series, DataFrame, Panel
import matplotlib.pyplot as plt
%matplotlib inline
data = pd.read_csv("DATALOG_22_01_2014.csv",\
names = ['DATE','TIME','HUM1','TMP1','HUM2','TMP2','HUM3','TMP3','WS','WD'])
data.set_index(['DATE','TIME'])
data.resample('D',how=mean)
But I got the following error
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-75-aa63b6b16877> in <module>()
----> 1 data.resample('D', how=mean)
NameError: name 'mean' is not defined
Could you help me?
Thank you
Hugo

Try
data.resample('D', how='mean')
instead. Right now you're asking Python to pass the mean object to the resample method as the how argument, but you don't have one defined.

Related

can not plot a graph using matplotlib showing error

Exception has occurred: ImportError
dlopen(/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/PIL/_imaging.cpython-39-darwin.so, 0x0002): symbol not found in flat namespace '_xcb_connect'
File "/Users/showrov/Desktop/Machine learning/Preprosessing/import_dataset.py", line 2, in <module>
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import sys
print(sys.version)
data=pd.read_csv('Data_customer.csv')
print(data)
plt.plot(data[:2],data[:2])
data[:2] will return the first 2 rows. In order to plot, you need to use the columns.
Mention the column name directly like data['columnName'] otherwise use the iloc method.
for example: data.iloc[:, 1:2] in order to access 2nd column.
For more information about indexing operations, please check out this link

when i import numpy and pandas in jupyter it gives error same in spider but in spider works after starting new kernel

When I import numpy and pandas in jupyter it gives error same in spider but in spider works after starting new kernel.
import numpy as np
NameError Traceback (most recent call last)
<ipython-input-1-0aa0b027fcb6> in <module>
----> 1 import numpy as np
~\numpy.py in <module>
1 from numpy import*
2
----> 3 arr = array([1,2,3,4])
NameError: name 'array' is not defined
this is showing "NameError" which is due to the
arr=array([1,2,3,4])
you should try something like this
arr=np.array([1,2,3,4])
I found the error. It was a very bad mistake my c files have program numpy.py so while importing numpy python was accessing that file not the numpy module. So i deleted that and everything worked fine.
Try this:
arr=np.array([1,2,3,4])
As you are using numpy as np, to create an array the following syntax is needed:
arr=np.array([1,2,3])

i got an attribute error couldn't figure this one out so i had to ask here in python3.6 and the pandas data frame work

I am trying to execute the following code:
sapmle2000submission.astype('int32').dtypes
which raises an error:
AttributeError Traceback (most recent call
last) in ()
----> 1 sapmle2000submission.astype('int32').dtypes
AttributeError: 'list' object has no attribute 'astype'
Can someone please help me to figure out why?
Looks like that the obejct sapmle2000submission is a list, not a pandas Series.
You can convert it as Series and specify its dtype:
import pandas as pd
sapmle2000submission_series = pd.Series(sapmle2000submission, dtype='int32')

NameError: name 'pd' is not defined

I am attempting run in Jupyter
import pandas as pd
import matplotlib.pyplot as plt # plotting
import numpy as np # dense matrices
from scipy.sparse import csr_matrix # sparse matrices
%matplotlib inline
However when loading the dataset with
wiki = pd.read_csv('people_wiki.csv')
# add id column
wiki['id'] = range(0, len(wiki))
wiki.head(10)
the following error persists
NameError Traceback (most recent call last)
<ipython-input-1-56330c326580> in <module>()
----> 1 wiki = pd.read_csv('people_wiki.csv')
2 # add id column
3 wiki['id'] = range(0, len(wiki))
4 wiki.head(10)
NameError: name 'pd' is not defined
Any suggestions appreciated
Select Restart & Clear Output and run the cells again from the beginning.
I had the same issue, and as Ivan suggested in the comment, this resolved it.
If you came here from a duplicate, notice also that your code needs to contain
import pandas as pd
in the first place. If you are using a notebook like Jupyter and it's already there, or if you just added it, you probably need to re-evaluate the cell, as suggested in the currently top-voted answer by martin-martin.
python version will need 3.6 above, I think you have been use the python 2.7. Please select from top right for your python env version.
Be sure to load / import Pandas first
When stepping through the Anaconda Navigator demo, I found that pressing "play" on the first line before inputting the second line resolved the issue.

Unable to use seaborn.countplot

I'm trying to plot some graphs using the latest version of Pycharm as a Python IDE.
As an interpreter, I'm using Anaconda with Python 3.4.3-0.
I have installed using conda install the news version of pandas (0.17.0), seaborn (0.6.0), numpy (1.10.1), matplotlib (1.4.3), ipython (4.0.1)
Inside the nesarc_pds.csv I have this:
IDNUM,S1Q2I
39191,1
39787,1
40082,1
40189,1
40226,1
40637,1
41306,1
41627,1
41710,1
42113,1
42120,1
42720,1
42909,1
43092,1
7,2
15,2
25,2
40,2
46,2
49,2
57,2
63,2
68,2
100,2
104,2
116,2
125,2
136,2
137,2
145,2
168,2
3787,9
6554,9
7616,9
11686,9
12431,9
14889,9
17694,9
19440,9
20141,9
21540,9
22476,9
24207,9
25762,9
29045,9
29731,9
So, that being said, this is my code:
import pandas as pd
import numpy
import seaborn as snb
import matplotlib.pyplot as plt
data = pd.read_csv("nesarc_pds.csv", low_memory=False)
#converting variable to numeric
pd.to_numeric(data["S1Q2I"], errors='coerce')
#setting a new dataset...
sub1=data[(data["S1Q2I"]==1) & (data["S3BQ1A5"]==1)]
sub2 = sub1.copy()
#setting the missing data 9 = unknown into NaN
sub2["S1Q2I"] = sub2["S1Q2I"].replace(9, numpy.nan)
#setting date to categorical type
sub2["S1Q2I"] = sub2["S1Q2I"].astype('category')
#plotting
snb.countplot(x="S1Q2I", data=sub2)
plt.xlabel("blablabla")
plt.title("lalala")
And then.....this is the error:
Traceback (most recent call last):
File "C:/Users/LPForGE_1/PycharmProjects/guido/haha.py", line 49, in <module>
snb.countplot(x="S1Q2I", data=sub2)
File "C:\Anaconda3\lib\site-packages\seaborn\categorical.py", line 2544, in countplot
errcolor)
File "C:\Anaconda3\lib\site-packages\seaborn\categorical.py", line 1263, in __init__
self.establish_colors(color, palette, saturation)
File "C:\Anaconda3\lib\site-packages\seaborn\categorical.py", line 300, in establish_colors
l = min(light_vals) * .6
ValueError: min() arg is an empty sequence
Any help would be really nice. I pretty much exhausted my intelligence trying to understand how to solve this.