Error while trying to use get .year from an column - pandas

When I'm trying to get all years from by database, Python throws an error:
df = pd.to_datetime(df.index)
df['year'] = [i.year for i in df.index]
Output:
AttributeError: 'DatetimeIndex' object has no attribute 'index'

Use DatetimeIndex.year:
df['year'] = pd.to_datetime(df.index).year
In your solution assign back df.index:
df.index = pd.to_datetime(df.index)
df['year'] = [i.year for i in df.index]

Related

Dataframe index with isclose function

I have a dataframe with numerical values between 0 and 1. I am trying to create simple summary statistics (manually). I when using boolean I can get the index but when I try to use math.isclose the function does not work and gives an error.
For example:
import pandas as pd
df1 = pd.DataFrame({'col1':[0,.05,0.74,0.76,1], 'col2': [0,
0.05,0.5, 0.75,1], 'x1': [1,2,3,4,5], 'x2':
[5,6,7,8,9]})
result75 = df1.index[round(df1['col2'],2) == 0.75].tolist()
value75 = df1['x2'][result75]
print(value75.mean())
This will give the correct result but occasionally the value result is NAN so I tried:
result75 = df1.index[math.isclose(round(df1['col2'],2), 0.75, abs_tol = 0.011)].tolist()
value75 = df1['x2'][result75]
print(value75.mean())
This results in the following error message:
TypeError: cannot convert the series to <class 'float'>
Both are type "bool" so not sure what is going wrong here...
This works:
rows_meeting_condition = df1[(df1['col2'] > 0.74) & (df1['col2'] < 0.76)]
print(rows_meeting_condition['x2'].mean())

Pandas 'Timestamp' object is not subscriptable error

I have a data frame and I am trying to figure our days of the week for a data set.
df['day_of_week'] = df['CMPLNT_TO_DT'].dt.day_name()
TypeError: 'Timestamp' object is not subscriptable
Your problem is with an incorrect assignment.
df['date']=pd.date_range('2021/5/9', '2021/5/14')
df['date'].dt.day_name()
Output:
and:
df = pd.Timestamp('2017-01-01T12')
df['CMPLNT_TO_DT'].dt.day_name()
Output:
The problem is not with the .dt.day_name():
df = pd.Timestamp('2017-01-01T12')
df['CMPLNT_TO_DT']
again:

Whats is the most common way to convert a Timestamp to an int

I tried multiple ways but they always give me an error. The most common error I get is :
AttributeError: 'Timestamp' object has no attribute 'astype'
Here is the line where I try to convert my element :
df.index.map(lambda x: x - oneSec if (pandas.to_datetime(x).astype(int) / 10**9) % 1 else x)
I tried x.astype(int) or x.to_datetime().astype(int)
I think here is necessary use Index.where:
df = pd.DataFrame(index=(['2019-01-10 15:00:00','2019-01-10 15:00:01']))
df.index = pd.to_datetime(df.index)
mask = df.index.second == 1
print (mask)
[False True]
df.index = df.index.where(~mask, df.index - pd.Timedelta(1, unit='s'))
print (df)
Empty DataFrame
Columns: []
Index: [2019-01-10 15:00:00, 2019-01-10 15:00:00]

Get day of year from a string date in pandas dataframe

I want to turn my date string into day of year... I try this code..
import pandas as pd
import datetime
data = pd.DataFrame()
data = pd.read_csv(xFilename, sep=",")
and get this DataFrame
Index Date Tmin Tmax
0 1950-01-02 -16.508 -2.096
1 1950-01-03 -6.769 0.875
2 1950-01-04 -1.795 8.859
3 1950-01-05 1.995 9.487
4 1950-01-06 -17.738 -9.766
I try this...
convert = lambda x: x.DatetimeIndex.dayofyear
data['Date'].map(convert)
with this error:
AttributeError: 'str' object has no attribute 'DatetimeIndex'
I expect to get new date to match 1950-01-02 = 2, 1950-01-03 = 3...
Thank for your help... and sorry Im new on python
I think need pass parameter parse_dates to read_csv and then call Series.dt.dayofyear:
data = pd.read_csv(xFilename, parse_dates=["Date"])
data['dayofyear'] = data['Date'].dt.dayofyear

Why am I returned an object when using std() in Pandas?

The print for average of the spreads come out grouped and calculated right. Why do I get this returned as the result for the std_deviation column instead of the standard deviation of the spread grouped by ticker?:
pandas.core.groupby.SeriesGroupBy object at 0x000000000484A588
df = pd.read_csv('C:\\Users\\William\\Desktop\\tickdata.csv',
dtype={'ticker': str, 'bidPrice': np.float64, 'askPrice': np.float64, 'afterHours': str},
usecols=['ticker', 'bidPrice', 'askPrice', 'afterHours'],
nrows=3000000
)
df = df[df.afterHours == "False"]
df = df[df.bidPrice != 0]
df = df[df.askPrice != 0]
df['spread'] = (df.askPrice - df.bidPrice)
df['std_deviation'] = df['spread'].std(ddof=0)
df = df.groupby(['ticker'])
print(df['std_deviation'])
print(df['spread'].mean())
UPDATE: no longer being returned an object but now trying to figure out how to have the standard deviation displayed by ticker
df['spread'] = (df.askPrice - df.bidPrice)
df2 = df.groupby(['ticker'])
print(df2['spread'].mean())
df = df.set_index('ticker')
print(df['spread'].std(ddof=0))
UPDATE2: got the dataset I needed using
df = df[df.afterHours == "False"]
df = df[df.bidPrice != 0]
df = df[df.askPrice != 0]
df['spread'] = (df.askPrice - df.bidPrice)
print(df.groupby(['ticker'])['spread'].mean())
print(df.groupby(['ticker'])['spread'].std(ddof=0))
This line:
df = df.groupby(['ticker'])
assigns df to a DataFrameGroupBy object, and
df['std_deviation']
is a SeriesGroupBy object (of the column).
It's a good idea not to "shadow" / re-assign one variable to a completely different datatype. Try to use a different variable name for the groupby!