Whats is the most common way to convert a Timestamp to an int - pandas

I tried multiple ways but they always give me an error. The most common error I get is :
AttributeError: 'Timestamp' object has no attribute 'astype'
Here is the line where I try to convert my element :
df.index.map(lambda x: x - oneSec if (pandas.to_datetime(x).astype(int) / 10**9) % 1 else x)
I tried x.astype(int) or x.to_datetime().astype(int)

I think here is necessary use Index.where:
df = pd.DataFrame(index=(['2019-01-10 15:00:00','2019-01-10 15:00:01']))
df.index = pd.to_datetime(df.index)
mask = df.index.second == 1
print (mask)
[False True]
df.index = df.index.where(~mask, df.index - pd.Timedelta(1, unit='s'))
print (df)
Empty DataFrame
Columns: []
Index: [2019-01-10 15:00:00, 2019-01-10 15:00:00]

Related

Error while trying to use get .year from an column

When I'm trying to get all years from by database, Python throws an error:
df = pd.to_datetime(df.index)
df['year'] = [i.year for i in df.index]
Output:
AttributeError: 'DatetimeIndex' object has no attribute 'index'
Use DatetimeIndex.year:
df['year'] = pd.to_datetime(df.index).year
In your solution assign back df.index:
df.index = pd.to_datetime(df.index)
df['year'] = [i.year for i in df.index]

Pandas Data frame column condition check based on length of the value

I have pandas data frame which gets created by reading an excel file. The excel file has a column called serial number. Then I pass a serial number to another function which connect to API and fetch me the result set for those serial number.
My Code -:
def create_excel(filename):
try:
data = pd.read_excel(filename, usecols=[4,18,19,20,26,27,28],converters={'Serial Number': '{:0>32}'.format})
except Exception as e:
sys.exit("Error reading %s: %s" % (filename, e))
data["Subject Organization"].fillna("N/A",inplace= True)
df = data[data['Subject Organization'].str.contains("Fannie",case = False)]
#df['Serial Number'].apply(lamda x: '000'+x if len(x) == 29 else '00'+x if len(x) == 30 else '0'+x if len(x) == 31 else x)
print(df)
df.to_excel(r'Data.xlsx',index= False)
output = df['Serial Number'].apply(lambda x: fetch_by_ser_no(x))
df2 = pd.DataFrame(output)
df2.columns = ['Output']
df5 = pd.concat([df,df2],axis = 1)
The problem I am facing is I want to check if df5 returned by fetch_by_ser_no() is blank then make the serial number as 34 characters by adding two more leading 00 and then check the function again.
How can I do it by not creating multiple dataframe
Any help!!
Thanks
You can try to use if ... else ...:
output = df['Serial Number'].apply(lambda x: 'ok' if fetch_by_ser_no(x) else 'badly')

Function giving error when run on the same dataframe more than once

Function giving error when run on the same data frame more than once. it works fine the first time but when run again on the same df it gives me this error:
IndexError: single positional indexer is out-of-bounds
def update_data(df):
df.drop(df.columns[[-1, -2, -3]], axis=1, inplace=True)
df.loc['Total'] = df.sum()
df.iloc[-1, 0] = 'Group'
df = df.set_index(list(df)[0])
for i in range(1, 21):
df.iloc[-1, i] = 100 + (100 * (
(df.iloc[-1, i] - df.iloc[-1, 0]) / abs(df.iloc[-1, 0])))
df.iloc[-1, 0] = 100
xax = list(df.columns.values)
yax = df.values[-1].tolist()
d = {'period': xax, 'level': yax}
index_level = pd.DataFrame(d)
index_level['level'] = index_level['level'].round(3)
return index_level
Using inplace=True in a function changes the input data frame. Of course there it doesn't work, your function presumes the data is in some format at the start of the function. That assumption is broken.
df = pd.DataFrame([{'x': 0}])
def change(df):
df.drop(columns=['x'], inplace=True)
return len(df)
change(df)
Out[346]: 1
df
Out[347]:
Empty DataFrame
Columns: []
Index: [0]

pandas: filter rows with list elements beginning with string?

Blockquote
I have the following dataframe.
d = pd.DataFrame({'a': [['foo', 'bar'], ['bar'], ['fah', 'baz']})
I'd like to return just the rows with values of a beginning f in them - i.e. the first and third rows.
This is what I've tried:
d[d.a.is_in('f')]
Use any in list comprehension with generator:
d = d[[any(y.startswith('f') for y in x) for x in d['a']]]
print (d)
a
0 [foo, bar]
2 [fah, baz]
Detail: (convert to list only for sample)
print ([list(y.startswith('f') for y in x) for x in d['a']])
[[True, False], [False], [True, False]]
Solution using .apply(), iterating over the individual list elements, checking with .startswith() and evaluating the length of the resultant list:
import pandas as pd
df = pd.DataFrame({'a': [['foo', 'bar'], ['bar'], ['fah', 'baz']]})
df = df[df.a.apply(lambda x: len([el for el in x if el.startswith('f')]) > 0)]
print(df)
which results in:
a
0 [foo, bar]
2 [fah, baz]

Why am I returned an object when using std() in Pandas?

The print for average of the spreads come out grouped and calculated right. Why do I get this returned as the result for the std_deviation column instead of the standard deviation of the spread grouped by ticker?:
pandas.core.groupby.SeriesGroupBy object at 0x000000000484A588
df = pd.read_csv('C:\\Users\\William\\Desktop\\tickdata.csv',
dtype={'ticker': str, 'bidPrice': np.float64, 'askPrice': np.float64, 'afterHours': str},
usecols=['ticker', 'bidPrice', 'askPrice', 'afterHours'],
nrows=3000000
)
df = df[df.afterHours == "False"]
df = df[df.bidPrice != 0]
df = df[df.askPrice != 0]
df['spread'] = (df.askPrice - df.bidPrice)
df['std_deviation'] = df['spread'].std(ddof=0)
df = df.groupby(['ticker'])
print(df['std_deviation'])
print(df['spread'].mean())
UPDATE: no longer being returned an object but now trying to figure out how to have the standard deviation displayed by ticker
df['spread'] = (df.askPrice - df.bidPrice)
df2 = df.groupby(['ticker'])
print(df2['spread'].mean())
df = df.set_index('ticker')
print(df['spread'].std(ddof=0))
UPDATE2: got the dataset I needed using
df = df[df.afterHours == "False"]
df = df[df.bidPrice != 0]
df = df[df.askPrice != 0]
df['spread'] = (df.askPrice - df.bidPrice)
print(df.groupby(['ticker'])['spread'].mean())
print(df.groupby(['ticker'])['spread'].std(ddof=0))
This line:
df = df.groupby(['ticker'])
assigns df to a DataFrameGroupBy object, and
df['std_deviation']
is a SeriesGroupBy object (of the column).
It's a good idea not to "shadow" / re-assign one variable to a completely different datatype. Try to use a different variable name for the groupby!