Regarding ValueError: NaTType does not support strftime - dataframe

My dataframe has 2 levels for columns and I need to convert my column level[1] from datetime into strings but columns headers have some 'NaT's and hence my strftime function is failing.
df.columns=['a','b','d','d']+[x.strftime('%m-%d-%Y') for x in df.columns.levels[1][:-1]]
This gives me error that
ValueError: NaTType does not support strftime
Based on discussions on similar topic, I tried using
[x.datetime.strftime('%m-%d-%Y') for x in df.columns.levels[1][:-1]]
but then I get an error saying
AttributeError: 'Timestamp' object has no attribute 'datetime'
Is there anything that I am missing. Please help.
thank you!

You can add a condition when converting with:
[ x.strftime('%m-%d-%Y') if not pd.isnull(x) else "01-01-1970" \
for x in df.columns.levels[1][:-1]]

Related

Convert dataframe column from Object to numeric

Hello I have a conversion question. I'm using some code to conditionally add a value to a new column in my dataframe (df). the new column ('new_col') is created in type object. How do I convert 'new_col' in dataframe to float for aggregation in code to follow. I'm new to python, tried several function and methods. Any help would be greatly appreciated.
conds = [(df['sc1']=='UP_MJB'),(df['sc1']=='UP_MSCI')]
actions = [df['st1'],df['st2']]
df['new_col'] = np.select(conds,actions,default=df['sc1'])
tried astype(float), got value Error. Talked to teammate, tried df.to_numeric(np.select(conds,actions,default=df['sc1'])). That worked.

Filtering DataFrame with static date value

I am trying to filter DataFrame to get all dates greater than '2012-09-15'
I tried the solution from another post which suggested me to use
data.filter(data("date").lt(lit("2015-03-14")))
but i am getting an error
TypeError: 'DataFrame' object is not callable
What is the solution for this
You need square brackets around "date", i.e.
data.filter(data["date"] < lit("2015-03-14"))
Calling data("date") is treating data as a function (rather than a dataframe)

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

I have the following Panda dataframe:
dataframe
and I need to group the columns per quarter using the resampling function
df_copy=df_copy.resample('Q',axis=1).mean()
however when I apply the function I get the following error:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or
PeriodIndex, but got an instance of 'Index'
Please, can you help me to group the years into quarters? this function is just driving me crazy, I also previous to applying the function I change the type of the years(2000-1,2000-2,2000-3.....) from String to DateTime in order to re-sample.

TypeError: <class 'datetime.time'> is not convertible to datetime

The problem is somewhat simple. My objective is to compute the days difference between two dates, say A and B.
These are my attempts:
df['daydiff'] = df['A']-df['B']
df['daydiff'] = ((df['A']) - (df['B'])).dt.days
df['daydiff'] = (pd.to_datetime(df['A'])-pd.to_datetime(df['B'])).dt.days
These works for me before but for some reason, I'm keep getting this error this time:
TypeError: class 'datetime.time' is not convertible to datetime
When I export the df to excel, then the date works just fine. Any thoughts?
Use pd.Timestamp to handle the awkward differences in your formatted times.
df['A'] = df['A'].apply(pd.Timestamp) # will handle parsing
df['B'] = df['B'].apply(pd.Timestamp) # will handle parsing
df['day_diff'] = (df['A'] - df['B']).dt.days
Of course, if you don't want to change the format of the df['A'] and df['B'] within the DataFrame that you are outputting, you can do this in a one-liner.
df['day_diff'] = (df['A'].apply(pd.Timestamp) - df['B'].apply(pd.Timestamp)).dt.days
This will give you the days between as an integer.
When I applied the solution offered by emmet02, I got TypeError: Cannot convert input [00:00:00] of type as well. It's basically saying that the dataframe contains missing timestamp values which are represented as [00:00:00], and this value is rejected by pandas.Timestamp function.
To address this, simply apply a suitable missing-value strategy to clean your data set, before using
df.apply(pd.Timestamp)

python read data like -.345D-4

I am wondering how to read the values like -.345D+1 with numpy/scipy?
The values are float with first 0 ignored.
I have tried the numpy.loadtxt and got errors like
ValueError: invalid literal for float(): -.345D+01
Many thanks :)
You could write a converter and use the converters keyword. If cols are the indices of the columns where you expect this format:
converters = dict.fromkeys(cols, lambda x: float(x.replace("D", "E")))
np.loadtxt(yourfile, converters=converters)