Calculating Root-Mean-Square of pandas dataframe column - pandas

I have 50 residual values that are in the format 00:00:00.0000 under df['Residuals'] but hold actual values in a Pandas dataframe columns such as:
00:00:04.7328
00:00:01.4252
and so on. I want to calculate the rms value of these times in seconds but cannot convert them from this format to just a decimal format. The dtype of the listed values above says m8[ns] which I am unfamiliar with. My question is how can I convert it from this m8[ns] format to an integer and then run the calculations?

The first thing to be paid attention to is the dtype, whether it's <m8[ns] (which is TimedeltaProperties) or <M8[ns] (which is DatetimeProperties)
In the case of <m8[ns]:
df['Residuals'].dt.seconds + df['Residuals'].dt.microseconds*1e-6
should get you the answer.
In the case of <M8[ns]:
df['Residuals'].dt.second + df['Residuals'].dt.microsecond*1e-6 # without 's'
should get you the answer.

Related

Pandas resample based on datetime column where there're duplicate datetimes then plot

Suppose I have two columns time (e.g. 2019-02-13T22:31:47.000000000) and amount (e.g. 15). The time column might have duplicates.
What's the best way to resample amount into daily/monthly/yearly then plot?
I tried:
df.resample('M', on='time').sum().plot(x='time', y='amount')
but it says:
raise KeyError(key) from err
if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 'time'
Already verified that time is datetime (without null values):
df['time'].isnull().any()
false
Amount is float as well.
The documentation says
The object must have a datetime-like index
So, try:
df.set_index('time').resample('M').sum().plot(x='time', y='amount')

Pandas Python, as a currency

Suppose I have a column with currency values, as well as blank values, I want the blanks to be represented at 0.00 and the currency to be in two decimal place values, how would I do this using pandas python?
If you just want to alter the visual appearance you can use
df.fillna(0).apply(lambda x: ["{0:.2f}".format(item) for item in x ])
This will fill all np.nan with 0 and then convert the full dataframe to a string with given format-specification. However, this is a very dump approach, since you loose all abilities for calculating with your data.

dataframe not reading float values

I have a dataframe that contains times in float format for example 12.0, 12.25, 12.75 with 27 columns. I have an if which checks if a user-given time is in the dataframe, but it only recognizes the 12.0 formatted time out of the dataframe. I am checking from the dataframe df4 in a specific column "Timestamp" if the given time (return_time) is in the column and get the corresponding index so that I can change its value in each column and then write it into a csv file.
if return_time in df4["Timestamp"]:
idx=df4[df4["Timestamp"]==return_time].index.values
df4.loc[idx,i]="CHARGING"
df4.to_csv("test.csv")
I should have gotten 27 different times as a result to store them in each column of the csv but out of the 27 i only get a few that correspond to the .0 type. It doesn't recognize the other types like .25 .50 or .75
You don't need to get the values of index that you want to select using .loc[...]. Just use idx=df4[df4["Timestamp"]==return_time].index

Changing Excel Dates (As integers) mixed with timestamps in single column - Have tried str.extract

I have a dataframe with a column of dates, unfortunately my import (using read_excel) brought in format of dates as datetime and also excel dates as integers.
What I am seeking is a column with dates only in format %Y-%m-%d
From research, excel starts at 1900-01-00, so I could add these integers. I have tried to use str.extract and a regex in order to separate the columns into two, one of datetimes, the other as integers. However the result is NaN.
Here is an input code example
df = pd.DataFrame({'date_from': [pd.Timestamp('2022-09-10 00:00:00'),44476, pd.Timestamp('2021-02-16 00:00:00')], 'date_to': [pd.Timestamp('2022-12-11 00:00:00'),44455, pd.Timestamp('2021-12-16 00:00:00')]})
Attempt to first separate the columns by extracting the integers( dates imported from MS excel)
df.date_from.str.extract(r'(\d\d\d\d\d)')
however this gives NaN.
The reason I have tried to separate integers out of the column, is that I get an error when trying to act on the excel dates within the mixed column (in other words and error using the following code:)
def convert_excel_time(excel_time):
return pd.to_datetime('1900-01-01') + pd.to_timedelta(excel_time,'D')
Any guidance on how I might get a column of dates only? I find the datetime modules and aspects of pandas and python the most frustrating of all to get to grips with!
thanks
You can convert values to timedeltas by to_timedelta with errors='coerce' for NaT if not integers add Timestamp called d, then convert datetimes with errors='coerce' and last pass to Series.fillna in custom function:
def f(x):
#https://stackoverflow.com/a/9574948/2901002
d = pd.Timestamp(1899, 12, 30)
timedeltas = pd.to_timedelta(x, unit='d', errors='coerce')
dates = pd.to_datetime(x, errors='coerce')
return (timedeltas + d).fillna(dates)
cols = ['date_from','date_to']
df[cols] = df[cols].apply(f)
print (df)
date_from date_to
0 2022-09-10 2022-12-11
1 2021-10-07 2021-09-16
2 2021-02-16 2021-12-16

Convert floats to ints in pandas dataframe

I have a pandas dataframe with a column ‘distance’ and it is of datatype ‘float64’.
Distance
14.827379
0.754254
0.2284546
1.833768
I want to convert these numbers to whole numbers (14,0,0,1). I tried with this but I get the error “ValueError: Cannot convert NA to integer”.
df['distance(kmint)'] = result['Distance'].astype('int')
Any help would be appreciated!!
I filtered out the NaN's from the dataframe using this:
result = result[np.isfinite(result['distance(km)'])]
Then, I was able to convert from float to int.
An alternative approach would be to convert the NaN values as part of your data import and cleaning processes. The more generalized solution could involve specifying the values that are NaN in the read_table command by setting the na_values flag. What you want to make sure of is that there isn't some malfored data like 1.5km in one of your fields that getting picked up as a NaN value.
pandas.read_table(..., na_values=None, keep_default_na=True, na_filter=True, ....)
Subsequently, once the dataframe is populated and the NaN values are identified properly, you can use the fillna method to substitute in zeros or the values that you identified as your distances.
Finally, it would be best to probably use notnull versus isfinite to convert the over to integers.