dataframe not reading float values - pandas

I have a dataframe that contains times in float format for example 12.0, 12.25, 12.75 with 27 columns. I have an if which checks if a user-given time is in the dataframe, but it only recognizes the 12.0 formatted time out of the dataframe. I am checking from the dataframe df4 in a specific column "Timestamp" if the given time (return_time) is in the column and get the corresponding index so that I can change its value in each column and then write it into a csv file.
if return_time in df4["Timestamp"]:
idx=df4[df4["Timestamp"]==return_time].index.values
df4.loc[idx,i]="CHARGING"
df4.to_csv("test.csv")
I should have gotten 27 different times as a result to store them in each column of the csv but out of the 27 i only get a few that correspond to the .0 type. It doesn't recognize the other types like .25 .50 or .75

You don't need to get the values of index that you want to select using .loc[...]. Just use idx=df4[df4["Timestamp"]==return_time].index

Related

NaN output when multiplying row and column of dataframe in pandas

I have two data frames the first one looks like this:
and the second one like so:
I am trying to multiply the values in number of donors column of the second data frame(96 values) with the values in the first row of the first data frame and columns 0-95 (also 96 values).
Below is the code I have for multiplying the two right now, but as you can see the values are all NaN:
Does anyone know how to fix this?
Your second dataframe has dtype object, you must convert it to float
df_sls.iloc[0,3:-1].astype(float)

Changing Excel Dates (As integers) mixed with timestamps in single column - Have tried str.extract

I have a dataframe with a column of dates, unfortunately my import (using read_excel) brought in format of dates as datetime and also excel dates as integers.
What I am seeking is a column with dates only in format %Y-%m-%d
From research, excel starts at 1900-01-00, so I could add these integers. I have tried to use str.extract and a regex in order to separate the columns into two, one of datetimes, the other as integers. However the result is NaN.
Here is an input code example
df = pd.DataFrame({'date_from': [pd.Timestamp('2022-09-10 00:00:00'),44476, pd.Timestamp('2021-02-16 00:00:00')], 'date_to': [pd.Timestamp('2022-12-11 00:00:00'),44455, pd.Timestamp('2021-12-16 00:00:00')]})
Attempt to first separate the columns by extracting the integers( dates imported from MS excel)
df.date_from.str.extract(r'(\d\d\d\d\d)')
however this gives NaN.
The reason I have tried to separate integers out of the column, is that I get an error when trying to act on the excel dates within the mixed column (in other words and error using the following code:)
def convert_excel_time(excel_time):
return pd.to_datetime('1900-01-01') + pd.to_timedelta(excel_time,'D')
Any guidance on how I might get a column of dates only? I find the datetime modules and aspects of pandas and python the most frustrating of all to get to grips with!
thanks
You can convert values to timedeltas by to_timedelta with errors='coerce' for NaT if not integers add Timestamp called d, then convert datetimes with errors='coerce' and last pass to Series.fillna in custom function:
def f(x):
#https://stackoverflow.com/a/9574948/2901002
d = pd.Timestamp(1899, 12, 30)
timedeltas = pd.to_timedelta(x, unit='d', errors='coerce')
dates = pd.to_datetime(x, errors='coerce')
return (timedeltas + d).fillna(dates)
cols = ['date_from','date_to']
df[cols] = df[cols].apply(f)
print (df)
date_from date_to
0 2022-09-10 2022-12-11
1 2021-10-07 2021-09-16
2 2021-02-16 2021-12-16

replace values in column in data frame with average value

I am working with data analysis using python. I want to replace all data < 120 in one column with average_steam. average_steam= 123
to access the column in data frame I write "data.steam" then I get all values
the code I tried is:
average_steam=data.steamin.mean()
print(average_steam)
data.steamin.replace(data.steamin<=120,average_steam, inplace=True)

How to filter a Dataframe based on an ID-Column which corresponds to a second Dataframe containing conditions for each ID efficiently?

I have a Dataframe with one ID column and two data columns X,Y containing numeric values. For each ID there are several rows of data.
I have a second Dataframe with the same ID column and two numeric columns specifing the lower and upper Limit for the X - Values for each ID.
I want to use the second Dataframe to filter the first Dataframe to only have rows which have X Values within in the X_min-X_max Range of the specific ID.
I can solve this by Looping over the second dataframe and filtering groupby(ID) - Elements of the first DF but that is slow for large amount of IDs. Is there an efficient way to solve this?
Example Code with the data in df, the ranges in df_ranges and the expected result in df_result. The real data Frame is obviously a lot bigger.
import pandas as pd
x=[2.1,2.2,2.6,2.4,2.8,3.5,2.8,3.2]
y=[3.1,3.5,3.4,2.7,2.1,2.7,4.1,4.3]
ID=[0]*4+[0.1]*4
x_min=[2.0,3.0]
x_max=[2.5,3.4]
IDs=[0,0.1]
df=pd.DataFrame({'ID':ID,'X':x,'Y':y})
df_ranges=pd.DataFrame({'ID':IDs,'X_min':x_min,'X_max':x_max})
df_result=df.iloc[[0,1,3,7],:]
Possible Solution:
def filter_ranges(grp,df_ranges):
x_min=df_ranges.loc[df_ranges.ID==grp.name,'X_min'].values[0]
x_max=df_ranges.loc[df_ranges.ID==grp.name,'X_max'].values[0]
return grp.loc[(grp.X>=x_min)&(grp.X<=x_max),:]
target_df_grp=df.groupby('ID').apply(filter_ranges,df_ranges=df_ranges)
Try this:
merged = df.merge(df_ranges, on='ID')
target_df = merged[(merged.X>=merged.X_min)&(merged.X<=merged.X_max)][['ID', 'X', 'Y']] # Here, desired filter is applied.
print(target_df) will give:
ID X Y
0 0.0 2.1 3.1
1 0.0 2.2 3.5
3 0.0 2.4 2.7
7 0.1 3.2 4.3

Calculating Root-Mean-Square of pandas dataframe column

I have 50 residual values that are in the format 00:00:00.0000 under df['Residuals'] but hold actual values in a Pandas dataframe columns such as:
00:00:04.7328
00:00:01.4252
and so on. I want to calculate the rms value of these times in seconds but cannot convert them from this format to just a decimal format. The dtype of the listed values above says m8[ns] which I am unfamiliar with. My question is how can I convert it from this m8[ns] format to an integer and then run the calculations?
The first thing to be paid attention to is the dtype, whether it's <m8[ns] (which is TimedeltaProperties) or <M8[ns] (which is DatetimeProperties)
In the case of <m8[ns]:
df['Residuals'].dt.seconds + df['Residuals'].dt.microseconds*1e-6
should get you the answer.
In the case of <M8[ns]:
df['Residuals'].dt.second + df['Residuals'].dt.microsecond*1e-6 # without 's'
should get you the answer.