Performing math operations when plotting Pandas Dataframe columns - numpy

I'd like to plot products, ratios, etc of columns in a Pandas Data Frame without first creating a new column containing that product, ratio, etc. E.g.,
[df['A']/df['A']].plot()
doesn't work. For the following code:
x = np.array([[1,2,3],[4,5,6]])
df = pd.DataFrame(x,columns=['A','B','C'])
[df['A']/df['B']].plot()
I get the following error message: "AttributeError: 'list' object has no attribute 'plot' "

The division operation which you are doing in this line:
[df['A']/df['B']].plot()
returns a python list object instead of pandas object.
If you want to plot a particular column first without adding it to the dataframe, you can try this:
import pandas as pd
import numpy as np
x = np.array([[1,2,3],[4,5,6]])
df = pd.DataFrame(x,columns=['A','B','C'])
df['A'].div(df['B']).plot()
which returns a <matplotlib.axes._subplots.AxesSubplot> object

Related

Pandas Rolling Operation on Categorical column

The code I am trying to execute:
for cat_name in df['movement_state'].cat.categories:
transformed_df[f'{cat_name} Count'] = grouped_df['movement_state'].rolling(rolling_window_size, closed='both').apply(lambda s, cat=cat_name: s.value_counts()[cat])
transformed_df[f'{cat_name} Ratio'] = grouped_df['movement_state'].rolling(rolling_window_size, closed='both').apply(lambda s, cat=cat_name: s.value_counts(normalize=True)[cat])
For reproduction purposes just assume the following:
import numpy as np
import pandas as pd
d = {'movement_state': pd.Categorical(np.random.choice(['moving', 'standing', 'parking'], 20))}
grouped_df = pd.DataFrame.from_dict(d)
rolling_window_size = 3
I want to do rolling window operations on my GroupBy Object. I am selecting the column movement_state beforehand. This column is categorical as shown below.
grouped_df['movement_state'].dtypes
# Output
CategoricalDtype(categories=['moving', 'parking', 'standing'], ordered=False)
If I execute, I get these error messages:
pandas.core.base.DataError: No numeric types to aggregate
TypeError: cannot handle this type -> category
ValueError: could not convert string to float: 'standing'
Inside this code snippet of rolling.py from the pandas source code I read that the data must be converted to float64 before it can be processed by cython.
def _prep_values(self, values: ArrayLike) -> np.ndarray:
"""Convert input to numpy arrays for Cython routines"""
if needs_i8_conversion(values.dtype):
raise NotImplementedError(
f"ops for {type(self).__name__} for this "
f"dtype {values.dtype} are not implemented"
)
else:
# GH #12373 : rolling functions error on float32 data
# make sure the data is coerced to float64
try:
if isinstance(values, ExtensionArray):
values = values.to_numpy(np.float64, na_value=np.nan)
else:
values = ensure_float64(values)
except (ValueError, TypeError) as err:
raise TypeError(f"cannot handle this type -> {values.dtype}") from err
My question to you
Is it possible to count the values of a categorical column in a pandas DataFrame using the rolling method as I tried to do?
A possible workaround a came up with is to just use the codes of the categorical column instead of the string values. But this way, s.value_counts()[cat] would raise a KeyError if the window I am looking at does not contain every possible value.

How can I get an interpolated value from a Pandas data frame?

I have a simple Pandas data frame with two columns, 'Angle' and 'rff'. I want to get an interpolated 'rff' value based on entering an Angle that falls between two Angle values (i.e. between two index values) in the data frame. For example, I'd like to enter 3.4 for the Angle and then get an interpolated 'rff'. What would be the best way to accomplish that?
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
s = s.set_index('Angle') #Set 'Angle' as index
print(s)
result = s.at[3.0, "rff"]
print(result)
You may use numpy:
import numpy as np
np.interp(3.4, s.index, s.rff)
#59.6
You could use numpy for this:
import numpy as np
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
print(np.interp(3.4, s.Angle, s.rff))
>>> 59.6

How do you append a column and drop a column with pandas dataframes? Can't figure out why it won't print the dataframe afterwards

The DataFrame that I am working with has a datetime object that I changed to a date object. I attempted to append the date object to be the last column in the DataFrame. I also wanted to drop the datetime object column.
Both the append and drop operations don't work as expected. Nothing prints out afterwards. It should print the entire DataFrame (shortened it is long).
My code:
import pandas as pd
import numpy as np
df7=pd.read_csv('kc_house_data.csv')
print(df7)
mydates = pd.to_datetime(df7['date']).dt.date
print(mydates)
df7.append(mydates)
df7.drop(['date'], axis=1)
print(df7)
Why drop/append? You can overwrite
df7['date'] = pd.to_datetime(df7['date']).dt.date
import pandas as pd
import numpy as np
# read csv, convert column type
df7=pd.read_csv('kc_house_data.csv')
df7['date'] = pd.to_datetime(df7['date']).dt.date
print(df7)
Drop a column using df7.drop('date', axis=1, inplace=True).
Append a column using df7['date'] = mydates.

How to create a DataFrame with index names different from `row` and write data into (`index`, `column`) pairs in Julia?

How can I create a DataFrame with Julia with index names that are different from Row and write values into a (index,column) pair?
I do the following in Python with pandas:
import pandas as pd
df = pd.DataFrame(index = ['Maria', 'John'], columns = ['consumption','age'])
df.loc['Maria']['age'] = 52
I would like to do the same in Julia. How can I do this? The documentation shows a DataFrame similar to the one I would like to construct but I cannot figure out how.

Average of selected rows in csv file

In a csv file, how can i calculate the average of selected rows in a column:
Columns
I did this:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Read the csv file:
df = pd.read_csv("D:\\xxxxx\\mmmmm.csv")
#Separate the columns and get the average:
# Skid:
S = df['Skid Number after milling'].mean()
But this just gave me the average for the entire column
Thank you for the help!
For selecting rows in a pandas dataframe or series you can use the .iloc attribute.
For example df['A'].iloc[3:5] selects the fourth and fifth row in column "A" of a DataFrame. Indexing starts at 0 and the number behind the colon is not included. This returns a pandas series.
You can do the same using numpy: df["A"].values[3:5]
This already returns a numpy array.
Possibilities to calculate the mean are therefore.
df['A'].iloc[3:5].mean()
or
df["A"].values[3:5].mean()
Also see the documentation about indexing in pandas.