Row wise average return nan - pandas

Here is my data frame
Where I wrote 1 or 2, I would like to get the mean/median of the previous column.
For instance, for DXC.N, the expected output where I wrote 1 is mean(nan,(-0.44..),0.1127..,(-0.15..),(-0.19..),nan))
For EFX, the expected output where I wrote 2 is mean(nan,-0,14..,0.06..,0.13..,0.007,nan)
I tried the following but it returns only nans :
DF['Column8']=DF.groupby('Column1')['Column8'].mean()
Thanks,

I think you need something more like this:
You want the mean of colum7, not column8 right?
# use transform() as it will return a series of values that will match in legnth to your original dataframe
DF['Column8']=DF.groupby('Column1')['Column7'].transform('mean')

Related

pandas dataframe - how to find multiple column names with minimum values

I have a dataframe (small sample shown below, it has more columns), and I want to find the column names with the minimum values.
Right now, I have the following code to deal with it:
finaldf['min_pillar_score'] = finaldf.iloc[:, 2:9].idxmin(axis="columns")
This works fine, but does not return multiple values of column names in case there is more than one instance of minimum values. How can I change this to return multiple column names in case there is more than one instance of the minimum value?
Please note, I want row wise results, i.e. minimum column names for each row.
Thanks!
try the code below and see if it's in the output format you'd anticipated. it produces the intended result at least.
result will be stored in mins.
mins = df.idxmin(axis="columns")
for i, r in df.iterrows():
mins[i] = list(r[r == r[mins[i]]].index)
Get column name where value is something in pandas dataframe might be helpful also.
EDIT: adding an image of the output and the full code context.
Assuming this input as df:
A B C D
0 5 8 9 5
1 0 0 1 7
2 6 9 2 4
3 5 2 4 2
4 4 7 7 9
You can use the underlying numpy array to get the min value, then compare the values to the min and get the columns that have a match:
s = df.eq(df.to_numpy().min()).any()
list(s[s].index)
output: ['A', 'B']

NaN output when multiplying row and column of dataframe in pandas

I have two data frames the first one looks like this:
and the second one like so:
I am trying to multiply the values in number of donors column of the second data frame(96 values) with the values in the first row of the first data frame and columns 0-95 (also 96 values).
Below is the code I have for multiplying the two right now, but as you can see the values are all NaN:
Does anyone know how to fix this?
Your second dataframe has dtype object, you must convert it to float
df_sls.iloc[0,3:-1].astype(float)

Removing values of a certain object type from a dataframe column in Pandas

I have a pandas dataframe where some values are integers and other values are an array. I simply want to drop all of the rows that contain the array (object datatype I believe) in my "ORIGIN_AIRPORT_ID" column, but I have not been able to figure out how to do so after trying many methods.
Here is what the first 20 rows of my dataframe looks like. The values that show up like a list are the ones I want to remove. The dataset is a couple million rows, so I just need to write code that removes all of the array-like values in that specific dataframe column if that makes sense.
df = df[df.origin_airport_ID.str.contains(',') == False]
You should consider next time giving us a data sample in text, instead of a figure. It's easier for us to test your example.
Original data:
ITIN_ID ORIGIN_AIRPORT_ID
0 20194146 10397
1 20194147 10397
2 20194148 10397
3 20194149 [10397, 10398, 10399, 10400]
4 20194150 10397
In your case, you can use the .to_numeric pandas function:
df['ORIGIN_AIRPORT_ID'] = pd.to_numeric(df['ORIGIN_AIRPORT_ID'], errors='coerce')
It replaces every cell that cannot be converted into a number to a NaN ( Not a Number ), so we get:
ITIN_ID ORIGIN_AIRPORT_ID
0 20194146 10397.0
1 20194147 10397.0
2 20194148 10397.0
3 20194149 NaN
4 20194150 10397.0
To remove these rows now just use .dropna
df = df.dropna().astype('int')
Which results in your desired DataFrame
ITIN_ID ORIGIN_AIRPORT_ID
0 20194146 10397
1 20194147 10397
2 20194148 10397
4 20194150 10397

Parse dictionary inside dataframe

One column of my df has either 1.a nested dictionary or 2. NAN as value
The dicts has 2 key-value pairs like this one
{'value': '1', 'info': {....}}
I wish to only get the value of “value”, the value of “info” is not useful, we can leave “NAN” if it is NAN value
What is the easiest way to achieve this?
BTW I tried df_september_p1['that_column_name']==np.nan
and df_september_p1['that columnname']==’nan’,
which yield the same Boolean values. The weird thing is I see the 2nd row has NAN as value but the yield result is False for 2nd row… don’t get why
You can use Series.str.get working well with dictioanries or with missing values NaNs:
df_september_p1['val'] = df_september_p1['that_column_name'].str.get('value')

Is there a pandas function for get variables names in a column?

I'm just thinking in a hypothetical dataframe (df) with around 50 columns and 30000 rows, and one hypothetical column like e.g: Toy = ['Ball','Doll','Horse',...,'Sheriff',etc].
Now I only have the name of the column (Toy) and I want to know what are the variables inside the column without duplicated values.
I'm thinking an output like the .describe() function
df['Toy'].describe()
but with more info, because now I'm getting only this output
count 30904
unique 7
top "Doll"
freq 16562
Name: Toy, dtype: object
In other words, how do I get the 7 values in this column. I was thinking in something like copy the column and delete duplicated values, but I'm pretty sure that there is a shorter way. Do you know the right code or if I should use another library?
Thank you so much!
You can use unique() function to list out all the unique values in your columns. In your case, to list out the unique values in the column name toys in the dataframe df the syntax would look like
df["toys"].unique()
You can also use .drop_duplicates(), which returns a pandas Series:
df['toys'].drop_duplicates()