Null value of mean in dataframe columns after imputation - pandas

I have a dataframe where I imputed nulls with the mean of group ID. The null count after applying imputation for all such columns is 0. When I do a describe, the mean of few columns which were imputed shows as NaN. The null count for such columns is 0 and the data type is 'float16' (changed it from 'float64' to save on in-memory computations). How could this possibly happen?
Thanks in advance!

Related

NaN output when multiplying row and column of dataframe in pandas

I have two data frames the first one looks like this:
and the second one like so:
I am trying to multiply the values in number of donors column of the second data frame(96 values) with the values in the first row of the first data frame and columns 0-95 (also 96 values).
Below is the code I have for multiplying the two right now, but as you can see the values are all NaN:
Does anyone know how to fix this?
Your second dataframe has dtype object, you must convert it to float
df_sls.iloc[0,3:-1].astype(float)

Is there a way to combine two columns in a dataset, keeping the larger float64 using Pandas?

Ill try to keep it simple, but these are very large datasets I am working with.
Theoretically I am trying to combine Columns A and B of my data frame.
But, if A has a value in a row then B doesn't, and vice versa. That hole is filled with 'NaN'
A {1,2,NaN,4,5}
B {NaN,NaN,3,NaN,NaN}
I need A to equal {1,2,3,4,5}
EDIT:
Using
df.rename(columns{"a":"b"})
before you concatenate your data allows them to be combined easily is the only layering values layer over NaN.
df['A'] = df['A'].fillna(df['B'])
What this code does is fill all missing values of column A with the values found in column B.
For more options see: https://datascience.stackexchange.com/questions/17769/how-to-fill-missing-value-based-on-other-columns-in-pandas-dataframe

Filteration on dataframe column value with combination of values

I have a dataframe which has 2 columns named TABLEID and STATID
There are different values in the both the columns.
when I filter the dataframe on values say '101PC' and 'ST101', it gives me 14K records and when I filter the dataframe on values say '102HT' and 'ST102', it gives me 14K records also. The issue is when I try to combine both the filters like below it gives me blank dataframe. I was expecting 28K records in my resultant dataframe. Any help is much appreciated
df[df[['TABLEID','STATID']].apply(tuple, axis = 1).isin([('101PC', 'ST101'), ('102HT','ST102')])]

Sorting pyspark Dataframe's columns

Good morning,
I'd like to ask you if I can change the order of dataframe columns based on the number of null values in pyspark.
For example: one column contains 5 null values, the second one contains 3 null values, and the third contains 4 null values. The new dataframe must sorted like this: [second column, third column, first column].
I hope you help me. Thank you

Difference in count of groupby and its indices

What is the difference between the lenght of groupby object and the length of indices method of groupby object? I expected to return the same numbers for both the statements.
len(Fees.groupby(['InstituteCode','Code','ProgramType','Status','AcademicYear']))
8000
Why do I get different numbers?
len(Fees.groupby(['InstituteCode','Code','ProgramType','Status','AcademicYear']).indices)
7433
Does it mean I have only 7433 distinct records for the given list of columns?
This was because the "Code" column was null for 568 records. Those were skipped in groupby. It became clear when I checked for null values using...
df.apply(lambda x: x.isnull().sum())