Remove NaN values from pandas dataframes inside a list - pandas

I have a number of dataframes inside a list. I am trying to remove NaN values. I tried to do it in a for loop:
for i in list_of_dataframes:
i.dropna()
it didn't work but python didnt return an error neither. If I apply the code
list_of_dataframes[0] = list_of_dataframes[0].dropna()
to a each dataframe individually it works, but i have too many of them. There must be a way which I just can't figure out. What are the possible solutions?
Thanks a lot

You didn't assign the new DataFrames with the dropped values to anything, so there was no effect.
Try this:
for i in range(len(list_of_dataframes)):
list_of_dataframes[i] = list_of_dataframes[i].dropna()
Or, more conveniently:
for df in list_of_dataframes:
df.dropna(inplace=True)

Related

Put dataframes below each other

I have two data frames that I want to put below each other (see screenshots)
series1 = df_new[['hour', 'minute','value']]
series2 = df_new[['hour', 'minute','value.1']]
I tried to use the command
a= pd.concat([series1, series2])
and I get this instead of the dataframes below each other, no idea why I get nan values. Can you help me with that?
Try
pd.concat([series1,series2.rename(columns={'value.1':'value'})], ignore_index=True)

Pandas dataframe being treated as a series object after using groupby

I am conducting an analysis of a dataset. To find my results, I use this line of code:
new_df = df_ncis.groupby(['state', 'year'])['totals'].mean()
The object returned by this statement is a Series, when it should be a dataframe. I don't understand why this happened, or how to solve this issue. Also, one of the columns of the new object is missing its name. Here is the github link for the project: https://github.com/louishrm/gundataUS.
Any help would be great.
You are filtering the result by ['totals'] which is a series.
try this instead
new_df = df_ncis[['state', 'year', 'totals']].groupby(['state', 'year']).mean()
which will give you a dataframe with your 3 columns.
or if you want it as a dataframe of one column (Note the double brackets)
new_df = df_ncis.groupby(['state', 'year'])[['totals']].mean()

Pandas str slice in combination with Pandas str index

I have a Dataframe containing a single column with a list of file names. I want to find all rows in the Dataframe that their value has a prefix from a set of know prefixes.
I know I can run a simple for loop, but I want to run in a Dataframe to check speeds and run benchmarks - it's also a nice exercise.
What I had in mind is combining str.slice with str.index but I can't get it to work. This is what I have in mind:
import pandas as pd
file_prefixes = {...}
file_df = pd.Dataframe(list_of_file_names)
file_df.loc[file_df.file.str.slice(start=0, stop=upload_df.file.str.index('/')-1).isin(file_prefixes), :] # this doesn't work as the index returns a dataframe
My hope is that said code will return all rows that the value there starts with a file prefix from the list above.
In summary, I would like help with 2 things:
Combining slice and index
Thoughts about better ways to achieve this
Thanks
I will use startswith
file_df.loc[file_df.file.str.startswith(tuple(file_prefixes)), :]

What's the cleanest way for assigning a new pandas dataframe column to a single value?

Working with a dataframe df I wanted to create a new column A and assign it to a single value (a string in my case)
df['A'] = value
gave a warning and suggested to use loc
however the solution below still gave the same warning:
df.loc[:,'A']=value
Doing some research I found the solution below which does not generate a warning:
df=df.assign(A =value)
Is it the general accepted way of creating a new column and assigning it to a value? Are there other possibilities using loc?
pandas version '0.20.1'
EDIT: this is the warning message obtained for the 2 first methods
"A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead"
As explained by #EdChum and #ScottBoston
Since df was derived using a mask on some original dataframe
df = df_original[boolean_mask]
to avoid the warning with the two first methods, use instead df=df_original[boolean_mask].copy()
df.assign does not need this because it automatically creates a copy of the original dataframe

Pass pandas sub dataframe to master dataframe

I have a dataframe which I am doing some work on
d={'x':[2,8,4,-5,4,5,-3,5],'y':[-.12,.35,.3,.15,.4,-.5,.6,.57]}
df=pd.DataFrame(d)
df['x_even']=df['x']%2==0
subdf, get all rows where x is negative and then square x and then multiple 100 to y
subdf=df[df.x<0]
subdf['x']=subdf.x**2
subdf['y']=subdf.y*100
subdf's work is completed. I am not sure how I can incorporate these changes to the master dataframe (df).
It looks like your current code should give you a SettingWithCopyWarning warning.
To avoid this you could do the following:
df.loc[df.x<0, 'y'] = df.loc[df.x<0, 'y']*100
df.loc[df.x<0, 'x'] = df.loc[df.x<0, 'x']**2
Which will change your df, without raising a warning and there is no need to merge anything back.
pd.merge(subdf,df,how='outer')
This does what I was asking for. Thanks for the tip Primer