Put dataframes below each other - dataframe

I have two data frames that I want to put below each other (see screenshots)
series1 = df_new[['hour', 'minute','value']]
series2 = df_new[['hour', 'minute','value.1']]
I tried to use the command
a= pd.concat([series1, series2])
and I get this instead of the dataframes below each other, no idea why I get nan values. Can you help me with that?

Try
pd.concat([series1,series2.rename(columns={'value.1':'value'})], ignore_index=True)

Related

SettingWithCopyWarning : self.obj[item_labels[indexer[info_axis]]] = value_ issue i cannot overcome this issue

I am struggling in assigning new columns in my data frame, based on existing columns in the same data frame.
I keep getting this SettingWithCopyWarning that is driving me crazy:
_C:\ProgramData\Anaconda3\envs\PLAXIS_V20.3_python37\lib\site-packages\pandas\core\indexing.py:1596: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self.obj[key] = _infer_fill_value(value)
C:\ProgramData\Anaconda3\envs\PLAXIS_V20.3_python37\lib\site-packages\pandas\core\indexing.py:1783: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self.obj[item_labels[indexer[info_axis]]] = value_
The code that I am trying to run is the following:
dfe = df_sampled.loc[mask]
af=dfe["Dist{}{}UKPN mm".format(i,j)]
af.apply(lambda x: ((x*0.001)/ref)*100)
dfe.loc[:,"Dist{}_{}UKPN perc".format(i,j)]=af
My goal is to create a new column on the dfe dataframe ( which is a sub dataframe from another dataframe), which is based on the existing column on the dfe, but multiple by scalars values ( e.g to convert them into percentage)
I hope someone can help to understand how to overcome this
Thanks
F.

Pandas dataframe being treated as a series object after using groupby

I am conducting an analysis of a dataset. To find my results, I use this line of code:
new_df = df_ncis.groupby(['state', 'year'])['totals'].mean()
The object returned by this statement is a Series, when it should be a dataframe. I don't understand why this happened, or how to solve this issue. Also, one of the columns of the new object is missing its name. Here is the github link for the project: https://github.com/louishrm/gundataUS.
Any help would be great.
You are filtering the result by ['totals'] which is a series.
try this instead
new_df = df_ncis[['state', 'year', 'totals']].groupby(['state', 'year']).mean()
which will give you a dataframe with your 3 columns.
or if you want it as a dataframe of one column (Note the double brackets)
new_df = df_ncis.groupby(['state', 'year'])[['totals']].mean()

Remove NaN values from pandas dataframes inside a list

I have a number of dataframes inside a list. I am trying to remove NaN values. I tried to do it in a for loop:
for i in list_of_dataframes:
i.dropna()
it didn't work but python didnt return an error neither. If I apply the code
list_of_dataframes[0] = list_of_dataframes[0].dropna()
to a each dataframe individually it works, but i have too many of them. There must be a way which I just can't figure out. What are the possible solutions?
Thanks a lot
You didn't assign the new DataFrames with the dropped values to anything, so there was no effect.
Try this:
for i in range(len(list_of_dataframes)):
list_of_dataframes[i] = list_of_dataframes[i].dropna()
Or, more conveniently:
for df in list_of_dataframes:
df.dropna(inplace=True)

Access pandas DataFrame attributes inside chained methods

Good afternoon,
I have a few .csv files to be transformed into pandas DataFrames. Although they contain the same type of data in the same columns, they have different column names. I am trying to do all the small transformations on the fly to be able to concatenate the DataFrames all at once. The problem I am having is that as far as I know there is not way to access the attributes of the DataFrame "on the fry", first you assign it to a variable and then access the data. In the following way:
df = pd.read_csv("my_csv.csv")
df = df.rename(columns=dict(zip(df.columns, [my_columns])))
So I was wondering if anyone knows a way to do something like the following:
(pd.read_csv("my_csv.csv")
.rename(columns=dict(zip(SELF.columns, [my_columns])))
)
where SELF references the DataFrame that has been just created.
So far I have tried unsuccessfully to use lambda functions as I know they can be used to subset the DataFrame by conditions set on the just created object like [lambda x: x.ColumnA > 20]
Thank you in advance.
EDIT:
I was able to do what I was looking for with the help of .pipe() I did the following:
def rename_columns(self, columns):
return self.rename(columns=dict(zip(self.columns, columns)))
(pd.DataFrame([{'a':1},{'a':1},{'a':1},{'a':1},{'a':1}])
.pipe(rename_columns, ['b'])
)
You can use .set_axis for this:
(pd.DataFrame(np.random.randn(5, 5))
.set_axis(['A', 'B', 'C', 'D', 'E'], axis=1, inplace=False)
)
inplace will change in a future version of pandas, but currently defaults to True; axis=1 operates on columns.

Pass pandas sub dataframe to master dataframe

I have a dataframe which I am doing some work on
d={'x':[2,8,4,-5,4,5,-3,5],'y':[-.12,.35,.3,.15,.4,-.5,.6,.57]}
df=pd.DataFrame(d)
df['x_even']=df['x']%2==0
subdf, get all rows where x is negative and then square x and then multiple 100 to y
subdf=df[df.x<0]
subdf['x']=subdf.x**2
subdf['y']=subdf.y*100
subdf's work is completed. I am not sure how I can incorporate these changes to the master dataframe (df).
It looks like your current code should give you a SettingWithCopyWarning warning.
To avoid this you could do the following:
df.loc[df.x<0, 'y'] = df.loc[df.x<0, 'y']*100
df.loc[df.x<0, 'x'] = df.loc[df.x<0, 'x']**2
Which will change your df, without raising a warning and there is no need to merge anything back.
pd.merge(subdf,df,how='outer')
This does what I was asking for. Thanks for the tip Primer