How to combine two data columns in pandas? - pandas

I have two tables, like below. I want to merge two table into 1. I tried to merge,concat, join in panda but it gives a new table of height 20, I want to have a height of 10 in the new combined table. How to do this one panda data frames?

You need concat with axis=1:
df = pd.concat([df1, df2], axis=1)

Related

DataFrame Groupby apply on second dataframe?

I have 2 dataframes df1, df2. Both have id as a column. I want to compute a new column, weighted_average, in df1 that is a function of the values in df2 with the same id.
First, I think I should do df1.groupby("id"). Is it possible to use GroupBy.apply(...) and have it use values from df2? In the examples I've seen, it usually just operates on df1 values.
If they have same id positions and length, you can do some like:
df2["new column name"] = df1["column name"].apply(...)

Python Pandas : joining selective columns from 1 data frame and add to another

I want to take 1 column from dataframe 'dfGS' and add it to dataframe 'df3'
If I just join the full dfGS to DF3 it works fine, but when I try to specify only 1 column to join I get : KeyError:'Ticker'**
'''df3=pd.merge( df3,dfGS['Shares to Trade'],how="inner",on='Ticker')'''
Ticker is the correct reference column in both df's , so not sure where I am going wrong?
error
df3
dfGS
I would do it like this:
df3 = df3.merge(dfGS[['Shares to Trade',"Ticker"]], how="inner", left_on="Ticker", right_on="Ticker")
OR
df3 = df3.merge(dfGS[['Shares to Trade',"Ticker"]], how="inner", on="Ticker")

Combine two dataframe to send a automated message [duplicate]

is there a way to conveniently merge two data frames side by side?
both two data frames have 30 rows, they have different number of columns, say, df1 has 20 columns and df2 has 40 columns.
how can i easily get a new data frame of 30 rows and 60 columns?
df3 = pd.someSpecialMergeFunct(df1, df2)
or maybe there is some special parameter in append
df3 = pd.append(df1, df2, left_index=False, right_index=false, how='left')
ps: if possible, i hope the replicated column names could be resolved automatically.
thanks!
You can use the concat function for this (axis=1 is to concatenate as columns):
pd.concat([df1, df2], axis=1)
See the pandas docs on merging/concatenating: http://pandas.pydata.org/pandas-docs/stable/merging.html
I came across your question while I was trying to achieve something like the following:
So once I sliced my dataframes, I first ensured that their index are the same. In your case both dataframes needs to be indexed from 0 to 29. Then merged both dataframes by the index.
df1.reset_index(drop=True).merge(df2.reset_index(drop=True), left_index=True, right_index=True)
If you want to combine 2 data frames with common column name, you can do the following:
df_concat = pd.merge(df1, df2, on='common_column_name', how='outer')
I found that the other answers didn't cut it for me when coming in from Google.
What I did instead was to set the new columns in place in the original df.
# list(df2) gives you the column names of df2
# you then use these as the column names for df
df[list(df2)] = df2
There is way, you can do it via a Pipeline.
** Use a pipeline to transform your numerical Data for ex-
Num_pipeline = Pipeline
([("select_numeric", DataFrameSelector([columns with numerical value])),
("imputer", SimpleImputer(strategy="median")),
])
**And for categorical data
cat_pipeline = Pipeline([
("select_cat", DataFrameSelector([columns with categorical data])),
("cat_encoder", OneHotEncoder(sparse=False)),
])
** Then use a Feature union to add these transformations together
preprocess_pipeline = FeatureUnion(transformer_list=[
("num_pipeline", num_pipeline),
("cat_pipeline", cat_pipeline),
])
Read more here - https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.FeatureUnion.html
This solution also works if df1 and df2 have different indices:
df1.loc[:, df2.columns] = df2.to_numpy()

How to resample a dataframe with different functions applied to each column if we have more than 20 columns?

I know this question has been asked before. The answer is as follows:
df.resample('M').agg({'col1': np.sum, 'col2': np.mean})
But I have 27 columns and I want to sum the first 25, and average the remaining two. Should I write this('col1' - 'col25': np.sum) for 25 columns and this('col26': np.mean, 'col27': np.mean) for two columns?
Mt dataframe contains hourly data and I want to convert it to monthly data. I want to try something like that but it is nonsense:
for i in col_list:
df = df.resample('M').agg({i-2: np.sum, 'col26': np.mean, 'col27': np.mean})
Is there any shortcut for this situation?
You can try this, not for loop :
sum_col = ['col1','col2','col3','col4', ...]
sum_df = df.resample('M')[sum_col].sum()
mean_col = ['col26','col27']
mean_df = df.resample('M')[mean_col].mean()
df = sum_col.join(mean_df)

How to concat 3 dataframes with each into sequential columns

I'm trying to understand how to concat three individual dataframes (i.e df1, df2, df3) into a new dataframe say df4 whereby each individual dataframe has its own column left to right order.
I've tried using concat with axis = 1 to do this, but it appears not possible to automate this with a single action.
Table1_updated = pd.DataFrame(columns=['3P','2PG-3Io','3Io'])
Table1_updated=pd.concat([get_table1_3P,get_table1_2P_max_3Io,get_table1_3Io])
Note that with the exception of get_table1_2P_max_3Io, which has two columns, all other dataframes have one column
For example,
get_table1_3P =
get_table1_2P_max_3Io =
get_table1_3Io =
Ultimately, i would like to see the following:
I believe you need first concat and tthen change order by list of columns names:
Table1_updated=pd.concat([get_table1_3P,get_table1_2P_max_3Io,get_table1_3Io], axis=1)
Table1_updated = Table1_updated[['3P','2PG-3Io','3Io']]