Dataframe add column with calculation - dataframe

I've a Dataframe with multiple columns and trying to add a new column to calculate the sum between these two columns. Ss there any function can be loop through the whole dataframe?
Original
enter image description here
Desired
enter image description here

Why do all columns have the same name?
In pandas, we can create a new column and put the sum of the values of the other two columns in it with the following code:
df['sum'] = df['col_1'] + df['col_2']
But first you need to change the names of the columns so that they are not the same.
And then you can arrange the columns as desired.

Related

Pyspark dynamic column selection from dataframe

I have a dataframe with multiple columns as t_orno,t_pono, t_sqnb ,t_pric,....and so on(it's a table with multiple columns).
The 2nd dataframe contains certain name of the columns from 1st dataframe. Eg.
columnname
t_pono
t_pric
:
:
I need to select only those columns from the 1st dataframe whose name is present in the 2nd. In above example t_pono,t_pric.
How can this be done?
Let's say you have the following columns (which can be obtained using df.columns, which returns a list):
df1_cols = ["t_orno", "t_pono", "t_sqnb", "t_pric"]
df2_cols = ["columnname", "t_pono", "t_pric"]
To get only those columns from the first dataframe that are present in the second one, you can do set intersection (and I cast it to a list, so it can be used to select data):
list(set(df1_cols).intersection(df2_cols))
And we get the result:
["t_pono", "t_pric"]
To put it all together and select only those columns:
select_columns = list(set(df1_cols).intersection(df2_cols))
new_df = df1.select(*select_columns)

How can I query a column of a dataframe on a specific value and get the values of two other columns corresponding to that value

I have a data frame where the first column contains various countries' ISO codes, while the other 2 columns contain dataset numbers and Linkedin profile links.
Please refer to the image.
I need to query the data frame's first "FBC" column on the "IND" value and get the corresponding values of the "no" and "Linkedin" columns.
Can somebody please suggest a solution?
Using query():
If you want just the no and Linkedin values.
df = df.query("FBC.eq('IND')")[["no", "Linkedin"]]
If you want all 3:
df = df.query("FBC.eq('IND')")

Conditional row number Pandas

I need to add row number to my dataframe based on certain condition, below is the image input data frame.
I need a row number column in my dataframe as illustrated in below image(Rank column).
so when ever "RequestResubmitted" value is found within group I want reset rank to 1 again.
Let us try cumsum create the cub key and groupby + cumcount
s=df.groupby([df['Word Order Code'],df['Status Code'].eq('Request Submitted').cumsum()]).cumcount()+1
df['rank']=s

Convert Series to Dataframe where series index is Dataframe column names

I am selecting row by row as follows:
for i in range(num_rows):
row = df.iloc[i]
as a result I am getting a Series object where row.index.values contains names of df columns.
But I wanted instead dataframe with only one row having dataframe columns in place.
When I do row.to_frame() instead of 1x85 dataframe (1 row, 85 cols) I get 85x1 dataframe where index contains names of columns and row.columns
outputs
Int64Index([0], dtype='int64').
But all I want is just original data-frame columns with only one row. How do I do it?
Or how do I convert row.index values to row.column values and change 85x1 dimension to 1x85
You just need to adding T
row.to_frame().T
Also change your for loop with adding []
for i in range(num_rows):
row = df.iloc[[i]]

Merging DataFrames on a specific column together

I tried to perform my self-created function on a for loop.
Some remarks in advance:
ma_strategy is my function and requires three inputs
ticker_list is a list with strings result is a pandas Dataframe with 7 columns and I can call the column 'return_cum' with result['return_cum']. - The rows of this column are containing floating point numbers.
My intention is the following:
The for loop should iterate over the items in my ticker_list and should save the 'return_cum' columns in a DataFrame. Then the different 'return_cum' columns should be stored together so that at the end I get a DataFrame with all the 'return_cum' columns of my ticker list.
How can I achieve that goal?
My approach is:
for i in ticker_list:
result = ma_strategy(i, 20, 5)
x = result['return_cum'].to_frame()
But at this stage I need some help.
If i inderstood you correctly this should work:
result_df =pd.DataFrame()
for i in ticker_list:
result= ma_strategy(i, 20,5)
resault_df[i + '_return_cum'] = result['return_cum']