Merging Dataframe within a for loop

Merging Dataframe within a for loop - pandas

I tried to perform my self-created function on a for loop, but it does not work as expected.
Some remarks in advance:
ma_strategy is my function and requires three inputs
ticker_list is a list with strings
result is a pandas Dataframe with 7 columns and I can call the column 'return_cum' with result['return_cum']. The rows of this column are containing floating point numbers.
These for loops doesn't work:
for i in ticker_list:
result = ma_strategy(i, 20, 5)
x = result['return_cum']
sample_returns = pd.DataFrame
y = pd.merge(x.to_frame(),sample_returns, left_index=True)
for i in ticker_list:
result = ma_strategy(i, 20, 5)
x = result[['return_cum']]
sample_returns = pd.DataFrame
y = pd.concat([sample_returns, x], axis=1)
My intention is the following:
The for loop should iterate over the items in my ticker_list and should save the 'return_cum' columns in x. Then the 'return_cum' columns should be stored in y together so that at the end I get a DataFrame with all the 'return_cum' columns of my ticker list.
How can I achieve that goal? I tried pd.concoat and merge, but nothing works.
Thanks for your help!

Related

Contatenate rows in Pandas

I have 12 months sales data for each month. I want to analyze the dataset as a whole.
I have tried using the concat function but It produces not a number (NaN) in my dataframe fields.
In R, cbind function solves this. How do i approach this differently in Python?
I tried using df.concat function to bind the rows cos all the column names are the same for the datasets.
What other options can i explore?
sales_1 = pd.read_csv('Sales_January_2019.csv')
sales_2 = pd.read_csv('Sales_February_2019.csv')
sales_3 = pd.read_csv('Sales_March_2019.csv')
sales_4 = pd.read_csv('Sales_April_2019.csv')
sales_5 = pd.read_csv('Sales_May_2019.csv')
sales_6 = pd.read_csv('Sales_June_2019.csv')
sales_7 = pd.read_csv('Sales_July_2019.csv')
sales_8 = pd.read_csv('Sales_August_2019.csv')
sales_9 = pd.read_csv('Sales_September_2019.csv')
sales_10 = pd.read_csv('Sales_October_2019.csv')
sales_11 = pd.read_csv('Sales_November_2019.csv')
sales_12 = pd.read_csv('Sales_December_2019.csv')
I expect all data frame to be merged into one since the column names are the same for all

perhaps
# using concat with the list of the DF that you already read-in to combine into a single DF
pd.concat([sales_1 ,sales_2 ,sales_3 ,sales_4 ,sales_5 ,sales_6 ,sales_7 ,sales_8 ,sales_9 ,sales_10 ,sales_11 ,sales_12 ])

cancatenate multiple dfs with same dimensions and apply functions to cell values of all dfs and store result in the cell

df1 = pd.DataFrame(np.random.randint(0,9,size=(2, 2)))
df2 = pd.DataFrame(np.random.randint(0,9,size=(2, 2)))
Lets say after concatenate df1 and df2(real case I have many dfs with 700*200 size) in a way that I get something like below table(I dont need to see this table, just for explanation)
col a
col b
row a
[1.4]
[7,8]
row b
[9,2]
[2,0]
Then i want to pass each cell values to below compute function and add the result it from to the cell
def compute(row, column, cell_values):
baseline_df = [2, 4, 6, 7, 8]
result = baseline_df
for values in cell_values:
if (column-row) != dict[values]: # dict contain specific values
result = baseline_df
else:
result = result.apply(func, value=values)
return result.loc[column-row]
def func(df, value):
# operation
result_df = df*value
return result_df
What I want is get df1 and df2 , concatenate and apply above function and get the results. In a really fast way.
In the actual use case , df quite big and if it run for all cells it would take significant amount of time, i need a faster way to perform this.
Note:
This is my idea of doing this. I hope you understand what my requirements are. Please let me know if that is not clear.
currently, i am using something like below, just get the max value of the cell and do the calculation(func)later
This will just give the max value of all cells combined,
dfs = pd.concat(grid).max(level=0)
Final result should be something like this after calculation(same 2d array with new cell data)
col a
col b
row a
0.1
0.7
row b
0.9
0,6
Different approaches are also welcome

Getting same value from list in dataframe column using Python

I have dataframe in which there 3 columns, Now, I added one more column and in which I am adding unique values using random function.
I created list variable and using for loop I am adding random string in that list variable
after that, I created another loop in which I am extracting value of list and adding it in column's value.
But, Same value is adding in each row everytime.
df = pd.read_csv("test.csv")
lst = []
for i in range(20):
randColumn = ''.join(random.choice(string.ascii_uppercase + string.digits)
for i in range(20))
lst.append(randColumn)
for j in lst:
df['randColumn'] = j
print(df)
#Output.......
A B C randColumn
0 1 2 3 WHI11NJBNI8BOTMA9RKA
1 4 5 6 WHI11NJBNI8BOTMA9RKA
Could you please help me to fix this that Why each row has same value from list.

Updated to work correctly with any type of column in df.
If I got your question clearly, you can use method zip of rdd to achieve your goals.
from pyspark.sql import SparkSession, Row
import pyspark.sql.types as t
lst = []
for i in range(2):
rand_column = ''.join(random.choice(string.ascii_uppercase + string.digits) for i in range(20))
# Adding random strings as Row to list
lst.append(Row(random=rand_column))
# Making rdd from random strings array
random_rdd = sparkSession.sparkContext.parallelize(lst)
res = df.rdd.zip(random_rdd).map(lambda rows: Row(**(rows[0].asDict()), **(rows[1].asDict()))).toDF()

Merging DataFrames on a specific column together

I tried to perform my self-created function on a for loop.
Some remarks in advance:
ma_strategy is my function and requires three inputs
ticker_list is a list with strings result is a pandas Dataframe with 7 columns and I can call the column 'return_cum' with result['return_cum']. - The rows of this column are containing floating point numbers.
My intention is the following:
The for loop should iterate over the items in my ticker_list and should save the 'return_cum' columns in a DataFrame. Then the different 'return_cum' columns should be stored together so that at the end I get a DataFrame with all the 'return_cum' columns of my ticker list.
How can I achieve that goal?
My approach is:
for i in ticker_list:
result = ma_strategy(i, 20, 5)
x = result['return_cum'].to_frame()
But at this stage I need some help.

If i inderstood you correctly this should work:
result_df =pd.DataFrame()
for i in ticker_list:
result= ma_strategy(i, 20,5)
resault_df[i + '_return_cum'] = result['return_cum']

Repeat elements in pandas dataframe so equal number of each unique element

I have a pandas dataframe with multiple different feature columns. I have one particular column which can take on a variety of integer value. I want to manipulate the dataframe in such a way that there is an equal number of each of these integer value.
Before;
df['key'] = [1,1,1,3,4,5,5]
After;
df['key'] = [1,1,1,3,3,3,4,4,4,5,5,5]
I want this to be applied to every key in the dataframe.

So here's an ugly way that I've coded up a solution, but I feel like it goes against the entire reason to use pandas dataframes.
for idx, i in enumerate(data['key'].value_counts()):
if i == max(data['key'].value_counts()):
pass
else:
scaling = (max(data['key'].value_counts()) // i) - 1
data2 = pd.concat([data[data['key'] == idx]]*scaling, ignore_index=True)
data = pd.concat([data, data2], ignore_index=True)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Merging Dataframe within a for loop - pandas

Related

Contatenate rows in Pandas

cancatenate multiple dfs with same dimensions and apply functions to cell values of all dfs and store result in the cell

Getting same value from list in dataframe column using Python

Merging DataFrames on a specific column together

Repeat elements in pandas dataframe so equal number of each unique element

Categories

Resources