I have a dataset with 71 columns and 113 rows. Each column is a array of values. I want to split these arrays into separate columns. Then rename the columns with the prefix
!wget https://raw.githubusercontent.com/pranavn91/sample/master/audioonly.csv
audio = pd.read_csv("audioonly.csv")
zcr = pd.DataFrame(audio['zcr'].str.split().values.tolist())
zcr.columns = ['zcr_' + str(col) for col in zcr.columns]
I can do it for each column individually and combine as single dataframe.
Please propose a faster method.
you can use concat and a list comprehension:
audio_exploded = pd.concat([pd.DataFrame(audio[col].str.split().values.tolist())\
.add_prefix(f'{col}_')
for col in audio.columns],
axis=1)
Related
I would like to concatenate all the columns with comma-delimitted in pandas.
But as you can seem it is very laborious tasks since I manually typed all the column indices.
de = data[3]+","+data[4]+","+data[5]+....+","+data[1511]
do you have any idea to avoid above procedure in pandas in python3?
First convert all columns to strings by DataFrame.astype and then possible add join per rows:
df = data.astype(str).apply(','.join, axis=1)
Or after convert to strings add ,, then sum and last remove last , by Series.str.rstrip:
df = data.astype(str).add(',').sum(axis=1).str.rstrip(',')
I'm trying to understand how to concat three individual dataframes (i.e df1, df2, df3) into a new dataframe say df4 whereby each individual dataframe has its own column left to right order.
I've tried using concat with axis = 1 to do this, but it appears not possible to automate this with a single action.
Table1_updated = pd.DataFrame(columns=['3P','2PG-3Io','3Io'])
Table1_updated=pd.concat([get_table1_3P,get_table1_2P_max_3Io,get_table1_3Io])
Note that with the exception of get_table1_2P_max_3Io, which has two columns, all other dataframes have one column
For example,
get_table1_3P =
get_table1_2P_max_3Io =
get_table1_3Io =
Ultimately, i would like to see the following:
I believe you need first concat and tthen change order by list of columns names:
Table1_updated=pd.concat([get_table1_3P,get_table1_2P_max_3Io,get_table1_3Io], axis=1)
Table1_updated = Table1_updated[['3P','2PG-3Io','3Io']]
I am selecting row by row as follows:
for i in range(num_rows):
row = df.iloc[i]
as a result I am getting a Series object where row.index.values contains names of df columns.
But I wanted instead dataframe with only one row having dataframe columns in place.
When I do row.to_frame() instead of 1x85 dataframe (1 row, 85 cols) I get 85x1 dataframe where index contains names of columns and row.columns
outputs
Int64Index([0], dtype='int64').
But all I want is just original data-frame columns with only one row. How do I do it?
Or how do I convert row.index values to row.column values and change 85x1 dimension to 1x85
You just need to adding T
row.to_frame().T
Also change your for loop with adding []
for i in range(num_rows):
row = df.iloc[[i]]
I have a dataframe as
df = pd.DataFrame(np.random.randn(5,4),columns=list('ABCD'))
I can use the following to achieve the traditional calculation like mean(), sum()etc.
df.loc['calc'] = df[['A','D']].iloc[2:4].mean(axis=0)
Now I have two questions
How can I apply a formula (like exp(mean()) or 2.5*mean()/sqrt(max()) to column 'A' and 'D' for rows 2 to 4
How can I append row to the existing df where two values would be mean() of the A and D and two values would be of specific formula result of C and B.
Q1:
You can use .apply() and lambda functions.
df.iloc[2:4,[0,3]].apply(lambda x: np.exp(np.mean(x)))
df.iloc[2:4,[0,3]].apply(lambda x: 2.5*np.mean(x)/np.sqrt(max(x)))
Q2:
You can use dictionaries and combine them and add it as a row.
First one is mean, the second one is some custom function.
ad = dict(df[['A', 'D']].mean())
bc = dict(df[['B', 'C']].apply(lambda x: x.sum()*45))
Combine them:
ad.update(bc)
df = df.append(ad, ignore_index=True)
I tried to perform my self-created function on a for loop.
Some remarks in advance:
ma_strategy is my function and requires three inputs
ticker_list is a list with strings result is a pandas Dataframe with 7 columns and I can call the column 'return_cum' with result['return_cum']. - The rows of this column are containing floating point numbers.
My intention is the following:
The for loop should iterate over the items in my ticker_list and should save the 'return_cum' columns in a DataFrame. Then the different 'return_cum' columns should be stored together so that at the end I get a DataFrame with all the 'return_cum' columns of my ticker list.
How can I achieve that goal?
My approach is:
for i in ticker_list:
result = ma_strategy(i, 20, 5)
x = result['return_cum'].to_frame()
But at this stage I need some help.
If i inderstood you correctly this should work:
result_df =pd.DataFrame()
for i in ticker_list:
result= ma_strategy(i, 20,5)
resault_df[i + '_return_cum'] = result['return_cum']