convert from one column pandas dataframe to 3 columns based on index - pandas

I have:
col1
0 1
1 2
2 3
3 4
4 5
5 6
...
I want, every 3 rows of the original dataframe to become a single row in the new dataframe:
col1 col2 col3
0 1 2 3
1 4 5 6
...
Any suggestions?

The values of the dataframe are an array that can be reshaped using numpy's reshape method. Then, create a new dataframe using the reshaped values. Assuming your existing dataframe is df-
df_2 = pd.DataFrame(df.values.reshape(2, 3), columns=['col1', 'col2', 'col3'])
This will create the new dataframe of two rows and 3 columns.
col1 col2 col3
0 0 1 2
1 3 4 5

You can use set_index and unstack to get the right shape, and add_preffix to change the column name:
print (df.set_index([df.index//3, df.index%3+1])['col1'].unstack().add_prefix('col'))
col1 col2 col3
0 1 2 3
1 4 5 6
in case the original index is not consecutive values but you still want to reshape every 3 rows, replace df.index by np.arange(len(df)) for both in the set_index

you can covert the col in numpy array and then reshape.
In [27]: np.array(df['col1']).reshape( len(df) // 3 , 3 )
Out[27]:
array([[1, 2, 3],
[4, 5, 6]])
In [..] :reshaped_cols = np.array(df['col1']).reshape( len(df) // 3 , 3 )
pd.DataFrame( data = reshaped_cols , columns = ['col1' , 'col2' , 'col3' ] )
Out[30]:
col1 col2 col3
0 1 2 3
1 4 5 6

Related

Convert multiple columns in pandas dataframe to array of arrays

I have the following dataframe:
col1 col2 col3
1 1 2 3
2 4 5 6
3 7 8 9
4 10 11 12
I want to create a new column that will be an array of arrays, that contains a single array consisting of specific columns, casted to float.
So given column names, say "col2" and "col3", the output dataframe would look like this.
col1 col2 col3 new
1 1 2 3 [[2,3]]
2 4 5 6 [[5,6]]
3 7 8 9 [[8,9]]
4 10 11 12 [[11,12]]
What I have so far works, but seems clumsy and I believe there's a better way. I'm fairly new to pandas and numpy.
selected_columns = ["col2", "col3"]
df[selected_columns] = df[selected_columns].astype(float)
df['new'] = df.apply(lambda r: tuple(r[selected_columns]), axis=1)
.apply(np.array)
.apply(lambda r: tuple(r[["new"]]), axis=1)
.apply(np.array)
Appreciate your help, Thanks!
Using agg:
cols = ['col2', 'col3']
df['new'] = df[cols].agg(list, axis=1)
Using numpy:
df['new'] = df[cols].to_numpy().tolist()
Output:
col1 col2 col3 new
1 1 2 3 [2, 3]
2 4 5 6 [5, 6]
3 7 8 9 [8, 9]
4 10 11 12 [11, 12]
2D lists
cols = ['col2', 'col3']
df['new'] = df[cols].agg(lambda x: [list(x)], axis=1)
# or
df['new'] = df[cols].to_numpy()[:,None].tolist()
Output:
col1 col2 col3 new
1 1 2 3 [[2, 3]]
2 4 5 6 [[5, 6]]
3 7 8 9 [[8, 9]]
4 10 11 12 [[11, 12]]

How to split a series with array to multiple series?

One column of my dataset contains numpy arrays as elements. I want to split it to multiple columns each one with a value of the array.
Data now are like:
column1 column2 column3
0 1 np.array([1,2,3,4]) 4.5
1 2 np.array([5,6,7,8]) 3
I want to convert it into:
column1 col1 col2 col3 col4 column3
0 1 1 2 3 4 4.5
1 2 5 6 7 8 3
Another possible solution, based on pandas.DataFrame.from_records:
out = pd.DataFrame.from_records(
df['col'], columns=[f'col{i+1}' for i in range(len(df.loc[0, 'col']))])
Output:
col1 col2 col3 col4
0 1 2 3 4
1 5 6 7 8
As an alternative:
df = pd.DataFrame(data={'col':[np.array([1,2,3,4]),np.array([5,6,7,8])]})
new_df = pd.DataFrame(df.col.tolist(), index= df.index) #explode column to new dataframe and get index from old df.
new_df.columns = ["col_{}".format(i) for i in range(1,len(new_df.columns) + 1)]
'''
col_1 col_2 col_3 col_4
0 1 2 3 4
1 5 6 7 8
'''
I hope I've understood your question well. You can leverage the result_type="expand" of the .apply method:
df = df.apply(
lambda x: {f"col{k}": vv for v in x for k, vv in enumerate(v, 1)},
result_type="expand",
axis=1,
)
print(df)
Prints:
col1 col2 col3 col4
0 1 2 3 4
1 5 6 7 8

How to split the pandas column into two

import pandas as pd
df = pd.DataFrame({'col1': ['asdf', 'xy', 'q'], 'col2': [(2, 1), (2, 2), (3, 3)]})
I have dataframe above. I wish to split the column col2 such as below:
import pandas as pd
df = pd.DataFrame({'col1': ['asdf', 'xy', 'q'], 'col2': [2, 2, 3], 'col3': [1, 2, 3]})
Is it possible?
You can use to_list and 2D assignment:
df[['col2', 'col3']] = df['col2'].tolist()
output:
col1 col2 col3
0 asdf 2 1
1 xy 2 2
2 q 3 3
Or, if you want to remove 'col2' and assign to another name, using pop:
df[['col3', 'col4']] = df.pop('col2').tolist()
output:
col1 col3 col4
0 asdf 2 1
1 xy 2 2
2 q 3 3
Yet another possible solution:
df['col2'], df['col3'] = zip(*df.col2)
Output:
col1 col2 col3
0 asdf 2 1
1 xy 2 2
2 q 3 3
You can use pandas.Series constructor :
df[['col2','col3']] = df['col2'].apply(pd.Series)
# Output :
print(df)
col1 col2 col3
0 asdf 2 1
1 xy 2 2
2 q 3 3

pandas rolling calc count function

import pandas as pd
import numpy as np
df = pd.DataFrame({'col1': np.arange(6), 'col2': np.arange(2, 8)})
col1
col2
0
3
1
4
2
5
3
6
4
7
and i want get column col3 if condition with after rolling.tail(3) and re turn count(col1>=3 and col2>=3)
the last result i want it likes:
col1
col2
col3
reason
0
3
0
1
4
1
[(3>=3 and 6>=3)]
2
5
2
[(3>=3 and 6>=3),(4>=3 and 7>=3)]
3
6
nan
4
7
nan
Hope to get your reply as soon as possible

Dropping Rows with a does not equal condition

I'm attempting to create a new dataframe that drops a certain segment of records from an existing dataframe.
df2=df[df['AgeSeg']!='0-1']
when I look at df2, the records with '0-1' Age Segment are still there.
Output with 0-1 records still in it.
I would expect the new dataframe to not have them. What am I doing wrong?
You can use isin (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html)
Simple example:
import pandas as pd
df = pd.DataFrame({'col1': [1, 2, 3, 2, 9], 'col2': [4, 5, 6, 3, 0]})
df = df[df['col1'].isin([2]) != True]
df before:
col1 col2
0 1 4
1 2 5
2 3 6
3 2 3
4 9 0
df after:
col1 col2
0 1 4
2 3 6
4 9 0