How to split the pandas column into two

How to split the pandas column into two - pandas

import pandas as pd
df = pd.DataFrame({'col1': ['asdf', 'xy', 'q'], 'col2': [(2, 1), (2, 2), (3, 3)]})
I have dataframe above. I wish to split the column col2 such as below:
import pandas as pd
df = pd.DataFrame({'col1': ['asdf', 'xy', 'q'], 'col2': [2, 2, 3], 'col3': [1, 2, 3]})
Is it possible?

You can use to_list and 2D assignment:
df[['col2', 'col3']] = df['col2'].tolist()
output:
col1 col2 col3
0 asdf 2 1
1 xy 2 2
2 q 3 3
Or, if you want to remove 'col2' and assign to another name, using pop:
df[['col3', 'col4']] = df.pop('col2').tolist()
output:
col1 col3 col4
0 asdf 2 1
1 xy 2 2
2 q 3 3

Yet another possible solution:
df['col2'], df['col3'] = zip(*df.col2)
Output:
col1 col2 col3
0 asdf 2 1
1 xy 2 2
2 q 3 3

You can use pandas.Series constructor :
df[['col2','col3']] = df['col2'].apply(pd.Series)
# Output :
print(df)
col1 col2 col3
0 asdf 2 1
1 xy 2 2
2 q 3 3

Related

Reshaping a pandas dataframe in a specific manner

Consider the code below:
import pandas as pd
d = {'col1': [1, 2, 3 ,4 ,5, 5, 6, 5], 'col2': [3, 4, 3 ,4 , 5, 6 , 6, 5], 'col3': [5, 6, 3 ,4 , 5, 6 ,6, 5], 'col4': [7, 8, 3 , 4 , 5, 4 , 6, 4], }
df = pd.DataFrame(data=d)
df=df.T
This code gives me the following output:
# 0 1 2 3 4 5 6 7
# col1 1 2 3 4 5 5 6 5
# col2 3 4 3 4 5 6 6 5
# col3 5 6 3 4 5 6 6 5
# col4 7 8 3 4 5 4 6 4
I would like to reshape the dataframe in such a way that the columns are rearranged as shown below:
# 0 1
# col1 1 2
# col2 3 4
# col3 5 6
# col4 7 8
# col1 3 4
# col2 3 4
# col3 3 4
# col4 3 4
# col1 5 5
# col2 5 6
# col3 5 6
# col4 5 4
# col1 6 5
# col2 6 5
# col3 6 5
# col4 6 4
The code should allow some room for modification so that one can choose two columns as in the above example or three columns or four columns and so on. Any ideas how to implement this?

Try this:
import pandas as pd
d = {'col1': [1, 2, 3 ,4 ,5, 5, 6, 5], 'col2': [3, 4, 3 ,4 , 5, 6 , 6, 5], 'col3': [5, 6, 3 ,4 , 5, 6 ,6, 5], 'col4': [7, 8, 3 , 4 , 5, 4 , 6, 4], }
df = pd.DataFrame(data=d)
df = df.T
number = 2 #Here you can choose the number of columns
df1 = df.iloc[:, :number]
for x in range(0, len(df.columns), number):
df1 = pd.concat([df1, df.iloc[:, x:x + number].T.reset_index(drop=True).T])
print(df1)

A much faster way, is to use numpy, especially as the number of columns is even.
You are reshaping into a 2 column dataframe; this is achieved with np.reshape:
data = np.reshape(df.to_numpy(), (-1, 2))
data
array([[1, 2],
[3, 4],
[5, 5],
[6, 5],
[3, 4],
[3, 4],
[5, 6],
[6, 5],
[5, 6],
[3, 4],
[5, 6],
[6, 5],
[7, 8],
[3, 4],
[5, 4],
[6, 4]])
The length of the current index is 4; when reshaped, it should be length of current index * length of columns/2:
index = np.tile(df.index, df.columns.size//2)
index
array(['col1', 'col2', 'col3', 'col4', 'col1', 'col2', 'col3', 'col4',
'col1', 'col2', 'col3', 'col4', 'col1', 'col2', 'col3', 'col4'],
dtype=object)
All that is left is to create a new dataframe:
pd.DataFrame(data, index = index)
0 1
col1 1 2
col2 3 4
col3 5 5
col4 6 5
col1 3 4
col2 3 4
col3 5 6
col4 6 5
col1 5 6
col2 3 4
col3 5 6
col4 6 5
col1 7 8
col2 3 4
col3 5 4
col4 6 4
Another option, is to use the idea of even and odd rows to reshape the data, with pyjanitor's pivot_longer function; collate even(0) and odd(1) into separate columns:
# pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor
(df.set_axis((df.columns % 2).astype(str), axis=1)
.pivot_longer(ignore_index=False,
names_to = ['0', '1'],
names_pattern=['0', '1'])
)
0 1
col1 1 2
col2 3 4
col3 5 6
col4 7 8
col1 3 4
col2 3 4
col3 3 4
col4 3 4
col1 5 5
col2 5 6
col3 5 6
col4 5 4
col1 6 5
col2 6 5
col3 6 5
col4 6 4
Again, the numpy approach is much faster

Groupby agg keep blank value

Let say I have got a dataframe called df
A 10
A 20
15
20
B 10
B 10
The result I want is
A 30
35
B 20

I imagine your blanks are actually NaNs, then use dropna=False:
df.groupby('col1', dropna=False).sum()
If they really are empty strings, then it should work with the default.
Example:
df = pd.DataFrame({'col1': ['A', 'A', float('nan'), float('nan'), 'B', 'B'],
'col2': [10, 20, 15, 20, 10, 10]})
df.groupby('col1', dropna=False).sum()
output:
col2
col1
A 30
B 20
NaN 35

Group by custom group and aggregate columns.
Suppose your dataframe with 2 columns: 'col1' and 'col2':
>>> df
col1 col2
0 A 10 # <- group 1
1 A 20 # <- group 1
2 15 # <- group 2
3 20 # <- group 2
4 B 10 # <- group 3
5 B 10 # <- group 3
grp = df.iloc[:, 0].ne(df.iloc[:, 0].shift()).cumsum()
out = df.groupby(grp, as_index=False).agg({'col1': 'first', 'col2': 'sum'})
Output result:
>>> out
col1 col2
0 A 30
1 35
2 B 20

convert from one column pandas dataframe to 3 columns based on index

I have:
col1
0 1
1 2
2 3
3 4
4 5
5 6
...
I want, every 3 rows of the original dataframe to become a single row in the new dataframe:
col1 col2 col3
0 1 2 3
1 4 5 6
...
Any suggestions?

The values of the dataframe are an array that can be reshaped using numpy's reshape method. Then, create a new dataframe using the reshaped values. Assuming your existing dataframe is df-
df_2 = pd.DataFrame(df.values.reshape(2, 3), columns=['col1', 'col2', 'col3'])
This will create the new dataframe of two rows and 3 columns.
col1 col2 col3
0 0 1 2
1 3 4 5

You can use set_index and unstack to get the right shape, and add_preffix to change the column name:
print (df.set_index([df.index//3, df.index%3+1])['col1'].unstack().add_prefix('col'))
col1 col2 col3
0 1 2 3
1 4 5 6
in case the original index is not consecutive values but you still want to reshape every 3 rows, replace df.index by np.arange(len(df)) for both in the set_index

you can covert the col in numpy array and then reshape.
In [27]: np.array(df['col1']).reshape( len(df) // 3 , 3 )
Out[27]:
array([[1, 2, 3],
[4, 5, 6]])
In [..] :reshaped_cols = np.array(df['col1']).reshape( len(df) // 3 , 3 )
pd.DataFrame( data = reshaped_cols , columns = ['col1' , 'col2' , 'col3' ] )
Out[30]:
col1 col2 col3
0 1 2 3
1 4 5 6

adding multiple lists into one column DataFrame pandas

l = {'col1': [[1,2,3], [4,5,6]]}
df = pd.DataFrame(data = l)
col1
0 [1, 2, 3]
1 [4, 5, 6]
Desired output:
col1
0 1
1 2
2 3
3 4
4 5
5 6

Here is explode
df.explode('col1')
col1
0 1
0 2
0 3
1 4
1 5
1 6

You can use np.ravel to flatten the list of lists:
import numpy as np, pandas as pd
l = {'col1': [[1,2,3], [4,5,6]]}
df = pd.DataFrame(np.ravel(*l.values()),columns=l.keys())
>>> df
col1
0 1
1 2
2 3
3 4
4 5
5 6

Dropping Rows with a does not equal condition

I'm attempting to create a new dataframe that drops a certain segment of records from an existing dataframe.
df2=df[df['AgeSeg']!='0-1']
when I look at df2, the records with '0-1' Age Segment are still there.
Output with 0-1 records still in it.
I would expect the new dataframe to not have them. What am I doing wrong?

You can use isin (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html)
Simple example:
import pandas as pd
df = pd.DataFrame({'col1': [1, 2, 3, 2, 9], 'col2': [4, 5, 6, 3, 0]})
df = df[df['col1'].isin([2]) != True]
df before:
col1 col2
0 1 4
1 2 5
2 3 6
3 2 3
4 9 0
df after:
col1 col2
0 1 4
2 3 6
4 9 0

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to split the pandas column into two - pandas

import pandas as pd df = pd.DataFrame({'col1': ['asdf', 'xy', 'q'], 'col2': [(2, 1), (2, 2), (3, 3)]}) I have dataframe above. I wish to split the column col2 such as below: import pandas as pd df = pd.DataFrame({'col1': ['asdf', 'xy', 'q'], 'col2': [2, 2, 3], 'col3': [1, 2, 3]}) Is it possible?

Yet another possible solution: df['col2'], df['col3'] = zip(*df.col2) Output: col1 col2 col3 0 asdf 2 1 1 xy 2 2 2 q 3 3

You can use pandas.Series constructor : df[['col2','col3']] = df['col2'].apply(pd.Series) # Output : print(df) col1 col2 col3 0 asdf 2 1 1 xy 2 2 2 q 3 3

Related

Reshaping a pandas dataframe in a specific manner

Groupby agg keep blank value

convert from one column pandas dataframe to 3 columns based on index

adding multiple lists into one column DataFrame pandas

Dropping Rows with a does not equal condition

Categories

Resources