How to convert my column into row using Pandas - pandas

I have result my data frame as below. How to convert into rows. My column header is numbers which has to avoided
https://i.stack.imgur.com/RvwVD.png

IIUC, try this:
row = [*'ABCDE']
row2 = [1,2,3,4,5]
df = pd.DataFrame([row,row2])
print(df)
Input dataframe:
0 1 2 3 4
0 A B C D E
1 1 2 3 4 5
Use this code:
df_out = df.T.set_index(0)
print(df_out)
Output:
1
0
A 1
B 2
C 3
D 4
E 5

you can use a simple reshape to do this.
This can be many other ways to do this, Here I show I way,
I think this is how your data frame looks like:
**df = pd.DataFrame({0:['neg',0.015],1:['neu',0.006],2:['pos',0.014]})**
You can do a simple reshape using below line:
**
import pandas as pd
import numpy as np
pd.DataFrame(np.array(df.iloc[1:]).reshape(-1,1))
**

If I understand you correctly, you want to turn your first row into your header?
This is the way I would do it:
df.T.set_index(0).T

Related

Trying to convert column to be row indexes, set_index error

data_new. set_index('Usual Mode of Transport to Work')
jupyter notebook
Trying to convert column to be row indexes, however, it shows up as NaN? How do i resolve it? Thanks. Im a beginner in python.
Lets start with a toy dataframe
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,5,size=(5, 4)), columns=list('ABCD'))
print(df)
A B C D
0 3 1 2 1
1 2 2 3 4
2 2 4 4 1
3 1 0 3 2
4 1 2 4 0
Now, let's set column A as the index
df.set_index('A')
B C D
A
3 1 2 1
2 2 3 4
2 4 4 1
1 0 3 2
1 2 4 0
This sets the index correctly but doesn't save this newly indexed dataframe in the original data frame variable, i.e., df. So when you check the value of df you see find the original dataframe.
To save the new indexing, you can do one of the following
df = df.set_index('A)
or
df.set_index('A', inplace=True)
Coming to the NaN values, I believe it has got something to do with using Jupyter notebook. Since Jupyter allows jumping between cells, it does not necessarily follow the linear execution order like traditional coding. This can get confusing. You can use the "Variable View" in Jupyter to cross-check if you are passing the value you intend to. I hope this can help you figure out the NaN issue.

How can I group different rows based on its value?

I have a data frame in pandas like this:
Attributes1
Attributes value1
Attributes2
Attributes value2
a
1
b
4
b
2
a
5
Does anyone know how can I get a new data frame like below?
a
b
1
2
5
4
Thank you!
Try:
x = pd.DataFrame(
df.apply(
lambda x: dict(
zip(x.filter(regex=r"Attributes\d+$"), x.filter(like="value"))
),
axis=1,
).to_list()
)
print(x)
Prints:
a b
0 1 4
1 5 2
Use the transpose() function to transpose the rows to columns, then use the groupby() function to group the columns with the same name.
Also, in the future, please add what you've tried to do to solve the problem as well.
We can do wide_to_long then pivot
s = pd.wide_to_long(df.reset_index(),
['Attributes','Attributes value'],
i = 'index',
j = 'a').reset_index().drop(['a'],axis=1)
s = s.pivot(*s)
Out[22]:
Attributes a b
index
0 1 4
1 5 2

Converting a pandas crosstab into a stacked dataframe (a regular table)

Given a pandas crosstab, how do you convert that into a stacked dataframe?
Assume you have a stacked dataframe. First we convert it into a crosstab. Now I would like to revert back to the original stacked dataframe. I searched a problem statement that addresses this requirement, but could not find any that hits bang on. In case I have missed any, please leave a note to it in the comment section.
I would like to document the best practice here. So, thank you for your support.
I know that pandas.DataFrame.stack() would be the best approach. But one needs to be careful of the the "level" stacking is applied to.
Input: Crosstab:
Label a b c d r
ID
1 0 1 0 0 0
2 1 1 0 1 1
3 1 0 0 0 1
4 1 0 0 1 0
6 1 0 0 0 0
7 0 0 1 0 0
8 1 0 1 0 0
9 0 1 0 0 0
Output: Stacked DataFrame:
ID Label
0 1 b
1 2 a
2 2 b
3 2 d
4 2 r
5 3 a
6 3 r
7 4 a
8 4 d
9 6 a
10 7 c
11 8 a
12 8 c
13 9 b
Step-by-step Explanation:
First, let's make a function that would create our data. Note that it randomly generates the stacked dataframe, and so, the final output may differ from what I have given below.
Helper Function: Make the Stacked And Crosstab DataFrames
import numpy as np
import pandas as pd
# Make stacked dataframe
def _create_df():
"""
This dataframe will be used to create a crosstab
"""
B = np.array(list('abracadabra'))
A = np.arange(len(B))
AB = list()
for i in range(20):
a = np.random.randint(1,10)
b = np.random.randint(1,10)
AB += [(a,b)]
AB = np.unique(np.array(AB), axis=0)
AB = np.unique(np.array(list(zip(A[AB[:,0]], B[AB[:,1]]))), axis=0)
AB_df = pd.DataFrame({'ID': AB[:,0], 'Label': AB[:,1]})
return AB_df
original_stacked_df = _create_df()
# Make crosstab
crosstab_df = pd.crosstab(original_stacked_df['ID'],
original_stacked_df['Label']).reindex()
What to expect?
You would expect a function to regenerate the stacked dataframe from the crosstab. I would provide my own solution to this in the answer section. If you could suggest something better that would be great.
Other References:
Closest stackoverflow discussion: pandas stacking a dataframe
Misleading stackoverflow question-topic: change pandas crossstab dataframe into plain table format:
You can just do stack
df[df.astype(bool)].stack().reset_index().drop(0,1)
The following produces the desired outcome.
def crosstab2stacked(crosstab):
stacked = crosstab.stack(dropna=True).reset_index()
stacked = stacked[stacked.replace(0,np.nan)[0].notnull()].drop(columns=[0])
return stacked
# Make original dataframe
original_stacked_df = _create_df()
# Make crosstab dataframe
crosstab_df = pd.crosstab(original_stacked_df['ID'],
original_stacked_df['Label']).reindex()
# Recontruct stacked dataframe
recon_stacked_df = crosstab2stacked(crosstab = crosstab_df)
Check if original == reconstructed:
np.alltrue(original_stacked_df == recon_stacked_df)
Output: True

Pandas: Imputing Missing Values to Data Frame

Suppose I have a data frame with some missing values, as below:
import pandas as pd
df = pd.DataFrame([[1,3,'NA',2], [0,1,1,3], [1,2,'NA',1]], columns=['W', 'X', 'Y', 'Z'])
print(df)
The variable Y is missing two values. Say I run some imputation model and come up with an estimate of what the two values should be:
to_impute = [2,1]
What is the best way of replacing the two NA's with those two values? I know of ways that are fairly roundabout, e.g. looping over to_impute and using df.iloc to add each value. But I'm hoping there is a concise and non-iterative way.
(This is something that is easy in R, and I'm hoping it can be easy in Pandas.)
In pandas NA should be NaN, 1st you need to replace it , then we can using fillna
df.Y=df.Y.replace('NA',np.nan)
df.Y=df.Y.fillna(pd.Series([1,2],index=df.index[df.Y.isnull()]))
df
Out[1375]:
W X Y Z
0 1 3 1.0 2
1 0 1 1.0 3
2 1 2 2.0 1
Let us treat your NA as str
df.loc[df.Y=='NA','Y']=[1,2]
df
Out[1380]:
W X Y Z
0 1 3 1 2
1 0 1 1 3
2 1 2 2 1

Pandas: Create several rows from column that is a list

Let's say I have something like this:
df = pd.DataFrame({'key':[1,2,3], 'type':[[1,3],[1,2,3],[1,2]], 'value':[5,1,8]})
key type value
1 [1, 3] 5
2 [1, 2, 3] 1
3 [1] 8
Where one of the columns contains a list of items.
I would like to create several rows for each row that contains multiple types.
Ontaining this:
key type value
1 1 5
1 3 5
2 1 1
2 2 1
2 3 1
3 1 8
I've been playing with apply with axis=1 but I can't find a way to return more than 1 row per row of the DataFrame.
Extracting all different 'types' and then looping-concatenating seems to be ugly.
any ideas?
Thanks!!!
import itertools
import pandas as pd
import numpy as np
def melt_series(s):
lengths = s.str.len().values
flat = [i for i in itertools.chain.from_iterable(s.values.tolist())]
idx = np.repeat(s.index.values, lengths)
return pd.Series(flat, idx, name=s.name)
melt_series(df.type).to_frame().join(df.drop('type', 1)).reindex_axis(df.columns, 1)
setup
df = pd.DataFrame({'key':[1,2,3],
'type':[[1,3],[1,2,3],[1,2]],
'value':[5,1,8]})
df