Two-level header in pandas? - pandas

I created a new dataframe from an old one and now I have something like this:
df = pd.DataFrame({0:[1,5,1,1,3]}, index=[243,254,507,1903,2358]).rename_axis('uid')
print (df)
0
uid
243 1
254 5
507 1
1903 1
2358 3
I don't really understand what it means. Is that a double header with the first header having just one index and the second having the other one? How can I transform this dataframe into having a single header, with names ['userID' , 'counts'] ?

Here is one column DataFrame with column 0 and index name uid.
So need:
df = df.reset_index()
df.columns = ['userID' , 'counts']
print (df)
userID counts
0 243 1
1 254 5
2 507 1
3 1903 1
4 2358 3
Another solution:
df = df.rename_axis('userID').squeeze().reset_index(name='counts')

Related

I want to remove specific rows and restart the values from 1

I have a dataframe that looks like this:
Time Value
1 5
2 3
3 3
4 2
5 1
I want to remove the first two rows and then restart time from 1. The dataframe should then look like:
Time Value
1 3
2 2
3 1
I attach the code:
file = pd.read_excel(r'C:......xlsx')
df = file0.loc[(file0['Time']>2) & (file0['Time']<11)]
df = df.reset_index()
Now what I get is:
index Time Value
0 3 3
1 4 2
2 5 1
Thank you!
You can use .loc[] accessor and reset_index() method:
df=df.loc[2:].reset_index(drop=True)
Finally use list comprehension:
df['Time']=[x for x in range(1,len(df)+1)]
Now If you print df you will get your desired output:
Time Value
0 1 3
1 2 2
2 3 1
You can use df.loc to extract the subset of dataframe, Reset the index and then change the value of Time column.
df = df.loc[2:].reset_index(drop=True)
df['Time'] = df.index + 1
print(df)
you have two ways to do that.
first :
df[2:].assign(time = df.time.values[:-2])
Which returns your desired output.
time
value
1
3
2
2
3
1
second :
df = df.set_index('time')
df['value'] = df['value'].shift(-2)
df.dropna()
this return your output too but turn the numbers to float64
time
value
1
3.0
2
2.0
3
1.0

Sort data in Pandas dataframe alphabetically

I have a dataframe where I need to sort the contents of one column (comma separated) alphabetically:
ID Data
1 Mo,Ab,ZZz
2 Ab,Ma,Bt
3 Xe,Aa
4 Xe,Re,Fi,Ab
Output:
ID Data
1 Ab,Mo,ZZz
2 Ab,Bt,Ma
3 Aa,Xe
4 Ab,Fi,Re,Xe
I have tried:
df.sort_values(by='Data')
But this does not work
You can split, sorting and then join back:
df['Data'] = df['Data'].apply(lambda x: ','.join(sorted(x.split(','))))
Or use list comprehension alternative:
df['Data'] = [','.join(sorted(x.split(','))) for x in df['Data']]
print (df)
ID Data
0 1 Ab,Mo,ZZz
1 2 Ab,Bt,Ma
2 3 Aa,Xe
3 4 Ab,Fi,Re,Xe
IIUC get_dummies
s=df.Data.str.get_dummies(',')
df['n']=s.dot(s.columns+',').str[:-1]
df
Out[216]:
ID Data n
0 1 Mo,Ab,ZZz Ab,Mo,ZZz
1 2 Ab,Ma,Bt Ab,Bt,Ma
2 3 Xe,Aa Aa,Xe
3 4 Xe,Re,Fi,Ab Ab,Fi,Re,Xe
IIUC you can use a list comprehension:
[','.join(sorted(i.split(','))) for i in df['Data']]
#['Ab,Mo,ZZz', 'Ab,Bt,Ma', 'Aa,Xe', 'Ab,Fi,Re,Xe']
using explode and sort_values
df["Sorted_Data"] = (
df["Data"].str.split(",").explode().sort_values().groupby(level=0).agg(','.join)
)
print(df)
ID Data Sorted_Data
0 1 Mo,Ab,ZZz Ab,Mo,ZZz
1 2 Ab,Ma,Bt Ab,Bt,Ma
2 3 Xe,Aa Aa,Xe
3 4 Xe,Re,Fi,Ab Ab,Fi,Re,Xe
Using row iteration:
for index, row in df.iterrows():
row['Data'] = ','.join(sorted(row['Data'].split(',')))
In [29]: df
Out[29]:
Data
0 Ab,Mo,ZZz
1 Ab,Bt,Ma
2 Aa,Xe
3 Ab,Fi,Re,Xe

Compare two data frames for different values in a column

I have two dataframe, please tell me how I can compare them by operator name, if it matches, then add the values ​​of quantity and time to the first data frame.
In [2]: df1 In [3]: df2
Out[2]: Out[3]:
Name count time Name count time
0 Bob 123 4:12:10 0 Rick 9 0:13:00
1 Alice 99 1:01:12 1 Jone 7 0:24:21
2 Sergei 78 0:18:01 2 Bob 10 0:15:13
85 rows x 3 columns 105 rows x 3 columns
I want to get:
In [5]: df1
Out[5]:
Name count time
0 Bob 133 4:27:23
1 Alice 99 1:01:12
2 Sergei 78 0:18:01
85 rows x 3 columns
Use set_index and add them together. Finally, update back.
df1 = df1.set_index('Name')
df1.update(df1 + df2.set_index('Name'))
df1 = df1.reset_index()
Out[759]:
Name count time
0 Bob 133.0 04:27:23
1 Alice 99.0 01:01:12
2 Sergei 78.0 00:18:01
Note: I assume time columns in both df1 and df2 are already in correct date/time format. If they are in string format, you need to convert them before running above commands as follows:
df1.time = pd.to_timedelta(df1.time)
df2.time = pd.to_timedelta(df2.time)

How to add a new row to pandas dataframe with non-unique multi-index

df = pd.DataFrame(np.arange(4*3).reshape(4,3), index=[['a','a','b','b'],[1,2,1,2]], columns=list('xyz'))
where df looks like:
Now I add a new row by:
df.loc['new',:]=[0,0,0]
Then df becomes:
Now I want to do the same but with a different df that has non-unique multi-index:
df = pd.DataFrame(np.arange(4*3).reshape(4,3), index=[['a','a','b','b'],[1,1,2,2]], columns=list('xyz'))
,which looks like:
and call
df.loc['new',:]=[0,0,0]
The result is "Exception: cannot handle a non-unique multi-index!"
How could I achieve the goal?
Use append or concat with helper DataFrame:
df1 = pd.DataFrame([[0,0,0]],
columns=df.columns,
index=pd.MultiIndex.from_arrays([['new'], ['']]))
df2 = df.append(df1)
df2 = pd.concat([df, df1])
print (df2)
x y z
a 1 0 1 2
1 3 4 5
b 2 6 7 8
2 9 10 11
new 0 0 0

How to expand one row to multiple rows according to its value in Pandas

This is a DataFrame I have for example. Please refer the image link.
Before:
Before
d = {1: ['2134',20, 1,1,1,0], 2: ['1010',5, 1,0,0,0], 3: ['3457',15, 0,1,1,0]}
columns=['Code', 'Price', 'Bacon','Onion','Tomato', 'Cheese']
df = pd.DataFrame.from_dict(data=d, orient='index').sort_index()
df.columns = columns
What I want to do is expanding a single row into multiple rows. Then the Dataframe should be look like the image of below link. The intention is using some columns(from 'Bacon' to 'Cheese') as categories.
After:
After
I tried to find the answer, but failed. Thanks.
You can first reshape with set_index and stack, then filter by query and get_dummies from column level_2 and last reindex columns for add missing with no 1 and reset_index:
df = df.set_index(['Code', 'Price']) \
.stack() \
.reset_index(level=2, name='val') \
.query('val == 1') \
.level_2.str.get_dummies() \
.reindex(columns=df.columns[2:], fill_value=0) \
.reset_index()
print (df)
Code Price Bacon Onion Tomato Cheese
0 2134 20 1 0 0 0
1 2134 20 0 1 0 0
2 2134 20 0 0 1 0
3 1010 5 1 0 0 0
4 3457 15 0 1 0 0
5 3457 15 0 0 1 0
You can use stack and transpose to do this operation and format accordingly.
df = df.stack().to_frame().T
df.columns = ['{}_{}'.format(*c) for c in df.columns]
Use pd.melt to put all the food in one column and then pd.get_dummies to expand the columns.
df1 = pd.melt(df, id_vars=['Code', 'Price'])
df1 = df1[df1['value'] == 1]
df1 = pd.get_dummies(df1, columns=['variable'], prefix='', prefix_sep='').sort_values(['Code', 'Price'])
df1.reindex(columns=df.columns, fill_value=0)
Edited after I saw how jezrael used reindex to both add and drop a column.