Keys as Rows (Pandas Dataframe from dictionary) - pandas

I have this dictionary:
d = {'a': (1,2,3), 'b': (4,5,6)}
I would like it to be formed as a dataframe where the key is shown as row along with its corresponding values, like the table below:
Keys
Values
a
1
a
2
a
3
b
4
b
5
Any ideas?

Here is my suggestion.
Create your dataframe with the following command:
df = pd.DataFrame({'Keys': list(dict.keys()), 'Values': list(dict.values())})
Explode your dataframe on column of 'Values' with the following command:
df = df.explode(column='Values').reset_index(drop=True
The output result is something like this:
Keys Values
0 a 1
1 a 2
2 a 3
3 b 4
4 b 5
5 b 6

d = {'a': (1,2,3), 'b': (4,5,6)}
df = pd.DataFrame(d).unstack().droplevel(1).reset_index().rename({'index':'Keys', 0:'Values'}, axis=1)
Output:
>>> df
Keys Values
0 a 1
1 a 2
2 a 3
3 b 4
4 b 5
5 b 6

Related

creating a new dataframe from 3 other dataframes but columns must have specific order without specifying the name of the columns

Assume we have 3 dataframes named df1, df2, df3. Each of these dataframes have 100 rows and 15 columns. I want to create new dataframe that will have the first column of df1, then the first column of df2m then the first column of df3. then it will have the second column of df1 then the second column of df2 then the second column of df3 and so on until all 15 columns of each of the three dataframes are included. For example
df1
A B C ... O
1 1 1 1
1 1 1 1
... ... ... ...
df2
A B C ... O
2 2 2 2
2 2 2 2
... ... ... ...
df3
A B C ... O
3 3 3 3
3 3 3 3
... ... ... ...
The expected output should be something like the following
dfnew
A_df1 A_df2 A_df3 B_df1 B_df2 B_df3 ... O_df1 O_df2 O_df3
1 2 3 1 2 3 1 2 3
1 2 3 1 2 3 1 2 3
... ... ... ...
My issue is that I cannot use the names of the columns to specify them. For example I know how to do it like this
# create a list of the dataframes
dfs = [df1, df2, df3]
# concatenate the dataframes along the columns axis (axis=1)
dfnew = pd.concat(dfs, axis=1)
# specify the column names for the new dataframe
column_names = ["column1", "column2", ..., "column15"]
# concatenate the dataframes along the columns axis (axis=1)
# and specify the column names for the new dataframe
dfnew = pd.concat(dfs, axis=1, columns=column_names)
but I cannot use the column names because they will change everytime. Plus it seems like there could be a faster way that hard coding them by using the .loc function
Exmaple
data1 = {'A': {0: 1, 1: 1}, 'B': {0: 1, 1: 1}, 'C': {0: 1, 1: 1}}
df1 = pd.DataFrame(data1)
df2 = df1.replace(1, 2).copy()
df3 = df1.replace(1, 3).copy()
df1
A B C
0 1 1 1
1 1 1 1
df2
A B C
0 2 2 2
1 2 2 2
df3
A B C
0 3 3 3
1 3 3 3
Code
dfs = (pd.concat([df1, df2, df3], axis=1, keys=['df1', 'df2', 'df3'])
.sort_index(level=1, axis=1).swaplevel(0, 1, axis=1))
dfs
A B C
df1 df2 df3 df1 df2 df3 df1 df2 df3
0 1 2 3 1 2 3 1 2 3
1 1 2 3 1 2 3 1 2 3
dfs.set_axis(dfs.columns.map('_'.join), axis=1)
A_df1 A_df2 A_df3 B_df1 B_df2 B_df3 C_df1 C_df2 C_df3
0 1 2 3 1 2 3 1 2 3
1 1 2 3 1 2 3 1 2 3

pandas grouping based on conditions on other columns

import pandas as pd
df = pd.DataFrame(columns=['A','B'])
df['A']=['A','B','A','A','B','B','B']
df['B']=[2,4,3,5,6,7,8]
df
A B
0 A 2
1 B 4
2 A 3
3 A 5
4 B 6
5 B 7
6 B 8
df.columns=['id','num']
df
id num
0 A 2
1 B 4
2 A 3
3 A 5
4 B 6
5 B 7
6 B 8
I would like to apply groupby on id column but some condition on num column
I want to have 2 columns is_even_count and is_odd_count columns in final data frame where is_even_count only counts even numbers from num column after grouping and is_odd_count only counts odd numbers from num column after grouping.
My output will be
is_even_count is_odd_count
A 1 2
B 3 1
how can i do this in pandas
Use modulo division by 2 and compare by 1 with map:
d = {True:'is_odd_count', False:'is_even_count'}
df = df.groupby(['id', (df['num'] % 2 == 1).map(d)]).size().unstack(fill_value=0)
print (df)
num is_even_count is_odd_count
id
A 1 2
B 3 1
Another solution with crosstab:
df = pd.crosstab(df['id'], (df['num'] % 2 == 1).map(d))
Alternative with numpy.where:
a = np.where(df['num'] % 2 == 1, 'is_odd_count', 'is_even_count')
df = pd.crosstab(df['id'], a)

Re-index to insert missing rows in a multi-indexed dataframe

I have a MultiIndexed DataFrame with three levels of indices. I would like to expand my third level to contain all values in a given range, but only for the existing values in the two upper levels.
For example, assume the first level is name, the second level is date and the third level is hour. I would like to have rows for all 24 possible hours (even if some are currently missing), but only for the already existing names and dates. The values in new rows can be filled with zeros.
So a simple example input would be:
>>> import pandas as pd
>>> df = pd.DataFrame([[1,1,1,3],[2,2,1,4], [3,3,2,5]], columns=['A', 'B', 'C','val'])
>>> df.set_index(['A', 'B', 'C'], inplace=True)
>>> df
val
A B C
1 1 1 3
2 2 1 4
3 3 2 5
if the required values for C are [1,2,3], the desired output would be:
val
A B C
1 1 1 3
2 0
3 0
2 2 1 4
2 0
3 0
3 3 1 0
2 5
3 0
I know how to achieve this using groupby and applying a defined function for each group, but I was wondering if there was a cleaner way of doing this with reindex (I couldn't make this one work for a MultiIndex case, but perhaps I'm missing something)
Use -
partial_indices = [ i[0:2] for i in df.index.values ]
C_reqd = [1, 2, 3]
final_indices = [j+(i,) for j in partial_indices for i in C_reqd]
index = pd.MultiIndex.from_tuples(final_indices, names=['A', 'B', 'C'])
df2 = pd.DataFrame(pd.Series(0, index), columns=['val'])
df2.update(df)
Output
df2
val
A B C
1 1 1 3.0
2 0.0
3 0.0
2 2 1 4.0
2 0.0
3 0.0
3 3 1 0.0
2 5.0
3 0.0

Pandas: Delete duplicated items in a specific column

I have a panda dataframe (here represented using excel):
Now I would like to delete all dublicates (1) of a specific row (B).
How can I do it ?
For this example, the result would look like that:
You can use duplicated for boolean mask and then set NaNs by loc, mask or numpy.where:
df.loc[df['B'].duplicated(), 'B'] = np.nan
df['B'] = df['B'].mask(df['B'].duplicated())
df['B'] = np.where(df['B'].duplicated(), np.nan,df['B'])
Alternative if need remove duplicates rows by B column:
df = df.drop_duplicates(subset=['B'])
Sample:
df = pd.DataFrame({
'B': [1,2,1,3],
'A':[1,5,7,9]
})
print (df)
A B
0 1 1
1 5 2
2 7 1
3 9 3
df.loc[df['B'].duplicated(), 'B'] = np.nan
print (df)
A B
0 1 1.0
1 5 2.0
2 7 NaN
3 9 3.0
df = df.drop_duplicates(subset=['B'])
print (df)
A B
0 1 1
1 5 2
3 9 3

issue merging two dataframes with pandas by summing element by element [duplicate]

I'm trying to merge two DataFrames summing columns value.
>>> print(df1)
id name weight
0 1 A 0
1 2 B 10
2 3 C 10
>>> print(df2)
id name weight
0 2 B 15
1 3 C 10
I need to sum weight values during merging for similar values in the common column.
merge = pd.merge(df1, df2, how='inner')
So the output will be something like following.
id name weight
1 2 B 25
2 3 C 20
This solution works also if you want to sum more than one column. Assume data frames
>>> df1
id name weight height
0 1 A 0 5
1 2 B 10 10
2 3 C 10 15
>>> df2
id name weight height
0 2 B 25 20
1 3 C 20 30
You can concatenate them and group by index columns.
>>> pd.concat([df1, df2]).groupby(['id', 'name']).sum().reset_index()
id name weight height
0 1 A 0 5
1 2 B 35 30
2 3 C 30 45
In [41]: pd.merge(df1, df2, on=['id', 'name']).set_index(['id', 'name']).sum(axis=1)
Out[41]:
id name
2 B 25
3 C 20
dtype: int64
If you set the common columns as the index, you can just sum the two dataframes, much simpler than merging:
In [30]: df1 = df1.set_index(['id', 'name'])
In [31]: df2 = df2.set_index(['id', 'name'])
In [32]: df1 + df2
Out[32]:
weight
id name
1 A NaN
2 B 25
3 C 20