How to add a dictionary keys and values to specific columns in an empty dataframe - pandas

I have a dict with 3 keys, and each value is a list of numpy arrays.
I'd like to to append this dictionary to an empty dataframe so that the values of the numpy arrays in the list will be the first numbers(column 'x'), the values at the second position in the numpy arrays(column 'y'), and the keys to be the final column (column 'z'), like so:
my_dict = {0: [array([5.4, 3.9, 1.3, 0.4]), array([4.9, 3. , 1.4, 0.2]),array([4.6, 3.6, 1. , 0.2]), array([4.6, 3.2, 1.4, 0.2]), array([4.7, 3.2, 1.6, 0.2])],
1: [array([6.1, 2.9, 4.7, 1.4]), array([5.9, 3. , 4.2, 1.5]), array([7.4, 2.8, 6.1, 1.9])],
2: [array([7. , 3.2, 4.7, 1.4]), array([5.6, 2.7, 4.2, 1.3])]}
I'd like to get the below df:
x y z
0 5.4 3.9 0
1 4.9 3. 0
2 4.6 3.6 0
3 4.6 3.2 0
4 4.7 3.2 0
5 6.1 2.9 1
6 5.9 3. 1
7 7.4 2.8 1
8 7. 3.2 2
9 5.6 2.7 2
it's a bit tricky, how can i do it?

This will do it:
data = [j[:2].tolist() + [k] for k, v in my_dict.items() for j in v]
df = pd.DataFrame(data, columns=list('xyz'))
df
x y z
0 5.4 3.9 0
1 4.9 3.0 0
2 4.6 3.6 0
3 4.6 3.2 0
4 4.7 3.2 0
5 6.1 2.9 1
6 5.9 3.0 1
7 7.4 2.8 1
8 7.0 3.2 2
9 5.6 2.7 2

Try this:
target_df=pd.DataFrame(columns=['x','y','z']) # empty dataframe
for k,v in my_dict.items():
for val in v:
d={'x':[val[0]], 'y':[val[1]], 'z':[k]}
target_df=pd.concat([target_df, pd.DataFrame(d)], ignore_index=True)
print(target_df) will give desired dataframe
x y z
0 5.4 3.9 0
1 4.9 3.0 0
2 4.6 3.6 0
3 4.6 3.2 0
4 4.7 3.2 0
5 6.1 2.9 1
6 5.9 3.0 1
7 7.4 2.8 1
8 7.0 3.2 2
9 5.6 2.7 2

Related

How to shift all the values from a certain point of the dataframe to the right?

Example:
I have this dataset
A B C D E
0 0.1 0.2 0.3 0.4 0.5
1 1.1 1.2 1.3 1.4 1.5
2 2.1 2.2 2.4 2.5 2.6
3 3.1 3.2 3.4 3.5 3.6
4 4.1 4.2 4.4 4.5 4.6
5 5.1 5.2 5.3 5.4 5.5
What I would like to have is:
A B C D E
0 0.1 0.2 0.3 0.4 0.5
1 1.1 1.2 1.3 1.4 1.5
2 2.1 2.2 2.4 2.5 2.6
3 3.1 3.2 3.4 3.5 3.6
4 4.1 4.2 4.4 4.5 4.6
5 5.1 5.2 5.3 5.4 5.5
So I need to shift only certain rows and only certain columns to the right.
Not all the lines and columns have to be affected by that shift. I hope it's clear, thank you.
Pandas would be a lovely way to solve this. Use the .loc to select the rows and columns and use .shift() to move them to the right.
import pandas as pd
df.loc[2:4, ['C','D']] = df.loc[2:4, ['C','D']].shift(1, axis=1)
If you share your dataframe code to define df, I can fully test the loc/shift solution.

Is it possible to do pandas groupby transform rolling mean?

Is it possible for pandas to do something like:
df.groupby("A").transform(pd.rolling_mean,10)
You can do this without the transform or apply:
df = pd.DataFrame({'grp':['A']*5+['B']*5,'data':[1,2,3,4,5,2,4,6,8,10]})
df.groupby('grp')['data'].rolling(2, min_periods=1).mean()
Output:
grp
A 0 1.0
1 1.5
2 2.5
3 3.5
4 4.5
B 5 2.0
6 3.0
7 5.0
8 7.0
9 9.0
Name: data, dtype: float64
Update per comment:
df = pd.DataFrame({'grp':['A']*5+['B']*5,'data':[1,2,3,4,5,2,4,6,8,10]},
index=[*'ABCDEFGHIJ'])
df['avg_2'] = df.groupby('grp')['data'].rolling(2, min_periods=1).mean()\
.reset_index(level=0, drop=True)
Output:
grp data avg_2
A A 1 1.0
B A 2 1.5
C A 3 2.5
D A 4 3.5
E A 5 4.5
F B 2 2.0
G B 4 3.0
H B 6 5.0
I B 8 7.0
J B 10 9.0

pandas increment cell value by 0.1

I have a pandas data frame that looks similar to below:
A
0 1
1 NaN
2 2
3 NaN
4 NaN
5 3
6 4
8 NaN
9 5
10 NaN
What I want it is:
A
0 1
1 1.1
2 2
3 2.1
4 2.2
5 3
6 4
8 4.1
9 5
10 5.1
The missing values I want to fill incrementally by 0.1. I have been playing with np.arrange but I cannot work out how to piece everything together. I feel I am on the right path but would appreciate some help. thank you
In []: import pandas as pd
In []: import numpy as np
In []: np.arange(1, 2, 0.1)
Out[]: array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9])
In []: def up(x):
return x.astype(str) + '.' + np.arange(len(x)).astype(str)
In []: data = pd.DataFrame([[1,0],[0,1],[1,0],[0,1]], columns=["A", "B"])
In []: out = data.apply(up).values
array([['1.0', '0.0'],
['0.1', '1.1'],
['1.2', '0.2'],
['0.3', '1.3']], dtype=object)
In []: df = pd.DataFrame(out)
A B
0 1.0 0.0
1 0.1 1.1
2 1.2 0.2
3 0.3 1.3
A little bit hard to get that point
s=df.A.isnull().astype(int).diff().ne(0).cumsum()[df.A.isnull()]# creat the group Id for those NaN value , if they are NaN they belong to same Id
df.A.fillna(df.A.ffill()+s.groupby(s).cumcount().add(1).mul(0.1))# then we using fillna , and creat the position inorder to adding the .01 for each
Out[1764]:
0 1.0
1 1.1
2 2.0
3 2.1
4 2.2
5 3.0
6 4.0
8 4.1
9 5.0
10 5.1
Name: A, dtype: float64

Unifying columns in the same Pandas dataframe to one column

Hi I would like to unify columns in the same dataframe to one column such as:
col1 col2
1 1.4 1.5
2 2.3 2.6
3 3.6 6.7
to
col1&2
1 1.4
1 1.5
2 2.3
2 2.6
3 3.6
3 6.7
Thanks for your help
Use stack, then remove level by reset_index and last create one column DataFrame by to_frame:
df = df.stack().reset_index(level=1, drop=True).to_frame('col1&2')
print (df)
col1&2
1 1.4
1 1.5
2 2.3
2 2.6
3 3.6
3 6.7
Or:
df = pd.DataFrame({'col1&2': df.values.reshape(1,-1).ravel()}, index=np.repeat(df.index, 2))
print (df)
col1&2
1 1.4
1 1.5
2 2.3
2 2.6
3 3.6
3 6.7

pandas multiindex selection with ranges

I have a python frame like
y m A B
1990 1 3.4 5
2 4 4.9
...
1990 12 4.0 4.5
...
2000 1 2.3 8.1
2 3.7 5.0
...
2000 12 2.4 9.1
I would like to select 2-12 from the second index (m) and years 1991-2000. I don't seem to get the multindex slicing correct. E.g. I tried
idx = pd.IndexSlice
dfa = df.loc[idx[1:,1:],:]
but that does not seem to slice on the first index. Any suggestions on an elegant solution?
Cheers, Mike
Without a sample code to reproduce your df it is difficult to guess, but if you df is similar to:
import pandas as pd
df = pd.read_csv(pd.io.common.StringIO(""" y m A B
1990 1 3.4 5
1990 2 4 4.9
1990 12 4.0 4.5
2000 1 2.3 8.1
2000 2 3.7 5.0
2000 12 2.4 9.1"""), sep='\s+')
df
y m A B
0 1990 1 3.4 5.0
1 1990 2 4.0 4.9
2 1990 12 4.0 4.5
3 2000 1 2.3 8.1
4 2000 2 3.7 5.0
5 2000 12 2.4 9.1
Then this code will extract what you need:
print df.loc[(df['y'].isin(range(1990,2001))) & df['m'].isin(range(2,12))]
y m A B
1 1990 2 4.0 4.9
4 2000 2 3.7 5.0
If however your df is indexes by y and m, then this will do the same:
df.set_index(['y','m'],inplace=True)
years = df.index.get_level_values(0).isin(range(1990,2001))
months = df.index.get_level_values(1).isin(range(2,12))
df.loc[years & months]
y m A B
1 1990 2 4.0 4.9
4 2000 2 3.7 5.0