Related
Given this input:
pd.DataFrame({'C1': [6, np.NaN, 16, np.NaN], 'C2': [17, np.NaN, 1, np.NaN],
'D1': [8, np.NaN, np.NaN, 6], 'D2': [15, np.NaN, np.NaN, 12]}, index=[1,1,2,2])
I'd like to combine columns beginning in the same letter (the Cs and Ds), as well as rows with same index (1 and 2), and extract the non-null values to the simplest representation without duplicates, which I think is something like:
{1: {'C': [6.0, 17.0], 'D': [8.0, 15.0]}, 2: {'C': [16.0, 1.0], 'D': [6.0, 12.0]}}
Using stack or groupby gets me part of the way there, but I feel like there is a more efficient way to do it.
You can rename columns by lambda function for first letters with aggregate lists after DataFrame.stack and then create nested dictionary in dict comprehension:
s = df.rename(columns=lambda x: x[0]).stack().groupby(level=[0,1]).agg(list)
d = {level: s.xs(level).to_dict() for level in s.index.levels[0]}
print (d)
{1: {'C': [6.0, 17.0], 'D': [8.0, 15.0]}, 2: {'C': [16.0, 1.0], 'D': [6.0, 12.0]}}
I have a two data frames, one made up with a column of numpy array list, and other with two columns. I am trying to match the elements in the 1st dataframe (df) to get two columns, o1 and o2 from the df2, by matching based on index. I was wondering i can get some inputs.. please note the string 'A1' in column in 'o1' is repeated twice in df2 and as you may see in my desired output dataframe the duplicates are removed in column o1.
import numpy as np
import pandas as pd
array_1 = np.array([[0, 2, 3], [3, 4, 6], [1,2,3,6]])
#dataframe 1
df = pd.DataFrame({ 'A': array_1})
#dataframe 2
df2 = pd.DataFrame({ 'o1': ['A1', 'B1', 'A1', 'C1', 'D1', 'E1', 'F1'], 'o2': [15, 17, 18, 19, 20, 7, 8]})
#desired output
df_output = pd.DataFrame({ 'A': array_1, 'o1': [['A1', 'C1'], ['C1', 'D1', 'F1'], ['B1','A1','C1','F1']],
'o2': [[15, 18, 19], [19, 20, 8], [17,18,19,8]] })
# please note in the output, the 'index 0 of df1 has 0&2 which have same element i.e. 'A1', the output only shows one 'A1' by removing duplicated one.
I believe you can explode df and use that to extract information from df2, then finally join back to df
s = df['A'].explode()
df_output= df.join(df2.loc[s].groupby(s.index).agg(lambda x: list(set(x))))
Output:
A o1 o2
0 [0, 2, 3] [C1, A1] [18, 19, 15]
1 [3, 4, 6] [F1, D1, C1] [8, 19, 20]
2 [1, 2, 3, 6] [F1, B1, C1, A1] [8, 17, 18, 19]
I have the below Panda DataFrame that contains two columns. The first column is original values containing the missing values (NaN values) and the second column that is the result of missing imputation for filling the NaN values in the first column. How can I plot these two columns in the same graph that show the original values with filled values like the graph below:
Data=pd.DataFrame([[3.83092724, np.nan],
[ np.nan, 3.94103207],
[ np.nan, 3.86621724],
[3.48386179, np.nan],
[ np.nan, 3.7430167 ],
[3.2382959 , np.nan],
[3.9143139 , np.nan],
[4.46676265, np.nan],
[ np.nan, 3.9340262 ],
[3.650658 , np.nan],
[ np.nan, 3.10590516],
[4.19497691, np.nan],
[4.11873876, np.nan],
[4.15286075, np.nan],
[4.67441617, np.nan],
[4.50631534, np.nan],
[ np.nan, 4.01349688],
[ np.nan, 3.48459778],
[ np.nan, 3.83495488],
[ np.nan, 3.10590516],
[ np.nan, 4.09355884],
[4.8433281 , np.nan],
[ np.nan, 3.33450675],
[4.86672126, np.nan],
[ np.nan, 3.2382959 ],
[ np.nan, 3.48210011],
[ np.nan, 3.00958811],
[ np.nan, 3.05774663]], columns=['original', 'filled'])
You need markers, otherwise the chart makes no sense if you have individual original values surrounded by missing values.
We first plot the original values. Then, for the filled values, we fill any missing value directly adjacent to an existing filled value, with the original value to get the dashed line from that original value to the next/preceding filled value. Finally we plot these amended filled values column as a dashed line.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.DataFrame([[3.83092724, np.nan],
[ np.nan, 3.94103207],
[ np.nan, 3.86621724],
[3.48386179, np.nan],
[ np.nan, 3.7430167 ],
[3.2382959 , np.nan],
[3.9143139 , np.nan],
[4.46676265, np.nan],
[ np.nan, 3.9340262 ],
[3.650658 , np.nan],
[ np.nan, 3.10590516],
[4.19497691, np.nan],
[4.11873876, np.nan],
[4.15286075, np.nan],
[4.67441617, np.nan],
[4.50631534, np.nan],
[ np.nan, 4.01349688],
[ np.nan, 3.48459778],
[ np.nan, 3.83495488],
[ np.nan, 3.10590516],
[ np.nan, 4.09355884],
[4.8433281 , np.nan],
[ np.nan, 3.33450675],
[4.86672126, np.nan],
[ np.nan, 3.2382959 ],
[ np.nan, 3.48210011],
[ np.nan, 3.00958811],
[ np.nan, 3.05774663]], columns=['original', 'filled'])
_,ax = plt.subplots()
df.original.plot(marker='o', ax=ax)
m = (df.filled.isna()&df.filled.shift(1).notna()) | (df.filled.isna()&df.filled.shift(-1).notna())
df.filled.fillna(df.loc[m,'original']).plot(ls='--', ax=ax, color=ax.get_lines()[0].get_color())
The above is a clean solution for the general case. If the original values are drawn with a solid opaque line and the filled values with a line width of not greater than that of the original values, you can simply first draw the completely filled filled values and then, on top of that line, the original values:
df.filled.fillna(df.original).plot(ax=ax, color='blue', ls='--')
df.original.plot(marker='o', ax=ax, color='blue')
I have a matrix that looks like that:
>> X
>>
[[5.1 1.4]
[4.9 1.4]
[4.7 1.3]
[4.6 1.5]
[5. 1.4]]
I want to get its first column as an array of [5.1, 4.9, 4.7, 4.6, 5.]
However when I try to get it by X[:,0] i get
>> [[5.1]
[4.9]
[4.7]
[4.6]
[5. ]]
which is something different. How to get it as an array ?
You can use list comprehensions for this kind of thing..
import numpy as np
X = np.array([[5.1, 1.4], [4.9, 1.4], [4.7, 1.3], [4.6, 1.5], [5.0, 1.4]])
X_0 = [i for i in X[:,0]]
print(X_0)
Output..
[5.1, 4.9, 4.7, 4.6, 5.0]
Almost there! Just reshape your result:
X[:,0].reshape(1,-1)
Outputs:
[[5.1 4.9 4.7 4.6 5. ]]
Full code:
import numpy as np
X=np.array([[5.1 ,1.4],[4.9 ,1.4], [4.7 ,1.3], [4.6 ,1.5], [5. , 1.4]])
print(X)
print(X[:,0].reshape(1,-1))
With regular numpy array:
In [3]: x = np.arange(15).reshape(5,3)
In [4]: x
Out[4]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
In [5]: x[:,0]
Out[5]: array([ 0, 3, 6, 9, 12])
With np.matrix (use discouraged if not actually deprecated)
In [6]: X = np.matrix(x)
In [7]: X
Out[7]:
matrix([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
In [8]: print(X)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]
[12 13 14]]
In [9]: X[:,0]
Out[9]:
matrix([[ 0],
[ 3],
[ 6],
[ 9],
[12]])
In [10]: X[:,0].T
Out[10]: matrix([[ 0, 3, 6, 9, 12]])
To get 1d array, convert to array and ravel, or in one step:
In [11]: X[:,0].A1
Out[11]: array([ 0, 3, 6, 9, 12])
I have a dataframe with 1000 columns. I want to replace every -9 value in every column with that row's df['a'] value.
df = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [6, -9, 8, np.nan, -9], 'c': [-9, 19, -9, -9, -9]})
What I want is
df = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [6, 2, 8, np.nan, 5], 'c': [1, 19, 3, 4, 5]})
I have tried
df.replace(-9, df['a'], inplace = True)
And
df.replace(-9, np.nan, inplace = True)
df.fillna(df.a, inplace = True)
But they don't change the df.
My solution right now is to use a for loop:
df.replace(-9, np.nan, inplace = True)
col_list = list(df)
for i in col_list:
df[i].fillna(df['a'], inplace = True)
This solution works, but it also replaces any np.nan values. Any ideas as to how I can replace just the -9 values without first converting it into np.nan? Thanks.
I think need mask:
df = df.mask(df == -9, df['a'], axis=0)
print (df)
a b c
0 1 6.0 1
1 2 2.0 19
2 3 8.0 3
3 4 NaN 4
4 5 5.0 5
Or:
df = pd.DataFrame(np.where(df == -9, df['a'].values[:, None], df), columns=df.columns)
print (df)
a b c
0 1.0 6.0 1.0
1 2.0 2.0 19.0
2 3.0 8.0 3.0
3 4.0 NaN 4.0
4 5.0 5.0 5.0
you can also do something like this
import numpy as np
import pandas as pd
df_tar = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [6, 2, 8, np.nan, 5], 'c': [1, 19, 3, 4, 5]})
df = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [6, -9, 8, np.nan, -9], 'c': [-9, 19, -9, -9, -9]})
df.loc[df['b']==-9,'b']=df.loc[df['b']==-9,'a']
df.loc[df['c']==-9,'c']=df.loc[df['c']==-9,'a']