pandas increment cell value by 0.1 - pandas

I have a pandas data frame that looks similar to below:
A
0 1
1 NaN
2 2
3 NaN
4 NaN
5 3
6 4
8 NaN
9 5
10 NaN
What I want it is:
A
0 1
1 1.1
2 2
3 2.1
4 2.2
5 3
6 4
8 4.1
9 5
10 5.1
The missing values I want to fill incrementally by 0.1. I have been playing with np.arrange but I cannot work out how to piece everything together. I feel I am on the right path but would appreciate some help. thank you
In []: import pandas as pd
In []: import numpy as np
In []: np.arange(1, 2, 0.1)
Out[]: array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9])
In []: def up(x):
return x.astype(str) + '.' + np.arange(len(x)).astype(str)
In []: data = pd.DataFrame([[1,0],[0,1],[1,0],[0,1]], columns=["A", "B"])
In []: out = data.apply(up).values
array([['1.0', '0.0'],
['0.1', '1.1'],
['1.2', '0.2'],
['0.3', '1.3']], dtype=object)
In []: df = pd.DataFrame(out)
A B
0 1.0 0.0
1 0.1 1.1
2 1.2 0.2
3 0.3 1.3

A little bit hard to get that point
s=df.A.isnull().astype(int).diff().ne(0).cumsum()[df.A.isnull()]# creat the group Id for those NaN value , if they are NaN they belong to same Id
df.A.fillna(df.A.ffill()+s.groupby(s).cumcount().add(1).mul(0.1))# then we using fillna , and creat the position inorder to adding the .01 for each
Out[1764]:
0 1.0
1 1.1
2 2.0
3 2.1
4 2.2
5 3.0
6 4.0
8 4.1
9 5.0
10 5.1
Name: A, dtype: float64

Related

Operations with multiple dataframes partialy sharing indexes in pandas

I have two dataframes: (i) One has two indexes and two headers, and (ii) the other one has one index and one header. The second level of each axis in the first dataframe relates to each axis of the second dataframe. I need to multiply both dataframes based on that relation between the axis.
Dataframe 1:
Dataframe 2:
Expected result (multiplication by index/header):
Try using pd.DataFrame.mul with the level parameter:
import pandas as pd
df = pd.DataFrame([[9,10,2,1,6,5],
[4, 0,3,4,6,6],
[9, 3,9,1,2,3],
[3, 5,9,3,9,0],
[4,4,8,5,10,5],
[5, 3,1,8,5,6]])
df.columns = pd.MultiIndex.from_arrays([[2020]*3+[2021]*3,[1,2,3,1,2,3]])
df.index = pd.MultiIndex.from_arrays([[1]*3+[2]*3,[1,2,3,1,2,3]])
print(df)
print('\n')
df2 = pd.DataFrame([[.1,.3,.6],[.4,.4,.3],[.5,.4,.1]], index=[1,2,3], columns=[1,2,3])
print(df2)
print('\n')
df_out = df.mul(df2, level=1)
print(df_out)
Output:
2020 2021
1 2 3 1 2 3
1 1 9 10 2 1 6 5
2 4 0 3 4 6 6
3 9 3 9 1 2 3
2 1 3 5 9 3 9 0
2 4 4 8 5 10 5
3 5 3 1 8 5 6
1 2 3
1 0.1 0.3 0.6
2 0.4 0.4 0.3
3 0.5 0.4 0.1
2020 2021
1 2 3 1 2 3
1 1 0.9 3.0 1.2 0.1 1.8 3.0
2 1.6 0.0 0.9 1.6 2.4 1.8
3 4.5 1.2 0.9 0.5 0.8 0.3
2 1 0.3 1.5 5.4 0.3 2.7 0.0
2 1.6 1.6 2.4 2.0 4.0 1.5
3 2.5 1.2 0.1 4.0 2.0 0.6

The previous value in each group is padded with missing values

If there are three columns of data, the first column is some category id, the second column and the third column have some missing values, I want to aggregate the id of the first column, after grouping, fill in the third column of each group by the method: 'ffill' Missing value
I found a good idea here: Pandas: filling missing values by weighted average in each group! , but it didn't solve my problem because the output it produced was not what I wanted
Enter the following code to get an example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'name': ['A','A', 'B','B','B','B', 'C','C','C'],'value': [1, np.nan, np.nan, 2, 3, 1, 3, np.nan, 3],
'sss':[1, np.nan, 3, np.nan, np.nan, np.nan, 2, np.nan, np.nan]})
Out[13]:
name value sss
0 A 1.0 1.0
1 A NaN NaN
2 B NaN 3.0
3 B 2.0 NaN
4 B 3.0 NaN
5 B 1.0 NaN
6 C 3.0 2.0
7 C NaN NaN
8 C 3.0 NaN
Fill in missing values with a previous value after grouping
Then I ran the following code, but it outputs strange results:
df["sss"] = df.groupby("name").transform(lambda x: x.fillna(axis = 0,method = 'ffill'))
df
Out[13]:
name value sss
0 A 1.0 1.0
1 A NaN 1.0
2 B NaN NaN
3 B 2.0 2.0
4 B 3.0 3.0
5 B 1.0 1.0
6 C 3.0 3.0
7 C NaN 3.0
8 C 3.0 3.0
The result I want is this:
Out[13]:
name value sss
0 A 1.0 1.0
1 A NaN 1.0
2 B NaN 3.0
3 B 2.0 3.0
4 B 3.0 3.0
5 B 1.0 3.0
6 C 3.0 2.0
7 C NaN 2.0
8 C 3.0 2.0
Can someone point out where I am wrong?strong text

Make all values after a label have the same value of that label

I have a data frame:
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 10, size=(5, 2)), columns=['col1', 'col2'])
Which generates the following frame:
col1 col2
0 6 3
1 7 4
2 6 9
3 2 6
4 7 4
I want to replace all values from row 2 forward with whatever value on row 1. So I type:
df.loc[2:] = df.loc[1:1]
But the resulting frame is filled with nan:
col1 col2
0 6.0 3.0
1 7.0 4.0
2 NaN NaN
3 NaN NaN
4 NaN NaN
I know I can use fillna(method='ffill') to get what I want but why did the broadcasting not work and result is NaN? Expected result:
col1 col2
0 6 3
1 7 4
2 7 4
3 7 4
4 7 4
Edit: pandas version 0.24.2
I believe df.loc[1:1] is just the empty array, hence converted to NaN? It should be df.loc[2:, 'Value'] = df.loc[1, 'Value'].

pandas aggregate include all groups

I have the following issue with groupby aggregation, i.e adding groups which are not presented in the dataframe but based on the desired output should be included. An example:
import pandas as pd
from pandas.compat import StringIO
csvdata = StringIO("""day,sale
1,1
2,4
2,10
4,7
5,2.3
7,4.4
2,3.4""")
#day 3,6 are intentionally not included here but I'd like to have it in output
df = pd.read_csv(csvdata, sep=",")
df1=df.groupby(['day'])['sale'].agg('sum').reset_index().rename(columns={'sale':'dailysale'})
df1
How can I get the following? Thank you!
1 1.0
2 17.4
3 0.0
4 7.0
5 2.3
6 0.0
7 4.4
You can add Series.reindex with specified range after aggregating sum:
df1 = (df.groupby(['day'])['sale']
.sum()
.reindex(range(1, 8), fill_value=0)
.reset_index(name='dailysale'))
print (df1)
day dailysale
0 1 1.0
1 2 17.4
2 3 0.0
3 4 7.0
4 5 2.3
5 6 0.0
6 7 4.4
Another idea is use ordered categorical, so aggregate sum add missing rows:
df['day'] = pd.Categorical(df['day'], categories=range(1, 8), ordered=True)
df1 = df.groupby(['day'])['sale'].sum().reset_index(name='dailysale')
print (df1)
day dailysale
0 1 1.0
1 2 17.4
2 3 0.0
3 4 7.0
4 5 2.3
5 6 0.0
6 7 4.4

Move strings within a mixed string and float column to new column in Pandas

Can't seem to find the answer anywhere. I have a column 'q' within my dataframe that has both strings and floats. I would like to remove the string values from 'q' and move them into an existing string column 'comments'. Any help is appreciated.
I have tried:
df['comments']=[isinstance(x, str) for x in df.q]
I have also tried some str methods on q but to no avail. Any direction on this would be appreciated
If series is:
s=pd.Series([1.0,1.1,1.2,1.3,'this','is',1.4,'a',1.5,'comment'])
s
Out[24]:
0 1
1 1.1
2 1.2
3 1.3
4 this
5 is
6 1.4
7 a
8 1.5
9 comment
dtype: object
then only floats can be:
[e if type(e) is float else np.NaN for e in s if type(e)]
Out[25]: [1.0, 1.1, 1.2, 1.3, nan, nan, 1.4, nan, 1.5, nan]
And comments can be:
[e if type(e) is not float else '' for e in s if type(e)]
Out[26]: ['', '', '', '', 'this', 'is', '', 'a', '', 'comment']
This is what you are trying to do.
But element-wise iteration with pandas does not scale well, so extract floats only using:
pd.to_numeric(s,errors='coerce')
Out[27]:
0 1.0
1 1.1
2 1.2
3 1.3
4 NaN
5 NaN
6 1.4
7 NaN
8 1.5
9 NaN
dtype: float64
and :
pd.to_numeric(s,errors='coerce').to_frame('floats').merge(s.loc[pd.to_numeric(s,errors='coerce').isnull()].to_frame('comments'), left_index=True, right_index=True, how='outer')
Out[71]:
floats comments
0 1.0 NaN
1 1.1 NaN
2 1.2 NaN
3 1.3 NaN
4 NaN this
5 NaN is
6 1.4 NaN
7 NaN a
8 1.5 NaN
9 NaN comment
there is a side effect to pd.to_numeric(s,errors='coerce') where it'll convert all strings with float literals to float instead of keeping it as a string.
pd.to_numeric(pd.Series([1.0,1.1,1.2,1.3,'this','is',1.4,'a',1.5,'comment','12.345']), errors='coerce')
Out[73]:
0 1.000
1 1.100
2 1.200
3 1.300
4 NaN
5 NaN
6 1.400
7 NaN
8 1.500
9 NaN
10 12.345 <--- this is now the float 12.345 not str
dtype: float64
If you don't want to convert strings with float literals into floats, you can use also str.isnumeric() method:
df = pd.DataFrame({'q':[1.5,2.5,3.5,'a', 'b', 5.1,'3.55','1.44']})
df['comments'] = df.loc[df['q'].str.isnumeric()==False, 'q']
In [4]: df
Out[4]:
q comments
0 1.5 NaN
1 2.5 NaN
2 3.5 NaN
3 a a
4 b b
5 5.1 NaN
6 3.55 3.55 <-- strings are not converted into floats
7 1.44 1.44
Or something like this:
criterion = df.q.apply(lambda x: isinstance(x,str))
df['comments'] = df.loc[criterion, 'q']
Again, it won't convert strings into floats.