pandas rolling calc count function - pandas

import pandas as pd
import numpy as np
df = pd.DataFrame({'col1': np.arange(6), 'col2': np.arange(2, 8)})
col1
col2
0
3
1
4
2
5
3
6
4
7
and i want get column col3 if condition with after rolling.tail(3) and re turn count(col1>=3 and col2>=3)
the last result i want it likes:
col1
col2
col3
reason
0
3
0
1
4
1
[(3>=3 and 6>=3)]
2
5
2
[(3>=3 and 6>=3),(4>=3 and 7>=3)]
3
6
nan
4
7
nan
Hope to get your reply as soon as possible

Related

How to Select Rows and Columns of Dataframe?

I have a dataframe with 4 columns and 6 rows, I want to select 2nd and 4th columns and 1st and 6th rows of this dataframe and create a new dataframe. How can i do this?
You can use the code given below, to do this but you have to be careful with the fact that in pandas the indexing starts from 0 which is the first column or could be first row when you are retrieving it.
>>> import pandas as pd
>>>
>>> dictA = {'col1': ['tom', 10,20,56,2,3,4],'col2': ['tom', 10,20,56,2,3,4],'col3': ['tom', 10,20,56,2,3,4],'col4': ['tom', 10,20,56,2,3,4]}
>>>
... dfA = pd.DataFrame(dictA)
>>> dfA
col1 col2 col3 col4
0 tom tom tom tom
1 10 10 10 10
2 20 20 20 20
3 56 56 56 56
4 2 2 2 2
5 3 3 3 3
6 4 4 4 4
>>> new_df = dfA.iloc[[0,5],[2,3]]
>>> new_df
col3 col4
0 tom tom
5 3 3
For more details have a look here

convert from one column pandas dataframe to 3 columns based on index

I have:
col1
0 1
1 2
2 3
3 4
4 5
5 6
...
I want, every 3 rows of the original dataframe to become a single row in the new dataframe:
col1 col2 col3
0 1 2 3
1 4 5 6
...
Any suggestions?
The values of the dataframe are an array that can be reshaped using numpy's reshape method. Then, create a new dataframe using the reshaped values. Assuming your existing dataframe is df-
df_2 = pd.DataFrame(df.values.reshape(2, 3), columns=['col1', 'col2', 'col3'])
This will create the new dataframe of two rows and 3 columns.
col1 col2 col3
0 0 1 2
1 3 4 5
You can use set_index and unstack to get the right shape, and add_preffix to change the column name:
print (df.set_index([df.index//3, df.index%3+1])['col1'].unstack().add_prefix('col'))
col1 col2 col3
0 1 2 3
1 4 5 6
in case the original index is not consecutive values but you still want to reshape every 3 rows, replace df.index by np.arange(len(df)) for both in the set_index
you can covert the col in numpy array and then reshape.
In [27]: np.array(df['col1']).reshape( len(df) // 3 , 3 )
Out[27]:
array([[1, 2, 3],
[4, 5, 6]])
In [..] :reshaped_cols = np.array(df['col1']).reshape( len(df) // 3 , 3 )
pd.DataFrame( data = reshaped_cols , columns = ['col1' , 'col2' , 'col3' ] )
Out[30]:
col1 col2 col3
0 1 2 3
1 4 5 6

adding multiple lists into one column DataFrame pandas

l = {'col1': [[1,2,3], [4,5,6]]}
df = pd.DataFrame(data = l)
col1
0 [1, 2, 3]
1 [4, 5, 6]
Desired output:
col1
0 1
1 2
2 3
3 4
4 5
5 6
Here is explode
df.explode('col1')
col1
0 1
0 2
0 3
1 4
1 5
1 6
You can use np.ravel to flatten the list of lists:
import numpy as np, pandas as pd
l = {'col1': [[1,2,3], [4,5,6]]}
df = pd.DataFrame(np.ravel(*l.values()),columns=l.keys())
>>> df
col1
0 1
1 2
2 3
3 4
4 5
5 6

Make all values after a label have the same value of that label

I have a data frame:
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 10, size=(5, 2)), columns=['col1', 'col2'])
Which generates the following frame:
col1 col2
0 6 3
1 7 4
2 6 9
3 2 6
4 7 4
I want to replace all values from row 2 forward with whatever value on row 1. So I type:
df.loc[2:] = df.loc[1:1]
But the resulting frame is filled with nan:
col1 col2
0 6.0 3.0
1 7.0 4.0
2 NaN NaN
3 NaN NaN
4 NaN NaN
I know I can use fillna(method='ffill') to get what I want but why did the broadcasting not work and result is NaN? Expected result:
col1 col2
0 6 3
1 7 4
2 7 4
3 7 4
4 7 4
Edit: pandas version 0.24.2
I believe df.loc[1:1] is just the empty array, hence converted to NaN? It should be df.loc[2:, 'Value'] = df.loc[1, 'Value'].

Append Pandas Series to DataFrame as a column [duplicate]

This question already has answers here:
vlookup in Pandas using join
(3 answers)
Closed 6 years ago.
I have panadas dataframe (df) like ['key','col1','col2','col3'] and I have pandas series (sr) for which the index is the same as 'key' in data frame. I want to append the series to the dataframe at the new column called col4 with the same 'key'. I have the following code:
for index, row in segmention.iterrows():
df[df['key']==row['key']]['col4']=sr.loc[row['key']]
The code is very slow. I assume there should be more efficient and better way to do that. could you please help?
You can simply do:
df['col4'] = sr
If don't misunderstand.
Use map as mentioned EdChum:
df['col4'] = df['key'].map(sr)
print (df)
col1 col2 col3 key col4
0 4 7 1 A 2
1 5 8 3 B 4
2 6 9 5 C 1
Or assign with set_index:
df = df.set_index('key')
df['col4'] = sr
print (df)
col1 col2 col3 col4
key
A 4 7 1 2
B 5 8 3 4
C 6 9 5 1
If dont need align data in Series by key use (see difference 2,1,4 vs 4,1,2):
df['col4'] = sr.values
print (df)
col1 col2 col3 key col4
0 4 7 1 A 4
1 5 8 3 B 1
2 6 9 5 C 2
Sample:
df = pd.DataFrame({'key':[1,2,3],
'col1':[4,5,6],
'col2':[7,8,9],
'col3':[1,3,5]}, index=list('ABC'))
print (df)
col1 col2 col3 key
A 4 7 1 1
B 5 8 3 2
C 6 9 5 3
sr = pd.Series([4,1,2], index=list('BCA'))
print (sr)
B 4
C 1
A 2
dtype: int64
df['col4'] = df['key'].map(sr)
print (df)
col1 col2 col3 key col4
0 4 7 1 A 2
1 5 8 3 B 4
2 6 9 5 C 1
df = df.set_index('key')
df['col4'] = sr
print (df)
col1 col2 col3 col4
key
A 4 7 1 2
B 5 8 3 4
C 6 9 5 1
This is really a good use case for join, where the left dataframe aligns a column with the index of the right dataframe/series. You have to make sure your Series has a name for it to work
sr.name = 'some name'
df.join(sr, on='key')