Given the following Pandas dataframe :
|---+---+---+---|
| A | B | C | D |
|---+---+---+---|
| 0 | 1 | 0 | 0 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 1 | 0 | 0 |
| 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 0 |
|---+---+---+---|
How would you do to a bar chart like this :
0 being failure, and 1 success.
With pandas and matplotlib use melt + crosstab:
dfm = df.melt()
plot_df = (
pd.crosstab(dfm['variable'], dfm['value'])
.rename(columns={0: 'failure', 1: 'success'})
)
plot_df.plot.bar()
plt.tight_layout()
plt.show()
plot_df:
value failure success
variable
A 4 6
B 8 2
C 9 1
D 9 1
Or with Seaborn use sns.countplot after melt:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
df = pd.DataFrame({
'A': [0, 1, 1, 1, 1, 1, 0, 0, 0, 1], 'B': [1, 0, 0, 0, 0, 0, 0, 1, 0, 0],
'C': [0, 0, 0, 0, 0, 0, 1, 0, 0, 0], 'D': [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
})
dfm = df.melt()
ax = sns.countplot(data=dfm, x='variable', hue='value')
ax.legend(labels=['failure', 'success'])
plt.tight_layout()
plt.show()
I am computing a simple crosstab for the purpose of a transition matrix like this:
test_df = pd.DataFrame({'from': ['A', 'A', 'B', 'C'], 'to': ['A', 'B', 'B', None]},
columns=['from', 'to'])
pd.crosstab(test_df['from'], test_df['to'], dropna=False)
It produces the following matrix:
A | B
---------
A 1 | 1
---------
B 0 | 1
I want it to include all transitions, even if they're 0, like the following:
A | B | C
-------------
A 1 | 1 | 0
-------------
B 0 | 1 | 0
-------------
C 0 | 0 | 0
Is there some setting I am missing to do this? I tried checking the options and couldn't find anything.
Use DataFrame.reindex at the end:
i = test_df[['from','to']].stack().unique()
new_df = (pd.crosstab(test_df['from'], test_df['to'],dropna = False)
.reindex(index = i,columns=i,fill_value =0))
print(new_df)
to A B C
from
A 1 1 0
B 0 1 0
C 0 0 0
Another approach: DataFrame.pivot_table
(test_df.pivot_table(index = 'from',columns = 'to',aggfunc = 'size',fill_value = 0)
.reindex(index = i,columns = i,fill_value = 0))
I want to get the total count from the condition for loop, Let's say for the first row(I), the code will check through the if condition, if the condition meets the specific month then use the specific for loop to get the column count (for example : if the row's month is 1 then apply For k As Integer = 4 To dt.Columns.Count - 1 to get the count, if the row's month is 2 then apply For k As Integer = 4 To dt.Columns.Count - 2 to get the count and etc) follow by second row(I) and so on, after the if else condition k then return the total count ,how can i achieve it?
I have tried the method below but my code below did not work as what had been described above, it only return the count for the first condition,Please guide me on this :
For I As Integer = 0 To dt.Rows.Count - 1
'If dt.Rows(I).Item("Month").ToString = "1" Or dt.Rows(I).Item("Month").ToString = "3" Or dt.Rows(I).Item("Month").ToString = "5" Or dt.Rows(I).Item("Month").ToString = "7" Or dt.Rows(I).Item("Month").ToString = "8" Or dt.Rows(I).Item("Month").ToString = "10" Or dt.Rows(I).Item("Month").ToString = "12" Then
For k As Integer = 4 To dt.Columns.Count - 1
If dt.Rows(I).Item(k).ToString() = "1" Then
count1 += 1
Else
count1 = 0
End If
If count1 > 13 Then
Dx = True
End If
Next k
'ElseIf dt.Rows(I).Item("Month").ToString() = "2" Or dt.Rows(I).Item("Month").ToString() = "4" Or dt.Rows(I).Item("Month").ToString() = "6" Or dt.Rows(I).Item("Month").ToString() = "9" Or dt.Rows(I).Item("Month").ToString() = "11" Then
'For k As Integer = 4 To dt.Columns.Count - 2
'If dt.Rows(I).Item(k).ToString() = "1" Then
' count1 += 1
'Else
' count1 = 0
'End If
' If total > 13 Then
' Dx = True
' End If
'Next k
'End If
Next I
DataTable (column represents the date, month 11 has 30 columns and month 12 has 31 columns)
----------------------------------------------------------------------------
Id | year | month | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | till 31
----------------------------------------------------------------------------
kek | 2019 | 10 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
kek | 2019 | 11 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
kek | 2019 | 12 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 |
link
Expected Output :
if the consecutive count from 11/11 till 12/12 is more than 13 then dx return true.
In your current code you're resetting count1 to zero in each Else.
Also this code:
total += count1
If total > 13 Then
Dx = True
End If
...looks like it should be outside the loop.
It seems to me that you need this:
For I As Integer = 0 To dt.Rows.Count - 1
Dim offset = 2
If {"1", "3", "5", "7", "8", "10", "12"}.Contains(dt.Rows(I).Item("Month").ToString())
offset = 1
End If
For k As Integer = 4 To dt.Columns.Count - offset
If dt.Rows(I).Item(k).ToString() = "1" Then
count1 += 1
End If
Next
Next
If count1 > 13 Then
Dx = True
End If
If you want to get fancier then try LINQ:
Dim query = _
From dr In dt.Rows.OfType(Of DataRow)()
Let offset = If({"1", "3", "5", "7", "8", "10", "12"}.Contains(dr.Item("Month").ToString()), 0, 1)
From k In Enumerable.Range(4, dt.Columns.Count - offset)
Where dr.Item(k).ToString() = "1"
Select 1
Dim total = query.Sum()
If total > 13 Then
Dx = True
End If
I've a datagridview like this:
------------------------
| S.N |Data1 | Data2|
| 1 | - | 10 |
| 2 | 4 | 2 |
| 3 | 2 | - |
| 4 | 9 | - |
I want result like this:
------------------------
| S.N |Data1 | Data2|
| 1 | - | 10 |
| 2 | 4 | 2 |
| 3 | 2 | - |
| 4 | 9 | - |
| total | 15 | 12 |
-----------------------
I've tried this:
Dim data1 As double = 0
Dim data2 As double = 0
For i As Integer = 0 To DataGridView1.RowCount - 1
data1 += Val(CDbl(DataGridView1.Rows(i).Cells(1).Value))
data2 += Val(CDbl(DataGridView1.Rows(i).Cells(2).Value))
Next
Dim rows As String() = {"Total", data1, data2}
DataGridView1.Rows.Add(rows)
But it has shown an error:
Can I extract number only from datagridview and display sum of them
and add to the last row?
And now I get my answer:
Dim data1 As double = 0
Dim data2 As double = 0
For j As Integer = 0 To DataGridView1.RowCount - 1
If Regex.IsMatch(DataGridView1.Rows(j).Cells(1).Value, "^[0-9 ]+$") Then
data1 += Val(CDbl(DataGridView1.Rows(j).Cells(1).Value))
End If
If Regex.IsMatch(DataGridView1.Rows(j).Cells(2).Value, "^[0-9 ]+$") Then
data2 += Val(CDbl(DataGridView1.Rows(j).Cells(2).Value))
End If
Next
Dim rows As String() = {"Total", data1, data2}
DataGridView1.Rows.Add(rows)
This checks for nulls and ensures the data is numeric.
Dim data1 As double = 0
Dim data2 As double = 0
For i As Integer = 0 To DataGridView1.RowCount - 1
If Not IsDbNull(DataGridView1.Rows(i).Cells(1).Value) AndAlso IsNumeric(DataGridView1.Rows(i).Cells(1).Value) Then data1 += Val(CDbl(DataGridView1.Rows(i).Cells(1).Value))
If Not IsDbNull(DataGridView1.Rows(i).Cells(2).Value) AndAlso IsNumeric(DataGridView1.Rows(i).Cells(1).Value) Then data2 += Val(CDbl(DataGridView1.Rows(i).Cells(2).Value))
Next
Dim rows As String() = {"Total", data1, data2}
DataGridView1.Rows.Add(rows)
Excel Formulas I am trying to replicate in pandas:
Click here to download workbook
* Look at columns D, E and F
entsig and exsig are manual and can be changed. In real life they would be derived from the value of another column or a comparison of two other columns
ent = 1 if entsig previous = 1 and in = 0
in = 1 if ent previous = 1 or (in previous = 1 and ex = 0)
ex = 1 if exsig previous = 1 and in previous = 1
so either ent, in, or ex will always be = 1 but never more than one of them
import pandas as pd
df = pd.DataFrame(
[[0,0,0,0,0], [1,0,0,0,0], [1,0,0,0,0], [1,0,0,0,0], [0,0,0,0,0],
[0,1,0,0,0], [0,1,0,0,0], [1,0,0,0,0], [1,0,0,0,0], [0,0,0,0,0],
[0,0,0,0,0], [0,0,0,0,0], [0,1,0,0,0], [0,1,0,0,0], [0,1,0,0,0],
[0,0,0,0,0], [0,0,0,0,0], [1,0,0,0,0], [1,0,0,0,0], [1,0,0,0,0],
[1,1,0,0,0], [0,1,0,0,0], [0,1,0,0,0], [0,1,0,0,0]],
columns=['entsig', 'exsig','ent', 'in', 'ex'])
for i in df.index:
df['ent'][(df.entsig.shift(1)==1) & (df['ent'].shift(1) == 0) & (df['in'].shift(1) == 0)]=1
df['ex'][(df.exsig.shift(1)==1) & (df['in'].shift(1)==1)]=1
df['in'][(df.ent.shift(1)==1) | ((df['in'].shift(1)==1) & (df['ex']==0))]=1
for j in df.index:
df['ent'][df['in'] == 1]=0
df['in'][df['ex']==1]=0
df['ex'][df['ex'].shift(1)==1]=0
df
results in
entsig exsig ent in ex
0 0 0 0 0 0
1 1 0 0 0 0
2 1 0 1 0 0
3 1 0 0 1 0
4 0 0 0 1 0
5 0 1 0 1 0
6 0 1 0 0 1
7 1 0 0 0 0
8 1 0 1 0 0
9 0 0 0 1 0
10 0 0 0 1 0
11 0 0 0 1 0
12 0 1 0 1 0
13 0 1 0 0 1
14 0 1 0 0 0
15 0 0 0 0 0
16 0 0 0 0 0
17 1 0 0 0 0
18 1 0 1 0 0
19 1 0 0 1 0
20 1 1 0 1 0
21 0 1 0 0 1
22 0 1 0 0 0
23 0 1 0 0 0
Question
How can I make this code faster? It runs slow because it's a loop but I have not been able to come up with a solution that does not use loops. Any ideas or comments are appreciated.
If we can assume every group of 1's in entsig is followed by at least one 1 in
exsig, then you could compute ent, ex and in like this:
def ent_in_ex(df):
entsig_mask = (df['entsig'].diff().shift(1) == 1)
exsig_mask = (df['exsig'].diff().shift(1) == 1)
df.loc[entsig_mask, 'ent'] = 1
df.loc[exsig_mask, 'ex'] = 1
df['in'] = df['ent'].shift(1).cumsum().subtract(df['ex'].cumsum(), fill_value=0)
return df
If we can make this assumption, then ent_in_ex is significantly faster:
In [5]: %timeit orig(df)
10 loops, best of 3: 185 ms per loop
In [6]: %timeit ent_in_ex(df)
100 loops, best of 3: 2.23 ms per loop
In [95]: orig(df).equals(ent_in_ex(df))
Out[95]: True
where orig is the original code:
def orig(df):
for i in df.index:
df['ent'][(df.entsig.shift(1)==1) & (df['ent'].shift(1) == 0) & (df['in'].shift(1) == 0)]=1
df['ex'][(df.exsig.shift(1)==1) & (df['in'].shift(1)==1)]=1
df['in'][(df.ent.shift(1)==1) | ((df['in'].shift(1)==1) & (df['ex']==0))]=1
for j in df.index:
df['ent'][df['in'] == 1]=0
df['in'][df['ex']==1]=0
df['ex'][df['ex'].shift(1)==1]=0
return df