Matplotlib pcolor not plotting correctly - matplotlib

I am trying to create a heat map from a DataFrame (df) of IDs (rows) and Positions (columns) at which a motif is possible. If the motif is present the value of the table is 1 and 0 if it is not present. Such as:
ID Position 1 2 3 4 5 6 7 8 9 10 ...etc
A 0 1 0 0 0 1 0 0 0 1
B 1 0 1 0 1 0 0 1 0 0
C 0 0 0 1 0 0 1 0 1 0
D 1 0 1 0 0 0 1 0 1 0
I then multiply this matrix by itself to find the number of times the motifs present co-occur with motifs at other positions using the code:
df.T.dot(df)
To obtain the Data Frame:
POS 1 2 3 4 5 6 7 8 9 10 ...
1 2 0 2 0 1 0 1 1 1 0
2 0 1 0 0 0 1 0 0 0 1
3 2 0 2 0 1 0 1 1 1 0
4 0 0 0 1 0 0 1 0 1 0
5 1 0 1 0 1 0 0 1 0 0
6 0 1 0 0 0 1 0 0 0 1
7 1 0 1 1 0 0 2 0 2 0
8 1 0 1 0 1 0 0 1 0 0
9 1 0 1 1 0 0 2 0 2 0
10 0 1 0 0 0 1 0 0 0 1
...
Which is symmetrical with the diagonal, however when I try to create the Heat Map using
pylab.pcolor(df)
It gives me an asymmetrical map that does not seem to be representing the dotted matrix. I don't have enough reputation to post an image though.
Does anyone know why this might be occurring? Thanks

Related

How to turn a list of event in to a matrix to display in Panda

I have a list of events and i want to display on a graph how many happens per hour each day of the week as shown below:
Example of the graph i want
(each line is a day, x axis is the time of the day, y axis is the number of events)
As i am new to Panda i am not sure what's the best way to do it but here is my way:
x = [(rts[k].getDay(), rts[k].getHour(), 1) for k in rts]
df = pd.DataFrame(x[:30]) # Subset of 30 events
dfGrouped = df.groupby([0, 1]).sum() # Group them by day and hour
#Format to display
pd.DataFrame(np.random.randn(24, 7), index=range(0,24), columns=['Mo', 'Tu', 'We', 'Th', 'Fr', 'Sa', 'Su'])
Question is, how can i go from my dataframe with data grouped to a matrix 24x7 as required to display ?
I tried as_matrix but that give me only a one dimensional array, while i want the index of my dataframe to be the index in my matrix.
print(df)
2
0 1
0 19 1
23 1
1 10 2
18 3
22 1
2 17 1
3 8 2
9 3
11 3
13 1
19 1
4 7 1
9 1
14 1
15 1
18 1
5 1 2
7 1
13 1
19 1
6 12 1
Thanks for your help :)
Antoine
I think you need unstack for reshape data, then rename columns names by dict and if necessary add missing hours to index by reindex_axis:
df1 = df.groupby([0, 1])[2].sum().unstack(0, fill_value=0)
#set columns names
df = pd.DataFrame(x[:30], columns = ['days','hours','val'])
d = {0: 'Mo', 1: 'Tu', 2: 'We', 3: 'Th', 4: 'Fr', 5: 'Sa', 6: 'Su'}
df1 = df.groupby(['days', 'hours'])['val'].sum().unstack(0, fill_value=0)
df1 = df1.rename(columns=d).reindex_axis(range(24), fill_value=0)
print (df1)
days Mo Tu We Th Fr Sa Su
hours
0 0 0 0 0 0 0 0
1 0 0 0 0 0 2 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0
7 0 0 0 0 1 1 0
8 0 0 0 2 0 0 0
9 0 0 0 3 1 0 0
10 0 2 0 0 0 0 0
11 0 0 0 3 0 0 0
12 0 0 0 0 0 0 1
13 0 0 0 1 0 1 0
14 0 0 0 0 1 0 0
15 0 0 0 0 1 0 0
16 0 0 0 0 0 0 0
17 0 0 1 0 0 0 0
18 0 3 0 0 1 0 0
19 1 0 0 1 0 1 0
20 0 0 0 0 0 0 0
21 0 0 0 0 0 0 0
22 0 1 0 0 0 0 0
23 1 0 0 0 0 0 0

how populate columns dependng found value?

I have a pandas DataFrame with customers ID and columns related to months (1,2,3....)
I have a column with the number of months since last purchase
I am using the following to populate the relevant months columns
dt.loc[dt.month == 1, '1'] = 1
dt.loc[dt.month == 2, '2'] = 1
dt.loc[dt.month == 3, '3'] = 1
etc,
How can I populate the columns in a better way to avoid creating 12 statements?
pd.get_dummies
pd.get_dummies(dt.month)
Consider the dataframe dt
dt = pd.DataFrame(dict(
month=np.random.randint(1, 13, (10)),
a=range(10)
))
a month
0 0 8
1 1 3
2 2 8
3 3 11
4 4 3
5 5 4
6 6 1
7 7 5
8 8 3
9 9 11
Add columns like this
dt.join(pd.get_dummies(dt.month))
a month 1 3 4 5 8 11
0 0 8 0 0 0 0 1 0
1 1 3 0 1 0 0 0 0
2 2 8 0 0 0 0 1 0
3 3 11 0 0 0 0 0 1
4 4 3 0 1 0 0 0 0
5 5 4 0 0 1 0 0 0
6 6 1 1 0 0 0 0 0
7 7 5 0 0 0 1 0 0
8 8 3 0 1 0 0 0 0
9 9 11 0 0 0 0 0 1
If you wanted the column names to be strings
dt.join(pd.get_dummies(dt.month).rename(columns='month {}'.format))
a month month 1 month 3 month 4 month 5 month 8 month 11
0 0 8 0 0 0 0 1 0
1 1 3 0 1 0 0 0 0
2 2 8 0 0 0 0 1 0
3 3 11 0 0 0 0 0 1
4 4 3 0 1 0 0 0 0
5 5 4 0 0 1 0 0 0
6 6 1 1 0 0 0 0 0
7 7 5 0 0 0 1 0 0
8 8 3 0 1 0 0 0 0
9 9 11 0 0 0 0 0 1

How to create dummy variables on Ordinal columns in Python

I am new to Python. I have created dummy columns on categorical column using pandas get_dummies. How to create dummy columns on ordinal column (say column Rating has values 1,2,3...,10)
Consider the dataframe df
df = pd.DataFrame(dict(Cats=list('abcdcba'), Ords=[3, 2, 1, 0, 1, 2, 3]))
df
Cats Ords
0 a 3
1 b 2
2 c 1
3 d 0
4 c 1
5 b 2
6 a 3
pd.get_dummies
works the same on either column
with df.Cats
pd.get_dummies(df.Cats)
a b c d
0 1 0 0 0
1 0 1 0 0
2 0 0 1 0
3 0 0 0 1
4 0 0 1 0
5 0 1 0 0
6 1 0 0 0
with df.Ords
0 1 2 3
0 0 0 0 1
1 0 0 1 0
2 0 1 0 0
3 1 0 0 0
4 0 1 0 0
5 0 0 1 0
6 0 0 0 1
with both
pd.get_dummies(df)
Ords Cats_a Cats_b Cats_c Cats_d
0 3 1 0 0 0
1 2 0 1 0 0
2 1 0 0 1 0
3 0 0 0 0 1
4 1 0 0 1 0
5 2 0 1 0 0
6 3 1 0 0 0
Notice that it split out Cats but not Ords
Let's expand on this by adding another Cats2 column and calling pd.get_dummies
pd.get_dummies(df.assign(Cats2=df.Cats)))
Ords Cats_a Cats_b Cats_c Cats_d Cats2_a Cats2_b Cats2_c Cats2_d
0 3 1 0 0 0 1 0 0 0
1 2 0 1 0 0 0 1 0 0
2 1 0 0 1 0 0 0 1 0
3 0 0 0 0 1 0 0 0 1
4 1 0 0 1 0 0 0 1 0
5 2 0 1 0 0 0 1 0 0
6 3 1 0 0 0 1 0 0 0
Interesting, it splits both object columns but not the numeric one.

Truth table with 5 inputs and 3 outputs

I have to make a truth table with 5 inputs and 3 outputs, something like this:
A B C D E red green blue
0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 1
0 0 0 1 0 0 0 1
.
.
.
.
1 1 0 1 0 0 1 1
.
.
.
1 1 1 1 1 1 0 1
etc. (in total 32 rows, the numbers in the rgb table represents the number of 1's in each row in binary i.e in row 1 1 0 1 0 there are three 1's, so three in binary is 0 1 1).
I would like to present the result of it in the Atanua (http://sol.gfxile.net/atanua/index.html) tool (so fore example when I press button E, the blue light will shine, when pressing A B D the green and blue light will shine and so on). But there is a requirement that I can only use AND, OR, NOT operands, and each operand can only have two inputs. Although I'm using Karnaugh map to minimize it, still for so many records the results for each output are very long (especially for the last one).
I tried to simplify it more by adding all of the three output boolean functions into one, and the minimization process ended pretty well:
A + B + C + D
It seems to work fine (but as there is only one output light, it works only in red green blue column separately). My concern is the fact that I would like to have three outputs (three lights, not one), and is that even possible after this kind of minimization? Is there a good solution to do it in Atanua? Or do I have to make 3 separate boolean functions, no matter how long they will be (and there is a lot of them even after minimization)?
EDIT: the whole truth table :)
A B C D E R G B
0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 1
0 0 0 1 0 0 0 1
0 0 0 1 1 0 1 0
0 0 1 0 0 0 0 1
0 0 1 0 1 0 1 0
0 0 1 1 0 0 1 0
0 0 1 1 1 0 1 1
0 1 0 0 0 0 0 1
0 1 0 0 1 0 1 0
0 1 0 1 0 0 1 0
0 1 0 1 1 0 1 1
0 1 1 0 0 0 1 0
0 1 1 0 1 0 1 1
0 1 1 1 0 0 1 1
0 1 1 1 1 1 0 0
1 0 0 0 0 0 0 1
1 0 0 0 1 0 1 0
1 0 0 1 0 0 1 0
1 0 0 1 1 0 1 1
1 0 1 0 0 0 1 0
1 0 1 0 1 0 1 1
1 0 1 1 0 0 1 1
1 0 1 1 1 1 0 0
1 1 0 0 0 0 1 0
1 1 0 0 1 0 1 1
1 1 0 1 0 0 1 1
1 1 0 1 1 1 0 0
1 1 1 0 0 0 1 1
1 1 1 0 1 1 0 0
1 1 1 1 0 1 0 0
1 1 1 1 1 1 0 1
And the karnaugh map for each color (~is the gate NOT, * is AND, + OR):
RED:
BCDE+ACDE+ABDE+ABCE+ABCD
GREEN:
~A~BDE+~AC~DE+~ACD~E+~BCD~E+~AB~CE+B~CD~E+BC~D~E+A~B~CE+A~B~CD+A~BC~D+AB~C~D
BLUE:
~A~B~C~DE+~A~B~CD~E+~A~BC~D~E+~A~BCDE+~AB~C~D~E+~AB~CDE+~ABC~DE+~ABCD~E+A~B~C~D~E+A~B~CDE+A~BC~DE+A~BCD~E+AB~C~DE+AB~CD~E+ABC~D~E+ABCDE
Have to admit that the formulas are somewhat ugly, but it's not too complicated to implement with logic gatters, because you can reuse parts.
A -----+------+------------- - - -
NOT |
+------|--AND- ~AB
| | |
AND-----|---|-- ~A~B
+--AND-+ |
| +--|---|-- A~B
NOT AND--|-- AB
B -----+------+---+---------- - - -
Here as an example I created all combinations of [not]A and [not]B. You can do the same for C and D. So you can get any combination of [not]A and [not]B and [not]C and [not]D by combining a wire from each "box" with an and gatter (e.g. for ABCD we would take the AB wire AND the CD wire).

Extract columns from row values in Python

I am using this dataframe:
dfPredET.head(5)
id Class
1 Class_2
2 Class_1
3 Class_6
4 Class_2
5 Class_1
and I would like to transforms it indicating if one instance belongs to a class (1) or not (0):
id Class_1 Class_2 Class_3 Class_4 Class_5 Class_6 Class_7 Class_8 Class_9
1 0 1 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0
3 0 0 0 0 0 1 0 0 0
4 0 1 0 0 0 0 0 0 0
5 1 0 0 0 0 0 0 0 0
Can I do that using pivot() function? And how?
Use get_dummies:
In [7]:
pd.get_dummies(df)
Out[7]:
id Class_Class_1 Class_Class_2 Class_Class_6
0 1 0 1 0
1 2 1 0 0
2 3 0 0 1
3 4 0 1 0
4 5 1 0 0