Is there a way to specify the width of individual columns in a matplotlib table?
The first column in my table contains just 2-3 digit IDs, and I'd like this column to be smaller than the others, but I can't seem to get it to work.
Let's say I have a table like this:
import matplotlib.pyplot as plt
fig = plt.figure()
table_ax = fig.add_subplot(1,1,1)
table_content = [["1", "Daisy", "ill"],
["2", "Topsy", "healthy"]]
table_header = ('ID', 'Name','Status')
the_table = table_ax.table(cellText=table_content, loc='center', colLabels=table_header, cellLoc='left')
fig.show()
(Never mind the weird cropping, it doesn't happen in my real table.)
What I've tried is this:
prop = the_table.properties()
cells = prop['child_artists']
for cell in cells:
text = cell.get_text()
if text == "ID":
cell.set_width(0.1)
else:
try:
int(text)
cell.set_width(0.1)
except TypeError:
pass
The above code seems to have zero effect - the columns are still all equally wide. (cell.get_width() returns 0.3333333333, so I would think that width is indeed cell-width... so what am I doing wrong?
Any help would be appreciated!
I've been searching the web over and over again looking for similar probelm sollutions. I've found some answers and used them, but I didn't find them quite straight forward. By chance I just found the table method get_celld when simply trying different table methods.
By using it you get a dictionary where the keys are tuples corresponding to table coordinates in terms of cell position. So by writing
cellDict=the_table.get_celld()
cellDict[(0,0)].set_width(0.1)
you will simply adress the upper left cell. Now looping over rows or columns will be fairly easy.
A bit late answer, but hopefully others may be helped.
Just for completion. The column header starts with (0,0) ... (0, n-1). The row header starts with (1,-1) ... (n,-1).
---------------------------------------------
| ColumnHeader (0,0) | ColumnHeader (0,1) |
---------------------------------------------
rowHeader (1,-1) | Value (1,0) | Value (1,1) |
--------------------------------------------
rowHeader (2,-1) | Value (2,0) | Value (2,1) |
--------------------------------------------
The code:
for key, cell in the_table.get_celld().items():
print (str(key[0])+", "+ str(key[1])+"\t"+str(cell.get_text()))
Condition text=="ID" is always False, since cell.get_text() returns a Text object rather than a string:
for cell in cells:
text = cell.get_text()
print text, text=="ID" # <==== here
if text == "ID":
cell.set_width(0.1)
else:
try:
int(text)
cell.set_width(0.1)
except TypeError:
pass
On the other hand, addressing the cells directly works: try cells[0].set_width(0.5).
EDIT: Text objects have an attribute get_text() themselves, so getting down to a string of a cell can be done like this:
text = cell.get_text().get_text() # yup, looks weird
if text == "ID":
Related
I am generating an excel file from Python using XlsxWriter.
I'm trying to "layer" formatting depending on the cell value. In pseudocode:
for rowId in rows:
for colId in cols:
value = table.loc[rowId, colId]
if value < 0:
# Make font red
for rowId in rows:
for colId in cols:
value = table.loc[rowId, colId]
if value > 1e6:
# Make bg orange
for rowId in rows:
for colId in cols:
value = table.loc[rowId, colId]
if value is False:
# Make bg purple
In practice I'm finding it hard to layer formatting without either creating a format object for each combination of attributes, or creating a format object per cell.
Any ideas of how this could be achieved?
I have a data frame, lets say xyz. I have written code to find out the % of null values each column possess in the dataframe. my code below:
round(100*(xyz.isnull().sum()/len(xyz.index)), 2)
let say i got following results:
abc 26.63
def 36.58
ghi 78.46
I want to drop column ghi because it has more than 70% of null values.
I achieved it using the following code:
xyz = xyz.drop(xyz.loc[:,round(100*(xyz.isnull().sum()/len(xyz.index)), 2)>70].columns, 1)
but , i did not understand how does this code works, can anyone please explain it?
the code is doing the following:
xyz.drop( [...], 1)
removes the specified elements for a given axis, either by row or by column. In this particular case, df.drop( ..., 1) means you're dropping by axis 1, i.e, column
xyz.loc[:, ... ].columns
will return a list with the column names resulting from your slicing condition
round(100*(xyz.isnull().sum()/len(xyz.index)), 2)>70
this instruction is counting the number of nulls, adding them up and normalizing by the number of rows, effectively computing the percentage of nan in each column. Then, the amount is rounded to have only 2 decimal positions and finally you return True is the number of nan is more than 70%. Hence, you get a mapping between columns and a True/False array.
Putting everything together: you're first producing a Boolean array that marks which columns have more than 70% nan, then, using .loc you use Boolean indexing to look only at the columns you want to drop ( nan % > 70%), then using .columns you recover the name of such columns, which then are used by the .drop instruction.
Hopefully this clear things up!
If you code is hard to understand , you can just check dropna with thresh, since pandas already cover this case.
df=df.dropna(axis=1,thresh=round(len(df)*0.3))
I was looking for an answer everywhere, but I just couldn't find one to this problem (maybe I was just too stupid to use other answers, because I'm new to R).
I have two data frames with different numbers of rows. I want to create a plot containing a single bar per data frame. Both should have the same length and the count of different variables should be stacked over each other. For example: I want to compare the proportions of gender in those to data sets.
t1<-data.frame(cbind(c(1:6), factor(c(1,2,2,1,2,2))))
t2<-data.frame(cbind(c(1:4), factor(c(1,2,2,1))))
1 represents male, 2 represents female
I want to create two barplots next to each other that represent, that the proportions of gender in the first data frame is 2:4 and in the second one 2:2.
My attempt looked like this:
ggplot() + geom_bar(aes(1, t1$X2, position = "fill")) + geom_bar(aes(1, t2$X2, position = "fill"))
That leads to the error: "Error: stat_count() must not be used with a y aesthetic."
First I should merge the two dataframes. You need to add a variable that will identify the origin of the data, add in both dataframes a column with an ID (like t1 and t2). Keep in mind that your columnames are the same in both frames so you will be able to use the function rbind.
t1$data <- "t1"
t2$data <- "t2"
t <- (rbind(t1,t2))
Now you can make the plot:
ggplot(t[order(t$X2),], aes(data, X2, fill=factor(X2))) +
geom_bar(stat="identity", position="stack")
There is an inconsistency with dataframes that I cant explain. In the following, I'm not looking for a workaround (already found one) but an explanation of what is going on under the hood and how it explains the output.
One of my colleagues which I talked into using python and pandas, has a dataframe "data" with 12,000 rows.
"data" has a column "length" that contains numbers from 0 to 20. she wants to divided the dateframe into groups by length range: 0 to 9 in group 1, 9 to 14 in group 2, 15 and more in group 3. her solution was to add another column, "group", and fill it with the appropriate values. she wrote the following code:
data['group'] = np.nan
mask = data['length'] < 10;
data['group'][mask] = 1;
mask2 = (data['length'] > 9) & (data['phraseLength'] < 15);
data['group'][mask2] = 2;
mask3 = data['length'] > 14;
data['group'][mask3] = 3;
This code is not good, of course. the reason it is not good is because you dont know in run time whether data['group'][mask3], for example, will be a view and thus actually change the dataframe, or it will be a copy and thus the dataframe would remain unchanged. It took me quit sometime to explain it to her, since she argued correctly that she is doing an assignment, not a selection, so the operation should always return a view.
But that was not the strange part. the part the even I couldn't understand is this:
After performing this set of operation, we verified that the assignment took place in two different ways:
By typing data in the console and examining the dataframe summary. It told us we had a few thousand of null values. The number of null values was the same as the size of mask3 so we assumed the last assignment was made on a copy and not on a view.
By typing data.group.value_counts(). That returned 3 values: 1,2 and 3 (surprise) we then typed data.group.value_counts.sum() and it summed up to 12,000!
So by method 2, the group column contained no null values and all the values we wanted it to have. But by method 1 - it didnt!
Can anyone explain this?
see docs here.
You dont' want to set values this way for exactly the reason you pointed; since you don't know if its a view, you don't know that you are actually changing the data. 0.13 will raise/warn that you are attempting to do this, but easiest/best to just access like:
data.loc[mask3,'group'] = 3
which will guarantee you inplace setitem
I am using LazyHighCharts in rails 3. My requirement is to show different color in a column.
when i use
series = {
:type=> 'bar',
:name=> [],
:data=> #rooms,
:color=> 'pink'
}
it displays whole column in pink color. Suppose i have 5 rows in a column and i want to show first row in pink color and the rest four row in green color. can anyone suggest me solution for this.
thnks
i don't know if you have resolved this issue, but this is how i resolved it.
#Controller before set data to the series option
data = []
your_array.each do |t|
data << {:y=>t.value, :color=>"#"+("%06x" % (rand * 0xffffff))}
end
then set data to series option
f.series({:name=>"Subcurso", :data=>data} )