XGBoost plot_importance cannot show full feature names - matplotlib

I am running a XGboost regressor on a dataset, and finally plotting the feature importance.
because my feature names are long(20-60 characters), it "trims" their names in the saved image (on the left side of the image).
code:
from xgboost import plot_importance
import matplotlib.pyplot as plt
...
plot_importance(search.best_estimator_)
plt.savefig(f'{os.path.join(self.model_save_directory, "feature_importance.png")}')
I tried to set a wide image with:
plt.rcParams["figure.figsize"] = (28, 7)
Now there are more characters in the image of each feature, but not all. On the other hand, the height of the image has shrunk a lot.
Thanks

I was able to show the full feature names with:
plot_importance(search.best_estimator_, max_num_features=None)
plt.xticks(rotation=90)
plt.savefig(f'{os.path.join(self.model_save_directory, "feature_importance.png")}', bbox_inches='tight')

Related

How to save histogram plots from Tensorboard 2 to disk, just like you can do with scalars?

I am using Tensorboard 2 to visualize my training data and I am able to save scalar plots to disk. However, I am unable to find a way to do this for histogram plots (tf.summary.histogram).
Is it possible to save histogram plots from Tensorboard 2 to disk, just like it is possible to do with scalars? I have looked through the documentation and it seems like this is not supported, but I wanted to confirm with the community before giving up. Any help or suggestions would be greatly appreciated.
There is an open issue to add a download button for histograms. However, this issue is open for more than 4 years, so I doubt it is getting resolved soon.
A workaround is to use the url that tensorboard would use to get the data.
A short example:
# writing some data to tensorboard
from torch.utils.tensorboard import SummaryWriter
import numpy as np
writer = SummaryWriter('./tmp')
writer.add_histogram('hist', np.arange(10), 0)
Open tensorboard in the browser (here localhost:6006):
Get data as JSON using the template
http://<tb-host>/data/plugin/histograms/histograms?run=<run-name>&tag=<tag-name>.
Here http://localhost:6006/data/plugin/histograms/histograms/?run=.&tag=hist:
Now you can download the data as JSON.
Quick comparison with matplotlib:
import pandas as pd
import json
import matplotlib.pyplot as plt
with open('histograms.json', 'r') as f:
d = pd.DataFrame(json.load(f)[0][2])
fix, axes = plt.subplots(1, 2, figsize=(10, 3))
axes[0].bar(d[1], d[2])
axes[0].set_title('tb')
axes[1].hist(data)
axes[1].set_title('original')

Extract curve from image

I have troubles with a seemingly easy image processing task. I need to extract a curve from a medical image (the image I attached was fetched from an open source database).
Raw image
I also have about ~50 images like the attached one, but they have small differences and they vary a bit. I'm afraid it's not something that can be used with ML (due to small dataset)
I tried the following approach a modification of which is working well for blood vessels:
import os
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('./img.png')
img_gray = img.copy()
img_gray[img_gray < 70] = 0
img_gray = cv.blur(img_gray * 1.5, (5, 5)).astype(np.uint8)
img_gray = 255 - img_gray
_, img_gray = cv.threshold(img_gray, 150, 255, cv.THRESH_BINARY_INV)
plt.axis('off')
plt.imshow(img_gray, cmap='gray')
plt.savefig('output.png', dpi=600, pad_inches=0, bbox_inches='tight')
Then we have the following result:
Processed curve
Any ideas of how to improve the filtering, please?

Matplotlib: emoji font does not work when using backend_pdf

I want to use the emoji-font "Symbola.ttf" to label my plots. This does work when I use plt.show(). But it does not work when using the backend_pdf. Only two emojis are shown in a mixed up order.
example images:
when using plt.show():
when using the backend_pdf:
example code:
Here is my code to produce these examples:
import matplotlib.backends.backend_pdf
import matplotlib.pyplot as plt
import emoji
from matplotlib.font_manager import FontProperties
emojis = [emoji.EMOJI_UNICODE[e] for e in list(emoji.EMOJI_UNICODE.keys())[620:630]]
prop = FontProperties(fname='./Symbola.ttf', size=30)
# backend_pdf plot
pdf = matplotlib.backends.backend_pdf.PdfPages("output.pdf")
plt.xticks(range(len(emojis)), emojis, fontproperties=prop)
pdf.savefig()
pdf.close()
# plt.show() plot
plt.xticks(range(len(emojis)), emojis, fontproperties=prop)
plt.show()
I'm running this on a Linux machine.
I think I have found the problem. It seems that my Symbola.ttf was broken. When I use this .ttf file everything works great.

Figures with lots of data points in matplotlib

I generated the attached image using matplotlib (png format). I would like to use eps or pdf, but I find that with all the data points, the figure is really slow to render on the screen. Other than just plotting less of the data, is there anyway to optimize it so that it loads faster?
I think you have three options:
As you mentioned yourself, you can plot fewer points. For the plot you showed in your question I think it would be fine to only plot every other point.
As #tcaswell stated in his comment, you can use a line instead of points which will be rendered more efficiently.
You could rasterize the blue dots. Matplotlib allows you to selectively rasterize single artists, so if you pass rasterized=True to the plotting command you will get a bitmapped version of the points in the output file. This will be way faster to load at the price of limited zooming due to the resolution of the bitmap. (Note that the axes and all the other elements of the plot will remain as vector graphics and font elements).
First, if you want to show a "trend" in your plot , and considering the x,y arrays you are plotting are "huge" you could apply a random sub-sampling to your x,y arrays, as a fraction of your data:
import numpy as np
import matplotlib.pyplot as plt
fraction = 0.50
x_resampled = []
y_resampled = []
for k in range(0,len(x)):
if np.random.rand() < fraction:
x_resampled.append(x[k])
y_resampled.append(y[k])
plt.scatter(x_resampled,y_resampled , s=6)
plt.show()
Second, have you considered using log-scale in the x-axis to increase visibility?
In this example, only the plotting area is rasterized, the axis are still in vector format:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.uniform(size=400000)
y = np.random.uniform(size=400000)
plt.scatter(x, y, marker='x', rasterized=False)
plt.savefig("norm.pdf", format='pdf')

inline images have low quality

I'm loading a TIF file with scikit-image and displaying it inline in an ipython notebook (version 2.2.0). This works, however, the image is quite small when first displayed, and when I resize it using the draggable handle on the bottom right of the image, it just rescales the image while retaining the resolution of the original, so it's very blurry when enlarged. It's basically as if ipython is converting my image into a thumbnail on the fly.
I've tried using matplotlib's plt.imshow() as well, which has the exact same result. I'm starting the notebook with ipython notebook --pylab inline.
from skimage import io
import matplotlib.pyplot as plt
image_stack = io.MultiImage("my_image.tif")
image = image_stack[0] # it's a multi-page TIF, this gets the first image in the stack
io.imshow(image) # or plt.imshow(image)
io.show() # or plt.show()
To change the "%matplotlib inline" figure resolution on the notebook do:
import matplotlib as mpl
mpl.rcParams['figure.dpi']= dpi
I recommend setting the dpi somewhere between 150 and 300 if you are going to download/print the notebook. Ensure that %matplotlib inline runs before the mpl.rcParams['figure.dpi']= dpi otherwise the magic command resets the dpi to its default value (credits to #fabioedoardoluigialberto for noticing this).
The snipppet below only changes the dpi of figures saved through the savefig method, not of inline generated figures.
import matplotlib as mpl
mpl.rc("savefig", dpi=dpi)
According to https://www.reddit.com/r/IPython/comments/1kg9e2/ipython_tip_for_getting_better_quality_inline/
You could also execute this magic in your cell:
%config InlineBackend.figure_format = 'svg'
The print quality will look significantly better. You can also change svg to retina, to use higher-res PNGs (not as nice). Nevertheless, note that if your picture becomes too complicated, the svg picture will have a much larger size than that of the retina picture
The resolution of inline matplotlib figures is downscaled a bit from what you would see in a GUI window or saved image, presumably to save space in the notebook file. To change it, you can do:
import matplotlib as mpl
mpl.rc("figure", dpi=dpi)
Where dpi is some number that will control the size/resolution of the inline plots. I believe the inline default is 80, and the default elsewhere with matplotlib is 100.
The reason scaling the resulting plot by dragging the handle doesn't work is that the plot is rendered as a png, so scaling it zooms but does not change the intrinsic resolution.
Assuming this is the same thing that happens with iPython notebook (with %matplotlib inline) when you go to drag and resize the image, the fix is fairly simple.
If you just create a figure with a different default size, then the resolution also increases with the size of the default (Change resolution of imshow in ipython). For example:
fig = plt.figure(figsize = (10,10))
ax = fig.add_subplot(111)
ax.imshow(array)
Something like this should increase the resolution of the thing you are trying to plot. This seemed to work for me with your code:
from skimage import io
import matplotlib.pyplot as plt
%matplotlib inline
image_stack = io.MultiImage("my_image.tif")
image = image_stack[0]
fig = plt.figure(figsize= (20,20)) #create an empty figure to plot into with 20x20 size
io.imshow(image)
io.show()