How to plot ROC-AUC figures without using scikit-learn

How to plot ROC-AUC figures without using scikit-learn - object-detection

I have the following list containing multiple tuples of (TP, FP, FN):
[(12, 0, 0), (5, 2, 2), (10, 0, 1), (7, 1, 1), (13, 0, 0), (7, 2, 2), (11, 0, 2)]
each tuple represents the scores for a single image. This means I have 7 images and I have calculated the scores for a object detection task. Now I calculate precision and recall for each image(tuple) using the following function:
def calculate_recall_precision(data):
precisions_bundle = []
recalls_bundle = []
for tp, fp, fn in data:
precision = tp / (tp + fp)
recall = tp / (tp + fn)
precisions_bundle.append(precision)
recalls_bundle.append(recall)
return (precisions_bundle, recalls_bundle)
This function returns a tuple which contains two lists. The first one is precision values for each image and the second one is recall values for each image.
Now my main goal is to plot ROC and AUC curves using only matplotlib. Please note that I do not want to use scikit-learn library.

You can simply use matplotlib.pyplot.plot method. For example:
import numpy as np
import matplotlib.pyplot as plt
def plot_PR(precision_bundle, recall_bundle, save_path:Path=None):
line = plt.plot(recall_bundle, precision_bundle, linewidth=2, markersize=6)
line = plt.title('Precision/Recall curve', size =18, weight='bold')
line = plt.ylabel('Precision', size=15)
line = plt.xlabel('Recall', size=15 )
random_classifier_line_x = np.linspace(0, 1, 10)
random_classifier_line_y = np.linspace(1, 0, 10)
_ = plt.plot(random_classifier_line_x, random_classifier_line_y, color='firebrick', linestyle='--')
if save_path:
outname = save_path / 'PR_curve_thresh_opt.png'
_ = plt.savefig(outname, dpi = 100, bbox_inches='tight' )
return line
and then just use it as plot_PR(precision_bundle, recall_bundle).
Note: here I also added a dashed line for a random classifier and the possibility to save the figure in case you want to

Related

How to limit scrolling outside bounds in pyqtgraph?

I've set up a chart for candlestick display using PyQtGraph.
I've made a simplified example below.
I'm trying to figure out how to limit the viewable/scrollable range on the chart for the y1 and y2 axis.
I want to limit them to equal the ymin and ymax settings in the boundingRect() function.
If I run the chart it starts off with the bounds set correctly but it allows you to manually scroll around the chart outside of the bounds that are set within the boundingRect()
I want to prevent the ability to scroll beyond what the boundingRect() allows.
I want to be able to scroll along the X axis without issue but I want the Y axis to dynamically limit the bounds to the Y axis of the candlesticks that are currently viewable.
For starters I can't figure out how to force limits on scrollable bounds in a way that is compatible with what I have written below.
QPainterPath or QRectF doesn't seem to have a function to limit the scrollable view that I can find. Or at the very least I can't figure out the proper syntax.
Then I need to figure out how to return the axis range of the currently viewable candles in order to dynamically set the scrollable/viewable limits. Haven't gotten that far yet.
Any help is appreciated.
import pyqtgraph as pg
from pyqtgraph import QtCore, QtGui, QtWidgets
import numpy as np
data = np.array([ ## fields are (time, open, close, min, max).
(1., 10, 13),
(2., 13, 17),
(3., 17, 14),
(4., 14, 15),
(5., 15, 9),
(6., 9, 15),
(7., 15, 5),
(8., 5, 7),
(9., 7, 3),
(10., 3, 10),
(11., 10, 15),
(12., 15, 25),
(13., 25, 20),
(14., 20, 17),
(15., 17, 30),
(16., 30, 32),
(17., 32, 35),
(18., 35, 28),
(19., 28, 27),
(20., 27, 25),
(21., 25, 29),
(22., 29, 35),
(23., 35, 40),
(24., 40, 45),
])
class CandlestickItem(pg.GraphicsObject):
global data
_boundingRect = QtCore.QRectF()
# ...
def __init__(self):
pg.GraphicsObject.__init__(self)
self.flagHasData = False
def set_data(self, data):
self.data = data
self.flagHasData = True
self.generatePicture()
self.informViewBoundsChanged()
def generatePicture(self):
self.picture = QtGui.QPicture()
path = QtGui.QPainterPath()
p = QtGui.QPainter(self.picture)
p.setPen(pg.mkPen('w'))
w = (self.data[1][0] - self.data[0][0]) / 3.
for (t, open, close) in self.data:
rect = QtCore.QRectF(t-w, open, w*2, close-open)
path.addRect(rect)
if open > close:
p.setBrush(pg.mkBrush('r'))
else:
p.setBrush(pg.mkBrush('g'))
p.drawRect(rect)
p.end()
self._boundingRect = path.boundingRect()
def paint(self, p, *args):
if self.flagHasData:
p.drawPicture(0, 0, self.picture)
def boundingRect(self):
# data =data
# xmin = np.nanmin(data[:,0])
xmax = np.nanmax(data[:,0])
xmin = xmax - 5
ymin = np.nanmin(data[:,2])
ymax = np.nanmax(data[:,2])
return QtCore.QRectF(xmin, ymin, xmax-xmin, ymax-ymin)
item = CandlestickItem()
plt = pg.plot()
plt.addItem(item)
plt.setWindowTitle('pyqtgraph example: customGraphicsItem')
item.set_data(data)
if __name__ == '__main__':
import sys
if (sys.flags.interactive != 1) or not hasattr(QtCore, 'PYQT_VERSION'):
QtWidgets.QApplication.instance().exec_()

If I understood you correctly, you want to prevent user to zoom or scroll outside boundictRect of your candlesitck plot. That can be achieved with setting limit to viewbox.
EDIT: y view is now limited to min and max value within current view specified by x_start and x_end.
Here is modified code, that does that:
import math
import numpy as np
import pyqtgraph as pg
from pyqtgraph import QtCore, QtGui, QtWidgets
data = np.array([ ## fields are (time, open, close, min, max).
(1., 10, 13),
(2., 13, 17),
(3., 17, 14),
(4., 14, 15),
(5., 15, 9),
(6., 9, 15),
(7., 15, 5),
(8., 5, 7),
(9., 7, 3),
(10., 3, 10),
(11., 10, 15),
(12., 15, 25),
(13., 25, 20),
(14., 20, 17),
(15., 17, 30),
(16., 30, 32),
(17., 32, 35),
(18., 35, 28),
(19., 28, 27),
(20., 27, 25),
(21., 25, 29),
(22., 29, 35),
(23., 35, 40),
(24., 40, 45),
])
class CandlestickItem(pg.GraphicsObject):
global data
_boundingRect = QtCore.QRectF()
# ...
def __init__(self):
pg.GraphicsObject.__init__(self)
self.picture = QtGui.QPicture()
self.flagHasData = False
def set_data(self, data):
self.data = data
self.flagHasData = True
self.generatePicture()
self.informViewBoundsChanged()
def generatePicture(self):
self.picture = QtGui.QPicture()
path = QtGui.QPainterPath()
p = QtGui.QPainter(self.picture)
p.setPen(pg.mkPen('w'))
w = (self.data[1][0] - self.data[0][0]) / 3.
for (t, open, close) in self.data:
rect = QtCore.QRectF(t - w, open, w * 2, close - open)
path.addRect(rect)
if open > close:
p.setBrush(pg.mkBrush('r'))
else:
p.setBrush(pg.mkBrush('g'))
p.drawRect(rect)
p.end()
self._boundingRect = path.boundingRect()
def paint(self, p, *args):
if self.flagHasData:
p.drawPicture(0, 0, self.picture)
def boundingRect(self):
return QtCore.QRectF(self.picture.boundingRect())
def viewTransformChanged(self):
super(CandlestickItem, self).viewTransformChanged()
br = self.boundingRect()
# Get coords of view mapped to data
mapped_view = self.mapRectToView(self.viewRect())
# Get start and end of x slice
x_slice_start = int(mapped_view.x()) - 1
x_slice_end = x_slice_start + (math.ceil(mapped_view.width()) + 1)
if x_slice_start < 0:
x_slice_start = 0
if x_slice_end > data.shape[0]:
x_slice_end = data.shape[0]
# Get data in x interval
y_slice = data[x_slice_start:x_slice_end]
try:
ymin = np.nanmin(y_slice[:, 2])
ymax = np.nanmax(y_slice[:, 2])
if ymin != ymax:
self.getViewBox().setLimits(xMin=br.x(), xMax=br.width(), yMin=ymin, yMax=ymax)
except ValueError:
pass
return br
item = CandlestickItem()
plt = pg.plot()
plt.addItem(item)
plt.setWindowTitle('pyqtgraph example: customGraphicsItem')
item.set_data(data)
if __name__ == '__main__':
import sys
if (sys.flags.interactive != 1) or not hasattr(QtCore, 'PYQT_VERSION'):
QtWidgets.QApplication.instance().exec_()

How to convert a list of indices into a cell list (numpy array of lists) in numpy with vectorized implementation?

Cell list is a data structure that maintains lists of data points in an N-D meshgrid. For example, the following list of 2d indices:
ind = [(0, 1), (1, 0), (0, 1), (0, 0), (0, 0), (0, 0), (1, 1)]
is converted to the following 2x2 cell list:
cell = [[[3, 4, 5], [0, 2]],
[[1, ], [6, ]]
]
using an O(n) algorithm:
# create an empty 2x2 cell list
cell = [[[] for _ in range(2)] for _ in range(2)]
for i in range(len(ind)):
cell[ind[i][0], ind[i][1]].append(i)
Is there a vectorized way in numpy that can convert the list of indices (ind) into the cell structure described above?

I don't think there is a good pure numpy but you can either use pythran or---if you don't want to touch a compiler---scipy.sparse cf. this Q&A which is essentially a 1D version of your problem.
[stb_pthr.py] simplified from Most efficient way to sort an array into bins specified by an index array?
import numpy as np
#pythran export sort_to_bins(int[:], int)
def sort_to_bins(idx, mx=-1):
if mx==-1:
mx = idx.max() + 1
cnts = np.zeros(mx + 1, int)
for i in range(idx.size):
cnts[idx[i] + 1] += 1
for i in range(1, cnts.size):
cnts[i] += cnts[i-1]
res = np.empty_like(idx)
for i in range(idx.size):
res[cnts[idx[i]]] = i
cnts[idx[i]] += 1
return res, cnts[:-1]
Compile: pythran stb_pthr.py
Main script:
import numpy as np
try:
from stb_pthr import sort_to_bins
HAVE_PYTHRAN = True
except:
HAVE_PYTHRAN = False
from scipy import sparse
def fallback(flat, maxind):
res = sparse.csr_matrix((np.zeros_like(flat),flat,np.arange(len(flat)+1)),
(len(flat), maxind)).tocsc()
return res.indices, res.indptr[1:-1]
if not HAVE_PYTHRAN:
sort_to_bins = fallback
def to_cell(data, shape=None):
data = np.asanyarray(data)
if shape is None:
*shape, = (1 + c.max() for c in data.T)
flat = np.ravel_multi_index((*data.T,), shape)
reord, bnds = sort_to_bins(flat, np.prod(shape))
return np.frompyfunc(np.split(reord, bnds).__getitem__, 1, 1)(
np.arange(np.prod(shape)).reshape(shape))
ind = [(0, 1), (1, 0), (0, 1), (0, 0), (0, 0), (0, 0), (1, 1)]
print(to_cell(ind))
from timeit import timeit
ind = np.transpose(np.unravel_index(np.random.randint(0, 100, (1_000_000)), (10, 10)))
if HAVE_PYTHRAN:
print(timeit(lambda: to_cell(ind), number=10)*100)
sort_to_bins = fallback # !!! MUST REMOVE THIS LINE AFTER TESTING
print(timeit(lambda: to_cell(ind), number=10)*100)
Sample run, output is answer to OP's toy example and timings (in ms) for the pythran and scipy solutions on a 1,000,000 points example:
[[array([3, 4, 5]) array([0, 2])]
[array([1]) array([6])]]
11.411489499732852
29.700406698975712

Multicolored graph based on data frame values

Im plotting chart based on the data frame as below., I want to show the graph line in different colour based on the column Condition. Im trying the following code but it shows only one colour throughout the graph.
df = pd.DataFrame(dict(
Day=pd.date_range('2018-01-01', periods = 60, freq='D'),
Utilisation = np.random.rand(60) * 100))
df = df.astype(dtype= {"Utilisation":"int64"})
df['Condition'] = np.where(df.Utilisation < 10, 'Winter',
np.where(df.Utilisation < 30, 'Summer', 'Spring'))
condition_map = {'Winter': 'r', 'Summer': 'k', 'Spring': 'b'}
df[['Utilisation','Day']].set_index('Day').plot(figsize=(10,4), rot=90,
color=df.Condition.map(condition_map))

So, I assume you want a graph for each condition.
I would use groupby to separate the data.
# Color setting
season_color = {'Winter': 'r', 'Summer': 'k', 'Spring': 'b'}
# Create figure and axes
f, ax = plt.subplots(figsize = (10, 4))
# Loop over and plot each group of data
for cond, data in df.groupby('Condition'):
ax.plot(data.Day, data.Utilisation, color = season_color[cond], label = cond)
# Fix datelabels
f.autofmt_xdate()
f.legend()
f.show()
If you truly want the date ticks to be rotated 90 degrees, use autofmt_xdate(rotation = 90)
Update:
If you want to plot everything in a single line it's a bit trickier since a line only can have one color associated to it.
You could plot a line between each point and split a line if it crosses a "color boundary", or check out this pyplot example: multicolored line
Another possibility is to plot a lot of scatter points between each point and create a own colormap that represents your color boundaries.
To create a colormap (and norm) I use from_levels_and_colors
import matplotlib.colors
colors = ['#00BEC5', '#a0c483', '#F9746A']
boundaries = [0, 10, 30, 100]
cm, nrm = matplotlib.colors.from_levels_and_colors(boundaries, colors)
To connect each point with next you could shift the dataframe, but here I just zip the original df with a sliced version
from itertools import islice
f, ax = plt.subplots(figsize = (10,4))
for (i,d0), (i,d1) in zip(df.iterrows(), islice(df.iterrows(), 1, None)):
d_range = pd.date_range(d0.Day, d1.Day, freq = 'h')
y_val = np.linspace(d0.Utilisation, d1.Utilisation, d_range.size)
ax.scatter(d_range, y_val, c = y_val, cmap = cm, norm = nrm)
f.autofmt_xdate()
f.show()

Colormap is not categorizing the data properly

Here is my script to plot data from a Geogtiff file using basemap. The data is categorical and there are 13 categories within this domain. The problem is that some categories get bunched up into one colour and thus some resolution is lost.
Unfortunately, I do not know how to fix this. I read that plt.cm.get_cmp is better for discrete datasets but I have not gotten it to work unfortunately.
gtif = 'some_dir'
ds = gdal.Open(gtif)
data = ds.ReadAsArray()
gt = ds.GetGeoTransform()
proj = ds.GetProjection()
xres = gt[1]
yres = gt[5]
xmin = gt[0] + xres
xmax = gt[0] + (xres * ds.RasterXSize) - xres
ymin = gt[3] + (yres * ds.RasterYSize) + yres
ymax = gt[3] - yres
xy_source = np.mgrid[xmin:xmax+xres:xres, ymax+yres:ymin:yres]
ds = None
fig2 = plt.figure(figsize=[12, 11])
ax2 = fig2.add_subplot(111)
ax2.set_title("Land use plot")
bm2 = Basemap(ax=ax2,projection='cyl',llcrnrlat=ymin,urcrnrlat=ymax,llcrnrlon=xmin,urcrnrlon=xmax,resolution='l')
bm2.drawcoastlines(linewidth=0.2)
bm2.drawcountries(linewidth=0.2)
data_new=np.copy(data)
data_new[data_new==255] = 0
nbins = np.unique(data_new).size
cb =plt.cm.get_cmap('jet', nbins+1)
img2 =bm2.imshow(np.flipud(data_new), cmap=cb)
ax2.set_xlim(3, 6)
ax2.set_ylim(50,53)
plt.show()
labels = [str(i) for i in np.unique(data_new)]
cb2=bm2.colorbar(img2, "right", size="5%", pad='3%', label='NOAH Land Use Category')
cb2.set_ticklabels(labels)
cb2.set_ticks(np.unique(data_new))
Here are the categories that are found within the domain (numbered classes):
np.unique(data_new)
array([ 0, 1, 4, 5, 7, 10, 11, 12, 13, 14, 15, 16, 17], dtype=uint8)
Thanks so much for any help here. I have also attached the output image that shows the mismatch. (not working)

First, this colormap problem is independent of the use of basemap. The following is therefore applicable to any matplotlib plot.
The problem here is that creating a colormap from n values distributes those values equally over the colormap range. Some values from the image therefore fall into the same colorrange within the colormap.
To prevent this, one can generate a colormap with the initial number of categories as shown below.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.colors
# generate some data
data = np.array( [ 0, 1, 4, 5, 7, 10]*8 )
np.random.shuffle(data)
data = data.reshape((8,6))
# generate colormap and norm
unique = np.unique(data)
vals = np.arange(int(unique.max()+1))/float(unique.max())
cols = plt.cm.jet(vals)
cmap = matplotlib.colors.ListedColormap(cols, int(unique.max())+1)
norm=matplotlib.colors.Normalize(vmin=-0.5, vmax=unique.max()+0.5)
fig, ax = plt.subplots(figsize=(5,5))
im = ax.imshow(data, cmap=cmap, norm=norm)
for i in range(data.shape[0]):
for j in range(data.shape[1]):
ax.text(j,i,data[i,j], color="w", ha="center", va="center")
cb = fig.colorbar(im, ax=ax, norm=norm)
cb.set_ticks(unique)
plt.show()
This can be extended to exclude the colors not present in the image as follows:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.colors
# generate some data
data = np.array( [ 0, 1, 4, 5, 7, 10]*8 )
np.random.shuffle(data)
data = data.reshape((8,6))
unique, newdata = np.unique(data, return_inverse=1)
newdata = newdata.reshape(data.shape)
# generate colormap and norm
new_unique = np.unique(newdata)
vals = np.arange(int(new_unique.max()+1))/float(new_unique.max())
cols = plt.cm.jet(vals)
cmap = matplotlib.colors.ListedColormap(cols, int(new_unique.max())+1)
norm=matplotlib.colors.Normalize(vmin=-0.5, vmax=new_unique.max()+0.5)
fig, ax = plt.subplots(figsize=(5,5))
im = ax.imshow(newdata, cmap=cmap, norm=norm)
for i in range(newdata.shape[0]):
for j in range(newdata.shape[1]):
ax.text(j,i,data[i,j], color="w", ha="center", va="center")
cb = fig.colorbar(im, ax=ax, norm=norm)
cb.ax.set_yticklabels(unique)
plt.show()

Adding an arbitrary line to a matplotlib plot in ipython notebook

I'm rather new to both python/matplotlib and using it through the ipython notebook. I'm trying to add some annotation lines to an existing graph and I can't figure out how to render the lines on a graph. So, for example, if I plot the following:
import numpy as np
np.random.seed(5)
x = arange(1, 101)
y = 20 + 3 * x + np.random.normal(0, 60, 100)
p = plot(x, y, "o")
I get the following graph:
So how would I add a vertical line from (70,100) up to (70,250)? What about a diagonal line from (70,100) to (90,200)?
I've tried a few things with Line2D() resulting in nothing but confusion on my part. In R I would simply use the segments() function which would add line segments. Is there an equivalent in matplotlib?

You can directly plot the lines you want by feeding the plot command with the corresponding data (boundaries of the segments):
plot([x1, x2], [y1, y2], color='k', linestyle='-', linewidth=2)
(of course you can choose the color, line width, line style, etc.)
From your example:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(5)
x = np.arange(1, 101)
y = 20 + 3 * x + np.random.normal(0, 60, 100)
plt.plot(x, y, "o")
# draw vertical line from (70,100) to (70, 250)
plt.plot([70, 70], [100, 250], 'k-', lw=2)
# draw diagonal line from (70, 90) to (90, 200)
plt.plot([70, 90], [90, 200], 'k-')
plt.show()

It's not too late for the newcomers.
plt.axvline(x, color='r') # vertical
plt.axhline(x, color='r') # horizontal
It takes the range of y as well, using ymin and ymax.

Using vlines:
import numpy as np
np.random.seed(5)
x = arange(1, 101)
y = 20 + 3 * x + np.random.normal(0, 60, 100)
p = plot(x, y, "o")
vlines(70,100,250)
The basic call signatures are:
vlines(x, ymin, ymax)
hlines(y, xmin, xmax)

Rather than abusing plot or annotate, which will be inefficient for many lines, you can use matplotlib.collections.LineCollection:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
np.random.seed(5)
x = np.arange(1, 101)
y = 20 + 3 * x + np.random.normal(0, 60, 100)
plt.plot(x, y, "o")
# Takes list of lines, where each line is a sequence of coordinates
l1 = [(70, 100), (70, 250)]
l2 = [(70, 90), (90, 200)]
lc = LineCollection([l1, l2], color=["k","blue"], lw=2)
plt.gca().add_collection(lc)
plt.show()
It takes a list of lines [l1, l2, ...], where each line is a sequence of N coordinates (N can be more than two).
The standard formatting keywords are available, accepting either a single value, in which case the value applies to every line, or a sequence of M values, in which case the value for the ith line is values[i % M].

Matplolib now allows for 'annotation lines' as the OP was seeking. The annotate() function allows several forms of connecting paths and a headless and tailess arrow, i.e., a simple line, is one of them.
ax.annotate("",
xy=(0.2, 0.2), xycoords='data',
xytext=(0.8, 0.8), textcoords='data',
arrowprops=dict(arrowstyle="-",
connectionstyle="arc3, rad=0"),
)
In the documentation it says you can draw only an arrow with an empty string as the first argument.
From the OP's example:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(5)
x = np.arange(1, 101)
y = 20 + 3 * x + np.random.normal(0, 60, 100)
plt.plot(x, y, "o")
# draw vertical line from (70,100) to (70, 250)
plt.annotate("",
xy=(70, 100), xycoords='data',
xytext=(70, 250), textcoords='data',
arrowprops=dict(arrowstyle="-",
connectionstyle="arc3,rad=0."),
)
# draw diagonal line from (70, 90) to (90, 200)
plt.annotate("",
xy=(70, 90), xycoords='data',
xytext=(90, 200), textcoords='data',
arrowprops=dict(arrowstyle="-",
connectionstyle="arc3,rad=0."),
)
plt.show()
Just as in the approach in gcalmettes's answer, you can choose the color, line width, line style, etc..
Here is an alteration to a portion of the code that would make one of the two example lines red, wider, and not 100% opaque.
# draw vertical line from (70,100) to (70, 250)
plt.annotate("",
xy=(70, 100), xycoords='data',
xytext=(70, 250), textcoords='data',
arrowprops=dict(arrowstyle="-",
edgecolor = "red",
linewidth=5,
alpha=0.65,
connectionstyle="arc3,rad=0."),
)
You can also add curve to the connecting line by adjusting the connectionstyle.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to plot ROC-AUC figures without using scikit-learn - object-detection

Related

How to limit scrolling outside bounds in pyqtgraph?

How to convert a list of indices into a cell list (numpy array of lists) in numpy with vectorized implementation?

Multicolored graph based on data frame values

Colormap is not categorizing the data properly

Adding an arbitrary line to a matplotlib plot in ipython notebook

Categories

Resources