How to convert PASCAL VOC to YOLO - training-data

I was trying to develop some way to convert annotations between formats, and it's quit hard to find information but here I have :
This one is PASCAL VOC
<width>800</width>
<height>450</height>
<depth>3</depth>
<bndbox>
<xmin>474</xmin>
<ymin>2</ymin>
<xmax>726</xmax> <!-- shape_width = 252 -->
<ymax>449</ymax> <!-- shape_height = 447 -->
</bndbox>
convert to YOLO darknet
2 0.750000 0.501111 0.315000 0.993333
note initial 2 it's a category

My classmates and I have created a python package called PyLabel to help others with this task and other labelling tasks.
This would be the basic code to convert from voc to coco:
!pip install pylabel
from pylabel import importer
dataset = importer.ImportVOC(path=path_to_annotations)
dataset.exporter.ExportToYoloV5()
You can find sample notebooks and the source code here https://github.com/pylabel-project/pylabel

using some math: (also can be useful to COCO)
categ_index [(xmin + xmax) / 2 / image_width] [(ymin + ymax) / 2 / image_height] [(xmax - xmin) / image_width] [(ymax - ymin) / image_height]
in js code
const categ_index = 2;
const { width: image_width, height: image_height } = {
width: 800,
height: 450,
};
const { xmin, ymin, xmax, ymax } = {
xmin: 474,
ymin: 2,
xmax: 727,
ymax: 449,
};
const x_coord = (xmin + xmax) / 2 / image_width;
const y_coord = (ymin + ymax) / 2 / image_height;
const shape_width = (xmax - xmin) / image_width;
const shape_height = (ymax - ymin) / image_height;
console.log(`${categ_index} ${x_coord.toFixed(7)} ${y_coord.toFixed(7)} ${shape_width.toFixed(7)} ${shape_height.toFixed(7)}`);
// output
// 2 0.7506250 0.5011111 0.3162500 0.9933333

I use the following snippet to convert Pascal_VOC to YOLO. Yolo uses normalized coordinates so it is important to have the height and width of your image. Otherwise you can't calculate it.
Here is my snippet:
# Convert Pascal_Voc bb to Yolo
def pascal_voc_to_yolo(x1, y1, x2, y2, image_w, image_h):
return [((x2 + x1)/(2*image_w)), ((y2 + y1)/(2*image_h)), (x2 - x1)/image_w, (y2 - y1)/image_h]
I wrote an article about object detection format and how to convert them. You can check my blog post on Medium: https://christianbernecker.medium.com/convert-bounding-boxes-from-coco-to-pascal-voc-to-yolo-and-back-660dc6178742
Have fun!

Related

convert a .csv file to yolo darknet format

I have a few annotations that is originally in .csv format. I would need to convert it to yolo darknet format inorder to train my model with yolov4.
my .csv file :
YOLO format is : object-class x y width height
where, object_class, widht, height is know from my .csv format. But finding x,y is confusing .Note that x and y are center of rectangle (are not top-left corner).
Any help would be appreciated :)
You can use this function to convert bounding boxes to the yolo format. Of course you will need to write some code to read the csv. Just use this function as a template for your needs.
This function was extracted from the labelimg app:
https://github.com/tzutalin/labelImg/blob/master/libs/yolo_io.py
def BndBox2YoloLine(self, box, classList=[]):
xmin = box['xmin']
xmax = box['xmax']
ymin = box['ymin']
ymax = box['ymax']
xcen = float((xmin + xmax)) / 2 / self.imgSize[1]
ycen = float((ymin + ymax)) / 2 / self.imgSize[0]
w = float((xmax - xmin)) / self.imgSize[1]
h = float((ymax - ymin)) / self.imgSize[0]
# PR387
boxName = box['name']
if boxName not in classList:
classList.append(boxName)
classIndex = classList.index(boxName)
return classIndex, xcen, ycen, w, h

matplotlib: How to autoscale font-size so that text fits some bounding box

Problem
My matplotlib application generates user-defined dynamic images and so things like page title text can be of varying length. I want to be able to specify a bounding box to matplotlib and then have it auto-scale the font size so that the text fits within that bounding box. My application only uses the AGG backend.
My hack solution
I am the least sharp tool in the toolbox, but here is what I came up with for a solution to this problem. I brute force start at a fontsize of 50 and then iterate downward until I think I can fit the text into the box.
def fitbox(fig, text, x0, x1, y0, y1, **kwargs):
"""Fit text into a NDC box."""
figbox = fig.get_window_extent().transformed(
fig.dpi_scale_trans.inverted())
# need some slop for decimal comparison below
px0 = x0 * fig.dpi * figbox.width - 0.15
px1 = x1 * fig.dpi * figbox.width + 0.15
py0 = y0 * fig.dpi * figbox.height - 0.15
py1 = y1 * fig.dpi * figbox.height + 0.15
# print("px0: %s px1: %s py0: %s py1: %s" % (px0, px1, py0, py1))
xanchor = x0
if kwargs.get('ha', '') == 'center':
xanchor = x0 + (x1 - x0) / 2.
yanchor = y0
if kwargs.get('va', '') == 'center':
yanchor = y0 + (y1 - y0) / 2.
txt = fig.text(
xanchor, yanchor, text,
fontsize=50, ha=kwargs.get('ha', 'left'),
va=kwargs.get('va', 'bottom'),
color=kwargs.get('color', 'k')
)
for fs in range(50, 1, -2):
txt.set_fontsize(fs)
tbox = txt.get_window_extent(fig.canvas.get_renderer())
# print("fs: %s tbox: %s" % (fs, str(tbox)))
if (tbox.x0 >= px0 and tbox.x1 < px1 and tbox.y0 >= py0 and
tbox.y1 <= py1):
break
return txt
So then I can call this function like so
fitbox(fig, "Hello there, this is my title!", 0.1, 0.99, 0.95, 0.99)
Question/Feedback Request
Does matplotlib offer a better built-in solution for this problem?
Any significant downsides to this approach? The performance does not feel like a game breaker. I should likely make this function allow the specification of coordinates within a single axes and not the overall figure. Perhaps that already works :)
As an aside, I like how some other plotting applications allow the specifying of font-size in non-dimensional display coordinates. For example, PyNGL. So you can set it to fontsize=0.04 for example.
Thank you.
My implementation of auto-fit text in a box:
def text_with_autofit(self, txt, xy, width, height, *,
transform=None,
ha='center', va='center',
min_size=1, adjust=0,
**kwargs):
if transform is None:
if isinstance(self, Axes):
transform = self.transData
if isinstance(self, Figure):
transform = self.transFigure
x_data = {'center': (xy[0] - width/2, xy[0] + width/2),
'left': (xy[0], xy[0] + width),
'right': (xy[0] - width, xy[0])}
y_data = {'center': (xy[1] - height/2, xy[1] + height/2),
'bottom': (xy[1], xy[1] + height),
'top': (xy[1] - height, xy[1])}
(x0, y0) = transform.transform((x_data[ha][0], y_data[va][0]))
(x1, y1) = transform.transform((x_data[ha][1], y_data[va][1]))
# rectange region size to constrain the text
rect_width = x1 - x0
rect_height = y1- y0
fig = self.get_figure() if isinstance(self, Axes) else self
dpi = fig.dpi
rect_height_inch = rect_height / dpi
fontsize = rect_height_inch * 72
if isinstance(self, Figure):
text = self.text(*xy, txt, ha=ha, va=va, transform=transform,
**kwargs)
if isinstance(self, Axes):
text = self.annotate(txt, xy, ha=ha, va=va, xycoords=transform,
**kwargs)
while fontsize > min_size:
text.set_fontsize(fontsize)
bbox = text.get_window_extent(fig.canvas.get_renderer())
if bbox.width < rect_width:
break;
fontsize -= 1
if fig.get_constrained_layout():
text.set_fontsize(fontsize + adjust + 0.5)
if fig.get_tight_layout():
text.set_fontsize(fontsize + adjust)
return text

Tensorflow eager execution - nested gradient tape

I have been testing my WGAN-GP algorithm on TensorFlow using the traditional graph implementation. Recently, I happened to get to know TensorFlow Eager Execution and tried to convert my code to run on the eager execution mode.
Let me first show you the previous code:
self.x_ = self.g_net(self.z)
self.d = self.d_net(self.x, reuse=False)
self.d_ = self.d_net(self.x_)
self.d_loss = tf.reduce_mean(self.d) - tf.reduce_mean(self.d_)
epsilon = tf.random_uniform([], 0.0, 1.0)
x_hat = epsilon * self.x + (1 - epsilon) * self.x_
d_hat = self.d_net(x_hat)
ddx = tf.gradients(d_hat, x_hat)[0]
ddx = tf.sqrt(tf.reduce_sum(tf.square(ddx), axis=1))
ddx = tf.reduce_mean(tf.square(ddx - 1.0) * scale)
self.d_loss = self.d_loss + ddx
self.d_adam = tf.train.AdamOptimizer().minimize(self.d_loss, var_list=self.d_net.vars)
Then it is converted to:
self.x_ = self.g_net(self.z)
epsilon = tf.random_uniform([], 0.0, 1.0)
x_hat = epsilon * self.x + (1 - epsilon) * self.x_
with tf.GradientTape(persistent=True) as temp_tape:
temp_tape.watch(x_hat)
d_hat = self.d_net(x_hat)
ddx = temp_tape.gradient(d_hat, x_hat)[0]
ddx = tf.sqrt(tf.reduce_sum(tf.square(ddx), axis=1))
ddx = tf.reduce_mean(tf.square(ddx - 1.0) * 10)
with tf.GradientTape() as d_tape:
d = self.d_net(x)
d_ = self.d_net(x_)
loss_d = tf.reduce_mean(d) - tf.reduce_mean(d_) + ddx
grad_d = d_tape.gradient(loss_d, self.d_net.variables)
self.d_adam.apply_gradients(zip(grad_d, self.d_net.variables))
I tried several alternative ways to implement WGAN-GP loss, but in any way the d_loss is diverging! I hope there is someone who could enlighten me by pointing out my mistake(s).
Furthermore, I wonder whether I could use Keras layers with my previous loss and optimizer implementation. Thank you in advance!

How does Tensorflow Batch Normalization work?

I'm using tensorflow batch normalization in my deep neural network successfully. I'm doing it the following way:
if apply_bn:
with tf.variable_scope('bn'):
beta = tf.Variable(tf.constant(0.0, shape=[out_size]), name='beta', trainable=True)
gamma = tf.Variable(tf.constant(1.0, shape=[out_size]), name='gamma', trainable=True)
batch_mean, batch_var = tf.nn.moments(z, [0], name='moments')
ema = tf.train.ExponentialMovingAverage(decay=0.5)
def mean_var_with_update():
ema_apply_op = ema.apply([batch_mean, batch_var])
with tf.control_dependencies([ema_apply_op]):
return tf.identity(batch_mean), tf.identity(batch_var)
mean, var = tf.cond(self.phase_train,
mean_var_with_update,
lambda: (ema.average(batch_mean), ema.average(batch_var)))
self.z_prebn.append(z)
z = tf.nn.batch_normalization(z, mean, var, beta, gamma, 1e-3)
self.z.append(z)
self.bn.append((mean, var, beta, gamma))
And it works fine both for training and testing phases.
However I encounter problems when I try to use the computed neural network parameters in my another project, where I need to compute all the matrix multiplications and stuff by myself. The problem is that I can't reproduce the behavior of the tf.nn.batch_normalization function:
feed_dict = {
self.tf_x: np.array([range(self.x_cnt)]) / 100,
self.keep_prob: 1,
self.phase_train: False
}
for i in range(len(self.z)):
# print 0 layer's 1 value of arrays
print(self.sess.run([
self.z_prebn[i][0][1], # before bn
self.bn[i][0][1], # mean
self.bn[i][1][1], # var
self.bn[i][2][1], # offset
self.bn[i][3][1], # scale
self.z[i][0][1], # after bn
], feed_dict=feed_dict))
# prints
# [-0.077417567, -0.089603029, 0.000436493, -0.016652612, 1.0055743, 0.30664611]
According to the formula on the page https://www.tensorflow.org/versions/r1.2/api_docs/python/tf/nn/batch_normalization:
bn = scale * (x - mean) / (sqrt(var) + 1e-3) + offset
But as we can see,
1.0055743 * (-0.077417567 - -0.089603029)/(0.000436493^0.5 + 1e-3) + -0.016652612
= 0.543057
Which differs from the value 0.30664611, computed by Tensorflow itself.
So what am I doing wrong here and why I can't just calculate batch normalized value myself?
Thanks in advance!
The formula used is slightly different from:
bn = scale * (x - mean) / (sqrt(var) + 1e-3) + offset
It should be:
bn = scale * (x - mean) / (sqrt(var + 1e-3)) + offset
The variance_epsilon variable is supposed to scale with the variance, not with sigma, which is the square-root of variance.
After the correction, the formula yields the correct value:
1.0055743 * (-0.077417567 - -0.089603029)/((0.000436493 + 1e-3)**0.5) + -0.016652612
# 0.30664642276945747

Is there any way to use bivariate colormaps in matplotlib?

In other words, I want to make a heatmap (or surface plot) where the color varies as a function of 2 variables. (Specifically, luminance = magnitude and hue = phase.) Is there any native way to do this?
Some examples of similar plots:
Several good examples of exactly(?) what I want to do.
More examples from astronomy, but with non-perceptual hue
Edit: This is what I did with it: https://github.com/endolith/complex_colormap
imshow can take an array of [r, g, b] entries. So you can convert the absolute values to intensities and phases - to hues.
I will use as an example complex numbers, because for it it makes the most sense. If needed, you can always add numpy arrays Z = X + 1j * Y.
So for your data Z you can use e.g.
imshow(complex_array_to_rgb(Z))
where (EDIT: made it quicker and nicer thanks to this suggestion)
def complex_array_to_rgb(X, theme='dark', rmax=None):
'''Takes an array of complex number and converts it to an array of [r, g, b],
where phase gives hue and saturaton/value are given by the absolute value.
Especially for use with imshow for complex plots.'''
absmax = rmax or np.abs(X).max()
Y = np.zeros(X.shape + (3,), dtype='float')
Y[..., 0] = np.angle(X) / (2 * pi) % 1
if theme == 'light':
Y[..., 1] = np.clip(np.abs(X) / absmax, 0, 1)
Y[..., 2] = 1
elif theme == 'dark':
Y[..., 1] = 1
Y[..., 2] = np.clip(np.abs(X) / absmax, 0, 1)
Y = matplotlib.colors.hsv_to_rgb(Y)
return Y
So, for example:
Z = np.array([[3*(x + 1j*y)**3 + 1/(x + 1j*y)**2
for x in arange(-1,1,0.05)] for y in arange(-1,1,0.05)])
imshow(complex_array_to_rgb(Z, rmax=5), extent=(-1,1,-1,1))
imshow(complex_array_to_rgb(Z, rmax=5, theme='light'), extent=(-1,1,-1,1))
imshow will take an NxMx3 (rbg) or NxMx4 (grba) array so you can do your color mapping 'by hand'.
You might be able to get a bit of traction by sub-classing Normalize to map your vector to a scaler and laying out a custom color map very cleverly (but I think this will end up having to bin one of your dimensions).
I have done something like this (pdf link, see figure on page 24), but the code is in MATLAB (and buried someplace in my archives).
I agree a bi-variate color map would be useful (primarily for representing very dense vector fields where your kinda up the creek no matter what you do).
I think the obvious extension is to let color maps take complex arguments. It would require specialized sub-classes of Normalize and Colormap and I am going back and forth on if I think it would be a lot of work to implement. I suspect if you get it working by hand it will just be a matter of api wrangling.
I created an easy to use 2D colormap class, that takes 2 NumPy arrays and maps them to an RGB image, based on a reference image.
I used #GjjvdBurg's answer as a starting point. With a bit of work, this could still be improved, and possibly turned into a proper Python module - if you want, feel free to do so, I grant you all credits.
TL;DR:
# read reference image
cmap_2d = ColorMap2D('const_chroma.jpeg', reverse_x=True) # , xclip=(0,0.9))
# map the data x and y to the RGB space, defined by the image
rgb = cmap_2d(data_x, data_y)
# generate a colorbar image
cbar_rgb = cmap_2d.generate_cbar()
The ColorMap2D class:
class ColorMap2D:
def __init__(self, filename: str, transpose=False, reverse_x=False, reverse_y=False, xclip=None, yclip=None):
"""
Maps two 2D array to an RGB color space based on a given reference image.
Args:
filename (str): reference image to read the x-y colors from
rotate (bool): if True, transpose the reference image (swap x and y axes)
reverse_x (bool): if True, reverse the x scale on the reference
reverse_y (bool): if True, reverse the y scale on the reference
xclip (tuple): clip the image to this portion on the x scale; (0,1) is the whole image
yclip (tuple): clip the image to this portion on the y scale; (0,1) is the whole image
"""
self._colormap_file = filename or COLORMAP_FILE
self._img = plt.imread(self._colormap_file)
if transpose:
self._img = self._img.transpose()
if reverse_x:
self._img = self._img[::-1,:,:]
if reverse_y:
self._img = self._img[:,::-1,:]
if xclip is not None:
imin, imax = map(lambda x: int(self._img.shape[0] * x), xclip)
self._img = self._img[imin:imax,:,:]
if yclip is not None:
imin, imax = map(lambda x: int(self._img.shape[1] * x), yclip)
self._img = self._img[:,imin:imax,:]
if issubclass(self._img.dtype.type, np.integer):
self._img = self._img / 255.0
self._width = len(self._img)
self._height = len(self._img[0])
self._range_x = (0, 1)
self._range_y = (0, 1)
#staticmethod
def _scale_to_range(u: np.ndarray, u_min: float, u_max: float) -> np.ndarray:
return (u - u_min) / (u_max - u_min)
def _map_to_x(self, val: np.ndarray) -> np.ndarray:
xmin, xmax = self._range_x
val = self._scale_to_range(val, xmin, xmax)
rescaled = (val * (self._width - 1))
return rescaled.astype(int)
def _map_to_y(self, val: np.ndarray) -> np.ndarray:
ymin, ymax = self._range_y
val = self._scale_to_range(val, ymin, ymax)
rescaled = (val * (self._height - 1))
return rescaled.astype(int)
def __call__(self, val_x, val_y):
"""
Take val_x and val_y, and associate the RGB values
from the reference picture to each item. val_x and val_y
must have the same shape.
"""
if val_x.shape != val_y.shape:
raise ValueError(f'x and y array must have the same shape, but have {val_x.shape} and {val_y.shape}.')
self._range_x = (np.amin(val_x), np.amax(val_x))
self._range_y = (np.amin(val_y), np.amax(val_y))
x_indices = self._map_to_x(val_x)
y_indices = self._map_to_y(val_y)
i_xy = np.stack((x_indices, y_indices), axis=-1)
rgb = np.zeros((*val_x.shape, 3))
for indices in np.ndindex(val_x.shape):
img_indices = tuple(i_xy[indices])
rgb[indices] = self._img[img_indices]
return rgb
def generate_cbar(self, nx=100, ny=100):
"generate an image that can be used as a 2D colorbar"
x = np.linspace(0, 1, nx)
y = np.linspace(0, 1, ny)
return self.__call__(*np.meshgrid(x, y))
Usage:
Full example, using the constant chroma reference taken from here as a screenshot:
# generate data
x = y = np.linspace(-2, 2, 300)
xx, yy = np.meshgrid(x, y)
ampl = np.exp(-(xx ** 2 + yy ** 2))
phase = (xx ** 2 - yy ** 2) * 6 * np.pi
data = ampl * np.exp(1j * phase)
data_x, data_y = np.abs(data), np.angle(data)
# Here is the 2D colormap part
cmap_2d = ColorMap2D('const_chroma.jpeg', reverse_x=True) # , xclip=(0,0.9))
rgb = cmap_2d(data_x, data_y)
cbar_rgb = cmap_2d.generate_cbar()
# plot the data
fig, plot_ax = plt.subplots(figsize=(8, 6))
plot_extent = (x.min(), x.max(), y.min(), y.max())
plot_ax.imshow(rgb, aspect='auto', extent=plot_extent, origin='lower')
plot_ax.set_xlabel('x')
plot_ax.set_ylabel('y')
plot_ax.set_title('data')
# create a 2D colorbar and make it fancy
plt.subplots_adjust(left=0.1, right=0.65)
bar_ax = fig.add_axes([0.68, 0.15, 0.15, 0.3])
cmap_extent = (data_x.min(), data_x.max(), data_y.min(), data_y.max())
bar_ax.imshow(cbar_rgb, extent=cmap_extent, aspect='auto', origin='lower',)
bar_ax.set_xlabel('amplitude')
bar_ax.set_ylabel('phase')
bar_ax.yaxis.tick_right()
bar_ax.yaxis.set_label_position('right')
for item in ([bar_ax.title, bar_ax.xaxis.label, bar_ax.yaxis.label] +
bar_ax.get_xticklabels() + bar_ax.get_yticklabels()):
item.set_fontsize(7)
plt.show()
I know this is an old post, but want to help out others that may arrive late. Below is a python function to implement complex_to_rgb from sage. Note: This implementation isn't optimal, but it is readable. See links: (examples)(source code)
Code:
import numpy as np
def complex_to_rgb(z_values):
width = z_values.shape[0]
height = z_values.shape[1]
rgb = np.zeros(shape=(width, height, 3))
for i in range(width):
row = z_values[i]
for j in range(height):
# define value, real(value), imag(value)
zz = row[j]
x = np.real(zz)
y = np.imag(zz)
# define magnitued and argument
magnitude = np.hypot(x, y)
arg = np.arctan2(y, x)
# define lighness
lightness = np.arctan(np.log(np.sqrt(magnitude) + 1)) * (4 / np.pi) - 1
if lightness < 0:
bot = 0
top = 1 + lightness
else:
bot = lightness
top = 1
# define hue
hue = 3 * arg / np.pi
if hue < 0:
hue += 6
# set ihue and use it to define rgb values based on cases
ihue = int(hue)
# case 1
if ihue == 0:
r = top
g = bot + hue * (top - bot)
b = bot
# case 2
elif ihue == 1:
r = bot + (2 - hue) * (top - bot)
g = top
b = bot
# case 3
elif ihue == 2:
r = bot
g = top
b = bot + (hue - 2) * (top - bot)
# case 4
elif ihue == 3:
r = bot
g = bot + (4 - hue) * (top - bot)
b = top
# case 5
elif ihue == 4:
r = bot + (hue - 4) * (top - bot)
g = bot
b = top
# case 6
else:
r = top
g = bot
b = bot + (6 - hue) * (top - bot)
# set rgb array values
rgb[i, j, 0] = r
rgb[i, j, 1] = g
rgb[i, j, 2] = b
return rgb