convert a .csv file to yolo darknet format - object-detection

I have a few annotations that is originally in .csv format. I would need to convert it to yolo darknet format inorder to train my model with yolov4.
my .csv file :
YOLO format is : object-class x y width height
where, object_class, widht, height is know from my .csv format. But finding x,y is confusing .Note that x and y are center of rectangle (are not top-left corner).
Any help would be appreciated :)

You can use this function to convert bounding boxes to the yolo format. Of course you will need to write some code to read the csv. Just use this function as a template for your needs.
This function was extracted from the labelimg app:
https://github.com/tzutalin/labelImg/blob/master/libs/yolo_io.py
def BndBox2YoloLine(self, box, classList=[]):
xmin = box['xmin']
xmax = box['xmax']
ymin = box['ymin']
ymax = box['ymax']
xcen = float((xmin + xmax)) / 2 / self.imgSize[1]
ycen = float((ymin + ymax)) / 2 / self.imgSize[0]
w = float((xmax - xmin)) / self.imgSize[1]
h = float((ymax - ymin)) / self.imgSize[0]
# PR387
boxName = box['name']
if boxName not in classList:
classList.append(boxName)
classIndex = classList.index(boxName)
return classIndex, xcen, ycen, w, h

Related

Automatic annotation for yolo not working

I am trying to generate some annotation for image files that I have created for training , I am pasting object image on the top of background image and getting the x,y coordinates of the location where the object image is pasted ,
The bounding box for the pasted object is calculated as (x, (x+w), y , (y+h))
box = (x, (w+w), y , (y+h)) # w,h are width and height of the object image
I am converting this to yolo annotation using this function :
def convert_boxes_to_yolo(box, frame):
# frame is a tuple containing background image width and height
# x = box[0][0]
# y = box[0][1]
# w = box[1][0] - box[0][0]
# h = box[1][1] - box[0][1]
x,y,w,h = box
print( frame.shape)
xc = float((x + w/2.0) / frame.shape[1])
yc = float((y + h/2.0) / frame.shape[0])
wc = float(w / frame.shape[1])
hc = float(h / frame.shape[0])
return (str(xc), str(yc), str(wc), str(hc))
and using this function to plot the bounding box , which looks correct :
import cv2
import matplotlib.pyplot as plt
img = cv2.imread('Omen_6_image_generated.png')
dh, dw, _ = img.shape
#dh, dw = (35, 400)
fl = open('Omen_6_image_generated.txt', 'r')
data = fl.readlines()
fl.close()
for dt in data:
# Split string to float
_, x, y, w, h = map(float, dt.split())
# Taken from https://github.com/pjreddie/darknet/blob/810d7f797bdb2f021dbe65d2524c2ff6b8ab5c8b/src/image.c#L283-L291
# via https://stackoverflow.com/questions/44544471/how-to-get-the-coordinates-of-the-bounding-box-in-yolo-object-detection#comment102178409_44592380
l = int((x - w / 2) * dw)
r = int((x + w / 2) * dw)
t = int((y - h / 2) * dh)
b = int((y + h / 2) * dh)
if l < 0:
l = 0
if r > dw - 1:
r = dw - 1
if t < 0:
t = 0
if b > dh - 1:
b = dh - 1
cv2.rectangle(img, (l, t), (r, b), (0, 0, 255), 1)
image = Image.fromarray(img.astype('uint8'), 'RGB')
image.show()
The bounding box is plotted correctly but the online annotation tools are not able to parse the file.
For example the plotting code correctly plots the bounding box for the shared image and annotation file below but the AI annotation tool like https://www.makesense.ai/ is not able to parse it , also if you look the same image in labelImg results look wrong.
link to both image and yolo_file:
https://drive.google.com/drive/folders/13ZTVrzswtcvXRBo6kJAhiITxx-IzOi-_?usp=sharing

UTM to WGS 84 conversion issue

I am trying to use ESRI latest land use data and downloaded a tif image from
https://www.arcgis.com/apps/instant/media/index.html?appid=fc92d38533d440078f17678ebc20e8e2
for example, shown here
https://lulctimeseries.blob.core.windows.net/lulctimeseriespublic/lc2021/16R_20210101-20220101.tif
When I load the image to ArcGIS pro, the longitude between A and B (corners) is 6 degrees, which is expected. The tif image is in WGS84/UTM 16N projection. I want to be able to find longitude and latitude of each pixels on the tif file, so I converted it to WGS 84 coordinate. However, after I converted it (by GDAL and ArcGIS), the longitude span between A and B is larger than 6 degrees. It looks like the transformer treat each grid on the image as equal distance instead of equal longitude/latitude. Am I doing something wrong here?
def wgs_transformer(img):
"""Giving geoio image, return two transformers from image crs to wgs84 and wgs84 to image crs
They can be used to translate between pixels and lat/long
"""
assert isinstance(img, geoio.base.GeoImage), 'img is not geoio.base.GeoImage type object'
old_cs= osr.SpatialReference()
old_cs.ImportFromWkt(img.ds.GetProjectionRef())
# create the new coordinate system
new_cs = osr.SpatialReference()
new_cs.ImportFromEPSG(4326)
# create a transform object to convert between coordinate systems
transform_img_wgs = osr.CoordinateTransformation(old_cs,new_cs)
transform_wgs_img = osr.CoordinateTransformation(new_cs, old_cs)
return transform_img_wgs, transform_wgs_img
def pixels_to_latlong(img, px, py, transform, arr=None, return_band=False):
"""Giving pixels x, y cordinates, return latitude and longitude
Args:
img: geoio.base.GeoImage
px: float, x cordinate
py: float, y cordinate
transfrom: osgeo.osr.CoordinateTransformation
arr: np array, band info of the img
return_band: bool, if Ture, return band info for the pixel
Returns:
latitude, longitude
"""
assert isinstance(img, geoio.base.GeoImage), 'img is not geoio.base.GeoImage type object'
assert isinstance(transform, osgeo.osr.CoordinateTransformation), 'transform has to be osgeo.osr.CoordinateTransformation object'
band, width, height = img.shape
assert 0 <= px < width and 0 <= py < height, f'px {px}, py {py} are beyond the img shape ({width}, {height})'
if return_band:
assert isinstance(arr, np.ndarray), 'arr needs to be numpy.ndarray'
lat, long, _ = transform.TransformPoint(*img.raster_to_proj(px, py))
if return_band:
band_val = arr[py-1, px-1]
return lat, long, band_val
else:
return lat, long, 0

osr.TransformPoint flips x and y if file in ENVI RAW is provided

I want to transform the corner coordinates of a UTM projected file into longlat. If I read coordinates from a GeoTIFF, the function works as expected. But, after converting the GeoTiff to ENVI raw file (with gdal_translate), the coordinates are flipped:
from osgeo import gdal
from osgeo import osr
def corner2longlat(fname):
dataset = gdal.Open(fname, gdal.GA_ReadOnly)
ds_geotrans = dataset.GetGeoTransform()
X = ds_geotrans[0]
Y = ds_geotrans[3] + dataset.RasterYSize * ds_geotrans[5]
srs_projection = dataset.GetProjectionRef()
srs = osr.SpatialReference()
srs.ImportFromWkt(srs_projection)
srsLatLong = srs.CloneGeogCS()
ct = osr.CoordinateTransformation(srs, srsLatLong)
latlon = ct.TransformPoint(X, Y)
print(srs_projection)
print([latlon[0], latlon[1]])
image_tif = "all_bands.tif"
image_raw = "all_bands.raw"
corner2longlat(image_tif) gives then:
PROJCS["WGS 84 / UTM zone 43N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",75],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32643"]]
[41.463751886708124, 74.99976050569691]
corner2longlat(image_raw) gives then:
PROJCS["WGS 84 / UTM zone 43N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",75],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
[74.99976050569691, 41.463751886708124]
Does anybody have an idea how that could happen?
Thank you in advance
Lukas

KITTI dataset crop labelled point cloud

I am trying to train my model to recognize car, pedestrian and cyclist, it requires the cyclist, car and pedestrian point cloud as the training data. I downloaded the dataset from KITTI (http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d), both the label and the velodyne point(http://www.cvlibs.net/download.php?file=data_object_label_2.zip)(http://www.cvlibs.net/download.php?file=data_object_velodyne.zip). However, the object label doesn't seem to be from this set of data. I attempted to crop the point cloud to extract the object point cloud, but I can only obtain blank 3d space. This is my cropping function in MATLAB. Is there any mistake in my code? Is there any training and testing data set for pedestrian, cyclist and car point cloud available elsewhere?
function pc = crop_pc3d (pt_cloud, x, y, z, height, width, length)
%keep points only between (x,y,z) and (x+length, y+width,z+height)
%Initialization
y_min = y; y_max = y + width;
x_min = x; x_max = x + length;
z_min = z; z_max = z + height;
%Get ROI
x_ind = find( pt_cloud.Location(:,1) < x_max & pt_cloud.Location(:,1) > x_min );
y_ind = find( pt_cloud.Location(:,2) < y_max & pt_cloud.Location(:,2) > y_min );
z_ind = find( pt_cloud.Location(:,3) < z_max & pt_cloud.Location(:,3) > z_min );
crop_ind_xy = intersect(x_ind, y_ind);
crop_ind = intersect(crop_ind_xy, z_ind);
%Update point cloud
pt_cloud = pt_cloud.Location(crop_ind, :);
pc = pointCloud(pt_cloud);
end
The labels are in the image coordinate plane. So, in order to use them for the point cloud, they need to be transformed into velodyne coordinate plane.
For this transformation, use calibration data provided by camera calibration matrices.
Calibration data is provide on the KITTI.
http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d

Description of parameters of GDAL SetGeoTransform

Can anyone help me with parameters for SetGeoTransform? I'm creating raster layers with GDAL, but I can't find description of 3rd and 5th parameter for SetGeoTransform. It should be definition of x and y axis for cells. I try to find something about it here and here, but nothing.
I need to find description of these two parameters... It's a value in degrees, radians, meters? Or something else?
The geotransform is used to convert from map to pixel coordinates and back using an affine transformation. The 3rd and 5th parameter are used (together with the 2nd and 4th) to define the rotation if your image doesn't have 'north up'.
But most images are north up, and then both the 3rd and 5th parameter are zero.
The affine transform consists of six coefficients returned by
GDALDataset::GetGeoTransform() which map pixel/line coordinates into
georeferenced space using the following relationship:
Xgeo = GT(0) + Xpixel*GT(1) + Yline*GT(2)
Ygeo = GT(3) + Xpixel*GT(4) + Yline*GT(5)
See the section on affine geotransform at:
https://gdal.org/tutorials/geotransforms_tut.html
I did do like below code.
As a result I was able to do same with SetGeoTransform.
# new file
dst = gdal.GetDriverByName('GTiff').Create(OUT_PATH, xsize, ysize, band_num, dtype)
# old file
ds = gdal.Open(fpath)
wkt = ds.GetProjection()
gcps = ds.GetGCPs()
dst.SetGCPs(gcps, wkt)
...
dst.FlushCache()
dst = Nonet
Given information from the aforementioned gdal datamodel docs, the 3rd & 5th parameters of SatGeoTransform (x_skew and y_skew respectively) can be calculated from two control points (p1, p2) with known x and y in both "geo" and "pixel" coordinate spaces. p1 should be above-left of p2 in pixelspace.
x_skew = sqrt((p1.geox-p2.geox)**2 + (p1.geoy-p2.geoy)**2) / (p1.pixely - p2.pixely)`
y_skew = sqrt((p1.geox-p2.geox)**2 + (p1.geoy-p2.geoy)**2) / (p1.pixelx - p2.pixelx)`
In short this is the ratio of Euclidean distance between the points in geospace to the height (or width) of the image in pixelspace.
The units of the parameters are "geo"length/"pixel"length.
Here is a demonstration using the corners of the image stored as control points (gcps):
import gdal
from math import sqrt
ds = gdal.Open(fpath)
gcps = ds.GetGCPs()
assert gcps[0].Id == 'UpperLeft'
p1 = gcps[0]
assert gcps[2].Id == 'LowerRight'
p2 = gcps[2]
y_skew = (
sqrt((p1.GCPX-p2.GCPX)**2 + (p1.GCPY-p2.GCPY)**2) /
(p1.GCPPixel - p2.GCPPixel)
)
x_skew = (
sqrt((p1.GCPX-p2.GCPX)**2 + (p1.GCPY-p2.GCPY)**2) /
(p1.GCPLine - p2.GCPLine)
)
x_res = (p2.GCPX - p1.GCPX) / ds.RasterXSize
y_res = (p2.GCPY - p1.GCPY) / ds.RasterYSize
ds.SetGeoTransform([
p1.GCPX,
x_res,
x_skew,
p1.GCPY,
y_skew,
y_res,
])