I am trying to train my model to recognize car, pedestrian and cyclist, it requires the cyclist, car and pedestrian point cloud as the training data. I downloaded the dataset from KITTI (, both the label and the velodyne point( However, the object label doesn't seem to be from this set of data. I attempted to crop the point cloud to extract the object point cloud, but I can only obtain blank 3d space. This is my cropping function in MATLAB. Is there any mistake in my code? Is there any training and testing data set for pedestrian, cyclist and car point cloud available elsewhere?
function pc = crop_pc3d (pt_cloud, x, y, z, height, width, length)
%keep points only between (x,y,z) and (x+length, y+width,z+height)
y_min = y; y_max = y + width;
x_min = x; x_max = x + length;
z_min = z; z_max = z + height;
%Get ROI
x_ind = find( pt_cloud.Location(:,1) < x_max & pt_cloud.Location(:,1) > x_min );
y_ind = find( pt_cloud.Location(:,2) < y_max & pt_cloud.Location(:,2) > y_min );
z_ind = find( pt_cloud.Location(:,3) < z_max & pt_cloud.Location(:,3) > z_min );
crop_ind_xy = intersect(x_ind, y_ind);
crop_ind = intersect(crop_ind_xy, z_ind);
%Update point cloud
pt_cloud = pt_cloud.Location(crop_ind, :);
pc = pointCloud(pt_cloud);

The labels are in the image coordinate plane. So, in order to use them for the point cloud, they need to be transformed into velodyne coordinate plane.
For this transformation, use calibration data provided by camera calibration matrices.
Calibration data is provide on the KITTI.


Calculating the size of an object using opencv and numpy poly1d

I'm looking to use a small numpy array to generate a curve that I can use to predict the height measurement at non-known points. I have several points that I am using to create a poly1d. I know it's possible, we use software that does it just fine at work, and when I used a different image as a tester, plugging the values into Excel and getting the polynomial, it worked fine, but I'm getting pretty drastic measurements on a different calibratable image, I get drastically different results.
Here is the image that I'm trying to measure.
The stick on the front of the pole contains known measurements. From bottom to top, they are 3'6" (42"), 6'6" (78"), 9' 8" (116"), 13' (156)
The picture has been through opencv undistort with a calibrated camera.
This is the function that actually performs the logic. x and y are gathered by cv2 EVENT_LBUTTONUP, and sent to this function.
Checking the lengths of the array is just to help me figure out why this isn't working, trying to generate a line to show the curve fit.
dist = self.firstClick-y
if len(self.yData) > 4:
if len(self.yData) == 4:
array = np.array(self.xData)
array = np.expand_dims(array, axis=0)
array=np.append(array, [self.yData], axis=0)
x = array[:,0]
y = array[:,1]
self.poly = np.poly1d(np.polyfit(x, y, 2))
poly1d = np.poly1d(self.poly)
xp = np.linspace(-2, 20, 1)
_ = plt.plot(x, y, '.', xp, self.poly(xp), '-', xp, self.poly(xp), '--')
When I run this code, my values tend to quickly go into the tens of thousands when I'm attempting to collect the measurement at 18' 11", (the lowest wire).
Any help would be appreciated, I've been up all night trying to fit this curve.
Sorry, I should have included the code used to display and scale the image.
self.img = cv2.imread(imagePath, cv2.IMREAD_ANYCOLOR)
self.scale_percent = 30
self.width = int(self.img.shape[1] * self.scale_percent/100)
self.height = int(self.img.shape[0] * self.scale_percent/100)
dsize = (self.width, self.height)
self.output = cv2.resize(self.img, dsize)
img = self.output
cv2.imshow('image', img)
cv2.setMouseCallback('image', self.click_event)
I just called this function to display the image and the below code to calibrate the values.
if self.firstClick == 0:
self.firstClick = y
cv2.putText(self.output, "Pole Base", (x, y), font, 1, (255, 255, 0), 2)
cv2.imshow('image', self.output)
elif self.firstClick != 0 and self.secondClick == 0:
self.secondClick = y
print("The difference in first and second clicks is", self.firstClick - self.secondClick)
first = self.firstClick - self.secondClick
inch = first/42
foot = inch*12
self.foot = foot
print("One foot is currently: ", foot)
self.firstLine = 3.5*12
self.secondLine = 6.5*12
self.thirdLine = 9.67*12
self.fourthLine = 13*12
self.xData = np.array([self.firstLine, self.secondLine, self.thirdLine, self.fourthLine])

How to do FFT convolve? How to do normalization?

In Python, we can do a convolution by numpy.fft. For example, if we try to calculate gravitational lensing signal of the SIS model, we could define $\kappa$ as
$\kappa = \frac{\theta_{\rm E}}{2|\theta|}$,
then we can calculate deflection angle $\alpha$ by a convolution as $\alpha = \frac{1}{\pi} \int d\theta'^2 \kappa(\theta) \frac{\theta-\theta'}{|\theta-\theta'|}$ . Theoretically, deflection is $\alpha(\theta) = \theta_{\rm E}\frac{\theta}{|\theta|}$. But when I try to calculate that by numpy.fft, I am puzzled by some factors of normalization. For example,
npix = 2048 #mesh grid number
thetaE = 0.5 #a constant
dtheta = thetaE/10 #grid resolution
theta_x, theta_y, theta = mesh_theta(npix, dtheta) #assign grid position for calculation, theta will be an array of 2048*2048
kappa = thetaE/2./theta #define kappa mesh, 2048*2048
kern_alpha_x, kern_alpha_y = kernal_alpha(theta_x, theta_y) #define kernal mesh, 2048*2048
###zero padding should be used###
kappa_fft = np.fft.fft2(kappa)
kern_alpha_x_fft = np.fft.fft2(kern_alpha_x)
kern_alpha_y_fft = np.fft.fft2(kern_alpha_y)
alpha_x = np.fft.fftshift(np.fft.ifft2(kappa_fft*kern_alpha_x_fft)).real
alpha_y = np.fft.fftshift(np.fft.ifft2(kappa_fft*kern_alpha_y_fft)).real
As shown above, $\alpha_{\rm x}$, $\alpha_{\rm y}$ could be calculated by convolution between $\kappa$ and $K_{\alpha_{\rm x}}$, $K_{\alpha_{\rm y}}$, which means deflection $|\alpha| = \sqrt{\alpha_{\rm x}^2+\alpha_{\rm y}^2}$. However, when I check the results from alpha_x, alpha_y, it seems there is a normalization should be multiplied. If I multiply a factor as np.sqrt(alpha_x**2 + alpha_y**2)*dtheta*dtheta, then the result seems to be right. I do not know should this normalization dtheta*dthetamust be used and why? Thx.

Find 7 vertices of a box using openCV

I don't know if this question have been repeating in here. If yes then i'm sorry..
I have a box that positioned to see H,W,L view. I understand steps to get vertices however most of the examples in the net only describes how to get 4 vertices from 2D plane. So my question is, how if we want to get 7 vertices (like the pic above) and handle it in numpy? How to differentiate between upper points and lower points?
I will be using Python to determine this.
Here's my attempt to get the 8 corners of the 3d rectangle. I masked on the saturation channel of the HSV color space since that separates out white.
I used findContours to get the contour of the box and then used approxPolyDP to get a six-point approximation (the six visible corners).
From there I approximated the two "hidden" corners via a parallelogram approximation. For each point I looked two points behind and created a fourth point that would make a parallelogram with that side. I then took the centroid of these parallelogram points to guess the corner. I hoped that taking the centroid of the points would help even out the error between the parallelogram assumption and the perspective warping, but it did a poor job.
If you need a better approximation there are probably ways to estimate the perspective warping to get the corners.
import cv2
import numpy as np
import random
def tup(point):
return (int(point[0]), int(point[1]));
# load image
img = cv2.imread("box.jpg");
# reduce size to fit on screen
scale = 0.25;
h,w = img.shape[:2];
h = int(scale*h);
w = int(scale*w);
img = cv2.resize(img, (w,h));
copy = np.copy(img);
# convert to hsv
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV);
h,s,v = cv2.split(hsv);
# make mask
mask = cv2.inRange(s, 30, 255);
# dilate and erode to get rid of small holes
kernel = np.ones((5,5), np.uint8);
mask = cv2.dilate(mask, kernel, iterations = 1);
mask = cv2.erode(mask, kernel, iterations = 1);
# contours # OpenCV 3.4, in OpenCV 2 or 4 it returns (contours, _)
_, contours, _ = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE);
contour = contours[0]; # just take the first one
# approx until 6 points
num_points = 999999;
step_size = 0.01;
percent = step_size;
while num_points >= 6:
# get number of points
epsilon = percent * cv2.arcLength(contour, True);
approx = cv2.approxPolyDP(contour, epsilon, True);
num_points = len(approx);
# increment
percent += step_size;
# step back and get the points
# there could be more than 6 points if our step size misses it
percent -= step_size * 2;
epsilon = percent * cv2.arcLength(contour, True);
approx = cv2.approxPolyDP(contour, epsilon, True);
# draw contour
cv2.drawContours(img, [approx], -1, (0,0,200), 2);
# draw points
for point in approx:
point = point[0]; # drop extra layer of brackets
center = (int(point[0]), int(point[1]));, center, 4, (150, 200, 0), -1);
# do parallelogram approx to get the two "hidden" corners to complete our 3d rectangle
proposals = [];
size = len(approx);
for a in range(size):
# get points backwards
two = approx[a - 2][0];
one = approx[a - 1][0];
curr = approx[a][0];
# get vector from one -> two
dx = two[0] - one[0];
dy = two[1] - one[1];
hidden = [curr[0] + dx, curr[1] + dy];
proposals.append([hidden, curr, a, two]);
# debug draw
c = np.copy(copy);, tup(two), 4, (255, 0, 0), -1);, tup(one), 4, (0,255,0), -1);, tup(curr), 4, (0,0,255), -1);, tup(hidden), 4, (255,255,0), -1);
cv2.line(c, tup(two), tup(one), (0,0,200), 1);
cv2.line(c, tup(curr), tup(hidden), (0,0,200), 1);
cv2.imshow("Mark", c);
# draw proposals
for point in proposals:
point = point[0];
center = (point[0], point[1]);, center, 4, (200, 100, 0), -1);
# group points and sum up points
hidden_corners = [[0,0], [0,0]];
for point in proposals:
# get index and update hidden corners
index = point[2] % 2;
pos = point[0];
hidden_corners[index][0] += pos[0];
hidden_corners[index][1] += pos[1];
# divide to get centroid
hidden_corners[0][0] /= 3.0;
hidden_corners[0][1] /= 3.0;
hidden_corners[1][0] /= 3.0;
hidden_corners[1][1] /= 3.0;
# draw new points
for point in proposals:
# unpack
pos = point[0];
parent = point[1];
index = point[2] % 2;
source = point[3];
# draw
color = [random.randint(0, 150) for a in range(3)];
cv2.line(img, tup(hidden_corners[index]), tup(parent), (0,0,200), 2);
cv2.line(img, tup(pos), tup(parent), color, 1);
cv2.line(img, tup(pos), tup(source), color, 1);, tup(hidden_corners[index]), 4, (200, 200, 0), -1);
# show
cv2.imshow("Image", img);
cv2.imshow("Mask", mask);

Description of parameters of GDAL SetGeoTransform

Can anyone help me with parameters for SetGeoTransform? I'm creating raster layers with GDAL, but I can't find description of 3rd and 5th parameter for SetGeoTransform. It should be definition of x and y axis for cells. I try to find something about it here and here, but nothing.
I need to find description of these two parameters... It's a value in degrees, radians, meters? Or something else?
The geotransform is used to convert from map to pixel coordinates and back using an affine transformation. The 3rd and 5th parameter are used (together with the 2nd and 4th) to define the rotation if your image doesn't have 'north up'.
But most images are north up, and then both the 3rd and 5th parameter are zero.
The affine transform consists of six coefficients returned by
GDALDataset::GetGeoTransform() which map pixel/line coordinates into
georeferenced space using the following relationship:
Xgeo = GT(0) + Xpixel*GT(1) + Yline*GT(2)
Ygeo = GT(3) + Xpixel*GT(4) + Yline*GT(5)
See the section on affine geotransform at:
I did do like below code.
As a result I was able to do same with SetGeoTransform.
# new file
dst = gdal.GetDriverByName('GTiff').Create(OUT_PATH, xsize, ysize, band_num, dtype)
# old file
ds = gdal.Open(fpath)
wkt = ds.GetProjection()
gcps = ds.GetGCPs()
dst.SetGCPs(gcps, wkt)
dst = Nonet
Given information from the aforementioned gdal datamodel docs, the 3rd & 5th parameters of SatGeoTransform (x_skew and y_skew respectively) can be calculated from two control points (p1, p2) with known x and y in both "geo" and "pixel" coordinate spaces. p1 should be above-left of p2 in pixelspace.
x_skew = sqrt((p1.geox-p2.geox)**2 + (p1.geoy-p2.geoy)**2) / (p1.pixely - p2.pixely)`
y_skew = sqrt((p1.geox-p2.geox)**2 + (p1.geoy-p2.geoy)**2) / (p1.pixelx - p2.pixelx)`
In short this is the ratio of Euclidean distance between the points in geospace to the height (or width) of the image in pixelspace.
The units of the parameters are "geo"length/"pixel"length.
Here is a demonstration using the corners of the image stored as control points (gcps):
import gdal
from math import sqrt
ds = gdal.Open(fpath)
gcps = ds.GetGCPs()
assert gcps[0].Id == 'UpperLeft'
p1 = gcps[0]
assert gcps[2].Id == 'LowerRight'
p2 = gcps[2]
y_skew = (
sqrt((p1.GCPX-p2.GCPX)**2 + (p1.GCPY-p2.GCPY)**2) /
(p1.GCPPixel - p2.GCPPixel)
x_skew = (
sqrt((p1.GCPX-p2.GCPX)**2 + (p1.GCPY-p2.GCPY)**2) /
(p1.GCPLine - p2.GCPLine)
x_res = (p2.GCPX - p1.GCPX) / ds.RasterXSize
y_res = (p2.GCPY - p1.GCPY) / ds.RasterYSize

Convert from latitude, longitude to x, y

I want to convert GPS location (latitude, longitude) into x,y coordinates.
I found many links about this topic and applied it, but it doesn't give me the correct answer!
I am following these steps to test the answer:
(1) firstly, i take two positions and calculate the distance between them using maps.
(2) then convert the two positions into x,y coordinates.
(3) then again calculate distance between the two points in the x,y coordinates
and see if it give me the same result in point(1) or not.
one of the solution i found the following, but it doesn't give me correct answer!
latitude = Math.PI * latitude / 180;
longitude = Math.PI * longitude / 180;
// adjust position by radians
latitude -= 1.570795765134; // subtract 90 degrees (in radians)
// and switch z and y
xPos = (app.radius) * Math.sin(latitude) * Math.cos(longitude);
zPos = (app.radius) * Math.sin(latitude) * Math.sin(longitude);
yPos = (app.radius) * Math.cos(latitude);
also i tried this link but still not work with me well!
any help how to convert from(latitude, longitude) to (x,y) ?
No exact solution exists
There is no isometric map from the sphere to the plane. When you convert lat/lon coordinates from the sphere to x/y coordinates in the plane, you cannot hope that all lengths will be preserved by this operation. You have to accept some kind of deformation. Many different map projections do exist, which can achieve different compromises between preservations of lengths, angles and areas. For smallish parts of earth's surface, transverse Mercator is quite common. You might have heard about UTM. But there are many more.
The formulas you quote compute x/y/z, i.e. a point in 3D space. But even there you'd not get correct distances automatically. The shortest distance between two points on the surface of the sphere would go through that sphere, whereas distances on the earth are mostly geodesic lengths following the surface. So they will be longer.
Approximation for small areas
If the part of the surface of the earth which you want to draw is relatively small, then you can use a very simple approximation. You can simply use the horizontal axis x to denote longitude λ, the vertical axis y to denote latitude φ. The ratio between these should not be 1:1, though. Instead you should use cos(φ0) as the aspect ratio, where φ0 denotes a latitude close to the center of your map. Furthermore, to convert from angles (measured in radians) to lengths, you multiply by the radius of the earth (which in this model is assumed to be a sphere).
x = r λ cos(φ0)
y = r φ
This is simple equirectangular projection. In most cases, you'll be able to compute cos(φ0) only once, which makes subsequent computations of large numbers of points really cheap.
I want to share with you how I managed the problem. I've used the equirectangular projection just like #MvG said, but this method gives you X and Y positions related to the globe (or the entire map), this means that you get global positions. In my case, I wanted to convert coordinates in a small area (about 500m square), so I related the projection point to another 2 points, getting the global positions and relating to local (on screen) positions, just like this:
First, I choose 2 points (top-left and bottom-right) around the area where I want to project, just like this picture:
Once I have the global reference area in lat and lng, I do the same for screen positions. The objects containing this data are shown below.
//top-left reference point
var p0 = {
scrX: 23.69, // Minimum X position on screen
scrY: -0.5, // Minimum Y position on screen
lat: -22.814895, // Latitude
lng: -47.072892 // Longitude
//bottom-right reference point
var p1 = {
scrX: 276, // Maximum X position on screen
scrY: 178.9, // Maximum Y position on screen
lat: -22.816419, // Latitude
lng: -47.070563 // Longitude
var radius = 6371; //Earth Radius in Km
//## Now I can calculate the global X and Y for each reference point ##\\
// This function converts lat and lng coordinates to GLOBAL X and Y positions
function latlngToGlobalXY(lat, lng){
//Calculates x based on cos of average of the latitudes
let x = radius*lng*Math.cos(( +;
//Calculates y based on latitude
let y = radius*lat;
return {x: x, y: y}
// Calculate global X and Y for top-left reference point
p0.pos = latlngToGlobalXY(, p0.lng);
// Calculate global X and Y for bottom-right reference point
p1.pos = latlngToGlobalXY(, p1.lng);
* This gives me the X and Y in relation to map for the 2 reference points.
* Now we have the global AND screen areas and then we can relate both for the projection point.
// This function converts lat and lng coordinates to SCREEN X and Y positions
function latlngToScreenXY(lat, lng){
//Calculate global X and Y for projection point
let pos = latlngToGlobalXY(lat, lng);
//Calculate the percentage of Global X position in relation to total global width
pos.perX = ((pos.x-p0.pos.x)/(p1.pos.x - p0.pos.x));
//Calculate the percentage of Global Y position in relation to total global height
pos.perY = ((pos.y-p0.pos.y)/(p1.pos.y - p0.pos.y));
//Returns the screen position based on reference points
return {
x: p0.scrX + (p1.scrX - p0.scrX)*pos.perX,
y: p0.scrY + (p1.scrY - p0.scrY)*pos.perY
//# The usage is like this #\\
var pos = latlngToScreenXY(-22.815319, -47.071718);
$point = $("#point-to-project");
$point.css("left", pos.x+"em");
$point.css("top", pos.y+"em");
As you can see, I made this in javascript, but the calculations can be translated to any language.
P.S. I'm applying the converted positions to an HTML element whose id is "point-to-project". To use this piece of code on your project, you shall create this element (styled as position absolute) or change the "usage" block.
Since this page shows up on top of google while i searched for this same problem, I would like to provide a more practical answers. The answer by MVG is correct but rather theoratical.
I have made a track plotting app for the fitbit ionic in javascript. The code below is how I tackled the problem.
var gpsFix = false;
var circumferenceAtLat = 0;
function locationSuccess(pos){
gpsFix = true;
circumferenceAtLat = Math.cos(pos.coords.latitude*0.01745329251)*111305;
let x = Math.round((this.segments[i].start.x - this.bounds.minX)*this.scale);
let y = Math.round(this.bounds.maxY - this.segments[i].start.y)*this.scale; //heights needs to be inverted
let redraw = false;
//x or y bounds?
this.bounds.maxX = (position.x-this.bounds.minX)*1.1+this.bounds.minX; //increase by 10%
redraw = true;
this.bounds.minX = this.bounds.maxX-(this.bounds.maxX-position.x)*1.1;
redraw = true;
this.bounds.maxY = (position.y-this.bounds.minY)*1.1+this.bounds.minY; //increase by 10%
redraw = true;
this.bounds.minY = this.bounds.maxY-(this.bounds.maxY-position.y)*1.1;
redraw = true;
function reDraw(){
let xScale = device.screen.width / (this.bounds.maxX-this.bounds.minX);
let yScale = device.screen.height / (this.bounds.maxY-this.bounds.minY);
if(xScale<yScale) this.scale = xScale;
else this.scale = yScale;
//Loop trough your object to redraw all of them
For completeness I like to add my python adaption of #allexrm code which worked really well. Thanks again!
radius = 6371 #Earth Radius in KM
class referencePoint:
def __init__(self, scrX, scrY, lat, lng):
self.scrX = scrX
self.scrY = scrY = lat
self.lng = lng
# Calculate global X and Y for top-left reference point
p0 = referencePoint(0, 0, 52.526470, 13.403215)
# Calculate global X and Y for bottom-right reference point
p1 = referencePoint(2244, 2060, 52.525035, 13.405809)
# This function converts lat and lng coordinates to GLOBAL X and Y positions
def latlngToGlobalXY(lat, lng):
# Calculates x based on cos of average of the latitudes
x = radius*lng*math.cos(( +
# Calculates y based on latitude
y = radius*lat
return {'x': x, 'y': y}
# This function converts lat and lng coordinates to SCREEN X and Y positions
def latlngToScreenXY(lat, lng):
# Calculate global X and Y for projection point
pos = latlngToGlobalXY(lat, lng)
# Calculate the percentage of Global X position in relation to total global width
perX = ((pos['x']-p0.pos['x'])/(p1.pos['x'] - p0.pos['x']))
# Calculate the percentage of Global Y position in relation to total global height
perY = ((pos['y']-p0.pos['y'])/(p1.pos['y'] - p0.pos['y']))
# Returns the screen position based on reference points
return {
'x': p0.scrX + (p1.scrX - p0.scrX)*perX,
'y': p0.scrY + (p1.scrY - p0.scrY)*perY
pos = latlngToScreenXY(52.525607, 13.404572);
pos['x] and pos['y] contain the translated x & y coordinates of the lat & lng (52.525607, 13.404572)
I hope this is helpful for anyone looking like me for the proper solution to the problem of translating lat lng into a local reference coordinate system.
Its better to convert to utm coordinates, and treat that as x and y.
import utm
u = utm.from_latlon(12.917091, 77.573586)
The result will be (779260.623156606, 1429369.8665238516, 43, 'P')
The first two can be treated as x,y coordinates, the 43P is the UTM Zone, which can be ignored for small areas (width upto 668 km).