Fast R-Cnn and problems with spatialScale in BrainScript - cntk

I have the following code:
model (features, rois) = {
convOut = convLayers (features)
roiOut = ROIPooling (convOut, rois, (9:9),spatialScale=64.0/196.0)
z = fcLayers (roiOut)
}.z
Original taken from: cntk\Examples\Image\Detection\FastRCNN\BrainScript
What is spatialScale in ROIPooling and how do I calculate it?
If have found this in the output from the cntk.exe.
Validating --> z.convOut.z.rn3.r.r = RectifiedLinear (z.convOut.z.rn3.r.r._) : [49 x 49 x 64 x *] -> [49 x 49 x 64 x *]
Validating --> rois = InputValue() : -> [4 x 1000 x *]
Validating --> z.roiOut = ROIPooling (z.convOut.z.rn3.r.r, rois) : [49 x 49 x 64 x *], [4 x 1000 x *] -> [9 x 9 x 64 x 1000 x *]

spatial scale is the ratio of the spatial resolution of the input to the ROI and the spatial resolution of the input image to the network. 1/16.0 is the value used in the original Fast and Faster R-CNN implementation, this value depend on the network.
Pretty much, spatial scale is the scale of the input to ROI relative to the original image.
Thanks,
Emad

Related

Percentage weighting given two variables to equal a target

I have a target of target = 11.82 with two variables
x = 9
y = 15
How do I find the percentage weighting that would blend x & y to equal my target? i.e. 55% of x and 45% of y - what function is most efficient way to calc a weighting to obtain my target?
Looking at it again, what I think you want is really two equations:
9x + 15y = 11.82
x + y = 1
Solving that system of equations is pretty fast on pen and paper (just do linear combination). Or you could use sympy to solve the system of linear equations:
>>> from sympy import *
>>> from sympy.solvers.solveset import linsolve
>>> x, y = symbols('x, y')
>>> linsolve([x + y - 1, 9 * x + 15 * y - 11.82], (x, y)) # make 0 on right by subtraction
FiniteSet((0.53, 0.47))
We can confirm this by substitution:
>>> 9 * 0.53 + 15 * 0.47
11.82

Why high order function is not popular in numpy

For example, when you do x * y + z
with high order function, it can be expressed as : on3(x,y,add,lambda (x,y,z): x * y + z) which in theory could save lots of computations.
My question is why such patterns are rare in numpy
If you say x * y + z with NumPy arrays, x * y will allocate a temporary array and then + z will allocate the final array. If you need to squeeze out every bit of performance, you can avoid the intermediate temporary array like this:
r = x * y
r += z
That will only allocate a single array, which is nearly optimal assuming you don't want to mutate the inputs. If you do want to mutate them, you could instead do this:
x *= y # or np.mul(x, y, out=x)
x += z # or np.add(x, z, out=x)
Then you allocate nothing.
The above may still not be optimal if the data do not fit in cache, because you have to traverse x twice. You can solve this problem using Numba, by writing a vectorized function:
import numba
#numba.vectorize
def fma(x, y, z):
return x * y + z
Now when you run fma(x, y, z) it will visit the first element of each array, run x * y + z on those three elements, detect the type of the result, allocate an output array of that type, and then do the calculation for the rest of the elements.
Putting it all together, doing a single pass over the inputs and allocating nothing:
fma(x, y, z, out=x) # can also use out=y or out=z

Python to fit a linear-plateau curve

I have curve that initially Y increases linearly with X, then reach a plateau at point C.
In other words, the curve can be defined as:
if X < C:
Y = k * X + b
else:
Y = k * C + b
The training data is a list of X ~ Y values. I need to determine k, b and C through a machine learning approach (or similar), since the data is noisy and refection point C changes over time. I want something more robust than get C through observing the current sample data.
How can I do it using sklearn or maybe scipy?
WLOG you can say the second equation is
Y = C
looks like you have a linear regression to fit the line and then a detection point to find the constant.
You know that in the high values of X, as in X > C you are already at the constant. So just check how far back down the values of X you get the same constant.
Then do a linear regression to find the line with value of X, X <= C
Your model is nonlinear
I think the smartest way to solve this is to do these steps:
find the maximum value of Y which is equal to k*C+b
M=max(Y)
drop this maximum value from your dataset
df1 = df[df.Y != M]
and then you have simple dataset to fit your X to Y and you can use sklearn for that

Negative values in Log likelihood of a bivariate gaussian

I am trying to implement a loss function which tries to minimize the negative log likelihood of obtaining ground truth values (x,y) from predicted bivariate gaussian distribution parameters. I am implementing this in tensorflow -
Here is the code -
def tf_2d_normal(self, x, y, mux, muy, sx, sy, rho):
'''
Function that implements the PDF of a 2D normal distribution
params:
x : input x points
y : input y points
mux : mean of the distribution in x
muy : mean of the distribution in y
sx : std dev of the distribution in x
sy : std dev of the distribution in y
rho : Correlation factor of the distribution
'''
# eq 3 in the paper
# and eq 24 & 25 in Graves (2013)
# Calculate (x - mux) and (y-muy)
normx = tf.sub(x, mux)
normy = tf.sub(y, muy)
# Calculate sx*sy
sxsy = tf.mul(sx, sy)
# Calculate the exponential factor
z = tf.square(tf.div(normx, sx)) + tf.square(tf.div(normy, sy)) - 2*tf.div(tf.mul(rho, tf.mul(normx, normy)), sxsy)
negRho = 1 - tf.square(rho)
# Numerator
result = tf.exp(tf.div(-z, 2*negRho))
# Normalization constant
denom = 2 * np.pi * tf.mul(sxsy, tf.sqrt(negRho))
# Final PDF calculation
result = -tf.log(tf.div(result, denom))
return result
When I am doing the training, I can see the loss value decreasing but it goes well past below 0. I can understand that should be because, we are minimizing the 'negative' likelihood. Even the loss values are decreasing, I can't get my results accurate. Can someone help in verifying, if the code that I have written for the loss function is correct or not.
Also is such a nature of loss desirable for training Neural Nets(specifically RNN)?
Thankss
I see you've found the sketch-rnn code from magenta, I'm working on something similar. I found this piece of code not to be stable by itself. You'll need to stabilize it using constraints, so the tf_2d_normal code can't be used or interpreted in isolation. NaNs and Infs will start appearing all over the place if your data isn't normalized properly in advance or in your loss function.
Below is a more stable loss function version I'm building with Keras. There may be some redundancy in here, it may not be perfect for your needs but I found it to be working and you can test/adapt it. I included some inline comments on how large negative log values can arise:
def r3_bivariate_gaussian_loss(true, pred):
"""
Rank 3 bivariate gaussian loss function
Returns results of eq # 24 of http://arxiv.org/abs/1308.0850
:param true: truth values with at least [mu1, mu2, sigma1, sigma2, rho]
:param pred: values predicted from a model with the same shape requirements as truth values
:return: the log of the summed max likelihood
"""
x_coord = true[:, :, 0]
y_coord = true[:, :, 1]
mu_x = pred[:, :, 0]
mu_y = pred[:, :, 1]
# exponentiate the sigmas and also make correlative rho between -1 and 1.
# eq. # 21 and 22 of http://arxiv.org/abs/1308.0850
# analogous to https://github.com/tensorflow/magenta/blob/master/magenta/models/sketch_rnn/model.py#L326
sigma_x = K.exp(K.abs(pred[:, :, 2]))
sigma_y = K.exp(K.abs(pred[:, :, 3]))
rho = K.tanh(pred[:, :, 4]) * 0.1 # avoid drifting to -1 or 1 to prevent NaN, you will have to tweak this multiplier value to suit the shape of your data
norm1 = K.log(1 + K.abs(x_coord - mu_x))
norm2 = K.log(1 + K.abs(y_coord - mu_y))
variance_x = K.softplus(K.square(sigma_x))
variance_y = K.softplus(K.square(sigma_y))
s1s2 = K.softplus(sigma_x * sigma_y) # very large if sigma_x and/or sigma_y are very large
# eq 25 of http://arxiv.org/abs/1308.0850
z = ((K.square(norm1) / variance_x) +
(K.square(norm2) / variance_y) -
(2 * rho * norm1 * norm2 / s1s2)) # z → -∞ if rho * norm1 * norm2 → ∞ and/or s1s2 → 0
neg_rho = 1 - K.square(rho) # → 0 if rho → {1, -1}
numerator = K.exp(-z / (2 * neg_rho)) # → ∞ if z → -∞ and/or neg_rho → 0
denominator = (2 * np.pi * s1s2 * K.sqrt(neg_rho)) + epsilon() # → 0 if s1s2 → 0 and/or neg_rho → 0
pdf = numerator / denominator # → ∞ if denominator → 0 and/or if numerator → ∞
return K.log(K.sum(-K.log(pdf + epsilon()))) # → -∞ if pdf → ∞
Hope you find this of value.

Transform a vector to another frame of reference

I have a green vehicle which will shortly collide with a blue object (which is 200 away from the cube)
It has a Kinect depth camera D at [-100,0,200] which sees the corner of the cube (grey sphere)
The measured depth is 464 at 6.34° in the X plane and 12.53° in the Y plane.
I want to calculate the position of the corner as it would appear if there was a camera F at [150,0,0], which would see this:
in other words transform the red vector into the yellow vector. I know that this is achieved with a transformation matrix but I can't find out how to compute the matrix from the D-F vector [250,0,-200] or how to use it; my high-school maths dates back 40 years.
math.se has a similar question but it doesn't cover my problem and I can't find anything on robotices.se either.
I realise that I should show some code that I've tried, but I don't know where to start. I would be very grateful if somebody could help me to solve this.
ROS provides the tf library which allows you to transform between frames. You can simply set a static transform between the pose of your camera and the pose of your desired location. Then, you can get the pose of any point detected by your camera in the reference frame of your desired point on your robot. ROS tf will do everything you need and everything I explain below.
The longer answer is that you need to construct a transformation tree. First, compute the static transformation between your two poses. A pose is a 7-dimensional transformation including a translation and orientation. This is best represented as a quaternion and a 3D vector.
Now, for all poses in the reference frame of your kinect, you must transform them to your desired reference frame. Let's call this frame base_link and your camera frame camera_link.
I'm going to go ahead and decide that base_link is the parent of camera_link. Technically these transformations are bidirectional, but because you may need a transformation tree, and because ROS cares about this, you'll want to decide who is the parent.
To convert rotation from camera_link to base_link, you need to compute the rotational difference. This can be done by multiplying the quaternion of base_link's orientation by the conjugate of camera_link's orientation. Here's a super quick Python example:
def rotDiff(self,q1: Quaternion,q2: Quaternion) -> Quaternion:
"""Finds the quaternion that, when applied to q1, will rotate an element to q2"""
conjugate = Quaternion(q2.qx*-1,q2.qy*-1,q2.qz*-1,q2.qw)
return self.rotAdd(q1,conjugate)
def rotAdd(self, q1: Quaternion, q2: Quaternion) -> Quaternion:
"""Finds the quaternion that is the equivalent to the rotation caused by both input quaternions applied sequentially."""
w1 = q1.qw
w2 = q2.qw
x1 = q1.qx
x2 = q2.qx
y1 = q1.qy
y2 = q2.qy
z1 = q1.qz
z2 = q2.qz
w = w1 * w2 - x1 * x2 - y1 * y2 - z1 * z2
x = w1 * x2 + x1 * w2 + y1 * z2 - z1 * y2
y = w1 * y2 + y1 * w2 + z1 * x2 - x1 * z2
z = w1 * z2 + z1 * w2 + x1 * y2 - y1 * x2
return Quaternion(x,y,z,w)
Next, you need to add the vectors. The naive approach is to simply add the vectors, but you need to account for rotation when calculating these. What you really need is a coordinate transformation. The position of camera_link relative to base_link is some 3D vector. Based on your drawing, this is [-250, 0, 200]. Next, we need to reproject the vectors to your points of interest into the rotational frame of base_link. I.e., all the points your camera sees at 12.53 degrees that appear at the z = 0 plane to your camera are actually on a 12.53 degree plane relative to base_link and you need to find out what their coordinates are relative to your camera as if your camera was in the same orientation as base_link.
For details on the ensuing math, read this PDF (particularly starting at page 9).
To accomplish this, we need to find your vector's components in base_link's reference frame. I find that it's easiest to read if you convert the quaternion to a rotation matrix, but there is an equivalent direct approach.
To convert a quaternion to a rotation matrix:
def Quat2Mat(self, q: Quaternion) -> rotMat:
m00 = 1 - 2 * q.qy**2 - 2 * q.qz**2
m01 = 2 * q.qx * q.qy - 2 * q.qz * q.qw
m02 = 2 * q.qx * q.qz + 2 * q.qy * q.qw
m10 = 2 * q.qx * q.qy + 2 * q.qz * q.qw
m11 = 1 - 2 * q.qx**2 - 2 * q.qz**2
m12 = 2 * q.qy * q.qz - 2 * q.qx * q.qw
m20 = 2 * q.qx * q.qz - 2 * q.qy * q.qw
m21 = 2 * q.qy * q.qz + 2 * q.qx * q.qw
m22 = 1 - 2 * q.qx**2 - 2 * q.qy**2
result = [[m00,m01,m02],[m10,m11,m12],[m20,m21,m22]]
return result
Now that your rotation is represented as a rotation matrix, it's time to do the final calculation.
Following the MIT lecture notes from my link above, I'll arbitrarily name the vector to your point of interest from the camera A.
Find the rotation matrix that corresponds with the quaternion that represents the rotation between base_link and camera_link and simply perform a matrix multiplication. If you're in Python, you can use numpy to do this, but in the interest of being explicit, here is the long form of the multiplication:
def coordTransform(self, M: RotMat, A: Vector) -> Vector:
"""
M is my rotation matrix that represents the rotation between my frames
A is the vector of interest in the frame I'm rotating from
APrime is A, but in the frame I'm rotating to.
"""
APrime = []
i = 0
for component in A:
APrime.append(component * M[i][0] + component * M[i][1] + component * m[i][2])
i += 1
return APrime
Now, the vectors from camera_link are represented as if camera_link and base_link share an orientation.
Now you may simply add the static translation between camera_link and base_link (or subtract base_link -> camera_link) and the resulting vector will be your point's new translation.
Putting it all together, you can now gather the translation and orientation of every point your camera detects relative to any arbitrary reference frame to gather pose data relevant to your application.
You can put all of this together into a function simply called tf() and stack these transformations up and down a complex transformation tree. Simply add all the transformations up to a common ancestor and subtract all the transformations down to your target node in order to find the transformation of your data between any two arbitrary related frames.
Edit: Hendy pointed out that it's unclear what Quaternion() class I refer to here.
For the purposes of this answer, this is all that's necessary:
class Quaternion():
def __init__(self, qx: float, qy: float, qz: float, qw: float):
self.qx = qx
self.qy = qy
self.xz = qz
self.qw = qw
But if you want to make this class super handy, you can define __mul__(self, other: Quaternion and __rmul__(self, other: Quaternion) to perform quaternion multiplication (order matters, so make sure to do both!). conjugate(self), toEuler(self), toRotMat(self), normalize(self) may also be handy additions.
Note that due to quirks in Python's typing, the above other: Quaternion is only for clarity. You'll need a longer-form if type(other) != Quaternion: raise TypeError('You can only multiply quaternions with other quaternions) error handling block to make that into valid python :)
The following definitions are not necessary for this answer, but they may prove useful to the reader.
import numpy as np
def __mul__(self, other):
if type(other) != Quaternion:
print("Quaternion multiplication only works with other quats")
raise TypeError
r1 = self.qw
r2 = other.qw
v1 = [self.qx,self.qy,self.qz]
v2 = [other.qx,other.qy,other.qz]
rPrime = r1*r2 - np.dot(v1,v2)
vPrimeA = np.multiply(r1,v2)
vPrimeB = np.multiply(r2,v1)
vPrimeC = np.cross(v1,v2)
vPrimeD = np.add(vPrimeA, vPrimeB)
vPrime = np.add(vPrimeD,vPrimeC)
x = vPrime[0]
y = vPrime[1]
z = vPrime[2]
w = rPrime
return Quaternion(x,y,z,w)
def __rmul__(self, other):
if type(other) != Quaternion:
print("Quaternion multiplication only works with other quats")
raise TypeError
r1 = other.qw
r2 = self.qw
v1 = [other.qx,other.qy,other.qz]
v2 = [self.qx,self.qy,self.qz]
rPrime = r1*r2 - np.dot(v1,v2)
vPrimeA = np.multiply(r1,v2)
vPrimeB = np.multiply(r2,v1)
vPrimeC = np.cross(v1,v2)
vPrimeD = np.add(vPrimeA, vPrimeB)
vPrime = np.add(vPrimeD,vPrimeC)
x = vPrime[0]
y = vPrime[1]
z = vPrime[2]
w = rPrime
return Quaternion(x,y,z,w)
def conjugate(self):
return Quaternion(self.qx*-1,self.qy*-1,self.qz*-1,self.qw)