TD3 not converging on obstacle avoidance - tensorflow

My objective is obstacle avoidance using TD3 on (drone or uav) and currently I am having 2 spherical obstacles of radius 0.6 meters in my environment and there is a target position. My task is to make UAV reach the target position without colliding with the obstacles. I am doing this simulation in pybullet.
I am having my reward function as:
reward = 1.25 + 0.25*reachibility_reward - (2/6)*obs_penalty - (1/6)*pos_err - 0.5*inside_reward
Here 1.25 is the factor motivating the drone to hover and not crash immediately.
reachibility_reward is the reward which I am giving once the drone reaches somewhat close to the target.It is one time reward and not given every step.
obs_penalty is the penalty it is getting while the drone hits the obstacle.
Pos_err is the penalty the drone gets when it goes away from the target.
inside_reward is the penalty it is getting for not staying inside the environment. My environment spans from -3,3 on the x and y axis each and from 0,3 on the z axis as drone cannot go underground.
The plot of this reward function in 2-D is as follows:
Reward Plot in 2D
Here as you can see the darker rewards are the negative rewards and the lighter rewards are the positive rewards. The UAV tries to go from darker gradient to lighter graident. The two big blue things are the obstacles having reward of around -2 and the bright small circle is the target position with target reward of +2.5 and there is also small gradient for the drone to stay inside the environment.
Now the reward function seems fine to me but when I am plotting this on tensorboard the results are:
actor loss and critic loss
According to me the actor loss should decrease continuously and the critic loss should go close to zero and it was happening in the exploration phase (I have defined 2000 episodes to explore the environment and then start training of actor and critic) but now after exploration phase the actor loss is increasing and critic loss is increasing and decreasing abruptly.
I am trying to figure this out from past 3 months but am not getting any person with expertise in RL to discuss.
Can anyone tell what should I check in my model so that the network starts working?

Related

How to create a loss-function for an unsupervised-learning model, where the ouput resembles the direct input for a game agent?

I'm trying to setup a deep neuronal network, which predicts the next move for a game agent to navigate a world. To control the game agent it takes two float inputs. The first one controls the speed (0.0 = stop/do not move, 1.0 = max. speed). The second controls the steering (-1.0 = turn left, 0.0 = straight, +1.0 = turn right).
I designed the network so the it has two output neurons one for the speed (it has a sigmoid activation applied) and on for the steering (has a tanh activation). The actual input I want to feed the network is the pixel data and some game state values.
To train the network I would simply run a whole game (about 2000frames/samples). When the game is over I want to train the model. Here is where I struggle, how would my loss-function look like? While playing I collect all actions/ouputs from the network, the game state and rewards per frame/sample. When the game is done I also got the information if the agent won or lost.
Edit:
This post http://karpathy.github.io/2016/05/31/rl/ got me inspired. Maybe I could use the discounted (move, turn) value-pairs, multiply them by (-1) if game agent lost and (+1) if it won. Now I can use these values as gradients to update the networks weights?
It would be nice if someone could help me out here.
All the best,
Tobs.
The problem you are talking is belong to reinforcement-learning, where agent interact with environment and collect data that is game state, its action and reward/score it got at end. Now there are many approaches.
The one you are talking is policy-gradient method, And loss function is as E[\sum r], where r is score, which has to be maximized. And its gradient will be A*grad(log(p_theta)), where A is advantage function i.e. +1/-1 for winning/losing. And p_theta is the probability of choosing action parameterized by theta(neural network). Now if it has win, the gradient will be update in favor of that policy because of +1 and vice-versa.
Note: There are many methods to design A, in this case +1/-1 is chosen.
More you can read here in more detail.

Why does a decrease in cross-sectional area increase the pressure

When the cross section of the flow tube decreases, the flow speed increases, and therefore the pressure decreases.
can someone explain to me why this is true, i would think that as the cross section decreases the pressure would also increase .
This is related to "Continuity Equation" of fluid mechanisam.(Assuming fluid as incompressible)
if we have two cross-sections of areas A1 and A2 having velocities V1 and V2 respectively .Then according to continuity equation
A1*V1=A2*V2 or we can write
V2=(A1*V1)÷A2
V2 Is inversly proportional to the A2.
so velocity increases as the area decreases.
further we have a theorem in fluid mechanics called "Bernouli's theorem".
which states that the sum of all energies at any cross-section is constant.
So if the velocity(i.e kinetic energy) increases at any section there will be decrease in pressure(i.e pressure energy)
Think of it this way, what is pressure in the first place.!
Well pressure is the force acting perpendicular to a unit area write ?
So the fluid whatsoever particles are exerting force on that unit area, that's fine..
image five 10 people standing in an elevator standing next to each other, these guys are too much to fit inside the width of the elevator, thus they would push themselves towards the wall of the elevator making huge amount of force on the adjacent walls write ? what was that again ? aha!! huge force per area which are in this examples the walls of the elevator. hence they are too much pressure, okay.. now imagine that these people instead of standing 10 next to each others they formed themselves as groups of twos, so 5 rows of twos instead one of ten, i bet that they will feel more comfortable right? they won't push themselves that much to the wall and hence the wall will have small force on it and then small pressure, that was an example for proving that physics isn't just some numbers that define what is going to happen, Bernoulli's equation predicted that the pressure will decrease based on logic. science works :D

What is meaning of "parameter optimization of SVM by PSO"?

I can change parameters C and epsilon manually to obtain an optimised result, but I found that there is parameter optimization of SVM by PSO (or any other optimization algorithm). There is no algorithm. What does it mean: how can PSO automatically optimize the SVM parameters? I read several papers on this topic, but I'm still not sure.
Particle Swarm Optimization is a technique that uses the ML parameters (SVM parameters, in your case) as its features.
Each "particle" in the swarm is characterized by those parameter values. For instance, you might have initial coordinates of
degree epsilon gamma C
p1 3 0.001 0.25 1.0
p2 3 0.003 0.20 0.9
p3 2 0.0003 0.30 1.2
p4 4 0.010 0.25 0.5
...
pn ...........................
The "fitness" of each particle (p1-p4 shown here out of a population of n particles) is measured by the accuracy of the resulting model: the PSO algorithm trains and tests a model for each particle, returning that model's error rate as the value analogous to that from the training loss function (which it how the value is computed).
On each iteration, particles move toward the fittest neighbours. The process repeats until a maximum (hopefully the global one) appears as a convergence point. This process is simply one from the familiar gradient descent family.
There are two basic PSO variants. In gbest (global best), every particle affects every other particle, sort of a universal gravitation principle. It converges quickly, but may well miss a global max in favor of a local max that happened to be nearer to the swarm's original center. In lbest (local best), a particle responds to only its k closest neighbors. This can form localized clusters; it converges more slowly, but is more likely to find the global max in a non-convex space.
I'll try to briefly explain enough to answer your clarification questions. If that doesn't work, I'm afraid you'll probably have to find someone to discuss this in front of a white board.
To use PSO, you have to decide which SVM parameters you'll try to optimize, and how many particles you want to use. PSO is a meta-algorithm, so its features are the SVM parameters. The PSO parameters are population (how many particles you want to use, update neighbourhood (lbest size and a distance function; gbest is the all-inclusive case), and velocity (learning rate for the SVM parameters).
For a bit of illustration, let's assume the particle table above, extended to a population of 20 particles. We'll use lbest with a neighbourhood of 4, and a velocity of 0.1. We choose (randomly, in a grid, or however we think might give us nice results) the initial values of degree, epsilon, gamma, and C for each of the 20 particles.
Each iteration of PSO works like this:
# Train the model described by each particle's "position"
For each of the 20 particles:
Train an SVM with the SVM input and the given parameters.
Test the SVM; return the error rate as the PSO loss function value.
# Update the particle positions
for each of the 20 particles:
find the nearest 4 neighbours (using the PSO distance function)
identify the neighbour with the lowest loss (SVM's error rate).
adjust this particle's features (degree, epsilon, gamma, C) 0.1 of the way toward that neighbour's features. 0.1 is our learning rate / velocity. (Yes, I realize that changing degree is not likely to happen (it's a discrete value) without a special case in the update routine.
Continue iterating through PSO until the particles have converged to your liking.
gbest is simply lbest with an infinite neighbourhood; in that case, you don't need a distance function on the particle space.

where can I find detailed resource on object state prediction for using with dead reckoning?

I have a server and a client.
I have 40 opengl cubes. There state is described by 3d vector for position and 3x3 rotation matrix(or a quaternion).
How can I send intermediate packets and predict the object state on the client between those packets(extrapolation)?
for object position I can use a linear predictor on velocity.
How to predict quaternion states?
The easiest thing, parallel to what you're doing with linear velocity, is to use a linear predictor on angular velocity.
If you have two quaternions, q_0 and q_t, representing global orientations that are t seconds apart, you can compute the finite difference between the two quaternions and use that to find an angular velocity that can be used for extrapolation.
Make sure that the inner-product between q_0 and q_t is non-negative. If it's negative, negate all the components of one of the quaternions. This makes sure that we're not trying to go the long way around. If your bodies are rotating really fast relative to your sampling, this is a problem and you'll need a more complicated model that accounts for the previous angular velocity and makes assumptions about maximum possible acceleration. We'll assume that's not the case.
Then we compute the relative difference quaternion. dq = q_t * q_0' (where q_0' is the quaternion rotational inverse/conjugate). If you have the luxury of having fixed-sized steps, you can stop here and predict then next orientation t seconds into the future: q_2t = dq*d_t.
If we can' step forward by integer multiples of t, we compute the angle of rotation from dq. Quaternions and angular velocities are both variations on "axis-angle" representations of changes in orientation. If you rotate by Θ around unit-length axis [x,y,z], then the quaternion representation of that is q = [cos(Θ/2), sin(Θ/2)x, sin(Θ/2)y, sin(Θ/2)z] (using the quaternion convention where the w component comes first). If you rotate by Θ/t around axis [x,y,z], then the angular velocity is v = [Θx,Θy,Θz]/t. So v = Θ[q.x,q.y,q.z]/(t||[q.x,q.y,q.z]||). We can compute the angle two ways: Θ = 2acos(q.w) = 2asin(||[q.x,q.y,q.z]||). These will always be the same because of step 1. Numerics make it nicer to use sine since we need to find m = ||[q.x,q.y,q.z]|| anyway for the next step.
If m is large enough, then we just find the angular velocity:
v = 2asin(m)[dq.x,dq.y,dq.z]/(m*t)
However, if m's not large enough, we'll face numeric issues trying to divide by near-zero. So programmers will use the Taylor expansion of the sinc() function around zero, which happens to be very accurate in this case. Remember that m = sin(Θ/2). With m<1e-4, we can accurately compute asin(m)/m = 6/(6-m*m). Then you just need to multiply the result by 2*[dq.x,dq.y,dq.z]/t and you have your angular velocity. Phew.
Extrapolating is then a matter of multiplying your angular velocity times the time that has passed. Then you go backwards, converting the angular change to a quaternion and multiplying it onto q_t.
It seems like there must be an easier way...

Continuous collision detection between two moving tetrahedra

My question is fairly simple. I have two tetrahedra, each with a current position, a linear speed in space, an angular velocity and a center of mass (center of rotation, actually).
Having this data, I am trying to find a (fast) algorithm which would precisely determine (1) whether they would collide at some point in time, and if it is the case, (2) after how much time they collided and (3) the point of collision.
Most people would solve this by doing triangle-triangle collision detection, but this would waste a few CPU cycles on redundant operations such as checking the same edge of one tetrahedron against the same edge of the other tetrahedron upon checking up different triangles. This only means I'll optimize things a bit. Nothing to worry about.
The problem is that I am not aware of any public CCD (continuous collision detection) triangle-triangle algorithm which takes self-rotation in account.
Therefore, I need an algorithm which would be inputted the following data:
vertex data for three triangles
position and center of rotation/mass
linear velocity and angular velocity
And would output the following:
Whether there is a collision
After how much time the collision occurred
In which point in space the collision occurred
Thanks in advance for your help.
The commonly used discrete collision detection would check the triangles of each shape for collision, over successive discrete points in time. While straightforward to compute, it could miss a fast moving object hitting another one, due to the collision happening between discrete points in time tested.
Continuous collision detection would first compute the volumes traced by each triangle over an infinity of time. For a triangle moving at constant speed and without rotation, this volume could look like a triangular prism. CCD would then check for collision between the volumes, and finally trace back if and at what time the triangles actually shared the same space.
When angular velocity is introduced, the volume traced by each triangle no longer looks like a prism. It might look more like the shape of a screw, like a strand of DNA, or some other non-trivial shapes you might get by rotating a triangle around some arbitrary axis while dragging it linearly. Computing the shape of such volume is no easy feat.
One approach might first compute the sphere that contains an entire tetrahedron when it is rotating at the given angular velocity vector, if it was not moving linearly. You can compute a rotation circle for each vertex, and derive the sphere from that. Given a sphere, we can now approximate the extruded CCD volume as a cylinder with the radius of the sphere and progressing along the linear velocity vector. Finding collisions of such cylinders gets us a first approximation for an area to search for collisions in.
A second, complementary approach might attempt to approximate the actual volume traced by each triangle by breaking it down into small, almost-prismatic sub-volumes. It would take the triangle positions at two increments of time, and add surfaces generated by tracing the triangle vertices at those moments. It's an approximation because it connects a straight line rather than an actual curve. For the approximation to avoid gross errors, the duration between each successive moments needs to be short enough such that the triangle only completes a small fraction of a rotation. The duration can be derived from the angular velocity.
The second approach creates many more polygons! You can use the first approach to limit the search volume, and then use the second to get higher precision.
If you're solving this for a game engine, you might find the precision of above sufficient (I would still shudder at the computational cost). If, rather, you're writing a CAD program or working on your thesis, you might find it less than satisfying. In the latter case, you might want to refine the second approach, perhaps by a better geometric description of the volume occupied by a turning, moving triangle -- when limited to a small turn angle.
I have spent quite a lot of time wondering about geometry problems like this one, and it seems like accurate solutions, despite their simple statements, are way too complicated to be practical, even for analogous 2D cases.
But intuitively I see that such solutions do exist when you consider linear translation velocities and linear angular velocities. Don't think you'll find the answer on the web or in any book because what we're talking about here are special, yet complex, cases. An iterative solution is probably what you want anyway -- the rest of the world is satisfied with those, so why shouldn't you be?
If you were trying to collide non-rotating tetrahedra, I'd suggest a taking the Minkowski sum and performing a ray check, but that won't work with rotation.
The best I can come up with is to perform swept-sphere collision using their bounding spheres to give you a range of times to check using bisection or what-have-you.
Here's an outline of a closed-form mathematical approach. Each element of this will be easy to express individually, and the final combination of these would be a closed form expression if one could ever write it out:
1) The equation of motion for each point of the tetrahedra is fairly simple in it's own coordinate system. The motion of the center of mass (CM) will just move smoothly along a straight line and the corner points will rotate around an axis through the CM, assumed to be the z-axis here, so the equation for each corner point (parameterized by time, t) is p = vt + x + r(sin(wt+s)i + cos(wt + s)j ), where v is the vector velocity of the center of mass; r is the radius of the projection onto the x-y plane; i, j, and k are the x, y and z unit vectors; and x and s account for the starting position and phase of rotation at t=0.
2) Note that each object has it's own coordinate system to easily represent the motion, but to compare them you'll need to rotate each into a common coordinate system, which may as well be the coordinate system of the screen. (Note though that the different coordinate systems are fixed in space and not traveling with the tetrahedra.) So determine the rotation matrices and apply them to each trajectory (i.e. the points and CM of each of the tetrahedra).
3) Now you have an equation for each trajectory all within the same coordinate system and you need to find the times of the intersections. This can be found by testing whether any of the line segments from the points to the CM of a tetrahedron intersects the any of the triangles of another. This also has a closed-form expression, as can be found here.
Layering these steps will make for terribly ugly equations, but it wouldn't be hard to solve them computationally (although with the rotation of the tetrahedra you need to be sure not to get stuck in a local minimum). Another option might be to plug it into something like Mathematica to do the cranking for you. (Not all problems have easy answers.)
Sorry I'm not a math boff and have no idea what the correct terminology is. Hope my poor terms don't hide my meaning too much.
Pick some arbitrary timestep.
Compute the bounds of each shape in two dimensions perpendicular to the axis it is moving on for the timestep.
For a timestep:
If the shaft of those bounds for any two objects intersect, half timestep and start recurse in.
A kind of binary search of increasingly fine precision to discover the point at which a finite intersection occurs.
Your problem can be cast into a linear programming problem and solved exactly.
First, suppose (p0,p1,p2,p3) are the vertexes at time t0, and (q0,q1,q2,q3) are the vertexes at time t1 for the first tetrahedron, then in 4d space-time, they fill the following 4d closed volume
V = { (r,t) | (r,t) = a0 (p0,t0) + … + a3 (p3,t0) + b0 (q0,t1) + … + b3 (q3,t1) }
Here the a0...a3 and b0…b3 parameters are in the interval [0,1] and sum to 1:
a0+a1+a2+a3+b0+b1+b2+b3=1
The second tetrahedron is similarly a convex polygon (add a ‘ to everything above to define V’ the 4d volume for that moving tetrahedron.
Now the intersection of two convex polygon is a convex polygon. The first time this happens would satisfy the following linear programming problem:
If (p0,p1,p2,p3) moves to (q0,q1,q2,q3)
and (p0’,p1’,p2’,p3’) moves to (q0’,q1’,q2’,q3’)
then the first time of intersection happens at points/times (r,t):
Minimize t0*(a0+a1+a2+a3)+t1*(b0+b1+b2+b3) subject to
0 <= ak <=1, 0<=bk <=1, 0 <= ak’ <=1, 0<=bk’ <=1, k=0..4
a0*(p0,t0) + … + a3*(p3,t0) + b0*(q0,t1) + … + b3*(q3,t1)
= a0’*(p0’,t0) + … + a3’*(p3’,t0) + b0’*(q0’,t1) + … + b3’*(q3’,t1)
The last is actually 4 equations, one for each dimension of (r,t).
This is a total of 20 linear constraints of the 16 values ak,bk,ak', and bk'.
If there is a solution, then
(r,t)= a0*(p0,t0) + … + a3*(p3,t0) + b0*(q0,t1) + … + b3*(q3,t1)
Is a point of first intersection. Otherwise they do not intersect.
Thought about this in the past but lost interest... The best way to go about solving it would be to abstract out one object.
Make a coordinate system where the first tetrahedron is the center (barycentric coords or a skewed system with one point as the origin) and abstract out the rotation by making the other tetrahedron rotate around the center. This should give you parametric equations if you make the rotation times time.
Add the movement of the center of mass towards the first and its spin and you have a set of equations for movement relative to the first (distance).
Solve for t where the distance equals zero.
Obviously with this method the more effects you add (like wind resistance) the messier the equations get buts its still probably the simplest (almost every other collision technique uses this method of abstraction). The biggest problem is if you add any effects that have feedback with no analytical solution the whole equation becomes unsolvable.
Note: If you go the route of of a skewed system watch out for pitfalls with distance. You must be in the right octant! This method favors vectors and quaternions though, while the barycentric coords favors matrices. So pick whichever your system uses most effectively.