Framework of Cart Pole w/ Reinforcement Learning

Framework of Cart Pole w/ Reinforcement Learning - process

I am working on a side project that is modelling a the inverted pendulum problem and solving it
with a reinforcement learning algorithm, most notably Q-Learning. I have already engineered a simple MDP solver for a grid world - easy stuff.
However, I am struggling to figure out how to do this after days of scouring research papers. Nothing explains how to build up a framework for representing the problem.
When modelling the problem, can a standard Markov Decision Process be used? Or must it be a POMDP?
What is represented in each state (i.e. what state info is passed to the agent)? The coordinates, velocity, angle of the pendulum etc?
What actions can the agent take? Is it a continuous range of velocities in + or - x direction?
Advice on this is greatly appreciated.

"Reinforcement Learning: An Introduction" by Richard S. Sutton and Andrew G. Barto is the default book on reinforcement learning and they also talk about the cart pole problem (http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html). Sutton also offers the C code of the cart pole problem: http://webdocs.cs.ualberta.ca/~sutton/book/code/pole.c
Of course there are many implementations of the problem online: https://github.com/stober/cartpole
There are multiple solutions for the problem depending on hard you want to have it.
You can model it as a MDP or an POMDP.
The state can consist of position, velocity, angle and angle velocity, or any subset of these.
You can discretize the state space, you can use function approximation.
The actions can be simply min and max acceleration (discrete), something in between (discrete or continuous).
Start easy and work your way up to the more difficult problems!

Related

Should deep learning classification be used to classify details such as liquid level in bottle

Can deep learning classification be used to precisely label/classify both the object and one of its features. For example to identify the bottle (like Grants Whiskey) and liquid level in the bottle (in 10 percent steps - like 50% full). Is this the problem that can be best solved utilizing some of deep learning frameworks (Tensorflow etc) or some other approach is more effective?

Well, this should be well possible if the liquor is well colored. If not (e.g. gin, vodka), I'd say you have no chance with today's technologies when observing the object from a natural view angle and distance.
For colored liquor, I'd train two detectors. One for detecting the bottle, and a second one to detect the liquor given the bottle. The ratio between the two will be your percentage.
Some of the proven state-of-the-art deep learning-based object detectors (just Google them):
Multibox
YOLO
Faster RCNN
Or non-deep-learning-based:
Deformable part model
EDIT:
I was ask to elaborate more. Here is an example:
The box detector e.g. draws a box in the image at [0.1, 0.2, 0.5, 0.6] (min_height, min_width, max_height, max_width) which is the relative location of your bottle.
Now you crop the bottle from the original image and feed it to the second detector. The second detector draws e.g. [0.2, 0.3, 0.7, 0.8] in your cropped bottle image, the location indicates the fluid it has detected. Now (0.7 - 0.2) * (0.8 - 0.3) = 0.25 is the relative area of the fluid with respect to the area of the bottle, which is what OP is asking for.
EDIT 2:
I entered this reply assuming OP wants to use deep learning. I'd agree other methods should be considered if OP is still unsure with deep learning. For bottle detection, deep learning-based methods have shown to outperform traditional methods by a large margin. Bottle detection happens to be one of the classes in the PASCAL VOC challenge. See results comparison here: http://rodrigob.github.io/are_we_there_yet/build/detection_datasets_results.html#50617363616c20564f43203230313020636f6d7034
For the liquid detection however, deep learning might be slightly overkill. E.g. if you know what color you are looking for, even a simple color filter will give you "something"....

The rule of thumb for deep learning is, if it is visible in the image, hence a expert can tell you the answer solely based on the image then the chances are very high that you can learn this with deep learning, given enough annotated data.
However you are quite unlikely to have the required data needed for such a task, therefore I would ask myself the question if i can simplify the problem. For example you could take gin, vodka and so on and use SIFT to find the bottle again in a new scene. Then RANSAC for bottle detection and cut the bottle out of the image.
Then I would try gradient features to find the edge with the liquid level. Finally you can calculate the percentage with (liquid edge - bottom) / (top bottle - bottom bottle).

Identifying the bottle label should not be hard to do - it's even available "on tap" for cheap (these guys actually use it to identify wine bottle labels on their website): https://aws.amazon.com/blogs/aws/amazon-rekognition-image-detection-and-recognition-powered-by-deep-learning/
Regarding the liquid level, this might be a problem AWS wouldn't be able to solve straight away - but I'm sure it's possible to make a custom CNN for it. Alternatively, you could use good old humans on Amazon Mechanical Turk to do the work for you.
(Note: I do not work for Amazon)

ACO Pheromone update

I'm working on ACO and a little confused about the probability of choosing the next city. I have read some papers and books but still the idea of choosing is unclear. I am looking for a simple explanation how this path building works.
Also how does the heuristics and pheromone come into this decision making?
Because we have same pheromone values at every edge in the beginning and the heuristics (closeness) values remains constant, so how will different ants make different decisions based on these values?

Maybe is too late to answer to question but lately I have been working with ACO and it was also a bit confusing for me. I didn't find much information on stackoverflow about ACO when I needed it, so I have decided to answer this question since maybe this info is usefull for people working on ACO now or in the future.
Swarm Intelligence Algorithms are a set of techniques based on emerging social and cooperative behavior of organisms grouped in colonies, swarms, etc.
The collective intelligence algorithm Ant Colony Optimization (ACO) is an optimization algorithm inspired by ant colonies. In nature, ants of some species initially wander randomly until they find a food source and return to their colony laying down a pheromone trail. If other ants find this trail, they are more likely not to keep travelling at random, but instead to follow the trail, and reinforcing it if they eventually find food.
In the Ant Colony Optimization algorithm the agents (ants) are placed on different nodes (usually it is used a number of ants equal to the number of nodes). And, the probability of choosing the next node (city) is based on agents chose the next node using the equation known as the transition rule, that represents the probability for ant 𝑘 to go from city 𝑖 to city 𝑗 on the 𝑡th tour.
In the equation, 𝜏𝑖𝑗 represents the pheromone trail and 𝜂𝑖𝑗 the visibility between the two cities, while 𝛼 and 𝛽 are adjustable parameters that control the relative weight of trail intensity and visibility.
At the beginning there is the same value of pheromone in all the edges. Based on the transition rule, which, in turn, is based on the pheromone and the visibility (distance between nodes), some paths will be more likely to be chosen than others.
When the algorithm starts to run each agent (ant) performs a tour (visits each node), the best tour found until the moment will be updated with a new quantity of pheromone, which will make that tour more probably to be chosen next time by the ants.
You can found more information about ACO in the following links:
http://dataworldblog.blogspot.com.es/2017/06/ant-colony-optimization-part-1.html
http://dataworldblog.blogspot.com.es/2017/06/graph-optimization-using-ant-colony.html
http://hdl.handle.net/10609/64285
Regards,
Ester

How to simulate directional wind?

I am currently developing hair strand system for my project. Currently I am using verlet integration to simulate gravity and wind.
Wind vector is currently just a vector. But I want to make a more realistic wind.
Is there any papers or articles that I should read about? Thanks.

It depends on how deep you want to go with the simulation. I suppose that you want something more interesting than uniform wind with varying direction and intensity.
I would suggest adding turbulent velocity to each strand with 3D Curl/Simplex noise. Even animated Perlin noise might be cheap and fast enough for your needs, but you might be able to get more dramatic effects with curl noise.
The original paper for curl noise is here: http://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph2007-curlnoise.pdf
You can also find several implementations of it, but the basic idea is still the same - perturbing particles according to an underlying flow-field.

Robot odometry in labview

I am currently working on a (school-)project involving a robot having to navigate a corn field.
We need to make the complete software in NI Labview.
Because of the tasks the robot has to be able to perform the robot has to know it's position.
As sensors we have a 6-DOF IMU, some unrealiable wheel encoders and a 2D laser scanner (SICK TIM351).
Until now I am unable to figure out any algorithms or tutorials, and thus really stuck on this problem.
I am wondering if anyone ever attempted in making SLAM work in labview, and if so are there any examples or explanations to do this?
Or is there perhaps a toolkit for LabVIEW that contains this function/algorithm?
Kind regards,
Jesse Bax
3rd year mechatronic student

As Slavo mentioned, there's the LabVIEW Robotics module that contains algorithms like A* for pathfinding. But there's not very much there that can help you solve the SLAM problem, that I am aware of. The SLAM problem consist of the following parts: Landmark extraction, data association, state estimation and updating of state.
For landmark extraction, you have to pick one or multiple features that you want the robot to recognize. This can for example be a corner or a line(wall in 3D). You can for example use clustering, split and merge or the RANSAC algorithm. I believe your laser scanner extract and store the points in a list sorted by angle, this makes the Split and Merge algorithm very feasible. Although RANSAC is the most accurate of them, but also has a higher complexity. I recommend starting with some optimal data points for testing the line extraction. You can for example put your laser scanner in a small room with straight walls and perform one scan and save it to an array or a file. Make sure the contour is a bit more complex than just four walls. And remove noise either before or after measurement.
I haven't read up on good methods for data association, but you could for example just consider a landmark new if it is a certain distance away from any existing landmarks or update an old landmark if not.
State estimation and updating of state can be achieved with the complementary filter or the Extended Kalman Filter (EKF). EKF is the de facto for nonlinear state estimation [1] and tend to work very well in practice. The theory behind EKF is quite though, but it should be a tad easier to implement. I would recommend using the MathScript module if you are going to program EKF. The point of these two filters are to estimate the position of the robot from the wheel encoders and landmarks extracted from the laser scanner.
As the SLAM problem is a big task, I would recommend program it in multiple smaller SubVI's. So that you can properly test your parts without too much added complexity.
There's also a lot of good papers on SLAM.
http://www.cs.berkeley.edu/~pabbeel/cs287-fa09/readings/Durrant-Whyte_Bailey_SLAM-tutorial-I.pdf
http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-412j-cognitive-robotics-spring-2005/projects/1aslam_blas_repo.pdf
The book "Probabalistic Robotics".
https://wiki.csem.flinders.edu.au/pub/CSEMThesisProjects/ProjectSmit0949/Thesis.pdf

LabVIEW provides LabVIEW Robotics module. There are also plenty of templates for robotics module. Firstly you can check the Starter Kit 2.0 template Which will provide you simple working self driving robot project. You can base on such template and develop your own application from working model, not from scratch.

Calculating inertia for a multi-shape rigid body

I figured someone probably asked this question before but I wasn't able to find an answer.
I'm writing a physics library for my game engine (2d, currently in actionscript3, but easily translatable to C based languages).
I'm having problems finding a good formula to calculate the inertia of my game objects.
The thing is, there are plenty of proven formulas to calculate inertia around a centroid of a convex polygon, but my structure is slightly different: I have game-objects with their own local space. You can add convex shapes such as circles and convex polygons to this local space to form complex objects. The shapes themselves again have their own local space. So there are three layers: World, object & shape space.
I would have no problems calculating the inertia of each individual polygon in the shape with the formulas provided on the moments of inertia Wikipedia article.
or the ones provided in an awesome collision detection & response article.
But I'm wondering how to relate this to my object structure, do I simply add all the inertia's of the shapes of the object? That's what another writer uses to calculate the inertia of triangulated polygons, he adds all the moments of inertia of the triangles. Or is there more to it?
I find this whole inertia concept quite difficult to understand as I don't have a strong physics background. So if anyone could provide me with an answer, preferably with the logic behind inertia around a given centroid, I would be very thankful. I actually study I.T. - Game development at my university, but to my great frustration none of the teachers in their ranks are experienced in the area of physics.

Laurens, the physics is much simpler if you stay in two dimensional space. In 2D space, rotations are described by a scalar, resistance to rotation (moment of inertia) is described by a scalar, and rotations are additive and commutative. Things get hairy (much, much hairier) in three dimensional space.
When you connect two objects, the combined object has its own center of mass. To calculate the moment of inertia of this combined object, you need to sum the moments of inertia of the individual objects and also add on offset term given by the Steiner parallel axis theorem for each individual object. This offset term is the mass of the object times the square of the distance to the composite center of mass.
The primary reason you need to know the moment of inertia is so that you can simulate the response to torques that act on your object. This is fairly straightforward in 2D physics. Rotational behavior is an analog to Newton's second law. Instead of F=ma you use T=Iα. (Things once again are much hairier in 3D space.) You need to find the external forces and torques, solve for linear acceleration and rotational acceleration, and then integrate numerically.
A good beginner's book on game physics is probably in order. You can find a list of recommended texts in this question at the gamedev sister site.

For linear motion you can just add them. Inertia is proportional to mass. Adding the masses of your objects and calculating the inertia of the sum is equivalent to adding their individual inertias.
For rotation it gets more complicated, you need to find the centre of mass.
Read up on Newton's laws of motion. You'll need to understand them if you're writing a physics engine. The laws themselves are very short but understanding them requires more context so google around.
You should specifically try to understand the concepts: Mass, Inertia, Force, Acceleration, Momentum, Velocity, Kinetic energy. They're all related.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas