The forward-backward algorithm - hidden-markov-models

The forward-backward algorithm - hidden-markov-models

I am following Chapter 9 in "Speech and Language Processing By Jurafsky and Martin and trying to implement the forward-backward algorithm.
As far as I understand, it does not estimate the transition probabilities from and to the start and terminal states. But the autors refer to a spreadsheet by Jason Eisner where those probabilites are in fact estimated.
In a tutorial paper by Rabiner, only the transition probability from the start state is estimated.
What is going on?

Related

Phase Measurement 3d visualize when unwrapped phase function

Recently I have tried the phase-shifting-profilometry method to get a 3D surface.
Input Images
Object's Phase Function
Everything went smoothly until I found out that it becomes a Diagonal plane/ Diagonal surface when visualizing the surface due to the unwrapped phase algorithm.
3D object Visualize
I want to ask whether there is any method to make the surface horizontal (like the XY plane).
Sorry that I can not post images here because "I need at least 10 reputation to post images", so Images will be on the link below.
1: https://i.stack.imgur.com/R8BMt.gif
2: https://i.stack.imgur.com/QueKS.png
3: https://i.stack.imgur.com/6jysU.png
Thank you very much!

Ju Lee,
what you are actually plotting is the phase map which can be used to calculate the 3D object. The easiest way (which is only an approximation) to get a depth map is to calculate the phase difference to the phase map of a reference plane (without an object on it):
depth(x,y) ~ a*(phase_map(x,y) - phase_map_reference(x,y)),
where a is a scaling factor you have to determine experimentally. This "easy" procedure is roughly adapted from Takeda's famous 1983 paper: https://doi.org/10.1364/AO.22.003977.
See formula 22 therein for small phase differences.
A more accurate procedure giving the full 3D pointcloud directly from the phase can be found in Zhang's 2006 paper https://doi.org/10.1364/OE.14.009120.
But therefore, you have to calibrate the camera-projector system and calculate an "absolute" phase_map. This typically needs a lot of work, but references therefore are linked in Zhang's paper.
Have fun!

Calculating gradients in optical flow optimization (example: for incremental Horn and Schunck method)

I am having a problem in understanding the way how gradients are calculated for the incremental Horn and Schunck method (but actually, not only for that method, but more generally in iterative optimization methods for optical flow/deformable image registration). I use the Horn and Schunck example here, because there I can refer a paper that shows what is unclear to me.
I was reading the following paper http://campar.in.tum.de/pub/zikic2010revisiting/zikic2010revisiting.pdf that states that incremental Horn and Schunck method is actually the same as using Gauss-Newton optimization scheme on the original problem. In incremental Horn and Schunck, the solution of one Horn and Schunck iteration is used as the initial estimate for the following one: the displacement field is used to warp the source image, then the next Horn and Schunck iteration is used to calculate an incremental step. Afterwards, the initial estimate and the step are added and used as initialization for the next iteration. So far, so good, one can do this, even if I wouldn't say it is intuitive that this procedure of splitting things up and putting them back together should be correct.
Now the paper states that this (at first sight heuristic) approach can be derived as a Gauss-Newton optimization, which means it should have a more mathematical foundation. Now I find a motif that I came across more than once but cannot reason about:
In the term
IT(x)-IS(x+U(x))
, IT is the target image and IS is the source image, U(x) is the deformation field that is optimized.
When linearizing this energy term around a certain current deformation field value U(x) (in the paper it is equation 19, I changed it slightly), we get for a step h(x)
linearized(h(x))≡IT(x)−IS(x+U(x))−grad(IS(x+U(x)))*h(x)
Now the question is: WHAT is grad(IS(x+U(x)))? The authors argue that this would then be the same gradient as in the incremental Horn and Schunck method, but: I would say, the gradient needs to be taken wrt U(x). That means I would take the numerical gradient of IS at position x+U(x). However, I have seen now often that instead, the image is warped according to U(x), and then the numerical gradient of the warped image is taken at position x. This seems also to be what in incremental Horn and Schunck is done, however, it doesn't seem correct to me.
Is it an approximation that nobody talks about? Do I miss something? Or were all implementations that I saw using numerical gradients of warped images when iteratively optimizing for optical flow just doing the wrong thing?
Many thanks to anybody who could help me to get a bit enlightened.

Should deep learning classification be used to classify details such as liquid level in bottle

Can deep learning classification be used to precisely label/classify both the object and one of its features. For example to identify the bottle (like Grants Whiskey) and liquid level in the bottle (in 10 percent steps - like 50% full). Is this the problem that can be best solved utilizing some of deep learning frameworks (Tensorflow etc) or some other approach is more effective?

Well, this should be well possible if the liquor is well colored. If not (e.g. gin, vodka), I'd say you have no chance with today's technologies when observing the object from a natural view angle and distance.
For colored liquor, I'd train two detectors. One for detecting the bottle, and a second one to detect the liquor given the bottle. The ratio between the two will be your percentage.
Some of the proven state-of-the-art deep learning-based object detectors (just Google them):
Multibox
YOLO
Faster RCNN
Or non-deep-learning-based:
Deformable part model
EDIT:
I was ask to elaborate more. Here is an example:
The box detector e.g. draws a box in the image at [0.1, 0.2, 0.5, 0.6] (min_height, min_width, max_height, max_width) which is the relative location of your bottle.
Now you crop the bottle from the original image and feed it to the second detector. The second detector draws e.g. [0.2, 0.3, 0.7, 0.8] in your cropped bottle image, the location indicates the fluid it has detected. Now (0.7 - 0.2) * (0.8 - 0.3) = 0.25 is the relative area of the fluid with respect to the area of the bottle, which is what OP is asking for.
EDIT 2:
I entered this reply assuming OP wants to use deep learning. I'd agree other methods should be considered if OP is still unsure with deep learning. For bottle detection, deep learning-based methods have shown to outperform traditional methods by a large margin. Bottle detection happens to be one of the classes in the PASCAL VOC challenge. See results comparison here: http://rodrigob.github.io/are_we_there_yet/build/detection_datasets_results.html#50617363616c20564f43203230313020636f6d7034
For the liquid detection however, deep learning might be slightly overkill. E.g. if you know what color you are looking for, even a simple color filter will give you "something"....

The rule of thumb for deep learning is, if it is visible in the image, hence a expert can tell you the answer solely based on the image then the chances are very high that you can learn this with deep learning, given enough annotated data.
However you are quite unlikely to have the required data needed for such a task, therefore I would ask myself the question if i can simplify the problem. For example you could take gin, vodka and so on and use SIFT to find the bottle again in a new scene. Then RANSAC for bottle detection and cut the bottle out of the image.
Then I would try gradient features to find the edge with the liquid level. Finally you can calculate the percentage with (liquid edge - bottom) / (top bottle - bottom bottle).

Identifying the bottle label should not be hard to do - it's even available "on tap" for cheap (these guys actually use it to identify wine bottle labels on their website): https://aws.amazon.com/blogs/aws/amazon-rekognition-image-detection-and-recognition-powered-by-deep-learning/
Regarding the liquid level, this might be a problem AWS wouldn't be able to solve straight away - but I'm sure it's possible to make a custom CNN for it. Alternatively, you could use good old humans on Amazon Mechanical Turk to do the work for you.
(Note: I do not work for Amazon)

Deep neural network diverges after convergence

I implemented the A3C network in https://arxiv.org/abs/1602.01783 in TensorFlow.
At this point I'm 90% sure the algorithm is implemented correctly. However, the network diverges after convergence. See the attached image that I got from a toy example where the maximum episode reward is 7.
When it diverges, policy network starts giving a single action very high probability (>0.9) for most states.
What should I check for this kind of problem? Is there any reference for it?

Note that in Figure 1 of the original paper the authors say:
For asynchronous methods we average over the best 5
models from 50 experiments.
That can mean that in lot of cases the algorithm does not work that well. From my experience, A3C often diverges, even after convergence. Carefull learning-rate scheduling can help. Or do what the authors did - learn several agents with different seed and pick the one performing the best on your validation data. You could also employ early stopping when validation error becomes to increase.

Framework of Cart Pole w/ Reinforcement Learning

I am working on a side project that is modelling a the inverted pendulum problem and solving it
with a reinforcement learning algorithm, most notably Q-Learning. I have already engineered a simple MDP solver for a grid world - easy stuff.
However, I am struggling to figure out how to do this after days of scouring research papers. Nothing explains how to build up a framework for representing the problem.
When modelling the problem, can a standard Markov Decision Process be used? Or must it be a POMDP?
What is represented in each state (i.e. what state info is passed to the agent)? The coordinates, velocity, angle of the pendulum etc?
What actions can the agent take? Is it a continuous range of velocities in + or - x direction?
Advice on this is greatly appreciated.

"Reinforcement Learning: An Introduction" by Richard S. Sutton and Andrew G. Barto is the default book on reinforcement learning and they also talk about the cart pole problem (http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html). Sutton also offers the C code of the cart pole problem: http://webdocs.cs.ualberta.ca/~sutton/book/code/pole.c
Of course there are many implementations of the problem online: https://github.com/stober/cartpole
There are multiple solutions for the problem depending on hard you want to have it.
You can model it as a MDP or an POMDP.
The state can consist of position, velocity, angle and angle velocity, or any subset of these.
You can discretize the state space, you can use function approximation.
The actions can be simply min and max acceleration (discrete), something in between (discrete or continuous).
Start easy and work your way up to the more difficult problems!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas