How can I adjust detecting latency in ML kit text recognition? - google-mlkit

I want to lower the latency of detection because I get too many words in a second. I've tried to lower it by changing variables like FPS, maxFrameMs, minFramems, etc(almost everything regarding to latency) in VisonProcessorBase.java and InferenceInfoGraphic.java class but it didn't work.
How can I lower the speed of detection?

Those variables are just used for counting rather configuring camera.
You can play with REQUESTED_FPS constant and #selectPreviewFpsRange method inside the CameraSource.java class to lower the speed of frame feeding for detection.

Related

Creating a good training set for one-class detection

I am training a one-class (hands) object detector on the egohands data set. My problem is that it detects way too many things as hands. It feels like it is detecting everything that is skin-colored as a hand.
I assume the most likely explanation for this is that my training set is poor, as every single image of the set contains hands, and also almost no other skin-toned elements are on the images. I guess it is necessary to also present the network images that are not what you try to detect?
I just want to verify I am right with my assumptions, before investing lots of time into creating a better training set. Therefore I am very grateful for every hint want I am doing wrong.
Object detection preprocessing is critical step, take extra caution guards as detection networks are sensitive to geometrical transformations.
Some proven data augmentation methods include:
1.Random geometry transformation for random cropping (with constraints),
2.Random expansion,
3.Random horizontal flip
4.Random resize (with random interpolation).
5.Random color jittering for brightness, hue, saturation, and contrast

Should the size of the photos be the same for deep learning?

I have lots of image (about 40 GB).
My images are small but they don't have same size.
My images aren't from natural things because I made them from a signal so all pixels are important and I can't crop or delete any pixel.
Is it possible to use deep learning for this kind of images with different shapes?
All pixels are important, please take this into consideration.
I want a model which does not depend on a fixed size input image. Is it possible?
Without knowing what you're trying to learn from the data, it's tough to give a definitive answer:
You could pad all the data at the beginning (or end) of the signal so
they're all the same size. This allows you to keep all the important
pixels, but adds irrelevant information to the image that the network
will most likely ignore.
I've also had good luck with activations where you take a pretrained
network and pull features from the image at a certain part of the
network regardless of size (as long as it's larger than the network
input size). Then run through a classifier.
https://www.mathworks.com/help/deeplearning/ref/activations.html#d117e95083
Or you could window your data, and only process smaller chunks at one
time.
https://www.mathworks.com/help/audio/examples/cocktail-party-source-separation-using-deep-learning-networks.html

Particles passing through each other

I am writing a code to simulate particle movement. (currently 2D soon 3D hopefully)
The thing is, if I use a relatively large timestep particles end up passing through each other.
Do you have any suggestion that would allow me to correct that without using a really small step?
(it is in C++ if that makes much difference).
The use of timestep to advance the clock introduces model artifacts which can destroy the model validity, as is happening in your case. Use discrete event scheduling instead. This paper from Winter Simulation Conference 2005 describes how to implement movement in a discrete event framework. Your model will not only be more accurate, it will probably run much faster as well.
So you will have to do some sort of collision detection to see if two objects would collide.
Depending on your data structure the detection could take many forms. If you just have a list of points you would have to check all against each other in N^2 each step for the particle (adding the movement vector to create a larger spacial foot print). This could be done by the GJK algorithm.
Using some spacial data structure could reduce the complexity by only running the GJK on a pruned set of particles, i.e. no need to check if they impossible could overlap.

Correcting SLAM drift error using GPS measurements

I'm trying to figure out how to correct drift errors introduced by a SLAM method using GPS measurements, I have two point sets in euclidian 3d space taken at fixed moments in time:
The red dataset is introduced by GPS and contains no drift errors, while blue dataset is based on SLAM algorithm, it drifts over time.
The idea is that SLAM is accurate on short distances but eventually drifts, while GPS is accurate on long distances and inaccurate on short ones. So I would like to figure out how to fuse SLAM data with GPS in such way that will take best accuracy of both measurements. At least how to approach this problem?
Since your GPS looks like it is very locally biased, I'm assuming it is low-cost and doesn't use any correction techniques, e.g. that it is not differential. As you probably are aware, GPS errors are not Gaussian. The guys in this paper show that a good way to model GPS noise is as v+eps where v is a locally constant "bias" vector (it is usually constant for a few metters, and then changes more or less smoothly or abruptly) and eps is Gaussian noise.
Given this information, one option would be to use Kalman-based fusion, e.g. you add the GPS noise and bias to the state vector, and define your transition equations appropriately and proceed as you would with an ordinary EKF. Note that if we ignore the prediction step of the Kalman, this is roughly equivalent to minimizing an error function of the form
measurement_constraints + some_weight * GPS_constraints
and that gives you a more straigh-forward, second option. For example, if your SLAM is visual, you can just use the sum of squared reprojection errors (i.e. the bundle adjustment error) as the measurment constraints, and define your GPS constraints as ||x- x_{gps}|| where the x are 2d or 3d GPS positions (you might want to ignore the altitude with low-cost GPS).
If your SLAM is visual and feature-point based (you didn't really say what type of SLAM you were using so I assume the most widespread type), then fusion with any of the methods above can lead to "inlier loss". You make a sudden, violent correction, and augment the reprojection errors. This means that you lose inliers in SLAM's tracking. So you have to re-triangulate points, and so on. Plus, note that even though the paper I linked to above presents a model of the GPS errors, it is not a very accurate model, and assuming that the distribution of GPS errors is unimodal (necessary for the EKF) seems a bit adventurous to me.
So, I think a good option is to use barrier-term optimization. Basically, the idea is this: since you don't really know how to model GPS errors, assume that you have more confidance in SLAM locally, and minimize a function S(x) that captures the quality of your SLAM reconstruction. Note x_opt the minimizer of S. Then, fuse with GPS data as long as it does not deteriorate S(x_opt) more than a given threshold. Mathematically, you'd want to minimize
some_coef/(thresh - S(X)) + ||x-x_{gps}||
and you'd initialize the minimization with x_opt. A good choice for S is the bundle adjustment error, since by not degrading it, you prevent inlier loss. There are other choices of S in the litterature, but they are usually meant to reduce computational time and add little in terms of accuracy.
This, unlike the EKF, does not have a nice probabilistic interpretation, but produces very nice results in practice (I have used it for fusion with other things than GPS too, and it works well). You can for example see this excellent paper that explains how to implement this thoroughly, how to set the threshold, etc.
Hope this helps. Please don't hesitate to tell me if you find inaccuracies/errors in my answer.

Automatic feature extraction from chess board positions

I am working on a project where I take a chess board position (FEN string converted to binary) & it's evaluation score and feed it to a neural network. My aim is to make the neural network differentiate between good and bad positions.
How I encode the position : There are 12 unique pieces in chess i.e pawn, rook, knight, bishop, queen and king for white as well as black. I encode each piece using 4 bits with 0000 denoting an empty square. So the 64 squares are encoded into 256 bits and I use 6 more bits to denote game state like whose turn it is to move, king-castle status, etc.
Problem : Since the input space for chess positions is neither smooth nor uni-modal (one small change in the board position can result in a huge change in the evaluation score), the neural network doesn't learn well. Now, the next logical thing to somehow extract useful features (like material difference, center control, etc) and feed it to the network.
I do not want to hand pick the features as I want the network to learn everything by itself. Therefore I am thinking of extracting features automatically using autoencoders. Is there any better way to accomplish this?
Summary : What is the best way to automatically extract features from a chess board position so that it can be fed into a neural network?
UPDATE : To generate training data, I have modified Stockfish to dump it's evaluation process into a log file. So every new move(position) it considers is written to a file as an FEN string along with it's eval score
Neural networks can give an approximation of any function. The only consideration to do is the dimensionality of the search space, which give constraints to the amount of data you have to get a good approximation.
For a supervised network (you use autoencoders, then I think you use some variant of backpropagation), it's difficult for me to immagine how you think to do the trainig using single positions because you need similar positions in your training set. Maybe your approach is different, but I'm convinced that second strategy (using features) is more promising. I think using positions require a huge amount of data training to get good results.
For features take a look here, and to the classical work of Shannon.
I taked also useful informations from the source code of Crafty.
But you have to extract these informations from the FEN string.
Autoencoders are a way to give a reduction of data (good because increase performances). It seems to be better the use of Pincipal Component Analysys, as reported here.
I hope this can help you.