Full Page Text Recognition Dataset Creation - tensorflow

I have been reading OCR papers such as this one https://arxiv.org/pdf/1704.08628.pdf , and I am have trouble finding out how these datasets are actually generated.
In the linked paper, they use a regressor to predict the start location (a point) and height of a line of text. Then, based on that starting point and height, a second network performs OCR and end of line detection. I realize this is a very simplified explanation, but it follows that their dataset consists (at least in part) of full page text 'images' annotated with where each line begins, and then a transcription of the text on a given line. Alternatively, they could have just used the lower left point of bounding boxes as the start point and the height of the box as the word height (avoiding the need to re-annotate if the data was previously prepared using bounding boxes).
So how is a dataset like this actually created? Looking at other datasets it seems like there is some software that can create XML files containing the ground truths relevant to each image, can someone point me in the right direction? I've been googling around and finding lots of tools for annotating text with sentiment etc and other tools for annotating images for segmentation (for something like a YOLO network), but I'm coming up empty for the creation something like the Maurdoor dataset used in the linked paper.
Thank you

So after submitting this, the related threads window showed me many threads that my googling did not turn up. This http://www.prima.cse.salford.ac.uk/tools software seems to be what I was looking for, but I would still love to hear other ideas.

Related

Inflation layer not working in certain geometries in ANSYS meshing tool

I am trying to implement an inflation layer between two geometries in my mesh using ANSYS, and I am confused about the procedure.
I found online (see the answer from Gopinath N K on 1/17/22) that in the ANSYS meshing tool you cannot combine face meshing with inflation. So I tried to remove the face sizings thinking that was what was being referred to but it gave mixed results which I'll explain below.
Second, I saw here that to create inflation I might need to employ named selections instead of selecting the two geometries (a body and a face) but this also gave mixed results.
As to my mixed results, I successfully got an inflation layer to work for a cylindrical body inside another cylindrical one (see images below). The blue larger cylinder is the body (red arrow), and the green circles are the edges of the small cylinder inside (green arrows). I created this inflation layer successfully.
However, when I try to create an inflation layer between the Rotating Zone (larger cylinder) and the Stationary Zone the inflation layer fails. This occurs as soon as I select the rectangular larger body. I didn't bother to finish selecting the other faces since next to Active it says "No, Invalid Method". The same thing occurs if I select the Structured Zone (smallest cylinder) and the faces of the wing (angled plate subtracted from the Structured Zone). So I really no clue what is causing this since it seems to occur as soon as I select the outer larger body geometry. Maybe I'm not selecting the right set of faces, or there is something else that is leading to this.
Thank you
So it turns out that the message saying "No, Invalid Method" is referring to a Hex Dominant method I created. There are certain mesh methods that inflation does not like to work with, and I haven't been able to find any reason why. I hope anyone using the ANSYS Mesher finds this helpful.

Reverse Image search (for image duplicates) on local computer

I have a bunch of poor quality photos that I extracted from a pdf. Somebody I know has the good quality photo's somewhere on her computer(Mac), but it's my understanding that it will be difficult to find them.
I would like to
loop through each poor quality photo
perform a reverse image search using each poor quality photo as the query image and using this persons computer as the database to search for the higher quality images
and create a copy of each high quality image in one destination folder.
Example pseudocode
for each image in poorQualityImages:
search ./macComputer for a higherQualityImage of image
copy higherQualityImage to ./higherQualityImages
I need to perform this action once.
I am looking for a tool, github repo or library which can perform this functionality more so than a deep understanding of content based image retrieval.
There's a post on reddit where someone was trying to do something similar
imgdupes is a program which seems like it almost achieves this, but I do not want to delete the duplicates, I want to copy the highest quality duplicate to a destination folder
Update
Emailed my previous image processing prof and he sent me this
Off the top of my head, nothing out of the box.
No guaranteed solution here, but you can narrow the search space.
You’d need a little program that outputs the MSE or SSIM similarity
index between two images, and then write another program or shell
script that scans the hard drive and computes the MSE between each
image on the hard drive and each query image, then check the images
with the top X percent similarity score.
Something like that. Still not maybe guaranteed to find everything
you want. And if the low quality images are of different pixel
dimensions than the high quality images, you’d have to do some image
scaling to get the similarity index. If the poor quality images have
different aspect ratios, that’s even worse.
So I think it’s not hard but not trivial either. The degree of
difficulty is partly dependent on the nature of the corruption in the
low quality images.
UPDATE
Github project I wrote which achieves what I want
What you are looking for is called image hashing
. In this answer you will find a basic explanation of the concept, as well as a go-to github repo for plug-and-play application.
Basic concept of Hashing
From the repo page: "We have developed a new image hash based on the Marr wavelet that computes a perceptual hash based on edge information with particular emphasis on corners. It has been shown that the human visual system makes special use of certain retinal cells to distinguish corner-like stimuli. It is the belief that this corner information can be used to distinguish digital images that motivates this approach. Basically, the edge information attained from the wavelet is compressed into a fixed length hash of 72 bytes. Binary quantization allows for relatively fast hamming distance computation between hashes. The following scatter plot shows the results on our standard corpus of images. The first plot shows the distances between each image and its attacked counterpart (e.g. the intra distances). The second plot shows the inter distances between altogether different images. While the hash is not designed to handle rotated images, notice how slight rotations still generally fall within a threshold range and thus can usually be matched as identical. However, the real advantage of this hash is for use with our mvp tree indexing structure. Since it is more descriptive than the dct hash (being 72 bytes in length vs. 8 bytes for the dct hash), there are much fewer false matches retrieved for image queries.
"
Another blogpost for an in-depth read, with an application example.
Available Code and Usage
A github repo can be found here. There are obviously more to be found.
After importing the package you can use it to generate and compare hashes:
>>> from PIL import Image
>>> import imagehash
>>> hash = imagehash.average_hash(Image.open('test.png'))
>>> print(hash)
d879f8f89b1bbf
>>> otherhash = imagehash.average_hash(Image.open('other.bmp'))
>>> print(otherhash)
ffff3720200ffff
>>> print(hash == otherhash)
False
>>> print(hash - otherhash)
36
The demo script find_similar_images also on the mentioned github, illustrates how to find similar images in a directory.
Premise
I'll focus my answer on the image processing part, as I believe implementation details e.g. traversing a file system is not the core of your problem. Also, all that follows is just my humble opinion, I am sure that there are better ways to retrieve your image of which I am not aware. Anyway, I agree with what your prof said and I'll follow the same line of thought, so I'll share some ideas on possible similarity indexes you might use.
Answer
MSE and SSIM - This is a possible solution, as suggested by your prof. As I assume the low quality images also have a different resolution than the good ones, remember to downsample the good ones (and not upsample the bad ones).
Image subtraction (1-norm distance) - Subtract two images -> if they are equal you'll get a black image. If they are slightly different, the non-black pixels (or the sum of the pixel intensity) can be used as a similarity index. This is actually the 1-norm distance.
Histogram distance - You can refer to this paper: https://www.cse.huji.ac.il/~werman/Papers/ECCV2010.pdf. Comparing two images' histograms might be potentially robust for your task. Check out this question too: Comparing two histograms
Embedding learning - As I see you included tensorflow, keras or pytorch as tags, let's consider deep learning. This paper came to my
mind: https://arxiv.org/pdf/1503.03832.pdf The idea is to learn a
mapping from the image space to a Euclidian space - i.e. compute an
embedding of the image. In the embedding hyperspace, images are
points. This paper learns an embedding function by minimizing the
triplet loss. The triplet loss is meant to maximize the distance
between images of different classes and minimize the distance between
images of the same class. You could train the same model on a Dataset
like ImageNet. You could augment the dataset with by lowering the
quality of the images, in order to make the model "invariant" to
difference in image quality (e.g. down-sampling followed by
up-sampling, image compression, adding noise, etc.). Once you can
compute embedding, you could compute the Euclidian distance (as a
substitute of the MSE). This might work better than using MSE/SSIM as a similarity indexes. Repo of FaceNet: https://github.com/timesler/facenet-pytorch. Another general purpose approach (not related to faces) which might help you: https://github.com/zegami/image-similarity-clustering.
Siamese networks for predicting similarity score - I am referring to this paper on face verification: http://bmvc2018.org/contents/papers/0410.pdf. The siamese network takes two images as input and outputs a value in the [0, 1]. We can interpret the output as the probability that the two images belong to the same class. You can train a model of this kind to predict 1 for image pairs of the following kind: (good quality image, artificially degraded image). To degrade the image, again, you can combine e.g. down-sampling followed by
up-sampling, image compression, adding noise, etc. Let the model predict 0 for image pairs of different classes (e.g. different images). The output of the network can e used as a similarity index.
Remark 1
These different approaches can also be combined. They all provide you with similarity indexes, so you can very easily average the outcomes.
Remark 2
If you only need to do it once, the effort you need to put in implementing and training deep models might be not justified. I would not suggest it. Still, you can consider it if you can't find any other solution and that Mac is REALLY FULL of images and a manual search is not possible.
If you look at the documentation of imgdupes you will see there is the following option:
--dry-run
dry run (do not delete any files)
So if you run imgdupes with --dry-run you will get a listing of all the duplicate images but it will not actually delete anything. You should be able to process that output to move the images around as you need.
Try similar image finder I have developed to address this problem.
There is an explanation and the algorithm there, so you can implement your own version if needed.

How to design a neural network for input of various size?

I'm wondering, how to design a neural network, where the input data can have different shape, as the network has some fixed number of nodes in the input layer.
Typically when I want to train a image classification network for pictures with unknown (various) resolution or when I want to classify a text, with various length.
For example for images I surely can have some preprocessing pipeline which will resize the image, but I can lose some information with it, in the case of text, the "resizing" would be even harder to perform.
Is there any trick, how to design such a network?
Three possibilities come to mind.
The easiest is the zero-padding. Basically, you take a rather big input size and just add zeroes if your concrete input is too small. Of course, this is pretty limited and certainly not useful if your input ranges from a few words to full texts.
Recurrent NNs (RNN) are a very natural NN to choose if you have texts of varying size as input. You input words as word vectors (or embeddings) just one after another and the internal state of the RNN is supposed to encode the meaning of the full string of words. This is one of the earlier papers.
Another possibility is using recursive NNs. This is basically a form of preprocessing in which a text is recursively reduced to a smaller number of word vectors until only one is left - your input, which is supposed to encode the whole text. This makes a lot of sense from a linguistic point of view if your input consists of sentences (which can vary a lot in size), because sentences are structured recursively. For example, the word vector for "the man", should be similar to the word vector for "the man who mistook his wife for a hat", because noun phrases act like nouns, etc. Often, you can use linguistic information to guide your recursion on the sentence. If you want to go way beyond the Wikipedia article, this is probably a good start.
In case of images, instead of asking your NN to recognize what's in your image, you can ask it what's in a particular e.g. 256x256 size part of your picture. You train for that, and use it for certain partly overlapped rolling windows of your whole image. If your to-be-recognized pattern varies in size a lot, you can resize it, and use your NN again.

Methods of labeling human muscles on tensorflow

I want to be able to label all of the muscles on an athletes body. I got a lot of the images that the athletes are almost in the same body pose but the issue that I am running into is that drawing a box around them makes them inaccurate as it ends up overlapping other muscles. Drawing exact lines around them is a bit difficult as they are a lot of smaller muscles and creates inconsistently over 20-30images. I was wondering if there is a way to feed in a human anatomy and then have tensorflow go in and label all of the muscles in given pictures.
Or I was wondering if you all had a different idea on approach this problem that I'm running into.
I don't have anybody else to ask and I've been researching this for awhile so if I missed or overlooked something please forgive me
The way i see is you need to combine with some prepossessing steps to normalize your target object in the image such as:
identify the human,
identify the pose or skeleton (which nowadays many open-source such as openpose-plus),
the pose estimation results can label the limbs, or part of the body from which you can do something either by hand-crafted image processing or other segmentation model.

rendered latex file's pictures comes out after section that precedes it in the latex file

So I have a Latex file with some plots. However there's a problem with the figure floats in the image. I have a picture right before a page break, and latex pushes the to big plot down to the next. Fine enough. But then it decides to go smart and push up the section after the plot, to the free space that was left when the picture was pushed down. The result is as you understand pretty confusing, if you are discussing plots.
Here is some code
\subsection{Part A2.1.6}
The Xsi2-distrubution with h=40 and 1\% significance gives 63.7(22.2) which
is significantly smaller than both Ljung-Box applied to the linear and
quadratic residuals, from which we can conclude that the residuals are not iid,
and that the linear and quadratic models are not as good as wanted.
\begin{figure}[h!]
\hspace*{-2.5cm}
\centering
\includegraphics[scale=0.40]{plot7_MEGAPLOT.png}
\hspace*{-2.5cm}
\vspace*{-0.5cm}
\caption{Plot of raw and deseasonalized dat}
\end{figure}
\\
\section{Part A2.2}
\subsection{Part A2.2.1}
The main qualitative difference between the plots/roots is that we have an
converging oscillating function for the complex roots while the real values
gives us a weakly oscillating converging function.
It sounds like you might need the placeins package to prevent floats crossing a section barrier.
I've summarised this and most of the other solutions for handling float placements here.
This is quite normal. If there is not enough space for a figure, then the space shouldn't be left blank; it should be filled with whatever text comes next in the document. (You don't see quarter-empty pages in professionally published books, for example.)
If you don't want the figures to float, then you can use [H], but I don't recommend it because as you've discovered it leaves lots of blank space.
My recommendation to everyone using floats is to not give them a placement argument at all (the default is [tbp]) or use [htbp] and let LaTeX put things where it likes. Getting good spacing once the document is finished is as much a problem of tweaking the surrounding material as it is playing with the float parameters.
By the way, no discussion of how LaTeX handles floats is complete without a link to Robin Fairbairn's FAQ entry on the subject.
What you don't say is what you expect to see. If the plot is too big for "here", then LaTeX has to put it on the next page (or a page of floats). That leaves some space which, as Will says, should be filled with something. What effect are you hoping to see?