Tesseract OCR: is it possible to force a specific pattern? - structure

I'm using Tesseract and I want to develop an app that is able to recognize a sequence of characters. I had good results but not exellent.
The characters sequence I want to read has always a specific pattern, let's say:
number number number char char - (e.g.: 123AB)
Is there a way to "tell" the ocr engine that the structure is always fixed, in order to improve the results of the recognition?
Thank you in advance.

Try bazaar matching pattern in Tesseract:
\d\d\d\c\c

You can use the "tessedit_char_whitelist" parameter

Related

Add spacing between numbers after some specific digits in React Native

I need to show dashes as hints and need to have equal spacing after some digits in Text Input similar to the picture attached .
Do I need to use separate text inputs for these or is it possible to achieve this in a single input text field? The user should be able to enter the number in a single go.
If you know phone number format you can do it manually in one input. You are saving it as a real number, but displaying it with spaces. If you can have various formats, from various countries you can use this package https://www.npmjs.com/package/react-native-phone-number-input
The best way to achieve that is by using mask input you can use react-native-mask-input library to do it you can check the library from here.

IEnumString searching substrings - possible?

I've implemented auto completion to a combobox like this article shows. Is it possible to make it search for substrings instead of just the beginning of the words?
http://www.codeproject.com/Articles/2371/IAutoComplete-and-custom-IEnumString-implementatio
I haven't found any way to customize how IEnumString/IAutoComplete compares the strings. Is it possible?
The built in search options help a bit but it is complete chaos. To find instring matches you need to set flag AcoWordFilter. But this will prevent from numbers being matched!! However, there is a trick to get the numbers to match: preced with a double-quote as in "3 to find a string containing or starting with "3". Some more chaos? In the AcoWordFilter you also need to prefix other characters not considered part of a "word", eg. you need to prefix parentheses with a " but then you will not find parentheses at the first position!
So the solution is either to create your own implementation of IAutoComplete or offer the user to switch between the modes (a bit awkward).
I dont think that the MS engineers are especially proud of such chaos. How about one more option: AcoSearchAnwhere?
After retrieving the Edit control's IAutoComplete interface, query it for an IAutoComplete2 interface. Calling its SetOptions member you can disable prefix filtering by specifying the ACO_NOPREFIXFILTERING AUTOCOMPLETEOPTIONS.
This is available on Windows Vista and later. If you need a solution that works with pre-Vista versions, you'll have to write your own.

Find All in a Textbox

I am working on an application to search for and build a list of all the times a string (or variable of) is in a text file. Kind of like a Find All function in a text editor that I can build a list with the info that is found, such as
S350
S250
S270
S5000
What can I use to do this search? It will have one value that does not change (The S in this case) followed by up to 4 digits
RegEx seems like a good choice.
Something like.. S(\d{1,4})? might work for you.
Expresso is my preferred regular expression composer.

Change Url using Regex

I have url, for example:
http://i.myhost.com/myimage.jpg
I want to change this url to
http://i.myhost.com/myimageD.jpg.
(Add D after image name and before point)
i.e I want add some words after image name and before point using regex.
What is the best way do it using regex?
Try using ^(.*)\.([a-zA-Z]{3,5}) and replacing with \1D\2. I'm assuming the extension is 3-5 alphanumeric numbers but you can modify it to suit. E.g. if it's just jpg images then you can put that instead of the [a-zA-Z]{3,5}.
Sounds like a homework question given the solution must use a regex, on that assumption here is an outline to get you going.
If all you have is a URL then #mathematical.coffee's solution will suit. However if you have a chunk of text within which is one or more URLs and you have to locate and change just those then you'll need something a little more involved.
Look at the structure of a URL: {protocol}{address}{item}; where
{protocol} is "http://", "ftp://" etc.;
{address} is a name, e.g. "www.google.com", or a number, e.g. "74.125.237.116" - there will always be at least one dot in the address; and
{item} is "/name" where name is quite flexible - there will be zero or more items, you can think of them as directories and a file but this isn't strictly true. Also the sequence of items can end in a "/" (including when there are zero of them).
To make a regex which matches a URL start by matching each part. In the case of the items you'll want to match the last in the sequence separately - you'll have zero or more "directories" and one "file", the latter must be of the form "name.extension".
Once you have regexes for each part you just concatenate them to produce a regex for the whole. To form the replacement pattern you can surround parts of your regex with parentheses and refer to those parts using \number in the replacement string - see #mathematical.coffee's solution for an example.
The best way to learn regexs is to use an editor which supports them and just experiment. The exact syntax may not be the same as NSRegularExpression but they are mostly pretty similar for the basic stuff and you can translate from one to another easily.

Objective-C RegexKitLite match one string or another

I'm trying to use regexkitlite for string matching in objective-c and I'm having some problems with it. What I'm trying to do is search a large string for substrings matching:
"http://[something].jpg"
"http://[something].png"
Basically, I want to find all links to images from the original string. What I have currently is:
NSString *regexString = #"http://[a-zA-Z0-9._%+-/]+\.jpg";
Now this is working for .jpg images, but of course it doesn't match .png images. I would really like to use one regexString that would match either, but I can't figure out how.
Reading some regex tutorials for other languages, I think it is something along the lines of:
NSString *regexString = #"http://[a-zA-Z0-9._%+-/]+\.(?:jpg|png)";
But I can't quite get it right.
Any help would be greatly appreciated.
You don't need a non-capturing group around the file extensions. It's good practice to use them, but it could be causing an error here. (Does the library support it?)
Also, I simplified your regex slightly by using a predefined character class.
NSString *regexString = #"http://[\w.%+-/]+\.(jpg|png)";
You can see this in action here.
You can also add any file extensions that you want. Ex: (jpg|png|gif|...).
Updated: Apple now includes regular expression support with NSRegularExpression, which is available in OS X v10.7 and later.