Defining the whitelist of characters
Limiting a subset of characters instead of using the full set can greatly improve recognition accuracy, increase speed, and reduce resource consumption. A list of characters can be automatically identified from an image using the built-in Aspose.OCR mechanisms.
Predefined character subsets
To define the predefined set of characters Aspose.OCR engine will look for, provide one of the following values in setAllowedCharacters
method of RecognitionSettings
object:
Subset | Action |
---|---|
CharactersAllowedType.ALL | Try to recognize all characters. |
CharactersAllowedType.LATIN_ALPHABET | Only recognize Latin / English text (A to Z and a to z ), without accented characters. |
CharactersAllowedType.DIGITS | Recognize only binary, octal, decimal, or hexadecimal numbers (0-9 and A to F ). |
Characters that do not match the provided subset are ignored.
AsposeOCR api = new AsposeOCR();
RecognitionSettings recognitionSettings = new RecognitionSettings();
recognitionSettings.setAllowedCharacters(CharactersAllowedType.DIGITS);
RecognitionResult result = api.RecognizePage("source.png", recognitionSettings);
System.out.println("Recognition result:\n" + result.recognitionText + "\n\n");
Custom characters list
You can specify your own list of characters to be recognized in the constructor of AsposeOCR
class or in setAllowedCharacters
method of RecognitionSettings
object. The characters are provided as a case-sensitive string.
Characters that do not match the provided list are ignored.