Defining the whitelist of characters

Limiting a subset of characters instead of using the full set can greatly improve recognition accuracy, increase speed, and reduce resource consumption. A list of characters can be automatically identified from an image using the built-in Aspose.OCR mechanisms.

Predefined character subsets

To define the predefined set of characters Aspose.OCR engine will look for, provide one of the following values in setAllowedCharacters method of RecognitionSettings object:

Subset Action
CharactersAllowedType.ALL Try to recognize all characters.
CharactersAllowedType.LATIN_ALPHABET Only recognize Latin / English text (A to Z and a to z), without accented characters.
CharactersAllowedType.DIGITS Recognize only binary, octal, decimal, or hexadecimal numbers (0-9 and A to F).

Characters that do not match the provided subset are ignored.

AsposeOCR api = new AsposeOCR();
RecognitionSettings recognitionSettings = new RecognitionSettings();
recognitionSettings.setAllowedCharacters(CharactersAllowedType.DIGITS);
RecognitionResult result = api.RecognizePage("source.png", recognitionSettings);
System.out.println("Recognition result:\n" + result.recognitionText + "\n\n");

Custom characters list

You can specify your own list of characters to be recognized in the constructor of AsposeOCR class or in setAllowedCharacters method of RecognitionSettings object. The characters are provided as a case-sensitive string.

Characters that do not match the provided list are ignored.