Defining the whitelist of characters

Limiting a subset of characters instead of using the full set can greatly improve recognition accuracy, increase speed, and reduce resource consumption. A list of characters can be automatically identified from an image using the built-in Aspose.OCR mechanisms.

Predefined character subsets

To define the predefined set of characters Aspose.OCR engine will look for, provide one of the following values in setAllowedCharacters method of RecognitionSettings object:

Subset	Action
CharactersAllowedType.ALL	Try to recognize all characters.
CharactersAllowedType.LATIN_ALPHABET	Only recognize Latin / English text (`A` to `Z` and `a` to `z`), without accented characters.
CharactersAllowedType.DIGITS	Recognize only binary, octal, decimal, or hexadecimal numbers (`0-9` and `A` to `F`).

Characters that do not match the provided subset are ignored.

AsposeOCR api = new AsposeOCR();
RecognitionSettings recognitionSettings = new RecognitionSettings();
recognitionSettings.setAllowedCharacters(CharactersAllowedType.DIGITS);
RecognitionResult result = api.RecognizePage("source.png", recognitionSettings);
System.out.println("Recognition result:\n" + result.recognitionText + "\n\n");

Custom characters list

You can specify your own list of characters to be recognized in the constructor of AsposeOCR class or in setAllowedCharacters method of RecognitionSettings object. The characters are provided as a case-sensitive string.

Characters that do not match the provided list are ignored.

Through AsposeOCR constructor

AsposeOCR api = new AsposeOCR("AÁBCDEÉFG12345");
RecognitionResult result = api.RecognizePage("source.png", new RecognitionSettings());
System.out.println("Recognition result:\n" + result.recognitionText + "\n\n");

Through recognition settings

AsposeOCR api = new AsposeOCR();
RecognitionSettings recognitionSettings = new RecognitionSettings();
recognitionSettings.setAllowedCharacters("AÁBCDEÉFG12345");
RecognitionResult result = api.RecognizePage("source.png", recognitionSettings);
System.out.println("Recognition result:\n" + result.recognitionText + "\n\n");

Identifying the characters Defining the blacklist of characters