Defining the whitelist of characters

Limiting a subset of characters instead of using the full list for a selected language can greatly improve recognition accuracy, increase speed, and reduce resource consumption. A list of characters can be automatically identified from an image using the built-in Aspose.OCR mechanisms.

Predefined character sets

To define the predefined set of characters Aspose.OCR engine will look for, provide one of the following values in allowed_characters property of recognition settings:

Subset	Action
characters_allowed_type::ALL	Try to recognize all characters.
characters_allowed_type::LATIN_ALPHABET	Only recognize Latin / English text (`A` to `Z` and `a` to `z`), without accented characters.
characters_allowed_type::DIGITS	Recognize only binary, octal, decimal, or hexadecimal numbers (`0-9` and `A` to `F`).

Characters that do not match the provided subset are ignored.

std::string image_path = "source.png";
const size_t len = 4096;
wchar_t buffer[len] = { 0 };
RecognitionSettings settings;
settings.ignoredCharacters = characters_allowed_type::DIGITS;
size_t res_len = aspose::ocr::page_settings(image_path.c_str(), buffer, len, settings);
std::wcout << buffer;

Custom characters list

You can specify your own list of characters to be recognized. To define the exact set of characters Aspose.OCR engine will look for, use one of the following methods:

The characters are provided as a case-sensitive string. Characters that do not match the provided list are ignored.

std::string image_path = "source.png";
const size_t len = 4096;
wchar_t buffer[len] = { 0 };
size_t size = aspose::ocr::page_abc(image_path.c_str(), buffer, len, L"AÁBCDEÉFG12345");
std::wcout << buffer << L"\n";

Identifying the characters Defining the blacklist of characters